Python NLTK Sentiment Inspection using Naïve Bayes Classifier
Y. Jeevan Nagendra Kumar1, B. Mani Sai2, Varagiri Shailaja3, Singanamalli Renuka4, Bharathi Panduri5

1Dr. Y. Jeevan Nagendra Kumar, Department of Information Technology, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad (Telangana), India.
2B. Mani Sai, Department of Information Technology, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad (Telangana), India.
3Varagiri Shailaja, Department of Information Technology, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad (Telangana), India.
4Singanamalli Renuka, Department of Information Technology, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad (Telangana), India.
5Bharathi Panduri, Department of Information Technology, Gokaraju Rangaraju Institute of Engineering and Technology, Hyderabad (Telangana), India.
Manuscript received on 16 October 2019 | Revised Manuscript received on 25 October 2019 | Manuscript Published on 02 November 2019 | PP: 2684-2687 | Volume-8 Issue-2S11 September 2019 | Retrieval Number: B13280982S1119/2019©BEIESP | DOI: 10.35940/ijrte.B1328.0982S1119
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The Web is one of the richest sources for gathering of consumer reviews and opinions. There are many websites which contains opinions of the customers in the form of reviews, blogs, discussion groups, and forums. This project focuses on customer reviews on the restaurants. It predicts whether the given comment is either a positive or negative using supervised machine learning techniques. The project makes use of a dataset from Kaggle website. The dataset consists of comment and the type of comment (i.e., either positive or negative). This project makes a study on classification algorithm and text mining approaches to identify the type of comment. Firstly, the data set which is taken is made free from duplicates. That is duplicates are removed then it is followed by text pre-processing that involves removal of punctuation marks, stop word removal and then conversion of the whole text into vector format would takes place. The conversion from text to vector is an essential step because the English cannot be directly used for the analysis as we are working with linear algebra. So, as to work with this data, it has to be converted to vector format and we are using CountVectorizer to convert the data to the vector format. And finally comes the classification part. We are using Naive Bayes algorithm for this classification. This classification makes the data set into two parts as mentioned above. Here we are taking 70 percent of the data to be train data set and 30 percent of the data to be test data set.
Keywords: Multinomial Naïve Bayes, NLTK, Text pre-Processing, Count vectorizer, Classification, Django Web Framework, Text blob, Confusion matrix, Accuracy.
Scope of the Article: Classification