Interrogation of Sentiment Perusing with Hash Counting Vectorizer and Term Inverse Frequency Transformer using Machine Learning Classification
Kota Venkateswara Rao1, M. Shyamala Devi2
1Kota Venkateswara Rao, Research Scholar, Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, Chennai (Tamil Nadu) India.
2M. Shyamala Devi, Associate Professor, Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, Chennai, (Tamil Nadu) India.
Manuscript received on November 10, 2019. | Revised Manuscript received on November 17, 2019. | Manuscript published on 30 November, 2019. | PP: 3895-3901 | Volume-8 Issue-4, November 2019. | Retrieval Number: D8303118419/2019©BEIESP | DOI: 10.35940/ijrte.D8303.118419
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: With the fast growing technology, the business is moving towards increasing their profit by interpreting the customer satisfaction. The customer satisfaction can be analyzed in many ways. It is the responsibility of the business to analyze the customer satisfaction in order to improve their turnover and profit. With the current trend, the customers are giving their feedback through mobile and internet. With this overview, this paper attempts to analyze the sentiment of the customer feedback for the movie. The sentiment Analysis on movie Review dataset from the KAGGLE Machine learning repository is used for implementation. The type of sentiment classes is predicted through the following ways. Firstly, the sentiment count for each class is displayed and the top feature words for each sentiment class are also extracted from the dataset. Secondly, the dataset is sampled with counting vectorizer and then fitted with the classifiers like Logistic Regression Classifier, Linear SVM Classifier, Multinomial Naives Bayes Classifier, Gradient Boosting Classifer, Guassian Naive Bayes Classifier, Random Forest Classifier, Decision Tree Classifier and and Extra Tree Classifier. Thirdly, the dataset is sampled with Hashing vectorizer and then fitted with the above specified classifiers. Fourth, the dataset is sampled with TFIFD vectorizer and then fitted with the above specified classifiers. Fifth, the dataset is sampled with TFIFD Transformer and then fitted with the above specified classifiers. Sixth, the Performance analysis of classifiers is performed by analyzing the metrics like Precision, Recall, Fscore and Accuracy. The implementation is carried out using python code in Spyder Anaconda Navigator IP Console. Experimental results shows that the analysis of sentiment done by the random forest classifier is found to be more effective with the Accuracy of 89% for Counting vectorizer and TFIFD transformer, Accuracy of 87% for Hashing vectorizer and Accuracy of 88% for TFIFD vectorizer.
Keywords: Machine Learning, Precision, Recall, FScore and Accuracy.
Scope of the Article: Machine Learning.