Classification of Sentiment Based on Movie Feedback Given By Audiences
Sumedh Shah1, Alwin Anuse2, Rupali Kute3
1Sumedh Shah, Electronics and Telecommunication, Maharashtra Institute of Technology, Pune, India.
2Alwin Anuse, Electronics and Telecommunication, Maharashtra Institute of Technology, Pune, India.
3Rupali Kute, Electronics and Telecommunication, Maharashtra Institute of Technology, Pune, India.
Manuscript received on November 22, 2019. | Revised Manuscript received on November 28, 2019. | Manuscript published on November 30, 2019. | PP: 2594-2602 | Volume-8 Issue-4, November 2019. | Retrieval Number: D7237118419/2019©BEIESP | DOI: 10.35940/ijrte.D7237.118419
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The need for generating automated sentiment on audience feedbacks has been the need of the hour. Manually going through the entire movie feedback becomes tedious therefore an attempt to predict the polarity of a movie based on the reviews using machine learning models is done. Usage of the IMDB movie reviews dataset has been done for training and testing. In this study we also try to depict the real-life problems of class imbalance and train-test splits, hence obtaining solutions for the same. The problem of class imbalance in today’s world has affected a large amount of predictive applications such as cancer detection , fraudulent transactions in banks etc, hence this study is an attempt to perform a solution to solve the class imbalance problem. Use of the undersampling method has been done in this study to improve the accuracy of an imbalanced class. Feature extraction methods such as Bag of Words and Term Frequency Inverse document Frequency have been used to generate features from the reviews. The Logistic regression and SVM classifiers have been used in the study to measure the accuracy. Along with the accuracy the Confusion Matrix has also been calculated to showcase the class imbalance taking its effect on the accuracy.
Keywords: Bag of Words(BOW),Class Imbalance, Term Frequency Inverse Document Frequency(TF-IDF),Support Vector Machine(SVM).
Scope of the Article: Classification.