Spam Detection using NLP Techniques
Bollam Pragna1, M. Rama Bai2

1Bollam Pragna, Software Development Engineer Sunnyvale, California, United States.
2Dr. M. Rama Bai, Professor and Head, Department, Information Technology, MGIT, Hyderabad (Telangana), India.
Manuscript received on 15 October 2019 | Revised Manuscript received on 24 October 2019 | Manuscript Published on 02 November 2019 | PP: 2423-2426 | Volume-8 Issue-2S11 September 2019 | Retrieval Number: B12800982S1119/2019©BEIESP | DOI: 10.35940/ijrte.B1280.0982S1119
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Natural Language Processing is a vital field of research having applications in different subjects. Text Classification is a part of NLP where the text is converted into a machine-readable form by performing various methods. Tokenizing, part-of-speech tagging, stemming, chunking are some of the text classification methods. Implementing these methods on our data gives us a classified data on which we will train the model to detect spam and ham messages using Scikit-Learn Classifiers. We proposed a model to solve the issue of classifying messages as spam or ham by experimenting and analyzing the relative strengths of several machine learning algorithms such as K-Nearest Neighbors (KNN), Decision Tree Classifier, Random Forest Classifier, Logistic Regression, SGD Classifier, Multinomial Naive Bayes(NB), Support Vector Machine(SVM) to have a logical comparison of the performance measures of the methods we utilized in this research. The algorithm we proposed achieved an average accuracy of 98.49% with SVM model on ‘SMS Spam Collection’ dataset.
Keywords: Spam Detection Techniques Natural Language Processing.
Scope of the Article: Natural Language Processing