A Progressive Classification Framework for Detecting SPAM emails and Identification of Authors
I V S Venugopal1, D Lalitha Bhaskari2, M N Seetaramanath3

1I V S Venugopal, Department of IT, G V P College of Engineering (A), Visakhapatnam, Andhra Pradesh, (Andhra Pradesh), India.
2Dr D Lalitha Bhaskari, Department of CS&SE,AUCE(A),Andhra Pradesh, Visakhapatnam, (Andhra Pradesh), India.
3Dr M N Seetaramanath, Department of IT, G V P College of Engineering (A), Visakhapatnam, (Andhra Pradesh), India.

Manuscript received on 13 March 2019 | Revised Manuscript received on 20 March 2019 | Manuscript published on 30 March 2019 | PP: 147-157 | Volume-7 Issue-6, March 2019 | Retrieval Number: F2155037619/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Emails are the most popular form of communication in the space of cyber communications. In the recent past, many of the instances were observed, where the mode of communication were shifted to instance communication methods such as instance messages or video-based services for interaction. Nevertheless, for a detailed communication, there is no replacement of email communications. A number of surveys have reported that the amount of emails exchanged daily ranges between 200 to 250 million every day including the personal, business or promotional emails. Considering such a massive space for information exchange, it is regardless to mention that this space becomes the target for information misuses. One of the biggest threat to the email collaboration is spam emails containing unsolicited information or many of the cases asking for critical information of the recipients. Most of the email service providers help the users by incorporating a spam filtering process to prevent spamming in the email servers. Nonetheless, due to the critical nature of language used in communication makes the spam detection highly difficult. The fundamental strategies followed by most of the filters are to detect the spam emails based on specified key words. Regardless to mention, that in different domains of business or studies, some of the keywords carry different significance and cannot be blacklisted. Also, the inappropriate detection of the email as spam may lead to severe information loss. A good amount of research attempts is made in the recent past to build a framework for detection of spam as perfect as possible. However, due to the mentioned restriction the bottleneck still persists in between email filtration and detection of spam accuracy. Thus, this work proposes a novel automatic framework for detecting the spam emails on a wide range of domains. The obtained accuracy is significantly high for this framework due to the multiple layered approach adapted. The framework deploys classification of the emails in various domains and further applies the keyword-based filtration process with analysis of term frequency along with identification of nature of the sender for confirmation of the process resulting into progressive classification in order to make the world of email communication highly secure and satisfiable.
Keywords: Spam filtering, Term Frequency, Term Relation, Domain Knowledge, Author identification, progressive ,classification

Scope of the Article: Knowledge Discovery