Feature Selection using Stochastic Diffusion Search Algorithm in Big Data Analysis
Sumitra Srinivas K1, Gangadhara Rao Kancharla2

1Sumitra Srinivas K, Ph.D, Department of Computer Science, Sri Acharya Nagarjuna University, Guntur (Andhra Pradesh), India.
2Dr. Gangadhara Rao Kancharla, Department of Computer Science, Sri Acharya Nagarjuna University, Guntur (Andhra Pradesh), India.
Manuscript received on 20 January 2020 | Revised Manuscript received on 02 February 2020 | Manuscript Published on 05 February 2020 | PP: 236-242 | Volume-8 Issue-4S5 December 2019 | Retrieval Number: D10511284S519/2019©BEIESP | DOI: 10.35940/ijrte.D1051.1284S519
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Big Data analysis has been viewed as the processing or mining of massive amounts of data used to retrieve information which is useful from large datasets. Among all the methods employed to deal with the analysis of Big Data, the selection of a feature is found extremely effective. A common approach which includes search making use of feature-based subsets which is relevant to the topic, tends to represent the dataset with its actual description. However, a search that makes use of such a subset is a combinatorial problem which is time-consuming. All commonly used meta-heuristic algorithms to facilitate feature choice. The Stochastic Diffusion Search (SDS) based algorithm has been a multi-agent global search algorithm based on agent interaction is simple to overcome combinatorial problems. The SDS will choose the feature subset for the task of classification. The Classification and Regression Tree (CART), the Naïve Bayes (NB), the Support Vector Machine (SVM) and the K-Nearest Neighbour (KNN) have been used to improve the performance. Results proved that the proposed method was able to achieve a better performance than existing techniques.
Keywords: Big Data Analysis, Feature Selection, Stochastic Diffusion Search (SDS) Algorithm, K-Nearest Neighbour (KNN) Classifier, Naïve Bayes (NB) Classifier, Classification and Regression Tree (CART) Classifier and Support Vector Machine (SVM) Classifier.
Scope of the Article: Big Data Quality Validation