Luster Sampling to Improve Classifier Accuracy for Numeric Data
Lakshmi Sreenivasa Reddy D1, M. Rajini2 

1Dr Lakshmi Sreenivasa Reddy D, Department of Information Technology, Chaitanya Bharathi Institute of Technology, Hyderabad, India.
2M. Rajini, Department of Computer Science, Stanley Degree and PG College for Women, Hyderabad, India.

Manuscript received on 01 March 2019 | Revised Manuscript received on 05 March 2019 | Manuscript published on 30 July 2019 | PP: 3685-3692 | Volume-8 Issue-2, July 2019 | Retrieval Number: B2848078219/19©BEIESP | DOI: 10.35940/ijrte.B2848.078219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (

Abstract: Clustering is one of the essential techniques to group similar data. Improving model accuracy is still a challenge for all variety of data. Training and testing a classifier on entire data is not possible for large scale of data. Sampling of the data is necessary for any modeling and is an important aspect in data mining. All models train and test on different samples taken by traditional techniques like random forest ensemble method. In this paper, we propose cluster sampling which is superior to any other sampling methods in improving classifier accuracy. Sampling the data from usual methods cannot cover all variety of data from the original. Cluster sampling is a two-step approach. First it clusters the entire data, second it selects samples from each cluster. These samples consists all verity of data with equal proportion. Cluster sampling leverages the tree based ensemble to handle categorical, numerical and mixed type of data. Classifiers modeled on cluster sampling samples shown superior in accuracy than modeled on other sampling techniques.
Keywords: Clustering, Categorical Data, Numerical Data, Random forest, Classifier, Sampling

Scope of the Article: Classification