Newly Proposed k-NN Method for More Efficient Classification
Danail Sandakchiev, Master degree in Business Analytics from Sofia University “St. Kliment Ohridski”.
Manuscript received on November 12, 2019. | Revised Manuscript received on November 25, 2019. | Manuscript published on 30 November, 2019. | PP: 5417-5424 | Volume-8 Issue-4, November 2019. | Retrieval Number: D7314118419/2019©BEIESP | DOI: 10.35940/ijrteD7314.118419
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The purpose of this paper is to examine a new classification algorithm based on the well-known k nearest neighbors technique that achieves better efficiency in terms of accuracy, precision and time when classifying test observations in comparison to classic k nearest neighbors.The proposed methodology splits the input dataset into n folds containing all observations. Each record is allocated to one of the folds. One of the folds is saved for testing purposes and the rest of the folds are used for training. The process is executed n times. The pair of train/test subsets which produces the highest accuracy result is selected as final model for the respective input data.18 different datasets are used for experiments. For each dataset, the classic k-NN is compared to the proposed method (Mk-NN) using accuracy, F1 score and execution time as metrics. The proposed approach achieves better results than classic k-NN according to all used metrics. Based on experiments with validation subsets, evidence of overfitting was not found. This paper suggests a novel method for improvement in accuracy, precision, recall and time when classifying test observations from a dataset. The approach is based on the concept of k nearest neighbors. However, what separates it from classic k nearest neighbors is that it tries to find train and test subsets of the original dataset that best represent the input dataset using the k-fold method.
Keywords: Classification problems, k Nearest Neighbors, Machine Learning Algorithms.
Scope of the Article: Machine Learning.