REDIC K –Prototype Clustering Algorithm for Mixed Data (Numerical and Categorical Data)
Khyati R.1, Nirmal2, K.V.V.Satyanarayana3
1Khyati R., Scholar, Department of CSE, Koneru Lakshmaiah Education Foundation, Green Fields, Vaddeswaram, (A.P), India.
2NIrmal, Research Scholar, Department of CSE, Koneru Lakshmaiah Education Foundation, Green Fields, Vaddeswaram, (A.P), India.
3K.V.V. Satyanarayana, Department of CSE, Koneru Lakshmaiah Education Foundation, Green Fields, Vaddeswaram (A.P), India.
Manuscript received on 13 March 2019 | Revised Manuscript received on 20 March 2019 | Manuscript published on 30 March 2019 | PP: 1-6 | Volume-7 Issue-6, March 2019 | Retrieval Number: E1924017519/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: In the unsupervised learning Clustering is the task to find hidden structure without any prior knowledge of data and derive the interesting patterns from the given data objects. Furthermost the real word dataset is the combination of numerical and categorical data attributes. The K-prototype Clustering algorithm is widely used to group the mixed data because of ease of implementation. The efficiency of the algorithm depends on the selection strategy of initial centroids, and here the initial centroids are randomly selected. Other constraint of this algorithm is to provide number of clusters as input, which requires the domain specific knowledge. Inappropriate choice for number of clusters will affect the complexity of algorithm. In this paper the REDIC (Removal Dependency on K and Initial Centroid Selection) K-prototype clustering algorithm is proposed which will eliminate the dependency on input parameter and creates the cluster using incremental approach. Here as a replacement for the bit by bit comparison of categorical attributes, the frequency-based method is used to calculate the dissimilarity measurement between two categorical instances. Experiments are conducted with standard datasets and the results are compared with traditional K-prototype algorithm. The better results of REDIC K -prototypes clustering algorithm proves the efficiency of algorithm and removes the dependency on initial parameter selection.
Keywords: Cluster Analysis; K- Prototype Clustering; Initial Centroid; Number of Cluster; Frequency based Similarity Measurement.
Scope of the Article: Clustering