Khyati R. Nirmal,1 K.V.V.Satyanarayana2
1Shehna Sherafudeen, Research Scholar, ICFAI Business School, IBS Hyderabad (Telangana), India.
2Dr. Debajani Sahoo*, Associate Professor, Department of Marketing and Strategy, ICFAI Business School, IBS Hyderabad (Telangana), India.
Manuscript received on 23 May 2022 | Revised Manuscript received on 28 May 2022 | Manuscript Accepted on 15 July 2022 | Manuscript published on 30 July 2022 | PP: 21-28 | Volume-11 Issue-2, July 2022 | Retrieval Number: E1924017519/19©BEIES
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (

Abstract: In the unsupervised learning Clustering is the task to
find hidden structure without any prior knowledge of data and
derive the interesting patterns from the given data objects.
Furthermost the real word dataset is the combination of
numerical and categorical data attributes. The K-prototype
Clustering algorithm is widely used to group the mixed data
because of ease of implementation. The efficiency of the algorithm
depends on the selection strategy of initial centroids, and here the
initial centroids are randomly selected. Other constraint of this
algorithm is to provide number of clusters as input, which requires
the domain specific knowledge. Inappropriate choice for number
of clusters will affect the complexity of algorithm. In this paper the
REDIC (Removal Dependency on K and Initial Centroid
Selection) K-prototype clustering algorithm is proposed which will
eliminate the dependency on input parameter and creates the
cluster using incremental approach. Here as a replacement for the
bit by bit comparison of categorical attributes, the
frequency-based method is used to calculate the dissimilarity
measurement between two categorical instances. Experiments are
conducted with standard datasets and the results are compared
with traditional K-prototype algorithm. The better results of
REDIC K -prototypes clustering algorithm proves the efficiency of
algorithm and removes the dependency on initial parameter

Keywords: Cluster Analysis; K- Prototype Clustering; Initial
Centroid; Number of Cluster; Frequency based Similarity