An Improved Clustering Realized Relational Data Anonymization with Optimal Privacy and Utility Measures
G. Sasirekha1, S. Kishore Verma2, S. Sheik Faritha Begum3, J. S. Adeline Johnsana4
1G. Sasirekha, Department of Computer Science and Engineering, C. Abdul Hakeem College of Engineering and Technology, Vellore (Tamil Nadu), India.
2S. Kishore Verma, Department of Computer Science and Engineering, C. Abdul Hakeem College of Engineering and Technology, Vellore (Tamil Nadu), India.
3S. Sheik Faritha Begum, Department of Computer Science and Engineering, C. Abdul Hakeem College of Engineering and Technology, Vellore (Tamil Nadu), India.
4J. S. Adeline Johnsana, Department of Information Technology, Adhiparasakthi College of Engineering, Kalavai (Tamil Nadu), India.
Manuscript received on 22 May 2019 | Revised Manuscript received on 08 June 2019 | Manuscript Published on 15 June 2019 | PP: 380-387 | Volume-8 Issue-1S2 May 2019 | Retrieval Number: A00890581S219/2019©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Massive growth of technology results the increased usage of computer in day to day life. Every user feeds millions of data for every minute. The process of converting this raw data into useful information is called data mining. The need for preservation of data for its privacy is called Privacy Preserving Data Mining (PPDM). In recent years privacy preserving data mining has become more crucial because of increased storage of digital collection of data about users in many of government sectors, corporate, hospitals, banks, etc., This collection of data contains many sensitive attributes, which reveals their identity of the users by combining the data’s with publicly available data’s, which had been stolen by hackers. To prevent from this, a protection model called k-anonymization is introduced. This k-anonymity model preserves the individual identity through generalization and suppression. Privacy and utility measures are inversely proportional to each other. The need to maintain a tradeoff between privacy and utility is a vital factor in PPDM. In this paper, CARD (Clustered Anonymization of Relational Data) is presented to reduce the information loss of utility aware anonymization. The utility aware anonymization means k-anonymizing the dataset by accounting the two novel factors, transformation pattern loss (tpl) and null value count having minimum values. This utility aware anonymization is done for Cell oriented Anonymization (CoA), Attribute oriented Anonymization (AoA) and Record oriented Anonymization (RoA). CARD proceeds in clustering the given dataset with various benchmarked clustering algorithms like Simple K-Means (KMeans), Farthest First (FF), Expectation Maximization (EM), Partition around Medoids (PAM) and Gower method then this clustered data set are subjected to utility aware CoA, AoA and RoA anonymization approaches. Classification analysis like logistic regression, naïve bayes and random forest are done on clustered anonymized data set to assess and prove the privacy and utility of the proposed approach based on Information Loss, Re-Identification Risk and Classification Accuracy of the clustered dataset before publishing them. Our experimental results prove to be better than the non-clustered anonymization procedures. Among the five clustering algorithms, In our analysis Gower and Partition around Medoids (PAM) results give better solution in terms of privacy and utility since PAM and Gower approaches are the best clustering methods that are capable of clustering mixed data type (numerical and categorical).
Keywords: K Anonymization, Cell Oriented Anonymization, Attribute Oriented Anonymization, Record Oriented Anonymization, Partition around Medoids, Gower.
Scope of the Article: Data Mining