Anonymization Based Fisher–Yates Shuffle Method for Streaming of Twitter Data
AR. Arunachalam1, G. Michael2, D. Vimala3
1Dr. AR. Arunachalam, Department of Computer Science and Engineering, Bharath Institute of Higher Education and Research, Chennai (Tamil Nadu), India.
2G. Michael, Department of Computer Science and Engineering, Bharath Institute of Higher Education and Research, Chennai (Tamil Nadu), India.
3D. Vimala, Department of Computer Science and Engineering, Bharath Institute of Higher Education and Research, Chennai (Tamil Nadu), India.
Manuscript received on 15 August 2019 | Revised Manuscript received on 06 September 2019 | Manuscript Published on 17 September 2019 | PP: 408-411 | Volume-8 Issue-2S8 August 2019 | Retrieval Number: B13970882S819/2019©BEIESP | DOI: 10.35940/ijrte.B1397.0882S819
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: In this era of Big Data, many organizations are functioning with personal data, that has to be preserved for privacy reason. There are hazards to identify the individual details by using Quasi Identifier (QI). So to preserve the privacy, anonymization points us to convert the personal data into unidentified personal data. There are many organizations that produce the large data in real time. With the help of Hadoop components like HDFS and MapReduce and with its ecosystems, large volume of data can be processed in real time. There are many basic data anonymization techniques like cryptographic, substitution, character masking, shuffling, nulling out, date variance and number variance. Here privacy preservation is achieved for streaming data by using one of the anonymization techniques called ‘shuffling’ with Big data concept. K-anonymity, t-closeness, l-diversity are usually used technique for privacy concern in a data. But in all these techniques information loss and data utility are not preserved very well. Dynamically Anonymizing Data Shuffling (DADS) technique is used to overcome this information loss and also to improve data utility in streaming data.
Keywords: Big Data, Privacy Preservation, Data Anonymization, Data Masking, Shuffling, Hadoop, Flume, Twitter.
Scope of the Article: Data Mining