A Content Level Based Deduplication on Streaming Data using Poisson Process Filter Technique (PPFT)
A. Sahaya Jenitha1, V. Sinthu Janita Prakash2

1Mrs. A. Sahaya Jenitha, Associate professor in the department of computer science, Cauvery College for Women, Tiruchirappalli, India.
2Dr. Janita pursed, Professor and Head in the PG & Research Department of Computer Science, Cauvery College for Women, Tiruchirappalli.

Manuscript received on 12 August 2019. | Revised Manuscript received on 17 August 2019. | Manuscript published on 30 September 2019. | PP: 4084-4089 | Volume-8 Issue-3 September 2019 | Retrieval Number: C5453098319/2019©BEIESP | DOI: 10.35940/ijrte.C5453.098319
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: TThis paper proposed on Poisson process-based algorithm is to carry out content-level deduplication for the streaming data. Since Poisson processes are meant to do the counting of different events happening over a period of time and space, it becomes appropriate to use it for identifying duplications of data as it gets streamed based on time and space, which can allow the deduplication process to be carried out in tandem. Some of the research on deduplication has been focusing on File-level and Block-level deduplication while the focus can be brought to content-level, as data get streamed lively and becomes more dynamic. With this approach, the content-level deduplication will allow the data to be scanned intelligently and at the same time, it can save the deduplication operation time. Also, streaming data has its randomness which is innately there and by having Poisson process based deduplication it will address the random behaviour of the data transfer and can work efficiently in the dynamically connected environment. The proposed method identifies the unique data to store in the Database. Based on the experimental result, the Poisson Process-based algorithm produce 0.912 Area Under Curve (AUC) accuracy on real-world streaming data, which means that if AUC is greater than 0.8 then the performance of algorithm is pretty good. So, the machine intelligence-based deduplication model produced reliable and robust deduplication on streaming data compared to existing approaches.
Keywords: Deduplication, Poisson Process, Semantic Level, Classification Algorithms, Streaming Data.

Scope of the Article: Classification