Complexity Analysis of Compressing Genomic Sequence Data with Chained Hash Indexing in Multiple Dictionary-based LZW
A.S. Keerthy1, S. Manju Priya2
1A.S. Keerthy, Department of Computer Science, Karpagam Academy of Higher Education, Coimbatore (Tamil Nadu), India.
2S. Manju Priya, Department of CS, CA & IT, Karpagam Academy of Higher Education, Coimbatore (Tamil Nadu), India.
Manuscript received on 24 April 2019 | Revised Manuscript received on 02 May 2019 | Manuscript Published on 08 May 2019 | PP: 468-471 | Volume-7 Issue-5S3 February 2019 | Retrieval Number: E11830275S19/19©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Data compression is the most discussed topic among the researchers as well as people working in the data industry. Huge volume of data comes from different sources and in a variety of formats like audio, video, pictures, text data, numeric data, etc. Among the variety of data available for researchers to work on, the most prominent are the genomic data produced by biological research labs. With the advent of high speed sequencing machinery and techniques, the amount of genomic data being produced is surpassing the Moore’s Law. To store data proficiently and use it efficiently, compression of data is the best choice that researchers can opt for. Considering the specialty of genomic data, the compression methodology must be lossless. Keeping all these factors in consideration, a multiple dictionary based LZW compression technique was proposed and implemented. This paper computes the complexity analysis of the methodology and compares it with the currently existing ones.
Keywords: Complexity Analysis, Compression, Genomic Data, Lossless, MDLZW.
Scope of the Article: Data Analytics