Evaluation of Various DR Techniques in Massive Patient Datasets using HDFS
K. B. V. Brahma Rao1, R Krishnam Raju Indukuri2, P. Suresh Varma3, M. V. Rama Sundari4

1Dr. K. B. V. Brahma Rao*, Ph.D, Department of Computer Science and Engineering, Adikavi Nannaya University, Rajamahendravaram (A. P), India.
2Dr. R Krishnam Raju Indukuri, Ph.D, Department of Computer Science and Engineering, Adikavi Nannaya University, Rajamahendravaram (A. P), India.

3Dr. Suresh Varma Penumatsa, Professor & Dean of Academics Department of Computer Science & Engineering of Adikavi Nannaya University, Rajamahendravaram (A. P), India.
4Dr. M. V. Rama Sundari, Ph.D, Department of Computer Science and Systems Engineering, Andhra University, Visakhapatnam (A. P), India.
Manuscript received on September 26, 2021. | Revised Manuscript received on September 30, 2021. | Manuscript published on November 30, 2021. | PP: 1-6 | Volume-10 Issue-4, November 2021. | Retrieval Number: 100.1/ijrte.D65081110421 | DOI: 10.35940/ijrte.D6508.1110421
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The objective of comparing various dimensionality techniques is to reduce feature sets in order to group attributes effectively with less computational processing time and utilization of memory. The various reduction algorithms can decrease the dimensionality of dataset consisting of a huge number of interrelated variables, while retaining the dissimilarity present in the dataset as much as possible. In this paper we use, Standard Deviation, Variance, Principal Component Analysis, Linear Discriminant Analysis, Factor Analysis, Positive Region, Information Entropy and Independent Component Analysis reduction algorithms using Hadoop Distributed File System for massive patient datasets to achieve lossless data reduction and to acquire required knowledge. The experimental results demonstrate that the ICA technique can efficiently operate on massive datasets eliminates irrelevant data without loss of accuracy, reduces storage space for the data and also the computation time compared to other techniques.
Keywords: Dimensionality Reduction, Data Mining, Independent Component Analysis, Knowledge Reduction, HDFS