Frequent Itemset Mining Using Amended K-Nn Technique Onhetrogeneous Hadoop Clusters
V. Seethalakshmi1, V. Govindasamy2, V. Akila3

1V. Seethalakshmi, Research Scholar, Department of Computer Science and Engineering, Pondicherry Engineering College, Puducherry (Tamil Nadu), India.
2V. Govindasamy, Associate Professor, Department of Information Technology, Pondicherry Engineering College, Puducherry (Tamil Nadu), India.
3V. Akila, Assistant Professor, Department of Computer Science and Engineering, Pondicherry Engineering College, Puducherry (Tamil Nadu), India.
Manuscript received on 22 April 2019 | Revised Manuscript received on 01 May 2019 | Manuscript Published on 07 May 2019 | PP: 8-17 | Volume-7 Issue-6S3 April 2019 | Retrieval Number: F1003376S19/2019©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: With the improvement of Data Innovation, there is an exponential development in the amount of information that is produced and utilized in the recent years. The necessity of data storage and retrieval of information in order to derive value from the inaccurate information has cleared route for parallel, dispersed functions like Hadoop. The existing Hadoopexecuti on presumes that the power capacity of all the nodes in a group is homogeneous, but the cloud framework is composed of different hardware systems. Further, the cloud has distinctive equipment arrangement frameworks. Hence, it is required to modify the data placement strategy, so that the data is organized depending on the handling power of a node. Consequently, it is important to redesign the data placement strategy in Hadoop, where we can spread out the information depending upon the computing capacity of a node. In this paper, we are proposing a dynamic block placement policy in Hadoop to distribute the input information blocks among the various nodes depending on the processing power of every node. The proposed technique is named as Amended k-Nearest Neighbor (AMENDED k-NN) technique. The k-NN technique can modify, balance the data dynamically and rearrange the input data in heterogeneous environment, according to the processing power of every node in the Hadoopheterogeneous environment. The proposed data placement strategy can distribute the stored information in the heterogeneous cluster so as to enhance the data-processing capability. Our method at the point of the Hadoop Distributed File System (HDFS) does not consider the correlations among application data. Experimental results reveal that Amended k-Nearest Neighbor significantly improves the Frequent Itemset Mining (FIM) efficiency of the current Fi Doop-DPresults by 31 percentage with an average of 18 percentage. The proposed Amended k- Nearest Neighbor (AMENDED k-NN) technique provides reduced data execution time, redundant transaction and computation cost. We plan to integrate Amended K- Nearest Neighbor with a data-placement system in HDFS on Heterogeneous sets in order to increase additionally the load balancing system which is done in HDFS.
Keywords: Amended k-NN, K-nearest Neighbor, Frequent Itemset Mining, FiDoop-DP, Clustering, Data Placement Policy.
Scope of the Article: Clustering