Frequent Item Set Mining for Data Streaming using Spark with Pincer Search Algorithm
Biman Giri1, Sivagami M2, Maheswari N.3

1Biman Giri, SCSE, Vellore Institute of Technology, Chennai (Tamil Nadu), India.
2Sivagami M, SCSE, Vellore Institute of Technology, Chennai (Tamil Nadu), India.
3Maheswari N, SCSE, Vellore Institute of Technology, Chennai (Tamil Nadu), India.
Manuscript received on 22 April 2019 | Revised Manuscript received on 01 May 2019 | Manuscript Published on 07 May 2019 | PP: 40-45 | Volume-7 Issue-6S3 April 2019 | Retrieval Number: F1009376S19/2019©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Data Streaming is the continuous flow of the vast volume of data which is transferred at high speed rate from one place to another using any network technique. Data streaming includes variety of data such as log files generated by web or mobile applications, ecommerce purchases, information from social networks, financial trading floors, or geospatial services, and telemetry from connected devices etc. The most challenging task is to find out frequent item set in ecommerce applications. It is not easy to find out the maximal frequent item set from online transactions due to high speed rate of the data transfer. In the current research context, set of applications such as market analysis, network security, sensors networks, web tracking are using association rules to find the frequent item set in data streams. Mining closed frequent item is the one step forward of mining association rules, which aims to find out the subset of frequent item set which could be frequent items. Pincer Search algorithm is one of the well-known algorithms to find closed frequent item sets and all subset of them. This algorithm uses approaches, top down as well as bottom up to find out the frequent item sets based on the threshold value. The proposed work adapts and tunes the Pincer Search algorithm for real time data streaming applications. The proposed system generates the streaming data using Apache Flume. Then data is selected randomly from real time streaming data and Pincer Search algorithm is applied on Apache Spark platform. The sample result screen of this approach is shown to ensure the use of Pincer Search algorithm in data streaming applications.
Keywords: Maximal Frequent Item Set, Data Streaming, Web Tracking, Pincer Search Algorithm, Apache Flume and Apache Spark.
Scope of the Article: Data Mining