Data Cleaning in Cloud Platform
V Ramya1, Jayasimha S R2

1V Ramya, PG Student Department of Master of Computer Applications RV College of Engineering®, Bangalore.
2Jayasimha S R, Assistant Professor Department of Master of Computer Applications RV College of Engineering®, Bangalore.

Manuscript received on April 30, 2020. | Revised Manuscript received on May 06, 2020. | Manuscript published on May 30, 2020. | PP: 2535-2539 | Volume-9 Issue-1, May 2020. | Retrieval Number: A3088059120/2020©BEIESP | DOI: 10.35940/ijrte.A3088.059120
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (

Abstract: Data is very valuable and it is generated in large volumes. The Use of high-quality data for making quality decisions has become a huge task which helps people to make better decisions, analysis, predictions. We are surrounded by data with errors, Data cleaning is a delayed, complicated task and considered costly. Data polishing is important since it is necessary to remove errors from the data before transferring to the data warehouse since poor quality data is eliminated to get the desired results. The Error-free data will produce precise and accurate results when queried. Hence consistent and proper data is required for the decision making. The characteristics of data polishing is data repairing and data association. Identifying the homogeneous object and linking it to the most associated object is defined as Association. The process of making the database reliable by repairing and finding the faults is defined as repairing. In the case of big data applications, we do not use all the existing data, we use only subsets of appropriate data. Association is the process of converting extensive amounts of raw data to subsets of appropriate data that are useful. Once we get the appropriate data, the available data is analyzed and it leads to knowledge [14]. Multiple approaches are used to associate the given data and to achieve meaningful and useful knowledge to fix or repair [12]. Maintaining polished quality of data is referred to as data polishing. Usually the objectives of data polishing are not properly defined. This paper will discuss the goals of data cleaning and different approaches for data cleaning platforms. 
Keywords: Polishing, Clustering, Association, Deduplication, Repairing.
Scope of the Article: Clustering