Drought Prediction using Geo-Spatial Big Data
A Abisha1, R Beulah Jayakumari2, D Doreen Hephzibah Miriam3

1A Abisha, PG Student, Department of Computer Science and Engineering, Jerusalem College of Engineering, Chennai (Tamil Nadu), India.
2Dr. R Beulah Jayakumari, Associate Professor, Department of Computer Science and Engineering, Jerusalem College of Engineering, Chennai (Tamil Nadu), India.
3Dr. D Doreen Hephzibah Miriam, Director, Computational Intelligence Research Foundation, Chennai (Tamil Nadu), India.
Manuscript received on 14 July 2019 | Revised Manuscript received on 10 August 2019 | Manuscript Published on 29 August 2019 | PP: 132-136 | Volume-8 Issue-2S5 July 2019 | Retrieval Number: B10280682S519/2019©BEIESP | DOI: 10.35940/ijrte.B1028.0782S519
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The digital world with digital processing, requires large storage space. The continuous explosion of the data such as text, image, audio, video, data centers and backup data lead to several problem in both storage and retrieval process. In this paper drought analysis and prediction is done using big data processing tools such as Hadoop and hive which can increase high. Previously to analyze and predict drought, traditional techniques such as AVISO model is used which is complex to process, requires more processing time, cannot process huge data and also has more security issues like malware in the database, abuse of privileges, etc. The system proposed in this paper can process huge data and has more processing speed. Here, drought analysis and prediction is carried out. To analyze drought dataset with more than ten lakhs are processed and drought type is found using map-reduce algorithm which maps and reduces the data using numerical summarization. Drought types such as D0, D1, D2,D3 D4 are analyzed to obtain reduced output. The obtained drought type are clustered using hive. To predict drought, random forest algorithm acts as an predictor which creates multiple decision trees and finds the best split among them. Finally, the predicted output is visualized using the time series model. The tools used in this paper include Hadoop and hive which can process huge data and it is the solution of Big Data. Hadoop is an open-source software framework for storing data and processing them efficiently, even if the data size is very huge. Hadoop uses Hadoop Distributed File System(HDFS) for storage and MapReduce for processing the data. Hive is a query processing tool which is built on top of Hadoop. It is a Structured query language(SQL)-like language called HiveQL (HQL). In this paper hive is used to cluster the data obtained from MapReduce. Thus using Big Data improves performance more than 50% compared to traditional system.
Keywords: Big Data, Hadoop, Hive, Random Forest.
Scope of the Article: Big Data Networking