A Framework for Forecasting Outbreak of Infectious Diseases Based on Climate Variability and Social Media Content
Juliet Johny1, Linda Sara Mathew2

1Juliet Johny, Computer Science and Engineering, Mar Athanasius College of Engineering, Kothamangalam, Kerala, India.
2Linda Sara Mathew, Computer Science and Engineering, Mar Athanasius College of Engineering, Kothamangalam, Kerala, India.

Manuscript received on January 06, 2021. | Revised Manuscript received on January 15, 2021. | Manuscript published on January 30, 2021. | PP: 118-124 | Volume-9 Issue-5, January 2021. | Retrieval Number: 100.1/ijrte.E5204019521 | DOI: 10.35940/ijrte.E5204.019521
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The amount of data has risen significantly over the last few years, due to the popularity of some of the data generation sources like social media, electronic health records, sensors and online shopping sites. Analyzing, processing and storing this data is very prominent since it helps to uncover hidden patterns and unknown correlations. A big data analysis and prediction System is proposed in this context, which combines weather observations, health data and social media content in order to forecast the outbreaks of infectious diseases in a locality. Finding information about the determinants of disease outbreaks are required to reduce its effects on populations. An In-mapper combiner based MapReduce algorithm is used to calculate the mean of daily measurements of various climate parameters like temperature, atmospheric pressure, relative humidity, solar and wind. The climatic parameter that may leads to the outbreak of a disease is identified by finding the correlation between the parameters and disease incidence count. To evaluate how user’s tweeting patterns and sentiments matched with the outbreak of diseases, all tweets containing keywords related to diseases are collected using twitter streaming APIs and are analyzed and processed using Spark framework. The performance of proposed model is improved due to the presence of tweet processing. This indicates that the real-time analysis of social media data can provide more effective result rather than working on the historical data. 
Keywords: Apache Spark , Hadoop MapReduce, Kafka, Spark MLlib