Prediction of Diabetics using Machine Learning
G. Geetha1, K.Mohana Prasad2

1G. Geetha *, 1Research Scholar, Sathyabama Institute of Science and Technology, Chennai.
2Dr.K.Mohana Prasad, Associate Professor, Sathyabama Institute of Science and Technology, Chennai.
Manuscript received on January 02, 2020. | Revised Manuscript received on January 15, 2020. | Manuscript published on January 30, 2020. | PP: 1119-1124 | Volume-8 Issue-5, January 2020. | Retrieval Number: E6290018520/2020©BEIESP | DOI: 10.35940/ijrte.E6290.018520

Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (

Abstract: Around 50.9 Million People in India suffer from diabetics and Tamil Nadu stands second in the list of Indian states. The main objective of this paper is to develop prediction modeling of the given medical data of patients with and without diabetics. Through this paper, we aim to create hybrid models that can be easily used by doctors to treat patients with diabetics. Naïve Bayes and Random forest algorithms are used to predict whether a person having diabetics or not, by keeping his health conditions in mind. Thus this process enables doctors to easily group, classify and categorize the disease type accordingly treatment can be given to them. We split the Dataset into 1) Training set and 2) Testing Set and perform analysis on them. The Pima Indian dataset was used to study and analyze the data, alongside with data mining techniques. It is the data obtained from the National Institute for Diabetics patients which contains n number of medical predictor variables and one target variable. Initially, we replace the null values that are there in the dataset with the mean values of the respective columns. We then split the dataset into different ways to perform analysis on them: 85/15, 80/20, 70/30, 60/40. After procuring the data set, we apply Naïve Bayes and Random Forest algorithms on this. The Naïve Bayes algorithm is used here to find the probability of the independent features/columns. The data set is given as an input and the prediction takes place according to the NB Model. The Random Forest algorithm is used here in order to perform feature selection. It takes n inputs from the dataset and builds numerous uncorrelated decision trees during the time of training. It then displays the class that is the mode of all of the class outputs by individual trees.
Keywords: Diabetics, Machine learning, Random Forest, Data mining, Naïve Bayes Classification.
Scope of the Article: Machine Learning.