Diabetes Mellitus Prediction using Ensemble Machine Learning Techniques
Jyoti1, Peri Arjun2

1Dr. Jyoti Verma, Assistant Professor, Department of Computer Science, J.C. Bose University of Science and Technology, Haryana, India.
2Peri Arjun, Student, Department of Computer Science, J.C. Bose University of Science and Technology, Haryana, India.

Manuscript received on May 25, 2020. | Revised Manuscript received on June 29, 2020. | Manuscript published on July 30, 2020. | PP: 312-316 | Volume-9 Issue-2, July 2020. | Retrieval Number: B3480079220/2020©BEIESP | DOI: 10.35940/ijrte.B3480.079220
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The healthcare industry is inflicted with the plethora of patient data which is being supplemented each day manifold. Researchers have been continually using this data to help the healthcare industry improve upon the way major diseases could be handled. They are even working upon the way the patients could be informed timely of the symptoms that could avoid the major hazards related to them. Diabetes is one such disease that is growing at an alarming rate today. In fact, it can inflict numerous severe damages; blurred vision, myopia, burning extremities, kidney and heart failure. It occurs when sugar levels reach a certain threshold, or the human body cannot contain enough insulin to regulate the threshold. Therefore, patients affected by Diabetes must be informed so that proper treatments can be taken to control Diabetes. For this reason, early prediction and classification of Diabetes are significant. This work makes use of Machine Learning algorithms to improve the accuracy of prediction of the Diabetes. A dataset obtained as an output of K-Mean Clustering Algorithm was fed to an ensemble model with principal component analysis and K-means clustering. Our ensemble method produced only eight incorrectly classified instances, which was lowest compared to other methods. The experiments also showed that ensemble classifier models performed better than the base classifiers alone. Its result was compared with the same Dataset being applied on specific methods like random forest, Support Vector Machine, Decision Tree, Multilayer perceptron, and Naïve Bayes classification methods. All methods were run using 10k fold cross-validation.
Keywords: Diabetes, Machine learning, Ensemble, Dataset.