Prediction, Cross Validation and Classification in the Presence COVID-19 of Indian States and Union Territories using Machine Learning Algorithms
P. Arumugam1, V. Kadhirveni2, R. Lakshmi Priya3, Manimannan G4

1P. Arumugam, Professor, Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli (Tamil Nadu), India.
2V. Kadhirveni, Research Scholar, Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli (Tamil Nadu), India.
3R. Lakshmi Priya, Assistant Professor, Department of Statistics, Dr. Ambedkar Government Arts College, Vyasarpadi, Chennai (Tamil Nadu), India.
4Manimannan G*, Assistant Professor. Department of Statistics, TMG College of Arts and Science, Chennai (Tamil Nadu), India.

Manuscript received on March 11, 2021. | Revised Manuscript received on April 30, 2021. | Manuscript published on May 30, 2021. | PP: 16-20 | Volume-10 Issue-1, May 2021. | Retrieval Number: 100.1/ijrte.A56590510121 | DOI: 10.35940/ijrte.A5659.0510121
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (

Abstract: The present study predicts, cross validate and classify the data of COVID-19 based on four machine learning algorithm with four major parameters namely confirmed cases, recoveries, deaths and active cases. The secondary sources of database were collected from Ministry of Health and Family Welfare Department (MHFWD), from Indian State and Union Territories up to March, 2021. Based on these background, the database classified and predicted various machine learning Algorithm, like SVM, k NN, Random Forest and Logistic Regression. Initially, the k-mean clustering analysis is used to perform and identified five meaningful clusters and is labeled as Very Low, Low, Moderate, High and Very High of four major parameters based on their average values. In addition the five clusters are cross validated using four machine learning algorithm and affected states were visualized with help of prediction and probabilities. The different machine learning models achieved cross validation accuracy of 88%, 97%, 91% and 91%. . Delhi, Uttar Pradesh and West Bengal were Moderately Affected States, Assam, Bihar, Chhattisgarh, Haryana, Gujarat, Madhya Pradesh, Odisha, Punjab, Rajasthan and Telangana are Low Affected States, wherein Tamil Nadu, Kerala, Andhra Pradesh and Karnataka are highly affected States. and Maharashtra the Very Highly Affected State. Rest of the States and Union Territories has Very Low affected Covid-19 Cases is clearly identified. 
Keywords: COVID-19, Machine Learning Algorithms, Prediction, Cross Validation and Classification.