Managing Student Performance: A Predictive Analytics using Imbalanced Data
Usman Ashfaq1, Booma P. M.2, Raheem Mafas3
1Usman Ashfaq, Department of Computing, Engineering & Technology, Asia Pacific University of Technology & Innovation (APU), Kuala Lumpur, Malaysia.
2Dr. Booma P. M., Department of Computing, Engineering & Technology, Asia Pacific University of Technology & Innovation (APU), Kuala Lumpur, Malaysia.
3Raheem Mafas, Department of Computing, Engineering & Technology, Asia Pacific University of Technology & Innovation (APU), Kuala Lumpur, Malaysia.
Manuscript received on March 16, 2020. | Revised Manuscript received on March 24, 2020. | Manuscript published on March 30, 2020. | PP: 2277-2283 | Volume-8 Issue-6, March 2020. | Retrieval Number: E7008018520/2020©BEIESP | DOI: 10.35940/ijrte.E7008.038620
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Big data has revolutionized every field of life, which accumulates human learning as well. The field of education has progressed in past couple of decades, and addition to that, rapid growth in the number of educational institutions has created a tough competition. The massive accumulation of data in the educational sector has created a great scope of EDM (Educational Data Mining) with the support of robust predictive models. It is quite necessary to regularly examine the performance of the students to make them perform better, thus helps to maintain the reputation of the institution. This study proposed a predictive model through which the performance of the student can be forecasted depending upon various characteristics. The KDD(Knowledge Discovery in Databases) methodology was followed stepwise in this study for developing predictive models to predict student performance. The data balancing techniques such as SMOTE (Synthetic Minority Over-sampling Technique) and ADASYN (Adaptive Synthetic Sampling) were employed to handle the unbalanced effect of data which causes bias predictions. Also, for the selection of significant features techniques, FCBF (Fast Correlation Based Feature selection) and RFE (Recursive Feature Elimination) were used. The EDM algorithms Random Forest (RF), Support Vector Machine (SVM) and Artificial Neural Network (ANN) were utilized for predicting student performance with suitable hyper-parameter tuning using random search to enhance the performance of the model. The results obtained were cross-validated using Ensemble Method and benchmarked with previous studies. The random forest model achieved the highest accuracy of 86% after data balancing and careful selection of significant features.
Keywords: Predictive Algorithm, Data Balancing, Educational Data Mining, Feature Selection, Student Performance.
Scope of the Article: Data Mining.