Mining of Completion Rate of Higher Education Based on Fuzzy Feature Selection Model and Machine Learning Techniques
Tahseen A. Wotaifi1, Eman S. Al-Shamery2

1Tahseen A. Wotaifi, College of Information Technology, University of Babylon, Hillah, Babil, Iraq.
2Eman S. Al-Shamery, College of Information Technology, University of Babylon, Hillah, Babil, Iraq.
Manuscript received on 19 September 2019 | Revised Manuscript received on 06 October 2019 | Manuscript Published on 11 October 2019 | PP: 393-400 | Volume-8 Issue-2S10 September 2019 | Retrieval Number: B10670982S1019/2019©BEIESP | DOI: 10.35940/ijrte.B1067.0982S1019
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (

Abstract: In the context of the great change in the labor market and the higher education sector, great attention is given to individuals with an academic degree or the so-called graduates class. However, each educational institution has a different approach towards students who wish to complete their university degree. This study aims at (1) identifying the most important factors that directly affect the completion, and (2) predicting the completion rates of students for university degrees according to the system of higher education in the United States. Unlike previous studies, this project contributes to the use of the fuzzy logic technique on three methods for feature selection, namely the Correlation Attribute Evaluation, Relief Attribute Evaluation, and Gain Ratio Method. Since these three methods give different weight to the same attribute, the fuzzy logic technique has been used to get one weight for the attribute. A great challenge faced throughout this study is the curse of dimensionality, because the college scorecard dataset launched by the US Department of Education contains approximately (8000) educational institutions and (1825) features. Applying the method used in this study to identify important features lead to their reduction to only (79). Accordingly, two models have been used to predict the completion rates of students for their university studies which are the Random Forest and the Support Vector Regression with a Mean Absolute Error (MAE) value of (0.068) and (0.097) respectively.
Keywords: Completion Prediction of Students, Fuzzy-Selection Method, Filter Method, Mining Higher Education, Random Forest, and Support Vector Regression.
Scope of the Article: Machine Learning