Evaluating the Quality of Exam Questions: A Multidimensional Item Response
Faiz Zulkifli1, Rozaimah Zainal Abidin2, Zulkifley Mohamed3

1Faiz Zulkifli, Department of Computer and Mathematical Sciences, Universiti Teknologi MARA, Perak Branch, Tapah Campus, Tapah Road, Perak, Malaysia.
2Rozaimah Zainal Abidin, Department of Computer and Mathematical Sciences, Universiti Teknologi MARA, Perak Branch, Tapah Campus, Tapah Road, Perak, Malaysia.
3Zulkifley Mohamed, Department of Mathematics Science and Mathematics, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia.
Manuscript received on 11 October 2019 | Revised Manuscript received on 20 October 2019 | Manuscript Published on 02 November 2019 | PP: 606-612 | Volume-8 Issue-2S11 September 2019 | Retrieval Number: B10940982S1119/2019©BEIESP | DOI: 10.35940/ijrte.B1094.0982S1119
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The purpose of this research is to propose a new approach for evaluating the quality of the exam questions. Exam results were obtained from students taking the statistics and probability course in Universiti Teknologi MARA (UiTM). The number of exam questions is set by 10 questions with 30 items that have varying degrees of difficulty. A total of 214 students’ results have been extracted from the iCGPA system. “Multidimensional Item Response Analysis (MIRA)” was applied for the 1PL (Rasch), 2PL and 3PL models to evaluate the quality of the exam questions. The models were estimated using MH-RM algorithm in the R package. Model fitting comparison is based on the log-likelihood, SE, AIC and BIC statistics. The statistic and Zh statistic were calculated to identify the item misfit and person misfit. Through model fittings, all three models give the value of all acceptable and almost identical statistic. 5 items are considered as misfit by the 1PL model. For the 2PL and 3PL models, 5 items are categorized as misfit. The reduction in the number of misfit items can be attributed to the addition of information to the IRA model. On the other hand, the analysis of person fit provides different misfit percentages between the IRA models. This is probably because most students can answer all the questions very well. In conclusion, the quality of exam questions for statistics and probability courses needs to be improved by increasing the degree of difficulty of the questions that incorporate higher-order thinking skill.
Keywords: Exam Quality, Item Misfit, Multidimensional Item Response Analysis, Person Misfit.
Scope of the Article: Quality Assurance Process, Standards, and Systems