Enemy item detection using data mining methods

UNCG Author/Contributor (non-UNCG co-authors, if there are any, appear on document)
II. John B. Weir (Creator)
Institution
The University of North Carolina at Greensboro (UNCG )
Web Site: http://library.uncg.edu/
Advisor
Richard Luecht

Abstract: Enemy items are any two items that should not appear on the same test form. These items may address the same material, or one may provide clues about the answer to another. Most enemy item pairs are identified before forms are published; subject matter experts (SMEs) manually review forms for enemy pairs, a process that can be both cognitively taxing and expensive. Some have suggested statistical approaches for identifying enemy item pairs; for instance, response data might show violations of local independence caused by clueing. One drawback, however, is that these are post hoc tests: the forms must have been administered to a sufficient number of examinees. This study proposed a method of identifying enemy item pairs that capitalized on two data mining approaches: latent Dirichlet allocation (LDA), an unsupervised topic model, and a random forest classifier, a supervised ensemble learning algorithm. Output from the LDA model was used to calculate the Jensen-Shannon distance (JSD) between items. Random forests were trained with and without the JSD, as well as several other item-level variables. Item pairs were scored using the resulting random forest classifiers, and SMEs evaluated the output. The random forest classifier was then retrained using input from the SMEs. This study suggests that random forest models can be useful in the identification of enemy item pairs; information derived from the LDA topic model improves the performance of the random forest classifier, and integrating feedback from SMEs further improves the performance.

Additional Information

Publication
Dissertation
Language: English
Date: 2019
Keywords
Enemy Items, Latent Dirichlet Allocation, Random Forests, Test Assembly
Subjects
Examinations $x Design and construction
Machine learning
Data mining

Email this document to