The optimal design of the dual-purpose test

UNCG Author/Contributor (non-UNCG co-authors, if there are any, appear on document)
Xiao Luo (Creator)
The University of North Carolina at Greensboro (UNCG )
Web Site:
Richard Luecht

Abstract: Traditional test development focused on one purpose of the test, either ranking test-takers or providing diagnostic profiles for test-takers. Embedding both the ranking and diagnostic purposes in one assessment instrument would be a great advancement to the test functionality and utility. Our understandings regarding how such dual-purpose test should be optimally design and analyzed, however, were dwarfed by the growing needs for it in practice. Potential psychometric challenges related to the dual-purpose testing were not fully addressed in the literature. The present study provided a systematic comparison of various plausible designing and analyzing paradigms for the dual-purpose test in conditions with varying test length and dimensionality of true abilities. Results suggested that in order to obtain accurate and reliable total score and subscores, the test should be designed with multidimensionality and at least 10 items per domain and analyzed using the multidimensional IRT model. Specifically, the unidimensional dual-purpose test was able to produce reliable and accuracy but not diagnostically meaningful scores. Subscores obtained from an essentially unidimensional test were either unable to provide added value to the total score according to the PRMSE criterion or homogeneous to each other according to disattenuated correlations. The idiosyncratic multidimensional design was able to yield accurate, reliable, and diagnostically useful scores, but the validity of the diagnostic subscores was questionable, whose correlation disagreed with the true correlational structure. Consequently, even though subscores were identified distinct from the total score according to the PRMSE criterion, they were still nearly identical to each other according to the disattenuated correlations. On the other hand, the principled multidimensional design showed slightly lower accuracy and reliability in scores due to the principled "simple structure" of test design, but this sacrifice of accuracy and reliability ensured the interpretability and validity of diagnostic subscores, whose empirical correlational structure approximated the true structure. Furthermore, with respect to calibration methods, unidimensional calibration was found failing to distinguish subscores, and thus failing to give subscores useful diagnostic information, even though the subscores sometimes appeared more accurate and reliable than those obtained with the other two calibrations. The confirmatory multidimensional calibration and separate unidimensional calibration delivered very comparable results. Finally, alternative scoring methods were found either inappropriate to use or offering insignificant improvements over the raw scores.

Additional Information

Language: English
Date: 2013
Diagnostic score, Dual-purpose test, Multidimensional test, Principled design
Educational tests and measurements $x Design and construction

Email this document to