A comparison of observed score approaches to detecting differential item functioning among multiple groups

UNCG Author/Contributor (non-UNCG co-authors, if there are any, appear on document)
Jonathan Darrell Rollins III (Creator)
The University of North Carolina at Greensboro (UNCG )
Web Site: http://library.uncg.edu/
Richard Luecht

Abstract: The overall purpose of this dissertation was to compare various observed score approaches in detecting differential item functioning among multiple examinee groups simultaneously. Specifically, this study contributes to the literature base by investigating a lasso-constraint observed score method (i.e., logistic regression lasso; LR lasso) in the context of multiple groups as well as features of test design related to test information targets. Given that a lasso-constraint method has not been extended for multiple groups using observed scores, comparisons are made with other observed score techniques (i.e., generalized Mantel-Haenszel ?2 and generalized logistic regression) while using item response theory to generate data (thus avoiding model-data congruity complications in the study design). Multiple variables were manipulated in a simulation study at the test-level (e.g., the location of the test information target relative to the central tendency of the examinee population, and the shape of the test information function), item-level (e.g., the location of DIF items relative to the test information target, and the percentage of DIF items), and for simulees (e.g., the amount of impact and sample size balance). The relative lack of literature which explores DIF as it relates to target test information functions provided the exigency for exploring it within this study, along with its typical absence in literature using IRT generation models. Practitioners may find the results useful in judging the merit of adopting the newer lasso method for detecting DIF within multiple groups as opposed to pre-existing methods. Furthermore, the test design features of this study allow for the interpretation to be less theoretical in nature and better aligned with standard operational practices, such as building exams to be optimized at test information targets, for example. The results provide consilience that the LR lasso method has inflated type I error overall with no additional benefit in power. In fact, even when type I error rates are comparable across methods, LR lasso has a lower hit rate in many instances (i.e., higher type II error rate). The sensitivity of LR lasso to detecting DIF items seems to be substantially influenced by having an increased number of DIF items on a form. Recommendations for practitioners, as well as limitations and directions for future research, are provided as well. Taken collectively, the results of the simulation study can be interpreted to support the claim that LR lasso fails to perform comparably with more established methods for multiple groups DIF detection across numerous instances but could potentially have merit in practical application in situations that have yet to be explored. While some limitations of LR lasso were noted within this study, there are a variety of other conditions which need to be explored before practitioners discard the method altogether (a few such studies are suggested). It may well be the case that the added complexity afforded by the regularization in estimating the group-specific model parameters through lasso constraints may confound the detection of the DIF items.

Additional Information

Language: English
Date: 2018
Differential Item Functioning, Multiple Groups DIF, Psychometrics, Simulation Study
Educational tests and measurements $x Design and construction
Psychological tests $x Design and construction
Examinations $x Design and construction

Email this document to