Comparing three multilevel frameworks for the detection of differential item functioning

UNCG Author/Contributor (non-UNCG co-authors, if there are any, appear on document)
Elizabeth Adele Patton (Creator)
The University of North Carolina at Greensboro (UNCG )
Web Site:
Robert Henson

Abstract: Multilevel data complicates the accumulation of validation evidence. Using a unilevel approach to differential item functioning in the presence of multilevel data is both a theoretically and statistically unsound method. This simulation study compares three multilevel frameworks for the detection of differential item functioning. The methods compared were the Beggs Mantel-Haenszel adjustment, the multilevel Rasch model, and the SIBTEST bootstrapped standard error adjustment. Five conditions were varied in this study: the magnitude of DIF, the social-unit level sample size, the presence of impact, the degree of correlation within clusters, and the ratio of the reference to focal group. The results suggest that the Beggs Mantel-Haenszel adjustment is superior when analyzing Type I error and power rates. However, the multilevel Rasch model produced more accurate and precise estimates of effect size. Additionally, the multilevel Rasch model has the potential to provide more nuanced information regarding the causes of item bias.

Additional Information

Language: English
Date: 2019
Accountability, Differential item functioning, Multilevel differential item functioning, Validation
Educational tests and measurements $x Statistics
Educational tests and measurements $x Validity
Test bias $x Evaluation

Email this document to