Scoring methods of innovative items

UNCG Author/Contributor (non-UNCG co-authors, if there are any, appear on document)
Bradley J. Ungurait (Creator)
The University of North Carolina at Greensboro (UNCG )
Web Site:
Richard Luecht

Abstract: Advancements in technology and computer-based testing has allowed for greater flexibility in assessing examinee knowledge on large-scale, high-stakes assessments. Through computer-based delivery, cognitive ability and skills can be effectively assessed cost-efficiently and measure domains that are difficult or even impossible to measure with traditional paper and pencil assessments. Current educational methodology focuses on providing realistic problems for examinees to connect knowledge, processes, and strategies to finding a solution. As large-scale assessment programs move away from a paper-based format to computer-based delivery they are beginning to investigate and incorporate innovative item types. A contemporary measurement technique widely used by large-scale assessment programs to model examinee data is Item Response Theory (IRT). There are several key assumptions in IRT that must be fulfilled to ensure item and examinee parameter estimates are valid. Local item independence is one critical assumption that is directly related to the estimation process. When this assumption is violated, and not accounted for during item calibration and latent ability estimation, bias in parameterization can be introduced. The purpose of this study is to explore the effects that two scoring strategies have on the residual covariance structure on an assessment of reading comprehension. Analyzing the residual covariance of item scores can be used as an indicator of departure from unidimensionality and item independence. What is being measured is the degree to which the violation of dimensionality is small enough to be insensitive when estimating item parameters. Though, no assessment strictly satisfies the assumption of unidimensionality. Any nuisance factors that are not detected in a test for dimensionality could eventually accumulate in the model-fit statistics. Even if the assumption of unidimensionality is met it is important to identify items displaying dependency and finding ways to handle this issue. Results from this dissertation suggest that when items are scored as unique measurement opportunities, items within a contextual-based assessment display item dependency. The degree of item dependency by a subset of items is a factor on internal reliability estimation but does not impact unidimensional data structure. However, the impact of item dependency on testlet information differs between scoring method and calibration model. When a high degree of item dependency is present, testlet information is overestimated when scored as unique measurement opportunities. Estimations of examinee ability are highly correlated no matter which scoring method or IRT model is used.

Additional Information

Language: English
Date: 2021
Innovative Items, Local Item Dependence, Scoring
Item response theory
Educational tests and measurements

Email this document to