Title | Date | Views | Brief Description |
An Examination of the Residual Covariance Structures of Complex Performance Exercises Under Various Scaling and Scoring Methods |
2008 |
4843 |
Large-scale assessment programs are increasingly including complex performance exercises along with traditional multiple-choice items in a given test. These performance assessments are developed in part to measure sets of skills that are part of the ... |
Modeling differential pacing trajectories in high stakes computer adaptive testing using hierarchical linear modeling and structural equation modeling |
2006 |
2159 |
"This study compares two statistical methods for modeling changes in response latency (timing) patterns on a high-stakes adaptive test: (1) hierarchical linear modeling (HLM2) and (2) growth modeling using structural equation modeling (SEM). The test... |
Relationships between examinee pacing and observed item responses: results from a multi-factor simulation study and an operational high stakes assessment |
2009 |
2210 |
The use of response time in testing has a relatively long history, ranging from concerns over test speededness to using response times as performance indicators (e.g., speed and accuracy). This model-based investigation examined the relationship betw... |
A comparison of traditional test blueprinting and item development to assessment engineering in a licensure context |
2010 |
3741 |
With the need for larger and larger banks of items to support adaptive
testing and to meet security concerns, large-scale item generation is a
requirement for many certification and licensure programs. As part of the mass
production of items, it i... |
Item parameter changes and equating: an examination of the effects of lack of item parameter invariance on equating and score accuracy for different proficiency levels |
2013 |
6288 |
The impact of particular types of context effects on actual scores is less understood although there has been some research carried out regarding certain types of context effects under the nonequivalent anchor test (NEAT) design. In addition, the iss... |
The optimal design of the dual-purpose test |
2013 |
2112 |
Traditional test development focused on one purpose of the test, either ranking test-takers or providing diagnostic profiles for test-takers. Embedding both the ranking and diagnostic purposes in one assessment instrument would be a great advancement... |
An investigation on computer-adaptive multistage testing panels for multidimensional assessment |
2013 |
3542 |
The computer-adaptive multistage testing (ca-MST) has been developed as an alternative to computerized adaptive testing (CAT), and been increasingly adopted in large-scale assessments. Current research and practice only focus on ca-MST panels for cre... |
Conditions affecting the accuracy of classical equating methods for small samples under the NEAT design: a simulation study |
2011 |
5242 |
Small sample equating remains a largely unexplored area of research. This study attempts to fill in some of the research gaps via a large-scale, IRT-based simulation study that evaluates the performance of seven small-sample equating methods under va... |
A comparison of observed score approaches to detecting differential item functioning among multiple groups |
2018 |
984 |
The overall purpose of this dissertation was to compare various observed score approaches in detecting differential item functioning among multiple examinee groups simultaneously. Specifically, this study contributes to the literature base by investi... |
Operationalizing item difficulty modeling in a medical certification context |
2020 |
661 |
This research study modeled item difficulty in general pediatric test items using content, cognitive complexity, linguistic, and text-based variables. The research first presents an introduction which addresses the current shortcomings found in item ... |
Principled assessment as a foundation for standard setting |
2015 |
1908 |
This study investigated the impact of using Assessment Engineering (AE) task models as the unit of judgment in a standard setting workshop. The proposed method, or Task Model-based Standard Setting (TMSS), used a procedure similar to that of the Book... |
A simulation study to investigate optimal equating anchor set construction practices under the NEAT design |
2018 |
983 |
This study examines anchor set construction techniques in observed score test equating under the non-equivalent with anchor-test design. It differs from other studies in that it seeks to understand the interaction between the examinee abilities, test... |
Optimal characteristics of anchor tests in vertical scaling: a special case of non equivalent groups with anchor test (NEAT) design in vertical scaling |
2019 |
638 |
There are multiple empirical issues and complications associated with vertical scaling methods that have not been sufficiently explicated even though there has been scanty research conducted within the general framework of the nonequivalent group wit... |
A reconceptualization of IRT calibration with DIF items in a PROMIS Fatigue measure |
2022 |
45 |
Differential item functioning (DIF) is a statistical procedure intended for examining and evaluating test fairness. After DIF items are detected, there are three methods to deal with DIF items, which are to ignore DIF items, remove DIF items, and cre... |
Detecting test cheating using a Deterministic, gated item response theory model |
2010 |
7478 |
High-stakes tests are widely used as measurement tools to make inferences about test takers' proficiency, achievement, competence or knowledge. The stakes may be directly related to test performance, such as obtaining a high-school diploma, being gra... |
The effects of routing and scoring within a computer adaptive multi-stage framework |
2014 |
1884 |
This dissertation examined the overall effects of routing and scoring within a computer adaptive multi-stage framework (ca-MST). Testing in a ca-MST environment has become extremely popular in the testing industry. Testing companies enjoy its efficie... |
Enemy item detection using data mining methods |
2019 |
1905 |
Enemy items are any two items that should not appear on the same test form. These items may address the same material, or one may provide clues about the answer to another. Most enemy item pairs are identified before forms are published; subject matt... |
Quality control and the impact of variation and prediction errors on item family design |
2024 |
34 |
This two-part study examined the impact of variation within item families and errors associated with predicted item difficulty parameters on examinee test scores. Part A served as an extension of Shu et al.’s (2010) study to address how much variatio... |
Data collection design for equivalent groups equating:using a matrix stratification framework for mixed-format assessment |
2012 |
3694 |
Mixed-format assessments are increasingly being used in large scale standardized assessments to measure a continuum of skills ranging from basic recall to higher order thinking skills. These assessments are usually comprised of a combination of (a) m... |
Scoring methods of innovative items |
2021 |
197 |
Advancements in technology and computer-based testing has allowed for greater flexibility in assessing examinee knowledge on large-scale, high-stakes assessments. Through computer-based delivery, cognitive ability and skills can be effectively assess... |