Open Access
Modeling Rater Effects and Complex Learning Progressions using Item Response Models
Hyo Jeong Shin
- 01 Jan 2015
TL;DR: Shin et al. as discussed by the authors investigated extensions and applications of multilevel and multidimensional item response models with a primary focus on detecting rater effects in double-scored performance assessments, monitoring human raters with automated scoring engine, and developing measurement models for complicated learning progressions.
read more
Abstract: Author(s): Shin, Hyo Jeong | Advisor(s): Mark, Wilson | Abstract: This dissertation is comprised of three papers that propose and apply psychometric models to deal with complexities and challenges in large-scale assessments, focusing on modeling rater effects and complex learning progressions. In particular, three papers investigate extensions and applications of multilevel and multidimensional item response models, with a primary focus on (1) detecting rater effects in double-scored performance assessments, (2) monitoring human raters with automated scoring engine, and (3) developing measurement models for complicated learning progressions.The first paper applies and assesses the trifactor model for multiple ratings data in double-scored performance assessments, in which two different raters give independent scores for the same responses (e.g., the GRE essay). The trifactor model incorporates a cross- classified structure (e.g., items and raters) in addition to the general dimension (e.g., examinees). The paper includes a simulation design that follows the GRE example to reflect the incompleteness and imbalance in the real world assessments. The effect of the missingness rate in the data and ignoring the differences among the raters are investigated using the simulations. The use of the trifactor model is illustrated with empirical data.The second paper applies mixed-effects ordered probit models for the purpose of examining the effectiveness and efficiency of utilizing scores from automated scoring engines (AE) to monitor and provide diagnostic feedback to human raters under training compared to the scores from the human experts (HE). Using the real rater training study data, three types of rater effects—severity, accuracy, and centrality of each rater—are related with model parameters, and compared for cases (a) when the AE is considered as the true score and (b) when the HE is considered as the true score.The third paper proposes a structured constructs model based on change-point analysis to deal with complicated learning progressions, in which relations between levels across multiple constructs are assumed in advance. Based on the change-point analysis, and reparameterizations of the multidimensional Rasch model and partial credit model, cut score parameters and discontinuity parameters are incorporated to classify the examinees into the levels in the learning progressions, and to model the hypothesized relations as the advantage for examinees belonging to a certain level in one construct to reach a level in another construct. Parameter recovery of the proposed model and the consequences of ignoring the hypothesized relations are assessed using simulations. The use of the proposed model is illustrated with empirical data and interpreted as contributing to validity evidence for the hypothesized relations.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
References
•Journal Article
R: A language and environment for statistical computing.
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
410.8K
A new look at the statistical model identification
TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.
Convergent and discriminant validation by the multitrait-multimethod matrix.
TL;DR: This transmutability of the validation matrix argues for the comparisons within the heteromethod block as the most generally relevant validation data, and illustrates the potential interchangeability of trait and method components.
17.4K
Inference from Iterative Simulation Using Multiple Sequences
Andrew Gelman,Donald B. Rubin +1 more
TL;DR: The focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normal- ity after transformations and marginalization, and the results are derived as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations.
Categorical Data Analysis
TL;DR: In this article, categorical data analysis was used for categorical classification of categorical categorical datasets.Categorical Data Analysis, categorical Data analysis, CDA, CPDA, CDSA
15.1K