An Evaluation of IntelliMetric™ Essay Scoring System
Keywords:
Automated Essay Scoring, Raters, Rater consistancyAbstract
This report provides a two-part evaluation of the IntelliMetric™ automated essay scoring system based on its performance scoring essays from the Analytic Writing Assessment of the Graduate Management Admission Test™ (GMAT™). The IntelliMetric system performance is first compared to that of individual human raters, a Bayesian system employing simple word counts, and a weighted probability model using more than 750 responses to each of six prompts. The second, larger evaluation compares the IntelliMetric system ratings to those of human raters using approximately 500 responses to each of 101 prompts. Results from both evaluations suggest the IntelliMetric system is a consistent, reliable system for scoring AWA essays with a perfect + adjacent agreement on 96% to 98% and 92% to 100% of instances in evaluations 1 and 2, respectively. The Pearson r correlations of agreement between human raters and the IntelliMetric system averaged .83 in both evaluations.Downloads
Published
2006-03-29
How to Cite
Rudner, L. M., Garcia, V., & Welch, C. (2006). An Evaluation of IntelliMetric™ Essay Scoring System. The Journal of Technology, Learning and Assessment, 4(4). Retrieved from https://ejournals.bc.edu/index.php/jtla/article/view/1651
Issue
Section
Articles