An Evaluation of IntelliMetric™ Essay Scoring System

Authors

  • Lawrence M. Rudner GMAC
  • Veronica Garcia GMAC
  • Catherine Welch Assessment Innovations at ACT, Inc.

Keywords:

Automated Essay Scoring, Raters, Rater consistancy

Abstract

This report provides a two-part evaluation of the IntelliMetric™ automated essay scoring system based on its performance scoring essays from the Analytic Writing Assessment of the Graduate Management Admission Test™ (GMAT™). The IntelliMetric system performance is first compared to that of individual human raters, a Bayesian system employing simple word counts, and a weighted probability model using more than 750 responses to each of six prompts. The second, larger evaluation compares the IntelliMetric system ratings to those of human raters using approximately 500 responses to each of 101 prompts. Results from both evaluations suggest the IntelliMetric system is a consistent, reliable system for scoring AWA essays with a perfect + adjacent agreement on 96% to 98% and 92% to 100% of instances in evaluations 1 and 2, respectively. The Pearson r correlations of agreement between human raters and the IntelliMetric system averaged .83 in both evaluations.

Downloads

Published

2006-03-29

How to Cite

Rudner, L. M., Garcia, V., & Welch, C. (2006). An Evaluation of IntelliMetric™ Essay Scoring System. The Journal of Technology, Learning and Assessment, 4(4). Retrieved from https://ejournals.bc.edu/index.php/jtla/article/view/1651

Issue

Section

Articles