An Evaluation of IntelliMetric™ Essay Scoring System

Lawrence M. Rudner, Veronica Garcia, Catherine Welch

Abstract


This report provides a two-part evaluation of the IntelliMetric™ automated essay scoring system based on its performance scoring essays from the Analytic Writing Assessment of the Graduate Management Admission Test™ (GMAT™). The IntelliMetric system performance is first compared to that of individual human raters, a Bayesian system employing simple word counts, and a weighted probability model using more than 750 responses to each of six prompts. The second, larger evaluation compares the IntelliMetric system ratings to those of human raters using approximately 500 responses to each of 101 prompts. Results from both evaluations suggest the IntelliMetric system is a consistent, reliable system for scoring AWA essays with a perfect + adjacent agreement on 96% to 98% and 92% to 100% of instances in evaluations 1 and 2, respectively. The Pearson r correlations of agreement between human raters and the IntelliMetric system averaged .83 in both evaluations.

Keywords


Automated Essay Scoring; Raters; Rater consistancy

Full Text: PDF