An Evaluation of IntelliMetric™ Essay Scoring System

Lawrence M. Rudner; Veronica Garcia; Catherine Welch

Authors

Lawrence M. Rudner GMAC
Veronica Garcia GMAC
Catherine Welch Assessment Innovations at ACT, Inc.

Keywords:

Automated Essay Scoring, Raters, Rater consistancy

Abstract

This report provides a two-part evaluation of the IntelliMetric™ automated essay scoring system based on its performance scoring essays from the Analytic Writing Assessment of the Graduate Management Admission Test™ (GMAT™). The IntelliMetric system performance is first compared to that of individual human raters, a Bayesian system employing simple word counts, and a weighted probability model using more than 750 responses to each of six prompts. The second, larger evaluation compares the IntelliMetric system ratings to those of human raters using approximately 500 responses to each of 101 prompts. Results from both evaluations suggest the IntelliMetric system is a consistent, reliable system for scoring AWA essays with a perfect + adjacent agreement on 96% to 98% and 92% to 100% of instances in evaluations 1 and 2, respectively. The Pearson r correlations of agreement between human raters and the IntelliMetric system averaged .83 in both evaluations.

An Evaluation of IntelliMetric™ Essay Scoring System

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Developed By

Information