Automated Essay Scoring Versus Human Scoring: A Comparative Study

Jinhao Wang, Michelle Stallone Brown

Abstract


The current research was conducted to investigate the validity of automated essay scoring (AES) by comparing group mean scores assigned by AES and human raters. Data collection included two standardized writing tests – WritePlacer Plus and the Texas Higher Education Assessment (THEA) writing test. The research sample of 107 participants was drawn from a Hispanic serving institution in South Texas. The One-Way Repeated-Measures ANOVA and the follow-up Paired Samples t test were conducted to examine the group mean differences. Results of the tests indicated that the mean score assigned by IntelliMetric™ was significantly higher than faculty human raters’ mean score on WriterPlacer Plus test, and IntelliMetric™ mean score was also significantly higher than THEA mean score assigned by human raters from National Evaluation Systems. A statistically significant difference also existed between the human raters’ mean score on WritePlacer Plus and human raters’ mean score on THEA. These findings did not corroborate previous studies that reported non-significant mean score differences between AES and human scoring.

Keywords


Automated essay scoring; human raters; group mean scores; WritePlacer; Texas Higher Education Assessment; One-Way Repeated Measures ANOVA; Paired Samples t test; technology; computer

Full Text: PDF