الفهرس | Only 14 pages are availabe for public view |
Abstract This study aimed to investigate the validity of using automated essay scoring (AES) systems to score essays written by nonnative university female students of English whose native language is Arabic. For this purpose, the performance of the AES program, my access, which was supported by IntelliMetric scoring system, was compared with that of human raters in assigning scores. The data had been obtained by using the IntelliMetric scoring system to score 55 essays and by asking three qualified experienced human raters to score the same essays. Four-point informative analytic and holistic scoring rubrics had been used. The analytic rubric included five traits. The scores were then accumulated. Descriptive statistics, mean differences and pearson correlation coefficient were calculated. The results showed that across the five traits the correlations between the human raters and IntelliMetric scores were weak and moderate, ranging from 0.308 to 0.435. The correlation between IntelliMetric and the first human rater (H1) on holistic scoring was 0.278 and 0.288 with the second human rater (H2). There was no significance correlation between IntelliMetric and the third human rater (H3) on holistic scoring. Across the five traits the results of one-way analysis of variance (ANOVA) indicated that there was a statistically significant difference in the mean of IntelliMetric, H1, H2, and H3. Least significant difference test showed that IntelliMetric and H3 were not statistically different on three traits besides holistic scoring: focus and meaning, content and development and mechanics and conventions |