A challenge of large-scale oral communication assessments is to feasibly assess a broad construct that includes interactional competence. One possible approach in addressing this challenge is to use a spoken dialog system (SDS), with the computer acting as a peer to elicit a ratable speech sample. With this aim, an SDS was built and four trained human raters assessed the discourse elicited from 40 test takers that completed a paired oral task with both a human and a computer partner. The test takers were evaluated based on the analytic operational oral communication rating scales which included interactional competence, fluency, pronunciation, and grammar/vocabulary. Repeated-measures ANOVA indicated that fluency, pronunciation, and grammar and vocabulary were scored similarly across the two conditions, while interactional competence was scored substantially higher in the human partner condition. A g-study indicated that the computer partner was more reliable in assessing interactional competence, and rater questionnaire and interview data suggested the computer provided a more standardized assessment. Conversely, raters generally favored the human partner, in part because of its perceived authenticity and naturalness.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]