A challenge of large-scale oral communication assessments is to feasibly assess a broad construct that includes interactional competence. One possible approach in addressing this challenge is to use a spoken dialog system (SDS), with the computer acting as a peer to elicit a ratable speech sample. With this aim, an SDS was built and four trained human raters assessed the discourse elicited from 40 test takers that completed a paired oral task with both a human and a computer partner. The test takers were evaluated based on the analytic operational oral communication rating scales which included interactional competence, fluency, pronunciation, and grammar/vocabulary. Repeated-measures ANOVA indicated that fluency, pronunciation, and grammar and vocabulary were scored similarly across the two conditions, while interactional competence was scored substantially higher in the human partner condition. A g-study indicated that the computer partner was more reliable in assessing interactional competence, and rater questionnaire and interview data suggested the computer provided a more standardized assessment. Conversely, raters generally favored the human partner, in part because of its perceived authenticity and naturalness.

