A substantive body of research has been revolving around the linguistic features that distinguish different levels of students’ writing samples (e.g. Crossley and McNamara 2012; McNamara et al. 2015; Lu 2017). Nevertheless, it is somewhat difficult to generalize the findings across various empirical studies, given that different criteria were adopted to measure language learners’ proficiency levels (Chen and Baker 2016). Some researchers suggested using the Common European Framework of Reference for Languages (CEFR) (Council of Europe 2001) as the common standard of evaluating and describing students’ proficiency levels. Therefore, the current research intends to identify the linguistic features that distinguish students’ writing samples across CEFR levels by adopting a machine-learning method, decision tree, which provides the direct visualization of decisions made in each step of the classification procedure. The linguistic features that emerged as predicative of CEFR levels could be employed to (i) inform L2 writing instruction, (ii) track long-term development of writing ability, and (iii) facilitate experts’ judgment in the practice of aligning writing tests/samples with CEFR.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)