UNITE corpus

Please use the following text to cite this item or export to a predefined format:
Ferraresi, Adriano; et al., 2026, UNITE corpus, ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", http://hdl.handle.net/20.500.11752/OPEN-2156
Date issued
2026-02-28
Size
722316 tokens,
326 texts
Language(s)
Description
The UNITE corpus is a learner corpus of English consisting of written interactions between Italian university students and AI-based chatbots. The data were collected in 2024 in English as a Foreign Language (EFL) learning scenarios in which students interacted with chatbots in small talk and role-play tasks. Data collection took place at three universities in Italy: the University of Bologna, the University of Macerata, and the University of Naples "L'Orientale". The corpus contains 326 interactions (722,537 tokens) produced by as many learners aged 19–25, with mostly low- to upper-intermediate proficiency levels, enrolled in non-linguistic degree courses. The participants also include learners with disabilities and specific learning disorders, reflecting the project’s focus on inclusive language learning practices. The UNITE corpus is described using the Core Metadata Schema for Learner Corpora (Paquot et al. 2024) and is distributed as part of this repository in two versions. The first is a minimally annotated text version including metadata at the text, learner, task, and turn levels. The second version features learner-error annotation based on an adapted version of the Louvain Error Tagging scheme (Granger et al. 2022). The corpus is also accessible through the NoSketch Engine platform hosted at https://corpora.dipintra.it. The version of the corpus available there is further enriched with linguistic annotation (part-of-speech tags and lemmas).
Acknowledgement
 Files in this item
Name
readme-UNITE_corpus.txt
Size
11.48 KB
Format
text/plain
Description
Text
MD5
837a0066068faa2a6a694295b086fbe7
Preview
  File Preview
Name
UNITE_corpus.zip
Size
5.63 MB
Format
application/zip
Description
Zip
MD5
07d731ee2d1bd7de39c292213ff02f87
Preview
  File Preview