UNITE corpus
Please use the following text to cite this item or export to a predefined format:
Ferraresi, Adriano; et al., 2026, UNITE corpus, ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", http://hdl.handle.net/20.500.11752/OPEN-2156
Authors
Ferraresi, Adriano ; et al.
Item identifier
Project URL
Demo URL
Referenced by
Date issued
2026-02-28
Size
722316 tokens,
326 texts
Language(s)
Description
The UNITE corpus is a learner corpus of English consisting of written interactions between Italian university students and AI-based chatbots. The data were collected in 2024 in English as a Foreign Language (EFL) learning scenarios in which students interacted with chatbots in small talk and role-play tasks. Data collection took place at three universities in Italy: the University of Bologna, the University of Macerata, and the University of Naples "L'Orientale".
The corpus contains 326 interactions (722,537 tokens) produced by as many learners aged 19–25, with mostly low- to upper-intermediate proficiency levels, enrolled in non-linguistic degree courses. The participants also include learners with disabilities and specific learning disorders, reflecting the project’s focus on inclusive language learning practices.
The UNITE corpus is described using the Core Metadata Schema for Learner Corpora (Paquot et al. 2024) and is distributed as part of this repository in two versions. The first is a minimally annotated text version including metadata at the text, learner, task, and turn levels. The second version features learner-error annotation based on an adapted version of the Louvain Error Tagging scheme (Granger et al. 2022).
The corpus is also accessible through the NoSketch Engine platform hosted at https://corpora.dipintra.it. The version of the corpus available there is further enriched with linguistic annotation (part-of-speech tags and lemmas).
Acknowledgement
Ministero dell’Università e della Ricerca (MUR), Italy
Project code:2022JB5KAL
Project name:PRIN 2022
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- readme-UNITE_corpus.txt
- Size
- 11.48 KB
- Format
- text/plain
- Description
- Text
- MD5
- 837a0066068faa2a6a694295b086fbe7

The file preview has not been generated yet. Please try again later or contact the system administrator dspace-clarin-it-ilc-help@ilc.cnr.it
- Name
- UNITE_corpus.zip
- Size
- 5.63 MB
- Format
- application/zip
- Description
- Zip
- MD5
- 07d731ee2d1bd7de39c292213ff02f87

The file preview has not been generated yet. Please try again later or contact the system administrator dspace-clarin-it-ilc-help@ilc.cnr.it

