Please use the following text to cite this item or export to a predefined format:
Ferraresi, Adriano; et al., 2026, UNITE corpus, ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", http://hdl.handle.net/20.500.11752/OPEN-2156
| dc.contributor.author | Ferraresi, Adriano |
| dc.contributor.author | Raffi, Francesca |
| dc.contributor.author | Mongibello, Anna |
| dc.contributor.author | Polizzi, Daniele |
| dc.contributor.author | Palmieri, Giada |
| dc.contributor.author | Moretti, Beatrice |
| dc.contributor.author | De Brasi, Valentina |
| dc.date.accessioned | 2026-03-18T11:34:06Z |
| dc.date.available | 2026-03-18T11:34:06Z |
| dc.date.issued | 2026-02-28 |
| dc.description | The UNITE corpus is a learner corpus of English consisting of written interactions between Italian university students and AI-based chatbots. The data were collected in 2024 in English as a Foreign Language (EFL) learning scenarios in which students interacted with chatbots in small talk and role-play tasks. Data collection took place at three universities in Italy: the University of Bologna, the University of Macerata, and the University of Naples "L'Orientale". The corpus contains 326 interactions (722,537 tokens) produced by as many learners aged 19–25, with mostly low- to upper-intermediate proficiency levels, enrolled in non-linguistic degree courses. The participants also include learners with disabilities and specific learning disorders, reflecting the project’s focus on inclusive language learning practices. The UNITE corpus is described using the Core Metadata Schema for Learner Corpora (Paquot et al. 2024) and is distributed as part of this repository in two versions. The first is a minimally annotated text version including metadata at the text, learner, task, and turn levels. The second version features learner-error annotation based on an adapted version of the Louvain Error Tagging scheme (Granger et al. 2022). The corpus is also accessible through the NoSketch Engine platform hosted at https://corpora.dipintra.it. The version of the corpus available there is further enriched with linguistic annotation (part-of-speech tags and lemmas). |
| dc.identifier.uri | http://hdl.handle.net/20.500.11752/OPEN-2156 |
| dc.language.iso | eng |
| dc.publisher | Alma Mater Studiorum – Università di Bologna |
| dc.relation.isreferencedby | https://doi.org/10.5281/zenodo.18945548 |
| dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
| dc.rights.label | PUB |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ |
| dc.source.uri | https://site.unibo.it/unite/en |
| dc.subject | Learner corpus |
| dc.subject | human-AI written dialogue |
| dc.subject | English as a foreign language |
| dc.title | UNITE corpus |
| dc.type | corpus |
| local.contact.person | Adriano Ferraresi adriano.ferraresi@unibo.it Alma Mater Studiorum – Università di Bologna |
| local.demo.uri | https://corpora.dipintra.it |
| local.files.count | 2 |
| local.files.size | 5919227 |
| local.has.files | yes |
| local.language.name | English |
| local.size.info | 722316 tokens |
| local.size.info | 326 texts |
| local.sponsor | nationalFunds 2022JB5KAL Ministero dell’Università e della Ricerca (MUR), Italy PRIN 2022 |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- readme-UNITE_corpus.txt
- Size
- 11.48 KB
- Format
- text/plain
- Description
- Text
- MD5
- 837a0066068faa2a6a694295b086fbe7

The file preview has not been generated yet. Please try again later or contact the system administrator dspace-clarin-it-ilc-help@ilc.cnr.it
- Name
- UNITE_corpus.zip
- Size
- 5.63 MB
- Format
- application/zip
- Description
- Zip
- MD5
- 07d731ee2d1bd7de39c292213ff02f87

The file preview has not been generated yet. Please try again later or contact the system administrator dspace-clarin-it-ilc-help@ilc.cnr.it

