Please use the following text to cite this item or export to a predefined format:
Ferraresi, Adriano; et al., 2026, UNITE corpus, ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", http://hdl.handle.net/20.500.11752/OPEN-2156
dc.contributor.authorFerraresi, Adriano
dc.contributor.authorRaffi, Francesca
dc.contributor.authorMongibello, Anna
dc.contributor.authorPolizzi, Daniele
dc.contributor.authorPalmieri, Giada
dc.contributor.authorMoretti, Beatrice
dc.contributor.authorDe Brasi, Valentina
dc.date.accessioned2026-03-18T11:34:06Z
dc.date.available2026-03-18T11:34:06Z
dc.date.issued2026-02-28
dc.descriptionThe UNITE corpus is a learner corpus of English consisting of written interactions between Italian university students and AI-based chatbots. The data were collected in 2024 in English as a Foreign Language (EFL) learning scenarios in which students interacted with chatbots in small talk and role-play tasks. Data collection took place at three universities in Italy: the University of Bologna, the University of Macerata, and the University of Naples "L'Orientale". The corpus contains 326 interactions (722,537 tokens) produced by as many learners aged 19–25, with mostly low- to upper-intermediate proficiency levels, enrolled in non-linguistic degree courses. The participants also include learners with disabilities and specific learning disorders, reflecting the project’s focus on inclusive language learning practices. The UNITE corpus is described using the Core Metadata Schema for Learner Corpora (Paquot et al. 2024) and is distributed as part of this repository in two versions. The first is a minimally annotated text version including metadata at the text, learner, task, and turn levels. The second version features learner-error annotation based on an adapted version of the Louvain Error Tagging scheme (Granger et al. 2022). The corpus is also accessible through the NoSketch Engine platform hosted at https://corpora.dipintra.it. The version of the corpus available there is further enriched with linguistic annotation (part-of-speech tags and lemmas).
dc.identifier.urihttp://hdl.handle.net/20.500.11752/OPEN-2156
dc.language.isoeng
dc.publisherAlma Mater Studiorum – Università di Bologna
dc.relation.isreferencedbyhttps://doi.org/10.5281/zenodo.18945548
dc.rightsCreative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.labelPUB
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/
dc.source.urihttps://site.unibo.it/unite/en
dc.subjectLearner corpus
dc.subjecthuman-AI written dialogue
dc.subjectEnglish as a foreign language
dc.titleUNITE corpus
dc.typecorpus
local.contact.personAdriano Ferraresi adriano.ferraresi@unibo.it Alma Mater Studiorum – Università di Bologna
local.demo.urihttps://corpora.dipintra.it
local.files.count2
local.files.size5919227
local.has.filesyes
local.language.nameEnglish
local.size.info722316 tokens
local.size.info326 texts
local.sponsornationalFunds 2022JB5KAL Ministero dell’Università e della Ricerca (MUR), Italy PRIN 2022
metashare.ResourceInfo#ContentInfo.mediaTypetext
 Files in this item
Name
readme-UNITE_corpus.txt
Size
11.48 KB
Format
text/plain
Description
Text
MD5
837a0066068faa2a6a694295b086fbe7
Preview
  File Preview
Name
UNITE_corpus.zip
Size
5.63 MB
Format
application/zip
Description
Zip
MD5
07d731ee2d1bd7de39c292213ff02f87
Preview
  File Preview