UNITE corpus

Ferraresi, Adriano

Please use the following text to cite this item or export to a predefined format:

Ferraresi, Adriano; et al., 2026, UNITE corpus, ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", http://hdl.handle.net/20.500.11752/OPEN-2156

Share

dc.contributor.author	Ferraresi, Adriano
dc.contributor.author	Raffi, Francesca
dc.contributor.author	Mongibello, Anna
dc.contributor.author	Polizzi, Daniele
dc.contributor.author	Palmieri, Giada
dc.contributor.author	Moretti, Beatrice
dc.contributor.author	De Brasi, Valentina
dc.date.accessioned	2026-03-18T11:34:06Z
dc.date.available	2026-03-18T11:34:06Z
dc.date.issued	2026-02-28
dc.description	The UNITE corpus is a learner corpus of English consisting of written interactions between Italian university students and AI-based chatbots. The data were collected in 2024 in English as a Foreign Language (EFL) learning scenarios in which students interacted with chatbots in small talk and role-play tasks. Data collection took place at three universities in Italy: the University of Bologna, the University of Macerata, and the University of Naples "L'Orientale". The corpus contains 326 interactions (722,537 tokens) produced by as many learners aged 19–25, with mostly low- to upper-intermediate proficiency levels, enrolled in non-linguistic degree courses. The participants also include learners with disabilities and specific learning disorders, reflecting the project’s focus on inclusive language learning practices. The UNITE corpus is described using the Core Metadata Schema for Learner Corpora (Paquot et al. 2024) and is distributed as part of this repository in two versions. The first is a minimally annotated text version including metadata at the text, learner, task, and turn levels. The second version features learner-error annotation based on an adapted version of the Louvain Error Tagging scheme (Granger et al. 2022). The corpus is also accessible through the NoSketch Engine platform hosted at https://corpora.dipintra.it. The version of the corpus available there is further enriched with linguistic annotation (part-of-speech tags and lemmas).
dc.identifier.uri	http://hdl.handle.net/20.500.11752/OPEN-2156
dc.language.iso	eng
dc.publisher	Alma Mater Studiorum – Università di Bologna
dc.relation.isreferencedby	https://doi.org/10.5281/zenodo.18945548
dc.rights	Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.label	PUB
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.source.uri	https://site.unibo.it/unite/en
dc.subject	Learner corpus
dc.subject	human-AI written dialogue
dc.subject	English as a foreign language
dc.title	UNITE corpus
dc.type	corpus
local.contact.person	Adriano Ferraresi adriano.ferraresi@unibo.it Alma Mater Studiorum – Università di Bologna
local.demo.uri	https://corpora.dipintra.it
local.files.count	2
local.files.size	5914562
local.has.files	yes
local.language.name	English
local.size.info	722316 tokens
local.size.info	326 texts
local.sponsor	nationalFunds 2022JB5KAL Ministero dell’Università e della Ricerca (MUR), Italy PRIN 2022
metashare.ResourceInfo#ContentInfo.mediaType	text