This is a new version of the repository. Do let us know (dspace-clarin-it-ilc-help@ilc.cnr.it) if you encounter any issues.
What's New
corpusOPEN
Author(s):
Description:
This corpus contains 75 written autobiographical narratives related to affective objects, extracted from a larger dataset of 236 autobiographical narratives produced by Italian university students as part of a bilingual Italian–Spanish research protocol on self-concept and identity. Participants were asked to describe an object, person, or place they considered important or indispensable in their lives; this subcorpus includes exclusively the 75 responses in which participants chose to write about an object. Participants were prompted with the following question: "C'è qualche oggetto, persona o luogo importante per te o indispensabile nella tua vita? Descrivilo in dettaglio: la sua funzione, perché è così importante per te, ecc." Data were collected at three institutions in Emilia-Romagna: Università di Bologna (Political Science, Social and International Sciences), Università di Bologna (Modern Languages and Civilizations), and Università di Parma (Modern Languages and Civilizations). All participants provided written informed consent. The corpus is intended for research in corpus linguistics, psycholinguistics, and the study of autobiographical memory, identity, material culture, and anthropology. The corpus is structured as a CSV file with four columns: participant ID, gender (M/F), degree programme and institution, and narrative text.
This item contains 1 file (37.79 KB).
Publicly Available
corpusILC
Author(s):
Description:
The Plain OIIC version is a corpus of transcribed online interactions collected in academic contexts where intercomprehension among Romance languages is used as a pedagogical approach. The data consist of transcriptions of videorecorded interactions involving university students and tutors engaged in collaborative tasks, classroom discussions, and project-related activities in multilingual educational settings. This version of the corpus contains transcripts only, without linguistic annotation. The transcripts are segmented into turns and include speaker identification and timestamps. This version is intended for users who wish to work directly on the transcripts, for example for discourse analysis, conversation analysis, classroom interaction studies, or for developing and testing annotation schemes. The Plain OIIC version can also be used for teaching purposes, training in transcription and annotation, and methodological studies on multilingual interaction and intercomprehension practices. Since no annotation is included, this version allows researchers to apply their own analytical frameworks and annotation systems. The corpus is accompanied by metadata files describing recording sessions and participants, as well as documentation of the data collection context and transcription conventions.
This item contains 2 files (259.79 KB).
Publicly Available
corpusILC
Author(s):
Description:
The OIIC Annotated Version is a corpus of transcribed and annotated online interactions collected in academic contexts where intercomprehension among Romance languages is used as a pedagogical approach. The data consist of transcriptions of videorecorded interactions involving university students and tutors engaged in collaborative tasks, classroom discussions, and project-related activities in multilingual settings. This version of the corpus includes multilayer annotation of intercomprehension strategies performed in ELAN. The annotation system captures several dimensions of interaction, including interactional strategies (e.g. clarification requests, confirmation of understanding, agreement and disagreement), metadiscursive strategies (e.g. reformulations, metalinguistic comments), lexical phenomena (e.g. code-switching, calques, specialized vocabulary, translation), non-verbal and prosodic strategies (e.g. slowed speech, emphasis, gestures), participatory dominance phenomena (e.g. cooperative and competitive overlaps or interruptions), and misunderstanding sequences (resolved or unresolved). The annotated corpus is distributed in multiple complementary formats: ELAN files (.eaf) with time-aligned multilayer annotation, annotated transcripts in text format with embedded annotation labels, and tabular files (.csv) derived from the ELAN annotations to support quantitative analysis. The ELAN files constitute the primary annotated data, while the text and tabular versions are provided to facilitate qualitative reading, corpus exploration, and statistical analysis. The corpus is accompanied by metadata files describing sessions and participants, as well as documentation of the annotation system, including annotation guidelines and a table of annotation labels and their correspondence with ELAN labels. This annotated version of the corpus is particularly suitable for research on intercomprehension, multilingual interaction, mediation strategies, classroom discourse, translanguaging, and interactional practices in educational contexts. It can be used for both qualitative and quantitative analyses.
This item contains 2 files (1.41 MB).
Publicly Available
Most Viewed Items - Last Month
lexicalConceptualResourceILC
Author(s):
Description:
ItalWordNet (IWN) is a lexical-semantic database developed in the framework of two different research projects: EuroWordNet (EWN) and Sistema Integrato per il Trattamento Automatico del Linguaggio (SI-TAL). IWN is structured in the same way as the Princeton WordNet, namely around the notion of synset. Following the model designed in EWN, IWN encodes a rich set of semantic relations. In addition to the internal language relations, equivalence relations were also encoded between Italian synsets and the closest concepts in an Inter-Lingual Index (ILI), a separate language-independent module containing all WN1.5 synsets but not the relations among them. IWN now contains information about Italian Nouns, Verbs, Adjectives and Adverbs. This SQL version of IWN v2.0 contains a corrected and revised version of the original IWN: 49350 Synsets (of which: 3459 proper nouns, 32073 nominal, 8903 verbal, 4374 adjectival, 541 adverbial) 48416 Lemmas (of which: 3918 proper nouns, 29527 nouns, 8015 verbs, 5808 adjectives, 1090 adverbs) 68478 Senses
This item contains 2 files (4.81 MB).
Publicly Available
corpusOPEN
Author(s):
Description:
COME CITARE: Cereser E., Mastrantonio, D. (a cura di), ALEF. Archivio per lo studio della Lingua degli Elaborati studenteschi - Ca’ Foscari, progetto digitale di F. Boschetti, Venezia, Università Ca’ Foscari, 2026. ALEF è un archivio dell’Università Ca’ Foscari che raccoglie elaborati studenteschi provenienti dalle scuole superiori. La raccolta e lo studio delle produzioni studentesche si inseriscono nell’ambito del Ce.Do.Di (Centro di documentazione e ricerca sulla scuola e la didattica del Dipartimento di Studi Umanistici dell’Università Ca’ Foscari). Per la creazione di una rete di contatti con le scuole è stata importante la mediazione della sezione scuola di ASLI (Associazione per la Storia della Lingua Italiana), già coordinata da Rita Fresu. I testi sono stati acquisiti da Ca’ Foscari grazie a convenzioni siglate tra il Dipartimento di Studi Umanistici e i vari istituti scolastici italiani. Per ragioni di privacy, non sono qui menzionati i nomi delle tante e dei tanti docenti che hanno reso possibile la raccolta degli elaborati. Nella forma attuale (marzo 2026) ALEF contiene 227 testi scritti da studentesse e studenti di scuole secondarie di secondo grado di varie regioni d’Italia. Il nucleo di testi attualmente pubblicato (dal testo 270 al testo 496) è legato alla tesi di dottorato di Eugenio Cereser, Analisi e classificazione degli errori lessicali per un archivio digitale di testi studenteschi contemporanei, 38° ciclo, finanziata con fondi PNRR, Dottorato di Italianistica dell’Università Ca’ Foscari, di cui sono stati supervisori Davide Mastrantonio, Federico Boschetti e Michele Colombo (discussione della tesi prevista nell’aprile 2026). Altri nuclei di testi, non ancora pubblicati, erano stati precedentemente raccolti per le tesi di laurea magistrale di Chiara Marino (Strategie argomentative degli studenti delle scuole secondarie di secondo grado, Università Ca’ Foscari, a.a. 2022/2023, relatore D. Mastrantonio) e di Giulia Corrocher (Connettivi e relazioni logiche negli elaborati degli studenti delle scuole secondarie di secondo grado, Università Ca’ Foscari, a.a. 2022/2023, relatore D. Mastrantonio), le quali hanno entrambe collaborato alla trascrizione dei testi. Ogni testo presente in ALEF è identificato dalle seguenti informazioni: anno scolastico, anno d’istruzione, tipologia di istituto, regione, tipologia di prova secondo la riforma Fedeli, traccia della prova, numero progressivo del singolo testo. I testi provengono da esercitazioni e compiti in classe e le tracce delle prove dipendono dalle singole classi. Ogni testo è stato trascritto integralmente per offrire la trascrizione diplomatica degli elaborati. I criteri adottati sono stati i seguenti: sono stati rispettati gli a capo ; le cancellature leggibili sono state rese con il carattere barrato; per le cancellature non intellegibili si sono usati invece gli asterischi (*); le aggiunte sopra il rigo sono state trascritte in apice; eventuali errori di ortografia vengono segnalati con (sic).
This item contains 1 file (1.66 MB).
Publicly Available
corpusOPEN
Author(s):
Description:
Musisque Deoque, the whole corpus of the Latin poets, from the beginnings to the end of VIIth century, was established at the end of 2005 with the main goal of creating a singular database of Latin poetry, supported by a critical and exegetical electronic apparatus. At present, main collections of classical texts have been transferred onto digital device while resources, mostly online, allow quicker lexical searches. In most cases, however, search engine inquiry only provides results of a key inside a fix and ‘authoritarian’ text. The aim of Musisque Deoque is to overcome these limitations, allowing to locate not only the forms chosen from the text of a reference edition, but also the variants in its critical apparatus. Lately, the website has been implemented with new functions. These are the most important: Epigraphica, i. e. a peculiar handling of the Carmina Latina Epigraphica, with a search by corpora, by incipit, other information about place of origin, dating, when existing a paratext in prose, etc.; in addition, a photographic archive of the inscriptions on catalogue has been set up. Witnesses: the site has been supplied, in the apparatuses, with a standard nomenclature of the manuscripts, displaying the current proper names of city, library, collection and the signature; a list of poets and works that are present in the same manuscript; a link to the library’s website and, if existing, to the digitized images of the codex. Search by lemmas: available in the advanced search; Metrical scan of all the works in dactylic verses, performed by the Pedecerto application. Co-occurrences: starting from a chosen source text, the whole corpus is investigated to find verbal or non-verbal rhythmic similarities. Hellenica: a digital archive of Greek poetry.
This item contains 1 file (16.96 MB).
Publicly Available