Slide 1 of 3 

Linguistic Data and NLP Tools
Find
Citation Support (with Persistent IDs)
Slide 2 of 3
Deposit Free and Safe
License of your Choice (Open licenses encouraged)
Easy to Find
Easy to Cite
Slide 3 of 3
“There ought to be only one grand dépôt of art in the world, to which the artist might repair with his works, and on presenting them receive what he required... ”Ludwig van Beethoven, 1801

Author
Subject
toolServiceILC
Author(s):
Description:
DigItAnt-search is the GUI web application od the DigItAnt platform, designed to explore, visualise and navigate the different sources of information created or linked within the national ItAnt project (https://www.prin-italia-antica.unifi.it/). DigItAnt is an innovative platform designed to support historical linguistic and epigraphic studies, and researchers in the creation, management and consultation of digital linguistic resources for the fragmentary ancient languages.
DigItAnt-search allows to explore interactively various sources of information in a unified and easily accessible environment.
The development of DigItAnt was funded by the Ministry of University and Research under the program Research Projects of Relevant National Interest (PRIN) 2017.
This front-end application was developed by Michele Mallia under the supervision of Valeria Quochi and thanks to continuous discussion and exchange with the team, composed of: Andrea Bellandi, Alessandro Tommasi, Cesare Zavattari, Silvia Piccini, Michela Bandini, Chiara Fazzone.
This item contains no files.
toolServiceILC
Author(s):
Michele Mallia ; et al.
show everyone
Description:
EpiLexO is a user friendly web application for the creation and editing of an integrated system of language resources for ancient fragmentary languages centered on the lexicon, in compliance with current digital humanities and Linked Open Data principles. EpiLexo allows for the editing of lexica with all relevant cross-references: for their linking to their testimonies, as well as to bibliographic information and other (external) resources and common vocabularies. This front-end application rests on a Service-Oriented Architecture with two main back-end components, the LexO-server (\handle) and the CASH-server (1github), which manage lexica and textual documents respectively via Rest-ful APIs web-services, plus additional services for the management of other aspects such as access and authentication, XML rendering, etc. All code is available on https://github.com/DigItAnt/ The application has been developed in the context of a project on the languages of fragmentary attestation of ancient Italy, but can be applied to other similar contexts.
This item contains 1 file (568.36 KB).
Publicly Available
corpusILC
Author(s):
Frontini, Francesca ; et al.
show everyone
Description:
The StarwarsNER French Italian Corpus - sample is a multilingual benchmark resource for Named Entity Recognition (NER) in the wastewater and stormwater management domain.
It supports research in:
- Information extraction
- Relation extraction
- Entity linking
The corpus consists of manually annotated parallel French and Italian documents, aligned at the sentence level. Annotations follow a domain-specific schema based on the Sewer Network Ontology <http://hdl.handle.net/20.500.11752/ILC-1037>.
For copyright reasons, this release contains only a sample of the original corpus, namely 8 French documents from public administrations and their Italian translations.
---
## Resource Creation
1. **French corpus**
- Collected from reports, regulations, and local media texts.
- Manually annotated according to the STARWARS schema.
2. **Italian corpus**
- Produced via machine translation of the French texts.
- Reviewed and corrected by bilingual translation students and expert hydrologists.
3. **Annotation process**
- Conducted with the **INCEpTION** annotation platform.
- Ensured consistent alignment between French and Italian.
For details, please refer to the publication:
F.A. Cardillo, F. Debole, F. Frontini, M. Aelami, N. Chahinian, S. Conrad (2025) “Novel Benchmark for NER in the Wastewater and Stormwater Domain”, Proceedings of the 6th IEEE MNLP Conf. (CiST-MNLP’2025) 4-10 October 2025, Marrakech, Morocco. <https://arxiv.org/abs/2506.01938>
---
## Contents of this Package
- **Texts**: Provided in plain text.
- **Annotations**: Provided in **CONLL 2003 format, as exported from INCEpTION**.
- **Annotation guidelines**: Included in both **French** and **Italian**, as used by annotators.
This item contains 1 file (559.42 KB).
Publicly Available
Most Viewed Items - Last Month
toolServiceILC
Author(s):
Michele Mallia ; et al.
show everyone
Description:
EpiLexO is a user friendly web application for the creation and editing of an integrated system of language resources for ancient fragmentary languages centered on the lexicon, in compliance with current digital humanities and Linked Open Data principles. EpiLexo allows for the editing of lexica with all relevant cross-references: for their linking to their testimonies, as well as to bibliographic information and other (external) resources and common vocabularies. This front-end application rests on a Service-Oriented Architecture with two main back-end components, the LexO-server (\handle) and the CASH-server (1github), which manage lexica and textual documents respectively via Rest-ful APIs web-services, plus additional services for the management of other aspects such as access and authentication, XML rendering, etc. All code is available on https://github.com/DigItAnt/ The application has been developed in the context of a project on the languages of fragmentary attestation of ancient Italy, but can be applied to other similar contexts.
This item contains 1 file (568.36 KB).
Publicly Available
corpusILC
Author(s):
Frontini, Francesca ; et al.
show everyone
Description:
The StarwarsNER French Italian Corpus - sample is a multilingual benchmark resource for Named Entity Recognition (NER) in the wastewater and stormwater management domain.
It supports research in:
- Information extraction
- Relation extraction
- Entity linking
The corpus consists of manually annotated parallel French and Italian documents, aligned at the sentence level. Annotations follow a domain-specific schema based on the Sewer Network Ontology <http://hdl.handle.net/20.500.11752/ILC-1037>.
For copyright reasons, this release contains only a sample of the original corpus, namely 8 French documents from public administrations and their Italian translations.
---
## Resource Creation
1. **French corpus**
- Collected from reports, regulations, and local media texts.
- Manually annotated according to the STARWARS schema.
2. **Italian corpus**
- Produced via machine translation of the French texts.
- Reviewed and corrected by bilingual translation students and expert hydrologists.
3. **Annotation process**
- Conducted with the **INCEpTION** annotation platform.
- Ensured consistent alignment between French and Italian.
For details, please refer to the publication:
F.A. Cardillo, F. Debole, F. Frontini, M. Aelami, N. Chahinian, S. Conrad (2025) “Novel Benchmark for NER in the Wastewater and Stormwater Domain”, Proceedings of the 6th IEEE MNLP Conf. (CiST-MNLP’2025) 4-10 October 2025, Marrakech, Morocco. <https://arxiv.org/abs/2506.01938>
---
## Contents of this Package
- **Texts**: Provided in plain text.
- **Annotations**: Provided in **CONLL 2003 format, as exported from INCEpTION**.
- **Annotation guidelines**: Included in both **French** and **Italian**, as used by annotators.
This item contains 1 file (559.42 KB).
Publicly Available
lexicalConceptualResourceILC
Author(s):
Description:
ItalWordNet (IWN) is a lexical-semantic database developed in the framework of two different research projects: EuroWordNet (EWN) and Sistema Integrato per il Trattamento Automatico del Linguaggio (SI-TAL).
IWN is structured in the same way as the Princeton WordNet, namely around the notion of synset. Following the model designed in EWN, IWN encodes a rich set of semantic relations. In addition to the internal language relations, equivalence relations were also encoded between Italian synsets and the closest concepts in an Inter-Lingual Index (ILI), a separate language-independent module containing all WN1.5 synsets but not the relations among them.
IWN now contains information about Italian Nouns, Verbs, Adjectives and Adverbs.
This SQL version of IWN v2.0 contains a corrected and revised version of the original IWN:
49350 Synsets (of which: 3459 proper nouns, 32073 nominal, 8903 verbal, 4374 adjectival, 541 adverbial)
48416 Lemmas (of which: 3918 proper nouns, 29527 nouns, 8015 verbs, 5808 adjectives, 1090 adverbs)
68478 Senses
This item contains 2 files (4.81 MB).
Publicly Available






