StarwarsNER French Italian Corpus - sample 2.0
Please use the following text to cite this item or export to a predefined format:
Frontini, Francesca; et al., 2026, StarwarsNER French Italian Corpus - sample 2.0, ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", http://hdl.handle.net/20.500.11752/ILC-2180
Authors
Frontini, Francesca ; et al.
Item identifier
Referenced by
Date issued
2026-05-19
Size
8 files
Description
The StarwarsNER French Italian Corpus - sample is a multilingual benchmark resource for Named Entity Recognition (NER) in the wastewater and stormwater management domain.
With respect to **StarwarsNER French Italian Corpus - sample 1.0**, this new version adds English translations of the annotation guidelines and the plain texts, so that people who do not speak Italian or French can better understand and potentially replicate the experiment.
It supports research in:
- Information extraction
- Relation extraction
- Entity linking
The corpus consists of manually annotated parallel French and Italian documents, aligned at the sentence level. Annotations follow a domain-specific schema based on the Sewer Network Ontology <http://hdl.handle.net/20.500.11752/ILC-1037>.
For copyright reasons, this release contains only a sample of the original corpus, namely 8 French documents from public administrations and their Italian translations.
---
## Resource Creation
1. **French corpus**
- Collected from reports, regulations, and local media texts.
- Manually annotated according to the STARWARS schema.
2. **Italian (and English) corpus**
- Produced via machine translation of the French texts.
- Reviewed and corrected by bilingual translation students and expert hydrologists.
3. **Annotation process**
- Conducted with the **INCEpTION** annotation platform.
- Ensured consistent alignment between French and Italian.
For details, please refer to the publication:
F.A. Cardillo, F. Debole, F. Frontini, M. Aelami, N. Chahinian, S. Conrad (2025) “Novel Benchmark for NER in the Wastewater and Stormwater Domain”, Proceedings of the 6th IEEE MNLP Conf. (CiST-MNLP’2025) 4-10 October 2025, Marrakech, Morocco. <https://arxiv.org/abs/2506.01938>
---
## Contents of this Package
- **Texts**: Provided in plain text, in French, with translations in Italian and English (the latter for reference only).
- **Annotations**: Provided in **CONLL 2003 format, as exported from INCEpTION**, for the French and Italian texts.
- **Annotation guidelines**: Included in **French**, with translations in **English** and **Italian**, as used by annotators.
Acknowledgement
European Union's Horizon research and innovation program
Project code:euFunds 10108625
Project name:STARWARS (STormwAteR and Wastew- AteR networkS heterogeneous data AI-driven management)
Subject(s)
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- StarwarsCorpus - 2.0.zip
- Size
- 825.38 KB
- Format
- application/zip
- Description
- Zip
- MD5
- 0b0225f8061be7a7fee3dce4c5853fb6

The file preview has not been generated yet. Please try again later or contact the system administrator dspace-clarin-it-ilc-help@ilc.cnr.it

