Please use the following text to cite this item or export to a predefined format:
Frontini, Francesca; et al., 2025, StarwarsNER French Italian Corpus - sample, ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", http://hdl.handle.net/20.500.11752/ILC-1052
| dc.contributor.author | Frontini, Francesca |
| dc.contributor.author | Chahinian, Nanée |
| dc.contributor.author | Aelami, Mitra |
| dc.contributor.author | Cardillo, Franco Alberto |
| dc.contributor.author | Conard, Serge |
| dc.contributor.author | Debole, Franca |
| dc.date.accessioned | 2025-10-17T18:39:28Z |
| dc.date.available | 2025-10-17T18:39:28Z |
| dc.date.issued | 2025-10-07 |
| dc.description | The StarwarsNER French Italian Corpus - sample is a multilingual benchmark resource for Named Entity Recognition (NER) in the wastewater and stormwater management domain. It supports research in: - Information extraction - Relation extraction - Entity linking The corpus consists of manually annotated parallel French and Italian documents, aligned at the sentence level. Annotations follow a domain-specific schema based on the Sewer Network Ontology <http://hdl.handle.net/20.500.11752/ILC-1037>. For copyright reasons, this release contains only a sample of the original corpus, namely 8 French documents from public administrations and their Italian translations. --- ## Resource Creation 1. **French corpus** - Collected from reports, regulations, and local media texts. - Manually annotated according to the STARWARS schema. 2. **Italian corpus** - Produced via machine translation of the French texts. - Reviewed and corrected by bilingual translation students and expert hydrologists. 3. **Annotation process** - Conducted with the **INCEpTION** annotation platform. - Ensured consistent alignment between French and Italian. For details, please refer to the publication: F.A. Cardillo, F. Debole, F. Frontini, M. Aelami, N. Chahinian, S. Conrad (2025) “Novel Benchmark for NER in the Wastewater and Stormwater Domain”, Proceedings of the 6th IEEE MNLP Conf. (CiST-MNLP’2025) 4-10 October 2025, Marrakech, Morocco. <https://arxiv.org/abs/2506.01938> --- ## Contents of this Package - **Texts**: Provided in plain text. - **Annotations**: Provided in **CONLL 2003 format, as exported from INCEpTION**. - **Annotation guidelines**: Included in both **French** and **Italian**, as used by annotators. |
| dc.identifier.uri | http://hdl.handle.net/20.500.11752/ILC-1052 |
| dc.language.iso | ita |
| dc.language.iso | fra |
| dc.publisher | Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR) |
| dc.publisher | Institute of Information Science and Technologies "Alessandro Faedo" - National Research Council of Italy (ISTI CNR) |
| dc.publisher | Institut de Recherche pour le Développement |
| dc.publisher | Université de Montpellier |
| dc.relation.isreferencedby | https://arxiv.org/abs/2506.01938 |
| dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
| dc.rights.label | PUB |
| dc.rights.uri | https://creativecommons.org/licenses/by/4.0 |
| dc.source.uri | https://sites.google.com/view/horizoneurope2020-starwars/ |
| dc.subject | Named Entity Recognition |
| dc.subject | Sewer Network |
| dc.title | StarwarsNER French Italian Corpus - sample |
| dc.type | corpus |
| local.branding | ILC |
| local.contact.person | Francesca Frontini francesca.frontini@cnr.it Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR) |
| local.files.count | 1 |
| local.files.size | 0 |
| local.has.files | yes |
| local.language.name | Italian |
| local.language.name | French |
| local.size.info | 8 files |
| local.sponsor | euFunds 10108625 European Union's Horizon research and innovation program STARWARS (STormwAteR and Wastew- AteR networkS heterogeneous data AI-driven management) |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- StarwarsCorpus.zip
- Size
- 559.42 KB
- Format
- application/zip
- Description
- Zip
- MD5
- 335f0f1037273b0ba3c1f347842cf962

The file preview has not been generated yet. Please try again later or contact the system administrator dspace-clarin-it-ilc-help@ilc.cnr.it

