This is a new version of the repository. Do let us know (dspace-clarin-it-ilc-help@ilc.cnr.it) if you encounter any issues.
 
Please use the following text to cite this item or export to a predefined format:
Frontini, Francesca; et al., 2026, StarwarsNER French Italian Corpus - sample 2.0, ILC-CNR for CLARIN-IT repository hosted at Institute for Computational Linguistics "A. Zampolli", http://hdl.handle.net/20.500.11752/ILC-2180
dc.contributor.authorFrontini, Francesca
dc.contributor.authorChahinian, Nanée
dc.contributor.authorAelami, Mitra
dc.contributor.authorCardillo, Franco Alberto
dc.contributor.authorConard, Serge
dc.contributor.authorDebole, Franca
dc.date.accessioned2026-05-21T13:09:23Z
dc.date.available2025-10-17T18:39:28Z
dc.date.available2026-05-21T13:09:23Z
dc.date.issued2026-05-19
dc.descriptionThe StarwarsNER French Italian Corpus - sample is a multilingual benchmark resource for Named Entity Recognition (NER) in the wastewater and stormwater management domain. With respect to **StarwarsNER French Italian Corpus - sample 1.0**, this new version adds English translations of the annotation guidelines and the plain texts, so that people who do not speak Italian or French can better understand and potentially replicate the experiment. It supports research in: - Information extraction - Relation extraction - Entity linking The corpus consists of manually annotated parallel French and Italian documents, aligned at the sentence level. Annotations follow a domain-specific schema based on the Sewer Network Ontology <http://hdl.handle.net/20.500.11752/ILC-1037>. For copyright reasons, this release contains only a sample of the original corpus, namely 8 French documents from public administrations and their Italian translations. --- ## Resource Creation 1. **French corpus** - Collected from reports, regulations, and local media texts. - Manually annotated according to the STARWARS schema. 2. **Italian (and English) corpus** - Produced via machine translation of the French texts. - Reviewed and corrected by bilingual translation students and expert hydrologists. 3. **Annotation process** - Conducted with the **INCEpTION** annotation platform. - Ensured consistent alignment between French and Italian. For details, please refer to the publication: F.A. Cardillo, F. Debole, F. Frontini, M. Aelami, N. Chahinian, S. Conrad (2025) “Novel Benchmark for NER in the Wastewater and Stormwater Domain”, Proceedings of the 6th IEEE MNLP Conf. (CiST-MNLP’2025) 4-10 October 2025, Marrakech, Morocco. <https://arxiv.org/abs/2506.01938> --- ## Contents of this Package - **Texts**: Provided in plain text, in French, with translations in Italian and English (the latter for reference only). - **Annotations**: Provided in **CONLL 2003 format, as exported from INCEpTION**, for the French and Italian texts. - **Annotation guidelines**: Included in **French**, with translations in **English** and **Italian**, as used by annotators.
dc.identifier.urihttp://hdl.handle.net/20.500.11752/ILC-2180
dc.language.isoita
dc.language.isofra
dc.publisherIstituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR)
dc.publisherInstitute of Information Science and Technologies "Alessandro Faedo" - National Research Council of Italy (ISTI CNR)
dc.publisherInstitut de Recherche pour le Développement
dc.publisherUniversité de Montpellier
dc.relation.isreferencedbyhttps://doi.org/10.1109/CiSt65886.2025.11224095
dc.relation.isreplacedbyhttp://hdl.handle.net/20.500.11752/ILC-2169
dc.relation.replaceshttp://hdl.handle.net/20.500.11752/ILC-1052
dc.rightsCreative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.labelPUB
dc.rights.urihttps://creativecommons.org/licenses/by/4.0
dc.source.urihttps://sites.google.com/view/horizoneurope2020-starwars/
dc.subjectNamed Entity Recognition
dc.subjectSewer Network
dc.titleStarwarsNER French Italian Corpus - sample 2.0
dc.typecorpus
local.brandingILC
local.contact.personFrancesca Frontini francesca.frontini@cnr.it Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR)
local.files.count1
local.files.size845192
local.has.filesyes
local.language.nameItalian
local.language.nameFrench
local.size.info8 files
local.sponsorOther euFunds 10108625 European Union's Horizon research and innovation program STARWARS (STormwAteR and Wastew- AteR networkS heterogeneous data AI-driven management)
metashare.ResourceInfo#ContentInfo.mediaTypetext

Version History

Showing 1 - 2 out of 2 results
VersionDateSummary
3*
2026-05-19 15:19:45
Adding English Translations for corpus files
2025-10-17 18:39:28
* Selected version
This item isPublicly Available
and licensed under:
 Files in this item
Name
StarwarsCorpus - 2.0.zip
Size
825.38 KB
Format
application/zip
Description
Zip
MD5
0b0225f8061be7a7fee3dce4c5853fb6
Preview
  File Preview