Show simple item record

 
dc.contributor.author Sprugnoli, Rachele
dc.contributor.author Pellegrini, Matteo
dc.contributor.author Cecchini, Flavio Massimiliano
dc.contributor.author Passarotti, Marco
dc.date.accessioned 2021-03-09T10:26:37Z
dc.date.available 2021-03-09T10:26:37Z
dc.date.issued 2020
dc.identifier.uri http://hdl.handle.net/20.500.11752/OPEN-526
dc.description Training and gold test data released in EvaLatin 2020, the evaluation campaign of NLP tools for Latin. The two shared tasks proposed in EvaLatin 2020, i. e. Lemmatization and Part-of-Speech tagging, were aimed at fostering research in the field of language technologies for Classical languages. The shared dataset consists of texts taken from the Perseus Digital Library, processed with UDPipe models and then manually corrected by Latin experts. The training set includes only prose texts by Classical authors. The test set, alongside with prose texts by the same authors represented in the training set, also includes data relative to poetry and to the Medieval period.
dc.language.iso lat
dc.publisher CIRCSE Research Centre, Università Cattolica del Sacro Cuore
dc.relation info:eu-repo/grantAgreement/EC/H2020/769994
dc.relation.isreferencedby https://www.aclweb.org/anthology/2020.lt4hala-1.16.pdf
dc.rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.label PUB
dc.source.uri https://github.com/CIRCSE/LT4HALA/tree/master/data_and_doc
dc.subject Latin
dc.subject POS tagging
dc.subject Lemmatization
dc.title EvaLatin 2020: data
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding OPEN
demo.uri https://github.com/CIRCSE/LT4HALA/blob/master/data_and_doc/gold_EvaLatin/Horatius-Carmina_GOLD.conllu
contact.person Rachele Sprugnoli rachele.sprugnoli@unicatt.it Università Cattolica del Sacro Cuore
sponsor European Union EC/H2020/769994 LiLa - Linking Latin. Building a Knowledge Base of Linguistic Resources for Latin euFunds info:eu-repo/grantAgreement/EC/H2020/769994
size.info 341,419 tokens
size.info 16 files
files.size 3061734
files.count 1


 Files in this item

Icon
Name
EvaLatin-dataset.zip
Size
2.92 MB
Format
application/zip
Description
Dataset of EvaLatin 2020
MD5
d6b806a96bd69e2ad35bfa90174ddd33
 Download file  Preview
 File Preview  
  • EvaLatin-dataset
    • test_gold_data
      • Cicero_InCatilinam_GOLD.conllu489 kB
      • Seneca_DeVitaBeata_GOLD.conllu283 kB
      • Seneca_DeProvidentia_GOLD.conllu160 kB
      • Plinius_Epistulae_10_GOLD.conllu395 kB
      • Tacitus_Agricola_GOLD.conllu273 kB
      • Caesar_BellumCivile1_GOLD.conllu444 kB
      • Tacitus_Germania_GOLD.conllu221 kB
      • Horatius-Carmina_GOLD.conllu524 kB
      • SummaContraGentiles_IV_GOLD.conllu451 kB
    • training_data
      • Caesar_BellumCivile_LiberII.conllu258 kB
      • Caesar_BellumGallicum.conllu1 MB
      • Pliny_Younger_Epistulae_1-8.conllu1 MB
      • Seneca_DeClementia.conllu320 kB
      • Seneca_DeBeneficiis.conllu1 MB
      • Cicero_Philippica.conllu2 MB
      • Tacitus_Historiae.conllu2 MB

Show simple item record