ILC-CNR for CLARIN-IT repository
About and Policies

Mission Statement

The ultimate objective of CLARIN ERIC is to advance research in humanities and social sciences by giving researchers unified single sign-on access to a platform which integrates language-based resources and advanced tools at a European level. This shall be implemented by the construction and operation of a shared distributed infrastructure that aims at making language resources, technology and expertise available to the humanities and social sciences (henceforth abbreviated HSS) research communities at large.

To know more about CLARIN ERIC visit CLARIN-ShortGuide.pdf

Terms of Service

To achieve our mission statement,we set out some ground rules through the Terms of Service. By accessing or using any kind of data or services provided by the Repository, you agree to abide by the Terms contained in the above mentioned document.

Data in ILC-CNR for CLARIN-IT repository are made available under the licence attached to the resources. In case there is no licence, data is made freely available for access, printing and download for the purposes of non-commercial research or private study. Users must acknowledge in any publication, the Deposited Work using a persistent identifier (see Citing Data), its original author(s)/creator(s), and any publisher where applicable. Full items must not be harvested by robots except transiently for full-text indexing or citation analysis. Full items must not be sold commercially unless explicitly granted by the attached licence without formal permission of the copyright holders.

About Repository

It is like a library for linguistic data and tools.

  • Search for data and tools and easily download them.
  • Deposit the data and be sure it is safely stored, everyone can find it, use it, and correctly cite it (giving you credit)


Founded as an independent institute of CNR in 1980, ILC has a consolidated position as leading centre of reference in the field of Computational Linguistics at both the national and international levels. The Institute is involved in research, enhancement, technological transfer and training activities in strategic scientific areas of the discipline.
The main areas of competence of the Institute are represented by: Text Processing and Computational Philology; Natural Language Processing and Knowledge Extraction; Resources, Standards and Infrastructures; Computational Models of Language Usage. The studies carried out within each area are highly interdisciplinary and involve different professional skills and expertises that extend across the disciplines of Linguistics, Computational Linguistics, Computer Science and Bio-Engineering.
The wide range of competence areas of ILC together with the variety of its lines of activity and research projects makes the Institute a unique reality in Italy and one of the few at the international level. ILC activities range from innovative research in the field of Digital Humanities, to the definition of representation standards and distributed research infrastructures. Advanced methods and techniques for the knowledge management of Web content or document collections within Intranets are combined with computational models of language learning in ecological contexts of communicative interaction.
Research at ILC combines basic research, also including frontier research, with applied (goal-oriented) research within a virtuous circle, with a potential impact on culture, society and the economic system. Research is carried out within a consolidated network of national and international collaborations with research institutes, universities and public bodies, as well as companies involved in European, national and regional research projects.

License Agreement and Contracts

At the moment, ILC-CNR distinguishes three types of contracts.

  • For every deposit, we enter into a standard contract with the submitter, the so-called "Distribution License Agreement", in which we describe our rights and duties and the submitter acknowledges that they have the right to submit the data and gives us (the repository centre) right to distribute the data on their behalf.
  • Everyone who downloads data is bound by the licence assigned to the item - in order to download protected data, one has to be authenticated and needs to electronically sign the licence. A list of available licenses in our repository can be found here.
  • For submitters, there is a possibility for setting custom licences to items during the submission workflow.

Intellectual Property Rights

As mentioned in the section License Agreement and Contracts, we require the depositor of data or tools to sign a Distribution License Agreement, which specifies that they have the right to submit the data and gives us (the repository centre) right to distribute the data on their behalf. This means that depositors are solely responsible for taking care of IPR issues before publishing data or tools by submitting them to us.
Should anyone have a suspicion that any of the datasets or tools in our repository violate Intellectual Property Rights, they should contact us immediately at our help desk.

Privacy Policy

Read our Privacy Policy in order to learn how we manage personal data collected by the ILC-CNR for CLARIN-IT repository and services.

Metadata Policy

Deposited content must be accompanied by sufficient metadata describing its content, provenance and formats in order to support its preservation and dissemination. Metadata are freely accessible and are distributed in the public domain (under CC0). However, we reserve the right to be informed about commercial usage of metadata from ILC-CNR for CLARIN-IT repository including a description of your use case at Help Desk.

Preservation Policy

ILC-CNR for CLARIN-IT is committed to the long-term care of items deposited in the repository. ILC-CNR for CLARIN-IT also strives to adopt the current best practice in digital preservation to keep the data (and metadata) available and make the research results replicable by reusing datasets and tools. Indeed, the repository follows the best practice guidelines, standards, and regulations defined in CLARIN ERIC and OAIS.

As required by CLARIN ERIC, ILC-CNR for CLARIN-IT goes by periodic assessments (every 3 years). The assessment is performed through the CTS, formerly (DSA).

According to the deposit, every submission is ingested and distributed with a clear License Agreement and Contracts. This license can be the metadata distribution license (see "Distribution License Agreement") as well as public or academic licenses as described here. When a submitted item is restricted it is made available to authorized users only.

Technically, the ILC-CNR for CLARIN-IT repository is built on top of DSPACE. DSPACE has several export options that allow moving data (along with their metadata) back and forth among various repository systems.

In addition, DSPACE identifies two levels of digital preservation: bit preservation, and functional preservation. The first level guarantees the integrity of data and metadata over time (and the export options ensure the integrity among different storage systems); the second is functional preservation designed to manage the change (in format, for example) over time.

Format migration is a primary requisite for functional preservation. Some file formats can be easily preserved (text files, images, well-documented formats...) while others (especially the proprietary ones) need specific functionalities. To limit this, ILC-CNR for CLARIN-IT promotes the usage of specific standard formats.

ILC-CNR for CLARIN-IT also provides a "new-version-of" option which allows users to link two different versions of the same resource. Within preservation, this can be utilized to use the old one for replicability, reproducibility, and reuse, while the new ones can accept different formats.