• Home
  • Repository
  • About
  • CLARIN
  •  Login
  • ILC4CLARIN Repository Home
  • View Item
  •  
  • ILC-CNR for CLARIN-IT logo
    CLARIN logo
  •   What can you do?
  •   Browse  
    •    All of the Repository  
      •   Issue Date
      •   Authors
      •   Titles
      •   Subjects
      •   Publisher
      •   Language
      •   Type
      •   Rights Label
  •   My Account  
    •    Federated Login
    •    Local Authentication
  •   Statistics  
    •    StatisticsBETA
  •   General Information  
    •    Deposit
    •    Cite
    •    Submission Lifecycle
    •    FAQ
    •    About
    •    Help Desk
 
 

Multiword Extractor

 
ILC
  Authors
Rubino, Francesco ; Quochi, Valeria and Frontini, Francesca
  Item identifier
http://hdl.handle.net/20.500.11752/ILC-91
 Project URL
http://www.panacea-lr.eu/system/deliverables/PANACEA_D6.2.pdf
 Demo URL
https://ilc4clarin.ilc.cnr.it/en/services/multiword-extractor
 Referenced by
http://www.aclweb.org/anthology/C12-1140
 Date issued
2012-12-12
 Type
toolService
 Description
This is a lexical acquisition web-service for the automatic extraction of multiword expressions from large corpora. The service takes in input a POS-tagged corpus in CoNLL-X format plus a pair of POS-tags for the first and last word of a MWE, and outputs a list of extracted (candidate) multiword expressions with a set of linguistic and statistical information. The output can then be post-processed through filters that will refine and improve the accuracy of the extraction, and finally converted to an LMF-compliant XML lexical resource. The tool code is available open-source at https://github.com/francescafrontini/MWExtractor. Further details can be found in: Quochi Valeria & Frontini Francesca & Rubino Francesco. 2012. A MWE Acquisition and Lexicon Builder Web Service. In Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), December 10-14 2012, IIT Bombay, Mumbai, India. Frontini Francesca & Rubino Francesco & Quochi Valeria. 2012. Automatic Creation of quality multi-word Lexica from noisy text data. In Proceedings of the Sixth Workshop on Analytics for Noisy Unstructured Text Data (AND2012). December 9, 2012, IIT Bombay, Mumbai, India (Co-located with COLING2012).
 Publisher
Istituto di Linguistica Computazionale “A. Zampolli” - Consiglio Nazionale delle Ricerche (ILC-CNR)
 Acknowledgement

European Commission

Project code: FP7-STREP-GA248064

Project name: Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies

 Subject(s)
Multiword Extraction Automatic lexical acquisition
 Collection(s)
ILC4CLARIN : ILC Data & Tools
Show full item record
 
 

Coordination, Partners, Funding

  • Institute for Computational Linguistics "Antonio Zampolli" - Italian National Research Council
  • Italian Ministry of Education, University and Research

Repository

  • Main Page
  • Submission Lifecycle
  • FAQ
  • About and Policies
  • Help Desk

More

  • CLARIN
  • How To Sign Up

Copyright (c) 2019 ILC4CLARIN CNR. All rights reserved.