Marseille INRIA Columbia AT&T (MICA)
PARSER
Key Features
- MICA is a dependency parser trained on the Penn Treebank
- MICA can associate dependency parses with rich linguistic
information such as voice, the presence of empty subjects (PRO),
wh-movement, and whether a verb heads a relative clause.
- MICA is fast (450 words per second plus 6 seconds initialization
on a standard high-end machine) and has state-of-the-art performance
(87.6% unlabeled dependency accuracy on the Penn Treebank).
- MICA consists of two processes: the supertagger, which associates
tags representing rich syntactic information with the input word
sequence, and the actual parser, based on the INRIA SYNTAX system,
which derives the syntactic structure from the n-best chosen
supertags. Only the supertagger uses lexical information, the parser
only sees the supertag hypotheses.
- MICA returns n-best parses for arbitrary n;
parse trees are associated with probabilities. A
packed forest can also be returned.
Documentation
MICA is still under development. The documentation in particular is
work in progress.
- A minimalist installation and user guide can be found in the Readme
file (see section Download)
- A description of the output format of MICA can be found here
- An overview of the parser can be found in the paper
- Theoretical aspects of the parser are described in a series
of technical papers
- Supertagging
- S.Bangalore, P. Haffner, Classification of Large Label Sets
in Proceedings of the Snowbird Learning Workshop 2005
- S.Bangalore, A.Joshi, Supertagging: An approach to
almost parsing, in Computational Linguistics, 25(2) 1999
- Parsing
- A. Nasr, O.Rambow, Parsing with Lexicalized Probabilistic Recursive Transition Networks,
in Finite-State Methods and Natural Language Processing 2005 - LNAI 4002
- A. Nasr, O.Rambow, A Simple String-Rewriting Formalism for Dependency Grammar, in Workshop on Recent Advances in Dependency Grammar -
COLING 2004
- A. Nasr, O.Rambow, Supertagging and Full Parsing , in 7th International Workshop
on Tree Adjoining Grammars and Related Frameworks-TAG+ 2004
Download
Mailing Lists
Information concerning MICA is provided through two mailing lists:
- mica-announce
is dedicated to messages concerning new releases of MICA.
- mica-users
is for exchange of information among MICA users. Check the mailing
list archive
for past contributions.