Marseille INRIA Columbia AT&T (MICA)
PARSER

Key Features

MICA is a dependency parser trained on the Penn Treebank
MICA can associate dependency parses with rich linguistic information such as voice, the presence of empty subjects (PRO), wh-movement, and whether a verb heads a relative clause.
MICA is fast (450 words per second plus 6 seconds initialization on a standard high-end machine) and has state-of-the-art performance (87.6% unlabeled dependency accuracy on the Penn Treebank).
MICA consists of two processes: the supertagger, which associates tags representing rich syntactic information with the input word sequence, and the actual parser, based on the INRIA SYNTAX system, which derives the syntactic structure from the n-best chosen supertags. Only the supertagger uses lexical information, the parser only sees the supertag hypotheses.
MICA returns n-best parses for arbitrary n; parse trees are associated with probabilities. A packed forest can also be returned.

Documentation

MICA is still under development. The documentation in particular is work in progress.

A minimalist installation and user guide can be found in the Readme file (see section Download)
A description of the output format of MICA can be found here
An overview of the parser can be found in the paper
- MICA: A Probabilistic Dependency Parser Based on Tree Insertion Grammars, in Proc. North American Chapter of the Association for Computational Linguistics (NAACL) 2009
The grammar implemented in MICA can be found in this file

Each line in the file corresponds to a elementary tree.

Here is an example:

t27 S##1#l# NP#0#2#l#s NP#0#2#r#s VP##3#l# V##4#l#h V##4#r#h NP#1#5#l#s NP#1#5#r#s VP##3#r# S##1#r#

This is the tree t27 for the basic transitive verb.
The nodes of the tree are listed in a depth-first, left-to-right traversal.
Each node is listed twice: once when descending, and again when ascending.
This means that for leaf nodes, there are two entries for the left nodes right next to each other.

The format for each node is (using an example to explain): NP#0#2#l#s
- NP - node label
- 0 - deep argument position (this is only filled for substitution nodes)
- 2 - the number of the node in the tree; each node has a unique number, but since each node is listed twice in this enumeration, each number will occur exactly twice
- l - this is the left version of the node, the right version (r) will also occur, in this case right next to it since it is a leaf node
- s - the type of node; this can be s for substitution, h for head, c for co-head (for strongly governed prepositions) or nothing
A more readable version of the grammar is available here Do not print, it is over 1,000 pages
Theoretical aspects of the parser are described in a series of technical papers
- Supertagging
  - S.Bangalore, P. Haffner, Classification of Large Label Sets in Proceedings of the Snowbird Learning Workshop 2005
  - S.Bangalore, A.Joshi, Supertagging: An approach to almost parsing, in Computational Linguistics, 25(2) 1999
- Parsing
  - A. Nasr, O.Rambow, Parsing with Lexicalized Probabilistic Recursive Transition Networks, in Finite-State Methods and Natural Language Processing 2005 - LNAI 4002
  - A. Nasr, O.Rambow, A Simple String-Rewriting Formalism for Dependency Grammar, in Workshop on Recent Advances in Dependency Grammar - COLING 2004
  - A. Nasr, O.Rambow, Supertagging and Full Parsing , in 7th International Workshop on Tree Adjoining Grammars and Related Frameworks-TAG+ 2004

Download

Version	Architecture	Package	Documentation
1.0	Linux X86	mica-1_0.x86_32.tgz	Readme

Mailing Lists

Information concerning MICA is provided through two mailing lists:

mica-announce is dedicated to messages concerning new releases of MICA.
mica-users is for exchange of information among MICA users. Check the mailing list archive for past contributions.

Marseille INRIA Columbia AT&T (MICA) PARSER

Key Features

Documentation

Download

Mailing Lists

Marseille INRIA Columbia AT&T (MICA)
PARSER