Parsing Techniques
for
Lexicalized
Context-Free Grammars*
Giorgio Satta
University of Padua
* Joint work with : Jason Eisner, Mark-Jan Nederhof
Summary
• Part I: Lexicalized Context-Free Grammars
– motivations and definition
– relation with other formalisms
• Part II: standard parsing
– TD techniques
– BU techniques
• Part III: novel algorithms
– BU enhanced
– TD enhanced
Lexicalized grammars
• each rule specialized for one or more
lexical items
• advantages over non-lexicalized
formalisms:
– express syntactic preferences that are
sensitive to lexical words
– control word selection
Syntactic preferences
• adjuncts
Workers [ dumped sacks ] into a bin
*Workers dumped [ sacks into a bin ]
• N-N compound
[ hydrogen ion ] exchange
*hydrogen [ ion exchange ]
Word selection
• lexical
Nora convened the meeting
?Nora convened the party
• semantics
Peggy solved two puzzles
?Peggy solved two goats
• world knowledge
Mary shelved some books
?Mary shelved some cooks
Lexicalized CFG
Motivations :
• study computational properties common
to generative formalisms used in
state-of-the-art real-world parsers
• develop parsing algorithm that can be
directly applied to these formalisms
Lexicalized CFG
VP[dump][sack]
VP[dump][sack]
V[dump]
NP[sack]
PP[into][bin]
P[into]
N[sack]
dumped
sacks
NP[bin]
Det[a]
into
a
N[bin]
bin
Lexicalized CFG
Context-free grammars with :
• alphabet VT:
– dumped, sacks, into, ...
• delexicalized nonterminals VD:
– NP, VP, ...
• nonterminals VN:
– NP[sack], VP[dump][sack], ...
Lexicalized CFG
Delexicalized nonterminals encode :
• word sense
N, V, ...
• grammatical features
number, tense, ...
• structural information
bar level, subcategorization state, ...
• other constraints
distribution, contextual features, ...
Lexicalized CFG
• productions have two forms :
– V[dump]  dumped
– VP[dump][sack]
 VP[dump][sack] PP[into][bin]
• lexical elements in lhs inherited from rhs
Lexicalized CFG
• production is k-lexical :
k occurrences of lexical elements in rhs
– NP[bin]  Det[a] N[bin]
is 2-lexical
– VP[dump][sack]
 VP[dump][sack] PP[into][bin]
is 4-lexical
LCFG at work
• 2-lexical CFG
– Alshawi 1996
: Head Automata
– Eisner 1996
: Dependency Grammars
– Charniak 1997
: CFG
– Collins 1997
: generative model
LCFG at work
Probabilistic LCFG G is strongly equivalent to
probabilistic grammar G’ iff
• 1-2-1 mapping between derivations
• each direction is a homomorphism
• derivation probabilities are preserved
LCFG at work
From Charniak 1997 to 2-lex CFG :
NPS[profits]
NNP[profits]
ADJNP[corporate]
Pr1 (corporate | ADJ, NP, profits) 
Pr1 (profits | N, NP, profits) 
Pr2 ( NP  ADJ N | NP, S, profits)
LCFG at work
From Collins 1997 (Model #2) to 2-lex CFG :
VPS, {}, Dleft  D, DS  D [bought]
N, D [IBM]
VPS, {NP-C}, Dleft , DS  [bought]
Prleft (NP, IBM | VP, S, bought,
Dleft , {NP-C})
LCFG at work
Major Limitation :
Cannot capture relations involving
lexical items outside actual constituent
(cfr. history based models)
NP[d1][d0]
V[d0]
d0
NP[d1]
d1
PP[d2][d3]
d2
cannot look at d0
when computing
PP attachment
LCFG at work
• lexicalized context-free parsers that
are not LCFG :
– Magerman 1995
: Shift-Reduce+
– Ratnaparkhi 1997
: Shift-Reduce+
– Chelba & Jelinek 1998
: Shift-Reduce+
– Hermjakob & Mooney 1997
: LR
Related work
Other frameworks for the study of
lexicalized grammars :
• Carroll & Weir 1997 :
Stochastic Lexicalized Grammars;
emphasis on expressiveness
• Goodman 1997 :
Probabilistic Feature Grammars;
emphasis on parameter estimation
Summary
• Part I: Lexicalized Context-Free Grammars
– motivations and definition
– relation with other formalisms
• Part II: standard parsing
– TD techniques
– BU techniques
• Part III: novel algorithms
– BU enhanced
– TD enhanced
Standard Parsing
• standard parsing algorithms
(CKY, Earley, LC, ...) run on LCFG in time
O ( |G |  |w |3 )
• for 2-lex CFG (simplest case) |G | grows
with |VD|3  |VT|2 !!
Goal :
Get rid of |VT| factors
Standard Parsing: TD
Result (to be refined) :
Algorithms satisfying the correct-prefix
property are “unlikely” to run on LCFG in
time independent of VT
Correct-prefix property
Earley, Left-Corner, GLR, ... :
S
w
left-to-right reading position
On-line parsing
No grammar precompilation (Earley) :
G
Parser
w
Output
Standard Parsing: TD
Result :
On-line parsers with correct-prefix property
cannot run in time O ( f(|VD|, |w |) ),
for any function f
Off-line parsing
Grammar is precompiled (Left-Corner, LR) :
w
Parser
C(G )
G
PreComp
Output
Standard Parsing: TD
Fact :
We can simulate a nondeterministic
FA M on w in time O ( |M |  |w | )
Conjecture :
Fix a polynomial p.
We cannot simulate M on w in time
p( |w | ) unless we spend exponential
time in precompiling M
Standard Parsing: TD
Assume our conjecture holds true
Result :
Off-line parsers with correct-prefix property
cannot run in time O ( p(|VD|, |w |) ),
for any polynomial p, unless we spend
exponential time in precompiling G
Standard Parsing: BU
Common practice in lexicalized grammar
parsing :
• select productions that are lexically
grounded in w
• parse BU with selected subset of G
Problem :
Algorithm removes |VT| factors but
introduces new |w | factors !!
Standard Parsing: BU
Time charged :
• i, k, j
 |w |3
A[d2]
B[d1]
i
d1
C[d2]
k
d2
• A, B, C
j
• d1, d2
 |VD|3
 |w |2
Running time is O ( |VD|3  |w |5 ) !!
Standard BU : Exhaustive
100000
10000
y = c x 5,2019
1000
time
BU naive
100
10
1
10
100
length
Standard BU : Pruning
10000
1000
time
y = c x 3,8282
BU naive
100
10
1
10
100
length
Summary
• Part I: Lexicalized Context-Free Grammars
– motivations and definition
– relation with other formalisms
• Part II: standard parsing
– TD techniques
– BU techniques
• Part III: novel algorithms
– BU enhanced
– TD enhanced
BU enhanced
Result :
Parsing with 2-lex CFG in time
O ( |VD|3  |w |4 )
Remark :
Result transfers to models in Alshawi 1996,
Eisner 1996, Charniak 1997, Collins 1997
Remark :
Technique extends to improve parsing of
Lexicalized-Tree Adjoining Grammars
Algorithm #1
Basic step in naive BU :
A[d2]
B[d1]
i
d1
C[d2]
k
d2
j
Idea :
Indices d1 and j can be processed
independently
Algorithm #1
• Step 1
A[d2]
• Step 2
d1

C[d2]
B[d1]
i
A[d2]
k
A[d2]
k
d2
k
i
d2
C[d2]
i
C[d2]
j
d2
A[d2]

i
d2
j
BU enhanced
Upper bound provided by Algorithm #1 :
O (|w |4 )
Goal :
Can we go down to O (|w |3 ) ?
Spine
The spine of a parse tree is the path from
the root to the root’s head
S[buy]
S[buy]
NP[IBM]
IBM
AdvP[week]
VP[buy]
V[buy]
bought
last week
NP[Lotus]
Lotus
Spine projection
The spine projection is the yield of the subtree composed by the spine and all its
sibling nodes
S[buy]
S[buy]
NP[IBM]
IBM
AdvP[week]
VP[buy]
V[buy]
bought
last week
NP[Lotus]
Lotus
NP[IBM] bought NP[Lotus] AdvP[week]
Split Grammars
Split spine projections at head :
??
Problem :
how much information do we need to store
in order to construct new grammatical
spine projections from splits ?
Split Grammars
Fact :
Set of spine projections is a linear contextfree language
Definition :
2-lex CFG is split if set of spine projections
is a regular language
Remark :
For split grammars, we can recombine
splits using finite information
Split Grammars
S[d]
AdvP[a]
S1[d]
S[d]
AdvP[a]
AdvP[b]
S1[d]
S[d]
AdvP[b]
Non-split grammar :
• unbounded # of
dependencies
between left and
right dependents
of head

• linguistically
unattested and
unlikely
Split Grammars
S[d]
AdvP[a] S[d]
S[d]
AdvP[b]

Split grammar :
finite # of
dependencies
between left and
right dependents
of lexical head
Split Grammars
Precompile grammar such that splits are
derived separately
S[buy]
S[buy]
S[buy] AdvP[week]
NP[IBM] VP[buy]
NP[IBM] r3[buy]

V[buy] NP[Lotus]
bought
r3[buy] is a split symbol
r2[buy]
AdvP[week]
r1[buy]
NP[Lotus]
bought
Split Grammars
• t : max # of states per spine automaton
• g : max # of split symbols per spine
automaton (g < t )
• m : # of delexicalized nonterminals
thare are maximal projections
BU enhanced
Result :
Parsing with split 2-lexical CFG in time
O (t 2 g 2 m 2 |w |3 )
Remark:
Models in Alshawi 1996, Charniak 1997
and Collins 1997 are not split
Algorithm #2
Idea :
B[d]
d
B[d]
B[d]
s
d
s
d
• recognize left and
right splits separately
• collect head
dependents one
split at a time
Algorithm #2
NP[IBM]
bought
NP[Lotus]
AdvP[week]
Algorithm #2
• Step 1
r1
B[d1]
s1
k
d1
B[d1]
s1
s2
d2
d1
d2
• Step 2
i

s2
d1
B[d1]
r2
r2
r2
r2

s1
s2
s2
d2
i
d2
Algorithm #2 : Exhaustive
100000
y = c x 5,2019
10000
y = c x 3,328
1000
time
BU split
BU naive
100
10
1
10
100
length
Algorithm #2 : Pruning
10000
y = c x 3,8282
1000
y = c x 2,8179
time
BU naive
BU split
100
10
1
10
100
length
Related work
Cubic time algorithms for lexicalized
grammars :
• Sleator & Temperley 1991 :
Link Grammars
• Eisner 1997 :
Bilexical Grammars (improved by
transfer of Algorithm #2)
TD enhanced
Goal :
Introduce TD prediction for 2-lexical
CFG parsing, without |VT| factors
Remark :
Must relax left-to-right parsing
(because of previous results)
TD enhanced
Result :
TD parsing with 2-lex CFG in time
O ( |VD|3  |w |4 )
Open :
O ( |w |3 ) extension to split grammars
TD enhanced
Strongest version of correct-prefix property :
S
w
reading position
Data Structures
Prods with lhs A[d] :
Trie for A[d] :
• A[d]  X1[d1] X2[d2]
d1
d2
• A[d]  Y1[d3] Y2[d2]
• A[d]  Z1[d2] Z2[d1]
d2
d1
d3
Data Structures
Rightmost subsequence recognition
by precompiling input w into a
deterministic FA :
a
b
a
b
c
a
c
a
c
a
b
b
Algorithm #3
i
j
S
Item representation :
A[d]
• i, j indicate extension
of A[d] partial analysis
k
• k indicates rightmost
possible position for
completion of A[d]
analysis
Algorithm #3 : Prediction
• Step 1 :
find rightmost
subsequence
before k for some
A[d2] production
A[d2]
C[d2]
B[d1]
d1
i
d2
k’
k
• Step 2 :
make Earley
prediction
Conclusions
• standard parsing techniques are not
suitable for processing lexicalized
grammars
• novel algorithms have been introduced
using enhanced dynamic programming
• work to be done :
extension to history-based models
The End
Many thanks for helpful discussion to :
Jason Eisner, Mark-Jan Nederhof
Descargar

Parsing Techniques for Lexicalized Context