Parsing Techniques for Lexicalized Context-Free Grammars* Giorgio Satta University of Padua * Joint work with : Jason Eisner, Mark-Jan Nederhof Summary • Part I: Lexicalized Context-Free Grammars – motivations and definition – relation with other formalisms • Part II: standard parsing – TD techniques – BU techniques • Part III: novel algorithms – BU enhanced – TD enhanced Lexicalized grammars • each rule specialized for one or more lexical items • advantages over non-lexicalized formalisms: – express syntactic preferences that are sensitive to lexical words – control word selection Syntactic preferences • adjuncts Workers [ dumped sacks ] into a bin *Workers dumped [ sacks into a bin ] • N-N compound [ hydrogen ion ] exchange *hydrogen [ ion exchange ] Word selection • lexical Nora convened the meeting ?Nora convened the party • semantics Peggy solved two puzzles ?Peggy solved two goats • world knowledge Mary shelved some books ?Mary shelved some cooks Lexicalized CFG Motivations : • study computational properties common to generative formalisms used in state-of-the-art real-world parsers • develop parsing algorithm that can be directly applied to these formalisms Lexicalized CFG VP[dump][sack] VP[dump][sack] V[dump] NP[sack] PP[into][bin] P[into] N[sack] dumped sacks NP[bin] Det[a] into a N[bin] bin Lexicalized CFG Context-free grammars with : • alphabet VT: – dumped, sacks, into, ... • delexicalized nonterminals VD: – NP, VP, ... • nonterminals VN: – NP[sack], VP[dump][sack], ... Lexicalized CFG Delexicalized nonterminals encode : • word sense N, V, ... • grammatical features number, tense, ... • structural information bar level, subcategorization state, ... • other constraints distribution, contextual features, ... Lexicalized CFG • productions have two forms : – V[dump] dumped – VP[dump][sack] VP[dump][sack] PP[into][bin] • lexical elements in lhs inherited from rhs Lexicalized CFG • production is k-lexical : k occurrences of lexical elements in rhs – NP[bin] Det[a] N[bin] is 2-lexical – VP[dump][sack] VP[dump][sack] PP[into][bin] is 4-lexical LCFG at work • 2-lexical CFG – Alshawi 1996 : Head Automata – Eisner 1996 : Dependency Grammars – Charniak 1997 : CFG – Collins 1997 : generative model LCFG at work Probabilistic LCFG G is strongly equivalent to probabilistic grammar G’ iff • 1-2-1 mapping between derivations • each direction is a homomorphism • derivation probabilities are preserved LCFG at work From Charniak 1997 to 2-lex CFG : NPS[profits] NNP[profits] ADJNP[corporate] Pr1 (corporate | ADJ, NP, profits) Pr1 (profits | N, NP, profits) Pr2 ( NP ADJ N | NP, S, profits) LCFG at work From Collins 1997 (Model #2) to 2-lex CFG : VPS, {}, Dleft D, DS D [bought] N, D [IBM] VPS, {NP-C}, Dleft , DS [bought] Prleft (NP, IBM | VP, S, bought, Dleft , {NP-C}) LCFG at work Major Limitation : Cannot capture relations involving lexical items outside actual constituent (cfr. history based models) NP[d1][d0] V[d0] d0 NP[d1] d1 PP[d2][d3] d2 cannot look at d0 when computing PP attachment LCFG at work • lexicalized context-free parsers that are not LCFG : – Magerman 1995 : Shift-Reduce+ – Ratnaparkhi 1997 : Shift-Reduce+ – Chelba & Jelinek 1998 : Shift-Reduce+ – Hermjakob & Mooney 1997 : LR Related work Other frameworks for the study of lexicalized grammars : • Carroll & Weir 1997 : Stochastic Lexicalized Grammars; emphasis on expressiveness • Goodman 1997 : Probabilistic Feature Grammars; emphasis on parameter estimation Summary • Part I: Lexicalized Context-Free Grammars – motivations and definition – relation with other formalisms • Part II: standard parsing – TD techniques – BU techniques • Part III: novel algorithms – BU enhanced – TD enhanced Standard Parsing • standard parsing algorithms (CKY, Earley, LC, ...) run on LCFG in time O ( |G | |w |3 ) • for 2-lex CFG (simplest case) |G | grows with |VD|3 |VT|2 !! Goal : Get rid of |VT| factors Standard Parsing: TD Result (to be refined) : Algorithms satisfying the correct-prefix property are “unlikely” to run on LCFG in time independent of VT Correct-prefix property Earley, Left-Corner, GLR, ... : S w left-to-right reading position On-line parsing No grammar precompilation (Earley) : G Parser w Output Standard Parsing: TD Result : On-line parsers with correct-prefix property cannot run in time O ( f(|VD|, |w |) ), for any function f Off-line parsing Grammar is precompiled (Left-Corner, LR) : w Parser C(G ) G PreComp Output Standard Parsing: TD Fact : We can simulate a nondeterministic FA M on w in time O ( |M | |w | ) Conjecture : Fix a polynomial p. We cannot simulate M on w in time p( |w | ) unless we spend exponential time in precompiling M Standard Parsing: TD Assume our conjecture holds true Result : Off-line parsers with correct-prefix property cannot run in time O ( p(|VD|, |w |) ), for any polynomial p, unless we spend exponential time in precompiling G Standard Parsing: BU Common practice in lexicalized grammar parsing : • select productions that are lexically grounded in w • parse BU with selected subset of G Problem : Algorithm removes |VT| factors but introduces new |w | factors !! Standard Parsing: BU Time charged : • i, k, j |w |3 A[d2] B[d1] i d1 C[d2] k d2 • A, B, C j • d1, d2 |VD|3 |w |2 Running time is O ( |VD|3 |w |5 ) !! Standard BU : Exhaustive 100000 10000 y = c x 5,2019 1000 time BU naive 100 10 1 10 100 length Standard BU : Pruning 10000 1000 time y = c x 3,8282 BU naive 100 10 1 10 100 length Summary • Part I: Lexicalized Context-Free Grammars – motivations and definition – relation with other formalisms • Part II: standard parsing – TD techniques – BU techniques • Part III: novel algorithms – BU enhanced – TD enhanced BU enhanced Result : Parsing with 2-lex CFG in time O ( |VD|3 |w |4 ) Remark : Result transfers to models in Alshawi 1996, Eisner 1996, Charniak 1997, Collins 1997 Remark : Technique extends to improve parsing of Lexicalized-Tree Adjoining Grammars Algorithm #1 Basic step in naive BU : A[d2] B[d1] i d1 C[d2] k d2 j Idea : Indices d1 and j can be processed independently Algorithm #1 • Step 1 A[d2] • Step 2 d1 C[d2] B[d1] i A[d2] k A[d2] k d2 k i d2 C[d2] i C[d2] j d2 A[d2] i d2 j BU enhanced Upper bound provided by Algorithm #1 : O (|w |4 ) Goal : Can we go down to O (|w |3 ) ? Spine The spine of a parse tree is the path from the root to the root’s head S[buy] S[buy] NP[IBM] IBM AdvP[week] VP[buy] V[buy] bought last week NP[Lotus] Lotus Spine projection The spine projection is the yield of the subtree composed by the spine and all its sibling nodes S[buy] S[buy] NP[IBM] IBM AdvP[week] VP[buy] V[buy] bought last week NP[Lotus] Lotus NP[IBM] bought NP[Lotus] AdvP[week] Split Grammars Split spine projections at head : ?? Problem : how much information do we need to store in order to construct new grammatical spine projections from splits ? Split Grammars Fact : Set of spine projections is a linear contextfree language Definition : 2-lex CFG is split if set of spine projections is a regular language Remark : For split grammars, we can recombine splits using finite information Split Grammars S[d] AdvP[a] S1[d] S[d] AdvP[a] AdvP[b] S1[d] S[d] AdvP[b] Non-split grammar : • unbounded # of dependencies between left and right dependents of head • linguistically unattested and unlikely Split Grammars S[d] AdvP[a] S[d] S[d] AdvP[b] Split grammar : finite # of dependencies between left and right dependents of lexical head Split Grammars Precompile grammar such that splits are derived separately S[buy] S[buy] S[buy] AdvP[week] NP[IBM] VP[buy] NP[IBM] r3[buy] V[buy] NP[Lotus] bought r3[buy] is a split symbol r2[buy] AdvP[week] r1[buy] NP[Lotus] bought Split Grammars • t : max # of states per spine automaton • g : max # of split symbols per spine automaton (g < t ) • m : # of delexicalized nonterminals thare are maximal projections BU enhanced Result : Parsing with split 2-lexical CFG in time O (t 2 g 2 m 2 |w |3 ) Remark: Models in Alshawi 1996, Charniak 1997 and Collins 1997 are not split Algorithm #2 Idea : B[d] d B[d] B[d] s d s d • recognize left and right splits separately • collect head dependents one split at a time Algorithm #2 NP[IBM] bought NP[Lotus] AdvP[week] Algorithm #2 • Step 1 r1 B[d1] s1 k d1 B[d1] s1 s2 d2 d1 d2 • Step 2 i s2 d1 B[d1] r2 r2 r2 r2 s1 s2 s2 d2 i d2 Algorithm #2 : Exhaustive 100000 y = c x 5,2019 10000 y = c x 3,328 1000 time BU split BU naive 100 10 1 10 100 length Algorithm #2 : Pruning 10000 y = c x 3,8282 1000 y = c x 2,8179 time BU naive BU split 100 10 1 10 100 length Related work Cubic time algorithms for lexicalized grammars : • Sleator & Temperley 1991 : Link Grammars • Eisner 1997 : Bilexical Grammars (improved by transfer of Algorithm #2) TD enhanced Goal : Introduce TD prediction for 2-lexical CFG parsing, without |VT| factors Remark : Must relax left-to-right parsing (because of previous results) TD enhanced Result : TD parsing with 2-lex CFG in time O ( |VD|3 |w |4 ) Open : O ( |w |3 ) extension to split grammars TD enhanced Strongest version of correct-prefix property : S w reading position Data Structures Prods with lhs A[d] : Trie for A[d] : • A[d] X1[d1] X2[d2] d1 d2 • A[d] Y1[d3] Y2[d2] • A[d] Z1[d2] Z2[d1] d2 d1 d3 Data Structures Rightmost subsequence recognition by precompiling input w into a deterministic FA : a b a b c a c a c a b b Algorithm #3 i j S Item representation : A[d] • i, j indicate extension of A[d] partial analysis k • k indicates rightmost possible position for completion of A[d] analysis Algorithm #3 : Prediction • Step 1 : find rightmost subsequence before k for some A[d2] production A[d2] C[d2] B[d1] d1 i d2 k’ k • Step 2 : make Earley prediction Conclusions • standard parsing techniques are not suitable for processing lexicalized grammars • novel algorithms have been introduced using enhanced dynamic programming • work to be done : extension to history-based models The End Many thanks for helpful discussion to : Jason Eisner, Mark-Jan Nederhof

Descargar
# Parsing Techniques for Lexicalized Context