Statistical XFER:
Hybrid Statistical Rule-based
Machine Translation
Alon Lavie
Language Technologies Institute
Carnegie Mellon University
Joint work with:
Jaime Carbonell, Lori Levin, Bob Frederking, Erik Peterson,
Christian Monson, Vamshi Ambati, Greg Hanneman, Kathrin
Probst, Ariadna Font-Llitjos, Alison Alvarez, Roberto Aranovich
Outline
•
•
•
•
•
•
•
Background and Rationale
Stat-XFER Framework Overview
Elicitation
Learning Transfer Rules
Automatic Rule Refinement
Example Prototypes
Major Research Challenges
Aug 29, 2007
Statistical XFER MT
2
Progression of MT
• Started with rule-based systems
– Very large expert human effort to construct languagespecific resources (grammars, lexicons)
– High-quality MT extremely expensive  only for handful of
language pairs
• Along came EBMT and then Statistical MT…
– Replaced human effort with extremely large volumes of
parallel text data
– Less expensive, but still only feasible for a small number of
language pairs
– We “traded” human labor with data
• Where does this take us in 5-10 years?
– Large parallel corpora for maybe 25-50 language pairs
• What about all the other languages?
• Is all this data (with very shallow representation of
language structure) really necessary?
• Can we build MT approaches that learn deeper levels of
language structure and how they map from one
language to another?
Aug 29, 2007
Statistical XFER MT
3
Rule-based vs. Statistical MT
• Traditional Rule-based MT:
– Expressive and linguistically-rich formalisms capable of
describing complex mappings between the two languages
– Accurate “clean” resources
– Everything constructed manually by experts
– Main challenge: obtaining broad coverage
• Phrase-based Statistical MT:
– Learn word and phrase correspondences automatically
from large volumes of parallel data
– Search-based “decoding” framework:
• Models propose many alternative translations
• Effective search algorithms find the “best” translation
– Main challenge: obtaining high translation accuracy
Aug 29, 2007
Statistical XFER MT
4
Main Principles of Stat-XFER
• Integrate the major strengths of rule-based and
statistical MT within a common framework:
– Linguistically rich formalism that can express complex and
abstract compositional transfer rules
– Rules can be written by human experts and also acquired
automatically from data
– Easy integration of morphological analyzers and generators
– Word and basic phrase correspondences (i.e. base NPs)
can be automatically acquired from parallel text when
available
– Search-based decoding from statistical MT adapted to find
the best translation within the search space: multi-feature
scoring, beam-search, parameter optimization, etc.
– Framework suitable for both resource-rich and resourcepoor language scenarios
Aug 29, 2007
Statistical XFER MT
5
Stat-XFER MT Approach
Interlingua
Semantic
Analysis
Syntactic
Parsing
Sentence
Planning
Transfer Rules
Text
Generation
Statistical-XFER
Source
(e.g. Quechua)
Aug 29, 2007
Direct: SMT, EBMT
Statistical XFER MT
Target
(e.g. English)
6
Source Input
‫בשורה הבאה‬
Transfer Rules
{NP1,3}
NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1]
((X3::Y1)
(X1::Y2)
((X1 def) = +)
((X1 status) =c absolute)
((X1 num) = (X3 num))
((X1 gen) = (X3 gen))
(X0 = X1))
Preprocessing
Morphology
Transfer
Engine
Language
Model +
Additional
Features
Translation Lexicon
N::N |: ["$WR"] -> ["BULL"]
((X1::Y1)
((X0 NUM) = s)
((Y0 lex) = "BULL"))
N::N |: ["$WRH"] -> ["LINE"]
((X1::Y1)
((X0 NUM) = s)
((Y0 lex) = "LINE"))
Aug 29, 2007
Decoder
Translation
Output Lattice
(0 1 "IN" @PREP)
(1 1 "THE" @DET)
(2 2 "LINE" @N)
(1 2 "THE LINE" @NP)
(0 2 "IN LINE" @PP)
Statistical
XFER
MT
(0
4 "IN THE NEXT
LINE" @PP)
English Output
in the next line
7
Transfer Rule Formalism
;SL: the old man, TL: ha-ish ha-zaqen
Type information
Part-of-speech/constituent
information
Alignments
x-side constraints
[DET ADJ N] -> [DET N DET ADJ]
((X1 AGR) = *3-SING)
((X1 DEF = *DEF)
((X3 AGR) = *3-SING)
((X3 COUNT) = +)
y-side constraints
xy-constraints,
e.g. ((Y1 AGR) = (X1 AGR))
Aug 29, 2007
NP::NP
(
(X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2)
((Y1 DEF) = *DEF)
((Y3 DEF) = *DEF)
((Y2 AGR) = *3-SING)
((Y2 GENDER) = (Y4 GENDER))
)
Statistical XFER MT
8
Transfer Rule Formalism (II)
;SL: the old man, TL: ha-ish ha-zaqen
NP::NP
(
(X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2)
Value constraints
Agreement constraints
Aug 29, 2007
[DET ADJ N] -> [DET N DET ADJ]
((X1 AGR) = *3-SING)
((X1 DEF = *DEF)
((X3 AGR) = *3-SING)
((X3 COUNT) = +)
((Y1 DEF) = *DEF)
((Y3 DEF) = *DEF)
((Y2 AGR) = *3-SING)
((Y2 GENDER) = (Y4 GENDER))
)
Statistical XFER MT
9
Hebrew Manual Transfer Grammar
(human-developed)
• Initially developed in a couple of days, with
some later revisions by a CL post-doc
• Current grammar has 36 rules:
–
–
–
–
21 NP rules
one PP rule
6 verb complexes and VP rules
8 higher-phrase and sentence-level rules
• Captures the most common (mostly local)
structural differences between Hebrew and
English
Aug 29, 2007
Statistical XFER MT
10
Hebrew Transfer Grammar
Example Rules
{NP1,2}
;;SL: $MLH ADWMH
;;TL: A RED DRESS
{NP1,3}
;;SL: H $MLWT H ADWMWT
;;TL: THE RED DRESSES
NP1::NP1 [NP1 ADJ] -> [ADJ NP1]
(
(X2::Y1)
(X1::Y2)
((X1 def) = -)
((X1 status) =c absolute)
((X1 num) = (X2 num))
((X1 gen) = (X2 gen))
(X0 = X1)
)
NP1::NP1 [NP1 "H" ADJ] -> [ADJ NP1]
(
(X3::Y1)
(X1::Y2)
((X1 def) = +)
((X1 status) =c absolute)
((X1 num) = (X3 num))
((X1 gen) = (X3 gen))
(X0 = X1)
)
Aug 29, 2007
Statistical XFER MT
11
The XFER Engine
• Input: source-language input sentence, or sourcelanguage confusion network
• Output: lattice representing collection of translation
fragments at all levels supported by transfer rules
• Basic Algorithm: “bottom-up” integrated “parsingtransfer-generation” guided by the transfer rules
– Start with translations of individual words and phrases
from translation lexicon
– Create translations of larger constituents by applying
applicable transfer rules to previously created lattice
entries
– Beam-search controls the exponential combinatorics of the
search-space, using multiple scoring features
Aug 29, 2007
Statistical XFER MT
12
Source-language Confusion Network
Hebrew Example
• Input word: B$WRH
0
1
2
3
4
|--------B$WRH--------|
|-----B-----|$WR|--H--|
|--B--|-H--|--$WRH---|
Aug 29, 2007
Statistical XFER MT
13
XFER Output Lattice
(28
(29
(29
(29
(30
(30
(30
(30
(30
(30
(30
28
29
29
29
30
30
30
30
30
30
30
"AND" -5.6988 "W" "(CONJ,0 'AND')")
"SINCE" -8.20817 "MAZ " "(ADVP,0 (ADV,5 'SINCE')) ")
"SINCE THEN" -12.0165 "MAZ " "(ADVP,0 (ADV,6 'SINCE THEN')) ")
"EVER SINCE" -12.5564 "MAZ " "(ADVP,0 (ADV,4 'EVER SINCE')) ")
"WORKED" -10.9913 "&BD " "(VERB,0 (V,11 'WORKED')) ")
"FUNCTIONED" -16.0023 "&BD " "(VERB,0 (V,10 'FUNCTIONED')) ")
"WORSHIPPED" -17.3393 "&BD " "(VERB,0 (V,12 'WORSHIPPED')) ")
"SERVED" -11.5161 "&BD " "(VERB,0 (V,14 'SERVED')) ")
"SLAVE" -13.9523 "&BD " "(NP0,0 (N,34 'SLAVE')) ")
"BONDSMAN" -18.0325 "&BD " "(NP0,0 (N,36 'BONDSMAN')) ")
"A SLAVE" -16.8671 "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0 (NP0,0
(N,34 'SLAVE')) ) ) ) ")
(30 30 "A BONDSMAN" -21.0649 "&BD " "(NP,1 (LITERAL 'A') (NP2,0 (NP1,0
(NP0,0 (N,36 'BONDSMAN')) ) ) ) ")
Aug 29, 2007
Statistical XFER MT
14
The Lattice Decoder
• Simple Stack Decoder, similar in principle to simple
Statistical MT decoders
• Searches for best-scoring path of non-overlapping
lattice arcs
• No reordering during decoding
• Scoring based on log-linear combination of scoring
components, with weights trained using MERT
• Scoring components:
– Statistical Language Model
– Fragmentation: how many arcs to cover the entire
translation?
– Length Penalty
– Rule Scores
– Lexical Probabilities
Aug 29, 2007
Statistical XFER MT
15
XFER Lattice Decoder
00
ON THE FOURTH DAY THE LION ATE THE RABBIT TO A MORNING MEAL
Overall: -8.18323, Prob: -94.382, Rules: 0, Frag: 0.153846, Length: 0,
Words: 13,13
235 < 0 8 -19.7602: B H IWM RBI&I (PP,0 (PREP,3 'ON')(NP,2 (LITERAL 'THE')
(NP2,0 (NP1,1 (ADJ,2 (QUANT,0 'FOURTH'))(NP1,0 (NP0,1 (N,6 'DAY')))))))>
918 < 8 14 -46.2973: H ARIH AKL AT H $PN (S,2 (NP,2 (LITERAL 'THE') (NP2,0
(NP1,0 (NP0,1 (N,17 'LION')))))(VERB,0 (V,0 'ATE'))(NP,100
(NP,2 (LITERAL 'THE') (NP2,0 (NP1,0 (NP0,1 (N,24 'RABBIT')))))))>
584 < 14 17 -30.6607: L ARWXH BWQR (PP,0 (PREP,6 'TO')(NP,1 (LITERAL 'A')
(NP2,0 (NP1,0 (NNP,3 (NP0,0 (N,32 'MORNING'))(NP0,0 (N,27 'MEAL')))))))>
Aug 29, 2007
Statistical XFER MT
16
Data Elicitation for Languages with
Limited Resources
• Rationale:
– Large volumes of parallel text not available  create
a small maximally-diverse parallel corpus that
directly supports the learning task
– Bilingual native informant(s) can translate and align
a small pre-designed elicitation corpus, using
elicitation tool
– Elicitation corpus designed to be typologically and
structurally comprehensive and compositional
– Transfer-rule engine and new learning approach
support acquisition of generalized transfer-rules from
the data
Aug 29, 2007
Statistical XFER MT
17
Elicitation Tool:
English-Chinese Example
Aug 29, 2007
Statistical XFER MT
18
Elicitation Tool:
English-Chinese Example
Aug 29, 2007
Statistical XFER MT
19
Elicitation Tool:
English-Hindi Example
Aug 29, 2007
Statistical XFER MT
20
Elicitation Tool:
English-Arabic Example
Aug 29, 2007
Statistical XFER MT
21
Elicitation Tool:
Spanish-Mapudungun Example
Aug 29, 2007
Statistical XFER MT
22
Designing Elicitation Corpora
• Goal: Create a small representative parallel corpus that
contains examples of the most important translation
correspondences and divergences between the two languages
• Method:
– Elicit translations and word alignments for a broad diversity of
linguistic phenomena and constructions
• Current Elicitation Corpus: ~3100 sentences and phrases,
constructed based on a broad feature-based specification
• Open Research Issues:
– Feature Detection: discover what features exist in the language
and where/how they are marked
• Example: does the language mark gender of nouns? How and where
are these marked?
– Dynamic corpus navigation based on feature detection: no need to
elicit for combinations involving non-existent features
Aug 29, 2007
Statistical XFER MT
23
Rule Learning - Overview
• Goal: Acquire Syntactic Transfer Rules
• Use available knowledge from the source
side (grammatical structure)
• Three steps:
1. Flat Seed Generation: first guesses at
transfer rules; flat syntactic structure
2. Compositionality Learning: use previously
learned rules to learn hierarchical structure
3. Constraint Learning: refine rules by
learning appropriate feature constraints
Aug 29, 2007
Statistical XFER MT
24
Flat Seed Rule Generation
Learning Example: NP
Eng:
the big apple
Heb: ha-tapuax ha-gadol
Generated Seed Rule:
NP::NP [ART ADJ N]  [ART N ART ADJ]
((X1::Y1)
(X1::Y3)
(X2::Y4)
(X3::Y2))
Aug 29, 2007
Statistical XFER MT
25
Compositionality Learning
Initial Flat Rules:
S::S
[ART ADJ N V ART N]  [ART N ART ADJ V P ART N]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8))
NP::NP [ART ADJ N]  [ART N ART ADJ]
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N]  [ART N]
((X1::Y1) (X2::Y2))
Generated Compositional Rule:
S::S [NP V NP]  [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4))
Aug 29, 2007
Statistical XFER MT
26
Constraint Learning
Input: Rules and their Example Sets
S::S [NP V NP]  [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4))
{ex1,ex12,ex17,ex26}
NP::NP [ART ADJ N]  [ART N ART ADJ] {ex2,ex3,ex13}
((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2))
NP::NP [ART N]  [ART N]
((X1::Y1) (X2::Y2))
{ex4,ex5,ex6,ex8,ex10,ex11}
Output: Rules with Feature Constraints:
S::S [NP V NP]  [NP V P NP]
((X1::Y1) (X2::Y2) (X3::Y4)
(X1 NUM = X2 NUM)
(Y1 NUM = Y2 NUM)
(X1 NUM = Y1 NUM))
Aug 29, 2007
Statistical XFER MT
27
Automated Rule Refinement
• Bilingual informants can identify translation
errors and pinpoint the errors
• A sophisticated trace of the translation path
can identify likely sources for the error and do
“Blame Assignment”
• Rule Refinement operators can be developed
to modify the underlying translation grammar
(and lexicon) based on characteristics of the
error source:
– Add or delete feature constraints from a rule
– Bifurcate a rule into two rules (general and specific)
– Add or correct lexical entries
• See [Font-Llitjos, Carbonell & Lavie, 2005]
Aug 29, 2007
Statistical XFER MT
28
Stat-XFER MT Prototypes
• General Statistical XFER framework under development for
past five years (funded by NSF and DARPA)
• Prototype systems so far:
–
–
–
–
–
–
Chinese-to-English
Dutch-to-English
French-to-English
Hindi-to-English
Hebrew-to-English
Mapudungun-to-Spanish
–
–
–
–
–
–
Brazilian Portuguese-to-English
Native-Brazilian languages to Brazilian Portuguese
Hebrew-to-Arabic
Iñupiaq-to-English
Urdu-to-English
Turkish-to-English
• In progress or planned:
Aug 29, 2007
Statistical XFER MT
29
Chinese-English Stat-XFER System
• Bilingual lexicon: over 1.1 million entries (multiple
resources, incl. ADSO, Wikipedia, extracted base NPs)
• Manual syntactic XFER grammar: 76 rules! (mostly
NPs, a few PPs, and reordering of NPs/PPs within VPs)
• Multiple overlapping Chinese word segmentations
• English morphology generation
• Uses CMU SMT-group’s Suffix-Array LM toolkit for LM
• Current Performance (GALE dev-test):
– NW:
• XFER:
10.89(B)/0.4509(M)
• Best (UMD): 15.58(B)/0.4769(M)
– NG
• XFER:
8.92(B)/0.4229(M)
• Best (UMD): 12.96(B)/0.4455(M)
• In Progress:
– Automatic extraction of “clean” base NPs from parallel data
– Automatic learning and extraction of high-quality transferrules from parallel data
Aug 29, 2007
Statistical XFER MT
30
Translation Example
•
REFERENCE: When responding to whether it is possible
•
Stat-XFER (0.3989): In reply to whether the possibility to
extend the Russian fleet stationed in Crimea Pen. left the
deadline of the problem , Yanukovich replied : " of course .
IBM-ylee (0.2203):
In response to the possibility to extend the
deadline for the presence in Crimea peninsula , the Queen Vic said : "
of course .
CMU-SMT (0.2067): In response to a possible extension of the fleet in
the Crimean Peninsula stay on the issue , Yanukovych vetch replied : "
of course .
maryland-hiero (0.1878): In response to the possibility of extending
the mandate of the Crimean peninsula in , replied: "of course.
IBM-smt (0.1862):
The answer is likely to be extended the
Crimean peninsula of the presence of the problem, Yanukovych said: "
Of course.
CMU-syntax (0.1639): In response to the possibility of extension of
the presence in the Crimean Peninsula , replied : " of course .
•
•
•
•
•
to extend Russian fleet's stationing deadline at the
Crimean peninsula, Yanukovych replied, "Without a
doubt.
Aug 29, 2007
Statistical XFER MT
31
Major Research Directions
• Automatic Transfer Rule Learning:
– From manually word-aligned elicitation corpus
– From large volumes of automatically word-aligned
“wild” parallel data
– In the absence of morphology or POS annotated
lexica
– Compositionality and generalization
– Identifying “good” rules from “bad” rules
– Effective models for rule scoring for
• Decoding: using scores at runtime
• Pruning the large collections of learned rules
– Learning Unification Constraints
Aug 29, 2007
Statistical XFER MT
32
Major Research Directions
• Extraction of Base-NP translations from parallel data:
– Base-NPs are extremely important “building blocks” for
transfer-based MT systems
• Frequent, often align 1-to-1, improve coverage
• Correctly identifying them greatly helps automatic wordalignment of parallel sentences
– Parsers (or NP-chunkers) available for both languages:
Extract base-NPs independently on both sides and find
their correspondences
– Parsers (or NP-chunkers) available for only one language
(i.e. English): Extract base-NPs on one side, and find
reliable correspondences for them using word-alignment,
frequency distributions, other features…
• Promising preliminary results
Aug 29, 2007
Statistical XFER MT
33
Major Research Directions
• Algorithms for XFER and Decoding
– Integration and optimization of multiple
features into search-based XFER parser
– Complexity and efficiency improvements
(i.e. “Cube Pruning”)
– Non-monotonicity issues (LM scores,
unification constraints) and their
consequences on search
Aug 29, 2007
Statistical XFER MT
34
Major Research Directions
• Discriminative Language Modeling for MT:
– Current standard statistical LMs provide only weak
discrimination between good and bad translation
hypotheses
– New Idea: Use “occurrence-based” statistics:
• Extract instances of lexical, syntactic and semantic features
from each translation hypothesis
• Determine whether these instances have been “seen before”
(at least once) in a large monolingual corpus
– The Conjecture: more grammatical MT hypotheses are
likely to contain higher proportions of feature instances
that have been seen in a corpus of grammatical sentences.
– Goals:
• Find the set of features that provides the best discrimination
between good and bad translations
• Learn how to combine these into a LM-like function for scoring
alternative MT hypotheses
Aug 29, 2007
Statistical XFER MT
35
Major Research Directions
• Building Elicitation Corpora:
– Feature Detection
– Corpus Navigation
• Automatic Rule Refinement
• Translation for highly polysynthetic
languages such as Mapudungun and
Iñupiaq
Aug 29, 2007
Statistical XFER MT
36
Questions?
Aug 29, 2007
Statistical XFER MT
37
Descargar

Automatic Rule Learning for Resource Limited MT