CS60057
Speech &Natural Language
Processing
Autumn 2008
Lecture 16
3 Sep 2008
Outline for MT





Intro and a little history
Language Similarities and Divergences
Three classic MT Approaches
 Transfer
 Interlingua
 Direct
Modern Statistical MT
Evaluation
Lecture 1, 7/21/2005
Natural Language Processing
2
What is MT?

Translating a text from one language to another
automatically.
Lecture 1, 7/21/2005
Natural Language Processing
3
Machine Translation

dai yu zi zai chuang shang gan nian bao chai you ting
jian chuang wai zhu shao xiang ye zhe shang, yu sheng
xi li, qing han tou mu, bu jue you di xia lei lai.

Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen
to window outside bamboo tip plantain leaf of on-top rain sound sigh
drop clear cold penetrate curtain not feeling again fall down tears
come
As she lay there alone, Dai-yu’s thoughts turned to Bao-chai… Then
she listened to the insistent rustle of the rain on the bamboos and
plantains outside her window. The coldness penetrated the curtains
of her bed. Almost without noticing it she had begun to cry.

Lecture 1, 7/21/2005
Natural Language Processing
4
Machine Translation
Lecture 1, 7/21/2005
Natural Language Processing
5
Machine Translation



The Story of the Stone
=The Dream of the Red Chamber (Cao Xueqin 1792)
Issues:
 Word segmentation
 Sentence segmentation: 4 English sentences to 1 Chinese
 Grammatical differences




Chinese rarely marks tense:
 As, turned to, had begun,
 tou -> penetrated
Zero anaphora
No articles
Stylistic and cultural differences



Bamboo tip plaintain leaf -> bamboos and plantains
Ma ‘curtain’ -> curtains of her bed
Rain sound sigh drop -> insistent rustle of the rain
Lecture 1, 7/21/2005
Natural Language Processing
6
Not just literature

Hansards: Canadian parliamentary proceeedings
Lecture 1, 7/21/2005
Natural Language Processing
7
What is MT not good for?

Really hard stuff
 Literature
 Natural spoken speech (meetings, court reporting)

Really important stuff
 Medical translation in hospitals, 911
Lecture 1, 7/21/2005
Natural Language Processing
8
What is MT good for?



Tasks for which a rough translation is fine
 Web pages, email
Tasks for which MT can be post-edited
 MT as first pass
 “Computer-aided human translation
Tasks in sublanguage domains where high-quality MT is
possible
 FAHQT
Lecture 1, 7/21/2005
Natural Language Processing
9
Sublanguage domain



Weather forecasting
 “Cloudy with a chance of showers today and Thursday”
 “Low tonight 4”
Can be modeling completely enough to use raw MT output
Word classes and semantic features like MONTH, PLACE,
DIRECTION, TIME POINT
Lecture 1, 7/21/2005
Natural Language Processing
10
MT History






1946 Booth and Weaver discuss MT at Rockefeller foundation in
New York;
1947-48 idea of dictionary-based direct translation
1949 Weaver memorandum popularized idea
1952 all 18 MT researchers in world meet at MIT
1954 IBM/Georgetown Demo Russian-English MT
1955-65 lots of labs take up MT
Lecture 1, 7/21/2005
Natural Language Processing
11
History of MT: Pessimism

1959/1960: Bar-Hillel “Report on the state of MT in US and GB”
 Argued FAHQT too hard (semantic ambiguity, etc)
 Should work on semi-automatic instead of automatic
 His argument
Little John was looking for his toy box. Finally, he found it. The
box was in the pen. John was very happy.
 Only human knowledge let’s us know that ‘playpens’ are bigger
than boxes, but ‘writing pens’ are smaller
 His claim: we would have to encode all of human knowledge
Lecture 1, 7/21/2005
Natural Language Processing
12
History of MT: Pessimism

The ALPAC report
 Headed by John R. Pierce of Bell Labs
 Conclusions:





Supply of human translators exceeds demand
All the Soviet literature is already being translated
MT has been a failure: all current MT work had to be post-edited
Sponsored evaluations which showed that intelligibility and
informativeness was worse than human translations
Results:

MT research suffered
 Funding loss
 Number of research labs declined
 Association for Machine Translation and Computational
Linguistics dropped MT from its name
Lecture 1, 7/21/2005
Natural Language Processing
13
History of MT





1976 Meteo, weather forecasts from English to French
Systran (Babelfish) been used for 40 years
1970’s:
 European focus in MT; mainly ignored in US
1980’s
 ideas of using AI techniques in MT (KBMT, CMU)
1990’s
 Commercial MT systems
 Statistical MT
 Speech-to-speech translation
Lecture 1, 7/21/2005
Natural Language Processing
14
Language Similarities and Divergences



Some aspects of human language are universal or nearuniversal, others diverge greatly.
Typology: the study of systematic cross-linguistic
similarities and differences
What are the dimensions along with human languages
vary?
Lecture 1, 7/21/2005
Natural Language Processing
15
Morphological Variation




Isolating languages
 Cantonese, Vietnamese: each word generally has one
morpheme
Vs. Polysynthetic languages
 Siberian Yupik (`Eskimo’): single word may have very many
morphemes
Agglutinative languages
 Turkish: morphemes have clean boundaries
Vs. Fusion languages
 Russian: single affix may have many morphemes
Lecture 1, 7/21/2005
Natural Language Processing
16
Syntactic Variation


SVO (Subject-Verb-Object) languages
 English, German, French, Mandarin
SOV Languages
 Japanese, Hindi
VSO languages
 Irish, Classical Arabic
 SVO lgs generally prepositions: to Yuriko

VSO lgs generally postpositions:
Yuriko ni
Lecture 1, 7/21/2005
Natural Language Processing

17
Segmentation Variation


Not every writing system has word boundaries marked
 Chinese, Japanese, Thai, Vietnamese
Some languages tend to have sentences that are quite
long, closer to English paragraphs than sentences:
 Modern Standard Arabic, Chinese
Lecture 1, 7/21/2005
Natural Language Processing
18
Inferential Load: cold vs. hot lgs


Some ‘cold’ languages require the hearer to do more
“figuring out” of who the various actors in the various
events are:
 Japanese, Chinese,
Other ‘hot’ languages are pretty explicit about saying
who did what to whom.
 English
Lecture 1, 7/21/2005
Natural Language Processing
19
Inferential Load (2)
All noun phrases in
blue do not appear
in Chinese text …
But they are
needed
for a good
translation
Lecture 1, 7/21/2005
Natural Language Processing
20
Lexical Divergences


Word to phrases:
 English “computer science” = French “informatique”
POS divergences
 Eng. ‘she likes/VERB to sing’
 Ger. Sie singt gerne/ADV
 Eng ‘I’m hungry/ADJ
 Sp. ‘tengo hambre/NOUN
Lecture 1, 7/21/2005
Natural Language Processing
21
Lexical Divergences: Specificity

Grammatical constraints
 English has gender on pronouns, Mandarin not.



So translating “3rd person” from Chinese to English, need to figure
out gender of the person!
Similarly from English “they” to French “ils/elles”
Semantic constraints
 English `brother’
 Mandarin ‘gege’ (older) versus ‘didi’ (younger)
 English ‘wall’
 German ‘Wand’ (inside) ‘Mauer’ (outside)
 German ‘Berg’
 English ‘hill’ or ‘mountain’
Lecture 1, 7/21/2005
Natural Language Processing
22
Lexical Divergence: many-to-many
Lecture 1, 7/21/2005
Natural Language Processing
23
Lexical Divergence: lexical gaps

Japanese: no word for privacy
English: no word for Cantonese ‘haauseun’ or Japanese
‘oyakoko’ (something like `filial piety’)

English ‘cow’ versus ‘beef’, Cantonese ‘ngau’

Lecture 1, 7/21/2005
Natural Language Processing
24
Event-to-argument divergences




English
 The bottle floated out.
Spanish
 La botella salió flotando.
 The bottle exited floating
Verb-framed lg: mark direction of motion on verb
 Spanish, French, Arabic, Hebrew, Japanese, Tamil, Polynesian,
Mayan, Bantu familiies
Satellite-framed lg: mark direction of motion on satellite
 Crawl out, float off, jump down, walk over to, run after
 Rest of Indo-European, Hungarian, Finnish, Chinese
Lecture 1, 7/21/2005
Natural Language Processing
25
Structural divergences


G: Wir treffen uns am Mittwoch
E: We’ll meet on Wednesday
Lecture 1, 7/21/2005
Natural Language Processing
26
Head Swapping






E: X swim across Y
S: X crucar Y nadando
E: I like to eat
G: Ich esse gern
E: I’d prefer vanilla
G: Mir wäre Vanille lieber
Lecture 1, 7/21/2005
Natural Language Processing
27
Thematic divergence




Y me gusto
I like Y
G: Mir fällt der Termin ein
E: I forget the date
Lecture 1, 7/21/2005
Natural Language Processing
28
Divergence counts from Bonnie Dorr

32% of sentences in UN Spanish/English Corpus (5K)
Categorial
X tener hambre
Y have hunger
98%
Conflational
X dar puñaladas a Z
X stab Z
83%
Structural
X entrar en Y
X enter Y
35%
Head Swapping
X cruzar Y nadando
X swim across Y
8%
Thematic
X gustar a Y
Y likes X
6%
Lecture 1, 7/21/2005
Natural Language Processing
29
MT on the web


Babelfish:
 http://babelfish.altavista.com/
Google:
 http://www.google.com/search?hl=en&lr=&client=safa
ri&rls=en&q="1+taza+de+jugo"+%28zumo%29+de+n
aranja+5+cucharadas+de+azucar+morena&btnG=Se
arch
Lecture 1, 7/21/2005
Natural Language Processing
30
3 methods for MT



Direct
Transfer
Interlingua
Lecture 1, 7/21/2005
Natural Language Processing
31
Three MT Approaches:
Direct, Transfer, Interlingual
Lecture 1, 7/21/2005
Natural Language Processing
32
Direct Translation





Proceed word-by-word through text
Translating each word
No intermediate structures except morphology
Knowledge is in the form of
 Huge bilingual dictionary
 word-to-word translation information
After word translation, can do simple reordering
 Adjective ordering English -> French/Spanish
Lecture 1, 7/21/2005
Natural Language Processing
33
Direct MT Dictionary entry
Lecture 1, 7/21/2005
Natural Language Processing
34
Direct MT
Lecture 1, 7/21/2005
Natural Language Processing
35
Problems with direct MT

German

Chinese
Lecture 1, 7/21/2005
Natural Language Processing
36
The Transfer Model


Idea: apply contrastive knowledge, i.e., knowledge about
the difference between two languages
Steps:
 Analysis: Syntactically parse Source language
 Transfer: Rules to turn this parse into parse for Target
language
 Generation: Generate Target sentence from parse
tree
Lecture 1, 7/21/2005
Natural Language Processing
37
English to French

Generally
 English: Adjective Noun
 French: Noun Adjective
 Note: not always true




Route mauvaise ‘bad road, badly-paved road’
Mauvaise route ‘wrong road’)
But is a reasonable first approximation
Rule:
Lecture 1, 7/21/2005
Natural Language Processing
38
Transfer rules
Lecture 1, 7/21/2005
Natural Language Processing
39
Lexical transfer






Transfer-based systems also need lexical transfer rules
Bilingual dictionary (like for direct MT)
English home:
German
 nach Hause (going home)
 Heim (home game)
 Heimat (homeland, home country)
 zu Hause (at home)
Can list “at home <-> zu Hause”
Or do Word Sense Disambiguation
Lecture 1, 7/21/2005
Natural Language Processing
40
Systran: combining direct and transfer
Analysis
 Morphological analysis, POS tagging
 Chunking of NPs, PPs, phrases
 Shallow dependency parsing
 Transfer
 Translation of idioms
 Word sense disambiguation
 Assigning prepositions based on governing verbs
 Synthesis
 Apply rich bilingual dictionary
 Deal with reordering
 Morphological generation
Lecture 1, 7/21/2005
Natural Language Processing

41
Transfer: some problems



N2 sets of transfer rules!
Grammar and lexicon full of language-specific stuff
Hard to build, hard to maintain
Lecture 1, 7/21/2005
Natural Language Processing
42
Interlingua


Intuition: Instead of lg-lg knowledge rules, use the
meaning of the sentence to help
Steps:
 1) translate source sentence into meaning
representation
 2) generate target sentence from meaning.
Lecture 1, 7/21/2005
Natural Language Processing
43
Interlingua for
Mary did not slap the green witch
Lecture 1, 7/21/2005
Natural Language Processing
44
Interlingua



Idea is that some of the MT work that we need to do is
part of other NLP tasks
E.g., disambiguating E:book S:‘libro’ from E:book
S:‘reservar’
So we could have concepts like BOOKVOLUME and
RESERVE and solve this problem once for each
language
Lecture 1, 7/21/2005
Natural Language Processing
45
Direct MT: pros and cons (Bonnie Dorr)


Pros
 Fast
 Simple
 Cheap
 No translation rules hidden in lexicon
Cons
 Unreliable
 Not powerful
 Rule proliferation
 Requires lots of context
 Major restructuring after lexical substitution
Lecture 1, 7/21/2005
Natural Language Processing
46
Interlingual MT: pros and cons (B. Dorr)


Pros
 Avoids the N2 problem
 Easier to write rules
Cons:
 Semantics is HARD
 Useful information lost (paraphrase)
Lecture 1, 7/21/2005
Natural Language Processing
47
The impossibility of translation

Hebrew “adonoi roi” for a culture without sheep or
shepherds
 Something fluent and understandable, but not faithful:


“The Lord will look after me”
Something faithful, but not fluent and nautral

“The Lord is for me like somebody who looks after animals
with cotton-like hair”
Lecture 1, 7/21/2005
Natural Language Processing
48
What makes a good translation



Translators often talk about two factors we want to
maximize:
Faithfulness or fidelity
 How close is the meaning of the translation to the
meaning of the original
 (Even better: does the translation cause the reader to
draw the same inferences as the original would have)
Fluency or naturalness
 How natural the translation is, just considering its
fluency in the target language
Lecture 1, 7/21/2005
Natural Language Processing
49
Statistical MT Systems
Spanish/English
Bilingual Text
Statistical Analysis
Spanish
Que hambre tengo yo
Lecture 1, 7/21/2005
Slide from Kevin Knight
English
Text
Statistical Analysis
Broken
English
What hunger have I,
Hungry I am so,
I am so hungry,
Natural Language Processing
Have I that hunger …
English
I am so hungry
50
Statistical MT Systems
Spanish/English
Bilingual Text
English
Text
Statistical Analysis
Statistical Analysis
Broken
English
Spanish
Translation
Model P(s|e)
Que hambre tengo yo
Lecture 1, 7/21/2005
Slide from Kevin Knight
English
Language
Model P(e)
Decoding algorithm
argmax P(e) * P(s|e)
e
Natural Language Processing
I am so hungry
51
Statistical MT:
Faithfulness and Fluency formalized!

Best-translation of a source sentence S:
Tˆ  argmax



T
fluency (T )faithfulness
(T , S )
Developed by researchers who were originally in speech
recognition at IBM
Called the IBM model
Lecture 1, 7/21/2005
Natural Language Processing
52
Three Problems for Statistical MT

Language model
 Given an English string e, assigns P(e) by formula
 good English string
-> high P(e)
 random word sequence
-> low P(e)

Translation model
 Given a pair of strings <f,e>, assigns P(f | e) by formula
 <f,e> look like translations
-> high P(f | e)
 <f,e> don’t look like translations
-> low P(f | e)
Decoding algorithm
 Given a language model, a translation model, and a new
Lecture 1, 7/21/2005
Natural Language Processing
53
sentence
f
…
find
translation
e
maximizing
P(e)
*
P(f
|
e)
Slide from Kevin Knight

The IBM model

Hmm, those two factors might look familiar…
Tˆ  argmax

T
fluency (T )faithfulness
(T , S )
Yup, it’s Bayes rule:
Tˆ  argmax

T
P (T ) P ( S | T )

Lecture 1, 7/21/2005
Natural Language Processing
54
More formally


Assume we are translating from a foreign language
sentence F to an English sentence E:
 F = f1, f2, f3,…, fm
We want to find the best English sentence
 E-hat = e1, e2, e3,…, en
 E-hat = argmaxE P(E|F)

= argmaxE P(F|E)P(E)/P(F)

= argmaxE P(F|E)P(E)
Translation Model
Language Model
Lecture 1, 7/21/2005
Natural Language Processing
55
The noisy channel model for MT
Lecture 1, 7/21/2005
Natural Language Processing
56
Fluency: P(T)





How to measure that this sentence
 That car was almost crash onto me
is less fluent than this one:
 That car almost hit me.
Answer: language models (N-grams!)
 For example P(hit|almost) > P(was|almost)
But can use any other more sophisticated model of
grammar
Advantage: this is monolingual knowledge!
Lecture 1, 7/21/2005
Natural Language Processing
57
Faithfulness: P(S|T)




French: ça me plait [that me pleases]
English:
 that pleases me - most fluent
 I like it
 I’ll take that one
How to quantify this?
Intuition: degree to which words in one sentence are
plausible translations of words in other sentence
 Product of probabilities that each word in target
sentence would generate each word in source
sentence.
Lecture 1, 7/21/2005
Natural Language Processing
58
Faithfulness P(S|T)



Need to know, for every target language word,
probability of it mapping to every source language word.
How do we learn these probabilities?
Parallel texts!
 Lots of times we have two texts that are translations
of each other
 If we knew which word in Source Text mapped to
each word in Target Text, we could just count!
Lecture 1, 7/21/2005
Natural Language Processing
59
Faithfulness P(S|T)


Sentence alignment:
 Figuring out which source language sentence maps
to which target language sentence
Word alignment
 Figuring out which source language word maps to
which target language word
Lecture 1, 7/21/2005
Natural Language Processing
60
Big Point about Faithfulness and
Fluency



Job of the faithfulness model P(S|T) is just to model
“bag of words”; which words come from say English to
Spanish.
P(S|T) doesn’t have to worry about internal facts about
Spanish word order: that’s the job of P(T)
P(T) can do Bag generation: put the following words in
order (from Kevin Knight)
 have programming a seen never I language better
-actual the hashing is since not collision-free usually
the is less perfectly the of somewhat capacity table
Lecture 1, 7/21/2005
Natural Language Processing
61
P(T) and bag generation:
the answer

“Usually the actual capacity of the table is somewhat
less, since the hashing is not collision-free”

How about:
 loves Mary John
Lecture 1, 7/21/2005
Natural Language Processing
62
Three Problems for Statistical MT

Language model
 Given an English string e, assigns P(e) by formula
 good English string
-> high P(e)
 random word sequence
-> low P(e)

Translation model
 Given a pair of strings <f,e>, assigns P(f | e) by formula
 <f,e> look like translations
-> high P(f | e)
 <f,e> don’t look like translations
-> low P(f | e)
Decoding algorithm
 Given a language model, a translation model, and a new
Lecture 1, 7/21/2005
Natural Language Processing
63
sentence
f
…
find
translation
e
maximizing
P(e)
*
P(f
|
e)
Slide from Kevin Knight

The Classic Language Model
Word N-Grams
Goal of the language model -- choose among:
He is on the soccer field
He is in the soccer field
Is table the on cup the
The cup is on the table
Rice shrine
American shrine
Rice company
American company
Lecture 1, 7/21/2005
Slide from Kevin Knight
Natural Language Processing
64
Intuition of phrase-based
translation (Koehn et al. 2003)

Generative story has three steps
1) Group words into phrases
2) Translate each phrase
3) Move the phrases around
Lecture 1, 7/21/2005
Natural Language Processing
65
Generative story again
1)
2)
3)
Group English source words into phrases e1, e2, …, en
Translate each English phrase ei into a Spanish phrase
fj.
 The probability of doing this is (fj|ei)
Then (optionally) reorder each Spanish phrase
 We do this with a distortion probability
 A measure of distance between positions of a
corresponding phrase in the 2 lgs.
 “What is the probability that a phrase in position X in
the English sentences moves to position Y in the
Spanish sentence?”
Lecture 1, 7/21/2005
Natural Language Processing
66
Distortion probability



The distortion probability is parameterized by
 ai-bi-1
 Where ai is the start position of the foreign (Spanish)
phrase generated by the ith English phrase ei.
 And bi-1 is the end position of the foreign (Spanish)
phrase generated by the I-1th English phrase ei-1.
We’ll call the distortion probability d(ai-bi-1).
And we’ll have a really stupid model:
 d(ai-bi-1) = |ai-bi-1|
 Where  is some small constant.
Lecture 1, 7/21/2005
Natural Language Processing
67
Final translation model for phrasebased MT
l
P (F | E ) 
  ( f ,e )d (a
i
i
i
 b i1 )
i 1

Let’s look at a simple example with no distortion

Lecture 1, 7/21/2005
Natural Language Processing
68
Phrase-based MT



Language model P(E)
Translation model P(F|E)
 Model
 How to train the model
Decoder: finding the sentence E that is most probable
Lecture 1, 7/21/2005
Natural Language Processing
69
Training P(F|E)





What we mainly need to train is (fj|ei)
Suppose we had a large bilingual training corpus
 A bitext
 In which each English sentence is paired with a
Spanish sentence
And suppose we knew exactly which phrase in Spanish
was the translation of which phrase in the English
We call this a phrase alignment
If we had this, we could just count-and-divide:
Lecture 1, 7/21/2005
Natural Language Processing
70
But we don’t have phrase
alignments

What we have instead are word alignments:
Lecture 1, 7/21/2005
Natural Language Processing
71
Getting phrase alignments

To get phrase alignments:
1) We first get word alignments
2) Then we “symmetrize” the word alignments
into phrase alignments
Lecture 1, 7/21/2005
Natural Language Processing
72
How to get Word Alignments




Word alignment: a mapping between the source words
and the target words in a set of parallel sentences.
Restriction: each foreign word comes from exactly 1
English word
Advantage: represent an alignment by the index of the
English word that the French word comes from
Alignment above is thus 2,3,4,5,6,6,6
Lecture 1, 7/21/2005
Natural Language Processing
73
One addition: spurious words




A word in the foreign sentence
That doesn’t align with any word in the English sentence
Is called a spurious word.
We model these by pretending they are generated by an
English word e0:
Lecture 1, 7/21/2005
Natural Language Processing
74
More sophisticated models of
alignment
Lecture 1, 7/21/2005
Natural Language Processing
75
Computing word alignments: IBM
Model 1






For phrase-based machine translation
We want a word-alignment
To extract a set of phrases
A word alignment algorithm gives us P(F,E)
We want this to train our phrase probabilities (fj|ei) as
part of P(F|E)
But a word-alignment algorithm can also be part of a
mini-translation model itself.
Lecture 1, 7/21/2005
Natural Language Processing
76
IBM Model 1
Lecture 1, 7/21/2005
Natural Language Processing
77
IBM Model 1
Lecture 1, 7/21/2005
Natural Language Processing
78
How does the generative story assign
P(F|E) for a Spanish sentence F?

Terminology:

Suppose we had done steps 1 and 2, I.e. we already
knew the Spanish length J and the alignment A (and
English source E):
Lecture 1, 7/21/2005
Natural Language Processing
79
Let’s formalize steps 1 and 2






We want P(A|E) of an alignment A (of length J) given an
English sentence E
IBM Model 1 makes the (very) simplifying assumption
that each alignment is equally likely.
How many possible alignments are there between
English sentence of length I and Spanish sentence of
length J?
Hint: Each Spanish word must come from one of the
English source words (or the NULL word)
(I+1)J
Let’s assume probability of choosing length J is small
constant epsilon
Lecture 1, 7/21/2005
Natural Language Processing
80
Model 1 continued

Prob of choosing a length and then one of the possible
alignments:

Combining with step 3:

The total probability of a given foreign sentence F:
Lecture 1, 7/21/2005
Natural Language Processing
81
Decoding

How do we find the best A?
Lecture 1, 7/21/2005
Natural Language Processing
82
Training alignment probabilities

Step 1: get a parallel corpus
 Hansards




Canadian parliamentary proceedings, in French and English
Hong Kong Hansards: English and Chinese
Step 2: sentence alignment
Step 3: use EM (Expectation Maximization) to train word
alignments
Lecture 1, 7/21/2005
Natural Language Processing
83
Step 1: Parallel corpora

Example from DE-News (8/1/1996)
English
German
Diverging opinions about planned tax
reform
Unterschiedliche Meinungen zur geplanten
Steuerreform
The discussion around the envisaged
major tax reform continues .
Die Diskussion um die vorgesehene grosse
Steuerreform dauert an .
The FDP economics expert , Graf
Lambsdorff , today came out in favor of
advancing the enactment of significant
parts of the overhaul , currently planned
for 1999 .
Der FDP - Wirtschaftsexperte Graf
Lambsdorff sprach sich heute dafuer aus ,
wesentliche Teile der fuer 1999 geplanten
Reform vorzuziehen .
Lecture 1, 7/21/2005
Natural Language Processing
Slide from Christof Monz
84
Step 2: Sentence Alignment
The old man is happy. He has
fished many times. His wife
talks to him. The fish are
jumping. The sharks await.
El viejo está feliz porque ha
pescado muchos veces. Su
mujer habla con él. Los
tiburones esperan.
Intuition:
- use length in words or chars
- together with dynamic
programming
- or use a simpler MT model
Lecture 1, 7/21/2005
Slide from Kevin Knight
Natural Language Processing
85
Sentence Alignment
1.
2.
3.
4.
5.
The old man is happy.
He has fished many
times.
His wife talks to him.
The fish are jumping.
The sharks await.
Lecture 1, 7/21/2005
Slide from Kevin Knight
El viejo está feliz porque ha
pescado muchos veces.
Su mujer habla con él.
Los tiburones esperan.
Natural Language Processing
86
Sentence Alignment
1.
2.
3.
4.
5.
The old man is
happy.
He has fished
many times.
His wife talks to
him.
The fish are
jumping.
The sharks await.
Lecture 1, 7/21/2005
Slide from Kevin Knight
El viejo está feliz
porque ha
pescado muchos
veces.
Su mujer habla con
él.
Los tiburones
esperan.
Natural Language Processing
87
Sentence Alignment
1.
2.
3.
The old man is
happy. He has
fished many
times.
His wife talks to
him.
The sharks await.
El viejo está feliz
porque ha
pescado muchos
veces.
Su mujer habla con
él.
Los tiburones
esperan.
Note that unaligned sentences are thrown out, and
sentences are merged in n-to-m alignments (n, m > 0).
Lecture 1, 7/21/2005
Slide from Kevin Knight
Natural Language Processing
88
Step 3: word alignments



It turns out we can bootstrap alignments
From a sentence-aligned bilingual corpus
We use is the Expectation-Maximization or EM
algorithm
Lecture 1, 7/21/2005
Natural Language Processing
89
EM for training alignment probs
… la maison … la maison bleue … la fleur …
… the house … the blue house … the flower …
All word alignments equally likely
All P(french-word | english-word) equally likely
Lecture 1, 7/21/2005
Slide from Kevin Knight
Natural Language Processing
90
EM for training alignment probs
… la maison … la maison bleue … la fleur …
… the house … the blue house … the flower …
“la” and “the” observed to co-occur frequently,
so P(la | the) is increased.
Lecture 1, 7/21/2005
Slide from Kevin Knight
Natural Language Processing
91
EM for training alignment probs
… la maison … la maison bleue … la fleur …
… the house … the blue house … the flower …
“house” co-occurs with both “la” and “maison”, but
P(maison | house) can be raised without limit, to 1.0,
while P(la | house) is limited because of “the”
(pigeonhole principle)
Lecture 1, 7/21/2005
Slide from Kevin Knight
Natural Language Processing
92
EM for training alignment probs
… la maison … la maison bleue … la fleur …
… the house … the blue house … the flower …
settling down after another iteration
Lecture 1, 7/21/2005
Slide from Kevin Knight
Natural Language Processing
93
EM for training alignment probs
… la maison … la maison bleue … la fleur …
… the house … the blue house … the flower …
Inherent hidden structure revealed by EM training!
For details, see:
•Section 24.6.1 in the chapter
• “A Statistical MT Tutorial Workbook” (Knight, 1999).
• “The Mathematics of Statistical Machine Translation” (Brown et al, 1993)
• Software: GIZA++
Lecture 1, 7/21/2005
Slide from Kevin Knight
Natural Language Processing
94
Statistical Machine Translation
… la maison … la maison bleue … la fleur …
… the house … the blue house … the flower …
P(juste | fair) = 0.411
P(juste | correct) = 0.027
P(juste | right) = 0.020
…
new French
Lecture 1, 7/21/2005
sentence
Slide from Kevin Knight
Possible English translations,
Natural Language Processing
to be rescored by language model
95
A more complex model: IBM Model 3
Brown et al., 1993
Generative approach:
Mary did not slap the green witch
Mary not slap slap slap the green witch
Mary not slap slap slap NULL the green witch
n(3|slap)
P-Null
t(la|the)
Maria no dió una bofetada a la verde bruja
d(j|i)
Maria no dió una bofetada a la bruja verde
Lecture 1, 7/21/2005
Processing
Probabilities
can be learned fromNatural
rawLanguage
bilingual
text.
96
How do we evaluate MT? Human
tests for fluency

Rating tests: Give the raters a scale (1 to 5) and ask
them to rate
 Or distinct scales for


Clarity, Naturalness, Style
Or check for specific problems


Cohesion (Lexical chains, anaphora, ellipsis)
 Hand-checking for cohesion.
Well-formedness
 5-point scale of syntactic correctness
Comprehensibility tests
 Noise test
 Multiple choice questionnaire
 Readability tests
Lecture 1, 7/21/2005
Natural Language Processing
 cloze

97
How do we evaluate MT? Human
tests for fidelity

Adequacy
 Does it convey the information in the original?
 Ask raters to rate on a scale
 Bilingual
raters: give them source and target
sentence, ask how much information is preserved
 Monolingual raters: give them target + a good
human translation

Informativeness
 Task based: is there enough info to do some
task?
 Give raters multiple-choice questions about
content
Lecture 1, 7/21/2005
Natural Language Processing
98
Evaluating MT: Problems




Asking humans to judge sentences on a 5-point scale for
10 factors takes time and $$$ (weeks or months!)
We can’t build language engineering systems if we can
only evaluate them once every quarter!!!!
We need a metric that we can run every time we change
our algorithm.
It would be OK if it wasn’t perfect, but just tended to
correlate with the expensive human metrics, which we
could still run in quarterly.
Lecture 1, 7/21/2005
Bonnie Dorr
Natural Language Processing
99
Automatic evaluation



Miller and Beebe-Center (1958)
Assume we have one or more human translations of the
source passage
Compare the automatic translation to these human
translations
 Bleu
 NIST
 Meteor
 Precision/Recall
Lecture 1, 7/21/2005
Natural Language Processing
100
BiLingual Evaluation Understudy
(BLEU —Papineni, 2001)
http://www.research.ibm.com/people/k/kishore/RC22176.pdf



Automatic Technique, but ….
Requires the pre-existence of Human (Reference) Translations
Approach:
 Produce corpus of high-quality human translations
 Judge “closeness” numerically (word-error rate)
 Compare n-gram matches between candidate translation and
1 or more reference translations
Lecture 1, 7/21/2005
Slide from Bonnie Dorr
Natural Language Processing
101
BLEU Evaluation Metric
(Papineni et al, ACL-2002)
Reference (human) translation:
The U.S. island of Guam is
maintaining a high state of alert
after the Guam airport and its
offices both received an e-mail
from someone calling himself the
Saudi Arabian Osama bin Laden
and threatening a
biological/chemical attack against
public places such as the airport .
Machine translation:
The American [?] international
airport and its the office all
receives one calls self the sand
Arab rich business [?] and so on
electronic mail , which sends out ;
The threat will be able after public
place and so on the airport to start
the biochemistry attack , [?] highly
Lecture
7/21/2005
alerts 1,after
the maintenance.
Slide from Bonnie Dorr
• N-gram precision (score is between 0 & 1)
– What percentage of machine n-grams can
be found in the reference translation?
– An n-gram is an sequence of n words
– Not allowed to use same portion of reference
translation twice (can’t cheat by typing out
“the the the the the”)
• Brevity penalty
– Can’t just type out single word “the”
(precision 1.0!)
*** Amazingly hard to “game” the system (i.e., find a
way to change machine output so that BLEU
goes up, but quality doesn’t)
Natural Language Processing
102
BLEU Evaluation Metric
(Papineni et al, ACL-2002)
Reference (human) translation:
The U.S. island of Guam is
maintaining a high state of alert
after the Guam airport and its
offices both received an e-mail
from someone calling himself the
Saudi Arabian Osama bin Laden
and threatening a
biological/chemical attack against
public places such as the airport .
Machine translation:
The American [?] international
airport and its the office all
receives one calls self the sand
Arab rich business [?] and so on
electronic mail , which sends out ;
The threat will be able after public
place and so on the airport to start
the biochemistry attack , [?] highly
Lecture
7/21/2005
alerts 1,after
the maintenance.
Slide from Bonnie Dorr
• BLEU4 formula
(counts n-grams up to length 4)
exp (1.0 * log p1 +
0.5 * log p2 +
0.25 * log p3 +
0.125 * log p4 –
max(words-in-reference / words-in-machine – 1,
0)
p1 = 1-gram precision
P2 = 2-gram precision
P3 = 3-gram precision
P4 = 4-gram precision
Natural Language Processing
103
Multiple Reference Translations
Reference translation 1:
The U.S. island of Guam is maintaining
a high state of alert after the Guam
airport and its offices both received an
e-mail from someone calling himself
the Saudi Arabian Osama bin Laden
and threatening a biological/chemical
attack against public places such as
the airport .
Reference translation 2:
Guam International Airport and its
offices are maintaining a high state of
alert after receiving an e-mail that was
from a person claiming to be the
wealthy Saudi Arabian businessman
Bin Laden and that threatened to
launch a biological and chemical attack
on the airport and other public places .
Machine translation:
The American [?] international airport
and its the office all receives one calls
self the sand Arab rich business [?]
and so on electronic mail , which
sends out ; The threat will be able
after public place and so on the
airport to start the biochemistry attack
, [?] highly alerts after the
maintenance.
Reference translation 3:
The US International Airport of Guam
and its office has received an email
from a self-claimed Arabian millionaire
named Laden , which threatens to
launch a biochemical attack on such
public places as airport . Guam
authority has been on alert .
Lecture 1, 7/21/2005
Slide from Bonnie Dorr
Natural Language Processing
Reference translation 4:
US Guam International Airport and its
office received an email from Mr. Bin
Laden and other rich businessman
from Saudi Arabia . They said there
would be biochemistry air raid to Guam
Airport and other public places . Guam
needs to be in high precaution about
this matter .
104
Bleu Comparison
Chinese-English Translation Example:
Candidate 1: It is a guide to action which ensures that the military
always obeys the commands of the party.
Candidate 2: It is to insure the troops forever hearing the activity
guidebook that party direct.
Reference 1: It is a guide to action that ensures that the military
will forever heed Party commands.
Reference 2: It is the guiding principle which guarantees the
military forces always being under the command of the Party.
Reference 3: It is the practical guide for the army always to
heed the directions of the party.
Lecture 1, 7/21/2005
Slide from Bonnie Dorr
Natural Language Processing
107
How Do We Compute
Bleu Scores?




Intuition: “What percentage of words in candidate occurred in some
human translation?”
Proposal: count up # of candidate translation words (unigrams) # in
any reference translation, divide by the total # of words in # candidate
translation
But can’t just count total # of overlapping N-grams!
 Candidate: the the the the the the
 Reference 1: The cat is on the mat
Solution: A reference word should be considered exhausted after a
matching candidate word is identified.
Lecture 1, 7/21/2005
Slide from Bonnie Dorr
Natural Language Processing
108
“Modified n-gram precision”




For each word compute:
(1) total number of times it occurs in any single reference
translation
(2) number of times it occurs in the candidate translation
Instead of using count #2, use the minimum of #2 and #2,
I.e. clip the counts at the max for the reference
transcription
Now use that modified count.
And divide by number of candidate words.
Lecture 1, 7/21/2005
Slide from Bonnie Dorr
Natural Language Processing
109
Modified Unigram Precision:
Candidate #1
It(1) is(1) a(1) guide(1) to(1) action(1) which(1) ensures(1) that(2)
the(4) military(1) always(1) obeys(0) the commands(1) of(1) the
party(1)
Reference 1: It is a guide to action that ensures that the
military will forever heed Party commands.
Reference 2: It is the guiding principle which guarantees the
military forces always being under the command of the Party.
Reference 3: It is the practical guide for the army always to
heed the directions of the party.
What’s the answer???
17/18
Lecture 1, 7/21/2005
Slide from Bonnie Dorr
Natural Language Processing
110
Modified Unigram Precision:
Candidate #2
It(1) is(1) to(1) insure(0) the(4) troops(0) forever(1) hearing(0)
the activity(0) guidebook(0) that(2) party(1) direct(0)
Reference 1: It is a guide to action that ensures that the
military will forever heed Party commands.
Reference 2: It is the guiding principle which guarantees the
military forces always being under the command of the Party.
Reference 3: It is the practical guide for the army always to
heed the directions of the party.
What’s the answer????
Lecture 1, 7/21/2005
Slide from Bonnie Dorr
Natural Language Processing
8/14
111
Modified Bigram Precision:
Candidate #1
It is(1) is a(1) a guide(1) guide to(1) to action(1) action
which(0) which ensures(0) ensures that(1) that the(1) the
military(1) military always(0) always obeys(0) obeys the(0) the
commands(0) commands of(0) of the(1) the party(1)
Reference 1: It is a guide to action that ensures that the military will forever
heed Party commands.
Reference 2: It is the guiding principle which guarantees the military forces
always being under the command of the Party.
Reference 3: It is the practical guide for the army always to heed the
directions of the party.
What’s the answer????
Lecture 1, 7/21/2005
Slide from Bonnie Dorr
Natural Language Processing
10/17
112
Modified Bigram Precision:
Candidate #2
It is(1) is to(0) to insure(0) insure the(0) the troops(0)
troops forever(0) forever hearing(0) hearing the(0) the
activity(0) activity guidebook(0) guidebook that(0) that
party(0) party direct(0)
Reference 1: It is a guide to action that ensures that the
military will forever heed Party commands.
Reference 2: It is the guiding principle which guarantees the military
forces always being under the command of the Party.
Reference 3: It is the practical guide for the army always to heed the
directions of the party.
What’s the answer????
Lecture 1, 7/21/2005
Slide from Bonnie Dorr
1/13
Natural Language Processing
113
Catching Cheaters
the(2) the the the(0) the(0) the(0) the(0)
Reference 1: The cat is on the mat
Reference 2: There is a cat on the mat
What’s the unigram answer?
What’s the bigram answer?
Lecture 1, 7/21/2005
Slide from Bonnie Dorr
Natural Language Processing
2/7
0/7
114
Bleu distinguishes human from
machine translations
Lecture 1, 7/21/2005
Slide from Bonnie Dorr
Natural Language Processing
115
Bleu problems with sentence length

Candidate: of the
Reference 1: It is a guide to action that ensures that the
military will forever heed Party commands.
Reference 2: It is the guiding principle which guarantees the military
forces always being under the command of the Party.
Reference 3: It is the practical guide for the army always to heed the
directions of the party.
Problem: modified unigram precision is 2/2, bigram 1/1!

Solution: brevity penalty; prefers candidates translations
which are same length as one of the references
Lecture 1, 7/21/2005
Slide from Bonnie Dorr
Natural Language Processing
116
BLEU Tends to Predict Human Judgments
NIST Score
(variant of BLEU)
2.5
Adequacy
2.0
R2 = 88.0%
Fluency
R2 = 90.2%
1.5
Linear
(Adequacy)
Linear
(Fluency)
1.0
0.5
0.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
-0.5
-1.0
-1.5
-2.0
-2.5
Human Judgments
Lecture 1, 7/21/2005
Natural Language Processing
slide from G. Doddington (NIST)
117
Summary




Intro and a little history
Language Similarities and Divergences
Four main MT Approaches
 Transfer
 Interlingua
 Direct
 Statistical
Evaluation
Lecture 1, 7/21/2005
Natural Language Processing
118
Classes


LINGUIST 139M/239M. Human and Machine
Translation. (Martin Kay)
CS 224N. Natural Language Processing (Chris Manning)
Lecture 1, 7/21/2005
Natural Language Processing
119
Descargar

Computing and the Humanities