CS60057
Speech &Natural Language
Processing
Autumn 2007
Lecture 14b
24 August 2007
Lecture 1, 7/21/2005
Natural Language Processing
1
LING 180 SYMBSYS 138
Intro to Computer Speech and
Language Processing
Lecture 9: Machine Translation (I)
November 7, 2006
Dan Jurafsky
Thanks to Bonnie Dorr for some of these slides!!
Lecture 1, 7/21/2005
Natural Language Processing
2
Outline for MT Week





Intro and a little history
Language Similarities and Divergences
Three classic MT Approaches
 Transfer
 Interlingua
 Direct
Modern Statistical MT
Evaluation
Lecture 1, 7/21/2005
Natural Language Processing
3
What is MT?

Translating a text from one language to another
automatically.
Lecture 1, 7/21/2005
Natural Language Processing
4
Machine Translation

dai yu zi zai chuang shang gan nian bao chai you ting
jian chuang wai zhu shao xiang ye zhe shang, yu sheng
xi li, qing han tou mu, bu jue you di xia lei lai.

Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen
to window outside bamboo tip plantain leaf of on-top rain sound sigh
drop clear cold penetrate curtain not feeling again fall down tears
come
As she lay there alone, Dai-yu’s thoughts turned to Bao-chai… Then
she listened to the insistent rustle of the rain on the bamboos and
plantains outside her window. The coldness penetrated the curtains
of her bed. Almost without noticing it she had begun to cry.

Lecture 1, 7/21/2005
Natural Language Processing
5
Machine Translation
Lecture 1, 7/21/2005
Natural Language Processing
6
Machine Translation



The Story of the Stone
=The Dream of the Red Chamber (Cao Xueqin 1792)
Issues:
 Word segmentation
 Sentence segmentation: 4 English sentences to 1 Chinese
 Grammatical differences




Chinese rarely marks tense:
 As, turned to, had begun,
 tou -> penetrated
Zero anaphora
No articles
Stylistic and cultural differences



Bamboo tip plaintain leaf -> bamboos and plantains
Ma ‘curtain’ -> curtains of her bed
Rain sound sigh drop -> insistent rustle of the rain
Lecture 1, 7/21/2005
Natural Language Processing
7
Not just literature

Hansards: Canadian parliamentary proceeedings
Lecture 1, 7/21/2005
Natural Language Processing
8
What is MT not good for?

Really hard stuff
 Literature
 Natural spoken speech (meetings, court reporting)

Really important stuff
 Medical translation in hospitals, 911
Lecture 1, 7/21/2005
Natural Language Processing
9
What is MT good for?



Tasks for which a rough translation is fine
 Web pages, email
Tasks for which MT can be post-edited
 MT as first pass
 “Computer-aided human translation
Tasks in sublanguage domains where high-quality MT is
possible
 FAHQT
Lecture 1, 7/21/2005
Natural Language Processing
10
Sublanguage domain



Weather forecasting
 “Cloudy with a chance of showers today and Thursday”
 “Low tonight 4”
Can be modeling completely enough to use raw MT output
Word classes and semantic features like MONTH, PLACE,
DIRECTION, TIME POINT
Lecture 1, 7/21/2005
Natural Language Processing
11
MT History






1946 Booth and Weaver discuss MT at Rockefeller foundation in
New York;
1947-48 idea of dictionary-based direct translation
1949 Weaver memorandum popularized idea
1952 all 18 MT researchers in world meet at MIT
1954 IBM/Georgetown Demo Russian-English MT
1955-65 lots of labs take up MT
Lecture 1, 7/21/2005
Natural Language Processing
12
History of MT: Pessimism

1959/1960: Bar-Hillel “Report on the state of MT in US and GB”
 Argued FAHQT too hard (semantic ambiguity, etc)
 Should work on semi-automatic instead of automatic
 His argument
Little John was looking for his toy box. Finally, he found it. The
box was in the pen. John was very happy.
 Only human knowledge let’s us know that ‘playpens’ are bigger
than boxes, but ‘writing pens’ are smaller
 His claim: we would have to encode all of human knowledge
Lecture 1, 7/21/2005
Natural Language Processing
13
History of MT: Pessimism

The ALPAC report
 Headed by John R. Pierce of Bell Labs
 Conclusions:





Supply of human translators exceeds demand
All the Soviet literature is already being translated
MT has been a failure: all current MT work had to be post-edited
Sponsored evaluations which showed that intelligibility and
informativeness was worse than human translations
Results:

MT research suffered
 Funding loss
 Number of research labs declined
 Association for Machine Translation and Computational
Linguistics dropped MT from its name
Lecture 1, 7/21/2005
Natural Language Processing
14
History of MT





1976 Meteo, weather forecasts from English to French
Systran (Babelfish) been used for 40 years
1970’s:
 European focus in MT; mainly ignored in US
1980’s
 ideas of using AI techniques in MT (KBMT, CMU)
1990’s
 Commercial MT systems
 Statistical MT
 Speech-to-speech translation
Lecture 1, 7/21/2005
Natural Language Processing
15
Language Similarities and Divergences



Some aspects of human language are universal or nearuniversal, others diverge greatly.
Typology: the study of systematic cross-linguistic
similarities and differences
What are the dimensions along with human languages
vary?
Lecture 1, 7/21/2005
Natural Language Processing
16
Morphological Variation




Isolating languages
 Cantonese, Vietnamese: each word generally has one
morpheme
Vs. Polysynthetic languages
 Siberian Yupik (`Eskimo’): single word may have very many
morphemes
Agglutinative languages
 Turkish: morphemes have clean boundaries
Vs. Fusion languages
 Russian: single affix may have many morphemes
Lecture 1, 7/21/2005
Natural Language Processing
17
Syntactic Variation


SVO (Subject-Verb-Object) languages
 English, German, French, Mandarin
SOV Languages
 Japanese, Hindi
VSO languages
 Irish, Classical Arabic
 SVO lgs generally prepositions: to Yuriko

VSO lgs generally postpositions:
Yuriko ni
Lecture 1, 7/21/2005
Natural Language Processing

18
Segmentation Variation


Not every writing system has word boundaries marked
 Chinese, Japanese, Thai, Vietnamese
Some languages tend to have sentences that are quite
long, closer to English paragraphs than sentences:
 Modern Standard Arabic, Chinese
Lecture 1, 7/21/2005
Natural Language Processing
19
Inferential Load: cold vs. hot lgs


Some ‘cold’ languages require the hearer to do more
“figuring out” of who the various actors in the various
events are:
 Japanese, Chinese,
Other ‘hot’ languages are pretty explicit about saying
who did what to whom.
 English
Lecture 1, 7/21/2005
Natural Language Processing
20
Inferential Load (2)
All noun phrases in
blue do not appear
in Chinese text …
But they are
needed
for a good
translation
Lecture 1, 7/21/2005
Natural Language Processing
21
Lexical Divergences


Word to phrases:
 English “computer science” = French “informatique”
POS divergences
 Eng. ‘she likes/VERB to sing’
 Ger. Sie singt gerne/ADV
 Eng ‘I’m hungry/ADJ
 Sp. ‘tengo hambre/NOUN
Lecture 1, 7/21/2005
Natural Language Processing
22
Lexical Divergences: Specificity

Grammatical constraints
 English has gender on pronouns, Mandarin not.



So translating “3rd person” from Chinese to English, need to figure
out gender of the person!
Similarly from English “they” to French “ils/elles”
Semantic constraints
 English `brother’
 Mandarin ‘gege’ (older) versus ‘didi’ (younger)
 English ‘wall’
 German ‘Wand’ (inside) ‘Mauer’ (outside)
 German ‘Berg’
 English ‘hill’ or ‘mountain’
Lecture 1, 7/21/2005
Natural Language Processing
23
Lexical Divergence: many-to-many
Lecture 1, 7/21/2005
Natural Language Processing
24
Lexical Divergence: lexical gaps

Japanese: no word for privacy
English: no word for Cantonese ‘haauseun’ or Japanese
‘oyakoko’ (something like `filial piety’)

English ‘cow’ versus ‘beef’, Cantonese ‘ngau’

Lecture 1, 7/21/2005
Natural Language Processing
25
Event-to-argument divergences




English
 The bottle floated out.
Spanish
 La botella salió flotando.
 The bottle exited floating
Verb-framed lg: mark direction of motion on verb
 Spanish, French, Arabic, Hebrew, Japanese, Tamil, Polynesian,
Mayan, Bantu familiies
Satellite-framed lg: mark direction of motion on satellite
 Crawl out, float off, jump down, walk over to, run after
 Rest of Indo-European, Hungarian, Finnish, Chinese
Lecture 1, 7/21/2005
Natural Language Processing
26
Structural divergences


G: Wir treffen uns am Mittwoch
E: We’ll meet on Wednesday
Lecture 1, 7/21/2005
Natural Language Processing
27
Head Swapping






E: X swim across Y
S: X crucar Y nadando
E: I like to eat
G: Ich esse gern
E: I’d prefer vanilla
G: Mir wäre Vanille lieber
Lecture 1, 7/21/2005
Natural Language Processing
28
Thematic divergence




Y me gusto
I like Y
G: Mir fällt der Termin ein
E: I forget the date
Lecture 1, 7/21/2005
Natural Language Processing
29
Divergence counts from Bonnie Dorr

32% of sentences in UN Spanish/English Corpus (5K)
Categorial
X tener hambre
Y have hunger
98%
Conflational
X dar puñaladas a Z
X stab Z
83%
Structural
X entrar en Y
X enter Y
35%
Head Swapping
X cruzar Y nadando
X swim across Y
8%
Thematic
X gustar a Y
Y likes X
6%
Lecture 1, 7/21/2005
Natural Language Processing
30
MT on the web


Babelfish:
 http://babelfish.altavista.com/
Google:
 http://www.google.com/search?hl=en&lr=&client=safa
ri&rls=en&q="1+taza+de+jugo"+%28zumo%29+de+n
aranja+5+cucharadas+de+azucar+morena&btnG=Se
arch
Lecture 1, 7/21/2005
Natural Language Processing
31
3 methods for MT



Direct
Transfer
Interlingua
Lecture 1, 7/21/2005
Natural Language Processing
32
Three MT Approaches:
Direct, Transfer, Interlingual
Lecture 1, 7/21/2005
Natural Language Processing
33
Direct Translation





Proceed word-by-word through text
Translating each word
No intermediate structures except morphology
Knowledge is in the form of
 Huge bilingual dictionary
 word-to-word translation information
After word translation, can do simple reordering
 Adjective ordering English -> French/Spanish
Lecture 1, 7/21/2005
Natural Language Processing
34
Direct MT Dictionary entry
Lecture 1, 7/21/2005
Natural Language Processing
35
Direct MT
Lecture 1, 7/21/2005
Natural Language Processing
36
Problems with direct MT

German

Chinese
Lecture 1, 7/21/2005
Natural Language Processing
37
The Transfer Model


Idea: apply contrastive knowledge, i.e., knowledge about
the difference between two languages
Steps:
 Analysis: Syntactically parse Source language
 Transfer: Rules to turn this parse into parse for Target
language
 Generation: Generate Target sentence from parse
tree
Lecture 1, 7/21/2005
Natural Language Processing
38
English to French

Generally
 English: Adjective Noun
 French: Noun Adjective
 Note: not always true




Route mauvaise ‘bad road, badly-paved road’
Mauvaise route ‘wrong road’)
But is a reasonable first approximation
Rule:
Lecture 1, 7/21/2005
Natural Language Processing
39
Transfer rules
Lecture 1, 7/21/2005
Natural Language Processing
40
Lexical transfer






Transfer-based systems also need lexical transfer rules
Bilingual dictionary (like for direct MT)
English home:
German
 nach Hause (going home)
 Heim (home game)
 Heimat (homeland, home country)
 zu Hause (at home)
Can list “at home <-> zu Hause”
Or do Word Sense Disambiguation
Lecture 1, 7/21/2005
Natural Language Processing
41
Systran: combining direct and transfer
Analysis
 Morphological analysis, POS tagging
 Chunking of NPs, PPs, phrases
 Shallow dependency parsing
 Transfer
 Translation of idioms
 Word sense disambiguation
 Assigning prepositions based on governing verbs
 Synthesis
 Apply rich bilingual dictionary
 Deal with reordering
 Morphological generation
Lecture 1, 7/21/2005
Natural Language Processing

42
Transfer: some problems



N2 sets of transfer rules!
Grammar and lexicon full of language-specific stuff
Hard to build, hard to maintain
Lecture 1, 7/21/2005
Natural Language Processing
43
Interlingua


Intuition: Instead of lg-lg knowledge rules, use the
meaning of the sentence to help
Steps:
 1) translate source sentence into meaning
representation
 2) generate target sentence from meaning.
Lecture 1, 7/21/2005
Natural Language Processing
44
Interlingua for
Mary did not slap the green witch
Lecture 1, 7/21/2005
Natural Language Processing
45
Interlingua



Idea is that some of the MT work that we need to do is
part of other NLP tasks
E.g., disambiguating E:book S:‘libro’ from E:book
S:‘reservar’
So we could have concepts like BOOKVOLUME and
RESERVE and solve this problem once for each
language
Lecture 1, 7/21/2005
Natural Language Processing
46
Direct MT: pros and cons (Bonnie Dorr)


Pros
 Fast
 Simple
 Cheap
 No translation rules hidden in lexicon
Cons
 Unreliable
 Not powerful
 Rule proliferation
 Requires lots of context
 Major restructuring after lexical substitution
Lecture 1, 7/21/2005
Natural Language Processing
47
Interlingual MT: pros and cons (B. Dorr)


Pros
 Avoids the N2 problem
 Easier to write rules
Cons:
 Semantics is HARD
 Useful information lost (paraphrase)
Lecture 1, 7/21/2005
Natural Language Processing
48
The impossibility of translation

Hebrew “adonoi roi” for a culture without sheep or
shepherds
 Something fluent and understandable, but not faithful:


“The Lord will look after me”
Something faithful, but not fluent and nautral

“The Lord is for me like somebody who looks after animals
with cotton-like hair”
Lecture 1, 7/21/2005
Natural Language Processing
49
What makes a good translation



Translators often talk about two factors we want to
maximize:
Faithfulness or fidelity
 How close is the meaning of the translation to the
meaning of the original
 (Even better: does the translation cause the reader to
draw the same inferences as the original would have)
Fluency or naturalness
 How natural the translation is, just considering its
fluency in the target language
Lecture 1, 7/21/2005
Natural Language Processing
50
Statistical MT:
Faithfulness and Fluency formalized!

Best-translation of a source sentence S:
Tˆ  argmax



T
fluency (T )faithfulness
(T , S )
Developed by researchers who were originally in speech
recognition at IBM
Called the IBM model
Lecture 1, 7/21/2005
Natural Language Processing
51
The IBM model

Hmm, those two factors might look familiar…
Tˆ  argmax

T
fluency (T )faithfulness
(T , S )
Yup, it’s Bayes rule:
Tˆ  argmax

T
P (T ) P ( S | T )

Lecture 1, 7/21/2005
Natural Language Processing
52
More formally


Assume we are translating from a foreign language
sentence F to an English sentence E:
 F = f1, f2, f3,…, fm
We want to find the best English sentence
 E-hat = e1, e2, e3,…, en
 E-hat = argmaxE P(E|F)

= argmaxE P(F|E)P(E)/P(F)

= argmaxE P(F|E)P(E)
Translation Model
Language Model
Lecture 1, 7/21/2005
Natural Language Processing
53
The noisy channel model for MT
Lecture 1, 7/21/2005
Natural Language Processing
54
Fluency: P(T)





How to measure that this sentence
 That car was almost crash onto me
is less fluent than this one:
 That car almost hit me.
Answer: language models (N-grams!)
 For example P(hit|almost) > P(was|almost)
But can use any other more sophisticated model of
grammar
Advantage: this is monolingual knowledge!
Lecture 1, 7/21/2005
Natural Language Processing
55
Faithfulness: P(S|T)




French: ça me plait [that me pleases]
English:
 that pleases me - most fluent
 I like it
 I’ll take that one
How to quantify this?
Intuition: degree to which words in one sentence are
plausible translations of words in other sentence
 Product of probabilities that each word in target
sentence would generate each word in source
sentence.
Lecture 1, 7/21/2005
Natural Language Processing
56
Faithfulness P(S|T)



Need to know, for every target language word,
probability of it mapping to every source language word.
How do we learn these probabilities?
Parallel texts!
 Lots of times we have two texts that are translations
of each other
 If we knew which word in Source Text mapped to
each word in Target Text, we could just count!
Lecture 1, 7/21/2005
Natural Language Processing
57
Faithfulness P(S|T)


Sentence alignment:
 Figuring out which source language sentence maps
to which target language sentence
Word alignment
 Figuring out which source language word maps to
which target language word
Lecture 1, 7/21/2005
Natural Language Processing
58
Big Point about Faithfulness and
Fluency



Job of the faithfulness model P(S|T) is just to model
“bag of words”; which words come from say English to
Spanish.
P(S|T) doesn’t have to worry about internal facts about
Spanish word order: that’s the job of P(T)
P(T) can do Bag generation: put the following words in
order (from Kevin Knight)
 have programming a seen never I language better
-actual the hashing is since not collision-free usually
the is less perfectly the of somewhat capacity table
Lecture 1, 7/21/2005
Natural Language Processing
59
P(T) and bag generation:
the answer

“Usually the actual capacity of the table is somewhat
less, since the hashing is not collision-free”

How about:
 loves Mary John
Lecture 1, 7/21/2005
Natural Language Processing
60
Summary





Intro and a little history
Language Similarities and Divergences
Three classic MT Approaches
 Transfer
 Interlingua
 Direct
Modern Statistical MT
Evaluation
Lecture 1, 7/21/2005
Natural Language Processing
61
Descargar

Computing and the Humanities