Web-Based Machine Translation
Andy Way
School of Computing
Email: [email protected]
URL: www.computing.dcu.ie/~away
Room: L245
Phone: (700)5644
Plan of Attack (1)
• What is MT?
• Why do we do it? How much is it used?
How much more could it be used?
• Is it any good? What exactly is it good for?
What is it not good for?
• What MT methods are there?
• Do on-line MT systems translate word-forword? How might we be able to tell?
Plan of Attack (2)
• Do pairs of on-line MT systems work the
same in both directions?
• How can we help these MT systems help
• The Future (?!)
• Further Reading/More Information
What is MT?
MAHT (on-line dictionaries, termbanks, TM etc …)
HAMT (resolving ambiguity etc …)
Why do we do MT?
• To communicate in other languages than the
ones we know …
• (If we’re a company) To increase/maintain
market share
• To speed up the translation process
• etc etc ...
How much is it used?
• In 2000, MT specialist Scott Bennett said “Altavista's
BabelFish ... initiated in late 1997, is now used a million
times per day”.
• In 2001, Softissimo announced that the Internet translation
request volume processed by its Reverso translation engine
(www.reverso.net) has now reached several million
translation requests (of Web pages, e-mail, short texts and
results of search engine requests) per month on its mail
translation portal and the portals of its Internet partners.“
• V.d. Meer (2003) "Every day, portals like Altavista and
Google process nearly 10 million requests for automatic
How much more could it be used?
• Volume of text required to be translated
currently exceeds translators’ capacity
(demand outstrips supply). This imbalance
will only get worse, cf. accession of new
Member states in EU.
• NB, also Official Languages Act 2003
Solution: automation (the only solution).
How much more could it be used?
• translation and localisation industry have focussed
on product documentation which represents
probably less than 20% of all text-based
information repositories that need to be localised
• time: five times the volume of text needs to be
translated in practically no time.
Corporate decision makers will have to begin
supporting multilingual communication initiatives
and strategies.
How much more could it be used?
• GIL market growing from $4.2 billion in 2001 to
$8.9 billion in 2006, an annual growth rate of
16.3%. Localisation and translation services form
by far the largest part of this market with 69.8% of
the total, i.e. $2.9 billion in 2001 and $5.8 billion
in 2006, an annual growth rate of 14.6%.
• W.r.t. crosslingual applications, expected to grow
from less than 1% of the total market in 2001
($42 million) to $193 million in 2006, 35% annual
Is MT any good? (1)
Depends … what you want to use it for and
how you use it!!
Is MT any good? (2)
• No pre-editing  Lots of post-editing!
• Lots of pre-editing  No(t much) postediting!
Is MT any good? (3)
• Sometimes no pre-editing is required:
– for gisting;
– for company-internal circulation;
– etc etc …
• What it’s not good for is literary translation,
i.e. won’t take translators’ jobs - will free
them up for new (more interesting) tasks
and create new niche markets
MT Developers
• So MT is of use, and will become used
much more than it is currently, so …
• … we need people out there who can
improve current systems and develop new
 let’s look at how people currently “design”
MT systems …
MT Methods
Rule-Based MT
Transfer Interlingua
Data-Driven MT
The Vauquois Pyramid for MT
Examples of MT methods: Transfer
English SVO, Irish VSO, Japanese SOV. So
translation between them is complicated by
facts about word order.
But at a ‘deeper’ level, the languages are more
similar ...
Transfer (cont’d)
e.g. John saw MaryChonaic Seán Máire
feic Seán Máire
Examples of MT methods: Transfer
e.g. John likes Mary  Marie plaît à Jean
Rule: like(A1,A2)  plaire(A2’,A1’).
i.e. arguments are switched.
Examples of MT methods: Interlingua
John likes Mary  Marie plaît à Jean
Examples of MT methods: EBMT
Data-driven, compiles probabilities for
translations … Needs:
• bilingual aligned corpora;
• find best match(es) of $_source;
• establish translational equivalents;
• recombine to generate $_target.
EBMT - translation chunks
• Sentence aligned:
The man swims  L’homme nage.
The woman laughs  La femme rit.
• Sub-sententially aligned:
the man  L’homme, swims  nage, the 
l’, man  homme, the  la, woman 
femme, laughs  rit ...
EBMT: deriving translations
Let’s now translate The man laughs …
Best matches:
• the man  L’homme
• laughs  rit
Combined together, we get: L’homme rit
Great, can you see any problems?! We can fix
these by looking on the Web …
Web Validation of Translations
Input string:
the personal computers
Chunks retrieved:
personal computers  ordinateurs personnels
the 
le /la/ l’/ les
Via Altavista, we get:
1. Les ordinateurs personnels: 980 hits
2. L’ ordinateurs personnels: 0 hits
3. La ordinateurs personnels: 0 hits
4. Le ordinateurs personnels: 0 hits
Examples of MT methods: SMT
• bilingual aligned corpora;
• statistical models of languages and
Works by assuming that French is like English
in a noisy channel, i.e. in code!
cf. Speech Processing models!
Examples of MT methods: Hybridity
Rule-based Methods:
• generate good translations (if it works!);
• encode rule-based phenomena:
Examples of MT methods: Hybridity
Statistical Methods:
• are robust;
• can get a lot right automatically;
• don’t need specialised linguistic knowledge
of source, target, and how they relate to one
So let’s choose the best bits from each ...
Do MT systems translate word-for-word?
translate([Head1| Tail1], [Head2|Tail2):biling_lex (Head1,Head2),
translate (Tail1, Tail2).
etc etc ….
Well, the MT systems we’re using are a black
box (as opposed to a glass box), so we can’t
look at the rules to tell definitively …
Translating word-for-word
How can we tell then?
Compare the input and the output for a suite
of test sentences and try and work out
what’s going on …
Translating word-for-word
If on-line MT systems did translate word-forword, they would:
– pick the most likely translation of each word
each time (i.e. no translational variation ever);
– we could build up the translation of the
sentence compositionally.
• Let’s see if this is what happens by looking
at some real systems ...
Translating word-for-word
Let’s translate We have just finished reading
this book  French
Word-for word we get (from Babelfish):
we:nous, have:ayez, just:juste, finished:fini,
Model 0 Translation: Nous ayez juste fini
lecture ceci livre - hopeless!
Translating word-for-word
Let’s give the MT system larger chunks:
we have:nous avons, just finished reading:
lecture finie just, this book:ce livre
have just finished reading: ont juste fini la lecture
have just … this book: ont juste … ce livre
Translating word-for-word
Typing in the whole sentence, we get:
nous avons juste fini de lire ce livre, not bad!
Capitalizing the ‘we’ and adding a fullstop
makes no difference to the translation here.
Oracle translation: nous venons de finir de
lire ce livre, so you can see Babelfish hasn’t
done too badly here ...
Translating word-for-word
Let’s try another sentence, The thief was
kicking the policeman
Word-for-word we get (from Reverso):
the:le, thief:Voleur, was:Était, kicking:coup de
pied, policeman:policier
Model 0 Translation: le Voleur Était coup de
pied le Policier, not very good!
Translating word-for-word
Building the translation up compositionally:
the thief:Le voleur,
was kicking:Donnait un coup de pied,
the policeman:Le policier
Final translation: Le voleur donnait un coup
de pied le policier, pretty good!
ENFR = FR EN?!
• That is, do both components use the same
rules and dictionaries?
• Are the translation components reversible?
• Are the structural and lexical rules
Only one way to find out … let’s see!
ENFR = FR EN?!
For our 2 strings, we get:
Babelfish: Nous venons de finir de lire ce livre
Reverso: Nous venons de finir de lire ce livre
--------------------------------------------------------------Reverso: Le voleur donnait un coup de pied au
Babelfish: Le voleur donnait un coup de pied le
ENFR = FR EN?!
Let’s see the pairwise translations. Babelfish:
We have just finished reading this book 
Nous avons juste fini de lire ce livre
Nous venons de finir de lire ce livre 
We have just finished reading this book
ENFR = FR EN?!
Babelfish, 2nd sentence pair:
The thief was kicking the policeman Le
voleur donnait un coup de pied le policier
Le voleur donnait un coup de pied au policier
 The robber gave a kick to the police
ENFR = FR EN?!
Reverso, 1st sentence pair:
We have just finished reading this book 
Nous venons de finir de lire ce livre
Nous venons de finir de lire ce livre 
We have just stopped reading this book
ENFR = FR EN?!
Reverso, 2nd sentence pair:
The thief was kicking the policeman Le
voleur donnait un coup de pied au policier
Le voleur donnait un coup de pied au policier
The thief kicked the policeman
How can we help MT Systems help us?
• These on-line MT systems are general
purpose systems. Generally, the problems
are so great that we will never achieve
FAHQMT for such language …
• But, we have more chance of success if we
restrict the sorts of texts with which we
confront our MT systems ...
How to restrict MT Input?
• By constraining subject domain: construct
sublanguage MT systems, e.g. Météo
• By constraining the language used, i.e. by
using controlled languages
How can we help MT Systems help us?
• Update dictionaries/glossaries/rules to the
domain/text type we need to translate!
The Future?
• More of us will use MT, for more things …
• It’ll become (almost as) widely used as web
browsers …
• Speech to Speech Translation …
• MT for specific websites, documents etc ...
we need people like you to get interested in
MT and improve/develop systems!!
Further Reading/More Information
In the first instance, go to:
I’ll add more specific pointers suitable for 1st
year students soon.

Web-Based Machine Translation