Web-Based Machine Translation Andy Way School of Computing Email: firstname.lastname@example.org URL: www.computing.dcu.ie/~away Room: L245 Phone: (700)5644 Plan of Attack (1) • What is MT? • Why do we do it? How much is it used? How much more could it be used? • Is it any good? What exactly is it good for? What is it not good for? • What MT methods are there? • Do on-line MT systems translate word-forword? How might we be able to tell? Plan of Attack (2) • Do pairs of on-line MT systems work the same in both directions? • How can we help these MT systems help us? • The Future (?!) • Further Reading/More Information What is MT? MT = FAHQMT MAHT (on-line dictionaries, termbanks, TM etc …) CAT HAMT (resolving ambiguity etc …) Why do we do MT? • To communicate in other languages than the ones we know … • (If we’re a company) To increase/maintain market share • To speed up the translation process • etc etc ... How much is it used? • In 2000, MT specialist Scott Bennett said “Altavista's BabelFish ... initiated in late 1997, is now used a million times per day”. • In 2001, Softissimo announced that the Internet translation request volume processed by its Reverso translation engine (www.reverso.net) has now reached several million translation requests (of Web pages, e-mail, short texts and results of search engine requests) per month on its mail translation portal and the portals of its Internet partners.“ • V.d. Meer (2003) "Every day, portals like Altavista and Google process nearly 10 million requests for automatic translation." How much more could it be used? • Volume of text required to be translated currently exceeds translators’ capacity (demand outstrips supply). This imbalance will only get worse, cf. accession of new Member states in EU. • NB, also Official Languages Act 2003 Solution: automation (the only solution). How much more could it be used? • translation and localisation industry have focussed on product documentation which represents probably less than 20% of all text-based information repositories that need to be localised • time: five times the volume of text needs to be translated in practically no time. Corporate decision makers will have to begin supporting multilingual communication initiatives and strategies. How much more could it be used? • GIL market growing from $4.2 billion in 2001 to $8.9 billion in 2006, an annual growth rate of 16.3%. Localisation and translation services form by far the largest part of this market with 69.8% of the total, i.e. $2.9 billion in 2001 and $5.8 billion in 2006, an annual growth rate of 14.6%. • W.r.t. crosslingual applications, expected to grow from less than 1% of the total market in 2001 ($42 million) to $193 million in 2006, 35% annual growth. Is MT any good? (1) Depends … what you want to use it for and how you use it!! Input MT Output Cost Is MT any good? (2) • No pre-editing Lots of post-editing! • Lots of pre-editing No(t much) postediting! GARBAGE IN, GARBAGE OUT!!! Is MT any good? (3) • Sometimes no pre-editing is required: – for gisting; – for company-internal circulation; – etc etc … • What it’s not good for is literary translation, i.e. won’t take translators’ jobs - will free them up for new (more interesting) tasks and create new niche markets MT Developers • So MT is of use, and will become used much more than it is currently, so … • … we need people out there who can improve current systems and develop new ones. let’s look at how people currently “design” MT systems … MT Methods MT Rule-Based MT Transfer Interlingua Data-Driven MT EBMT SMT The Vauquois Pyramid for MT Interlingua Analysis $_source Transfer Generation Direct $_target Examples of MT methods: Transfer English SVO, Irish VSO, Japanese SOV. So translation between them is complicated by facts about word order. But at a ‘deeper’ level, the languages are more similar ... Transfer (cont’d) e.g. John saw MaryChonaic Seán Máire S S HEAD see SUBJ John OBJ GOV SUBJ OBJ Mary feic Seán Máire Examples of MT methods: Transfer e.g. John likes Mary Marie plaît à Jean (SUBJ) (OBJ) (SUBJ) (IOBJ) Rule: like(A1,A2) plaire(A2’,A1’). i.e. arguments are switched. Examples of MT methods: Interlingua John likes Mary Marie plaît à Jean lex=like/plaire sem=Experiencer lex=John/Jean sem=Patient lex=Mary/Marie Examples of MT methods: EBMT Data-driven, compiles probabilities for translations … Needs: • bilingual aligned corpora; • find best match(es) of $_source; • establish translational equivalents; • recombine to generate $_target. EBMT - translation chunks • Sentence aligned: The man swims L’homme nage. The woman laughs La femme rit. • Sub-sententially aligned: the man L’homme, swims nage, the l’, man homme, the la, woman femme, laughs rit ... EBMT: deriving translations Let’s now translate The man laughs … Best matches: • the man L’homme • laughs rit Combined together, we get: L’homme rit Great, can you see any problems?! We can fix these by looking on the Web … Web Validation of Translations Input string: the personal computers Chunks retrieved: • personal computers ordinateurs personnels • the le /la/ l’/ les Via Altavista, we get: 1. Les ordinateurs personnels: 980 hits 2. L’ ordinateurs personnels: 0 hits 3. La ordinateurs personnels: 0 hits 4. Le ordinateurs personnels: 0 hits Examples of MT methods: SMT Needs: • bilingual aligned corpora; • statistical models of languages and translation. Works by assuming that French is like English in a noisy channel, i.e. in code! cf. Speech Processing models! Examples of MT methods: Hybridity Rule-based Methods: • generate good translations (if it works!); • encode rule-based phenomena: sent(Num) nounphrase(Num), verbphrase(Num). Examples of MT methods: Hybridity Statistical Methods: • are robust; • can get a lot right automatically; • don’t need specialised linguistic knowledge of source, target, and how they relate to one another. So let’s choose the best bits from each ... Do MT systems translate word-for-word? translate([Head1| Tail1], [Head2|Tail2):biling_lex (Head1,Head2), translate (Tail1, Tail2). biling_lex(john,jean). biling_lex(swims,nage). etc etc …. Well, the MT systems we’re using are a black box (as opposed to a glass box), so we can’t look at the rules to tell definitively … Translating word-for-word How can we tell then? Compare the input and the output for a suite of test sentences and try and work out what’s going on … Translating word-for-word If on-line MT systems did translate word-forword, they would: – pick the most likely translation of each word each time (i.e. no translational variation ever); – we could build up the translation of the sentence compositionally. • Let’s see if this is what happens by looking at some real systems ... Translating word-for-word Let’s translate We have just finished reading this book French Word-for word we get (from Babelfish): we:nous, have:ayez, just:juste, finished:fini, reading:lecture,this:ceci,book:livre Model 0 Translation: Nous ayez juste fini lecture ceci livre - hopeless! Translating word-for-word Let’s give the MT system larger chunks: we have:nous avons, just finished reading: lecture finie just, this book:ce livre have just finished reading: ont juste fini la lecture have just … this book: ont juste … ce livre Translating word-for-word Typing in the whole sentence, we get: nous avons juste fini de lire ce livre, not bad! Capitalizing the ‘we’ and adding a fullstop makes no difference to the translation here. Oracle translation: nous venons de finir de lire ce livre, so you can see Babelfish hasn’t done too badly here ... Translating word-for-word Let’s try another sentence, The thief was kicking the policeman Word-for-word we get (from Reverso): the:le, thief:Voleur, was:Était, kicking:coup de pied, policeman:policier Model 0 Translation: le Voleur Était coup de pied le Policier, not very good! Translating word-for-word Building the translation up compositionally: the thief:Le voleur, was kicking:Donnait un coup de pied, the policeman:Le policier Final translation: Le voleur donnait un coup de pied le policier, pretty good! ENFR = FR EN?! • That is, do both components use the same rules and dictionaries? • Are the translation components reversible? • Are the structural and lexical rules bidirectional? Only one way to find out … let’s see! ENFR = FR EN?! For our 2 strings, we get: Babelfish: Nous venons de finir de lire ce livre Reverso: Nous venons de finir de lire ce livre --------------------------------------------------------------Reverso: Le voleur donnait un coup de pied au policier Babelfish: Le voleur donnait un coup de pied le policier ENFR = FR EN?! Let’s see the pairwise translations. Babelfish: We have just finished reading this book Nous avons juste fini de lire ce livre Nous venons de finir de lire ce livre We have just finished reading this book Aha! ENFR = FR EN?! Babelfish, 2nd sentence pair: The thief was kicking the policeman Le voleur donnait un coup de pied le policier Le voleur donnait un coup de pied au policier The robber gave a kick to the police officer Aha! ENFR = FR EN?! Reverso, 1st sentence pair: We have just finished reading this book Nous venons de finir de lire ce livre Nous venons de finir de lire ce livre We have just stopped reading this book Aha! ENFR = FR EN?! Reverso, 2nd sentence pair: The thief was kicking the policeman Le voleur donnait un coup de pied au policier Le voleur donnait un coup de pied au policier The thief kicked the policeman Aha! How can we help MT Systems help us? • These on-line MT systems are general purpose systems. Generally, the problems are so great that we will never achieve FAHQMT for such language … • But, we have more chance of success if we restrict the sorts of texts with which we confront our MT systems ... How to restrict MT Input? • By constraining subject domain: construct sublanguage MT systems, e.g. Météo • By constraining the language used, i.e. by using controlled languages How can we help MT Systems help us? • Update dictionaries/glossaries/rules to the domain/text type we need to translate! Savings Time Customisation The Future? • More of us will use MT, for more things … • It’ll become (almost as) widely used as web browsers … • Speech to Speech Translation … • MT for specific websites, documents etc ... we need people like you to get interested in MT and improve/develop systems!! Further Reading/More Information In the first instance, go to: http://www.computing.dcu.ie/~away/MT/mt.html I’ll add more specific pointers suitable for 1st year students soon.