OT12 online seminar: Translation Memory Tools Paul Filkin, Director of Client Communities, SDL Language Technologies 1 The Agenda… or things we’ll cover • how Trados was developed and established itself as industry leader • how translation memory tools work • what their benefits for open (and professional) translators are • what the particular distinguishing features of SDL Trados Studio are • what the future is for translation memory software 2 SDL Trados… a brief history 3 Translation Production Content is either … • Translated by professional translator • Or, the “occasional” translator – Non-linguist, Subject matter specialist (reviewer), Crowd sourced, … • Or, left un-translated – Not relevant, too costly, too much overhead involved, … This presentation focuses on content produced by professional translators 4 Productivity Environments • Today, content workers utilize specialized productivity environment(s) Content Worker Application Class Prominent Example Graphic Designers Graphic tools Adobe Photoshop Audio Producers Musicians DAW Steinberg Cubase Architects 3D modeling program Google Sketch up Engineers CAD Autodesk AutoCAD (Digital Audio Workstation) (Computer Aided Design) Game Developer Game Engine Epic Games Unreal Engine Translators CAT SDL TRADOS TWB / SDL Studio (Computer Aided Translation) All mentioned trademarks are property of their respective owners. 5 Translation Editor is at the core of any CAT Professional Translation can be done … • In principle, in any authoring editor (desktop/browser) – However, with limited productivity (in the range 800-1500 words per day) and high efforts maintaining consistency and accuracy. • Using Microsoft Word + Plug-ins – Plug-in to translation productivity tool – Hard dealing with structured content • Using a Dedicated Translation Editor (CAT or TEnT) – Depending on various factors: productivity boost in the range 2000 to 5000 words per day – Well established market for professionals 6 What is CAT Technology? • CAT: Computer-Aided Translation – A generic term used to describe software which assists users during the localization/translation process – Sometimes referred to as TEnT : Translation Environment Tool • Our CAT technology is an integrated toolset, offering: – Translation Memory (TM) – Termbase – Editing environments – Project Management functionality – Software Localization – OpenExchange Public ProZ Poll August 24 reply from 1670 translators http://www.proz.com/polls/5474 7 What is CAT Technology? • CAT technology incorporates the concept of translation memory and termbase • Translation memory: a database consisting of translation units – Translation unit: source and translated sentence or paragraph – During translation, the technology searches for exact or similar matches to the current source segment for translation – Matches found can be reused or edited • Termbase: multilingual database consisting of term entries – Term entries: terms, synonyms, acronyms, etc. – Contextual data: definition, part of speech, gender, etc. • Translators work with a translation memory and termbase to reuse previous translations and ensure consistency of terminology during translation 8 Translation Memory Overview • A translation memory is a searchable database containing source and translated sentences or paragraphs – The translation of a segment or phrase occurs only once, as each occurrence is stored in the database – During a translation project, when the source segment re-occurs, the translation memory remembers the translation (by searching the database) and inserts it into the new document – The translator may accept the previous translation or edit the translation, if necessary 9 Terminology Management Overview • A termbase is a searchable database which contains a list of multilingual terms and contextual term data – Term data gives details about the origin and use of the term, such as definition, gender, context, etc. – The termbase can be used in monolingual form during source content creation • Ensure consistency of terminology in source documentation • Facilitate translation for the global marketplace – The termbase can be used in bilingual form in conjunction with translation memory technology to increase translation accuracy • Ensure consistency of terminology in translated documentation 10 Key Productivity Accelerators Topic Level Segment Level Subsegment Level document, page, fragment, chunk, … sentence, header, footnote, table cell, … phrase, word, … Exclusion from translation through markup Translation Memory Auto-suggest “Perfect Matching” utilizing bi-lingual representations Automated Translation Placeables, Terms Auto-propagation Concordance Impact on effective handling of update translations Impact on effective handling of new translations Impact on effective handling of document internal redundancies Impact on consistency & quality 11 (dictionary based auto-completions) Topic (Document, …) Level “Don’t translate if it hasn’t changed” (but show it to provide context for the text that has actually changed/ added) Markup exclusions Use ITS / other convention to lock text Custom arrangements between CMS + Translation System Significant productivity gains dependent on update frequency 12 Perfect Matching Compare text with predecessor translation project and lock what hasn’t changed But, high overhead in managing corresponding projects Segment Level : TM “Don’t re-translate if you can reuse an (approved) existing translation” (but adapt as you need) • Increasingly sophisticated match type differentiation – 100%, Fuzzies, Context Matches (CM), (ICE) • Cascaded TMs, Ranking of TMs • Significant productivity gains dependent on – Availability of relevant TMs – Similar content produced again and again 13 Segment Level : Automated Translations “Adapt an automated translation proposal” (instead of translating from scratch) • Increasingly accepted by professional translators – Especially using Statistical Machine Translation (SMT) • Significant Productivity gains depending on – SMT engine trained with sufficient, relevant (in-domain), high quality (professional translator output) data – Translators are able to dynamically select “in-domain” trained engine [e.g. “Touchpoints”] – Trust scores 14 Segment Level : Auto-propagation “Auto-propagate translations for identical source segments” (and ripple through any changes when you change your translation) • Productivity gain if text has internal repetitions – Simplifies updating identical segments throughout the content • Requires parameters to control behavior 15 Subsegment Level : Auto-suggest “While I type, provide a list of relevant candidates so that I can quickly auto-complete this part of my translation’” • Productivity gain highly dependent on available data-sources and proposal strategy – Optimal configurations reduce keystrokes by 30 up to 50% – Avoidance of typos, impact on consistency 16 Subsegment Level : Placeables, terms “While I type, make it easy for me to place tags, recognised terms and other placeables so I can focus on the translatable text.’” • Productivity gain highly dependent on available data-sources for terminology or translator diligence, and the complexity of the tags – Avoidance of typos, impact on consistency, robust target documents 17 Subsegment Level : Concordance “Make it easy for me to search through Translation Memories, in both source or target text and from wherever I am in the document I’m translating’” • Biggest impact is in being able to find things you’ve translated before that are similar, or the same, as the current text and make it easy to reuse – Impacts the quality of the work you deliver – Impacts the time it takes to find the right words for complicated texts 18 Key technology advances… • Whereas the key technology advances are in the area of subsegment reuse and statistical machine translation (SMT), the actual productivity gains for a Professional Translator relate to the ergonomics of how systems allow users to interact, control and automate the various data sources: – Access, creation, chaining, weighting and sharing of TMs – Access to SMT pointing to specific engines – Compilation of phrase dictionaries on the fly 19 What Happens When Teams Grow? When teams of three or more work together, new factors must be considered to work effectively and properly collaborate Project Managers 20 Reviewers Translators Typical Package-based Workflows Project Manager Translator Reviewer or Translator Project Manager Reviewer 21 ...x 5 languages... Project Manager 22 Typical Project Workflow with SDL Studio GroupShare 1. Project Manager creates a project – Performs analysis, pre-translation using SDL Trados Studio connected to a TM on TM Server 2. Project Manager publishes project Uses Publish command in Studio, select server and location, and Studio takes care of the rest Contact team via email, phone Project Manager 23 Typical Project Workflow with SDL Studio GroupShare 3. Team Accesses Project – Use Studio 2011 to open project – Check out files as required for translation, review, or signoff • Studio only gets files as needed • Project Server tracks file versions – Studio and Project Server synchronize metadata Reviewer Translator Project Manager 24 Looking forward… • Current theme for CAT tools – reviewer productivity – Inclusion of track changes and commenting mechanisms in translation editor • Automation in the broader production chain 25 … and the Studio “Platform” which includes the OpenExchange 26 The SDL OpenExchange… current state of affairs 57 Apps on the OpenExchange 42 are completely free 29,804 downloads (August 2012) 7,141 app users (August 2012) 396 developers (August 2012) 27 Copyright © 2008-2012 SDL plc. All rights reserved.. All company names, brand names, trademarks, service marks, images and logos are the property of their respective owners. This presentation and its content are SDL confidential unless otherwise specified, and may not be copied, used or distributed except as authorised by SDL.