Chris Pratley
Group Program Manager
Microsoft Word
Office Unicode history and strategy
 Implementation
 Benefits of Unicode to Office users
 Demo of Word
Office97 Unicode Strategy
Office97 driving factors
– Customers operate world-wide (US only 40%)
– Need to handle multiple code pages in Europe
Office97 goals
– Enable loss-less file exchange world-wide
– Solve code page problems in Europe
– Development efficiency for Asian and Euro versions
• Unified source code base – but still different executables
• Unified development process
• Delta between language versions shrinks from 18 to 2 months
– Lay foundation for future
Office2000 Unicode Strategy
Office2000 goals
– Reduce Total Cost of Ownership for large corporations
• Single version to deploy and administer globally
• Configurable interface to handle local needs
– Language of User Interface can be changed
– Additional language features can be enabled as needed
– Emulate any localized version
• “Français”, “日本語”, “한글”, “‫”عربي‬,“‫”עברית‬, etc.
– Streamline development process further
• Core “US” team ships global product
• Integrate bi-directional version team (Arabic, Hebrew)
– Focus on needs of bilingual and multilingual users
Office “10” Unicode Strategy
Office10 goals
– Finish the globalization work begun in Office2000
• Extend functionality to all applications
• Integrate Complex Scripts support (Indic, Thai, Vietnamese)
– हिन्दी, தமிழில், ภาษาไทย, Việt
– Streamline development process further
• Single build process from start to finish
• Integrate complex scripts team
– Deepen Unicode support
• Unicode 3.0 languages (ᐃᓄᒃᑎᑐᑦ, አማርኛ, etc.)
• UTF-16 (esp. plane 2: )
• More complex script and limited combining diacritic coverage
The Word Family Tree
US/Euro JPN KOR CHT CHS Bi-Di Thai/Indic
(now w/ Indic)
A single
Core applications are Unicode internally
– Word, Excel, PowerPoint (Office97)
– Access, Publisher (new in Office2000)
• Databases and drivers are Unicode
– Outlook, FrontPage (new in Office10)
• New Outlook local storage is Unicode
Difficulties encountered with Unicode
– Lack of full system support in Win9x
– Every app needed different solution
• MFC-based apps were hardest
– Missing system services (e.g. font-linking)
– Interoperation with code-page based systems
– Educating test team about Unicode
• Testing issues different vs. MBCS
• Lack of expertise in uncommon languages
Office shared code services
– Central Win32 Unicode text API “wrappers”
• Simulate nearly full support on Win9x
– ExtTextOutW and others
• Provide optional font-linked output
– Hardcode “preferred fonts” by script, style
– User-specified font-fallbacks via reg key (if any)
– Font categorization by script range (use MLANG.DLL)
• Font substituted if glyph not available
– Word modifies font settings in the document
– Other apps do only at display time
– Insert Symbol dialog (Unicode 3.0 support)
Office Users Benefit
Single binary world-wide
 Shared world-wide file formats
 Multilingual word/data processing
 Unicode HTML
 Unicode e-mail (HTML, RTF, plain)
Single Binary
Easier to deploy, administer
– One set-up image to install world-wide
– One set of service packs for all machines
All features available in all “versions”
– Still have local version packages
– Multilingual users can use “foreign” features
User Interface language is configurable
– Your language follows you when you travel
Major cost savings for customers
– Less testing of corporate solutions
– Lower internal tech support costs
Single File Format
Multinational corporations use Office
– Need to exchange documents company-wide
Office unified file formats via Unicode
– Word95 had 7 different file formats
– Word97 had 1 file format but no editing, layout for
languages covered by other versions
– Word2000 adds editing, layout, and full-roundtrip
– Word10 adds full complex script support
Multilingual Usage
English Office10: input/display/edit/layout of
– European languages
• any similar left-right scripts if fonts/NLS available
– E.g. Canadian Syllabics (Inuktitut), Ethiopic, Cherokee
• Some combining diacritic support (African languages)
– East Asian languages (including UTF-16 “surrogates”)
• Chinese (Traditional and Simplified), Japanese, Korean
– Complex Script and Bi-directional scripts (need enabled system)
• Arabic (incl. Farsi, Urdu), Hebrew
• Thai
• Hindi, Tamil, Oriya, Telugu, Punjabi, Bengali, Gujarati, etc.
Multilingual Usage
Most documents are monolingual
– Most users are bilingual
• Local language
• English
Optimize UI for using one, two or three languages
– Over 100 supported – rare usage
Detect 20+ languages while typing (Word)
– Automatically install and use the correct proofing tools
Plain text I/O in any encoding (Word, Excel)
Multilingual Word Processing
Proofing tool interfaces are Unicode
– SDKs available for 3rd party development
Tools for over 35 languages available
– European languages, Japanese, Chinese, Korean,
Arabic, Hebrew, Thai, Hindi…
– Spelling, Grammar, Hyphenation, Thesaurus
• Traditional/Simplified Chinese conversion
• Japanese character usage consistency checker
• Hangul/Hanja conversion
– Translation dictionaries (available offline)
– Automatic translation web services
Multilingual Data Processing
Access databases are Unicode
– Hook up to SQL7.x/2000 Unicode databases
Excel workbooks are Unicode
– Hook up to Unicode databases using OLE-DB
– Create Pivot lists and manipulate Unicode data
PowerPoint creates multilingual multimedia
– Web sites, animations
Web sites
URLs transmitted in UTF-8 (before the “?”)
 FrontPage
– Create and edit web pages in Unicode
– WYSWYG Web pages
– Save in full or “filtered” HTML
– Display Unicode 3.0 pages
Mail and PIM
– New local storage is Unicode
• Contacts, Calendar, Tasks etc.
– Display and message handling all Unicode
– Send/receive mail in any encoding
Unicode HTML
HTML is a companion file format
– Roundtrip all formatting
• Optional HTML Filter cuts file size for publishing
– Save to web servers directly
– Roundtrip Unicode data in any encoding
• UTF-8 and UTF-16 are supported too
– HTML is tagged with encoding
Unicode e-mail
Office2000/10 provides fully multilingual email
– HTML mail uses internet standards
– All Unicode content preserved
Plugs into Outlook, Outlook Express, Exchange
– Use Word to compose replies and new messages
– Send in plain text, RTF, or HTML
All applications can mail documents as HTML
Future Directions
Help Windows build a worldwide platform
– Ensure system support is useful to app writers
– Unicode 3.0 languages too
Extend Unicode support to more apps
– Visual Basic Editor and Forms
Microsoft “Word10”

Microsoft Office97 and Unicode