Internationalization
An Introduction
Part II: Enabling
License
This presentation and its associated materials licensed under a
Creative Commons Attribution-Noncommercial-No Derivative
Works 2.5 License.
You may use these materials without obtaining permission from the
author. Any materials used or redistributed must contain this notice.
[Derivative works may be permitted with permission of the author.]
This work is copyright © 2008-2011 by Addison P. Phillips
Presenter and Presentation
• Addison Phillips
– Globalization Architect, Lab126
• This Presentation
– Part I of the Internationalization and Unicode Conference
tutorial :
“Internationalization: An Introduction”
Character Encodings and Unicode
Who is this guy?
• Globalization Architect, Lab126
We make the technology behind the Kindle
• Chair,
W3C Internationalization WG
Internationalization is:
• the design and development of a product that
is enabled for target audiences that vary in
culture, region, or language. [W3C]
• a fundamental architectural approach to
software development
Related Concepts
• Localization: creation of a product tailored to
a particular target market
• Translation: process of converting text from
one language to another
• Globalization: unified approach to creating
global products, especially those that support
multiple geographies simultaneously
Opinions differ on
capitalization (C12N);
choose from:
 i18N
 I18n
 I18n
 I18N
Very geeky; not very
internationalized
(I19G?)
Mystic Numbering (M4C N7G)
II N1 T2 E3 R4 N
ATI O NALI ZATI O N
5 6 7 8 9 10 11 12 13 14 15 16 17 18 N
I18N
Localization
Globalization
Canonicalization
Accessibility
=
=
=
=
L10N
G11N
C14N
A12Y
A Global Approach
• Internationalization turns technical problems
into business decisions
• Balance priorities based on real user
distribution/requirements
– Consider global user population as a whole
– Consider specific market requirements on an equal
footing
– Potential markets for the product
Buy In: The Key to Success
• For internationalization to be a success over
time, there must be commitment:
– Management
– Product Team
– Development Team
• All developers, not a splinter group
Addressable
Market:
Why Do
Internationalization?
Globalized Product Development
Internationalization turns technical problems into
business decisions.
– Localization: Choose which markets to translate user
interface or documentation for with no engineering.
– Deployment : Choose whether to serve applications from
a single site, cluster of sites, or in each target market.
– Development : Add content and features to products as
necessary in each target market.
– Integration and Interoperability: Servers and products
can work together around the world, so customers can
truly create “Enterprise” solutions.
Aspects of Internationalization
• Enabling—the same code supports multiple regions or
cultures. Sometimes called a “global binary”.
• Externalization—plan for localizability by separating
“content” from code. This makes localization for specific
languages, regions, or cultures easy, fast, and cheap.
• Customization—add culturally specific functionality,
presentation, or content to an application.
What, me worry?
•
•
•
•
•
•
•
•
•
•
We (wrote it in Java/C#, used Unicode, etc.), so
it is internationalized.
We made the assumption that the product
would only ever have English screens: all our
users understand it anyway.
A localized product is internationalized.
An internationalized product is slow/slower.
It takes longer to write internationalized code.
We can’t read the screens/it is too hard to test.
We have no intention of localizing, so no need
to internationalize.
We don’t have any customers there.
The users in (some country) never complained,
so it must work.
This product is 100% fully internationalized.
Development Methodologies
 Independent of
development methodology
Develop
Requirements
(all customers)
 Agile? Waterfall? You make
the choice.
 Encompasses the full
development cycle:





Design
Development
QC
Release
Support
Develop Roadmap
(global
deployment)
Develop
Requirements
& Architecture
RTM/GA
(by market)
Test
(non-English/nonASCII)
Design
(internationalized)
Code
(Enable, externalize,
customizable)
The Customization Approach
• “Internationalization is something remedial”
– “Didn’t we do internationalization in the last release?!?”
– Internationalization involves a lot of arcane knowledge (“we
don’t know what to do”)
– “It will interrupt or slow down development.”
– “International features are not important to our U.S.
customers—and they represent our largest market.”
– “The guys in-country have always figured it out before.”
– “Let’s outsource it”
– “We’ll get to it next time”
How That Model Really Looks
bug fixes
1.0
sexy new features
1.0a
2.0 Main Line
International Branch
functionality
gaps: intl
users waiting
for 2.0i now
Merges and Fixes
Lots more people
and cost
1.0i
Lost $ and opportunity
lots of cost to get there
Time
International
Release 1.0
The Problem with Customization











Code forks. (double, triple coding)
Lag time for international releases.
Non-adoption of localized release.
Full regression of every language.
Quality or commitment perception.
Lack of data exchange between language versions.
Difficult to repeat (every version is a repeat)
Proliferation of bugs and of support problems.
International features are cancelled.
Core product still doesn’t work/can’t address similar markets.
Loss of market share.
The Internationalization Approach
•
•
•
•
•
•
Gather requirements globally
Enable
Externalize
Customize
Test and support globally
Localize
Large Animal Pictures
ANALYZING AND DEVELOPING A DESIGN
Large Animal Pictures
Resources
Input
Global Code
Software
Component
I/O
Output
Enterprise Animal Pictures
clients
Business Logic
data feed
partner
or provider
API
Front End
Data Store
Business Logic
Operating Env.
Data Store
Operating Env.
API
Internationalization Issues
•
Text Processing
– Character encodings, including Unicode, spelling, word breaks, collation,
and so on
•
Language
– Of the software (localization)
– Of solutions built using the software (localizability, data)
•
Locale-affected formats
– dates, numbers and the like
•
Regionally-affected formats
– names, addresses, currency, and the like
•
Time-related issues
– time zone, calendar, holidays, work rules and the like
•
Cultural adaptation
– presentation, style, position, color use, and the like
•
Legal requirements
– accessibility, SOX, DRM, moderation, security, content, and the like
“Well, it depends…”
Making Good Design Decisions
• Generalize designs
– Locale independent data structures
– Locale sensitive display
• Externalize cultural or linguistic variations
• Customize as a last resort
Levels of Enablement
• Not Enabled
• Single-Language-at-a-Time (SLAAT)
All components run in the same language and
encoding environment correctly.
• Multi-Locale
Unicode support; components run in different
locales, languages, encodings, and time zones
Test Your Assumptions
Gender:
 Male
× Female
Choose Your Language
How is this company doing?
Making Code Aware of Culture
ENABLING
What is “enabling”?
• Enabled software:
adapts the display, processing, validation, storage,
and transmission of data according to the cultural,
linguistic, and regional needs of the users
– Text, Characters, and Encodings
– Locale Awareness
– Times and Time Zones
A “global binary” is a single
object-code version that is
used in all markets, regardless
of localization.
Don’t Code What You Think You Know
• 5/2/7
• 1.234
• 4.32.MD
• sometime in February?
sometime in May?
sometime in 2005?
• more than 1000? less than 2?
• number, time, currency?
morning or afternoon?
Date Formats
Culture
Format
Example
U. S. A.
mdy, /
2/16/05
France
dmy, .
16.2.05
France
dmy, -
16-2-05
CJKT
ymd, /
2005/2/16
CJKT
ymd,年月日
2005年2月16日
Japan
e¥md,
平成17年2月16日
Japan
¥md, /
17/2/16
Time Formats
•
•
•
•
•
•
U.S.A.:
France:
Japan:
Japan:
Korea:
Thai:
4:00 p.m.
16.00
1600
ごご4:00
오후 4:32
16:32 น.
• Albanian:
• Arabic:
4.32.MD
04:32 ‫م‬
5:00
AM
5:00
PM
10:00
PM
Don’t forget to
identify time zone!
More Examples
Assumptions about date tokens:
USA:
French:
Sun, Mon, Tue
lun. mar. mer.
Russian:
USA:
French:
Пн Вв Ср
Jan, Feb, Mar
janv. févr. mars avr.
Spanish (Spain):
Spanish (Americas):
ene, feb, mar
Ene, Feb, Mar
3 positions, titlecase
four positions
lowercase
two positions, Cyrillic
3 positions, titlecase
variable (4 or 5)
positions, lowercase
not titlecase
titlecase
Calendars: What Year Is It?
• Legal, ceremonial, or popular requirement
Gregorian
Japan Emperor:
Thailand (Buddhist):
Chinese (traditional):
Hebrew
Hijri (Islamic)
Armenian
etc. etc. etc.
2007
19 Heisei (平成19年 )
2551 (Gregorian + 543)
4704 (lunar)
5767 ‫(תשסו‬lunar)
1428 (lunar)
1456 (ԹՎ ՌՆԾԶ )
Weekends and Holidays
• When is the weekend?
– Friday is part of the weekend in some countries.
• Both official and unofficial holidays vary widely in number. Here are a
few to watch for:
– USA:
–
–
–
–
–
Japan:
China:
Britain:
France:
Spain:
July 4, MLK, President’s Day, Veteran’s Day, Flag Day,
Columbus Day, Thanksgiving…
Golden Week
New Year’s
Guy Fawke’s Day, Boxing Day
Bastille Day
Reyes Magos
Calendar Display
Number and List Formats
Grouping and decimal
separators:
England:
Germany:
Switzerland:
Swiss money:
France:
India:
12,345.67
12.345,67
12’345,67
12’345.67
12 345,67
12,34,567.89
France uses a non-breaking space!
India: number of digits in
groupings changes!
List delimiters & separators
can conflict
French example:
2 345,67, 1 012,34, 45,67
hard to read
2 345,67, 1 012,34, 45,67
2 345,67; 1 012,34; 45,67
easier to read
Collation
( A
F A N C Y
W O R D
F O R
“ S O R T I N G ” )
English:
ABC...RSTUVWXYZ
German:
AÄB...NOÖ...SßTUÜV…YZ
Swedish/Finnish:
Norwegian:
AB...STUVWXYZÅÄÖ
AB...VWXYÜZÆØÅ
Organizing Information
• “Alphabet” differences
• Additional information
– for example: yomi
• ASCII vs. the world
• Mixed information sets
“Should I be writing all of this down…”
• Wide range of
variation
• Obscure formats
• Difficult to obtain
reliable information
on formats
• Lots of work to
implement and
maintain
Enabling means not
having to know
(m)any of the details
Supporting International Formats
• Use neutral data
structures
– Makes code
independent of locale
– Most data types are
locale-neutral:
• Boolean
• String, char
• Number classes
• Date, Calendar
• Encapsulate
formatting/validation
in a function
– Format style chosen
dynamically at runtime
– Format details don’t
have to be specified or
researched
– APIs know the gory
details
Essence of Enabling
• Object to Presentation, Presentation to Object
–
–
–
–
–
–
–
–
–
–
Integers
Floats
Percents
Currencies
Dates
Locale
Times
Durations
Collation (lists)
Weights/measures/sizes
Resources (user interface strings)
user
presentation
Locale
• an identifier or data structure that allows
programmers to access culturally and
linguistically affected functionality in a system.
• Many systems now based on IETF BCP 47; for example
JavaScript, Java J2SE7, and CLDR
Supporting International Formats:
Numbers
French vs. Suisse
NumberFormat
Demo Code
public String formatNumber(int column, Number n, Locale l) {
NumberFormat format;
Currency c;
switch (column) {
default:
case 1:
format = NumberFormat.getInstance(l);
break;
case 2:
format = NumberFormat.getIntegerInstance(l);
break;
case 3:
format = NumberFormat.getPercentInstance(l);
break;
case 4:
format = NumberFormat.getCurrencyInstance(l);
try {
c = Currency.getInstance(l);
} catch (IllegalArgumentException e) {
// can get here if you specify a locale with no
// country or for one with a territory that isn't
// supported (like my favorite territory 'AQ'
// in which case we use the Almighty Buck
c = Currency.getInstance("USD");
}
format.setCurrency(c);
break;
case 0:
return n.toString();
}
return format.format(n);
}
Collation
Example
Break Iterator
Break iterators allow you to break text
into characters, words, lines, and
sentences.
In the demo, we use a word break
iterator to find word-breaks. We also
use a character break iterator to find
approximate glyph breaks.
BreakIterator iter =
BreakIterator.getWordInstance(b_locale);
iter.setText(str);
int pos = iter.first(); // points to the start of the string
pos = iter.next();
// so move to next break
int longest = 0;
while (pos != BreakIterator.DONE) {
String sub = str.substring(last, pos).trim();
// …
last = pos;
pos = iter.next();
}
Collator
Collator is the class that does
linguistically correct sorting. In the
demo, it’s really easy to use: Java
Collections can take a comparator and
do all the work internally. All we have
to do is provide the right one.
Collator nativeCol =
Collator.getInstance(b_locale);
bMap = new TreeSet(nativeCol);
Complex Types
• Data structures, APIs, or classes built from basic types must
include similar capabilities.
– Store data in a locale-neutral or independent format.
– Display in a language/regional/culturally sensitive manner
– Convert from locale format to locale-neutral or locale-independent
storage format.
Design Time and Data Structures
• Identify your own “locale bias”
– Field names matter!
• “Postal Code”, not “ZIP code”.
• Family Name/Given Name, not First Name/Last Name
– Avoid problematic fields
• Postal address parsing? Area code? Etc.
Currency
• Currency formatting is
usually similar to number
formatting. But things can
vary widely here, too:
–
–
–
–
$1,100.00 [USA]
€1 100,00 [France-Euro]
¥1,100 [Japan]
1.100$00 Esc. [Portugal,
obsolete]
– SFr. 1’000.00 [Switzerland]
•
Currency associated with the
locale doesn’t always apply.
Store the currency type with
value.
– Use ISO 4217 std. codes (USD,
JPY, EUR, RUR)
•
•
•
Not always one symbol.
Not always two decimal places.
$100 + ¥100 = $101
•
Consider neutral displays!
Being Locale Neutral
• Avoid or reduce locale-affected display to
increase portability
– Use unambiguous formats, such as ISO 8601like dates, especially in log files and the like
• 2005-04-01 14:17:00 UTC
– Use consistent formats (‘user locale’),
especially in columns or collections of data
Amount
351,234.56
102,556.78
65,336.00
212,345.00
Currency
USD
EUR
JPY
INR
Amount
351,234.56
102 556,78
65336
2,12,345.00
Currency
USD
EUR
JPY
INR
“String is the Thing”
• Text doesn’t get translated on the fly.
• Don’t use text as an identifier or foreign key.
– Use ID Numbers or not-human-readable values instead of requiring text
fields to match.
– “Intrinsic” data value versus “display” data value.
• Enumerated values displayed as strings.
• Use display strings.
Enumerated
ACCOUNTS_PAYABLE
Displayed
“Accounts Payable”
“pagável de clientes”
English-like Construction
• Concatenation
– String1 + string2
• Pluralization
– Dog + “s” = “dogs” (sheeps??)
• Lists
– 1.23, 2.23, 3.36
– 1,23, 2,23, 3,36?
This topic will be covered in
greater depth in the section on
localization.
Databases
• Most databases can only handle one collation sequence per instance
or one collation per index.
– Remove reliance on alphalists.
– Self-collate short lists.
– Pre-collate long lists?
• Example: NLS_SORT controls the way Oracle returns data (collation
sequence).
– Global environment variable.
– Not necessarily under your control.
– Indices are built on a predetermined or binary sort.
Enabling Summary
• Understand Encodings and Unicode
– All text has an encoding!
• Be Locale-Aware
– Create locale-neutral data structures
– Separate display from storage
Dates, Times, Durations, Calendars
and Time Zones
IT’S ABOUT TIME
Observed Time
Incremental Time
• Computed time based on “clock ticks” in an
“epoch”
– The epochal date is arbitrary. The UNIX epoch is
midnight, January 1, 1970, UTC.
• Some systems have data types for “field
based” time also.
What is a Time Zone
• A time zone is a geographical region or area
that has common rules for determining the
local observed time as it relates to monotonic
(computer) time.
• Distinctions include:
– Offset from UTC
– Daylight Savings (Summer Time) behavior
– Historic changes in offset or DST behavior
– Political control
Time Zone Affected Scenarios
• Zone independent
– only “incremental” times
are necessary
• Local time, past only
– future changes to time
zone rules not applicable
– example: logging system
• Local time, both past and
future
– time zone rule changes
may affect some time
values
– example: calendar
program
• Floating times
– events not tied to a specific
time zone
– example: birthdate, start date,
definition of “night” for phone
usage
• Recurring events
– events that recur—sometimes
during and sometimes not
during daylight savings.
– example: weekly status
meeting
Time Zone Identifiers
• Often based on the time zone information
database (tzinfo). These identifiers are
sometimes called the Olson ids.
Offset
Etc/UTC
Etc/GMT+1
Continent/Region/City
America/Indiana/
Indianapolis
Ocean/Island(City)
Atlantic/Canary
Pacific/Auckland
Pacific/Pago_Pago
Continent/City
America/Los_Angeles
Europe/Paris
Asia/Tokyo
Antarctica/DumontD
Urville
Time Zone Hints
• Only 21 countries have more than one time
zone (if you know the country, you often know
the time zone)
•
Argentina, Australia, Brazil, Canada, Chile, Democratic Republic of the Congo, Ecuador, France, Greenland,
Indonesia, Kazakhstan, Kiribati, Mexico, Micronesia, Mongolia, New Zealand, Portugal, Russia, Spain, and
the United States.
– Of these, most have maritime or overseas regions.
Examples:
• Ecuador: Galapagos Islands
• Chile: Easter Island
• Portugal: Azores
Locale-Neutral Formats
• Use locale-neutral formats for interchange:
– ISO 8601
– Incremental time values (e.g. time_t)
– Distinguish time zone if necessary for
interpretation
• Offset is not the same as time zone
SQL data types and XML formats
are often field-based, while
programming languages are
usually incremental.
At any given time, in UTC, it is
the same time everywhere that
time is measured.
Durations and Repeating Events
Wall-time:
this meeting is at 2 PM Pacific
time every Tuesday
– interval between meetings
may vary in number of
seconds
• Daylight time transitions
• Changes in DST rules
Fixed-duration:
run the virus scanner every
57 minutes
– interval is always 342000
milliseconds
Calendars
•
•
•
•
•
•
•
Gregorian
Japanese Imperial
Hijri
Thai Buddhist
Chinese Traditional
Jewish
Astronomy
Friday, January 20, 2006
1426 ،‫ ذو الحجة‬20 ،‫الجمعة‬
2006年1月20日星期五
二○○六年一月二十日星期五
平成18年1月20日
平成十八年一月二十日
วันศุกรที
์ ่ 20 มกราคม พ.ศ.
2549
วันศุกรที
์ ่ ๒๐ มกราคม พ.ศ.
๒๕๔๙
Calendars affect the field values
calculated for a given event. “Roll”
of values such as month, week, day,
etc. depend on such relationships.
Calendar code then converts to
incremental times.
Formatting Dates and Times
October 10, 14H 6:05:45 AM JST
Requires more than
just a locale!
 date
 time zone
 calendar
value being
formatted
defines relation to
“wall time”
defines rules for
calculating field
values
1034197545321L
Asia/Tokyo
Japanese Imperial
Example: Java Date Formatting
Computer Time (Data Structure)
java.util.Date: long integer, milliseconds since “epoch” of January 1,
1970, 00:00 UTC
Externalization
Moving language and culturally
affected data and components out of
code.
What is Localization?
• The process of tailoring a product to a specific
target market.
– Translation of messages
– Adaptation to local preferences
– Addition (or subtraction) of content or features
Localization is obvious…
• “Localization” is not “Internationalization”!
• Localizability is internationalization.
– Externalize text
– Externalize presentation
– Dynamic composition
– Distribution of language content
– “Plug-in” features
Avoiding Forks
English Version
version française
Deutsche Version
日本語版
Global Binary
Resources
Resources
Resources
Resources
Forked Code Woes
•
•
•
•
•
•
Hard to fix and maintain
Different versions in the field
Delays in releasing localized product
Different functionality by region
Confusing for customers/users
Versions are not interoperable and might not
be able to exchange data!
Other Benefits
•
•
•
•
Rename or re-brand product
Fix spelling or grammar mistakes
Fix usability
Make terminology consistent
… all without a rebuild!
What is a ‘Resource’?
any application component loaded
dynamically at runtime, rather than
compiled into the application
– in Localization: source code files containing
language, region, or culturally-affected
materials
$SET 1 Prompts
1 ENTER FIRST NAME
–
–
–
–
–
2 ENTER LAST NAME
$
$set 2 Error Messages
1 NAME NOT ON DATA BASE
2 ILLEGAL INPUT
–
–
–
–
–
–
–
–
–
–
–
a gencat message catalog file
Text
Error messages
Icons
Pictures
Fonts
Colors
Graphics
Sizes
Positions
Magic Numbers
Mnemonics (“Alt+G”,
“F4”, etc.)
File Locations
Dictionaries
Glossaries
Grammar Rules
Code
Non-Translatable Resources
• Some content should be externalized but not translated
– Sometimes referred to as “DNT” for “do not translate”
• Externalize? Yes…
– Segregate DNT material from translated material if possible (by
using separate resource files or separate resource blocks within a
file).
– Developers can’t always tell when something should or should not
be DNT… and neither can translators (context is missing)
The “Locale” in “Localization”
• Resources “fall back”
to find the best match
Global Binary
Resources
Falling back
zh-Hans-SG (Chinese, Simplified script, Singapore)
zh-Hans (Chinese, Simplified script)
zh (Chinese)
(root)
Sparse Population
• A given language resource may not contain a
complete set of resources.
– Some resource language fall back for each subresource (such as a particular value)
“appName”
“Démo”
“dialogTitle” “Bonjour monde”
“appName” “Demo”
“maxRows” 57
“dialogTitle” “Hello World”
Getting the Right Locale
Client Locale
Server Locale
API Request Locale
client
System Mgmt Locale
Front End
Business Logic
API
Business Logic
Data Store
Data Store
Operating Env.
Operating Env.
One request might serve
multiple purposes or be
seen in multiple
contexts
Resources and Translation
“key”, “display string”
“dialogTitle”, “Dialog Title”
“aMessage”, “This is a message.”
“key”, “ðìsplàÿ stríñg”
“dialogTitle”, “Ðîálòg Tïtlè”
“aMessage”, “Thìß ís â Mésßãgê.
Pseudo-Translation
Pseudotranslation
Don’t Build From Text Fragments
• Text fragments are hard to translate
– Fragments may not follow grammar rules
• Cannot know which parts go together
• Parts can be reused in incompatible ways
String1 = There are
String2 = no
There are files.
There are no files.
There are 50 files.
There are tables in files.
There are no tables in files.
String3 = tables in
String4 = files.
[] files out of [] were deleted.
An error occurred at [] on [].
Page [] of []
Processing: []% complete.
Issues With Text Composition
• Count:
– There were one errors found.
– You have earned your 22th set of bonus points.
• Gender:
– “Documenti del Chris“
– "Documenti della Chris”
– "Documenti - Chris"
• Case
• Grammatical Structure
– SOV, SVO, etc.
• Word Order and Inter-word Dependency
Sentence Parts Must Agree
• Endings, Gender, Plurality, Case
– e.g. Japanese counting uses different words for
different kinds of objects
– e.g. Slavic languages use different endings for
singular, few, many…
Message Format APIs
There were {0}
tables on {1}.
There were
{0,number,integer}
tables on
{1,date,short}.
{1,date}に
{0,number,integer}の
テーブルがあった。
• Number replacement
variables.
• Provide typing and
formatting information
where possible.
• Externalize as a single
unitary string.
Complex Message Formatting
There were no errors.
There was 1 error.
There were 2 errors.
0:There were no errors.
1:There was {0} error.
2:There were {0} errors.
0:не было ошибок
1:была {0} ошибка
2:были {0} ошибки
5:были {0} ошибок
“choice format” APIs allow for
different resources to be used
based on runtime values.
Examples:
 ordinal numbers (1st, 2nd,
3rd, 4th, etc.)
 complex messages, such as
“27 seconds ago” vs. “10
minutes ago”
number of resources may
need to vary by locale or
language
Images and Icons
•
•
•
•
Avoid metaphors
Avoid cultural sensitivities
Avoid body parts
Replace as necessary
• Avoid putting text into graphics
Graphic: $20
Text: $0.06
Images and Culture
• Beware your
biases—even
“good” ones.
Meet your friends on our
new social website for
India
Isn’t it Swell?
English is very succinct.
– Words in other languages
are often longer
– Sentences may be longer
– Characters may be larger
(taller, wider, or require a
bigger point size)
More Swollen Text
• 30% in length (alphabetics, abjads, etc.)
• 30% in height (ideographics)
• But… a rule of thumb, not a “fact”
– Measure your results with care.
GUI Layout
Managing English Text
String
Building??
Abbrev.
Eng.
String is the Thing
String is the
Thing?
Dereferencing
• Minimize sentence building
• Minimize arguments per string
• Use subject:predicate wherever possible
Don’t do this:
Your balance is $100.00.
When you can do this:
Balance: $100.00
Dynamic vs. Static Layout
•
•
•
•
Magic numbers
Externalized layouts
Mnemonics
Colors
Localizing Styles
• Bolding is not universal for emphasis
– Italicization, Capitalization, etc. are also not
universal (some scripts don’t have these
attributes)
• Use Logical not Presentational names
– Describe the function not the appearance. For
example, use “emphasis” instead of “italics”.
中国
Amikake
Wakiten
Use of Color
“Going Down”
“Going Up”
Keyboards
Input Method Editors
Some languages require software to
assemble keystrokes into characters
 Asian languages with vary large character sets
 Complex scripts with vowel-killers and other
contextual editing requirements
Applications that interact directly with keypressed events can disable or disrupt IME
input.
 On- and over-the-spot editing
Customization
When is it okay?
• Content should be highly
localized or have locale-specific
requirements:
– customization lets you address
this requirement in the most
localized possible manner
Externalization again
dates
Your Application
numbers
images
colors
addresses
local rules
etc.
local rules, regulatory requirements, postal addresses, default bookmark lists, your company’s customer service phone numbers
Externalization again
Locale-independent
global binary
Locale-dependent
resources
(includes code)
Large Animal Pictures
Resources
Input
Global Code
Software
Component
I/O
Output
Customization Examples
 Postal address
validation
 Postal code validation
 Telephone number
formatter
 “Personality” questions
 blood type vs. sun sign
Generic
Implementation
US
Implementation
 Personal name formatter
 first/last position, space,
highlighting, formality, etc.
 Tax codes and shipping
schedules
Generic API
DE
Implementation
Impl
Example: Postal Addresses
address1 varchar(32)
country
address2 varchar(32)
address1 varchar(64)
i18n
char(2)
city
varchar(16)
address2 varchar(64)
state
char(2)
city
zip
char(5)
province varchar(64)
varchar(64)
postcode varchar(64)
public interface Address {
public class genericAddress implements Address {
public class USAddress extends genericAddress {
public class UKAddress extends genericAddress {
country=US, postcode=‘WC2 1GH’
// error
country=UK, postcode=‘95111’
// error
country=DE, postcode=‘1A4喪’
// okay?
Building Global Software
Beyond Just Coding:
Localization, QA, and all that
The internationalization cycle
• Encompasses the full
development cycle:
–
–
–
–
–
–
Requirements
Design
Development
QC
Release
Support
Support Issues
and Requests
(all customers)
Develop Roadmap
(where is the
product going?)
Develop
Requirements
& Architecture
RTM/GA
(by market)
Test
(non-English/nonASCII)
Design
(internationalized)
Code
(Enable, externalize,
modularize)
What is “internationalization QA”?
• Does the enabled product work correctly?
– Non-English configurations
– Non-ASCII data and encoding support
– Cross time zone support
– Market specific features or customizations
• Does localization appear correctly?
– Is the product localizable?
What makes this different
from “regular” QA?
Growing (and Pruning) the Matrix
Include non-English configurations in your test
matrix; include non-ASCII data in your tests.
Be prepared to prune the test
matrix.
What to Test With
– Test Non-English configurations
• Non-English locales (lying to your machine)
• Native configurations (when does it make sense?)
– Test Non-ASCII data
• Encodings, encodings, everywhere
• Non-ASCII character values
– Test Across Time Zones
• Two or more time zones; consider international date
line (“it’s tomorrow in Japan”) and DST issues
Planning Testing
Initially
• Get tools that are
enabled!
– Automation allows
greater coverage, but
only if it works.
• Plan encodings and
locales as part of the
test matrix.
• Acquire third-party
products as necessary.
Increasing Maturity
• Use test driven
development practices.
• Get developers to write
unit tests that are
internationalized.
• Put the ‘i18n’ bugs into
the regression suite.
Configuring Machines
Create both native and simulated environments:
– Native operating systems may have minor but
sometimes critical differences (folder names,
keywords, localized registry entries)
– Most features don’t run into native differences
(easier to work with English-localized machines)
– Don’t buy physical keyboards (use software
keyboards) unless your application relies on scan
codes from keys
Localization
Incorporate
Localization is part of the release process too.
– Changes to the user interface cost the localization
team time and money.
– (Changes to the product cost the documentation
and QA folks too)
• May need to institute change control or a UI
freeze
Simultaneous Shipment (Simship)
Ideally, to maximize opportunity, ship the target
languages the same day as the source language.
– It might not make sense for your product.
– But it might not be as difficult as you think it is. It
might even be good for you.
Distribution of Content
• How does the localized text get into the
running product?
– Satellite assemblies, DLLs, shared libraries
– Message catalogs
– Special directory
– Database
– Etc.
More Distribution
• “Specific Language”
(per-language)
• “Language Included”
(one or more languages)
• “Language Pack”
(product plus something)
English
English
German
German
French
French
English
Global Binary
+
German
French
Completing the Product
• Static content is often under source control and
can be localized “normally”
• Dynamic content may include the initial set of
data or other items which need to be localized
beyond software.
–
–
–
–
–
Demos and Demo Data
Dictionary, Language add-ons
Local offers, links to Web store, etc.
Packaging
Regulatory
Quality Checking and Development
Methodologies
• Translation is a human-oriented
task.
– Translation time lines are linear
with volume.
• Localized product should be
tested for functionality
– translation can break things
– usually the first language finds
most of the bugs
• Translations should be checked
for quality
• Development cycle has to
include time for translators and
quality assurance to catch up.
– This does not mean “no agile”
or “no changes”
– Do pilot language(s) or movingtarget translation; do better UI
design and usability reviews;
etc.
Summary
Internationalization
… is a fundamental architectural approach: it is
how software is built.
– Design
– Enabling
– Externalization
– Customization
– Testing and Support
– Lifecycle
Q&A
Would you write the code for I18N on the
whiteboard before you go?
#define UNICODE
#import I18N.h
Descargar

inter-locale.com