Strategies for Developing
Non-English Websites
Elizabeth J. Pyatt
Instructional Designer
[email protected]
Education Technology Services
Supporting Multiple Languages
Unpopular Language Support (Easy):
All English Alphabet, all the time.
“Escribes vous Russki (Russian)? No”
Preferred Language Support (Harder):
Display native scripts and punctuation
Display appropriate punctuation/symbols
«¿Escribes vous Русский? !Sí!»
Script versus Language
Arabic Script used for – Arabic, Ottoman
Turkish, Persian (Farsi), etc.
Cyrillic Script used for – Russian,
Ukrainian, Uzbek, Bulgarian, etc.
Serbo-Croatian (1 language)
Cyrillic Text = “Serbian”
Roman (English alphabet) Text = “Croatian”
Hindi-Urdu (also 1 language)
(Hin = Devanagari / Urd = Arabic script)
Language of Scripts
i18n = internationalization
Roman/Latin alphabet = English alphabet
Cyrillic = Russian
RTL =Right to Left (e.g. Arabic/Hebrew)
CJK = Chinese-Japanese-Korean
Chinese has largest character count
South Asian = Scripts of India (many)
Taxonomy of scripts
C = Consonant; V = Vowel
Alphabet - 1 letter = 1 vowel or consonant
Roman, Cyrillic, Greek, Runes, Georgian,
Armenian, etc
Typing - map single letters to character
Syllabary - 1 character = 1 CV syllable
Japanese, Cherokee, Ethiopic, Sumerian
Typing - map CV sequence into character
(e.g. Jap Katagana na-wa = ナワ )
Taxonomy of scripts
C = Consonant; V = Vowel
 Ideographic (Chinese) - 1 character / 1 meaning
Symbols combined to make compounds
Typing - map CV sequence to list of possible
Ideographic scripts can have syllabary component
 Consonantal Syllabary - letters are consonants;
vowels are diacritics on C’s
Korean, Thai, languages of India, Cree, etc.
Typing uses CV sequences. Fonts must alter
characters depending on surrounding sounds
E.g. Susi = suis
Scripts & Encoding
ASCII - assign a number to a character
Excel Formula =CHAR(65) results in “A”
Modern Encoding expands the repertoire
beyond ASCII but with inconsistent
implementations for different
Know the encoding for your
script/language. Needed for debugging.
Some Notable Encodings
 Latin 1 (ISO-8859-1)
English, Most W. Europe, Africa, Pacific Is., Nat. American
 Latin 2 (ISO-8859-2) (Latin 3/Latin 4…)
Central Europe (Hungarian, Polish, Czech)
 Big5 (Chinese only), Shift-JIS (Japanese only), etc.
 “ISO” vs. “Windows” Parallel Encodings (e.g. Hebrew)
• ISO-8859-8 (Visual Hebrew)
• Windows-1255 (Windows Hebrew) (also MacHebrew)
• Parallel ISO/Windows for many scripts (Arabic, Cyrillic, etc)
 Unicode (Super Encoding, all scripts)
“Exotic Latin Alphabet” - Welsh, Hawaiian, Old Irish etc.
Also Chinese, Japanese, Cyrillic, Arabic, Hebrew, Greek…
Now What do I do?
Step 1 - Select target languages (don’t
forget English)
Step 2 - Determine which encoding
supports language.
Step 3 - Develop properly encoded page.
Aim for Unicode (even English).
Step 4 - Declare encoding & language in
HTML Meta tags
How do I get properly encoded
Latin 1 (English, Spanish, French,
Use entity codes (e.g. ñ for ñ)
Declare encoding
Major World Language
Set up keyboards
Type in text editor/HTML editor
Declare encoding & language
Undersupported Language
Get correct fonts/keyboards or “PDF it”.
Character Codes (Latin 1 Langs)
 Applies to “Western European” languages only
 Always use for backwards compatability
Some examples:
 Accent codes - e.g. ñ = ñ
 Punctuation - e.g. © = ©
 Old Math - e.g. ° = °
 New Math (recent browsers only)
Σ = S
∫ = ∫
σ = s
≠ = ≠
Encoding & Language Tags
Set encoding in header
Latin 1
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Shift_JIS (Japanese)
<meta http-equiv="Content-Type" content="text/html; charset=shift_jis">
Declare Page Language (ISO-639 code)
<html lang=“en-us">
Spanish/French/German/Japanese Document
<html lang=“es">
fr = French, de = German, zh = Chinese, jp = Japanese, etc.
Spanish P (or any HTML text tag)
<p lang=“es">
Challenge Set 1:
How do you insert the name José Espiño
into HTML?
How do you declare the language
Spanish? (multiple options)
What encoding is needed (assume English
page with Spanish word)
Stray Unicode Characters
You can hard-code a four-digit Unicode
numeric code to force a character to
appear. E.g. (Cyrillic “D” Д = &#1044; or
&#x0414; (hex))
Best used for small spans of text or
“exotic” Latin characters (e.g. a#/a()
If you use hex version, add the “x” prefix
and add leading zero (to make 4 digits
Set encoding to “utf-8” with meta-tag
Challenge 2:
How do you insert the ¿Escribes
vous Русский? !Sí! into HTML?
(Note: 1st letter capital in Cyrillic)
How do you declare the page to
be Unicode?
Setting Up Keyboards for Other
Activate required keyboards from Control
Panel or Systems Preferences (OS X)
You may need to install language utilities
for East Asian and other unusual scripts
from the System Disk
Quick Demo
Typing with Encoded Fonts
Keyboarding utilities which match the
“keys” to the right encoded number must
be installed.
Keyboards can arrange one encoding in
several layouts
QWERTY (AKA “transliterated/phonetic”)
• Preferred by U.S. students
Native layout (native script typewriters)
• Preferred by native speakers (e.g. instructors)
Dreamweaver/Front Page:
Options for Inputting Text
Switch keyboard (editor may add meta tag)
Or cut and paste encoded text
Or Import from international text editors via
Save As HTML
Global Writer (Windows)
Simple Text (free from Apple)
Others for specific scripts
Avoid import from Word
Mini Demo 2
Challenge 3 (Research):
What encodings can I use for
How about Modern Greek vs.
Ancient Greek?
Undersupported Scripts
Ultimate Challenge
“Undersupported” = minority languages,
ancient/medieval, small populations
Third Party utilities may be needed
Unicode font (TrueType .ttf format)
Keyboard Utility (if you can get it)
Print Font for PDF’s (the last resort)
Test, Test, Test (esp. Mac vs. Win)
Print Font
1. Replaces ASCII
Web Font
1. Complies with some
characters with random
encoding (e.g. ASCII)
2. Alternative fonts with same
2. Both parties must have
encoding can be used
same font to read
(e.g. Times or Arial)
document correctly
3. Ideal for Web transmission,
3. Ideal for print/PDF
still difficult for typing
documents when no
data transmission occurs 4. E.g. Arial Unicode, Lucida
4. E.g. Symbol, Webdings
Sans Unicode, Lucida
Grande, TITUS Cyberbit
(free) etc.
When Websites show Gibberish
Problem: No Encoding Specified (see
Go to View menu and manually switch
Problem: No HTML entity codes for
(See gibberish for accented letters)
Try switching View to Latin 1, Windows-1252,
MacRoman, UTF-8 (Unicode)
ANGEL & Other Web Tools
1. Activate keyboards for needed scripts
Open Netscape 7/Mozilla
Go to ANGEL or other Web tool
Switch keyboards
Users can view in Netscape 7/Mozillia, IE5+
(Win) or Safari (OSX)
Where to Find Out More
Penn State Computing with Accents
Titus Cyberbit Unicode Font (free)
Look under “Instrumentalia”
¡Escribez Русский!

IT Leadership in the 21st Century: The Ultimate Oxymoron