Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer firstname.lastname@example.org Education Technology Services Supporting Multiple Languages Unpopular Language Support (Easy): All English Alphabet, all the time. “Escribes vous Russki (Russian)? No” Preferred Language Support (Harder): Display native scripts and punctuation Display appropriate punctuation/symbols «¿Escribes vous Русский? !Sí!» Script versus Language Arabic Script used for – Arabic, Ottoman Turkish, Persian (Farsi), etc. Cyrillic Script used for – Russian, Ukrainian, Uzbek, Bulgarian, etc. Serbo-Croatian (1 language) Cyrillic Text = “Serbian” Roman (English alphabet) Text = “Croatian” Hindi-Urdu (also 1 language) (Hin = Devanagari / Urd = Arabic script) Language of Scripts i18n = internationalization Roman/Latin alphabet = English alphabet Cyrillic = Russian RTL =Right to Left (e.g. Arabic/Hebrew) CJK = Chinese-Japanese-Korean Chinese has largest character count South Asian = Scripts of India (many) Taxonomy of scripts C = Consonant; V = Vowel Alphabet - 1 letter = 1 vowel or consonant Roman, Cyrillic, Greek, Runes, Georgian, Armenian, etc Typing - map single letters to character Syllabary - 1 character = 1 CV syllable Japanese, Cherokee, Ethiopic, Sumerian Typing - map CV sequence into character (e.g. Jap Katagana na-wa = ナワ ) Taxonomy of scripts C = Consonant; V = Vowel Ideographic (Chinese) - 1 character / 1 meaning Symbols combined to make compounds Typing - map CV sequence to list of possible characters Ideographic scripts can have syllabary component Consonantal Syllabary - letters are consonants; vowels are diacritics on C’s Korean, Thai, languages of India, Cree, etc. Typing uses CV sequences. Fonts must alter characters depending on surrounding sounds E.g. Susi = suis Scripts & Encoding ASCII - assign a number to a character Excel Formula =CHAR(65) results in “A” Modern Encoding expands the repertoire beyond ASCII but with inconsistent implementations for different platforms/scripts Know the encoding for your script/language. Needed for debugging. Some Notable Encodings Latin 1 (ISO-8859-1) English, Most W. Europe, Africa, Pacific Is., Nat. American Latin 2 (ISO-8859-2) (Latin 3/Latin 4…) Central Europe (Hungarian, Polish, Czech) Big5 (Chinese only), Shift-JIS (Japanese only), etc. “ISO” vs. “Windows” Parallel Encodings (e.g. Hebrew) • ISO-8859-8 (Visual Hebrew) • Windows-1255 (Windows Hebrew) (also MacHebrew) • Parallel ISO/Windows for many scripts (Arabic, Cyrillic, etc) Unicode (Super Encoding, all scripts) “Exotic Latin Alphabet” - Welsh, Hawaiian, Old Irish etc. Also Chinese, Japanese, Cyrillic, Arabic, Hebrew, Greek… Now What do I do? Step 1 - Select target languages (don’t forget English) Step 2 - Determine which encoding supports language. Step 3 - Develop properly encoded page. Aim for Unicode (even English). Step 4 - Declare encoding & language in HTML Meta tags How do I get properly encoded text? Latin 1 (English, Spanish, French, German) Use entity codes (e.g. ñ for ñ) Declare encoding Major World Language Set up keyboards Type in text editor/HTML editor Declare encoding & language Undersupported Language Get correct fonts/keyboards or “PDF it”. Character Codes (Latin 1 Langs) Applies to “Western European” languages only Always use for backwards compatability Some examples: Accent codes - e.g. ñ = ñ Punctuation - e.g. © = © Old Math - e.g. ° = ° New Math (recent browsers only) Σ = S ∫ = ∫ σ = s ≠ = ≠ Encoding & Language Tags Set encoding in header Latin 1 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> Unicode <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> Shift_JIS (Japanese) <meta http-equiv="Content-Type" content="text/html; charset=shift_jis"> Declare Page Language (ISO-639 code) English-U.S. <html lang=“en-us"> Spanish/French/German/Japanese Document <html lang=“es"> fr = French, de = German, zh = Chinese, jp = Japanese, etc. Spanish P (or any HTML text tag) <p lang=“es"> Challenge Set 1: How do you insert the name José Espiño into HTML? How do you declare the language Spanish? (multiple options) What encoding is needed (assume English page with Spanish word) Stray Unicode Characters You can hard-code a four-digit Unicode numeric code to force a character to appear. E.g. (Cyrillic “D” Д = Д or Д (hex)) Best used for small spans of text or “exotic” Latin characters (e.g. a#/a() If you use hex version, add the “x” prefix and add leading zero (to make 4 digits total) Set encoding to “utf-8” with meta-tag Challenge 2: How do you insert the ¿Escribes vous Русский? !Sí! into HTML? (Note: 1st letter capital in Cyrillic) How do you declare the page to be Unicode? Setting Up Keyboards for Other Scripts Activate required keyboards from Control Panel or Systems Preferences (OS X) You may need to install language utilities for East Asian and other unusual scripts from the System Disk Quick Demo Typing with Encoded Fonts Keyboarding utilities which match the “keys” to the right encoded number must be installed. Keyboards can arrange one encoding in several layouts QWERTY (AKA “transliterated/phonetic”) • Preferred by U.S. students Native layout (native script typewriters) • Preferred by native speakers (e.g. instructors) Dreamweaver/Front Page: Options for Inputting Text A. B. C. D. Switch keyboard (editor may add meta tag) Type Or cut and paste encoded text Or Import from international text editors via Save As HTML Global Writer (Windows) Simple Text (free from Apple) Others for specific scripts Avoid import from Word Mini Demo 2 Challenge 3 (Research): What encodings can I use for Russian? http://ourworld.compuserve.com/homepages/PaulGor/ http://www.brama.com/compute/encode.html How about Modern Greek vs. Ancient Greek? http://www.hri.org/fonts/ http://www.stoa.org/unicode/quickstart.html Undersupported Scripts Ultimate Challenge “Undersupported” = minority languages, ancient/medieval, small populations Third Party utilities may be needed Unicode font (TrueType .ttf format) Keyboard Utility (if you can get it) Print Font for PDF’s (the last resort) Test, Test, Test (esp. Mac vs. Win) Print Font 1. Replaces ASCII vs. Web Font 1. Complies with some characters with random encoding (e.g. ASCII) characters 2. Alternative fonts with same 2. Both parties must have encoding can be used same font to read (e.g. Times or Arial) document correctly 3. Ideal for Web transmission, 3. Ideal for print/PDF still difficult for typing documents when no purposes data transmission occurs 4. E.g. Arial Unicode, Lucida 4. E.g. Symbol, Webdings Sans Unicode, Lucida Grande, TITUS Cyberbit (free) etc. When Websites show Gibberish Problem: No Encoding Specified (see gibberish) Go to View menu and manually switch encoding Problem: No HTML entity codes for accents (See gibberish for accented letters) Try switching View to Latin 1, Windows-1252, MacRoman, UTF-8 (Unicode) ANGEL & Other Web Tools 1. Activate keyboards for needed scripts 2. 3. 4. 5. 6. See http://tlt.psu.edu/suggestions/international/keyboards Open Netscape 7/Mozilla Go to ANGEL or other Web tool Switch keyboards Type! Users can view in Netscape 7/Mozillia, IE5+ (Win) or Safari (OSX) Where to Find Out More Penn State Computing with Accents http://tlt.psu.edu/suggestions/international Titus Cyberbit Unicode Font (free) http://titus.uni-frankfurt.de/indexe.htm Look under “Instrumentalia” ¡Escribez Русский!