language technology @ciil
Prof. Udaya Narayan Singh
Central Institute of Indian Languages
Set up on July 17, 1969
Located in Mysore, Karnataka
Overall Structure
Functions under the Department of Secondary & Higher
Education, Ministry of Human Resource Development
Guided by a Governing Committee chaired by the
Hon’ble HRM
Headed by a Director
Assisted by seven Deputy Directors
Supported by Seven Principals of RLCs
Administered with the help from an Assistant Director
Main Objectives
 Advices and Assists both Central & State Govts in the
matter of language
 Promotes all Indian languages by creating content and
 Protects and Documents Minor, Minority and Tribal
 CCCK program for officials in Karnataka
 Radio courses in Hindi for listeners
 Offers 3-months Courses in Communication
 Orientation Courses for Mother-tongue teachers
 Refresher Courses under Academic Staff College
 Organizes more than 100 Int’l & national
 seminars/workshops
Regional Language Centres
Promote Linguistic harmony by teaching
15 Indian languages to non-native learners
 10 months L2 teaching: 8000 teachers trained
 National Integration Camps and Refresher courses
 Distance Courses in Tamil/Telugu/Bengali/Urdu
 Originally conceived of only four RLCs in four
corners of India with following aims
 NRLC at Patiala to handle Kashmiri, Urdu & Panjabi
Regional Language Centres
 SRLC, Mysore to handle all four Dravidian languages
 WRLC at Pune to handle Marathi, Sindhi & Gujarati
 ERLC to handle Oriya Bengali & Assamese
 Later two more were added in 1973, UTRC
at Solan & in 1981, UTRC at Lucknow.
 Latest addition being the NERLC at Gauhati, 1999
Human Resource
Language Specialists
Information Scientists
Hardware Persons
Software Persons
Supporting Staff
Own printing press with all the facilities
Published 515 books
Intensive Courses
Common Vocab.
Apni Boli (for KVS)
Pict. Glossaries
Bibliographies, etc.
Some other achievements
Archived data of 118 languages
Studied 80 Tribal/Border languages
Cassette Courses in Four Language
Kashmiri on the net Link
Radio courses in Hindi through Kannada
 150-node LAN set up at CIIL and separate 10 node
 LANs at NRLC and ERLC
 Itanium Web server and database server at CIIL for
launching sites
 High speed V-SAT connection through STPI
 Analog audiotick computerized lab at SRLC and ERLC
 Digital audiotick computerized labs at NRLC
 2400 Electronic Journals acquired for CIIL & RLCs
 Browsing section in the library
Web based language resources
Spoken language corpus
Speech Science lab has following Hardware and Software
Computerized Speech Lab. Model 4100
Developed by: Kay Elemetrics Corp.
Lincoln Park, N. J. 07035-1488.
Software (dependent on CSL Hardware)
Web based language resources
Spoken language corpus
1.Computerized Speech Lab Main Programme
Version 2.5.2
2.Real-Time Spectrogram, Model 5129, Version 2.5.2
3.Video Phonetics Program and Database,
Model 5150, Version 2.5.2
4.Multi-Dimensional Voice Program, Model 5105,
Version 2.5.2
5.Multi-Dimensional Voice Program Advanced,
Model 5105, Version 2.5.2
6.Real-Time Pitch, Model 5121, Version 2.5.2
7.Analysis Synthesis Laboratory, Model 5104, Version 2.5.2
Web based language resources
Spoken language corpus
Software (without any hardware dependency)
1.Multi-Speech Signal Analysis Workstation,
Model 3700, Version 2.5.2
2.Real-Time Spectrogram, Model 5129, Version 2.5.2
3.Video Phonetics Program and Database,
Model 5150, Version 2.5.2
4.Real-Time Pitch, Model 5121, Version 2.5.2
5.Analysis Synthesis Laboratory, Model 5104, Version 2.5.2
Speech Production and Perception (CD-ROM Developed
by Sensimetrics)
Web based language resources
Spoken language corpus
Branches of study in Speech Science
 Articulatory Phonetics
 Experimental Phonetics
 Biological & Clinical Linguistic
 Speech Technology
 Forensic Phonetics
Web based language resources
Spoken language corpus
Phonetic Readers
Angami , Ao-Naga , Balti ,Bengali , Brokskat, Gojri ,
Gujarati ,Kashmiri , Khasi , Kota , Kurux , Kuvi,
Ladakhi, Lotha ,Manipuri , Mishmi , Mundari Sema ,
Shina ,Tangkhul-Naga ,Thaadou ,Tripuri
Web based language resources
Spoken language corpus
Major Events
 International institute of phonetics
 Seminar Cum Workshop On Voice Modulation
 And Culture
 Workshop On Aspiration
 Seminar On Voice Quality
 Workshop On Nasalization
 Workshop On Multilingual Speech Analysis
 And Synthesis
 Instrumental Analysis Of Phonetic Features Across
 Major Indian Languages
 Analysis Of Retroflex Sounds etc
Web based language resources
Spoken language corpus
Training / orientation programmes in phonetics
for the teachers from
Tamil Nadu
Himachal Pradesh
Uttar Pradesh
Jammu & Kashmir
Arunachal Pradesh
Madhya Pradesh
Web based language resources
Text corpora in major and minor Indian languages
Web based Indian Languages Grammars
Web based Indian Language Courses
Web based books and journals
Web based Translation services
In collaboration with Sahitya Akademi & NBT
Eelectronic journal - Translation Today and
Tools for translation
Electronic dictionaries
Annotated corpus & tools
Parallel corpora
Translational dictionaries
Cultural Glossaries
Word finders
Technical terminologies
Linguistic Data Consortium for
Indian Languages (LDC-IL)
Takes advantage of the giant strides in
Information Technology
Model: Linguistic Data Consortium (LDC) hosted by the
University of Pennsylvania, USA.
Budget: One crore per year and ten crore for ten years.
Funds: by the Ministry of Human Resource Development
Preliminary discussion held in:
International Workshop on Creation of Linguistic Data
Consortium for Indian Languages on August 16-17, 2003.
Meeting of the lead institutions to create LDC-IL on August 18,
2003 at IISc, Bangalore.
LDC-IL will focus on:
Becoming a repository of linguistic resources in all
Indian languages in the form of text, speech and
lexical corpora.
Facilitating creation of such databases by different
member organizations.
Setting standards for data collection and storage of
corpora for different research and development
Supporting development and sharing of tools for data
collection and management.
Facilitating training through workshops,
seminars etc. in technical as well as process
related issues.
Creating and maintaining the LDC-IL website
that would be the primary gateway for accessing
LDC-IL resources.
Designing or providing help in creation of
appropriate language technology for mass use.
Providing the necessary linkages between
academic institutions, individual researchers
and the masses
Major areas of languages covered:
Speech corpora
Handwritten corpora
Text corpora including parallel corpora
Natural Language Processing
Several by-products like lexicon, thesauri etc.,
Participating Institutions:
Indian Institute of Science, Bangalore,
Indian Institute of Technology, Bombay,
Indian Institute of Technology, Madras,
International Institute of Information Technology, Hyderabad
ISI Calcutta; TIFR Mumbai; HP Labs India; BM; C-DOT;
C-DAC; Tata InfoTechAll other IITs; KHS; NCPUL;
Rashtriya Sanskrit Sansthan; TDIL, MIT
All academic institutes, research organizations and
Corporate R&D groups from India and abroad
working on Indian languages will be encouraged to
participate in LDC-IL.: Different Indian
Universities with major departments of Linguistics
and computer science/Artificial Intelligence
Web Based Language Information Services
General Information
 Language/ Area Profile:
Geolinguistic; Sociolinguistic; Cultural; Literary
 Language/Area History:
Genealogical; Archaeological; Cultural; Textual
 Language Vitality:
Attitudinal; Utilitarian; Socio-political;
 Grammatical Information:
Phonetic; Graphemic; Phonological;
Morphological; Lexical Syntactic; Semantic;
Biblio search
Link to LIS site
Website for Modern Indian Literary Classics
in Translation
In collaboration with Sahitya Akademi and NBT
 To promote the celebrated Indian fiction writers during
 the last 150 years both within
 the country and abroad through a series of initiatives.
 A library of 100 major contemporary fiction writing
 in English and several
 Other European languages.
Digital Library and Manu scriptorium
Special Library with linguistics and allied disciplines
as focus
 Over 65000 books
 Subscription to over 270 journals
 Subscription to 4200 online journals
 Back volumes of all the journals
 RLC 7 libraries with collection in Indian languages
 Has CDs (worth 50 lakhs) in Indian languages
 in digital form
 Library automation through VTLS package
 Bhasa-Bharati will have display galleria as well as
scanned copies of writings.
 Audio and video tapes of interviews,
 Lectures notes and recordings
 Their own as well as professional recitations.
 Films and tele-films and serials.
 Documentaries.
Website for Modern Indian Literary Classics
in Translation
Bhasha Bharati
will also house and create hyper-texts of Indian
languages classics.
It will provide a service to common people who may
either visit here actually or virtually and seek answers
to their questions and queries.
It will handle questions on different topics, ranging
from knowledge and interpretation of a literary or
religious text, or to seek information on a speech group
or even on a word or an expression.
Website for Modern Indian Literary Classics
in Translation
Web based information on Indian Scripts
Linguistic Integration Project of India
LIPIKA will promote greater understanding among
Indian people, produce useful learning materials,
create web-based information.
LIPIKA will show unity in India's apparently
diverse writing systems.
LIPIKA will also help generate softwares with
necessary tools like spell-checkers and grammar
checkers. 25
Website for Modern Indian Literary Classics
in Translation
Task .1
Preparation of
a brief history of various writing systems of India,
such as Brahmi, Kharosthi, etc.;
a learners' manual (aimed at both foreigners and
Indians) into the structure of syllabic writing
systems as prevalent in India, including a
comparison of apparently divergent scripts used by
Indian languages today.
(a) Preparation of a CD/Video version of the
Learners' manual, based on the expertise of CDAC/NCST/CIIL
(b) Making the learning software in the public
domain, for propagation of Indian writing systems.
(a) Creation of new fonts and images in respect of
Deva-nagari and a few other major Indian writing
systems through a series of workshops
(i) calligraphists,
(ii) print making experts,
(iii) computer experts,
(iv) creative persons
Website for Modern Indian Literary Classics
in Translation
Some of the important collaborators of CIIL
 All IITs, IIIT Hyderabad, IISc., SIDA
 Government of Karnataka
MGI-CIIL from
 Andaman & Nicobar
 Government of Singapore
NCPUL and many
 Lancaster University
Website for Modern Indian Literary Classics
in Translation
HP Labs
University of Hyderabad
Delhi UnivNBT
Sahitya Akademi
Konkani Academy
Dogri Sansthan
Karnataka Nataka
Director’s Speech