Multilingual HLT in Europe and
the development of ASR
Louis C.W. Pols
Institute of Phonetic Sciences
University of Amsterdam
The Netherlands
PRASA2001 – Franschhoek, South Africa
30 Nov. 2001, keynote
Some history





Liesbeth Botha spent half a year at our
institute during second half of 1996
ever since the possible organization of a
workshop or a major conference in South
Africa was considered
(cancelled) AST Workshop on ‘Human
Language Technologies for E-Governance
in a Multilingual Society’, Stellenbosch
PRASA2001 – Franschhoek, 29-30 Nov.,
incl. Speech Processing and AST project
I always wanted to visit South Africa!
30 Nov. 2001
PRASA2001 - Franschhoek
2
Overview








Multilingual Europe (vs. Multilingual South Africa)
EU Framework Programs; Human Language
Technology (HLT)
Other (European) programs and organizations
ISCA
Dutch speech database initiatives (vs. AST)
Speech science and technology; ASR development
Academia (knowledge) and industry (applications)
Conclusions
30 Nov. 2001
PRASA2001 - Franschhoek
3
Multilingual Europe

Europe (West, Central, East)
EU-countries
Candidate-EU-countries
Schengen countries (internally no boundary control)
Euro countries (300 M people)



many nations and even more languages
multilingual community and (open) market
e-commerce, telebanking, infokiosk, etc.
30 Nov. 2001
PRASA2001 - Franschhoek
4
EU Framework Program FP5





Human Language Technologies RTD (HLT)
http://www.hltcentral.org/
part of Information Society Technologies (IST),
Key Action III (Multimedia Contents and Tools)
part of fifth Framework Program ’98-’02 (FP5)
IST 3600 M€ (26.5% of FP5); HLT 125 M€
HLT: Multilingual communication
Natural Interactivity
Cross-lingual information management
Support & Accompanying Measures
30 Nov. 2001
PRASA2001 - Franschhoek
7
6th Framework program





FP6 (’02-’06) the way forward
proposal published Febr. 2001
one of 7 priority themes:
Information Society Technologies
also networks of excellence
IST budget 3600 M€
30 Nov. 2001
PRASA2001 - Franschhoek
8
Complaints from academia





too much application & user oriented
little room for research (reaction Commission: it is
time for HLT to show its usefulness!),
but .... pendulum swings!
speech data not freely available
(only with delay and at (high) costs via ELRA)
still: several very interesting projects
we participated before (SAM, EuroCocosda,
somewhat in SpeechDat) but barely anymore, but
(KPN Research and) Nijmegen University still do
30 Nov. 2001
PRASA2001 - Franschhoek
9
Some HLT ‘speech’ projects







C-ORAL-ROM Integrated Reference Corpora for Spoken Romance
Languages (1/01, 36 mo)
CORETEX Improving Core Speech Recognition Technology (4/00, 36 mo)
I-EYE Interacting with Eyes: Gaze Assisted Access to Information in
Multiple Languages (1/00, 30 mo)
NESPOLE! NEgotiating through SPOken Lang. in E-comm. (1/00, 30 mo)
SIRIDUS Specification, Interaction and Reconfiguration In Dialogue
Understanding Systems (1/00, 36 mo)
SMADA Sp. Driven Multimodal Automatic Directory Assist. (1/00, 36 mo)
(finalizing ITRW ’Advanced ASR for Telecom Appl.’, Nov. 2002, Avignon)
SPEECON Sp. Driven Interfaces for Consumer Applications (2/00, 24 mo)
30 Nov. 2001
PRASA2001 - Franschhoek
10
Some ‘past’ HLT projects

ARISE Automatic Railway Systems for Europe (10/96, 24 mo)

CAVE Caller Verification in Bank and Telecommunication (11/95, 24 mo)

EAGLES Expert Advisory Group on Language Engineering Standards
(11/97, 24 mo)

ELRA European Language Resources Association (9/95, 50 mo)

ELSE Evaluation in Language and Speech Engineering (1/98, 16 mo)



SPEECHDAT Speech Databases for Creation of Voice Driven Teleservices
(3/96, 34 mo)
SPEECHDAT-CAR (3/98, 30 mo) + variants
VODIS Advanced Speech Technologies for Voice-operated Driver
Information Systems (11/95, 43 mo)
30 Nov. 2001
PRASA2001 - Franschhoek
11
some HLT ‘support’ projects




CLASS Collaboration in Language and Speech Science
and technology
(Int. WS on ‘Information Presentation and Natural
Multimodal Dialogue’, Verona Italy, Dec 14-15, 2001)
ELSNET-HLT The European Network of Excellence in
Human Language Technologies
HOPE HLT Opportunity Promotion in Europe, Euromap
ISLE-HLT Int. Standards for Language Engineering
(Eagles follow-up) incl. I/O Meta Data Initiative
(IMDI), see also COREX
30 Nov. 2001
PRASA2001 - Franschhoek
12
eContent



eContent part of eEurope initiative
European Digital Content on the Global
Networks, ’01-’05, 100 M€, 1st call 3/2001
Action Line 2 (AL2) addresses the intersection of the
content and language industries, more specifically the design,
production and distribution of high-quality European digital
content for the global networks in an increasingly multilingual
and multicultural socio-economic environment

http://www.hltcentral.org/econtent/
30 Nov. 2001
PRASA2001 - Franschhoek
13
MLIS

Multilingual Information Society Program




Supporting the creation of a framework of
services for European language resources
Encouraging the use of language technologies,
resources and standards
Promoting the use of advanced language tools in
the Community and Member States public sector
one call in June ’99, 15 M€, some 30 proj.

f.i. NL-TRANSLEX: Machine Translation for Dutch
and English/French/German
30 Nov. 2001
PRASA2001 - Franschhoek
14
INTAS




International Association for the promotion of cooperation with scientists from the New Independent
States of the former Soviet Union (NIS)
established June 1993
Open + Thematic Call 2000 (budget 16 M €)
max budget 150 k€/project (max 30 k€/NIS partner)

INTAS 915 ‘Spontaneous Speech of Typologically Unrelated
Languages (Russian, Finnish and Dutch): Comparison of
Phonetic Properties’ (90 k€, 7/01, 36 mo)
30 Nov. 2001
PRASA2001 - Franschhoek
15
Euromap

HLT Opportunity Promotion in Europe (HOPE)
(2/00, 24 mo, 8 national focus points)
to raise awareness of the benefits of human language
technologies (HLT) with companies, organizations and
users; to accelerate technology transfer from the research
base to the market; to stimulate community building in
specific domains (tourism and e-commerce).

General:

Dutch site: http://www.taalunieversum.org/tst/en/
30 Nov. 2001
http://www.hltcentral.org/euromap/
PRASA2001 - Franschhoek
16
European Language
Resources Association

A non-profit organization to promote the creation,
verification, and distribution of language resources.








US counterpart: LDC
173 resources sold in 2000.
organizer of LREC conferences (third one in May 2002 in
Las Palmas, Spain)
speech & related resources ~200
written resources ~145
terminological resources
tools and software
http://www.icp.grenet.fr/ELRA/home.html
30 Nov. 2001
PRASA2001 - Franschhoek
17
ELSNET





European Network of Excellence in Human
Language Technologies
one of the ~20 networks within FP5
Transfer of knowledge and expertise; Shared goals;
Evaluation; Shared language resources; Promotion
of best practice; Interoperability by means of
standardization
yearly Elsnet Summer Schools:
July 15-26, 2002 Odense, Denmark, ‘Evaluation
and Assessment of Text and Speech Systems’
Newsletter Elsnews; http://www.elsnet.org
30 Nov. 2001
PRASA2001 - Franschhoek
18
COCOSDA



Internat. organization for coordinating the globalized efforts
in spoken language resources and sp. technology evaluation
yearly, jointly, with Eurospeech and ICSLP since Chiavari,
Italy, Sept. ’91 (Eurosp.’91) and before; Oriental Cocosda
topic domains







Evaluation of Speech Underst. and Dialogue Systems (W. Minker)
Multi-modal corpora (S. Nakamura)
Justus Roux
Corpus Annotation Tools (S. Bird)
Local Languages (D. Gibbon)
regional programs (Europe; Asia; Oceania; Africa; Latin America)
data center representatives (LDC, S. Bird; ELRA, K. Choukri)
http://www.itl.atr.co.jp/cocosda
30 Nov. 2001
PRASA2001 - Franschhoek
19
COCOSDA matrix
30 Nov. 2001
PRASA2001 - Franschhoek
20
COST

European Cooperation in the field of Scientific and
Technical Research
(~60 k€ per action, for additional costs only):






COST 249: Continuous Speech Recognition over the
Telephone (19 countries; start 5/94; 6 yrs; final report)
COST 250: Speaker Recognition in Telephony
COST 258: The Naturalness of Synthetic Speech
COST 277: Nonlinear Speech Processing
COST 278: Spoken Language Interaction in Telecommun.
http://cost.cordis.lu/src/home.cfm
30 Nov. 2001
PRASA2001 - Franschhoek
21
EURESCOM


the European Institute for Research and
Strategic Studies in Telecommunications
20 shareholders from 19 European countries
(major European network operators and
service providers)

f.i. MUST - MUltimodal, multilingual information
Services with small mobile Terminals (P1104)
30 Nov. 2001
PRASA2001 - Franschhoek
22
ISCA








European Speech Comm. Association founded in ’88
from ESCA to ISCA at Eurospeech’99 in Budapest
membership organization
organizer of Eurospeech/ICSLP - Interspeech
organizer of specialized workshops (ITRWs)
Special interest groups (SIGs)
Speech Communication Journal
(http://www.elsevier.com/locate/specom)
http://www.isca-speech.org/
30 Nov. 2001
PRASA2001 - Franschhoek
23
Eurospeech-ICSLP-Interspeech
odd years (Eurospeech)
1
2
3
4
5
6
7
8
9
(in Europe)
Paris
Genoa
Berlin
Madrid
Rhodes
Budapest
Aalborg
Geneva
Lisbon
30 Nov. 2001
’89
’91
’93
’95
’97
’99
’01
’03
’05
even years (ICSLP)
(elsewhere)
past
future
Kobe
Banff
Yokohama
Philadelphia
Sydney
Beijing
Denver
Seoul
??
PRASA2001 - Franschhoek
’90
’92
’94
’96
’98
’00
’02
’04
’06
24
ISCA SIGs









Speech Synthesis - SynSig
Audio Visual Speech - AVISA
Speech And Language Technology for MInority Languages SALTMIL
Integration of Speech Technology in (Language) Learning InSTIL
SPeaker and Language Characterization - SPLC
Education in the Field of Speech Communication - EduSIG
Speech Prosody - SProSIG
Dialogue Processing - SigDial (also within ACL)
Groupe Francophone de la Communication Parlée - GFCP
30 Nov. 2001
PRASA2001 - Franschhoek
25
ISCA ITRWs (forthcoming)




Prosody in Speech Recognition and Understanding - Prosody 2001
Molly Pitcher Inn, Red Bank, NJ. October 22-24, 2001
TIPS - Temporal Integration in the Perception of Speech
Aix-en-Provence, France, 8-10 April 2002
Multi-Modal Dialogue in Mobile Environments
Kloster Irsee, Germany, June 17-21, 2002
Advanced ASR for Telecom Applications
Palais des Papes, Avignon, France, November 27-29, 2002
Supported but not organized by ISCA:


2001 International Workshop on Automatic Sp. Recogn. and Underst.
Madonna di Campiglio (Trento), Italy, December 9-13, 2001
Speech Prosody 2002
Aix-en-Provence, France, 11-13 April, 2002
30 Nov. 2001
PRASA2001 - Franschhoek
26
IEEE

IEEE Signal Processing Society
MMSP’01, Workshop on Multimedia Signal Processing,
Cannes, France, October 3-5, 2001
ASRU’01, Automatic Speech Recognition and Understanding
Workshop, Madonna de Campiglio (Trento), Italy,
December 9-13, 2001
2002 International Workshop on Multimedia Signal
Processing, US Virgin islands, December 9-11, 2002
IEEE Trans. on Signal Processing / Speech and
Audio Processing / Multimedia / Neural Networks
 http://www.ieee.org/

30 Nov. 2001
PRASA2001 - Franschhoek
27
DARPA NIST

DARPA Projects and Yearly evaluations





CSR (Continuous Speech Recognition);
LVCSR (Large Vocabulary Conversational Speech
Recognition);
ATIS (Air Travel Information System);
Language Recognition (Identification and
Verification);
Speaker Recognition (Identification and
Verification)
30 Nov. 2001
PRASA2001 - Franschhoek
28
NATO-ASI






ASI = Advanced Study Institute
many different domains
certain restrictions on NATO vs. non-NATO
participants, free registration, some funding
Dynamics of Speech Production and Perception, Il
Ciocci, Italy, June 23 – July 6, 2002
send application before Jan. 15, 2002 to
[email protected]
Organizing Cee.: Pierre L. Divenyi & Klára Vicsi
30 Nov. 2001
PRASA2001 - Franschhoek
29
European national programs




German Verbmobil; SmartKom (since 9/99)
Bavarian Archive for Speech Signals (BAS)
Spoken Dutch Corpus
French AUP
Swedish Centre for Speech Technology (CTT)
Swedish National Graduate School in
Language Technology (GSLT)
30 Nov. 2001
PRASA2001 - Franschhoek
30
Dutch speech database initiatives








Speech Processing Expertise Center SPEX
5,000 speakers Polyphone
1,000 speakers SpeechDat + variants
NWO Priority program TST-OVIS (public
transportation information system over telephone)
1,000 hrs CGN (Dutch-Flemish)
5.5 hrs ‘open source’ IFA-corpus
TST Platform
ToDI (Transcription of Dutch Intonation)
30 Nov. 2001
PRASA2001 - Franschhoek
31
Spoken Dutch Corpus

4.6 M€, 5 yrs, 10 M words, ~ 1000 hrs of speech











Corpus design and compilation
Recording and digitization
Orthographic transcription (all)
Lemmatization and POS tagging (all)
Lexicon link-up (all)
Broad phonetic transcription (1 M)
Word segmentation (1 M)
Syntactic annotation (1 M)
Prosodic annotation (250 k)
Development of exploitation software COREX
http://lands.let.kun.nl/cgn/home.htm
30 Nov. 2001
PRASA2001 - Franschhoek
32
IFA corpus




5.5 hrs of high-quality-recorded speech
4 male and 4 female speakers
more than 30 min. per speaker
various speaking styles per speaker
from conversational and read speech, to isolated
sentences, words and syllables



everything phonemically segmented & labeled
free access via SQL query language
http://www.fon.hum.uva.nl/IFAcorpus
30 Nov. 2001
PRASA2001 - Franschhoek
33
Speech science and
speech technology


we should try to bridge that gap
see my keynotes at ICPhS ’99 and Eurospeech’01:
“Flexible, robust and efficient human speech processing
versus present-day speech technology”
“Acquiring and implementing phonetic knowledge”




we have to understand each other in order to be
able to communicate and to contribute
probabilistic vs. knowledge driven
adding (multiple) knowledge (sources) to improve
performance
much knowledge in speech databases
30 Nov. 2001
PRASA2001 - Franschhoek
34
Phonetics  Speech Techn.
AFFINITY
to: phonetics
speech technology
from:
source / filter
phonetics
individuality
context
prosody
more data
speech technology new models
probabilities
speech vs. NLP
30 Nov. 2001
PRASA2001 - Franschhoek
human performance
specific knowledge
regularities
multiple features
EU FPV, DARPA
applications
user orientation
evaluation
35
Do recognizers need
intelligent ears?




intelligent ears  front-end pre-processor
only if it improves performance
humans are generally better speech
processors than machines, perhaps system
developers can learn from human behavior
robustness at stake (noise, reverberation,
incompleteness, restoration, competing
speakers, variable speaking rate, context,
dialects, non-nativeness, style, emotion)
30 Nov. 2001
PRASA2001 - Franschhoek
36
What is (phonetic) knowledge?






phonetic textbook knowledge
probabilistic knowledge from databases
fixed set of features vs. adaptable set
trading relations, selectivity
knowledge of the world, expectation
global vs. detailed
30 Nov. 2001
PRASA2001 - Franschhoek
37
How good is
human/machine speech recogn.?
corpus
TI digits
alphabet
description
vocabulary recognition % word error
size
perplexity machine human
read digits
10
read
26
letters
Resource
read
1,000
Management sentences
NAB
read
5,000sentences
unlimited
Switchboard spontaneous 2,000CSR
telephone
unlimited
conversations
Switchboard idem
20
wordspotting
keywords
10
26
0.72
5
0.009
1.6
60-1,000
17
2
45-160
6.6
0.4
80-150
43
4
31.1
7.4
-
Adapted from Lippmann (SpeCom, 1997)
30 Nov. 2001
PRASA2001 - Franschhoek
38
Human vs. machine (ASR)


machine surprisingly good for certain tasks
machine could be better for many others


robustness, outliers
what are the limits of human performance?



in noise
for degraded speech
missing information (trading)
30 Nov. 2001
PRASA2001 - Franschhoek
39
Human word intelligibility vs. noise
humans
start to
have some
trouble
recognizers
do have
trouble!
40
Robustness to degraded speech


speech = time-modulated signal in frequency bands
relatively insensitive to (spectral) distortions



temporal smearing of envelope modulation



prerequisite for digital hearing aid
modulating spectral slope: -5 to +5 dB/oct, 0.25-2 Hz
ca. 4 Hz max. in modulation spectrum  syllable
LP>4 Hz and HP<8 Hz little effect on intelligibility
spectral envelope smearing

for BW>1/3 oct masked SRT starts to degrade
30 Nov. 2001
PRASA2001 - Franschhoek
41
Robustness to degraded speech
and missing information

partly reversed speech (Saberi & Perrott,

Nature, 4/99)
 fixed duration segments time reversed or shifted
in time: perfect sentence intelligibility up to 50 ms
(demo: every 50 ms reversed
original
)
 low frequency modulation envelope (3-8 Hz) vs.
acoustic spectrum
 syllable as information unit? (S. Greenberg)
gap and click restoration (Warren)

gating experiments
30 Nov. 2001
PRASA2001 - Franschhoek
42
Desired pre-processor
characteristics in ASR


basic sensitivity for stationary and dynamic sounds
robustness to degraded speech



robustness to noise and reverberation
filter characteristics



rather insensitive to spectral and temporal smearing
is BP, PLP, MFCC, RASTA, TRAPS good enough?
lateral inhibition (spectral sharpening); dynamics
what can be neglected?

non-linearities, limited dynamic range, active elements,
co-modulation, secondary pitch, etc.
30 Nov. 2001
PRASA2001 - Franschhoek
43
Caricature of present-day
speech recognizers


fixed pre-processor, fixed features
trained with a variety of speech input




monaural, uni-modal input
pitch extractor generally not operational
performs well on average behavior



much global information, but ..... no interrelations
but ..... does poorly on any type of outlier (OOV, nonnative, fast or whispered speech, other communication
channel, new topic, new speaker)
neglects lots of useful (phonetic) information
heavily relies on language model
30 Nov. 2001
PRASA2001 - Franschhoek
44
Useful information: durational variability
overall average=95 ms
4626
Root /iy/
factor
level
95
m ean
39
s.d.
normal rate=95
0
1
1544
R
0
S
796
1
2
711
2
1588
1494
83
95
109
31
36
46
primary stress=104
0
37
816
1
735
count
2
0
37
719
1
729
2
46
78
89
91
87
104
98
98
119
104
25
36
25
29
40
34
33
54
42
word final=136
0
Lw
91
1
2
529
3
117
0
79
52
1
70
2
180
3
433
0
14
1
22
2
1
80
91
75
80
94
136
101
101
83
107
99
26
30
22
25
27
50
25
42
24
36
0
utterance final=186
0
Lu
52
0
50
1
12
2
8
0
134
2
46
0
374
1
37
2
22
94
126
186
121
98
111
96
156
90
27
46
52
23
25
24
37
58
27
Adopted from Wang (1998)
Academia (knowledge)
and industry (applications)







what do industry and universities expect from
each other? (panel discussion at E’01)
proper education and training  E-masters
good exchange between academia & industry
participation in joint projects  speech DB
adapt to requirements  CAIP Symposium
open source approach  Linux, praat, HTK
complaints: sometimes bad management and
high risk (puts HLT in bad spotlight, e.g. L&H)
30 Nov. 2001
PRASA2001 - Franschhoek
46
Information Technology for
Homeland Security

Center for Advanced Information Processing,
CAIP Symposium, Rutgers Univ., Nov. 29



“subsequent to events of Sept. 11, CAIP modified its
traditional Annual Research Review”
“Symposium identifies issues in Homeland Security and
encourages research, particularly with university-industry
cooperation”
e.g., biometric and voice identification; fusing voice and
face data; multimodal interfaces for asset deployment;
face-tracking for identification; microphone array for
speaker tracking
30 Nov. 2001
PRASA2001 - Franschhoek
47
E-masters in
Language and Speech

Course Content:








Theoretical Linguistics
Natural Language Processing
Phonetics and Phonology
Cognitive models for speech language processing
Speech signal processing
Pattern recognition
Language engineering applications
http://www.cstr.ed.ac.uk/euromasters/
30 Nov. 2001
PRASA2001 - Franschhoek
48
Conclusions





collecting speech corpora in national
languages (like in SA) is and excellent basis,
both for research and for applications
combine industrial and academic skills
make proper use of experiences elsewhere
that’s why we are all here at this workshop!
good luck and thank you for your attention
30 Nov. 2001
PRASA2001 - Franschhoek
49
Descargar

Multilingual HLT in Europe and the development of ASR