Brain and Communication
Mainz
Friday, 24 November 2000
Computers that read, hear
and understand
Prof. Wolfgang Wahlster
German Research Center for
Artificial Intelligence, DFKI GmbH
Stuhlsatzenhausweg 3
66123 Saarbruecken, Germany
phone: (+49 681) 302-5252/4162
fax: (+49 681) 302-5341
e-mail: [email protected]
WWW:http://www.dfki.de/~wahlster
Pervasive Speech and Language Technology
A capuccino in
10 minutes, please!
Speech-controlled
coffee machine
Let‘s go to Baker
Street in Berkeley!
Speech-based
car navigation
I would like to hear
Mozart‘s piano concert
No. 3!
Speech-enabled
music selection
Send the following email to
Mark Maybury: Hi Mark,
please forward the following
agenda to your project
partners!
Dictation
© Wolfgang Wahlster, DFKI
Pervasive Speech and Language Technology
Show me all CNN news of the
last 3 months that feature Bill
Clinton discussing health
care!
What has Jim Hendler said
about DAML during our
recent Dagstuhl seminar?
I would like to make an
appointment with
Dr. Kuremastu in Kyoto next
week!
Information on demand
Audio Mining
Speech-to-Speech
Translation
© Wolfgang Wahlster, DFKI
Three Levels of Language Processing
Speech Input
Speech Recognition
Word Lists
Sprachanalyse
What has the speaker
said?
100
Alternatives
Speech Analysis
Grammar
Lexical
Meaning
Speech
Understanding
What has the speaker
meant?
10
Alternatives
Discourse Context
Knowledge
about Domain
of Discourse
Reduction of Uncertainty
Acoustic
Language Models
What does the speaker
want?
Unambiguous
Understanding in the
Dialog Context
© Wolfgang Wahlster, DFKI
Increasing Complexity
Challenges for Language Engineering
Input Conditions
Naturalness
Adaptability
Dialog Capabilities
Close-Speaking
Microphone/Headset
Push-to-talk
Isolated Words
Speaker
Dependent
Monolog
Dictation
Telephone,
Pause-based
Segmentation
Read Continuous
Speech
Speaker
Independent
Informationseeking Dialog
Open Microphone,
GSM Quality
Spontaneous
Speech
Speaker
adaptive
Multiparty
Negotiation
© Wolfgang Wahlster, DFKI
Context-Sensitive Speech-to-Speech Translation
Wann fährt der nächste
Zug nach Hamburg ab?
When does the next
train to Hamburg depart?
Wo befindet sich
das nächste
Hotel?
Where is the nearest
hotel?
Verbmobil
Server
© Wolfgang Wahlster, DFKI
Mobile Speech-to-Speech Translation of
Spontaneous Dialogs
Verbmobil Speech
Translation Server
Solution: Conference Call: The Verbmobil Speech Translation Server
is accessed by GSM mobile phones.
© Wolfgang Wahlster, DFKI
Speech-to-Speech Translation
© Wolfgang Wahlster, DFKI
The Control Panel of Verbmobil
© Wolfgang Wahlster, DFKI
General Speech Recognition Task
Audio Signal
Recognizers
Word Hypotheses Graph
German
English
Japanese
© Wolfgang Wahlster, DFKI
Extracting Statistical Properties from Large Corpora
Transcribed
Speech Data
Segmented
Speech
with Prosodic
Labels
Annotated
Dialogs with
Dialog Acts
Treebanks &
PredicateArgument
Structures
Aligned
Bilingual
Corpora
Machine Learning
for the Integration of Statistical Properties into
Symbolic Models for Speech Recognition, Parsing,
Dialog Processing, Translation
Hidden
Markov
Models
Neural Nets,
Multilayered
Perceptrons
Probabilistic
Automata
Probabilistic
Grammars
Probabilistic
Transfer
Rules
© Wolfgang Wahlster, DFKI
The Use of Prosodic Information
at All Processing Stages
Speech Signal
Word Hypotheses Graph
Multilingual Prosody Module
Prosodic features:
lduration
lpitch
lenergy
lpause
Boundary
Information
Boundary
Information
Sentence
Mood
Accented
Words
Prosodic Feature
Vector
Search Space
Restriction
Dialog Act
Segmentation and
Recognition
Constraints for
Transfer
Lexical
Choice
Speaker
Adaptation
Parsing
Dialog
Understanding
Translation
Generation
Speech
Synthesis
© Wolfgang Wahlster, DFKI
The Understanding of Spontaneous Speech Repairs
Original Utterance
Editing Phase
Repair Phase
I need a car next Tuesday
oops
Monday
Hesitation
Reparans
Reparandum
Recognition of
Substitutions
Transformation of the
Word Hypothesis Graph
I need a car next Monday
Verbmobil Technology: Understands Speech Repairs and extracts the
intended meaning
Dictation Systems like: ViaVoice, VoiceXpress, FreeSpeech, Naturally Speaking
cannot deal with spontaneous speech and transcribe
the corrupted utterances.
© Wolfgang Wahlster, DFKI
Automatic Understanding and Correction of Speech
Repairs in Spontaneous Telephone Dialogs
Wir treffen uns in
Mannheim, äh,
in Saarbrücken.
(We are meeting in
Mannheim, oops,
in Saarbruecken.)
German
English
We are meeting
in Saarbruecken.
© Wolfgang Wahlster, DFKI
Spoken Dialogs about Schedules
Fielded applications
l Train schedules
(German Railway System, DB)
l TABA (Philips)
+49 241 60 40 20
l OSCAR (DaimlerChrysler)
+49 1805 99 66 22
l Flight Schedules (Lufthansa)
l ALF (Philips)
+49 1803 00 00 74
Technical Challenges: phone -based dialogs, many proper names, clarification
subdialogs
© Wolfgang Wahlster, DFKI
Linguatronic : Spoken Dialogs with Mercedes-Benz
Please call Doris Wahlster.
Microphone
Open the left window in the back.
Push-to-talk
Switch
I want to hear the weather channel.
When will I reach the next gas station?
Where is the next parking lot?
l Speech control of: cellular phone, radio, windows / AC, route guidance system
l Option for S-, C-, and E-Class of Mercedes and BMW
l Speaker-independent, Garbage models for non-speech (blinker, AC, wheels)
© Wolfgang Wahlster, DFKI
Speech-based Interaction with an Organizer
on a WAP Phone (Voice In - WML out)
With Maier
on 25 Oktober,
with Tetzlaff,
and with Streit too.
Oops, not with Streit.
From 2 to 3.
Okay!
© Wolfgang Wahlster, DFKI
Augmented Reality: Combining Speech, Gestures and
Graphics for Mobile Access to a Digital Library
Mobile Dialog with a Virtual Tourist
Guide for the Heidelberg Castle
Location-adaptive
Query Interpretation
© Wolfgang Wahlster, DFKI
Augmented Reality: Combining Speech, Gestures and
Graphics for Mobile Access to a Digital Library
Multimodal Route Description
Mobile Speech Translation and
Multilingual Information Access
© Wolfgang Wahlster, DFKI
Augmented Reality: Combining Speech, Gestures and
Graphics for Mobile Access to a Digital Library
Speech-based Access
to 3D Virtual Views
Multimodal Output from
a Digital Library and
Speech-based Access
to Internet Content
© Wolfgang Wahlster, DFKI
International Research Trends in Multilingual Systems
Multilingual Language Technology
Speech Recognition, Language Understanding, Language Generation,
and Speech Synthesis
Dialog Translation
l Call Centers
l ECommerce
l Mobile Travel
Assistance
l Telephone
Translations
Verbmobil
Multilingual
Indexing and
Annotation of
Videos
l Video Archives
l News Archives
Multilingual
Audio Retrieval
and Audio Mining
l Discussions
l Lecture Notes
l Organizers
Speech-based
Web Access
to Multilingual
Web pages
l WAP Phones
l WebTV
Multilingual
and Mobile
Communication
Assistants
l Multimodal
Interfaces
SmartKom
Spontaneous Speech, Robust Processing and Translation, Semantic and Pragmatic Understanding
© Wolfgang Wahlster, DFKI
Open Problems for the Next Decade
l Problems with current machine learning approaches
L Expensive data collection
L Cognitively unrealistic training data
L Data sparseness
l Problems with current hand-crafted knowledge sources
L Brittleness
L Domain dependence
L Limited scalability
© Wolfgang Wahlster, DFKI
A Speculative Conclusion (+50 years)
-500 years
Oral Society
TODAY

News and knowledge is
passed orally
No mass storage
No automatic processing
No automatic retrieval
Textual Society
News and knowledge is
passed textually
Mass storage of texts
Text Processing
Text Retrieval
+50 years
 Oral Society
News and knowledge is
passed orally
Mass storage of speech
Speech Processing
Audio Retrieval
© Wolfgang Wahlster, DFKI
Descargar

Kein Folientitel