Interactive Robot
Theatre as a future toy
Integration of Machine Learning, Quantum
Networks and software-hardware methodology in
humanoid robots
Marek Perkowski, Dept. Electrical Engineering PSU, and
Department of Electronics and Computer Science,
Korea Advanced Institute of Science and Technology
Talk presented at Department of Electronics,
Technical University of Warsaw, December 2004
Toys is a very serious business
Talking Robots
• Many talking toys exist,
but they are still very
primitive
• Actors for robot theatre,
agents for advertisement,
education and
Dog.com from Japan
entertainment.
• Designing inexpensive
We concentrate on Machine Learning
natural size humanoid techniques used to teach robots
caricature and realistic behaviors, natural language dialogs
robot heads
and facial gestures.
Work in progress
Robot with a Personality?
• Future robots will interact
closely with non-sophisticated
users, children and elderly, so
the question arises, how they
should look like?
• If human face for a robot, then
what kind of a face?
• Handsome or average, realistic
or simplified, normal size or
enlarged?
•The famous example
of a robot head
is Kismet from MIT.
• Why is Kismet so successful?
•We believe that a robot that will interact with humans
should have some kind of “personality” and Kismet so far
is the only robot with “personality”.
Robot face should be friendly and funny
The Muppets of Jim Henson are hard to match examples of
puppet artistry and animation perfection.
We are interested in
robot’s personality
as expressed by
its:
–
–
–
–
behavior,
facial gestures,
emotions,
learned speech
patterns.
Behavior, Dialog
and Learning
Words communicate only about 35 % of the
information transmitted from a sender to a
receiver in a human-to-human
communication.
The remaining information is included in
para-language.
Emotions, thoughts, decision and
intentions of a speaker can be
recognized earlier than they are
verbalized. NASA
• Robot activity as a mapping of the sensed environment and
internal states to behaviors and new internal states
(emotions, energy levels, etc).
• Our goal is to uniformly integrate verbal and non-verbal
robot behaviors.
Morita’s Theory
Fig. 1. Learning Behaviors as Mappings from
environment’s features to interaction procedures
probability
Speech from
microphones
Image features
from cameras
Sonars and other
sensors
Automatic
software
construction
Verbal response
generation (text
response and TTS).
Stored sounds
Head
movements
and facial
emotions
generation
from examples
(decision tree, bi
bi-decomposition,
Ashenhurst,, DNF)
Ashenhurst
Neck and shoulders
movement generation
Emotions and
knowledge memory
Robot Head Construction, 1999
Furby head with new control
Jonas
We animate various kinds of humanoid heads with from 4 to 20
DOF, looking for comical and entertaining values.
Mister Butcher
Latex skin from
Hollywood
4 degree of
freedom neck
Robot Head Construction, 2000
Skeleton
Alien
We use inexpensive servos from Hitec and Futaba, plastic, playwood and
aluminum.
The robots are either PC-interfaced, use simple micro-controllers such as
Basic Stamp, or are radio controlled from a PC or by the user.
Technical Construction, 2001
Details
Adam
Marvin the Crazy Robot
Virginia Woolf
2001
heads equipped with microphones, USB cameras, sonars
and CDS light sensors
2002
Max
BUG (Big Ugly Robot)
Image processing and pattern recognition uses software developed at
PSU, CMU and Intel (public domain software available on WWW).
Software is in Visual C++, Visual Basic, Lisp and Prolog.
Visual Feedback and Learning based on
Constructive Induction
2002
2002, Japan
Professor Perky
Professor Perky with automated
speech recognition (ASR) and
text-to-speech (TTS) capabilities
• We compared several
commercial speech systems
from Microsoft, Sensory and
Fonix.
•Based on experiences in
highly noisy environments and
with a variety of speakers, we
selected Fonix for both ASR
and TTS for Professor Perky
and Maria robots.
1 dollar latex skin
from China
• We use microphone array
from Andrea Electronics.
Maria,
2002/2003
20 DOF
Construction
details of Maria
location
of head
servos
skull
location of
controlling
rods
Custom
designed skin
location
of remote
servos
Animation of eyes and eyelids
Software/Hardware Architecture
•Network- 10 processors, ultimately 100 processors.
•Robotics Processors. ACS 16
•Speech cards on Intel grant
•More cameras
•Tracking in all robots.
•Robotic languages – Alice and Cyc-like technologies.
Cynthia,
2004, June
Currently
the hands
are not
moveable.
We have a
separate
hand design
project.
HAHOE KAIST ROBOT THEATRE, KOREA,
SUMMER 2004
Sonbi, the Confucian Scholar
Paekchong, the bad butcher
Yangban
the
Aristocrat
and Pune
his
concubine
The Narrator
The Narrator
We base all
our robots on
inexpensive
radiocontrolled
servo
technology.
We are
familiar with
latex and
polyester
technologies
for faces
New Silicone Skins
Probabilistic State Machines to describe
emotions
“you are beautiful”
P=1
/ ”Thanks for a compliment”
“you are blonde!”
Happy state
P=0.3
/ ”I am not an idiot”
“you are blonde!”
P=0.7
/ Do you suggest I am
an idiot?”
Ironic state
Unhappy state
Facial Behaviors of Maria
Maria asks:
Response:
Do I look like younger than twenty three?
“no”
“yes”
0.3
Maria smiles
“no”
0.7
Maria frowns
Probabilistic Grammars for performances
Speak ”Professor Perky”, blinks eyes twice
P=0.1
Speak ”Professor Perky”
Where?
P=0.3
Who?
P=0.5
P=0.5
Speak ”Doctor Lee”
Speak “in some
location”, smiles
broadly
P=0.5
Speak “In the
classroom”,
shakes head
What?
P=0.1
Speak “Was
singing and
dancing”
P=0.1
P=0.1
….
P=0.1
Speak “Was
drinking wine”
Human-controlled modes of
dialog/interaction
“Thanks, I
have a lesson”
“Hello Maria”
Robot
performs
Human teaches
“Question”
Robot asks
“Stop
performance”
“Thanks, I
have a
question”
“Questioning
finished”
Human asks
“Lesson
finished”
“Thanks, I
have a
command”
“Command
finished”
Human commands
Robot-Receptionist Initiated
Conversation
Human
Robot
What can I do for you?
Robot asks
This represents operation mode
Robot-Receptionist Initiated
Conversation
Human
Robot
What can I do for you?
Robot asks
I would like to order a
table for two
Robot-Receptionist Initiated
Conversation
Human
Robot
Smoking or nonsmoking?
Robot asks
Robot-Receptionist Initiated
Conversation
Human
Robot
Smoking or nonsmoking?
Robot asks
I do not understand
Robot-Receptionist Initiated
Conversation
Human
Robot
Do you want a table in a
smoking or non-smoking
section of the restaurant?
Non-smoking section is
near the terrace.
Robot asks
Robot-Receptionist Initiated
Conversation
Human
Robot
Do you want a table in a
smoking or non-smoking
section of the restaurant?
Non-smoking section is
near the terrace.
Robot asks
A table near the
terrace, please
Human-Initiated Conversation
Human
Robot
Hello Maria
initialization
Robot asks
Human-Initiated Conversation
Robot
What can I do for you?
Robot asks
Human
Hello Maria
Human-Asking
Human
Robot
Question
Robot asks
Question
Human asks
Human-Asking
Robot
Yes, you ask a
question.
Human
Question
Human asks
Human-Asking
Robot
Yes, you ask a
question.
Human
What book wrote Lee?
Human asks
Human-Asking
Robot
I have no sure
information.
Human
What book wrote Lee?
Human asks
Human-Asking
Robot
I have no sure
information.
Human
Try to guess.
Human asks
Human-Asking
Robot
Lee wrote book
“Flowers”.
Human
Try to guess.
Human asks
Human-Asking
Robot
Lee wrote book
“Flowers”.
Human
This is not true.
Human asks
Human ends
questioning
Human-Teaching
Human
Robot
Questioning finished
Human asks
“Questioning
finished”
Human teaches
Robot asks
Thanks, I have a lesson
Robot enters
asking mode
Human-Teaching
Human
Robot
What can I do for you?
Questioning finished
Human asks
“Questioning
finished”
Human teaches
Robot asks
Thanks, I have a lesson
Human starts
teaching
Human-Teaching
Human
Robot
What can I do for you?
Thanks, I have a
lesson
Human asks
“Questioning
finished”
Human teaches
Robot asks
Thanks, I have a lesson
Human-Teaching
Robot
Yes
Human
Thanks, I have a
lesson
Human teaches
Human-Teaching
Robot
Yes
Human
I give you questionanswer pattern
Human teaches
Human-Teaching
Robot
Human
Question pattern:
Yes
What book Smith wrote?
Human teaches
Human-Teaching
Robot
Human
Answer pattern:
Yes
Smith wrote book
“Automata Theory”
Human teaches
Human-Teaching
Robot
Human
Checking question:
Yes
What book wrote Smith?
Human teaches
Human-Teaching
Robot
Human
Checking question:
Smith wrote book
“Automata Theory”
What book wrote Smith?
Human teaches
Human-Teaching
Robot
Yes
Human
I give you questionanswer pattern
Human teaches
Human-Teaching
Robot
Human
Question pattern:
Yes
Where is room of Lee?
Human teaches
Human-Teaching
Robot
Human
Answer pattern:
Yes
Lee is in room 332
Human teaches
Human-Checking what robot
learned
Human
Robot
Lesson finished
Robot asks
“Lesson
finished”
Question
Human teaches
Human asks
Human-Checking what robot
learned
Human
Robot
Lesson finished
What can I do for
you?
Robot asks
“Lesson
finished”
Question
Human teaches
Human asks
Human-Checking what robot
learned
Human
Robot
Question
What can I do for
you?
Robot asks
“Lesson
finished”
Question
Human teaches
Human asks
Human-Asking
Human
Robot
Yes, you ask a
question.
Robot asks
Question
“Lesson
finished”
Question
Human teaches
Human asks
Human-Asking
Robot
Yes, you ask a
question.
Human
What book wrote Lee?
Human asks
Human-Asking
Robot
I have no sure
information.
Human
What book wrote Lee?
Human asks
Human-Asking
Robot
I have no sure
information.
Human
Try to guess.
Human asks
Human-Asking
Robot
Lee wrote book
“Automata Theory”
Observe that robot found
similarity between Smith and
Lee and generalized
(incorrectly)
Human
Try to guess.
Human asks
Behavior, Dialog and Learning
• The dialog/behavior has the following components:
– (1) Eliza-like natural language dialogs based on pattern
matching and limited parsing.
• Commercial products like Memoni, Dog.Com, Heart, Alice,
and Doctor all use this technology, very successfully – for
instance Alice program won the 2001 Turing competition.
– This is a “conversational” part of the robot brain, based
on pattern-matching, parsing and black-board principles.
– It is also a kind of “operating system” of the robot, which
supervises other subroutines.
Behavior, Dialog and Learning
• (2) Subroutines with logical data base and natural
language parsing (CHAT).
– This is the logical part of the brain used to find
connections between places, timings and all kind of
logical and relational reasonings, such as answering
questions about Japanese geography.
• (3) Use of generalization and analogy in dialog on
many levels.
– Random and intentional linking of spoken language, sound effects and
facial gestures.
– Use of Constructive Induction approach to help generalization, analogy
reasoning and probabilistic generations in verbal and non-verbal dialog,
like learning when to smile or turn the head off the partner.
Behavior, Dialog and Learning
• (4) Model of the robot, model of the user, scenario of the
situation, history of the dialog, all used in the
conversation.
• (5) Use of word spotting in speech recognition rather
than single word or continuous speech recognition.
• (6) Continuous speech recognition (Microsoft)
• (7) Avoidance of “I do not know”, “I do not
understand” answers from the robot.
– Our robot will have always something to say, in the worst case,
over-generalized, with not valid analogies or even nonsensical
and random.
Recent Works
• Multi-brain: sub-brains communicate
through natural language:
– Devil, angel and myself.
– Egoist and moralist
• CAM – Contents Addressable Memory.
Cypress funded project in 2005.
Fig. 2. Seven examples (4-input, 2 output minterms) are
given by the teacher as correct robot behaviors
Robot turns
head right,
away from
light in left
CD
AB
00
01
11
10
Robot turns head
left, away from light
in right, towards
sound in left
00 01 11 10
-
1,0 2,0 0,0 1,0 1,1
- – 0,0 - 0,0 - -
Robot turns head left
with equal front lighting
and no sound.
It blinks eyes
A - right
microphone
B - left light sensor
C - right light sensor
D - left microphone
Robot does
nothing
Head_Horiz , Eye_Blink
Generalization of
the AshenhurstCurtis
decomposition
model
This kind of tables known from
Rough Sets, Decision Trees, etc
Data Mining
Decomposition is hierarchical
At every step many
decompositions exist
Constructive Induction:
Technical Details
• U. Wong and M. Perkowski, A New Approach to Robot’s
Imitation of Behaviors by Decomposition of Multiple-Valued
Relations, Proc. 5th Intern. Workshop on Boolean Problems,
Freiberg, Germany, Sept. 19-20, 2002, pp. 265-270.
• A. Mishchenko, B. Steinbach and M. Perkowski, An Algorithm for
Bi-Decomposition of Logic Functions, Proc. DAC 2001, June 1822, Las Vegas, pp. 103-108.
• A. Mishchenko, B. Steinbach and M. Perkowski, BiDecomposition of Multi-Valued Relations, Proc. 10th IWLS, pp.
35-40, Granlibakken, CA, June 12-15, 2001. IEEE Computer
Society and ACM SIGDA.
Constructive Induction
• Decision Trees, Ashenhurst/Curtis hierarchical
decomposition and Bi-Decomposition algorithms are
used in our software
• These methods create our subset of MVSIS system
developed under Prof. Robert Brayton at University of
California at Berkeley [2].
– The entire MVSIS system can be also used.
• The system generates robot’s behaviors (C program
codes) from examples given by the users.
• This method is used for embedded system design, but
we use it specifically for robot interaction.
Braitenberg Vehicles
Braitenberg Vehicles
Quantum Circuits
Toffoli gate: Universal, uses controlled square root of NOT
|0
|0
|1
|1
|x
|x
U
?
=
|0
|0
|0
|0
|0
|0
|1
|1
|1
|1
|1
|1
|x
V|x
V
Example 1: Simulation
V†
|x
V
|x
Quantum Portland Faces
Conclusion. What did we learn
• (1) the more degrees of freedom the better
the animation realism.
• (2) synchronization of spoken text and head
(especially jaw) movements are important
but difficult.
• (3) gestures and speech intonation of the
head should be slightly exaggerated.
Conclusion. What did we learn(cont)
• (4) the sound should be laud to cover noises coming from
motors and gears and for a better theatrical effect.
• (5) noise of servos can be also reduced by appropriate
animation and synchronization.
• (6) best available ATR and TTS packages should be
applied.
• (7) OpenCV from Intel is excellent.
• (8) use puppet theatre experiences.
Conclusion. What did we learn(cont)
• (9) because of a too slow learning, improved parameterized
learning methods will be developed, but also based on
constructive induction.
• (10) open question: funny versus beautiful.
• (11) either high quality voice recognition from headset or
low quality in noisy room. YOU CANNOT HAVE BOTH
WITH CURRENT ATR TOOLS.
• The bi-decomposer of relations and other useful software
used in this project can be downloaded from http://wwwcad.eecs.berkeley.edu/mvsis/.
• This is the most advanced
humanoid robot theatre robot
project outside of Japan
• Open to international
collaboration
What to emphasize in future
cooperation?
• We want to develop a general methodology for
prototyping software/hardware systems for
interactive robots that work in human
environment.
• Image processing, voice recognition, speech
synthesis, expressing emotions, recognizing
human emotions.
• Machine Learning technologies.
• Safety, not hitting humans.
Can we do
this in
Poland?
Yes, engineers from Technical
University of Gliwice produce already a
commercially available hexapod
International Intel Science Talent
Competition and PDXBOT 2004
Additional
Slides with
Background
Robot Toy Market - Robosapiens
toy, poses in front of
Globalization
• Globalization implies that images,
technologies and messages are everywhere,
but at the same time disconnected from a
particular social structure or context. (Alain
Touraine)
• The need of a constantly expanding market
for its products chases the bourgoise over the
whole surface of the globe. It must nestle
everywhere, settle everywhere, establish
connections everywhere. (Marx & Engels,
1848)
India and China - what’s
different?
• They started at the same level of wealth and exports in
1980
• China today exports $ 184 Bn vs $ 34 Bn for India
• China’s export industry employs today over 50 million
people (vs 2 m s/w in 2008, and 20 m in the entire
organized sector in India today!)
• China’s export industry consists of toys (> 60% of the
world market), bicycles (10 m to the US alone last year),
and textiles (a vision of having a share of > 50% of the
world market by 2008)
Learning from Korea and Singapore
• The importance of Learning
– To manufacture efficiently
– To open the door to foreign technology and
investment
– To have sufficient pride in ones own ability to open
the door and go out and build ones own
proprietary identity
• To invest in fundamentals like Education
• to have the right cultural prerequisites for catching up
• To have pragmatism rule, not ideology
Samsung
1979 Started making microwaves
1980 First export order (foreign brand)
1983 OEM contracts with General Electric
1985 All GE microwaves made by Samsung
1987 All GE microwaves designed by Samsung
1990 The world’s largest microwave manufacturer without its own brand
1990 Launch own brand outside Korea
2000 Samsung microwaves # 1 worldwide, twelve
factories in twelve countries (including India, China
and the US)
2003 – the largest electronics company in
the world
How did Samsung do it?
• By learning from GE and other buyers
• By working very hard - 70 hour weeks, 10 days
holiday
• By being very productive - 9 microwaves per
person per day vs 4 at GE
• By meeting every delivery on time, even if it
meant working 7-day weeks for six months
• By developing new models so well that it got
GE to stop developing their own
Ashenhurst Functional Decomposition
Evaluates the data function and attempts to
decompose into simpler functions.
F(X) = H( G(B), A ), X = A  B
X
B - bound set
A - free set
if A  B = , it is disjoint decomposition
if A  B  , it is non-disjoint decomposition
A Standard Map of
function ‘z’
Bound Set
ab\c
00
01
02
Free Set
10
11
12
20
21
22
Explain the concept of
generalized don’t cares
0
1
2
1
-
0 ,1
1
1
2 ,3
2
2
0
-
Columns 0 and 1
and
columns 0 and 2
are compatible
column
compatibility = 2
z
NEW Decomposition of Multi-Valued
Relations
F(X) = H( G(B), A ), X = A  B
A
X
Relation
B
if A  B = , it is disjoint decomposition
if A  B  , it is non-disjoint decomposition
Forming a CCG from a K-Map
Bound Set
ab\c
00
01
02
Free Set
10
11
12
20
21
22
0
1
2
1
-
0 ,1
1
1
2 ,3
2
2
0
-
Columns 0 and 1 and columns 0 and
2 are compatible
column compatibility index = 2
C0
C1
C2
z
Column
Compatibility
Graph
Forming a CIG from a K-Map
ab\c
00
01
02
10
11
12
20
21
22
0
1
2
1
-
0 ,1
1
1
2 ,3
2
2
0
-
Columns 1 and 2 are incompatible
chromatic number = 2
C0
C1
C2
z
Column
Incompatibility Graph
Constructive Induction
• A unified internal language is used to describe
behaviors in which text generation and facial
gestures are unified.
• This language is for learned behaviors.
• Expressions (programs) in this language are
either created by humans or induced
automatically from examples given by trainers.
Is it worthy to build humanoid robots?
• Man’s design versus robot’s design
• The humanoid robot is versatile and adaptive, it takes its form
from a human, a design well-verified by Nature.
• Complete isomorphism of a humanoid robot with a human is
very difficult to achieve (walking) and not even not entirely
desired.
• All what we need is to adapt the robot maximally to the needs
of humans – elderly, disabled, children, entertainment.
• Replicating human motor or sensor functionality are based on
mechanistic methodologies, but adaptations and upgrades are
possible – for instance brain wave control or wheels
• Is it a cheating?
Is it worthy to build humanoid robots?
• Can building a mechanistic digital synthetic version of man be
anything less than a cheat when man is not mechanistic, digital
nor synthetic?
• If reference for the “ultimate” robot is man, then there is little
confusion about one’s aim to replace man with a machine.
Man & Machine
• Main reason to build machines in our
likeness is to facilitate their integration in
our social space:
– SOCIAL ROBOTICS
• Robot should do many things that we do, like
climbing stairs, but not necessarily in the way we
do it – airplane and bird analogy.
• Humanoid robots/social robots should make our
life easier.
The Social Robot
• “developing a brain”:
– Cognitive abilities as developed from classical AI to modern
cognitive ideas (neural networks, multi-agent systems, genetic
algorithms…)
• “giving the brain a body”:
– Physical embodiment, as indicated by Brooks [Bro86], Steels
[Ste94], etc.
• “a world of bodies”:
– Social embodiment
• A Social Robot is:
– A physical entity embodied in a complex, dynamic, and social
environment sufficiently empowered to behave in a manner
conducive to its own goals and those of its community.
Anthropomorphism
• Social interaction involves an adaptation on
both sides to rationalise each others actions,
and the interpretation of the others actions
based on one’s references
• Projective Intelligence: the observer
ascribes a degree of “intelligence” to the
system through their rationalisation of its
actions
Anthropomorphism & The Social Robot
• Objectives
– Augment human-robot sociality
– Understand and rationalize robot behavior
• Embrace anthropomorphism
• BUT - How does the robot not become trapped by
behavioral expectations?
• REQUIRED: A balance between anthropomorphic
features and behaviors leading to the robot’s own
identity
Finding the Balance
• Movement
– Behavior (afraid of the light)
– Facial Action Coding System
• Form
– Physical construction
– Degrees of freedom
• Interaction
– Communication (robot-like vs. human voice)
– Social cues/timing
• Autonomy
• Function & role
– machine vs. human capabilities
Emotion Robots Experiments
•
•
•
•
•
Autonomous mobile robots
Emotion through motion
“Projective emotion”
Anthropomorphism
Social behaviors
• Qualitative and quantitative analysis to a wide
audience through online web-based experiments
The perception learning tasks
•
Robot Vision:
1. Where is a face? (Face detection)
2. Who is this person (Face recognition, learning with
supervisor, person’s name is given in the process.
3. Age and gender of the person.
4. Hand gestures.
5. Emotions expressed as facial gestures (smile, eye
movements, etc)
6. Objects hold by the person
7. Lips reading for speech recognition.
8. Body language.
The perception learning tasks
•
Speech recognition:
1. Who is this person (voice based speaker
recognition, learning with supervisor, person’s name
is given in the process.)
2. Isolated words recognition for word spotting.
3. Sentence recognition.
•
Sensors.
1. Temperature
2. Touch
3. movement
The behavior learning tasks
•
Facial and upper body gestures:
1. Face/neck gesticulation for interactive dialog.
2. Face/neck gesticulation for theatre plays.
3. Face/neck gesticulation for singing/dancing.
•
Hand gestures and manipulation.
1. Hand gesticulation for interactive dialog.
2. Hand gesticulation for theatre plays.
3. Hand gesticulation for singing/dancing.
Learning the perception/behavior
mappings
1. Tracking the human.
2. Full gesticulation as a response to human
behavior in dialogs and dancing/singing.
3. Modification of semi-autonomous behaviors such
as breathing, eye blinking, mechanical hand
withdrawals, speech acts as response to person’s
behaviors.
4. Playing games with humans.
5. Body contact with human such as safe
gesticulation close to human and hand shaking.
Descargar

Slide 1