Distributed representations in AI:
Building the world model and
analogical reasoning
Dr. Dmitri Rachkovskij
Dept. Of Neural Information Processing Technologies
International Research and Training Center of
Information Technologies and Systems, Kiev, Ukraine
National Ukrainian Academy of Sciences
[email protected]
Slide number is in the right lower corner:
1
Distributed representations in AI:
Building the world model and
analogical reasoning: Plan
1. Intro: The world model - content and organization
The world model: models of attributes, objects, relations, episodes
Part-whole hierarchy
Classification hierarchy
2. The world model based on distributed representations
Symbolic, local, and distributed representations
General architecture of the world model
Representation and processing of simple structures
Representations of sequences and episodes
3. Analogical reasoning
Analogy and its research and modeling by cognitive psychologists
Modeling of analogy with distributed representations
2
The Agent's internal world model –
the system of knowledge about domain and Agent itself
Necessary for organization of intelligent behavior
The world model stores
episode and situations
encountered by the Agent
reactions on situations
evaluations of results
etc.
B R A IN
W ORLD
W o rld
M ODEL
AG ENT
t1
The world model is used for
recognition
analysis
prediction
reaction
etc.
t0 < t1
t2> t1
3
Models of attributes, objects, relations, situations, ...
Content of models - appearance, structure, behavior, etc. of objects
Models of
Attributes
black, furry, barking, big,
four-legged...
Objects - real or ideal
physical bodies (table)
animals (dog)
unreal (centaur)
Episodes and situations
- many objects and
relations (hunting, war,..)
Relations
spatial (above)
temporal (after)
part-whole (part-of)
Relation R(X,Y,…) requires
several objects X,Y,…
Undirected (X and Y are
neighbors)
Directed (X above Y, X sold Y a
BOOK)
Arguments of directed relations
have roles (agent X, object
Y , etc.)
4
Compositional structure of the world model:
part-whole relations and hierarchy
Models are interrelated. Model of an object (car) is associated with
models of its parts (body, motor, wheels), attributes (color, etc.)
Division objects-attributes is not absolute.
Model of the part (e.g., wheel) can have
ca r
its own models-parts (tire, rim, cap),
wheel
attributes (shape - ring, texture –
"protector", color - black, material - rubber)
body
ca p
tire
Structural attributes
M o d e -w h o le
car
M o d e l-w h o le
w heel
M o d e l-p a rt
tire , rim , c a p ...
M o d e l-w h o le
w h e e l, b o d y,
m o to r, ...
rim
Attributes of appearance
M o d e l-w h o le
w heel
M o d e l-p a rt
a ttrib u te s
C o lo r
(b la c k )
T e x tu re
("p ro te c to r")
Shape
(rin g )
5
Hierarchy part-whole is also named as
"meronymy-holonymy", "aggregate", "modular",
"compositional", "structural"
Model of situation
A d o g a p p ro a c h e d a b a ll, th e d o g b it th e b a ll, th e b a ll e x p lo d e d
S itu a tio n
A p p ro a c h
(d o g , b a ll)
E x p lo d e
(b a ll)
B it
(d o g , b a ll)
E v e n ts
O b je c ts a n d
re la tio n s
A p p ro a c h
dog
b a ll
B it
E x p lo d e
Model-whole may include (associate)
models-parts of goals, actions, costs, evaluations, feelings, и т.д.
6
Examples of hierarchical structures







Logical or symbolic propositions
Patterns in structural or syntactic recognition
Complex chemical substances
Proteins in molecular biology
Computer programs
Knowledge bases
etc.
F
X2
X1
f
X3
a
f
a
y
F
N
N
a
b
F (a ,f(y),f(y,F (a ,b )))
NH2
N
a
a
a
a
a
b
a
b
a
b
b
b
b
b
b
a
7
Classification structure of a world model:
is-a relations and hierarchy
Models of classes - combinations of attribute models
Is-a relation (cat is an animal) is also hierarchical there exist more abstract (general) or less abstract
(specific) classes
A N IM A L S
C A TS
CHAOCHAOS
DOGS
S P A N IE L S
S P A N IA L F ID O
8
Operations with class models
Classes allow transferring experience to new objects and situations
and making predictions. E.g., similar looking objects often
behave similar.
M O D E L O F O B JE C T
OR CLASS
R E C O G N IT IO N
P R E D IC T IO N
O B SER VA B LE
A T T R IB U T E S (M O D E L S )
APPEARANCE
U N O B SER V AB LE
A T T R IB U T E S (M O D E L S )
STR U C TU R E,
B E H A V IO R ...
9
Available "world models"
World models ~ Knowledge Bases ~ Ontologies
are beginning to be used in diverse applications
CYC - a general ontology for commonsense knowledge
(Lenat and Guha)
UMLS (Unified Medical Language System)
- an ontology of medical concepts
(Humphreys and Lindberg )
WORDNET - one of the most comprehensive lexical ontologies
(Miller)
etc.
10
Representation of information
in the world models
Representation schemes used in ontologies:
Conceptual graphs
Semantic networks
Frames
Prolog predicates
Other special representation languages
All these representation schemes are based on
traditional symbolic and local representations of
information - they have drawbacks
11
Distributed representations
(1) immediately reflect similarity degree
(2) provide high information capacity
Allow
(3) using of associative memory
(4) formation of part-whole hierarchies
(5) modeling of analogical reasoning
12
Local and symbolic implementations of models
Representation of information in computer
Pyramidal networks
address field1 field2 field3 field4 …
1
name_А"A"
…
2
name_В"В"
…
АB
АC
3
name_АВ
"АВ"
address1
address2
…
... ... ...
...
...
...
А
B
C
..
N
.
Bracketed representations
R e s o u rc e p o o l C o d e ve c to r
((A B) (A C D))...(...)
#1
#z
A
...
...
A 1 0 0 0 0 0 ...0 ...
Symbolic representations
B
...
...
B 0 1 0 0 0 0 ...0 ...
C
...
...
C 0 0 1 0 0 0 ...0 ...
entities (sun planet)
...
...
...
...
Z
expressions (((mass sun) :name masssun) Z...
... 0 0 0 00 0 ...1...
((mass planet) :name mass-planet)
R e s o u rc e u n it
((greater masssun massplanet) :name
13
Problems with symbolic and local representations
All-or-none similarity
Information capacity
Models are either identical Coding "1 from N"
or unsimilar
In N units - up to N models
А is identical to А
А is non-identical to
В,С,…,Z…
BC&AB
AC&AB
BC&AC
AC&BC
AB&AC
BC&BA
AC&BA
AB&BC
AC&CA
AB&BA
...
А - in a pointer to
((B C) ((B D) (X Y Z))...
BC&CA
...
AB&CA
...
...
...
...
...
AB
...
BA
AC
A
CA
B
BC
CB
C
14
Problems with symbolic and local representations
Comparison and estimation of similarity of complex
structured models are very complex (comparison
of graphs - finding partial isomorphism)
X
Y
...
X
Y
...
...
...
15
Problems with symbolic and local representations
Graph isomorphism is not enough to justify estimation of
similarity by humans
(1) The fascists invaded France, causing people to flee France
(2) Rats infested the apartment, causing the people to leave the
apartment
(3) The game show host kissed the contestant, inviting the
audience to applaud the contestant
Abstract structure of 3 sentences:
Relation12 [Relation1(X,Y), Relation2(Z,Y)]
Correspondences due to the abstract scheme
The fascists  the game show host (X) ??
France  contestant (Y) ??
invaded  kissed (Relation1) ??
People  audience (Z) ??
to flee  to applaud (Relation2) ?
16
Distributed representations
R e s o u rs e p o o l C o d e ve c to r
#1
A
B
C
#N
...
...
...
Z
...
...
...




A 0 1 0 1 0 1 ...0
B 0 1 0 0 0 1 ...0
C 0 0 1 0 0 0 ...1
...
Z
... 0 0 1 00 0 ...0
In distributed
representations, any
object is represented by
a distributed pattern of
units' activity
Long binary sparse stochastic codevector
binary (elements 0 and 1)
Overlap: O=p*p*N
number of elements N~100 000
p=M/N
number of 1s M~1000 << N
O = M*M/N=10=1%
1s have (pseudo)random positions
17
Advantages of distributed representations
Efficient use of resources for information representation
Up to CNM items (compare to N for local representations)
Natural representation and estimation of similarity.
Similarity of X and Y is calculated by dot-product:
S(X,Y) = |X & Y| = SUM XiYi, i =1,…,N
For binary vectors, dot-product = number of overlapping 1s
A
0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0 1 ... 0 1
A
0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0 1 ... 0 1
B
1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 ... 0 0
B
1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 ... 0 1
AB
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0
AB
0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 ... 0 1
A
0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 0 1 ... 0 1
B
1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 ... 0 1
AB
0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 ... 0 1
S0.01
A
B
A
B
A
S0.5
B
S0.5
S  0 .0 1
S  0 .5
S1
18
Distributed representations
(1) immediately reflect similarity degree
(2) provide high information capacity
Allow
(3) using of associative memory
(4) formation of part-whole hierarchies
(5) modeling of analogical reasoning
19
Some history
Academician N.M.Amosov founded Dept. Of
BioCybernetics, Inst. Of Cybernetics, Kiev, in 1960s
Modeling of Thinking and the Mind, Spartan Books, USA, 1967.
M-networks
Amosov, N.M., Basilevsky E.B., Kasatkin A.M., Kasatkina L.M., Luk A.N.,
Kussul E.M., & Talayev S.A. (1972) M-network as possible basis for
construction of heuristic models, Cybernetica 3, pp. 169-186.
1974 - first-in-the-world autonomous vehicle controlled by neural
networks. The vehicle could move in natural environment
Amosov, N.M., Kasatkin, A.M., & Kasatkina L.M. (1975) Active semantic
networks in robots with an autonomous control. Fourth Intern. Joint
Conference on Artificial intelligence, v.9, pp. 11-20
Amosov, N. M., Kussul, E. M., & Fomenko, V. D. (1975) Transport robot
with a neural network control system. Advance papers of the Fourth
Intern. Joint Conference on Artificial intelligence v.9 pp.1-10
20
Associative-Projective Neural Networks
APNNs - proposed by Dr. Kussul in 1983
E.Kussul (1992)
Associative neuron-like structures
T.Baidyk (2002)
Neural networks and problems of Artificial
Intelligence
Amosov, Baidyk, Goltsev,Kasatkin, Kasatkina,
Kussul, Rachkovskij
Neurocomputers and Intelligent Robots
(Spanish translation is prepared)
Donald Hebb (1949) Organization of behavior
21
The APNN architecture of the world model
based on distributed representations
Module M includes buffer fields
BUF and associative field ASC.
BUF - make bitwise operations
ASC - returns the most similar
codevector from the memory
M
BUF
BUF
BUF
ASC
BUF
BUF
BUF
...
o b je cts
visio n
h e a rin g
name
to u ch
...
sh a p e
te xtu re
co lor
se n s o ria l a ttrib u te s
ta ste
...
o lfa cto ry
...
size
p a ra lle l s ch e m e s
o f m o d u le s
22
The APNN architecture: associative field
Retrieval
Storage
1 .V e c to r A
2 . V e c to r B
3 . V e c to r C
...
L . V e c to r X Y
A u to a s s o c ia tiv e
m e m o ry
V e c to r A '
A u to a s s o c ia tiv e
m e m o ry
V e c to r A
23
The APNN architecture: associative field
local version
Storage
V e c to r A '
N
1 .V e c to r A
1 .V e c to r A
2 . V e c to r B
2 .V e c to r B
3 . V e c to r C
...
L . V e c to r X Y
3 .V e c to r C
...
L .V e c to r X Y
Retrieval
V e c to r A
M ax D ot
P ro d u c t
A 'A
V e c to r B
V e c to r C
...
V e c to r X Y
V e c to r A
distributed version (Hopfield-type)
V e c to r A '
N
1 .V e c to r A
2 . V e c to r B
3 . V e c to r C
...
L . V e c to r X Y
M a trix
N xN
V e c to rM a trix
P ro d u c t +
T h re s h o ld
M a trix
N xN
V e c to r A
24
A simple parallel scheme of construction and
processing of part-whole structures
M 3 :  A '  '   A  
M 3: A
M 1: A
A ,B ,C ,...
M 2: 
M 2: '
 , ,,...
Searching of the most similar model
A' is similar to A
alpha' is similar to alpha
M 3: A
M 1: A
M 1: A'
M 2: 
Construction by superposition
(disjunction) & thinning
|<A v alpha> & A| =
= 0.5 |A| = 0.5 |alpha|
<...> = thinning = grouping
M 3: A
M 1: A
M 2: 
Decoding of the model-whole
through its parts
25
A simple parallel scheme of construction and
processing of part-whole structures
M 3: A
Interpretations of  A  
Retrieval of the model-whole
by its model-part
M 3: A
M 1: A

A
M 1: A
a ttrib u te
a ttrib u te
p a rt
p a rt
o b je c t
o b je c t
va ria b le
va lu e
situ a tio n
re a c tio n
situ a tio n
e va lu a tio n
M 2: 
Association between two
models-parts
26
Architecture of the world model
using distributed representations
"ve rtica l" m o d u le
sch e m e
...
o b je ct
visio n
h e a rin g
name
to u ch
...
sh a p e
te xtu re
co lor
se n s o ry a ttrib u te s
ta ste
...
o lfa cto ry
...
size
"p a ra lle l" m o d u le
sch e m e
27
Representation of sequences and hierarchical episodes
AB vs BA. X>>n (n is #). A>>1  B>>2 vs  B>>1  A>>2.
( ,(),(,(,)) ) - labeled ordered acyclic graph
>>1  >>1>>2  >>1>>1>>2>>2 >>3
"Spot
2
P = c a u s e  B IT E  F L E E 
B IT E > > c a u s e _ a n tc  F L E E > > c a u s e _ cn sq 
bit
Jane,
1
causing
Jane
to flee from
Spot"
B IT E = b ite  s p o t  ja n e  s p o t> > a g e n t  ja n e > > o b je ct
1
F L E E = fle e  s p o t  ja n e  ja n e > > a g e n t  s p o t> > o b je ct
s p o t = d o g  id _ s p o t
ja n e = h u m a n  id _ ja n e
b ite
fle e
cause
id _ s p o t
id _ ja n e
h u m an
dog
28
Properties of distributed representations of
complex hierarchical models
The same dimensionality of codevectors of parts and
wholes
Codevector of model-whole is similar to codevectors of
models-parts
Similar (with respect to objects and relations)
hierarchical models have similar codevectors
Associations
(between parts and wholes, attributes and objects,
attributes and classes, objects and situations, etc.)
are made by similarity of codevectors
(not by connections or pointers,
as in local and symbolic representations)
29
Distributed representations
(1) immediately reflect similarity degree
(2) provide high information capacity
Allow
(3) using of associative memory
(4) formation of part-whole hierarchies
(5) modeling of analogical reasoning
30
Comparison of hierarchical structures and
analogical reasoning
An ability to estimate easily similarity of complex structured
representations is essential for many AI problems
One interesting problem is modeling of analogical reasoning
Gentner & Markman (1995, 1997, 2003);
Hummel & Holyoak (1997); Eliasmith & Thagard (2001)
Analogy, metaphor - a comparison process that allows
consideration of one domain from the point of view of
different domain (Gentner & Markman 1995, 2003)
Rutherford: solar system  atom
31
Similarity of analogs
Analogs are hierarchical
a n a lo g X
structured episodes or
situations
Analogs are compared not only
by "surface similarity"
(common or similar elements
- objects, relations, etc. )
"Structural similarity" is very
important - how the
elements are grouped in
analogs ("structural
consistency", "isomorphism")
a n a lo g Y
32
3 stages of analogy processing
1. Access (retrieval, recall)
- process of finding in
memory the most similar
base analog given target
(input, cue, probe)
episode
2. Mapping - process of
finding correspondences
between the elements of
two analogs
3. Inferences about
target analog based on
the info from the base
TARGET
E P IS O D E
E P IS O D E Х
BASE
E P IS O D E S
E P IS O D E 1
E P IS O D E 2
E P IS O D E Y
...
E P IS O D E N
MEMORY
33
Analogy in everyday life
Solution of problems. Base analog is used as a source
of ideas about the target problem
Explanations. Base analog is used for understanding
of target analog
Formation and evaluation of hypothesis
Justification of point of view (In political, historical,
etc. discussions)
In literature.
Etc.
34
Analogy in solving problems
Target analog - heat flow. Base analog - water flow.
H e a t flo w
c o ffe e
ic e c u b e
W ate r F lo w
C AUSE
G REATER
PRESSURE
PRESSURE
BEA KER
VIA L
D IAM ETER
D IA M ETER
FLO W
W ATER
FLAT-TO P
PIPE
LIQ UID
R e la t io n o r a t trib u t e
G REATER
F u n c tio n
N o b o r d er
E n tity
M a tc h H y p o th es is
35
Study of analogy in humans. Various types of analog
similarity. Episodes with animals.
Example - episodes with animals, adapted from Thagard, Holyoak,
Nelson & Gochfeld (1990).
General scheme: R0 (R1(X,Y), R2(Y,X))
The episodes have the same relations, as Probe, but various types of
similarity
S im ila rity typ e
P ro b e
P
L ite ra l S im ila rit y
LS
C ro s s -M a p p in g
CM
S u rfa c e F a tu re s
SF
A n a lo g y
AN
1 s t O rd e r re la tio n s
o n ly F O R
E p is o d e
Jane

S pot
b it
Jane
F id o
b it
John

F re d
b it
R over
J o h n fle e fro m
M o rt
b it
M o rt fle e fro m
fle e fro m
S pot
John
fle e fro m
F id o

R over
fle e fro m
F re d
F id o

F id o
b it
John
F e lix

F e lix
fle e fro m
M o rt
F e lix

F e lix
b it
M o rt
36
Modeling analogical reasoning with distributed
representations
Access to analogical episodes
Similar structures have similar codevectors. Total similarity of
structures is evaluated by the overlap of their codevectors
Access to analogical episode is done by finding in long-term memory
a codevector most similar to the codevector of the input episode
TARGET
E P IS O D E
C O D E V E C T O R o f E P IS O D E Х
BASE
E P IS O D E S
E P IS O D E 1 c o d e ve c to r
E P IS O D E 2 c o d e ve c to r
C O D E V E C T O R o f E P IS O D E Y
...
E P IS O D E N co d e v e cto r
MEMORY
37
Access to analogical episodes
Episodes with animals
Humans demonstrate the following pattern of retrieving analogs from
long-term memory: LS > CM  SF > AN > FOR.
Forbus, Gentner& Law 1995; Ross 1989; Wharton, Holyoak 1994
Similarity values between codevectors of episodes (our model)
T yp e o f
sim ilarity
P
LS
CM
SF
AN
FO R
E pisod e
S pot bit J ane 
Jane fle e from S p ot
Fido bit Jo hn 
John fle e from Fido
Fre d bit R o ve r 
R ove r fle e from Fre d
John fle e from Fido 
Fido bit Jo hn
M ort bit F elix 
Felix flee from M o rt
M ort fle e from Felix 
Felix bit M o rt
R ang e of

S im ilarity
value
1.0 0
0.4 0
0.3 0
0.2 4
0.1 4
0.0 9
0.0 05-0.0 08
38
Mapping analogs
with distributed representations
Mapping (interpretation) of analogy - find corresponding
elements of the analogs
In our model, many analogs can be mapped by direct
similarity of the codevectors of their elements
39
Mapping by
similarity
2
P = c a u s e  B IT E  F L E E 
B IT E > > c a u s e _ a n tc  F L E E > > c a u s e _ cn sq 
1
B IT E = b ite  s p o t  ja n e  s p o t> > a g e n t  ja n e > > o b je ct
1
F L E E = fle e  s p o t  ja n e  ja n e > > a g e n t  s p o t> > o b je ct
s p o t = d o g  id _ s p o t
ja n e = h u m a n  id _ ja n e
b ite
fle e
cause
level#4
Probe
id _ s p o t
id _ ja n e
level#3
A ntc
C nsq
Level#2
B ite
Flee
bite_a
h u m an
dog
level#1
bite_o
flee_a
flee_o
level#4
E _A N
0.25
0.19
0.19
0.08
0.08
0.06
0.07
0.07
0.07
level#3
A ntc
C nsq
0.19
0.20
0.33
0.02
0.02
0.35
0.11
0.02
0.03
0.12
0.10
0.02
0.10
0.03
0.02
0.11
0.02
0.11
level#2
B ite
F lee
0.07
0.07
0.11
0.02
0.03
0.11
0.16
0.02
0.03
0.16
0.15
0.02
0.14
0.02
0.02
0.14
0.02
0.14
level#1
bite_a
bite_o
flee_a
flee_o
0.06
0.06
0.06
0.07
0.10
0.09
0.02
0.02
0.02
0.02
0.10
0.09
0.15
0.13
0.02
0.02
0.02
0.02
0.14
0.13
0.26
0.02
0.02
0.01
0.02
0.25
0.02
0.03
0.02
0.02
0.26
0.02
0.01
0.02
0.02
0.25
40
Mapping analogs with distributed
representations. More sequential scheme.
4
P ro b e =  C n s q  A n tc 
lev#4
lev#bas e:
lev#0:
lev#bas e:
A gent roles
C harac ters
O bjec t roles
4
E _ F O R =  C n sq  A n tc 
0.57
lev#4
0.57
3
3
A n tc =  B ite  cau se_an tc
A n tc =  cau se_an tc  F lee
3
C n sq =  F lee  cau se_cn sq 
0.67(0.29)
3
C n sq =  cau se_cn sq  B ite 
0.28(0.09)
cau se_cn sq
0.32(0.11)
0.65(0.02)
0.67(0.29)
1.0
lev#3
lev#3
cau se_cn sq
B ite =
2
2
 b ite_a  b ite_o 
F lee =  flee_a  flee_o 
2
F lee =  flee_a  flee_o 
lev#2
0.51(0.3)
2
0.02(0.16)
0.5(0.3)
 b ite_a  b ite_o  = B ite
0.5(0.3)
0.51(0.3)
lev#2
1
1
 flee_ag t  m o rt = flee_a
1
1
 flee_o b j  felix  = flee_o
b ite_a =  sp o t  b ite_ag t
b ite_o =  jan e  b ite_o b j
1
flee_a =  flee_ag t  jan e
0.33(0.01)
1
b ite_a =  felix  b ite_ag t
0.02(0.26)
0.67(0.02)
jan e
felix
0.03
flee_ag t
0.67(0.03)
0.34(0.01)
0.01
b ite_ag t
flee_o =
1
 flee_o b j  sp o t
0.68(0.03)
lev#1
1
0.02(0.26)
0.33(0.01)
sp o t
m o rt
0.03
 m o rt  b ite_o b j = b ite_o
0.68(0.02)
flee_o b j
0.33(0.01)
lev#1
0.01
b ite_o b j
41
Distributed representations in APNNs
(1) immediately reflect similarity degree
(2) provide high information capacity
Allow
(3) using of distributed associative memory
(4) formation of part-whole hierarchies
(5) modeling of analogical reasoning
42
43
Descargar

Иерархии в моделях мира, распределенное …