NEW TIES WP2
Agent and learning mechanisms
Decision making and learning

Agents have a controller (decision tree, DQT)





Input: situation (as perceived = seen/heard/interpr’d
Output: action
Decision making = using DQT
Learning = modifying DQT
Decisions also depend on inheritable “attitude
genes” (learned through evolution)
Example of a DQT
B
0.5
T
NO
VISUAL:
FRONT
FOOD
REACHABLE
MOVE
0.2
0.2
TURN
LEFT
TURN
RIGHT
BAG:
FOOD
T
YES
YES
A
0.6
0.5
A
A
1.0
1.0
PICKUP
EAT
NO
A
0.6
MOVE
0.2
TURN
LEFT
0.2
TURN
RIGHT
Legend
B
Bias
T
Test
A
Action
Decision
0.2
Genetic bias
YES Boolean choice
Interaction evolution & individual
learning




Bias node with n children each with bias bi
Bias ≠ probability
 Bias bi is learned, changing (name: learned bias)
 Genetic bias gi is inherited, part of genome,
constant
Actual probability of choosing child x:
p(b,g) = b + (1 - b) ∙ g
Learned and inherited behaviour are linked through
formula
DQT nodes & parameters cont’d



Test node language: native concepts +
emerging concepts
Native: see_agent, see_mother, see_food,
have_food, see_mate, …
New concepts can emerge by categorisation
(discrimination game)
Learning: the heart of the
emergence engine

Evolutionary learning:



Individual learning:



not within an agent (not during lifetime), over generations
by variation + selection
within one agent, during lifetime
by reinforcement learning
Social learning:


during lifetime, in interacting agents
by sending/receiving + adopting knowledge pieces
Types of learning: properties

Evolutionary learning:




Individual learning:





Agent does not create new knowledge during lifetime
Basic DQTree + genetic biases are inheritable
“knowledge creator” = crossover and mutation
Agent does create new knowledge during lifetime
DQTree + learned biases are modified
“knowledge creator” = reinforcement learning (driven by rewards)
Individually learnt knowledge dies with its host agent
Social learning:



Agent imports knowledge already created elsewhere (new? not new?)
Adoption of imported knowledge ≈ crossover
Importing knowledge pieces
 can save effort for recipient
 can create novel combinations

Exporting knowledge helps its preservation after death of host
Present status of types of learning

Evolutionary learning:



Individual learning:


Demonstrated in 2 NT scenarios
Autonomous selection/reproduction causes problems with
population stability (im/explosion)
 code, but never demonstrated in NT scenarios
Social learning:


Under construction/design based on the “telepathy” approach
Communication protocols + adoption mechanisms needed
Evolution: variation operators

Operators for DQT:
 Crossover = subtree swap
 Mutation =
 Substitute subtree with random sub-tree
 Change concepts in test nodes
 Change bias on an edge

Operators for attitude genes:
 Crossover = full arithmetic xover
 Mutation =
 Add Gaussian noise
 Replace with random value
Evolution: selection operators

Mate selection:




Mate action chosen by DQT
Propose – accept proposal
Adulthood OK
Survivor selection:


Dead if too old ( ≥ 80 years)
Dead if zero energy
Experiment: Simple world
Setup: Environment




World size: 200 x 200 grid cells
Agents and food (no tokens, roads, etc).
Both are variable in number.
Initial distribution of agents (500): in upper
left corner
Initial distribution of food (10000): 5000 in
upper left and lower right corner.
Experiment: Simple world
Setup: Agents

Native knowledge (concepts and DQT sub
trees)
 Navigating (random walk)
 Eating (identify, pickup and eat plants)
 Mating (identify mates, propose/agree)

Random DQT-tree branches
 Differs per agent
 Based on the “pool” of native concepts
Experiment: Simple world
Simulation continued for 3 months real time to test stability
Experiment: Poisonous Food
Setup: Environment





Two types of food: poisonous (decreases energy)
and edible (increases energy)
World size: 200 x 200 grid cells
Agents and food (no tokens, roads, etc). Both are
variable in number.
Initial distribution of agents (500): uniform random
over the grid space.
Initial distribution of food (10000): 5000 of each
type of food uniform random over the same grid
space as the agents.
Experiment: Poisonous Food
Setup: Agent

Native knowledge
 Identical to simple world experiment

Additional native knowledge
 Can distinguish poisonous from edible plants
 Relation with eating/picking up is not present

No random DQT-tree branches
Experiment: Poisonous Food
Measures





Population size
Welfare (energy)
Number of poisonous and edible plants
Complexity of controller (nr. of nodes)
Age
Experiment: Poisonous Food
Demo
Experiment: Poisonous Food Results
2500
population size
healthy plants (x10)
poisonous plants (x10)
average agent energy (x100)
2000
1500
1000
500
0
timestep
1250
2500
3750
5000
6250
7500
8750
10000
11250
12500
13750
15000
Descargar

Environmental challenges