1
 GP usually does recombination using subtree
crossover.
 The idea:
 We have two individuals.
 in each individual, select a random subtree (which can
possibly be the root).
 Then swap those two subtrees.
 Commonly we may pick leaf nodes 10% of the time and
non-leaf nodes 90% of the time.
2
May also
work as a
node
selection
algorithm
3
4
c now has the number of
desired nodes.
Randomly ‘select’ a node
among the desired nodes.
The Randomly ‘selected’
node is now picked and
returned.
Counts only the desired nodes
and its ‘desired’ children
recursively in DFS manner
This is basically a DFS
Skip the ‘desired’ nodes and
its ‘desired’ children
recursively in DFS manner
until the random integer ‘a’
is crossed and if crossed,
return the node
This is also a DFS.
Note that, c is made 0
before a call to PickNode
5
6
 Normally GP does not do Mutation
 Why? Because, the crossover operator is:
 Highly Mutative
 And non-homologous
 In homologous crossover, an individual
crossing over with itself will just make copies
of itself
7
 Subtree mutation: pick a random subtree and replace
it with a randomly-generated subtre
 Commonly Grow is used with a max-depth of 5.
 Again, leaf nodes are often picked 10% of the time and non-
leaf nodes 90% of the time.
 Replace a random non-leaf node with one of its subtrees.
 Pick a random non-leaf node and swap its subtrees.
 Mutate ERC nodes with some noise
 Select two subtrees in the individual such that neither is
contained within the other, and swap them with one
another.
8
9
 Genetic Programming isn’t constrained to a single tree
 it’s perfectly reasonable to have a genotype in the form
of a vector of trees i.e., forest.
10
 Consider a soccer robot team programs
 Here an individual is an entire robot team.
 Each robot program is represented as two trees
 a tree is called when the robot is far from the
ball (it returns a vector indicating where to
run),
 another tree is called when the robot is close
enough to a ball to kick it (it would return a
vector indicating the direction to kick).
11
 The individual consisted of some n of these
tree pairs
22 trees!
 perhaps one per robot
 or one per robot class (goalies, forwards,
etc.),
 one for every robot to use (a homogeneous
2 trees only!
team).
 So a soccer individual might have from 2 to
22 trees!
12
 Trees can also be used to define
 These ADFs can be called by a primary
tree.
 ADFs can be used if ideal solutions are likely
to be represented by:
 large trees
 with often repeated subtrees within them
13
 In such cases, individual may consist of one or two
subfunctions
 These subfunctions are called repeatedly from a
main tree.
 We add to the individual a second tree (the
 We include special nodes (function calls to that
second tree) in the original parent tree’s
function set.
14
15
 We first evaluate the main tree.
 When it’s time to call an ADF1 node, we first call its two children and
store away their results
 call them result1 and result2




We then call the ADF1 tree.
When its ARG1 function is called, it automatically returns result1.
Likewise ARG2 automatically returns result2.
When the ADF1 tree is finished, we store away its return value
 let’s call it final.
and we continue execution where we left off in the Main tree.
16
 The heuristic here is one of modularity.
 Modularity lets us search very large spaces if we
know that good solutions in them are likely to be
repetitive:
 We don’t need the individual to contain all of the
repetitions perfectly
 Rather we can break the individuals into modules with
an overarching section of the genotype define how those
modules are arranged.
 So, we have a primary tree having ADFs as nodes…
17
 Note that you could have more than one ADF tree.
 In theory you could have recursive calls, that is, ADF
trees which call each other.
 But your individuals won’t be smart enough to build a
base case automatically,
 so to keep the system from going into an infinite
recursive loop, you’ll need to have some maximum call
depth built in.
18
 When the ADF1 node is called, we jump immediately to the
 without bothering to call the children to the ADF1 node first.
 Whenever ARG1 in ADF1 tree is called:
 we jump back to the main tree for a second
 call the first child
 get its result
 come back to the ADF1 tree
 and have ARG1 return that value.
 Likewise for ARG2.
19
20
 STGP is a variant of Genetic Programming
 In STGP we need to assign “Type Constraints” to nodes
 type constraints specify:
 which nodes are permitted to hook up with which
others
 and in what way.
21
 If each node in the tree returns the same kind of things then its
easy
 for example, in symbolic regression, all nodes return floating-point
values
 But in more complex programs, this isn’t really an option.
 For example, consider a a special operator, “If”:
 “If” takes three arguments: a boolean test, the then-value to return if
the test is true, and the else-value to return if the test is false.
 “If” returns floating-point values like the other nodes but it requires
a node which returns a boolean value.
 This means we’ll need to add some nodes which return only
boolean values;
 leaf nodes and some non-leaf node operators like And or Not.
 So, we must pay attention to which nodes are permitted to be
children of which other nodes and where.
 What happens, for example, if we try to multiply sin(x) by “false”?
22
 Atomic Typing
 Set Typing
 Polymorphic Typing
23
 It is the simplest approach
 Here, each type is just a symbol or integer.
 Each of the followings is assigned a type:
 The return value of each node,
 the expected child types for each node,
 and the expected return type for the tree as a whole,
 A node may attach as the child of another, or act as the
root node of the tree, only if the types match
24
 Here the types are sets of symbols
 Two types are said to match, if their intersections are
nonempty.
 Set typing can be used to provide sufficient typing
information for a lot of things
25
 Atomic and set typing presume a finite number of
symbols.
 Polymorphic typing relies on type resolution
algorithms similar to those found in polymorphic
typing programming languages
 It’ is quite complex.
26
Example:
 Consider a matrix-multiply node which takes two
children providing matrices and multiplies them,
returning a new matrix.
 Clearly, the dimensions of the returned matrix are
functions of the two children matrices.
 What if we change one of the children to a subtree
which returns a new, differently-sized matrix?
 It’s possible to do this if we can reconcile it by
changing the return type of the parent.
 This may trigger a cascade of changes to return types,
or to the types of children, as the tree readjusts itself.
27
28
 In cellular encoding, Trees are used as
short programs to instruct an
interpreter how to create a second data
structure (usually a graph).
 This second data structure is then used
as the phenotype.
29
The general idea:
 take a seed (perhaps a graph consisting of a
single node or a single edge) and hand it to
the root of the tree.
 The root operator modifies and expands the
graph, then hands certain expanded elements
off to its children.
 They then expand the graph further, handing
expanded pieces to their children, and so on,
 This continues until the tree is exhausted.
 The fully expanded graph is then used as the
phenotype
30
 The original formulation of Cellular Encoding
operated on graph nodes
 used mostly for neural networks
 requires a fairly complicated mechanism.
 An alternative would be to operate on graph edges
 doesn’t allow all possible graphs
 but is fairly useful for sparse or planar graphs
 often found in electrical circuits or finite-state
automata.
 This is known as the Edge Encoding
 Edge Encoding is relatively easier to describe
31
 Edge (Cellular) Encoding tree nodes:
 take things from parents
 operate on them
 and then hand them to their children.
 The Figure shows an Edge Encoding operator called
double.
 It takes an edge (E) handed to it by its parent
 and creates a duplicate edge (F) connecting the same two
nodes
 It then hands one edge each to its two children.
32
 Duplicates an edge
33
 Duplicates an edge
 Reverses an edge
34
 Duplicates an edge
 Reverses an edge
 Creates a self-loop edge on the
35
 Duplicates an edge
 Reverses an edge
 Creates a self-loop edge on the
 Creates a new node and then a
new edge from the head node of
bud’s edge out to the new node
36
 Duplicates an edge
 Reverses an edge
 Creates a self-loop edge on the
 Creates a new node and then a
new edge from the head node of
bud’s edge out to the new node
 Splits its edge into an edge from
split’s tail node out to a new
node, and then another edge
37
 Duplicates an edge
 Reverses an edge
 Creates a self-loop edge on the
 Creates a new node and then a
new edge from the head node of
bud’s edge out to the new node
 Splits its edge into an edge from
split’s tail node out to a new
node, and then another edge
 (\epsilon, 1, 0) label their edge
38
 Duplicates an edge
 Reverses an edge
 Creates a self-loop edge on the




Creates a new node and then a
new edge from the head node of
bud’s edge out to the new node
Splits its edge into an edge from
split’s tail node out to a new
node, and then another edge
(\epsilon, 1, 0) label their edge
node of their edge.
39
The initial Edge
40
After Applying double
Duplicates an
edge; hands one
edge each to its
two children.
41
After Applying reverse
Reverses an edge
42
After :
Applying loop
Creates a self-loop
node of loop’s
edge
43
After :
Applying loop
Applying \epsilon
44
After :
Applying loop
Applying \epsilon
Applying start
45
After :
Applying loop
Applying \epsilon
Applying start
Applying 0
46
After :
Applying bud
Creates a new node and then a
new edge from the head node of
bud’s edge out to the new node
47
Splits its edge into an edge from
split’s tail node out to a new node,
and then another edge back to
After :
Applying split
48
After :
Applying split
Applying 0
49
After :
Applying split
Applying 0
Applying accept
50
After :
Applying split
Applying 0
Applying accept
Applying 1
51
After Applying 1
52
So the edge encoding gives us full finite-state
automaton which interprets the regular language
(1|0)*01.
53
 Cellular and Edge encoding are examples of an indirect
or developmental encoding
 a representation which contains a set of rules to develop
a secondary data structure which is then used as the
phenotype.
 Indirect encodings are a popular research topic for two
reasons.
 First there’s the biological attraction
 DNA is an indirect encoding, as it creates RNA and protein
which then go on to do the heavy lifting in living organisms.
 Second, we can add compactness and modularity by
54
55
 In stack languages code takes the form of a stream
of instructions, usually in postfix notation.
 These languages assume the presence of a stack onto
which temporary variables, and in some cases chunks
of code, can be pushed and popped.
56
Example:
 Consider the expression 5 × (3 + 4)
 A stack language might say 5 3 4 + ×.
This pushes 5, 3, and 4 on the stack
then pops the last two numbers (4 and 3) off the stack
and pushes the result (7) on the stack
then pops off of the stack the remaining two numbers (7
and 5)
 multiplies them
 and pushes the result (35) back on.





57
 Stack languages often create subroutines by pushing
chunks of code onto the stack, then executing them
from the stack multiple times.
58
Example:
 We may generalize the procedure for evaluating a ×
(b+c) into a subroutine by wrapping its operators in
parentheses
 We use two special operators:
 “push (+×)” [(+×) => a × (b+c)]
 “do”: which pops a subroutine off the stack, executes it n
times, and pushes it back on the stack
 Then, “5 7 9 2 4 3 6 5 9 push (+×) 4 do” computes
5×(7+9)×(2+4)×(3+6)×(5+9).
59
Example:
 “5 7 9 2 4 3 6 5 9 push (+×) 4 do”
 Pop do => it looks for an n
 Pop 4 => Now it knows it needs to do the ‘ops’ 4 times
 Pop (+x) => a x (b+c) => 4 times apply ‘x’ operations





×(5+9)
×(3+6)×(5+9)
×(2+4)×(3+6)×(5+9)
× (7+9)×(2+4)×(3+6)×(5+9) [4 times done]
Now pop the last number 5 which ends the job:
5×(7+9)×(2+4)×(3+6)×(5+9).
60
 Stack languages have long been used in GP
 Among the most well-known is Lee Spector’s GP stack




language, named Push
Push maintains multiple stacks, one for each data type,
allowing code to operate over different kinds of data
cleanly.
Push also includes special stacks for storing, modifying,
and executing code.
This allows Push programs to modify their own code as
they are executing it.
This makes possible, for example, the automatic creation of
61
 If the language simply forms a stream of symbols with no constraints, we can
just use a list representations (To be discussed shortly)
 But most SL at least require that the parentheses used to delimit code must be
paired.
 There are many ways to guarantee this constraint.
 In some SL a left parenthesis must always be followed by a non-parenthesis.
 This is easy to do (use the idea of symbolic regression tree)
 Some SL allows parentheses immediately after left
parentheses
 e.g., ((a b) ( ) ((c)))
 you could just use the left parenthesis as the root node
of a subtree and the elements inside the parentheses
as the children of that node.
 Both approaches will require that tree nodes have
arbitrary arity.
62
 Another approach is to use nested linked lists.
 Each parenthesized expression (like (a b)) forms one
linked list, and elements in the expression can be
 Nodes in each linked list node are called
cons cell
 The left child of a cons cell holds a list
element
 the right child points to the next cons cell
in the list or to \box indicating the end of
the list.
63
64
 Parse trees aren’t the only way to represent programs.
 Some GP practitioners represent programs as
arbitrary-length lists (or strings) of machine
language instructions.
 Individuals are evaluated by converting the lists
into functions and executing them.
 This is commonly known as Linear Genetic
Programming
65
 Executing arbitrary machine code strings can be
dangerous if closure isn’t maintained.
 Certainly your individual wouldn’t be just a bit-string,
 that would allow all sorts of machine language
instructions, even undesirable ones or nonsense ones
 Clearly it’d have to be a list of instructions chosen from
a carefully-selected set.
*Closure refers to producing valid individuals from previous ones.
66
 If the instruction set is finite in length, we could just
assign a unique integer to each instruction and
represent a genotype as a list of integers.
 Usually these schemes employ a finite set of registers
as well:
 this allows the machine code lists to operate essentially
like directed acyclic graphs (DAGs),
 early instructions affecting instructions much further
down in the list due to their shared register.
 Additionally some special instructions are there that
operate on constants (Add 2, etc.).
67
 GE’s representation is a list of integers or boolean




values.
It then uses this list as the decision points in a predefined tree grammar to build a GP Tree.
The tree is then evaluated in GP style to assess fitness.
This somewhat complex approach is yet another
example of an indirect encoding,
it can straightforwardly define any tree for any desired
language.
68
An arbitrary individual: <false false true true false true false true true false false>
 Consider the (arbitrary and ridiculous) grammar and
an individual represented as a list
 To interpret this, we start with tree, and use the first
element in the list to decide how to expand that
 false expands to the first item
 true expands to the second item
 Once we expand, we expand the remaining undefined
variables in a depth-first fashion
69
An arbitrary individual: <false false true true false true false true true false false>
70
An arbitrary individual: <false false true true false true false true true false false>
71
An arbitrary individual: <false false true true false true false true true false false>
72
An arbitrary individual: <false false true true false true false true true false false>
73
An arbitrary individual: <false false true true false true false true true false false>
74
An arbitrary individual: <false false true true false true false true true false false>
75
An arbitrary individual: <false false true true false true false true true false false>
76
An arbitrary individual: <false false true true false true false true true false false>
77
An arbitrary individual: <false false true true false true false true true false false>
 Notice that we wound up not using the last 4 bits in the individual
(true true false false).
 Also, the list may be too short and we don’t have enough decision
points?
 To solve this typically one just wraps around to the beginning of the list
again.
 Or bypass evaluation and just assign the individual the worst possible
fitness
78
 GE allows us to construct any valid tree for a given
grammar
 This is a lot more flexible than standard Tree-based GP:
 GE does not need strong typing.
 The downside is that this representation is naturally
un-smooth in certain places:
 tiny changes early in the list result in gigantic changes in
the tree.
 This can be a problem.
79
80
 How new lists are generated largely
depends on the domain-specific needs
of the method involved.
 Two main issues:
 specifying the length of the list,
 populating it.
81
 Specifying the Length:
 sample a length from the geometric distribution.
 Beware again that the distribution will have a very high
number of small lists
 you may wish to use a flatter distribution.
 populate the list:
 just march through the list and set each of its values to
something random but appropriate.
 Remember that for some problems there may be
constraints on which elements may appear after other
elements
82
83
Mutation in lists has two parts:
 changing the contents of the list.
 changing the size of the list
84
 Contents may be changed in exactly the
same way that you do for fixed-length
vectors:
 using a bit-flip mutation or integer
randomization, etc.
 Remember that you may not be able to change
some elements without changing others due to
certain constraints among the elements.
85
 Changing the length likewise depends on
the problem:
 some problems prefer to only add to the end of a
list.
 One simple approach is to sample from some
distribution, then add (or subtract, if it so
happens) that amount to the list length.
 we could do a random walk starting at 0,
flipping a coin until it comes up tails.
 The number we arrive at is what you add to (or delete
from, if it’s negative) the end of the list.
86
Larger b means longer and more diffuse walk
P must be initialized here as well!
We do this with some
probability depending on p
We continue the walk based
on the value of b
Recall that we did a similar random walk to determine the noise with which to
mutate in Random Walk Mutation.
87
 Lists can’t be any smaller than 1
 but can be arbitrarily large,
 A random walk like this may cause the individual lists
to become fairly large
 you may need to add some countering force to keep your
population from growing simply due to your mutation
operator
88
 In Grammatical Evolution, the early elements in the list are
far more important than the later elements.
 the early elements determine the early choices in the tree
grammar, and changing them radically changes the tree;
 whereas the later elements only change small subtrees or
individual elements
 if the list is too long, they don’t change anything at all!
 This has a huge effect on the smoothness of the landscape,
and you want to make sure your mutation procedure
reflects this.
 You might only occasionally change the elements at the
beginning of the list, and much more often change the
elements near the end of the list.
89
90
 there are various ways to do crossover among variable-
length lists.
 one-point list crossover
 two-point list crossover,
 variations on the above
 In one point list crossover
 we pick a (possibly different) point in each list
 then cross over the segments to the right of the points
 The segments can be non-zero in length.
91
Doing the swap
92
 we pick two points in each individual and swap the
mid-sections.
 Again, note that the points don’t have to be the same.
 The 1 and 2 point crossovers have quite different
dynamics.
 We need to carefully consider many things
 e.g.: Is the representation reliant to the particulars of
what’s going on in the middle, and sensitive to
disruption there?
93
Ensuring that c < d and e < f
Doing the swap
94
 certain elements of the list may be more important
than others and more sensitive to being messed up via
crossover.
 So in Grammatical Evolution for example you might
want to consider picking two-point crossover points
near to the end of the list more often than ones near
the front.
 Or stick with one-point crossover
95
96
 Rules in rulesets usually take a form which
looks like if ---> then.
 The if part is commonly called the body of
the rule
 and the then part is commonly called
 There are two common kinds of rulesets
 state-action rulesets
 production rulesets.
97
 State-action rules are designed to
perform some action (the then) when
some situation or event has occurred in
the world (the if ).
 e.g., a robot’s sensors might trigger a
rule which causes the robot to turn
left.
98
 Production rules are different in that some
rules’ then actions trigger other rules’ if
portions.
 e.g., if a rule a ---> b fires, it would then
cause some other rule b ---> c to fire.
 Production rules are mostly used to
construct indirect encodings which grow
graph structures etc.
 The interconnection among the rules in
production rulesets resemble directed graph
structures.
99
 We could use a variable-sized vector structure like a
Most Commonly Used!
list.
 Or we could use a hash table which stores the elements
as keys and arbitrary things as values. Faster Method!
 The basic closure constraint in a set is its uniqueness
property:
 often you have to make sure that when you create sets,
mutate them, or cross them over, the rules remain all
different.
 Unless you have a mutation or crossover operation
which does this naturally, you may need to go back
into the set and remove duplicates.
100
h will keep track of the unique elements. h may be a hash table.
\ell’ in the end will give the length without duplicates
v_i is a duplicate
So, swap it with the last element
Decrement \ell’ since no. of unique elements
would now be 1 less
v_i is not a duplicate; so keep it in h.
Now, v_1, v_2, … v_\ell’ have the unique
elements of vector v!.
Put them in a blank vector of length \ell’
101
102
 An agent is an autonomous
computational entity
 It manipulates the world on its own
 in response to feedback it receives from
the world.
 Agents include autonomous robots, game
agents, entities in simulations, etc.
103
 One common kind of program an agent might follow
is a policy:
 a collection of simple rules to tell the agent what to do in
each possible situation it may find itself in.
 These rules are often called state-action rules.
 Example state-action rules for an agent to get
around in a city:
 are you downtown? Then get on the train.
 Are you on the train? Then take the train to the wharf.
 Did you miss your stop? Then get off the train and get on
the return train.
104
State Description: A state description is
some feature about the current world that
might or might not be true.
Action or Class: An action is what
we should do if that feature is true
a robot might have rules like the following:
105
1. Under-Specification:
 no rules match the current condition
 there are holes in the space which no
rule covers.
 This is often handled by requiring a
default rule which fires when no
other rule fires.
106
2. Over-Specification:
 more than one rule matches the current condition, but
those rules disagree in their heads in an incompatible way
 e.g., one says “Turn Left” and one says “Turn Right”
 We’ll need employ some kind of arbitration scheme to
decide what to do.
 Most commonly, if we have lots of rules, we might have
a vote.
 Another way is to pick a rule at random.
 Note that, a state space can be simultaneously underand over-specified.
107
 As we move the agent around, we may not only assess the




fitness of the individual itself but also assess the fitness of
the individual rules inside the ruleset individual.
At the very least this can be done by breaking the rules into
those which fired during the course of running the
individual and those which never fired
 thus aren’t responsible for the wonderful/terrible
outcome that resulted
We can then punish or reward only the rules which fired.
Or if after turning Left the robot received an electric shock,
which might penalize the series of rules whose firings
which led up to that shock
We might be more inclined to mutate or eliminate (by
108
crossover) the more-penalized rules.
109
 Production rules are similar to state-action rules
except that the “actions” are used to trigger the states
of other rules.
 production rules are fired (typically) by a single event
(triggered by some other rule usually) and then this
causes them to trigger multiple downstream rules in
turn.
 A lot of the uses for production rules is to enable
modular indirect encodings
110
 Typically the symbols which appear in the heads (the
right side) of production rules are of two forms:
 nonterminal symbols, which may also appear in the
bodies of rules, and
 terminal symbols, which normally may not appear
in the bodies of rules
 Terminal symbols normally don’t expand any further.
 Note that for most production systems, there’s a fixed
number of rules, one per nonterminal.
111
 Suppose we are trying to create an 8-node directed,
unlabeled graph structure.
 Our ruleset might look like the following (numbers are
terminal symbols):
112
Start
113
Start
We then apply the rule which matches a
114
Start
we then apply rules to each of the elements in that matrix, expanding them into
their 2×2 elements
115
Start
we then apply rules to each of the elements in that matrix, expanding them into
their 2×2 elements
116
Start
we then apply rules to each of the elements in that matrix, expanding them into
their 2×2 elements
117
Start
we then apply rules to each of the elements in that matrix, expanding them into
their 2×2 elements
118
Start
we then apply rules to each of the elements in that matrix, expanding them into
their 2×2 elements
119
Start
Now we expand again!
120
Start
Now we expand again!
121
Start
Now we expand again!
122
Start
Now we expand again!
123
Start
Now we are out of non-terminals!
However, we can still expand 0 and 1
We can continue for a predefined times
124
Start
So, this is our adjacency matrix
125
 These are sets of production rules which produce a
string of symbols.
 That string is then interpreted as a small computer
program to produce some final object such as a plant
or tree, fractal or pattern, or machine.
 A simple example of an L-System is one which creates
the Koch Curve, a fractal pattern.
126
 The rule system consists of the single rule F ---> F + F − F − F + F.
 We start with a single F. Applying this rule, this expands to F + F
− F − F + F.
 Expanding each of these F’s using the rule, we get: F + F − F − F +
F+F+F−F−F+F−F+F−F−F+F−F+F−F−F+F+F+F−
F−F+F
 Expanding yet again, we get: F + F − F − F + F + F + F − F − F + F −
F + F − F − F + F − F + F − F − F + F + F + F − F − F + F+ F + F − F −
F+F+F+F−F−F+F−F+F−F−F+F−F+F−F−F+F+F+
F − F − F + F− F + F − F − F + F + F + F − F − F + F − F + F − F − F +
F − F + F − F − F + F + F + F − F − F + F− F + F − F − F + F + F + F −
F−F+F−F+F−F−F+F−F+F−F−F+F+F+F−F−F+F+
F+F−F−F+F+F+F−F−F+F−F+F−F−F+F−F+F−F−
F+F+F+F−F−F+F
127
 Let us use the following
interpretation:
 F => “draw a line forward”
 + => “turn left”
 − => “turn right”
 Then we get the Koch Curve
shown in Figure
 Further expansions create more
complex patterns.
128
 One interesting use of L-Systems with evolutionary
computation was in discovering useful designs such as
novel chairs or tables.
 L-Systems is also applied together with Edge Encoding
to discover animal body forms and finite-state
automata-like graph structures.
 The L-System ruleset expanded into a string, which was
then interpreted as a series of Edge Encoding
instructions (double, split, etc.) to produce the final
graph
129
130
 Building rulesets is mostly a matter of determining
how many elements you want, and then creating
them.
 We begin by picking a desired ruleset size n, using
some distribution (the Geometric
 Distribution is probably fine).
 We then create a ruleset out of n randomly-generated
elements
131
 For production rules, there are some additional
constraints.
 Specifically, the various symbols which appear in the
heads of the rules need to match symbols in the bodies
of the rules.
 Likewise, you probably won’t want two rules that have
the same body
 If we have two production rules of the form a ---> b, c
and a ---> d, e, f , which one should fire?
 Arbitration doesn’t make much sense in production
rules, unlike state-action rules, unless perhaps your
production rules are probabilistic.
132
133
There is no rule for c! What gets triggered from c events?
134
There is no rule for c! What gets triggered from c events?
Duplicate rules for a!
135
There is no rule for c! What gets triggered from c events?
Duplicate rules for a!
Recursive rules!
136
There is no rule for c! What gets triggered from c events?
Duplicate rules for a!
Recursive rules!
This is a disconnected (junk) rule! Nothing will trigger this!
137
There is no rule for c! What gets triggered from c events?
Duplicate rules for a!
Recursive rules!
This is a disconnected (junk) rule! Nothing will trigger this!
 During initialization you’ll need to handle some of these
situations.
 Two ways:
 You could generate rules at random and then try to “fix” things.
 Or you could create some n nonterminal symbols and then construct
rules for each of them.
 Issues:
 whether to allow recursive rules or not,
 whether or not to permit disconnected rules (that is, ones never
triggered).
138
v! is the set of non terminals. v_1 will be
our start symbol
\ell is size of the current rule
h! of size \ell keeps the right side of the
current rule
Avoiding recursion
We get the ith rule
Handling
junk rule
v_i to the
right of 139
v_\ell
140
 Mutation in sets is often similar to mutation in lists.
 Changing the size of the ruleset (if you’re allowed to),
 and mutating the rules in the set.
 One way to change the size is to sample a small value from
the geometric distribution, then either add or delete that
number of rules from the set
 You might select victims at random
 Likewise, you could mutate rules in the set in the same
manner as bit-flip mutation:
 mutate each rule with a certain independent probability.
 Production rules, as usual, have additional constraints.
141
142
 If you have a fixed number of rules then recombination may be easy:
 just use uniform crossover to swap some subset of the rules.
If your number of rules is arbitrary, you may need to pick a subset of
rules in each individual and swap them
 If you’ve got constraints on your rules (such as in production rules), you
need to be careful about crossover:

 what happens if you disconnect a rule? (Or do you care?)
 What happens if you eliminated a rule for which some other rule had an
event that triggered it? Who gets triggered now?
 One of the biggest issues in crossing over arbitrary-length production
rulesets is in merging the symbols:
 you may have symbols in one ruleset which don’t match up with symbols in
the other ruleset. As a result, the rulesets are essentially disjoint.
 And there isn’t any good guidance here: like graphs, it’s fairly ad-hoc.
143