```Learning to Support
Constraint Programmers
Susan L. Epstein1
Gene Freuder2 and Rick Wallace2
1Department
of Computer Science
Hunter College and The Graduate Center of
The City University of New York
2Cork Constraint Computation Centre








Learns to solve constraint satisfaction problems
Learns search heuristics
Can transfer what it learns on simple problems to solve
more difficult ones
Can export knowledge to ordinary constraint solvers
Both a learner and a test bed
Heuristic but complete: will find a solution, eventually,
if one exists
Guarantees high-quality, not optimal, solutions
Begins with substantial domain knowledge
Outline





Performance results
Reasoning mechanism
Learning
Representations
The Problem Space



Constraint satisfaction problem <X, D, C>
Solution: assign a value to every variable
consistent with constraints
Many real-world problems can be represented and
solved this way (design and configuration, planning
and scheduling, diagnosis and testing)
Domains
A  {1,2,3}
B  {1,2,4,5,6}
C  {1,2}
D  {1,3}
Variables
A, B, C, D
Constraints
A=B
A>D
C≠D
(1 1) (2 2)
A
B
(2 1) (3 1) (3 2)
C
D
(1 3) (2 1) (2 3)
A Challenging Domain


Constraint solving is NP-hard
Problem class parameters: <n, k, d, t>
n = number of variables
k = maximum domain size
d = edge density (% of possible constraints)
t = tightness (% of value pairs excluded)




Complexity peak: values for d and t that make
problems hardest
Heavy-tailed distribution difficulty [Gomes et al., 2002]
Problem may have multiple or no solutions
Unexplored choices may be good
Finding a Path to a Solution



Sequence of decision pairs (select variable,
assign value)
Optimal length: 2n for n variables
For n variables with domain size d, there are
(d+1)n possible states
Select a variable
Assign a value
Solution
Domains
A  {1,2,3}
B  {1,2,4,5,6}
C  {1,2}
Search from initial state to goal
D  {1,3}
(1 1) (2 2)
B
Solution Method
B=1
A
A=1
C
C=1
D
D=1 D=3
No No
A
Constraints
A=B
A>D
C≠D
B
(2 1) (3 1) (3 2)
A=2
…
C
C=2
D
(1 3) (2 1) (2 3)
A
C
D
C
D
D
D=1 D=3
No No
D
Consistency Maintenance


A  {1,2}
C  {1,2}
D  {1,3}
C  {1,2}
D
Some values may initially be inconsistent
Value assignment can restrict domains
B
B=1
A
B=2
…
A=1
No
No other
possibilities
Domains
A  {1,2,3}
B  {1,2,4,5,6}
C  {1,2}
D  {1,3}
Constraints
A=B
A>D
C≠D
(1 1) (2 2)
A
B
(2 1) (3 1) (3 2)
C
D
(1 3) (2 1) (2 3)
Retraction
When an inconsistency arises, a retraction method
removes a value and returns to an earlier state
A  {1,2}
C  {1,2}
D  {1,3}
C  {1,2}
D
B
B=1
A
A=1
Domains
A  {1,2,3}
B  {1,2,4,5,6}
C  {1,2}
D  {1,3}
B=2
…
Here!
No!
Where’s the error?
(1 1) (2 2)
A
B
(2 1) (3 1) (3 2)
C
D
(1 3) (2 1) (2 3)
Variable Ordering
lA
good variable ordering can speed search
Domains
A  {1,2,3}
B  {1,2,4,5,6}
C  {1,2}
D  {1,3}
B  {1,2}
C  {1,2}
D  {1,2}
A
A=1
B  {1,2}
C  {1,2}
D
No
A=2
…
(1 1) (2 2)
A
B
(2 1) (3 1) (3 2)
C
D
(1 3) (2 1) (2 3)
Value Ordering
A good value ordering can speed search too
A
B  {1,2}
C  {1,2}
D  {1,3}
C  {1,2}
A=2
D
D=1
B
B=2
C
C=2
Domains
A  {1,2,3}
B  {1,2,4,5,6}
C  {1,2}
D  {1,3}
B  {1,2}
C  {1,2}
(1 1) (2 2)
A
(2 1) (3 1) (3 2)
C
Solution: A=2, B=2, C=2, D=1
B
D
(1 3) (2 1) (2 3)
Constraint Solvers Know…




Several consistency methods
Several retraction methods
Many variable ordering heuristics
Many value ordering heuristics
… but
the interactions among them are not well understood,
nor is one combination best for all problem classes.
Goals of the ACE Project





Characterize problem classes
Learn to solve classes of problems well
Evaluate mixtures of known heuristics
Develop new heuristics
Explore the role of planning in solution
Outline





Performance results ACE
Reasoning mechanism
Learning
Representation
Experimental Design






Specify problem class, consistency and retraction
methods
Average performance across 10 runs
Learn on L problems (halt at 10,000 steps)
To-completion testing on T new problems
During testing, use only heuristics judged accurate
during learning
Evaluate performance on
 Steps to solution
 Constraint checks
 Retractions
 Elapsed time
ACE Learns to Solve Hard Problems




<30, 8, .24, .66> near the complexity peak
Learn on 80 problems
10 runs, binned in sets of 10 learning problems
Outperforms MinDomain, an “off-the-shelf” heuristic
2500
2000
Steps to solution 1500
1000
2500
2000
1500
S teps

Avg
Med
1000
500
500
0
1 - 10
11 - 20
21 - 30
1 2 3
31 - 40
4
41 - 50
5
51 - 60
6
61 - 70
71 - 80
7 8
Bin #
Means in blue, medians in red
ACE Rediscovers Brélaz Heuristic




Graph coloring: assign different colors to adjacent
nodes.
Graph coloring is a kind of constraint satisfaction
problem.
Brélaz: Minimize dynamic domain, break ties with
maximum forward degree.
ACE learned this consistently on different classes of
graph coloring problems.
Color each vertex red, blue, or green so pair
of adjacent vertices are different colors.
[Epstein & Freuder, 2001]
ACE Discovers a New Heuristic




“Maximize the product of degree and forward degree at
the top of the search tree”
Min Domain
Min Domain/Degree
Min Domain + degree preorder
Learned on small problems but tested in 10 runs on n =
150, domain size 5, density .05, tightness .24
Reduced search tree size by 25% – 96%
[Epstein, Freuder, Wallace,
Morozov, & Samuels 2002]
Outline





Performance results
Reasoning mechanism
Learning
Representation
Constraint-Solving Heuristic





Uses domain knowledge
What problem classes does it work well on?
Is it valid throughout a single solution?
Can its dual also be valid?
How can heuristics be combined?
… and where do new heuristics come from?
FORR (For the Right Reasons)






General architecture for learning and problem solving
Multiple learning methods, multiple representations,
multiple decision rationales
Specialized by domain knowledge
Learns useful knowledge to support reasoning
Specify whether a rationale is correct or heuristic
Learns to combine rationales to improve problem
solving
[Epstein 1992]




Class-independent action-selection rationale
Supports or opposes actions by comments
Expresses opinion direction by strengths
Limitedly-rational procedure
current problem state
actions



Tier 1: rationales that correctly select a single action
Tier 2: rationales produce a set of actions directed to a
subgoal
Tier 3: heuristic rationales that select a single action
Current state
Possible actions
Tier 1: Reaction Victory
from perfect
knowledge
Choosing an Action
…
T-11
T-1n
Decision?
yes
take action
no
Tier 2: Planning
triggered by
situation recognition
P-1
…
P-2
Decision?
…
Tier 3: Heuristic T-31
reactions
T-32
Voting
P-k
yes
begin plan
no
…
T-3m
take action
ACE’s Domain Knowledge







Consistency maintenance methods: forward checking,
arc consistency
Backtracking methods: chronological
21 variable ordering heuristics
19 value ordering heuristics
3 languages whose expressions have interpretations as
heuristics
Graph theory knowledge, e.g., connected, acyclic
Constraint solving knowledge, e.g., “only one arc
consistency pass is required on a tree”
An Overview of ACE





Performance results ACE
Reasoning mechanism
Learning
Representation
What ACE Learns





Weighted linear combination for comment strengths
 For voting in tier 3 only
 Includes only valuable heuristics
 Indicates relative accuracy of valuable heuristics
New, learned heuristics
How to restructure tier 3
When random choice is the right thing to do
Acquire knowledge that supports heuristics (e.g.,
typical solution path length)
Digression-based Weight Learning






Learn from trace of each solved problem
Reward decisions on perfect solution path
Shorter paths reward variable ordering
Longer paths reward value ordering
Blame digression-producing decisions in
proportion to error
Select a variable
Assign a value digression
Solution




Advisor grammar on pairs of concerns
 Maximize or minimize
 Product or quotient
 Stage
Monitor all expressions
Use good ones collectively
Use best ones individually
Outline





Performance results ACE
Reasoning mechanism
Learning
Representation
Representation of Experience



State describes variables and value assignments,
impossible future values, prior state, connected
components, constraint checks incurred, dynamic edges,
trees
History of successful decisions
… plus other significant decisions
become training examples
Is
Can be Cannot be
A —
1
2
B
2
—
—
C —
1,2
—
D —
1,3
—
Checks incurred: 4
1 acyclic component: A,C,D
No
No
No
No Yes
Representation of Learned
Knowledge



Solution size distribution
Latest error: greatest number of variables bound at
retraction
ACE’s Status Report




41 Advisors in tiers 1 and 3
5 experimental planners
Problem classes: random, coloring, geometric, logic,
n-queens, small world, and quasigroup (with and
without holes)
Learns to solve hard problems
 Learns new heuristics
 Transfers to harder problems
 Divides and conquers problems
 Learns when not to reason

Current ACE Research








Further weight-learning refinements
Learn appropriate restart parameters
More problem classes, consistency methods,
retraction methods, planners, and Advisor languages
Learn appropriate consistency checking methods
Learn appropriate backtracking methods
Learn to bias initial weights
Metaheuristics to reformulate the architecture
Modeling strategies
… and, coming soon, ACE on the Web
Acknowledgements
Continued thanks for their ideas and efforts go to:
Diarmuid Grimes
Mark Hennessey
Tiziana Ligorio
Anton Morozov
Smiljana Petrovic
Bruce Samuels
Students of the FORR study group
The Cork Constraint Computation Centre
and, for their support, to:
The National Science Foundation
Science Foundation Ireland
Is ACE Reinforcement Learning?


Similarities:
 Unsupervised learning through trial and error
 Delayed rewards
 Learns a policy
Primary differences:
 Reinforcement learning learns a policy represented
as the estimated values of states it has experienced
repeatedly … but ACE is unlikely to revisit a state;
instead it learns how to act in any state
 Q-learning learns state-action preferences … but
ACE learns a policy that combines action
preferences
How is ACE like STAGGER?
l
lLearns
lRepresents
STAGGER
Boolean classifier
Weighted booleans
lSupervised
Yes
lNew elements Failure-driven
lInitial bias
Yes
lReal attributes
ACE
Search control preference
function for a sequence of
decisions in a class of problems
Weighted linear function
No
Success-driven
Under construction
Yes
No
[Schlimmer 1987]
How is ACE like SAGE.2?
lBoth
learn search control from unsupervised experience, reinforce
decisions on a successful path, gradually introduce new factors,
specify a threshold, and transfer to harder problems, but…
l
SAGE.2
ACE
lLearns on
Different problems in a class
lRepresents
Symbolic rules Weighted linear function
lReinforces
lFailure response
Revise
Reduce weight
lProportional to error No
Yes
lCompares states
Yes
No
lRandom benchmarks No
Yes
lSubgoals
No
Yes
lLearns during search Yes
No
[Langley 1985]
```