Learning to Support Constraint Programmers Susan L. Epstein1 Gene Freuder2 and Rick Wallace2 1Department of Computer Science Hunter College and The Graduate Center of The City University of New York 2Cork Constraint Computation Centre Facts about ACE Learns to solve constraint satisfaction problems Learns search heuristics Can transfer what it learns on simple problems to solve more difficult ones Can export knowledge to ordinary constraint solvers Both a learner and a test bed Heuristic but complete: will find a solution, eventually, if one exists Guarantees high-quality, not optimal, solutions Begins with substantial domain knowledge Outline The task: constraint satisfaction Performance results Reasoning mechanism Learning Representations The Problem Space Constraint satisfaction problem <X, D, C> Solution: assign a value to every variable consistent with constraints Many real-world problems can be represented and solved this way (design and configuration, planning and scheduling, diagnosis and testing) Domains A {1,2,3} B {1,2,4,5,6} C {1,2} D {1,3} Variables A, B, C, D Constraints A=B A>D C≠D (1 1) (2 2) A B (2 1) (3 1) (3 2) C D (1 3) (2 1) (2 3) A Challenging Domain Constraint solving is NP-hard Problem class parameters: <n, k, d, t> n = number of variables k = maximum domain size d = edge density (% of possible constraints) t = tightness (% of value pairs excluded) Complexity peak: values for d and t that make problems hardest Heavy-tailed distribution difficulty [Gomes et al., 2002] Problem may have multiple or no solutions Unexplored choices may be good Finding a Path to a Solution Sequence of decision pairs (select variable, assign value) Optimal length: 2n for n variables For n variables with domain size d, there are (d+1)n possible states Select a variable Assign a value Solution Domains A {1,2,3} B {1,2,4,5,6} C {1,2} Search from initial state to goal D {1,3} (1 1) (2 2) B Solution Method B=1 A A=1 C C=1 D D=1 D=3 No No A Constraints A=B A>D C≠D B (2 1) (3 1) (3 2) A=2 … C C=2 D (1 3) (2 1) (2 3) A C D C D D D=1 D=3 No No D Consistency Maintenance A {1,2} C {1,2} D {1,3} C {1,2} D Some values may initially be inconsistent Value assignment can restrict domains B B=1 A B=2 … A=1 No No other possibilities Domains A {1,2,3} B {1,2,4,5,6} C {1,2} D {1,3} Constraints A=B A>D C≠D (1 1) (2 2) A B (2 1) (3 1) (3 2) C D (1 3) (2 1) (2 3) Retraction When an inconsistency arises, a retraction method removes a value and returns to an earlier state A {1,2} C {1,2} D {1,3} C {1,2} D B B=1 A A=1 Domains A {1,2,3} B {1,2,4,5,6} C {1,2} D {1,3} B=2 … Here! No! Where’s the error? (1 1) (2 2) A B (2 1) (3 1) (3 2) C D (1 3) (2 1) (2 3) Variable Ordering lA good variable ordering can speed search Domains A {1,2,3} B {1,2,4,5,6} C {1,2} D {1,3} B {1,2} C {1,2} D {1,2} A A=1 B {1,2} C {1,2} D No A=2 … (1 1) (2 2) A B (2 1) (3 1) (3 2) C D (1 3) (2 1) (2 3) Value Ordering A good value ordering can speed search too A B {1,2} C {1,2} D {1,3} C {1,2} A=2 D D=1 B B=2 C C=2 Domains A {1,2,3} B {1,2,4,5,6} C {1,2} D {1,3} B {1,2} C {1,2} (1 1) (2 2) A (2 1) (3 1) (3 2) C Solution: A=2, B=2, C=2, D=1 B D (1 3) (2 1) (2 3) Constraint Solvers Know… Several consistency methods Several retraction methods Many variable ordering heuristics Many value ordering heuristics … but the interactions among them are not well understood, nor is one combination best for all problem classes. Goals of the ACE Project Characterize problem classes Learn to solve classes of problems well Evaluate mixtures of known heuristics Develop new heuristics Explore the role of planning in solution Outline The task: constraint satisfaction Performance results ACE Reasoning mechanism Learning Representation Experimental Design Specify problem class, consistency and retraction methods Average performance across 10 runs Learn on L problems (halt at 10,000 steps) To-completion testing on T new problems During testing, use only heuristics judged accurate during learning Evaluate performance on Steps to solution Constraint checks Retractions Elapsed time ACE Learns to Solve Hard Problems <30, 8, .24, .66> near the complexity peak Learn on 80 problems 10 runs, binned in sets of 10 learning problems Discards 26 of 38 heuristics Outperforms MinDomain, an “off-the-shelf” heuristic 2500 2000 Steps to solution 1500 1000 2500 2000 1500 S teps Avg Med 1000 500 500 0 1 - 10 11 - 20 21 - 30 1 2 3 31 - 40 4 41 - 50 Tasks 5 51 - 60 6 61 - 70 71 - 80 7 8 Bin # Means in blue, medians in red ACE Rediscovers Brélaz Heuristic Graph coloring: assign different colors to adjacent nodes. Graph coloring is a kind of constraint satisfaction problem. Brélaz: Minimize dynamic domain, break ties with maximum forward degree. ACE learned this consistently on different classes of graph coloring problems. Color each vertex red, blue, or green so pair of adjacent vertices are different colors. [Epstein & Freuder, 2001] ACE Discovers a New Heuristic “Maximize the product of degree and forward degree at the top of the search tree” Exported to several traditional approaches: Min Domain Min Domain/Degree Min Domain + degree preorder Learned on small problems but tested in 10 runs on n = 150, domain size 5, density .05, tightness .24 Reduced search tree size by 25% – 96% [Epstein, Freuder, Wallace, Morozov, & Samuels 2002] Outline The task: constraint satisfaction Performance results Reasoning mechanism Learning Representation Constraint-Solving Heuristic Uses domain knowledge What problem classes does it work well on? Is it valid throughout a single solution? Can its dual also be valid? How can heuristics be combined? … and where do new heuristics come from? FORR (For the Right Reasons) General architecture for learning and problem solving Multiple learning methods, multiple representations, multiple decision rationales Specialized by domain knowledge Learns useful knowledge to support reasoning Specify whether a rationale is correct or heuristic Learns to combine rationales to improve problem solving [Epstein 1992] An Advisor Implements a Rationale Class-independent action-selection rationale Supports or opposes actions by comments Expresses opinion direction by strengths Limitedly-rational procedure current problem state actions Advisor < strength, action, Advisor > Advisor Categories Tier 1: rationales that correctly select a single action Tier 2: rationales produce a set of actions directed to a subgoal Tier 3: heuristic rationales that select a single action Current state Possible actions Tier 1: Reaction Victory from perfect knowledge Choosing an Action … T-11 T-1n Decision? yes take action no Tier 2: Planning triggered by situation recognition P-1 … P-2 Decision? … Tier 3: Heuristic T-31 reactions T-32 Voting P-k yes begin plan no … T-3m take action ACE’s Domain Knowledge Consistency maintenance methods: forward checking, arc consistency Backtracking methods: chronological 21 variable ordering heuristics 19 value ordering heuristics 3 languages whose expressions have interpretations as heuristics Graph theory knowledge, e.g., connected, acyclic Constraint solving knowledge, e.g., “only one arc consistency pass is required on a tree” An Overview of ACE The task: constraint satisfaction Performance results ACE Reasoning mechanism Learning Representation What ACE Learns Weighted linear combination for comment strengths For voting in tier 3 only Includes only valuable heuristics Indicates relative accuracy of valuable heuristics New, learned heuristics How to restructure tier 3 When random choice is the right thing to do Acquire knowledge that supports heuristics (e.g., typical solution path length) Digression-based Weight Learning Learn from trace of each solved problem Reward decisions on perfect solution path Shorter paths reward variable ordering Longer paths reward value ordering Blame digression-producing decisions in proportion to error Valuable Advisor’s weight > baseline’s Select a variable Assign a value digression Solution Learning New Advisors Advisor grammar on pairs of concerns Maximize or minimize Product or quotient Stage Monitor all expressions Use good ones collectively Use best ones individually Outline The task: constraint satisfaction Performance results ACE Reasoning mechanism Learning Representation Representation of Experience State describes variables and value assignments, impossible future values, prior state, connected components, constraint checks incurred, dynamic edges, trees History of successful decisions … plus other significant decisions become training examples Is Can be Cannot be A — 1 2 B 2 — — C — 1,2 — D — 1,3 — Checks incurred: 4 1 acyclic component: A,C,D Dynamic edges: AD, CD No No No No Yes Representation of Learned Knowledge Weights for Advisors Solution size distribution Latest error: greatest number of variables bound at retraction ACE’s Status Report 41 Advisors in tiers 1 and 3 3 languages in which to express additional Advisors 5 experimental planners Problem classes: random, coloring, geometric, logic, n-queens, small world, and quasigroup (with and without holes) Learns to solve hard problems Learns new heuristics Transfers to harder problems Divides and conquers problems Learns when not to reason Current ACE Research Further weight-learning refinements Learn appropriate restart parameters More problem classes, consistency methods, retraction methods, planners, and Advisor languages Learn appropriate consistency checking methods Learn appropriate backtracking methods Learn to bias initial weights Metaheuristics to reformulate the architecture Modeling strategies … and, coming soon, ACE on the Web Acknowledgements Continued thanks for their ideas and efforts go to: Diarmuid Grimes Mark Hennessey Tiziana Ligorio Anton Morozov Smiljana Petrovic Bruce Samuels Students of the FORR study group The Cork Constraint Computation Centre and, for their support, to: The National Science Foundation Science Foundation Ireland Is ACE Reinforcement Learning? Similarities: Unsupervised learning through trial and error Delayed rewards Learns a policy Primary differences: Reinforcement learning learns a policy represented as the estimated values of states it has experienced repeatedly … but ACE is unlikely to revisit a state; instead it learns how to act in any state Q-learning learns state-action preferences … but ACE learns a policy that combines action preferences How is ACE like STAGGER? l lLearns lRepresents STAGGER Boolean classifier Weighted booleans lSupervised Yes lNew elements Failure-driven lInitial bias Yes lReal attributes ACE Search control preference function for a sequence of decisions in a class of problems Weighted linear function No Success-driven Under construction Yes No [Schlimmer 1987] How is ACE like SAGE.2? lBoth learn search control from unsupervised experience, reinforce decisions on a successful path, gradually introduce new factors, specify a threshold, and transfer to harder problems, but… l SAGE.2 ACE lLearns on Same task Different problems in a class lRepresents Symbolic rules Weighted linear function lReinforces Repeating rules Correct comments lFailure response Revise Reduce weight lProportional to error No Yes lCompares states Yes No lRandom benchmarks No Yes lSubgoals No Yes lLearns during search Yes No [Langley 1985]

Descargar
# Machine Learning in Graph Theory