Fall 2008 The Chinese University of Hong Kong CSC 3130: Automata theory and formal languages Polynomial time Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130 Efficient algorithms PCP ATM • The running time of an algorithm depends on the input • For longer inputs, we allow more time • Efficiency is measured as a function of input size Examples of running time n = input size problem algorithm running time problem running time parsing short paths matching LR(1) CYK Dijkstra Edmonds O(n) O(n2) O(n log n) O(n3) 0n 1 n O(n) routing scheduling 2O(n) 2O(n logn) theorem proving 2O(n) Input representation • Since we measure efficiency in terms of input size, how the input is represented will make a difference • For us, any “reasonable” representation will be okay The number 17 17 10001 (17 in base two) 11111111111111111 1 3 OK OK NO 2 4 This graph 0000,0010,0001,0010 OK (2,3),(3,4) OK Measuring running time • What does it mean when we say: “This algorithm runs in 1000 steps” • One step in java if (x > 0) y = 5*y + x; RAM machine write r3; all mean different things! Turing Machine d(q3, a) = (q7, b, R) Example L = {0n1n: n > 0} But how about: in java: RAM machine? M(string x) { n = x.len; if n % 2 == 0 reject; else for (i = 0; i <= n/2; i++) if x[i] != x[n-i] reject; accept; } running time = O(n) Turing Machine? multitape Turing Machine? nondeterministic TM? Efficiency and the Church-Turing thesis • The Church-Turing thesis says all these models are equivalent in power… java UNIVAC Turing Machine RAM machine multitape TM … but not in running time! The Cobham-Edmonds thesis • However, there is an extension to the Church-Turing thesis that says For any realistic models of computation M1 and M2: M1 can be simulated on M2 with at most polynomial slowdown • So any task that takes time T on M1 can be done in time (say) T2 or T3 on M2 Efficient simulation • The running time of a program depends on the model of computation… ordinary TM slow multitape TM RAM machine java fast … but in the grand scheme, this is irrelevant Every reasonable model of computation can be simulated efficiently on every other Example of efficient simulation • Recall simulating multiple tapes on a single tape M S 0 1 0 1 1 0 # 0 0 G = {0, 1, ☐} … … 0 … 1 0 # 0 1 # 1 0 … 0 # G = {0, 1, ☐, 0, 1, ☐, #} Running time of simulation • Each move of the multiple tape TM might require traversing the whole single tape 1 step of 3-tape TM O(s) steps of single tape TM s = rightmost cell ever visited after t steps s ≤ 3t + 4 t steps of 3-tape O(ts) = O(t2) single tape steps multi-tape TM quadratic slowdown single tape TM Simulation slowdown java O(t) O(t) multi-tape TM O(t2) O(t2) O(t) O(t) RAM machine single tape TM • Cobham-Edmonds Thesis: M1 can be simulated on M2 with at most polynomial slowdown Running time of nondeterministic TM • What about nondeterministic TMs? • For ordinary TMs, the running time of M on input x is the number of transitions M makes before it halts • But a nondeterministic TM can run for a different time on different “computation paths” Example qrej qacc q0 10001 1/1R q1 0/0R what is the running time? 5 • Definition of running time for nondeterministic TM computation path: any possible sequence of transitions running time = max length of any computation path Simulation of nondeterministic TM N M nondet TM multi-tape TM 0 input tape x … 0 1 0 simulation tape z … 1 0 address tape a For all k > 0 … 1 2 2 1 For all possible strings a of length k Copy x to z. represents possible choices Simulate N on input z using a as choices at each step If a specifies an invalid choice or simulation loops/rejects, abandon simulation. each a describes If N enters its accept state, accept and halt.a possible path If N rejected on all as of length k, reject and computation halt. Simulation slowdown for nondeterminism For all k > 0 For all possible strings a of length k Copy x to z. Simulate N on input z using a as choices If a specifies an invalid choice or simulation loops/rejects, abandon simulation. If N enters its accept state, accept and halt. If N rejected on all as of length k, reject and halt. running time of N is t simulation will halt when k = t running time of simulation = (running time for specific a) × (number of as of length ≤ t) = O(t) × 2O(t) = 2O(t) Simulation slowdown java O(t) multi-tape TM O(t) O(t) O(t2) O(t) RAM machine O(t2) 2O(t) single tape TM nondeterministic TM Do nondeterministic TM violate the Cobham-Edmonds thesis? Nondeterminism and the CE thesis • Cobham-Edmonds Thesis says: Any two realistic models of computation can be simulated with polynomial slowdown • But is nondetermistic computation realistic? Example • Recall the scheduling problem Can you schedule final exams so that there are no conflicts? CSC 3230 CSC 2110 • Scheduling with nondeterminism: schedule(int n, Edges edges) { for i := 1 to n: choose { c[i] := Y; } or { c[i] := R; } or { c[i] := B; } for all e in edges: if c[e.left] == c[e.right] reject; accept; } CSC 3130 CSC 3160 Exams → vertices Conflicts → edges Slots → colors Y R B Example schedule(int n, Edges edges) { for i := 1 to n: choose { c[i] := Y; } or { c[i] := R; } or { c[i] := B; } for all e in edges: if c[e.left] == c[e.right] reject; accept; } In reality, programming languages don’t allow us to choose We have to tell the computer how to make these choices Nondeterminism does not seem like a realistic feature of a programming language or computer ... but if we had it, we could schedule in linear time! Nondeterministic simulation nondeterministic TM 2O(t) slowdown multi-tape TM Is this the best we can do? • If we can do better, this would improve all known combinatorial optimization algorithms! Millenium prize problems • Recall how in 1900, Hilbert gave 23 problems that guided mathematics in the 20th century • In 2000, the Clay Mathematical Institute gave 7 problems for the 21st century 1 P versus NP computer science 2 The Hodge conjecture Perelman 2006 (refused money) 3 The Poincaré conjecture 4 The Riemann hypothesis Hilbert’s 8th problem 5 Yang–Mills existence and mass gap $1,000,000 6 Navier–Stokes existence and smoothness 7 The Birch and Swinnerton-Dyer conjecture The P versus NP question nondeterministic TM poly(t) ordinary TM Can nondeterministic TM be simulated on ordinary TM with polynomial slowdown? • Among other things, this asks: – Is nondeterminism realistic feature computation? Mostapeople think ofnot, – Can the choose construct be efficiently implemented? but nobody knows for sure! – Can we efficiently optimize any “well-posed” problem? The class P P is the class of all languages that can be decided on an ordinary TM whose running time is some polynomial in the length of the input By the CE thesis, we can replace “ordinary TM” by any realistic model of computation java RAM multi-tape TM Examples of languages in P problem 0n1 n algorithm running time O(n) parsing short paths matching LR(1) CYK Dijkstra Edmonds O(n) O(n2) O(n log n) O(n3) n = input size L01 = {0n1n: n > 0} LG = {x: x is generated by G} G is some CFG PATH = {(G, a, b, L): G is a graph with a path of length L from a to b} MATCH = {G, a, b, L: G is a graph with a “perfect” matching} decidable P (efficient) MATCH PATH LG context-free L01 Languages believed to be outside P problem running time routing scheduling thm-proving 2O(n) of best-known algorithm We do not know if these problems have faster algorithms, but we suspect not To explain why, first we need to understand what these problems have in common 2O(n) 2O(n) decidable ? SCHED ROUTE PROVE P (efficient) MATCH PATH L G More problems 2 1 A clique is a subset of vertices that are all interconnected {1, 4}, {2, 3, 4}, {1} are cliques 4 3 Graph G An independent set is a subset of vertices so that no pair is connected {1, 2}, {1, 3}, {4} are independent sets there is no independent set of size 3 A vertex cover is a set of vertices that touches (covers) all edges {2, 4}, {3, 4}, {1, 2, 3} are vertex covers Boolean formula satisfiability • A boolean formula is an expression made up of variables, ands, ors, and negations, like (x1∨x2 ) ∧ (x2 ∨x3 ∨x4) ∧ (x1) • The formula is satisfiable if one can assign values to the variables so the expression evaluates to true Above formula is satisfiable because this assignment makes it true: x1 = F x2 = F x3 = T x4 = T Status of these problems CLIQUE = {(G, k): G is a graph with a clique of k vertices} IS = {(G, k): G is a graph with an independent set of k vertices} VC = {(G, k): G is a graph with a vertex cover of k vertices} SAT = {f: f is a satisfiable Boolean formula} problem running time CLIQUE IS VC SAT 2O(n) 2O(n) 2O(n) 2O(n) of best-known algorithm What do these problems have in common? Checking solutions efficiently • We don’t know how to solve them efficiently • But if someone told us the solution, we would be able to check it very quickly Example: Is (G, 5) in CLIQUE? 12 9 1,5,9,12,14 1 13 2 4 3 14 5 6 15 7 11 10 8 Cliques via nondeterminism • Checking solutions efficiently is equivalent to designing efficient nondeterministic algorithms Example: Is (G, k) in CLIQUE? clique(Graph G, int k) { C = {}; for i := 1 to G.n: choose { C := union(C, {i}); } or {} if size(C) != k reject; for i := 1 to G.n: for j := 1 to G.n: if i in C and j in C if G.isedge(i,j) == false reject; accept; } % potential clique % choose clique % check size is k % check all edges % are in Example: Formula satisfiability f = (x1∨x2 ) ∧ (x2 ∨x3 ∨x4) ∧ (x1) Checking solution: Nondeterministic algorithm: FFTT sat(Formula f) { x = new bool[f.n]; for i := 1 to n: choose { x[i] := true; } or { x[i] := false; } if f.eval(x) == true accept; else reject; } substitute x1 = F x2 = F x3 = T x4 = T evaluate formula f = (F ∨T ) ∧ (F∨T∨F) ∧ (T) can be done in linear time The class NP L can be solved on a nondeterministic TM in polynomial time iff its solutions can be checked in time polynomial in the input length • The class NP: NP is the class of all languages that can be decided on a nondeterministic TM whose running time is some polynomial in the length of the input P versus NP P is contained in NP because an ordinary TM is only weaker than a nondeterministic one decidable NP (efficiently checkable) IS SAT CLIQUE Conceptually, finding solutions can only be harder than checking them VC P (efficient) MATCH PATH L G P versus NP • The answer to the question $1,000,000 Is P equal to NP? is not known. But one reason it is believed to be negative is because, intuitively, searching is harder than verifying • For example, solving homework problems (searching for solutions) is harder than grading (verifying the solution is correct) Searching versus verifying Mathematician: Given a mathematical claim, come up with a proof for it. Scientist: Given a collection of data on some phenomena, ﬁnd a theory explaining it. Engineer: Given a set of constraints (on cost, physical laws, etc.) come up with a design (of an engine, bridge, etc) which meets them. Detective: Given the crime scene, ﬁnd “who’s done it”.

Descargar
# Slide 1