```Fall 2008
The Chinese University of Hong Kong
CSC 3130: Automata theory and formal languages
Polynomial time
Andrej Bogdanov
http://www.cse.cuhk.edu.hk/~andrejb/csc3130
Efficient algorithms
PCP
ATM
• The running time of an
algorithm depends on
the input
• For longer inputs, we
allow more time
• Efficiency is measured as
a function of input size
Examples of running time
n = input size
problem
algorithm
running time
problem
running time
parsing
short paths
matching
LR(1) CYK
Dijkstra
Edmonds
O(n) O(n2)
O(n log n)
O(n3)
0n 1 n
O(n)
routing
scheduling
2O(n)
2O(n logn)
theorem proving
2O(n)
Input representation
• Since we measure efficiency in terms of input size,
how the input is represented will make a difference
• For us, any “reasonable” representation will be okay
The number 17
17
10001
(17 in base two)
11111111111111111
1
3
OK
OK
NO
2
4
This graph
0000,0010,0001,0010 OK
(2,3),(3,4)
OK
Measuring running time
• What does it mean when we say:
“This algorithm runs in 1000 steps”
• One step in
java
if (x > 0)
y = 5*y + x;
RAM machine
write r3;
all mean different things!
Turing Machine
d(q3, a) = (q7, b, R)
Example
L = {0n1n: n > 0}
in java:
RAM machine?
M(string x) {
n = x.len;
if n % 2 == 0 reject;
else
for (i = 0; i <= n/2; i++)
if x[i] != x[n-i] reject;
accept; }
running time = O(n)
Turing Machine?
multitape Turing Machine?
nondeterministic TM?
Efficiency and the Church-Turing thesis
• The Church-Turing thesis says all these models are
equivalent in power…
java
UNIVAC
Turing Machine
RAM machine
multitape TM
… but not in running time!
The Cobham-Edmonds thesis
• However, there is an extension to the Church-Turing
thesis that says
For any realistic models of computation M1 and M2:
M1 can be simulated on M2 with at most
polynomial slowdown
• So any task that takes time T on M1 can be done in
time (say) T2 or T3 on M2
Efficient simulation
• The running time of a program depends on the model
of computation…
ordinary TM
slow
multitape TM
RAM machine
java
fast
… but in the grand scheme, this is irrelevant
Every reasonable model of computation can be
simulated efficiently on every other
Example of efficient simulation
• Recall simulating multiple tapes on a single tape
M
S
0
1
0
1
1
0

# 0
0
G = {0, 1, ☐}
…
…
0
…
1
0 # 0
1

# 1

0
…
0 #
G = {0, 1, ☐, 0, 1, ☐, #}



Running time of simulation
• Each move of the multiple tape TM might require
traversing the whole single tape
1 step of 3-tape TM
O(s) steps of single tape TM
s = rightmost cell ever visited
after t steps
s ≤ 3t + 4
t steps of 3-tape
O(ts) = O(t2) single tape steps
multi-tape TM
slowdown
single tape TM
Simulation slowdown
java
O(t)
O(t)
multi-tape TM
O(t2)
O(t2) O(t)
O(t)
RAM machine
single tape TM
• Cobham-Edmonds Thesis:
M1 can be simulated on M2 with at most
polynomial slowdown
Running time of nondeterministic TM
• For ordinary TMs, the running time of M on input x is
the number of transitions M makes before it halts
• But a nondeterministic TM can run for a different
time on different “computation paths”
Example
qrej
qacc
q0
10001
1/1R
q1
0/0R
what is the running time? 5
• Definition of running time for nondeterministic TM
computation path:
any possible sequence of transitions
running time =
max length of any computation path
Simulation of nondeterministic TM
N
M
nondet TM
multi-tape TM
0
input tape x
…
0
1
0
simulation tape z
…
1 0
For all k > 0
…
1 2 2 1
For all possible strings a of length k
Copy x to z.
represents
possible choices
Simulate N on input z using a as choices
at each step
If a specifies an invalid choice or
simulation loops/rejects, abandon simulation.
each a describes
If N enters its accept state, accept and halt.a possible
path
If N rejected on all as of length k, reject and computation
halt.
Simulation slowdown for nondeterminism
For all k > 0
For all possible strings a of length k
Copy x to z.
Simulate N on input z using a as choices
If a specifies an invalid choice or
simulation loops/rejects, abandon simulation.
If N enters its accept state, accept and halt.
If N rejected on all as of length k, reject and halt.
running time of N is t
simulation will halt when k = t
running time of simulation = (running time for specific a)
× (number of as of length ≤ t)
= O(t) × 2O(t) = 2O(t)
Simulation slowdown
java
O(t)
multi-tape TM
O(t)
O(t)
O(t2)
O(t)
RAM machine
O(t2)
2O(t)
single tape TM
nondeterministic TM
Do nondeterministic TM violate the Cobham-Edmonds thesis?
Nondeterminism and the CE thesis
• Cobham-Edmonds Thesis says:
Any two realistic models of computation
can be simulated with polynomial slowdown
• But is nondetermistic computation realistic?
Example
• Recall the scheduling problem
Can you schedule final exams
so that there are no conflicts?
CSC 3230
CSC 2110
• Scheduling with nondeterminism:
schedule(int n, Edges edges) {
for i := 1 to n:
choose { c[i] := Y; }
or { c[i] := R; }
or { c[i] := B; }
for all e in edges:
if c[e.left] == c[e.right]
reject;
accept;
}
CSC 3130
CSC 3160
Exams → vertices
Conflicts → edges
Slots → colors Y
R
B
Example
schedule(int n, Edges edges) {
for i := 1 to n:
choose { c[i] := Y; }
or { c[i] := R; }
or { c[i] := B; }
for all e in edges:
if c[e.left] == c[e.right]
reject;
accept;
}
In reality, programming
languages don’t allow us
to choose
We have to tell the
computer how to make
these choices
Nondeterminism does not seem like a realistic
feature of a programming language or computer
... but if we had it, we could schedule in linear time!
Nondeterministic simulation
nondeterministic TM
2O(t) slowdown
multi-tape TM
Is this the best we can do?
• If we can do better, this would improve all known
combinatorial optimization algorithms!
Millenium prize problems
• Recall how in 1900, Hilbert gave 23 problems that
guided mathematics in the 20th century
• In 2000, the Clay Mathematical Institute gave 7
problems for the 21st century
1 P versus NP
computer science
2 The Hodge conjecture
Perelman 2006 (refused money)
3 The Poincaré conjecture
4 The Riemann hypothesis
Hilbert’s 8th problem
5 Yang–Mills existence and mass gap
\$1,000,000
6 Navier–Stokes existence and smoothness
7 The Birch and Swinnerton-Dyer conjecture
The P versus NP question
nondeterministic TM
poly(t)
ordinary TM
Can nondeterministic TM be simulated on
ordinary TM with polynomial slowdown?
• Among other things, this asks:
– Is nondeterminism
realistic feature
computation?
Mostapeople
think ofnot,
– Can the choose
construct
be efficiently
implemented?
but nobody
knows
for sure!
– Can we efficiently optimize any “well-posed” problem?
The class P
P is the class of all languages
that can be decided on an
ordinary TM whose running
time is some polynomial
in the length of the input
By the CE thesis, we can replace
“ordinary TM” by any realistic
model of computation
java
RAM
multi-tape TM
Examples of languages in P
problem
0n1 n
algorithm
running time
O(n)
parsing
short paths
matching
LR(1) CYK
Dijkstra
Edmonds
O(n) O(n2)
O(n log n)
O(n3)
n = input size
L01 = {0n1n: n > 0}
LG = {x: x is generated by G} G is some CFG
PATH = {(G, a, b, L): G is a graph with a
path of length L from a to b}
MATCH = {G, a, b, L: G is a graph with a
“perfect” matching}
decidable
P (efficient)
MATCH
PATH
LG
context-free
L01
Languages believed to be outside P
problem
running time
routing scheduling thm-proving
2O(n)
of best-known algorithm
We do not know if these
problems have faster algorithms,
but we suspect not
To explain why, first we need
to understand what these
problems have in common
2O(n)
2O(n)
decidable
?
SCHED
ROUTE
PROVE
P (efficient)
MATCH
PATH L
G
More problems
2
1
A clique is a subset of vertices
that are all interconnected
{1, 4}, {2, 3, 4}, {1} are cliques
4
3
Graph G
An independent set is a subset of
vertices so that no pair is connected
{1, 2}, {1, 3}, {4} are independent sets
there is no independent set of size 3
A vertex cover is a set of vertices
that touches (covers) all edges
{2, 4}, {3, 4}, {1, 2, 3} are vertex covers
Boolean formula satisfiability
• A boolean formula is an expression made up of
variables, ands, ors, and negations, like
(x1∨x2 ) ∧ (x2 ∨x3 ∨x4) ∧ (x1)
• The formula is satisfiable if one can assign values to
the variables so the expression evaluates to true
Above formula is satisfiable because this assignment
makes it true: x1 = F x2 = F x3 = T x4 = T
Status of these problems
CLIQUE = {(G, k): G is a graph with a clique of k vertices}
IS = {(G, k): G is a graph with an independent set of k vertices}
VC = {(G, k): G is a graph with a vertex cover of k vertices}
SAT = {f: f is a satisfiable Boolean formula}
problem
running time
CLIQUE
IS
VC
SAT
2O(n)
2O(n)
2O(n)
2O(n)
of best-known algorithm
What do these problems have in common?
Checking solutions efficiently
• We don’t know how to solve them efficiently
• But if someone told us the solution, we would be able
to check it very quickly
Example:
Is (G, 5) in CLIQUE?
12
9
1,5,9,12,14
1
13
2
4
3
14
5
6
15
7
11
10
8
Cliques via nondeterminism
• Checking solutions efficiently is equivalent to
designing efficient nondeterministic algorithms
Example: Is (G, k) in CLIQUE?
clique(Graph G, int k) {
C = {};
for i := 1 to G.n:
choose { C := union(C, {i}); }
or {}
if size(C) != k reject;
for i := 1 to G.n:
for j := 1 to G.n:
if i in C and j in C
if G.isedge(i,j) == false
reject;
accept;
}
% potential clique
% choose clique
% check size is k
% check all edges
%
are in
Example: Formula satisfiability
f = (x1∨x2 ) ∧ (x2 ∨x3 ∨x4) ∧ (x1)
Checking solution:
Nondeterministic algorithm:
FFTT
sat(Formula f) {
x = new bool[f.n];
for i := 1 to n:
choose { x[i] := true; }
or { x[i] := false; }
if f.eval(x) == true accept;
else reject;
}
substitute
x1 = F x2 = F x3 = T x4 = T
evaluate formula
f = (F ∨T ) ∧ (F∨T∨F) ∧ (T)
can be done in linear time
The class NP
L can be solved on a nondeterministic TM in
polynomial time iff its solutions can be checked in
time polynomial in the input length
• The class NP:
NP is the class of all languages that can be decided
on a nondeterministic TM whose running time is
some polynomial in the length of the input
P versus NP
P is contained in NP
because an ordinary TM
is only weaker than a
nondeterministic one
decidable
NP (efficiently checkable)
IS
SAT
CLIQUE
Conceptually, finding
solutions can only be
harder than checking
them
VC
P (efficient)
MATCH
PATH L
G
P versus NP
• The answer to the question
\$1,000,000
Is P equal to
NP?
is not known. But one reason it is believed to be
negative is because, intuitively, searching is harder
than verifying
• For example, solving homework problems (searching
for solutions) is harder than grading (verifying the
solution is correct)
Searching versus verifying
Mathematician:
Given a mathematical claim, come up with a proof for it.
Scientist:
Given a collection of data on some phenomena, ﬁnd a theory
explaining it.
Engineer:
Given a set of constraints (on cost, physical laws, etc.) come up
with a design (of an engine, bridge, etc) which meets them.
Detective:
Given the crime scene, ﬁnd “who’s done it”.
```