```Parsing Techniques
A Practical Guide
by
Dick Grune and
Ceriel J. H. Jacobs
Book in slide format
• Fantastic book: Parsing Techniques
• I went through chapter 4 of the book and created slides of it.
• That is, the following slides is chapter 4, in slide form.
• Additionally, there are several slides from:
• Personal correspondence with one of the authors, Dick Grune.
• Material from other sources.
• Slides that I created, applying the concepts to XML.
Roger L. Costello
April 17, 2015
General Non-Directional
Parsing
We will learn about two parsing methods
• This chapter presents two non-directional parsing methods:
• Unger’s method
• CYK method
• They are called non-directional because they access the input in a
seemingly arbitrary order.
• They require the entire input to be in memory before parsing can
start.
Unger’s method
• Unger’s method is top-down.
• Suppose we wish to determine if the input is derivable from this
grammar:
S → A1 A2 … A m
• Then a portion of the input, t1, must be derivable from A1, a portion
of the input, t2, must be derivable from A2, etc.
S
A1
A2
t1
t2
…
Am
tm
Find a partition of the input
• Unger’s method tries to find such a partitioning of the input.
• This partitioning problem is recursive: if a non-terminal Ai is to derive
a certain part of the input, there must be a partition of this part that
fits a right-hand of Ai.
partition (symbol, input)
{
if terminal(symbol) then return symbol
for each Ai=RHS(symbol)
{
ti = subset(input)
return partition(Ai, ti)
}
}
Ultimately a right-hand side must consist of
terminal symbols only, and these can easily be
match with the current part of the input.
CYK method
• The CYK method is bottom-up.
• It finds occurrences of right-hand sides in the input; whenever it finds
one, it replaces the input with the left-hand side.
A → aa
Input: babaa
babA
This sentential form is again the subject
of a search for right-hand sides, etc.
Ultimately we may find a sentential
form that derives the input and is a
right-hand side of the start symbol.
4.1 Unger’s Parsing Method
The input to Unger’s method is a context-free (CF) grammar and an
input sentence:
CF grammar
input sentence
Unger Parser
We first discuss Unger’s parsing method for grammars without ε-rules
and without loops (see Section 2.9.4).Then the problems introduced by
ε-rules will be discussed and the parsing method will be modified to
allow for all CF grammars.
4.1.1 Unger’s Method
without ε-Rules or Loops
Consider the problem of putting m marbles
into c cups
• Suppose there are c cups and m marbles.
• The cups are numbered c1, c2, …, cc.
• The marbles are numbered m1, m2, …, mm.
• Problem: find all possible partitions of the marbles into the cups such
that each cup has at least one marble and no cup has a higher
numbered marble than an earlier cup.
Partitions for 3 marbles and 2 cups
m1
m2
m2
m1
c1
c2
c1
m3
Partition #1
m3
c2
Partition #2
Here's another way of schematically
representing the partitions
marbles
cup1 cup2
m1
m2, m3
m1, m2
m3
Partition algorithm
partition (marbles, cups)
{
Put marble 1 in cup 1, then generate all partitions of
the other m-1 marbles over the other c-1 cups.
Next, put marble 1 and marble 2 in cup 1, then generate all
partitions of the other m-2 marbles over the other c-1 cups.
etc.
If the number of marbles is less than the number of cups,
no partition is possible.
}
Unger method: find all possible partitions on
the input string
Example: suppose we have a grammar rule:
S → ABC | DE | F
and we want to find out whether S derives the input sentence pqrs.
The initial parsing problem can then be schematically represented as:
S
pqrs
For each right-hand side we must generate all possible partitions of the
input sentence.
Partition pqrs over ABC
Partitioning the input corresponds to partitioning the input symbols
over the right-hand side symbols. If a right-hand side has more symbols
than the sentence, no partition can be found (since there are no εrules). Here are all possible partitions for the first right-hand side:
S
S → ABC | DE | F
A
B
C
p
p
pq
q
qr
r
rs
s
s
Now there are a series of sub-problems
S
A
B
C
p
p
pq
q
qr
r
rs
s
s
This first partition results in the following sub-problems:
does A derive p, does B derive q, and does C derive rs?
These sub-problems must all be answered in the affirmative,
or the partition is not the right one.
We've gone from this big problem:
Problem: Does S derive pqrs?
to these smaller problems:
Problem: Does A derive p?
Problem: Does B derive q?
Problem: Does C derive rs?
Partition pqrs over DE
S → ABC | DE | F
S
D
E
p
pq
pqr
qrs
rs
s
Partition pqrs over F
S → ABC | DE | F
S
F
pqrs
This partition does not result in a simpler problem.
Here's why we have disallowed loops in the
grammar
S
F
pqrs
Suppose the rule for F loops back to S:
F → S
the partition for F resulted in the original string. With
this loop we get the original problem back again and
again.
Let's take an example
In the following discussion we will use this arithmetic expression
grammar:
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
We will use this sentence as the input:
(i + i) * i
Partition the input over Expr + Term
Expr
Expr → Expr + Term | Term
Expr
+
Term
(
(
(
(
(
(i
(i
(i
(i
(i+
(i+
(i+
(i+i
(i+i
(i+i)
i
i+
i+i
i+i)
i+i)*
+
+i
+i)
+i)*
i
i)
i)*
)
)*
*
+i)*i
i)*i
)*i
*i
i
i)*i
)*i
*i
i
)*i
*i
i
*i
i
i
15 partitions. The unoptimized version of the Unger algorithm
would examine each of the 15 partitions. But a simple check can
result in huge savings (see next slide).
Need only examine one of the partitions
Expr
Expr
+
Term
(
(
(
(
(
(i
(i
(i
(i
(i+
(i+
(i+
(i+i
(i+i
(i+i)
i
i+
i+i
i+i)
i+i)*
+
+i
+i)
+i)*
i
i)
i)*
)
)*
*
+i)*i
i)*i
)*i
*i
i
i)*i
)*i
*i
i
)*i
*i
i
*i
i
i
This is the only partition that has a chance of succeeding
because it is the only one that matches the middle
column (the + terminal symbol). The other partitions
can be rejected without further investigation.
First sub-problem
Expr
Expr
+
Term
(i
+
i)*i
The first sub-problem is to find
out whether Expr derives (i
Can rule out: Expr → Expr + Term
Expr
Expr
+
Term
(i
+
i)*i
We cannot partition (i into three
non-empty parts because (i only
consists of 2 symbols. So this rule
cannot apply.
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
Only applicable rule: Expr → Term
Expr
Expr
+
Term
(i
+
i)*i
The only rule that we can apply is
this rule: Expr → Term
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
Only applicable rule: Term → Factor
Expr
Term
(i
The only rule that we can apply is
this rule: Term → Factor
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
No applicable rule
Expr
Term
Factor
(i
It is impossible for Factor to
derive (i because the first righthand side of Factor has too
Expr
many symbols and the second
Term
one consists of one terminal
Factor
symbol only.
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
Here's what we've been investigating:
Does this rule …
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
… derive this input
(i + i) * i
Conclusion: Expr → Expr + Term does not
derive the input
Expr
Expr
+
Term
(
(
(
(
(
(i
(i
(i
(i
(i+
(i+
(i+
(i+i
(i+i
(i+i)
i
i+
i+i
i+i)
i+i)*
+
+i
+i)
+i)*
i
i)
i)*
)
)*
*
+i)*i
i)*i
)*i
*i
i
i)*i
)*i
*i
i
)*i
*i
i
*i
i
i
This is the only partition that had a chance of
succeeding. But we ruled it out as well because we
found that Expr could not derive (i
Partition the input over Term
Expr
Term
(i+i)*i
1 partition
The second right-hand side of Expr consists of
only one symbol, so there is only one partition.
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
Partition the input over Term * Factor
Expr
Term
Term
*
Factor
(
(
(
(
(
(i
(i
(i
(i
(i+
(i+
(i+
(i+i
(i+i
(i+i)
i
i+
i+i
i+i)
i+i)*
+
+i
+i)
+i)*
i
i)
i)*
)
)*
*
+i)*i
i)*i
)*i
*i
i
i)*i
)*i
*i
i
)*i
*i
i
*i
i
i
15 partitions
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
Need only examine one of the partitions
Expr
Term
Expr
Term
Expr
*+
( (
ii
i+
( (
i+
i+i
( (
i+i
i+i)
( (
i+i)
i+i)*
( (
i+i)*
(i (i
++
+i
(i (i
+i
+i)
(i (i
+i)
+i)*
(i (i
+i)*
(i+(i+
ii
i)
(i+(i+
i)
i)*
(i+(i+
i)*
(i+i
(i+i
))
(i+i
)*
(i+i
)*
(i+i)
(i+i)
**
Factor
Term
+i)*i
+i)*i
i)*i
i)*i
)*i
)*i
*i
*i
ii
i)*i
i)*i
)*i
)*i
*i
*i
ii
)*i
)*i
*i
*i
ii
*i
*i
ii
ii
This is the only partition that has a chance of succeeding
because it is the only one that matches the middle
column (the * terminal symbol).
Solve this sub-problem
Expr
Term
Expr
Term
Expr
*+
(i+i)
*
Factor
Term
i
Find out whether Term derives
(i+i)
Can rule out: Term → Term * Factor
Expr
Term
Expr
Term
Expr
*+
(i+i)
*
Factor
Term
i
This uses a + terminal whereas
Term * Factor uses a *
terminal
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
Only applicable rule: Term → Factor
Expr
Term
Expr
Term
Expr
*+
(i+i)
*
Factor
Term
i
The only rule that we can apply is
this rule: Term → Factor
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
Applicable rule: Factor → ( Expr )
Expr
Term
Expr
Term
Factor
(i+i)
Partition this over ( Expr )
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
Partition (i + i) over ( Expr )
Expr
Term
Expr
Term
Factor
(Expr Expr
+
(
i+i
)
Term
)
This is the only partition that has
a chance of succeeding
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
Partition i + i over Expr
Continuing our search we find the following derivation:
Expr
i+i
Expr + Term
i+i
Expr
i
Term
i
Term
i
Factor
i
Factor
i
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
The example demonstrates several aspects of
the Unger method:
1. Even small examples require a considerable amount of work.
2. Some simple checks can result in huge savings.
• Example: matching the terminal symbol in a right-hand side with the partition
at hand often leads to the rejection of the partition without investigating the
partition any further.
Expr
Expr
+
Term
(
(
(
(
(
(i
(i
(i
(i
(i+
(i+
(i+
(i+i
(i+i
(i+i)
i
i+
i+i
i+i)
i+i)*
+
+i
+i)
+i)*
i
i)
i)*
)
)*
*
+i)*i
i)*i
)*i
*i
i
i)*i
)*i
*i
i
)*i
*i
i
*i
i
i
These partitions can be rejected
because their middle parts don't
match the terminal symbol +
Another optimization
Unger [1] presents several checks that can result in huge savings. For
example, one can compute the minimum length of the strings that
would derive from a non-terminal. Any partitions with a shorter length
can be immediately rejected.
[1] Communications of the ACM, 11(4):240-247, April 1968.
4.1.2 Unger’s Method with
ε-Rules
Consider the problem of putting m marbles
into c cups and cups may be empty
• As before, suppose there are c cups and m marbles.
• The cups are numbered c1, c2, …, cc.
• The marbles are numbered m1, m2, …, mm.
• Problem: find all possible partitions of the marbles into the cups such
that each cup has zero or more marbles and no cup has a higher
numbered marble than an earlier cup.
All partitions of 3 marbles and 2 cups
c1
m1 m3
m2
m1
m2
c2
c1
c2
Partition #1
m3
Partition #2
m2
m1
m3
m2 m3
m1
c1
c2
c1
Partition #3
c2
Partition #4
More partitions are possible when
the cups can be empty
Here's another way of schematically
representing the partitions
marbles
cup1
m1
m1, m2
m1, m2, m3
cup2
m1, m2, m3
m2, m3
m3
More partitions when grammar has ε-rules
• Consider the grammar rule S → ABC and the input sentence pqr.
• Does the rule derive the input sentence?
• If we allow ε-rules many more partitions will have to be investigated
because each of the non-terminals A, B, and C may derive the empty
string.
S
A
B
p
pq
pqr
p
p
p
pq
pq
pqr
q
qr
C
pqr
qr
r
qr
r
r
r
10 partitions.
Partition pqr over SD
• Now suppose we investigate whether B derives pqr.
• Suppose there is a rule B → SD.
• Then the following partitions must be investigated:
B
S
p
pq
pqr
D
pqr
qr
r
Here's why ε-rules cause trouble
B
S
p
pq
pqr
D
pqr
qr
r
In the process of finding whether S derives pqr, we end up asking the same
question again, in a different context. If we are not careful and do not detect this,
our parser will loop forever, or run out of memory.
General formulation of the problem with ε-rules
• When searching along a path, we are looking for a derivation that has
a recursive loop in the grammar, of the form S → … → αSβ.
• If the grammar contains ε-rules then the parser must assume that α
and β can produce ε, so this loop will cause the parser to ask the
question “does S derive pqr?” over and over again.
Avoid the looping problem by cutting off the
search process:
• Maintain a list of questions that we are currently investigating.
• Before starting to investigate a new question (for example, “does S
derive pqr?”), check that the question does not already appear in the
list. If it does, do not investigate the question. Instead, proceed as if
Example grammar with ε-rules
The below grammar generates sequences of d's.
S
L
D
→
→
→
LSD | ε
ε
d
Here's how it generates one d:
S → LSD → SD → D → d
Here's how it generates two d's:
S → LSD → SD → LSDD → SDD → DD → dD → dd
Does S generate d?
• Let's use the Unger method to see if S can generate d.
• The next slide shows the complete search path for the question
*
S→
d? (does S generate d?)
* d?
Complete search path for: S →
* ε?
L→
* ε?
S→
* ε?
ε→
yes
L S D
* ε?
L→
ε ε ε
* ε?
ε→
yes
* d?
D→
* d?
d→
yes
* ε?
L→
* ε?
ε→
yes
L S D
* d?
S→
ε ε d
ε d ε
d ε
ε
* d? no
ε→
* d?
S→
* d?
L→
cut-off: no
* d?
ε→
no
* ε?
S→
* ε?
ε→
cut-off: no
yes
S
L
D
→
→
→
LSD | ε
ε
d
* ε?
L→
partition d over LSD
* ε?
S→
* ε?
ε→
yes
L S D
* ε?
L→
ε ε ε
* ε?
ε→
yes
* d?
D→
* d?
d→
yes
* ε?
L→
* ε?
ε→
yes
L S D
* d?
S→
ε ε d
ε d ε
d ε
ε
* d? no
ε→
* d?
S→
* d?
L→
cut-off: no
* d?
ε→
no
* ε?
S→
* ε?
ε→
cut-off: no
yes
S
L
D
→
→
→
LSD | ε
ε
d
* ε?
L→
partition d over ε
* ε?
S→
* ε?
ε→
yes
L S D
* ε?
L→
ε ε ε
* ε?
ε→
yes
* d?
D→
* d?
d→
yes
* ε?
L→
* ε?
ε→
yes
L S D
* d?
S→
ε ε d
ε d ε
d ε
ε
* d? no
ε→
* d?
S→
* d?
L→
cut-off: no
* d?
ε→
no
* ε?
S→
* ε?
ε→
cut-off: no
yes
S
L
D
→
→
→
LSD | ε
ε
d
* ε?
L→
Does L derive ε?
* ε?
S→
* ε?
ε→
yes
L S D
* ε?
L→
ε ε ε
* ε?
ε→
yes
* d?
D→
* d?
d→
yes
* ε?
L→
* ε?
ε→
yes
L S D
* d?
S→
ε ε d
ε d ε
d ε
ε
* d? no
ε→
* d?
S→
* d?
L→
cut-off: no
* d?
ε→
no
* ε?
S→
* ε?
ε→
cut-off: no
yes
S
L
D
→
→
→
LSD | ε
ε
d
* ε?
L→
Does S derive ε?
* ε?
S→
* ε?
ε→
yes
L S D
* ε?
L→
ε ε ε
* ε?
ε→
yes
* d?
D→
* d?
d→
yes
* ε?
L→
* ε?
ε→
yes
L S D
* d?
S→
ε ε d
ε d ε
d ε
ε
* d? no
ε→
* d?
S→
* d?
L→
cut-off: no
* d?
ε→
no
* ε?
S→
* ε?
ε→
cut-off: no
yes
S
L
D
→
→
→
LSD | ε
ε
d
* ε?
L→
Does S derive ε?
* ε?
S→
* ε?
ε→
yes
L S D
* ε?
L→
ε ε ε
* ε?
ε→
yes
* d?
D→
* d?
d→
yes
* ε?
L→
* ε?
ε→
yes
* ε?
S→
* ε?
ε→
yes
cut-off: no
L S D
* d?
S→
ε ε d
ε d ε
d ε
ε
* d? no
ε→
* d?
S→
* d?
L→
cut-off: no
* d?
ε→
no
Does S derive ε?
so we terminate this search path.
S
L
D
→
→
→
LSD | ε
ε
d
* ε?
L→
Does S derive ε?
* ε?
S→
ε → ε?
yes
L S D
* ε?
L→
ε ε ε
* ε?
ε→
yes
* d?
D→
* d?
d→
yes
* ε?
L→
* ε?
ε→
yes
L S D
* d?
S→
ε ε d
ε d ε
d ε
ε
* d? no
ε→
* d?
S→
* d?
L→
cut-off: no
* d?
ε→
no
* ε?
S→
* ε?
ε→
cut-off: no
yes
S
L
D
→
→
→
LSD | ε
ε
d
* ε?
L→
Does D derive d?
* ε?
S→
ε → ε?
yes
L S D
* ε?
L→
ε ε ε
* ε?
ε→
yes
* d?
D→
* d?
d→
yes
* ε?
L→
* ε?
ε→
yes
L S D
* d?
S→
ε ε d
ε d ε
d ε
ε
* d? no
ε→
* d?
S→
* d?
L→
cut-off: no
* d?
ε→
no
* ε?
S→
* ε?
ε→
cut-off: no
yes
Does S derive d? Yes!
S
L
D
→
→
→
* ε?
L→
LSD | ε
ε
d
* ε?
S→
* ε?
ε→
yes
L S D
* ε?
ε→
* ε?
L→
ε ε ε
* ε?
ε→
yes
* d?
D→
* d?
d→
yes
* ε?
L→
* ε?
ε→
yes
* ε?
S→
yes
cut-off: no
L S D
* d?
S→
ε ε d
ε d ε
d ε
ε
* d? no
ε→
* d?
S→
* d?
L→
cut-off: no
* d?
ε→
S
L
S
ε
ε
yes yes
no
D
d
yes
Yes, S derives d
Why can we cut-off with repeats?
* ε?
L→
Does S derive ε?
* ε?
S→
* ε?
ε→
yes
L S D
* ε?
L→
ε ε ε
* ε?
ε→
yes
* d?
D→
* d?
d→
yes
* ε?
L→
* ε?
ε→
yes
* ε?
S→
* ε?
ε→
yes
cut-off: no
L S D
* d?
S→
ε ε d
ε d ε
d ε
ε
* d? no
ε→
* d?
S→
* d?
L→
cut-off: no
* d?
ε→
no
Does S derive ε?
so we terminate this search path.
If we didn't get a "yes" for the first time that
isn't going to produce a different answer.
Does S generate dd?
• Let's use the Unger method to see if S can generate dd.
• The next slide shows the complete search path for the question
*
S→
dd? (does S generate dd?)
* ε?
L→
* ε?
S→
* ε?
ε→
L S D
* dd?
S→
S
ε
d
dd
ε
d
ε
D
dd
d
ε
d
ε
ε
* dd? no
ε→
* ε?
L→
ε ε ε
* ε?
ε→
L
ε
ε
ε
d
d
dd
yes
yes
* dd?
D→
* dd? no
d→
* ε?
L→
* ε?
ε→
* d?
S→
yes
yes: see previous example
* d?
D→
* d? yes
d→
* ε?
L→
* ε?
ε→
yes
* dd? cut-off: no
S→
* d?
L→
* d?
ε→
no
* d?
L→
* d?
ε→
no
* dd?
L→
* dd? no
ε→
* ε?
S→
* ε?
ε→
cut-off: no
yes
* ε?
L→
* ε?
S→
* ε?
ε→
L S D
* dd?
S→
S
ε
d
dd
ε
d
ε
D
dd
d
ε
d
ε
ε
* dd? no
ε→
* ε?
L→
ε ε ε
* ε?
ε→
L
ε
ε
ε
d
d
dd
yes
yes
* dd?
D→
* dd? no
d→
* ε?
L→
* ε?
ε→
* d?
S→
* ε?
S→
* ε?
ε→
yes
cut-off: no
yes
yes: see previous example
* d?
D→
* d? yes
d→
* ε?
L→
* ε?
ε→
yes
* dd? cut-off: no
S→
* d?
L→
* d?
ε→
no
* d?
L→
* d?
ε→
no
* dd?
L→
* dd? no
ε→
yes, yes, yes. S does derive dd.
* ε?
L→
* ε?
S→
* ε?
ε→
L S D
* dd?
S→
S
ε
d
dd
ε
d
ε
D
dd
d
ε
d
ε
ε
* dd? no
ε→
yes
* dd?
D→
* dd? no
d→
* ε?
L→
* ε?
ε→
* d?
S→
* ε?
ε→
* ε?
L→
ε ε ε
* ε?
ε→
L
ε
ε
ε
d
d
dd
yes
* ε?
S→
yes
cut-off: no
yes
yes: see previous example
* d?
D→
* d? yes
d→
* ε?
L→
* ε?
ε→
yes
* dd? cut-off: no
S→
* dd?
S→
* d?
L→
* d?
ε→
no
* d?
L→
* d?
ε→
no
* dd?
L→
* dd? no
ε→
* ε?
L→
* ε?
L→
* d?
S→
* ε?
S→
* d?
D→
* d?
D→
* ε?
L→
* ε?
S→
* ε?
ε→
L S D
* dd?
S→
S
ε
d
dd
ε
d
ε
D
dd
d
ε
d
ε
ε
* dd? no
ε→
yes
* dd?
D→
* dd? no
d→
* ε?
L→
* ε?
ε→
* d?
S→
* ε?
ε→
* ε?
L→
ε ε ε
* ε?
ε→
L
ε
ε
ε
d
d
dd
yes
* ε?
S→
yes
cut-off: no
yes
yes: see previous example
* d?
D→
* d? yes
d→
* ε?
L→
* ε?
ε→
yes
* dd? cut-off: no
S→
* dd?
S→
* d?
L→
* d?
ε→
no
* d?
L→
* d?
ε→
no
* dd?
L→
* dd? no
ε→
* ε?
L→
* ε?
L→
* d?
S→
* ε?
S→
* d?
D→
* d?
D→
S → LSD → SD → LSDD → SDD →
DD → dD → dd
* ε?
L→
* ε?
S→
* ε?
ε→
L S D
* dd?
S→
S
ε
d
dd
ε
d
ε
D
dd
d
ε
d
ε
ε
* dd? no
ε→
* ε?
L→
ε ε ε
* ε?
ε→
L
ε
ε
ε
d
d
dd
yes
yes
* dd?
D→
* dd? no
d→
* ε?
L→
* ε?
ε→
* d?
S→
* ε?
S→
* ε?
ε→
yes
cut-off: no
yes
yes: see previous example
* d?
D→
* d? yes
d→
* ε?
L→
* ε?
ε→
yes
* dd? cut-off: no
S→
* d?
L→
* d?
ε→
no
* d?
L→
* d?
ε→
no
* dd?
L→
* dd? no
ε→
The question whether L derives ε is asked
many times: see the 4 yellow ovals.
* ε?
L→
* ε?
S→
* ε?
ε→
L S D
* dd?
S→
S
ε
d
dd
ε
d
ε
D
dd
d
ε
d
ε
ε
* dd? no
ε→
* ε?
L→
ε ε ε
* ε?
ε→
L
ε
ε
ε
d
d
dd
yes
yes
* dd?
D→
* dd? no
d→
* ε?
L→
* ε?
ε→
* d?
S→
* ε?
S→
* ε?
ε→
yes
cut-off: no
yes
yes: see previous example
* d?
D→
* d? yes
d→
* ε?
L→
* ε?
ε→
yes
* dd? cut-off: no
S→
* d?
L→
* d?
ε→
no
* d?
L→
* d?
ε→
no
* dd?
L→
* dd? no
ε→
Optimization: We can save much time by
remembering the answers to the questions.
In fact, by remembering answers to
questions the efficiency is improved
dramatically, from exponential to polynomial.
The term memoization is used for the
questions.
Another optimization
• Another possible optimization is achieved by computing in advance
which non-terminals can derive ε.
• In fact, this is a special case of computing the minimum length of a
terminal string that each non-terminal derives.
• If a non-terminal derives ε, the minimum length is 0.
4.1.3 Getting Parse-Forest
Grammars from Unger Parsing
Recall the notation
Expr_1_2
length
starting position
73
Let's take an example
Create a parse-forest grammar for the arithmetic expression grammar
that we examined in Section 4.1.1:
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
As before, we will use this sentence as the input:
(i + i) * i
Partition (i + i) * i over Expr + Term
Expr
Expr
+
Term
(
(
(
(
(
(i
(i
(i
(i
(i+
(i+
(i+
(i+i
(i+i
(i+i)
i
i+
i+i
i+i)
i+i)*
+
+i
+i)
+i)*
i
i)
i)*
)
)*
*
+i)*i
i)*i
)*i
*i
i
i)*i
)*i
*i
i
)*i
*i
i
*i
i
i
15 partitions. Create a parse-forest grammar rule for each of these
partitions. The grammar clean-up algorithm from Section 2.9.5 will
remove all the rules except the one with the arrow next to it.
Parse-forest rule for each partition
Expr
Expr
+
Term
(
(
(
(
(
(i
(i
(i
(i
(i+
(i+
(i+
(i+i
(i+i
(i+i)
i
i+
i+i
i+i)
i+i)*
+
+i
+i)
+i)*
i
i)
i)*
)
)*
*
+i)*i
i)*i
)*i
*i
i
i)*i
)*i
*i
i
)*i
*i
i
*i
i
i
Expr_1_7 → Expr_1_1 +_2_1 Term_3_5
...
Expr_1_7 → Expr_1_2 +_3_1 Term_4_4
...
Undefined terminal
Expr
Expr
+
Term
(
(
(
(
(
(i
(i
(i
(i
(i+
(i+
(i+
(i+i
(i+i
(i+i)
i
i+
i+i
i+i)
i+i)*
+
+i
+i)
+i)*
i
i)
i)*
)
)*
*
+i)*i
i)*i
)*i
*i
i
i)*i
)*i
*i
i
)*i
*i
i
*i
i
i
Expr_1_7 → Expr_1_1 +_2_1 Term_3_5
...
This says there is a + at position 2. But
there isn't. So this terminal is
undefined. Consequently, the rule will
be removed by the clean-up algorithm.
Further attempt described in Section 4.1.1:
Expr
Expr
+
Term
(i
+
i)*i
Parse-forest rule:
Expr_1_2 → Term_1_2
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
Further attempt described in Section 4.1.1:
Expr
Term
Parse-forest rule:
Term_1_2 → Factor_1_2
(i
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
Further attempt described in Section 4.1.1:
Expr
Term
Parse-forest rule:
Factor_1_2 → i_1_2
Factor
This says there is an i at position 1. But
there isn't. So this terminal is
undefined. Consequently, the rule will
be removed by the clean-up algorithm.
(i
Expr
Term
Factor
→
→
→
Expr + Term | Term
Term * Factor | Factor
( Expr ) | i
• Unger parsing is a top-down parsing method.
• It creates a lot of undefined non-terminals and undefined terminals.
• These undefined items represent hypotheses of the top-down
process that did not materialize.
Cleaned-up parse-forest grammar
• For the expression grammar the parsing process generates a parseforest grammar of 294 rules.
• After clean-up the below parse-forest grammar remains, with 11
rules.
Expr_1_7
Term_1_7
Term_1_5
Factor_1_5
Expr_2_3
Expr_2_1
Term_2_1
Factor_2_1
Term_4_1
Factor_4_1
Factor_7_1
→
→
→
→
→
→
→
→
→
→
→
Term_1_7
Term_1_5 *_6_1 Factor_7_1
Factor_1_5
(__1_1 Expr_2_3 )_5_1
Expr_2_1 +_3_1 Term_4_1
Term_2_1
Factor_2_1
i_2_1
Factor_4_1
i_4_1
i_7_1
4.2 The CYK Parsing Method
Discoverers
• This parsing method is attributed to J. Cocke, D. H. Younger, and T.
Kasami, who independently discovered variations of the method.
• It is now known as the Cocke-Younger-Kasami method, or the CYK
method.
Inputs
grammar
sentence
CYK
algorithm
Two phases
CYK Algorithm
Phase 1
(recognition phase)
Table that shows which nonterminal(s) derive which
substrings of the input sentence.
(recognition table)
Phase 2
All possible derivations
of the input sentence
4.2.1 CYK Recognition with
General CF Grammars
Learn CYK using this grammar for numbers
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Here's how to derive 32.5e+1
Number → Real → Integer Fraction Scale → Integer Digit Fraction Scale → Digit Digit Fraction Scale →
3 Digit Fraction Scale → 3 2 Fraction Scale → 3 2 . Integer Scale → 3 2 . Digit Scale → 3 2 . 5 Scale →
3 2 . 5 e Sign Integer → 3 2 . 5 e + Integer → 3 2 . 5 e + Digit → 3 2 . 5 e + 1
Unit rules (a.k.a. single rules, chain rules)
• Unit rules are grammar rules of the form A → B, where A and B are
non-terminals. Such rules are also called single rules or chain rules.
• Here are some unit rules from the grammar on the previous slide:
Number → Integer
Integer → Digit
• In fact, those two unit rules can be chained together:
Number → Integer → Digit
Derive strings of length 1 first
• The CYK algorithm first concentrates on substrings of the input
sentence, shortest strings first, and then works its way up.
• The following substrings of length 1 can be derived directly from the
grammar:
Digit Digit
3
2
Digit
.
5
e
Sign
Digit
+
1
Unit rules can chain to Digit
Digit Digit
3
2
Digit
.
5
e
Sign
Digit
+
1
Digit → 3
Other non-terminals can chain to Digit to derive 3:
Integer → Digit
→ 3
Number → Integer → Digit → 3
Next step: apply unit rules
The next step is to apply the unit rules, repetitively, to find out which other
non-terminals derive the substrings of length 1. This gives the following
result:
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
Next, find combinations that are in the
grammar
An Integer followed by a Digit is derived from Integer
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Integer
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
A unit rule can chain to Integer
Integer
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
Integer → … → 3 2
Other non-terminals can chain to Integer to derive 3 2:
Number → Integer → … → 3 2
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number, Integer
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
Find other combinations that are in the
grammar
Dot (.) followed by a Integer is derived from Fraction
Number, Integer
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number, Integer
Fraction
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
Find other combinations that are in the
grammar
An e followed by a Sign followed by an Integer is derived from Scale
Number, Integer
Fraction
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number, Integer
Fraction
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Scale
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
There are no unit rules that chain to Fraction
or Scale
Number, Integer
Fraction
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Scale
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Find other combinations that are in the
grammar
An Integer followed by a Fraction followed by a Scale is derived from Real
Number, Integer
Fraction
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Scale
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Real
Number, Integer
Fraction
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Scale
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
A unit rule can chain to Real
Number, Real
Number, Integer
Fraction
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Scale
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
Real → … → 2.5e+1
Other non-terminals can chain to Real to derive 2.5e+1:
Number → Real → … → 2.5e+1
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Can use the rule for Real at a bigger level
This box represents Integer. An Integer followed by a Fraction followed by a Scale is derived from Real
Number, Real
Number, Integer
Fraction
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Scale
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Real
Number, Real
Number, Integer
Fraction
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Scale
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
A unit rule can chain to Real
Number, Real
Number, Real
Number, Integer
Fraction
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Scale
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
Real → … → 32.5e+1
Other non-terminals can chain to Real to derive 32.5e+1:
Number → Real → … → 32.5e+1
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
So we find that Number does derive 32.5e+1
Number, Real
Number, Real
Number, Integer
Fraction
Number, Number,
Integer, Integer,
Digit
Digit
3
2
Scale
Number,
Integer,
Digit
.
5
e
Sign
Number,
Integer,
Digit
+
1
Complications
• In the previous example we see that unit rules complicate things a bit.
• Another complication is from ε-rules, which we examine next.
Example involving an ε-rule
If we want to recognize 43.1 we have to realize that Scale derives ε,
so we get the following picture:
Number, Real
Number, Real
Number, Integer
Fraction
Number, Number,
Integer, Integer,
Digit
Digit
4
3
Scale
Number,
Integer,
Digit
.
1
ε
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
• We must take into account the fact that several non-terminals can
derive ε between any two adjacent terminal symbols in the input
sentence, and also in front of the input sentence or at the back.
• We will see that the problems caused by these kinds of rules can be
solved, albeit at a certain cost.
CYK algorithm: recognize short substrings first
As we've seen, the CYK algorithm works by determining which nonterminals derive which substrings, shortest substrings first.
Also note that CYK does not process the input from
left to right. It processes the input in arbitrary order,
i.e., CYK is a non-directional parsing algorithm.
ε is the shortest substring
• Although we skipped them in the example, the shortest substrings of
any input sentence are the ε-substrings
• We must find the set of non-terminals that derive ε
• We will call this set Rε
• The following slides describe a closure algorithm to compute Rε
"R" stands for recognition set. Rε is the set of nonterminals that recognize the empty string.
Closure algorithms
• Closure algorithms are characterized by two components:
1. Initialization: what we know initially.
2. Inference rule: a rule telling how knowledge from several places is to be
combined.
• The inference rule is repeated until nothing changes any more.
114
Initialization
• The set Rε is initialized to the set of non-terminals A for which A → ε
• For the example grammar, Rε is initially this set: { Empty }
non-terminal derives ε
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Inference
• A grammar rule derives ε if all symbols on its right-hand side derive ε.
• So, check each grammar rule: If a right-hand side consists only of
symbols that are a member of Rε, we add the left-hand side to Rε
Use a closure algorithm to find Rε
Initial knowledge: The set Rε is initialized to the set of non-terminals A
for which A → ε
Grammar rule
Rε
Number → Integer | Real
{ }
Integer → Digit | Integer Digit
{ }
Real → Integer Fraction Scale
{ }
Fraction → . Integer
{ }
{ }
Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
{ }
Sign → + | -
{}
Empty → ε
{ Empty }
First application of the inference rule
Grammar rule
Rε
Number → Integer | Real
{ }
Integer → Digit | Integer Digit
{ }
Real → Integer Fraction Scale
{ }
Fraction → . Integer
{ }
{ Scale }
Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
{ }
Sign → + | -
{}
Empty → ε
{ Empty }
Second application of the inference rule
Grammar rule
Rε
Number → Integer | Real
{ }
Integer → Digit | Integer Digit
{ }
Real → Integer Fraction Scale
{ }
Fraction → . Integer
{ }
{ Scale }
Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
{ }
Sign → + | -
{}
Empty → ε
{ Empty }
No changes, so Rε = {Empty, Scale }
Next, non-empty substrings
• We saw how to find the non-terminals that derive substrings of
length zero.
• We now direct our attention to the non-empty substrings of
the input sentence.
120
Set of non-terminals that derive a substring
Suppose we have an input sentence t = t1t2…tn and we want to
compute the set of non-terminals that derive the substring of t starting
at position i, of length l.
t1t2…titi+1…ti+l-1…tn
compute the set of non-terminals
that derive this substring: starts at
position i, of length l.
Notation for the substring
si,l
substring
length l
start at position i
Notation (cont.)
si,l
= the substring starting at position i, of length l.
= titi+1…ti+l-1
Non-terminals that derive si,l?
We want to compute the set of non-terminals
that derive the substring si,l
Notation for the set of non-terminals
Ri,l = the set of non-terminals that derive the substring si,l
Ri,l is a recognition set because it is the set of non-terminals that recognizes substring si,l
Determine if S is a member of R1,n
• Suppose we want to determine if an input string t with length n can
be derived from a grammar.
• The start symbol S derives t if and only if S is a member of R1,n.
S → A1 A2 … Am
A1 → …
A2 → …
input string
CYK Parser
The grammar derives the input string iff S ∈ R1,n
The notations can be extended for substrings
of length 0
si,0 = ε
Ri,0 = Rε
for all i
Graphical presentation of the substring
notation, using a sentence of 4 symbols
s1,4
4
s2,3
3
s1,3
s3,2
2
s2,2
s1,2
s1,1
s1,0
t1
s2,1
s2,0
t2
s3,1
s3,0
s4,1
s4,0
t3
position
t4
1
0
length
Compute non-terminals that derive l-length
substrings
• Because shorter substrings are dealt with first, we can assume that
we are at a stage in the algorithm where all information on substrings
with length smaller than l is available.
• Using this information, we check each right-hand side in the grammar,
to see if it derives si,l, as follows:
next slide 
Does grammar rule R → A1…Am derive si,l
Suppose we have a right-hand side A1…Am. We divide si,l into m
(possibly empty) segments, such that A1 derives the first segment, A2
the second, etc. We start with A1. If A1…Am is to derive si,l, A1 has to
derive a first part of it, say of length k. That is, A1 must derive si,k (A1
must be a member of Ri,k), and A2…Am must derive the rest:
ti
…
A1
A2
…
Am
ti+k-1
ti+k
ti+k+1
…
ti+l-1
Example
• Suppose the substring si,l is, say, of length 10.
• Suppose we have already found the non-terminals that derive
substrings, starting at position i, with length less than 10. The nonterminals are in these sets: Ri,0, Ri,1, Ri,2, Ri,3, Ri,4, Ri,5, Ri,6, Ri,7, Ri,8,
and Ri,9.
• So, if A1 is a non-terminal, we can check to see if it is in any of those
sets.
• Suppose we find A1 in Ri,4. That means non-terminal A1 derives,
starting at position i, a substring of length 4. Then A2…Am must derive
the rest. So we repeat the process with A2 for a substring of length 6,
starting at position i+4.
Example (cont.)
• Suppose we find A1 in Ri,2 and Ri,4. That means non-terminal A1
derives, starting at position i, a substring of length 2 and of length 4.
Then A2…Am must derive either the remaining 8 symbols or the
remaining 6 symbols. We repeat the process with A2 for a substring of
length 8 and of length 6, starting at position i+4.
The CYK algorithm
• If A1…Am is to derive si,l, A1 has to derive a first part of it, say of length
k. That is, A1 must derive si,k (A1 must be a member of Ri,k), and
A2…Am must derive the rest.
• This is attempted for every k for which A1 is a member of Ri,k,
including k=0.
• If A1 is a terminal, then A1 must equal t1, and k is 1.
• Checking if A2…Am derives ti+k..ti+l-1 is done the same way. (Yay!
Recursion!)
Difference from Unger's method
The CYK method does not have to try all partitions because we already
know which non-terminals derive which substrings.
Problem #1 with the CYK algorithm
• Recall that we are evaluating a right-hand side A1…Am. Suppose m is 1
and A1 is a non-terminal. That is, we want to determine if A1 derives
si,l.
• This means we are dealing with a unit rule: S → A1
• If A1 is to derive substring si,l, A1 has to derive the entire substring. So
A1 must be a member of Ri,l.
• But we are trying to compute the set Ri,l. So to compute Ri,l we need
to already know Ri,l … that's a problem!
Ri,l closure algorithm
• Suppose the substring is: abc
• That is, si,3 is: abc
• Suppose we want to know if this rule derives the substring: S → A1
• Suppose A1 is defined with this rule: A1 → B
*
• Suppose B ultimately derives abc: B → … C →
abc
• So A1 does derive the substring and A1 should become a member of
the set Ri,3.
Ri,l closure algorithm (cont.)
Somewhere along the derivation there must be a first step that does
not use a unit rule.
*
S → A1 → B → … C →
abc
We know how to process non-unit rules!
Ri,l closure algorithm (cont.)
*
• S → A1 → B → … C →
abc
• At a certain moment in the process of computing the set Ri,3, C will be
• Now, if we repeat the computation of Ri,3 again and again, at some
point B will be added, and during the next repetition, A1 will be
• So we have to repeat the process until no new non-terminals are
• This, like the computation of Rε, is an example of a closure algorithm.
Solution to Problem #1
The solution to unit rules is to apply the Ri,l closure algorithm
Problem #2 with the CYK algorithm
• Problem: determine if A1…Am derives si,l, where all of the Ai's derive ε
except one.
• This problem is basically equivalent to the problem of unit rules.
S → A1…Ai…Am
ε
si,l
ε
Solution to Problem #2
As with the unit rules problem, the solution requires recomputation of
the entries of S until nothing changes any more, again using a closure
algorithm.
CYK algorithm
• Suppose we want to determine if a string t with length n can be
derived from a grammar.
• Compute R1,n by using the process described on the previous slides,
which involves computing all the Ri,l sets.
• The start symbol S derives t if and only if S is a member of R1,n.
CYK is a complicated process
Part of the complexity stems from the ε-rules and the unit rules. Their
presence forces us to do the Ri,l computation repeatedly; this is
inefficient because after the first computation of Ri,l recomputations
yield little new information.
A lot of work to try all possibilities
A right-hand side may consist of arbitrarily many non-terminals, and
trying all possibilities can be a lot of work.
• Recall our earlier example where we find A1 in Ri,2 and Ri,4. That means nonterminal A1 derives, starting at position i, a substring of length 2 and of length
4. Then A2…Am must derive either the remaining 8 symbols or the remaining 6
symbols. We repeat the process with A2 for a substring of length 8 and of
length 6, starting at position i+4.
Cost to compute one rule
Suppose we have a right-hand side A1…Am. We divide si,l into m
(possibly empty) segments. So there are m-1 segment ends. Finding a
segment end costs   actions since a list (Ri,l) proportional to the
length of the input has to be scanned. Further, each segment must
combine with the previous ones. So, finding the required m-1 segment
ends costs  −1 .
ti
…
A1
A2
…
Am
ti+k-1
ti+k
ti+k+1
…
segment end #1
ti+l-1
segment end #m-1
Cost to compute
• There are  2 elements in R, filling it completely costs  +1 , so
the time requirement is exponential in the maximum length of the
right-hand sides of the grammar.
• Example: the longest right-hand side in the below grammar is 3, so
the time requirement is  4 . This is far more efficient than
exhaustive search, which requires a time that is exponential in the
length of the input sentence, but still heavy enough to worry about.
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
longest right-hand side is 3
• Summary of problems with the CYK method: complicated process due
to unit rules and ε-rules, and big time-complexity.
• Imposing certain restrictions on the rules may solve these problems
to a large extent. However, these restrictions should not limit the
generative power of the grammar significantly.
4.2.2 CYK Recognition with a
Grammar in Chomosky Normal
Form
Desired restrictions
• Two of the restrictions that we want to impose on the grammars are:
• No unit rules
• No ε-rules
• We would also like to limit the maximum length of a right-hand side
to 2; this would reduce the time complexity to O(n3).
There is a grammar form with these
restrictions
• There is a form for CF grammars that exactly fits these restrictions:
Chomsky Normal Form (CNF).
• It is as if this if this normal form was invented for the CYK algorithm.
Chomsky Normal Form
A grammar is in Chomsky Normal Form when all rules either have the
form:
A→a
or:
A → BC
where a is a terminal and B, C are non-terminals.
CF to CNF
Any CF grammar can be mechanically transformed into a CNF grammar.
Let's see how the CYK algorithm works for a
grammar in CNF
• There are no ε-rules in a CNF grammar, so Rε is empty.
• The sets for substrings of length 1 (Ri,1) can be read directly from the
rules: they are determined by the rules of the form A → a.
• Rules of the form A → BC can never derive a single terminal because there
are no ε-rules.
• Next we proceed iteratively, first processing all substrings of length 2,
then all substrings of length 3, etc.
BC derives a substring of length l
When a right-hand side BC is to derive a substring of length l, B has to
derive the first part (which is non-empty) and C the rest (also nonempty):
B
ti
…
C
ti+k-1
ti+k
…
ti+l-1
So B must derive Si,k, that is, B must be a member of Ri,k, and likewise C
must derive Si+k,l-k, that is, C must be a member of Ri+k,l-k. Determining
if such a k exists is easy, just try all possibilities; they range from 1 to l1. All sets Ri,k and Ri+k,l-k have already been computed at this point.
A much simpler process
The process described on the previous slide is much less complicated
than the one we saw before, which worked with a general CF grammar,
for two reasons:
1. No unit rules: We do not have to repeat the process again and again until no
new non-terminals are added to Ri,l. Here, the substrings we are dealing
with are really substrings: they cannot be equal to the string we started out
with.
2. Two non-terminals: We have to find only one place where the substring
must split in two because the right-hand side consists of only two nonterminals. In ambiguous grammars there can be several different splittings,
but at this point that does not worry us. Ambiguity is a parsing issue, not a
recognition issue.
Recognizing a string for a grammar in CNF
Grammar:
S → AB
A→a
B→b
Does the grammar derive this string? ab
Compute the sets for substrings of length 1 (Ri,1): What non-terminal(s) can derive a? Answer:
A. So R1,1 is { A }. What non-terminal(s) can derive b? Answer: B. So R2,1 is { B }.
R1,1 : { A }
R2,1 : { B }
Compute the sets for substrings of length 2 (Ri,2): Can AB derive ab? A must derive a and B must
derive b. Is A a member of R1,1? Yes. Is B a member of R2,1? Yes. So AB can derive ab and therefore
S is a member of R1,2. Therefore the grammar's start symbol S can derive the string ab.
R1,2 : { S }
The algorithm results in a complete collection
of sets Ri,l
• Suppose an input sentence contains 10 symbols.
• What are the sets of Ri,l?
•
•
•
•
•
There will be 10 sets with length 1: R1,1, R2,1, …, R10,1
There will be 9 sets with length 2: R1,2, R2,2, …, R9,8
There will be 8 sets with length 3: R1,3, R2,3, …, R8,3
…
There will be 1 set with length 10: R1,10
No sets that would exceed 10 characters
The previous slide said:
There will be 1 set with length 10: R1,10
So there will not be, say, a set R2,10 because that would require the input string to be of length 11.
Decreasing numbers of sets
• Suppose an input sentence contains 10 symbols.
• What are the set of Ri,l?
• There will be 10 sets with length 1: R1,1, R2,1, …, R10,1
• There will be 9 sets with length 2: R1,2, R2,2, …, R9,8
The number of sets is decreasing
• There will be 8 sets with length 3: R1,3, R2,3, …, R8,3
•…
• There will be 1 set with length 10: R1,10
Here are the sets Ri,l
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Recognition table
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
The Ri,l sets are arranged in a triangular table. This table is
called the recognition table, or the well-formed substring table.
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Input with n symbols
• Suppose an input sentence t contains n symbols.
• What are the sets of Ri,l?
•
•
•
•
•
There will be n sets with length 1: R1,1, R2,1, …, Rn,1
There will be n-1 sets with length 2: R1,2, R2,2, …, Rn-1,8
There will be n-2 sets with length 3: R1,3, R2,3, …, Rn-3,3
…
There will be 1 set with length n: R1,n
Max length of a substring
Consider a substring starting at position i:
t1t2…titi+1…ti+l-1…tn
No substring can be longer than n + 1 – i.
t1
t2
t3
t4
t5
t6
t7
t8
t9
t10
No substring can be longer than 10 + 1 – 4 = 7.
Max values of i, l combination in Si,l
t1t2…titi+1…ti+l-1…tn
Recall that we denote a substring by the
notation Si,l. The value of i+l cannot be
greater than n+1.
t1
t2
t3
t4
t5
t6
t7
S4,7
t8
t9
t10
Recognition table
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
…
Ri,1
…
…
…
…
…
…
…
…
…
…
…
Ri+l-1,1
…
…
Rn,1
Consider Ri,l
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
…
Ri,1
The set of all non-terminals that
derive substrings of length l starting
at index i.
…
…
…
…
…
…
…
…
…
…
…
Ri+l-1,1
…
…
Rn,1
What non-terminals are in R3,4?
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R3,4 is computed from what sets?
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Non-terminals in R3,4?
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
Consider a non-terminal A → BC and suppose B can derive the
substring S3,3 and C can derive S6,1. Then B will be in R3,3 and C
will be in R6,1 and A will be in R3,4. That is, set R3,4 is computed
from sets R3,3 and R6,1.
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Non-terminals in R3,4? (cont.)
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
Consider a non-terminal A → BC and suppose B can derive the
substring S3,2 and C can derive S5,2. Then B will be in R3,2 and C
will be in R5,2 and A will be in R3,4. That is, set R3,4 is computed
from sets R3,2 and R5,2.
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Non-terminals in R3,4? (cont.)
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
Consider a non-terminal A → BC and suppose B can derive the
substring S3,1 and C can derive S4,3. Then B will be in R3,1 and C
will be in R4,3 and A will be in R3,4. That is, set R3,4 is computed
from sets R3,1 and R4,3.
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Non-terminals in R3,4? (cont.)
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
Set R3,4 is computed from the sets in yellow.
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Computing set Ri,l
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
V…
Ri,1
The set Ri,l is computed from sets
along the arrows V and W.
…
…
…
…
…
…
…
…W
…
…
…
Ri+l-1,1
…
…
Rn,1
Computing set Ri,l
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
V…
Ri,1
…
…
…
R…
i+1,l-1 …
…
…W
…
…
…
…
Ri+l-1,1
…
…
Rn,1
Combine each of the Bs in Ri,1 with each of the Cs in Ri+1,l-1, and for each pair
B and C for which there is a rule A → BC in the grammar, we insert A in Ri,l.
Computing set Ri,l
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
V…
R…
i,2
Ri,1
…
…
R…
i+1,l-1
…
…
…
…
…
R
…i+2,l-2 …
W
…
Ri+l-1,1
…
…
Rn,1
Combine each of the Bs in Ri,2 with each of the Cs in Ri+2,l-2, and for each pair
B and C for which there is a rule A → BC in the grammar, we insert A in Ri,l.
Computing set Ri,l
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
Ri,l
R…
…
i,l-1 R…
i+1,l-1
V
…
…
R…i,2
…
…
R
i,1
…
…
…
R
…i+2,l-2 …
W
…
Ri+l-1,1
…
…
Rn,1
Continue in this way, moving up V until reaching Ri,l-1 (the endpoint of V) and
moving down W until reaching Ri+l-1,1 (the endpoint of W).
Order of computing the sets
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
Ri,l
R…
…
i,l-1 R…
i+1,l-1
V
…
…
R…i,2
…
…
R
i,1
…
…
…
R
…i+2,l-2 …
W
…
Ri+l-1,1
…
…
Rn,1
The set Ri,l cannot be computed until all sets below it are known in the
triangle of which it is the top. This restricts the order in which the sets can be
computed.
Here's one way to compute the recognition
table
Compute all recognition sets for substrings of length 1, then compute all recognition
sets for substrings of length 2, and so forth. That is, no substring of length l is
recognized until all strings of length l-1 have been recognized. This order is best
suited for off-line parsing, where the number of symbols is known in advance.
Computing the recognition table
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
Compute these after computing these
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Another way to compute the recognition
table
R1,10
Compute R1,2 after
computing the sets in
the triangle under it.
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Another way to compute the recognition
table
R1,10
3
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
This is the set of nonR9,2
terminals that derive the
R9,1 R10,1
first two characters in the
input string.
1 This is the set of non-terminals that derive the first character in the input string.
2 This is the set of non-terminals that derive the second character in the input string.
Order of computation
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
3
1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
2
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Another way (cont.)
R1,10
Compute R2,2 after
computing the sets in
the triangle under it.
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Another way (cont.)
R1,10
Compute R1,3 after
computing the sets in
the triangle under it.
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Another way (cont.)
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Create the recognition sets as the input string is consumed.
Order of computation
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
6
3
1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
5
2
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
4
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
Another way (cont.)
In this order, Ri,l is computed as soon as all sets and input symbols needed for its
computation are available. This order is particularly suited for on-line parsing,
where the number of symbols is not known in advance, and additional information
is computed each time a new symbol is read.
How many R's are computed?
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R1,2
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R1,3
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2
R10,1
+ 1
How many R's are computed?
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R1,2
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R1,3
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2
10 + 1 = 11
R10,1
+ 1
How many R's are computed?
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R1,2
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R1,3
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2
9 + 2 = 11
R10,1
+ 1
How many R's are computed?
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R1,2
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R1,3
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2
8 + 3 = 11
R10,1
+ 1
How many R's are computed?
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R1,2
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R1,3
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2
7 + 4 = 11
R10,1
+ 1
How many R's are computed?
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R1,2
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R1,3
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2
6 + 5 = 11
R10,1
+ 1
How many R's are computed?
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R1,2
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R1,3
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2
10
2
x 11 = 55
R10,1
+ 1
How many R's are computed?
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
…
Ri,1

2
…
…
…
…
…
…
…
…
…
x (n+1)
…
…
Ri+l-1,1
…
…
Rn,1
How many sets are examined to compute R?
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
Set R3,4 is computed by examining at most 3
sets down (along the V arrow). Once that
downward set is fixed, the corresponding set
along the diagonal (W arrow) is fixed.
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
How many sets are examined to compute R?
R1,10
R1,9
R1,8
R1,7
R1,6
R1,5
R1,4
R1,3
R1,2
R1,1
R2,9
R2,8
R2,7
R2,6
R2,5
R2,4
R2,3
R2,2
R2,1
R3,8
R3,7
R3,6
R3,5
R3,4
R3,3
R3,2
R3,1
R4,7
R4,6
R4,5
R4,4
R4,3
R4,2
R4,1
R5,6
R5,5
R5,4
R5,3
R5,2
R5,1
R6,5
R6,4
R6,3
R6,2
R6,1
Set R1,10 is computed by examining at most
9 sets down (along the V arrow). Once that
downward set is fixed, the corresponding set
along the diagonal (W arrow) is fixed.
R7,4
R7,3
R7,2
R7,1
R8,3
R8,2
R8,1
R9,2
R9,1
R10,1
How many sets are examined to compute R?
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
V…
Ri,1
Set Ri,l is computed by examining at most n
sets down (along the V arrow). Once that
downward set is fixed, the corresponding set
along the diagonal (W arrow) is fixed.
…
…
…
…
…
…
…
…W
…
…
…
Ri+l-1,1
…
…
Rn,1
How many sets are examined to compute R?
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
V…
Ri,1
…
…
…
…
…
…
…
…W
…
Set Ri,l is computed by examining at most n
sets down (along the V arrow). Usually there
are fewer than n sets to be examined and in
practical situations many of the sets are
empty and need not be examined at all. We
will call the number of sets that really have
… to be considered nocc (number of
it is usually much smaller than
…
… occurrences);
n and for many grammars it is even a
…
Ri+l-1,1
n,1worst-case estimates it
constant, but R
for
should be replaced by n.
How many sets are examined to compute R?
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
V…
Ri,1
Once the set on the arrow V has been
chosen, the corresponding set on the arrow
W is fixed, so the cost of finding it does not
depend on n.
…
…
…
…
…
…
…
…W
…
…
…
Ri+l-1,1
…
…
Rn,1
Time complexity of CYK algorithm on
grammars in Chomsky Normal Form
Number of R sets computed:

2
x (n+1)≈ 2
Number of R sets examined to compute an R:
nocc
O(n2nocc)
The algorithm operates in a time proportional to the cube
of the length of the input sentence in the worst case.
Recall the combining step
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
V…
Ri,1
…
…
…
R…
i+1,l-1 …
…
…W
…
…
…
…
Ri+l-1,1
…
…
Rn,1
Combine each of the Bs in Ri,1 with each of the Cs in Ri+1,l-1, and for each pair
B and C for which there is a rule A → BC in the grammar, we insert A in Ri,l.
Time to find the rule that combines B and C
into A
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
V…
Ri,1
…
…
…
R…
i+1,l-1 …
…
…W
…
…
…
…
Ri+l-1,1
…
…
Rn,1
Combine each of the Bs in Ri,1 with each of the Cs in Ri+1,l-1, and for each pair
B and C for which there is a rule A → BC in the grammar, we insert A in Ri,l.
Finding the rule in the grammar that combines B and C into an A can be done
in constant time, by hashing or precomputation.
Cost to perform all the combinations
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
V…
Ri,1
…
…
…
R…
i+1,l-1 …
…
…W
…
…
…
…
Ri+l-1,1
…
…
Rn,1
Each R along the V and W arrows can contain at most |VN| non-terminals,
where |VN| is the number of non-terminals in the grammar (see section 2.2).
Cost to perform all the combinations (cont.)
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
V…
Ri,1
…
…
…
R…
i+1,l-1 …
…
…W
…
…
…
…
Ri+l-1,1
…
…
Rn,1
Each R along the V and W arrows can contain at most |VN| non-terminals,
where |VN| is the number of non-terminals in the grammar (see section 2.2).
But again the actual number is usually much lower, since usually only a very
limited subset of the non-terminals can produce a segment of the input of a
given length in a given position; we will indicate the number by |VN|occ.
Cost to perform all the combinations (cont.)
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
V…
Ri,1
…
…
…
R…
i+1,l-1 …
…
…W
…
…
…
…
Ri+l-1,1
…
…
Rn,1
As we said, the number of non-terminals in each R is: |VN|occ. In the worst
case, we have to combine each of them with each of the other nonterminals. So the cost of one combination step is O(| |2 ).
Overall time complexity of CYK algorithm on
grammars in Chomsky Normal Form
R1,n-1
R1,n-1 R2,n-1
R1,n-2 R2,n-2 R3,n-2
…
…
…
…
…
…
…
…
R1,l
…
…
R1,1
…
…
…
…
…
…
…
…
Ri,l
…
V…
Ri,1
…
…
…
R…
i+1,l-1 …
…
…W
…
…
…
…
Ri+l-1,1
…
…
Rn,1
Earlier we said the total number of R's is O(n2nocc). The cost of one
combination step is O(| |2 ). So the overall time requirement is
O(| |2 n2nocc).
Distinguish between |VN| and n
|VN| = the number of non-terminals in the grammar
n = the length of the input sentence
Beautiful algorithms
• I really enjoy learning these algorithms. There is such beauty in them.
• Algorithms have a precision similar to mathematics but deal with
problems that are much more complex than mathematics.
4.2.3 Transforming a CF
Grammar into Chomsky Normal
Form
It is worthwhile transforming to CNF
• The previous section has demonstrated that it is certainly worthwhile
to try to transform a general CF grammar into CNF.
• In this section we will discuss this transformation, using this number
grammar:
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
The transformation is done in stages
1.
2.
3.
4.
Eliminate ε-rules
Eliminate unit rules
(optional) Clean the grammar (see Section 2.9.5)
Modify the remaining rules and add rules until they all have the
desired form: either A → a or A → BC.
4.2.3.1 Eliminating the
ε-Rules
Can't simply remove ε-rules
• Suppose we have a grammar G, with an ε-rule A → ε, and we want to
eliminate this rule. We cannot simply remove the rule, as this would
change the language defined by the non-terminal A, and also
probably the language defined by the grammar G.
• Example: Below is a grammar with and without the ε-rule A → ε;
notice that they define different languages.
S → xA
A → aA
A → b
A → ε
Defines this language:
{x, xa, xb, xaa, xab, xaaa, xaab, ...}
S → xA
A → aA
A → b
Defines this language:
{xb, xab, xaab, xaaab,...}
Step 1: Replace rules containing A on the
right-hand side
So something has to be done about the occurrences of A in the righthand sides of the grammar rules. Whenever A occurs in a grammar rule
B → αAβ, we replace this rule with two others: B → αA'β, where A' is a
new non-terminal, for which we shall add rules later (these rules will be
the non-empty grammar rules of A), and B → αβ, which handles the
case where A derives ε in a derivation using the B → αAβ rule.
B → αA'β
B → αAβ
B → αβ
When we are through, there will be no occurrence of A left in the grammar.
Step 1
S
A
A
A
→
→
→
→
xA
aA
b
ε
step 1
S
S
A
A
A
A
→
→
→
→
→
→
x
xA'
a
aA'
b
ε
During the process new ε-rules may originate
B → A'
B→A
B→ε
B may derive the empty string because A may derive the empty string.
The process makes all ε-derivations explicit
New ε-rules (cont.)
• The newly created ε-rules must be dealt with in exactly the same way.
• Ultimately this process will stop because the number of nonterminals that derive ε is finite and in the end none of these nonterminals occur in any right-hand side any more.
Step 2: add rules for the new non-terminals,
A'
• For each non-ε rule A → α, add a rule A' → α.
• If A does not have a non-ε rule (i.e., A only has A → ε), remove all
rules using A'.
Step 2
S
A
A
A
→
→
→
→
xA
aA
b
ε
step 1
S
S
A
A
A
A
→
→
→
→
→
→
x
xA'
a
aA'
b
ε
step 2
S → x
S → xA'
A → a
A → aA'
A → b
A → ε
A' → a
A' → aA'
A' → b
Defines this language:
{x, xa, xb, xaa, xab, xaaa, xaab, ...}
Inaccessible ε-rules
All this leaves us with a grammar that still contains ε-rules. However,
none of the non-terminals having an ε-rule is reachable from the start
symbol.
S → x
S → xA'
A → a
A → aA' A is not reachable from S
A → b
A → ε
A' → a
A' → aA'
A' → b
Consider this grammar
S → aS
S → ε
Defines this language:
{ε, a, aa, aaa, ...}
ε is a member of the language defined by the grammar.
Let's eliminate the ε-rules
S → aS
S → ε
S → a
S → aS'
S → ε
step 1
step 2
S → a
S → aS'
S → ε
S → a
S → aS'
S → ε
S' → a
S' → aS'
Yikes! This ε-rule is reachable from the start symbol.
Inaccessible ε-rules (cont.)
All this leaves us with a grammar that still contains ε-rules. However,
none of the non-terminals having an ε-rule is reachable from the start
symbol, with one important exception: the start symbol itself. In
particular, we now have a rule S → ε if and only if ε is a member of the
language defined by the grammar G.
Let's apply the ε-eliminating scheme to this
grammar
S
L
L
M
M
→
→
→
→
→
L a M
L M
ε
M M
ε
Defines this language:
{a}
Let's apply the ε-eliminating scheme to this
grammar
S
L
L
M
M
→
→
→
→
→
L a M
L M
ε
M M
ε
step 1
S
S
S
S
L
L
M
M
→
→
→
→
→
→
→
→
a
a M'
L' a
L' a M'
L M
ε
M M
ε
S
S
S
S
L
L
M
M
→
→
→
→
→
→
→
→
a
a M'
L' a
L' a M'
L M
ε
M M
ε
step 1
S
S
S
S
L
L
L
L
M
M
→
→
→
→
→
→
→
→
→
→
a
a M'
L' a
L' a M'
L'
M'
L' M'
ε
M M
ε
S
S
S
S
L
L
L
L
M
M
→
→
→
→
→
→
→
→
→
→
a
a M'
L' a
L' a M'
L'
M'
L' M'
ε
M M
ε
step 1
S
S
S
S
L
L
L
L
M
M
M
→
→
→
→
→
→
→
→
→
→
→
a
a M'
L' a
L' a M'
L'
M'
L' M'
ε
M'
M' M'
ε
S
S
S
S
L
L
L
L
M
M
M
→
→
→
→
→
→
→
→
→
→
→
a
a M'
L' a
L' a M'
L'
M'
L' M'
ε
M'
M' M'
ε
step 2
S → a
S → a M'
S → L' a
S → L' a M'
L' → L'
L' → M'
L' → L' M'
M' → M'
M' → M' M'
L → L'
L → M'
L → L' M'
L → ε
M → M'
M → M' M'
M → ε
S → a
S → a M'
S → L' a
S → L' a M'
L' → L'
L' → M'
L' → L' M'
M' → M'
M' → M' M'
L → L'
L → M'
L → L' M'
L → ε
M → M'
M → M' M'
M → ε
S → a
cleanup
Let's apply the ε-eliminating scheme to the
number grammar
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
step 1
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
step 1
Number
Integer
Real
Real
Fraction
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number
Integer
Real
Real
Fraction
Scale
Digit
Sign
Empty → ε
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
step 2
Number
Integer
Real
Real
Fraction
Scale'
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Done!
4.2.3.2 Eliminating Unit Rules
Recall unit rules
Unit rules are rules of the form: A → B
If A → B and B → C then A → C
• If a rule A → B is used in a derivation, it must be followed at some
point in the derivation by the use of the rule B → α.
• Therefore, if we have a rule A → B, and the rules for B are
B → α1 | α2 | ... | αn
we can replace the rule A → B by
A → α1 | α2 | ... | αn
Example
A
B
C
C
→
→
→
→
B
C
B
c
replace A → B with A → C
A
B
C
C
→
→
→
→
C
C
B
c
This has resulted in another unit rule so
we repeat the replacement procedure
again.
replace A → C with A → B and A → c
A
A
B
C
C
→
→
→
→
→
B
c
C
B
c
Yikes! We are back to where we began.
This is an infinitely ambiguous grammar.
What language is generated by this grammar?
A
B
C
C
→
→
→
→
B
C
A
c
A →
A →
A →
A →
Etc.
B→
B→
B→
B→
C
C
C
C
→
→
→
→
c
A → B → C → c
A → B → C → A → B → C → c
A → B → C → A → B → C→ A → B → C → c
The grammar defines this language:
{c}
Infinitely ambiguous grammar
• Therefore, if we have a rule A → B, and the rules for B are
B → α1 | α2 | ... | αn
we can replace the rule A → B by
A → α1 | α2 | ... | αn
• In doing this, we can introduce new unit rules. In particular, when
repeating this process, we could at some point again get the rule
A → B. In this case we have an infinitely ambiguous grammar,
because this means that B derives B:
A → B → ... → B
Short-cut derivations like A → B → ... → B
For derivations like this:
A → B → ... → B
omit this unit rule
Recall this grammar
A
B
C
C
→
→
→
→
B
C
A
c
A →
A →
A →
A →
Etc.
B→
B→
B→
B→
C
C
C
C
→
→
→
→
c
A → B → C → c
A → B → C → A → B → C → c
A → B → C → A → B → C→ A → B → C → c
The grammar has derivations of this form: A → B → ... → B
So we can omit this rule: A → B
Recall this grammar
A
B
C
C
→
→
→
→
B omit this rule
C
A
c
B→ C → c
This is the only derivation that leads to a terminal symbol.
Grammars are no longer infinitely ambiguous
• Also rules of the form A → A are omitted.
• A pleasant side effect of removing ε-rules and unit rules is that the
resulting grammar is not infinitely ambiguous any more.
Let's remove the unit rules from our ε-free
number grammar
Number
Integer
Real
Real
Fraction
Scale'
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number
Number
Integer
Integer
Real
Real
Fraction
Scale'
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number
Number
Number
Integer
Integer
Real
Real
Fraction
Scale'
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
→
→
→
→
Digit
Integer Digit
Real
Digit
Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number
Number
Number
Integer
Integer
Real
Real
Fraction
Scale'
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
→
→
→
→
Digit
Integer Digit
Real
Digit
Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number
Number
Number
Integer
Integer
Real
Real
Fraction
Scale'
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Real
Digit
Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
0, 1,...,9 are terminal symbols,
so, for example, Number → 0
is not a unit rule.
Number
Number
Number
Integer
Integer
Real
Real
Fraction
Scale'
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Real
Digit
Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number
Number
Number
Number
Integer
Integer
Real
Real
Fraction
Scale'
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
Digit
Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number
Number
Number
Number
Integer
Integer
Real
Real
Fraction
Scale'
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
Digit
Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number
Number
Number
Number
Integer
Integer
Real
Real
Fraction
Scale'
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Done!
4.2.3.3 Cleaning up the
Grammar
Unreachable rules
Number
Number
Number
Number
Integer
Integer
Real
Real
Fraction
Scale'
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
These rules cannot be reached from
the start rule, Number.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Remove the unreachable rules?
• The CYK algorithm will work equally well with or without the
unreachable rules, so cleaning up the grammar, as described in
Section 2.9.5, is optional.
• For conceptual and descriptional simplicity we will clean up our
grammar here, but further on (Section 4.2.6) we shall see that this is
The cleaned-up grammar
Number
Number
Number
Number
Integer
Integer
Fraction
Scale'
Digit
Sign
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
4.2.3.4 Finally, to the
Chomsky Normal Form
After all the grammar transformations ...
• We have a grammar without e-rules.
• We have a grammar without unit rules.
• All non-terminals are reachable.
• There are no non-productive rules.
We are left with two types of rules
A→a
Rules in this form are in Chomsky Normal Form
A → X1X2...Xm, with m≥2
Need to process this
Replace terminal symbols
A → X1X2...b...Xm
Replace this terminal symbol with a new non-terminal Tb and
add the rule Tb → b.
Now we have this
A → X1X2...Tb...Xm
Tb → b This is in Chomsky Normal Form
Repeat, for each terminal symbol in A
A → X1X2...Tb...Xm
For each terminal symbol i, replace i with
a non-terminal Ti and add a rule Ti → i.
Now, the only rules not yet in CNF are of this
form
A → X1X2... Xm, with m≥3 and
all Xi are non-terminals
Split the rule and replace by two rules
A → X1X2... Xm
A → A1X3... Xm
A1 → X1X2
new non-terminal
This is in CNF
Repeat the splitting
A → X1X2X3X4X5
A → A1X3X4X5
A1 → X1X2
A → A2X4X5
A1 → X1X2
A2 → A1X3
A → A3X5
A1 → X1X2
A2 → A1X3
A3 → A2X4
Let's apply the splitting to the number
grammar
Number
Number
Number
Number
Integer
Integer
Fraction
Scale'
Digit
Sign
Number
Number
Number
Number
Integer
Integer
Fraction
T1
Scale'
Digit
Sign
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
. Integer
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Number
Number
Number
Number
Integer
Integer
Fraction
T1
Scale'
Digit
Sign
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Number
Number
Number
Number
Integer
Integer
Fraction
T1
Scale'
T2
Digit
Sign
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Number
Number
Number
Number
Integer
Integer
Fraction
T1
Scale'
T2
Digit
Sign
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
T2
Digit
Sign
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
T2
Digit
Sign
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Number grammar in Chomsky Normal Form
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
4.2.4 The Example Revisited
Apply CYK to grammar in CNF
• Now let us see how the CYK algorithm works with our example
grammar, which we have just transferred to CNF.
• Again, our sentence is 32.5e+1.
• The recognition table is shown on the following slide with some
explanatory slides following it.
Recognition table for the input 32.5e+1
7
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
The bottom row is read directly from the
grammar
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
The bottom row is read directly from the
grammar
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Notice that for each symbol a in the
sentence there must be at least one
non-terminal A with a production rule
A → a, or else the sentence cannot be
derived from the grammar.
Deriving the input.5
7
Number
Number
6
Fraction can derive .5 using T1 followed by
Integer (i.e., the non-terminal(s) that can derive the
portion of the string that starts at index 3, with length 2).
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
→ T1 Integer
→ 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
→ .
Fraction
Integer
T1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Deriving the input 2.5
7
Number
6
→
→
→
→
Number
Integer
Fraction
T1
Number
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
T1 Integer
.
Another derivation of the input 2.5
7
Number
6
→
→
→
→
N1
Integer
Fraction
T1
Number
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
T1 Integer
.
Two ways to compute a certain Ri,l
• The first way is to check each right-hand side in the grammar.
• For example, suppose we want to check whether the right-hand side
N1 Scale' derives the string 2.5e (= S2,4). (See next slide)
We want to compute R2,4
7
We are filling in the recognition table and we have arrived
at this cell of the table. We would like to know if Number
can be added to the cell. That is, can the substring 2.5e
be derived from this right-hand side?
6
5
l
4
Number,
N1
1
Number
Number,
N1
3
2
?
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→ N1 Scale'
Can N1 derive 2?
N1 is not a member of R2,1.
7
6
5
l
4
Number,
N1
Number,
N1
3
2
1
?
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Can N1 derive 2.?
N1 is not a member of R2,2.
7
6
5
l
4
Number,
N1
Number,
N1
3
2
1
?
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Can N1 derive 2.3? Can Scale' derive 5?
N1 is a member of R2,3 but
Scale' is not a member of R5,1.
7
6
5
l
4
Number,
N1
Number,
N1
3
2
1
?
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Conclusion
7
We conclude that this right-hand side does not derive the
substring.
6
5
l
4
1
?
Number,
N1
3
2
Number
Number,
N1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→ N1 Scale'
Repeat the method
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
7
6
5
l
4
Number,
N1
Number,
N1
3
2
1
?
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
repeat for all the other right-hand sides
Second method
• The second method is to compute possible right-hand sides from the
recognition table computed so far.
• For example, R2,4 is the set of non-terminals that have a right-hand
side AB where either
• A is a member of R2,1 and B is a member of R3,3, or
• A is a member of R2,2 and B is a member of R4,2, or
• A is a member of R2,3 and B is a member of R5,1.
We want to compute R2,4
l
7
?
6
?
?
5
?
?
?
4
Number,
N1
?
?
?
3
-
Number,
N1
-
-
Scale'
2
Number,
Integer
-
Fraction
-
N2
Number,
Integer,
Digit
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
1
3
1
2
2
.
3
i
5
4
We have filled in the lower rows of the
table (and the preceding cells on the same
row) and now want to fill in this cell.
-
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
7
Since B is empty, we can move on.
6
5
B
l
4
Number,
N1
Number,
N1
3
2
1
A
?
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
7
Since A and B are empty, we can move on.
6
5
B
l
4
Number,
N1
Number,
N1
3
2
1
A
?
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Is there a non-terminal in the below grammar with a righthand side AB, where A is a member of {Number, N1} and B
is a member of {T2}? No!
7
6
5
l
4
Number,
N1
1
A
B
Number,
N1
3
2
?
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
4.2.5 CYK Parsing with
Chomsky Normal Form
Can recognize if a sentence belongs to the
language
We now have an algorithm that determines whether a sentence
belongs to a language or not, and it is much faster than exhaustive
search.
Want to know: can the input can be derived
from the grammar?
Most of us, however, not only want to know whether a sentence
belongs to a language, but also, if so, how it can be derived from the
grammar. If it can be derived in more than one way, we probably want
to know all possible derivations.
The recognition table contains the info we
want
The recognition table contains the information on all derivations of
substrings of the input sentence that we could possibly make. So the
recognition table contains the information on how an input can be
derived from the grammar.
Use the recognition table multiple ways
7
Number
The recognition table shows that we can recognize the
string 32.5e+1. The table can also be used to derive the
string. Let's see how.
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
How to derive a sentence using the
recognition table
The first step of the derivation of the input sentence t, with length n,
can be read from the grammar, together with the recognition table. If
n=1, there must be a rule S → t; if n≥2, we have to examine all rules
S → AB, where A derives the first k symbols of t, and B the rest, that is,
A is a member of R1,k and B is a member of Rk+1,n-k, for some k. There
must be at least one such rule, or else S would not derive t.
Try each S → AB
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Examine the recognition table to see if
Integer can derive the first k symbols
and Digit can derive the remaining 7-k
symbols.
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
No Digit
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
Integer
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Integer can derive the first 1 symbol
but Digit cannot derive the remaining 6
symbols.
7
Number
6
No Digit
5
l
4
Number,
N1
Number,
N1
3
Integer
2
1
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Can't derive input using this rule
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
No more cells containing Integer in the
first column. Conclusion: cannot derive
the input string using the rule
Number → Integer Digit.
Recognition table contains too much info
The recognition table contains too much information, so much that it
hides what we want to know. The table may contain information about
non-terminals deriving substrings, where these derivations cannot be
used in the derivation of the input sentence from the start symbol S.
For example, we just saw that R1,2 contains Integer, but the fact that
Integer derives 32 cannot be used in the derivation of 32.5e+1
from Number.
Try next rule
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Let's see if we can derive the input string
using the rule
Number → Integer Fraction.
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
No Fraction
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
Integer
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
7
Number
6
No Fraction
5
l
4
Number,
N1
Number,
N1
3
Integer
2
1
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Can't derive input using this rule
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
No more cells containing Integer in the
first column. Conclusion: cannot derive
the input string using the rule
Number → Integer Fraction.
Try next rule
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Let's see if we can derive the input string
using the rule
Number → N1 Scale'.
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
N1
l
4
Number,
N1
Number,
N1
3
2
1
Number,
Integer
Number,
Integer,
Digit
3
1
Scale'
Scale'
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Can derive input using this rule
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
N1
l
4
Number,
N1
Number,
N1
3
2
1
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Scale'
Number,
Integer
→
→
→
→
→
→
→
→
→
→
→
→
→
→
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Yay! We can derive the input string using
the rule
Number → N1 Scale'.
Derive 32.5 from N1
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Let's explore the grey part of the
recognition table to see if we can derive
the substring 32.5 from N1. See if
Integer can derive the first k symbols
and Fraction derive the remaining 4-k
symbols.
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
Integer
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
No Fraction
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Can derive the substring
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
Integer
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
No Fraction
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Yay! We can derive the substring 32.5
using the rule
N1 → Integer Fraction.
Can derive the substring
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
Integer
1
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
Digit
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Yay! We can derive the substring 32
using the rule
Integer → Integer Digit.
Can derive the substring
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
i
T1
Digit
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Yay! We can derive the substring .5
using the rule
Fraction → T1 Integer.
Can derive the substring
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
i
T1
Digit
Yay! We can derive the substring e+1
using the rule
Scale' → N2 Integer.
Here's the derivation
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
Fraction
Number,
Integer,
Digit
3
1
T1
2
2
.
3
i
Number → N1 Scale'
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
i
Number → N1 Scale'→ Integer Fraction Scale'
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
i
Number → N1 Scale'→ Integer Fraction Scale'→ Integer Digit Fraction Scale'
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
i
Number → N1 Scale'→ Integer Fraction Scale'→ Integer Digit Fraction Scale' →
3 Digit Fraction Scale'
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
i
Number → N1 Scale'→ Integer Fraction Scale'→ Integer Digit Fraction Scale' →
3 Digit Fraction Scale' → 3 2 Fraction Scale'
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
i
Number → N1 Scale'→ Integer Fraction Scale'→ Integer Digit Fraction Scale' →
3 Digit Fraction Scale' → 3 2 Fraction Scale' → 3 2 T1 Integer Scale'
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
i
Number → N1 Scale'→ Integer Fraction Scale'→ Integer Digit Fraction Scale' →
3 Digit Fraction Scale' → 3 2 Fraction Scale' → 3 2 T1 Integer Scale' →
3 2 . Integer Scale'
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
i
Number → N1 Scale'→ Integer Fraction Scale'→ Integer Digit Fraction Scale' →
3 Digit Fraction Scale' → 3 2 Fraction Scale' → 3 2 T1 Integer Scale' →
3 2 . Integer Scale' → 3 2 . 5 Scale'
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
i
Number → N1 Scale'→ Integer Fraction Scale'→ Integer Digit Fraction Scale' →
3 Digit Fraction Scale' → 3 2 Fraction Scale' → 3 2 T1 Integer Scale' →
3 2 . Integer Scale' → 3 2 . 5 Scale' → 3 2 . 5 N2 Integer
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
i
Number → N1 Scale'→ Integer Fraction Scale'→ Integer Digit Fraction Scale' →
3 Digit Fraction Scale' → 3 2 Fraction Scale' → 3 2 T1 Integer Scale' →
3 2 . Integer Scale' → 3 2 . 5 Scale' → 3 2 . 5 N2 Integer →
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
i
Number → N1 Scale'→ Integer Fraction Scale'→ Integer Digit Fraction Scale' →
3 Digit Fraction Scale' → 3 2 Fraction Scale' → 3 2 T1 Integer Scale' →
3 2 . Integer Scale' → 3 2 . 5 Scale' → 3 2 . 5 N2 Integer →
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
i
Number → N1 Scale'→ Integer Fraction Scale'→ Integer Digit Fraction Scale' →
3 Digit Fraction Scale' → 3 2 Fraction Scale' → 3 2 T1 Integer Scale' →
3 2 . Integer Scale' → 3 2 . 5 Scale' → 3 2 . 5 N2 Integer →
3 2 . 5 T2 Sign Integer → 3 2 . 5 e Sign Integer → 3 2 . 5 e + Integer
7
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Number
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
3
1
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Scale'
Number,
Integer
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
i
Number → N1 Scale'→ Integer Fraction Scale'→ Integer Digit Fraction Scale' →
3 Digit Fraction Scale' → 3 2 Fraction Scale' → 3 2 T1 Integer Scale' →
3 2 . Integer Scale' → 3 2 . 5 Scale' → 3 2 . 5 N2 Integer →
3 2 . 5 T2 Sign Integer → 3 2 . 5 e Sign Integer → 3 2 . 5 e + Integer →
3 2 . 5 e + 7
Derivation from wrong grammar – yikes!
Unfortunately, this derivation is not exactly what we want, because it is
a derivation that uses the rules of the grammar that has been
converted to Chomsky Normal Form, not the original grammar.
4.2.6 Undoing the Effect of
the CNF Transformation
What non-terminals are in the grammar but
not the recognition table?
7
Number
The original grammar, before transforming to CNF:
Number
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Empty
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
7
Number
Number
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Empty
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
5
4
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Empty was removed from the grammar
because it derives the empty string.
N2
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
7
Number
Number
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Empty
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
5
4
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
This Scale was removed from the grammar
because it derives the empty string.
N2
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
7
Number
Number
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Empty
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
5
4
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Real was removed from the grammar because
it became unreachable after eliminating the unit
rules.
N2
Number,
Integer,
Digit
→
→
→
→
→
→
→
→
→
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
Recap: When we examine the original grammar and the recognition table, we see that the recognition table
contains the information we need on most of the non-terminals of the original grammar. However, there are a
few non-terminals missing in the recognition table: Scale, Real, and Empty.
7
Number
Number
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Empty
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Scale was replaced by Scale', where Scale' derives exactly the same as Scale, except for the empty
string.
7
Number
Number
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Empty
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale'
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Scale was replaced by Scale', where Scale' derives exactly the same as Scale, except for the empty
string. We can use this to add some more information to the recognition table: at every occurrence of
7
Number
Number
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Empty
Number
6
5
l
4
Number,
N1
Number,
N1
3
2
1
Scale',
Scale
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→
→
→
→
→
→
→
→
→
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Revisit two earlier slides
• We saw the next two slides previously.
• They talk about removing unreachable non-terminals
Unreachable rules
Number
Number
Number
Number
Integer
Integer
Real
Real
Fraction
Scale'
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
These rules cannot be reached from
the start rule, Number.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Remove the unreachable rules?
• The CYK algorithm will work equally well with or without the
unreachable rules, so cleaning up the grammar, as described in
Section 2.9.5, is optional.
• For conceptual and descriptional simplicity we will clean up our
grammar here, but further on (Section 4.2.6) we shall see that this is
The CYK algorithm does not require that all non-terminals be reachable. We could just as well have left the
non-terminal Real in the grammar, and transformed its rules to CNF. The CYK algorithm would then have
added Real to the recognition table, wherever that would be appropriate.
Number
Number
Number
Number
Integer
Integer
Real
Real
Fraction
Scale'
Scale
Digit
Sign
Empty
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Integer Fraction Scale'
. Integer
These rules cannot be reached from
the start rule, Number.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
This rule can be used, unchanged
Real
Real
→ Integer Fraction
→ Integer Fraction Scale'
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Real
Real
→ Integer Fraction
→ Integer Fraction Scale'
Replace this with N1
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
Real
Real
→ Integer Fraction
→ Integer Fraction Scale'
This rule was in the
original grammar
Number
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Real
Real
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | -
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
Real
N1 Scale'
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | Integer Fraction
N1 Scale'
7
Number,
Real
Number,
Real
6
Since Real can derive an Integer
followed by a Fraction.
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
Scale',
Scale
Number,
Integer
Number,
Integer,
Digit
3
1
Fraction
Number,
Integer,
Digit
T1
2
2
.
3
i
N2
Number,
Integer,
Digit
5
4
Number
Number
Number
Number
Number
N1
Integer
Integer
Fraction
T1
Scale'
N2
T2
Digit
Sign
Real
Real
T2
Sign
Number,
Integer,
Digit
e
5
+
6
1
7
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
→
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
Integer Fraction
N1 Scale'
Real
Integer Fraction
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Integer Digit
T1 Integer
.
N2 Integer
T2 Sign
e
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | Integer Fraction
N1 Scale'
Next, let's add a row to the bottom of the recognition table. This extra
row represents the non-terminals that derive the empty string (strings
of length zero).
7
Number
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
1
0
Integer | Real
Digit | Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+ | ε
Number,
Real,
N1
Number,
Real,
N1
3
2
→
→
→
→
→
→
→
→
→
Scale',
Scale
Number,
Integer
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
T1
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
.
3
i
N2
5
4
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
e
5
These non-terminals can be
considered as possibly occurring
between two adjacent symbols in the
sentence, and also in front of or at
the end of the sentence.
The recognition table is complete!
The recognition table now contains all the information we need to
parse a sentence with the original grammar. Again, a derivation starts
with the start symbol S. If A1A2...Am is a right-hand side of S, we want to
know if this rule can be applied, that is, if A1A2...Am derives s1,n.
Try each S → A1A2...Am
7
Number,
Real
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
T1
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
i
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Examine the recognition table to see if
Integer can derive the entire input
string.
N2
Number,
Integer,
Digit
.
3
→
→
→
→
→
→
→
→
→
→
→
→
7
No Integer
Number,
Real
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
T1
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
i
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Integer cannot derive the entire input
string.
N2
Number,
Integer,
Digit
.
3
→
→
→
→
→
→
→
→
→
→
→
→
7
Number,
Real
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
T1
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
i
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
See if Real can derive the entire input
string.
N2
Number,
Integer,
Digit
.
3
→
→
→
→
→
→
→
→
→
→
→
→
Can derive input using this rule
7
Number,
Real
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
T1
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
i
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Yay! We can derive the input string using
the rules
Number → Real
Real → Integer Fraction Scale.
N2
Number,
Integer,
Digit
.
3
→
→
→
→
→
→
→
→
→
→
→
→
Can derive 32 using this rule
7
Number,
Real
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
T1
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
i
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Yay! We can derive 32 using the rule
Integer → Integer Digit.
N2
Number,
Integer,
Digit
.
3
→
→
→
→
→
→
→
→
→
→
→
→
Can derive .5 using this rule
7
Number,
Real
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
T1
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
i
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Yay! We can derive .5 using the rule
Fraction → . Integer.
N2
Number,
Integer,
Digit
.
3
→
→
→
→
→
→
→
→
→
→
→
→
Can derive e+1 using this rule
7
Number,
Real
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
T1
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
i
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Yay! We can derive e+1 using the rule
N2
Number,
Integer,
Digit
.
3
→
→
→
→
→
→
→
→
→
→
→
→
The sentence is split up into these parts
number
real
integer
integer
fraction
integer
digit
digit
3
scale
sign
digit
2
.
5
integer
digit
e
+
1
The recognition table is complete!
The recognition table now contains all the information we need to
parse a sentence with the original grammar. Again, a derivation starts
with the start symbol S. If A1A2...Am is a right-hand side of S, we want to
know if this rule can be applied, that is, if A1A2...Am derives s1,n.
A1A2...Am is checked, starting with A1. There are two cases:
• A1 is a terminal symbol. In this case, it must be the first symbol of S1,n, or this
rule is not applicable. Then (if the rule is applicable), we must check if A2...Am
derives S2,n-1, in the same way that we are now checking if A1A2...Am derives
S1,n.
• A1 is a non-terminal. In this case, it must be a member of R1,k, for some k, or
this rule is not applicable. Then (if the rule is applicable), we must check if
A2...Am derives Sk+1,n-k, in the same way that we are now checking if A1A2...Am
derives S1,n.
Recursion!
The recognition table now contains all the information we need to
parse a sentence with the original grammar. Again, a derivation starts
with the start symbol S. If A1A2...Am is a right-hand side of S, we want to
know if this rule can be applied, that is, if A1A2...Am derives s1,n. That is
checked, starting with A1. There are two cases:
• A1 is a terminal symbol. In this case, it must be the first symbol of S1,n, or this
rule is not applicable. Then (if the rule is applicable), we must check if A2...Am
derives S2,n-1, in the same way that we are now checking if A1A2...Am derives
S1,n.
• A1 is a non-terminal. In this case, it must be a member of R1,k, for some k, or
this rule is not applicable. Then (if the rule is applicable), we must check if
A2...Am derives Sk+1,n-k, in the same way that we are now checking if A1A2...Am
derives S1,n.
Can generate all parsings
The recognition table now contains all the information we need to parse a
sentence with the original grammar. Again, a derivation starts with the start
symbol S. If A1A2...Am is a right-hand side of S, we want to know if this rule
can be applied, that is, if A1A2...Am derives s1,n. That is checked, starting with
A1. There are two cases:
• A1 is a terminal symbol. In this case, it must be the first symbol of S1,n, or this rule is
not applicable. Then (if the rule is applicable), we must check if A2...Am derives S2,n-1,
in the same way that we are now checking if A1A2...Am derives S1,n.
• A1 is a non-terminal. In this case, it must be a member of R1,k, for some k, or this rule
is not applicable. Then (if the rule is applicable), we must check if A2...Am derives
Sk+1,n-k, in the same way that we are now checking if A1A2...Am derives S1,n. If we want
all parsings, we must do this for every k for which A1 is a member of R1,k.
Can handle non-terminals that derive ε
The recognition table now contains all the information we need to parse a
sentence with the original grammar. Again, a derivation starts with the start
symbol S. If A1A2...Am is a right-hand side of S, we want to know if this rule
can be applied, that is, if A1A2...Am derives s1,n. That is checked, starting with
A1. There are two cases:
• A1 is a terminal symbol. In this case, it must be the first symbol of S1,n, or this rule is
not applicable. Then (if the rule is applicable), we must check if A2...Am derives S2,n-1,
in the same way that we are now checking if A1A2...Am derives S1,n.
• A1 is a non-terminal. In this case, it must be a member of R1,k, for some k, or this rule
is not applicable. Then (if the rule is applicable), we must check if A2...Am derives
Sk+1,n-k, in the same way that we are now checking if A1A2...Am derives S1,n. If we want
all parsings, we must do this for every k for which A1 is a member of R1,k. Notice that
non-terminals deriving the empty string pose no problem at all, because they appear
as a member of Ri,0 for all i.
4.2.7 A Short Retrospective
of CYK
We have come a long way ...
We started with building a recognition table using the original
grammar. Then we found that using the original grammar with its unit
rules and ε-rules is somewhat complicated, although it certainly can be
done. We proceeded by transforming the grammar to Chomsky Normal
Form (CNF). CNF does not contain unit rules or ε-rules. Our gain in this
respect was that the algorithm for constructing the recognition table
became much simpler. The limitation of the maximum length of a righthand side to 2 was a gain in efficiency, and also a little in simplicity.
Gain in efficiency?
The limitation of the maximum length of a right-hand side to 2 was a
gain in efficiency, and also a little in simplicity. However, Sheil [1] has
demonstrated that the efficiency only depends on the maximum
number of non-terminals occurring in a right-hand side of the grammar,
not on the length of the right-hand side per se.
• This can be easily understood, once one realizes that the efficiency depends
on (among other things) the number of cuts in a substring that are "difficult"
to find, when checking whether a right-hand side derives this substring. This
number of "difficult" cuts only depends on the number of non-terminals in
the right-hand side. So, for efficiency, Chomsky Normal Form is a bit too
restrictive.
[1] Sheil, B. Observations on context-free parsing. Statistical Methods in Linguistics, pages 79-109, 1976.
Recover lost information
A disadvantage of this transformation to CNF is that the resulting
recognition table lacks some information that we need to construct a
derivation using the original grammar. In the transformation process,
some non-terminals were thrown away, because they became nonproductive. Fortunately, the missing information could easily be
recovered. Ultimately, this process resulted in almost the same
recognition table that we would get with our first attempt using the
original grammar. It only contains some extra information on nonterminals that were added during the transformation of the grammar
to CNF. More importantly, however, it was obtained in a simpler and
much more efficient way.
For a more elaborate version of the CYK algorithm, applied to Tree
4.2.8 Getting Parse-Forest
Grammars from CYK Parsing
Parse forest versus Parse-forest grammar
• Parse forest: a tree of parse trees. Ambiguous grammars have
multiple ways of parsing the sentence; a parse forest contains all the
parse trees, connected by a root node. Duplicate branches are
merged, for efficiency reasons.
• Parse-forest grammar: a grammar that represents the parse forest.
Sum_1_5
length
starting position
The parse forest grammar rule Sum_1_5 shows
that in the parse tree grammar Sum produces an
input segment of length 5, starting at position 1.
Obtaining a parse-forest grammar during CYK
parsing
A non-terminal A is entered into entry Ri,l of the recognition table
because there is a rule A → BC. Also, B is in Ri,k and C is in Ri+k,l-k. The
rule
A_i_l → B_i_k C_m_n
is added to the parse forest grammar, where m=i+k and n=l-k.
Let's create a parse-forest grammar
7
Number,
Real
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
T1
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
i
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
N2
Number,
Integer,
Digit
.
3
→
→
→
→
→
→
→
→
→
→
→
→
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
i
N2
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
.
3
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Scale',
Scale
Number,
Integer
→
→
→
→
→
→
→
→
→
→
→
→
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
i
N2
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
.
3
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Number_1_4 → Real_1_4
Scale',
Scale
Number,
Integer
→
→
→
→
→
→
→
→
→
→
→
→
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
i
N2
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
.
3
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2 Scale_5_0
Scale',
Scale
Number,
Integer
→
→
→
→
→
→
→
→
→
→
→
→
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
i
N2
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
.
3
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2 Scale_5_0
Number_1_2 → Integer_1_2
Scale',
Scale
Number,
Integer
→
→
→
→
→
→
→
→
→
→
→
→
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
N2
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
2
2
.
3
i
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2 Scale_5_0
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number,
Integer,
Digit
3
1
→
→
→
→
→
→
→
→
→
→
→
→
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
T1
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
.
3
i
N2
5
4
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2 Scale_5_0
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
e
5
→
→
→
→
→
→
→
→
→
→
→
→
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
Number,
Integer,
Digit
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
3
1
2
2
T1
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
.
3
i
N2
5
4
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2 Scale_5_0
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
8
e
5
→
→
→
→
→
→
→
→
→
→
→
→
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
N2
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Empty
+
6
1
7
8
2
2
.
3
i
5
4
e
5
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2 Scale_5_0
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
Scale,
Digit_1_1 → 3_1_1
Number,
Integer,
Digit
3
1
→
→
→
→
→
→
→
→
→
→
→
→
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
N2
Number,
Integer,
Digit
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
3
1
2
2
.
3
i
5
4
e
5
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2 Scale_5_0
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
Scale,
Digit_1_1 → 3_1_1
Empty
Scale_1_0 → Empty_1_0
8
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
N2
Number,
Integer,
Digit
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
3
1
2
2
.
3
i
5
4
e
5
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2 Scale_5_0
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
Scale,
Digit_1_1 → 3_1_1
Empty
Scale_1_0 → Empty_1_0
Empty_1_0 → ε
8
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
N2
Number,
Integer,
Digit
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
3
1
2
2
.
3
i
5
4
e
5
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2 Scale_5_0
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
Scale,
Digit_1_1 → 3_1_1
Empty
Scale_1_0 → Empty_1_0
Empty_1_0 → ε
Number_2_6 → Real_2_6
8
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
N2
Number,
Integer,
Digit
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
3
1
2
2
.
3
i
5
4
e
5
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2 Scale_5_0
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
Scale,
Digit_1_1 → 3_1_1
Empty
Scale_1_0 → Empty_1_0
Empty_1_0 → ε
Number_2_6 → Real_2_6
8
Real_2_6 → Integer_2_1 Fraction_3_1 Scale_5_3
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
N2
Number,
Integer,
Digit
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
3
1
2
2
.
3
i
5
4
e
5
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2 Scale_5_0
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
Scale,
Digit_1_1 → 3_1_1
Empty
Scale_1_0 → Empty_1_0
Empty_1_0 → ε
Number_2_6 → Real_2_6
8
Real_2_6 → Integer_2_1 Fraction_3_1 Scale_5_3
Number_2_3 → Real_2_3
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
N2
Number,
Integer,
Digit
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
3
1
2
2
.
3
i
5
4
e
5
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
Scale,
Digit_1_1 → 3_1_1
Empty
Scale_1_0 → Empty_1_0
Empty_1_0 → ε
Number_2_6 → Real_2_6
8
Real_2_6 → Integer_2_1 Fraction_3_1
Number_2_3 → Real_2_3
Real_2_3 → Integer_2_1 Fraction_3_2
Scale_5_3
Scale_5_0
Scale_5_3
Scale_5_0
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
N2
Number,
Integer,
Digit
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
3
1
2
2
.
3
i
5
4
e
5
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
Scale,
Digit_1_1 → 3_1_1
Empty
Scale_1_0 → Empty_1_0
Empty_1_0 → ε
Number_2_6 → Real_2_6
8
Real_2_6 → Integer_2_1 Fraction_3_1
Number_2_3 → Real_2_3
Real_2_3 → Integer_2_1 Fraction_3_2
Number_2_1 → Integer_2_1
Scale_5_3
Scale_5_0
Scale_5_3
Scale_5_0
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
N2
Number,
Integer,
Digit
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
3
1
2
2
.
3
i
5
4
e
5
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
Scale,
Digit_1_1 → 3_1_1
Empty
Scale_1_0 → Empty_1_0
Empty_1_0 → ε
Number_2_6 → Real_2_6
8
Real_2_6 → Integer_2_1 Fraction_3_1
Number_2_3 → Real_2_3
Real_2_3 → Integer_2_1 Fraction_3_2
Number_2_1 → Integer_2_1
Integer_2_1 → Digit_2_1
Scale_5_3
Scale_5_0
Scale_5_3
Scale_5_0
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
N2
Number,
Integer,
Digit
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
3
1
2
2
.
3
i
5
4
e
5
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
Scale,
Digit_1_1 → 3_1_1
Empty
Scale_1_0 → Empty_1_0
Empty_1_0 → ε
Number_2_6 → Real_2_6
8
Real_2_6 → Integer_2_1 Fraction_3_1
Number_2_3 → Real_2_3
Real_2_3 → Integer_2_1 Fraction_3_2
Number_2_1 → Integer_2_1
Integer_2_1 → Digit_2_1
Digit_2_1 → 2_2_1
Scale_5_3
Scale_5_0
Scale_5_3
Scale_5_0
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
N2
Number,
Integer,
Digit
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
3
1
2
2
.
3
i
5
4
e
5
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
Scale,
Digit_1_1 → 3_1_1
Empty
Scale_1_0 → Empty_1_0
Empty_1_0 → ε
Number_2_6 → Real_2_6
8
Real_2_6 → Integer_2_1 Fraction_3_1
Number_2_3 → Real_2_3
Real_2_3 → Integer_2_1 Fraction_3_2
Number_2_1 → Integer_2_1
Integer_2_1 → Digit_2_1
Digit_2_1 → 2_2_1
Scale_2_0 → Empty_2_0
Scale_5_3
Scale_5_0
Scale_5_3
Scale_5_0
7
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number,
Real
Number,
Real
6
5
l
4
Number,
Real,
N1
Number,
Real,
N1
3
2
1
0
Scale',
Scale
Number,
Integer
Fraction
N2
Number,
Integer,
Digit
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
3
1
2
2
.
3
i
5
4
e
5
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
Digit_1_1 → 3_1_1
Scale,
Scale_1_0 → Empty_1_0
Empty
Empty_1_0 → ε
Number_2_6 → Real_2_6
Real_2_6 → Integer_2_1 Fraction_3_1
8
Number_2_3 → Real_2_3
Real_2_3 → Integer_2_1 Fraction_3_2
Number_2_1 → Integer_2_1
Integer_2_1 → Digit_2_1
Digit_2_1 → 2_2_1
Scale_2_0 → Empty_2_0
Empty_2_0 → ε
Scale_5_3
Scale_5_0
Scale_5_3
Scale_5_0
And so forth ...
Many unreachable rules
The parse-forest grammar contains many unreachable non-terminals, for
example Scale_1_0, Number_2_6, etc.
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2
Number_1_4 → Real_1_4
Real_1_4 → Integer_1_2 Fraction_3_2
Number_1_2 → Integer_1_2
Integer_1_2 → Integer_1_1 Digit_2_1
Number_1_1 → Integer_1_1
Integer_1_1 → Digit_1_1
Digit_1_1 → 3_1_1
Scale_1_0 → Empty_1_0
Empty_1_0 → ε
Number_2_6 → Real_2_6
Real_2_6 → Integer_2_1 Fraction_3_1
Number_2_3 → Real_2_3
Real_2_3 → Integer_2_1 Fraction_3_2
Number_2_1 → Integer_2_1
Integer_2_1 → Digit_2_1
Digit_2_1 → 2_2_1
Scale_2_0 → Empty_2_0
Empty_2_0 → ε
...
Scale_5_3
Scale_5_0
Scale_5_3
Scale_5_0
Cleaned parse-forest grammar obtained by CYK
parsing
Removing the
unreachable nonterminals yields this
parse-forest
grammar:
Number_1_7 → Real_1_7
Real_1_7 → Integer_1_2 Fraction_3_2 Scale_5_3
Integer_1_2 → Integer_1_1 Digit_2_1
Integer_1_1 → Digit_1_1
Digit_1_1 → 3_1_1
Digit_2_1 → 2_2_1
Fraction_3_2 → ._3_1 Integer_4_1
Integer_4_1 → Digit_4_1
Digit_4_1 → 5_4_1
Scale_5_3 → e_5_1 Sign_6_1 Integer_7_1
Sign_6_1 → +_6_1
Integer_7_1 → Digit_7_1
Digit_7_1 → 1_7_1
4.3 Tabular Parsing
A more elementary data structure
• We have drawn the CYK recognition tables as two-dimensional
triangular matrices, but the complexity of the entries – sets of nonterminals – already shows that this representation is not in its most
elementary form.
• Simplicity and insight can be gained by realizing that a CYK
recognition table is a superposition of a number of tables, one for
each non-terminal in the grammar; the entries in these tables are just
bits, saying "Present" or "Not Present".
Replace with a series of bit tables
This data structure ...
{A}
{B}
{A,C}
... can be represented by
these more elementary
data structures
•
•
A
•
B
•
C
•
7
Number,
Real
•
•
•
••
Number,
Real
6
•
• •
Number
•
•• •
•
Integer
5
l
4
1
0
•
Number,
Real,
N1
3
2
•
Number,
Real,
N1
•
•
•
The eight faces of the
recognition table
Scale',
Scale
Real
Number,
Integer
Fraction
Fraction
N2
Number,
Integer,
Digit
Number,
Integer,
Digit
T1
Number,
Integer,
Digit
T2
Sign
Number,
Integer,
Digit
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
Scale,
Empty
+
6
1
7
•
Scale,
Empty
••••••••
Scale
3
1
2
2
.
3
5
4
e
5
•• •
•
Digit
8
i
•
Sign
••••••••
Empty
How to interpret a dot in a cell
A Number of length 7 has
been recognized in position 1
l
7
6
5
4
3
2
1
0
•
•
•
•
•
••
1 2 3
• •
4 5 6 7 8
i
Number
A Number of length 1 has
been recognized in position 7
A Number of lengths 1, 2,
4, and 7 has been
recognized in position 1
7
6
5
4
3
2
1
0
•
•
•
•
•
••
• •
An Integer of lengths 1
and 2 has been
recognized in position 1
7
6
5
4
3
2
1
0
Number
A Real of lengths 4 and 7
has been recognized in
position 1
7
6
5
4
3
2
1
0
•
•
7
6
5
4
•
•
3
2
1
0
7
6
5
4
3
2
1
0
•
Integer
Real
A Scale of length 0 has
been recognized in
position 1
•
•• •
Fraction
•
••••••••
A Digit of length 1 has
been recognized in
position 1
7
6
5
4
3
2
1
0
Scale
•
Sign
•• •
•
Digit
7
6
5
4
3
2
1
0
•
An Empty of length 0
has been recognized in
position 1
7
6
5
4
3
2
1
0
••••••••
Empty
Superposition of bits
Imagine the 8 tables standing upright in the order Number ... Empty,
perhaps cut out of transparent plastic, glued together into a single
block. Now topple the block backwards, away from you. A new matrix
appears, T, on what was the bottom of the block before you toppled it,
as shown on the next slide, where the old recognition table is still
visible on what is now the top.
Number
1,2
4,7
1,3,6
1
1
Integer
1,2
1
1
1
Real
4,7
3,6
2
Fraction
Scale
0
0
Digit
1
1
0
0
0,3
1
0
0
1
1
Sign
Empty
0
0
0
0
0
0
0
0
3
2
.
5
e
+
1
1
2
3
4
5
6
7
0
8
This entry is: TNumber, 1
Number
1,2
4,7
1,3,6
1
1
Integer
1,2
1
1
1
Real
4,7
3,6
2
Fraction
Scale
0
0
Digit
1
1
0
0
0,3
1
0
0
1
1
Sign
Empty
0
0
0
0
0
0
0
0
3
2
.
5
e
+
1
1
2
3
4
5
6
7
0
8
Each entry is a list of lengths.
Number
1,2
4,7
1,3,6
1
1
Integer
1,2
1
1
1
Real
4,7
3,6
2
Fraction
Scale
0
0
Digit
1
1
0
0
0,3
1
0
0
1
1
Sign
Empty
0
0
0
0
0
0
0
0
3
2
.
5
e
+
1
1
2
3
4
5
6
7
0
8
These numbers show the length of
strings that can be recognized by
Number in position 1.
Number
1,2
4,7
1,3,6
1
1
Integer
1,2
1
1
1
Real
4,7
3,6
2
Fraction
Scale
0
0
Digit
1
1
0
0
0,3
1
0
0
1
1
Sign
Empty
0
0
0
0
0
0
0
0
3
2
.
5
e
+
1
1
2
3
4
5
6
7
0
8
“Productions of Number of lengths
1, 2, 4, and 7 can be recognized in
position 1.”
Number
1,2
4,7
1,3,6
1
1
Integer
1,2
1
1
1
Real
4,7
3,6
2
Fraction
Scale
0
0
Digit
1
1
0
0
0,3
1
0
0
1
1
Sign
Empty
0
0
0
0
0
0
0
0
3
2
.
5
e
+
1
1
2
3
4
5
6
7
0
8
“Productions of Number of lengths
1, 3, and 6 can be recognized in
position 2.”
Number
1,2
4,7
1,3,6
1
1
Integer
1,2
1
1
1
Real
4,7
3,6
2
Fraction
Scale
0
0
Digit
1
1
0
0
0,3
1
0
0
1
1
Sign
Empty
0
0
0
0
0
0
0
0
3
2
.
5
e
+
1
1
2
3
4
5
6
7
0
8
Tabular parsing algorithms
• Parsing algorithms that use mainly the representation shown on the
previous slides are called tabular parsing algorithms.
• No information is gained or lost in the transformation to this form.
• The “tabular representation” has its own advantages and
Initializing T
The table T is initialized by putting a 1 in all entries TA,i where the input
has a token t in position i and the grammar has a rule A → t.
That is, put a 1 in each entry TA,i where
A directly produces the input symbol.
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number
Integer
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Real
Fraction
Scale
Digit
The table T is initialized by putting a
1 in all entries TA,i where the input
has a token t in position i and the
grammar has a rule A → t.
1
Sign
Put a 1 in entry TDigit,1 because the
input has a token 3 in position 1
and the grammar has a rule
Digit → 3.
Empty
3
2
.
5
e
+
1
1
2
3
4
5
6
7
8
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number
Integer
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Real
Fraction
Scale
Digit
1
1
The table T is initialized by putting a
1 in all entries TA,i where the input
has a token t in position i and the
grammar has a rule A → t.
1
1
1
Sign
Put a 1 in entry TSign,6 because the
input has a token + in position 6
and the grammar has a rule
Sign → +.
Empty
3
2
.
5
e
+
1
1
2
3
4
5
6
7
8
Initializing T (cont.)
The table T is initialized by putting a 0 in all entries TA,i where the
grammar has a rule A → ε.
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number
Integer
Real
Fraction
Scale
Digit
1
1
1
1
1
Sign
Empty
0
0
0
0
0
0
0
3
2
.
5
e
+
1
1
2
3
4
5
6
7
0
8
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Fill the rest of the table
There are two ways to fill the rest of the table, top-down and
bottom-up.
4.3.1 Top-Down Tabular
Parsing
Use T for recognition
For recognition we are only interested in one element in one entry in
the table T:
Does TS,1 contain n?
where S is the start symbol and n is the length of the input.
This entry contains 7, so the
grammar recognizes the input.
Number
1,2
4,7
1,3,6
1
1
Integer
1,2
1
1
1
Real
4,7
3,6
2
Fraction
Scale
0
0
Digit
1
1
0
0
0,3
1
0
0
1
1
Sign
Empty
0
0
0
0
0
0
0
0
3
2
.
5
e
+
1
1
2
3
4
5
6
7
0
8
Hold on!
T is filled in on the previous slide. Right now we are learning how to fill
in T, using a top-down approach. Recall that we initialized T.
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number
Integer
Real
Fraction
Scale
Digit
1
1
1
1
1
Sign
Empty
0
0
0
0
0
0
0
3
2
.
5
e
+
1
1
2
3
4
5
6
7
0
8
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Here's how to answer the query
So our query is this:
Does TS,1 contain n?
Push that query on a stack. Using all grammar rules for S, draw up a list
of possibilities for TS,1 to contain n. For a rule like S → AB these
include:
Does TA,1 contain 0 and does TB,1 contain n?
Does TA,1 contain 1 and does TB,2 contain n-1?
Does TA,1 contain 2 and does TB,3 contain n-2?
...
Does TA,1 contain n and does TB,n contain 0?
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number
Integer
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Real
Fraction
Scale
Digit
1
1
1
1
1
Sign
Empty
0
0
0
0
0
0
0
3
2
.
5
e
+
1
1
2
3
4
5
6
7
0
8
Does TInteger,1 contain 7?
Does TReal,1 contain 7?
Does TNumber,1 contain 7?
Stack
Number
Number
Integer
Integer
Real
Fraction
Scale
Scale
Digit
Sign
Sign
Empty
Number
Integer
→
→
→
→
→
→
→
→
→
→
→
→
Integer
Real
Digit
Integer Digit
Integer Fraction Scale
. Integer
Empty
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
ε
Real
Fraction
Scale
Digit
1
1
1
1
1
Sign
Empty
0
0
0
0
0
0
0
3
2
.
5
e
+
1
1
2
3
4
5
6
7
0
8
...
Does TInteger,1 contain 1
and does TDigit,2 contain
7?
Does TInteger,1 contain 0
and does TDigit,1 contain
7?
Does TDigit,1 contain 7?
Does TInteger,1 contain 7?
Does TReal,1 contain 7?
Does TNumber,1 contain 7?
Stack
Recurse down to terminal queries
Eventually the queries will develop into "terminal queries", queries that
can be resolved without creating new queries. Examples are "does Ta,k
contain 1?", which can be answered by checking if the input contains
an a in position k, and "does TP,k contain 0?", which is equivalent to
"does P produce ε?".
Once we have obtained an answer to a query we store it in the proper
position in the table, and we do this not only for the top query, but also
for all intermediate, generated queries. This is very important since
now we can obtain the answer without further computation if the
same query turns up again, which it frequently does. Note that this
requires the possibility to store negative information (actually positive
information about absence) in the entries of table T: an entry like TA,k
can contain information like "does not contain 7".
Memoization
• The technique of storing results of computations in a table in order to
replace recomputation is called memoization. It is a very useful and
widely application device in algorithmics, and can often reduce the
time requirements of an algorithm from exponential to polynomial, as
it does in the present example.
• Memoization was invented in 1968 by Michie [1] and was introduced
in parsing by Sheil [2], who did not yet use the term "memorization".
[1] Michie, D. "memo" functions and machine learning. Nature, 218(5136):19-22, April 6, 1968.
[2] Sheil, B. Observations on context-free parsing. Statistical Methods in Linguistics, pages 71-109, 1976.
(required reading for anybody who wants to write or use a general CF grammar. Many practical hints and
opinions are given)
Handle left-recursive non-terminals
• In processing the queries we must also concern ourselves with leftrecursive non-terminals. If the non-terminal A is left-recursive, the
query "does TA,1 contain n" will eventually again create the query
"does TA,1 contain n", and thus start an endless loop. The loop can be
prevented by just discarding this recursive occurrence of the same
query, since a second computation would not bring in any
information not already obtained by the first. Whether a generated
query is a recursive occurrence can be determined by looking it up
the stack of queries.
• A full implementation of this algorithm is discussed in Section 17.3
4.3.2 Bottom-Up Tabular
Parsing
Assume CNF, fill with bits
• The bottom-up tabular parsing algorithm works most efficiently for
grammars that are in Chomsky Normal Form, and we will assume our
grammar to be so.
• In the previous section we filled the front panel of T with lists of
integers. In this section we fill the three-dimensional blocks with
Booleans (bits).
7
6
•
A terminal production of Number with
length 7 starts in position 1. We will write
that as: T1,Number,7
5
l
3
4
2
1
0
Number
Integer
Real
Fraction
Scale
Digit
Sign
Empty
3
2
.
1
2
3
i
5
e
+
1
4
5
6
7
8
Ti,A,k
A terminal production of A with length k starts in position i.
To fill entry Ti,A,k ...
Bottom-up tabular parsing fills the whole recognition table by filling
columns starting from the right end. To fill entry Ti,A,k, we find a rule of
the form A → BC from the grammar, and we access Ti,B,1. If this entry is
set, there is a segment at i of length 1 produced by B. So we access
Ti+1,C,k-1, and if it is also set, there is a segment at i+1 of length k-1
produced by C. From this we conclude that the input segment at i of
length k holds a terminal production of A and we set the entry Ti,A,k. If
not we try again with Ti,B,2 and Ti+2,C,k-2, etc., until Ti,B,k-1 and Ti+k-1,C,1,
just as in the CYK algorithm.
Computation order
We can stop at Ti+k-1,C,1, because grammars in CNF have no ε-rules, so
Ti+k,C,0 does not exist, This means that the computation of the entry
Ti,A,k involves entries Ti,B,j with j<k only. So if we compute all entries Ti,P,j
before all entries Ti,Q,k for j<k and arbitrary P and Q, the values of all
entries needed for the computation of Ti,A,k are guaranteed to be ready.
This imposes a particular computation order on the algorithm:
for all input positions i from n to 1
for all non-terminals A in the grammar
for all length k from 1 to i
compute Ti,A,k
Computing cost
The cost of computing one entry Ti,A,k is    , where  is the
length of the input and  the average number of production rules
of a non-terminal. As we saw, this computation is repeated
times, where  is proportional to the size of the grammar. So the time
requirements of this parsing algorithm is  3   or  3 x
.
• Bottom-up tabular parsing is applied in a number of algorithms in
Section 12 and 15.7.
• Nederhof and Satta [1] have written a tutorial on tabular parsing,
applying it to a wide selection of non-deterministic parsing
algorithms.
[1] Nederhof, Mark-Jan and Satta, Giorio, Tabular parsing. In Formal Languages and Applications,
volume 148 of Fuzziness and Soft Computing, pages 529-549, Springer, April 2004.
```