```Regular Expressions and
Non-regular Languages
http://cis.k.hosei.ac.jp/~yukita/
Expressions and their values
Arithmetic
Expression
Value
(5  3)  4
32
expression
( 0  1)  0
Regular
or
expression
Subexpress
The language
strings
( 0  1) 0
ions
*
*
begining
followed
that consists
of
with 0 or 1
by an arbitrary
number
of 0s.
0, 1, 0  1, and 0 have values {0}, {1}, and
*
*
{0,1}, and { 0} , respective ly. The value of the whole expression
*
is { 0 ,1}  { 0} .
2
Definition 1.26
R is a regular
expression
if
1. R  a   ,
2. R   ,
3. R   ,
4. R  ( R1  R 2 ), where
5. R  ( R1  R 2 ), where
*
6. R  ( R1 ), where
R1 and R 2 are regular
R1 and R 2 are regular
R1 is a regular
expression
expression
expression
s,
s, or
.
3
The values of atomic expressions
expression
value
a
{a}

{ }

  {}
In many real programmin
alphabet
written
meaning
The language that consists
of
only one string a   .
The language that consists
of
*
only the empty string    .
The empty language
*
g languages,
we must distinguis
a   and string a   ; the former should
*
h
be
as ' a' while the latter " a". But we will not follow
this convention
.
4
Example 1.27
Let   { 0 ,1} throughou
t in the following
examples.
1. 0 10  { w | w has exactly a single 1}.
*
*
2.  1  { w | w has at least one 1}.
*
*
3.  001   { w | w contains
*
*
the string 001 as a substring }.
4. (  )  { w | w is a string of even length }.
*
7. 0  0  1 1  0  1  { w | w starts and ends with the
*
*
same symbol}.
8. ( 0   )1  01  1 .
*
*
*
10. 1    .
*
11.   { }.
*
5
Units for the binary operations
R    R shows that  is the unit for the union operation.
R   R
shows that  is the unit for the concatenat ion operation.
We denote by L ( R ) the language
of R .
R   may not equal R .
For example,
In general,
For example,
if R  0 , then L ( R )  { 0} but L ( R   )  { 0 ,  }.
R    R.
if R  0 , then L ( R )  { 0} but L ( R   )   .
6
Theorem 1.28
• A language is regular if and only if some
regular expression describes it.
We break down this theorem as follows.
• Lemma 1.29
– If a language is described by a regular
expression, then it is regular.
• Lemma 1.32
– If a langulage is regular, then it is described
by a regular expression.
7
Proof of Lemma 1.29
We shall convert
R into an NFA N .
a
Case 1 : R  a .
Case 2 : R   .
Case 3 : R   .
Case 4 : R  R1  R 2 .
Case 5 : R  R1  R 2 .
Case 6 : R  R1
Next three slides
*
Proof of Lemma 1.29
8
Case 4: Let N1, N2, and N correspond to R1,
R2, and R, respectively.
N
N1


N2
Proof of Lemma 1.29
9
Case 5: Let N1, N2, and N correspond to R1,
R2, and R, respectively.
N2
N1

N

Proof of Lemma 1.29
10
Case 6: Let N1 and N correspond to R1 and
R, respectively.
N

N1


Proof of Lemma 1.29
11
Generalized Nondeterministic
Finite Automaton
• is roughly a NFA in which the transition arrows may have
regular expressions as labels.
• We assume the following standard form for convenience,
which can always be attained with an easy modification.
– There is only one accept state and different from the start state.
– The start state has transition arrows going to every other state
but no arrows coming in from any other state.
– There is only a single accept state, and it has arrows coming in
from any other state but no arrows going to any other state.
– Except for the start and accept states, one arrow goes from
every state to every other state and also from each state to itself.
12
Standard Form of GNFA
...
13
Standard Form of GNFA
Q  { q start }  Q   { q accept }
is a disjont
union.
For each pair ( q i , q j ) in ({ q start }  Q  )  ( Q   Q  )  Q   { q accept }
there is one and only one directed arrow going from q i to q j .
14
Equivalent GNFA with one fewer state
qi
R4
R1
qj
qi
( R1 )( R 2 ) * ( R 3 )  ( R 4 )
qj
R3
qrip
R2
15
Definition 1.33
A generalize
d nondetermi
nistic finite automaton
( Q ,  ,  , q start , q accept ), where R is the set of regular
is 5 - tuple
expression
s,
1. Q is the finite set of states,
2.  is the input alphabet,
3.  : ( Q  { q accept })  ( Q  { q start })  R ,
4. q start is the start state, and
5. q accept is the accept state.
16
Computation with GNFA
A GNFA accepts a string w  w1 w 2  w k   with w i   ,
*
and a sequence
*
of states q 0 , q 1 ,  , q k exists such that
1. q 0  q start ,
2. q k  q accept ,
3. for each i , we have w i  L ( R i ), where R i   ( q i 1 , q i ).
17
Converting GNFA
Convert ( G ) :
1. Let k be the number
of states of G .
2 . If k  2 , then return the
regular
expression
appearing
on the
only arrow.
3. If k  2 , we select any state q rip  Q  { q start , q accept } and let
G  be the GNFA ( Q ,  ,  , q start , q accept ), where Q   Q  { q rip },
and for any q i  Q   { q accept } and q j  Q   { q start } let
  ( q i , q j )  ( R1 )( R 2 ) * ( R 3 )  ( R 4 ),
for R1   ( q i , q rip ), R 2   ( q rip , q rip ), R 3   ( q rip , q j ), and R 4   ( q i , q j ).
4. Compute
Convert ( G  ) and return thi s value.
18
Claim 1.34 For any GNFA G, Convert(G) is
equivalent to G.
Basis(k  2) : Obvious.
Proof.
Induction
step : Assume
that the
claim is true for k  1 states.
Suppose that G accepts an input w . Then, in an accepting
of the computatio
n, G enters a sequence
branch
of states
q start , q 1 , q 2 , q 3 ,  , q accept .
If q rip  { q start , q 1 , q 2 , q 3 ,  , q accept }, clearly G  also accepts w .
If q rip  { q start , q 1 , q 2 , q 3 ,  , q accept }, removing
consecutiv
e q rip states forms an accepting
The states q i and q j bracketing
computatio
n for G .
a run have a new regular
on the arrow between th em that describes
in via
each run of
q rip on G . So G  accepts w .
all strings
expression
taking q i to q j
19
Proof continued
For the other direction,
suppose that G  accepts an input w .
As each arrow between any two states q i and q j in G  describes
the collection
of strings
taking q i and q j in G , either directly
or via q rip , G must also accept w . Thus G and G  are equivalent
The induction
itself
hypothesis
the algorithm
recursivel y on input G , the result is a regular
that is equivalent
regular
states that when
expression
.
calls
expression
to G  because G  has k  1 states. Hence the
also is equivalent
to G , and the algorithm
is proved correct.
20
Non-regularity
B , C , and D seem to require machines
to recognize
with infinite
number
of states
them.
B  { 0 1 | n  0}
n n
C  { w | w has an equal number
of 0s and 1s}
D  { w | w has an equal number
of occurrence s of
01 and 10 as substrings
}.
B and C will turn out to be nonregular
while
D regular.
See, problem 1.41.
21
Theorem 1.37 Pumping Lemma
If A is a regular
(the pumping
language,
length)
then s can be divided
s  xyz , satisfying
then ther e is a number
p
where, if s  A with | s | p ,
into three pieces,
the following
conditions
:
1. for each i  0 , xy z  A ,
i
2. | y | 0 , and
3. | xy | p .
22
Proof of Th 1.37
s  s1 s 2 
s k  sl  s n
  


q1 q 2  q k  q k  q a
 ( s1 s 2  )( s k  s l )(  s n )  xyz
where
q a is an accepting
By the pigeonhole
principle,
we can take the pumping
as the number
state.
length
of states plus one.
23
Example 1.38
Claim : B  { 0 1 | n  0} is not regular.
n n
Proof.
Assume
the contrary t hat B is regular.
Let p be the pumping
length. Then, s  0 1 can be decomposed
p
p
as s  xyz with | y | 1, and xy z  B for any n  0 .
n
There can be three cases y  0 , y  0 1 , and y  1 , for some
k
nonzero
k l
l
k , l . In each case, we can eazily see that xy z  B , which
n
24
Example 1.39
Claim : C  { w | w has an equal number
of 0s and 1s }
is not regular.
Proof.
Assume
the contrary t hat C is regular.
Let p be the pumping
length. Then, s  0 1 can be decomposed
p
p
as s  xyz with | y | 1 and | xy | p , and xy z  C for any n  0 .
n
Then we
must have y  0 for some nonzero
k
k.
We can eazily see that xy z  C , which contradict s the assumption
n
.
25
Alternative proof of 1.39
The class of regular
operation.
languages
is closed under the
This is eazy to prove if we run two
accept only strings
itnersecti on
DFAs parallely
and
which are accepted by both of the DFAs.
Now, assume C is regular. Then,
C  0 1  B is also regular.
* *
proved in Example
1.38.
26
Example 1.40
Claim : F  { ww | w  { 0 ,1} } is not regular.
*
Proof.
Assume
the contrary t hat F is regular.
Let p be the pumping
length and let s  0 10 1  F . This s can be split
p
p
into pieces like s  xyz with | y | 1 and | xy | p , and xy z  F for any n  0 .
n
Then we
must have y  0 for some nonzero
k
k.
We can eazily see that xy z  F , which contradict s the assumption
n
.
27
Example 1.41 Unary Language
Claim : D  {1
Proof.
Assume
n
2
| n  0} is not regular.
the contrary t hat D is regular.
Let p be the pumping
length and let s  1
p
2
 D . This s can be split
into pieces like s  xyz with | y | 1 and | xy | p , and xy z  D
n
for any n  0 .
n
The length of xy z grows linearly w
of strings
ith n , while the lengths
in D grows as 0 ,1, 4 , 9 ,16 , 25 , 36 , 49 , 
These two facts are incompatib
le as can be easily seen.
28
Example 1.42 Pumping Down
Claim : E  { 0 1 | i  j } is not regular.
i
Proof.
Assume
j
the contrary t hat E is regular.
Let p be the pumping
length and let s  0
p 1 p
1 . This s can be split
into s  xyz with | y | 1 and | xy | p , and xy z  E for any n  0 .
n
Then we
must have y  0 for some nonzero
k
k.
We can eazily see that xy z  xz  E , which contradict s the assumption
0
.
29
Problem 1.41 Differential Encoding
Claim : D  { w | w contains
substrings
equal number
of occurrence s of the
01 and 10 } is regular.
0
1
1
0
0xx0
0xx1
0
qstart
0
1
1xx1
1xx0
1
1
0
30
```