Regular Expressions and Non-regular Languages http://cis.k.hosei.ac.jp/~yukita/ Expressions and their values Arithmetic Expression Value (5 3) 4 32 expression ( 0 1) 0 Regular or expression Subexpress The language strings ( 0 1) 0 ions * * begining followed that consists of with 0 or 1 by an arbitrary number of 0s. 0, 1, 0 1, and 0 have values {0}, {1}, and * * {0,1}, and { 0} , respective ly. The value of the whole expression * is { 0 ,1} { 0} . 2 Definition 1.26 R is a regular expression if 1. R a , 2. R , 3. R , 4. R ( R1 R 2 ), where 5. R ( R1 R 2 ), where * 6. R ( R1 ), where R1 and R 2 are regular R1 and R 2 are regular R1 is a regular expression expression expression s, s, or . 3 The values of atomic expressions expression value a {a} { } {} In many real programmin alphabet written meaning The language that consists of only one string a . The language that consists of * only the empty string . The empty language * g languages, we must distinguis a and string a ; the former should * h be as ' a' while the latter " a". But we will not follow this convention . 4 Example 1.27 Let { 0 ,1} throughou t in the following examples. 1. 0 10 { w | w has exactly a single 1}. * * 2. 1 { w | w has at least one 1}. * * 3. 001 { w | w contains * * the string 001 as a substring }. 4. ( ) { w | w is a string of even length }. * 7. 0 0 1 1 0 1 { w | w starts and ends with the * * same symbol}. 8. ( 0 )1 01 1 . * * * 10. 1 . * 11. { }. * 5 Units for the binary operations R R shows that is the unit for the union operation. R R shows that is the unit for the concatenat ion operation. We denote by L ( R ) the language of R . R may not equal R . For example, In general, For example, if R 0 , then L ( R ) { 0} but L ( R ) { 0 , }. R R. if R 0 , then L ( R ) { 0} but L ( R ) . 6 Theorem 1.28 • A language is regular if and only if some regular expression describes it. We break down this theorem as follows. • Lemma 1.29 – If a language is described by a regular expression, then it is regular. • Lemma 1.32 – If a langulage is regular, then it is described by a regular expression. 7 Proof of Lemma 1.29 We shall convert R into an NFA N . a Case 1 : R a . Case 2 : R . Case 3 : R . Case 4 : R R1 R 2 . Case 5 : R R1 R 2 . Case 6 : R R1 Next three slides * Proof of Lemma 1.29 8 Case 4: Let N1, N2, and N correspond to R1, R2, and R, respectively. N N1 N2 Proof of Lemma 1.29 9 Case 5: Let N1, N2, and N correspond to R1, R2, and R, respectively. N2 N1 N Proof of Lemma 1.29 10 Case 6: Let N1 and N correspond to R1 and R, respectively. N N1 Proof of Lemma 1.29 11 Generalized Nondeterministic Finite Automaton • is roughly a NFA in which the transition arrows may have regular expressions as labels. • We assume the following standard form for convenience, which can always be attained with an easy modification. – There is only one accept state and different from the start state. – The start state has transition arrows going to every other state but no arrows coming in from any other state. – There is only a single accept state, and it has arrows coming in from any other state but no arrows going to any other state. – Except for the start and accept states, one arrow goes from every state to every other state and also from each state to itself. 12 Standard Form of GNFA ... 13 Standard Form of GNFA Q { q start } Q { q accept } is a disjont union. For each pair ( q i , q j ) in ({ q start } Q ) ( Q Q ) Q { q accept } there is one and only one directed arrow going from q i to q j . 14 Equivalent GNFA with one fewer state qi R4 R1 qj qi ( R1 )( R 2 ) * ( R 3 ) ( R 4 ) qj R3 qrip R2 15 Definition 1.33 A generalize d nondetermi nistic finite automaton ( Q , , , q start , q accept ), where R is the set of regular is 5 - tuple expression s, 1. Q is the finite set of states, 2. is the input alphabet, 3. : ( Q { q accept }) ( Q { q start }) R , 4. q start is the start state, and 5. q accept is the accept state. 16 Computation with GNFA A GNFA accepts a string w w1 w 2 w k with w i , * and a sequence * of states q 0 , q 1 , , q k exists such that 1. q 0 q start , 2. q k q accept , 3. for each i , we have w i L ( R i ), where R i ( q i 1 , q i ). 17 Converting GNFA Convert ( G ) : 1. Let k be the number of states of G . 2 . If k 2 , then return the regular expression appearing on the only arrow. 3. If k 2 , we select any state q rip Q { q start , q accept } and let G be the GNFA ( Q , , , q start , q accept ), where Q Q { q rip }, and for any q i Q { q accept } and q j Q { q start } let ( q i , q j ) ( R1 )( R 2 ) * ( R 3 ) ( R 4 ), for R1 ( q i , q rip ), R 2 ( q rip , q rip ), R 3 ( q rip , q j ), and R 4 ( q i , q j ). 4. Compute Convert ( G ) and return thi s value. 18 Claim 1.34 For any GNFA G, Convert(G) is equivalent to G. Basis(k 2) : Obvious. Proof. Induction step : Assume that the claim is true for k 1 states. Suppose that G accepts an input w . Then, in an accepting of the computatio n, G enters a sequence branch of states q start , q 1 , q 2 , q 3 , , q accept . If q rip { q start , q 1 , q 2 , q 3 , , q accept }, clearly G also accepts w . If q rip { q start , q 1 , q 2 , q 3 , , q accept }, removing consecutiv e q rip states forms an accepting The states q i and q j bracketing computatio n for G . a run have a new regular on the arrow between th em that describes in via each run of q rip on G . So G accepts w . all strings expression taking q i to q j 19 Proof continued For the other direction, suppose that G accepts an input w . As each arrow between any two states q i and q j in G describes the collection of strings taking q i and q j in G , either directly or via q rip , G must also accept w . Thus G and G are equivalent The induction itself hypothesis the algorithm recursivel y on input G , the result is a regular that is equivalent regular states that when expression . calls expression to G because G has k 1 states. Hence the also is equivalent to G , and the algorithm is proved correct. 20 Non-regularity B , C , and D seem to require machines to recognize with infinite number of states them. B { 0 1 | n 0} n n C { w | w has an equal number of 0s and 1s} D { w | w has an equal number of occurrence s of 01 and 10 as substrings }. B and C will turn out to be nonregular while D regular. See, problem 1.41. 21 Theorem 1.37 Pumping Lemma If A is a regular (the pumping language, length) then s can be divided s xyz , satisfying then ther e is a number p where, if s A with | s | p , into three pieces, the following conditions : 1. for each i 0 , xy z A , i 2. | y | 0 , and 3. | xy | p . 22 Proof of Th 1.37 s s1 s 2 s k sl s n q1 q 2 q k q k q a ( s1 s 2 )( s k s l )( s n ) xyz where q a is an accepting By the pigeonhole principle, we can take the pumping as the number state. length of states plus one. 23 Example 1.38 Claim : B { 0 1 | n 0} is not regular. n n Proof. Assume the contrary t hat B is regular. Let p be the pumping length. Then, s 0 1 can be decomposed p p as s xyz with | y | 1, and xy z B for any n 0 . n There can be three cases y 0 , y 0 1 , and y 1 , for some k nonzero k l l k , l . In each case, we can eazily see that xy z B , which n leads to contradict ion. ( n 2 ) 24 Example 1.39 Claim : C { w | w has an equal number of 0s and 1s } is not regular. Proof. Assume the contrary t hat C is regular. Let p be the pumping length. Then, s 0 1 can be decomposed p p as s xyz with | y | 1 and | xy | p , and xy z C for any n 0 . n Then we must have y 0 for some nonzero k k. We can eazily see that xy z C , which contradict s the assumption n . 25 Alternative proof of 1.39 The class of regular operation. languages is closed under the This is eazy to prove if we run two accept only strings itnersecti on DFAs parallely and which are accepted by both of the DFAs. Now, assume C is regular. Then, This contradict s what we C 0 1 B is also regular. * * proved in Example 1.38. 26 Example 1.40 Claim : F { ww | w { 0 ,1} } is not regular. * Proof. Assume the contrary t hat F is regular. Let p be the pumping length and let s 0 10 1 F . This s can be split p p into pieces like s xyz with | y | 1 and | xy | p , and xy z F for any n 0 . n Then we must have y 0 for some nonzero k k. We can eazily see that xy z F , which contradict s the assumption n . 27 Example 1.41 Unary Language Claim : D {1 Proof. Assume n 2 | n 0} is not regular. the contrary t hat D is regular. Let p be the pumping length and let s 1 p 2 D . This s can be split into pieces like s xyz with | y | 1 and | xy | p , and xy z D n for any n 0 . n The length of xy z grows linearly w of strings ith n , while the lengths in D grows as 0 ,1, 4 , 9 ,16 , 25 , 36 , 49 , These two facts are incompatib le as can be easily seen. 28 Example 1.42 Pumping Down Claim : E { 0 1 | i j } is not regular. i Proof. Assume j the contrary t hat E is regular. Let p be the pumping length and let s 0 p 1 p 1 . This s can be split into s xyz with | y | 1 and | xy | p , and xy z E for any n 0 . n Then we must have y 0 for some nonzero k k. We can eazily see that xy z xz E , which contradict s the assumption 0 . 29 Problem 1.41 Differential Encoding Claim : D { w | w contains substrings equal number of occurrence s of the 01 and 10 } is regular. 0 1 1 0 0xx0 0xx1 0 qstart 0 1 1xx1 1xx0 1 1 0 30

Descargar
# Regular Expressions and Non