Tree Automata
First: A reminder on Automata on
words
Typing semistructured data
Finite state automata on words
( , Q , q0 , F ,  )
Transitions
 :   Q  P (Q )
Alphabet
State
Initial state
q0  Q
Accepting states
F Q
Typing semistructured data
Nondeterministic automaton: Example
  a , q 0   q 0 , q 1 
  a , b 
 b , q 1   q 0 
Q  q 0 , q 1 , q 2 , q 3 
   , q 1   q 2 
F  q 2 
   , q 2   q 3 
   , q 3   q 3 
a
q0
b
q0
q1
a
q0
a
b
q0
q0
q1
q1
a
q0
q0
KO
b
q0
q1
a
q0
q0
q1
q2
OK
Reminder
• Deterministic
   , q   q 0 
 a , q 0   q 0 , q1 
– No  transition
– No alternative transitions such as
• Determinization
– It is possible to obtain an equivalent deterministic automaton
– State of new automaton = set of states of the original one
– Possible exponential blow-up
• Minimization
• Limitations – cannot do
– Context-free languages
a
n
• Essential tool – e.g., lexical analysis
b ,n Ν
n

Reminder (2)
•
•
•
•
L(A) = set of words accepted by automata A
Regular languages
Can be described by regular expressions, e.g. a(b+c)*d
Closed under complement
 *  L ( A)
• Closed under union, intersection
L ( A)  L ( B )
L ( A)  L ( B )
– Product automata with states (s,s’)
where s is from A and s’ is from A’
Automata on words versus trees
a
Left to right
a
b
b
a
Right to left
B
o
t
t
o
m
u
p
b
b
b
b
a
a
No difference
a
T
o
p
d
o
w
n
b
Differences
Automata
Automata on ranked trees
Typing semistructured data
Binary tree automata
• Parallel evaluation
( , Q , F ,  )
a
• For leaves:
 :   P (Q )
• For other nodes:
 :   Q  Q  P (Q )
B
o
t
t
o
m
u
p
q”
q1
b
b
q’
b
b
a
q
a
q
Typing semistructured data
q2
a
q”
q
b
q’
Bottom-up tree automata
  a , q , q '   r , r '
• Bottom-up: if a node labeled a has its children in
states q, q’ then the node moves
nondeterministically to state r or r’
• Accepts is the root is in some state in F
• Not deterministic if alternatives or -transitions:
  a , q , q '   { r , r ' }    , r   r '
Example: deterministic bottom-up
 1 0   q 0 
 1 1  q1 
  0 ,1,  ,  
Q  q 0 , q 1 
F  q 1 
 2   , q 1 , q 1   q 1 
 2   , q 0 , q 0 ,  2   , q 1 , q 0 ,  2   , q 0 , q 1   q 0 
 2  , q 0 , q 0   q 0 
 2  , q 1 , q 1 ,  2  , q 1 , q 0 ,  2  , q 0 , q 1   q 1 
Boolean circuit evaluation
q0
v
1
q1
1
0
q1
q1
 1 1  q1 
v
1
q1
0
q0
q1
OK
 1 0   q 0 
1
 2   , q 1 , q 1   q 1 
q1
v
q1
v
v
v
q1
q1
 2   , q 0 , q 0   q 0 
 2   , q 1 , q 0   q 0 
q1
 2   , q 0 , q 1   q 0 
1
 2  , q 1 , q 1   q 1 
q1
 2  , q 0 , q 0   q 0 
 2  , q 1 , q 0   q 1 
 2  , q 0 , q 1   q 1 
Regular tree language = set of trees
accepted by a bottom-up tree
automaton
Typing semistructured data
Regular tree languages
Theorem: the following are equivalent
– L is a regular tree language
– L is accepted by a nondeterministic bottom-up
automaton
– L is accepted by a deterministic bottom-up
automaton
– L is accepted by a nondeterministic top-down
automaton
Deterministic top-down is weaker
Top-down tree automata
 a , q "  
 q , q ' 
• Top-down: if a node labeled a is in state q”,
then its left child moves to state q, right to q’
• Accepts is all leaves are in states in F
• Not deterministic if
 a , q "  
 q , q ' ,  r , r ' 
Why deterministic
top-down is weaker?
• Consider the language
– L = { <r> <a\>,<b\> <\r>, <r> <b\>,<a\><\r>) }
• It can be accepted by a bottom-up TA
– Exercise: write a BUTA A such that L = L(A)
• Suppose that B is a deterministic top-down TA that
accepts both trees in L
– Exercise: Show that B also accepts <r> <a\><a\> <\r>
– A contradiction
Fact: No deterministic top-down tree automata accepts
exactly L
Ranked trees automata: Properties
•
•
•
•
Like for words
Determinization
Minimization
Closed under
– Complement
– Intersection
– Union
But…
• XML documents are unranked:
book (intro,section*,conclusion)
Automata
Automata on unranked tree
Typing semistructured data
Unranked tree automata
 2   , t   t 
 2  , f    f 
 2  , t   t 
 2  , f    f 
 2   , t , t   t 
 2   , t , t , t   t ...
 2  , t , f    f 
 2  , t , f   t 
 2   , f , t    f ...
 2  , f , t   t ...
 2  , f , f    f 
 2  , f , f , f    f ...
Issue: represent an infinite set of transitions
Solution: a regular language
Unranked tree automata (2)




a
,
L
(
Q
)

r
1,..., r m 
• Rule:
• Meaning: if the states of the children of some
node labeled a form a word in L(Q), this node
moves to some state in {r1,…,rm}
 2   , And 1   t 
where
 2   , And 0    f 
 2  , Or 1   t 
 2  , Or 0    f 
where
where
where
And 1  t 
And 0  ( t  f ) * f ( t  f ) *
Or 1  ( t  f ) * t ( t  f ) *
Or 0  f 
Building on ranked trees
a
a
b
b
a
b
b
b
a
b
b
a
b
b
b
a
b
b
Ranked tree: FirstChild-NextSibling
F: encoding into a ranked tree
F is a bijection
F-1: decoding
Building on
bottom-up ranked trees (2)
• For each Unranked TA A, there is a Ranked TA
accepting F(L(A))
• For each Ranked TA A, there is an unranked TA
accepting F-1(L(A))
• Both are easy to construct
Consequence: Unranked TA are closed under union,
intersection, complement
Determinaztaion also possible, a bit more tricky
Descargar

Tree Automata - The Blavatnik School of Computer Science