Syntactic Pattern Recognition •Statistical PR: Find a feature vector x • Train a system using a set of labeled patterns • Classify unknown patterns •Ignores relational information contained in the structure •Most structural methods use hierarchical decomposition •Note similarity between a sentence structure and pattern description Picture A f A c b B a g C Rectangle C Triangle B e d edge edge edge a b c edge edge d e edge edge f g Language •Alphabet is a finite set of symbols, V={x1,x2, …,xn} •Sentence over B is a finite string of ordered symbols (left to right) from V •Example: V = {a,b,c}, valid sentences are “abb”, “abba”, “aaa”, null •Length of a sentence s, |s| is the number of symbols •s1os2 is the concatenation of the two sentences •VoVoV…oV = Vn is the set of all sentences with n symbols over V •V+=VUV2UV3…. is the set of all non-empty sentences over V •V* is the closure of V •Language is an arbitrary subset L of V* •Example: V={0,1}, then L1 = {001, 110, 111, 0, null} is a finite language • L2 = {s|s = 1n021m, n>=1, 1<=m<=10} is an infinite language Languages •L1oL2 = {s|s = s1s2, s1 belongs to L1 and s2 belongs to L2} is concatenation •L1it = {s|s = s1s2…sn, n>=0, si belongs to L1} is the iterate of L1 •L1oL2 and L1it are both languages •Example: V = {a,b} L1 = {aa,ab,bb} L2 = {a,b} •L1oL2 = {a3,aba,b2a,a2b,ab2,b3} •L1it is infinite; for n={0,1,2} , a 2 2 4 3 2 2 2 3 2 2 2 4 , ab , b , a , a b , a b , aba , abab , ab , b a , b ab , b } •s is called a sub-string of t if t =usv for some strings u,v belonging to V* •Every string is a substring of itself as u and/or v can be null Grammars •Grammar G = {VT, VN, P, S} has 4 entities • VT is a set of terminal symbols, called primitives or constants • VN is a set of non-terminal symbols, called variables • VT and VN belong to V; • VT V N , VT V N V P is the set of production rules A->B where A has at least one variable and B is a mix of variables and constants • S is the starting symbol or the root; S belongs to VN •L(G) is a formal language ( a set of strings) generated by the grammar G • Each string is composed of only primitives • Each string can be derived from S using the production rules P • Example: VT = {a,b}, VN = {S}; P = {S->aSb, S->ab} => L(G) : anbn, n>=1 •Grammar is used to : •(I) generate the strings (sentences) accepted by L(G), (ii) check if a sentence belongs to a grammar, (iii) analyze the structure of a sentences Grammar Types V ; V ; A , B V N ; a , b VT ; * UnRestricted Grammar (UR) 1 2 Context Sensitive Grammar (CS) A 1 2 Context Free Grammar (CF) A1 2 Finite State Grammar (FS) A 1 a , orA Example: VT = {a,b,c}; VN = {S, A, B} UR CS CF 1 B 2a FS S abA S aBbc S aAc S aA A c S ab S bB S bB S cB aB aAc S AB A B aA Ac abc A aB A a B Bb Acb A c B b B b B c Finite State Grammars, and Graphical Representations •Nodes are nonterminals in VN and an additional terminal node T not in V •Productions of type Ai->aAj represented by edge a directed from Ai to Aj •Productions of type Ai->a represented by edge a directed from Ai to T S aA a S bB A cB B a A c A S a B a a a T For a FS grammar G, an arbitrary string x=x1x2..xn, xi in VT is in L(G) iff there exists at least one path (x1,x2,..,xn) from S to T Syntactic Pattern Recognition C2-class problem C1 and C2 are composed of features from a set VT Let G be a grammar such that L(G) consists only of sentences (patterns) from C1 Example: VT = {a,b} VN = {S,A} P:{S->aSb S->b} L(G): {b; anbn+1, n>=1} Classification Rule x belongs to C1 iff x belongs to L(G) x belongs to C2 iff otherwise Classification algorithm has to correctly answer whether or not a given string is grammatically correct. Pattern Grammars 2-class problem: rectangles and other quadilaterals Select primitives: Set of rectangles: a: 0o edge b: 90o edge c: 180o edge d: 270o edge L { a 0 b 0 c 0 d 0 ; n , m 1, 2 ,3 ...} n m n m If a0, b0, c0, d0 represent unit length lines L {a b c ;1 n 3} n Consider, a: n n 0o horizontal unit length b: 120o unit length c: 240o unit length L(G) represents the class of equilateral triangles What is the grammar? Make it up from domain knowledge There is no unique solution FS Grammar solution VT = {a,b,c} VN = {S, A, B, C, D, E, F, G, H, I, J, K} S aA C bI H bK S aA S aC D bF I c A aD A aB F bJ J cI A aD E bG K cJ B aE G bH 2 2 a b c 2 D bF F bJ J cI I c CS Grammar solution VT = {a,b,c} P VN = {S, A, B, C, D, E, F} S aAF A aBF D bc A b B aEF C b A aDF E bD F c Syntax Analysis •Let x be the unknown pattern. Recognition task is finding L(Gi) such that x belongs to L(Gi) •i.e. Given a string x and a grammar G, construct a triangle with the top vertex S and the bottom side x inside which will be the derivation parse tree •Top-down and Bottom-up parsing methods can be used L i , G i 1 i m C i 1 i m S x Stochastic Languages Probabilities are associated with production rules- stochastic grammar Stochastic language is one obtained by such a grammar Probability of obtaining x is p ( x ) p ( P1 ) p ( P 2 | P1 ) p ( P n | P1 , P 2 , P n 1 ) Tree representations A string s1 is directly derived from string s2 in G ( s 1 s 2 ) if there exists a rule in G such that s1 is the result of replacing by . In general, s is derived from the initial symbol of G, S, if there exists a sequence of strings from which we can derive s G from S, i.e., S s Parsing is the reverse of generation

Descargar
# Slide 1