PODS 2011 2011 ACM SIGMOD/PODS Conference Athens, Greece Finding a Minimal Tree Pattern Under Neighborhood Constraints Benny Kimelfeld Yehoshua Sagiv IBM Research – Almaden The Hebrew University of Jerusalem carrey williams DB (Graph) Search widom ullman search search XKeyword [Balmin et al.] (XML) BANKS [Bhalotia & al.] (rel.) eu brussels search [Achiezra, K, S] (XML) DBXploer [Agrawal et al.] DISCOVER [Hristidis et al.] SPARK [Lou et al.] Blinks [He et al.] RDF search [Tran et al.] … 2 DB Search: 2 Approaches graph algorithm DB data graph e.g., top-k Steiner-trees search results query1 query2 schema query gen. query3 DB query engine SQL, XQuery, … search results3 Aided Query Formulation ExQueX [K et al.] QUICK [Zenz et al.] 4 Building Integration Forms Q System [Talukdar et al.] concepts (as keywords) 5 The Problem (Informal) employee schema: eid name type dept dept id name head (Mary’s department SELECT * FROM employee e, dept d WHERE e.name=‘Mary’ and e.dept=d.id { Mary, dept } { Mary, manager } (managers named Mary) SELECT * FROM employee e WHERE e.name=‘Mary’ AND e.type=‘manager’ (Mary’s manager) SELECT * FROM employee e1, employee e2, dept d WHERE e1.name=‘Mary’ AND e1.dept=d.id AND d.head=e2.eid AND e2.type=‘manager’ { Jacob, Mary, manager } { dept, dept } { Mary, dept, dept } bags of labels … 6 And in XML DTD <!ELEMENT department (facility|(branch*,headq))> <!ELEMENT branch (facility)> ... { facility, headq } department[branch/facility][headq] department { facility, facility } branch headq facility department branch branch facility facility department[branch/facility][branch/facility] 7 Techniques for Query Extraction incremental query construction (candidate networks) subtree enumeration on the schema graph • Discover/XKeyword [Balmin et al.] • DBXploer [Agrawal et al.] • SPARK [Lou et al.] • ExQueX [K et al.] • QUICK [Zenz et al.] • Q System [Talukdar et al.] • Acyclic queries ─ tree patterns • Ranking dominated by size: smaller = better 8 Example of Query Extraction employee eid name dept type dept id e22 Marie manager a a db e22 e68 Jacob b b sales e78 regular name head regular employee regular employee regular employee regular dept manager dept employee dept regular employee employee manager regular manager manager employee manager dept employee manager 9 Neighborhood Constraints employee dept dept regular manager dept #≤1 #≤1 #=1 , ( regular ? | ( manager , dept* ) ) possible neighbors of employee dept employee employee #=1 possible neighbors of dept 10 Efficiency Matters! Existing solutions usually experiment on tiny schemas; an exception: Q System [Talukdar et al, VLDB’08] • 408 relations • 1366 table references w/o taking constraints into account, larger K needed! Speed is important since patterns are typically extracted following a user query… Schemas can be much larger, e.g., SAP: …company data model with over 13,000 database tables. To manage the meta data (e.g., types and interrelationships) of these tables… Kemper et al., “Performance tuning for SAP R/3”, IEEE Data Eng. Bull., 1999. 11 Techniques for Query Extraction incremental query construction (candidate networks) subtree enumeration on the schema graph • Discover/XKeyword [Balmin et al.] • DBXploer [Agrawal et al.] • SPARK [Lou et al.] • ExQueX [K et al.] • QUICK [Zenz et al.] • Q System [Talukdar et al.] No guarantee on the running time Can construct exponentially many intermediate, partial patterns, before finding even 1 tree pattern Repeated labels are not allowed employee regular dept employee manager No neighborhood constraints 12 In the Paper Finding a Minimal Tree Pattern Under Neighborhood Constraints • 1st provably efficient algorithms allowing expressive constraints (and repeated labels) • Future work: extending to top-k – In the paper: subtle issues in defining top-k • Existing techniques for top-k repeatedly solve top-1, w/ additional constraints – – – – Shortest simple paths [Yen, 1971] Smallest Steiner trees [K & S, 2006] Most probable answers over Markov sequences [K & Ré, 2010] … 13 Abstraction: Schema (undirected) finite set of labels label1 label2 bag-of-labels1 bag-of-labels2 bag-of-labels3 bag-of-labels17 bag-of-labels18 bag-of-labels19 .. . .. . ... labeln bag-of-labels54 bag-of-labels55 bag-of-labels56 .. . neighborhood constrains each constraint is a (possibly infinite) set of bags, in some rep. language 14 Problem Definition (undirected model) label1 label2 bag-of-labels1 bag-of-labels2 bag-of-labels3 bag-of-labels17 bag-of-labels18 bag-of-labels19 .. . ... .. . schema S ... labeln input weights allowed bag Λ of labels bag-of-labels54 bag-of-labels55 bag-of-labels56 e.g., { Mary, manager}, {country, country, border} .. . goal: minimal tree T, s.t. • T contains Λ • T is consistent with S ∀ nodes v∈T: labels( nbrs(v) ) ⊆ some bag-of-labelsi of those of label(v) v label2 ⊆ bag-of-labels17 bag-of-labels18 bag-of-labels19 .. . 15 label1 label2 labeln bag-of-labels bag-of-labels11 bag-of-labels bag-of-labels22 bag-of-labels bag-of-labels3 bag-of-labels bag-of-labels1717 bag-of-labels bag-of-labels1818 bag-of-labels bag-of-labels19 bag-of-labels bag-of-labels5454 bag-of-labels bag-of-labels5555 bag-of-labels bag-of-labels56 Simple Example publication #=1 title ... ... #≤1 journal ... ... ... 19 conference#≤1 VLDB publication ... ... 3 PODS 56 author author journal conference publication publication VLDB PODS conference conference VLDB PODS publication conference conference VLDB PODS 16 Complexity Measures input: Λ (bag of labels) S (schema) output: min T s.t. Λ ⊆ T & T is consistent w/ S (constraints in rep. language ) NP-hard, already under trivial rep. languages… (even to approx.) But Λ is typically tiny! derived from a user-phrased (search) query fixed |Λ| “efficient” = e.g., O(|S||Λ|) possible under a very general condition on the rep. language (not discussed, see paper) even better: FPT, i.e. Fixed-Parameter Tractable “efficient” = e.g., O(2|Λ||S|2) next slides 17 Rep: Mutual-Exclusion Graphs #≤1 journal publication author #≤5 conference#≤1 interval graph #≤1 conf-date title #= 1 edition #≤1 Theorem: not FPT (under standard assumptions; FPT reduction from parameterized Independent-Set, which is W[1]-hard [Abrahamson et al., 93]) Theorem: FPT for classes of mux graphs: disjoint cliques more generally, interval graphs more generally, circular-arc graphs 18 Rep: Regular Expressions e := label | ε | e* | e? | e,e | e|e publication title, author*, (journal | (conference, conf-date) | edition) Theorem: FPT 19 Proof Strategy (Algorithm) input: Λ (bag of labels) S (schema) output: min T s.t. Λ ⊆ T & T is consistent w/ S (nontrivial) adaptation of Dreyfus & Wagner’s algorithm for Steiner trees reduction labeled bag cover generalized minimum set cover: bags instead of sets, cover needs to satisfy a neighborhood constraint dynamic prog. for disjoint-cliques mux graphs works for interval graphs regular expressions Thanks: Ryan & Virginia Williams circular-arc graphs 20 Summary Questions? • Plethora of tools extract minimal tree patterns from the schema (relational / XML /…) – Existing solutions: exptime and/or no repeated labels; not allowing even basic neighborhood constraints • Simple & general abstraction of the problem – allowing neighborhood constraints • Algorithms – In particular, our FPT alg.s find a min pattern under 2 constraint languages: regexp & mutual exclusion • Next step: based on this work, top-k 21

Descargar
# Document