Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao Introduction Huge amount of information online: – Distribution: Not every query can be answered by the data in a single database • – Fragmentation: horizontal, vertical Heterogeneity • Notational heterogeneity: – • Conceptual heterogeneity: – – Different access language and protocol: Parsing HTML, SQL, OQL, Z39.50 Semantic mismatches Instability Introduction Intelligent agents – Search and find desired information – Convert formats – Translate different context – Etc… – Not feasible yet – Considerable research in ontologies and natural language understanding is required Introduction Infomaster: an information integration tool – Provide integrated access – Manage evolving information sources – Add new information sources – Remove outdated information sources Architecture Tested Application Areas Newspaper classifieds – Provide a uniform search interface – Gather corresponding classifieds from all relevant newspapers Product catalogs – Provide terminology translation Campus databases Abstraction Hierarchy Descriptions of Relationships Interface relation & Site relation: in the terms of Base relation Interface relation v.s. Base relation: Interface Base Descriptions of Relationships Site relation v.s. Base relation: Site Base Base Descriptions of Relationships Site relation v.s. Base relation: Site Base Base Query Processing Example: BMWs built in 1996 that are for sale for a Price below their average market value. Reduction: Interface relations Base relations Simple: User’s query --- Interface relation --- Base relation Example rewritten query: Abduction Base relations Site relations Site relations are expressed in terms of base relations, but not vice versa Query rewritten problem: answer queries using views Abduction: use a standard model elimination theorem prover Abduction Base relations Site relations : The set of all descriptions of the site relations : A set of site relations : The rewritten user query after the reduction step Abduction Base relations Site relations The example query plans: Optimization Assume: All ads in sjmn are in sfc Conclusions The first integration system: – Arbitrary positive relational algebra user queries – DB description Efficient optimization by use: – Integrity constraints – Local completeness information Flexible Use of query planning: – Expressive description language – Constraint – Background theories Related Works Information Manifold project and SIMS project: – Explore the use of descriptions logics for describing information sources Occam project – Use general AI planning techniques to generate information gathering plans TSIMMIS project – Use pattern matching techniques to match user queries and predefined queries.