Finding knowledge,
data and answers on
the Semantic Web
Tim Finin
University of Maryland, Baltimore County
http://ebiquity.umbc.edu/resource/html/id/223/
Joint work with Li Ding, Anupam Joshi, Cynthia Parr,
Joel Sachs, Andriy Parafiynyk and Lushan Han
UMBC
an Honors University in Maryland
 http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by
DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433
1
This talk
• Motivation
• Semantic Web background
• Swoogle Semantic Web
search engine
• Use cases and applications
• Social Semantic Web
• Conclusions
2
Google has made us smarter
3
But what about our agents?
tell
register
Agents still have a very minimal
understanding of text and images.
4
But what about our agents?
Swoogle
Swoogle
Swoogle
Swoogle
tell
Swoogle
Swoogle
Swoogle
register
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
A Google for knowledge on the Semantic Web
is needed by software agents and programs
5
This talk
• Motivation
• Semantic Web background
• Swoogle Semantic Web
search engine
• Use cases and applications
• Social Semantic Web
• Conclusions
6
Brief history of the Semantic Web
Tim Berners-Lee’s original 1989
WWW proposal described a web of
relationships among named
objects unifying many info.
management tasks.
• Guha’s MCF (~94)
• XML+MCF=>RDF (~96)
• Semantic Web coined (~97)
• RDF+OO=>RDFS (~99)
• RDFS+KR=>DAML+OIL (00)
• W3C’s SW activity (01)
• W3C’s OWL (03)
• SPARQL (06)
• Rules, RDFa, ….
http://www.w3.org/History/1989/proposal.html
7
Interest is high
• Interest in industry, government and VCs is
high
• RDF is in Adobe’s products, Oracle 10g and
11g, Microsoft Vista, and Yahoo’s food portal
• Several high-visibility startups use RDF
– Joost (internet TV), Teranode (Bioinformatics),
Garlik (personal info monitoring)
• And, if you want more evidence that interest
is high …
8
$1795
$695
CD Only
9
What do we mean by “Semantic Web”
Semantic
Web
PowerSet
explicit
semantics
“NLP”
Folksonomies
XML
Tags
“a smarter Google”
KR based
topic maps
ad hoc
approaches
Microformats
RDF+OWL
other
structured
Google Base
Freebase
10
RDF is the first SW language
Graph
XML Encoding
<rdf:RDF ……..>
<….>
<….>
</rdf:RDF>
Good for
Machine
Processing
RDF
Data Model
Triples
stmt(docInst, rdf_type, Document)
stmt(personInst, rdf_type, Person)
stmt(inroomInst, rdf_type, InRoom)
stmt(personInst, holding, docInst)
stmt(inroomInst, person, personInst)
Good For
Human
Viewing
Good For
Reasoning
• RDF is a simple language for building graph based representations
• Grounded in web standards
• With terms to support ontologies, description logic, rules and much of first
order logic
11
IMHO
• Better NLP will help search engines, it’s a long
term, incremental project
• We need an well-defined and extensible
representation system for explicit knowledge
• It should be backed by open, non-proprietary
standards supported by industry, Government and
other interested parties
• The W3C approach is not perfect
• But “The perfect is the enemy of the good.”
• “Semantic Web” vs. “semantic web”
12
This talk
• Motivation
• Semantic Web background
• Swoogle Semantic Web
search engine
• Use cases and applications
• Social Semantic Web
• Conclusions
13
• http://swoogle.umbc.edu/
• Running since summer 2004
• 2.1M RDF docs, 420M triples, 10K ontologies,
15K namespaces, 1.5M classes, 185K properties,
49M instances, 800 registered users
14
Swoogle Architecture
Analysis
SWD classifier
Ranking
Index
…
Search Services
IR Indexer
SWD Indexer
Semantic Web
metadata
Web
Server
Web
Service
html
Discovery
document cache
Candidate
URLs
SwoogleBot
Bounded Web Crawler
Google Crawler
rdf/xml
the Web
Semantic Web
human
machine
Legends
Information flow
Swoogle‘s web interface
15
A Hybrid Harvesting Framework
true
Submissions & pings
Inductive learner
would
Seeds M
Meta crawling
Seeds R
Seeds H
Bounded HTML crawling
google
Google API call
Swoogle
Sample
Dataset
crawl
RDF crawling
crawl
the Web
16
This talk
• Motivation
• Semantic Web background
• Swoogle Semantic Web
search engine
• Use cases and
applications
• Social Semantic Web
• Conclusions
19
Applications and use cases
1 Supporting Semantic Web developers
– Ontology designers, vocabulary discovery, who’s using my
ontologies or data?, use analysis, errors, statistics, etc.
2 Searching specialized collections
– Spire: aggregating observations and data from biologists
– InferenceWeb: searching over and enhancing proofs
– SemNews: Text Meaning of news stories
3 Supporting SW tools
– Triple shop: finding data for SPARQL queries
20
1
21
80 ontologies were found that
had these three terms
By default, ontologies are ordered
by their ‘popularity’, but they can
also be ordered by recency or size.
Let’s look at this one
22
Basic Metadata
hasDateDiscovered: 2005-01-17
hasDatePing: 2006-03-21
hasPingState: PingModified
type: SemanticWebDocument
isEmbedded: false
hasGrammar: RDFXML
hasParseState: ParseSuccess
hasDateLastmodified: 2005-04-29
hasDateCache: 2006-03-21
hasEncoding: ISO-8859-1
hasLength: 18K
hasCntTriple: 311.00
hasOntoRatio: 0.98
hasCntSwt: 94.00
hasCntSwtDef: 72.00
hasCntInstance: 8.00
23
Who uses this ontology and
how do they access it?
24
rdfs:range
was used 41 times
owl:ObjectProperty
was
time:Cal…
defined
once
and
to
assert
a
value.
instantiated
28 times
used
24 times (e.g.,
as range)
25
These are the namespaces this
ontology uses. Clicking on one
shows all of the documents using
the namespace.
All of this is available
in RDF form for the
agents among us.
26
Here’s what the agent sees.
Note the swoogle and wob
(web of belief) ontologies.
27
We can also search for
terms (classes, properties)
like terms for “person”.
28
10K terms associated with
“person”! Ordered by use.
Let’s look at foaf:Person’s metadata
29
Metadata stored for a term is
information about it’s definition –
both what and by whom
30
10K terms associated with
“person”! Ordered by use.
31
How do other terms use foaf:Person? 100
documents assert that foaf:publication is a
property of a foaf:Person
32
87K documents used foaf:gender with a
foaf:Person instance as the subject
33
3K documents used dc:creator with a
foaf:Person instance as the object
34
Swoogle’s archive saves every
version of a SWD it’s seen.
35
36
2
An NSF ITR collaborative project with
• University of Maryland, Baltimore County
• University of Maryland, College Park
• U. Of California, Davis
• Rocky Mountain Biological Laboratory
37
An invasive species scenario
• Nile Tilapia fish have been found in a California lake.
• Can this invasive species thrive in this environment?
• If so, what will be the likely
consequences for the
ecology?
• So…we need to understand
the effects of introducing
this fish into the food web
of a typical California lake
38
Food Webs
• A food web models the trophic (feeding)
relationships between organisms in an ecology
– Food web simulators are used to explore the
consequences of changes in the ecology, such as the
introduction or removal of a species
– A locations food web is usually constructed from studies
of the frequencies of the species found there and the
known trophic relations among them.
• Goal: automatically construct a food web for a new
location using existing data and knowledge
• ELVIS: Ecosystem Location Visualization and
Information System
39
East River Valley Trophic Web
http://www.foodwebs.org/
40
Species List Constructor
Click a county, get a species list
41
The problem
• We have data on what species are known to be in
the location and can further restrict and fill in with
other ecological models
• But we don’t know which of these the Nile Tilapia
eats of who might eat it.
• We can reason from taxonomic data (similar
species) and known natural history data (size,
mass, habitat, etc.) to fill in the gaps.
42
43
Food Web Constructor
Predict food web links using database and taxonomic reasoning.
In an new estuary, Nile
Tilapia could compete
with ostracods (green)
to eat algae. Predators
(red) and prey (blue) of
ostracods may be
affected
44
Evidence Provider
45
Status
• ELVIS (Ecosystem Location Visualization and
Information System) as an integrated set of web services
for constructing food webs for a given location.
• Background ontologies
– SpireEcoConcepts: concepts and properties to represent food
webs, and ELVIS related tasks, inputs and outputs
– ETHAN (Evolutionary Trees and Natural History) Concepts and
properties for ‘natural history’ information on species derived
from data in the Animal diversity web and other taxonomic
sources. 250K classes on plants and animals
• Under development
– Connect to visualization software
– Connect to triple shop to discover more data
46
Supporting SW Tools
3
• Semantic Web applications can access Swoogle
through a REST-based Web interface or via
SQL.
• Two examples:
– A system to help scientists construct datasets from
RDF documents on the Web
– Tools to manage Semantic Web data in Blogs and
other forms of social media
47
UMBC Triple Shop
• http://sparql.cs.umbc.edu/
• Online SPARQL RDF query processing with several
interesting features
• Automatically finds SWDs for give queries using
Swoogle backend database
• Datasets, queries and results can be saved, tagged,
annotated, shared, searched for, etc.
• RDF datasets as first class objects
– Can be stored on our server or downloaded
– Can be materialized in a database or
(soon) as a Jena model
48
What’s SPARQL?
• SPARQL is the standard language (& protocol)
for querying RDF graphs
• Think: SQL for RDF
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntaxns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name ?email
FROM <http://rdf.example.org/people.rdf>
WHERE { ?person a foaf:Person .
?person foaf:name ?name .
OPTIONAL {?person foaf:mbox ?email} .
}
49
Who knows Anupam Joshi?
Show me their names, email address
and pictures
52
The UMBC ebiquity
site publishes lots of
RDF data, including
FOAF profiles
53
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?p2name ?p2mbox ?p2pix
FROM ???
WHERE { ?p1 foaf:surname "Joshi" .
FROM clause!
?p1 foaf:firstName No
“Anupam"
.
?p1 foaf:mbox ?p1mbox .
?p2 foaf:knows ?p3 .
?p3 foaf:mbox ?p1mbox .
?p2 foaf:name ?p2name .
?p2 foaf:mbox ?p2mbox .
OPTIONAL { ?p2 foaf:depiction ?p2pix } .
}
ORDER BY ?p2name
54
log in
specify dataset
Enter query w/o
FROM clause!
55
We want to create a reusable dataset
56
Find RDF data using terms found in the query
That also satisfy some simple
constraints (e.g., for trust)
57
302 RDF documents
were found that
might have useful
data.
58
We’ll select them all
and add them to
the current dataset.
59
We’ll run the query
against this dataset
to see if the results
are as expected.
60
The results can be
produced in any of
several formats
61
62
Looks like a useful
dataset. Let’s save it
and also materialize
it the TS triple store.
An extension will let us ask
that it be automatically
updated when constituents
change
63
We can also
annotate, save and
share queries.
65
This talk
• Motivation
• Semantic Web background
• Swoogle Semantic Web
search engine
• Use cases and applications
• Social Semantic Web
• Conclusions
66
• Social media sites have become the
biggest source of new content on the Web
• Blogs, Wikis, Photo sites, forums, etc.
• Accounting for ~1/3 of new Web content
67
• Social media sites have embraced new
ways of letting users add semantic
information
• Showing users the potential of semantics
69
Social Media and the Semantic Web
• Many are exploring how Semantic Web technology
can work with social media
• Social media like blogs are typically temporally
organized
– valued for their timely and dynamic information!
• If static pages form the Web’s long term memory,
then the Blogosphere is its stream of consciousness
• Maybe we can (1) help people publish data in RDF
on their blogs and (2) mine social media sites for
useful information
70
A BioBlitz involves going out
to an area and recording
every organism you see
The OWL icon links
to the data in RDF
71
72
A good Semantic Web opportunity
• We want to make it easy for scientists to enter
and collect information from social media
–Professionals, students and amateurs!
• Two early examples
–SPOTter – a tool to add Semantic Web data
to blogs
–Splickr – a system to mine Flickr for images
of organisms
73
SPOTter: SPire Observation Tool
• We’ve developed some simple components to help
people add RDF data to blogs and ping Swoogle to get
it indexed.
• SPOTter is an initial prototype that uses the ETHAN
ontology and is being used in some BioBlitz activities
with students.
• We’re working toward a version that uses Twitter so
that people can make the blog entries from the cell
phones via SMS
– The SPOTter agent will get the entries (via RSS)
and index the data
74
SPOTter
button
Once entered, the data is
embedded into the blog post
and Swoogle is pinged to
index it
75
Prototype
SPOTter
Search
engine
• We can draw a bounding box on
The map and find observations
• An RSS feed provided for each
query
76
Flickr
• The Flickr “photo sharing” site has millions of
photographs
– Many of plants and animals
• Most of them have descriptions, timestamps, tags and
even geo-tags
– Flickr has even introduced “machine tags” that can
be mapped into RDF
• Any Flickr users (humans or bots) can add comments
and annotations
• There’s a good API
• It could be a good source of ecological information
77
78
79
Results for people and machines
80
This talk
• Motivation
• Semantic Web background
• Swoogle Semantic Web
search engine
• Use cases and applications
• Social Semantic Web
• Conclusions
81
Conclusion
• The web will contain the world’s knowledge in
forms accessible to people and computers
– We need better ways to discover, index, search and
reason over SW knowledge
• SW search engines address different tasks than
html search engines
– So they require different techniques and APIs
• Swoogle like systems can help create consensus
ontologies and foster best practices
• Social media provide new challenges and
opportunities for the Semantic Web
82
For more information
http://ebiquity.umbc.edu/
Annotated
in OWL
83
Descargar

Document