Semantic Web Technologies:
A Tutorial
Li Ding
University of Maryland Baltimore County
Joint work with Deborah McGuinness, Tim Finin and Anupam Joshi
Presented at Kodak Research Laboratories, Rochester, New York
18 July 2006
@
2
The Web has made people smarter
craigslist
del.icio.us
@
3
But what about machines?
tell
register
Machines still have a very minimal
understanding of text and images.
@
4
Motivation: machine-friendly data

Natural Language
Li Ding is a person
LiDigisasaon
as seen by a person

XML – represent structures
<person>Li Ding</person>
as seen by a person

as seen by a machine
<on>LiDig</on>
as seen by a machine
Semantic Web - represent more semantics



represent structures
enable common vocabulary
associate symbols with logic interpretation for inference
@
Semantic Web Technologies
@
6
Semantic Web Layers
Semantic
Aspect
Web
Aspect
HTTP
"The Semantic Web is an extension of the current web in which information is
given well-defined meaning, better enabling computers and people to work in
cooperation.“ – Berners-Lee, Hendler & Lassila, Scientific American, 2001
Image source: http://en.wikipedia.org/wiki/Image:W3c_semantic_web_stack.jpg
@
7
The Semantic Web is simple

Each URI denotes a concept
Don't say "colour" say <http://example.com/2002/std6#col>

URIs are connected by triples
Relational database

RDF (Resource Description Framework)
Machines read data as directed RDF graph
Source: Tim Berners-Lee, Putting the Web back into Semantic Web, ISWC2005 Keynote
@
8
Example: RDF graph and syntax
http://xmlns.com/foaf/0.1/name
t1
t2
Li Ding
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
RDF Graph
URI, Literal, BNode
Triple
http://xmlns.com/foaf/0.1/Person
The entire graph means: there exist a person whose name is “Li Ding”.
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:foaf=http://xmlns.com/foaf/0.1/
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#“>
<foaf:Person>
<foaf:name>Li Ding</foaf:name>
</foaf:Person>
Data encoded in RDF/XML syntax
</rdf:RDF>
XML
unicode
Namespace
URI as tag
Alternative RDF syntax languages: N3(notation 3), N-Triples, Turtle
@
9
Example: Surfing RDF graphs
G1: http://cs.umbc.edu/~dingli1/foaf.rdf
http://cs.umbc.edu/~dingli1/foaf.rdf#dingli
foaf:name
foaf:knows
rdf:type
Surf to definition
G3: http://xmlns.com/foaf/1.0/
Li Ding
foaf:Person
foaf:mbox
rdf:type
wordNet:Agent
mailto:[email protected]
rdfs:seeAlso
http://cs.umbc.edu/~finin/foaf.rdf
Surf to another instance
G2: http://cs.umbc.edu/~finin/foaf.rdf
foaf:mbox
mailto:[email protected]
foaf:firstName
foaf:surname
Finin
Tim
rdfs:subClassOf
foaf:Person
rdf:type
rdfs:Class
rdfs:domain
foaf:mbox
rdf:type
rdf:Property
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs: http://www.w3.org/2000/01/rdf-schema#
foaf: http://xmlns.com/foaf/1.0/
@
Example: Serving human & machine
10
The Original RDF/XML for machines
The HTML is generated by applying XSLT on RDF/XML
@
11
Ontology Spectrum
Catalog/ID
Thesauri
“narrower
term”
relation
DB Schema
Terms/
glossary
Disjointness,
Frames
Formal
Inverse,
is-a (properties) part of…
UMLS
RDF
Wordnet
OO
Formal
Informal instance
is-a
RDFS
DAML CYC
OWL IEEE SUO
Value
Restriction
Simple
Taxonomies
Source: Originally by Deborah L. McGuinness (KSL, Stanford), modified by Tim Finin
General
Logical
constraints
Expressive
Ontologies
@
Ontology Languages: RDFS and OWL

RDFS





12
Set theory – rdfs:Class
Relation – rdf:Property, rdfs:domain, rdfs:range
Hierarchy – rdfs:subClassOf, rdfs:subPropertyOf
Built-in Datatype – xsd:string, xsd:dataTime
OWL

Description Logic



Class axioms





oneOf, disjointWith, unionOf, complementOf, intersectionOf …
Restriction, onProperty, cardinality, hasValue…
Property axioms


Class, Thing, Nothing
DatatypeProperty, ObjectProperty, AnnotationProperty,…
inverseOf , TransitiveProperty , SymmetricProperty
FunctionalProperty, InverseFunctionalProperty
Equality– equivalentClass , sameAs , differentFrom…
Ontology annotation – Ontology, imports, versionInfo
@
13
Example: Inference using ontologies


Ontology Languages (RDFS, OWL) has formal foundations that
allow us to infer additional (implicit) statements
 RDFS provides basic ones, e.g. sub-class, sub-property, domain
 OWL adds many more axioms, e.g. inverse-property, equality,
SWRL (Semantic Web Rule Language) enables a general
purposed solution
 Supports rule representation
 But also requires inference support beyond RDFS and OWL
hasbrother rdfs:subPropertyOf hasSibling
hasSibling
#Joe
hasBrother
hasUncle
hasChild owl:inverseOf hasParent
hasParent
#Louise
hasChild
#Deborah
SWRL: (x hasParent y) (y hasBrother z) => (x hasUncle z)
Source: Semantic Web tutorial (AAAI 2005) by Deborah L. McGuinness
@
More languages and more ontologies

Languages (require special inference engine)

[Trust/Uncertainty] BayesOWL

[Proof] PML (Proof Markup Language)

[Query/Data Access] SPARQL Query Language for RDF
[Rule] SWRL( Semantic Web Rule Language)
[Policy] REI: A Policy Specification Language






14
[Service] OWL-S by DAML (1.2 preview available)
[Service] SAWSDL (Semantic Annotations for WSDL)
[Thesauri] SKOS (Simple Knowledge Organization System)
Ontologies (only need RDFS and/or OWL inference)



Upper ontologies - OpenCyc, WordNet, OntoSem, SUO
Specialized common ontologies - FOAF, Dublin Core, RSS
Domain ontologies – bibtex, biology, and many…
Li Ding, Pranam Kolari, Zhongli Ding, and Sasikanth Avancha, “Using Ontologies in the Semantic Web: A Survey”, in Ontologies
in the Context of Information Systems (book chapter), 2005. http://ebiquity.umbc.edu/paper/html/id/257/
@
15
Semantic Web Tools
Editor
Online Registry
 DAML Ontology Library
 Schema Web
Search Engine
publish
Swoogle
Semantic Web Search
Browser
Tabulator
IsaViz
Piggybank
Arago
Horus
Mspace
Magpie
browse
Protégé
Swoop
Reasoner
create
inference
Managing
Ontologies
instance
update
extend
integrate
Mapping Tools
Pellet (DL)
Racer (DL)
FACT++ (DL)
Jena
JTP
F-OWL
Euler
CWM
Jena (SPARQL)
KAON
Kowari
Seasam
OWLIM
3store
Triple store Instance store
Redland
Tap
RDF store
Yars
ONION
IBM IODT
PROMPT
OntoMapper RDFLib
RDF gateway
Glue
allegro
OntoMerge
Oracle 10
Ontomorph
source1: http://ebiquity.umbc.edu/paper/html/id/257/Using-Ontologies-in-the-Semantic-Web-A-Survey
source2: http://www.wiwiss.fu-berlin.de/suhl/bizer/toolkits/
@
Semantic Web data
@
Semantic Web data sources
17
Text editor: I write RDF/XML manually.
 Semantic Web Editors: Protégé, Swoop
 Information Extraction (consumer side)

NLP (hard), e.g. SemNews
 heuristic scrapping (regular expr.), e.g. Semagix Freedom


Wrapped database content (publisher side)
blog, social network websites, e.g. livejournal.com
 academic interests: http://www.mindswap.org/,
http://ebiquity.umbc.edu


Generated by software
creative commons license embedded in HTML
 embedded metadata JPEG, PDF (XMP)
 agent communication message


…
@
18
The Scale of the Semantic Web
Statistics based Semantic Web data indexed by Swoogle
Year
Terms Documents Individuals Triples
Bytes
(million) (million)
(million) (million) (billion)
2004
0.15
0.33
7.3
48
4.3
2006
1.9
1.6
16
276
47
2008
10
100
1000
20,000
3000
Estimated number of documents based on Google query
Docs
Optimistic
109
Conservative 105
Corresponding Google query
rdf OR inurl:rss OR inurl:foaf -filetype:html
rdf filetype:rdf
@
Where the data from


19
“com” has contributed the largest portion of websites (71%) and pure
SWDs (39%) because industry has adopted virtual hosting technology
as well as ontologies such as RSS and FOAF
most SWOs are from “org” (46%, e.g. www.w3.org) and “edu” (14%,
e.g., spire.umbc.edu) because of the deep interests in developing
ontologies from academia and non-profit organizations.
SWDs: Semantic Web documents; SWOs: semantic web ontologies; pure SWD: not embeded
note: Statistics of top level domain is also used in characterizing the Web (Henziger and Lawrence 2004)
@
20
Source websites of SWD
Jan 2005- Aug 2005
100000
1000000
1, 125911
100000
2, 17474
3, 5200
10000
1000
y = 6236.7x -0.6629
R2 = 0.9622
100
10
80401, 2
100517, 1
1
1
10
100
1000
10000 100000 100000
0
y: # of websites hosting >= m SWDs
y: # of websites hosting >= m SWDs
Jan 2005- Mar 2006
10000
1000
100
y = 6598.8x -0.7305
R2 = 0.9649
10
1
1
10
100
1000
10000
100000 1000000
m: # of SWDs
m: # of SWDs

Invariant found!




The number of websites hosting more than m SWDs follows
power law distribution
Similar to the Web
Head: virtual hosting
Tail: crawling strategy
@
21
Size of SWD
Number of SWDs

Embedded SWDs are small



69% have 3 triples
96% have <10 triples;
Pure SWDs


60% have 5 to 1000 triples.
Special size of RSS 130

Number of SWOs


SWOs



# of triples
17 triples for channel
7 triples for each of the 15
items
Biased by PML,
Small ones from RDF test
Largest is 1M
@
22
Age of SWD

Measured by the last-modified time of SWD


PSWD: Exponential distribution
SWO: flat tail -- ontology development interests decrease?
pswd
swo (pml filtered)
1000000
Expon. (pswd)
y = 2E-48e
0.0032x
100000
10000
1000
100
10
1
7/20/1995 4/15/1998
1/9/2001
10/6/2003
7/2/2006
@
How Semantic Web Terms are used?


23
All usage distributions follow Power distribution
Few SWTs been well populated


371 has >100 class-instance
1208 has>100 property-instances
@
24
Swoogle Rank (citation based)
http://www.w3.org/2000/01/rdf-schema
http://www.w3.org/1999/02/22-rdf-syntax-ns
indegree=1,077,768,mean(inflow)=0.100
0.25
1
0.11
2
indegree=432,984,mean(inflow)=0.039
0.51
0.10
0.30
0.35
0.11
http://www.w3.org/2002/07/owl
indegree=86,959,mean(inflow)=0.069
0.18
0.16
5
0.03
indegree=270,178,mean(inflow)=0.168
0.20
0.10
6
0.12
0.43
http://purl.org/rss/1.0
8
http://web.resource.org/cc
0.17
indegree=57,066,mean(inflow)=0.195
0.21
0.27
0.07 0.10
4
0.12
0.11
http://purl.org/dc/elements/1.1
0.07
0.06
0.16
0.12
0.20
0.08
10
http://www.hackcraft.net/bookrdf/vocab/0_1/
indegree=16,380,mean(inflow)=0.167
3
0.29
Computed using Swoogle metadata by May 2006
0.23
0.05
0.03
0.17
indegree=155,949,mean(inflow)=0.036
0.25
7
indegree=54,909,mean(inflow)=0.042
9
http://www.w3.org/2001/vcard-rdf/3.0
0.10
indegree=861,416,mean(inflow)=0.096
http://purl.org/dc/terms
0.27
http://xmlns.com/foaf/0.1/index.rdf
indegree=512,790,mean(inflow)=0.217
@
Semantic Web Applications
@
26
TAGA: Travel Agent Game in Agentcities
Motivation
Features
Market dynamics
Auction theory (TAC)
Semantic web
Agent collaboration (FIPA
& Agentcities)
Owl as a
content
language
Open Market Framework
Auction Services
OWL message content
OWL Ontologies
Global Agent Community
Ontologies
FIPA (JADE, April Agent Platform)
Semantic Web (RDF, OWL)
Web (SOAP,WSDL,DAML-S)
Internet (Java Web Start )
http://taga.umbc.edu/ontologies/
Owl for
protocol
description
Report Contract
travel.owl – travel concepts
fipaowl.owl – FIPA content lang.
auction.owl – auction services
tagaql.owl – query language
Owl for
representation
and reasoning
Report Direct Buy Transactions
Report Auction Transactions
Market Oversight
Agent
Bulletin Board
Agent
Customer
Agent
Technologies
Report Travel Package
Auction Service
Agent
Proposal
Direct Buy
Travel Agents
Web Service
Agents
Owl for
service
descriptions
FIPA platform infrastructure services, including directory facilitators enhanced to use OWL-S for service discovery
http://taga.umbc.edu (offline now)
@
27
Semantic Content Publishing




data stored in database
PHP generates both HTML
and OWL
HTML pages link to
corresponding OWL
no more web scraping
http://ebiquity.umbc.edu/person/html/Li/Ding/
http://ebiquity.umbc.edu/person/foaf/Li/Ding/foaf.rdf
FOAF
PHP
PHP
http://ebiquity.umbc.edu/ -- ebiquity group website
Mysql database
@
Rei Policy Language






28
Rei is a declarative policy language for describing
policies over actions
 Reasons over domain dependent information
Currently represented in OWL + logical variables
Based on deontic concepts
 Permission, Prohibition, Obligation, Dispensation
Models speech acts
 Delegation, Revocation, Request, Cancel
Meta policies
 Priority, modality preference
Policy engineering tools
 Reasoner, IDE for Rei policies in Eclipse
http://rei.umbc.edu/
@
Example: enforcing privacy policy
29
 The
speaker doesn’t want others to know the
specific room that he’s in, but is willing for
others to know he’s on campus
 He defines the following privacy policy

Share my location with a granularity >= “State”
 The
broker
isLocated(US) => Yes!
 isLocated(Maryland) => Yes!
 isLocated(UMBC) => Uncertain..
 isLocated(ITE-RM210) => Uncertain..

@
Cobra: Context Broker Architecture

Ontology

Agents

Service

Inference

Policy
http://cobra.umbc.edu/
30
@
Web-scale semantic web data access
agent
data access service
ask (“person”)
Search vocabulary
Compose query
Populate
RDF database
inform (“foaf:Person”)
31
the Web
Index RDF data
Search URIrefs
in SW vocabulary
ask (“?x rdf:type foaf:Person”)
inform (doc URLs)
Search URLs
in SWD index
Fetch docs
Query local
RDF database
@
Swoogle Semantic Web Search Engine


Harvesting Semantic Web
data from the Web
Provide search/navigation
services for machines (via
REST+ RDF/XML)




32
Digest doc, term, namespace
Links
Also serves human users
Status


Running since summer 2004
1.6M RDF documents, 300M
RDF triples, 10K ontologies
http://swoogle.umbc.edu/
@
33
Ontology Dictionary



From web of document to web of data
Aggregate from multiple sources
Inductively learned definition
Onto 1
foaf:name
Onto 2
rdf:type
owl:Class
rdfs:domain
foaf:Person
foaf:Person
foaf:Agent
rdfs:subClassOf
foaf:name
rdfs:domain
rdf:type
owl:Class
wob:hasInstanceDomain
foaf:Person
wob:hasInstanceDomain
foaf:Agent
dc:title
rdfs:subClassOf
SWD3
foaf:name
foaf:Person
rdf:type
dc:title
Tim Finin
Dr.
http://swoogle.umbc.edu/2005/modules.php?name=Ontology_Dictionary
@
Semantic Web Challenges - Winners
2003
CS AKTive Space (CAS) is an integrated
Semantic Web application which provides a
way to explore the UK Computer Science
Research domain across multiple
dimensions for multiple stakeholders, from
funding agencies to individual researchers.
34
2004
Flink itself is also likely to be unique as a
crossover between a social experiment
and a semantic application.
2005
CONFOTO is a browsing and annotation
service for conference photos.
http://challenge.semanticweb.org/
@
35
Triple Shop: SPARQL dataset finder
Who knows Anupam Joshi?
Show me their names, email address
and pictures
1. Compose a SPARQL query
without FROM clause
2. Parse SPARQL query, search
Swoogle for related URLs,
and compose a dataset
3. Run SPARQL query on dataset
http://sparql.cs.umbc.edu/tripleshop2/
@
36
Integrating Social Networks
data
 FOAF


FOAF Network
Reputation Systems
J. Golbeck
source
knows RDF
RDF/XML
Citeseer Rank
knows
L. Ding
H. Chen
P. Kolari

DBLP


Coauthor
Database
HTML
Google PageRank
knows
J. Hendler
knows
F. Perich
T. Finin
Kagal
A. Joshi
hub
Golbeck’s
Trust Network
sink
island

Trust


sameName
Reputation
Trust network
Computation
 Entity mapping
 Tie strength
 Trust aggregation
Y. Peng
L. Ding
co-author
L. Kagal
T. Finin
28
6
1
A. Sheth
A. Joshi
1
5
M. P. Singh
H. Chen
F. Perich
DBLP Coauthor Network
@
37
Inference Web Infrastructure
WWW
SDS
OWL-S/BPEL
(DAML/SNRC)
CWM
(TAMI)
JTP
(DAML/NIMD)
SPARK
(CALO)
N3
KIF
SPARK-L
UIMA Text Analytics
(NIMD/Exp Agg)
Proof Markup
Language (PML)
Trust
Justification
Provenance
Toolkit
IWTrust
Trust computation
IW Explainer/
Abstractor
End-user friendly
visualization
IWBrowser
Expert friendly
Visualization
IWSearch
search engine
based publishing
IWBase
provenance
registration
[Inference Web] Framework for explaining question answering tasks by abstracting,
storing, exchanging, combining, annotating, filtering, segmenting, comparing, and
rendering proofs and proof fragments provided by question answerers.
@
PML: Proof Markup Langauge
38
isQueryFor
Query
foo:query1
(type TonysSpecialty ?x)
Question foo:question1
(what is Tony’s Specialty)
IWBase
hasAnswer
hasLanguage
NodeSet
foo:ns1
(hasConclusion …)
fromQuery
isConsequentOf
Language
hasInferencEngine
InferenceEngine
hasRule
InferenceStep
InferenceRule
hasAntecendent
…
NodeSet
foo:ns2
(hasConclusion …)
fromAnswer
InferenceStep
Source
hasVariableMapping
Mapping
isConsequentOf
hasSourceUsage SourceUsage hasSource
usageTime …
Justification Trace
@
IWBrowser – Justification and Provenance
39
@
Tracking Provenance via RDF Molecule
decompose
An RDF graph G
40
The graph’s RDF molecules
http://www.cs.umbc.edu/~dingli1
t2 foaf:knows
t1 foaf:name
t1
Li Ding
t4
t3 foaf:name
foaf:mbox
t2
Tim Finin
t3
t4
t3
mailto:[email protected]
Match sub-Graph
Web pages containing one or more molecules discovered by Swoogle
Ding, L.; Finin, T.; Peng, Y.; Pinheiro da Silva, P.; McGuinness, D.L. Tracking RDF Graph Provenance using RDF Molecules.
Proceedings of the Fourth International Semantic Web Conference (poster), November 2005. 2005 ,
@
http://www-ksl.stanford.edu/KSL_Abstracts/KSL-05-06.html
Conclusion

The Semantic Web



simple but powerful
Standardized by W3C: RDF, RDFS, OWL
Current focuses






41
Query -- SPARQL
Rules – SWRL, RIF
Web services – OWL-S, WSDL-S, SAWSDL
Best practice and deployment
but cannot do everything
Open questions


Business model, Industry adoption?
Privacy?
@
42
Recommended Readings

Tutorials





Starting points








Semantic Web Road map, (since 1998), Tim Berners-Lee
The Semantic Web, Scientific American, May 2001, Tim Berners-Lee, James
Hendler and Ora Lassila
Ontology Development 101: A Guide to Creating Your First Ontology, 2001,
Natalya F. Noy and Deborah L. McGuinness
Semantic Web Tutorials, http://www.w3.org/2001/sw/BestPractices/Tutorials
W3C Semantic Web activity, http://www.w3.org/2001/sw/
W3C Semantic Web Interest Group, http://www.w3.org/2001/sw/interest/
W3C Semantic Web News, http://www.w3.org/2001/sw/news
Planet RDF - aggregated blogs, http://planetrdf.com/
Dave Beckett’s Resource Description Framework (RDF) Resource Guide
Swoogle Semantic Web Search Engine, http://swoogle.umbc.edu
Semantic Web reference card, http://ebiquity.umbc.edu/resource/html/id/94/
Conferences and Journals




International Semantic Web Conference (ISWC)
European Semantic Web Conference (ESWC)
Semantic Technology Conference (SemTech)
Journal of Web Semantics
@
Ongoing W3C’s Semantic Web Activity

RDF Data Access Working Group


RuleML => SWRL=> RIF
Best Practices Working Group






RDQL… => SPARQL
Rules Interchange Working Group


43
Vocabulary management, e.g. WordNet
Thesauri– SKOS (Simple Knowledge Organization System)
Image Annotation
DOAP (Description of a Project)
Many tutorials and demos
Semantic Annotations for Web Services Description
Language Working Group


OWL-S and WSDL-S
WSDL 2.0
@
Descargar

Document