http://ebiquity.umbc.edu/
Research Overview
23 August 2006
UMBC and Ebiquity
• UMBC is a research extensive University with a a
major focus on Information Technology
• Ebiquity is a large and active research group with
the goal of
“Building intelligent systems in open,
heterogeneous, dynamic, distributed
environments”
• Current research includes mobile and pervasive
computing, security/trust/privacy, semantic web,
multiagent systems, advanced databases, and high
performance computing
10/7/2015
Page 2
http://ebiquity.umbc.edu/
10/7/2015
Page 6
People and funding
• Faculty: Finin, Yesha, Joshi, Peng, Halem
• Colleagues: Oates, desJardins, Pinkston, Segall, …
• Students: ~10 PhD, ~10 MS, ~5 undergrad
• Funding
• Current: DARPA (Trauma Pod, STTRs), NSF (two
ITRs, Cybertrust, NSG, …), Intelligence community,
NASA, NIST, Industry (IBM, Fujitsu, …)
• Recent: DARPA (CoABS, GENOA II, DAML), NSF
(CAREER)
10/7/2015
Page 7
Ebiquity Research Space
KR
machine
learning
user
modeling
web services/SOC
semantic
web
AI Intelligent
Information
Systems
information
extraction
IR
data
mining
DB
knowledge
management
wearable computing
HPCC
wireless
mobility
Networking
& Systems
context
awareness pervasive
computing
policies
Security
assurance
trust
intrusion privacy
detection
10/7/2015
Page 8
DRM
Ebiquity Research Space
language
robotics
HCI
planning
technology
KR
user
semantic
data
machine modeling
web
mining
learning
Building intelligent
AI Intelligent DB
systems
in
Information
Systems
open, heterogeneous,
dynamic,
distributed
Networking
Security
&environments
Systems
web services
knowledge
management
IR
service oriented computing
wearable computing
policies
wireless
mobility
context pervasive
awareness computing
intrusion
detection
10/7/2015
assurance
privacy
Page 9
trust
Some Current and Recent Projects
Pervasive and mobile computing
(1) Trauma Pod
(2) Context aware pervasive computing
(3) Mogatu: Tivo for mobile computing
Semantic Web
(4) Swoogle: searching and indexing Semantic Web data
(5) Semnews: text understanding and extraction
(6) Agents and the Semantic Web
(7) Spire: Semantic Web for data discovery and integration
Security and trust
(8) Semantic policy languages
(9) Semdis: Discovering Semantic Links
(10) Securing ad hoc networks
(11) Privacy for passive RFID tags
Information extraction and retrieval
(12) Recognizing spam weblogs
(13) Extracting opinions from weblogs
(14) Modeling the Spread of Influence on the Blogosphere
10/7/2015
Page 10
Semantic Web
"The Semantic Web is an
extension of the current web
in which information is given
well-defined meaning, better
enabling computers and
people to work in cooperation."
-- Berners-Lee, Hendler and Lassila, The Semantic
Web, Scientific American, 2001
10/7/2015
Page 11
Doc. & Term Ranking
Swoogle Search
OntoRank for RDF documents
TermRank:
Rankingfor
semantic
web terms
TermRank
RDF terms
Analysis
Ranking
SWD classifier
…
Index
Ontology Dictionary
Search Services
IR Indexer
Semantic Web
metadata
SWD Indexer
Web
Server
Web
Service
html
document cache
Discovery
Swoogle Statistics
Semantic
Web
Bounded Web Crawler
Google Crawler
Candidate
URLs
rdf/xml
Web
SwoogleBot
Semantic
Web
Archive
machine
human
SW docs.
1.6M
classes
1.3M
embedded
400K
properties
175K
triples
300M
individuals
43.1M
ontologies
10K
Sem. Web Archive
registered users
Swoogle Triple Shop
417
July 2006
http://swoogle.umbc.edu/
10/7/2015
Filip Perich
12
Contributors: Tim Finin, Li Ding, Rong Pan, Pavan Reddivari, Pranam Kolari, Akshay Java, Anupam Joshi, Yun Peng, R. Scott Cost, Jim Mayfield, Joel Sachs, and Drew
Ogle. Partial research support was provided by DARPA contract F30602-00-0591 and by NSF by awards NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649. April 2005.
Ebiquity • CAIN• RMBL
• Mindswap
http://spire.umbc.edu/
Semantic Prototypes in Research Ecoinformatics
ELVIS
Web Ontologies
Ecosystem Location Visualization and Information System
What are likely predators and prey of an invader in a new environment?
For intelligent
agents
SpireEcoConcepts
Species List Constructor
Food Web Constructor
Evidence Provider
Click a county, get a species list.
Predict food web links using database and taxonomic reasoning.
Examine evidence for predicted links.
Food web database
concepts and results of
Food Web Constructor
ETHAN
N
C ertaintyIdx X Y 
For humans
i
i 1
W eight AB 
Predictions
w eight i
 discount ( LinkValue )
1
1  ( D istance X A  P enalty X A )  ( D istance Y B  P enalty YB )
In an new estuary, Nile tilapia
could compete with ostracods
(green) to eat algae. Predators
(red) and prey (blue) of
ostracods may be affected.
Evolutionary Trees and
Natural History
For online scientific databases about species and
higher taxonomic levels
For machines
Oreochromis niloticus
Nile tilapia
TripleShop A SPARQL Workshop for Triples
Use the semantic web to find and share body masses of fish that eat fish.
Query
Create a dataset
Get results
Enter a SPARQL query.
Find semantic web docs
that can answer query.
Apply query to dataset with semantic reasoning.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX spec:
<http://spire.umbc.edu/ontologies/SpireEcoConcepts.owl#>
PREFIX ethan: <http://spire.umbc.edu/ontologies/ethan.owl#>
PREFIX kw:
<http://spire.umbc.edu/ontologies/ethan_keywords.owl#>
SELECT DISTINCT ?predator ?prey ?preymaxmass
?predatormaxmass
WHERE {
?link rdf:type spec:ConfirmedFoodWebLink .
?link spec:predator ?predator .
?link spec:prey ?prey .
?predator rdfs:subClassOf ethan:Actinopterygii .
?prey rdfs:subClassOf ethan:Actinopterygii .
OPTIONAL { ?prey kw:mass_kg_high ?preymaxmass } .
OPTIONAL { ?predator kw:mass_kg_high
?predatormaxmass }
}
Esox_lucius.owl
webs_publisher.php?
published_study=11
Actinopterygii.owl
Name, tag, and share with group members or public.
Export results for further analysis
(e.g. in Food Web Constructor).
UMich Animal Diversity Web
http://www.animaldiversity.org
SEMDIS
On Homeland Security and the Semantic Web:
a Provenance and Trust Aware Inference Framework
Semantic Association Discovery and Evaluation
Motivation
1
Semantic association between X and Bin Laden
• Provenance
•Multiple sources contribute unique fragments of association
•Multiple sources confirm a fragment in different belief states
• Rank: only some discovered associations are interesting
•Trust: some information sources not sufficiently trustworthy
UMBC
an Honors University in Maryland
Architecture
2
Collaborative implementation
• University of Georgia
• Extracting knowledge from the Web
• Discovering complex semantic association
• Ranking semantic association by content
• UMBC
• Tracking provenance of semantic association
• Trusting semantic association by context
• Enabling best-first search using trust heuristics
15
SEMDIS
On Homeland Security and the Semantic Web:
a Provenance and Trust Aware Inference Framework
Semantic Association Discovery and Evaluation
Provenance
3
Provenance of an RDF graph or sub-graph
• Three sources of a RDF graph, G
• where-provenance: the web documents that serialize G
• whom-provenance: the person who created/published G
• why-provenance: the RDF graphs which logically imply G
Trust
4
Trustworthiness of an RDF graph
The hypothesis “Mr X is associated with Bin Laden” is proved by a fourtriple semantic association (SA), how to evaluate SA’s trustworthiness.
S1: eg:MrX eg:isPresidentOf eg:companyA
S2: eg:organizationB eg:invests eg:companyA
S3: eg:organizationB eg:isOwnedBy eg:MrY
S4: eg:MrY eg:relatesTo eg:BinLaden
Trust relation between agents helps propagate belief states
case1: (belief concatenation) exact one source per triple
case2: [belief aggregation] multiple sources for a triple
case3: [social dependency] sources are dependent through social network
RDF graph provenance service
• Observations
• provenance information is part of context information
• provenance is not required for most inference tasks
• provenance is useful for context based trust analysis
• provenance can be used to group knowledge
• Approach
• provide a stand alone service that queries provenance of a given
RDF graph or sub-graph
UMBC
an Honors University in Maryland
We assume all triples are semantically independent
16
http://semnews.umbc.edu/
SemNews: A Semantic News Framework
Facts from NL
Provides RDF version of
the news.
Natural Language
RDF/OWL
Images
Audio
WWW
NLP Tools
Text
Semantic Web
Ontologies
Instances
triples
video
Web of documents
Web of data
• Intelligent agents need knowledge and information.
• Majority of content on the Web remains in natural language.
• SW can benefit NLP tools in their language understanding task.
http://ebiquity.umbc.edu/
http://semnews.umbc.edu
SemNews: A Semantic News Framework
Semantacizing RSS
1
Browsing Facts
Fact Repository Interface
Language Processing
Data Aggregators
11
2
RSS
Aggregator
Ontology &
Instance browser
OntoSem
3
4
News Feeds
Text Search
FR
TMRs
12
RDQL Query
13
Swoogle Index
14
Semantic RSS
15
6
5
OntoSem2OWL
View structured representation of RSS news stories.
Dekade Editor
9
7
OntoSem Ontology
(OWL)
Ontologically linked News
8
Knowledge Editor
Environment
Browse facts not just news.
Inferred
10
Triples
Semantic Queries
TMR
Semantic Web Tools
Tracking Named Entities
Semantic Alerts
RDQL
Find news stories by browsing through the
OntoSem ontology.
Ontological Semantics
Find stories about a specific named
entity.
Alerts can be specified as ontological
concepts/ keywords/ RDQL queries.
Subscribe to the results as an RSS feed.
Structured queries over text
converted to RDF representation.
OntoSem to OWL
Fact
Repository
NL Text
Lexicon
• OntoSem is a NLP system that processes text and converts them into facts.
• Supported by a constructed world model encoded in an rich ontology. Its Ontology has
> 8000 concepts with an average of 16 properties/concept.
http://ebiquity.umbc.edu/
OntoSem
Ontology
TMR
OntoSem2OWL
TMRs
In OWL
OWL
Ontology
OntoSem2OWL is a rule based conversion engine that maps frame-based OntoSem
ontology, fact repository to OWL. Over 102189 triples generated.
Analyzing Weblogs
• Blogs are an important new communication technology
• Social networking + user generated content
• On the Blogosphere or in an Intranet (e.g., Sun, Intel)
• Modeling and understand blog-based systems
•
•
•
•
•
Extracting and computing metadata
Topic modeling and splog detection
Community recognition; influence and information flow
Opinion extraction (TREC 06)
More to come: fact extraction, recognizing bias, trend mapping,
event detection, monitoring for “surprises”
• If the Web is our common “brain”, the Blogosphere is
its consciousness
10/7/2015
Page 19
Detecting Spam Blogs: A Machine Learning Approach
1
What are Spam Blogs (Splogs)?
 Blogs hosting machine generated posts, each adding to web spam
 Posts have content hijacked from other blogs and/or stuffed keywords
 Posts with links interspersed between random text
 Blogs with context based ads to fool users into clicking ads (See 3)
A Case of Content Plagiarism
(1) Original Post by Ebiquity
(2) Infiltration in Search Results
(3) A splog result
2
3
Why are Splogs a Problem?
$197
 Splogs undermine ranking algorithms (See 6)
 Splogs water down search results (See 6)
 Splogs threaten the Web advertising model (See 3)
 Splogs indulge in “plagiarism” (See 1,2,3)
 Splogs skew results of social research tools (See 4)
 Splogs stress the Blogosphere infrastructure (See 4,5)
“Holy Grail Of Advertising...
“
“Easy Dominate Any Market,
Any Search Engine, Any
Keyword”
This Work
URL Tokens as Features
Linear
RBF
Sigmoid
1
0.95
0.9
0.85
AUC
0.8
0.75
0.7
An ch or Text as Featu res
0.65
0.6
1
0.55
0.95
0.5
Linear
100
0.9
500
1000
5000
RBF
Sigmoid
 Formalizes the Splog Detection Problem
 Supervised Machine Learning Technique
 Training set of 1400 hand labeled examples
 Effectiveness of Specialized features
 Local Models for Fast Splog Detection
 Global link-based models good for (delayed) detection
 Precision/Recall of 87% for bag-of-words
 See http://memeta.umbc.edu/splog/
10000
Feature Size
0.85
AUC
0.8
0.75
0.7
0.65
2-Word Grams as Features
0.6
Linear
0.55
RBF
Lin k (Global) Distribu tion
Sigmoid
Linear
100
500
0.95
5000
10000
Sigmoid
0.95
0.85
0.9
0.8
0.85
0.8
0.75
AUC
AUC
1000
0.7
Words as Featu res
0.65
0.6
1
0.55
0.95
0.5
Linear
RBF
Sigmoid
0.75
0.7
0.65
0.6
0.55
100
0.9
1
Feature Size
0.9
500
1000
5000
10000
0.5
Feature Size
0.85
0
0.25
0.5
0.75
1
Probability Thresholds for Local Model
0.8
AUC
RBF
1
0.5
0.75
0.7
0.65
0.6
10/7/2015
0.55
0.5
100
500
Filip Perich
1000
Feature Size
5000
Blog Features
10000
Splog Features
20
We, what, was, my, org, flickr, paper,
Find, info, news, website, best, articles, perfect,
words, me, thank, go, archives
Products, uncategorized, hot, Resources, inc, copyright
Feeds That Matter
General Statistics
Applications
• 83,204 publicly listed subscribers.
• 2,786,687 feeds of which 496,879 are unique.
• 26,2436 users (35%) use folders to organize subscriptions
• Data collected in May 2006.
Feed
Recommendations
Finding Influential Feeds
for a Topic.
Two feeds are similar if they are categorized
under similar folder names. The above chart
shows the feed recommendations and
corresponding text based cosine similarity.
The number of subscribers per feed
follows a power law distribution.
The distribution of domains
in the Bloglines dataset.
Leading Blogs on
topic “Politics”. Seed
set are top blogs in
“politics” from
bloglines and blog
graph used is from
Blogpulse dataset..
The number of folders per user. Most
users tend to use modest number of
folders.
Scatter plot showing the relation
between the number of folders and
the number of feeds subscribed.
As more feeds are subscribed
users tend to organize feeds into
folders.
10/7/2015
Page 21
Feeds That Matter
Tag Cloud Before Merge
Tag Cloud After Merge
Tag cloud generated by using the folder names as labels (Top 200 folder names).
Tag cloud generated by merging related folders (Top 200 folder names).
Folder Names are used as a substitute for topics. Lower
ranked folder merged into a higher ranked folder if there is an
overlap and high cosine similarity.
10/7/2015
Page 22
Opinion extraction from Blogs
• Data: 3M posts from 83K blogs over two weeks
• Task: given a topic (e.g., March of the Penguins), find
blogs posts that express an opinion about it.
• Some features of our system:
• Clean data by removing splogs (~12% posts) and noncontent (e.g., ads, headers, footers, blogrolls)
• Use Google to find opinion words relevant to topic and
induced topic category (eg, heavy=bad for digital cameras)
• Multiple-hand crafted heuristic scoring functions to measure
opinionatedness given the topic phrase
• Train an SVM to learn appropriate scorer weights
10/7/2015
Page 23
Security and Trust in Open Environments
• Many new information systems are open, heterogeneous
and dynamic
• Examples: the web, web services, P2P systems, Grid
computing, pervasive computing, MANETs, etc.
• Providing security and privacy in such systems is
challenging
• We can not rely on traditional authentication-based
schemes
• Recognizing “bad actors” in such systems is hard
• We are exploring new approaches using computational
policies, trust and reputation.
10/7/2015
Page 24
Trust & Security for the Semantic Web
• Autonomous agents need policies as
1 A robot may not injure a human
“norms of behavior”
being, or, through inaction, allow
a human being to come to harm.
2 A robot must obey the orders
• In OS, networking, data management,
given it by human beings except
where such orders would conflict
applications, multiagent systems, pervasive
with the First Law.
3 A robot must protect its own
environments, etc.
existence as long as such
protection does not conflict with
the First or Second Law.
• Especially to secure complex open,
distributed, dynamic environments
An early policy for agents
• Traditional “hard coded” rules like DB access
control & file permissions depending on known
entities won’t work!
• Trust associations based on attributes are needed
• Interesting issues abound, like how to
• Resolve conflicts among agents governed by multiple policies
• Enforce policies via sanctions, reputation, escalation, etc.
• Modify policies dynamically according to context
• Make policy engineering easier than software engineering
10/7/2015
Page 25
Rei Policy Language
• Developed several versions of Rei, a policy
specification language, encoded in (1)
Prolog, (2) RDFS, (3) OWL
• Used to model different kinds of policies
• Authorization for services
• Privacy in pervasive computing and the web
• Conversations between agents
• Team formation, collaboration & maintenance
• The OWL grounding enables policies that
reason over SW descriptions of actions,
agents, targets and context
10/7/2015
Page 26
Rei Policy Language
• Developed several versions of the Rei
policy specification language in Prolog, RDFS, &
OWL
• Used to model different kinds of policies
•
•
•
•
Authorization for services
Privacy in pervasive computing and the web
Conversations between agents
Team formation, collaboration & maintenance
• The OWL grounding enables policies
that reason over SW descriptions
of actions, agents, targets and
context
10/7/2015
USER
JAVA API
REI INTERFACE
YAJXB
REI
FOWL
FLORA
XSB
Page 27
Applications – past, present & future
• Coordinating access in supply chain
management system
• Authorization policies in a pervasive
computing environment
• Policies for team formation, collaboration,
information flow in multi-agent systems
• Security in semantic web services
• Privacy and trust on the Internet
• Privacy in pervasive computing
environments
10/7/2015
Page 28
1999
2002
2003
…
2004
…
Enhancing Web Privacy via Policies and Trust
Motivation
August 2002
January 2004
compliant
Consum er Trust - Published
Privacy Policy
Non-compliant
Trust website policies
Distrust website policies
Top 5000 sites - January 2004
W3C specified P3P Architecture
P3P Compliance
Consumer Confidence
Key Points
publish (optionally)
Web Server
 Web Sites optionally publish P3P policies
 Clients specify privacy preferences using a
policy language, for instance Rei
 Privacy Expert is the privacy enhancement
enabler by binding together entities of the
system
 Rei Engine evaluates policies of users
against website attributes
 Website Recommender Network
propagates and builds a model of websites
based on reputation
 FOAF – Enables the creation of the website
recommender network
Website Evaluation Ontology
P3P Policy
www.slashdot.org
Ontologies, Trust rules
Personal agents
DiscussionGroup
9
Rei Engine
hasP3P
owner
URI
Privacy Expert
OSDN
isBasedOutOf
hasPrivacyCertifier
--
USA
hasTextPolicy
Intelligent Privacy Proxy*
subDomainOf
--
URI
policySimilarTo
OSDN
XSLT Transformer
serviceType
popularity
Website
Recommender
Network
Rei Privacy Policy
(RDF based,
enhancements
over APPEL)
lawEnforcedBy
Yes
hasPolicyEnforcement
US
Clients
10/7/2015
publish
Page 29
FOAF
Trusted Agent
Network#
Securing Ad-Hoc Networks
10/7/2015
Page 30
Monitoring and Response
• Active Response Framework
• Nodes Snoop Locally
• Send Signed Accusations to Other Nodes
• Each Node Makes Decision Locally based on Policy
• Accusations can be Corroborated and lead to increase in
reputation
• False Accusations Can Be Flagged and lead to loss of
reputation (or even sanctions)
• Nodes Can Choose Not To Communicate Through
Suspected Nodes
10/7/2015
Page 31
SWANS: Secure and Adaptive WSNs
• A holistic policy driven approach to designing secure
and adaptive wireless sensor networks
• Secure self-organization
• Centralized and distributed protocols
• State determination
•
•
•
•
Parameters to define “raw” state
Node-level logical construct to identify complete state
Network-level logical construct to help identify global state
A set of policies to adapt to changes in state
10/7/2015
Page 32
SWANS: Secure and Adaptive WSN
10/7/2015
Page 33
ORs will be data rich
Drugs
Tools
CAST
Patient Monitors
Staff
• ORs will be awash in low-level data, much of it noisy or
incomplete
• Challenges include coping with the noise and interpreting the
low-level data to recognize high-level events and activities
10/7/2015
Page 41
System Architecture
Continuous
Queries
Trend
Analyzer
Patient Monitor
Medicines
Tools
Stream
Processor
(TelegraphCQ)
Context
Aware
Agent
Rule
Base
RFID
System
Video
Clipper
Database
Staff
Patient
History
Medical
Encounter
Record
Staff
Medical
Supplies
10/7/2015
Page 42
10/7/2015
Page 43
http://ebiquity.umbc.edu/
10/7/2015
Page 47
http://ebiquity.umbc.edu/
10/7/2015
Page 48
Descargar

Slide 1