Ontologies and Semantic
Applications in Earth Sciences
Peter Fox (TWC/RPI; formerly HAO/NCAR)
Thanks to many.
Projects funded by NSF/OCI and NASA/ACCESS/ESTO
20081118 Fox OOS meeting
1
Background
Scientists should be able to access a global, distributed
knowledge base of scientific data that:
• appears to be integrated
• appears to be locally available
But… data is obtained by multiple means (models and
instruments), using various protocols, in differing
vocabularies, using (sometimes unstated)
assumptions, with inconsistent (or non-existent)
meta-data. It may be inconsistent, incomplete,
evolving, and distributed
And… there exist(ed) significant levels of semantic
heterogeneity, large-scale data, complex data
types, legacy systems, inflexible and unsustainable
2
implementation technology
Data-types as service
Limited
interoperability
VO App
1
Open
VOTable
VO App2
VO App3
Geospatial Consortium:
Simple
Image
Access
Protocol
Web {Feature, Coverage, Mapping}
Simple
Service
Spectrum
Sensor Web Enablement:
VO layer
Sensor {Observation, Planning,
Analysis}Lightweight
Service
semantics
Access
Protocol
Simple
Time Access
Protocol
Limited meaning, hard
coded
use
the
same
approach
DBn
DB
DB
2
DB1
3
…………
Limited extensibility
Under review
3
Added value
Education, clearinghouses,
disciplines, etc.
other
services,
Semantic mediation layer - mid-upper-level
VO
Portal
Semantic
interoperability
Web
Serv.
VO
API
“Knowledge” as service!
Added value
Added value
Semantic query,
hypothesis and
inference
Mediation Layer
• Ontology - capturing concepts of Parameters,
Instruments, Date/Time, Data Product (and
Semantic mediation layer - VSTO - low level
associated classes, properties) and Service
Classes
• Maps queries to underlying data Metadata, schema,
data
• Generates access requests for metadata,
data
• Allows queries, reasoning, analysis, new
Added value
DBn
DB2
DB3 explanation,
hypothesis
generation,
testing,
etc.
…………
DB
1
20080602 Fox VSTO et al.
Query,
access
and use
of data
Standard,
or
not,
vocabular
ies
and
schema
4
Semantic Web Methodology and
Technology Development Process
•
•
Establish and improve a well-defined methodology vision for
Semantic Technology based application development
Leverage any existing vocabularies
Rapid
Open World:
Evolve, Iterate, Prototype
Redesign,
Redeploy
Leverage
Technology
Infrastructure
Adopt
Science/Expert
Technology
Approach Review & Iteration
Use Tools
Analysis
Use Case
Small Team,
mixed skills
20080602
Fox VSTO et al.
Develop
model/
ontology
5
E.g. Science and technical use cases
Find data which represents the state of the neutral
atmosphere anywhere above 100km and toward the
arctic circle (above 45N) at any time of high
geomagnetic activity.
– Extract information from the use-case - encode knowledge
– Translate this into a complete query for data - inference and
integration of data from instruments, indices and models
Provide semantically-enabled, smart data query services
via a SOAP web for the Virtual IonosphereThermosphere-Mesosphere Observatory that retrieve
data, filtered by constraints on Instrument, Date-Time,
and Parameter in any order and with constraints
included in any combination.
20080602 Fox VSTO et al.
6
VSTO - semantics and ontologies in an operational
environment: vsto.hao.ucar.edu, www.vsto.org
Web Service
Existing OPeNDAP
Service
20080602 Fox VSTO et al.
7
Semantic Web Services
20080602 Fox VSTO et al.
8
Semantic Web Services
OWL document returned
using VSTO ontology can be used both
syntactically or
semantically
20080602 Fox VSTO et al.
9
Semantic Web Benefits
• Unified/ abstracted query workflow: Parameters, Instruments, Date-Time
across widely different disciplines
• Decreased input requirements for query: in one case reducing the
number of selections from eight to three
• Semantic query support: by using background ontologies and a
reasoner, our application has the opportunity to only expose coherent
queries (portal and services)
• Semantic integration: in the past users had to remember (and maintain
codes) to account for numerous different ways to combine and plot the
data whereas now semantic mediation provides the level of sensible data
integration required, and exposed as smart web services
– understanding of coordinate systems, relationships, data synthesis,
transformations, etc.
– returns independent variables and related parameters
• A broader range of potential users (PhD scientists, students,
professional research associates and those from outside the fields)
•
VSTO: http://vsto.hao.ucar.edu, http://www.vsto.org
10
http://dataportal.ucar.edu/schemas/vsto_all.owl
(1.0, 2.0 coming)
Fox RPI: Semantic Data Frameworks May 14,
2008
11
Ingest/pipelines: problem definition
•
Data is coming in faster, in greater volumes and outstripping our ability to
perform adequate quality control
•
Data is being used in new ways and we frequently do not have sufficient
information on what happened to the data along the processing stages to
determine if it is suitable for a use we did not envision
•
We often fail to capture, represent and propagate manually generated
information that need to go with the data flows
•
Each time we develop a new instrument, we develop a new data ingest
procedure and collect different metadata and organize it differently. It is then
hard to use with previous projects
•
The task of event determination and feature classification is onerous and we
don't do it until after we get the data
12
20080602 Fox VSTO et al.
13
Use cases
• Who (person or program) added the comments
to the science data file for the best vignetted,
rectangular polarization brightness image from
January, 26, 2005 1849:09UT taken by the
ACOS Mark IV polarimeter?
• What was the cloud cover and atmospheric
seeing conditions during the local morning of
January 26, 2005 at MLSO?
• Find all good images on March 21, 2008.
• Why are the quick look images from March 21,
2008, 1900UT missing?
• Why does this image look bad?
14
20080602 Fox VSTO et al.
15
20080602 Fox VSTO et al.
16
Provenance
• Origin or source from which something
comes, intention for use, who/what
generated for, manner of manufacture,
history of subsequent owners, sense of
place and time of manufacture, production
or discovery, documented in detail
sufficient to allow reproducibility
• Knowledge provenance; enrich with
ontologies and ontology-aware tools
17
18
20080602 Fox VSTO et al.
19
Quick look browse
20080602 Fox VSTO et al.
20
21
Visual browse
22
23
24
Search and structured query
Search
Structured
Query
25
Search
20080602 Fox VSTO et al.
26
Data Integration Use Case
• Determine the statistical signatures of both
volcanic and solar forcings on the height of the
tropopause
27
Detection and attribution relations…
28
20080602 Fox VSTO et al.
29
SWEET 2.0
Semantic framework indicating how volcano and atmospheric
parameters and databases can immediately be plugged in to the
semantic data framework to enable data integration.
31
Faceted Search
20080602 Fox VSTO et al.
32
Summary
• Level of ontology encoding relates to use,
e.g.
– VSTO:
– SPCDIS:
– SESDI: Data integration needs higher level of
curation of ontologies and mapping to data
• Languages and tools
– Rapid prototyping (PHP, Semantic MediaWiki)
– Clean and simple (RDFS, Perl and SPARQL)
– Complex and rich (Java, Protégé, Jena,
33
Pellet, ELMO, Maven, Eclipse)
Modified GEON Solution Framework
Data Discovery
Data Integration
Level 1:
Level 2:
Data Registration
at the Discovery Level,
e.g. Volcano
location and activity
Data Registration
at the Inventory Level,
e.g. list of datasets by,
types, times, products
Level 3:
Data Registration
at the Item Detail
Level, e.g. access to
individual quantities
Earth Sciences Virtual Database
A Data Warehouse where
Schema heterogeneity problem is
Solved; schema based integration
20080602 Fox VSTO et al.
Ontology based
Data Integration
34
A.K.Sinha, Virginia Tech, 2006
Spare material
20080602 Fox VSTO et al.
35
Example 1: Registration of
Volcanic Data
Location Codes:
• U - Above the 180° turn at
Holei Pali (upper Chain of
Craters Road)
• L - Below Holei Pali (lower
Chain of Craters Road)
• UL - Individual traverses
were made both above and
below the 180° turn at Holei
Pali
• H - Highway 11
SO2 Emission from Kilauea east rift zone vehicle-based (Source: HVO)
Abreviations: t/d=metric tonne (1000 kg)/day,
SD=standard deviation, WS=wind speed, WD=wind
36
direction
of true
north, N=number of traverses
20080602east
Fox VSTO
et al.
Registering Volcanic Data (2)
• No explicit lat/long data
• Volcano identified by name
• Volcano ontology framework will link
name to location
37
20080602 Fox VSTO et al.
Registering Atmospheric Data (2)
20080602 Fox VSTO et al.
38
Building blocks
• Data formats and metadata: IAU standard FITS, with SoHO keyword
convention, JPeG, GIF
• Ontologies: OWL-DL and RDF
• The proof markup language (PML) provides an interlingua for
capturing the information agents need to understand results and to
justify why they should believe the results.
• The Inference Web toolkit provides a suite of tools for manipulating,
presenting, summarizing, analyzing, and searching PML in efforts to
provide a set of tools that will let end users understand information
and its derivation, thereby facilitating trust in and reuse of
information.
• Capturing semantics of data quality, event, and feature detection
within a suitable community ontology packages (SWEET, VSTO)
39
Descargar

The VIRTUAL SOLAR-TERRESTRIAL OBSERVATORY