Using a Resource Description
Framework (RDF) to carry
metadata for climate datasets
M.Benno Blumenthal and John del Corral
International Research Institute for Climate
and Society
http://iridl.ldeo.columbia.edu/ontologies/
Why RDF?
Make implicit semantics explicit
Web-based system for interoperating
semantics
RDF/OWL is an emerging technology, so
tools are being built that help solve the
semantic problems in handling data
Standard Metadata
Standard Metadata Schema/Data Services
Tools
Datasets
Users
Many Data Communities
Standard Metadata Schema
Tools
Standard Metadata Schema
Datasets
Users
Standard Metadata Schema
Tools
Users
Datasets
Datasets
Tools
Users
Standard Metadata Schema
Standard Metadata Schema
Tools
Users
Datasets
Tools
Users
Datasets
Super Schema
Standard metadata schema
Standard Metadata Schema
Standard Metadata Schema
Tools
Tools
Datasets
Users
Users
Standard Metadata Schema
Tools
Users
Datasets
Datasets
Standard Metadata Schema
Tools
Users
Datasets
Standard Metadata Schema
Tools
Users
Datasets
Super Schema: direct
Standard metadata schema/data service
Standard Metadata Schema
Standard Metadata Schema
Tools
Tools
Datasets
Users
Users
Standard Metadata Schema
Tools
Users
Datasets
Datasets
Standard Metadata Schema
Tools
Users
Datasets
Standard Metadata Schema
Tools
Users
Datasets
Flaws
• A lot of work
• Super Schema/Service is the LowestCommon-Denominator
• Science keeps evolving, so that standards
either fall behind or constantly change
RDF Standard Data Model
Exchange
Standard metadata schema
RDF
RDF
RDF
Standard Metadata Schema
Standard Metadata Schema
Tools
Tools
Datasets
Users
Users
RDF
Standard Metadata Schema
Tools
Users
Datasets
Datasets
RDF
RDF
Standard Metadata Schema
Standard Metadata Schema
Tools
Users
Datasets
Tools
Users
Datasets
RDF Data Model Exchange
Standard metadata schema
RDF
RDF
RDF
Standard Metadata Schema
Standard Metadata Schema
RDF
RDF
RDF
RDF
Tools
Tools
Datasets
Users
Users
RDF
RDF
Standard Metadata Schema
Users
RDF
Standard Metadata Schem
RDF
RDF
Tools
Datasets
RDF
RDF
Datasets
Standard Metadata Schema
Tools
Users
RDF
RDF
Datasets
Tools
Users
Datasets
Why is this better?
• Maps the original dataset metadata into a standard
format that can be transported and manipulated
• Still the same impedance mismatch when mapped to the
least-common-denominator standard metadata, but
• When a better standard comes along, the original
complete-but-nonstandard metadata is already there to
be remapped, and “late semantic binding” means
everyone can use the new semantic mapping
• Can use enhanced mappings between models that have
common concepts beyond the least-commondenominator
• EASIER – tools to enhance the mapping process,
mappings build on other mappings
RDF Architecture
queries
queries
queries
Virtual (derived) RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
Example: Search Interface
Additional Semantics
Dataset Ontology
Search Ontology
Search Interface
Datasets
Users
Sample Tool: Faceted Search
http://iridl.ldeo.columbia.edu/ontologies/query2.pl?...
Distinctive Features of the search
• Search terms are interrelated
• terms that describe the set of returns are
displayed (spanning and not)
• Returned items also have structure (subitems and superseded items are not
shown)
Architectural Features of the
search
http://iridl.ldeo.columbia.edu/ontologies/query2.pl
• Multiple search structures possible
• Multiple languages possible
• Search structure is kept in the database,
not in the code
RDF: framework for writing
connections
Triplets of
• Subject
• Property (or Predicate)
• Object
URI’s identify things, i.e. most of the above
Namespaces are used as a convenient
shorthand for the URI’s
Datatype Properties
{WOA} dc:title “NOAA NODC WOA01”
{WOA} dc:description “NOAA NODC
WOA01: World Ocean Atlas 2001, an atlas
of objectively analyzed fields of major
ocean parameters at monthly, seasonal,
and annual time scales. Resolution: 1x1;
Longitude: global; Latitude: global; Depth:
[0 m,5500 m]; Time: [Jan,Dec]; monthly”
Object Properties
{WOA} iridl:isContainerOf {Grid-1x1},
{Grid-1x1} iridl:isContainerOf {Monthly}
WOA01 diagram
Standard Properties
{WOA} dcterm:hasPart {Grid-1x1},
{Grid-1x1} dcterm:hasPart {MONTHLY}
Alternatively
{WOA} iridl:isContainerOf {Grid-1x1},
{iridl:isContainerOf} rdfs:subPropertyOf
{dcterm:hasPart}
Data Structures in RDF
{SST} rdf:type {cfatt:non_coordinate_variable},
{SST} cfobj:standard_name {cf:sea_surface_temperature},
{SST} netcdf:hasDimension {longitude}
Object properties provide a framework for
explicitly writing down relationships
between data objects/components, e.g.
vague meaning of nesting is made explicit
Properties also can be related, since they
are objects too
Search Interface Term
• http://iri.columbia.edu/~benno/sampleterm.
pdf
Virtual Triples
Use Conventions to connect concepts to
established sets of concepts
Generate additional “virtual” triples from the
original set and semantics
RDFS – some property/class semantics
OWL – additional property/class semantics:
more sophisticated (ontological)
relationships
SWRL – rules for constructing virtual triples
OWL
Language for expressing ontologies, i.e. the
semantics are very important. However, even
without a reasoner to generate the implied RDF
statements, OWL classes and properties
represent a sophistication of the RDF Schema
However, there are many world views in how to
express concepts: concepts as classes vs
concepts as individuals vs concept as predicate
Define terms
• Attribute Ontology
• Object Ontology
• Term Ontology
Attribute Ontology
• Subjects are the only type-object
• Predicates are “attributes”
• Objects are datatype
• Isomorphic to simple data tables
• Isomorphic to netcdf attributes of datasets
• Some faceted browsers: predicate = facet
Object Ontology
• Objects are object-type
• Isomorphic to “belongs to”
• Isomorphic to multiple data tables connected by
keys
• Express the concept behind netcdf attributes
which name variables
• Concepts as objects can be cross-walked
• Concepts as object can be interrelated
Example: controlled vocabulary
{variable} cfatt:standard_name {“string”}
Where string has to belong to a list of
possibilities.
{variable} cfobj:standard_name {stdnam}
Where stdnam is an individual of the class
cfobj:StandardName
Example: controlled vocabulary
Bi-direction crosswalk between the two is
somewhat trivial, which means all my
objects will have both
cfatt:standard_name
and
cfobj:standard_name
Example: controlled vocabulary
If I am writing software to read/write netcdf
files, I use the cfatt ontology and in
particular cfatt:standard_name
If I am making connections/cross-walks to
other variable naming standards, I use
cfobj:standard_name
Term Ontology
Concepts as individuals
Simple Knowledge Organization System
(SKOS) is a prime example
The ontology used here is slightly different:
facets are classes of terms rather than
being top_concepts
Nuanced tagging
Concepts as objects can be interrelated:
specific terms imply broader terms
Object ends up being tagging with terms
ranging from general to specific.
Search can then be nuanced
tagging can proceed in absence of perfect
information
Faceted Search Explicated
Search Interface
• Items (datasets/maps)
• Terms
• Facets
• Taxa
Search Interface Semantic API
{item} dc:title dc:description rss:link iridl:icon
dcterm:isPartOf {item2}
dcterm:isReplacedBy {item2}
{item} trm:isDescribedBy {term}
{term} a {facet} of {taxa} of {trm:Term},
{facet} a {trm:Facet}, {taxa} a {trm:Taxa},
{term} trm:directlyImplies {term2}
Faceted Search w/Queries
http://iridl.ldeo.columbia.edu/ontologies/query2.pl?...
RDF Architecture
queries
queries
queries
Virtual (derived) RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
RDF
IRI RDF Architecture
MMI
Data Servers
Ontologies
JPL
bibliography
Start Point
Standards
Organizations
RDF Crawler
RDFS Semantics
Owl Semantics
SWRL Rules
SeRQL CONSTRUCT
Sesame
Search Queries
Search Interface
Location
Canonicalizer
Time
Canonicalizer
Cast of Characters
NC – netcdf data file format
CF – Climate and Forecast metadata
convention for netcdf
SWEET - Semantic Web for Earth and
Environmental Terminology (OWL
Ontology)
IRIDL – IRI Data Library
NC basic attributes
CF attributes
IRIDL
attributes/objects
CF data objects
SWEET Ontologies
(OWL)
CF Standard Names
(RDF object)
Location
CF Standard Names
As Terms
SWEET as Terms
Search Terms
Gazetteer Terms
IRIDL Terms
Thoughts
• Pure RDF framework seems currently
viable for a moderate collection of data
• Potential for making a lot of implicit data
conventions explicit
• Explicit conventions can improve
interoperability
• Simple RDF concepts can greatly impact
searches
Future Work Possibilities
More Usable Search Interface
Tagging Interface that uses tag interrelationships
to simplify choices
Data Format translation using semantics
“Related Object Browsing” given a dataset, find
related data, papers, images
Document/execute/create analysis trees
Stovepipe conventions/bash-to-fit
Less Monolithic IRI Data Library
Question: Canonical Objects
Given a set of individuals related by
owl:sameAs, semantics says predicates
that point to one individual should point to
all the individuals. But as a programmer I
usually want to work with one (the
canonical object).
Question: Concept Class
Membership
Seemingly the only way to use a concept
which is expressed as an OWL class is
rdf:type, i.e. membership. Should I really
be putting datasets into conceptual
classes? Shouldn’t there be more choice
of relationship between a concept and a
dataset?
Question: OWL/SKOS Crosswalk
Given a concept in both OWL and Term
(SKOS) frameworks, is it possible to
crosswalk between them, in particular
preserving the ability of Term to have
different predicates with the item being
described?
I’ve tried dual-defined objects, not sure it
works yet …
Stovepipe Conventions
•
•
•
•
Fixed Schema
Agreed upon metadata domain
Agreed upon data domain
Designed to be a partial solution
General server software needs to decide
whether data legitimately fits the standard
User contemplates bash-to-fit
Overview
Specialized Data Tools
Maproom
Generalized Data Tools
Data Viewer
Data Language
IRI Data Collection
Dataset
• Dataset
•Dataset
•Variable
•ivar
•ivar
multidimensional
URL/URI for data,
calculations, figs, etc
IRI Data Collection
Ocean/Atm
“geolocated by lat/lon”
multidimensional
spectral harmonics
equal-area grids
GRIB grid codes
climate divisions
Economics
Public Health
“geolocated by
entity”
IRI Data
Collection
Dataset
• Dataset
•Dataset
•Variable
•ivar
•ivar
multidimensional
GIS
“geolocation by
vector object or
projection
metadata”
IRI Data Collection
GRIB
netCDF
images
binary
spreadsheets
shapefiles
Database
Tables
queries
Servers
OpenDAP
THREDDS
IRI Data
Collection
Dataset
• Dataset
•Dataset
•Variable
•ivar
•ivar
images w/proj
IRI Data Collection
GRIB
netCDF
images
binary
spreadsheets
shapefiles
Database
Tables
queries
Servers
OpenDAP
THREDDS
images w/proj
IRI Data
Collection
Dataset
• Dataset
•Dataset
•Variable
•ivar
•ivar
Calculations
“virtual variables”
Data Files
netcdf
binary
Images
GeoTiff
images
graphics
descriptive and
navigational pages
Clients
OpenDAP
THREDDS
Tables
OpenGIS
WMS v1.3
WCS
IRI General Data Tools
Data Page
IRI General Data Tools
Data Viewer
IRI General Data Tools
Cut and Paste
IRI Map Room
IRI Map Room
Malaria Early Warning System
• Front page
illustrates most
recent dekadal
rainfall estimates
(FEWS RFE)
• Administrative and
epidemiological
overlays available
• Change dates to
view different time
periods
• Click and drag box
across map to zoom
Descargar

Slide 1