VO Query Language
GSFC XML Group
Ed Shaya
Brian Thomas
Kirk Borne
VOQL Requirements



Provide a means for users to submit general requests for
astronomical information from a distributed set of repositories.
Allow for the science use cases.
Easy to learn and use:
–
–




Hide from the user obvious but tedious steps
May require several levels o f language with only the top level being
easy.
Allow for web form entry.
Independent of internal arrangement of data at repositories.
Plug-n-play metadata and ontology.
Span a distributed set of heterogeneous services.
–
–
–
Each VO query can transform to multiple queries in local dialects.
Workflow of interactions between registries, services, and user.
Integration of multiple responses
IVO Meeting @ Cambridge
May 12-16 2003
More VOQL Requirements


Easy to parse and transform into other forms
Extensible
–
–
Sites can extend query language through local namespaces
VO namespace can add language elements into the future.
IVO Meeting @ Cambridge
May 12-16 2003
XML Query Language



Compatible XML and Human-Readable versions
Xquery is a superset of Xpath
Based on Quilt, XQL, and XML-QL
–
–


Quilt is based on Object Query Langauge (OQL)
OQL is based on Structured Query Language (SQL)
If,then,else: case switch: basic functions: define new functions
FLWR (for, let, where, return)
for $i in (1 to 3)
let $j := (1 to $i)
Results in:
$i = 1, $j = 1
$I = 2, $j = (1,2)
$I =3, $j = (1,2,3)
IVO Meeting @ Cambridge
May 12-16 2003
XQuery Continued
for $s in document('‘bright_stars.xml'')/*/id_main
let $b := document('‘photometry.xml'')/*/star[name = $s]/band
where count ($b) > 1
return
<colors>
<starName>$i</starname>
for $j in (2 to count($b))
<color name=“$b[$j]@name - $b[$j-1]@name”>
$b[$j]/value - $b[$j-1]/value
</color>
</colors>
IVO Meeting @ Cambridge
May 12-16 2003
IVO Meeting @ Cambridge
May 12-16 2003
OLAP/XMLA




On-line Analytical Processes
Reduces bandwidth/time of data out
Statistical Package add on to Databases
Analysis of DataCubes
–
Hierarchy of Axis Values



Years, Months, Days, Hours, minutes
Degrees, minutes, seconds
Interior, core, mantle, atmosphere, mesosphere, exosphere
IVO Meeting @ Cambridge
May 12-16 2003
JVO Query Language – Naoki Yasuda


Retrieves catalog data and images from multiple data
servers via a single user interface
Extension of SQL
–
–
–
–
Catalog.UCD
Box(Point(c1.ra,c1.dec), width1,height1)
XMATCH(c1,c2,!c3,…)< 3 arcsec
Select Catalog: Keyword1 & Keyword2

Select by [[MAX|MIN](PROPERTY) | ALL] [NAME]
Area : [inside|outside] area0
Area1 [overlap|union] area2 | shape
SHAPE: box, circle, oval, triangle,point
DIFF(x.obs_date, y.obs_date) > 30 days
–
IVO Meeting @ Cambridge
May 12-16 2003
Data mining


Beyond finding data; intense data filtering,
conditioning, knowledge synthesis.
Grid Services?
–
–
–
–
–
–
–
–
Principal Component Analysis
Iterative solutions
Genetic algorithms
Maximum-likelihood functions
Neural nets
Decision trees
Cluster analysis
Regression analysis
IVO Meeting @ Cambridge
May 12-16 2003
Data Objects

Dataset
–
Tables

Fields
–
Units
– Class (UCD)
– Range
– Values
–
Images



–
Axes
Coordinate Maps
Data Values
Spectra


Wavelength
Intensity
IVO Meeting @ Cambridge
May 12-16 2003
ADQL

Obtain Data Sets
–
By bibliographic query

–
By description


Keywords, abstract, mission name
Obtain tables
–
–
By title, table #, field names
By Xpath

–
–

Author, date published, title, journal, volume
/LocalGroup/[galaxy=“M31”]/region7/v-band
Obtain table data by UCDs or field names
Min/max of range, regular expression
Obtain N-cube data
–
–
Subset by axis values,
subset by ra,dec, radius or more generally Func(axes1..)
IVO Meeting @ Cambridge
May 12-16 2003
Astronomy Data Query Language (ADQL)
IVO Meeting @ Cambridge
May 12-16 2003
ADQL/Query Schema
IVO Meeting @ Cambridge
May 12-16 2003
Knowledge Based Query


Class  Instance Objects
Property (V-band)  Instance  value (-1.4)
– Measurement property values are Data
– Modifier (aperture)  Instance  value (3 arcsec)

–
Aggregate property – member, region, component

–


Modifier (inequality)  Instance  value (before, not)
Values are bags of objects
SubclassOf property – subclass has restricted
property value range or restricted list of properties.
Property Space – N-properties form a space.
A bit of math is needed to relate values.
IVO Meeting @ Cambridge
May 12-16 2003
Problem Statement Language: Root
IVO Meeting @ Cambridge
May 12-16 2003
PSL Constraint
IVO Meeting @ Cambridge
May 12-16 2003
PSL AstroObject
IVO Meeting @ Cambridge
May 12-16 2003
Dataset Schema
<dataset subject="astronomy">
<title>AC 2000.2: The Astrographic Catalogue on the Hipparcos System</title>
<altname type="ADC">1275</altname>
<altname type="CDS">I/275</altname>
<altname type="brief">The AC 2000.2 Catalogue</altname>
<references type="source">
<reference>
<title>AC 2000.2: The Astrographic Catalogue on the Hipparcos System</title>
<author><initial>S</initial><initial>E</initial><lastName>Urban</lastName></author>
<author><initial>T</initial><initial>E</initial><lastName>Corbin</lastName></author>
<author><initial>G</initial><initial>L</initial><lastName>Wycoff</lastName></author>
<author><initial>E</initial><lastName>Hoeg</lastName></author>
<author><initial>C</initial><lastName>Fabricius</lastName></author>
<author><initial>V</initial><initial>V</initial><lastName>Makarov</lastName></author>
<journal><name>Astron. J.</name><volume>115</volume><pageno>1212</pageno>
<date><year>1998</year></date><bibcode>1998AJ....115.1212U</bibcode>
</journal>
</reference>
</references>
IVO Meeting @ Cambridge
May 12-16 2003
Dataset Continued
<keywords
xml:base=http://adc.gsfc.nasa.gov/keywordLists/adc/
parentListURL="adc_keywordList.html">
<keyword xlink:href="kw_p.html#Positional_data">Positional data</keyword>
<keyword xlink:href="kw_a.html#Astrographic_zones">Astrographic
zones</keyword>
<keyword xlink:href="kw_s.html#Surveys">Surveys</keyword>
</keywords>
<descriptions>
<description>
<para>
The AC 2000.2 is a revised version of the 1997 release of the AC 2000 (Cat. <I/247>). It was
decided that the availability of an improved reference catalogue and the inclusion of
photometry from the Tycho-2 catalogue would be sufficient to warrant a complete rereduction of the data and a new distribution of the catalogue. The AC 2000.2 catalog contains
positions of 4,621,751 stars at the average epoch of plate exposures for each star (average
1907).
</para>
</description>
IVO Meeting @ Cambridge
May 12-16 2003
Case Study 0: Setting up the Query


Return RA, Dec, Vmag for stars with 13<Vmag<15 and
10:12:53.5<RA<13:13:43 and 18:38:00<DE< 18:40:00.
PSL:
<object class=“star”>
<property name=“Vmag”>
<range min=“13” max=“15”/>
<value>?vmag</value>
</property>
<property name=“RA”>
<range min=“10:12:53.5” max=“13:13:43”/>\
<value>?ra</value>
</property>
<property name=“DE”>
<range min=“18:38:00” max=“18:40:00”/>
<value>?de</value>
</property>
</object>
IVO Meeting @ Cambridge
May 12-16 2003
Case Study 0: Mapping Query to Metadata

Search for tables with metadata that satisfy:
–
–
–
–
–



Object/[class=“star”] –search-> keyword, description
Property[@name=“Vmag”] –search-> field/UCD, name
Property[@name=“RA”] –search-> field/UCD, name
Property[@name=“DE”] –search-> field/UCD, name
Property/range –search-> field/min and field/max or coverage
attributes
For all such tables, return:
?vmag, ?ra, ?de
Also, return group/field[@name=“error”] for group with
Vmag info.
IVO Meeting @ Cambridge
May 12-16 2003
PSL Pull down
Property Name Pull Down
MathML Pull down
AndConstrainties, Andproperties
Name, Class, etc.
*,-,/,+,sum,avg,<,>, etc
Problem Statement Language (PSL)
Begin Request
Constraint
Find astronomical objects with the following properties:
AND these properties
1. Name: assign to var1
2. Class is "cluster of galaxies | galaxy cluster"
3. Measurement quantities satisfy:
a. X-ray brightness > 3.3E7Jy : assign to var2
1. Time interval of measurement: 1998Y-1999Y
Using the above variables satisfy, the math formulae:
1. (var2 + var3) < (var1 – log[var4])
OR these constraints
[several constraints for which one must be true etc ]
Return a table with the following sequence of fields:
var1 var2
End Request
IVO Meeting @ Cambridge
May 12-16 2003
Brian Thomas’ Infrastructure
IVO Meeting @ Cambridge
May 12-16 2003
Tony Linde’s Infrastructure

VO activity
–
–
–







User
Problem Assistant – service to help user state the problem
Ontology – terms and relationships derived from existing data
Workflow – to retrieve data, merge it, analyze it, reduce it
Registry – lists all services and their high level metadata
Job Control – decides which jobs and when
Data Centre – receiver of query for all internal data sources
Data Source Service – uses translator to restate query
Translator – from data query language to implemented service
Languages
–
–
–
–
–
Problem Statement Language (PSL)
Workflow Language (WFL)
Astronomical dataset Query Language (ADQL)
Ontology Query Language (OQL)
Registry Query Language (RQL)
IVO Meeting @ Cambridge
May 12-16 2003
Conclusion



Metadata should clearly distinguish between values
that are property values and those that are modifiers
of properties.
Then, a mapping from a natural(ish) scientific
knowledge based language (PSL) to a request
language for data-center common items (ADQL) is
possible.
A federated system with a VO-wide vocabulary plus
specialized (local) namespaces is best for getting
started right away and permitting for evolution.
IVO Meeting @ Cambridge
May 12-16 2003
Descargar

Document