1
XPath 2.0
http://www.w3.org/TR/xpath20/
http://www.w3.org/TR/xquery-operators/
Roger L. Costello
6 March 2010
2
Set this to XPath 2.0
3
Using Namespaces in
Oxygen
• Suppose in the Oxygen XPath expression
evaluator tool you would like to write expressions
such as this:
current-dateTime() - xs:dateTime('2008-01-14T00:00:00')
• How do you tell Oxygen what namespace the "xs"
prefix maps to? Here's how:
– Go to:
Options ► Preferences ► XML ► XSLT-FO-XQuery ► XPath
and in the Default prefix-namespace mappings table
add a new entry mapping xs to the XML Schema
namespace http://www.w3.org/2001/XMLSchema
4
XML Document
We will use this XML
document throughout
this tutorial, so spend
a minute or two
familiarizing yourself
with it.
It is planets.xml in the
example01 folder.
Please load it into
Oxygen XML.
<?xml version="1.0" encoding="UTF-8"?>
<planets>
<planet>
<name>Mercury</name>
<mass units="(Earth = 1)">.0553</mass>
<day units="days">58.65</day>
<radius units="miles">1516</radius>
<density units="(Earth = 1)">.983</density>
<distance units="millions miles">43.4</distance>
</planet>
<planet>
<name>Venus</name>
<mass units="(Earth = 1)">.815</mass>
<day units="days">116.75</day>
<radius units="miles">3716</radius>
<density units="(Earth = 1)">.943</density>
<distance units="millions miles">66.8</distance>
</planet>
<planet>
<name>Earth</name>
<mass units="(Earth = 1)">1</mass>
<day units="days">1</day>
<radius units="miles">2107</radius>
<density units="(Earth = 1)">1</density>
<distance units="millions miles">128.4</distance>
</planet>
</planets>
planets.xml
5
Sequences
• Sequences are central to XPath 2.0
• XPath 2.0 operates on sequences, and
generates sequences.
• A sequence is an ordered collection of
nodes and/or atomic values.
6
Example Sequences
• This sequence is composed of three atomic values:
(1, 2, 3)
• This sequence is also composed of three atomic
values:
('red', 'white', 'blue')
• This XPath expression will generate a sequence
composed of three <name> nodes:
(//planet/name)
See example01
http://www.w3.org/TR/xpath20/#id-sequence-expressions
7
More Sequence Examples
• With the following XPath, a sequence of six
nodes are generated; the first three are
<mass> nodes, the next three are <name>
nodes:
(//planet/mass, //planet/name)
• This sequence contains node values
followed by atomic values:
(//planet/name, 1, 2, 3)
See example02
8
Definition of Sequence
• A sequence is an ordered collection of zero or more items.
• An item is either an atomic value or a node.
• An atomic value is a single, non-variable piece of data, e.g.
10, true, 2007, "hello world". (An atomic value is an XML
Schema simpleType value)
• There are seven kinds of nodes:
– element, text, attribute, document, PI, comment, namespace
• A sequence containing exactly one item is called a
singleton sequence.
• A sequence containing zero items is called an empty
sequence.
http://www.w3.org/TR/xpath20/#dt-item
9
Sequence Constructor
• A sequence is constructed by enclosing an
expression in parentheses.
• Each item is separated by a comma.
– The comma is called the sequence constructor
operator.
10
No Nested Sequences
• If you have a sequence (1, 2) and nest it in
another sequence
((1, 2), 3)
the resulting sequence is flattened to simply
(1, 2, 3)
• A nested empty sequence is removed
(1, (2, 3), (), 4, 5, 6)
the resulting sequence is flattened to
simply:
(1, 2, 3, 4, 5, 6)
See example03
11
Extract Items from a
Sequence
• You can extract items from a sequence
using the […] operator (predicate):
(4, 5, 6)[2]
returns the singleton sequence:
(5)
• This XPath expression:
//planet[2]
returns the second planet
See example04
12
The index must be an
integer
• The predicate value must be an integer
(more specifically, it must be an XML
Schema integer datatype).
(sequence)[index]
The index must be an integer
13
Initializing
• Example: suppose an element may or may
not have an attribute, discount. If the
element has the discount attribute then
return its value; otherwise, return 0.
(@discount, 0)[1]
14
Context Item
• Dot "." stands for the current context item.
• The context item can be a node, e.g.
//planet[.]
or it can be an atomic value, e.g.
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)[. mod 2 = 0]
See example05
15
count(sequence)
• This function returns an integer,
representing the number of items in the
sequence.
See example03.b
http://www.w3.org/TR/xquery-operators/#func-count
16
Why Nested Parentheses?
Compare these two:
count((1, 2, 3))
Notice the nested
parentheses
Why is this one correct
and the other one
incorrect?
count(1, 2, 3)
17
Answer
• The count function has only one argument.
• This form:
count(1, 2, 3)
provides three arguments to count, which is
incorrect.
• This form:
count((1, 2, 3))
provides one argument to count (the argument is a
sequence with three items).
18
Sequence of Sequences?
count((//planet, (1, 2, 3), ('red', 'white', 'blue')))
sequence of sequences?
• There is no such thing as a sequence of
sequences!
• There's only one sequence; all subsequences
get flattened into a single sequence.
The value of a nonexistent node is the empty
sequence, ()
/Planets/Planet[999]
There is no 999th Planet,
so the result of evaluating this
XPath expression is the
empty sequence, denoted by ()
19
20
() is not equal to ''
• An empty sequence is not equal to a string
of length zero.
('a', 'b', (), 'c') is not equal to ('a', 'b', '', 'c')
count = 3
count = 4
See example03.a
21
This predicate [.]
eliminates empty strings
The value of ('a', '')[.] is just ('a')
The value of ('a', 'b', '', 'c')[.] is just ('a', 'b', 'c')
22
Two built-in functions
true()
false()
http://www.w3.org/TR/xquery-operators/#func-false
http://www.w3.org/TR/xquery-operators/#func-true
23
index-of(sequence, value)
• The index-of() function allows you to obtain
the position of value in sequence.
sequence
value
index-of((1,3,5,7,9,11), 7)
Output: (4)
http://www.w3.org/TR/xquery-operators/#func-index-of
7 is at the 4th index position.
24
Suppose the value occurs at
multiple locations in the
sequence
• index-of returns a sequence of index
locations. In the last example the result was
a sequence of length 1.
multiple 7's in the sequence
index-of((1,3,5,7,9,11,7,7), 7)
Output: (4, 7, 8)
See example05.1
25
remove(sequence,
position)
• The remove function enables you to remove
a value at a specified position from a
sequence position
sequence.
remove((1,3,5,7,9,11), 4)
remove this
Output: (1, 3, 5, 9, 11)
http://www.w3.org/TR/xquery-operators/#func-remove
See example05.2
26
The "to" Range Operator
• The range operator–to–can be used to generate a
sequence of consecutive integers:
(1 to 10)
returns the sequence:
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
• This expression:
(1 to 100)[(. mod 10) = 0]
returns the sequence:
(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
• This expression:
(1, 2, 10 to 14, 34, 99)
returns this disjointed sequence:
(1, 2, 10, 11, 12, 13, 14, 34, 99)
See example06
27
The operands of "to"
must be integers
This is not valid:
('a' to 'z')
Error message you will get:
"Error: Required type of first operand of 'to'
is integer; supplied value has type string"
28
value
sequence
(note: '2' is missing)
position
insert-before(sequence,
position,value)
insert-before((1,3,4,5,6,7,8,9),2,2
insert the value 2 before position 2
Output: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
http://www.w3.org/TR/xquery-operators/#func-insert-before
29
Appending a value to the
end
Specify a position greater than the
length of the sequence
insert-before(1 to 10, count(1 to 10) + 1, 2)
Output: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2)
30
The inserted value can be
a sequence
sequence of values
insert-before((1,3,4,5,6,7,8,9),2,(2,3))
Output: (1, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10)
See example05.3
31
Sequence Functions
• index-of() returns the index (position) of a
value
• [idx] returns the value at idx
• remove() returns the sequence minus the
item whose index (position) is specified
• insert-before() returns the sequence plus a
new value
Do Lab8
32
Sequences are Ordered
• Order matters.
• This generates a sequence composed of the
<mass> elements followed by the <name>
elements:
(//planet/mass, //planet/name)
See example07
33
reverse(sequence)
• This function reverses the items in
sequence.
Notice in the first example the items are wrapped
in parentheses (thus creating a sequence).
See example07.1
http://www.w3.org/TR/xquery-operators/#func-reverse
34
The for Expression
• Use the for expression to loop (iterate) over all
items in a sequence. This is its general form:
for variable in sequence return expression
• Here's an example which iterates over the integers
1-10, multiplying each integer by two:
for $i in (1 to 10) return $i * 2
returns
(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
See example08
http://www.w3.org/TR/xpath20/#id-for-expressions
35
for Expression Examples
• This iterates over each <planet> element, and
returns its <radius> element:
for $p in /planets/planet return $p/radius
• This iterates over each <radius> element, and
returns itself (the sequence generated is identical
to above):
for $r in /planets/planet/radius return $r
• This iterates over each letter of the alphabet:
for $i in ('a','b','c','d','e','f','g','h','i','j','k','l',
'm','n','o','p','q','r','s','t','u','v','w','x','y','z')
return $i
See example09
36
More for Examples
• This returns the radius converted to
kilometers (it returns numbers, not nodes):
for $r in /planets/planet/radius return $r * 1.61
• This applies the avg() function to the
sequence of nodes returned by the for
expression:
avg(for $r in /planets/planet/radius return $r)
See example10
37
Terminology
for variable in sequence return expression
range variable
return expression
input sequence
The return expression is evaluated once
for each item in the input sequence.
38
Multiple Variables
Multiple variables can be used:
,
for variable in sequence
return expression
39
Example of Multiple
Variables
for $x in (1, 2), $y in (3, 4) return ($x * $y)
returns (3, 4, 6, 8)
Do Lab9
See example11
40
The if Expression
• The form of the if expression is:
if (boolean expression) then expression1 else expression2
• If the boolean expression evaluates to true then the
result is expression1, else the result is expression2
• This if expression finds the minimum of two
numbers:
if (10 &lt; 20) then 10 else 20
• This for loop returns all the positive numbers in
the sequence:
for $i in (0, -3, 5, 7, -1, 2) return if ($i &gt; 0) then $i else ()
See example12
http://www.w3.org/TR/xpath20/#id-conditionals
41
Nested if-then-else
if (boolean expr) then expr1 else expr2
These can be an if-then-else
42
Notes about the if
Expression
1. You must wrap the boolean expression in
parentheses.
2. You must have an "else" part. There is no
if-then expression, only an if-then-else
Do Lab10
43
The some Expression
• The form of the some expression is:
some variable in sequence satisfies boolean expression
• The result of the expression is either true or
false.
• Using the some expression means that at
least one item in the sequence satisfies the
boolean expression.
http://www.w3.org/TR/xpath20/#id-quantified-expressions
44
Examples of the some
Expression
• This example determines if there are some (one or
more) negative values in the sequence:
some $i in (2, 6, -1, 3, 9) satisfies $i &lt; 0
• Note that this produces the same boolean result:
(2, 6, -1, 3, 9) &lt; 0
because "<" is a general comparison operator, i.e.
it compares each item in the sequence until a
match is found.
See example13
45
More Examples of "some"
• Is there is some planet that has a radius
greater than 2000?
some $i in /planets/planet satisfies $i/radius &gt; 2000
• Note that this produces the same boolean
result:
/planets/planet/radius &gt; 2000
See example14
46
The every Expression
• The form of the every expression is:
every variable in sequence satisfies boolean expression
• The result of the expression is either true or
false.
• Using the every expression means that
every item in the sequence satisfies the
boolean expression.
http://www.w3.org/TR/xpath20/#id-quantified-expressions
47
Examples of the every
Expression
• This example determines if every item in
the sequence is positive:
every $i in (2, 6, -1, 3, 9) satisfies $i &gt; 0
• Note that this produces the same boolean
result:
not((2, 6, -1, 3, 9) &lt;= 0)
48
Multiple Universal
Quantifiers
• An XPath expression can have multiple
universal quantifiers.
,
every variable in sequence
satisfies condition
See example15
49
Union Operator
• The union operator is used to combine two
node sequences (cannot union atomic
sequences).
• Example:
/planets/planet/mass union /planets/planet/radius
produces the sequence:
<mass units="(Earth = 1)">.0553</mass>
<radius units="miles">1516</radius>
<mass units="(Earth = 1)">.815</mass>
<radius units="miles">3716</radius>
<mass units="(Earth = 1)">1</mass>
<radius units="miles">2107</radius>
http://www.w3.org/TR/xpath20/#combining_seq
50
Equivalent
/planets/planet/mass union /planets/planet/radius
/planets/planet/mass | /planets/planet/radius
The union and | operators are equivalent.
51
Duplicates are Eliminated
• When you union two node sets, any
duplicates are eliminated.
• This yields 3 nodes, not 6:
/planets/planet/mass union /planets/planet/mass
See example16
52
Intersect Operator
• The intersect operator returns the intersection of
two node sequences.
• Example: find all planets with mass over .8 and
radius over 2000:
/planets/planet[mass &gt; .8] intersect /planets/planet[radius &gt; 2000]
http://www.w3.org/TR/xpath20/#combining_seq
<planet>
<name>Venus</name>
<mass units="(Earth = 1)">.815</mass>
<day units="days">116.75</day>
<radius units="miles">3716</radius>
<density units="(Earth = 1)">.943</density>
<distance units="millions miles">66.8</distance>
</planet>
<planet>
<name>Earth</name>
<mass units="(Earth = 1)">1</mass>
<day units="days">1</day>
<radius units="miles">2107</radius>
<density units="(Earth = 1)">1</density>
<distance units="millions
miles">128.4</distance>
</planet>
53
Equivalent
/planets/planet[mass &gt; .8] intersect /planets/planet[radius &gt; 2000]
/planets/planet[(mass &gt; .8) and (radius &gt; 2000)]
54
Duplicates are Eliminated
• When you intersect two node sets, any
duplicates are eliminated.
• This yields 2 nodes, not 4:
/planets/planet[mass &gt; .8] intersect /planets/planet[mass &gt; .8]
See example17
55
Except Operator
• The except operator returns the difference
between two node sequences.
• Example: get all planets except Earth:
/planets/planet except /planets/planet[name='Earth']
<planet>
<name>Mercury</name>
<mass units="(Earth = 1)">.0553</mass>
<day units="days">58.65</day>
<radius units="miles">1516</radius>
<density units="(Earth = 1)">.983</density>
<distance units="millions miles">43.4</distance>
</planet>
<planet>
<name>Venus</name>
<mass units="(Earth = 1)">.815</mass>
<day units="days">116.75</day>
<radius units="miles">3716</radius>
<density units="(Earth = 1)">.943</density>
<distance units="millions miles">66.8</distance>
</planet>
http://www.w3.org/TR/xpath20/#combining_seq
56
Equivalent
/planets/planet except /planets/planet[name='Earth']
/planets/planet[name!='Earth']
See example18
57
I posed a challenge to the xml-dev list, challenging them
to simplify an XPath expression. Their answer is
awesome.
Problem: create an XPath expression for this:
There must be one child Title element and
there must be zero or more child Author elements and
there must be one child Date element and
nothing else.
Here's the XPath 2.0 expression I created:
count(Title) eq 1 and
count(Author) ge 0 and
count(Date) eq 1 and
count(*[not(name() = ('Title','Author','Date'))]) eq 0
See next slide for the solution created by the XPath
masters on xml-dev 
58
Title and Date and empty(* except (Title[1], Date[1], Author))
Incredible, don't you think?
59
No Duplicates, Document
Order
• The union, intersect, and except operators
return their results as sequences in
document order, without any duplicate
items in the result sequence.
60
"Duplicate" is Based on
Identity,
Not Value
• Two nodes are duplicates iff they are the
exact same node.
• These two <p> elements have the same
value, but different identities
<div>
<p>Box 1</p>
<p>Box 1</p>
</div>
Do Lab11
61
Multiple Node Tests
• Recall that in XPath 1.0 an XPath
expression is composed of steps separated
by slashes:
node-test slash node-test slash …
• At each step you can only specify one node
test.
• In XPath 2.0 you can specify multiple node
tests on each step.
62
Example of Multiple Node
Tests
• Example: select the mass and radius for
each planet:
/planets/planet/(mass|radius)
<mass units="(Earth = 1)">.0553</mass>
<radius units="miles">1516</radius>
<mass units="(Earth = 1)">.815</mass>
<radius units="miles">3716</radius>
<mass units="(Earth = 1)">1</mass>
<radius units="miles">2107</radius>
63
Equivalent
/planets/planet/(mass|radius)
/planets/planet/(mass union radius)
/planets/planet/mass | /planets/planet/radius
/planets/planet/*[(self::mass) or (self::radius)]
See example19
64
Examples of Multiple Node
Tests using Union and
Intersect Operators
XML:
XPath:
XPath:
<test>
<a>A</a>
<b>B</b>
<c>C</c>
<d>D</d>
<e>E</e>
</test>
/test/(a, b) union /test/(c, d, e)
/test/(a, b, c) intersect /test/(b, c, d)
Output:
Output:
<a>A</a>
<b>B</b>
<c>C</c>
<d>D</d>
<e>E</e>
<b>B</b>
<c>C</c>
See example20
65
Feed Nodes into a
Function
• In XPath 1.0 an expression following a
slash identifies node(s).
• In XPath 2.0 an expression following a
slash can be a function. Each value
preceding the slash is fed into the function.
/planets/planet/name/substring(.,1,1)
The name of
each planet
is fed into
See example21
Output: ("M", "V", "E")
66
Feed Nodes into a for loop
/planets/planet/day/(for $i in . return $i * 2)
Output: (117.3, 233.5, 2)
Note: be sure you wrap the for-loop in parentheses.
See example22
67
Can't Feed Atomic Values
• The previous slides showed feeding nodes into a
function and for-loop.
• You cannot feed atomic values, e.g., this is illegal:
(1 to 10)/(for $i in . return $i)
Here's the error message you get:
Error: Required item type of first operand of /
is node(); supplied value has item type
xs:integer
Do Lab12
See example22.a
68
Comments
• XPath 2.0 expressions may be commented
using this syntax:
(: comment :)
(: multiply each day by two :) /planets/planet/day/(for $i in . return $i * 2)
69
General Comparison
Operators
• Here are the general comparison operators:
=, !=, <, <=, >, >=
• These operators are used to compare
sequences.
• Each item in one sequence is compared
against each item in the other sequence; the
comparison evaluates to true if one or more
item-item comparisons evaluates to true.
http://www.w3.org/TR/xpath20/#id-general-comparisons
70
How General Comparison
Works
(item1, item2) op (item3, item4)
is evaluated as:
(item1 op item3) or (item1 op item4) or (item2 op item3) or (item2 op item4)
(1, 2) = (2, 3)
is evaluated as:
(1 = 2) or (1 = 3) or (2 =2) or (2 = 3)
this it returns true
(1, 2) = (3, 4)
returns false because there are no equal values between the sequences
See example23
71
Example
/planets/planet[mass &gt; .8] = /planets/planet[density &gt; .9]
• The left side returns a sequence of two
planets (Venus, Earth), and the right side
returns a sequence of three planets
(Mercury, Venus, Earth).
• The result is true.
See example24
72
Definition of Equal
• Two nodes are equivalent if:
– their node values are the same
– the order of the values are the same
– the number of values is the same
• The tag names can be different.
Comparison is based on data, not markup.
73
Example
• The below document has two <planet> elements.
They use different tag names.
/planets/planet[1] = /planets/planet[2] returns true.
<planets>
<planet>
<name>Mercury</name>
<mass units="(Earth = 1)">.0553</mass>
<day units="days">58.65</day>
<radius units="miles">1516</radius>
<density units="(Earth = 1)">.983</density>
<distance units="millions miles">43.4</distance>
</planet>
<planet>
<n>Mercury</n>
<m units="(Earth = 1)">.0553</m>
<d units="days">58.65</d>
<r units="miles">1516</r>
<d units="(Earth = 1)">.983</d>
<d units="millions miles">43.4</d>
</planet>
</planets>
See example25
74
Equivalent?
• Problem: find all planets whose name is not
in this sequence ('Earth', 'Mars')
• Are these equivalent?
/planets/planet[not(name = ('Earth', 'Mars'))]
/planets/planet[name != ('Earth', 'Mars')]
75
Not Equivalent!
/planets/planet[not(name = ('Earth', 'Mars'))]
<planet>
<name>Mercury</name>
<mass units="(Earth = 1)">.0553</mass>
<day units="days">58.65</day>
<radius units="miles">1516</radius>
<density units="(Earth = 1)">.983</density>
<distance units="millions miles">43.4</distance>
</planet>
<planet>
<name>Venus</name>
<mass units="(Earth = 1)">.815</mass>
<day units="days">116.75</day>
<radius units="miles">3716</radius>
<density units="(Earth = 1)">.943</density>
<distance units="millions miles">66.8</distance>
</planet>
/planets/planet[name != ('Earth', 'Mars')]
<planet>
<name>Mercury</name>
<mass units="(Earth = 1)">.0553</mass>
<day units="days">58.65</day>
<radius units="miles">1516</radius>
<density units="(Earth = 1)">.983</density>
<distance units="millions miles">43.4</distance>
</planet>
<planet>
<name>Venus</name>
<mass units="(Earth = 1)">.815</mass>
<day units="days">116.75</day>
<radius units="miles">3716</radius>
<density units="(Earth = 1)">.943</density>
<distance units="millions miles">66.8</distance>
</planet>
<planet>
<name>Earth</name>
<mass units="(Earth = 1)">1</mass>
<day units="days">1</day>
<radius units="miles">2107</radius>
<density units="(Earth = 1)">1</density>
<distance units="millions miles">128.4</distance>
</planet>
76
Explanation
/planets/planet[not(name = ('Earth', 'Mars'))]
/planets/planet[name != ('Earth', 'Mars')]
for each planet
is its name 'Earth' or 'Mars'?
if so, don't return it
otherwise return it
for each planet
is its name not 'Earth' or not 'Mars'?
if so, don't return it
otherwise return it
Consider the planet whose name is Earth:
equal?
Consider the planet whose name is Earth:
Earth
Earth
not equal?
Earth
not equal?
Mars
Earth
equal?
Mars
not((Earth equal Earth) or (Earth equal Mars))
not(true or false)
not(true)
false
(Earth not equal Earth) or (Earth not equal Mars)
false or true
true
(Every planet will not equal Earth or Mars, so every
planet is returned.
See example26
77
Value Comparison
Operators
• Here are the value comparison operators:
eq, ne, lt, le, gt, ge
• These operators are used to compare atomic
values.
• Example:10 lt 30 returns true
• Example:
/planets/planet[1]/name eq 'Mercury'
returns true
http://www.w3.org/TR/xpath20/#id-value-comparisons
See example27
78
No Sequences Allowed!
• Suppose the third planet contains two <name>
elements:
<planet>
<name>Earth</name>
<name>Mother Earth</name>
</planet>
then
/planets/planet[3]/name eq 'Earth'
raises an error:
"Error! A sequence of more than one
item is not allowed as the first operand
of 'eq'."
See example28
79
However, this works
Note that:
/planets/planet[3]/name = 'Earth'
returns true because the "=" operator is used with sequences.
See example29
80
is Operator
• You can compare two nodes to see if they are the
same nodes by using the "is" operator:
expr1 is expr2
returns true only if expr1 and expr2 identify the
same node. expr1 and expr2 must be singleton
sequences.
This expression
//planet[mass = .815] is //planet[day = 116.75]
returns true
because both expressions identify the same <planet> element
http://www.w3.org/TR/xpath20/#id-node-comparisons
See example30
81
<< Operator
• This expression
expr1 << expr2
returns true if the node identified by expr1
comes before the node identified by expr2
in the document.
This expression
//planet[mass = .0553] &lt;&lt; //planet[mass = .815]
returns true
because the left expression identifies Mercury, the right
expression identifies Venus, and Mercury comes before
Venus in the document
http://www.w3.org/TR/xpath20/#id-comparisons
See example31
82
>> Operator
• This expression
expr1 >> expr2
returns true if the node identified by expr1
comes after the node identified by expr2 in
the document.
This expression
//planet[mass = .815] &gt;&gt; //planet[mass =.0553]
returns true
because the left expression identifies Venus, the right
expression identifies Mercury, and Venus comes after
Mercury in the document
http://www.w3.org/TR/xpath20/#id-comparisons
Do Lab13
See example32
83
Arithmetic Operators
• Here are the arithmetic operators:
+, -, *, div, mod, idiv
• The idiv operates on integers and returns an
integer rounded toward zero, e.g.
3 idiv 2 returns 1
-5 idiv 2 returns -2
http://www.w3.org/TR/xpath20/#id-arithmetic
See example33
84
Equivalent
n idiv m
floor(n div m)
if n and m are positive
ceiling(n div m)
if n or m is negative
85
current-dateTime Function
• current-dateTime() is an XPath 2.0 function
that returns the current date and time, e.g.
2008-01-19T14:19:26.406-05:00
• The value returned by this function is of
type xs:dateTime (the XML Schema
dateTime datatype).
http://www.w3.org/TR/xquery-operators/#func-current-dateTime
See example34
86
The matches() Function
• The form of the matches function is:
matches(input string, regex)
• It is a boolean function. It returns true if the
input string matches the regular expression,
false otherwise.
if (matches(/planets/planet[2]/name, 'Venus')) then 'Success' else 'Failure'
The matches() function evaluates to true; the result is 'Success'
http://www.w3.org/TR/xpath-functions/#func-matches
87
The matches() Function
if (matches(/planets/planet[2]/name, 'V[a-z]+s')) then 'Success' else 'Failure'
This regex says: Any string that starts
with 'V' ends with 's' and has at least
one lowercase letter of the alphabet.
See example44
88
Regular Expressions
• The following 4 slides show examples of
regular expressions:
Regular Expressions
Chapter \d
Chapter&#x020;\d
a*b
[xyz]b
a?b
a+b
[a-c]x
Examples
Chapter 1
Chapter 1
b, ab, aab, aaab, …
xb, yb, zb
b, ab
ab, aab, aaab, …
ax, bx, cx
89
Regular Expressions
(cont.)
Regular Expressions
[a-c]x
[-ac]x
[ac-]x
[^0-9]x
\Dx
Chapter\s\d
(ho){2} there
(ho\s){2} there
.abc
(a|b)+x
Examples
ax, bx, cx
-x, ax, cx
ax, cx, -x
any non-digit char followed by x
any non-digit char followed by x
Chapter followed by a blank followed by a digit
hoho there
ho ho there
any (one) char followed by abc
ax, bx, aax, bbx, abx, bax,...
90
Regular Expressions
(cont.)
a{1,3}x
a{2,}x
\w\s\w
ax, aax, aaax
aax, aaax, aaaax, …
word character
(alphanumeric plus dash)
followed by a space
followed by a word
character
[a-zA-Z-[Ol]]*
A string composed of any
lower and upper case
letters, except "O" and "l"
The period "." (Without the
backward slash the period
means "any character")
\.
91
Regular Expressions
(cont.)
^Hello
Hello$
^Hello$
Hello (and it must be at the
beginning)
Hello (and it must be at the
end)
Hello (and it must be the
only value)
92
Regular Expressions
(cont.)
\n
\r
\t
\\
\|
\\^
\?
\*
\+
\{
\}
\(
\)
\[
\]
linefeed
carriage return
tab
The backward slash \
The vertical bar |
The hyphen The caret ^
The question mark ?
The asterisk *
The plus sign +
The open curly brace {
The close curly brace }
The open paren (
The close paren )
The open square bracket [
The close square bracket ]
93
Regular Expressions
(concluded)
\p{L}
\p{Lu}
\p{Ll}
\p{N}
\p{Nd}
\p{P}
\p{Sc}
A letter, from any language
An uppercase letter, from any language
A lowercase letter, from any language
A number - Roman, fractions, etc
A digit from any language
A punctuation symbol
A currency sign, from any language
\p{Sc}\p{Nd}+(\.\p{Nd}\p{Nd})?
"currency sign from any
language, followed by one
or more digits from any
language, optionally
followed by a period and
two digits from any
language"
94
Different from the Regex in the
XML Schema Pattern Facet
Consider this XML Schema element declaration:
<element name="Free-text">
<simpleType>
<restriction base="string">
<pattern value="Hello" />
</restriction>
</simpleType>
</element>
And suppose this is the input:
<Free-text>Hello</Free-text>
The input validates against the schema. That is, the
string "Hello" matches the regex in the pattern facet.
Likewise, using the same input and regex, the
matches function succeeds:
if (matches(//Free-text, 'Hello')) then 'Success' else 'Failure'
95
Different from the Regex in the
XML Schema Pattern Facet
Next, consider this input:
<Free-text>He said Hello World</Free-text>
The input does not validate against the schema. That
is, the string "He said Hello World" does not match
the regex in the pattern facet.
Conversely, the matches function does succeed:
if (matches(//Free-text, 'Hello')) then 'Success' else 'Failure'
http://www.w3.org/TR/xpath-functions/#regex-syntax
96
XSD Regex's are
Implicitly Achored
• When you give a regex in a pattern facet,
there are "implicit anchors" in the regex.
• The regex "Hello" is actually this:
^Hello$
The $ matches the end of the input
The ^ matches the start of the input
Thus "Hello" matches only input that starts with H, ends with o, and in between is ello.
97
No Implicit Anchors in
XPath Regex's
• The regex "Hello" in XPath has no implicit
anchors. Any anchors must be explicitly
specified.
• Thus, the regex "Hello" matches any input
that contains the string Hello
if (matches(//Free-text, 'Hello')) then 'Success' else 'Failure'
is equivalent to:
if (contains(//Free-text, 'Hello')) then 'Success' else 'Failure'
See example45
98
Case-Insensitivity Mode
• The matches function has an optional third
argument:
matches(input, regex, flags)
• The "i" flag is used to: perform a case-insensitive
comparison of the input and the regex.
Example: suppose this is the input:
<Free-text>He said HELLO WORLD</Free-text>
Consider this XPath:
if (matches(//Free-text, 'Hello', 'i')) then 'Success' else 'Failure'
The result is 'Success' because the input is checked to see if it
contains 'Hello', 'hello', 'HELLO', 'HeLLO', etc.
99
The Default is Case-Sensitive
• If the "i" flag is not used in the matches
function, it defaults to a case-sensitive
comparison.
Consider this XPath:
if (matches(//Free-text, 'Hello')) then 'Success' else 'Failure'
The result is 'Failure' because the input is checked to see if it
contains 'Hello'
See example46
100
Multiline Mode
• The "m" flag is used to indicate that the input
should be treated as composed of one or more
lines, each line has a start and end, and the regex
should be compared against each line.
Example: suppose this is the input:
<Free-text>He said
Hello World</Free-text>
Consider this XPath:
if (matches(//Free-text, '^Hello', 'm')) then 'Success' else 'Failure'
The result is 'Success.' The regex says: does the input start with the
string 'Hello.' The 'm' flag say: check each line. Thus, the result is
'Success' since the second line start with 'Hello.'
101
The Default is
One Long String
• If the "m" flag is not used in the matches
function, it defaults to treating the input as
one long string, with one start and one end.
Consider this XPath:
if (matches(//Free-text, '^Hello')) then 'Success' else 'Failure'
The result is 'Failure' because the input is treated as one long
string and 'Hello' does not start the string.
See example47
102
Dot-all Mode
• The "s" flag is used to indicate that the dot (.) character
matches every character, including the newline (x0A)
character.
• If the "s" flag is not used, the default behavior is for the dot
character to match every character except the newline
character.
if (matches('Hello
World', 'H.*World')) then 'Success' else 'Failure'
The result is 'Failure'
if (matches('Hello
World', 'H.*World', 's')) then 'Success' else 'Failure'
The result is 'Success'
See example48
103
Ignore Whitespace Mode
• The "x" flag is used to indicate that whitespace in
a regex should be ignored.
• If the "x" flag is not used then any whitespace in
the regex is treated as part of the regex.
if (matches('abcabc', '(a b c)+')) then 'Success' else 'Failure'
The result is 'Failure.' The regex only matches this input:
a b c, a b c a b c, etc.
if (matches('abcabc', '(a b c)+', 'x')) then 'Success' else 'Failure'
The result is 'Success.' The regex only matches this input:
abc, abcabc, etc.
See example49
104
Multiple Flags
• Zero or more flags can be specified.
• The default value is used for modes not
specified.
if (matches('Hello
World', '^WORLD$', 'im')) then 'Success' else 'Failure'
The result is 'Success.' The regex says: The input must begin and end
with the literal string 'WORLD.' The flags say: ignore case and treat
the input as 2 lines, and compare each line.
Do Lab14
See example50
105
The tokenize() Function
• Use to split up a string into pieces (tokens).
• A regex specifies the characters that
separate the tokens.
for $i in tokenize('12, 16, 3, 99', ',\s*') return $i
The result is: 12 16 3 99
http://www.w3.org/TR/xpath-functions/#func-tokenize
106
Use Flags with tokenize()
• The flags (i, m, s, x) we saw with the
matches() function are also available with
tokenize()
for $i in tokenize('12xx16XX3xX99', 'xx', 'i') return $i
The result is: 12 16 3 99
See example51
107
Separators are Discarded
• The separators are specified using a regex.
• The input string is processed from left to
right, looking for substrings that match the
regex.
• The separators are discarded, the remaining
strings are collected and yield the output
sequence.
108
Example: Footnote
References as Separators
• Tokenize the input using [n] as the
separators.
• For example, tokenize this:
XPath[1] XSLT[2]
into these tokens: XPath XSLT
Will this work?
tokenize('XPath[1] XSLT[2]', '\[.+\]')
109
+ is a Greedy Quantifier
• The regex on the previous slide does not
produce the desired result.
• Here's why: the + operator searches for the
longest string that matches. It is called a
greedy operator.
\[.+\]
Read as: find the longest string that starts
with '[' and ends with ']'
See example52
110
Why Does This Work?
tokenize('XPath[1] XSLT[2]', '\[\d+\]')
111
Regex is for [digit(s)]
tokenize('XPath[1] XSLT[2]', '\[\d+\]')
Only permit digits in the brackets
See example53
112
+? is a non-Greedy Operator
• If you want to match the shortest possible
substring, add a '?' after the quantifier to
make it non-greedy.
\[.+?\] Read as: find the shortest string that starts
with '[' and ends with ']'
tokenize('XPath[1] XSLT[2]', '\[.+?\]') Yields the desired tokens: 'XPath' and 'XSLT'
See example54
113
* and + are Greedy
• Above we saw that + is greedy
• * is also greedy
• To make them non-greedy append a '?'
*? and +?
114
Regex with 2 Alternatives,
and Both Match
• Consider this XPath: tokenize('bab', 'a|ab')
• What tokens will be generated?
{b, b} or {b}
115
First Alternative Wins!
• If multiple alternatives match, the first one
is used.
• Thus, the result is: {b, b}
• Suppose that's not what we want. We want
the longest alternative ('ab') used whenever
possible.
See example55
116
Solution
• Both of these regex's give the desired result:
ab|a or ab?
See example56
117
Separator Matches
Beginning and Ending
• Consider this XPath: tokenize('aba', 'a')
• The input string starts with the separator
and ends with the separator
• What will be the result?
118
Zero-length Strings
• The output is a zero-length string, 'b', zerolength string:
{'', 'b', ''}
See example57
119
Regex Doesn't Match Input
• If the regex doesn't match the input string
then the result is the input string:
tokenize('bbb', 'a') produces {'bbb'}
Do Lab15
See example58
120
What Separator?
• Suppose you want to split (tokenize) this
string W151TBH into
{'W', '151', 'TBH'}
• That is, separate the numeric from the
alphabetic.
• What regex would you use?
121
Need More Knowledge
• The problem can't be solved given what we
currently know.
• However, it can be solved by using the
tokenize() function with the replace()
function, so let's learn about replace().
122
The replace() Function
• The replace() function replaces any string
that matches the regex with a replacement
string:
replace(input, regex, replacement)
• Example: this removes all vowels:
replace('Hello World', '[aeiou]', '')
returns:
{'Hll Wrld'}
http://www.w3.org/TR/xpath-functions/#func-replace
See example59
123
Example
• What is the result of this replace:
replace('banana', '(an)*a', '#')
See example60
124
* is a Greedy Operator
• The result of: replace('banana', '(an)*a', '#')
is b#
• (an)* looks for the longest string of 'anan…'
• The * is a greedy operator
• To make it non-greedy, append ? to the *
replace('banana', '(an)*?a', '#')
• The result is: b#n#n#
See example61
125
Two Matching Alternatives
• Suppose the regex contains two alternatives,
and both match:
replace('banana', 'a|an', '#')
• What will be the result?
126
Leftmost Alternative Wins
• The rule is that the first (leftmost) alternative
wins:
replace('banana', 'a|an', '#')
results in:
b#n#n#
• Switching the alternatives:
replace('banana', 'an|a', '#')
results in:
b###
See example62
127
Using Variables in the
Replacement String
• Consider a regex composed of a sequence
of parenthesized expressions:
( … )( … )( … )
$1
$2
$3
$1 stands for the characters matched by the first parenthesized expression
$2 stands for the characters matched by the second parenthesized expression
…
$9 stands for the characters matched by the ninth parenthesized expression
128
Example: Insert Hyphens
into a Date
replace('12March2008', '([0-9]+)([a-zA-Z]+)([0-9]+)', '$1-$2-$3')
The result is: 12-March-2008
See example63
129
Regex Doesn't Match Input
• If the regex doesn't match the input then the
result will be unchanged:
replace('aaaa', 'b', '#')
The result is: aaaa
See example64
130
Use Flags with replace()
• replace() uses the same flags as matches()
and tokenize(): i, m, s, x
• Example: replace('Haha', 'h', 'b', 'i')
returns:
baba
Do Lab16
See example65
131
Tokenize this String
• How would you separate the numeric parts
from the character parts:
W151TBH
{'W', '151', 'TBH'}
132
Step 1
• Use replace() to append a hash mark (#)
onto the end of each part:
W151TBH
W#151#TBH#
This is accomplished using replace:
replace('W151TBH', '([0-9]+|[a-zA-Z]+)', '$1#')
See example66
133
Step 2
• Tokenize using # as the separator:
W#151#TBH#
{'W', '151', 'TBH', ''}
This is accomplished by this:
tokenize('W#151#TBH#', '#')
See example67
134
Step 3
• Remove the zero-length string
('W', '151', 'TGH', '')[.]
The predicate says: Give me the value of the sequence.
Recall that the value of ('a', '')[.] is just ('a')
See example68
135
Putting it all Together
tokenize(replace('W151TBH', '([0-9]+|[a-zA-Z]+)', '$1#'), '#')[.]
This produces:
('W', '151', 'TBH')
See example69
136
What does the
predicate apply to?
• What is the result of these statements?
//name[1]
(//name)[1]
137
Answer
• //name[1] returns the first <name> element
in each <planet> element.
– Number of elements returned: 3
• (//name)[1] returns the first <name>
element among all the <name> elements in
all the <planet> elements.
– Number of elements returned: 1
See example70
138
Select the first Book
by each Author
<BookStore>
<Book>
<Title>Illusions The Adventures of a Reluctant Messiah</Title>
<Author>Richard Bach</Author>
<Date>1977</Date>
<ISBN>0-440-34319-4</ISBN>
<Publisher>Dell Publishing Co.</Publisher>
</Book>
<Book>
<Title>The First and Last Freedom</Title>
<Author>J. Krishnamurti</Author>
<Date>1954</Date>
<ISBN>0-06-064831-7</ISBN>
<Publisher>Harper &amp; Row</Publisher>
</Book>
<Book>
<Title>Jonathan Livingston Seagul</Title>
<Author>Richard Bach</Author>
<Date>1970</Date>
<ISBN>0-684-84684-5</ISBN>
<Publisher>Simon &amp; Schuster</Publisher>
</Book>
</BookStore>
Select these two
139
Select the first Book
by each Author
//Book[not(Author = preceding::Book/Author)]
The predicate evaluates to true if the Author of the Book
is not the same as the Author of a preceding Book
Do Lab17
See example71
140
XPath Functions
• http://www.w3schools.com/Xpath/xpath_fu
nctions.asp
• http://www.w3.org/TR/xqueryoperators/#contents
141
XPath 2.0 Functions
142
distinct-values(values)
• This XPath function will return a sequence
composed of unique values.
distinct-values((2, 2, 3, 4, 1, 4, 2, 6, 3, 9))
Output:
2 3 4 1 6 9
See example72
http://www.w3.org/TR/xquery-operators/#func-distinct-values
Note that the sequence of
integers is wrapped within
a pair of parentheses. Why?
Because the function takes
only one argument.
143
<?xml version="1.0"?>
<FitnessCenter>
<Member id="1" level="platinum">
<Name>Jeff</Name>
<FavoriteColor>lightgrey</FavoriteColor>
</Member>
<Member id="2" level="gold">
<Name>David</Name>
<FavoriteColor>lightblue</FavoriteColor>
</Member>
<Member id="3" level="platinum">
<Name>Roger</Name>
<FavoriteColor>lightyellow</FavoriteColor>
</Member>
<Member id="4" level="platinum">
<Name>Sally</Name>
<FavoriteColor>lightgrey</FavoriteColor>
</Member>
<Member id="5" level="platinum">
<Name>Linda</Name>
<FavoriteColor>purple</FavoriteColor>
</Member>
</FitnessCenter>
Another Example
distinct-values(/FitnessCenter/Member/FavoriteColor)
Output:
lightgrey
lightblue
lightyellow
purple
Do Lab18
See example73
144
doc(url)
• The doc(url) function is used to retrieve
data from another XML document.
doc('FitnessCenter2.xml')
You must put quotes around the file name.
Actually, the argument to doc() is a URL.
http://www.w3.org/TR/xquery-operators/#func-doc
See example74
145
data(item)
• This function returns the (atomic) value of
node, i.e., it "atomizes" the node.
• This function is exactly the same as the
string(item) function, except the string
function always returns the value of the item
as a string, whereas the data(item) function
returns the value of the item with its type
intact.
http://www.w3.org/TR/xquery-operators/#func-data
146
data(item)
string(/FitnessCenter/Member[1]/MembershipFee) + 1
error
data(/FitnessCenter/Member[1]/MembershipFee) + 1
341
data(340) + 1
341
See example75
147
error(QName?, description)
• You can raise an error in your XPath using
the error() function.
for $i in /FitnessCenter/Member return
if (number($i/MembershipFee) lt 0) then
error((), 'Invalid value for MembershipFee')
else
true()
http://www.w3.org/TR/xquery-operators/#func-error
See example76
148
trace(value, message)
•
•
This is used for debugging, to monitor the execution.
The trace() function does two things:
– it returns (outputs) value
– it displays message and information about value
for $i in /FitnessCenter/Member return trace($i/MembershipFee, 'The membership fee is:')
Output:
Screen:
<MembershipFee>340</MembershipFee>
<MembershipFee>-500</MembershipFee>
The membership fee is: [1]: element(MembershipFee, untyped): /FitnessCenter/Member[1]/MembershipFee[1]
The membership fee is: [1]: element(MembershipFee, untyped): /FitnessCenter/Member[2]/MembershipFee[1]
The membership fee is: [1]: element(MembershipFee, untyped): /FitnessCenter/Member[3]/MembershipFee[1]
<MembershipFee>340</MembershipFee>
http://www.w3.org/TR/xquery-operators/#func-trace
See example77
149
compare(string1, string2)
• This function performs a string comparison of string1
against string2.
• If string1 is less than string2 then it returns -1
• If string1 is equal to string2 then it returns 0
• If string1 is greater than string2 then it returns 1
compare('ab','abc')
compare('ab','ab')
compare('abc','ab')
Output:
-1
0
1
http://www.w3.org/TR/xquery-operators/#func-compare
See example78
150
string-join(sequence,
separator)
• The first argument identifies any number of
values.
• The function will concatenate all the values,
placing separator between each value.
string-join(('a','b','c'),' ')
string-join(/FitnessCenter/Member/Name,'/')
Output:
abc
Jeff/David/Roger
http://www.w3.org/TR/xquery-operators/#func-string-join
See example79
151
An elegant way of creating
the XPath to any node
string-join(for $i in ancestor-or-self::* return name($i),'/')
This returns the name of the current
node (self) plus all its ancestors
Example: Suppose that the current
node is FavoriteColor. Then this will
return:
FitnessCenter Member FavoriteColor
And this function will concatentate these
values together, separating each value with /
Thus, the output is:
FitnessCenter/Member/FavoriteColor
Do Lab19
See example80
152
starts-with(string-to-test,
string)
• This function returns true if string-to-test
starts with string, false otherwise.
starts-with('abc', 'a')
starts-with(/FitnessCenter/Member[1]/FavoriteColor, 'light')
Output:
true
true
Note: this XPath function is also present in version 1.0
http://www.w3.org/TR/xquery-operators/#func-starts-with
See example81
153
ends-with(string-to-test,
string)
• This function returns true if string-to-test
ends with string, false otherwise.
ends-with('xyz', 'yz')
ends-with(/FitnessCenter/Member[1]/FavoriteColor, 'grey')
Output:
true
true
Note: this XPath function is not present in version 1.0
http://www.w3.org/TR/xquery-operators/#func-ends-with
See example82
154
String Functions You
Already Know
•
•
•
•
•
contains(string-to-test, string)
substring(string, starting-loc, length?)
substring-before(string, match-string)
substring-after(string, match-string)
translate(string, from-pattern, to-pattern)
http://www.w3.org/TR/xquery-operators/#contents
See example83
155
normalize-space(string)
• This function strips leading and trailing
whitespace (space, carriage return, tab), and
replaces multiple whitespaces within the
data by a single space.
normalize-space(' A cat ate the mouse ')
normalize-space('There are
two lines')
Output:
A cat ate the mouse
There are two lines
http://www.w3.org/TR/xquery-operators/#func-normalize-space
See example84
156
upper-case(string)
lower-case(string)
upper-case('hello world')
Output:
HELLO WORLD
lower-case('BLUE SKY')
Output:
blue sky
http://www.w3.org/TR/xquery-operators/#func-upper-case
http://www.w3.org/TR/xquery-operators/#func-lower-case
See example85
157
escape-html-uri(uri)
• This function makes a URI usable by browsers, by
escaping non-ASCII characters.
escape-html-uri('http://www.example.com?value=Π')
Output:
http://www.example.com?value=%CE%A0
http://www.w3.org/TR/xquery-operators/#func-escape-html-uri
See example86
158
year-from-date(xs:date)
• The argument of this function is a date as defined in XML
Schemas.
• Recall that the format of a date is: CCYY-MM-DD
year-from-date(xs:date('2009-09-19'))
Output:
2009
http://www.w3.org/TR/xquery-operators/#func-year-from-date
See example87
159
Many Date, Time
Functions!
year-from-dateTime(xsd:dateTime)
month-from-dateTime(xsd:dateTime)
day-from-dateTime(xsd:dateTime)
hours-from-dateTime(xsd:dateTime)
minutes-from-dateTime(xsd:dateTime)
seconds-from-dateTime(xsd:dateTime)
timezone-from-dateTime(xsd:dateTime)
year-from-date (xsd:date)
month-from-date (xsd:date)
day-from-date (xsd:date)
timezone-from-date (xsd:date)
hours-from-time (xsd:time)
minutes-from-time (xsd:time)
seconds-from-time (xsd:time)
timezone-from-time (xsd:time)
http://www.w3.org/TR/xquery-operators/#component-extraction-functions
See example88
160
root(node?)
The root() function returns
the document node
Document
/
PI
<?xml version=“1.0”?>
Element
FitnessCenter
Element
Member
Element
Member
Element
Name
Text
Jeff
Element
FavoriteColor
Text
lightgrey
Element
Name
Text
David
Element
Member
Element
FavoriteColor
Text
lightblue
Element
Name
Text
Roger
Element
FavoriteColor
Text
lightyellow
161
Useful if working with
multiple documents
• The root() function can be very useful if are working with multiple
documents.
• The following XPath expression outputs the name of every node in the
document, regardless of what document is currently being processed.
for $i in root()//* return name($i)
http://www.w3.org/TR/xquery-operators/#func-root
See example89
162
subsequence(sequence, start-loc,
length?)
• This function returns a portion of sequence. Namely, it returns the
items in sequence starting at index position start-loc. If length is not
specified then it returns all the following items in the sequence.
Otherwise, it returns length items.
subsequence((1 to 10), 2, 5)
subsequence(//Name, 2)
Output:
2,3,4,5,6
<Name>David</Name>
<Name>Roger</Name>
http://www.w3.org/TR/xquery-operators/#func-subsequence
Do Lab20
See example90
163
zero-or-one(sequence)
one-or-more(sequence)
exactly-one(sequence)
• These functions are used to assert that a sequence contains
the number of occurrences that you expect.
• Each function will generate an error if the sequence does
not contain the expected number of occurrences. If the
sequence does contain the expected number of occurrences
then it simply returns the sequence
zero-or-one(/FitnessCenter/Member[1]/Name)
one-or-more(/FitnessCenter/Member[1]/Phone)
exactly-one(/FitnessCenter/Member[1]/FavoriteColor)
http://www.w3.org/TR/xquery-operators/#func-zero-or-one
See example91
164
avg(sequence)
avg((1 to 100))
avg(//MembershipFee)
Output:
50.5
393.3333333333
Note that the avg() function has only one argument.
Consequently, in the first XPath expression it was
necessary to wrap the items with parentheses.
http://www.w3.org/TR/xquery-operators/#func-avg
See example92
165
max(sequence)
• The max() function enables you to obtain
the maximum value among a sequence of
values.
max((5, 3, 19, 2, -7))
max(//MembershipFee)
Output:
19
500
http://www.w3.org/TR/xpath-functions/#func-max
See example93
166
min(sequence)
• The min() function enables you to obtain
the minimum value among a sequence of
values.
min((5, 3, 19, 2, -7))
min(//MembershipFee)
Output:
-7
340
http://www.w3.org/TR/xpath-functions/#func-max
See example94
167
Why 2 sets of
parentheses?
• Did you notice that I used two sets of
parentheses in the min and max functions?
– min((2,1,3)) and max((2,1,3))
• In fact, if you omitted the inner parenthesis
you would get an error message.
– min(2,1,3) and max(2,1,3)
Error!
168
Reason for 2 parentheses
• Both the min and max functions have an optional second argument,
collation:
min(sequence, collation?)
max(sequence, collation?)
• The collation argument enables you to specify the collating sequence
that should be used to determine the min/max value. We will typically
just use the default collating sequence. Consequently, we will not use
the second argument.
• Do you now understand the need for the 2 parentheses?
min(2,1)
Is this a member of the sequence, or is it a collation?
Instead, you must do this: min((2,1))
169
number(value), string(value)
number(value) … "Hey, treat value as a number".
string(value) … "Hey, treat value as a string".
09 represents the number 9, which has a string value of '9'
http://www.w3.org/TR/xquery-operators/#func-number
http://www.w3.org/TR/xquery-operators/#func-string
See example95
170
Lesson Learned
• When you are doing a comparison of two
values it is very good practice to wrap your
values within either number() or string().
That way you are explicitly telling the
XSLT Processor how you want the values
compared - as numeric values or as string
values.
171
exists() function
• This function returns either true or false.
• This function is used to determine if an
element exists.
if (exists(/FitnessCenter/Member[3])) then 'There is a 3rd Member' else 'Error! No 3rd Member'
Output:
There is a 3rd Member
if (exists(/FitnessCenter/Member[99])) then 'There is a 99th Member' else 'Error! No 99th Member'
Output:
Error! No 99th Member
http://www.w3.org/TR/xquery-operators/#func-exists
172
exists(()) = false
exists(())
Output:
false
"The empty sequence does not exist"
See example96
173
empty() function
• This function returns either true or false.
• This function is used to determine if an
element does not exist.
if (empty(/FitnessCenter/Member[3])) then 'No 3rd Member' else 'Error! There is a 3rd Member'
Output:
Error! There is a 3rd Member
if (empty(/FitnessCenter/Member[99])) then 'No 99th Member' else 'Error! There is a 99th Member'
Output:
No 99th Member
http://www.w3.org/TR/xquery-operators/#func-empty
See example97
174
empty(()) = true
empty(())
Output:
true
"The empty sequence is empty"
See example97
175
empty() = not(exists())
empty(/FitnessCenter/Member[3]) eq not(exists(/FitnessCenter/Member[3]))
Output: true
empty(/FitnessCenter/Member[99]) eq not(exists(/FitnessCenter/Member[99]))
Output: true
See example98
176
deep-equal(sequence1,
sequence2)
• This function returns true if the two
sequences are identical in value and
position.
http://www.w3.org/TR/xquery-operators/#func-deep-equal
See example99
177
operand instance of
datatype
• You can use the XPath instance of boolean
operator to determine if an operand is of a
particular datatype.
• The operand must not be a node. You must first
atomize the node, using data(.)
• instance of checks the datatype label on the
operand. The label must match datatype. Thus 340
is an instance of xs:integer, but not
xs:positiveInteger
http://www.w3.org/TR/xpath20/#id-instance-of
178
operand instance of
datatype
http://www.w3.org/TR/xpath20/#id-instance-of
See example100
179
operand cast as datatype
• You can use the XPath cast as boolean
operator to make operand be a particular
datatype:
equivalent
http://www.w3.org/TR/xpath20/#id-cast
See example101
180
operand castable as
datatype
• You can use the XPath castable as boolean
operator to determine if an operand can be
cast to a particular datatype:
if (//Member[1]/MembershipFee castable as xs:integer) then
(//Member[1]/MembershipFee cast as xs:integer) * 2
else
false()
http://www.w3.org/TR/xpath20/#id-castable
See example102
181
name, local-name,
namespace-uri
• name() returns whatever is inside <…>
• local-name() returns the name that's after the colon
<…:…>
• namespace-uri() returns the namespace
See example103
182
string(node)
• This extracts the data of a node and returns
it as a string.
http://www.w3.org/TR/xquery-operators/#func-string
See example104
183
base-uri(node?),
document-uri(node)
• These return the filepath/URL to where the
XML is executing.
http://www.w3.org/TR/xquery-operators/#func-base-uri
http://www.w3.org/TR/xquery-operators/#func-document-uri
See example105
184
Kind Tests
• Here are different ways to select a kind of item:
node(): selects any kind of node
(element, attribute, text,
comment, PI, namespace)
text(): selects a text node
element(): selects an element node
element(Member): selects Member
element nodes
attribute(): selects attribute nodes
attribute(id): selects id attribute nodes
document(): selects the document node
comment(): selects a comment node
processing-instruction(): selects a PI node
185
Occurrence Indicators
• Use + to indicate one or more
• Use * to indicate zero or more
• Use ? to indicated zero or one
186
Please look at these
examples; they illustrate
the kind test and
occurrence indicators
See example107
187
XPath 2.0 is a
Strongly Typed Language
• Each XPath 2.0 function returns a value of a
specific datatype. The argument(s) that are passed
to the function must be of the required datatype.
• Also, the XPath 2.0 operators require the operands
be of a required datatype. For example, you
cannot perform arithmetic operations on strings
without explicitly telling the processor to treat
your strings like numbers.
188
XPath 2.0 is a
Strongly Typed Language
• Consider this expression:
'3' + 2
Here's the error message that you will get:
Arithmetic operator is not defined for
arguments of types (xs:string, xs:integer)
• Conversely, in XPath 1.0 the processor
automatically coerces the string into a
number.
See example35
189
Advantages of a
Strongly Typed System
• Early and reliable identification of errors.
– Example: '3' + 2 will generate an error because the type
of the first operand is not appropriate for the operator.
• Implementations (XPath processors) can optimize
performance if they know about the types of the
data.
– Example: Consider this comparison:
//planet/* = 'mars'
If the processor knows the datatypes of each child of
<planet> then it can just compare the string children
against 'mars'
190
Disadvantages of a
Strongly Typed System
• XPath authoring is complicated because more
attention must be paid to types.
– Example: if you want to compare a number against a
number that is represented as a string then you have to
explicitly cast the number to a string and then do the
comparison.
• Supporting an extensive type system puts a burden
on implementers of XPath. This is why schema
awareness is optional for implementers.
191
XML Schema Datatypes
• XPath 2.0 uses the datatypes defined in the
XML Schema Datatypes Specification
192
193
XPath Functions are Strongly
Typed
• Each XPath function requires arguments to be of a
certain datatype.
• Each XPath function returns a result as a certain
datatype.
• Example: here is the signature of the currentdateTime function:
current-dateTime() as xs:dateTime
Read as: "The current-dateTime function is
invoked without any arguments; it returns a value
that has the datatype: XML Schema dateTime."
194
XPath Operators are Strongly
Typed
• Each XPath operator requires the operands to be
of a certain datatype.
• Each XPath operator returns a result as a certain
datatype.
• Example: you can subtract two dateTime values
and the result is of type xs:duration
current-dateTime() - xs:dateTime('1970-01-01T00:00:00Z') returns P14275DT15H49M28.796S
Read as: "The duration between now (Jan. 31,
2009, 10:49am) and Jan. 01, 1970 is 14,275 days,
15 hours, 49 minutes, 28.796 seconds."
See example36
195
Constructor Functions
• Constructor functions are used to construct atomic values with the
specified types.
• Example: the constructor:
xs:dateTime('1970-01-01T00:00:00Z')
constructs an atomic value whose type is xs:dateTime.
• The signature of the xs:dateTime constructor is:
xs:dateTime($arg as xs:anyAtomicType?) as xs:dateTime?
• There is a constructor function for each of the W3C built-in atomic
types.
• If the argument is a node, the atomic value is extracted and that value
is cast to the type.
• If the argument is an empty sequence, the result is an empty sequence.
• The complete list of constructor functions.
xs:string($arg as xs:anyAtomicType?) as xs:string?
xs:boolean($arg as xs:anyAtomicType?) as xs:boolean?
xs:decimal($arg as xs:anyAtomicType?) as xs:decimal?
xs:float($arg as xs:anyAtomicType?) as xs:float?
Implementations ·may· return negative zero for xs:float("-0.0E0").
xs:duration($arg as xs:anyAtomicType?) as xs:duration?
xs:dateTime($arg as xs:anyAtomicType?) as xs:dateTime?
xs:time($arg as xs:anyAtomicType?) as xs:time?
xs:date($arg as xs:anyAtomicType?) as xs:date?
xs:gYearMonth($arg as xs:anyAtomicType?) as xs:gYearMonth?
xs:gYear($arg as xs:anyAtomicType?) as xs:gYear?
xs:gMonthDay($arg as xs:anyAtomicType?) as xs:gMonthDay?
xs:gDay($arg as xs:anyAtomicType?) as xs:gDay?
xs:gMonth($arg as xs:anyAtomicType?) as xs:gMonth?
xs:hexBinary($arg as xs:anyAtomicType?) as xs:hexBinary?
xs:base64Binary($arg as xs:anyAtomicType?) as xs:base64Binary?
xs:anyURI($arg as xs:anyAtomicType?) as xs:anyURI?
xs:QName($arg as xs:anyAtomicType) as xs:QName?
xs:normalizedString($arg as xs:anyAtomicType?) as xs:normalizedString?
xs:token($arg as xs:anyAtomicType?) as xs:token?
xs:language($arg as xs:anyAtomicType?) as xs:language?
xs:NMTOKEN($arg as xs:anyAtomicType?) as xs:NMTOKEN?
xs:Name($arg as xs:anyAtomicType?) as xs:Name?
xs:NCName($arg as xs:anyAtomicType?) as xs:NCName?
xs:ID($arg as xs:anyAtomicType?) as xs:ID?
xs:IDREF($arg as xs:anyAtomicType?) as xs:IDREF?
xs:ENTITY($arg as xs:anyAtomicType?) as xs:ENTITY?
xs:integer($arg as xs:anyAtomicType?) as xs:integer?
xs:nonPositiveInteger($arg as xs:anyAtomicType?) as xs:nonPositiveInteger?
xs:negativeInteger($arg as xs:anyAtomicType?) as xs:negativeInteger?
xs:long($arg as xs:anyAtomicType?) as xs:long?
xs:int($arg as xs:anyAtomicType?) as xs:int?
xs:short($arg as xs:anyAtomicType?) as xs:short?
xs:byte($arg as xs:anyAtomicType?) as xs:byte?
xs:nonNegativeInteger($arg as xs:anyAtomicType?) as xs:nonNegativeInteger?
xs:unsignedLong($arg as xs:anyAtomicType?) as xs:unsignedLong?
xs:unsignedInt($arg as xs:anyAtomicType?) as xs:unsignedInt?
xs:unsignedShort($arg as xs:anyAtomicType?) as xs:unsignedShort?
xs:unsignedByte($arg as xs:anyAtomicType?) as xs:unsignedByte?
xs:positiveInteger($arg as xs:anyAtomicType?) as xs:positiveInteger?
xs:yearMonthDuration($arg as xs:anyAtomicType?) as xs:yearMonthDuration?
xs:dayTimeDuration($arg as xs:anyAtomicType?) as xs:dayTimeDuration?
xs:untypedAtomic($arg as xs:anyAtomicType?) as xs:untypedAtomic?
196
197
New Datatypes
• The XPath 2.0 working group decided that
the XML Schema datatypes are not
complete, so they created a few new ones
and added them to the XML Schema
datatypes.
198
xs:anyAtomicType
• xs:anyAtomicType is an abstract type that is
the base type of all atomic values.
• All datatypes, including the original XML
Schema datatypes, are subtypes of
xs:anyAtomicType
• "Abstract" means that it cannot be used
directly; instead, a subtype must be used.
199
xs:untypedAtomic
• Any value that has not been associated with
a schema type has the type
xs:untypedAtomic.
200
xs:dayTimeDuration
• This is a subtype of xs:duration. It has only
day, hour, minute, and second components.
• Subtracting two xs:date values yields a
result of type xs:dayTimeDuration
current-date() - xs:date('1970-01-01')
P1Y2M3DT10H30M12.3S
xs:duration
subtype
P428DT10H30M12.3S
xs:dayTimeDuration
See example37
201
Subtracting Two Dates
• Here's an example of subtracting two xs:date
values:
current-date() - xs:date('1970-01-01')
• The resulting value is an xs:dayTimeDuration
value.
• Here's how it is specified in the XPath 1.0 and
XPath 2.0 Functions and Operators specification:
op:subtract-dates($arg1 as xs:date, $arg2 as xs:date) as xs:dayTimeDuration?
"When subtracting two values, each of type xs:date, the resulting value is of
type xs:dayTimeDuration."
http://www.w3.org/TR/xquery-operators/#func-subtract-dates
202
xs:yearMonthDuration
• This is also a subtype of xs:duration. It has
only has the year and month components.
P1Y2M3DT10H30M12.3S
xs:duration
subtype
P1Y2M
xs:yearMonthDuration
203
Datatype of Literals and
Expressions
•
•
•
•
•
•
•
•
datatype of current-dateTime() - xs:dateTime('1970-01-01T00:00:00Z') is
xs:dayTimeDuration
datatype of current-date() - xs:date('1970-01-01') is xs:dayTimeDuration
datatype of 3 is xs:integer
datatype of 3.14 is xs:decimal
datatype of "3" is xs:string
datatype of true is Unknown xs:untypedAtomic
datatype of true() is xs:boolean
datatype of 1E3 is xs:double
See example38
204
Datatype of Input Data
Unassociated with a Schema
• datatype of //planet[1]/mass is Unknown
xs:untypedAtomic
• datatype of //planet[1]/mass/text() is Unknown
xs:untypedAtomic
See example39
205
Datatype of Arithmetic
Operations
•
•
•
•
•
•
datatype of
datatype of
datatype of
datatype of
datatype of
datatype of
2 + 2 is xs:integer
2.0 + 2.0 is xs:decimal
2.0 + 2 is xs:decimal
6 div 2 is xs:integer
6.0 div 2.0 is xs:decimal
6.0 div 2 is xs:decimal
See example40
206
Numeric Types
• The 4 main numeric types supported in XPath 2.0
are:
–
–
–
–
xs:decimal
xs:integer
xs:float
xs:double
• All arithmetic operators and functions that can be
performed on these types can also be performed
on their subtypes.
207
xs:decimal
• Numeric literals that contain only digits and
a decimal point (no letter E or e) are
considered to be decimal numbers with the
type xs:decimal.
• Example: 25.5 and 25.0 are xs:decimal
values.
208
xs:integer
• Numeric literals that contain only digits (no
decimal point or the letter E or e) are
considered to be integer numbers with the
type xs:integer.
• Example: 25 is an integer value.
209
xs:float and xs:double
• Numeric literals that contain the letter E or e
are considered to be double numbers with
the type xs:double.
• Example: 1E3 and 1e3 are xs:double values.
See example41
210
How a Value
becomes Numeric
• The value is a numeric literal
• The value is selected from an input document that is
associated with a schema that declares it to have a numeric
type
• The value is the result of a function that returns a number,
e.g. count(…) returns xs:integer
• The value is the result of a numeric constructor function,
e.g. xs:float("25.83") returns a xs:float value
• The value is the result of an explicit cast, e.g.,
//planet[1]/mass cast as xs:decimal
• The value is cast automatically when it is passed to a
function
211
The number() Function
• The number() function is almost equivalent
to the xs:double() constructor function.
• Both return a value of type xs:double.
• Differences:
–
–
–
–
number("hi") = NaN
xs:double("hi") = error
number(()) = NaN
xs:double(()) = error
See example42
212
Numeric Type Promotion
• If an operation, such as comparison or an
arithmetic operation, is performed on values
of two different primitive numeric types,
one value's type is promoted to the type of
the other.
213
Numeric Type Promotion
Operand #1
Operand #2
Promoted to
xs:decimal
xs:float
xs:float
xs:decimal
xs:double
xs:double
xs:float
xs:double
xs:double
214
Numeric Type Promotion
Example: 1.0 + 1.2E0 = 2.2E0
xs:decimal
xs:double
xs:double
promote
xs:double
Numeric type promotion happens automatically in
arithmetic expressions and comparison expressions. It
also occurs in calls to functions that expect numeric
values.
See example43
215
Subtype Substitution
• Wherever a type is expected, you can
substitute it with any of its derived types.
• Example: a function that expects a
xs:decimal value can be invoked with an
xs:integer value since integer derives from
decimal.
216
Descargar

XPath 2.0