Uniform Resource Identifiers
Jacek Kopecký
WSML Working Group
June 2004
Overview
•
•
•
•
•
History of URIs
URI syntax
URI references and their resolution
Good practices for creating URIs
Interesting issues
June 2004
Jacek Kopecký, [email protected]
2
URI History
• Universal Resource Identifiers (RFC 1630,
June 1994)
• Uniform Resource Locators and Names
• RFC 2396, August 1998
• 2396bis in development
• Originally “Universal”, later “Uniform” as a
compromise
• “Universal” again preferred by TimBL
June 2004
Jacek Kopecký, [email protected]
3
URLs and URNs
• Locators (addresses) vs. Names
• URNs not easily dereferencable
• URNs can be made dereferencable by
infrastructure
• URLs perceived as less persistent
• URLs and URNs drifting towards middle
ground
• http://www.w3.org/DesignIssues/NameMyth.html
• No point in making the distinction any more
June 2004
Jacek Kopecký, [email protected]
4
Uniform Resource Identifiers
• URIs “identify” “resources”
• Identification doesn’t imply interaction
• Resource is a sameness of characteristics over time
• Latest blog rant
• Latest blog rant on politics
• Blog rant on politics from 2004-6-22
• Resource need not be accessible when URI is
created
• Pictures from my future trip to London will be at
http://jacek.cz/photos/2004-08-london
June 2004
Jacek Kopecký, [email protected]
5
URI Syntax
• According to 2396bis
• http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html
• Examples
•
•
•
•
http://www.ietf.org/rfc/rfc2396.txt
mailto:[email protected]
news:comp.infosystems.www.servers.unix
telnet://melvyl.ucop.edu/
• URI Syntax - simplified
• scheme: [//authority] [/path] [?query] [#fragid]
• Relative URI without “scheme:”
• Dot path segments (‘.’ and ‘..’) treated specially
June 2004
Jacek Kopecký, [email protected]
6
URI Syntax cont’d
• Reserved characters (like /:?#@$&+* )
• Many allowed characters
• Rest of UNICODE percent-encoded from
UTF-8
• http://google.com/search?q=kopeck%C3%BD
• Percent-encoding allowed characters creates
equivalent URIs
• But namespaces compared char-by-char
June 2004
Jacek Kopecký, [email protected]
7
URI Reference Resolution
• Resolving URI A against base URI B
• Going from the left, keep as much from B as
is undefined in A
• First part of A replaces that part from B
• Path resolution special
• If A has absolute path, that is taken
• Relative path from A resolved against path from
B, removing dot segments from result
• Everything after first part of A taken from A
• Fragment always taken from A
June 2004
Jacek Kopecký, [email protected]
8
URI Ref. Resolution Examples
•
Base URI: http://a/b/c/d?e#f
1.
2.
3.
4.
5.
6.
7.
8.
9.
g
.
./
./g
..
../
../g
../../g
../../../g
June 2004
= http://a/b/c/g
= http://a/b/c/
= http://a/b/c/
= http://a/b/c/g
= http://a/b/
= http://a/b/
= http://a/b/g
= http://a/g
= http://a/g
Jacek Kopecký, [email protected]
9
URI Ref. Resolution Examples
•
Base URI: http://a/b/c/d?e#f
10.
11.
12.
13.
14.
15.
16.
17.
18.
/./g
//g
#s
g#s
?y
g?y
g?y#s
g:h
./g:h
June 2004
= http://a/g
= http://g
= http://a/b/c/d?e#s
= http://a/b/c/g#s
= http://a/b/c/d?y
= http://a/b/c/g?y
= http://a/b/c/g?y#s
= g:h
= http://a/b/d/g:h
Jacek Kopecký, [email protected]
10
Base URIs
•
Necessary when resolving URI references
1. Explicit base URI embedded in content
•
<link xml:base=“http://example.com/bar/” href=“x.html” />
2. URI of the document
•
Usual in HTML files on the web
3. App-dependent base URI default
June 2004
Jacek Kopecký, [email protected]
11
URI Equivalence
• Do two URIs identify the same resource?
• Comparing without accessing the resources
• Various applications for URI comparison
• Increasing cache efficiency
• Comparing the namespaces of two symbols
• Algorithms must avoid false positives
• False negatives unavoidable
• http://weather.example.com/innsbruck
• http://jacek.cz/innsbruckweather redirect to above
June 2004
Jacek Kopecký, [email protected]
12
Uses of URIs
•
•
•
•
•
Addresses on the Web
Namespaces in XML QNames
Namespaces in QNames in other languages
Identifiers of things and concepts (e.g. RDF)
Unique keys (e.g. MIME message ID)
June 2004
Jacek Kopecký, [email protected]
13
QName
•
•
•
•
•
•
•
•
Introduced in XML Namespaces
Name of an XML namespace-qualified element
RDF uses QNames for brevity of URI notation
XML Schema expanded use of QNames to further
things (6 symbol spaces)
Every following language uses QNames as
identifiers
Number of independent symbol spaces
=> Turning QNames into URIs is cumbersome
Should have been as simple as in RDF (IMHO)
June 2004
Jacek Kopecký, [email protected]
14
Creating URIs for Web Resources
• Versioning approach for persistence
• http://w3.org/TR/soap vs.
• http://w3.org/TR/soap12 vs.
• http://w3.org/TR/2003/REC-soap12-part1-20030624/
• Simple, memorable URIs
•
•
•
•
http://jacek.cz/blog
Scribbled on a napkin
Correcting spelling and case helps – mod_speling
Making the “www.” prefix optional (both ways) helps
• Content negotiation – drop .html (.php, .asp)
• URI changes harmful
June 2004
Jacek Kopecký, [email protected]
15
Creating Example URIs
•
•
•
•
•
•
http://example.com
http://example.net
http://example.org
Reserved for precisely this purpose
Or use own domain (deri.org, wsmo.org)
http://foo.com not good
June 2004
Jacek Kopecký, [email protected]
16
Creating URIs for Namespaces
• Dereferencable, ending with ‘/’ or ‘#’
• Canonical URIs – no unnecessary dot
segments or percent-encoding
• Namespaces compared char-by-char
• Namespace document
• Preferably in the language that uses the
namespace – enables automatic discovery
• With human-oriented descriptions
• To allow for the above, don’t share namespace
URIs for schema and WSDL
June 2004
Jacek Kopecký, [email protected]
17
Creating URIs for Concepts
• Group concepts in a common,
dereferencable namespace
• Each concept identified by its fragID
• In RDF/XML, namespace ends with ‘#’
• Namespace document describes the
concepts
• Two problems
• FragIDs depend on media types
• Can http://example.com/#car identify a car?
June 2004
Jacek Kopecký, [email protected]
18
Fragment IDs in URIs
• Fragment ID identifies a secondary resource
• Interpretation of fragment IDs depends on
media type
• In HTML <a name=“foo”>
• In XML <element xml:id=“foo”/>
• No meaning in JPEG
• xml:id in development
• So far language-dependent (often DTD) solutions
• Fragment IDs should mean the same thing
across media types with content negotiation
June 2004
Jacek Kopecký, [email protected]
19
Range of HTTP URIs?
• Open W3C TAG issue
• Can http: URI identify a car?
• Can I say http://jacek.cz/dragstar/ is my
motorbike?
• TimBL doesn’t seem to think so
• Is it necessary to distinguish between a thing
and a description of that thing?
June 2004
Jacek Kopecký, [email protected]
20
Other Interesting Issues
• data: URI scheme – the URI is the resource
• RFC 2397
• …
• mailto: scheme a misnomer
• URIs don’t specify actions but identifiers
• uuid: scheme for unique identifiers
• Good for transient identification in closed systems
• Mismatches between perceived and intended
meaning of a resource
• http://w3.org/tr/soap
• Should URIs be human-readable?
• http://www.bscw.semanticweb.org/bscw/bscw.cgi/0/21621
June 2004
Jacek Kopecký, [email protected]
21
Main Points
•
•
•
•
•
Cool URIs don’t change
URIs can be (and are) scribbled on napkins
URIs don’t (necessarily) point to documents
Dereferencable URIs also good as names
URLs, URNs obsolete
June 2004
Jacek Kopecký, [email protected]
22
References
•
•
•
•
•
•
http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html
http://www.ietf.org/rfc/rfc2396.txt
http://www.w3.org/Provider/Style/URI
http://www.w3.org/DesignIssues/Architecture.html
http://www.w3.org/DesignIssues/Axioms.html
http://www.w3.org/DesignIssues/NameMyth.html
June 2004
Jacek Kopecký, [email protected]
23
Hope it Helped
• Thanks for your attention
• Questions? Comments?
• [email protected]
June 2004
Jacek Kopecký, [email protected]
24
Descargar

Uniform Resource Identifiers (URIs)