Tutorial
OAI and OAI-PMH for Beginners
An introduction to the Open Archives Initiative
and the Protocol for Metadata Harvesting
Uwe Müller
Humboldt University Berlin, Germany
[email protected]
Andy Powell
UKOLN, University of Bath
[email protected]
Agenda
 Part I
History and overview
 Part II
Technical introduction
 Coffee/tea break
 Part III
Implementation issues – data provider and service
provider
 Part IV
Implementation issues – XML schema and supporting
multiple record formats
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners
Acknowledgements
 Some of the slides presented here are our own!
 Many of them have been kindly donated by (taken
from!):
Herbert Van de Sompel
Carl Lagoze
Michael Nelson
Simeon Warner
(and others probably!)
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners
Tutorial
OAI and OAI-PMH for Beginners
An introduction to the Open Archives Initiative
and the Protocol for Metadata Harvesting
Part I: History and overview
Andy Powell
UKOLN, University of Bath
[email protected]
OAI roots…
 the roots of OAI lie in the development of eprint
archives…
arXiv, CogPrints, NACA (NASA), RePEc, NDLTD,
NCSTRL
 each offered Web interface for deposit of articles
and for end-user searches
 difficult for end-users to work across archives
without having to learn multiple different interfaces
 recognised need for single search interface to all
archives
Universal Pre-print Service (UPS)
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Searching vs. harvesting
 two possible approaches to building the UPS…
 cross-searching multiple archives based on
protocol like Z39.50
 harvesting metadata into one or more ‘central’
services – bulk move data to the user-interface
 US digital library experience in this area (e.g.
NCSTRL) indicated that cross-searching not
preferred approach - distributed searching of N
nodes viable, but only for small values of N
NCSTRL: N > 100; bad
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Problems of cross-searching
 collection description
how do you know which targets to search?
 query-language problem
syntax varies and drifts over time between the various
nodes
 rank-merging problem
how do you meaningfully merge multiple result sets?
 performance
tends to be limited by slowest target
 difficult to build browse interface
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Universal Preprint Service
 a cross-archive DL that that provides services on
a collection of metadata harvested from multiple
archives
based on NCSTRL+; a modified version of Dienst
 demonstrated at Santa Fe NM, October 21-22,
1999
http://ups.cs.odu.edu/
D-Lib Magazine, 6(2) 2000 (2 articles)
http://www.dlib.org/dlib/february00/02contents.html
 UPS was soon renamed the Open Archives
Initiative (OAI) http://www.openarchives.org/
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
RDN experience
 similar experience within the UK Resource
Discovery Network (RDN)
 cross-searching of only 5 subject gateways
 problems with cross-searching approach
performance
central browse interface
 looking for metadata harvesting solution
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Data and service providers
 UPS identified two logical groups of services…
 data providers
handle deposit/publishing of resources in archive
expose metadata about resources in archive
 service providers
harvest metadata from data providers
use it to offer single user-interface across all harvested
metadata
 note:
data provider may also be responsible for human-oriented
(I.e. Web) interface to archive
both functions may be offered by same ‘service’
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Metadata harvesting requirements
 in order that harvesting approach can work need
agreements about…
 transport protocols – HTTP vs. FTP vs. …
 metadata formats – DC vs. MARC vs. …
 quality assurance – mandatory elements,
mechanisms for naming of people, subjects, etc.,
handling duplicated records, best-practice
 intellectual property and usage rights – who can
do what with the records
 work in this area resulted in the “Santa Fe
Convention”
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
OAI-PMH v 1.0 [01/2001]
 goal: optimise discovery of document-like objects
 inputs…
 Santa Fe Convention
 various DLF meetings on metadata harvesting
 deliberations at Cornell
 alpha-testers of OAI-PMH v 1.0
 recognition of DC as ‘best’ core metadata
format for interoperability across multiple
archives
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
OAI-PMH v 1.0 [01/2001]
 low-barrier interoperability specification
 metadata harvesting model: data provider /
service provider
 focus on document-like objects
 autonomous protocol
 HTTP based
 XML responses
 unqualified Dublin Core
 experimental: 12-18 months
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
What’s in a name?
Open
Archives
the protocol is openly
documented, and metadata
is “exposed” to at least some
peer group (note: rights
management can still apply!)
Initiative
archive defined as a
“collection of stuff” -not the archivist’s
definition of “archive”.
“Repository” used in
most OAI documents.
OAI is happening
at break-neck speed...
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
OAI timeline before v. 2.0











October 21-22, 1999 - initial UPS meeting
February 15, 2000 - Santa Fe Convention published in D-Lib Magazine
precursor to the OAI metadata harvesting protocol
June 3, 2000 - workshop at ACM DL 2000 (Texas)
August 25, 2000 - OAI steering committee formed, DLF/CNI support
September 7-8, 2000 - technical meeting at Cornell University
defined the core of the current OAI metadata harvesting protocol
September 21, 2000 - workshop at ECDL 2000 (Portugal)
November 1, 2000 - Alpha test group announced (~15 organizations)
January 23, 2001 - OAI protocol 1.0 announced, OAI Open Day in the U.S.
(Washington DC)
purpose: freeze protocol for 12-16 months, generate critical mass
February 26, 2001 - OAI Open Day in Europe (Berlin)
July 3, 2001 - OAI protocol 1.1 announced
to reflect changes in the W3C’s XML latest schema recommendation
September 8, 2001 - workshop at ECDL 2001 (Darmstadt)
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
OAI-PMH v.2.0 [06/2002]
 goal: recurrent exchange of metadata about
resources between systems

inputs:
 OAI-PMH v.1.0
 feedback on OAI-implementers
 deliberations by OAI-tech [09/01 - 06/02]
 alpha test group of OAI-PMH v.2.0 [03/02 06/02]
officially released June 14, 2002
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
OAI-PMH v.2.0 [06/2002]
 low-barrier interoperability specification
 metadata harvesting model: data provider /
service provider
 metadata about resources
 autonomous protocol
 HTTP based
 XML responses
 unqualified Dublin Core
 stable
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Santa Fe
convention
OAI-PMH
v.1.0/1.1
OAI-PMH
v.2.0
nature
experimental
experimental
stable
verbs
Dienst
OAI-PMH
OAI-PMH
requests
HTTP GET/POST
HTTP GET/POST
HTTP GET/POST
responses
XML
XML
XML
transport
HTTP
HTTP
HTTP
metadata
OAMS
unqualified
Dublin Core
about
eprints
unqualified
Dublin Core
document
like objects
model
metadata
harvesting
metadata
harvesting
metadata
harvesting
resources
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Flexible deployment
 simple protocol based on HTTP and XML allows
for rapid deployment
 a number of toolkits available – see part III
 systems can be deployed in variety of
configurations
 multiple service providers can harvest from
multiple data providers
 aggregators can sit between data and service
providers
 harvesting approach can be complemented with
searching based on Z39.50 or SRW
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Multiple data and service p’s
Data providers
Harvesting
based on
OAI-PMH
Service providers
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Aggregators
Data providers
Aggregator
Service providers
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Can be mixed with x-searching
Data providers
Harvesting
based on
OAI-PMH
Searching
based on
Z39.50 or
SRW
Service providers
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Summary
 OAI-PMH – OAI Protocol for Metadata Harvesting
 low-cost mechanism for harvesting metadata
records from one system to another
from ‘data providers’ to ‘service providers’
 development over last 2-3 years has seen move
from specific (discovery of e-prints) to generic
(sharing descriptions of any resources)
 based on HTTP and XML – Web-friendly
 allows client to say ‘give me some or all of your
records’ where ‘some’ is based on
date-stamps, sets, metadata formats
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Summary (2)
 mandates simple DC as record format but
extensible to any format encoded in XML
 OAI-PMH is not a search protocol
but use can underpin search-based services based on
Z39.50 or SRW or …
 metadata and full-text typically made freely
available – but not a requirement
OAI-PMH can be used between closed groups
 access-control and compression mechanisms
based on underlying HTTP protocol
 simple protocol allows easy deployment
systems can be combined in variety of ways
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Important resources
 OAI Web site:
http://www.openarchives.org/
 OAI-PMH specification:
http://www.openarchives.org/OAI/openarchivesprotocol.html
 Implementation guidelines:
http://www.openarchives.org/OAI/2.0/guidelines.htm
 Discussion lists:
http://www.openarchives.org/mailman/listinfo/oai-general
http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
 Repository explorer:
http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai
 Tools: http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part I
Tutorial
OAI and OAI-PMH for Beginners
An introduction to the Open Archives Initiative
and the Protocol for Metadata Harvesting
Part II: Technical Introduction
Uwe Müller
Humboldt University Berlin, Germany
[email protected]
Agenda
1. Protocol Basics
2. Protocol Details
3. Request Types
4. Examples
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
The Open Archives Initiative (OAI)

Main ideas
world-wide consolidation of scholarly archives
free access on the archives (at least: metadata)
consistent interfaces for archives and service provider
low barrier protocol / effortless implementation
based on existing standards (e.g. HTTP, XML, DC)

Basic functioning
Requests (based on HTTP)
Metadata
„Service”
Metadata
(Documents)
Harvester
Service Provider
Metadata (encoded in XML)
Repository
Data Provider
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
OAI: General Assumptions
 two groups of ‘participants’
 Data Providers (Open Archives, Repositories)
free access of metadata
not necessarily: free access to full texts / resources
easy to implement, low barriers
 Service Providers
use OAI interfaces of the Data Providers
harvest and store metadata (no live requests!)
may select certain subsets from Data Providers
(set hierarchy, date stamp)
may enrich metadata
offer (value-added) service on the basis of the metadata
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Data
Provider
Data
Provider
Repository
Images
e-print
Data
Provider
Identify
OPAC
e-print
Data
Provider
Requests:
e-prints
e-print
Museum
Data
Provider
OAI-PMH: Structure Model
Archive
e-print
ListMetadataformats
ListSets
ListIdentifiers
Service
Provider
Data
Provider
ListRecords
Repository
GetRecord
Harvester
Repository
Responses:
General information
Metadata formats
Repository
e-print
Set structure
Record identifier
Metadata
Repository
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
OAI-PMH: Protocol Overview
protocol based on HTTP
request arguments as GET or POST parameters
six request types
e.g. http://archive.org?
verb=ListRecords&from=2002-11-01
responses are encoded in XML syntax
supports any metadata format (at least: Dublin Core)
logical set hierarchy (definition: data providers)
date stamps (last change of metadata set)
error messages
flow control
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Agenda
1. Protocol Basics
2. Protocol Details
3. Request Types
4. Examples
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Protocol Details: Definitions
Harvester
client application issuing OAI-PMH requests
Repository
network accessible server, able to process OAI-PMH requests
correctly
Resource
object the metadata is “about”, nature of resources is not defined in
the OAI-PMH
Item
component of an repository from which metadata about a resource
can be disseminated
has an unique identifier
Record
metadata in a specific metadata format
Identifier
unique key for an item in a repository
Set
optional construct for grouping items in a repository
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Protocol Details: Definitions (2)
resource
item =
identifier
all available metadata
about David
Dublin Core
metadata
MARC
metadata
SPECTRUM
metadata
item
records
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Protocol Details: Records
 metadata of a resource in a specific format
 three parts
1. header (mandatory)
identifier (1)
datestamp (1)
setSpec elements (*)
status attribute for deleted item (?)
2. metadata (mandatory)
XML encoded metadata with root tag, namespace
repositories must support Dublin Core
3. about (optional)
rights statements
provenance statements
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Protocol Details: Datestamps
 date of last modification of a metadata set
 mandatory characteristic of every item
 two possible granularities:
YYYY-MM-DD, YYYY-MM-DDThh:mm:ssZ
 function: information on metadata, selective
harvesting (from and until arguments)
 applications: incremental update mechanisms
 modification, creating, deletion
 deletion: three support levels
no, persistent, transient
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Protocol Details: Metadata Schema
 OAI-PMH supports dissemination of multiple
metadata formats from a repository
 properties of metadata formats
id string to specify the format (metadataPrefix)
metadata schema URL (XML schema to test validity)
XML namespace URI (global identifier for metadata
format)
 repositories must be able to disseminate
unqualified Dublin Core
 arbitrary metadata formats can be defined and
transported via the OAI-PMH
 returned metadata must comply with XML
namespace specification
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Protocol Details: Metadata Schema (2)

minimum standard: unqualified Dublin Core
http://dublincore.org/
Dublin Core Metadata Element Set contains 15 elements
elements are optional
elements may be repeated
The Dublin Core Metadata Element Set:
Title
Contributor
Source
Creator
Date
Language
Subject
Type
Relation
Description
Format
Coverage
Publisher
Identifier
Rights
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Protocol Details: Sets







logical partitioning of repositories
optional – archives do not have to define sets
no recommendations
not necessarily exhaustive
not necessarily strictly hierarchical
function: selective harvesting (set parameter)
applications:
subject gateways, dissertation search engine, …
 examples (Germany, see http://www.dini.de)
publication types (thesis, article, …)
document types (text, audio, image, …)
content sets, according to DNB (medicine, biology, …)
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Protocol Details: Request Format
 requests must be submitted using the GET or
POST methods of HTTP
 repositories must support both methods
 at least one key=value pair: verb=[RequestType]
 additional key=value pairs depend on request
type
 example for GET request: http://archive.org/oai?
verb=ListRecords&metadataPrefix=oai_dc
 encoding of special characters
e.g. “:” (host port separator) becomes “%3A”
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Protocol Details: Response





formatted as HTTP responses
content type must be text/xml
status codes (distinguished from OAI-PMH errors)
e.g. 302 (redirect), 503 (service not available)
compression: optional in OAI-PMH,
only identity encoding is mandatory
response format: well formed XML with markup:
1. XML declaration
(<?xml version="1.0" encoding="UTF-8" ?>)
2. root element named OAI-PMH with three attributes
(xmlns, xmlns:xsi, xsi:schemaLocation)
3. three child elements
1. responseDate (UTC datetime)
2. request (request that generated this response)
3. a) error (in case of an error or exception condition)
b) element with the name of the OAI-PMH request
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Protocol Details: Flow Control





four of the request types return a list of entries
three of them may reply ‘large’ lists
OAI-PMH supports partitioning
decision on partitioning: repository
response to a request includes
incomplete list
resumption token
+ expiration date, size of complete list, cursor (optional)

new request with same request type
resumption token as parameter
all other parameters omitted!

response includes
next (maybe last) section of the list
resumption token (empty if last section of list enclosed)
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Protocol Details: Flow Control (2)
Example
“want to have all your records”
Service Provider
archive.org/oai?verb=ListRecords&
metadataPrefix=oai_dc
Data Provider
“have 267, but give you only 100”
100 records + resumptionToken “anyID1”
“want more of this”
archive.org/oai?resumptionToken=anyID1
Harvester
“have 267, give you another 100”
Repository
100 records + resumptionToken “anyID2”
“want more of this”
archive.org/oai?resumptionToken=anyID2
“have 267, give you my last 67”
67 records + resumptionToken “”
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Protocol Details: Errors and Exceptions
 repositories must indicate OAI-PMH errors
 inclusion of one or more error elements
 defined error identifiers
badArgument
badResumptionToken
badVerb
cannotDisseminateFormat
idDoesNotExist
noRecordsMatch
noMetaDataFormats
noSetHierarchy
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Agenda
1. Protocol Basics
2. Protocol Details
3. Request Types
4. Examples
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Request Types
 six different request types
1.
2.
3.
4.
5.
6.




Identify
ListMetadataFormats
ListSets
ListIdentifiers
ListRecords
GetRecord
harvester has not to use all types
repository must implement all types
required and optional arguments
depend on request types
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Request Type: Identify
function
description of an archive
example
archive.org/oai-script?verb=Identify
parameters
none
errors / exceptions
badArgument
e.g. archive.org/oai-script?verb=Identify&
set=biology
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Request Type: Identify (2)
response format
Element
Example
#
repositoryName
My Archive
1
baseURL
http://archive.org/oai
1
protocolVersion
2.0
1
earliestDatestamp 1999-01-01
1
deleteRecords
no, transient, persistent
1
granularity
YYYY-MM-DD, YYYY-MM-DDThh:mm:ssZ
1
adminEmail
[email protected]
+
compression
deflate, compress, …
*
description
oai-identifier, eprints, friends, …
*
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Request Type: ListMetadataFormats
function
retrieve available metadata formats from archive
example
archive.org/oai-script?verb=ListMetadataFormats&
identifier=oai:HUBerlin.de:3000218
parameters
identifier (optional)
errors / exceptions
badArgument
idDoesNotExist
e.g. archive.org/oai-script?verb=ListMetadataFormats&
identifier=really-wrong-identifier
noMetadataFormats
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Request Type: ListSets
function
retrieve set structure of a repository
example
archive.org/oai-script?verb=ListSets
parameters
resumptionToken (exclusive)
errors / exceptions
badArgument
badResumptionToken
e.g. archive.org/oai-script?verb=ListSets&
resumptionToken=any-wrong-token
noSetHierarchy
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Request Type: ListIdentifiers
function
abbreviated form of ListRecords, retrieving only headers
example
archive.org/oai-script?verb=ListIdentifiers&
metadataPrefix=oai_dc&from=2002-12-01
parameters
from (optional)
until (optional)
metadataPrefix (required)
set (optional)
resumptionToken (exclusive)
errors / exceptions
badArgument, e.g. …&from=2002-12-01-13:45:00
badResumptionToken
cannotDisseminateFormat
noRecordsMatch
noSetHierarchy
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Request Type: ListRecords
function
harvest records from a repository
example
archive.org/oai-script?verb=ListRecords&
metadataPrefix=oai_dc&set=biology
parameters
from (optional)
until (optional)
metadataPrefix (required)
set (optional)
resumptionToken (exclusive)
errors / exceptions
badArgument
badResumptionToken
cannotDisseminateFormat
noRecordsMatch
noSetHierarchy
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Request Type: GetRecord
function
retrieve individual metadata record from a repository
example
archive.org/oai-script?verb=GetRecord&
identifier=oai:HUBerlin.de:3000218&
metadataPrefix=oai_dc
parameters
identifier (required)
metadataPrefix (required)
errors / exceptions
badArgument
cannotDisseminateFormat
idDoesNotExist
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Agenda
1. Protocol Basics
2. Protocol Details
3. Request Types
4. Examples
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Example: http://edoc.hu-berlin.de/OAI-2.0?
verb=ListIdentifiers&from=2002-01-06&until=2002-01-08&
metadataPrefix=oai_dc&set=doctypes:dissertations
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2002-10-22T17:49:49+01:00</responseDate>
<request verb="ListIdentifiers" from="2002-01-03" until="2002-01-08" metadataPrefix="oai_dc"
set="doctypes:dissertations">http://edoc.hu-berlin.de/OAI-2.0</request>
<ListIdentifiers>
<header>
<identifier>oai:HUBerlin.de:3000819</identifier>
<datestamp>2002-01-08</datestamp>
<setSpec>doctypes</setSpec>
<setSpec>doctypes:dissertations</setSpec>
<setSpec>dnb</setSpec>
<setSpec>dnb:dnb33</setSpec>
</header>
<header>
<identifier>oai:HUBerlin.de:3000831</identifier>
<datestamp>2002-01-07</datestamp>
<setSpec>doctypes</setSpec>
<setSpec>doctypes:dissertations</setSpec>
<setSpec>dnb</setSpec>
<setSpec>dnb:dnb27</setSpec>
</header>
</ListIdentifiers>
</OAI-PMH>
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Example: http://edoc.hu-berlin.de/OAI-2.0?
verb=GetRecord&identifier=oai:HUBerlin:3000819&
metadataPrefix=oai_dc
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2002-11-27T14:57:01+01:00</responseDate>
<request verb="GetRecord" metadataPrefix="oai_dc"
identifier="oai:HUBerlin.de:3000819">http://edoc.hu-berlin.de/OAI-2.0</request>
<GetRecord>
<record>
<header>
<identifier>oai:HUBerlin.de:3000819</identifier>
[…]
</header>
<metadata>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Einfluß genetischer Variationen im Tumor Nekrose […]</dc:title>
<dc:creator>Schüttlöffel, Antje</dc:creator>
[…]
</metadata>
</record>
</GetRecord>
</OAI-PMH>
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Technical Introduction: Questions?
OAI – official site
http://www.openarchives.org/
protocol specification
http://www.openarchives.org/OAI/openarchivesprotocol.html
general mailing list
http://www.openarchives.org/mailman/listinfo/OAI-general/
implementers mailing list
http://www.openarchives.org/mailman/listinfo/OAI-implementers/
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part II
Tutorial
OAI and OAI-PMH for Beginners
An introduction to the Open Archives Initiative
and the Protocol for Metadata Harvesting
Part III: Implementation Issues
Data Provider and Service Provider
Uwe Müller
Humboldt University Berlin, Germany
[email protected]
Agenda
1. General Considerations
2. Data Provider
3. Service Provider
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
General: First Questions
Data Provider
Which data do I want to deliver?
Which service providers do I want to provide with data?
Service Provider
Which Service do I want to provide?
From which data providers do I get the metadata?
In which way the metadata have to be processed?
Data Provider & Service Provider
Which aspects do we have to agree upon?
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
General: Metadata Formats / Sets
 required: unqualified Dublin Core
 special subjects / communities: other metadata
specifications may be required
describe resources in a specialised way
definition of an XML schema (publicly available for
validation)
 define set hierarchy
sensible partitioning for selective harvesting
agreement between data providers and between data
and service providers
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
General: Organisational Structure
 aggregated data providers
if harvested by a service provider, “sub data providers”
should not be harvested by same SP (duplication ...)
 subject gateways
selective harvesting if corresponding sets have been
defined and implemented
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Agenda
1. General Considerations
2. Data Provider
3. Service Provider
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Data Provider: Prerequisites
 metadata on resources (“items”)
should be stored in (SQL) database
possible in case of need: file system …
unique identifier for each item
 web server, accessible via the internet
e.g. apache, IIS
 programming interface / API
e.g. Perl, PHP, Java-Servlet
web server extension
access to database (or filesystem)
not needed: session management
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Data Provider: Prerequisites (2)
 archive identifier / base URL
 unique identifier for items
 metadata format (at least: unqualified Dublin
Core)
 datestamps for metadata (created / last modified)
 logical set hierarchy (may have)
agreement within (subject) communities
 flow control / implementation of resumption token
(optional, ‘larger’ archives should have that)
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Data Provider: Architecture
OAI request
(HTTP request)
Programming extension
(e.g. PHP, Perl,
JavaServlets)
Web server
(e.g. Apache, IIS)
Script / Programme
OAI response
(XML instance)
- parsing arguments
- creating error messages
- creating SQL statements
-creating XML output
SQL
request
SQLDatabase
DB
response
OAI Data Provider
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Data Provider: General Structure
Argument Parser
validates OAI requests
Error Generator
creates XML responses with encoded error messages
Database Query / Local Metadata Extraction
retrieves metadata from repository
according to the required metadata format
XML Generator / Response Creation
creates XML responses with encoded metadata information
Flow Control
realises incomplete list sequences for ‘larger’ repositories
uses resumption token as mechanism
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Data Provider: Flow Chart
HTTP
request
verb
oai_dc
error: badResumptionToken
XML
response
else
else
Prefix
read parameters
from local system
GetRecord
ListRecords
ListIdentifiers
empty metadata
valid
unknown
re
sumption
Token
ListSets
empty
ListMetadataFormats
Identify
error: badArgument
• verb, metadataPrefix, resumptionToken … OAI arguments
• rows … size of the result list
• 100 … here: maximal list size
for responses
error: badVerb
error: cannotDisseminateFormat
parse the other
parameters
deliver min (rows, 100)
record headers
store parameters,
store and deliver
resumptionToken
yes
send SQL request
to database
rows>
100
no
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Data Provider: Resumption Token




should be implemented for “large” lists
initiated by data provider
store parameters (set, from, …) and number of already
delivered records
properties
expiration: expirationDate (optional)
completeListSize (optional)
already delivered records: cursor (optional)
recovery from network errors (possibility to re-issue most
recent resumption token)

problem
database changes
two possible solutions
duplicate data in a “request table”
store date of first request with the other parameters 
use like additional until argument
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Data Provider: Resumption Token (2)
Example
“want to have all your records”
Service Provider
archive.org/oai?verb=ListRecords&
metadataPrefix=oai_dc
Data Provider
“have 267, but give you only 100”
100 records + resumptionToken “anyID1”
“want more of this”
archive.org/oai?resumptionToken=anyID1
Harvester
“have 267, give you another 100”
Repository
100 records + resumptionToken “anyID2”
“want more of this”
archive.org/oai?resumptionToken=anyID2
“have 267, give you my last 67”
67 records + resumptionToken “”
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Data Provider: Resumption Token (3)
Example (2)
“want to have all your records”
Data Provider
archive.org/oai?verb=ListRecords&
metadataPrefix=oai_dc
“have 267, but give you only 100”
100 records + resumptionToken “anyID1”
“want more of this”
archive.org/oai?resumptionToken=anyID1
select dc-data
from metadata-table
267 records
anyID1 = {
1
from=empty,
2
until=empty,
set=empty,
Database
mdP=oai_dc,
date=
4
5
2002-12-05T15:00:00Z,
select dc-data
delivered=100
from metadata-table
}
“have 268, give you another 100”
insert,
update,
delete
3
268 records
100 records + resumptionToken “anyID2”
Repository
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Data Provider: Data Representation
 use recommended data representation
dates
2002-12-05
2002-xx-xx, 2002, 05.12.2002
language code
eng, ger, ...
en, de, english, german
 multi values: use own XML element for each entity
author
<dc:creator>Smith, Adam</dc:creator>
<dc:creator>Nash, John</dc:creator>
<dc:creator>Smith, Adam; Nash, John
</dc:creator>
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Data Provider: Compression







method to reduce traffic and enhance performance
optional for both sides: data and service providers
handled on HTTP level
harvesters may include an Accept-Encoding header in
their requests –specifying preferences
harvesters without Accept-Encoding header always
receive uncompressed data
repositories must support HTTP identity encoding
repositories should specify supported encodings by
including compression elements in the identify response
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Data Provider: Test and Registration


create own OAI-PMH requests and send to OAI interface –
check results
use the Repository Explorer (VT University)
http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai/
provide arguments via HTML forms
responses are validated
‘browsing’ to other requests
automatic conformance tester

official registration site
http://www.openarchives.org/data/registerasprovider.html
provide base URL
extensive conformance test (incl. error conditions …)
information on incorrect behaviour
in case of conformance – added to the official list
regular checks
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Agenda
1. General Considerations
2. Data Provider
3. Service Provider
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Service Provider: Examples
 Repository Explorer:
http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai/
 search engines / subject gateways
Cross Archive Searching Service: http://arc.cs.odu.edu/
MyOAI: http://www.myoai.org/
DINI: http://edoc.hu-berlin.de/oaisearch/
Physnet: http://physnet.uni-oldenburg.de/oai/query.php
 internal communication
ProPrint: http://edoc.hu-berlin.de/proprint/
library compounds
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Service Provider: Prerequisites
 internet connected server
 database system (relational or XML)
 programming environment
can issue HTTP requests to web servers
can issue database requests
XML parser
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Service Provider: Structure (1)
Archive Management
selection of archives to be harvested
enter entries manually or
automatically add / remove archives using the
official registry
Request Component
creates HTTP requests and sends them to OAI
archives (data provider)
demands metadata using the allowed verbs of the
OAI-PMH
possibly selective harvesting (set parameter)
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Service Provider: Structure (2)
Scheduler
realises timed and regular retrieval of the
associated archives
simplest case: manual initiation of the jobs
else: e.g. cron job …
Flow Control
resumption token: partitioning of the result list into
incomplete sections – anew request to retrieve
more results
HTTP error 503 (service not available) – analysis
of response to extract “retry-after” period
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Service Provider: Structure (3)
Update Mechanism
realises consolidation of metadata which have been
harvested earlier (merge old and new data)
easiest case: always delete all ‘old’ metadata of an archive
before harvesting it
reasonable: incremental update (from parameter) – insert
new metadata and overwrite changed / deleted metadata
(assignment using the unique identifiers)
XML Parser
analyses the responses received from the archives
validation: using the XML schema
transforms the metadata encoded in XML into the internal
data structure
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Service Provider: Structure (4)
Normaliser
 transforms data into a homogenous structure
(different metadata formats)
 harmonises representation (e.g. date, author,
language code)
 maps / translates different languages
Database
 mapping the XML structure of the metadata into a
relational database (multi values …)
 or: use an XML database
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Service Provider: Structure (5)
Duplication Checker
merges identical records from different data providers
possibility: unique identifier for the item (e.g. URN, …)
but: often not easily practicable and not risk / error free
Service Module
provides the actual service to the ‘public’
basis: harvested and stored records of the associated
archives
uses only local database for requests etc.
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Service Provider: Architecture
User
Harvester
User
Administrator
OAI Service Provider
Scheduler
Service
module
Normaliser
Update
mechanism
Database
XML Parser
Flow control
Dublication
checker
Data Provider
Data Provider
Data
Provider
2nd OAForum workshop
- Lisbon - 5th-7th December
2002 - Tutorial: OAI and
OAI-PMH
for Beginners - Part III
Service Provider: Resumption Token
 optional from the data provider’s point of view
 but: mandatory for service providers
 for complete lists: resume sequences of
incomplete lists
1. ‘recognise’ that response contains incomplete list
2. re-issue OAI request to data provider in order to get
next part of the list
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Service Provider: Test and Registration
 harvest registered ( OAI complient!) data
providers
 test behaviour of service provider
 official registration site
http://www.openarchives.org/service/
registerasprovider.html
provide institutional information
web site, email address, ...
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Data & Service Provider: Questions?
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part III
Tutorial
OAI and OAI-PMH for Beginners
An introduction to the Open Archives Initiative
and the Protocol for Metadata Harvesting
Part IV: Implementation issues - XML
schemas and support for multiple
record formats
Andy Powell
UKOLN, University of Bath
[email protected]
Agenda
1. basics
2. XML schema details
3. extending oai_dc for your application
4. using IMS metadata as new record format
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Basics
 OAI-PMH uses XML Schemas to define record
formats
 you can exchange any data you like using OAIPMH as long as you can encode it as XML and
define an XML-Schema for it!
 OAI-PMH mandates the ‘oai_dc’ XML schema
 OAI-PMH documentation also describes use of
XML schema to exchange
rfc1807: a schema for rfc1807 format metadata;
marc21: a recommended schema for MARC21
metadata, provided by the Library of Congress;
oai_marc: a schema for MARC format metadata
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
A closer look at oai_dc
 the simple DC schema used as mandatory record
format in OAI-PMH defines a container schema
 container schema is OAI-specific
 container schema is hosted on the OAI Web site
 imports a generic DCMES schema
 generic DCMES schema is hosted on the DCMI
Web site
 same model likely to be used for ‘qualified’ DC
schema – container schema hosted by OAI,
generic schema hosted by DCMI
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
An oai_dc record…
 an example oai_dc record (viewed via the
repository explorer)
 here’s the full GetRecord response
 three important things to notice…
 namespace for the oia_dc format
xmlns:oai_dc=http://www.openarchives.org/OAI/2.0/oai_dc/
 namespace for DCMES elements
xmlns:dc=http://purl.org/dc/elements/1.1/
 container schema associated with the oai_dc
namespace
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd"
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
The XML schemas
 The oai_dc container schema
http://www.openarchives.org/OAI/2.0/oai_dc.xsd
 imports DCMES schema from
http://dublincore.org/schemas/xmls/simpledc20020312.xsd
 defines a container element called ‘dc’
 lists the allowed elements within the ‘dc’ container
(from the DCMES namespace/schema above)
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
When oai_dc isn’t enough
 when the 15 DCMES elements are too limited –
e.g. adding extra metadata elements
 when you need greater precision in your metadata
records – e.g. adding ‘encoding schemes’ to
existing elements
 when you want to exchange other metadata
formats
IMS/IEEE LOM – eLearning metadata
ODRL – Open Digital Rights Language
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Extending the oai_dc schema
 simple scenario…
 RDN currently uses oai_dc schema to exchange
records but wants to add one additional element
called
accessControl
 note: this is not a real scenario…
RDN really wants to use qualified DC records – but doing
qualified DC too complicated for this tutorial!
hope to write-up RDN work on exchanging qualified DC in
future issue of Ariadne
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Step 1 – metadata format name
 the new metadata format needs a name
 in this case, we’ve chosen
rdn_dc
 following OAI’s naming of ‘oai_dc’
 alternative possibilities
rdndc
rdn
etc.
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Step 2 – create namespaces
 two namespaces are required…
 namespace for the rdn_dc format
http://www.rdn.ac.uk/oai/rdn_dc/
 namespace for the new metadata elements
(properties) that we are going to use in this format
http://purl.org/rdn/terms/
 note:
use of Purl for the elements namespace follows DCMI
usage but is not mandatory
however, both these namespace URIs should be under
your control to ensure uniqueness and prevent re-use in
the future
URIs do not need to resolve to anything
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Step 3 – local copy of DC schema
 make local copy of the DCMES schema
 in this case the copy is at
http://www.rdn.ac.uk/oai/rdn_dc/20021204/dc.xsd
 this step isn’t strictly necessary
 in fact – it is probably bad practice to do this
 but, currently some minor problems with the
DCMI-hosted copy of the schema
 …working with local copy is easier
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Step 4 – schema for new terms
 create an XML schema for the new ‘rdnterms’
 in this case the schema is available at
http://www.rdn.ac.uk/oai/rdn_dc/20021204/rdnterms.xsd
 the schema defines the new element/property
accessControl
 and adds it to the dc:any group
 also creates a new container type
rdnterms:elementContainer
 note:
schema URI contains a date-stamp
this should make future enhancements to the schema
easier to implement
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Step 5 – container schema
 create a container schema for the new record
format
 in this case the schema is available at
http://www.rdn.ac.uk/oai/rdn_dc/20021204/rdn_dc.xsd
 this simply imports the rdnterms schema
 then defines a container element called ‘rdndc’ of
type
rdnterms:elementContainer
 again, the schema URI contains a date-stamp
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Step 6 – validate, validate, val…
 create some test records using your new schemas
http://www.rdn.ac.uk/oai/rdn_dc/20021204/test.xml
http://www.rdn.ac.uk/oai/rdn_dc/20021204/oai-test.xml
 use the XML schema validator at
http://www.w3.org/2001/03/webdata/xsv
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Step 7 – ListMetadataFormats
 add information about the new format to your
repository’s response to the
‘ListMetadataFormats’ request…
…
<metadataFormat>
<metadataPrefix>rdn_dc</metadataPrefix>
<schema>http://www.rdn.ac.uk/oai/rdn_dc/20021113/rdn_dc.xsd</schema>
<metadataNamespace>http://www.rdn.ac.uk/oai/rdn_dc/</metadataNamespace>
</metadataFormat>
…
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Step 8 – other verbs
 modify your repository’s response to the ‘ListSets’,
‘ListIdentifiers’, ‘ListRecords’ and ‘GetRecord’
requests
 accept ‘metadataPrefix’ set to new format name
‘rdn_dc’
 return records formatted according to the new
schema(s)
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Step 9 – validate again





use the Repository Explorer to check that:
all requests work with new ‘metadataPrefix’
oai_dc format still works!
appropriate records are returned for each format
responses validate correctly
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Summary
 decide on name for your new metadata format
and appropriate namespaces
 develop XML schemas for container and new
elements if appropriate
 create test records and validate
 modify your repository (source code and/or
configuration files) to support the new format
 validate and test repository
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Other record formats
 can take similar approach with other metadata
record formats
IMS/IEEE LOM
ODRL
 in these cases, XML schemas and namespaces
have already been agreed
 deployment of these formats should be easier
because you don’t need to define your own
schemas…
BUT… XML schema specs continually undergoing
revisions currently so sometimes hard for applications like
IMS to keep up!
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Adding support for IMS
 modify ‘ListMetadataFormats’ response to include
…
<metadataFormat>
<metadataPrefix>ims</metadataPrefix>
<schema>http://www.imsglobal.org/xsd/imsmd_v1p2p2.xsd</schema>
<metadataNamespace>
http://www.imsglobal.org/xsd/imsmd_v1p2
</metadataNamespace>
</metadataFormat>
…
 extend ‘ListSets’, ‘ListIdentifiers’, ‘ListRecords’
and ‘GetRecord’ requests
accept ‘metadataPrefix’ set to ‘ims’ and return records
formatted appropriately
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners - Part IV
Tutorial
OAI and OAI-PMH for Beginners
An introduction to the Open Archives Initiative
and the Protocol for Metadata Harvesting
Summary
 during today’s tutorial we hope that you have
 gained an overview of the history behind the OAIPMH and an overview of its key features
 been given a deeper technical insight into how the
protocol works
 learned something about some of the main
implementation issues
 found some useful starting points and hints that
will help you as implementors
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners
Questions
 now…
 feel free to tell us what you didn’t understand
 and ask general questions (of course!)
Uwe Müller
Humboldt University Berlin, Germany
[email protected]
Andy Powell
UKOLN, University of Bath
[email protected]
2nd OAForum workshop - Lisbon - 5th-7th December 2002 - Tutorial: OAI and OAI-PMH for Beginners
Descargar

OA-Forum OAI Tutorial, Lisbon