Contributions > Descriptions of Standards > OAI-PMH Login
  Minimize
 
 
  Print  Minimize
Name
 
  • OAI-PMH
Service Genre
 
Description
 

OAI-PMH “defines a mechanism for harvesting XML-formatted metadata from repositories”. The OAI-PMH mandates unqualified Dublin Core (DC) as its common metadata format. The OAI-PMH also “supports the notion of multiple metadata sets, allowing communities to expose metadata in formats that are specific to their applications and domains”. Other metadata formats might include domain-specific Dublin Core Application Profiles (DCAPs) or other XML formats, such as LOM or ODRL.

Non-XML formatted data, e.g., MARC records, could either be conveyed using an XML translation (e.g. MARCXML) or could conceivably be wrapped in a CDATA section within the XML record itself. The latter approach, although messy, could also be used to enable content, as well as metadata, to be harvested using OAI-PMH, using content packaging standards such as METS, IMS Content Packaging and MPEG-21 DIDL.

The OAI-PMH specification defines 2 main actors:

  • harvester - a client application that issues OAI-PMH requests. A harvester is operated by a service provider as a means of collecting metadata from repositories.
  • repository - a network accessible server, managed by a data provider to expose metadata to harvesters.

To allow various repository configurations, the protocol distinguishes between three distinct entities related to the metadata made accessible by repositories:

  • resource - the object or “stuff” that metadata is “about”. The nature of a resource, whether it is physical or digital, or whether it is stored in the repository or is a constituent of another database, is outside the scope of the protocol.
  • item - a constituent of a repository from which metadata about a resource can be disseminated. An item is conceptually a container that stores or dynamically generates metadata about a single resource in multiple formats.
  • record - metadata in a specific metadata format. A record is returned as an XML-encoded byte stream in response to a protocol request to disseminate a specific metadata format from a constituent item.
Requests & Behaviours
 

OAI-PMH requests must be submitted using either the HTTP GET or POST methods. POST has the advantage of imposing no limitations on the length of arguments. Repositories must support both the GET and POST methods.

The protocol specifies 6 requests (or ‘verbs’) that OAI-PMH compliant repositories must be able to support:

  • Identify - used to retrieve information about a repository. The response has the following mandatory elements:
    • repositoryName - the name of the repository
    • base URL - the base URL of the repository
    • protocolVersion - the version of OAI-PMH supported
    • earliestDatestamp - earliest date/time for additions, changes or deletions
    • deletedRecord - information on how the repository treats deletions
    • granularity - reports whether date stamps include hours and minutes
  • ListMetadataFormats - used to retrieve the metadata formats available for the repostory as a whole or an individual record (the data returned includes the metadata prefixes to be used in other requests. Possible arguments:
    • identifier (optional) - metadata formats available for an individual item
  • ListSets - used to retrieve information about the set structure of the repository. One possible arguement:
    • resumptionToken (exclusive) - used as part of flow control.
  • ListIdentifiers - used to retrieve header records only. Possible arguments:
    • metadataPrefix (required) - see GetRecord above
    • from (optional) - the earliest date stamp for records to be returned
    • until (optional) - the latest date stamp for records to be returned
    • resumptionToken (exclusive) - used as part of flow control.
  • ListRecords - used to harvest records from a repository. Possible arguments:
    • metadataPrefix (required) - the format in which the metadata should be returned
    • from (optional) - the earliest date stamp for records to be returned
    • until (optional) - the latest date stamp for records to be returned
    • set (optional) - only harvest records which are members of the named set
    • resumptionToken (exclusive) - used as part of flow control.
  • GetRecord - used to retrieve an individual metadata record from the repository. Two required arguments:
    • identifier - the unique identifier of the item in the repository
    • metadataPrefix - the format in which the metadata should be returned
Use & Interactions
 

As outlined above, OAI-PMH is mainly used for the harvesting of metadata records in XML format, although it can be used to transport resources themselves.

The protocol supports the concept of selective harvesting by allowing harvesters to limit the number of records returned using either date stamps or sets (groups of records). It also supports segmentation of large result sets (flow control) via the use of resumption tokens.

The Identify, ListMetadatFormats and ListSets verbs are mainly used to retrieve information about a repository to help a harvester decide its harvesting strategy. the main vehicle for harvesting is the ListRecords verb. ListIdentifiers can be seen as a summary of ListRecord whilst GetRecord is used to return a single specified record from the repository.

Interface Definition
 

All OAI-PMH interactions use HTTP as a transport mechanism.

Structure
 

 

Functionality
 

 

Implementation Guidance & Dependencies
 

See http://www.openarchives.org/OAI/openarchivesprotocol.html

Usage Scenarios
 

 

Applicable Standards
 

OAI-PMH v2.0 http://www.openarchives.org/OAI/openarchivesprotocol.html

Known Uses
 

 

Design Tradeoffs
 

 

Service Expression Dependencies
 

 

Related Service Expressions
 

 

Relates Service Usage Models (SUMs)
 

 

Related CORE SUMs
 

 

Classification
 

Component Status

[ ] Placeholder

 

[ ] Approved 

[X] Unapproved

[ ] Withdrawn

[ ] Superseded

Domain

[X] Learning

[ ] IT Services

[X] Research

[X ] Libraries

[ ] Administration

[] Common

 

Development Status

[ ] Proposed

[ ] Developmental  

[ ] Prototype

[X]  Production

Deployment Scale

[ ] Isolated

[X] Widespread

 

 

Maturity

[ ] Immature

[X] Mature

 

 

Confidence

[ ] High

[X] Medium

[ ] Low

 

State Behaviour

[X] Stateful

[ ] Stateless

 

 

Transactional Behaviour

[X] Transactional / ACID

[ ] Transactional / non-ACID

[ ] Non-Transactional

 

Batch Behaviour

[ ] Individual

[X] Batch

 

 

Time Constraint Behaviour

[ ] Hard Real Time

[X] Soft Real Time

[ ] None

 

Service End Point

[ ] Provider

[X] Requestor

[ ] Transcoder

 

Auth’ed

[X] Auth’ed

[ ] Non-Auth’ed

 

 

Protocol Binding

[ ] Web Service

[ ] Other

[ ] SOAP

[ ] REST

[X] HTTP

Service Genre Coverage

[X] Full

[ ] Extended

[ ] Subset

[ ] Overlapping

Status
 

Pending

e-Framework Version
 

 

Date Submitted
 

2007-03-01

Date Updated
 

 

Author
 

Neil Smith

Organisation Affiliation
 


Last updated 30 January 2008

 

 
  Minimize
Unless otherwise noted material from the e-Framework website can be downloaded for your own use under a Creative Commons Attribution-ShareAlike 2.5 Australia License
CreativeCommons-by-sa.png
 
Friday, December 05, 2008
Copyright e-Framework Partners 2006 - 2008

Terms and Conditions

Privacy Statement