OAI-PMH: a guide for harvesters
ORA supports and participates in the Open Archives Initiative (OAI). ORA is a registered OAI-PMH data-provider and provides metadata for all public records which is updated as soon as each record is published or updated.
The OAI-PMH endpoint uses OAI_PMH v2.0 and is available at the base URL
Item = ORA record
Each record in ORA is modeled as an Item in the OAI-PMH interface. Only the most recent version of each record is exposed via this interface.
Metadata formats and downstream targets
Metadata for each item (record) is available in several formats. Not all formats are supported for all records. The available formats include:
You may request a list of all the metadata formats supported with the ListMetadataFormats verb.
OAI DC (oai_dc)
Simple Dublin Core (DC).
Datacite (datacite_dc) - datasets only
Customised DC format for the DataCite service.
Customised DC format for the Oxford University SOLO service
Customised DC format for the BASE service.
Customised metadata format for the OpenAIRE Literature Repository Guidelines v4.0.
EThOS (uketd_dc) - theses only
Extended DC format for the EThOS service.
RIOXX Terms (rioxx_terms and rioxx_terms_cc0)
Metadata formats for the RIOXX V2 Metadata application profile. This format has additions for deposit and record publication dates in support of UKRI and CORE recommendations. These updates use the RIOXX V3 Beta formats for these fields
The rioxx_terms_cc0 metadata format is released under a CC-0 licence. It is identical to the rioxx_terms metadata format, with the exception of abstracts/summary descriptions, which are not included.
Every OAI-PMH metadata record has a datestamp associated with it, which is the last modification time of that record in the ORA public website.
Because the current ORA public website dates from April 2018, the OAI-PMH datestamp values do not correspond with the original submission or publication times for older records, and may not for newer records because of administrative and bibliographic updates.
The earliest datestamp is given by the
<earliestDatestamp> element of the Identify response.
The OAI-PMH interface does not support selective harvesting based on publication date. The datestamps are designed to support incremental harvesting of updates on an ongoing basis. It is not possible to selectively harvest only, say, records published in February 2017.
Except for selective harvesting based on subject areas (see description of Sets below) the interface is designed to support copying and synchronization of a complete set of ORA metadata. In order to harvest metadata for all articles, either make requests without a datestamp range (recommended), or make requests from the
<earliestDatestamp> through to the present (but be aware that because of bulk updates there are some dates on which there were large numbers of updates).
Once an initial harvest has been completed, the copy may be maintained by making incremental harvesting requests with the from date set to the date of last harvest (
from is best taken from the last server response; don't set the
ORA records are available for selective harvesting as a separate set based on their 'Type of work' within the ORA system, e.g. 'thesis', 'dataset', 'journal article'. You may request a list of all the sets supported with the ListSets verb.
New records are made available immediately on publication.
Record deletion policy
The ORA OAI-PMH service does not maintain information about deletions. Once deleted from the ORA system, deleted records are removed from the OAI-PMH service immediately.
If required, ORA performs scheduled maintenance activity on Tuesday mornings from 07:00 to 09:00 (UK time). This may result in the OAI-PMH service being unavailable for short periods.
Internal ORA identifiers (record identifiers) are in the form
ORA OAI-PMH identifiers are in the format
oai:ora.ox.ac.uk:uuid:000d2073-9081-4a5b-b238-021cc7178e49. This is a change from the previous ORA OAI-PMH endpoint, where identifiers did not have the OAI scheme or Repository Identifier prefixes.
Harvesters which used the previous endpoint can map identifiers by prefixing them with