I · 04

Anatomy of an OpenURL

An OpenURL is an ordinary HTTP URL whose query string carries a structured citation. This page takes one apart, piece by piece.

Every OpenURL has the same coarse shape: a base URL identifying a link resolver, followed by a query string carrying the citation in Key/Encoded-Value (KEV) form. The query string conforms to the rules of Z39.88-2004, but the URL itself is just an HTTP URL and travels over HTTP like any other.

The example

Below is a complete OpenURL describing the Einstein, Podolsky, and Rosen paper of 1935, including both a DOI identifier and a full set of descriptive Referent metadata. Line breaks are added for readability; in transmission the URL is one continuous string.

https://resolver.example.edu/openurl?
ctx_ver=Z39.88-2004
&rft_val_fmt=info:ofi/fmt:kev:mtx:journal
&rfr_id=info:sid/example.com:database
&rft_id=info:doi/10.1103/PhysRev.47.777
&rft.atitle=Can+Quantum-Mechanical+Description+of+Physical+Reality+Be+Considered+Complete%3F
&rft.jtitle=Physical+Review
&rft.aulast=Einstein
&rft.aufirst=A.
&rft.au=Podolsky%2C+B.
&rft.au=Rosen%2C+N.
&rft.date=1935
&rft.volume=47
&rft.issue=10
&rft.spage=777
&rft.epage=780

Base URL

Everything to the left of the ? is the base URL: scheme, host, path. The base URL identifies the resolver that will process the request. A library's resolver typically has a fixed base URL distributed to content providers in advance (often through a knowledge-base service such as OCLC's WorldCat knowledge base or Ex Libris's Central Discovery Index).

The base URL is not specified by Z39.88-2004. It is whatever HTTP endpoint the institution publishes for its resolver.

Query-string anatomy

The query string carries the ContextObject in KEV form. Each parameter is a key/value pair separated by &. Below are the categories of parameter you will find in a typical OpenURL.

Administrative

ctx_ver=Z39.88-2004 declares the version of the framework in use. Its presence is the signal that the request is 1.0 rather than 0.1. Other administrative parameters defined by the standard include ctx_id (an optional identifier for the ContextObject itself), ctx_tim (timestamp), and ctx_enc (character encoding, with info:ofi/enc:UTF-8 being the usual value).

Format identifiers

rft_val_fmt=info:ofi/fmt:kev:mtx:journal names the metadata format used to describe the Referent. Four formats were registered: journal, book, dissertation, and patent. The info:ofi/fmt:kev:mtx: prefix marks them as the KEV metadata formats defined by Z39.88-2004 (see The OpenURL Registry).

Referent descriptors

Everything beginning rft. is a descriptor of the Referent — the resource being requested. The keys after the dot are defined by the format named in rft_val_fmt. For the journal format, the keys cover article title, journal title, authors, volume, issue, pages, dates, and so on. See Journal Format (KEV).

Identifiers

rft_id=info:doi/10.1103/PhysRev.47.777 carries an identifier for the Referent. Identifiers and descriptive metadata may both be present. The info: URI scheme (RFC 4452) is used to express identifiers from common namespaces — info:doi/, info:pmid/, info:arxiv/, and so on. See Identifiers.

Source identification

rfr_id=info:sid/example.com:database identifies the Referrer — the system that generated the OpenURL. The info:sid/ scheme is the conventional way to do this and is widely supported. The value following info:sid/ is the source's reverse-DNS-style identifier, optionally suffixed with a colon-delimited sub-identifier. See Referrer (rfr).

Keys are not encoded; values are

Keys (the part to the left of each =) are ASCII identifiers and are not percent-encoded. Values are percent-encoded as required by RFC 3986 for HTTP query-string content. Notably:

  • Spaces may be encoded as + or %20. Both are accepted; + is conventional in query strings.
  • Reserved characters such as ?, %, =, &, #, and colons within values must be percent-encoded.
  • Multi-valued fields (for example rft.au, which can appear multiple times for multiple authors) are repeated, not joined.

Decoders should accept either form of space and should be tolerant of redundant encoding.

Sources