APS Harvesting API

General

Content Negotiation

Endpoints that support content negotiation use a content type specified using the the Accept request header. See supported formats for a complete list of formats and the documentation for individual endpoints to see which formats are supported for each of them.

Example (request article in APS JSON format):

Accept: application/vnd.tesseract.article+json

Responses

Individual endpoints define the responses they provide. Clients should guard against general network and HTTP errors such as timeouts, unexpected server errors (50x responses), etc. Most supported error responses provide additional information about the error (see Error Responses).

Response Payloads

All JSON responses wrap the object(s) returned in an object containing the top-level key data. This is done to provide future-proofing of response objects. The actual data contained in the data key is specified by the type of response (see documentation for each endpoint below).

Example (list of objects returned):

{
  "data": [
  ]
}

Example (single object returned):

{
  "data": {
  }
}

Authentication

There are three main authentication types supported for access to the API:

  • No authentication - Some article (for example: open access) may be retrieved using the JSON format without any requirement for authentication
  • IP authentication - Access to from specific IP address ranges granted by APS to specific formats and sets of the content
  • CHORUS Agency Access via tokens - Access to content by funding agencies participating in CHORUS. Agencies provide their token via the CHOR-Agency-Auth-Token request header.

Pagination

Endpoints that return a list of objects will paginate the response if the number of items exceeds a requested or internal limit. When additional results are available the Link header will be returned following the convention described in draft-nottingham-http-link-header-06 and popularized by the Github API. The Link header values must be followed, do not construct your own URLs.

Several libraries are available for parsing these headers:

Example:

Link: <http://harvest.aps.org/v2/journals/articles?from=2015-01-01&page=2&per_page=100>; rel="next", <http://harvest.aps.org/v2/journals/articles?from=2015-01-01&page=56&per_page=100>; rel="last"

Endpoints

GET /v2/journals/articles

Retrieve a list of articles given some criteria.

URL Parameters

Name Type Values Description
from Date YYYY-MM-DD  
until Date YYYY-MM-DD  
date String modified, published Type of date to use when applying the from and until restrictions may be "modified" for modification date of the article (default) or "published" to use publication date
journals String see journals List of journal codes (comma separated)
set String see sets Set to restrict to, for example: "openaccess"
per_page Integer 1 to 100 Number of results to return per response (max 100)

Request Headers

  • Accept - the content type to return, only application/vnd.tesseract.article+json (default) is supported for this operation
  • CHOR-Agency-Auth-Token - a CHOR, Inc. issued authentication token. If specified the returned items will be restricted to those that match the funder(s) specified by the token.

Responses

  • 200 OK - list of Article Response objects
  • 400 Bad Request - invalid or conflicting parameters specified (e.g., until is newer than from)
  • 401 Unauthorized - access to the requested resource(s) is not permitted. This may be due to an expired token if using the CHOR token access.

GET /v2/journals/articles/{id}

Retrieve a specific article by its ID (DOI). The article will be returned in the format requested.

URL Parameters

  • id - the ID of the article (DOI)

Request Headers

  • CHOR-Agency-Auth-Token - a CHOR, Inc. issued authentication token. If specified the returned items will be restricted to those that match the funder(s) specified by the token.

Request Headers

Responses

  • 200 OK - Article Response object
  • 401 Unauthorized - access to the requested resource(s) is not permitted. This may be due to an expired token if using the CHOR token access.
  • 404 Not Found - the requested article could not be found
  • 406 Not Acceptable - in invalid media type was requested

Response Headers

  • Content-SHA1 - returned only for BagIt responses, contains the hex encoded SHA1 checksum of the payload.

Supported Formats

Content Type Description
application/vnd.tesseract.article+json APS article JSON (Default)
application/zip BagIt
application/pdf PDF
text/xml XML

APS Article JSON

The JSON format provides the article metadata in an easy to consume format. Full documentation for the fields within the returned JSON object can be found in the Article Response section. A JSON Schema is also provided to validate responses and provide additional documentation.

BagIt

The BagIt format provides a mechanism for packaging deliverables consisting of multiple files. The harvest API provides a bag per article in zip format. When unpacked the bag contains a directory named using the basename of the filename parameter in the Content-Disposition header. For example, the response returning the following header:

Content-Disposition: attachment; filename="articlebag-10-1103-PhysRevX-5-021001-complete.zip"

This corresponds to the following structure within the zip file:

articlebag-10-1103-PhysRevX-5-021001-complete/
  |-- manifest-md5.txt
  |-- manifest-sha1.txt
  |-- bagit.txt
  |-- bag-info.txt
  +-- data/
        +-- PhysRevX.5.021001/
              |-- figure_f1.eps.gz
              |-- figure_f2.eps.gz
              |-- figure_f3.eps.gz
              |-- figure_f4.eps.gz
              |-- figure_f5.eps.gz
              |-- figure_f6.eps.gz
              |-- fulltext.ocr
              |-- fulltext.xml
              |-- metadata.xml
              +-- online.pdf

Actual names of the bags and contents will vary depending on what components the request is authorized to retrieve. Best practice is to verify that the SHA1 checksum of the retrieved bag matches the checksum provided in the Content-SHA1 response header. Individual files within the bag should be validated using the manifest-*.txt files contained in the bag.

Journals

Code Name
PRL Physical Review Letters
PRX Physical Review X
RMP Reviews of Modern Physics
PRA Physical Review A
PRB Physical Review B
PRC Physical Review C
PRD Physical Review D
PRE Physical Review E
PRAB Physical Review Accelerators and Beams
PRSTAB Physical Review Special Topics - Accelerators and Beams
PRAPPLIED Physical Review Applied
PRFLUIDS Physical Review Fluids
PRMATERIALS Physical Review Materials
PRPER Physical Review Physics Education Research
PRSTPER Physical Review Special Topics - Physics Education Research
PR Physical Review
PRI Physical Review (Series I)

Sets

Code Description
openaccess Open access

Response Objects

Article Response

Fields such as title and abstract which may contain formatting use HTML with MathML embedded that is appropriate for display using MathJax. All strings are UTF-8.

Example

{
  "id": "10.1103/PhysRevX.5.021001",
  "type": "article",
  "abstract": {
    "value": "<p>The first law of thermodynamics imposes not just a constraint on the energy content of systems in extreme quantum regimes but also symmetry constraints related to the thermodynamic processing of quantum coherence. We show that this thermodynamic symmetry decomposes any quantum state into mode operators that quantify the coherence present in the state. We then establish general upper and lower bounds for the evolution of quantum coherence under arbitrary thermal operations, valid for any temperature. We identify primitive coherence manipulations and show that the transfer of coherence between energy levels manifests irreversibility not captured by free energy. Moreover, the recently developed thermomajorization relations on block-diagonal quantum states are observed to be special cases of this symmetry analysis.</p>",
    "format": "html"
  },
  "articleType": "article",
  "authors": [
    {
      "type": "Person",
      "name": "Matteo Lostaglio",
      "firstname": "Matteo",
      "surname": "Lostaglio",
      "affiliationIds": [
        "a1"
      ]
    },
    {
      "type": "Person",
      "name": "Kamil Korzekwa",
      "firstname": "Kamil",
      "surname": "Korzekwa",
      "affiliationIds": [
        "a1"
      ]
    },
    {
      "type": "Person",
      "name": "David Jennings",
      "firstname": "David",
      "surname": "Jennings",
      "affiliationIds": [
        "a1"
      ]
    },
    {
      "type": "Person",
      "name": "Terry Rudolph",
      "firstname": "Terry",
      "surname": "Rudolph",
      "affiliationIds": [
        "a1"
      ]
    }
  ],
  "affiliations": [
    {
      "name": "Department of Physics, Imperial College London, London SW7 2AZ, United Kingdom",
      "id": "a1"
    }
  ],
  "date": "2015-04-01",
  "fundings": [
    {
      "funderId": "http://dx.doi.org/10.13039/501100000266",
      "funderName": "Engineering and Physical Sciences Research Council",
      "awards": []
    },
    {
      "funderId": "http://dx.doi.org/10.13039/501100000921",
      "funderName": "European Cooperation in Science and Technology",
      "awards": [
        "MP1209"
      ]
    },
    {
      "funderId": "http://dx.doi.org/10.13039/501100000288",
      "funderName": "Royal Society",
      "awards": []
    },
    {
      "funderId": "http://dx.doi.org/10.13039/501100000275",
      "funderName": "Leverhulme Trust",
      "awards": []
    }
  ],
  "metadata_last_modified_at": "2015-04-01T12:39:49-0400",
  "last_modified_at": "2015-04-01T12:39:49-0400",
  "identifiers": {
    "doi": "10.1103/PhysRevX.5.021001"
  },
  "issue": {
    "number": "2"
  },
  "pageStart": "021001",
  "hasArticleId": true,
  "numPages": 11,
  "publisher": {
    "name": "APS"
  },
  "rights": {
    "rightsStatement": "Published by the American Physical Society",
    "copyrightYear": 2015,
    "copyrightHolders": [
      {
        "name": "authors"
      }
    ],
    "creativeCommons": true,
    "licenses": [
      {
        "url": "http://creativecommons.org/licenses/by/3.0/"
      }
    ]
  },
  "journal": {
    "id": "PRX",
    "abbreviatedName": "Phys. Rev. X",
    "name": "Physical Review X"
  },
  "classificationSchemes": {
    "subjectAreas": [
      {
        "id": "nano",
        "label": "Nanophysics"
      },
      {
        "id": "quantum",
        "label": "Quantum Physics"
      },
      {
        "id": "quantum-info",
        "label": "Quantum Information"
      }
    ]
  },
  "title": {
    "value": "Quantum Coherence, Time-Translation Symmetry, and Thermodynamics",
    "format": "html"
  },
  "volume": {
    "number": "5"
  }
}

Fields:

  • id (string) the identifier for the article. This is typically the DOI but should be treated as an opaque identifier used only within the context of this API, to obtain the DOI see the identifiers field.
  • type (string) the type of object this is (always "article" in this API)
  • authors (array) the authors in the order they appear in the article
    • type (string) "Person" in the case of the an individual person, "Organization" for collaborations
    • name (string) the full formatted name
    • firstname (string) the first (given) name(s)
    • surname (string) the surname
    • affiliationIds (array) an array containing the IDs of the affiliations the author is linked to
  • affiliations (array) the affiliations
    • id the internal document identifier for this affiliation (used to link authors with affiliations)
    • name the affiliation name
  • date (string) the date of publication in YYYY-MM-DD format
  • articleType (string)
  • identifiers (object)
    • doi the DOI for the article
  • journal (object) the journal the article was published in
    • id (string) APS specific identifier for this journal
    • abbreviatedName (string) the abbreviated name of the journal (used when citing and in other contexts)
    • name (string) the full (unabbreviated) name of the journal
  • hasArticleId (boolean) whether or not this article uses article IDs rather than page numbers (in which case the page number must be treated as an opaque identifier)
  • issue (object)
    • number the issue "number", should be treated as a string since issues may not necessarily be purely numeric (e.g., "1-2")
  • volume (object)
    • number the volume number
  • pageStart (string) the starting page number for the article. If hasArticleId is true this must be treated as an opaque identifier
  • pageEnd (string) the ending page number for the article
  • numPages (number) the number of pages as it appeared in print or online PDF
  • metadata_last_modified_at (timestamp) the latest time the metadata of this article was modified
  • last_modified_at (timestamp) the latest time any of the components of this article were modified
  • tocSection (object) table of contents section information
    • label (string) the label (title) of the table of contents the article appeared in
  • title (object) the title of the article
    • format (string) the format the value of the title is in (currently only "html" is supported)
    • value (string) the title in the specified format
  • abstract (object) the abstract of the article
    • format (string) the format the value of the title is in (currently only "html" is supported)
    • value (string) the abstract in the specified format
  • classificationSchemes (object) the classification schemes that have been applied to this article keyed by their type
    • subjectAreas (object) the APS subject areas that apply to this article
      • id the identifier for this subject
      • label the human readable label for this subject
  • fundings (array) the funding that applies to this article
    • funderId (string) the fundref ID of the funder
    • funderName (string) the name of the funder
    • awards (array) an array of award IDs (grants, etc.)
  • rights (object) the copyright and licensing rights for this article
    • rightsStatement
    • copyrightYear
    • copyrightHolders (object)
      • name (string)
    • creativeCommons (boolean) whether or not this article is licensed under a Creative Commons license (see licenses for the specific license)
    • licenses (array)
      • url (string)

Error Responses

Most errors will return a JSON response with an error object containing a message with additional details.

Example:

{
  "errors": [
    {
      "title": "zip format not authorized"
    }
  ]
}

Examples - curl

The examples below use cURL to demonstrate some common API use cases. The headers are shown along with abbreviated response payloads full payloads have been ommitted for brevity.

Retrieve all articles newer than January 1, 2015

curl -D - -H 'Accept: application/vnd.tesseract.article+json' http://harvest.aps.org/v2/journals/articles?from=2015-01-01

Response:

HTTP/1.1 200 OK
Server: nginx/1.7.9
Date: Mon, 27 Apr 2015 12:57:17 GMT
Content-Type: application/vnd.tesseract.article+json
Content-Length: 403667
Connection: close
Status: 200 OK
Link: <http://harvest.aps.org/v2/journals/articles?from=2015-01-01&page=2&per_page=100>; rel="next", <http://harvest.aps.org/v2/journals/articles?from=2015-01-01&page=56&per_page=100>; rel="last"

{"data":[...]}

Retrieve all articles using a CHORUS token

curl -D - -H 'CHOR-Agency-Auth-Token: 8f75458ff33070cad3bdee28868bad434f2c' http://harvest.aps.org/v2/journals/articles

## Retrieve a single article - JSON format

% curl -D - -H 'Accept: application/vnd.tesseract.article+json' http://harvest.aps.org/v2/journals/articles/10.1103/PhysRevX.5.021001

Response:

HTTP/1.1 200 OK
Server: nginx/1.7.9
Date: Mon, 27 Apr 2015 12:55:10 GMT
Content-Type: application/vnd.tesseract.article+json
Content-Length: 2842
Connection: close
Status: 200 OK

{"data":{...}}

Retrieve a single format - PDF format

% curl -D - -H 'Accept: application/pdf' http://harvest.aps.org/v2/journals/articles/10.1103/PhysRevX.5.021001

Response:

HTTP/1.1 200 OK
Server: nginx/1.7.9
Date: Wed, 29 Apr 2015 20:15:57 GMT
Content-Type: application/pdf
Content-Length: 749834
Connection: close
Vary: Accept-Encoding
Status: 200 OK
ETag: "4d1c9cab10e2030a9ce751017081f73e34f35c0f"
Content-Disposition: inline; filename=PhysRevX.5.021001.pdf

...

Retrieve a single format - BagIt format

curl -D - -H 'Accept: application/zip' http://harvest.aps.org/v2/journals/articles/10.1103/PhysRevX.5.021001

Response:

HTTP/1.1 200 OK
Server: nginx/1.7.9
Date: Wed, 29 Apr 2015 20:43:35 GMT
Content-Type: application/zip
Content-Length: 1833565
Connection: close
Status: 200 OK
ETag: "85797c188ca0fbcb1bf73b55e4d1bd98d9c4c4bc"
Cache-Control: max-age=86400
Expires: Thu, 30 Apr 2015 20:43:35 GMT
Content-SHA1: 85797c188ca0fbcb1bf73b55e4d1bd98d9c4c4bc
Accept-Ranges: bytes
Content-Disposition: attachment; filename="articlebag-10-1103-PhysRevX-5-021001-complete.zip"

...

Retrieve all open access articles

Open access articles (typically Creative Commons CC-BY) may be retrieved without the need for any authentication. The set parameter must be provided and set to "openaccess".

curl http://harvest.aps.org/v2/journals/articles?set=openaccess


The fulltext XML and PDF of the article may be retrieved by calling the endpoint for individual articles using the DOI retrieved in the list response and setting the appropriate Accept header:

Fulltext XML:

curl  -H 'Accept: text/xml' http://harvest.aps.org/v2/journals/articles/10.1103/PhysRevSTAB.4.072801


PDF:

curl  -H 'Accept: application/pdf' http://harvest.aps.org/v2/journals/articles/10.1103/PhysRevSTAB.4.072801