Harvesting

From swissbib
Jump to: navigation, search

General description

It is of great importance to keep swissbib in sync with the local library systems after the so called initial load. This could be done by different means but OAI-PMH is the one of choice, as it supports all the means necessary for a continuous update.

This page does not deal with the institutional repositories and their implementation of OAI as their function and usage is different from a local library system.


OAI-PMH for local library systems

The following sections deal with functionality of OAI-PMH to be readily available and other that is of minor importance for a start. This page does not include commands/options that are necessary and/or mandatory by standard to build and maintain a repository as these are described in great detail on the OAI website protocol section


Mechanisms and Information necessary

To be able to retrieve information on a regular basis the following verbs and arguments are mandatory for swissbib:

  • ListRecords
  • set
  • until
  • from
  • resumptionToken


Additionally to these Verbs and arguments the following functionality is needed:

  • timestamps with a granularity down to hours
  • information on deleted records as this often happens in library systems.


Remarks on the different Verbs and Options

set

Generally it is planned to have at least two sets:

  1. the actual bibliographic data in MARCXML
  2. the availability information from Aleph systems as a second set (MARCXML)

=> if it is planned to make the OAI-Interface publicly available there must by standard be another set in DublinCore as fall-back.


from - until

  • The argument from is necessary for incremental updates
  • The argument until gives the flexibility to redo transfers for a given timespan.


resumptionToken

As we generally use listRecords, which delivers the metadata directly, it makes generally sense to build packets in order to minimize the network load per transfer.


Timestamps granularity

As it is planned to update the availability information more frequently than once per day the granularity of the timestamps should at least go down to the level of hours. If the system delivers them even more detailed this should - of course - not be changed.


Information on deleted records

In order to keep swissbib in sync with the local sites it is of great importance to get information about deleted records. Therefore the local sites must deliver information about deleted records on the level "persistent".


Mechanisms that are nice to have

These verbs are of minor importance but could be handy:

  • ListIdentifiers
  • GetRecords


If a local site has problems with its network connection harvesting with these commands could be promising but it won't be the solution of choice.


Informations on the local systems' OAI capabilities

This information is open for swissbib members only and covers the functionality of the systems as well as data formats and informations about specialties.