Import

From Swissbib

Jump to: navigation, search

Contents

General description

swissbib has two options to import structured metadata and both are actually in use:

  1. The basic import path goes through the data preparation module and is best suited for any library metadata that is not unique
  2. The additional import path goes directly in the search engines index and is very well suited for unique materials that need not to be deduplicated (archival collection and alike)

primary import path

During the import the data is analyzed and transformed to Pica+ the internal format of the swissbib data preparation engine CBS. At this stage some of the data is excluded as it is not useful for swissbib.

The conversion to the internal format as well as other steps can be defined by export source in order to get the maximum out of the individual data delivered by the local library networks.

A lot of the conversion steps are shared between the different networks.

Criteria for excluding data

swissbib applies general criteria for exclusion of data. There mainly centered around two topics:

  1. data quality and richness (minimal requirements to be met)
  2. usefulness of the information provided

The actual criteria for the exclusion of records are defined on a separate page.

Data sources imported through primary import path

  • all library data (12 sources)
  • the Swiss posters collection (1 source)
  • the institutional repositories (currently 1 source)
  • the retroSEALS collection (1 source)

secondary import path

The secondary import path is used to index either full text or structured metadata that describes unique data.

full text

swissbib uses the full text indexation to enrich the library metadata with abstracts and indexes that are scanned by the Swiss libraries. Unfortunately not all institutions share their content openly.

structured metadata

For unique data it makes little sense to transform and mix it with library data in the "data preparation module" because of the nature of this data. It is a lot easier to index it directly. In order to smoothen the process the data is converted into a MARC-field-structure that can be indexed with the same definitions as the library data.

Other transformation scenarios are possible but it isn't clear what could be gained with it.

Currently this applies to the archival data of the Swiss national library:

  • digitized content of the Swiss Literary Archives SLA
  • digitized content of the Federal Archives of historic monuments (FAHM)

Together with swissbib the staff of the national library developed xsl-transformations from scope-XML-structures to standard MARCxml.