Matching and merging
Swissbib contains the bibliographic data of RERO, the IDS-libraries, the Swiss national library and the IDS-partner-libraries. Due to the organizational structure of the Swiss library system the data is stored in eleven different databases. The amount of doubles in this data is high because of the similar collection structure of the different libraries. The data is cataloged according to three sets of rules: AACR2, Recaro and KIDS. The data is structured in formats MARC21 and IDSMARC. A fourth factor is multilingualism that influences mostly authorities and subject headings.
Whereas the formats can be mapped rather easily the different cataloging rules and to a bigger extent the different cataloging practices pose problems.
In order to get a picture of the difficulties related to eliminating duplicates in catalog-data the swissbib project mandated Pierre Gavin and Jean-Bertrand Gonin in 2007 to conduct a study of the feasibility of deduplication of bibliographic and name-authority data. Their findings are summarized on the page deduplication study.