4 Data Linking
The internal data are linked to various external data sources.
Main database table el_matches
. Idea is to just dump everything there in a first step.
additional_data
is ajson
column for maximal flexibility
CREATE TABLE `el_matches` (
timestamp NULL DEFAULT current_timestamp(),
`created_at` timestamp NULL DEFAULT NULL ON UPDATE current_timestamp(),
`updated_at` varchar(55) NOT NULL,
`entity_id` varchar(255) DEFAULT NULL,
`entity_id_type` varchar(255) DEFAULT NULL,
`entity_id_combination` varchar(255) DEFAULT NULL,
`entity_id_combination_type` varchar(255) DEFAULT NULL,
`external_id` varchar(255) DEFAULT NULL,
`external_id_label` varchar(55) NOT NULL,
`external_id_type` varchar(255) DEFAULT NULL,
`external_id_desc` varchar(255) DEFAULT NULL,
`source` DEFAULT NULL,
`source_id` longtext varchar(255) DEFAULT NULL,
`property_type` CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL CHECK (json_valid(`additional_data`))
`additional_data` longtext =InnoDB DEFAULT CHARSET=utf8mb4; ) ENGINE
4.1 GND/lobid
I want to link the entities, like a book author or publisher, to the external identifier of the German National Library (GND
).
The easiest way seems to be calling the lobid API.
It returns a json with meta data, e.g. contributors or topics that often, but not always, have a GND
id.
I filter the json
file by using jq, a json processor, that has a wrapper in R: jqr.
4.1.1 Output
- DB tbl:
lgbtiq_kg.el_matches
- beware: it’s not cleaned data, so not in “clean” database
lgbtiq_kg_clean
4.1.2 Examples for jq
commands
- getting contributors:
.member[].contribution[]? | {type: .agent?.type[], label: .agent?.label, role:.role?.id, gnd_id: .agent?.gndIdentifier}
- getting topics:
.member[].subject[].componentList[]? | {gnd_id: select(.type?).gndIdentifier, type: select(.gndIdentifier?).type[], label: select(.type?).label}
- getting lobid ressource id:
.member[] | {source_id: .id}
- more complex with optional fields:
.member[] | {id: ((select(.sameAs) | {sameAs}) // null), gnd_subject_category: .gndSubjectCategory, placeOfBusiness: .placeOfBusiness}
4.1.3 Basic workflow
- Most sophisticated script:
data-linking/lobid/04-search-via-publisher.R
- serves for templates from now on
- search for
isbn
- the most precise way, because it’s already a persistent id
data-linking/lobid/01-search-via-isbn.R
- script first version
- search for author name and title in the relevant objects
- query like
contribution.agent.label:AUTHOR+AND+title: TITLE&format=json")
data-linking/lobid/02-search-via-author-title.R
- query like
- search just by name
data-linking/lobid/03-search-via-author.R
- search for publishers as corporate bodies:
data-linking/lobid/04-search-via-publisher.R
4.2 VIAF
To VIAF by using the viafr
R-package.
It suggest a VIAF entity when providing a string.
- db table:
el_viaf_books_authors