Heritage Connector Code Repositories

The Heritage Connector project is composed of multiple components and services split across multiple GitHub repos. Below we give an overview of the project’s main code repositories:

Pipeline
Deployment
Demos
NLP
Vectors
APIs
Datasets
Thor / Fuseki
Elastic Wikidata

Heritage Connector Pipeline

Contains the main bootstrap / pipeline used by the Heritage Connector to import data into a graph before performing NPL and Graph analysis on it. Individual repos provide much of the actual functionality and are listed separately below.

https://github.com/TheScienceMuseum/heritage-connector

A set of tools to:

load tabular collection data to a knowledge graph
find links between collection entities and Wikidata
perform NLP to create more links in the graph (also see hc-nlp)
explore and analyse a collection graph ways that aren’t possible in existing collections systems

Heritage Connector Deployment

An easy way to deploy all the Heritage Connector API’s and Endpoints, with the exception of the main pipeline, in one step.

https://github.com/TheScienceMuseum/heritage-connector-deployment

The following services are included and all configured through environment variables:

fuseki - RDF triplestore
thor - front end for performing SPARQL queries
thor-cors-proxy - CORS proxy to enable thor to connect to fuseki
heritage-connector-vectors - an nearest neighbours on knowledge graph embeddings
heritage-connector-apis - API endpoints to wrap some common SPARQL queries and the nearest neighbours API.

Heritage Connector Demos

This repo contains various demos and sketches of demos for Heritage Connector.

https://github.com/TheScienceMuseum/heritage-connector-demos

an interactive streamlit app showing NER and entity linking which uses static data for speed (not hosted at the moment)
a bookmarklet to view connections from an SMG collection, blog or journal page
a macro visualisation of the whole collection/knowledge graph
a visualisation of the combined SMG and V&A collections
maps of all the places in the knowledge graph

Heritage Connector NLP

NLP tools for heritage collections

https://github.com/TheScienceMuseum/heritage-connector-nlp

Heritage Connector Vectors

Generates graph and language embeddings for the Heritage Connector project.

https://github.com/TheScienceMuseum/heritage-connector-vectors

Heritage Connector APIs

Various API’s for the querying Heritage Connector project https://github.com/TheScienceMuseum/heritage-connector-apis

Heritage Connector Data

Various Public data sets for the Heritage Connector Project.

Further public datasets and outputs can be found on the HC / TANC page on Zenodo

Heritage Connector Thor / Fuseki

These repos provide a SPARQL server (Fuseki), a SPARQL client (Thor) and a Proxy server (to deal with CORS headers) as Docker containers. All three component can be installed in-one-go via this repo

Elastic Wikidata

Simple CLI tools to load a subset of Wikidata into Elasticsearch.

https://github.com/TheScienceMuseum/elastic-wikidata