Heritage Connector Code Repositories
The Heritage Connector project is composed of multiple components and services split across multiple GitHub repos. Below we give an overview of the project’s main code repositories:
Heritage Connector Pipeline
Contains the main bootstrap / pipeline used by the Heritage Connector to import data into a graph before performing NPL and Graph analysis on it. Individual repos provide much of the actual functionality and are listed separately below.
https://github.com/TheScienceMuseum/heritage-connector
A set of tools to:
- load tabular collection data to a knowledge graph
- find links between collection entities and Wikidata
- perform NLP to create more links in the graph (also see hc-nlp)
- explore and analyse a collection graph ways that aren’t possible in existing collections systems
Heritage Connector Deployment
An easy way to deploy all the Heritage Connector API’s and Endpoints, with the exception of the main pipeline, in one step.
https://github.com/TheScienceMuseum/heritage-connector-deployment
The following services are included and all configured through environment variables:
- fuseki - RDF triplestore
- thor - front end for performing SPARQL queries
- thor-cors-proxy - CORS proxy to enable thor to connect to fuseki
- heritage-connector-vectors - an nearest neighbours on knowledge graph embeddings
- heritage-connector-apis - API endpoints to wrap some common SPARQL queries and the nearest neighbours API.
Heritage Connector Demos
This repo contains various demos and sketches of demos for Heritage Connector.
https://github.com/TheScienceMuseum/heritage-connector-demos
- an interactive streamlit app showing NER and entity linking which uses static data for speed (not hosted at the moment)
- a bookmarklet to view connections from an SMG collection, blog or journal page
- a macro visualisation of the whole collection/knowledge graph
- a visualisation of the combined SMG and V&A collections
- maps of all the places in the knowledge graph
Heritage Connector NLP
NLP tools for heritage collections
https://github.com/TheScienceMuseum/heritage-connector-nlp
Heritage Connector Vectors
Generates graph and language embeddings for the Heritage Connector project.
https://github.com/TheScienceMuseum/heritage-connector-vectors
Heritage Connector APIs
Various API’s for the querying Heritage Connector project https://github.com/TheScienceMuseum/heritage-connector-apis
Heritage Connector Data
Various Public data sets for the Heritage Connector Project.
Further public datasets and outputs can be found on the HC / TANC page on Zenodo
Heritage Connector Thor / Fuseki
These repos provide a SPARQL server (Fuseki), a SPARQL client (Thor) and a Proxy server (to deal with CORS headers) as Docker containers. All three component can be installed in-one-go via this repo
- https://github.com/TheScienceMuseum/fuseki-docker
- https://github.com/TheScienceMuseum/thor-docker
- https://github.com/TheScienceMuseum/thor-cors-proxy
Elastic Wikidata
Simple CLI tools to load a subset of Wikidata into Elasticsearch.