The Heritage Connector project is composed of multiple components and services split across multiple GitHub repos. Below we give an overview of the project’s main code repositories:
Contains the main bootstrap / pipeline used by the Heritage Connector to import data into a graph before performing NPL and Graph analysis on it. Individual repos provide much of the actual functionality and are listed separately below.
A set of tools to:
- load tabular collection data to a knowledge graph
- find links between collection entities and Wikidata
- perform NLP to create more links in the graph (also see hc-nlp)
- explore and analyse a collection graph ways that aren’t possible in existing collections systems
An easy way to deploy all the Heritage Connector API’s and Endpoints, with the exception of the main pipeline, in one step.
The following services are included and all configured through environment variables:
- fuseki - RDF triplestore
- thor - front end for performing SPARQL queries
- thor-cors-proxy - CORS proxy to enable thor to connect to fuseki
- heritage-connector-vectors - an nearest neighbours on knowledge graph embeddings
- heritage-connector-apis - API endpoints to wrap some common SPARQL queries and the nearest neighbours API.
This repo contains various demos and sketches of demos for Heritage Connector.
- an interactive streamlit app showing NER and entity linking which uses static data for speed (not hosted at the moment)
- a bookmarklet to view connections from an SMG collection, blog or journal page
- a macro visualisation of the whole collection/knowledge graph
- a visualisation of the combined SMG and V&A collections
- maps of all the places in the knowledge graph
NLP tools for heritage collections
Generates graph and language embeddings for the Heritage Connector project.
Various API’s for the querying Heritage Connector project https://github.com/TheScienceMuseum/heritage-connector-apis
Various Public data sets for the Heritage Connector Project.
Further public datasets and outputs can be found on the HC / TANC page on Zenodo
These repos provide a SPARQL server (Fuseki), a SPARQL client (Thor) and a Proxy server (to deal with CORS headers) as Docker containers. All three component can be installed in-one-go via this repo
Simple CLI tools to load a subset of Wikidata into Elasticsearch.