Graph database prototype

After discussions across the areas, it has become clear that the current results-section is not fulfilling our needs anymore. The results section as it stands is only able to properly capture workflows that contain a single system/material, a single method, and several properties. We are now transitioning into a much more complicated scenario with multiple systems, multiple methods very complex workflow graphs which can dramatically differ between entries.

In order to try out a solution, we wanted to try out a graph database that could better capture all of the interesting properties in these more complicated workflows. This first step will only be a POC, which attemps to capture only the systems in a workflow, and consists of the following steps:

Simple (local) performance test of how Neo4J queries scale with different types of data.
- How the query time scales with respect to the number of entries in the database (e.g. range 100-100 000 entries)
- How the query time scales with respect to the size of individual graphs in the database (e.g. range 10-10000 nodes and edges)
- How the query time scales with respect to the query complexity (e.g. range 1-10 connections queried)
Agree on a very simple base class that represents systems, he most minimal definition will do for now. The agreement should be between areas A, B and C, also possibly looking into optimade and already existing ontologies a bit.
Adding Neo4J into our docker infrastructure
Metainfo annotations for Neo4J. Certain quantities and sections can become nodes, parent/child relationships can become edges.
Adding Neo4J ingestion based on the annotations.
Adding an entry query API endpoint for Neo4J.
Adding a new search menu that builds meaningful queries based on user input.

Edited Nov 24, 2022 by Lauri Himanen

Assignee

Assign to

Time tracking