Graph database prototype
After discussions across the areas, it has become clear that the current results
-section is not fulfilling our needs anymore. The results section as it stands is only able to properly capture workflows that contain a single system/material, a single method, and several properties. We are now transitioning into a much more complicated scenario with multiple systems, multiple methods very complex workflow graphs which can dramatically differ between entries.
In order to try out a solution, we wanted to try out a graph database that could better capture all of the interesting properties in these more complicated workflows. This first step will only be a POC, which attemps to capture only the systems in a workflow, and consists of the following steps:
-
Simple (local) performance test of how Neo4J queries scale with different types of data. -
How the query time scales with respect to the number of entries in the database (e.g. range 100-100 000 entries) -
How the query time scales with respect to the size of individual graphs in the database (e.g. range 10-10000 nodes and edges) -
How the query time scales with respect to the query complexity (e.g. range 1-10 connections queried)
-
-
Agree on a very simple base class that represents systems, he most minimal definition will do for now. The agreement should be between areas A, B and C, also possibly looking into optimade and already existing ontologies a bit. -
Adding Neo4J into our docker infrastructure -
Metainfo annotations for Neo4J. Certain quantities and sections can become nodes, parent/child relationships can become edges. -
Adding Neo4J ingestion based on the annotations. -
Adding an entry query API endpoint for Neo4J. -
Adding a new search menu that builds meaningful queries based on user input.