Possibility to use custom schemas in apps and search
This MR adds the possibility of using scalar quantities from custom schemas (both Python and YAML) in queries, aggregations and in app definitions. A new documentation for Apps is also included.
Highlights
-
That path in the data is no longer enough to target a quantity. E.g.
data.sample.id
may be defined by several different schemas. We need to add an identifier for the schema in the quantity name. -
Schemas are identified using the
qualified_name
of the root section (can be fetched withsection.qualified_name()
):- Python schemas: path of the class name, e.g.
nomadschemaexample.schema.MySchema
- YAML schemas:
entry_id:gTqaJYQ7IH20dl5PeX7ZPzkHudI8.MySchema
The other option would be to use the reference syntax (can be fetched with
definition_reference()
), but the references to YAML files are very hard to use, as they look like this:../uploads/Yl6DTVCVS1GqYRwEhmexrA/raw/schema.archive.yaml#/definitions/section_definitions/1
. - Python schemas: path of the class name, e.g.
-
The full quantity identifier is
<path>#<schema_name>
to target a quantity in the schema. E.g.data.sample.id#nomadschemaexample.schema.MySchema
. -
The GUI will support simplified rendering of schema identifiers to remove clutter from the UI, but in the app config and in our backend the quantities need to be identified by this full name.
-
Currently
#
is used as a separator between path and schema name. The choice is complicated by the fact that many separator are reserved for other purposes:-
.
denotes section hierarchy in paths, and is also used in schema identifiers. -
:
is used in the YAML schema name and also for query modifiers, e.g.material.elements:all: ['Si', 'C']
-
/
is used in inner_section_definitions and in YAML schema paths -
&
is reserved for url query parameters -
@
is reserved for indicating a hash digest for a definition. Used to e.g. distinguish between different versions of a schema. - Any operators commonly used in boolean logic (+, -, &, |) should not be used if we want to later add support for them in the search bar.
-
-
For technical reasons, the API calls targeting YAML quantities will need to include the data type. The current syntax is like this:
<path>#<schema_name>#<dtype>
. This data type is added fully transparently by the GUI, but it is up to the user to include it in manual API calls (you will get a meaningful warning if you omit it).
Example setups for testing
Python schema
- Get the code:
git checkout search
- Add the test Python schema included in our source code to your
PYTHONPATH
:export PYTHONPATH="${PYTHONPATH}:/<root folder>/nomad-FAIR/tests/data/plugins"
- Copy and use this
nomad.yaml
file: nomad.yaml - Boot up docker, appworker, GUI
- Login, create new upload, upload this file: dataset.zip
- Go to "Explore/My Python Schema" to try out different things.
YAML schema
- Get the code:
git checkout search
- Boot up docker, appworker, GUI
- Login, create new upload, upload this file containing the schema: schema.archive.yaml. Note down the
upload_id
andentry_id
for the schema. - Modify this
nomad.yaml
file: nomad.yaml so that it uses theentry_id
you got from the previous step. - Use the
upload_id
in line 8 of this script: generator.py. Run the script, zip the produced dataset folder. - Restart appworker with the new nomad.yaml file. Login, upload the zipped dataset.
- Go to "Explore/My YAML Schema" to try out different things.
Known limitations compared to "native" quantities
- Search boxes cannot show suggestions for values (due to the hierarchy being flattened within ES, our suggestion mechanism cannot filter out the values to show).
- Only scalar quantities are available (might be possible to get around, not sure)
- Nested queries are not possible (due to the hierarchy being flattened in ES)
- By default, a single document can contain 10 000 nested documents. So for very large archives, this limit may be hit at some point when
search_quantities
is populated.