Syntactic sugar for yaml schemas
When you write Metainfo schemas in .archive.yaml
files, you have to follow the Metainfo "lanuage" very precisely. On top, we also lack semantic checks and good error messages. As a result, it is not very easy to write these schemas. Additionally, our YAML "flavour" is very different to this used by the Nexus tools. This also does not help.
Examples
We need to put some "syntactic suggar" around our YAML. Here is a before after example.
section_definitions:
Values:
description: Represents a named array for float data
quantities:
name:
type: str
description:
type: str
values:
type: np.float64
shape: ['*']
MySection:
base_sections: nomad.datamodel.EntryData
quantities:
time:
type: int
shape: ['*']
unit: s
sub_sections:
values:
section: Values
repeats: true
Values:
description: Represents a named array for float data
name: str
description: str
values: np.float64[*]
MySection(nomad.datamodel.EntryData):
time: int[*] in s
values: Values*
However, there are a few problems:
- definition properties like
name
ordescription
might collide with properties that the schema wants to define (likeValues.description
is colliding withSection.description
). - it might not be clear if we want to define a quantity or a sub_section
A more explicit form without these problems might be this:
Values:
m_def: Section
m_description: Represents a named array for float data
name:
m_type: str
values:
m_type: np.float64
m_shape: ['*']
MySection(nomad.datamodel.EntryData):
time:
m_def: Quantity
m_type: int
m_shape: ['*']
m_unit: s
values:
m_def: SubSection
m_section: Values
m_repreats: true
But we loose some convenience again. How much can be implied and how much has to be explicit?
Implementation
- can we keep a line/col mapping to objects parsed from YAML to include in errors
- errors should include paths, e.g. "MySection.values.m_section: The referenced section Values does not exist."
- the output is dict data that can be put into
Package.m_from_dict
. The validation of the resultingPackage
might produce semantic errors, that ideally could also reported back to a path? - its all about giving options: lots of aliases, user decide if they want to have it explicit or not
- documentation is important
- ideally this can be reused for nexus. Their schema files currently look like this: https://github.com/FAIRmat-Experimental/nexus_definitions/tree/3c4cbcbb90640336206b99b75e03735f2353b9c6/applications/nyaml