A convenient metainfo schema file format
The metainfo already has "json-compatible" serialization and deserialization functions (MSection.m_from_dict()
, MSection.m_to_dict()
). These can also be used for schemas, which technically are just "normal" metainfo data. In principle, it is easy to use yaml, along-side json to deserialize (i.e. parse) a yaml-based schema. However, ...
- the structure is based on dicts and arrays. Packages have
section_definitions
, Sections havequantities
, etc. But, most of these array elements have a unique name that could be used in a more convinient dict. - plural aliases would be nice (e.g.
base_section: foo
in addtion tobase_sections: ['foo']
- some of the schema terms could use some more friendly aliases, e.g. use
sections
instead ofsection_definitions
- some types are serialized rather complicated. Instead of needing to use
{"type_kind": "python", "type_data": "str"}
,{"type_kind": "numpy", "type_data": "float64"}
,{"type_kind": "reference", "type_data": "#/section_definitions/0"}
, we could guess the right "kind" from simply string values likestr
,np.float64
, or#/section_definitions/0
. - references are based on the dict array structure, which is not ideal for the same reason as before. Something like
#/section_definitions/0
referring to the first section definition in the array ofsection_definitions
could be replaced by something based on section name. E.g.,#/Process
in the case the the first section of the package is namedProcess
.
We are only considering the deserialize direction.
Example:
m_def: 'nomad.metainfo.metainfo.Package'
sections:
Sample:
base_section: 'nomad.datamodel.metainfo.Sample'
quantities:
sample_id:
type: str
description: |
This is a description with *markup* using [markdown](https://markdown.org).
It can have multiple lines, because yaml allows to easily do this.
m_annotations:
eln:
component: StringEditComponent
Process:
quantities:
samples:
type: '#/Sample'
shape: ['*']
sub_sections:
samples:
section_def: '#/Sample'
repeats: true
SpecialProcess:
base_section: '#/Process'
quantities:
values:
type: np.float64
shape: [3, 3]
How to approach this?
-
replicate the given example in Python -
serialize this Python example to yaml and json with m_to_dict()
-
compare with the given yaml above to understand the points given in this issue -
MSection.m_from_dict
is used to deserialise. Implement Package, Section, Quantity, SubSection specific overwrites form_from_dict
that resolve the convenience notation into the regular form before calling thesuper
implementation -
Add good error handling. This format will be used by end-users. -
Add extensive tests (also for the error handling)
Edited by Mohammad Nakhaee