Beginning with 2nd June, only the "Single Sign On" option for login to the GitLab web interface will be possible. If you don't have an MPCDF wide second factor so far, please get one at our SelfService (https://selfservice.mpcdf.mpg.de). The GitLab internal second factor will not work.
I was having a look at the generated nexus.py. My observation is that you use a lot of additional definitions to provide properties about the definitions itself. Basically all the nxp_documentation, nxp_type, nxp_optional, nxp_unit, etc. These a properties of the definitions not properties of the data. This is not really a schema. This is a schema that folds itself into the instances.
This is very fundamental and we should address this before we can continue with work on the GUI and other consolidation/refinement. @sanbrock: We should discuss this "in person" soon.
I rewrote and commented NXsplit as an example. Please compare this to the one in git.
classBase(Category)classNXslit(NXobject):# This is where the documentation goes. Documentation is a property of the class# NXsplit, not a property of its instances (like npx_documentation) would be.''' A simple slit. For more complex geometries, :ref:`NXaperture` should be used.'''m_def=Section(validate=False,links=['https://manual.nexusformat.org/classes/base_classes/NXslit.html#nxslit'],# There is a category concept in the metainfo that could be usedcategories=[Base],# This is an alternative place for the documentation. It is equivalent to putting# in a Python class doc string.description='...')# I have no idea what a category is, it appears neither on NXsplit nor on NXobject in# the nexus docs. But we have a category concept in the metainfo that could be used.# It works like putting tags or labels on definitions.# category = Quantity(type=str, default='base')# I am not sure about the npx_ prefix. I would only use it where necessary.depends_on=Quantity(# This is where the type goes. If it is a primity like NX_CHAR, NX_NUMBER, ...# there are python/metainfo equivalents. If it is complex, we need to define a# section class and then you can use a sub-section to add a property of this type.type=str,links=['https://manual.nexusformat.org/classes/base_classes/NXslit.html#nxslit-depends-on-field'],# Not yet supported by metainfo, but this is where it needs to go. Its a property# of the quantity and not its values.optional=True,# This is where a property documentation goes. It is a property of the quantity# not a property of its values (like npx_documentation).description=''' Points to the path of the last element in the geometry chain that places this object in space. When followed through that chain is supposed to end at an element depending on "." i.e. the origin of the coordinate system. If desired the location of the slit can also be described relative to an NXbeam, which will allow a simple description of a non-centred slit.''')x_gap=Quantity(# Float could be a metainfo/python mapping for NX_NUMBER.type=np.float64,optional=True,description='Size of the gap opening in the first dimension of the local coordinate system.',links=['https://manual.nexusformat.org/classes/base_classes/NXslit.html#nxslit-x-gap-field'],# Parsers are supposed to convert all values into SI units. Therefore, the metainfo# equivalent of length would be meter.unit='m')y_gap=Quantity(type=np.float64,optional=True,description='Size of the gap opening in the second dimension of the local coordinate system.',links=['https://manual.nexusformat.org/classes/base_classes/NXslit.html#nxslit-y-gap-field'],unit='m')# I guess here the prefix might be necessarynxp_default=Quantity(type=str,optional=True,description=''' .. index:: plotting Declares which child group contains a path leading to a :ref:`NXdata` group. It is recommended (as of NIAC2014) to use this attribute to help define the path to the default dataset to be plotted. See https://www.nexusformat.org/2014_How_to_find_default_data.html for a summary of the discussion.')''',links=['https://manual.nexusformat.org/classes/base_classes/NXslit.html#nxslit-default-attribute']
The additional nxp_documentation, type, unit, links, ... make up 90% of the generated definitions. Besides being a performance factor, I am not sure that this is really what you indent to do or at least what you should do. If you are only doing this to overcome metainfo limitations, this would also be the wrong approach. We should but the most common things into the metainfo (e.g. optional and stuff like this, make inheritance work better, etc.), find work arounds for special nexus things (e.g. TEMPLATE_properties), or simply omit very hard to translate and exceptional information.
Talking about limitations: now lets imagine a few exceptions and more complicated cases:
First a TEMPLATE_PROPERTY
# We have a general class for template propertiesclassTemplateProperty(MSection):name=Quantity(type=str)classNXsomething(NXobject):''' NXsomething documentation '''classExampleTemplateQuantity(TemplateQuantity):# This kind of overwriting does not work yet, but we should make it workname=Quantity(default='EXAMPLE')# Here we add the actual value definition for EXAMPLEvalue=Quantity(type=float64,shape=[3],unit='m')EXAMPLE=SubSection(sub_section=ExampleTemplateQuantity,# None value definition properties (like type, unit, ..) should go heredescription='Documentation for Example',optional=False,links=['...'])
Second, we want to change a definition while sub-classing
classNXsomething(NXObject):example=Quantity(type=str,optional=True,description='basedocumentation')inheritedAsIs=Quantity(type=str,optional=True,description='basedocumentation')classNXsomeContainer(NXObject):classMyNXsomething(NXsomething):# This does yet fully work. But we should make it that the new Quantity properties (e.g. optional, description)# overwrite the old ones and the rest is inherited (e.g. type)# Thus, you only have to change what needs to changeexample=Quantity(optional=False,description='Additional docs')# If you don't need to change inheritedAsIs it will be simply inheritedsomething=SubSection(sub_section=NXsomethingDerived)
Third, we desperately need to move a definition property to instance level for some reason (should be avoided though!)
classNXsomething(NXobject):''' NXsomething documentation '''classQuantityWithUnit(MSection):# Here we add the actual value definition for example_quantityvalue=Quantity(type=float64,shape=[3])# And the unit that we for some reason need to change when instantiating NXsomething.unit=Quantity(type=Unit,default='m')example_quantity=SubSection(sub_section=QuantityWithUnit,# Most of the definition properties should go heredescription='Documentation for Example',optional=False,links=['...'])
Forth, for some reason we want to allow instances to overwrite definition level properties like documentation
# This is most likely something we want to enable for everything. Therefore, put it into NXobject.classNXobject(MSection):# Since this is kinda special and would apply to all elements. Maybe prefixing it, e.g. nx_, might make it# easier to distinguish from normal properties.nx_documentation=Quantity(type=str,optional=True,description=''' All elements in nexus have a documentation in the schema. This quantity is only used to provide additional instance specific documentation.''')classNXsomething(NXobject):''' This is the definition documentation. This would appear in the metainfo and archive browser. '''something=NXsomething(nx_documentaton=''' This is additional instance documentation set by the parser and will only appear in the archive browser.''')
In hindsight, I feel, we should have worked more on handwritten conceptual examples first. Those would also have helped to identify and remove limitations in the metainfo.
@sanbrock I put your latest nomad-FAIR commit on the north-nexus branch. I will also change the open merge request about nexus to this branch and remove the old nexus related branch.
You should consider the north branch a protected branch that needs MRs, the north-nexus branch would be yours, north-* branches might belong to others. Pushing to branches other your own, makes rebasing and stuff a little more dangerous.
Yes, the implemented structure is not how it should be rather than a base for discussion how things shall evolve. I agree that we should first clean the structure and do the optimisation step only afterwards.
Currently Nexus Groups/Fields/Attributes are all moved to SubSections (and their properties, like being optional, documentation, enumeration, etc. to Quantities), whereas normally one would expect already Fields to be Quantities. Problem with this is that a Field can have ANY kind of Attributes defined. Hence, a few default Attributes (e.g. default /what to visualised/, units /actual unit of a data element (e.g. 'm'), not to confused with units property describing the quantity type (e.g. NX_LENGTH)/) could be moved to become property of a Field even if represented as Quantity, but the possibility of additional Attributes still suggest to code Fields into SubSection.
On the other hand an Attribute should already be represented as a Quantity. For this, we shall check if and how all Attribute properties can be moved into Quantity. They are:
name,
type (e.g. NX_DATE_TIME),
optional,
deprecated,
doc,
dimensions+dim+index+value,
enumeration+item+value+doc.
Also Nexus primitive classes can be converted to built-in NOMAD metainfo classes:
ISO8601
ISO8601 date/time stamp. It is recommended to add an explicit time zone, otherwise the local time zone is assumed per ISO8601. The norm is that if there is no time zone, it is assumed local time, however, when a file moves from one country to another it is undefined. If the local time zone is written, the ambiguity is gone.
NX_BINARY
any representation of binary data - if text, line terminator is [CR][LF]
NX_CHAR
The preferred string representation is UTF-8. Both fixed-length strings and variable-length strings are valid. String arrays cannot be used where only a string is expected (title, start_time, end_time, NX_class attribute,…). Fields or attributes requiring the use of string arrays will be clearly marked as such (like the NXdata attribute auxiliary_signals). This is the default field type.
NX_DATE_TIME
Alias for the ISO8601 date/time stamp. It is recommended to add an explicit time zone, otherwise the local time zone is assumed per ISO8601.
NX_FLOAT
any representation of a floating point number
NX_INT
any representation of an integer number
NX_NUMBER
any valid NeXus number representation
NX_POSINT
any representation of a positive integer number (greater than zero)
NX_UINT
any representation of an unsigned integer number (includes zero)
On branch 672-metainfo-nexus-improvements, I added the ability to add arbitrary more properties to definitions. In the table I listed them in italics. You can basically add everything you want. It will be considered more, if it has no existing meaning in the metainfo. For example:
The code generator needs be adapted to generate the right parameters from the more property.
Definition property mapping
nx
nomad
name
name, when writing metainfo with Python directly, you don't need to set this manually and the python identifier will be used
type
type, but you have to map the type
optional
optional, will become part of more
deprecated
deprecated, but its a string not a bool!
doc
description
dimensions+dim+index+value
shape, shapes are list of Union[str,int] representing the size of dimensions. The values are not interpreted, but conventions are numbers for absolute sizes, str like '1..4', '2..*', or str that refer to quantities.
enumeration+item+value+doc
values, will become part of more. You can also generate Python enums and use those as type.
Primitive type mapping
nx
nomad/python
NX_BINARY
metainfo.Bytes, this should not be used or any "BIG" data, because the metainfo will base64 encode this for JSON compatibility.
NX_BOOLEAN
bool
NX_CHAR
str
NX_DATE_TIME
metainfo.Datetime
NX_FLOAT
float or a more precise numpy type
NX_INT
int
NX_NUMBER
Union[int,float] or float or a more precise numpy type
NX_POSINT
int or a more precise numpy type
NX_UINT
int or a more precise numpy type
You can use a more property like nx_type to additionally save the original NX_type.
Unit mapping
We would use the SI unit equivalent as a unit. Maybe Pint has a feature for getting the SI unit for a unit category. You can also use a more property like nx_unit to additionally save the original NX_unit.
classMySection(MSection):a_TEMPLATE=Quantity(type=str,template=True)a_OTHER_TEMPLATE=Quantity(type=str,template=True)# This could workmyObject=MySection()myObject.a_TEMPLATE=('a_name','my value')assertmyObject.a_name=='my value'assertmyObject.a_TEMPLATE==('a_name','my_value')# Don't know if we will have check for uniqunessmyObject.a_TEMPLATE=('a_name','value 1')myObject.a_OTHER_TEMPLATE=('a_name','value 1')assertmyObject.a_name=='unclear what this should be now'# This will never work (and why I don't like this nexus feature)myObject.a_name='my_value'# Unclear if this is a_TEMPLATE or a_OTHER_TEMPLATE. In the GUI for example, we would not # be able to show the right definition for this data
importjsonfromnomad.metainfoimportMSectionfromnomad.metainfo.metainfoimportPackage,Quantity,SectionProxy,SubSectionm_package=Package(name='advanced_metainfo_example')classBaseSection(MSection):notes=Quantity(type=str,description='Additional notes about this section in Markdown.',links=['https://markdown.org'],# 'format' does not exist in metainfo schemas, but will be added as 'more'format='Markdown')# Section can have inner section definitions. Those will be mapped to 'inner_section_defs'# in the schema.classUser(MSection):# The class doc string will be used as Section schema description''' A section for basis user information. '''first_name=Quantity(type=str)last_name=Quantity(type=str)# 'optional' does not exist in metainfo schemas, but will be added add 'more'email=Quantity(type=str,format='email',optional=True)authors=SubSection(# You can use inner sections as usual.sub_section=User,description='The user that authored this section.',# Repeats controlles if the multiplicity in instances. Here authors is defined as# a list of User objects.repeats=True)# Section classes can be sub-classed, the base class section will be added as 'base_sections'# in the schema.classApplicationSection(BaseSection):# Inner section definitions can be sub-classed as well.classUser(BaseSection.User):user_id=Quantity(type=int)# 'email' is already defined in the base section. All schema properties of email# are inherited, so you do not have to repeat the type or other properties.email=Quantity(# 'deprecated' is an actual metainfo property.# Can be added to all definitions. It is a string that describes how to# deal with the deprecation propery.deprecated='Use user_id as a replacement')# ApplicationSection.User only defined a special version of BaseSection.User. The# inherited sub section 'authors' would sill use BaseSection.User, we have to# overwrite the section schema that the sub section is using with the new definition.authors=SubSection(sub_section=User)data=SubSection(sub_section=SectionProxy('ApplicationData'),repeats=True)classApplicationData(MSection):name=Quantity(type=str)value=Quantity(type=str)if__name__=='__main__':print('Schema --------------------\n',json.dumps(m_package.m_to_dict(with_meta=True),indent=2))archive=ApplicationSection(notes='Some example data about artifical movie life',authors=[ApplicationSection.User(user_id=1,first_name='Sandor',last_name='Brockhauser')])archive.data.append(ApplicationData(name='robot',value='THX-1138'))archive.data.append(ApplicationData(name='droid',value='C-3PO'))print('Data ----------------------\n',json.dumps(archive.m_to_dict(with_meta=True),indent=2))
I just realised that type=bytes are not yet supported by the metainfo.
Also, there already is a deprecated property on all definitions. This is of type str and the idea is that this describes how to deal with the deprecation.
I added a type=metainfo.Bytes. This allows to use Python bytes as values. This should not be used for any substantial binary data though. The metainfo has to be JSON compatible; we want to show the archive data in the GUI eventually. JSON does not support binary and the common approach is base64 encoding. Of course this is 4 x actual binary in size.