metainfo.py 41.3 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
# Copyright 2018 Markus Scheidgen
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an"AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
14

15
16
17
18
"""
The NOMAD meta-info allows to define physics data quantities. These definitions are
necessary for all computer representations of respective data (e.g. in Python,
search engines, data-bases, and files).
19

20
This modules provides various Python interfaces for
21

22
23
24
- defining meta-info data
- to create and manipulate data that follows these definitions
- to (de-)serialize meta-info data in JSON (i.e. represent data in JSON formatted files)
25

26
27
28
29
Here is a simple example that demonstrates the definition of System related quantities:

.. code-block:: python

Markus Scheidgen's avatar
Markus Scheidgen committed
30
    class System(MSection):
31
32
33
34
35
        \"\"\"
        A system section includes all quantities that describe a single a simulated
        system (a.k.a. geometry).
        \"\"\"

36
37
38
39
        n_atoms = Quantity(
            type=int, description='''
            A Defines the number of atoms in the system.
            ''')
40

41
42
43
44
        atom_labels = Quantity(type=Enum(ase.data.chemical_symbols), shape['n_atoms'])
        atom_positions = Quantity(type=float, shape=['n_atoms', 3], unit=Units.m)
        simulation_cell = Quantity(type=float, shape=[3, 3], unit=Units.m)
        pbc = Quantity(type=bool, shape=[3])
45

46
47
48
    class Run(MSection):
        systems = SubSection(sub_section=System, repeats=True)

49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
Here, we define a `section` called ``System``. The section mechanism allows to organize
related data into, well, sections. Sections form containment hierarchies. Here
containment is a parent-child (whole-part) relationship. In this example many ``Systems``,
are part of one ``Run``. Each ``System`` can contain values for the defined quantities:
``n_atoms``, ``atom_labels``, ``atom_positions``, ``simulation_cell``, and ``pbc``.
Quantities allow to state type, shape, and physics unit to specify possible quantity
values.

Here is an example, were we use the above definition to create, read, and manipulate
data that follows these definitions:

.. code-bock:: python

    run = Run()
    system = run.m_create(System)
    system.n_atoms = 3
    system.atom_labels = ['H', 'H', 'O']

    print(system.atom_labels)
    print(run.m_to_json(ident=2))

This last statement, will produce the following JSON:

.. code-block:: JSON

    {
75
        "m_def" = "Run",
76
77
        "System": [
            {
78
                "m_def" = "System",
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
                "m_parent_index" = 0,
                "n_atoms" = 3,
                "atom_labels" = [
                    "H",
                    "H",
                    "O"
                ]
            }
        ]
    }

This is the JSON representation, a serialized version of the Python representation in
the example above.

Sections can be extended with new quantities outside the original section definition.
This provides the key mechanism to extend commonly defined parts with (code) specific
quantities:

.. code-block:: Python

    class Method(nomad.metainfo.common.Method):
        x_vasp_incar_ALGO=Quantity(
            type=Enum(['Normal', 'VeryFast', ...]),
            links=['https://cms.mpi.univie.ac.at/wiki/index.php/ALGO'])
        \"\"\"
        A convenient option to specify the electronic minimisation algorithm (as of VASP.4.5)
        and/or to select the type of GW calculations.
        \"\"\"


All meta-info definitions and classes for meta-info data objects (i.e. section instances)
110
inherit from :class:` MSection`. This base-class provides common functions and properties
111
112
113
114
115
116
for all meta-info data objects. Names of these common parts are prefixed with ``m_``
to distinguish them from user defined quantities. This also constitute's the `reflection`
interface (in addition to Python's build in ``getattr``, ``setattr``) that allows to
create and manipulate meta-info data, without prior program time knowledge of the underlying
definitions.

Markus Scheidgen's avatar
Markus Scheidgen committed
117
.. autoclass:: MSection
118
119
120

The following classes can be used to define and structure meta-info data:

Markus Scheidgen's avatar
Markus Scheidgen committed
121
- sections are defined by sub-classes :class:`MSection` and using :class:`Section` to
122
  populate the classattribute `m_def`
123
124
125
126
127
128
129
130
131
132
133
134
- quantities are defined by assigning classattributes of a section with :class:`Quantity`
  instances
- references (from one section to another) can be defined with quantities that use
  section definitions as type
- dimensions can use defined by simply using quantity names in shapes
- categories (former `abstract type definitions`) can be given in quantity definitions
  to assign quantities to additional specialization-generalization hierarchies

See the reference of classes :class:`Section` and :class:`Quantities` for details.

.. autoclass:: Section
.. autoclass:: Quantity
135
136
"""

137
138
# TODO validation

139
140
from typing import Type, TypeVar, Union, Tuple, Iterable, List, Any, Dict, cast
import sys
141
import inspect
142
import re
143
import json
144

145
import numpy as np
146
147
from pint.unit import _Unit
from pint import UnitRegistry
148

Markus Scheidgen's avatar
Markus Scheidgen committed
149
150
is_bootstrapping = True
MSectionBound = TypeVar('MSectionBound', bound='MSection')
151

152

153
# Reflection
154

155
class Enum(list):
156
    """ Allows to define str types with values limited to a pre-set list of possible values. """
157
158
159
    pass


160
161
162
163
164
165
166
167
class DataType:
    """
    Allows to define custom data types that can be used in the meta-info.

    The metainfo supports most types out of the box. These includes the python build-in
    primitive types (int, bool, str, float, ...), references to sections, and enums.
    However, in some occasions you need to add custom data types.
    """
Markus Scheidgen's avatar
Markus Scheidgen committed
168
    def type_check(self, section, value):
169
170
        return value

Markus Scheidgen's avatar
Markus Scheidgen committed
171
    def to_json_serializable(self, section, value):
172
173
        return value

Markus Scheidgen's avatar
Markus Scheidgen committed
174
    def from_json_serializable(self, section, value):
175
176
177
178
        return value


class Dimension(DataType):
Markus Scheidgen's avatar
Markus Scheidgen committed
179
    def type_check(self, value):
180
        if isinstance(value, int):
Markus Scheidgen's avatar
Markus Scheidgen committed
181
            return value
182
183
184

        if isinstance(value, str):
            if value.isidentifier():
Markus Scheidgen's avatar
Markus Scheidgen committed
185
                return value
186
            if re.match(r'(\d)\.\.(\d|\*)', value):
Markus Scheidgen's avatar
Markus Scheidgen committed
187
                return value
188
189

        if isinstance(value, Section):
Markus Scheidgen's avatar
Markus Scheidgen committed
190
            return value
191

192
        if isinstance(value, type) and hasattr(value, 'm_def'):
Markus Scheidgen's avatar
Markus Scheidgen committed
193
            return value
194
195
196
197

        raise TypeError('%s is not a valid dimension' % str(value))


Markus Scheidgen's avatar
Markus Scheidgen committed
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
class Reference(DataType):
    """ A datatype class that can be used to define reference types based on section definitions.

    A quantity can be used to define possible references between sections. Instantiate
    this class to create a reference type that specified that a quantity with this type
    is actually a reference (or references, depending on shape) to a section of the
    given definition.
    """
    def __init__(self, section: 'Section'):
        self.section = section


class QuantityReference(Reference):
    """ Instances represent a special reference type to reference other Quantities.

    It will allow quantity names as values and resolve them to the actual quantitiy
    definition. Only works for quantities defined within the same section.
    """

    def __init__(self):
        super().__init__(Quantity.m_def)

    def normalize(self, section: 'MSection', value: Union[str, 'Quantity']):
        if isinstance(value, Quantity):
            if value.m_parent != section.m_def:
                raise TypeError('Must be a quantity of the same section.')
            return value

        value = section.m_def.all_quantities[value]
        if value is not None:
            raise TypeError('Must be the name of a quantity in the same section.')

        return value

232
233
234
235
# TODO class Unit(DataType)
# TODO class Datetime(DataType)


236
class MObjectMeta(type):
237

238
239
    def __new__(self, cls_name, bases, dct):
        cls = super().__new__(self, cls_name, bases, dct)
Markus Scheidgen's avatar
Markus Scheidgen committed
240

Markus Scheidgen's avatar
Markus Scheidgen committed
241
242
        init = getattr(cls, '__init_cls__')
        if init is not None and not is_bootstrapping:
243
244
            init()
        return cls
245
246


Markus Scheidgen's avatar
Markus Scheidgen committed
247
Content = Tuple[MSectionBound, Union[List[MSectionBound], MSectionBound], str, MSectionBound]
248
249
250
251
252
253
254
255
256
257
258

SectionDef = Union[str, 'Section', 'SubSection', Type[MSectionBound]]
""" Type for section definition references.

This can either be :

- the name of the section
- the section definition itself
- the definition of a sub section
- or the section definition Python class
"""
259
260


Markus Scheidgen's avatar
Markus Scheidgen committed
261
262
class MSection(metaclass=MObjectMeta):
    """Base class for all section instances on all meta-info levels.
263

Markus Scheidgen's avatar
Markus Scheidgen committed
264
265
266
    All metainfo objects instantiate classes that inherit from ``MSection``. Each
    section or quantity definition is an ``MSection``, each actual (meta-)data carrying
    section is an ``MSection``. This class consitutes the reflection interface of the
267
268
269
270
271
272
273
274
275
276
277
278
279
    meta-info, since it allows to manipulate sections (and therefore all meta-info data)
    without having to know the specific sub-class.

    It also carries all the data for each section. All sub-classes only define specific
    sections in terms of possible sub-sections and quantities. The data is managed here.

    The reflection insterface for reading and manipulating quantity values consists of
    Pythons build in ``getattr``, ``setattr``, and ``del``, as well as member functions
    :func:`m_add_value`, and :func:`m_add_values`.

    Sub-sections and parent sections can be read and manipulated with :data:`m_parent`,
    :func:`m_sub_section`, :func:`m_create`.

280
281
282
283
284
    .. code-block:: python

        system = run.m_create(System)
        assert system.m_parent == run
        assert run.m_sub_section(System, system.m_parent_index) == system
285
286

    Attributes:
287
        m_def: The section definition that defines this sections, its possible
288
289
290
291
292
293
294
295
296
297
298
            sub-sections and quantities.
        m_parent: The parent section instance that this section is a sub-section of.
        m_parent_index: For repeatable sections, parent keep a list of sub-sections for
            each section definition. This is the index of this section in the respective
            parent sub-section list.
        m_data: The dictionary that holds all data of this section. It keeps the quantity
            values and sub-section. It should only be read directly (and never manipulated)
            if you are know what you are doing. You should always use the reflection interface
            if possible.
    """

299
    m_def: 'Section' = None
300

Markus Scheidgen's avatar
Markus Scheidgen committed
301
    def __init__(self, m_def: 'Section' = None, m_parent: 'MSection' = None, **kwargs):
302
        self.m_def: 'Section' = m_def
Markus Scheidgen's avatar
Markus Scheidgen committed
303
        self.m_parent: 'MSection' = m_parent
304
        self.m_parent_index = -1
305

306
        cls = self.__class__
307
308
        if self.m_def is None:
            self.m_def = cls.m_def
309

310
311
        if cls.m_def is not None:
            assert self.m_def == cls.m_def, \
312
313
                'Section class and section definition must match'

314
        self.m_annotations: Dict[str, Any] = {}
Markus Scheidgen's avatar
Markus Scheidgen committed
315
        rest = {}
316
317
318
319
        for key, value in kwargs.items():
            if key.startswith('a_'):
                self.m_annotations[key[2:]] = value
            else:
Markus Scheidgen's avatar
Markus Scheidgen committed
320
321
322
323
324
                rest[key] = value

        if is_bootstrapping:
            self.m_data: Dict[str, Any] = {}
            for key, value in rest.items():
325
326
                self.m_data[key] = value

Markus Scheidgen's avatar
Markus Scheidgen committed
327
328
329
330
331
332
        else:
            # self.m_data = {}
            # self.m_update(**rest)
            self.m_data = {}
            for key, value in rest.items():
                self.m_data[key] = value
333

334
    @classmethod
Markus Scheidgen's avatar
Markus Scheidgen committed
335
    def __init_cls__(cls):
336
337
        # ensure that the m_def is defined
        m_def = cls.m_def
Markus Scheidgen's avatar
Markus Scheidgen committed
338
        if m_def is None:
339
340
            m_def = Section()
            setattr(cls, 'm_def', m_def)
341

342
343
        # transfer name and description to m_def
        m_def.name = cls.__name__
344
        if cls.__doc__ is not None:
345
            m_def.description = inspect.cleandoc(cls.__doc__).strip()
346
        m_def.section_cls = cls
347

348
        for name, attr in cls.__dict__.items():
349
350
            # transfer names and descriptions for properties
            if isinstance(attr, Property):
351
                attr.name = name
352
                if attr.description is not None:
353
                    attr.description = inspect.cleandoc(attr.description).strip()
354
                    attr.__doc__ = attr.description
355

Markus Scheidgen's avatar
Markus Scheidgen committed
356
                # manual manipulation of m_data due to bootstrapping
357
358
359
360
361
362
363
364
365
                if isinstance(attr, Quantity):
                    properties = m_def.m_data.setdefault('quantities', [])
                elif isinstance(attr, SubSection):
                    properties = m_def.m_data.setdefault('sub_sections', [])
                else:
                    raise NotImplementedError('Unknown property kind.')
                properties.append(attr)
                attr.m_parent = m_def
                attr.m_parent_index = len(properties) - 1
366

Markus Scheidgen's avatar
Markus Scheidgen committed
367
368
369
370
371
372
373
374
375
376
377
378
        # add base sections
        for base_cls in cls.__bases__:
            if base_cls != MSection:
                section = getattr(base_cls, 'm_def')
                if section is None:
                    raise TypeError(
                        'Section defining classes must have MSection or a decendant as '
                        'base classes.')

                # manual manipulation of m_data due to bootstrapping
                m_def.m_data.setdefault('base_sections', []).append(section)

379
380
381
        # add section cls' section to the module's package
        module_name = cls.__module__
        pkg = Package.from_module(module_name)
382
        pkg.m_add_sub_section(cls.m_def)
383

Markus Scheidgen's avatar
Markus Scheidgen committed
384
385
    def m_type_check(self, definition: 'Quantity', value: Any, check_item: bool = False):
        """ Checks and normalized the given value according to the quantity type. """
386
387
388
389

        if value is None and not check_item and definition.default is None:
            # Allow the default None value even if it would violate the type
            return
390
391
392
393

        def check_value(value):
            if isinstance(definition.type, Enum):
                if value not in definition.type:
394
                    raise TypeError('Not one of the enum values.')
395
396
397

            elif isinstance(definition.type, type):
                if not isinstance(value, definition.type):
398
                    raise TypeError('Value has wrong type.')
399
400

            elif isinstance(definition.type, Section):
Markus Scheidgen's avatar
Markus Scheidgen committed
401
                if not isinstance(value, MSection) or value.m_def != definition.type:
402
                    raise TypeError('The value is not a section of wrong section definition')
403

Markus Scheidgen's avatar
Markus Scheidgen committed
404
405
406
            elif isinstance(definition.type, DataType):
                value = definition.type.type_check(self, value)

407
            else:
408
409
410
                # TODO
                # raise Exception('Invalid quantity type: %s' % str(definition.type))
                pass
411

Markus Scheidgen's avatar
Markus Scheidgen committed
412
413
            return value

414
415
416
417
418
419
420
        shape = None
        try:
            shape = definition.shape
        except KeyError:
            pass

        if shape is None or len(shape) == 0 or check_item:
Markus Scheidgen's avatar
Markus Scheidgen committed
421
            value = check_value(value)
422

423
424
425
426
427
428
429
430
        else:
            if type(definition.type) == np.dtype:
                if len(shape) != len(value.shape):
                    raise TypeError('Wrong shape')
            else:
                if len(shape) == 1:
                    if not isinstance(value, list):
                        raise TypeError('Wrong shape')
431

Markus Scheidgen's avatar
Markus Scheidgen committed
432
                    value = [check_value(item) for item in value]
433

434
                else:
Markus Scheidgen's avatar
Markus Scheidgen committed
435
                    raise NotImplementedError('Checking types is not available for complex shapes.')
436
437
438

        # TODO check dimension

Markus Scheidgen's avatar
Markus Scheidgen committed
439
440
        return value

441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
    def _resolve_sub_section(self, definition: SectionDef) -> 'SubSection':
        """ Resolves and checks the given section definition. """

        if isinstance(definition, type):
            definition = getattr(definition, 'm_def', None)
            if definition is None:
                raise TypeError(
                    'The type/class %s is not definining a section, i.e. not derived from '
                    'MSection.' % str(definition))

        if isinstance(definition, Section):
            sub_section = self.m_def.all_sub_sections_by_section.get(definition, None)
            if sub_section is None:
                raise KeyError(
                    'The section %s is not a sub section of %s.' %
                    (definition.name, self.m_def.name))

        elif isinstance(definition, str):
            sub_section = self.m_def.all_sub_sections[definition]

        elif isinstance(definition, SubSection):
            sub_section = definition
463
464

        else:
465
466
467
468
            raise TypeError(
                '%s does not refer to a section definition. Either use the section '
                'definition, sub section definition, section class, or name.' %
                str(definition))
469

470
471
472
473
474
475
476
477
478
479
480
        if sub_section is None:
            raise KeyError(
                'The section %s is not a sub section of %s.' %
                (cast(Definition, definition).name, self.m_def.name))

        if sub_section.m_parent is not self.m_def:
            raise KeyError(
                'The section %s is not a sub section of %s.' %
                (cast(Definition, definition).name, self.m_def.name))

        return sub_section
481

Markus Scheidgen's avatar
Markus Scheidgen committed
482
    def m_sub_sections(self, definition: SectionDef) -> List[MSectionBound]:
483
484
485
486
487
488
489
490
        """Returns all sub sections for the given section definition

        Args:
            definition: The definition of the section.

        Raises:
            KeyError: If the definition is not for a sub section
        """
491
492
        sub_section = self._resolve_sub_section(definition)
        return getattr(self, sub_section.name)
493

Markus Scheidgen's avatar
Markus Scheidgen committed
494
    def m_sub_section(self, definition: SectionDef, parent_index: int = -1) -> MSectionBound:
495
496
497
498
499
500
501
502
503
504
505
506
507
508
        """Returns the sub section for the given section definition and possible
           parent_index (for repeatable sections).

        Args:
            definition: The definition of the section.
            parent_index: The index of the desired section. This can be omitted for non
                repeatable sections. If omitted for repeatable sections a exception
                will be raised, if more then one sub-section exists. Likewise, if the given
                index is out of range.
        Raises:
            KeyError: If the definition is not for a sub section
            IndexError: If the given index is wrong, or if an index is given for a non
                repeatable section
        """
509
        sub_section = self._resolve_sub_section(definition)
510

511
        m_data_value = getattr(self, sub_section.name)
512
513

        if m_data_value is None:
514
            if sub_section.repeats:
515
516
517
                m_data_value = []
            else:
                m_data_value = None
518
519
520
521
522
523
524
525
526
527
528
529
530

        if isinstance(m_data_value, list):
            m_data_values = m_data_value
            if parent_index == -1:
                if len(m_data_values) == 1:
                    return m_data_values[0]
                else:
                    raise IndexError()
            else:
                return m_data_values[parent_index]
        else:
            if parent_index != -1:
                raise IndexError('Not a repeatable sub section.')
531
532

            return m_data_value
533

Markus Scheidgen's avatar
Markus Scheidgen committed
534
    def m_add_sub_section(self, sub_section: MSectionBound) -> MSectionBound:
535
536
        """Adds the given section instance as a sub section to this section."""

537
538
539
540
541
542
        sub_section_def = self._resolve_sub_section(sub_section.m_def.section_cls)
        sub_section.m_parent = self
        if sub_section_def.repeats:
            values = getattr(self, sub_section_def.name)
            sub_section.m_parent_index = len(values)
            values.append(sub_section)
543
544

        else:
545
546
            self.m_data[sub_section_def.name] = sub_section
            sub_section.m_parent_index = -1
547
548
549

        return sub_section

550
    def m_create(self, definition: Type[MSectionBound], **kwargs) -> MSectionBound:
551
        """Creates a subsection and adds it this this section
552

553
554
555
556
        Args:
            section: The section definition of the subsection. It is either the
                definition itself, or the python class representing the section definition.
            **kwargs: Are used to initialize the subsection.
557

558
559
        Returns:
            The created subsection
560

561
        Raises:
562
            KeyError: If the given section is not a subsection of this section.
563
        """
564
        sub_section: 'SubSection' = self._resolve_sub_section(definition)
565

566
567
        section_cls = sub_section.sub_section.section_cls
        section_instance = section_cls(m_def=section_cls.m_def, m_parent=self, **kwargs)
568

569
        return cast(MSectionBound, self.m_add_sub_section(section_instance))
570

571
572
573
    def __resolve_quantity(self, definition: Union[str, 'Quantity']) -> 'Quantity':
        """Resolves and checks the given quantity definition. """
        if isinstance(definition, str):
574
            quantity = self.m_def.all_quantities[definition]
575

576
        else:
577
            if definition.m_parent != self.m_def:
578
579
580
581
582
583
584
                raise KeyError('Quantity is not a quantity of this section.')
            quantity = definition

        return quantity

    def m_add(self, definition: Union[str, 'Quantity'], value: Any):
        """Adds the given value to the given quantity."""
585

586
587
        quantity = self.__resolve_quantity(definition)

Markus Scheidgen's avatar
Markus Scheidgen committed
588
        value = self.m_type_check(quantity, value, check_item=True)
589
590
591
592
593
594
595
596
597

        m_data_values = self.m_data.setdefault(quantity.name, [])
        m_data_values.append(value)

    def m_add_values(self, definition: Union[str, 'Quantity'], values: Iterable[Any]):
        """Adds the given values to the given quantity."""

        quantity = self.__resolve_quantity(definition)

Markus Scheidgen's avatar
Markus Scheidgen committed
598
        values = [self.m_type_check(quantity, value, check_item=True) for value in values]
599
600
601
602
603

        m_data_values = self.m_data.setdefault(quantity.name, [])
        for value in values:
            m_data_values.append(value)

604
605
606
    def m_update(self, **kwargs):
        """ Updates all quantities and sub-sections with the given arguments. """
        for name, value in kwargs.items():
607
608
            prop = self.m_def.all_properties.get(name, None)
            if prop is None:
609
610
                raise KeyError('%s is not an attribute of this section' % name)

611
612
            if isinstance(prop, SubSection):
                if prop.repeats:
613
614
615
616
                    if isinstance(value, List):
                        for item in value:
                            self.m_add_sub_section(item)
                    else:
617
                        raise TypeError('Sub section %s repeats, but no list was given' % prop.name)
618
619
620
621
622
623
                else:
                    self.m_add_sub_section(item)

            else:
                setattr(self, name, value)

624
625
    def m_to_dict(self) -> Dict[str, Any]:
        """Returns the data of this section as a json serializeable dictionary. """
626
627

        def items() -> Iterable[Tuple[str, Any]]:
628
            yield 'm_def', self.m_def.name
629
            if self.m_parent_index != -1:
630
                yield 'm_parent_index', self.m_parent_index
631

632
            for name, sub_section in self.m_def.all_sub_sections.items():
633
634
635
636
637
638
639
640
                if name not in self.m_data:
                    continue

                if sub_section.repeats:
                    yield name, [item.m_to_dict() for item in self.m_data[name]]
                else:
                    yield name, self.m_data[name].m_to_dict()

641
            for name, quantity in self.m_def.all_quantities.items():
642
643
644
645
                if name in self.m_data:
                    value = getattr(self, name)
                    if hasattr(value, 'tolist'):
                        value = value.tolist()
646
647
648
649
650
651
652
653
654
655
656
657
658
659

                    # TODO
                    if isinstance(quantity.type, Section):
                        value = str(value)
                    # TODO
                    if isinstance(value, type):
                        value = str(value)
                    # TODO
                    if isinstance(value, np.dtype):
                        value = str(value)
                    # TODO
                    if isinstance(value, _Unit):
                        value = str(value)

660
661
662
                    yield name, value

        return {key: value for key, value in items()}
663

664
    @classmethod
Markus Scheidgen's avatar
Markus Scheidgen committed
665
    def m_from_dict(cls: Type[MSectionBound], dct: Dict[str, Any]) -> MSectionBound:
666
        section_def = cls.m_def
667

668
669
        # remove m_def and m_parent_index, they set themselves automatically
        assert section_def.name == dct.pop('m_def', None)
670
671
672
        dct.pop('m_parent_index', -1)

        def items():
673
            for name, sub_section_def in section_def.all_sub_sections.items():
674
675
676
677
                if name in dct:
                    sub_section_value = dct.pop(name)
                    if sub_section_def.repeats:
                        yield name, [
678
                            sub_section_def.sub_section.section_cls.m_from_dict(sub_section_dct)
679
680
                            for sub_section_dct in sub_section_value]
                    else:
681
                        yield name, sub_section_def.sub_section.section_cls.m_from_dict(sub_section_value)
682
683
684
685
686

            for key, value in dct.items():
                yield key, value

        dct = {key: value for key, value in items()}
Markus Scheidgen's avatar
Markus Scheidgen committed
687
        section_instance = cast(MSectionBound, section_def.section_cls())
688
689
690
        section_instance.m_update(**dct)
        return section_instance

691
    def m_to_json(self, **kwargs):
692
        """Returns the data of this section as a json string. """
693
        return json.dumps(self.m_to_dict(), **kwargs)
694

695
    def m_all_contents(self) -> Iterable[Content]:
696
        """Returns an iterable over all sub and sub subs sections. """
697
698
699
        for content in self.m_contents():
            for sub_content in content[0].m_all_contents():
                yield sub_content
700

701
            yield content
702

703
    def m_contents(self) -> Iterable[Content]:
704
        """Returns an iterable over all direct subs sections. """
705
706
707
        for name, attr in self.m_data.items():
            if isinstance(attr, list):
                for value in attr:
Markus Scheidgen's avatar
Markus Scheidgen committed
708
                    if isinstance(value, MSection):
709
                        yield value, attr, name, self
710

Markus Scheidgen's avatar
Markus Scheidgen committed
711
            elif isinstance(attr, MSection):
712
                yield value, value, name, self
713

714
    def __repr__(self):
715
        m_section_name = self.m_def.name
716
717
718
719
720
        name = ''
        if 'name' in self.m_data:
            name = self.m_data['name']

        return '%s:%s' % (name, m_section_name)
721
722


Markus Scheidgen's avatar
Markus Scheidgen committed
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
class MCategory(metaclass=MObjectMeta):

    m_def: 'Category' = None

    @classmethod
    def __init_cls__(cls):
        # ensure that the m_def is defined
        m_def = cls.m_def
        if m_def is None:
            m_def = Category()
            setattr(cls, 'm_def', m_def)

        # transfer name and description to m_def
        m_def.name = cls.__name__
        if cls.__doc__ is not None:
738
            m_def.description = inspect.cleandoc(cls.__doc__).strip()
Markus Scheidgen's avatar
Markus Scheidgen committed
739
740
741
742
743
744
745

        # add section cls' section to the module's package
        module_name = cls.__module__
        pkg = Package.from_module(module_name)
        pkg.m_add_sub_section(cls.m_def)


746
747
748
749
750
751
752
# M3, the definitions that are used to write definitions. These are the section definitions
# for sections Section and Quantity.They define themselves; i.e. the section definition
# for Section is the same section definition.
# Due to this circular nature (hen-egg-problem), the classes for sections Section and
# Quantity do only contain placeholder for their own section and quantity definitions.
# These placeholder are replaced, once the necessary classes are defined. This process
# is referred to as 'bootstrapping'.
753

754
755
756
_definition_change_counter = 0


757
758
class cached_property:
    """ A property that allows to cache the property value.
759
760
761
762
763

    The cache will be invalidated whenever a new definition is added. Once all definitions
    are loaded, the cache becomes stable and complex derived results become available
    instantaneous.
    """
764
765
766
767
768
    def __init__(self, f):
        self.__doc__ = getattr(f, "__doc__")
        self.f = f
        self.change = -1
        self.values: Dict[type(self), Any] = {}
769

770
771
772
773
774
775
776
    def __get__(self, obj, cls):
        if obj is None:
            return self

        global _definition_change_counter
        if self.change != _definition_change_counter:
            self.values = {}
777

778
779
780
781
        value = self.values.get(obj, None)
        if value is None:
            value = self.f(obj)
            self.values[obj] = value
782
783
784
785

        return value


Markus Scheidgen's avatar
Markus Scheidgen committed
786
class Definition(MSection):
787

Markus Scheidgen's avatar
Markus Scheidgen committed
788
    __all_definitions: Dict[Type[MSection], List[MSection]] = {}
789

790
791
792
    name: 'Quantity' = None
    description: 'Quantity' = None
    links: 'Quantity' = None
793
    categories: 'Quantity' = None
794

795
796
797
798
799
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        global _definition_change_counter
        _definition_change_counter += 1

800
801
802
803
804
        for cls in self.__class__.mro() + [self.__class__]:
            definitions = Definition.__all_definitions.setdefault(cls, [])
            definitions.append(self)

    @classmethod
Markus Scheidgen's avatar
Markus Scheidgen committed
805
    def all_definitions(cls: Type[MSectionBound]) -> Iterable[MSectionBound]:
806
        """ Returns all definitions of this definition class. """
Markus Scheidgen's avatar
Markus Scheidgen committed
807
        return cast(Iterable[MSectionBound], Definition.__all_definitions.get(cls, []))
808

809
810
811
812
    @cached_property
    def all_categories(self):
        """ All categories of this definition and its categories. """
        all_categories = list(self.categories)
Markus Scheidgen's avatar
Markus Scheidgen committed
813
        for category in self.categories:  # pylint: disable=not-an-iterable
814
815
816
817
818
            for super_category in category.all_categories:
                all_categories.append(super_category)

        return all_categories

819

820
821
822
823
824
class Property(Definition):
    pass


class Quantity(Property):
825
826
827
828
829
830
831
832
833
834
835
    """Used to define quantities that store a certain piece of (meta-)data.

    Quantities are the basic building block with meta-info data. The Quantity class is
    used to define quantities within sections. A quantity definition
    is a (physics) quantity with name, type, shape, and potentially a unit.

    In Python terms, quantities are descriptors. Descriptors define how to get, set, and
    delete values for a object attribute. Meta-info descriptors ensure that
    type and shape fit the set values.
    """

836
837
    type: 'Quantity' = None
    shape: 'Quantity' = None
838
839
    unit: 'Quantity' = None
    default: 'Quantity' = None
Markus Scheidgen's avatar
Markus Scheidgen committed
840
    synonym_for: 'Quantity' = None
841
842
843
844
845

    # TODO derived_from = Quantity(type=Quantity, shape=['0..*'])
    # TODO categories = Quantity(type=Category, shape=['0..*'])
    # TODO converter = Quantity(type=Converter), a class with set of functions for
    #      normalizing, (de-)serializing values.
846
847
848
849
850

    # Some quantities of Quantity cannot be read as normal quantities due to bootstraping.
    # Those can be accessed internally through the following replacement properties that
    # read directly from m_data.
    __name = property(lambda self: self.m_data['name'])
Markus Scheidgen's avatar
Markus Scheidgen committed
851
    __synonym_for = property(lambda self: self.m_data.get('synonym_for', None))
852
    __default = property(lambda self: self.m_data.get('default', None))
853

854
    def __get__(self, obj, type=None):
855
856
857
858
859
        if obj is None:
            # class (def) attribute case
            return self

        # object (instance) attribute case
Markus Scheidgen's avatar
Markus Scheidgen committed
860
861
862
        if self.__synonym_for is not None:
            return getattr(obj, self.__synonym_for.name)

863
864
865
866
        try:
            return obj.m_data[self.__name]
        except KeyError:
            return self.__default
867

868
    def __set__(self, obj, value):
869
870
871
872
873
        if obj is None:
            # class (def) case
            raise KeyError('Cannot overwrite quantity definition. Only values can be set.')

        # object (instance) case
Markus Scheidgen's avatar
Markus Scheidgen committed
874
875
876
        if self.__synonym_for is not None:
            return setattr(obj, self.__synonym_for.name, value)

877
878
879
880
881
882
883
884
885
        if type(self.type) == np.dtype:
            if type(value) != np.ndarray:
                value = np.array(value, dtype=self.type)
            elif self.type != value.dtype:
                value = np.array(value, dtype=self.type)

        elif type(value) == np.ndarray:
            value = value.tolist()

Markus Scheidgen's avatar
Markus Scheidgen committed
886
        value = obj.m_type_check(self, value)
887
        obj.m_data[self.__name] = value
888

889
    def __delete__(self, obj):
890
891
892
893
894
        if obj is None:
            # class (def) case
            raise KeyError('Cannot delete quantity definition. Only values can be deleted.')

        # object (instance) case
Markus Scheidgen's avatar
Markus Scheidgen committed
895
896
897
        if self.__synonym_for is not None:
            return self.__synonym_for.__delete__(obj)

898
        del obj.m_data[self.__name]
899
900


901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
class SubSection(Property):
    """ Allows to assign a section class as a sub-section to another section class. """

    sub_section: 'Quantity' = None
    repeats: 'Quantity' = None

    def __get__(self, obj: MSection, type=None) -> Union[MSection, 'Section']:
        if obj is None:
            # the class attribute case
            return self

        else:
            # the object attribute case
            m_data_value = obj.m_data.get(self.name, None)
            if m_data_value is None:
                if self.repeats:
                    m_data_value = []
                    obj.m_data[self.name] = m_data_value

            return m_data_value

    def __set__(self, obj: MSection, value: Union[MSection, List[MSection]]):
        raise NotImplementedError('Sub sections cannot be set directly. Use m_create.')

    def __delete__(self, obj):
        raise NotImplementedError('Sub sections cannot be deleted directly.')


929
class Section(Definition):
930
931
932
    """Used to define section that organize meta-info data into containment hierarchies.

    Section definitions determine what quantities and sub-sections can appear in a section
933
    instance.
934

935
    In Python terms, sections are classes. Sub-sections and quantities are attributes of
936
937
938
939
940
941
    respective instantiating objects. For each section class there is a corresponding
    :class:`Section` instance that describes this class as a section. This instance
    is referred to as 'section definition' in contrast to the Python class that we call
    'section class'.
    """

Markus Scheidgen's avatar
Markus Scheidgen committed
942
    section_cls: Type[MSection] = None
943
944
    """ The section class that corresponse to this section definition. """

945
946
    quantities: 'SubSection' = None
    sub_sections: 'SubSection' = None
947

Markus Scheidgen's avatar
Markus Scheidgen committed
948
    base_sections: 'Quantity' = None
949
950
    # TODO extends = Quantity(type=bool), denotes this section as a container for
    #      new quantities that belong to the base-class section definitions
951

952
    @cached_property
953
    def all_properties(self) -> Dict[str, Union['SubSection', Quantity]]:
954
        """ All attribute (sub section and quantity) definitions. """
955

956
957
958
        properties: Dict[str, Union[SubSection, Quantity]] = dict(**self.all_quantities)
        properties.update(**self.all_sub_sections)
        return properties
959

960
    @cached_property
961
    def all_quantities(self) -> Dict[str, Quantity]:
962
        """ All quantity definition in the given section definition. """
963

Markus Scheidgen's avatar
Markus Scheidgen committed
964
965
966
967
968
969
        all_quantities: Dict[str, Quantity] = {}
        for section in self.base_sections + [self]:
            for quantity in section.m_data.get('quantities', []):
                all_quantities[quantity.name] = quantity

        return all_quantities
970

971
    @cached_property
972
973
    def all_sub_sections(self) -> Dict[str, 'SubSection']:
        """ All sub section definitions for this section definition by name. """
974

975
976
        return {
            sub_section.name: sub_section
977
            for sub_section in self.m_data.get('sub_sections', [])}
978

979
980
981
982
983
984
    @cached_property
    def all_sub_sections_by_section(self) -> Dict['Section', 'SubSection']:
        """ All sub section definitions for this section definition by their section definition. """
        return {
            sub_section.sub_section: sub_section
            for sub_section in self.m_data.get('sub_sections', [])}
985

986

987
class Package(Definition):
988

989
990
991
    section_definitions: 'SubSection'
    category_definitions: 'SubSection'

992
993
994
995
996
997
998
999
1000
1001
1002
    @staticmethod
    def from_module(module_name: str):
        module = sys.modules[module_name]

        pkg: 'Package' = getattr(module, 'm_package', None)
        if pkg is None:
            pkg = Package()
            setattr(module, 'm_package', pkg)

        pkg.name = module_name
        if pkg.description is None and module.__doc__ is not None:
1003
            pkg.description = inspect.cleandoc(module.__doc__).strip()
1004
1005

        return pkg
1006
1007


1008
1009
1010
1011
1012
class Category(Definition):
    """Can be used to define categories for definitions.

    Each definition, including categories themselves, can belong to a set of categories.
    Categories therefore form a hierarchy of concepts that definitions can belong to, i.e.
1013
    they form a `is a` relationship.
1014

1015
1016
    In the old meta-info this was known as `abstract types`.
    """
1017
1018
1019
1020
1021
1022

    @cached_property
    def definitions(self) -> Iterable[Definition]:
        """ All definitions that are directly or indirectly in this category. """
        return list([
            definition for definition in Definition.all_definitions()
1023
            if self in definition.all_categories])
1024
1025


Markus Scheidgen's avatar
Markus Scheidgen committed
1026
Section.m_def = Section(name='Section')
1027
1028
Section.m_def.m_def = Section.m_def
Section.m_def.section_cls = Section
1029

Markus Scheidgen's avatar
Markus Scheidgen committed
1030
1031
1032
1033
Definition.m_def = Section(name='Definition')
Property.m_def = Section(name='Property')
Quantity.m_def = Section(name='Quantity')
SubSection.m_def = Section(name='SubSection')
1034
1035

Definition.name = Quantity(
Markus Scheidgen's avatar
Markus Scheidgen committed
1036
    type=str, name='name', description='''
1037
1038
1039
    The name of the quantity. Must be unique within a section.
    ''')
Definition.description = Quantity(
Markus Scheidgen's avatar
Markus Scheidgen committed
1040
    type=str, name='description', description='''
1041
1042
1043
    An optional human readable description.
    ''')
Definition.links = Quantity(
Markus Scheidgen's avatar
Markus Scheidgen committed
1044
    type=str, shape=['0..*'], name='links', description='''
1045
1046
    A list of URLs to external resource that describe this definition.
    ''')
1047
Definition.categories = Quantity(
Markus Scheidgen's avatar
Markus Scheidgen committed
1048
    type=Category.m_def, shape=['0..*'], default=[], name='categories',
1049
1050
1051
    description='''
    The categories that this definition belongs to. See :class:`Category`.
    ''')
1052

1053
1054
1055
1056
1057
1058
1059
Section.quantities = SubSection(
    sub_section=Quantity.m_def, repeats=True,
    description='''The quantities of this section.''')

Section.sub_sections = SubSection(
    sub_section=SubSection.m_def, repeats=True,
    description='''The sub sections of this section.''')
Markus Scheidgen's avatar
Markus Scheidgen committed
1060
1061
1062
1063
1064
1065
1066
Section.base_sections = Quantity(
    type=Section, shape=['0..*'], default=[], name='base_sections',
    description='''
    Inherit all quantity and sub section definitions from the given sections.
    Will be derived from Python base classes.
    ''')

1067
1068

SubSection.repeats = Quantity(
Markus Scheidgen's avatar
Markus Scheidgen committed
1069
    type=bool, name='repeats', default=False,
1070
1071
1072
    description='''Wether this sub section can appear only once or multiple times. ''')

SubSection.sub_section = Quantity(
Markus Scheidgen's avatar
Markus Scheidgen committed
1073
    type=Section.m_def, name='sub_section', description='''
1074
1075
    The section definition for the sub section. Only section instances of this definition
    can be contained as sub sections.
1076
    ''')
1077

1078
Quantity.m_def.section_cls = Quantity
1079
Quantity.type = Quantity(
Markus Scheidgen's avatar
Markus Scheidgen committed
1080
    type=Union[type, Enum, Section, np.dtype], name='type', description='''
1081
1082
1083
1084
1085
1086
1087
1088
1089
    The type of the quantity.

    Can be one of the following:

    - none to support any value
    - a build-in primitive Python type, e.g. ``int``, ``str``
    - an instance of :class:`Enum`, e.g. ``Enum(['one', 'two', 'three'])
    - a instance of Section, i.e. a section definition. This will define a reference
    - a custom meta-info DataType
1090
1091
1092
1093
1094
1095
    - a numpy dtype,

    If set to a dtype, this quantity will use a numpy array to store values. It will use
    the given dtype. If not set, this quantity will use (nested) Python lists to store values.
    If values are set to the property, they will be converted to the respective
    representation.
1096
1097
1098
1099

    In the NOMAD CoE meta-info this was basically the ``dTypeStr``.
    ''')
Quantity.shape = Quantity(
Markus Scheidgen's avatar
Markus Scheidgen committed
1100
    type=Dimension, shape=['0..*'], name='shape', description='''
1101
    The shape of the quantity that defines its dimensionality.
1102

1103
1104
1105
1106
1107
1108
1109
1110
1111
    A shape is a list, where each item defines a dimension. Each dimension can be:

    - an integer that defines the exact size of the dimension, e.g. ``[3]`` is the
      shape of a spacial vector
    - the name of an int typed quantity in the same section
    - a range specification as string build from a lower bound (i.e. int number),
      and an upper bound (int or ``*`` denoting arbitrary large), e.g. ``'0..*'``, ``'1..3'``
    ''')
Quantity.unit = Quantity(