metainfo.py 38.5 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
# Copyright 2018 Markus Scheidgen
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an"AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
14

15
16
17
18
"""
The NOMAD meta-info allows to define physics data quantities. These definitions are
necessary for all computer representations of respective data (e.g. in Python,
search engines, data-bases, and files).
19

20
This modules provides various Python interfaces for
21

22
23
24
- defining meta-info data
- to create and manipulate data that follows these definitions
- to (de-)serialize meta-info data in JSON (i.e. represent data in JSON formatted files)
25

26
27
28
29
Here is a simple example that demonstrates the definition of System related quantities:

.. code-block:: python

Markus Scheidgen's avatar
Markus Scheidgen committed
30
    class System(MSection):
31
32
33
34
35
        \"\"\"
        A system section includes all quantities that describe a single a simulated
        system (a.k.a. geometry).
        \"\"\"

36
37
38
39
        n_atoms = Quantity(
            type=int, description='''
            A Defines the number of atoms in the system.
            ''')
40

41
42
43
44
        atom_labels = Quantity(type=Enum(ase.data.chemical_symbols), shape['n_atoms'])
        atom_positions = Quantity(type=float, shape=['n_atoms', 3], unit=Units.m)
        simulation_cell = Quantity(type=float, shape=[3, 3], unit=Units.m)
        pbc = Quantity(type=bool, shape=[3])
45

46
47
48
    class Run(MSection):
        systems = SubSection(sub_section=System, repeats=True)

49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
Here, we define a `section` called ``System``. The section mechanism allows to organize
related data into, well, sections. Sections form containment hierarchies. Here
containment is a parent-child (whole-part) relationship. In this example many ``Systems``,
are part of one ``Run``. Each ``System`` can contain values for the defined quantities:
``n_atoms``, ``atom_labels``, ``atom_positions``, ``simulation_cell``, and ``pbc``.
Quantities allow to state type, shape, and physics unit to specify possible quantity
values.

Here is an example, were we use the above definition to create, read, and manipulate
data that follows these definitions:

.. code-bock:: python

    run = Run()
    system = run.m_create(System)
    system.n_atoms = 3
    system.atom_labels = ['H', 'H', 'O']

    print(system.atom_labels)
    print(run.m_to_json(ident=2))

This last statement, will produce the following JSON:

.. code-block:: JSON

    {
75
        "m_def" = "Run",
76
77
        "System": [
            {
78
                "m_def" = "System",
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
                "m_parent_index" = 0,
                "n_atoms" = 3,
                "atom_labels" = [
                    "H",
                    "H",
                    "O"
                ]
            }
        ]
    }

This is the JSON representation, a serialized version of the Python representation in
the example above.

Sections can be extended with new quantities outside the original section definition.
This provides the key mechanism to extend commonly defined parts with (code) specific
quantities:

.. code-block:: Python

    class Method(nomad.metainfo.common.Method):
        x_vasp_incar_ALGO=Quantity(
            type=Enum(['Normal', 'VeryFast', ...]),
            links=['https://cms.mpi.univie.ac.at/wiki/index.php/ALGO'])
        \"\"\"
        A convenient option to specify the electronic minimisation algorithm (as of VASP.4.5)
        and/or to select the type of GW calculations.
        \"\"\"


All meta-info definitions and classes for meta-info data objects (i.e. section instances)
110
inherit from :class:` MSection`. This base-class provides common functions and properties
111
112
113
114
115
116
for all meta-info data objects. Names of these common parts are prefixed with ``m_``
to distinguish them from user defined quantities. This also constitute's the `reflection`
interface (in addition to Python's build in ``getattr``, ``setattr``) that allows to
create and manipulate meta-info data, without prior program time knowledge of the underlying
definitions.

Markus Scheidgen's avatar
Markus Scheidgen committed
117
.. autoclass:: MSection
118
119
120

The following classes can be used to define and structure meta-info data:

Markus Scheidgen's avatar
Markus Scheidgen committed
121
- sections are defined by sub-classes :class:`MSection` and using :class:`Section` to
122
  populate the classattribute `m_def`
123
124
125
126
127
128
129
130
131
132
133
134
- quantities are defined by assigning classattributes of a section with :class:`Quantity`
  instances
- references (from one section to another) can be defined with quantities that use
  section definitions as type
- dimensions can use defined by simply using quantity names in shapes
- categories (former `abstract type definitions`) can be given in quantity definitions
  to assign quantities to additional specialization-generalization hierarchies

See the reference of classes :class:`Section` and :class:`Quantities` for details.

.. autoclass:: Section
.. autoclass:: Quantity
135
136
"""

137
138
# TODO validation

139
140
from typing import Type, TypeVar, Union, Tuple, Iterable, List, Any, Dict, cast
import sys
141
import inspect
142
import re
143
import json
144

145
import numpy as np
146
147
from pint.unit import _Unit
from pint import UnitRegistry
148

Markus Scheidgen's avatar
Markus Scheidgen committed
149
150
is_bootstrapping = True
MSectionBound = TypeVar('MSectionBound', bound='MSection')
151

152

153
# Reflection
154

155
class Enum(list):
156
    """ Allows to define str types with values limited to a pre-set list of possible values. """
157
158
159
    pass


160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
class DataType:
    """
    Allows to define custom data types that can be used in the meta-info.

    The metainfo supports most types out of the box. These includes the python build-in
    primitive types (int, bool, str, float, ...), references to sections, and enums.
    However, in some occasions you need to add custom data types.
    """
    def check_type(self, value):
        pass

    def normalize(self, value):
        return value

    def to_json_serializable(self, value):
        return value

    def from_json_serializable(self, value):
        return value


class Dimension(DataType):
    def check_type(self, value):
        if isinstance(value, int):
            return

        if isinstance(value, str):
            if value.isidentifier():
                return
            if re.match(r'(\d)\.\.(\d|\*)', value):
                return

        if isinstance(value, Section):
            return

195
        if isinstance(value, type) and hasattr(value, 'm_def'):
196
197
198
199
200
201
202
203
204
205
206
            return

        raise TypeError('%s is not a valid dimension' % str(value))
    # TODO


# TODO class Unit(DataType)
# TODO class MetainfoType(DataType)
# TODO class Datetime(DataType)


207
class MObjectMeta(type):
208

209
210
    def __new__(self, cls_name, bases, dct):
        cls = super().__new__(self, cls_name, bases, dct)
Markus Scheidgen's avatar
Markus Scheidgen committed
211
212
        init = getattr(cls, '__init_cls__')
        if init is not None and not is_bootstrapping:
213
214
            init()
        return cls
215
216


Markus Scheidgen's avatar
Markus Scheidgen committed
217
Content = Tuple[MSectionBound, Union[List[MSectionBound], MSectionBound], str, MSectionBound]
218
219
220
221
222
223
224
225
226
227
228

SectionDef = Union[str, 'Section', 'SubSection', Type[MSectionBound]]
""" Type for section definition references.

This can either be :

- the name of the section
- the section definition itself
- the definition of a sub section
- or the section definition Python class
"""
229
230


Markus Scheidgen's avatar
Markus Scheidgen committed
231
232
class MSection(metaclass=MObjectMeta):
    """Base class for all section instances on all meta-info levels.
233

Markus Scheidgen's avatar
Markus Scheidgen committed
234
235
236
    All metainfo objects instantiate classes that inherit from ``MSection``. Each
    section or quantity definition is an ``MSection``, each actual (meta-)data carrying
    section is an ``MSection``. This class consitutes the reflection interface of the
237
238
239
240
241
242
243
244
245
246
247
248
249
    meta-info, since it allows to manipulate sections (and therefore all meta-info data)
    without having to know the specific sub-class.

    It also carries all the data for each section. All sub-classes only define specific
    sections in terms of possible sub-sections and quantities. The data is managed here.

    The reflection insterface for reading and manipulating quantity values consists of
    Pythons build in ``getattr``, ``setattr``, and ``del``, as well as member functions
    :func:`m_add_value`, and :func:`m_add_values`.

    Sub-sections and parent sections can be read and manipulated with :data:`m_parent`,
    :func:`m_sub_section`, :func:`m_create`.

250
251
252
253
254
    .. code-block:: python

        system = run.m_create(System)
        assert system.m_parent == run
        assert run.m_sub_section(System, system.m_parent_index) == system
255
256

    Attributes:
257
        m_def: The section definition that defines this sections, its possible
258
259
260
261
262
263
264
265
266
267
268
            sub-sections and quantities.
        m_parent: The parent section instance that this section is a sub-section of.
        m_parent_index: For repeatable sections, parent keep a list of sub-sections for
            each section definition. This is the index of this section in the respective
            parent sub-section list.
        m_data: The dictionary that holds all data of this section. It keeps the quantity
            values and sub-section. It should only be read directly (and never manipulated)
            if you are know what you are doing. You should always use the reflection interface
            if possible.
    """

269
    m_def: 'Section' = None
270

Markus Scheidgen's avatar
Markus Scheidgen committed
271
    def __init__(self, m_def: 'Section' = None, m_parent: 'MSection' = None, _bs: bool = False, **kwargs):
272
        self.m_def: 'Section' = m_def
Markus Scheidgen's avatar
Markus Scheidgen committed
273
        self.m_parent: 'MSection' = m_parent
274
        self.m_parent_index = -1
275

276
        cls = self.__class__
277
278
        if self.m_def is None:
            self.m_def = cls.m_def
279

280
281
        if cls.m_def is not None:
            assert self.m_def == cls.m_def, \
282
283
                'Section class and section definition must match'

284
285
286
287
288
289
290
291
        self.m_annotations: Dict[str, Any] = {}
        self.m_data: Dict[str, Any] = {}
        for key, value in kwargs.items():
            if key.startswith('a_'):
                self.m_annotations[key[2:]] = value
            else:
                self.m_data[key] = value

292
293
294
295
296
297
298
        # TODO
        # self.m_data = {}
        # if _bs:
        #     self.m_data.update(**kwargs)
        # else:
        #     self.m_update(**kwargs)

299
    @classmethod
Markus Scheidgen's avatar
Markus Scheidgen committed
300
    def __init_cls__(cls):
301
302
        # ensure that the m_def is defined
        m_def = cls.m_def
Markus Scheidgen's avatar
Markus Scheidgen committed
303
        if m_def is None:
304
305
            m_def = Section()
            setattr(cls, 'm_def', m_def)
306

307
308
        # transfer name and description to m_def
        m_def.name = cls.__name__
309
        if cls.__doc__ is not None:
310
            m_def.description = inspect.cleandoc(cls.__doc__).strip()
311
        m_def.section_cls = cls
312

313
        for name, attr in cls.__dict__.items():
314
315
            # transfer names and descriptions for properties
            if isinstance(attr, Property):
316
                attr.name = name
317
                if attr.description is not None:
318
                    attr.description = inspect.cleandoc(attr.description).strip()
319
                    attr.__doc__ = attr.description
320

321
322
323
324
325
326
327
328
329
330
                # TODO manual manipulation of m_data due to bootstrapping
                if isinstance(attr, Quantity):
                    properties = m_def.m_data.setdefault('quantities', [])
                elif isinstance(attr, SubSection):
                    properties = m_def.m_data.setdefault('sub_sections', [])
                else:
                    raise NotImplementedError('Unknown property kind.')
                properties.append(attr)
                attr.m_parent = m_def
                attr.m_parent_index = len(properties) - 1
331

332
333
334
        # add section cls' section to the module's package
        module_name = cls.__module__
        pkg = Package.from_module(module_name)
335
        pkg.m_add_sub_section(cls.m_def)
336

337
    @staticmethod
338
    def m_type_check(definition: 'Quantity', value: Any, check_item: bool = False):
339
        """Checks if the value fits the given quantity in type and shape; raises
340
341
342
343
344
        TypeError if not."""

        if value is None and not check_item and definition.default is None:
            # Allow the default None value even if it would violate the type
            return
345
346
347
348

        def check_value(value):
            if isinstance(definition.type, Enum):
                if value not in definition.type:
349
                    raise TypeError('Not one of the enum values.')
350
351
352

            elif isinstance(definition.type, type):
                if not isinstance(value, definition.type):
353
                    raise TypeError('Value has wrong type.')
354
355

            elif isinstance(definition.type, Section):
Markus Scheidgen's avatar
Markus Scheidgen committed
356
                if not isinstance(value, MSection) or value.m_def != definition.type:
357
                    raise TypeError('The value is not a section of wrong section definition')
358
359

            else:
360
361
362
                # TODO
                # raise Exception('Invalid quantity type: %s' % str(definition.type))
                pass
363
364
365
366
367
368
369
370
371
372

        shape = None
        try:
            shape = definition.shape
        except KeyError:
            pass

        if shape is None or len(shape) == 0 or check_item:
            check_value(value)

373
374
375
376
377
378
379
380
        else:
            if type(definition.type) == np.dtype:
                if len(shape) != len(value.shape):
                    raise TypeError('Wrong shape')
            else:
                if len(shape) == 1:
                    if not isinstance(value, list):
                        raise TypeError('Wrong shape')
381

382
383
                    for item in value:
                        check_value(item)
384

385
386
387
388
                else:
                    # TODO
                    # raise Exception('Higher shapes not implemented')
                    pass
389
390
391

        # TODO check dimension

392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
    def _resolve_sub_section(self, definition: SectionDef) -> 'SubSection':
        """ Resolves and checks the given section definition. """

        if isinstance(definition, type):
            definition = getattr(definition, 'm_def', None)
            if definition is None:
                raise TypeError(
                    'The type/class %s is not definining a section, i.e. not derived from '
                    'MSection.' % str(definition))

        if isinstance(definition, Section):
            sub_section = self.m_def.all_sub_sections_by_section.get(definition, None)
            if sub_section is None:
                raise KeyError(
                    'The section %s is not a sub section of %s.' %
                    (definition.name, self.m_def.name))

        elif isinstance(definition, str):
            sub_section = self.m_def.all_sub_sections[definition]

        elif isinstance(definition, SubSection):
            sub_section = definition
414
415

        else:
416
417
418
419
            raise TypeError(
                '%s does not refer to a section definition. Either use the section '
                'definition, sub section definition, section class, or name.' %
                str(definition))
420

421
422
423
424
425
426
427
428
429
430
431
        if sub_section is None:
            raise KeyError(
                'The section %s is not a sub section of %s.' %
                (cast(Definition, definition).name, self.m_def.name))

        if sub_section.m_parent is not self.m_def:
            raise KeyError(
                'The section %s is not a sub section of %s.' %
                (cast(Definition, definition).name, self.m_def.name))

        return sub_section
432

Markus Scheidgen's avatar
Markus Scheidgen committed
433
    def m_sub_sections(self, definition: SectionDef) -> List[MSectionBound]:
434
435
436
437
438
439
440
441
        """Returns all sub sections for the given section definition

        Args:
            definition: The definition of the section.

        Raises:
            KeyError: If the definition is not for a sub section
        """
442
443
        sub_section = self._resolve_sub_section(definition)
        return getattr(self, sub_section.name)
444

Markus Scheidgen's avatar
Markus Scheidgen committed
445
    def m_sub_section(self, definition: SectionDef, parent_index: int = -1) -> MSectionBound:
446
447
448
449
450
451
452
453
454
455
456
457
458
459
        """Returns the sub section for the given section definition and possible
           parent_index (for repeatable sections).

        Args:
            definition: The definition of the section.
            parent_index: The index of the desired section. This can be omitted for non
                repeatable sections. If omitted for repeatable sections a exception
                will be raised, if more then one sub-section exists. Likewise, if the given
                index is out of range.
        Raises:
            KeyError: If the definition is not for a sub section
            IndexError: If the given index is wrong, or if an index is given for a non
                repeatable section
        """
460
        sub_section = self._resolve_sub_section(definition)
461

462
        m_data_value = getattr(self, sub_section.name)
463
464

        if m_data_value is None:
465
            if sub_section.repeats:
466
467
468
                m_data_value = []
            else:
                m_data_value = None
469
470
471
472
473
474
475
476
477
478
479
480
481

        if isinstance(m_data_value, list):
            m_data_values = m_data_value
            if parent_index == -1:
                if len(m_data_values) == 1:
                    return m_data_values[0]
                else:
                    raise IndexError()
            else:
                return m_data_values[parent_index]
        else:
            if parent_index != -1:
                raise IndexError('Not a repeatable sub section.')
482
483

            return m_data_value
484

Markus Scheidgen's avatar
Markus Scheidgen committed
485
    def m_add_sub_section(self, sub_section: MSectionBound) -> MSectionBound:
486
487
        """Adds the given section instance as a sub section to this section."""

488
489
490
491
492
493
        sub_section_def = self._resolve_sub_section(sub_section.m_def.section_cls)
        sub_section.m_parent = self
        if sub_section_def.repeats:
            values = getattr(self, sub_section_def.name)
            sub_section.m_parent_index = len(values)
            values.append(sub_section)
494
495

        else:
496
497
            self.m_data[sub_section_def.name] = sub_section
            sub_section.m_parent_index = -1
498
499
500

        return sub_section

501
502
    # TODO this should work with the section constructor
    def m_create(self, definition: Type[MSectionBound], **kwargs) -> MSectionBound:
503
        """Creates a subsection and adds it this this section
504

505
506
507
508
        Args:
            section: The section definition of the subsection. It is either the
                definition itself, or the python class representing the section definition.
            **kwargs: Are used to initialize the subsection.
509

510
511
        Returns:
            The created subsection
512

513
        Raises:
514
            KeyError: If the given section is not a subsection of this section.
515
        """
516
        sub_section: 'SubSection' = self._resolve_sub_section(definition)
517

518
519
        section_cls = sub_section.sub_section.section_cls
        section_instance = section_cls(m_def=section_cls.m_def, m_parent=self, **kwargs)
520

521
        return cast(MSectionBound, self.m_add_sub_section(section_instance))
522

523
524
525
    def __resolve_quantity(self, definition: Union[str, 'Quantity']) -> 'Quantity':
        """Resolves and checks the given quantity definition. """
        if isinstance(definition, str):
526
            quantity = self.m_def.all_quantities[definition]
527

528
        else:
529
            if definition.m_parent != self.m_def:
530
531
532
533
534
535
536
                raise KeyError('Quantity is not a quantity of this section.')
            quantity = definition

        return quantity

    def m_add(self, definition: Union[str, 'Quantity'], value: Any):
        """Adds the given value to the given quantity."""
537

538
539
        quantity = self.__resolve_quantity(definition)

Markus Scheidgen's avatar
Markus Scheidgen committed
540
        MSection.m_type_check(quantity, value, check_item=True)
541
542
543
544
545
546
547
548
549
550

        m_data_values = self.m_data.setdefault(quantity.name, [])
        m_data_values.append(value)

    def m_add_values(self, definition: Union[str, 'Quantity'], values: Iterable[Any]):
        """Adds the given values to the given quantity."""

        quantity = self.__resolve_quantity(definition)

        for value in values:
Markus Scheidgen's avatar
Markus Scheidgen committed
551
            MSection.m_type_check(quantity, value, check_item=True)
552
553
554
555
556

        m_data_values = self.m_data.setdefault(quantity.name, [])
        for value in values:
            m_data_values.append(value)

557
558
559
    def m_update(self, **kwargs):
        """ Updates all quantities and sub-sections with the given arguments. """
        for name, value in kwargs.items():
560
561
            prop = self.m_def.all_properties.get(name, None)
            if prop is None:
562
563
                raise KeyError('%s is not an attribute of this section' % name)

564
565
            if isinstance(prop, SubSection):
                if prop.repeats:
566
567
568
569
                    if isinstance(value, List):
                        for item in value:
                            self.m_add_sub_section(item)
                    else:
570
                        raise TypeError('Sub section %s repeats, but no list was given' % prop.name)
571
572
573
574
575
576
                else:
                    self.m_add_sub_section(item)

            else:
                setattr(self, name, value)

577
578
    def m_to_dict(self) -> Dict[str, Any]:
        """Returns the data of this section as a json serializeable dictionary. """
579
580

        def items() -> Iterable[Tuple[str, Any]]:
581
            yield 'm_def', self.m_def.name
582
            if self.m_parent_index != -1:
583
                yield 'm_parent_index', self.m_parent_index
584

585
            for name, sub_section in self.m_def.all_sub_sections.items():
586
587
588
589
590
591
592
593
                if name not in self.m_data:
                    continue

                if sub_section.repeats:
                    yield name, [item.m_to_dict() for item in self.m_data[name]]
                else:
                    yield name, self.m_data[name].m_to_dict()

594
            for name, quantity in self.m_def.all_quantities.items():
595
596
597
598
                if name in self.m_data:
                    value = getattr(self, name)
                    if hasattr(value, 'tolist'):
                        value = value.tolist()
599
600
601
602
603
604
605
606
607
608
609
610
611
612

                    # TODO
                    if isinstance(quantity.type, Section):
                        value = str(value)
                    # TODO
                    if isinstance(value, type):
                        value = str(value)
                    # TODO
                    if isinstance(value, np.dtype):
                        value = str(value)
                    # TODO
                    if isinstance(value, _Unit):
                        value = str(value)

613
614
615
                    yield name, value

        return {key: value for key, value in items()}
616

617
    @classmethod
Markus Scheidgen's avatar
Markus Scheidgen committed
618
    def m_from_dict(cls: Type[MSectionBound], dct: Dict[str, Any]) -> MSectionBound:
619
        section_def = cls.m_def
620

621
622
        # remove m_def and m_parent_index, they set themselves automatically
        assert section_def.name == dct.pop('m_def', None)
623
624
625
        dct.pop('m_parent_index', -1)

        def items():
626
            for name, sub_section_def in section_def.all_sub_sections.items():
627
628
629
630
                if name in dct:
                    sub_section_value = dct.pop(name)
                    if sub_section_def.repeats:
                        yield name, [
631
                            sub_section_def.sub_section.section_cls.m_from_dict(sub_section_dct)
632
633
                            for sub_section_dct in sub_section_value]
                    else:
634
                        yield name, sub_section_def.sub_section.section_cls.m_from_dict(sub_section_value)
635
636
637
638
639

            for key, value in dct.items():
                yield key, value

        dct = {key: value for key, value in items()}
Markus Scheidgen's avatar
Markus Scheidgen committed
640
        section_instance = cast(MSectionBound, section_def.section_cls())
641
642
643
        section_instance.m_update(**dct)
        return section_instance

644
    def m_to_json(self, **kwargs):
645
        """Returns the data of this section as a json string. """
646
        return json.dumps(self.m_to_dict(), **kwargs)
647

648
    def m_all_contents(self) -> Iterable[Content]:
649
        """Returns an iterable over all sub and sub subs sections. """
650
651
652
        for content in self.m_contents():
            for sub_content in content[0].m_all_contents():
                yield sub_content
653

654
            yield content
655

656
    def m_contents(self) -> Iterable[Content]:
657
        """Returns an iterable over all direct subs sections. """
658
659
660
        for name, attr in self.m_data.items():
            if isinstance(attr, list):
                for value in attr:
Markus Scheidgen's avatar
Markus Scheidgen committed
661
                    if isinstance(value, MSection):
662
                        yield value, attr, name, self
663

Markus Scheidgen's avatar
Markus Scheidgen committed
664
            elif isinstance(attr, MSection):
665
                yield value, value, name, self
666

667
    def __repr__(self):
668
        m_section_name = self.m_def.name
669
670
671
672
673
        name = ''
        if 'name' in self.m_data:
            name = self.m_data['name']

        return '%s:%s' % (name, m_section_name)
674
675


Markus Scheidgen's avatar
Markus Scheidgen committed
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
class MCategory(metaclass=MObjectMeta):

    m_def: 'Category' = None

    @classmethod
    def __init_cls__(cls):
        # ensure that the m_def is defined
        m_def = cls.m_def
        if m_def is None:
            m_def = Category()
            setattr(cls, 'm_def', m_def)

        # transfer name and description to m_def
        m_def.name = cls.__name__
        if cls.__doc__ is not None:
691
            m_def.description = inspect.cleandoc(cls.__doc__).strip()
Markus Scheidgen's avatar
Markus Scheidgen committed
692
693
694
695
696
697
698

        # add section cls' section to the module's package
        module_name = cls.__module__
        pkg = Package.from_module(module_name)
        pkg.m_add_sub_section(cls.m_def)


699
700
701
702
703
704
705
# M3, the definitions that are used to write definitions. These are the section definitions
# for sections Section and Quantity.They define themselves; i.e. the section definition
# for Section is the same section definition.
# Due to this circular nature (hen-egg-problem), the classes for sections Section and
# Quantity do only contain placeholder for their own section and quantity definitions.
# These placeholder are replaced, once the necessary classes are defined. This process
# is referred to as 'bootstrapping'.
706

707
708
709
_definition_change_counter = 0


710
711
class cached_property:
    """ A property that allows to cache the property value.
712
713
714
715
716

    The cache will be invalidated whenever a new definition is added. Once all definitions
    are loaded, the cache becomes stable and complex derived results become available
    instantaneous.
    """
717
718
719
720
721
    def __init__(self, f):
        self.__doc__ = getattr(f, "__doc__")
        self.f = f
        self.change = -1
        self.values: Dict[type(self), Any] = {}
722

723
724
725
726
727
728
729
    def __get__(self, obj, cls):
        if obj is None:
            return self

        global _definition_change_counter
        if self.change != _definition_change_counter:
            self.values = {}
730

731
732
733
734
        value = self.values.get(obj, None)
        if value is None:
            value = self.f(obj)
            self.values[obj] = value
735
736
737
738

        return value


Markus Scheidgen's avatar
Markus Scheidgen committed
739
class Definition(MSection):
740

Markus Scheidgen's avatar
Markus Scheidgen committed
741
    __all_definitions: Dict[Type[MSection], List[MSection]] = {}
742

743
744
745
    name: 'Quantity' = None
    description: 'Quantity' = None
    links: 'Quantity' = None
746
    categories: 'Quantity' = None
747

748
749
750
751
752
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        global _definition_change_counter
        _definition_change_counter += 1

753
754
755
756
757
        for cls in self.__class__.mro() + [self.__class__]:
            definitions = Definition.__all_definitions.setdefault(cls, [])
            definitions.append(self)

    @classmethod
Markus Scheidgen's avatar
Markus Scheidgen committed
758
    def all_definitions(cls: Type[MSectionBound]) -> Iterable[MSectionBound]:
759
        """ Returns all definitions of this definition class. """
Markus Scheidgen's avatar
Markus Scheidgen committed
760
        return cast(Iterable[MSectionBound], Definition.__all_definitions.get(cls, []))
761

762
763
764
765
    @cached_property
    def all_categories(self):
        """ All categories of this definition and its categories. """
        all_categories = list(self.categories)
Markus Scheidgen's avatar
Markus Scheidgen committed
766
        for category in self.categories:  # pylint: disable=not-an-iterable
767
768
769
770
771
            for super_category in category.all_categories:
                all_categories.append(super_category)

        return all_categories

772

773
774
775
776
777
class Property(Definition):
    pass


class Quantity(Property):
778
779
780
781
782
783
784
785
786
787
788
    """Used to define quantities that store a certain piece of (meta-)data.

    Quantities are the basic building block with meta-info data. The Quantity class is
    used to define quantities within sections. A quantity definition
    is a (physics) quantity with name, type, shape, and potentially a unit.

    In Python terms, quantities are descriptors. Descriptors define how to get, set, and
    delete values for a object attribute. Meta-info descriptors ensure that
    type and shape fit the set values.
    """

789
790
    type: 'Quantity' = None
    shape: 'Quantity' = None
791
792
    unit: 'Quantity' = None
    default: 'Quantity' = None
793
794
795
796
797
798
799

    # TODO section = Quantity(type=Section), the section it belongs to
    # TODO synonym_for = Quantity(type=Quantity)
    # TODO derived_from = Quantity(type=Quantity, shape=['0..*'])
    # TODO categories = Quantity(type=Category, shape=['0..*'])
    # TODO converter = Quantity(type=Converter), a class with set of functions for
    #      normalizing, (de-)serializing values.
800
801
802
803
804
805

    # Some quantities of Quantity cannot be read as normal quantities due to bootstraping.
    # Those can be accessed internally through the following replacement properties that
    # read directly from m_data.
    __name = property(lambda self: self.m_data['name'])
    __default = property(lambda self: self.m_data.get('default', None))
806

807
    def __get__(self, obj, type=None):
808
809
810
811
812
813
814
815
816
        if obj is None:
            # class (def) attribute case
            return self

        # object (instance) attribute case
        try:
            return obj.m_data[self.__name]
        except KeyError:
            return self.__default
817

818
    def __set__(self, obj, value):
819
820
821
822
823
        if obj is None:
            # class (def) case
            raise KeyError('Cannot overwrite quantity definition. Only values can be set.')

        # object (instance) case
824
825
826
827
828
829
830
831
832
        if type(self.type) == np.dtype:
            if type(value) != np.ndarray:
                value = np.array(value, dtype=self.type)
            elif self.type != value.dtype:
                value = np.array(value, dtype=self.type)

        elif type(value) == np.ndarray:
            value = value.tolist()

Markus Scheidgen's avatar
Markus Scheidgen committed
833
        MSection.m_type_check(self, value)
834
        obj.m_data[self.__name] = value
835

836
    def __delete__(self, obj):
837
838
839
840
841
        if obj is None:
            # class (def) case
            raise KeyError('Cannot delete quantity definition. Only values can be deleted.')

        # object (instance) case
842
        del obj.m_data[self.__name]
843
844


845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
class SubSection(Property):
    """ Allows to assign a section class as a sub-section to another section class. """

    sub_section: 'Quantity' = None
    repeats: 'Quantity' = None

    def __get__(self, obj: MSection, type=None) -> Union[MSection, 'Section']:
        if obj is None:
            # the class attribute case
            return self

        else:
            # the object attribute case
            m_data_value = obj.m_data.get(self.name, None)
            if m_data_value is None:
                if self.repeats:
                    m_data_value = []
                    obj.m_data[self.name] = m_data_value

            return m_data_value

    def __set__(self, obj: MSection, value: Union[MSection, List[MSection]]):
        raise NotImplementedError('Sub sections cannot be set directly. Use m_create.')

    def __delete__(self, obj):
        raise NotImplementedError('Sub sections cannot be deleted directly.')


873
class Section(Definition):
874
875
876
    """Used to define section that organize meta-info data into containment hierarchies.

    Section definitions determine what quantities and sub-sections can appear in a section
877
    instance.
878

879
    In Python terms, sections are classes. Sub-sections and quantities are attributes of
880
881
882
883
884
885
    respective instantiating objects. For each section class there is a corresponding
    :class:`Section` instance that describes this class as a section. This instance
    is referred to as 'section definition' in contrast to the Python class that we call
    'section class'.
    """

Markus Scheidgen's avatar
Markus Scheidgen committed
886
    section_cls: Type[MSection] = None
887
888
    """ The section class that corresponse to this section definition. """

889
890
    quantities: 'SubSection' = None
    sub_sections: 'SubSection' = None
891

892
893
894
895
    # TODO super = Quantity(type=Section, shape=['0..*']), inherit all quantity definition
    #      from the given sections, derived from Python base classes
    # TODO extends = Quantity(type=bool), denotes this section as a container for
    #      new quantities that belong to the base-class section definitions
896

897
    @cached_property
898
    def all_properties(self) -> Dict[str, Union['SubSection', Quantity]]:
899
        """ All attribute (sub section and quantity) definitions. """
900

901
902
903
        properties: Dict[str, Union[SubSection, Quantity]] = dict(**self.all_quantities)
        properties.update(**self.all_sub_sections)
        return properties
904

905
    @cached_property
906
    def all_quantities(self) -> Dict[str, Quantity]:
907
        """ All quantity definition in the given section definition. """
908

909
910
        return {
            quantity.name: quantity
911
            for quantity in self.m_data.get('quantities', [])}
912

913
    @cached_property
914
915
    def all_sub_sections(self) -> Dict[str, 'SubSection']:
        """ All sub section definitions for this section definition by name. """
916

917
918
        return {
            sub_section.name: sub_section
919
            for sub_section in self.m_data.get('sub_sections', [])}
920

921
922
923
924
925
926
    @cached_property
    def all_sub_sections_by_section(self) -> Dict['Section', 'SubSection']:
        """ All sub section definitions for this section definition by their section definition. """
        return {
            sub_section.sub_section: sub_section
            for sub_section in self.m_data.get('sub_sections', [])}
927

928

929
class Package(Definition):
930

931
932
933
    section_definitions: 'SubSection'
    category_definitions: 'SubSection'

934
935
936
937
938
939
940
941
942
943
944
    @staticmethod
    def from_module(module_name: str):
        module = sys.modules[module_name]

        pkg: 'Package' = getattr(module, 'm_package', None)
        if pkg is None:
            pkg = Package()
            setattr(module, 'm_package', pkg)

        pkg.name = module_name
        if pkg.description is None and module.__doc__ is not None:
945
            pkg.description = inspect.cleandoc(module.__doc__).strip()
946
947

        return pkg
948
949


950
951
952
953
954
class Category(Definition):
    """Can be used to define categories for definitions.

    Each definition, including categories themselves, can belong to a set of categories.
    Categories therefore form a hierarchy of concepts that definitions can belong to, i.e.
955
    they form a `is a` relationship.
956

957
958
    In the old meta-info this was known as `abstract types`.
    """
959
960
961
962
963
964

    @cached_property
    def definitions(self) -> Iterable[Definition]:
        """ All definitions that are directly or indirectly in this category. """
        return list([
            definition for definition in Definition.all_definitions()
965
            if self in definition.all_categories])
966
967


968
Section.m_def = Section(name='Section', _bs=True)
969
970
Section.m_def.m_def = Section.m_def
Section.m_def.section_cls = Section
971

972
973
Quantity.m_def = Section(name='Quantity', _bs=True)
SubSection.m_def = Section(name='SubSection', _bs=True)
974
975
976
977
978
979
980
981
982
983
984
985
986

Definition.name = Quantity(
    type=str, name='name', _bs=True, description='''
    The name of the quantity. Must be unique within a section.
    ''')
Definition.description = Quantity(
    type=str, name='description', _bs=True, description='''
    An optional human readable description.
    ''')
Definition.links = Quantity(
    type=str, shape=['0..*'], name='links', _bs=True, description='''
    A list of URLs to external resource that describe this definition.
    ''')
987
Definition.categories = Quantity(
988
    type=Category.m_def, shape=['0..*'], default=[], name='categories', _bs=True,
989
990
991
    description='''
    The categories that this definition belongs to. See :class:`Category`.
    ''')
992

993
994
995
996
997
998
999
1000
1001
Section.quantities = SubSection(
    sub_section=Quantity.m_def, repeats=True,
    description='''The quantities of this section.''')

Section.sub_sections = SubSection(
    sub_section=SubSection.m_def, repeats=True,
    description='''The sub sections of this section.''')

SubSection.repeats = Quantity(
1002
    type=bool, name='repeats', default=False, _bs=True,
1003
1004
1005
1006
1007
1008
    description='''Wether this sub section can appear only once or multiple times. ''')

SubSection.sub_section = Quantity(
    type=Section.m_def, name='sub_section', _bs=True, description='''
    The section definition for the sub section. Only section instances of this definition
    can be contained as sub sections.
1009
    ''')
1010

1011
Quantity.m_def.section_cls = Quantity
1012
Quantity.type = Quantity(
1013
    type=Union[type, Enum, Section, np.dtype], name='type', _bs=True, description='''
1014
1015
1016
1017
1018
1019
1020
1021
1022
    The type of the quantity.

    Can be one of the following:

    - none to support any value
    - a build-in primitive Python type, e.g. ``int``, ``str``
    - an instance of :class:`Enum`, e.g. ``Enum(['one', 'two', 'three'])
    - a instance of Section, i.e. a section definition. This will define a reference
    - a custom meta-info DataType
1023
1024
1025
1026
1027
1028
    - a numpy dtype,

    If set to a dtype, this quantity will use a numpy array to store values. It will use
    the given dtype. If not set, this quantity will use (nested) Python lists to store values.
    If values are set to the property, they will be converted to the respective
    representation.
1029
1030
1031
1032
1033
1034

    In the NOMAD CoE meta-info this was basically the ``dTypeStr``.
    ''')
Quantity.shape = Quantity(
    type=Dimension, shape=['0..*'], name='shape', _bs=True, description='''
    The shape of the quantity that defines its dimensionality.
1035

1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
    A shape is a list, where each item defines a dimension. Each dimension can be:

    - an integer that defines the exact size of the dimension, e.g. ``[3]`` is the
      shape of a spacial vector
    - the name of an int typed quantity in the same section
    - a range specification as string build from a lower bound (i.e. int number),
      and an upper bound (int or ``*`` denoting arbitrary large), e.g. ``'0..*'``, ``'1..3'``
    ''')
Quantity.unit = Quantity(
    type=_Unit, _bs=True, description='''
    The optional physics unit for this quantity.
1047

1048
1049
1050
1051
1052
1053
1054
1055
    Units are given in `pint` units. Pint is a Python package that defines units and
    their algebra. There is a default registry :data:`units` that you can use.
    Example units are: ``units.m``, ``units.m / units.s ** 2``.
    ''')
Quantity.default = Quantity(
    type=None, _bs=True, default=None, description='''
    The default value for this quantity.
    ''')
1056

1057
1058
1059
Package.m_def = Section(name='Package', _bs=True)

Category.m_def = Section(name='Category', _bs=True)
1060

1061
1062
1063
Package.section_definitions = SubSection(
    sub_section=Section.m_def, name='section_definitions', repeats=True,
    description=''' The sections defined in this package. ''')
1064

1065
1066
1067
Package.category_definitions = SubSection(
    sub_section=Category.m_def, name='category_definitions', repeats=True,
    description=''' The categories defined in this package. ''')
1068

Markus Scheidgen's avatar
Markus Scheidgen committed
1069
1070
1071
1072
1073
is_bootstrapping = False

Package.__init_cls__()
Category.__init_cls__()
Section.__init_cls__()
1074
SubSection.__init_cls__()
Markus Scheidgen's avatar
Markus Scheidgen committed
1075
Quantity.__init_cls__()
1076

1077
1078
units = UnitRegistry()
""" The default pint unit registry that should be used to give units to quantity definitions. """