metainfo.py 37.6 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
# Copyright 2018 Markus Scheidgen
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an"AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
14

15
16
17
18
"""
The NOMAD meta-info allows to define physics data quantities. These definitions are
necessary for all computer representations of respective data (e.g. in Python,
search engines, data-bases, and files).
19

20
This modules provides various Python interfaces for
21

22
23
24
- defining meta-info data
- to create and manipulate data that follows these definitions
- to (de-)serialize meta-info data in JSON (i.e. represent data in JSON formatted files)
25

26
27
28
29
Here is a simple example that demonstrates the definition of System related quantities:

.. code-block:: python

Markus Scheidgen's avatar
Markus Scheidgen committed
30
    class Run(MSection):
31
32
        pass

Markus Scheidgen's avatar
Markus Scheidgen committed
33
    class System(MSection):
34
35
36
37
38
        \"\"\"
        A system section includes all quantities that describe a single a simulated
        system (a.k.a. geometry).
        \"\"\"

39
        m_def = Section(repeats=True, parent=Run)
40

41
42
43
44
        n_atoms = Quantity(
            type=int, description='''
            A Defines the number of atoms in the system.
            ''')
45

46
47
48
49
        atom_labels = Quantity(type=Enum(ase.data.chemical_symbols), shape['n_atoms'])
        atom_positions = Quantity(type=float, shape=['n_atoms', 3], unit=Units.m)
        simulation_cell = Quantity(type=float, shape=[3, 3], unit=Units.m)
        pbc = Quantity(type=bool, shape=[3])
50

51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
Here, we define a `section` called ``System``. The section mechanism allows to organize
related data into, well, sections. Sections form containment hierarchies. Here
containment is a parent-child (whole-part) relationship. In this example many ``Systems``,
are part of one ``Run``. Each ``System`` can contain values for the defined quantities:
``n_atoms``, ``atom_labels``, ``atom_positions``, ``simulation_cell``, and ``pbc``.
Quantities allow to state type, shape, and physics unit to specify possible quantity
values.

Here is an example, were we use the above definition to create, read, and manipulate
data that follows these definitions:

.. code-bock:: python

    run = Run()
    system = run.m_create(System)
    system.n_atoms = 3
    system.atom_labels = ['H', 'H', 'O']

    print(system.atom_labels)
    print(run.m_to_json(ident=2))

This last statement, will produce the following JSON:

.. code-block:: JSON

    {
77
        "m_def" = "Run",
78
79
        "System": [
            {
80
                "m_def" = "System",
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
                "m_parent_index" = 0,
                "n_atoms" = 3,
                "atom_labels" = [
                    "H",
                    "H",
                    "O"
                ]
            }
        ]
    }

This is the JSON representation, a serialized version of the Python representation in
the example above.

Sections can be extended with new quantities outside the original section definition.
This provides the key mechanism to extend commonly defined parts with (code) specific
quantities:

.. code-block:: Python

    class Method(nomad.metainfo.common.Method):
        x_vasp_incar_ALGO=Quantity(
            type=Enum(['Normal', 'VeryFast', ...]),
            links=['https://cms.mpi.univie.ac.at/wiki/index.php/ALGO'])
        \"\"\"
        A convenient option to specify the electronic minimisation algorithm (as of VASP.4.5)
        and/or to select the type of GW calculations.
        \"\"\"


All meta-info definitions and classes for meta-info data objects (i.e. section instances)
Markus Scheidgen's avatar
Markus Scheidgen committed
112
inherit from :class:` MSection`. This base-class provides common functions and attributes
113
114
115
116
117
118
for all meta-info data objects. Names of these common parts are prefixed with ``m_``
to distinguish them from user defined quantities. This also constitute's the `reflection`
interface (in addition to Python's build in ``getattr``, ``setattr``) that allows to
create and manipulate meta-info data, without prior program time knowledge of the underlying
definitions.

Markus Scheidgen's avatar
Markus Scheidgen committed
119
.. autoclass:: MSection
120
121
122

The following classes can be used to define and structure meta-info data:

Markus Scheidgen's avatar
Markus Scheidgen committed
123
- sections are defined by sub-classes :class:`MSection` and using :class:`Section` to
124
  populate the classattribute `m_def`
125
126
127
128
129
130
131
132
133
134
135
136
- quantities are defined by assigning classattributes of a section with :class:`Quantity`
  instances
- references (from one section to another) can be defined with quantities that use
  section definitions as type
- dimensions can use defined by simply using quantity names in shapes
- categories (former `abstract type definitions`) can be given in quantity definitions
  to assign quantities to additional specialization-generalization hierarchies

See the reference of classes :class:`Section` and :class:`Quantities` for details.

.. autoclass:: Section
.. autoclass:: Quantity
137
138
"""

139
140
# TODO validation

141
142
from typing import Type, TypeVar, Union, Tuple, Iterable, List, Any, Dict, cast
import sys
143
import inspect
144
import re
145
import json
146

147
import numpy as np
148
149
from pint.unit import _Unit
from pint import UnitRegistry
150
import inflection
151

Markus Scheidgen's avatar
Markus Scheidgen committed
152
153
is_bootstrapping = True
MSectionBound = TypeVar('MSectionBound', bound='MSection')
154

155

156
# Reflection
157

158
class Enum(list):
159
    """ Allows to define str types with values limited to a pre-set list of possible values. """
160
161
162
    pass


163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
class DataType:
    """
    Allows to define custom data types that can be used in the meta-info.

    The metainfo supports most types out of the box. These includes the python build-in
    primitive types (int, bool, str, float, ...), references to sections, and enums.
    However, in some occasions you need to add custom data types.
    """
    def check_type(self, value):
        pass

    def normalize(self, value):
        return value

    def to_json_serializable(self, value):
        return value

    def from_json_serializable(self, value):
        return value


class Dimension(DataType):
    def check_type(self, value):
        if isinstance(value, int):
            return

        if isinstance(value, str):
            if value.isidentifier():
                return
            if re.match(r'(\d)\.\.(\d|\*)', value):
                return

        if isinstance(value, Section):
            return

198
        if isinstance(value, type) and hasattr(value, 'm_def'):
199
200
201
202
203
204
205
206
207
208
209
            return

        raise TypeError('%s is not a valid dimension' % str(value))
    # TODO


# TODO class Unit(DataType)
# TODO class MetainfoType(DataType)
# TODO class Datetime(DataType)


210
class MObjectMeta(type):
211

212
213
    def __new__(self, cls_name, bases, dct):
        cls = super().__new__(self, cls_name, bases, dct)
Markus Scheidgen's avatar
Markus Scheidgen committed
214
215
        init = getattr(cls, '__init_cls__')
        if init is not None and not is_bootstrapping:
216
217
            init()
        return cls
218
219


Markus Scheidgen's avatar
Markus Scheidgen committed
220
221
Content = Tuple[MSectionBound, Union[List[MSectionBound], MSectionBound], str, MSectionBound]
SectionDef = Union[str, 'Section', Type[MSectionBound]]
222
223


Markus Scheidgen's avatar
Markus Scheidgen committed
224
225
class MSection(metaclass=MObjectMeta):
    """Base class for all section instances on all meta-info levels.
226

Markus Scheidgen's avatar
Markus Scheidgen committed
227
228
229
    All metainfo objects instantiate classes that inherit from ``MSection``. Each
    section or quantity definition is an ``MSection``, each actual (meta-)data carrying
    section is an ``MSection``. This class consitutes the reflection interface of the
230
231
232
233
234
235
236
237
238
239
240
241
242
    meta-info, since it allows to manipulate sections (and therefore all meta-info data)
    without having to know the specific sub-class.

    It also carries all the data for each section. All sub-classes only define specific
    sections in terms of possible sub-sections and quantities. The data is managed here.

    The reflection insterface for reading and manipulating quantity values consists of
    Pythons build in ``getattr``, ``setattr``, and ``del``, as well as member functions
    :func:`m_add_value`, and :func:`m_add_values`.

    Sub-sections and parent sections can be read and manipulated with :data:`m_parent`,
    :func:`m_sub_section`, :func:`m_create`.

243
244
245
246
247
    .. code-block:: python

        system = run.m_create(System)
        assert system.m_parent == run
        assert run.m_sub_section(System, system.m_parent_index) == system
248
249

    Attributes:
250
        m_def: The section definition that defines this sections, its possible
251
252
253
254
255
256
257
258
259
260
261
            sub-sections and quantities.
        m_parent: The parent section instance that this section is a sub-section of.
        m_parent_index: For repeatable sections, parent keep a list of sub-sections for
            each section definition. This is the index of this section in the respective
            parent sub-section list.
        m_data: The dictionary that holds all data of this section. It keeps the quantity
            values and sub-section. It should only be read directly (and never manipulated)
            if you are know what you are doing. You should always use the reflection interface
            if possible.
    """

262
    m_def: 'Section' = None
263

Markus Scheidgen's avatar
Markus Scheidgen committed
264
    def __init__(self, m_def: 'Section' = None, m_parent: 'MSection' = None, _bs: bool = False, **kwargs):
265
        self.m_def: 'Section' = m_def
Markus Scheidgen's avatar
Markus Scheidgen committed
266
        self.m_parent: 'MSection' = m_parent
267
        self.m_parent_index = -1
268

269
        cls = self.__class__
270
271
        if self.m_def is None:
            self.m_def = cls.m_def
272

273
274
        if cls.m_def is not None:
            assert self.m_def == cls.m_def, \
275
276
                'Section class and section definition must match'

277
278
279
280
281
282
283
284
        self.m_annotations: Dict[str, Any] = {}
        self.m_data: Dict[str, Any] = {}
        for key, value in kwargs.items():
            if key.startswith('a_'):
                self.m_annotations[key[2:]] = value
            else:
                self.m_data[key] = value

285
286
287
288
289
290
291
        # TODO
        # self.m_data = {}
        # if _bs:
        #     self.m_data.update(**kwargs)
        # else:
        #     self.m_update(**kwargs)

292
    @classmethod
Markus Scheidgen's avatar
Markus Scheidgen committed
293
    def __init_cls__(cls):
294
295
        # ensure that the m_def is defined
        m_def = cls.m_def
Markus Scheidgen's avatar
Markus Scheidgen committed
296
        if m_def is None:
297
298
            m_def = Section()
            setattr(cls, 'm_def', m_def)
299

300
301
        # transfer name and description to m_def
        m_def.name = cls.__name__
302
        if cls.__doc__ is not None:
303
            m_def.description = inspect.cleandoc(cls.__doc__).strip()
304
        m_def.section_cls = cls
305

306
        # add sub_section to parent section
307
308
309
        if m_def.parent is not None:
            sub_section_name = inflection.underscore(m_def.name)
            setattr(m_def.parent.section_cls, sub_section_name, sub_section(m_def))
310

311
        for name, attr in cls.__dict__.items():
312
            # transfer names and descriptions for quantities
313
314
            if isinstance(attr, Quantity):
                attr.name = name
315
                if attr.description is not None:
316
                    attr.description = inspect.cleandoc(attr.description).strip()
317
                    attr.__doc__ = attr.description
318
                # manual manipulation of m_data due to bootstrapping
319
                m_def.m_data.setdefault('Quantity', []).append(attr)
320

321
322
            # set names and parent on sub-sections
            elif isinstance(attr, sub_section):
323
                attr.section_def.parent = m_def
324
325
326
                if attr.section_def.name is None:
                    attr.section_def.name = inflection.camelize(name)

327
328
329
        # add section cls' section to the module's package
        module_name = cls.__module__
        pkg = Package.from_module(module_name)
330
        pkg.m_add_sub_section(cls.m_def)
331

332
    @staticmethod
333
    def m_type_check(definition: 'Quantity', value: Any, check_item: bool = False):
334
        """Checks if the value fits the given quantity in type and shape; raises
335
336
337
338
339
        TypeError if not."""

        if value is None and not check_item and definition.default is None:
            # Allow the default None value even if it would violate the type
            return
340
341
342
343

        def check_value(value):
            if isinstance(definition.type, Enum):
                if value not in definition.type:
344
                    raise TypeError('Not one of the enum values.')
345
346
347

            elif isinstance(definition.type, type):
                if not isinstance(value, definition.type):
348
                    raise TypeError('Value has wrong type.')
349
350

            elif isinstance(definition.type, Section):
Markus Scheidgen's avatar
Markus Scheidgen committed
351
                if not isinstance(value, MSection) or value.m_def != definition.type:
352
                    raise TypeError('The value is not a section of wrong section definition')
353
354

            else:
355
356
357
                # TODO
                # raise Exception('Invalid quantity type: %s' % str(definition.type))
                pass
358
359
360
361
362
363
364
365
366
367

        shape = None
        try:
            shape = definition.shape
        except KeyError:
            pass

        if shape is None or len(shape) == 0 or check_item:
            check_value(value)

368
369
370
371
372
373
374
375
        else:
            if type(definition.type) == np.dtype:
                if len(shape) != len(value.shape):
                    raise TypeError('Wrong shape')
            else:
                if len(shape) == 1:
                    if not isinstance(value, list):
                        raise TypeError('Wrong shape')
376

377
378
                    for item in value:
                        check_value(item)
379

380
381
382
383
                else:
                    # TODO
                    # raise Exception('Higher shapes not implemented')
                    pass
384
385
386

        # TODO check dimension

387
    def _resolve_section(self, definition: SectionDef) -> 'Section':
388
389
        """Resolves and checks the given section definition. """
        if isinstance(definition, str):
390
            section = self.m_def.sub_sections[definition]
391
392
393

        else:
            if isinstance(definition, type):
394
                section = getattr(definition, 'm_def')
395
396
            else:
                section = definition
397
            if section.name not in self.m_def.sub_sections:
398
399
400
                raise KeyError('Not a sub section.')

        return section
401

Markus Scheidgen's avatar
Markus Scheidgen committed
402
    def m_sub_sections(self, definition: SectionDef) -> List[MSectionBound]:
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
        """Returns all sub sections for the given section definition

        Args:
            definition: The definition of the section.

        Raises:
            KeyError: If the definition is not for a sub section
        """
        section_def = self._resolve_section(definition)

        m_data_value = self.m_data.get(section_def.name, None)

        if m_data_value is None:
            return []

        if section_def.repeats:
            return m_data_value
        else:
            return [m_data_value]

Markus Scheidgen's avatar
Markus Scheidgen committed
423
    def m_sub_section(self, definition: SectionDef, parent_index: int = -1) -> MSectionBound:
424
425
426
427
428
429
430
431
432
433
434
435
436
437
        """Returns the sub section for the given section definition and possible
           parent_index (for repeatable sections).

        Args:
            definition: The definition of the section.
            parent_index: The index of the desired section. This can be omitted for non
                repeatable sections. If omitted for repeatable sections a exception
                will be raised, if more then one sub-section exists. Likewise, if the given
                index is out of range.
        Raises:
            KeyError: If the definition is not for a sub section
            IndexError: If the given index is wrong, or if an index is given for a non
                repeatable section
        """
438
        section_def = self._resolve_section(definition)
439

440
441
442
443
444
445
446
        m_data_value = self.m_data.get(section_def.name, None)

        if m_data_value is None:
            if section_def.repeats:
                m_data_value = []
            else:
                m_data_value = None
447
448
449
450
451
452
453
454
455
456
457
458
459

        if isinstance(m_data_value, list):
            m_data_values = m_data_value
            if parent_index == -1:
                if len(m_data_values) == 1:
                    return m_data_values[0]
                else:
                    raise IndexError()
            else:
                return m_data_values[parent_index]
        else:
            if parent_index != -1:
                raise IndexError('Not a repeatable sub section.')
460
461

            return m_data_value
462

Markus Scheidgen's avatar
Markus Scheidgen committed
463
    def m_add_sub_section(self, sub_section: MSectionBound) -> MSectionBound:
464
465
        """Adds the given section instance as a sub section to this section."""

466
        section_def = sub_section.m_def
467
468
469
470
471
472
473
474
475
476
477

        if section_def.repeats:
            m_data_sections = self.m_data.setdefault(section_def.name, [])
            section_index = len(m_data_sections)
            m_data_sections.append(sub_section)
            sub_section.m_parent_index = section_index
        else:
            self.m_data[section_def.name] = sub_section

        return sub_section

478
479
    # TODO this should work with the section constructor
    def m_create(self, definition: Type[MSectionBound], **kwargs) -> MSectionBound:
480
        """Creates a subsection and adds it this this section
481

482
483
484
485
        Args:
            section: The section definition of the subsection. It is either the
                definition itself, or the python class representing the section definition.
            **kwargs: Are used to initialize the subsection.
486

487
488
        Returns:
            The created subsection
489

490
        Raises:
491
            KeyError: If the given section is not a subsection of this section.
492
        """
493
        section_def: 'Section' = self._resolve_section(definition)
494

495
        section_cls = section_def.section_cls
496
        section_instance = section_cls(m_def=section_def, m_parent=self, **kwargs)
497

498
        return cast(MSectionBound, self.m_add_sub_section(section_instance))
499

500
501
502
    def __resolve_quantity(self, definition: Union[str, 'Quantity']) -> 'Quantity':
        """Resolves and checks the given quantity definition. """
        if isinstance(definition, str):
503
            quantity = self.m_def.quantities[definition]
504

505
        else:
506
            if definition.m_parent != self.m_def:
507
508
509
510
511
512
513
                raise KeyError('Quantity is not a quantity of this section.')
            quantity = definition

        return quantity

    def m_add(self, definition: Union[str, 'Quantity'], value: Any):
        """Adds the given value to the given quantity."""
514

515
516
        quantity = self.__resolve_quantity(definition)

Markus Scheidgen's avatar
Markus Scheidgen committed
517
        MSection.m_type_check(quantity, value, check_item=True)
518
519
520
521
522
523
524
525
526
527

        m_data_values = self.m_data.setdefault(quantity.name, [])
        m_data_values.append(value)

    def m_add_values(self, definition: Union[str, 'Quantity'], values: Iterable[Any]):
        """Adds the given values to the given quantity."""

        quantity = self.__resolve_quantity(definition)

        for value in values:
Markus Scheidgen's avatar
Markus Scheidgen committed
528
            MSection.m_type_check(quantity, value, check_item=True)
529
530
531
532
533

        m_data_values = self.m_data.setdefault(quantity.name, [])
        for value in values:
            m_data_values.append(value)

534
535
536
    def m_update(self, **kwargs):
        """ Updates all quantities and sub-sections with the given arguments. """
        for name, value in kwargs.items():
537
            attribute = self.m_def.attributes.get(name, None)
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
            if attribute is None:
                raise KeyError('%s is not an attribute of this section' % name)

            if isinstance(attribute, Section):
                if attribute.repeats:
                    if isinstance(value, List):
                        for item in value:
                            self.m_add_sub_section(item)
                    else:
                        raise TypeError('Sub section %s repeats, but no list was given' % attribute.name)
                else:
                    self.m_add_sub_section(item)

            else:
                setattr(self, name, value)

554
555
    def m_to_dict(self) -> Dict[str, Any]:
        """Returns the data of this section as a json serializeable dictionary. """
556
557

        def items() -> Iterable[Tuple[str, Any]]:
558
            yield 'm_def', self.m_def.name
559
            if self.m_parent_index != -1:
560
                yield 'm_parent_index', self.m_parent_index
561

562
            for name, sub_section in self.m_def.sub_sections.items():
563
564
565
566
567
568
569
570
                if name not in self.m_data:
                    continue

                if sub_section.repeats:
                    yield name, [item.m_to_dict() for item in self.m_data[name]]
                else:
                    yield name, self.m_data[name].m_to_dict()

571
            for name, quantity in self.m_def.quantities.items():
572
573
574
575
                if name in self.m_data:
                    value = getattr(self, name)
                    if hasattr(value, 'tolist'):
                        value = value.tolist()
576
577
578
579
580
581
582
583
584
585
586
587
588
589

                    # TODO
                    if isinstance(quantity.type, Section):
                        value = str(value)
                    # TODO
                    if isinstance(value, type):
                        value = str(value)
                    # TODO
                    if isinstance(value, np.dtype):
                        value = str(value)
                    # TODO
                    if isinstance(value, _Unit):
                        value = str(value)

590
591
592
                    yield name, value

        return {key: value for key, value in items()}
593

594
    @classmethod
Markus Scheidgen's avatar
Markus Scheidgen committed
595
    def m_from_dict(cls: Type[MSectionBound], dct: Dict[str, Any]) -> MSectionBound:
596
        section_def = cls.m_def
597

598
599
        # remove m_def and m_parent_index, they set themselves automatically
        assert section_def.name == dct.pop('m_def', None)
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
        dct.pop('m_parent_index', -1)

        def items():
            for name, sub_section_def in section_def.sub_sections.items():
                if name in dct:
                    sub_section_value = dct.pop(name)
                    if sub_section_def.repeats:
                        yield name, [
                            sub_section_def.section_cls.m_from_dict(sub_section_dct)
                            for sub_section_dct in sub_section_value]
                    else:
                        yield name, sub_section_def.section_cls.m_from_dict(sub_section_value)

            for key, value in dct.items():
                yield key, value

        dct = {key: value for key, value in items()}
Markus Scheidgen's avatar
Markus Scheidgen committed
617
        section_instance = cast(MSectionBound, section_def.section_cls())
618
619
620
        section_instance.m_update(**dct)
        return section_instance

621
    def m_to_json(self, **kwargs):
622
        """Returns the data of this section as a json string. """
623
        return json.dumps(self.m_to_dict(), **kwargs)
624

625
    def m_all_contents(self) -> Iterable[Content]:
626
        """Returns an iterable over all sub and sub subs sections. """
627
628
629
        for content in self.m_contents():
            for sub_content in content[0].m_all_contents():
                yield sub_content
630

631
            yield content
632

633
    def m_contents(self) -> Iterable[Content]:
634
        """Returns an iterable over all direct subs sections. """
635
636
637
        for name, attr in self.m_data.items():
            if isinstance(attr, list):
                for value in attr:
Markus Scheidgen's avatar
Markus Scheidgen committed
638
                    if isinstance(value, MSection):
639
                        yield value, attr, name, self
640

Markus Scheidgen's avatar
Markus Scheidgen committed
641
            elif isinstance(attr, MSection):
642
                yield value, value, name, self
643

644
    def __repr__(self):
645
        m_section_name = self.m_def.name
646
647
648
649
650
        name = ''
        if 'name' in self.m_data:
            name = self.m_data['name']

        return '%s:%s' % (name, m_section_name)
651
652


Markus Scheidgen's avatar
Markus Scheidgen committed
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
class MCategory(metaclass=MObjectMeta):

    m_def: 'Category' = None

    @classmethod
    def __init_cls__(cls):
        # ensure that the m_def is defined
        m_def = cls.m_def
        if m_def is None:
            m_def = Category()
            setattr(cls, 'm_def', m_def)

        # transfer name and description to m_def
        m_def.name = cls.__name__
        if cls.__doc__ is not None:
668
            m_def.description = inspect.cleandoc(cls.__doc__).strip()
Markus Scheidgen's avatar
Markus Scheidgen committed
669
670
671
672
673
674
675

        # add section cls' section to the module's package
        module_name = cls.__module__
        pkg = Package.from_module(module_name)
        pkg.m_add_sub_section(cls.m_def)


676
677
678
679
680
681
682
# M3, the definitions that are used to write definitions. These are the section definitions
# for sections Section and Quantity.They define themselves; i.e. the section definition
# for Section is the same section definition.
# Due to this circular nature (hen-egg-problem), the classes for sections Section and
# Quantity do only contain placeholder for their own section and quantity definitions.
# These placeholder are replaced, once the necessary classes are defined. This process
# is referred to as 'bootstrapping'.
683

684
685
686
_definition_change_counter = 0


687
688
class cached_property:
    """ A property that allows to cache the property value.
689
690
691
692
693

    The cache will be invalidated whenever a new definition is added. Once all definitions
    are loaded, the cache becomes stable and complex derived results become available
    instantaneous.
    """
694
695
696
697
698
    def __init__(self, f):
        self.__doc__ = getattr(f, "__doc__")
        self.f = f
        self.change = -1
        self.values: Dict[type(self), Any] = {}
699

700
701
702
703
704
705
706
    def __get__(self, obj, cls):
        if obj is None:
            return self

        global _definition_change_counter
        if self.change != _definition_change_counter:
            self.values = {}
707

708
709
710
711
        value = self.values.get(obj, None)
        if value is None:
            value = self.f(obj)
            self.values[obj] = value
712
713
714
715

        return value


Markus Scheidgen's avatar
Markus Scheidgen committed
716
class Definition(MSection):
717

Markus Scheidgen's avatar
Markus Scheidgen committed
718
    __all_definitions: Dict[Type[MSection], List[MSection]] = {}
719

720
721
722
    name: 'Quantity' = None
    description: 'Quantity' = None
    links: 'Quantity' = None
723
    categories: 'Quantity' = None
724

725
726
727
728
729
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        global _definition_change_counter
        _definition_change_counter += 1

730
731
732
733
734
        for cls in self.__class__.mro() + [self.__class__]:
            definitions = Definition.__all_definitions.setdefault(cls, [])
            definitions.append(self)

    @classmethod
Markus Scheidgen's avatar
Markus Scheidgen committed
735
    def all_definitions(cls: Type[MSectionBound]) -> Iterable[MSectionBound]:
736
        """ Returns all definitions of this definition class. """
Markus Scheidgen's avatar
Markus Scheidgen committed
737
        return cast(Iterable[MSectionBound], Definition.__all_definitions.get(cls, []))
738

739
740
741
742
    @cached_property
    def all_categories(self):
        """ All categories of this definition and its categories. """
        all_categories = list(self.categories)
Markus Scheidgen's avatar
Markus Scheidgen committed
743
        for category in self.categories:  # pylint: disable=not-an-iterable
744
745
746
747
748
            for super_category in category.all_categories:
                all_categories.append(super_category)

        return all_categories

749
750

class Quantity(Definition):
751
752
753
754
755
756
757
758
759
760
761
    """Used to define quantities that store a certain piece of (meta-)data.

    Quantities are the basic building block with meta-info data. The Quantity class is
    used to define quantities within sections. A quantity definition
    is a (physics) quantity with name, type, shape, and potentially a unit.

    In Python terms, quantities are descriptors. Descriptors define how to get, set, and
    delete values for a object attribute. Meta-info descriptors ensure that
    type and shape fit the set values.
    """

762
763
    type: 'Quantity' = None
    shape: 'Quantity' = None
764
765
    unit: 'Quantity' = None
    default: 'Quantity' = None
766
767
768
769
770
771
772

    # TODO section = Quantity(type=Section), the section it belongs to
    # TODO synonym_for = Quantity(type=Quantity)
    # TODO derived_from = Quantity(type=Quantity, shape=['0..*'])
    # TODO categories = Quantity(type=Category, shape=['0..*'])
    # TODO converter = Quantity(type=Converter), a class with set of functions for
    #      normalizing, (de-)serializing values.
773
774
775
776
777
778

    # Some quantities of Quantity cannot be read as normal quantities due to bootstraping.
    # Those can be accessed internally through the following replacement properties that
    # read directly from m_data.
    __name = property(lambda self: self.m_data['name'])
    __default = property(lambda self: self.m_data.get('default', None))
779

780
    def __get__(self, obj, type=None):
781
782
783
784
785
786
787
788
789
        if obj is None:
            # class (def) attribute case
            return self

        # object (instance) attribute case
        try:
            return obj.m_data[self.__name]
        except KeyError:
            return self.__default
790

791
    def __set__(self, obj, value):
792
793
794
795
796
        if obj is None:
            # class (def) case
            raise KeyError('Cannot overwrite quantity definition. Only values can be set.')

        # object (instance) case
797
798
799
800
801
802
803
804
805
        if type(self.type) == np.dtype:
            if type(value) != np.ndarray:
                value = np.array(value, dtype=self.type)
            elif self.type != value.dtype:
                value = np.array(value, dtype=self.type)

        elif type(value) == np.ndarray:
            value = value.tolist()

Markus Scheidgen's avatar
Markus Scheidgen committed
806
        MSection.m_type_check(self, value)
807
        obj.m_data[self.__name] = value
808

809
    def __delete__(self, obj):
810
811
812
813
814
        if obj is None:
            # class (def) case
            raise KeyError('Cannot delete quantity definition. Only values can be deleted.')

        # object (instance) case
815
        del obj.m_data[self.__name]
816
817


818
class Section(Definition):
819
820
821
822
823
824
825
826
827
828
829
830
831
    """Used to define section that organize meta-info data into containment hierarchies.

    Section definitions determine what quantities and sub-sections can appear in a section
    instance. A section instance itself can appear potentially many times in its parent
    section. See :data:`repeats` and :data:`parent`.

    In Python terms, sections are classes. Sub-sections and quantities are attribute of
    respective instantiating objects. For each section class there is a corresponding
    :class:`Section` instance that describes this class as a section. This instance
    is referred to as 'section definition' in contrast to the Python class that we call
    'section class'.
    """

Markus Scheidgen's avatar
Markus Scheidgen committed
832
    section_cls: Type[MSection] = None
833
834
    """ The section class that corresponse to this section definition. """

835
836
    repeats: 'Quantity' = None
    parent: 'Quantity' = None
837

838
839
840
841
    # TODO super = Quantity(type=Section, shape=['0..*']), inherit all quantity definition
    #      from the given sections, derived from Python base classes
    # TODO extends = Quantity(type=bool), denotes this section as a container for
    #      new quantities that belong to the base-class section definitions
842

843
844
845
846
    def __init__(self, **kwargs):
        # The mechanism that produces default values, depends on parent. Without setting
        # the parent default manually, an endless recursion will occur.
        kwargs.setdefault('parent', None)
847

848
        super().__init__(**kwargs)
849

850
    @cached_property
851
852
    def attributes(self) -> Dict[str, Union['Section', Quantity]]:
        """ All attribute (sub section and quantity) definitions. """
853

854
855
856
        attributes: Dict[str, Union[Section, Quantity]] = dict(**self.quantities)
        attributes.update(**self.sub_sections)
        return attributes
857

858
    @cached_property
859
860
    def quantities(self) -> Dict[str, Quantity]:
        """ All quantity definition in the given section definition. """
861

862
863
864
        return {
            quantity.name: quantity
            for quantity in self.m_data.get('Quantity', [])}
865

866
    @cached_property
867
868
    def sub_sections(self) -> Dict[str, 'Section']:
        """ All sub section definitions for this section definition. """
869

870
871
        return {
            sub_section.name: sub_section
872
            for sub_section in Section.all_definitions()
873
            if sub_section.parent == self}
874

875
876
877
878
879
880
881
882
883
    def add_quantity(self, quantity: Quantity):
        """
        Adds the given quantity to this section.

        Allows to add a quantity to a section definition outside the corresponding
        section class.

        .. code-block:: Python

Markus Scheidgen's avatar
Markus Scheidgen committed
884
        class System(MSection):
885
886
            pass

887
        System.m_def.add_quantity(Quantity(name='n_atoms', type=int))
888
889
890
891
892
893
894
895
896

        This will add the quantity definition to this section definition,
        and add the respective Python descriptor as an attribute to this class.
        """
        quantities = self.m_data.setdefault('Quantity', [])
        quantities.append(quantity)

        setattr(self.section_cls, quantity.name, quantity)

897

898
class Package(Definition):
899
900
901
902
903
904
905
906
907
908
909
910

    @staticmethod
    def from_module(module_name: str):
        module = sys.modules[module_name]

        pkg: 'Package' = getattr(module, 'm_package', None)
        if pkg is None:
            pkg = Package()
            setattr(module, 'm_package', pkg)

        pkg.name = module_name
        if pkg.description is None and module.__doc__ is not None:
911
            pkg.description = inspect.cleandoc(module.__doc__).strip()
912
913

        return pkg
914
915
916
917
918
919
920


class sub_section:
    """ Allows to assign a section class as a sub-section to another section class. """

    def __init__(self, section: SectionDef, **kwargs):
        if isinstance(section, type):
Markus Scheidgen's avatar
Markus Scheidgen committed
921
            self.section_def = cast(MSection, section).m_def
922
923
924
        else:
            self.section_def = cast(Section, section)

Markus Scheidgen's avatar
Markus Scheidgen committed
925
    def __get__(self, obj: MSection, type=None) -> Union[MSection, Section]:
926
927
928
929
930
931
932
933
934
935
936
937
938
        if obj is None:
            # the class attribute case
            return self.section_def

        else:
            # the object attribute case
            m_data_value = obj.m_data.get(self.section_def.name, None)
            if m_data_value is None:
                if self.section_def.repeats:
                    m_data_value = []

            return m_data_value

Markus Scheidgen's avatar
Markus Scheidgen committed
939
    def __set__(self, obj: MSection, value: Union[MSection, List[MSection]]):
940
941
942
943
944
945
        raise NotImplementedError('Sub sections cannot be set directly. Use m_create.')

    def __delete__(self, obj):
        raise NotImplementedError('Sub sections cannot be deleted directly.')


946
947
948
949
950
class Category(Definition):
    """Can be used to define categories for definitions.

    Each definition, including categories themselves, can belong to a set of categories.
    Categories therefore form a hierarchy of concepts that definitions can belong to, i.e.
951
    they form a `is a` relationship.
952

953
954
    In the old meta-info this was known as `abstract types`.
    """
955
956
957
958
959
960

    @cached_property
    def definitions(self) -> Iterable[Definition]:
        """ All definitions that are directly or indirectly in this category. """
        return list([
            definition for definition in Definition.all_definitions()
961
            if self in definition.all_categories])
962
963


964
965
966
Section.m_def = Section(repeats=True, name='Section', _bs=True)
Section.m_def.m_def = Section.m_def
Section.m_def.section_cls = Section
967

968
Quantity.m_def = Section(repeats=True, parent=Section.m_def, name='Quantity', _bs=True)
969
970
971
972
973
974
975
976
977
978
979
980
981

Definition.name = Quantity(
    type=str, name='name', _bs=True, description='''
    The name of the quantity. Must be unique within a section.
    ''')
Definition.description = Quantity(
    type=str, name='description', _bs=True, description='''
    An optional human readable description.
    ''')
Definition.links = Quantity(
    type=str, shape=['0..*'], name='links', _bs=True, description='''
    A list of URLs to external resource that describe this definition.
    ''')
982
Definition.categories = Quantity(
983
    type=Category.m_def, shape=['0..*'], default=[], name='categories', _bs=True,
984
985
986
    description='''
    The categories that this definition belongs to. See :class:`Category`.
    ''')
987
988
989
990
991
992
993

Section.repeats = Quantity(
    type=bool, name='repeats', default=False, _bs=True,
    description='''
    Wether instances of this section can occur repeatedly in the parent section.
    ''')
Section.parent = Quantity(
994
    type=Section.m_def, name='parent', _bs=True, description='''
995
996
997
    The section definition of parent sections. Only section instances of this definition
    can contain section instances of this definition.
    ''')
998

999
Quantity.m_def.section_cls = Quantity
1000
Quantity.type = Quantity(
1001
    type=Union[type, Enum, Section, np.dtype], name='type', _bs=True, description='''
1002
1003
1004
1005
1006
1007
1008
1009
1010
    The type of the quantity.

    Can be one of the following:

    - none to support any value
    - a build-in primitive Python type, e.g. ``int``, ``str``
    - an instance of :class:`Enum`, e.g. ``Enum(['one', 'two', 'three'])
    - a instance of Section, i.e. a section definition. This will define a reference
    - a custom meta-info DataType
1011
1012
1013
1014
1015
1016
    - a numpy dtype,

    If set to a dtype, this quantity will use a numpy array to store values. It will use
    the given dtype. If not set, this quantity will use (nested) Python lists to store values.
    If values are set to the property, they will be converted to the respective
    representation.
1017
1018
1019
1020
1021
1022

    In the NOMAD CoE meta-info this was basically the ``dTypeStr``.
    ''')
Quantity.shape = Quantity(
    type=Dimension, shape=['0..*'], name='shape', _bs=True, description='''
    The shape of the quantity that defines its dimensionality.
1023

1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
    A shape is a list, where each item defines a dimension. Each dimension can be:

    - an integer that defines the exact size of the dimension, e.g. ``[3]`` is the
      shape of a spacial vector
    - the name of an int typed quantity in the same section
    - a range specification as string build from a lower bound (i.e. int number),
      and an upper bound (int or ``*`` denoting arbitrary large), e.g. ``'0..*'``, ``'1..3'``
    ''')
Quantity.unit = Quantity(
    type=_Unit, _bs=True, description='''
    The optional physics unit for this quantity.
1035

1036
1037
1038
1039
1040
1041
1042
1043
    Units are given in `pint` units. Pint is a Python package that defines units and
    their algebra. There is a default registry :data:`units` that you can use.
    Example units are: ``units.m``, ``units.m / units.s ** 2``.
    ''')
Quantity.default = Quantity(
    type=None, _bs=True, default=None, description='''
    The default value for this quantity.
    ''')
1044

1045
1046
Package.m_def = Section(repeats=True, name='Package', _bs=True)
Package.m_def.parent = Package.m_def
1047

1048
Section.m_def.parent = Package.m_def
1049

1050
Category.m_def = Section(repeats=True, parent=Package.m_def)
1051

Markus Scheidgen's avatar
Markus Scheidgen committed
1052
1053
1054
1055
1056
1057
is_bootstrapping = False

Package.__init_cls__()
Category.__init_cls__()
Section.__init_cls__()
Quantity.__init_cls__()
1058

1059
1060
units = UnitRegistry()
""" The default pint unit registry that should be used to give units to quantity definitions. """