metainfo.py 24.5 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
# Copyright 2018 Markus Scheidgen
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an"AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
14

15
16
17
18
"""
The NOMAD meta-info allows to define physics data quantities. These definitions are
necessary for all computer representations of respective data (e.g. in Python,
search engines, data-bases, and files).
19

20
This modules provides various Python interfaces for
21

22
23
24
- defining meta-info data
- to create and manipulate data that follows these definitions
- to (de-)serialize meta-info data in JSON (i.e. represent data in JSON formatted files)
25

26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Here is a simple example that demonstrates the definition of System related quantities:

.. code-block:: python

    class Run(MObject):
        pass

    class System(MObject):
        \"\"\"
        A system section includes all quantities that describe a single a simulated
        system (a.k.a. geometry).
        \"\"\"

        m_section = Section(repeats=True, parent=Run.m_section)

        n_atoms = Quantity(type=int)
        \"\"\" A Defines the number of atoms in the system. \"\"\"
43

44
45
46
47
        atom_labels = Quantity(type=Enum(ase.data.chemical_symbols), shape['n_atoms'])
        atom_positions = Quantity(type=float, shape=['n_atoms', 3], unit=Units.m)
        simulation_cell = Quantity(type=float, shape=[3, 3], unit=Units.m)
        pbc = Quantity(type=bool, shape=[3])
48

49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
Here, we define a `section` called ``System``. The section mechanism allows to organize
related data into, well, sections. Sections form containment hierarchies. Here
containment is a parent-child (whole-part) relationship. In this example many ``Systems``,
are part of one ``Run``. Each ``System`` can contain values for the defined quantities:
``n_atoms``, ``atom_labels``, ``atom_positions``, ``simulation_cell``, and ``pbc``.
Quantities allow to state type, shape, and physics unit to specify possible quantity
values.

Here is an example, were we use the above definition to create, read, and manipulate
data that follows these definitions:

.. code-bock:: python

    run = Run()
    system = run.m_create(System)
    system.n_atoms = 3
    system.atom_labels = ['H', 'H', 'O']

    print(system.atom_labels)
    print(run.m_to_json(ident=2))

This last statement, will produce the following JSON:

.. code-block:: JSON

    {
        "m_section" = "Run",
        "System": [
            {
                "m_section" = "System",
                "m_parent_index" = 0,
                "n_atoms" = 3,
                "atom_labels" = [
                    "H",
                    "H",
                    "O"
                ]
            }
        ]
    }

This is the JSON representation, a serialized version of the Python representation in
the example above.

Sections can be extended with new quantities outside the original section definition.
This provides the key mechanism to extend commonly defined parts with (code) specific
quantities:

.. code-block:: Python

    class Method(nomad.metainfo.common.Method):
        x_vasp_incar_ALGO=Quantity(
            type=Enum(['Normal', 'VeryFast', ...]),
            links=['https://cms.mpi.univie.ac.at/wiki/index.php/ALGO'])
        \"\"\"
        A convenient option to specify the electronic minimisation algorithm (as of VASP.4.5)
        and/or to select the type of GW calculations.
        \"\"\"


All meta-info definitions and classes for meta-info data objects (i.e. section instances)
inherit from :class:` MObject`. This base-class provides common functions and attributes
for all meta-info data objects. Names of these common parts are prefixed with ``m_``
to distinguish them from user defined quantities. This also constitute's the `reflection`
interface (in addition to Python's build in ``getattr``, ``setattr``) that allows to
create and manipulate meta-info data, without prior program time knowledge of the underlying
definitions.

.. autoclass:: MObject

The following classes can be used to define and structure meta-info data:

- sections are defined by sub-classes :class:`MObject` and using :class:`Section` to
  populate the classattribute `m_section`
- quantities are defined by assigning classattributes of a section with :class:`Quantity`
  instances
- references (from one section to another) can be defined with quantities that use
  section definitions as type
- dimensions can use defined by simply using quantity names in shapes
- categories (former `abstract type definitions`) can be given in quantity definitions
  to assign quantities to additional specialization-generalization hierarchies

See the reference of classes :class:`Section` and :class:`Quantities` for details.

.. autoclass:: Section
.. autoclass:: Quantity
135
136
"""

137
138
from typing import Type, TypeVar, Union, Tuple, Iterable, List, Any, Dict, cast
import sys
139
140
141
import inspect
from pint.unit import _Unit
from pint import UnitRegistry
142
143
144
145

__module__ = sys.modules[__name__]
MObjectBound = TypeVar('MObjectBound', bound='MObject')

146

147
# Reflection
148

149
150
151
152
class Enum(list):
    pass


153
class MObjectMeta(type):
154

155
156
157
158
159
160
    def __new__(self, cls_name, bases, dct):
        cls = super().__new__(self, cls_name, bases, dct)
        init = getattr(cls, '__init_section_cls__')
        if init is not None:
            init()
        return cls
161
162


163
Content = Tuple[MObjectBound, Union[List[MObjectBound], MObjectBound], str, MObjectBound]
164
SectionDef = Union[str, 'Section', Type[MObjectBound]]
165
166


167
class MObject(metaclass=MObjectMeta):
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
    """Base class for all section objects on all meta-info levels.

    All metainfo objects instantiate classes that inherit from ``MObject``. Each
    section or quantity definition is an ``MObject``, each actual (meta-)data carrying
    section is an ``MObject``. This class consitutes the reflection interface of the
    meta-info, since it allows to manipulate sections (and therefore all meta-info data)
    without having to know the specific sub-class.

    It also carries all the data for each section. All sub-classes only define specific
    sections in terms of possible sub-sections and quantities. The data is managed here.

    The reflection insterface for reading and manipulating quantity values consists of
    Pythons build in ``getattr``, ``setattr``, and ``del``, as well as member functions
    :func:`m_add_value`, and :func:`m_add_values`.

    Sub-sections and parent sections can be read and manipulated with :data:`m_parent`,
    :func:`m_sub_section`, :func:`m_create`.

186
187
188
189
190
    .. code-block:: python

        system = run.m_create(System)
        assert system.m_parent == run
        assert run.m_sub_section(System, system.m_parent_index) == system
191
192
193
194
195
196
197
198
199
200
201
202
203
204

    Attributes:
        m_section: The section definition that defines this sections, its possible
            sub-sections and quantities.
        m_parent: The parent section instance that this section is a sub-section of.
        m_parent_index: For repeatable sections, parent keep a list of sub-sections for
            each section definition. This is the index of this section in the respective
            parent sub-section list.
        m_data: The dictionary that holds all data of this section. It keeps the quantity
            values and sub-section. It should only be read directly (and never manipulated)
            if you are know what you are doing. You should always use the reflection interface
            if possible.
    """

205
206
    m_section: 'Section' = None

207
208
209
210
    def __init__(self, m_section: 'Section' = None, m_parent: 'MObject' = None, **kwargs):
        self.m_section: 'Section' = m_section
        self.m_parent: 'MObject' = m_parent
        self.m_parent_index = -1
211
        self.m_data = dict(**kwargs)
212

213
        cls = self.__class__
214
        if self.m_section is None:
215
216
217
218
            self.m_section = cls.m_section

        if cls.m_section is not None:
            assert self.m_section == cls.m_section, \
219
220
                'Section class and section definition must match'

221
222
223
224
225
    @classmethod
    def __init_section_cls__(cls):
        if not hasattr(__module__, 'Quantity') or not hasattr(__module__, 'Section'):
            # no initialization during bootstrapping, will be done maunally
            return
226

227
228
        m_section = cls.m_section
        if m_section is None and cls != MObject:
229
230
            m_section = Section()
            setattr(cls, 'm_section', m_section)
231

232
        m_section.name = cls.__name__
233
234
        if cls.__doc__ is not None:
            m_section.description = inspect.cleandoc(cls.__doc__)
235
        m_section.section_cls = cls
236

237
238
239
240
241
        for name, attr in cls.__dict__.items():
            if isinstance(attr, Quantity):
                attr.name = name
                if attr.__doc__ is not None:
                    attr.description = inspect.cleandoc(attr.__doc__)
242
                # manual manipulation of m_data due to bootstrapping
243
                m_section.m_data.setdefault('Quantity', []).append(attr)
244
245

    @staticmethod
246
    def m_type_check(definition: 'Quantity', value: Any, check_item: bool = False):
247
        """Checks if the value fits the given quantity in type and shape; raises
248
249
250
251
252
        TypeError if not."""

        if value is None and not check_item and definition.default is None:
            # Allow the default None value even if it would violate the type
            return
253
254
255
256

        def check_value(value):
            if isinstance(definition.type, Enum):
                if value not in definition.type:
257
                    raise TypeError('Not one of the enum values.')
258
259
260

            elif isinstance(definition.type, type):
                if not isinstance(value, definition.type):
261
                    raise TypeError('Value has wrong type.')
262
263
264

            elif isinstance(definition.type, Section):
                if not isinstance(value, MObject) or value.m_section != definition.type:
265
                    raise TypeError('The value is not a section of wrong section definition')
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280

            else:
                raise Exception('Invalid quantity type: %s' % str(definition.type))

        shape = None
        try:
            shape = definition.shape
        except KeyError:
            pass

        if shape is None or len(shape) == 0 or check_item:
            check_value(value)

        elif len(shape) == 1:
            if not isinstance(value, list):
281
                raise TypeError('Wrong shape')
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305

            for item in value:
                check_value(item)

        else:
            # TODO
            raise Exception('Higher shapes not implemented')

        # TODO check dimension

    def __resolve_section(self, definition: SectionDef) -> 'Section':
        """Resolves and checks the given section definition. """
        if isinstance(definition, str):
            section = self.m_section.sub_sections[definition]

        else:
            if isinstance(definition, type):
                section = getattr(definition, 'm_section')
            else:
                section = definition
            if section.name not in self.m_section.sub_sections:
                raise KeyError('Not a sub section.')

        return section
306

307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
    def m_sub_section(self, definition: SectionDef, parent_index: int = -1) -> MObjectBound:
        """Returns the sub section for the given section definition and possible
           parent_index (for repeatable sections).

        Args:
            definition: The definition of the section.
            parent_index: The index of the desired section. This can be omitted for non
                repeatable sections. If omitted for repeatable sections a exception
                will be raised, if more then one sub-section exists. Likewise, if the given
                index is out of range.
        Raises:
            KeyError: If the definition is not for a sub section
            IndexError: If the given index is wrong, or if an index is given for a non
                repeatable section
        """
        section_def = self.__resolve_section(definition)

        m_data_value = self.m_data[section_def.name]

        if isinstance(m_data_value, list):
            m_data_values = m_data_value
            if parent_index == -1:
                if len(m_data_values) == 1:
                    return m_data_values[0]
                else:
                    raise IndexError()
            else:
                return m_data_values[parent_index]
        else:
            if parent_index != -1:
                raise IndexError('Not a repeatable sub section.')
            else:
                return m_data_value

    def m_create(self, definition: SectionDef, **kwargs) -> MObjectBound:
342
        """Creates a subsection and adds it this this section
343

344
345
346
347
        Args:
            section: The section definition of the subsection. It is either the
                definition itself, or the python class representing the section definition.
            **kwargs: Are used to initialize the subsection.
348

349
350
        Returns:
            The created subsection
351

352
        Raises:
353
            KeyError: If the given section is not a subsection of this section.
354
        """
355
        section_def: 'Section' = self.__resolve_section(definition)
356

357
        section_cls = section_def.section_cls
358
        section_instance = section_cls(m_section=section_def, m_parent=self, **kwargs)
359

360
        if section_def.repeats:
361
362
363
364
            m_data_sections = self.m_data.setdefault(section_def.name, [])
            section_index = len(m_data_sections)
            m_data_sections.append(section_instance)
            section_instance.m_parent_index = section_index
365
        else:
366
            self.m_data[section_def.name] = section_instance
367

368
        return cast(MObjectBound, section_instance)
369

370
371
372
373
    def __resolve_quantity(self, definition: Union[str, 'Quantity']) -> 'Quantity':
        """Resolves and checks the given quantity definition. """
        if isinstance(definition, str):
            quantity = self.m_section.quantities[definition]
374

375
376
377
378
379
380
381
382
383
        else:
            if definition.m_parent != self.m_section:
                raise KeyError('Quantity is not a quantity of this section.')
            quantity = definition

        return quantity

    def m_add(self, definition: Union[str, 'Quantity'], value: Any):
        """Adds the given value to the given quantity."""
384

385
386
        quantity = self.__resolve_quantity(definition)

387
        MObject.m_type_check(quantity, value, check_item=True)
388
389
390
391
392
393
394
395
396
397

        m_data_values = self.m_data.setdefault(quantity.name, [])
        m_data_values.append(value)

    def m_add_values(self, definition: Union[str, 'Quantity'], values: Iterable[Any]):
        """Adds the given values to the given quantity."""

        quantity = self.__resolve_quantity(definition)

        for value in values:
398
            MObject.m_type_check(quantity, value, check_item=True)
399
400
401
402
403
404
405

        m_data_values = self.m_data.setdefault(quantity.name, [])
        for value in values:
            m_data_values.append(value)

    def m_to_dict(self) -> Dict[str, Any]:
        """Returns the data of this section as a json serializeable dictionary. """
406
        pass
407

408
    def m_to_json(self):
409
        """Returns the data of this section as a json string. """
410
        pass
411

412
    def m_all_contents(self) -> Iterable[Content]:
413
        """Returns an iterable over all sub and sub subs sections. """
414
415
416
        for content in self.m_contents():
            for sub_content in content[0].m_all_contents():
                yield sub_content
417

418
            yield content
419

420
    def m_contents(self) -> Iterable[Content]:
421
        """Returns an iterable over all direct subs sections. """
422
423
424
425
426
        for name, attr in self.m_data.items():
            if isinstance(attr, list):
                for value in attr:
                    if isinstance(value, MObject):
                        yield value, attr, name, self
427

428
429
            elif isinstance(attr, MObject):
                yield value, value, name, self
430

431
432
433
434
435
436
437
    def __repr__(self):
        m_section_name = self.m_section.name
        name = ''
        if 'name' in self.m_data:
            name = self.m_data['name']

        return '%s:%s' % (name, m_section_name)
438
439


440
441
442
443
444
445
446
# M3, the definitions that are used to write definitions. These are the section definitions
# for sections Section and Quantity.They define themselves; i.e. the section definition
# for Section is the same section definition.
# Due to this circular nature (hen-egg-problem), the classes for sections Section and
# Quantity do only contain placeholder for their own section and quantity definitions.
# These placeholder are replaced, once the necessary classes are defined. This process
# is referred to as 'bootstrapping'.
447

448
class Quantity(MObject):
449
450
451
452
453
454
455
456
457
458
459
    """Used to define quantities that store a certain piece of (meta-)data.

    Quantities are the basic building block with meta-info data. The Quantity class is
    used to define quantities within sections. A quantity definition
    is a (physics) quantity with name, type, shape, and potentially a unit.

    In Python terms, quantities are descriptors. Descriptors define how to get, set, and
    delete values for a object attribute. Meta-info descriptors ensure that
    type and shape fit the set values.
    """

460
    name: 'Quantity' = None
461
462
463
464
465
466
467
468
    """ The name of the quantity. Must be unique within a section. """

    description: 'Quantity' = None
    """ An optional human readable description. """

    links: 'Quantity' = None
    """ A list of URLs to external resource that describe this definition. """

469
    type: 'Quantity' = None
470
471
472
473
474
475
476
477
478
479
480
481
482
    """ The type of the quantity.

    Can be one of the following:

    - a build-in Python type, e.g. ``int``, ``str``, ``any``
    - an instance of :class:`Enum`, e.g. ``Enum(['one', 'two', 'three'])
    - a instance of Section, i.e. a section definition. This will define a reference
    - the Python typing ``Any`` to denote an arbitrary type
    - a Python class, e.g. ``datetime``

    In the NOMAD CoE meta-info this was basically the ``dTypeStr``.
    """

483
    shape: 'Quantity' = None
484
    """ The shape of the quantity that defines its dimensionality.
485

486
487
488
489
490
491
492
493
494
495
496
    A shape is a list, where each item defines a dimension. Each dimension can be:

    - an integer that defines the exact size of the dimension, e.g. ``[3]`` is the
      shape of a spacial vector
    - the name of an int typed quantity in the same section
    - a range specification as string build from a lower bound (i.e. int number),
      and an upper bound (int or ``*`` denoting arbitrary large), e.g. ``'0..*'``, ``'1..3'``
    """

    unit: 'Quantity' = None
    """ The optional physics unit for this quantity.
497

498
499
500
501
502
503
504
505
506
507
508
509
510
    Units are given in `pint` units. Pint is a Python package that defines units and
    their algebra. There is a default registry :data:`units` that you can use.
    Example units are: ``units.m``, ``units.m / units.s ** 2``.
    """

    default: 'Quantity' = None
    """ The default value for this quantity. """

    # Some quantities of Quantity cannot be read as normal quantities due to bootstraping.
    # Those can be accessed internally through the following replacement properties that
    # read directly from m_data.
    __name = property(lambda self: self.m_data['name'])
    __default = property(lambda self: self.m_data.get('default', None))
511

512
    def __get__(self, obj, type=None):
513
514
515
516
517
518
519
520
521
        if obj is None:
            # class (def) attribute case
            return self

        # object (instance) attribute case
        try:
            return obj.m_data[self.__name]
        except KeyError:
            return self.__default
522

523
    def __set__(self, obj, value):
524
525
526
527
528
529
        if obj is None:
            # class (def) case
            raise KeyError('Cannot overwrite quantity definition. Only values can be set.')

        # object (instance) case
        MObject.m_type_check(self, value)
530
        obj.m_data[self.__name] = value
531

532
    def __delete__(self, obj):
533
534
535
536
537
        if obj is None:
            # class (def) case
            raise KeyError('Cannot delete quantity definition. Only values can be deleted.')

        # object (instance) case
538
        del obj.m_data[self.__name]
539
540


541
class Section(MObject):
542
543
544
545
546
547
548
549
550
551
552
553
554
    """Used to define section that organize meta-info data into containment hierarchies.

    Section definitions determine what quantities and sub-sections can appear in a section
    instance. A section instance itself can appear potentially many times in its parent
    section. See :data:`repeats` and :data:`parent`.

    In Python terms, sections are classes. Sub-sections and quantities are attribute of
    respective instantiating objects. For each section class there is a corresponding
    :class:`Section` instance that describes this class as a section. This instance
    is referred to as 'section definition' in contrast to the Python class that we call
    'section class'.
    """

555
    section_cls: Type[MObject] = None
556
557
    """ The section class that corresponse to this section definition. """

558
    name: 'Quantity' = None
559
560
561
562
563
564
565
566
    """ The name of the section. """

    description: 'Quantity' = None
    """ A human readable description of the section. """

    links: 'Quantity' = None
    """ A list of URLs to external resource that describe this definition. """

567
    repeats: 'Quantity' = None
568
569
    """ Wether instances of this section can occur repeatedly in the parent section. """

570
    parent: 'Quantity' = None
571
    """ The section definition for parents.
572

573
574
    Instances of this section can only occur in instances of the given parent.
    """
575

576
    __all_instances: List['Section'] = []
577

578
579
580
581
    def __init__(self, **kwargs):
        # The mechanism that produces default values, depends on parent. Without setting
        # the parent default manually, an endless recursion will occur.
        kwargs.setdefault('parent', None)
582

583
584
        super().__init__(**kwargs)
        Section.__all_instances.append(self)
585

586
587
588
589
    # TODO cache
    @property
    def attributes(self) -> Dict[str, Union['Section', Quantity]]:
        """ All attribute (sub section and quantity) definitions. """
590

591
592
593
        attributes: Dict[str, Union[Section, Quantity]] = dict(**self.quantities)
        attributes.update(**self.sub_sections)
        return attributes
594

595
596
597
598
    # TODO cache
    @property
    def quantities(self) -> Dict[str, Quantity]:
        """ All quantity definition in the given section definition. """
599

600
601
602
        return {
            quantity.name: quantity
            for quantity in self.m_data.get('Quantity', [])}
603

604
605
606
607
    # TODO cache
    @property
    def sub_sections(self) -> Dict[str, 'Section']:
        """ All sub section definitions for this section definition. """
608

609
610
611
612
        return {
            sub_section.name: sub_section
            for sub_section in Section.__all_instances
            if sub_section.parent == self}
613

614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
    def add_quantity(self, quantity: Quantity):
        """
        Adds the given quantity to this section.

        Allows to add a quantity to a section definition outside the corresponding
        section class.

        .. code-block:: Python

        class System(MObject):
            pass

        System.m_section.add_quantity(Quantity(name='n_atoms', type=int))

        This will add the quantity definition to this section definition,
        and add the respective Python descriptor as an attribute to this class.
        """
        quantities = self.m_data.setdefault('Quantity', [])
        quantities.append(quantity)

        setattr(self.section_cls, quantity.name, quantity)

636

637
638
Section.m_section = Section(repeats=True, name='Section')
Section.m_section.m_section = Section.m_section
639
Section.m_section.section_cls = Section
640

641
Section.name = Quantity(type=str, name='name')
642
643
644
Section.description = Quantity(type=str, name='description')
Section.links = Quantity(type=str, shape=['0..*'], name='links')
Section.repeats = Quantity(type=bool, name='repeats', default=False)
645
Section.parent = Quantity(type=Section.m_section, name='parent')
646

647
Quantity.m_section = Section(repeats=True, parent=Section.m_section, name='Quantity')
648
Quantity.m_section.section_cls = Quantity
649
Quantity.name = Quantity(type=str, name='name')
650
651
Quantity.description = Quantity(type=str, name='description')
Quantity.links = Quantity(type=str, shape=['0..*'], name='links')
652
653
Quantity.type = Quantity(type=Union[type, Enum, Section], name='type')
Quantity.shape = Quantity(type=Union[str, int], shape=['0..*'], name='shape')
654
655
Quantity.unit = Quantity(type=_Unit)
Quantity.default = Quantity(type=Any, default=None)
656
657


658
659
660
class Package(MObject):
    m_section = Section()
    name = Quantity(type=str)
661
662


663
Section.m_section.parent = Package.m_section
664

665
666
units = UnitRegistry()
""" The default pint unit registry that should be used to give units to quantity definitions. """