metainfo.py 18.8 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
# Copyright 2018 Markus Scheidgen
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an"AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
14

15
16
17
18
"""
The NOMAD meta-info allows to define physics data quantities. These definitions are
necessary for all computer representations of respective data (e.g. in Python,
search engines, data-bases, and files).
19

20
This modules provides various Python interfaces for
21

22
23
24
- defining meta-info data
- to create and manipulate data that follows these definitions
- to (de-)serialize meta-info data in JSON (i.e. represent data in JSON formatted files)
25

26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
Here is a simple example that demonstrates the definition of System related quantities:

.. code-block:: python

    class Run(MObject):
        pass

    class System(MObject):
        \"\"\"
        A system section includes all quantities that describe a single a simulated
        system (a.k.a. geometry).
        \"\"\"

        m_section = Section(repeats=True, parent=Run.m_section)

        n_atoms = Quantity(type=int)
        \"\"\" A Defines the number of atoms in the system. \"\"\"
43

44
45
46
47
        atom_labels = Quantity(type=Enum(ase.data.chemical_symbols), shape['n_atoms'])
        atom_positions = Quantity(type=float, shape=['n_atoms', 3], unit=Units.m)
        simulation_cell = Quantity(type=float, shape=[3, 3], unit=Units.m)
        pbc = Quantity(type=bool, shape=[3])
48

49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
Here, we define a `section` called ``System``. The section mechanism allows to organize
related data into, well, sections. Sections form containment hierarchies. Here
containment is a parent-child (whole-part) relationship. In this example many ``Systems``,
are part of one ``Run``. Each ``System`` can contain values for the defined quantities:
``n_atoms``, ``atom_labels``, ``atom_positions``, ``simulation_cell``, and ``pbc``.
Quantities allow to state type, shape, and physics unit to specify possible quantity
values.

Here is an example, were we use the above definition to create, read, and manipulate
data that follows these definitions:

.. code-bock:: python

    run = Run()
    system = run.m_create(System)
    system.n_atoms = 3
    system.atom_labels = ['H', 'H', 'O']

    print(system.atom_labels)
    print(run.m_to_json(ident=2))

This last statement, will produce the following JSON:

.. code-block:: JSON

    {
        "m_section" = "Run",
        "System": [
            {
                "m_section" = "System",
                "m_parent_index" = 0,
                "n_atoms" = 3,
                "atom_labels" = [
                    "H",
                    "H",
                    "O"
                ]
            }
        ]
    }

This is the JSON representation, a serialized version of the Python representation in
the example above.

Sections can be extended with new quantities outside the original section definition.
This provides the key mechanism to extend commonly defined parts with (code) specific
quantities:

.. code-block:: Python

    class Method(nomad.metainfo.common.Method):
        x_vasp_incar_ALGO=Quantity(
            type=Enum(['Normal', 'VeryFast', ...]),
            links=['https://cms.mpi.univie.ac.at/wiki/index.php/ALGO'])
        \"\"\"
        A convenient option to specify the electronic minimisation algorithm (as of VASP.4.5)
        and/or to select the type of GW calculations.
        \"\"\"


All meta-info definitions and classes for meta-info data objects (i.e. section instances)
inherit from :class:` MObject`. This base-class provides common functions and attributes
for all meta-info data objects. Names of these common parts are prefixed with ``m_``
to distinguish them from user defined quantities. This also constitute's the `reflection`
interface (in addition to Python's build in ``getattr``, ``setattr``) that allows to
create and manipulate meta-info data, without prior program time knowledge of the underlying
definitions.

.. autoclass:: MObject

The following classes can be used to define and structure meta-info data:

- sections are defined by sub-classes :class:`MObject` and using :class:`Section` to
  populate the classattribute `m_section`
- quantities are defined by assigning classattributes of a section with :class:`Quantity`
  instances
- references (from one section to another) can be defined with quantities that use
  section definitions as type
- dimensions can use defined by simply using quantity names in shapes
- categories (former `abstract type definitions`) can be given in quantity definitions
  to assign quantities to additional specialization-generalization hierarchies

See the reference of classes :class:`Section` and :class:`Quantities` for details.

.. autoclass:: Section
.. autoclass:: Quantity
135
136
"""

137
138
139
140
141
142
143
from typing import Type, TypeVar, Union, Tuple, Iterable, List, Any, Dict, cast
import sys


__module__ = sys.modules[__name__]
MObjectBound = TypeVar('MObjectBound', bound='MObject')

144

145
# Reflection
146

147
148
149
150
class Enum(list):
    pass


151
class MObjectMeta(type):
152

153
154
155
156
157
158
    def __new__(self, cls_name, bases, dct):
        cls = super().__new__(self, cls_name, bases, dct)
        init = getattr(cls, '__init_section_cls__')
        if init is not None:
            init()
        return cls
159
160


161
Content = Tuple[MObjectBound, Union[List[MObjectBound], MObjectBound], str, MObjectBound]
162
SectionDef = Union[str, 'Section', Type[MObjectBound]]
163
164


165
class MObject(metaclass=MObjectMeta):
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
    """Base class for all section objects on all meta-info levels.

    All metainfo objects instantiate classes that inherit from ``MObject``. Each
    section or quantity definition is an ``MObject``, each actual (meta-)data carrying
    section is an ``MObject``. This class consitutes the reflection interface of the
    meta-info, since it allows to manipulate sections (and therefore all meta-info data)
    without having to know the specific sub-class.

    It also carries all the data for each section. All sub-classes only define specific
    sections in terms of possible sub-sections and quantities. The data is managed here.

    The reflection insterface for reading and manipulating quantity values consists of
    Pythons build in ``getattr``, ``setattr``, and ``del``, as well as member functions
    :func:`m_add_value`, and :func:`m_add_values`.

    Sub-sections and parent sections can be read and manipulated with :data:`m_parent`,
    :func:`m_sub_section`, :func:`m_create`.

184
185
186
187
188
    .. code-block:: python

        system = run.m_create(System)
        assert system.m_parent == run
        assert run.m_sub_section(System, system.m_parent_index) == system
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206

    Attributes:
        m_section: The section definition that defines this sections, its possible
            sub-sections and quantities.
        m_parent: The parent section instance that this section is a sub-section of.
        m_parent_index: For repeatable sections, parent keep a list of sub-sections for
            each section definition. This is the index of this section in the respective
            parent sub-section list.
        m_data: The dictionary that holds all data of this section. It keeps the quantity
            values and sub-section. It should only be read directly (and never manipulated)
            if you are know what you are doing. You should always use the reflection interface
            if possible.
    """

    def __init__(self, m_section: 'Section' = None, m_parent: 'MObject' = None, **kwargs):
        self.m_section: 'Section' = m_section
        self.m_parent: 'MObject' = m_parent
        self.m_parent_index = -1
207
        self.m_data = dict(**kwargs)
208

209
210
211
212
213
214
        if self.m_section is None:
            self.m_section = getattr(self.__class__, 'm_section', None)
        else:
            assert self.m_section == getattr(self.__class__, 'm_section', self.m_section), \
                'Section class and section definition must match'

215
216
217
218
219
    @classmethod
    def __init_section_cls__(cls):
        if not hasattr(__module__, 'Quantity') or not hasattr(__module__, 'Section'):
            # no initialization during bootstrapping, will be done maunally
            return
220

221
222
223
224
225
        m_section = getattr(cls, 'm_section', None)
        if m_section is None:
            m_section = Section()
            setattr(cls, 'm_section', m_section)
        m_section.name = cls.__name__
226
        m_section.section_cls = cls
227

228
229
230
        for name, value in cls.__dict__.items():
            if isinstance(value, Quantity):
                value.name = name
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
                # manual manipulation of m_data due to bootstrapping
                m_section.m_data.setdefault('Quantity', []).append(value)

    @staticmethod
    def __type_check(definition: 'Quantity', value: Any, check_item: bool = False):
        """Checks if the value fits the given quantity in type and shape; raises
        ValueError if not."""

        def check_value(value):
            if isinstance(definition.type, Enum):
                if value not in definition.type:
                    raise ValueError('Not one of the enum values.')

            elif isinstance(definition.type, type):
                if not isinstance(value, definition.type):
                    raise ValueError('Value has wrong type.')

            elif isinstance(definition.type, Section):
                if not isinstance(value, MObject) or value.m_section != definition.type:
                    raise ValueError('The value is not a section of wrong section definition')

            else:
                raise Exception('Invalid quantity type: %s' % str(definition.type))

        shape = None
        try:
            shape = definition.shape
        except KeyError:
            pass

        if shape is None or len(shape) == 0 or check_item:
            check_value(value)

        elif len(shape) == 1:
            if not isinstance(value, list):
                raise ValueError('Wrong shape')

            for item in value:
                check_value(item)

        else:
            # TODO
            raise Exception('Higher shapes not implemented')

        # TODO check dimension

    def __resolve_section(self, definition: SectionDef) -> 'Section':
        """Resolves and checks the given section definition. """
        if isinstance(definition, str):
            section = self.m_section.sub_sections[definition]

        else:
            if isinstance(definition, type):
                section = getattr(definition, 'm_section')
            else:
                section = definition
            if section.name not in self.m_section.sub_sections:
                raise KeyError('Not a sub section.')

        return section
291

292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
    def m_sub_section(self, definition: SectionDef, parent_index: int = -1) -> MObjectBound:
        """Returns the sub section for the given section definition and possible
           parent_index (for repeatable sections).

        Args:
            definition: The definition of the section.
            parent_index: The index of the desired section. This can be omitted for non
                repeatable sections. If omitted for repeatable sections a exception
                will be raised, if more then one sub-section exists. Likewise, if the given
                index is out of range.
        Raises:
            KeyError: If the definition is not for a sub section
            IndexError: If the given index is wrong, or if an index is given for a non
                repeatable section
        """
        section_def = self.__resolve_section(definition)

        m_data_value = self.m_data[section_def.name]

        if isinstance(m_data_value, list):
            m_data_values = m_data_value
            if parent_index == -1:
                if len(m_data_values) == 1:
                    return m_data_values[0]
                else:
                    raise IndexError()
            else:
                return m_data_values[parent_index]
        else:
            if parent_index != -1:
                raise IndexError('Not a repeatable sub section.')
            else:
                return m_data_value

    def m_create(self, definition: SectionDef, **kwargs) -> MObjectBound:
327
        """Creates a subsection and adds it this this section
328

329
330
331
332
        Args:
            section: The section definition of the subsection. It is either the
                definition itself, or the python class representing the section definition.
            **kwargs: Are used to initialize the subsection.
333

334
335
        Returns:
            The created subsection
336

337
        Raises:
338
            KeyError: If the given section is not a subsection of this section.
339
        """
340
        section_def: 'Section' = self.__resolve_section(definition)
341

342
        section_cls = section_def.section_cls
343
        section_instance = section_cls(m_section=section_def, m_parent=self, **kwargs)
344

345
        if section_def.repeats:
346
347
348
349
            m_data_sections = self.m_data.setdefault(section_def.name, [])
            section_index = len(m_data_sections)
            m_data_sections.append(section_instance)
            section_instance.m_parent_index = section_index
350
        else:
351
            self.m_data[section_def.name] = section_instance
352

353
        return cast(MObjectBound, section_instance)
354

355
356
357
358
    def __resolve_quantity(self, definition: Union[str, 'Quantity']) -> 'Quantity':
        """Resolves and checks the given quantity definition. """
        if isinstance(definition, str):
            quantity = self.m_section.quantities[definition]
359

360
361
362
363
364
365
366
367
368
        else:
            if definition.m_parent != self.m_section:
                raise KeyError('Quantity is not a quantity of this section.')
            quantity = definition

        return quantity

    def m_add(self, definition: Union[str, 'Quantity'], value: Any):
        """Adds the given value to the given quantity."""
369

370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
        quantity = self.__resolve_quantity(definition)

        MObject.__type_check(quantity, value, check_item=True)

        m_data_values = self.m_data.setdefault(quantity.name, [])
        m_data_values.append(value)

    def m_add_values(self, definition: Union[str, 'Quantity'], values: Iterable[Any]):
        """Adds the given values to the given quantity."""

        quantity = self.__resolve_quantity(definition)

        for value in values:
            MObject.__type_check(quantity, value, check_item=True)

        m_data_values = self.m_data.setdefault(quantity.name, [])
        for value in values:
            m_data_values.append(value)

    def m_to_dict(self) -> Dict[str, Any]:
        """Returns the data of this section as a json serializeable dictionary. """
391
        pass
392

393
    def m_to_json(self):
394
        """Returns the data of this section as a json string. """
395
        pass
396

397
    def m_all_contents(self) -> Iterable[Content]:
398
        """Returns an iterable over all sub and sub subs sections. """
399
400
401
        for content in self.m_contents():
            for sub_content in content[0].m_all_contents():
                yield sub_content
402

403
            yield content
404

405
    def m_contents(self) -> Iterable[Content]:
406
        """Returns an iterable over all direct subs sections. """
407
408
409
410
411
        for name, attr in self.m_data.items():
            if isinstance(attr, list):
                for value in attr:
                    if isinstance(value, MObject):
                        yield value, attr, name, self
412

413
414
            elif isinstance(attr, MObject):
                yield value, value, name, self
415

416
417
418
419
420
421
422
    def __repr__(self):
        m_section_name = self.m_section.name
        name = ''
        if 'name' in self.m_data:
            name = self.m_data['name']

        return '%s:%s' % (name, m_section_name)
423
424


425
# M3
426

427
428
429
430
431
class Quantity(MObject):
    m_section: 'Section' = None
    name: 'Quantity' = None
    type: 'Quantity' = None
    shape: 'Quantity' = None
432

433
    __name = property(lambda self: self.m_data['name'])
434

435
    default = property(lambda self: None)
436

437
    def __get__(self, obj, type=None):
438
        return obj.m_data[self.__name]
439

440
    def __set__(self, obj, value):
441
        MObject.__dict__['_MObject__type_check'].__get__(MObject)(self, value)
442
        obj.m_data[self.__name] = value
443

444
    def __delete__(self, obj):
445
        del obj.m_data[self.__name]
446
447


448
449
class Section(MObject):
    m_section: 'Section' = None
450
    section_cls: Type[MObject] = None
451
452
453
454
    name: 'Quantity' = None
    repeats: 'Quantity' = None
    parent: 'Quantity' = None
    extends: 'Quantity' = None
455

456
    __all_instances: List['Section'] = []
457

458
    default = property(lambda self: [] if self.repeats else None)
459

460
461
462
463
    def __init__(self, **kwargs):
        # The mechanism that produces default values, depends on parent. Without setting
        # the parent default manually, an endless recursion will occur.
        kwargs.setdefault('parent', None)
464

465
466
        super().__init__(**kwargs)
        Section.__all_instances.append(self)
467

468
469
470
471
    # TODO cache
    @property
    def attributes(self) -> Dict[str, Union['Section', Quantity]]:
        """ All attribute (sub section and quantity) definitions. """
472

473
474
475
        attributes: Dict[str, Union[Section, Quantity]] = dict(**self.quantities)
        attributes.update(**self.sub_sections)
        return attributes
476

477
478
479
480
    # TODO cache
    @property
    def quantities(self) -> Dict[str, Quantity]:
        """ All quantity definition in the given section definition. """
481

482
483
484
        return {
            quantity.name: quantity
            for quantity in self.m_data.get('Quantity', [])}
485

486
487
488
489
    # TODO cache
    @property
    def sub_sections(self) -> Dict[str, 'Section']:
        """ All sub section definitions for this section definition. """
490

491
492
493
494
        return {
            sub_section.name: sub_section
            for sub_section in Section.__all_instances
            if sub_section.parent == self}
495
496


497
498
Section.m_section = Section(repeats=True, name='Section')
Section.m_section.m_section = Section.m_section
499
Section.m_section.section_cls = Section
500

501
502
503
504
Section.name = Quantity(type=str, name='name')
Section.repeats = Quantity(type=bool, name='repeats')
Section.parent = Quantity(type=Section.m_section, name='parent')
Section.extends = Quantity(type=Section.m_section, shape=['0..*'], name='extends')
505

506
Quantity.m_section = Section(repeats=True, parent=Section.m_section, name='Quantity')
507
Quantity.m_section.section_cls = Quantity
508
509
510
Quantity.name = Quantity(type=str, name='name')
Quantity.type = Quantity(type=Union[type, Enum, Section], name='type')
Quantity.shape = Quantity(type=Union[str, int], shape=['0..*'], name='shape')
511
512


513
514
515
class Package(MObject):
    m_section = Section()
    name = Quantity(type=str)
516
517


518
Section.m_section.parent = Package.m_section
519
520


521
522
class Definition(MObject):
    m_section = Section(extends=[Section.m_section, Quantity.m_section, Package.m_section])
523

524
    description = Quantity(type=str)
525
526
527
528
529
530
531
532
533
534


class Unit:
    pass


class Units:

    Angstrom = Unit()
    amu = Unit()