Commit 9db42cef authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Added documentation about how to write a normalizer. #448

parent e2322615
Pipeline #86705 passed with stages
in 37 minutes and 15 seconds
How to write a normalizer
A normalizer can be any Python algorithm that takes the archive of an entry as input
and manipulates (usually expands) the given archive. This way a normalizer can add
additional sections and quantities based on the information already available in the archive.
All normalizer are executed after parsing. Normalizer are run for each entry (i.e. each
set of files that represent a code run). Normalizer are run in a particular order and
you can make assumptions about the availability of data created by other normalizer.
A normalizer is run in any case, but it might chose not to do anything. A normalizer
can perform any operation on the archive, but in general should not alter existing information,
but just add more information.
Starting example
This is an example for a very simple normalizer that computes the unit cell volume from
a given lattice and adds it to the archive.
.. code-block:: python
from nomad.normalizing import Normalizer
from nomad.atomutils import get_volume
class UnitCellVolumeNormalizer(Normalizer):
def normalize(self):
for system in self.archive.section_run[-1].section_system:
system.unit_cell_volume = get_volume(lattice_vectors.magnitude)
self.logger.debug('computed unit cell volume', system_index=system.m_parent_index)
You simply inherit from ``Normalizer`` and implement the ``normalize`` method. The
``archive`` is available as a field. There is also a logger on the object that can be used.
Be aware that the processing will already report the run of the normalizer, log its execution
time, log any exceptions that might been thrown.
Of course, if you add new information to the archive, this needs also be defined in the
metainfo. For example you could extend the section system with a special system definition
that extends the existing section system definition:
.. code-block:: python
import numpy as np
from nomad.datamodel.metainfo.public import section_system as System
from nomad.metainfo import Section, Quantity
class UnitCellVolumeSystem(System):
m_def = Section(extends_base_section=True)
unit_cell_volume = Quantity(np.dtype(np.float64), unit='m^3')
Or you simply alter the ``section_system`` class (``nomad/datamodel/metainfo/``).
System normalizer
There is a special base-class for normalizing systems that allows to run the normalization
on all (or only the resulting `representative` system:
.. code-block:: python
from nomad.normalizing import SystemBasedNormalizer
from nomad.atomutils import get_volume
class UnitCellVolumeNormalizer(SystemBasedNormalizer):
def _normalize_system(self, system, is_representative):
system.unit_cell_volume = get_volume(lattice_vectors.magnitude)
The parameter ``is_representative`` will be true for the `representative` system, i.e.
the final step in a geometry optimization or other workflow.
Adding a normalizer to the processing
For any new normalizer class to be recognized by the processing, the normalizer class
needs to be added to the list of normalizers in ``nomad/normalizing/``.
The order of the normalizers in this list will also determine the execution order of
the normalizers during processing.
.. code-block:: python
normalizers: Iterable[Type[Normalizer]] = [
Testing a normalizer
To simply tryout a normalizer, you could use the CLI and run the parse command:
.. code-block:: sh
nomad --debug parse --show-archive <path-to-example-file>
But eventually you need to add a more formal test. Place your ``pytest``-tests in
``tests/normalizing/`` similar to the existing tests. Necessary
test data can be added to ``tests/data/normalizers``.
......@@ -17,5 +17,6 @@ and infrastructure with a simplyfied architecture and consolidated code base.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment