Commit a8513c50 authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Backend refactor

parent 07effaf8
...@@ -32,7 +32,7 @@ pip install nomad-lab ...@@ -32,7 +32,7 @@ pip install nomad-lab
To **use the NOMAD parsers for example**, install the `parsing` extra: To **use the NOMAD parsers for example**, install the `parsing` extra:
``` ```
pip install nomad-lab[parsing] pip install nomad-lab[parsing]
nomad parse --show-backend <your-file-to-parse> nomad parse --show-archive <your-file-to-parse>
``` ```
### For NOMAD developer ### For NOMAD developer
......
...@@ -42,7 +42,7 @@ To parse code input/output from the command line, you can use NOMAD's command li ...@@ -42,7 +42,7 @@ To parse code input/output from the command line, you can use NOMAD's command li
interface (CLI) and print the processing results output to stdout: interface (CLI) and print the processing results output to stdout:
``` ```
nomad parse --show-backend <path-to-file> nomad parse --show-archive <path-to-file>
``` ```
To parse a file in Python, you can program something like this: To parse a file in Python, you can program something like this:
......
Subproject commit afc55d917505d8a882611ca27ef91f0cc3ac11f6 Subproject commit f41d95aa9bf238dbcf2258eba82c87ecdb491cd1
Subproject commit 5b1305cb24a7ec2806a94b2b5192ab6ec9a7d0d2 Subproject commit 41bc37d7d165f671de427ab25bda17d110e22e38
Subproject commit 0a9bb17150428c5c86115091aed58a1ae502d96b Subproject commit bd9c04e281aa42010d3e57310f1106680a332763
...@@ -4,11 +4,11 @@ Using the NOMAD parsers ...@@ -4,11 +4,11 @@ Using the NOMAD parsers
To use the NOMAD parsers from the command line, you can use the ``parse`` command. The To use the NOMAD parsers from the command line, you can use the ``parse`` command. The
parse command will automatically *match* the right parser to your code output file and parse command will automatically *match* the right parser to your code output file and
run the parser. There are two output formats, ``--show-metadata`` (a JSON representation run the parser. There are two output formats, ``--show-metadata`` (a JSON representation
of the repository metadata), ``--show-backend`` (a JSON representation of the archive data). of the repository metadata), ``--show-archive`` (a JSON representation of the archive data).
.. code-block:: sh .. code-block:: sh
nomad parser --show-backend <path-to-your-mainfile-code-output-file> nomad parser --show-archive <path-to-your-mainfile-code-output-file>
You can also use the NOMAD parsers from within Python. This will give you the parse You can also use the NOMAD parsers from within Python. This will give you the parse
results as metainfo objects to conveniently analyse the results in Python. See :ref:`metainfo <metainfo-label>` results as metainfo objects to conveniently analyse the results in Python. See :ref:`metainfo <metainfo-label>`
......
...@@ -284,10 +284,9 @@ like all others: add __usrMyCodeLength to the group name. ...@@ -284,10 +284,9 @@ like all others: add __usrMyCodeLength to the group name.
## Backend ## Backend
The backend is an object can stores parsed data according to its meta-info. The The backend is an object can stores parsed data according to its meta-info. The
class :py:class:`nomad.parsing.AbstractParserBackend` provides the basic backend interface. class :py:class:`nomad.parsing.Backend` provides the basic backend interface.
It allows to open and close sections, add values, arrays, and values to arrays. It allows to open and close sections, add values, arrays, and values to arrays.
In nomad@FAIRDI, we practically only use the :py:class:`nomad.parsing.LocalBackend`. In In NOMAD-coe multiple backend implementations existed to facilitate the communication of
NOMAD-coe multiple backend implementations existed to facilitate the communication of
python parsers with the scala infrastructure, including caching and streaming. python parsers with the scala infrastructure, including caching and streaming.
## Triggers ## Triggers
......
...@@ -124,6 +124,9 @@ nomad dev searchQuantities > gui/src/searchQuantities.json ...@@ -124,6 +124,9 @@ nomad dev searchQuantities > gui/src/searchQuantities.json
./gitinfo.sh ./gitinfo.sh
``` ```
In additional, you have to do some more steps to prepare your working copy to run all
the tests. See below.
## Build and run the infrastructure with docker ## Build and run the infrastructure with docker
### Docker and nomad ### Docker and nomad
...@@ -218,6 +221,33 @@ yarn start ...@@ -218,6 +221,33 @@ yarn start
``` ```
## Run the tests ## Run the tests
### additional settings and artifacts
To run the tests some additional settings and files are necessary that are not part
of the code base.
First you need to create a `nomad.yaml` with the admin password for the user management
system:
```
keycloak:
password: <the-password>
```
Secondly, you need to provide the `springer.msg` Springer materials database. It can
be copied from `/nomad/fairdi/db/data/springer.msg` on our servers and should
be placed at `nomad/normalizing/data/springer.msg`.
Thirdly, you have to provide static files to serve the docs and NOMAD distribution:
```
cd docs
make html
cd ..
python setup.py compile
python setup.py sdist
cp dist/nomad-lab-*.tar.gz dist/nomad-lab.tar.gz
```
### run the necessary infrastructure
You need to have the infrastructure partially running: elastic, rabbitmq. You need to have the infrastructure partially running: elastic, rabbitmq.
The rest should be mocked or provided by the tests. Make sure that you do no run any The rest should be mocked or provided by the tests. Make sure that you do no run any
worker, as they will fight for tasks in the queue. worker, as they will fight for tasks in the queue.
......
elastic:
index_name: fairdi_nomad_ems
mongo:
db_name: fairdi_nomad_ems
domain: EMS
...@@ -23,7 +23,6 @@ import json ...@@ -23,7 +23,6 @@ import json
from nomad import config, utils from nomad import config, utils
from nomad import files from nomad import files
from nomad import datamodel
from nomad.cli import parse as cli_parse from nomad.cli import parse as cli_parse
from .client import client from .client import client
...@@ -131,31 +130,31 @@ class CalcProcReproduction: ...@@ -131,31 +130,31 @@ class CalcProcReproduction:
self.upload_files.raw_file_object(self.mainfile).os_path, self.upload_files.raw_file_object(self.mainfile).os_path,
parser_name=parser_name, logger=self.logger, **kwargs) parser_name=parser_name, logger=self.logger, **kwargs)
def normalize(self, normalizer: typing.Union[str, typing.Callable], parser_backend=None): def normalize(self, normalizer: typing.Union[str, typing.Callable], entry_archive=None):
''' '''
Parse the downloaded calculation and run the given normalizer. Parse the downloaded calculation and run the given normalizer.
''' '''
if parser_backend is None: if entry_archive is None:
parser_backend = self.parse() entry_archive = self.parse()
return cli_parse.normalize(parser_backend=parser_backend, normalizer=normalizer, logger=self.logger) return cli_parse.normalize(entry_archive=entry_archive, normalizer=normalizer, logger=self.logger)
def normalize_all(self, parser_backend=None): def normalize_all(self, entry_archive=None):
''' '''
Parse the downloaded calculation and run the whole normalizer chain. Parse the downloaded calculation and run the whole normalizer chain.
''' '''
return cli_parse.normalize_all(parser_backend=parser_backend, logger=self.logger) return cli_parse.normalize_all(entry_archive=entry_archive, logger=self.logger)
@client.command(help='Run processing locally.') @client.command(help='Run processing locally.')
@click.argument('CALC_ID', nargs=1, required=True, type=str) @click.argument('CALC_ID', nargs=1, required=True, type=str)
@click.option('--override', is_flag=True, help='Override existing local calculation data.') @click.option('--override', is_flag=True, help='Override existing local calculation data.')
@click.option('--show-backend', is_flag=True, help='Print the backend data.') @click.option('--show-archive', is_flag=True, help='Print the archive data.')
@click.option('--show-metadata', is_flag=True, help='Print the extracted repo metadata.') @click.option('--show-metadata', is_flag=True, help='Print the extracted repo metadata.')
@click.option('--mainfile', default=None, type=str, help='Use this mainfile (in case mainfile cannot be retrived via API.') @click.option('--mainfile', default=None, type=str, help='Use this mainfile (in case mainfile cannot be retrived via API.')
@click.option('--skip-normalizers', is_flag=True, help='Do not normalize.') @click.option('--skip-normalizers', is_flag=True, help='Do not normalize.')
@click.option('--not-strict', is_flag=True, help='Also match artificial parsers.') @click.option('--not-strict', is_flag=True, help='Also match artificial parsers.')
def local(calc_id, show_backend, show_metadata, skip_normalizers, not_strict, **kwargs): def local(calc_id, show_archive, show_metadata, skip_normalizers, not_strict, **kwargs):
utils.get_logger(__name__).info('Using %s' % config.client.url) utils.get_logger(__name__).info('Using %s' % config.client.url)
with CalcProcReproduction(calc_id, **kwargs) as local: with CalcProcReproduction(calc_id, **kwargs) as local:
...@@ -163,15 +162,15 @@ def local(calc_id, show_backend, show_metadata, skip_normalizers, not_strict, ** ...@@ -163,15 +162,15 @@ def local(calc_id, show_backend, show_metadata, skip_normalizers, not_strict, **
print( print(
'Data being saved to .volumes/fs/tmp/repro_' 'Data being saved to .volumes/fs/tmp/repro_'
'%s if not already there' % local.upload_id) '%s if not already there' % local.upload_id)
backend = local.parse(strict=not not_strict) entry_archive = local.parse(strict=not not_strict)
if not skip_normalizers: if not skip_normalizers:
local.normalize_all(parser_backend=backend) local.normalize_all(entry_archive=entry_archive)
if show_backend: if show_archive:
json.dump(backend.resource.m_to_dict(), sys.stdout, indent=2) json.dump(entry_archive.m_to_dict(), sys.stdout, indent=2)
if show_metadata: if show_metadata:
metadata = datamodel.EntryMetadata(domain='dft') # TODO take domain from matched parser metadata = entry_archive.section_metadata
metadata.apply_domain_metadata(backend) metadata.apply_domain_metadata(entry_archive)
json.dump(metadata.m_to_dict(), sys.stdout, indent=4) json.dump(metadata.m_to_dict(), sys.stdout, indent=4)
...@@ -41,18 +41,20 @@ def parse( ...@@ -41,18 +41,20 @@ def parse(
if hasattr(parser, 'backend_factory'): if hasattr(parser, 'backend_factory'):
setattr(parser, 'backend_factory', backend_factory) setattr(parser, 'backend_factory', backend_factory)
parser_backend = parser.run(mainfile_path, logger=logger) entry_archive = datamodel.EntryArchive()
metadata = entry_archive.m_create(datamodel.EntryMetadata)
if not parser_backend.status[0] == 'ParseSuccess': metadata.domain = parser.domain
logger.error('parsing was not successful', status=parser_backend.status) try:
parser.parse(mainfile_path, entry_archive, logger=logger)
except Exception as e:
logger.error('parsing was not successful', exc_info=e)
logger.info('ran parser') logger.info('ran parser')
return parser_backend return entry_archive
def normalize( def normalize(
normalizer: typing.Union[str, typing.Callable], parser_backend=None, normalizer: typing.Union[str, typing.Callable], entry_archive, logger=None):
logger=None):
if logger is None: if logger is None:
logger = utils.get_logger(__name__) logger = utils.get_logger(__name__)
...@@ -63,50 +65,46 @@ def normalize( ...@@ -63,50 +65,46 @@ def normalize(
if normalizer_instance.__class__.__name__ == normalizer) if normalizer_instance.__class__.__name__ == normalizer)
assert normalizer is not None, 'there is no normalizer %s' % str(normalizer) assert normalizer is not None, 'there is no normalizer %s' % str(normalizer)
normalizer_instance = typing.cast(typing.Callable, normalizer)(parser_backend.entry_archive) normalizer_instance = typing.cast(typing.Callable, normalizer)(entry_archive)
logger = logger.bind(normalizer=normalizer_instance.__class__.__name__) logger = logger.bind(normalizer=normalizer_instance.__class__.__name__)
logger.info('identified normalizer') logger.info('identified normalizer')
normalizer_instance.normalize(logger=logger) normalizer_instance.normalize(logger=logger)
logger.info('ran normalizer') logger.info('ran normalizer')
return parser_backend
def normalize_all(parser_backend=None, logger=None): def normalize_all(entry_archive, logger=None):
''' '''
Parse the downloaded calculation and run the whole normalizer chain. Parse the downloaded calculation and run the whole normalizer chain.
''' '''
for normalizer in normalizing.normalizers: for normalizer in normalizing.normalizers:
if normalizer.domain == parser_backend.domain: if normalizer.domain == entry_archive.section_metadata.domain:
parser_backend = normalize( normalize(normalizer, entry_archive, logger=logger)
normalizer, parser_backend=parser_backend, logger=logger)
return parser_backend
@cli.command(help='Run parsing and normalizing locally.', name='parse') @cli.command(help='Run parsing and normalizing locally.', name='parse')
@click.argument('MAINFILE', nargs=1, required=True, type=str) @click.argument('MAINFILE', nargs=1, required=True, type=str)
@click.option('--show-backend', is_flag=True, default=False, help='Print the backend data.') @click.option('--show-archive', is_flag=True, default=False, help='Print the archive data.')
@click.option('--show-metadata', is_flag=True, default=False, help='Print the extracted repo metadata.') @click.option('--show-metadata', is_flag=True, default=False, help='Print the extracted repo metadata.')
@click.option('--skip-normalizers', is_flag=True, default=False, help='Do not run the normalizer.') @click.option('--skip-normalizers', is_flag=True, default=False, help='Do not run the normalizer.')
@click.option('--not-strict', is_flag=True, help='Do also match artificial parsers.') @click.option('--not-strict', is_flag=True, help='Do also match artificial parsers.')
@click.option('--parser', help='Skip matching and use the provided parser') @click.option('--parser', help='Skip matching and use the provided parser')
@click.option('--annotate', is_flag=True, help='Sub-matcher based parsers will create a .annotate file.') @click.option('--annotate', is_flag=True, help='Sub-matcher based parsers will create a .annotate file.')
def _parse( def _parse(
mainfile, show_backend, show_metadata, skip_normalizers, not_strict, parser, mainfile, show_archive, show_metadata, skip_normalizers, not_strict, parser,
annotate): annotate):
nomadcore.simple_parser.annotate = annotate nomadcore.simple_parser.annotate = annotate
kwargs = dict(strict=not not_strict, parser_name=parser) kwargs = dict(strict=not not_strict, parser_name=parser)
backend = parse(mainfile, **kwargs) entry_archive = parse(mainfile, **kwargs)
if not skip_normalizers: if not skip_normalizers:
normalize_all(backend) normalize_all(entry_archive)
if show_backend: if show_archive:
json.dump(backend.resource.m_to_dict(), sys.stdout, indent=2) json.dump(entry_archive.m_to_dict(), sys.stdout, indent=2)
if show_metadata: if show_metadata:
metadata = datamodel.EntryMetadata(domain='dft') # TODO take domain from matched parser metadata = entry_archive.section_metadata
metadata.apply_domain_metadata(backend) metadata.apply_domain_metadata(entry_archive)
json.dump(metadata.m_to_dict(), sys.stdout, indent=4) json.dump(metadata.m_to_dict(), sys.stdout, indent=4)
# Copyright 2018 Markus Scheidgen
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an"AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
from nomad import config
def get_optional_backend_value(backend, key, section, unavailable_value=None, logger=None):
# Section is section_system, section_symmetry, etc...
val = None # Initialize to None, so we can compare section values.
# Loop over the sections with the name section in the backend.
for section_index in backend.get_sections(section):
if section == 'section_system':
try:
if not backend.get_value('is_representative', section_index):
continue
except (KeyError, IndexError):
continue
try:
new_val = backend.get_value(key, section_index)
except (KeyError, IndexError):
new_val = None
# Compare values from iterations.
if val is not None and new_val is not None:
if val.__repr__() != new_val.__repr__() and logger:
logger.warning(
'The values for %s differ between different %s: %s vs %s' %
(key, section, str(val), str(new_val)))
val = new_val if new_val is not None else val
if val is None and logger:
logger.warning(
'The values for %s where not available in any %s' % (key, section))
return unavailable_value if unavailable_value is not None else config.services.unavailable_value
else:
if isinstance(val, np.generic):
return val.item()
return val
...@@ -460,7 +460,7 @@ class EntryMetadata(metainfo.MSection): ...@@ -460,7 +460,7 @@ class EntryMetadata(metainfo.MSection):
''' Applies a user provided metadata dict to this calc. ''' ''' Applies a user provided metadata dict to this calc. '''
self.m_update(**metadata) self.m_update(**metadata)
def apply_domain_metadata(self, backend): def apply_domain_metadata(self, archive):
"""Used to apply metadata that is related to the domain. """Used to apply metadata that is related to the domain.
""" """
assert self.domain is not None, 'all entries must have a domain' assert self.domain is not None, 'all entries must have a domain'
...@@ -473,7 +473,7 @@ class EntryMetadata(metainfo.MSection): ...@@ -473,7 +473,7 @@ class EntryMetadata(metainfo.MSection):
if domain_section is None: if domain_section is None:
domain_section = self.m_create(domain_section_def.section_cls) domain_section = self.m_create(domain_section_def.section_cls)
domain_section.apply_domain_metadata(backend) domain_section.apply_domain_metadata(archive)
class EntryArchive(metainfo.MSection): class EntryArchive(metainfo.MSection):
......
...@@ -289,18 +289,17 @@ class DFTMetadata(MSection): ...@@ -289,18 +289,17 @@ class DFTMetadata(MSection):
self.m_parent.with_embargo, self.m_parent.with_embargo,
user_id) user_id)
def apply_domain_metadata(self, backend): def apply_domain_metadata(self, entry_archive):
from nomad.normalizing.system import normalized_atom_labels from nomad.normalizing.system import normalized_atom_labels
entry = self.m_parent entry = self.m_parent
logger = utils.get_logger(__name__).bind( logger = utils.get_logger(__name__).bind(
upload_id=entry.upload_id, calc_id=entry.calc_id, mainfile=entry.mainfile) upload_id=entry.upload_id, calc_id=entry.calc_id, mainfile=entry.mainfile)
if backend is None: if entry_archive is None:
self.code_name = self.code_name_from_parser() self.code_name = self.code_name_from_parser()
return return
entry_archive = backend.entry_archive
section_run = entry_archive.section_run section_run = entry_archive.section_run
if not section_run: if not section_run:
logger.warn('no section_run found') logger.warn('no section_run found')
...@@ -321,7 +320,7 @@ class DFTMetadata(MSection): ...@@ -321,7 +320,7 @@ class DFTMetadata(MSection):
else: else:
raise KeyError raise KeyError
except KeyError as e: except KeyError as e:
logger.warn('backend after parsing without program_name', exc_info=e) logger.warn('archive without program_name', exc_info=e)
self.code_name = self.code_name_from_parser() self.code_name = self.code_name_from_parser()
try: try:
......
...@@ -48,15 +48,15 @@ class EMSMetadata(MSection): ...@@ -48,15 +48,15 @@ class EMSMetadata(MSection):
quantities = Quantity(type=str, shape=['0..*'], default=[], a_search=Search()) quantities = Quantity(type=str, shape=['0..*'], default=[], a_search=Search())
group_hash = Quantity(type=str, a_search=Search()) group_hash = Quantity(type=str, a_search=Search())
def apply_domain_metadata(self, backend): def apply_domain_metadata(self, entry_archive):
from nomad import utils from nomad import utils
if backend is None: if entry_archive is None:
return return
entry = self.m_parent entry = self.m_parent
root_section = backend.entry_archive.section_experiment root_section = entry_archive.section_experiment
entry.formula = root_section.section_sample[0].sample_chemical_formula entry.formula = root_section.section_sample[0].sample_chemical_formula
atoms = root_section.section_sample[0].sample_atom_labels atoms = root_section.section_sample[0].sample_atom_labels
if hasattr(atoms, 'tolist'): if hasattr(atoms, 'tolist'):
......
...@@ -50,7 +50,7 @@ class BasisSet(ABC): ...@@ -50,7 +50,7 @@ class BasisSet(ABC):
@abstractmethod @abstractmethod
def to_dict(self) -> RestrictedDict: def to_dict(self) -> RestrictedDict:
"""Used to extract basis set settings from the backend and returning """Used to extract basis set settings from the archive and returning
them as a RestrictedDict. them as a RestrictedDict.
""" """
pass pass
......
...@@ -106,7 +106,7 @@ class EncyclopediaNormalizer(Normalizer): ...@@ -106,7 +106,7 @@ class EncyclopediaNormalizer(Normalizer):
except (AttributeError, KeyError): except (AttributeError, KeyError):
pass pass
else: else:
# Try to find system type information from backend for the selected system. # Try to find system type information from archive for the selected system.
try: try:
system = self.section_run.section_system[system_idx] system = self.section_run.section_system[system_idx]
stype = system.system_type stype = system.system_type
...@@ -278,7 +278,7 @@ class EncyclopediaNormalizer(Normalizer): ...@@ -278,7 +278,7 @@ class EncyclopediaNormalizer(Normalizer):
representative_scc_idx=representative_scc_idx, representative_scc_idx=representative_scc_idx,
) )
# Put the encyclopedia section into backend # Put the encyclopedia section into archive
self.fill(context) self.fill(context)
# Check that the necessary information is in place # Check that the necessary information is in place
......
...@@ -33,8 +33,8 @@ class OptimadeNormalizer(SystemBasedNormalizer): ...@@ -33,8 +33,8 @@ class OptimadeNormalizer(SystemBasedNormalizer):
This normalizer performs all produces a section all data necessary for the Optimade API. This normalizer performs all produces a section all data necessary for the Optimade API.
It assumes that the :class:`SystemNormalizer` was run before. It assumes that the :class:`SystemNormalizer` was run before.
''' '''
def __init__(self, backend): def __init__(self, archive):
super().__init__(backend, only_representatives=True) super().__init__(archive, only_representatives=True)
def add_optimade_data(self, index) -> OptimadeEntry: def add_optimade_data(self, index) -> OptimadeEntry:
''' '''
......
...@@ -23,9 +23,6 @@ class WorkflowNormalizer(Normalizer): ...@@ -23,9 +23,6 @@ class WorkflowNormalizer(Normalizer):
This normalizer performs all produces a section all data necessary for the Optimade API. This normalizer performs all produces a section all data necessary for the Optimade API.
It assumes that the :class:`SystemNormalizer` was run before. It assumes that the :class:`SystemNormalizer` was run before.
''' '''
def __init__(self, backend):
super().__init__(backend)
def _get_relaxation_type(self): def _get_relaxation_type(self):
sec_system = self.section_run.section_system sec_system = self.section_run.section_system
if not sec_system: if not sec_system:
......
...@@ -64,12 +64,10 @@ basends. In nomad@FAIRDI, we only currently only use a single backed. The follow ...@@ -64,12 +64,10 @@ basends. In nomad@FAIRDI, we only currently only use a single backed. The follow
classes provide a interface definition for *backends* as an ABC and a concrete implementation classes provide a interface definition for *backends* as an ABC and a concrete implementation
based on nomad@fairdi's metainfo: based on nomad@fairdi's metainfo: