Commit 53fa1e8c authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Merge branch 'v0.5.1' into 'master'

Merge for v0.5.1 release

See merge request !51
parents 55c0b9ff fbf95ffb
Pipeline #54040 canceled with stage
in 36 seconds
......@@ -19,9 +19,11 @@ stages:
KUBECONFIG: /etc/deploy/config
......@@ -126,10 +128,14 @@ release_version:
- docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN
- docker pull $LATEST_IMAGE
- docker push $RELEASE_IMAGE
- docker push $STABLE_IMAGE
- tags
......@@ -79,7 +79,9 @@ your browser.
### v0.5.1
- integrated parsers Dmol3, qbox, molcas, fleur, and onetep
- API endpoint for query based raw file download
- improvements to admin cli: e.g. clean staging files, reprocess uploads based on codes
- improved error handling in the GUI
- lots of parser bugfixes
- lots of minor bugfixes
API Documentation
......@@ -8,6 +9,7 @@ Summary
API Details
......@@ -4,7 +4,6 @@ Reference
.. automodule:: nomad.config
......@@ -37,6 +36,7 @@ nomad.processing
.. automodule::
......@@ -6,6 +6,7 @@ import { kibanaBase, apiBase, debug } from '../config'
import { compose } from 'recompose'
import { withApi } from './api'
import { withDomain } from './domains'
import packageJson from '../../package.json'
class About extends React.Component {
static propTypes = {
......@@ -74,7 +75,8 @@ class About extends React.Component {
` : ''}
### About this version
- version: \`${info ? info.version : 'loading'}/${info ? info.release : 'loading'}\`
- version (API): \`${info ? info.version : 'loading'}/${info ? info.release : 'loading'}\`
- version (GUI): \`${packageJson.version}\`
- domain: ${info ? : 'loading'}
- git: \`${info ? info.git.ref : 'loading'}; ${info ? info.git.version : 'loading'}\`
- last commit message: *${info ? info.git.log : 'loading'}*
......@@ -16,7 +16,7 @@
Command line interface (CLI) for nomad. Provides a group/sub-command structure, think git,
that offers various functionality to the command line user.
Use it from the command line with ``nomad --help`` or ``python -m nomad.cli --help``to learn
Use it from the command line with ``nomad --help`` or ``python -m nomad.cli --help`` to learn
......@@ -12,6 +12,26 @@
# See the License for the specific language governing permissions and
# limitations under the License.
This module describes all configurable parameters for the nomad python code. The
configuration is used for all executed python code including API, worker, CLI, and other
scripts. To use the configuration in your own scripts or new modules, simply import
this module.
All parameters are structured into objects for two reasons. First, to have
categories. Second, to allow runtime manipulation that is not effected
by python import logic. The categories are choosen along infrastructure components:
``mongo``, ``elastic``, etc.
This module also provides utilities to read the configuration from environment variables
and .yaml files. This is done automatically on import. The precedence is env over .yaml
over defaults.
.. autoclass:: nomad.config.NomadConfig
.. autofunction:: nomad.config.apply
.. autofunction:: nomad.config.load_config
import logging
import os
import os.path
......@@ -27,7 +47,8 @@ warnings.filterwarnings("ignore", message="numpy.ufunc size changed")
class NomadConfig(dict):
A dict subclass that uses attributes as key/value pairs.
A class for configuration categories. It is a dict subclass that uses attributes as
key/value pairs.
def __init__(self, **kwargs):
......@@ -158,7 +179,8 @@ mail = NomadConfig(
normalize = NomadConfig(
......@@ -256,6 +278,13 @@ def apply(key, value) -> None:
def load_config(config_file: str = os.environ.get('NOMAD_CONFIG', 'nomad.yaml')) -> None:
Loads the configuration from the ``config_file`` and environment.
config_file: Override the configfile, default is file stored in env variable
NOMAD_CONFIG or ``nomad.yaml``.
# load yaml and override defaults
if os.path.exists(config_file):
with open(config_file, 'r') as stream:
......@@ -426,10 +426,15 @@ def send_mail(name: str, email: str, message: str, subject: str):
msg['Subject'] = subject
msg['From'] = 'The nomad team <%s>' % config.mail.from_address
msg['To'] = name
to_addrs = [email]
if config.mail.cc_address is not None:
msg['Cc'] = 'The nomad team <%s>' % config.mail.cc_address
server.send_message(msg, config.mail.from_address, email)
server.send_message(msg, from_addr=config.mail.from_address, to_addrs=to_addrs)
except Exception as e:
logger.error('Could send email', exc_info=e)
logger.error('Could not send email', exc_info=e)
......@@ -21,26 +21,39 @@ Assumption about parsers
For now, we make a few assumption about parsers
- they always work on the same *meta-info* version
- they have no conflicting python requirments
- they have no conflicting python requirements
- they can be loaded at the same time and can be used within the same python process
- they are uniquely identified by a GIT URL and publicly accessible
- their version is uniquely identified by a GIT commit SHA
Each parser is defined via an instance of :class:`Parser`.
Each parser is defined via an instance of :class:`Parser`. The implementation :class:`LegacyParser` is used for most NOMAD-coe parsers.
.. autoclass:: nomad.parsing.Parser
The are sub-classes for parsers with special purposes.
.. autoclass:: nomad.parsing.Parser
.. autoclass:: nomad.parsing.MatchingParser
.. autoclass:: nomad.parsing.MissingParser
.. autoclass:: nomad.parsing.BrokenParser
.. autoclass:: nomad.parsing.TemplateParser
.. autoclass:: nomad.parsing.GenerateRandomParser
.. autoclass:: nomad.parsing.ChaosParser
.. autoclass:: nomad.parsing.EmptyParser
The implementation :class:`LegacyParser` is used for most NOMAD-coe parsers.
.. autoclass:: nomad.parsing.LegacyParser
The parser definitions are available via the following two variables.
.. autodata:: nomad.parsing.parsers
.. autodata:: nomad.parsing.parser_dict
Parsers are reused for multiple caclulations.
Parsers are reused for multiple calculations.
Parsers and calculation files are matched via regular expressions.
......@@ -56,8 +69,8 @@ based on NOMAD-coe's *python-common* module.
.. autoclass:: nomad.parsing.LocalBackend
from typing import Callable, IO, Union
import magic
import gzip
......@@ -27,6 +27,7 @@ from mongoengine import Document, StringField, ListField, DateTimeField, Validat
from mongoengine.connection import MongoEngineConnectionError
from mongoengine.base.metaclasses import TopLevelDocumentMetaclass
from datetime import datetime
import functools
from nomad import config, utils, infrastructure
import nomad.patch # pylint: disable=unused-import
......@@ -338,6 +339,7 @@ def task(func):
SUCCESS state. Calling the first task will put it into RUNNING state. Tasks will
only be executed, if the process has not yet reached FAILURE state.
def wrapper(self, *args, **kwargs):
if self.tasks_status == FAILURE:
......@@ -362,7 +364,6 @@ def task(func):
self.get_logger().critical('task wrapper failed with exception', exc_info=e)
setattr(wrapper, '__task_name', func.__name__)
wrapper.__name__ = func.__name__
return wrapper
......@@ -519,6 +520,7 @@ def process(func):
other :class:`Proc` instances. Each :class:`Proc` instance can only run one
any process at a time.
def wrapper(self, *args, **kwargs):
assert len(args) == 0 and len(kwargs) == 0, 'process functions must not have arguments'
if self.process_running:
......@@ -547,7 +549,7 @@ def process(func):
task = getattr(func, '__task_name', None)
if task is not None:
setattr(wrapper, '__task_name', task)
wrapper.__name__ = func.__name__
setattr(wrapper, '__process_unwrapped', func)
return wrapper
......@@ -19,9 +19,9 @@ calculations, and files
.. autoclass:: Calc
.. autoclass:: Upload
from typing import cast, List, Any, ContextManager, Tuple, Generator, Dict, cast
......@@ -260,6 +260,7 @@ class Calc(Proc):
def parsing(self):
""" The *task* that encapsulates all parsing related actions. """
context = dict(parser=self.parser, step=self.parser)
logger = self.get_logger(**context)
parser = parser_dict[self.parser]
......@@ -334,6 +335,7 @@ class Calc(Proc):
def normalizing(self):
""" The *task* that encapsulates all normalizing related actions. """
for normalizer in normalizers:
if normalizer.domain != config.domain:
......@@ -365,6 +367,7 @@ class Calc(Proc):
def archiving(self):
""" The *task* that encapsulates all archival related actions. """
logger = self.get_logger()
calc_with_metadata = datamodel.CalcWithMetadata(**self.metadata)
......@@ -411,10 +414,13 @@ class Upload(Proc):
name: optional user provided upload name
upload_path: the path were the uploaded files was stored
temporary: True if the uploaded file should be removed after extraction
metadata: optional user provided additional meta data
upload_id: the upload id generated by the database
upload_time: the timestamp when the system realised the upload
user_id: the id of the user that created this upload
published: Boolean that indicates the publish status
publish_time: Date when the upload was initially published
last_update: Date of the last (re-)publishing
joined: Boolean indicates if the running processing has joined (:func:`check_join`)
id_field = 'upload_id'
......@@ -443,7 +449,13 @@ class Upload(Proc):
def metadata(self) -> dict:
# TODO user_metadata needs to be stored in the public bucket, since staging data might not be shared
Getter, setter for user metadata. Metadata is pickled to and from the public
bucket to allow sharing among all processes. Usually uploads do not have (much)
user defined metadata, but users provide all metadata per upload as part of
the publish process. This will change, when we introduce editing functionality
and metadata will be provided through different means.
upload_files = PublicUploadFiles(self.upload_id, is_authorized=lambda: True)
except KeyError:
......@@ -452,7 +464,6 @@ class Upload(Proc):
def metadata(self, data: dict) -> None:
# TODO user_metadata needs to be stored in the public bucket, since staging data might not be shared
upload_files = PublicUploadFiles(self.upload_id, is_authorized=lambda: True, create=True)
upload_files.user_metadata = data
......@@ -624,6 +635,9 @@ class Upload(Proc):
def re_process_upload(self):
A *process* that performs the re-processing of a earlier processed
Runs the distributed process of fully reparsing/renormalizing an existing and
already published upload. Will renew the archive part of the upload and update
mongo and elastic search entries.
......@@ -683,11 +697,13 @@ class Upload(Proc):
def process_upload(self):
""" A *process* that performs the initial upload processing. """
def uploading(self):
""" A no-op *task* as a stand-in for receiving upload data. """
......@@ -709,7 +725,7 @@ class Upload(Proc):
def extracting(self):
Task performed before the actual parsing/normalizing. Extracting and bagging
The *task* performed before the actual parsing/normalizing. Extracting and bagging
the uploaded files, computing all keys, create an *upload* entry in the NOMAD-coe
repository db, etc.
......@@ -797,7 +813,7 @@ class Upload(Proc):
def parse_all(self):
Identified mainfile/parser combinations among the upload's files, creates
The *task* used to identify mainfile/parser combinations among the upload's files, creates
respective :class:`Calc` instances, and triggers their processing.
logger = self.get_logger()
......@@ -819,6 +835,14 @@ class Upload(Proc):
def check_join(self):
Performs an evaluation of the join condition and triggers the :func:`cleanup`
task if necessary. The join condition allows to run the ``cleanup`` after
all calculations have been processed. The upload processing stops after all
calculation processings have been triggered (:func:`parse_all` or
:func:`re_process_upload`). The cleanup task is then run within the last
calculation process (the one that triggered the join by calling this method).
total_calcs = self.total_calcs
processed_calcs = self.processed_calcs
......@@ -893,6 +917,10 @@ class Upload(Proc):
def cleanup(self):
The *task* that "cleans" the processing, i.e. removed obsolete files and performs
pending archival operations. Depends on the type of processing.
if self.current_process == 're_process_upload':
......@@ -901,39 +929,66 @@ class Upload(Proc):
def get_calc(self, calc_id) -> Calc:
""" Returns the upload calc with the given id or ``None``. """
return Calc.objects(upload_id=self.upload_id, calc_id=calc_id).first()
def processed_calcs(self):
The number of successfully or not successfully processed calculations. I.e.
calculations that have finished processing.
return Calc.objects(upload_id=self.upload_id, tasks_status__in=[SUCCESS, FAILURE]).count()
def total_calcs(self):
""" The number of all calculations. """
return Calc.objects(upload_id=self.upload_id).count()
def failed_calcs(self):
""" The number of calculations with failed processing. """
return Calc.objects(upload_id=self.upload_id, tasks_status=FAILURE).count()
def pending_calcs(self):
def pending_calcs(self) -> int:
""" The number of calculations with pending processing. """
return Calc.objects(upload_id=self.upload_id, tasks_status=PENDING).count()
def all_calcs(self, start, end, order_by=None):
Returns all calculations, paginated and ordered.
start: the start index of the requested page
end: the end index of the requested page
order_by: the property to order by
query = Calc.objects(upload_id=self.upload_id)[start:end]
return query.order_by(order_by) if order_by is not None else query
def outdated_calcs(self):
""" All successfully processed and outdated calculations. """
return Calc.objects(
upload_id=self.upload_id, tasks_status=SUCCESS,
def calcs(self):
""" All successfully processed calculations. """
return Calc.objects(upload_id=self.upload_id, tasks_status=SUCCESS)
def to_upload_with_metadata(self, user_metadata: dict = None) -> UploadWithMetadata:
This is the :py:mod:`nomad.datamodel` transformation method to transform
processing uploads into datamodel uploads. It will also implicitely transform
all calculations of this upload.
user_metadata: A dict of user metadata that is applied to the resulting
datamodel data and the respective calculations.
# prepare user metadata per upload and per calc
if user_metadata is not None:
calc_metadatas: Dict[str, Any] = dict()
......@@ -329,7 +329,7 @@ def scroll_search(
:func:`aggregate_search`, but pagination is replaced with scrolling, no ordering,
no property, and no metrics information is available.
he search is limited to parameters :param:`q` and :param:`search_parameters`,
he search is limited to parameters ``q`` and ``search_parameters``,
which work exactly as in :func:`entry_search`.
Scrolling is done by calling this function again and again with the same ``scroll_id``.
......@@ -397,13 +397,13 @@ def entry_search(
Performs a search and returns a paginated list of search results.
The search is determimed by the given elasticsearch_dsl query param:`q`,
param:`time_range` and additional :param:`search_parameters`.
The search is determimed by the given elasticsearch_dsl query ``q``,
``time_range`` and additional ``search_parameters``.
The search_parameters have to match general or domain specific metadata quantities.
See module:`datamodel`.
The search results are paginated. Pagination is controlled by the pagination parameters
param:`page` and param:`per_page`. The results are ordered.
``page`` and ``per_page``. The results are ordered.
page: The page to return starting with page 1
......@@ -518,9 +518,9 @@ def metrics_search(
datasets, and additional domain specific metrics (e.g. total energies, and unique geometries for DFT
calculations). The quantities that can be aggregated to metrics are defined
in module:`datamodel`. Aggregations and respective metrics are calculated for
aggregations given in param:`aggregations` and metrics in param:`aggregation_metrics`.
As a pseudo aggregation param:`total_metrics` are calculation over all search results.
The param:`aggregations` gives tuples of quantities and default aggregation sizes.
aggregations given in ``aggregations`` and metrics in ``aggregation_metrics``.
As a pseudo aggregation ``total_metrics`` are calculation over all search results.
The ``aggregations`` gives tuples of quantities and default aggregation sizes.
aggregations: A customized list of aggregations to perform. Keys are index fields,
images.nomad.tag: "stable"
images.frontend.tag: "stable"
nodePort: 30011
images.nomad.tag: "stable"
images.frontend.tag: "stable"
nodePort: 30012
images.nomad.tag: "latest"
images.frontend.tag: "latest"
nodePort: 30005
......@@ -20,3 +20,8 @@ The different overrides are:
The .env file contains some additional config and secrets. The development secrets do
not matter and are in the git (`.env_development`) and are replaced by real secret on
the production machine.
### Matomo (piwik)
This docker-compose can be used to run the user-data tracking server *Matomo* and its
database. This is currently not used by the official nomad production deployment.
......@@ -13,8 +13,3 @@ by using different URL-path and database names.
The chart does not run any databases and search engines. Those are supposed to run
separately (see also *nomad-full* for an alternative approach) and their hosts, etc.
can be configures via helm values.
### nomad-full
This chart is under development. It is an attempt to also run all required databases
and search engine in the same kubernetes cluster.
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment