Commit 9c3983eb authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Updated documentation.

parent 8ac6c422
......@@ -32,10 +32,6 @@ WORKDIR /install
# We also install the -dev dependencies, to use this image for test and qa
RUN pip install --upgrade pip
COPY requirements-dev.txt requirements-dev.txt
RUN pip install -r requirements-dev.txt
COPY requirements-dep.txt requirements-dep.txt
RUN pip install -r requirements-dep.txt
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
......
......@@ -12,4 +12,4 @@ API Details
-----------
.. autoflask:: nomad.api:app
:undoc-static:
\ No newline at end of file
:undoc-static:
......@@ -52,7 +52,9 @@ These *guidelines* are partially enforced by CI/CD. As part of CI all tests are
branches; further we run a *linter*, *pep8* checker, and *mypy* (static type checker). You can
run ``nomad qa`` to run all these tests and checks before committing.
Only the CI/CD on ``master`` will create new ``*:latest`` images and allow to deploy.
The CI/CD will run on all refs that do not start with ``dev-``. The CI/CD will
not release or deploy anything automatically, but it can be manually triggered after the
build and test stage completed successfully.
Git/GitLab
......@@ -72,7 +74,7 @@ Terms and Identifiers
There are is some terminology consistently used in this documentation and the source
code. Use this terminology for identifiers.
Do not use abbreviations. There are (few) exceptions: `proc` (processing); `exc`, `e` (exception);
Do not use abbreviations. There are (few) exceptions: ``proc`` (processing); ``exc``, ``e`` (exception);
``calc`` (calculation), ``repo`` (repository), ``utils`` (utilities), and ``aux`` (auxiliary).
Other exceptions are ``f`` for file-like streams and ``i`` for index running variables.
Btw., the latter is almost never necessary in python.
......@@ -88,21 +90,26 @@ Terms:
- repo entry: Some quantities of a calculation that are used to represent that calculation in the repository.
- archive data: The normalized data of one calculation in nomad's meta-info-based format.
Ids and Hashes
--------------
Ids
---
Throughout nomad, we use different ids and hashes to refer to entities. If something
Throughout nomad, we use different ids. If something
is called *id*, it is usually a random uuid and has no semantic connection to the entity
it identifies. If something is called a *hash* than it is a hash build based on the
entity it identifies. This means either the whole thing or just some properties of
said entities.
The most common hashes are the *upload_hash* and *calc_hash*. The upload hash is
a hash over an uploaded file, as each upload usually refers to an individual user upload
(usually a .zip file). The calc_hash is a hash over the mainfile path within an upload.
The combination of upload_hash and calc_hash is used to identify calculations. They
allow us to id calculations independently of any random ids that are created during
processing. To create hashes we use :py:func:`nomad.utils.hash`.
- The most common hashes is the ``calc_hash`` based on mainfile and auxfile contents.
- The ``upload_id`` is a UUID assigned at upload time and never changed afterwards.
- The ``mainfile`` is a path within an upload that points to a main code output file.
Since, the upload directory structure does not change, this uniquely ids a calc within the upload.
- The ``calc_id`` (internal calculation id) is a hash over the ``mainfile`` and respective
``upload_id``. Therefore, each `calc_id` ids a calc on its own.
- We often use pairs of `upload_id/calc_id`, which in many context allow to resolve a calc
related file on the filesystem without having to ask a database about it.
- The ``pid`` or (``coe_calc_id``) is an sequential interger id.
- Calculation ``handle`` or ``handle_id`` are created based on those ``pid``.
To create hashes we use :py:func:`nomad.utils.hash`.
NOMAD-coe Dependencies
......@@ -129,9 +136,11 @@ We use the branch ``nomad-fair`` on all dependencies for nomad-FAIRDI specific c
Parsers
^^^^^^^
There are several steps to take, to make a NOMOAD-coe parser fit for nomad-FAIRDI:
There are several steps to take, to wrap a NOMOAD-coe parser into a nomad@FAIRDI parser:
- Implement ``nomadcore.baseclasses.ParserInterface``. Make sure that the meta-info is
- Implement ``nomadcore.baseclasses.ParserInterface`` or a class with a similar constructutor
and `parse` method interface.
- Make sure that the meta-info is
only loaded for each parse instance, not for each parser run.
- Have a root package that bears the parser name, e.g. ``vaspparser``
- The important classes (e.g. the parser interface implementation) in the root module
......@@ -140,28 +149,16 @@ There are several steps to take, to make a NOMOAD-coe parser fit for nomad-FAIRD
- Have a test module. Don't go overboard with the test data.
- Make it a pypi-style package, i.e. create ``setup.py`` script.
- The package name should be the parser name, e.g. ``vaspparser``.
- The parser should only use the provided logger
(:py:func:`nomadcore.baseclasses.ParserInterface::setup_logger`).
This is important for two reasons. First, our logging uses structured logging and
all entries are tagged data about parser, upload_ids, mainfiles, etc. This is important
to make errors easily reproduceable. Second, we store all logs on a parser run to
be available for end users.
- Keep logging sensible (see logging below). Do not log everything. Do not log massive
amounts of data. Keep in mind what are errors (as in the parser cannot perform its job)
and what not (the input is faulty).
- Let the parser logging as it is. We will catch it with a handler installed on the root logger.
This handler will redirect all legacy log events and put it though the nomad@FAIRDI
treatment described below.
- Remove all scala code.
Normalizers
^^^^^^^^^^^
There are several steps to take, to make a NOMOAD-coe normalizer fit for nomad-FAIRDI:
- If written in scala, re-write it in python.
- The normalizer should read from the provided backend. In NOMAD-coe normalizers read
data from provided serialized dictionaries. Don't do that; we do not want to use such
a normalizer specific interface.
- Do package, module, and logging related changes as you would for a parser.
We are rewriting all NOMAD-coe normalizers, see :py:mod:`nomad.normalizing`.
Logging
......@@ -177,9 +174,9 @@ There are three important prerequisites to understand about nomad-FAIRDI's loggi
end all entries are stored as JSON dictionaries with ``@timestamp``, ``level``,
``logger_name``, ``event`` plus custom context data. Keep events very short, most
information goes into the context.
- We use logging to inform us about the state of nomad-FAIRDI, not about user
- We use logging to inform about the state of nomad-FAIRDI, not about user
behavior, input, data. Do not confuse this when determining the log-level for an event.
A user providing an invalid upload file, for example, should never be an error.
For example, a user providing an invalid upload file, for example, should never be an error.
Please follow the following rules when logging:
......
......@@ -49,11 +49,9 @@ virtualenv -p `which python3` .pyenv
source .pyenv/bin/activate
```
We use *pip* to manage dependencies. There are multiple *requirements* files.
One of them, called *requirements-dev* contains all tools necessary to develop and build
nomad.
We use *pip* to manage required python packages.
```
pip install -r requirements-dev.txt
pip install -r requirements.txt
```
### Install NOMAD-coe dependencies.
......@@ -63,14 +61,7 @@ Those dependencies are managed and configured via python in
`nomad/dependencies.py`. This gives us more flexibility in interacting with
different parser, normalizer versions from within the running nomad infrastructure.
We compiled a the *requirements-dep* file with python modules that are commonly
used in NOMAD-coe. It is optional, but you should install them first, as they sort
out some issues with installing dependencies in the right order later.
```
pip install -r requirements-dep.txt
```
To actually run the dependencies script:
To run the dependencies script and install all dependencies into your environment:
```
python nomad/dependencies.py --dev
```
......@@ -82,7 +73,6 @@ dependency code without having to reinstall after.
### Install nomad
Finally, you can add nomad to the environment itself.
```
pip install -r requirements.txt
pip install -e .
```
......@@ -272,8 +262,20 @@ You need to have the infrastructure partially running: elastic, rabbitmq.
The rest should be mocked or provided by the tests. Make sure that you do no run any
worker, as they will fight for tasks in the queue.
```
cd instrastructure
docker-compose up -d elastic rabbitmq
cd ..
pytest -sv tests
cd ops/docker-compose
docker-compose up -d elastic rabbitmq postgres
cd ../..
pytest -svx tests
```
We use pylint, pycodestyle, and mypy to ensure code quality. To run those:
```
nomad qa --skip-test
```
To run all tests and code qa:
```
nomad qa
```
This mimiques the tests and checks that the GitLab CI/CD will perform.
......@@ -24,8 +24,8 @@ WORKDIR /install
# We also install the -dev dependencies, to use this image for test and qa
RUN pip install --upgrade pip
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY rawapi.requirements.txt rawapi.requirements.txt
RUN pip install -r rawapi.requirements.txt
# do that after the dependencies to use docker's layer caching
COPY . /install
......
setuptools
pandas
pyyaml
h5py
hjson
future
enum34
scipy
ase==3.15.0
Pint==0.7.2
mdtraj==1.9.1
panedr==0.2
mdanalysis==0.16.2
parmed==3.0.0
\ No newline at end of file
watchdog
gitpython
mypy
pylint
pylint_plugin_utils
astroid==2.0.4 # bug in pylint_mongoengine with latest version
pylint_mongoengine
pycodestyle
pytest==3.10.0 # celery fixtures not compatible with 4.x
pytest-timeout
pytest-cov
rope
mongomock
numpy
cython>=0.19
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment