Commit d548c92e authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

updated developer documentation

parent 495c0d33
Pipeline #88466 canceled with stages
in 10 minutes and 22 seconds
......@@ -38,7 +38,7 @@ Let's say you want to see the repository metadata (i.e. the information that you
our gui) for entries that fit search criteria, like compounds having atoms *Si* and *O* in
it:
```
```sh
curl -X GET "http://nomad-lab.eu/prod/rae/api/repo/?atoms=Si&atoms=O"
```
......@@ -46,7 +46,7 @@ Here we used curl to send an HTTP GET request to return the resource located by
In practice you can omit the `-X GET` (which is the default) and you might want to format
the output:
```
```sh
curl "http://nomad-lab.eu/prod/rae/api/repo/?atoms=Si&atoms=O" | python -m json.tool
```
......@@ -68,21 +68,21 @@ Similar functionality is offered to download archive or raw data. Let's say you
identified an entry (given via a `upload_id`/`calc_id`, see the query output), and
you want to download it:
```
```sh
curl "http://nomad-lab.eu/prod/rae/api/raw/calc/JvdvikbhQp673R4ucwQgiA/k-ckeQ73sflE6GDA80L132VCWp1z/*" -o download.zip
```
With `*` you basically requests all the files under an entry or path..
If you need a specific file (that you already know) of that calculation:
```
```sh
curl "http://nomad-lab.eu/prod/rae/api/raw/calc/JvdvikbhQp673R4ucwQgiA/k-ckeQ73sflE6GDA80L132VCWp1z/INFO.OUT"
```
You can also download a specific file from the upload (given a `upload_id`), if you know
the path of that file:
```
```sh
curl "http://nomad-lab.eu/prod/rae/api/raw/JvdvikbhQp673R4ucwQgiA/exciting_basis_set_error_study/monomers_expanded_k8_rgkmax_080_PBE/72_Hf/INFO.OUT"
```
......@@ -90,27 +90,27 @@ If you have a query
that is more selective, you can also download all results. Here all compounds that only
consist of Si, O, bulk material simulations of cubic systems (currently ~100 entries):
```
```sh
curl "http://nomad-lab.eu/prod/rae/api/raw/query?only_atoms=Si&only_atoms=O&system=bulk&crystal_system=cubic" -o download.zip
```
Here are a few more examples for downloading the raw data of based on DOI or dataset.
You will have to encode non URL safe characters in potential dataset names (e.g. with a service like [www.urlencoder.org](https://www.urlencoder.org/)):
```
```sh
curl "http://nomad-lab.eu/prod/rae/api/raw/query?datasets.doi=10.17172/NOMAD/2020.03.18-1" -o download.zip
curl "http://nomad-lab.eu/prod/rae/api/raw/query?dataset=Full%20ahnarmonic%20stAViC%20approach%3A%20Silicon%20and%20SrTiO3" -o download.zip
```
In a similar way you can see the archive of an entry:
```
```sh
curl "http://nomad-lab.eu/prod/rae/api/archive/f0KQE2aiSz2KRE47QtoZtw/6xe9fZ9xoxBYZOq5lTt8JMgPa3gX" | python -m json.tool
```
Or query and display the first page of 10 archives:
```
```sh
curl "http://nomad-lab.eu/prod/rae/api/archive/query?only_atoms=Si&only_atoms=O" | python -m json.tool
```
......@@ -164,7 +164,7 @@ Optionally, if you need to access your private data, the package *python-keycloa
required to conveniently acquire the necessary tokens to authenticate your self towards
NOMAD.
```
```sh
pip install bravado
pip install python-keycloak
```
......@@ -386,20 +386,20 @@ The shell tool *curl* can be used to call most API endpoints. Most endpoints for
or downloading data are only **GET** operations controlled by URL parameters. For example:
Downloading data:
```
```sh
curl http://nomad-lab.eu/prod/rae/api/raw/query?upload_id=<your_upload_id> -o download.zip
```
It is a litle bit trickier, if you need to authenticate yourself, e.g. to download
not yet published or embargoed data. All endpoints support and most require the use of
an access token. To acquire an access token from our usermanagement system with curl:
```
```sh
curl --data 'grant_type=password&client_id=nomad_public&username=<your_username>&password=<your password>' \
https://nomad-lab.eu/fairdi/keycloak/auth/realms/fairdi_nomad_prod/protocol/openid-connect/token
```
You can use the access-token with:
```
```sh
curl -H 'Authorization: Bearer <you_access_token>' \
http://nomad-lab.eu/prod/rae/api/raw/query?upload_id=<your_upload_id> -o download.zip
```
......
......@@ -11,7 +11,7 @@ are up and running and both have access to the underlying file storage, part of
which is mounted inside each container under :code:`.volumes/fs`.
With both the source and target deployment running, you can use the
:code::ref:`cli_ref:mirror` command to transfer the data from source to target. The
:ref:`cli_ref:mirror` command to transfer the data from source to target. The
mirror will copy everything: i.e. the raw data, archive data and associated
metadata in the database.
......
.. _install-client:
Install the NOMAD client library
================================
......
# Developing NOMAD
## Introduction
The nomad infrastructure consists of a series of nomad and 3rd party services:
- nomad worker (python): task worker that will do the processing
- nomad app (python): the nomad app and it's REST APIs
- nomad gui: a small server serving the web-based react gui
- proxy: an nginx server that reverse proxyies all services under one port
- elastic search: nomad's search and analytics engine
- mongodb: used to store processing state
- rabbitmq: a task queue used to distribute work in a cluster
All 3rd party services should be run via *docker-compose* (see below). The
nomad python services can be run with python to develop them.
The gui can be run with a development server via yarn.
Below you will find information on how to install all python dependencies and code
manually. How to use *docker*/*docker-compose*. How run 3rd-party services with *docker-compose*.
Keep in mind the *docker-compose* configures all services in a way that mirror
the configuration of the python code in `nomad/config.py` and the gui config in
`gui/.env.development`.
To learn about how to run everything in docker, e.g. to operate a NOMAD OASIS in
production, go (here)(/app/docs/ops.html).
## Getting started
### Cloning and development tools
If not already done, you should clone nomad and create a python virtual environment.
To clone the repository:
### Clone the sources
If not already done, you should clone nomad. To clone the main NOMAD repository:
```
git clone git@gitlab.mpcdf.mpg.de:nomad-lab/nomad-FAIR.git
cd nomad-FAIR
git clone git@gitlab.mpcdf.mpg.de:nomad-lab/nomad-FAIR.git nomad
cd nomad
```
### C libs
Even though the NOMAD infrastructure is written in python, there is a C library
required by one of our python dependencies.
#### libmagic
Libmagic allows to determine the MIME type of files. It should be installed on most
unix/linux systems. It can be installed on MacOS with homebrew:
```
brew install libmagic
```
### Prepare your Python environment
### Virtual environment
You work in a Python virtual environment.
#### pyenv
The nomad code currently targets python 3.7. If you host machine has an older version installed,
......@@ -67,43 +29,50 @@ virtualenv -p `which python3` .pyenv
source .pyenv/bin/activate
```
#### Conda
#### conda
If you are a conda user, there is an equivalent, but you have to install pip and the
right python version while creating the environment.
```
```sh
conda create --name nomad_env pip python=3.7
conda activate nomad_env
```
To install libmagick for conda, you can use (other channels might also work):
```
```sh
conda install -c conda-forge --name nomad_env libmagic
```
#### pip
Make sure you have the most recent version of pip:
```
```sh
pip install --upgrade pip
```
The next steps can be done using the `setup.sh` script. If you prefer to understand all
the steps and run them manually, read on:
#### Missing system libraries (e.g. on MacOS)
Even though the NOMAD infrastructure is written in python, there is a C library
required by one of our python dependencies. Libmagic is missing on some systems.
Libmagic allows to determine the MIME type of files. It should be installed on most
unix/linux systems. It can be installed on MacOS with homebrew:
```sh
brew install libmagic
```
### Install NOMAD-coe dependencies.
### Install sub-modules.
Nomad is based on python modules from the NOMAD-coe project.
This includes parsers, python-common and the meta-info. These modules are maintained as
their own GITLab/git repositories. To clone and initialize them run:
```
```sh
git submodule update --init
```
All requirements for these submodules need to be installed and they need to be installed
themselves as python modules. Run the `dependencies.sh` script that will install
everything into your virtual environment:
```
```sh
./dependencies.sh -e
```
......@@ -112,19 +81,19 @@ to change the downloaded dependency code without having to reinstall after.
### Install nomad
Finally, you can add nomad to the environment itself (including all extras)
```
```sh
pip install -e .[all]
```
If pip tries to use and compile sources and this creates errors, it can be told to prefer binary version:
```
```sh
pip install -e .[all] --prefer-binary
```
### Generate GUI artifacts
The NOMAD GUI requires static artifacts that are generated from the NOMAD Python codes.
```
```sh
nomad dev metainfo > gui/src/metainfo.json
nomad dev searchQuantities > gui/src/searchQuantities.json
nomad dev units > gui/src/units.js
......@@ -136,92 +105,53 @@ the tests. See below.
## Running the infrastructure
### Docker and nomad
Nomad depends on a set of databases, search engines, and other services. Those
must run to make use of nomad. We use *docker* and *docker-compose* to create a
unified environment that is easy to build and to run.
You can use *docker* to run all necessary 3rd-party components and run all nomad
services manually from your python environment. You can also run nomad in docker,
but using Python is often preferred during development, since it allows
you change things, debug, and re-run things quickly. The later one brings you
closer to the environment that will be used to run nomad in production. For
development we recommend to skip the next step.
### Docker images for nomad
Nomad comprises currently two services,
the *worker* (does the actual processing), and the *app*. Those services can be
run from one image that have the nomad python code and all dependencies installed. This
is covered by the `Dockerfile` in the root directory
of the nomad sources. The gui is served also served from the *app* which entails the react-js frontend code.
Before building the image, make sure to execute
```
./gitinfo.sh
```
This allows the app to present some information about the current git revision without
having to copy the git itself to the docker build context.
To run NOMAD, some 3-rd party services are neeed
- elastic search: nomad's search and analytics engine
- mongodb: used to store processing state
- rabbitmq: a task queue used to distribute work in a cluster
### Run necessary 3-rd party services with docker-compose
All 3rd party services should be run via *docker-compose* (see below).
Keep in mind the *docker-compose* configures all services in a way that mirror
the configuration of the python code in `nomad/config.py` and the gui config in
`gui/.env.development`.
You can run all containers with:
```
You can run all services with:
```sh
cd ops/docker-compose/infrastructure
docker-compose -f docker-compose.yml -f docker-compose.override.yml up -d mongo elastic rabbitmq
docker-compose up -d mongo elastic rabbitmq
```
To shut down everything, just `ctrl-c` the running output. If you started everything
in *deamon* mode (`-d`) use:
```
```sh
docker-compose down
```
Usually these services only used by the nomad containers, but sometimes you also
need to check something or do some manual steps.
The *docker-compose* can be overriden with additional seetings. See documentation section on
operating NOMAD for more details. The override `docker-compose.override.yml` will
expose all database ports to the hostmachine and should be used in development. To use
it run docker-compose with `-f docker-compose.yml -f docker-compose.override.yml`.
### ELK (elastic stack)
If you run the ELK stack (and enable logstash in nomad/config.py),
you can reach the Kibana with [localhost:5601](http://localhost:5601).
The index prefix for logs is `logstash-`. The ELK is only available with the
`docker-compose.dev-elk.yml` override.
### mongodb and elastic search
You can access mongodb and elastic search via your preferred tools. Just make sure
to use the right ports (see above).
Usually these services only used by NOMAD, but sometimes you also
need to check something or do some manual steps. You can access mongodb and elastic search
via your preferred tools. Just make sure to use the right ports.
## Running NOMAD
### API and worker
NOMAD consist of the NOMAD app/api, a worker, and the GUI. You can run app and worker with
the NOMAD cli:
To simply run a worker with the installed nomad cli, do (from the root)
```
```sh
nomad admin run app
nomad admin run worker
nomad admin run appworker
```
To run it directly with celery, do (from the root)
```
celery -A nomad.processing worker -l info
```
The app will run at port 8000 by default.
You can also run worker and app together:
```
nomad admin run appworker
To run the worker directly with celery, do (from the root)
```sh
celery -A nomad.processing worker -l info
```
### GUI
When you run the gui on its own (e.g. with react dev server below), you have to have
the API running manually also. This *inside docker* API is configured for ngingx paths
and proxies, which are run by the gui container. But you can run the *production* gui
in docker and the dev server gui in parallel with an API in docker.
Either with docker, or:
```
the app manually also.
```sh
cd gui
yarn
yarn start
......@@ -229,13 +159,12 @@ yarn start
## Running tests
### additional settings and artifacts
To run the tests some additional settings and files are necessary that are not part
of the code base.
First you need to create a `nomad.yaml` with the admin password for the user management
system:
```
```yaml
keycloak:
password: <the-password>
```
......@@ -245,7 +174,7 @@ be copied from `/nomad/fairdi/db/data/springer.msg` on our servers and should
be placed at `nomad/normalizing/data/springer.msg`.
Thirdly, you have to provide static files to serve the docs and NOMAD distribution:
```
```sh
cd docs
make html
cd ..
......@@ -254,24 +183,23 @@ python setup.py sdist
cp dist/nomad-lab-*.tar.gz dist/nomad-lab.tar.gz
```
### run the necessary infrastructure
You need to have the infrastructure partially running: elastic, rabbitmq.
The rest should be mocked or provided by the tests. Make sure that you do no run any
worker, as they will fight for tasks in the queue.
```
cd ops/docker-compose
```sh
cd ops/docker-compose/infrastructure
docker-compose up -d elastic rabbitmq
cd ../..
pytest -svx tests
```
We use pylint, pycodestyle, and mypy to ensure code quality. To run those:
```
```sh
nomad dev qa --skip-test
```
To run all tests and code qa:
```
```sh
nomad dev qa
```
......@@ -608,7 +536,7 @@ The lifecycle of a *feature* branch should look like this:
While working on a feature, there are certain practices that will help us to create
a clean history with coherent commits, where each commit stands on its own.
```
```sh
git commit --amend
```
......@@ -619,7 +547,7 @@ you are basically adding changes to the last commit, i.e. editing the last commi
you push, you need to force it `git push origin feature-branch --force-with-lease`. So be careful, and
only use this on your own branches.
```
```sh
git rebase <version-branch>
```
......@@ -633,7 +561,7 @@ more consistent history. You can also rebase before create a merge request, bas
allowing for no-op merges. Ideally the only real merges that we ever have, are between
version branches.
```
```sh
git merge --squash <other-branch>
```
......@@ -659,7 +587,7 @@ you have to make sure that the modules are updated to not accidentally commit ol
submodule commits again. Usually you do the following to check if you really have a
clean working directory.
```
```sh
git checkout something-with-changes
git submodule update
git status
......
GUI React Components
====================
These is the API reference for NOMAD's GUI React components.
.. contents:: Table of Contents
.. reactdocgen:: react-docgen.out
......@@ -67,6 +67,7 @@ Python modules:
from nomad.datamodel.metainfo.public import section_run
my_run = section_run()
Many more examples about how to read the NOMAD Metainfo programmatically can be found
`here <https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/tree/master/examples/access_metainfo.py>`_.
......
.. mdinclude:: ../../ops/containers/README.md
......@@ -6,4 +6,3 @@ Operating NOMAD
depl_docker.rst
depl_helm.rst
depl_images.rst
......@@ -8,12 +8,12 @@ Archive format. This is documentation on how to develop such a parser.
Let's assume we need to write a new parser from scratch.
First we need the install *nomad-lab* Python package to get the necessary libraries:
```
```sh
pip install nomad-lab
```
We prepared an example parser project that you can work with.
```
```sh
git clone ... --branch hello-word
```
......@@ -21,7 +21,7 @@ Alternatively, you can fork the example project on GitHub to create your own par
your fork accordingly.
The project structure should be
```
```none
example/exampleparser/__init__.py
example/exampleparser/__main__.py
example/exampleparser/metainfo.py
......@@ -33,7 +33,7 @@ example/setup.py
Next you should install your new parser with pip. The `-e` parameter installs the parser
in *development*. This means you can change the sources without the need to re-install.
```
```sh
cd example
pip install -e .
```
......@@ -61,13 +61,13 @@ populate the archive with a *root section* `Run` and set the program name to `EX
You can run the parser with the included `__main__.py`. It takes a file as argument and
you can run it like this:
```
```sh
python -m exampleparser test/data/example.out
```
The output should show the log entry and the minimal archive with one `section_run` and
the respective `program_name`.
```
```json
INFO root 2020-12-02T11:00:52 Hello World
- nomad.release: devel
- nomad.service: unknown nomad service
......@@ -86,12 +86,12 @@ Let's do some actual parsing. Here we demonstrate how to parse ASCII files with
structure information in it. As it is typically used by materials science codes.
The on the `master` branch of the example project, we have a more 'realistic' example:
```
```sh
git checkout master
```
This example imagines a potential code output that looks like this (`tests/data/example.out`):
```
```none
2020/05/15
*** super_code v2 ***
......@@ -161,7 +161,7 @@ with a list of quantities to parse. To access a parsed quantity, one can use the
method.
We can apply these parser definitions like this:
```
```sh
mainfile_parser.mainfile = mainfile
mainfile_parser.parse()
```
......@@ -195,7 +195,7 @@ for calculation in mainfile_parser.get('calculation'):
```
You can still run the parse on the given example file:
```
```sh
python -m exampleparser test/data/example.out
```
......@@ -232,7 +232,7 @@ To improve the parser quality and ease the further development, you should get i
habit of testing the parser.
We use the Python unit test framework *pytest*:
```
```sh
pip install pytest
```
......@@ -251,7 +251,7 @@ def test_example():
```
You can run all tests in the `tests` directory like this:
```
```sh
pytest -svx tests
```
......@@ -302,30 +302,30 @@ added to the infrastructure parser tests (`tests/parsing/test_parsing.py`).
Once the parser is added, it become also available through the command line interface and
normalizers are applied as well:
```
```sh
nomad parser test/data/example.out
```
## Developing an existing parser
To develop an existing parser, you should install all parsers:
```
```sh
pip install nomad-lab[parsing]
```
Close the parser project on top:
```
```sh
git clone <parser-project-url>
cd <parser-dir>
```
Either remove the installed parser and pip install the cloned version:
```
```sh
rm -rf <path-to-your-python-env>/lib/python3.7/site-packages/<parser-module-name>
pip install -e .
```
Or use `PYTHONPATH` so that the cloned code takes precedence over the installed code:
```
```sh
PYTHONPATH=. nomad parser <path-to-example-file>
```
......@@ -437,7 +437,8 @@ The simplest kind of matcher looks like this
```
This matcher uses a single regular expression ([regular expressions documentation](https://docs.python.org/2/library/re.html)) to match a line. An online tool to quickly verify regular expressions and to see what they match can be found [here](https://regex101.com/#python).
This matcher uses a single regular expression ([regular expressions documentation](https://docs.python.org/2/library/re.html))
to match a line. Here is an online tool to quickly [verify regular expressions](https://regex101.com/#python).
Note the following things:
......@@ -622,14 +623,13 @@ object (that might have cached values). This is useful to perform transformation
the data parsed before emitting it.
The simplest way to achieve this is to define methods called onClose and then the section name in the object that you pass as superContext.
For example
```python
def onClose_section_scf_iteration(self, backend, gIndex, section):
logging.getLogger("nomadcore.parsing").info("YYYY bla gIndex %d %s", gIndex, section.simpleValues)
logging.getLogger("nomadcore.parsing").info("YYYY bla gIndex %d %s", gIndex, section.simpleValues)
```
defines a trigger called every time an scf iteration section is closed.
This example defines a trigger called every time an scf iteration section is closed.
### Logging
You can use the standard python logging module in parsers. Be aware that all logging
......
......@@ -17,8 +17,6 @@
#