Commit 8ac6c422 authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Updated documentation.

parent a113893e
......@@ -2,6 +2,11 @@ body {
font-family: "Roboto", "Lato", "proxima-nova", "Helvetica Neue", Arial, sans-serif;
}
h1, h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend {
font-family: inherit !important
}
.wy-nav-top {
display: none
}
......
......@@ -7,6 +7,7 @@ Summary
.. qrefflask:: nomad.api:app
:undoc-static:
API Details
-----------
......
docs/components.png

89.8 KB | W: | H:

docs/components.png

107 KB | W: | H:

docs/components.png
docs/components.png
docs/components.png
docs/components.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -20,7 +20,7 @@ sys.path.insert(0, os.path.abspath('..'))
# -- Project information -----------------------------------------------------
project = 'nomad-FAIR'
project = 'nomad-FAIRDI'
copyright = '2018, FAIRDI e.V.'
author = 'FAIRDI e.V.'
......@@ -138,7 +138,7 @@ latex_elements = {
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'nomad-FAIR.tex', 'nomad-FAIR Documentation',
(master_doc, 'nomad-FAIRDI.tex', 'nomad-FAIRDI Documentation',
'the NOMAD developers', 'manual'),
]
......@@ -148,7 +148,7 @@ latex_documents = [
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'nomad', 'nomad-FAIR Documentation',
(master_doc, 'nomad', 'nomad-FAIRDI Documentation',
[author], 1)
]
......@@ -159,13 +159,20 @@ man_pages = [
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'nomad-FAIR', 'nomad-FAIR Documentation',
author, 'nomad-FAIR', 'One line description of project.',
(master_doc, 'nomad-FAIRDI', 'nomad-FAIRDI Documentation',
author, 'nomad-FAIRDI', 'One line description of project.',
'Miscellaneous'),
]
# -- Extension configuration -------------------------------------------------
autodoc_member_order = 'bysource'
autodoc_default_options = {
'member-order': 'bysource',
'special-members': '__init__',
'undoc-members': None,
'exclude-members': '__weakref__'
}
# -- Options for todo extension ----------------------------------------------
......
docs/data.png

93.3 KB | W: | H:

docs/data.png

48.8 KB | W: | H:

docs/data.png
docs/data.png
docs/data.png
docs/data.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -108,7 +108,7 @@ processing. To create hashes we use :py:func:`nomad.utils.hash`.
NOMAD-coe Dependencies
----------------------
We currently clone and install NOMAD-coe dependencies *"outside"* the nomad-FAIR project
We currently clone and install NOMAD-coe dependencies *"outside"* the nomad-FAIRDI project
(see :py:mod:`nomad.dependencies`). The installed projects become part of the python
environment and all dependencies are used like regular pipy packages and python modules.
......@@ -116,20 +116,20 @@ This allows us to target (e.g. install) individual commits. In theory, these mig
change during runtime, allowing to update parsers or normalizers on a running nomad.
More importantly, we can address commit hashes to identify exact parser/normalizer versions.
On the downside, common functions for all dependencies (e.g. the python-common package,
or nomad_meta_info) cannot be part of the nomad-FAIR project. In general, it is hard
to simultaneously develop nomad-FAIR and NOMAD-coe dependencies.
or nomad_meta_info) cannot be part of the nomad-FAIRDI project. In general, it is hard
to simultaneously develop nomad-FAIRDI and NOMAD-coe dependencies.
Another approach is to integrate the NOMAD-coe sources with nomad-FAIR. The lacking
Another approach is to integrate the NOMAD-coe sources with nomad-FAIRDI. The lacking
availability of individual commit hashes, could be replaces with hashes of source-code
files.
We use the branch ``nomad-fair`` on all dependencies for nomad-FAIR specific changes.
We use the branch ``nomad-fair`` on all dependencies for nomad-FAIRDI specific changes.
Parsers
^^^^^^^
There are several steps to take, to make a NOMOAD-coe parser fit for nomad-FAIR:
There are several steps to take, to make a NOMOAD-coe parser fit for nomad-FAIRDI:
- Implement ``nomadcore.baseclasses.ParserInterface``. Make sure that the meta-info is
only loaded for each parse instance, not for each parser run.
......@@ -155,7 +155,7 @@ There are several steps to take, to make a NOMOAD-coe parser fit for nomad-FAIR:
Normalizers
^^^^^^^^^^^
There are several steps to take, to make a NOMOAD-coe normalizer fit for nomad-FAIR:
There are several steps to take, to make a NOMOAD-coe normalizer fit for nomad-FAIRDI:
- If written in scala, re-write it in python.
- The normalizer should read from the provided backend. In NOMAD-coe normalizers read
......@@ -167,7 +167,7 @@ There are several steps to take, to make a NOMOAD-coe normalizer fit for nomad-F
Logging
-------
There are three important prerequisites to understand about nomad-FAIR's logging:
There are three important prerequisites to understand about nomad-FAIRDI's logging:
- All log entries are recorded in a central elastic search database. To make this database
useful, log entries must be sensible in size, frequence, meaning, level, and logger name.
......@@ -177,7 +177,7 @@ There are three important prerequisites to understand about nomad-FAIR's logging
end all entries are stored as JSON dictionaries with ``@timestamp``, ``level``,
``logger_name``, ``event`` plus custom context data. Keep events very short, most
information goes into the context.
- We use logging to inform us about the state of nomad-FAIR, not about user
- We use logging to inform us about the state of nomad-FAIRDI, not about user
behavior, input, data. Do not confuse this when determining the log-level for an event.
A user providing an invalid upload file, for example, should never be an error.
......
nomad@FAIR
==========
nomad@FAIRDI
============
This project is a prototype for the continuation of the original NOMAD-coe software
and infrastructure with a simplyfied architecture and consolidated code base.
......
# Introduction
**NOvel Materials Discorvery (NOMAD)** comprises storage, processing, management, discovery, and
analytics of computational material science data from over 40 community *codes*.
The original NOMAD software, developed by the
[NOMAD-coe](http://nomad-coe.eu) project, is used to host over 50 million total energy
calculations in a single central infrastructure instances that offers a variety
of services (*repository*, *archive*, *encyclopedia*, *analytics*, *visualization*).
This is the documentation of **nomad@FAIRDI**, the Open-Source continuation of the
original NOMAD-coe software that reconciles the original code base,
integrate it's services, allows 3rd parties to run individual and federated instance of
the nomad infrastructure, provides nomad to other material science domains, and applies
the FAIRDI principles as proliferated by the [FAIRDI Data Infrastructure e.V.](http://fairdi.eu).
A central and publically available instance of the nomad software is run at the
[MPCDF](https://www.mpcdf.mpg.de/) in Garching, Germany.
The nomad software runs SAAS on a server and is used via web-based GUI and ReSTful
API. Originally developed and hosted as individual services, **nomad@FAIRDI**
provides all services behind one GUI and API into a single coherent, integrated, and
modular software project.
This documentation is only about the nomad *software*; it is about architecture,
how to contribute, code reference, engineering and operation of nomad. It is not a
nomad user manual.
## Architecture
The following depicts the *nomad@FAIRDI* architecture with respect to software components
in terms of python modules, gui components, and 3rd party services (e.g. databases,
search engines, etc.). It comprises a revised version of the repository and archive.
.. figure:: components.png
:alt: nomad components
Besides various scientific computing, machine learning, and computational material
science libraries (e.g. numpy, skikitlearn, tensorflow, ase, spglib, matid, and many more),
Nomad uses a set of freely available or Open Source technologies that already solve most
of its processing, storage, availability, and scaling goals. The following is a non
comprehensive overview of used languages, libraries, frameworks, and services.
### Python 3
The *backend* of nomad is written in Python. This includes all parsers, normalizers,
and other data processing. We only use Python 3 and there is no compatibility with
Python 2. Code is formatted close to [pep8](https://www.python.org/dev/peps/pep-0008/),
critical parts use [pep484](https://www.python.org/dev/peps/pep-0484/) type-hints.
[Pycodestyle](https://pypi.org/project/pycodestyle/),
[pylint](https://www.pylint.org/), and
[mypy](http://mypy-lang.org/) (static type checker) are used to ensure quality.
Tests are written with [pytest](https://docs.pytest.org/en/latest/contents.html).
Logging is done with [structlog](https://www.structlog.org/en/stable/) and *logstash* (see
Elasticstack below). Documentation is driven by [Sphinx](http://www.sphinx-doc.org/en/master/).
### celery
[Celery](http://celeryproject.org) (+ [rabbitmq](https://www.rabbitmq.com/))
is a popular combination for realizing long running tasks in internet applications.
We use it to drive the processing of uploaded files.
It allows us to transparently distribute processing load while keeping processing state
available to inform the user.
### elastic search
[Elasticsearch](https://www.elastic.co/webinars/getting-started-elasticsearch)
is used to store repository data (not the raw files).
Elasticsearch allows for flexible scalable search and analytics.
### mongodb
[Mongodb](https://docs.mongodb.com/) is used to store and track the state of the
processing of uploaded files and therein contained calculations. We use
[mongoengine](http://docs.mongoengine.org/) to program with mongodb.
### PostgreSQL
A relational database is used to store all user provided metadata: users, datasets
(curated sets of uploaded data), references, comments, DOIs, coauthors, etc.
Furthermore, it is still used to store some of the calculation metadata derived
via parsing. *This will most likely move out of Postgres in the future.* We
use [SQLAlchemy](https://docs.sqlalchemy.org/en/latest/) as on ORM framework.
### flask, et al.
The ReSTful API is build with the [flask](http://flask.pocoo.org/docs/1.0/)
framework and its [ReST+](https://flask-restplus.readthedocs.io/en/stable/) extension. This
allows us to automatically derive a [swagger](https://swagger.io/) description of the nomad API,
which in turn allows us to generate programming language specific client libraries, e.g. we
use [bravado](https://github.com/Yelp/bravado) for Python and
[swagger-js](https://github.com/swagger-api/swagger-js) for Javascript.
Fruthermore, you can browse and use the API via [swagger-ui](https://swagger.io/tools/swagger-ui/).
### Elasticstack
The [elastic stack](https://www.elastic.co/guide/index.html)
(previously *ELK* stack) is a central logging, metrics, and monitoring
solution that collects data within the cluster and provides a flexible analytics frontend
for said data.
### Javascript, React, Material-UI
The frontend (GUI) of **nomad@FAIRDI** build on top of the
[React](https://reactjs.org/docs/getting-started.html) component framework.
This allows us to build the GUI as a set of re-usable components to
achieve a coherent representations for all aspects of nomad, while keeping development
efforts manageable. React uses [JSX](https://reactjs.org/docs/introducing-jsx.html)
(a ES6 variety) that allows to mix HTML with Javascript code.
The component library [Material-UI](https://material-ui.com/)
(based on Google's popular material design framework) provides a consistent look-and-feel.
### docker
To run a **nomad@FAIRDI** instance, many services have to be orchestrated:
the nomad api, nomad worker, mongodb, Elasticsearch, PostgreSQL, RabbitMQ,
Elasticstack (logging), the nomad GUI, and a reverse proxy to keep everything together.
Further services might be needed (e.g. JypiterHUB), when nomad grows.
The container platform [Docker](https://docs.docker.com/) allows us to provide all services
as pre-build images that can be run flexibly on all types of platforms, networks,
and storage solutions. [Docker-compose](https://docs.docker.com/compose/) allows us to
provide configuration to run the whole nomad stack on a single server node.
### kubernetes + helm
To run and scale nomad on a cluster, you can use [kubernetes](https://kubernetes.io/docs/home/)
to orchestrated the necessary containers. We provide a [helm](https://docs.helm.sh/)
chart with all necessary service and deployment descriptors that allow you to setup and
update nomad with few commands.
### GitLab
Nomad as a software project is managed via [GitLab](https://docs.gitlab.com/).
The **nomad@FAIRDI** project is hosted [here](https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR).
GitLab is used to manage versions, different branches of development, tasks and issues,
as a [registry for Docker images](https://docs.gitlab.com/ee/user/project/container_registry.html),
and [CI/CD platform](https://docs.gitlab.com/ee/ci/).
## Data model
.. figure:: data.png
:alt: nomad's data model
The entities that comprise the nomad data model are *users*, *datasets*, *uploads*,
*calculations* (calc), and *materials*. *Users* upload multiple related *calculations*
in one *upload*. *Users* can curate *calculations* into *datasets*. *Caclulations*
belong to one *material* based on the simulated system.
### Users
- The user `email` is used as a primary key to uniquely identify users
(even among different nomad installations)
### Uploads
- An upload contains related calculations in the form of raw code input and output files
- Uploader are encouraged to upload all relevant files
- The directory structure of an upload might be used to relate calculations to each other
- Uploads have a unique randomly choosen `upload_id` (UUID)
- The `uploader` is the user that provided the upload. There is always one immutable `uploader`
- Currently, uploads can be provided as `.zip` or `.tar.gz` files.
### Calculations
- A calculation has a unique `calc_id` that is based on the upload's id and the `mainfile`
- The `mainfile` is a upload relative path to the main output file.
- Each calculation, when published, gets a unique `pid`. Pids are ascending intergers. For
each `pid` a shorter `handle` is created. Handles can be registered with a handle system,
e.g. the central nomad installation at MPCDF is registered at a MPCDF/GWDW handle system.
- The `calc_hash` is computed from the main and other parsed raw files.
- Calculation data comprises *user metadata* (comments, references, datasets, coauthors),
*calculation metadata* (code, version, system and symmetry, used DFT method, etc.),
the *archive data* (a hierarchy of all parsed quantities), and the uploaded *raw files*.
### Datasets
- Datasets are user curated sets of calculations.
- Users can assign names and nomad can register a DOI for a dataset.
- A calculation can be put into multiple datasets.
### Materials
- Materials aggregate calculations based on common system properties
(e.g. system type, atoms, lattice, space group, etc.).
### Implementation
The different entities have often multiple implementations for different storage systems.
For example, aspects of calculations are stored in files (raw files, calc metadata, archive data),
Postgres (user metadata), Elasticsearch (metadata), and mongodb (processing state).
Different transformation between different implementations exist. See
:py:mod:`nomad.datamodel` for further information.
## Processing
.. figure:: proc.png
:alt: nomad's processing workflow
See :py:mod:`nomad.processing` for further information.
Introduction
============
**NOvel Materials Discorvery (NOMAD)** comprises storage, processing, management, discovery, and
analytics of computational material science data from over 40 community *codes*.
The original NOMAD software, developed by the
[NOMAD-coe](http://nomad-coe.eu) project, is used to host over 50 million total energy
calculations in a single central infrastructure instances that offers a variety
of services (repository, archive, encyclopedia, analytics, visualization).
This is the documentation of **nomad@FAIR**, the Open-Source continuation of the
original NOMAD-coe software that reconciles the original code base,
integrate it's services,
allows 3rd parties to run individual and federated instance of the nomad infrastructure,
provides nomad to other material science domains, and applies the FAIR principles
as proliferated by the (FAIR Data Infrastructure e.V.)[http://fairdi.eu].
There are different use-modes for the nomad software, but the most common use is
to run the nomad infrastructure on a cloud and provide clients access to
web-based GUIs and REST APIs. This nomad infrastructure logically comprises the
*nomad repository* for uploading, searching, and downloading raw calculation input and output
from all relevant computational material science codes. A second part of nomad
is the *archive*. It provides all uploaded data in a common data format
called *meta-info* and includes common and code specific
schemas for structured data. Further services are available from
(nomad-coe.eu)[http://nomad-coe.eu], e.g. the *nomad encyclopedia*, *analytics toolkit*,
and *advanced graphics*.
Architecture
------------
The following depicts the *nomad@FAIR* architecture with respect to software components
in terms of python modules, gui components, and 3rd party services (e.g. databases,
search engines, etc.). It comprises a revised version of the repository and archive.
.. figure:: components.png
:alt: nomad components
The main modules of nomad
Nomad uses a series of 3rd party technologies that already solve most of nomads
processing, storage, availability, and scaling goals:
celery
^^^^^^
http://celeryproject.org (incl. rabbitmq) is a popular combination for realizing
long running tasks in internet applications. We use it to drive the processing of uploaded files.
It allows us to transparently distribute processing load while keeping processing state
available to inform the user.
elastic search
^^^^^^^^^^^^^^
Elastic search is used to store repository data (not the raw files).
Elastic search allows for flexible scalable search and analytics.
mongodb
^^^^^^^
Mongo is used to store and track the state of the processing of uploaded files and therein
contained calculations.
elastic stack
^^^^^^^^^^^^^
The *elastic stack* (previously *ELK* stack) is a central logging, metrics, and monitoring
solution that collects data within the cluster and provides a flexible analytics frontend
for said data.
Data model
----------
.. figure:: data.png
:alt: nomad's data model
The main data classes in nomad
See :py:mod:`nomad.processing`, :py:mod:`nomad.users`, and :py:mod:`nomad.repo`
for further information.
Processing
----------
.. figure:: proc.png
:alt: nomad's processing workflow
The workflow of nomad's processing tasks
See :py:mod:`nomad.processing` for further information.
docs/proc.png

60.6 KB | W: | H:

docs/proc.png

80.5 KB | W: | H:

docs/proc.png
docs/proc.png
docs/proc.png
docs/proc.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -6,34 +6,55 @@ nomad.config
.. automodule:: nomad.config
:members:
nomad.infrastructure
--------------------
.. automodule:: nomad.infrastructure
:members:
nomad.dependencies
------------------
.. automodule:: nomad.dependencies
:members:
nomad.datamodel
---------------
.. automodule:: nomad.datamodel
:members:
nomad.files
-----------
.. automodule:: nomad.files
:members:
nomad.parsing
-------------
.. automodule:: nomad.parsing
nomad.normalizing
-----------------
.. automodule:: nomad.normalizing
:members:
nomad.processing
----------------
.. automodule:: nomad.processing
nomad.repo
----------
.. automodule:: nomad.repo
nomad.search
------------
.. automodule:: nomad.search
nomad.user
----------
nomad.coe_repo
--------------
.. automodule:: nomad.coe_repo
nomad.api
---------
.. automodule:: nomad.api
nomad.client
------------
.. automodule:: nomad.client
nomad.utils
-----------
.. automodule:: nomad.utils
\ No newline at end of file
......@@ -17,8 +17,6 @@ Endpoints can use *flask_httpauth* based authentication either with basic HTTP
authentication or access tokens. Currently the authentication is validated against
users and sessions in the NOMAD-coe repository postgres db.
.. autodata:: base_path
There are two authentication "schemes" to authenticate users. First we use
HTTP Basic Authentication (username, password), which also works with username=token,
password=''. Second, there is a curstom HTTP header 'X-Token' that can be used to
......
......@@ -15,22 +15,23 @@
"""
Interface to the NOMAD-coe repository postgres database. This implementation is based on
SQLAlchemy. There are model classes that represent entries in the *users* and *session*
tables.
tables. All DB entities are implemented as classes, but most are hidden and data
can be accessed via the various relations with :class:`User`, :class:`Calc`, :class:`Upload`.
This module allows to authenticate users based on user password or session tokens.
It allows to access the user data like names and user_id.
To load an entity from the database use :data:`nomad.infrastructure.repository_db`
(the SQLAlchemy session), e.g.:
.. autoclass:: User
:members:
:undoc-members:
.. code-block:: python
repository_db.Query(coe_repo.Calc).filter_by(upload_id=some_id)
.. autoclass:: Session
.. autoclass:: User
:members:
:undoc-members:
.. autofunction:: ensure_test_user
This module also provides functionality to add parsed calculation data to the db:
.. autodata:: admin_user
.. autoexception:: LoginException
.. autoclass:: UploadMetaData
:members:
......@@ -40,6 +41,9 @@ This module also provides functionality to add parsed calculation data to the db
.. autoclass:: Calc
:members:
:undoc-members:
.. autoclass:: DataSet
:members:
:undoc-members:
"""
from .user import User, ensure_test_user, admin_user, LoginException
......
......@@ -62,7 +62,7 @@ class UploadMetaData:
Utility class that provides per upload meta data and overriding per calculation
meta data. For a given *mainfile* data is first read from the `calculations` key
(a list of calculation dict with a matching `mainfile` key), before it is read
from :param:`metadata_dict` it self.
from `metadata_dict` it self.
The class is used to deal with user provided meta-data.
......
......@@ -30,6 +30,8 @@ class Session(Base): # type: ignore
class LoginException(Exception):
""" Exception that is raised if the user could not be logged in despite present
credentials. """
pass
......@@ -166,6 +168,10 @@ def ensure_test_user(email):
def admin_user():
"""
Returns the admin user, a special user with `user_id==0`.
Its password is part of :mod:`nomad.config`.
"""
repo_db = infrastructure.repository_db
admin = repo_db.query(User).filter_by(user_id=1).first()
assert admin, 'Admin user does not exist.'
......
......@@ -15,10 +15,37 @@
"""
This module contains classes that allow to represent the core
nomad data entities :class:`Upload` and :class:`Calc` on a high level of abstraction
independent from their representation in the different modules :py:mod:`nomad.repo`,
:py:mod:`nomad.processing`, :py:mod:`nomad.coe_repo`, :py:mod:`nomad.files`.
independent from their representation in the different modules
:py:mod:`nomad.processing`, :py:mod:`nomad.coe_repo`, :py:mod:`nomad.files`,
:py:mod:`nomad.search`.
It is not about representing every detail, but those parts that are directly involved in
api, processing, migration, mirroring, or other 'infrastructure' operations.
Transformations between different implementations of the same entity can be build
and used. To ease the number of necessary transformations the classes
:class:`UploadWithMetadata` and :class:`CalcWithMetadata` can act as intermediate
representations. Therefore, implement only transformation from and to these
classes.
To implement a transformation, provide a transformation method in the source
entity class and register it:
.. code-block:: python
def to_my_target_entity(self):
target = MyTargetEntity()
target.property_x = # your transformation code
return target
MyTargetEntity.register_mapping(MySourceEntity.to_my_target_entity)
To apply a transformation, use:
.. code-block:: python
my_target_entity_instance = my_source_entity_instance.to(MyTargetEntity)
"""
from typing import Type, TypeVar, Union, Iterable, cast, Callable, Dict
......@@ -30,14 +57,42 @@ T = TypeVar('T')
class Entity():
"""
A common base class for all nomad entities. It provides the functions necessary
to apply transformations.
"""
mappings: Dict[Type['Entity'], Callable[['Entity'], 'Entity']] = dict()
@classmethod
def load_from(cls: Type[T], obj) -> T:
raise NotImplementedError