Commit 5cfe215a authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Merge branch 'docs' into 'master'

Docs

See merge request !11
parents f53ffb23 614e3190
Pipeline #37437 passed with stages
in 5 minutes and 1 second
.build/
.static/
\ No newline at end of file
body {
font-family: "Roboto", "Lato", "proxima-nova", "Helvetica Neue", Arial, sans-serif;
}
.wy-nav-top {
display: none
}
a {
color: #607D8B !important
}
a:visited {
color: #607D8B !important
}
\ No newline at end of file
......@@ -87,7 +87,7 @@ pygments_style = 'sphinx'
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'
html_theme = 'sphinx_rtd_theme'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
......@@ -178,6 +178,7 @@ todo_include_todos = True
# Enably sphinx specifc markdown features
def setup(app):
app.add_stylesheet('css/custom.css')
app.add_config_value('recommonmark_config', {
'enable_auto_doc_ref': True,
'enable_eval_rst': True
......
# Contributing
The are some *rules* or better strong *guidelines*
- Use an IDE (e.g. [vscode](https://code.visualstudio.com/)) or otherwise automatically
enforce code [formatting and linting](https://code.visualstudio.com/docs/python/linting).
- There is a style guide to python. Write [pep-8](https://www.python.org/dev/peps/pep-0008/)
compliant python code. An exception is the line cap at 79, which can be broken but keep it 90-ish.
- Test the public API of each submodule (i.e. python file)
- Be [pythonic](https://docs.python-guide.org/writing/style/) and watch
[this](https://www.youtube.com/watch?v=wf-BqAjZb8M).
- Document any *public* API of each submodule (e.g. python file). Public meaning API that
is exposed to other submodules (i.e. other python files).
- Use google [docstrings](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html).
- Add your docstrings to the sphinx documentation in `docs`. Use .md, follow the example.
Markdown in sphix is supported via [recommonmark](https://recommonmark.readthedocs.io/en/latest/index.html#autostructify)
and [AutoStructify](http://recommonmark.readthedocs.io/en/latest/auto_structify.html)
- The project structure is according to [this](https://docs.python-guide.org/writing/structure/)
guide. Keep it!
Development guidelines
======================
Design principles
-----------------
- simple first, complicated only when necessary
- adopting generic established 3rd party solutions before implementing specific solutions
- only uni directional dependencies between components/modules, no circles
- only one language: Python (except, GUI of course)
Source code & Git repository
----------------------------
Code Rules
^^^^^^^^^^
The are some *rules* or better strong *guidelines* for writing code. The following
applies to all python code (and were applicable, also to JS and other code):
- Use an IDE (e.g. `vscode <https://code.visualstudio.com/>`_ or otherwise automatically
enforce code (`formatting and linting <https://code.visualstudio.com/docs/python/linting>`_).
Use ``nomad qa`` before committing. This will run all tests, static type checks, linting, etc.
- There is a style guide to python. Write `pep-8 <https://www.python.org/dev/peps/pep-0008/>`_
compliant python code. An exception is the line cap at 79, which can be broken but keep it 90-ish.
- Test the public API of each sub-module (i.e. python file)
- Be `pythonic <https://docs.python-guide.org/writing/style/>`_ and watch
`this <https://www.youtube.com/watch?v=wf-BqAjZb8M>`_.
- Document any *public* API of each sub-module (e.g. python file). Public meaning API that
is exposed to other sub-modules (i.e. other python files).
- Use google `docstrings <http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html>`_.
- Add your doc-strings to the sphinx documentation in ``docs``. Use .md, follow the example.
Markdown in sphix is supported via `recommonmark
<https://recommonmark.readthedocs.io/en/latest/index.html#autostructify>`_
and `AutoStructify <http://recommonmark.readthedocs.io/en/latest/auto_structify.html>`_
- The project structure is according to `this guid <https://docs.python-guide.org/writing/structure/>`_.
Keep it!
CI/CD
^^^^^
These *guidelines* are partially enforced by CI/CD. As part of CI all tests are run on all
branches; further we run a *linter*, *pep8* checker, and *mypy* (static type checker). You can
run ``nomad qa`` to run all these tests and checks before committing.
Only the CI/CD on ``master`` will create new ``*:latest`` images and allow to deploy.
Git/GitLab
^^^^^^^^^^
The ``master`` branch of our repository is *protected*. You must not (even if you have
the rights) commit to it directly. You develop on *feature* branches and commit changes
to the master branch via *merge requests*. After merge feature branches should be removed.
We tag releases with ``vX.X.X`` according to the regular semantic versioning practices.
There might be branches for older versions to facilitate hot-fixes.
Terms and Identifiers
---------------------
There are is some terminology consistently used in this documentation and the source
code. Use this terminology for identifiers.
Do not use abbreviations. There are (few) exceptions: `proc` (processing); `exc`, `e` (exception);
``calc`` (calculation), ``repo`` (repository), ``utils`` (utilities), and ``aux`` (auxiliary).
Other exceptions are ``f`` for file-like streams and ``i`` for index running variables.
Btw., the latter is almost never necessary in python.
Terms:
- upload: A logical unit that comprises one (.zip) file uploaded by a user.
- calculation: A computation in the sense that is was created by an individual run of a CMS code.
- raw file: User uploaded files (e.g. part of the uploaded .zip), usually code input or output.
- upload file/uploaded file: The actual (.zip) file a user uploaded
- mainfile: The mainfile output file of a CMS code run.
- aux file: Additional files the user uploaded within an upload.
- repo entry: Some quantities of a calculation that are used to represent that calculation in the repository.
- archive data: The normalized data of one calculation in nomad's meta-info-based format.
Ids and Hashes
--------------
Throughout nomad, we use different ids and hashes to refer to entities. If something
is called *id*, it is usually a random uuid and has no semantic connection to the entity
it identifies. If something is called a *hash* than it is a hash build based on the
entity it identifies. This means either the whole thing or just some properties of
said entities.
The most common hashes are the *upload_hash* and *calc_hash*. The upload hash is
a hash over an uploaded file, as each upload usually refers to an individual user upload
(usually a .zip file). The calc_hash is a hash over the mainfile path within an upload.
The combination of upload_hash and calc_hash is used to identify calculations. They
allow us to id calculations independently of any random ids that are created during
processing. To create hashes we use :py:func:`nomad.utils.hash`.
NOMAD-coe Dependencies
----------------------
We currently clone and install NOMAD-coe dependencies *"outside"* the nomad-FAIR project
(see :py:mod:`nomad.dependencies`). The installed projects become part of the python
environment and all dependencies are used like regular pipy packages and python modules.
This allows us to target (e.g. install) individual commits. In theory, these might
change during runtime, allowing to update parsers or normalizers on a running nomad.
More importantly, we can address commit hashes to identify exact parser/normalizer versions.
On the downside, common functions for all dependencies (e.g. the python-common package,
or nomad_meta_info) cannot be part of the nomad-FAIR project. In general, it is hard
to simultaneously develop nomad-FAIR and NOMAD-coe dependencies.
Another approach is to integrate the NOMAD-coe sources with nomad-FAIR. The lacking
availability of individual commit hashes, could be replaces with hashes of source-code
files.
We use the branch ``nomad-fair`` on all dependencies for nomad-FAIR specific changes.
Parsers
^^^^^^^
There are several steps to take, to make a NOMOAD-coe parser fit for nomad-FAIR:
- Implement ``nomadcore.baseclasses.ParserInterface``. Make sure that the meta-info is
only loaded for each parse instance, not for each parser run.
- Have a root package that bears the parser name, e.g. ``vaspparser``
- The important classes (e.g. the parser interface implementation) in the root module
(e.g. ``vaspparser/__init__.py``)
- Only use sub-modules were necessary. Try to avoid sub-directories
- Have a test module. Don't go overboard with the test data.
- Make it a pypi-style package, i.e. create ``setup.py`` script.
- The package name should be the parser name, e.g. ``vaspparser``.
- The parser should only use the provided logger
(:py:func:`nomadcore.baseclasses.ParserInterface::setup_logger`).
This is important for two reasons. First, our logging uses structured logging and
all entries are tagged data about parser, upload_ids, mainfiles, etc. This is important
to make errors easily reproduceable. Second, we store all logs on a parser run to
be available for end users.
- Keep logging sensible (see logging below). Do not log everything. Do not log massive
amounts of data. Keep in mind what are errors (as in the parser cannot perform its job)
and what not (the input is faulty).
- Remove all scala code.
Normalizers
^^^^^^^^^^^
There are several steps to take, to make a NOMOAD-coe normalizer fit for nomad-FAIR:
- If written in scala, re-write it in python.
- The normalizer should read from the provided backend. In NOMAD-coe normalizers read
data from provided serialized dictionaries. Don't do that; we do not want to use such
a normalizer specific interface.
- Do package, module, and logging related changes as you would for a parser.
Logging
-------
There are three important prerequisites to understand about nomad-FAIR's logging:
- All log entries are recorded in a central elastic search database. To make this database
useful, log entries must be sensible in size, frequence, meaning, level, and logger name.
Therefore, we need to follow some rules when it comes to logging.
- We use an *structured* logging approach. Instead of encoding all kinds of information
in log messages, we use key-value pairs that provide context to a log *event*. In the
end all entries are stored as JSON dictionaries with ``@timestamp``, ``level``,
``logger_name``, ``event`` plus custom context data. Keep events very short, most
information goes into the context.
- We use logging to inform us about the state of nomad-FAIR, not about user
behavior, input, data. Do not confuse this when determining the log-level for an event.
A user providing an invalid upload file, for example, should never be an error.
Please follow the following rules when logging:
- Only use :py:func:`nomad.utils.get_logger` to acquire a logger. Never use the build-in
logging directly. These logger work like the system loggers, but allow you to
pass keyword arguments with additional context data. See also the
`structlog docs <https://structlog.readthedocs.io/en/stable/>`_.
- In many context, a logger is already provided (e.g. api, processing, parser, normalizer).
This provided logger has already context information bounded. So it is important to
use those instead of acquiring your own loggers. Have a look for methods called
``get_logger`` or attributes called ``logger``.
- Keep events (what usually is called *message*) very short. Examples are: *file uploaded*,
*extraction failed*, etc.
- Structure the keys for context information. When you analyse logs in ELK, you will
see that the set of all keys over all log entries can be quit large. Structure your
keys to make navigation easier. Use keys like ``nomad.proc.parser_version`` instead of
``parser_version``. Use module names as prefixes.
- Don't log everything. Try to anticipate, how you would use the logs in case of bugs,
error scenarios, etc.
- Don't log sensitive data.
- Think before logging data (especially dicts, list, numpy arrays, etc.).
- Logs should not be abused as a *printf*-style debugging tool.
......@@ -9,6 +9,6 @@ and infrastructure with a simplyfied architecture and consolidated code base.
introduction
setup
dev_guidelines
api
reference
contributing
......@@ -2,24 +2,24 @@ Introduction
============
**NOvel Materials Discorvery (NOMAD)** comprises storage, processing, management, discovery, and
analytics of computational material science data from over 40 comminity *codes*.
analytics of computational material science data from over 40 community *codes*.
The original NOMAD software, developed by the
[NOMAD-coe](http://nomad-coe.eu) project, is used to host over 50 million total energy
calculations in a single central infrastructure instances that offers a variaty
calculations in a single central infrastructure instances that offers a variety
of services (repository, archive, encyclopedia, analytics, visualization).
This is the documentation of **nomad@FAIR**, the Open-Source continuation of the
original NOMAD-coe software that reconsiles the original code base,
original NOMAD-coe software that reconciles the original code base,
integrate it's services,
allows 3rd parties to run individual and federated instance of the nomad infrastructure,
provides nomad to other material science domains, and applies the FAIR principles
as prolifirated by the (FAIR Data Infrastructure e.V.)[http://fairdi.eu].
as proliferated by the (FAIR Data Infrastructure e.V.)[http://fairdi.eu].
There are different use-modes for the nomad software, but the most common use is
to run the nomad infrastructure on a cloud and provide clients access to
web-based GUIs and REST APIs. This nomad infrastructure logically comprises the
*nomad repository* for uploading, searching, and downloading raw calculation input and output
from all relevant computionational material science codes. A second part of nomad
from all relevant computational material science codes. A second part of nomad
is the *archive*. It provides all uploaded data in a common data format
called *meta-info* and includes common and code specific
schemas for structured data. Further services are available from
......@@ -29,9 +29,9 @@ and *advanced graphics*.
Architecture
------------
The following depicts the *nomad@FAIR* architecture with respect to software compenents
The following depicts the *nomad@FAIR* architecture with respect to software components
in terms of python modules, gui components, and 3rd party services (e.g. databases,
search enginines, etc.). It comprises a revised version of the repository and archive.
search engines, etc.). It comprises a revised version of the repository and archive.
.. figure:: components.png
:alt: nomad components
......@@ -45,7 +45,8 @@ celery
^^^^^^
http://celeryproject.org (incl. rabbitmq) is a popular combination for realizing
long running tasks in internet applications. We use it to drive the processing of uploaded files.
It allows us to transparently distribute processing load.
It allows us to transparently distribute processing load while keeping processing state
available to inform the user.
elastic search
^^^^^^^^^^^^^^
......@@ -54,8 +55,8 @@ Elastic search allows for flexible scalable search and analytics.
mongodb
^^^^^^^
Mongo is used to store and track the state of the processing of uploaded files and therein c
ontained calculations.
Mongo is used to store and track the state of the processing of uploaded files and therein
contained calculations.
elastic stack
^^^^^^^^^^^^^
......@@ -83,44 +84,3 @@ Processing
The workflow of nomad's processing tasks
See :py:mod:`nomad.processing` for further information.
Design principles
-----------------
- simple first, complicated only when necessary
- adopting generic established 3rd party solutions before implementing specific solutions
- only uni directional dependencies between components/modules, no circles
- only one language: Python (except, GUI of course)
General concepts
----------------
terms
^^^^^
There are is some terminology consistently used in this documentastion and the source
code:
- upload: A logical unit that comprises one (.zip) file uploaded by a user.
- calculation: A computation in the sense that is was created by an individual run of a CMS code.
- raw file: User uploaded files (e.g. part of the uploaded .zip), usually code input or output.
- upload file/uploaded file: The actual (.zip) file a user uploaded
- mainfile: The mainfile output file of a CMS code run.
- aux file: Additional files the user uploaded within an upload.
- repo entry: Some quantities of a calculation that are used to represent that calculation in the repository.
- archive data: The normalized data of one calculation in nomad's meta-info-based format.
ids and hashes
^^^^^^^^^^^^^^
Throughout nomad, we use different ids and hashes to refer to entities. If something
is called *id*, it is usually a random uuid and has no semantic conection to the entity
it identifies. If something is calles a *hash* than it is a hash build based on the
entitiy it identifies. This means either the whole thing or just some properties of
said entities.
The most common hashes are the *upload_hash* and *calc_hash*. The upload hash is
a hash over an uploaded file, as each upload usually refers to an indiviudal user upload
(usually a .zip file). The calc_hash is a hash over the mainfile path within an upload.
The combination of upload_hash and calc_hash is used to identify calculations. They
allow us to id calculations independently of any random ids that are created during
processing. To create hashes we use :func:`nomad.utils.hash`.
\ No newline at end of file
......@@ -36,3 +36,8 @@
```eval_rst
.. automodule:: nomad.user
```
## nomad.utils
```eval_rst
.. automodule:: nomad.utils
```
\ No newline at end of file
.root {
font-family: "Roboto", "Helvetica", "Arial", sans-serif;
color: rgba(0, 0, 0, 0.87);
font-size: 16px;
line-height: 1.6;
width: 1000px;
}
.root code {
font-family: "Roboto Mono", monospace;
}
.root .documentwrapper {
float: left;
width: 700px;
}
.root .sphinxsidebar {
padding: 0 20px;
margin-left: 730px;
}
.root .footer { display: none; }
.root #indices-and-tables { display: none; }
.root .relations { display: none; }
.root p {
-webkit-margin-before: 1em;
-webkit-margin-after: 1em;
-webkit-margin-start: 0px;
-webkit-margin-end: 0px;
}
.root h1 > a { display: none; }
.root h2 > a { display: none; }
.root h3 > a { display: none; }
.root h4 > a { display: none; }
.root a {
color: #607D8B;
}
.root h1 {
color: rgba(0, 0, 0, 0.54);
margin: 32px 0 24px;
font-size: 2.8125rem;
font-weight: 400;
font-family: "Roboto", "Helvetica", "Arial", sans-serif;
line-height: 1.13333em;
margin-left: -.02em;
}
.root h2 {
color: rgba(0, 0, 0, 0.54);
margin: 32px 0 24px;
font-size: 2.125rem;
font-weight: 400;
line-height: 1.20588em;
}
.root h3 {
color: rgba(0, 0, 0, 0.87);
margin: 24px 0 18px;
font-size: 1.5rem;
font-weight: 400;
font-family: "Roboto", "Helvetica", "Arial", sans-serif;
line-height: 1.35417em;
}
.root h4 {
color: rgba(0, 0, 0, 0.87);
margin: 18px 0 12px;
font-size: 1rem;
font-weight: 400;
font-family: "Roboto", "Helvetica", "Arial", sans-serif;
line-height: 1.5em;
}
.root pre {
margin: 24px 0;
padding: 12px 18px;
overflow: auto;
border-radius: 4px;
background-color: #fff;
}
.root ul {
margin: 0;
padding: 0;
list-style: none;
padding-bottom: 12px;
}
.root li:first-child {
padding-top: 12px;
}
.root li {
padding-bottom: 8px;
width: 100%;
text-align: left;
align-items: center;
padding-left: 12px;
text-decoration: none;
}
.root img {
width: 700px;
}
.root .caption {
text-align: center;
color: rgba(0, 0, 0, 0.54);
font-size: 14px;
font-weight: 400;
font-family: "Roboto", "Helvetica", "Arial", sans-serif;
line-height: 1.375;
}
.root div.admonition {
margin-top: 10px;
margin-bottom: 10px;
padding: 7px;
}
.root div.admonition dt {
font-weight: bold;
}
.root div.admonition dl {
margin-bottom: 0;
}
.root p.admonition-title {
margin: 0px 10px 5px 0px;
font-weight: bold;
}
.root div.body p.centered {
text-align: center;
margin-top: 25px;
}
.root dd p {
margin-top: 0;
}
.root .field-body strong {
font-family: "Roboto Mono", monospace;
}
.root .descname {
font-weight: 500;
}
import React, { Component } from 'react'
import PropTypes from 'prop-types'
import HtmlToReact from 'html-to-react'
import { withRouter } from 'react-router-dom'
import { HashLink as Link } from 'react-router-hash-link'
import './Documentation.css'
import Url from 'url-parse'
import { apiBase, appBase } from '../config'
import { withStyles } from '@material-ui/core';
const docBaseRegExp = new RegExp(`^(${appBase.replace('/', '\\/')})?(\\/docs/)?`)
const processNodeDefinitions = new HtmlToReact.ProcessNodeDefinitions(React)
const processingInstructions = location => {
return [
{
// We have to remove sphynx header links. Not all of them are cought with css.
shouldProcessNode: node => node.name === 'a' && node.children[0].data === '',
processNode: (node, children) => {
return ''
}
},
{
// We have to replace the sphynx links with router Links;
// the hrefs have to be processed to be compatible with router, i.e. they have
// to start with /documentation/.
shouldProcessNode: node => node.type === 'tag' && node.name === 'a' && node.attribs['href'] && !node.attribs['href'].startsWith('http'),
processNode: function DocLink(node, children) {
const linkUrl = Url(node.attribs['href'])
let pathname = linkUrl.pathname.replace(docBaseRegExp, '').replace(/^\//, '')
if (pathname === '') {
pathname = location.pathname
} else {
pathname = `/docs/${pathname}`
}
return (
<Link smooth to={pathname + (linkUrl.hash || '#')}>{children}</Link>
)
}
},
{
// We have to redirect img src attributes to the static sphynx build dir.
shouldProcessNode: node => node.type === 'tag' && node.name === 'img' && node.attribs['src'] && !node.attribs['src'].startsWith('http'),
processNode: (node, children) => {
node.attribs['src'] = `${apiBase}/docs/${node.attribs['src']}`
return processNodeDefinitions.processDefaultNode(node)
}
},
{
shouldProcessNode: node => true,
processNode: processNodeDefinitions.processDefaultNode
}
]
}
const isValidNode = () => true
const htmlToReactParser = new HtmlToReact.Parser()
const domParser = new DOMParser() // eslint-disable-line no-undef
class Documentation extends Component {
static propTypes = {
location: {
pathname: PropTypes.string.isRequired
}
classes: PropTypes.object.isRequired,
}
state = {
react: ''
}
onRouteChanged() {
const fetchAndUpdate = path => {
if (path === '' || path.startsWith('#')) {
path = '/index.html' + path
}
fetch(`${apiBase}/docs${path}`)