Commit 546ae436 authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Review of the documentation. #716

parent cfacafb6
......@@ -24,6 +24,7 @@ target/
.idea/
.vscode/launch.json
.vscode/settings.local.json
.vscode/extensions.json
nomad.yaml
gunicorn.log.conf
gunicorn.conf
......@@ -36,6 +37,10 @@ gui/src/searchQuantities.json
gui/src/toolkitMetadata.json
gui/src/unitsData.js
gui/src/parserMetadata.js
gui/.editorconfig
gui/.pnp.cjs
gui/.yarn/
gui/.yarnrc.yml
examples/workdir/
gunicorn.log.conf
nomad/gitinfo.py
......
......@@ -71,8 +71,7 @@ This will give you something like this:
The `entry_id` is a unique identifier for, well, entries. You can use it to access
other entry data. For example, you want to access the entry's archive. More
precisely, you want to gather the formula and energies from the main workflow result.
The following requests the archive based on the `entry_id` and only requires
the some archive sections.
The following requests the archive based on the `entry_id` and only requires some archive sections.
```py
first_entry_id = response_json['data'][0]['entry_id']
......@@ -174,9 +173,9 @@ The result will look like this:
```
You can work with the results in the given JSON (or respective Python dict/list) data already.
If you have [NOMAD's Python library](pythonlib) installed ,
If you have [NOMAD's Python library](pythonlib.md) installed ,
you can take the archive data and use the Python interface.
The [Python interface](metainfo) will help with code-completion (e.g. in notebook environments),
The [Python interface](metainfo.md) will help with code-completion (e.g. in notebook environments),
resolve archive references (e.g. from workflow to calculation to system), and allow unit conversion:
```py
from nomad.datamodel import EntryArchive
......@@ -200,12 +199,12 @@ programming interfaces (APIs). More specifically [RESTful HTTP APIs](https://en.
to use NOMAD as a set of resources (think data) that can be uploaded, accessed, downloaded,
searched for, etc. via [HTTP requests](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol).
You can get an overview on all NOMAD APIs on the [API page](https://nomad-lab.eu/prod/v1/gui/analyze/apis).
You can get an overview on all NOMAD APIs on the [API page]({{ nomad_url() }}/v1/gui/analyze/apis).
We will focus here on NOMAD's main API (v1). In fact, this API is also used by
the web interface and should provide everything you need.
There are different tools and libraries to use the NOMAD API that come with different
trade-offs between expressiveness, learning curve, and convinience.
trade-offs between expressiveness, learning curve, and convenience.
#### You can use your browser
......@@ -228,11 +227,11 @@ See [the initial example](#getting-started).
#### Use our *dashboard*
The NOMAD API has an [OpenAPI dashboard](../api/v1). This is an interactive documentation of all
The NOMAD API has an [OpenAPI dashboard]({{ nomad_url() }}/v1/api/v1). This is an interactive documentation of all
API functions that allows you to try these functions in the browser.
#### Use NOMAD's Python package
Install the [NOMAD Python client library](pythonlib) and use it's `ArchiveQuery`
Install the [NOMAD Python client library](pythonlib.md) and use it's `ArchiveQuery`
functionality for a more convenient query based access of archive data.
## Different kinds of data
......@@ -244,7 +243,7 @@ the API:
- Raw files, the files as they were uploaded to NOMAD.
- Archive data, all of the extracted data for an entry.
There are also different entities (see also [Datamodel](index#datamodel)) with different functions in the API:
There are also different entities (see also [Datamodel](index.md#datamodel-uploads-entries-files-datasets)) with different functions in the API:
- Entries
- Uploads
......@@ -284,9 +283,9 @@ available if you are [logged in](#authentication).
### Pagination
Typically when you issue a query, not all results can be returned. Instead an API will
typically only return one *page*. This behavior is controlled through pagination parameters,
like `page_site`, `page`, `page_offset`, `page_after_value`.
When you issue a query, usually not all results can be returned. Instead, an API returns
only one *page*. This behavior is controlled through pagination parameters,
like `page_site`, `page`, `page_offset`, or `page_after_value`.
Let's consider a search for entries as an example.
```py
......@@ -305,7 +304,7 @@ response = requests.post(
)
```
This will only produce a response with max 10 entries in it. The response will contain a
This will only result in a response with a maximum of 10 entries. The response will contain a
`pagination` object like this:
```json
{
......@@ -342,10 +341,10 @@ You will get the next 10 results.
### Authentication
Most of the API operations do not require any authorization and can be freely used
without a user or credentials. However, to upload data, edit data, or view your own and potentially unpublished data, the API needs to authenticate you.
without a user or credentials. However, to upload, edit, or view your own and potentially unpublished data, the API needs to authenticate you.
The NOMAD API uses OAuth and tokens to authenticate users. We provide simple operations
that allow you to acquire an *access token* via username and password based:
that allow you to acquire an *access token* via username and password:
```py
import requests
......@@ -380,13 +379,13 @@ operations.
## Search for entries
See [getting started](#getting-started) for a typical search example. Combine the [different
concepts](#common concepts) above to create the queries that you need.
concepts](#common-concepts) above to create the queries that you need.
Searching for entries is typically just an initial step. Once you know what entries exist
you'll probably want to do one of the following things.
## Download raw files
You can use [queries](#queries) to download raw files. But typically you don't want to
You can use [queries](#queries) to download raw files, but typically you don't want to
download file-by-file or entry-by-entry. Therefore, we allow to download a large set of
files in one big zip-file. Here, you might want to use a program like *curl* to download
directly from the shell:
......@@ -396,11 +395,11 @@ curl "{{ nomad_url() }}/v1/entries/raw?results.material.elements=Ti&results.mate
```
## Access archives
Above under [getting started](#getting started), you'll already learned how to access
archive data. A speciality of archive API functions is that you can define what is `required`
Above under [getting started](#getting started), you've already learned how to access
archive data. A special feature of the archive API functions is that you can define what is `required`
from the archives.
```
```py
response = requests.post(
f'{base_url}/entries/archive/query',
json={
......
......@@ -3,17 +3,17 @@
## Introduction
NOMAD stores all processed data in a *well defined*, *structured*, and *machine readable*
format. Well defined means that each element is backed by a formal definition that provides
a name, description, location, shape, type, and potential unit for that data. It has a
hierarchical structure that logically organizes data in sections and sub-sections and allows
to include cross-references between pieces of data. Formal definitions and corresponding
data structures allow to machine process NOMAD data.
format. Well defined means that each element is supported by a formal definition that provides
a name, description, location, shape, type, and possible unit for that data. It has a
hierarchical structure that logically organizes data in sections and subsections and allows
cross-references between pieces of data. Formal definitions and corresponding
data structures enable the machine processing of NOMAD data.
![archive example](assets/archive-example.png)
#### The Metainfo is the schema for Archive data.
The Archive stores descriptive and structured information about materials-science
data. Each entry in NOMAD is associated with one Archive that contains all the processed
information of that entry. What information can possible exist in an archive, how this
information of that entry. What information can possibly exist in an archive, how this
information is structured, and how this information is to be interpreted is governed
by the Metainfo.
......@@ -23,7 +23,7 @@ provide data types with names, descriptions, categories, and further information
applies to all incarnations of a certain data type.
Consider a simulation `Run`. Each
simulation run in NOMAD is characterized by a *section*, is called *run*, can contain
simulation run in NOMAD is characterized by a *section*, that is called *run*. It can contain
*calculation* results, simulated *systems*, applied *methods*, the used *program*, etc.
What constitutes a simulation run is *defined* in the metainfo with a *section definition*.
All other elements in the Archive (e.g. *calculation*, *system*, ...) have similar definitions.
......@@ -34,8 +34,8 @@ has to provide certain information: *name*, *description*, *shape*, *units*, *ty
#### Types of definitions
- *Sections* are the building block for hierarchical data. A section can contain other
sections (via *sub-sections*) and data (via *quantities*).
- *Sub-sections* define a containment relationship between sections.
sections (via *subsections*) and data (via *quantities*).
- *Subsections* define a containment relationship between sections.
- *Quantities* define a piece of data in a section.
- *References* are special quantities that allow to define references from a section to
another section or quantity.
......@@ -46,8 +46,8 @@ has to provide certain information: *name*, *description*, *shape*, *units*, *ty
NOMAD Metainfo is kept independent of the actual storage format and is not bound to any
specific storage method. In our practical implementation, we use a binary form of JSON,
called [msgpack](https://msgpack.org/) on our servers and provide Archive data as JSON via
our API. For NOMAD end-users the internal storage format is of little relevance, because
the archive data is solely served by NOMAD's API. On top of the JSON API data, the
our API. For NOMAD end-users the internal storage format is of little relevance, since the
archive data is provided exclusively by NOMAD's API. On top of the JSON API data, the
[NOMAD Python package](pythonlib.md) provides a more convenient interface for Python users.
......@@ -93,27 +93,28 @@ API will give you JSON data likes this:
```
This will show you the Archive as a hierarchy of JSON objects (each object is a section),
where each key is a property (e.g. a quantity or sub-section). Of course you can use
where each key is a property (e.g. a quantity or subsection). Of course you can use
this data in this JSON form. You can expect that the same keys (each item has a formal
definition) always provides the same type of data. But, not all keys are present in
each archive, not all lists might have the same amount of objects. This depends on the
data. For example, some *runs* contain many systems (e.g. geometry optimization), others
definition) always provides the same type of data. However, not all keys are present in
every archive, and not all lists might have the same number of objects. This depends on the
data. For example, some *runs* contain many systems (e.g. geometry optimizations), others
don't; typically *bulk* systems will have *symmetry* data, non bulk systems might not.
To learn what each key means, you need to look-up its definition in the Metainfo.
To learn what each key means, you need to look up its definition in the Metainfo.
{{ metainfo_data() }}
## Archive Python interface
In Python, you JSON data is typically represented as nested combinations of dictionaries
In Python, JSON data is typically represented as nested combinations of dictionaries
and lists. Of course, you could work with this right away. To make it easier for Python
programmers the [NOMAD Python package](pythonlib.md), will allow you to use this
JSON data with a more high level interface that give the following advantages:
programmers the [NOMAD Python package](pythonlib.md) allows you to use this
JSON data with a higher level interface, which provides the following advantages:
- code completion in dynamic coding environments like Jupyter notebooks
- a cleaner syntax that uses attributes instead of dictionary access
- all higher dimensional numerical data is represented as numpy arrays
- allows to navigate references
- allows to navigate through references
- numerical data has a Pint unit attached to it
For each section the Python package contains a Python class that corresponds to its
......
......@@ -20,7 +20,7 @@ cd nomad
```
There are several branches in the repository. The master branch contains the latest released version, but there are also
develop branches for each version called vX.X.X. Checkout the branch you want to work on it
develop branches for each version called vX.X.X. Checkout the branch you want to work on.
```
git checkout vX.X.X
```
......@@ -34,20 +34,20 @@ This branch can be pushed to the repo, and then later may be merged to the relev
### Setup a Python environment
You work in a Python virtual environment.
You should work in a Python virtual environment.
#### pyenv
The nomad code currently targets python 3.7. If you host machine has an older version installed,
The nomad code currently targets python 3.7. If your host machine has an older version installed,
you can use [pyenv](https://github.com/pyenv/pyenv) to use python 3.7 in parallel to your
system's python. Never the less, we have good experience with 3.8 and 3.9 users as well
and everything might work with newer versions as well.
#### virtualenv
We strongly recommend to use *virtualenv* to create a virtual environment. It will allow you
We strongly recommend to use *virtualenv* to create a virtual environment. It allows you
to keep nomad and its dependencies separate from your system's python installation.
Make sure to base the virtual environment on Python 3.
Make sure that the virtual environment is based on Python 3.
To install *virtualenv*, create an environment and activate the environment use:
To install *virtualenv*, create an environment, and activate the environment, use:
```
pip install virtualenv
virtualenv -p `which python3` .pyenv
......@@ -70,7 +70,7 @@ To install libmagick for conda, you can use (other channels might also work):
conda install -c conda-forge --name nomad_env libmagic
```
Using the following command one can install all the dependencies, and the sub-modules from the NOMAD-coe project
The following command can be used to install all dependencies and the submodules of the NOMAD-coe project.
```
bash setup.sh
```
......@@ -104,16 +104,16 @@ their own GITLab/git repositories. To clone and initialize them run:
git submodule update --init
```
All requirements for these submodules need to be installed and they need to be installed
themselves as python modules. Run the `dependencies.sh` script that will install
All requirements for these submodules need to be installed and they themselves need to be installed
as python modules. Run the `dependencies.sh` script that will install
everything into your virtual environment:
```sh
./dependencies.sh -e
```
If one of the Python packages that are installed during this process, fails because it
If one of the Python packages, that are installed during this process, fail because it
cannot be compiled on your platform, you can try `pip install --prefer-binary <packagename>`
to install set package manually.
to install set packages manually.
The `-e` option will install the NOMAD-coe dependencies with symbolic links allowing you
to change the downloaded dependency code without having to reinstall after.
......@@ -148,17 +148,18 @@ Or simply run
The generated files are not stored in GIT. If you pull a different commit, the GUI code
might not match the expected data in outdated files. If there are changes to units, metainfo, new parsers, new toolkits it might be necessary to regenerate these gui artifacts.
In additional, you have to do some more steps to prepare your working copy to run all
In addition, you have to do some more steps to prepare your working copy to run all
the tests. See below.
## Run the infrastructure
### Install docker
One needs to install [docker](https://docs.docker.com/get-docker/) and [docker-compose](https://docs.docker.com/compose/install/).
You need to install [docker](https://docs.docker.com/get-docker/) and [docker-compose](https://docs.docker.com/compose/install/).
### Run required 3rd party services
To run NOMAD, some 3rd party services are needed
- elastic search: nomad's search and analytics engine
- mongodb: used to store processing state
- rabbitmq: a task queue used to distribute work in a cluster
......@@ -189,8 +190,8 @@ curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_setti
Note that the ElasticSearch service has a known problem in quickly hitting the
virtual memory limits of your OS. If you experience issues with the
ElasticSearch container not running correctly or crashing, try [increasing the
virtual memory limits as shown here](https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html).
ElasticSearch container not running correctly or crashing, try increasing the
virtual memory limits as shown [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html).
To shut down everything, just `ctrl-c` the running output. If you started everything
in *deamon* mode (`-d`) use:
......@@ -198,7 +199,7 @@ in *deamon* mode (`-d`) use:
docker-compose down
```
Usually these services only used by NOMAD, but sometimes you also
Usually these services are used only by NOMAD, but sometimes you also
need to check something or do some manual steps. You can access mongodb and elastic search
via your preferred tools. Just make sure to use the right ports.
......@@ -213,8 +214,8 @@ keycloak:
realm_name: fairdi_nomad_test
```
NOMAD consist of the NOMAD app/api, a worker, and the GUI. You can run app and worker with
the NOMAD cli. These commands will run the services and show their logout put. You should open
NOMAD consist of the NOMAD app/api, a worker, and the GUI. You can run the app and the worker with
the NOMAD cli. These commands will run the services and display their log output. You should open
them in separate shells as they run continuously. They will not watch code changes and
you have to restart manually.
......@@ -226,7 +227,7 @@ nomad admin run app
nomad admin run worker
```
Or both together in once process:
Or both together in one process:
```
nomad admin run appworker
```
......@@ -248,8 +249,8 @@ nomad dev toolkit-metadata > gui/src/toolkitMetadata.json
nomad dev units > gui/src/unitsData.js
```
When you run the gui on its own (e.g. with react dev server below), you have to have
the app manually also. The gui and its dependencies run on [node](https://nodejs.org) and
If you run the gui on its own (e.g. with react dev server below), you also have to have
the app manually. The gui and its dependencies run on [node](https://nodejs.org) and
the [yarn](https://yarnpkg.com/) dependency manager. Read their documentation on how to
install them for your platform.
```sh
......@@ -265,9 +266,7 @@ of the code base.
You have to provide static files to serve the docs and NOMAD distribution:
```sh
cd docs
make html
cd ..
mkdocs build && mv site docs/build
python setup.py compile
python setup.py sdist
cp dist/nomad-lab-*.tar.gz dist/nomad-lab.tar.gz
......@@ -298,13 +297,13 @@ This mimiques the tests and checks that the GitLab CI/CD will perform.
## Setup your IDE
The documentation section on development guidelines details (see below) how the code is organized,
The documentation section for development guidelines (see below) details how the code is organized,
tested, formatted, and documented. To help you meet these guidelines, we recommend to
use a proper IDE for development and ditch any VIM/Emacs (mal-)practices.
We strongly recommend that all developers use *visual studio code*, or *vscode* for short,
(this is a copletely different producs than *visual studio*). It is available for free
for all major platforms (here)[https://code.visualstudio.com/download].
(this is a completely different producs than *visual studio*). It is available for free
for all major platforms [here](https://code.visualstudio.com/download).
You should launch and run vscode directly from the projects root directory. The source
code already contains settings for vscode in the `.vscode` directory. The settings
......@@ -315,7 +314,7 @@ pipelines. If you want to augment this with your own settings, you can have a
The settings also include a few launch configuration for vscode's debugger. You can create
your own launch configs in `.vscode/launch.json` (also in .gitignore).
The settings exprect that you have installed a python environment at `.pyenv` as
The settings expect that you have installed a python environment at `.pyenv` as
described in this tutorial (see above).
## Code guidelines
......@@ -348,8 +347,7 @@ applies to all python code (and were applicable, also to JS and other code):
- Use google [docstrings](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html).
- Add your doc-strings to the sphinx documentation in `docs`. Use .md, follow the example.
Markdown in sphinx is supported via [recommonmark]
(https://recommonmark.readthedocs.io/en/latest/index.html#autostructify)
Markdown in sphinx is supported via [recommonmark](https://recommonmark.readthedocs.io/en/latest/index.html#autostructify)
and [AutoStructify](http://recommonmark.readthedocs.io/en/latest/auto_structify.html)
- The project structure is according to [this guide](https://docs.python-guide.org/writing/structure/).
......@@ -370,7 +368,7 @@ build and test stage completed successfully.
### Names and identifiers
There are is some terminology consistently used in this documentation and the source
There is a certain terminology consistently used in this documentation and the source
code. Use this terminology for identifiers.
Do not use abbreviations. There are (few) exceptions: `proc` (processing); `exc`, `e` (exception);
......@@ -397,7 +395,7 @@ Throughout nomad, we use different ids. If something
is called *id*, it is usually a random uuid and has no semantic connection to the entity
it identifies. If something is called a *hash* then it is a hash generated based on the
entity it identifies. This means either the whole thing or just some properties of
said entities.
this entities.
- The most common hashes is the `entry_hash` based on mainfile and auxfile contents.
- The `upload_id` is a UUID assigned to the upload on creation. It never changes.
......@@ -406,8 +404,8 @@ said entities.
- The `entry_id` (previously called `calc_id`) uniquely identifies an entry. It is a hash
over the `mainfile` and respective `upload_id`. **NOTE:** For backward compatibility,
`calc_id` is also still supported in the api, but using it is strongly discouraged.
- We often use pairs of `upload_id/entry_id`, which in many contexts allow to resolve an entry
related file on the filesystem without having to ask a database about it.
- We often use pairs of `upload_id/entry_id`, which in many contexts allow to resolve an entry-related
file on the filesystem without having to ask a database about it.
- The `pid` or (`coe_calc_id`) is a legacy sequential interger id, previously used to identify
entries. We still store the `pid` on these older entries for historical purposes.
- Calculation `handle` or `handle_id` are created based on those `pid`.
......@@ -427,8 +425,8 @@ There are three important prerequisites to understand about nomad-FAIRDI's loggi
`logger_name`, `event` plus custom context data. Keep events very short, most
information goes into the context.
- We use logging to inform about the state of nomad-FAIRDI, not about user
behavior, input, data. Do not confuse this when determining the log-level for an event.
For example, a user providing an invalid upload file, for example, should never be an error.
behavior, input, or data. Do not confuse this when determining the log-level for an event.
For example, a user providing an invalid upload file should never be an error.
Please follow the following rules when logging:
......@@ -535,8 +533,8 @@ The lifecycle of a *feature* branch should look like this:
We currently use git submodules to manage NOMAD internal dependencies (e.g. parsers).
All dependencies are python packages and installed via pip to your python environement.
This allows us to target (e.g. install) individual commits. More importantly, we can address c
ommit hashes to identify exact parser/normalizer versions. On the downside, common functions
This allows us to target (e.g. install) individual commits. More importantly, we can address commit
hashes to identify exact parser/normalizer versions. On the downside, common functions
for all dependencies (e.g. the python-common package, or nomad_meta_info) cannot be part
of the nomad-FAIRDI project. In general, it is hard to simultaneously develop nomad-FAIRDI
and NOMAD-coe dependencies.
......@@ -574,8 +572,8 @@ In these cases, use rebase and not merge. Rebase puts your branch commits in fro
merged commits instead of creating a new commit with two ancestors. It basically moves the
point where you initially branched away from the version branch to the current position in
the version branch. This will avoid merges, merge commits, and generally leave us with a
more consistent history. You can also rebase before create a merge request, basically
allowing for no-op merges. Ideally the only real merges that we ever have, are between
more consistent history. You can also rebase before creating a merge request, which basically
allows no-op merges. Ideally the only real merges that we ever have, are between
version branches.
```sh
......@@ -585,8 +583,8 @@ version branches.
When you need multiple branches to implement a feature and merge between them, try to
use *squash*. Squashing basically puts all commits of the merged branch into a single commit.
It basically allows you to have many commits and then squash them into one. This is useful
if these commits where just made for synchronization between workstations or due to
unexpected errors in CI/CD, you needed a save point, etc. Again the goal is to have
if these commits were made just to synchronize between workstations, due to
unexpected errors in CI/CD, because you needed a save point, etc. Again the goal is to have
coherent commits, where each commits makes sense on its own.
Often a feature is also represented by an *issue* on GitLab. Please mention the respective
......@@ -598,7 +596,7 @@ Remember that tags and branches are both Git references and you can accidentally
The main NOMAD GitLab-project (`nomad-fair`) uses Git-submodules to maintain its
parsers and other dependencies. All these submodules are places in the `/dependencies`
directory. There are helper scripts to install (`./dependencies.sh` and
directory. There are helper scripts to install (`./dependencies.sh`) and
commit changes to all submodules (`./dependencies-git.sh`). After merging or checking out,
you have to make sure that the modules are updated to not accidentally commit old
submodule commits again. Usually you do the following to check if you really have a
......
......@@ -3,13 +3,13 @@
The **NOvel Materials Discovery (NOMAD)** is a data management platform for materials
science data. Here, NOMAD is a [web-application and database](https://nomad-lab.eu/prod/v1/gui/search)
that allows to centrally publish data. But you can also use the NOMAD software to build your
own local [NOMAD Oasis](../oasis/).
own local [NOMAD Oasis](oasis.md).
![NOMAD](assets/nomad-hero-shot.png){ align=right width=400 }
*More than 12 million of simulations from over 400 authors world-wide*
- Free publication and sharing data of data
- Free publication and sharing of data
- Manage research data though its whole life-cycle
- Extracts <b>rich metadata</b> from data automatically
- All data in <b>raw</b> and <b>machine processable</b> form
......@@ -53,8 +53,8 @@ and their properties, as well as all analytics.
The *archive* is a hierarchical data format with a strict schema.
All the information is organized into logical nested *sections*.
Each *sections* comprised a set of *quantities* on a common subject.
All *sections* and *quantities* are backed by a formal schema that defines names, descriptions, types, shapes, and units.
Each *section* comprised a set of *quantities* on a common subject.
All *sections* and *quantities* are supported by a formal schema that defines names, descriptions, types, shapes, and units.
We sometimes call this data *archive* and the schema *metainfo*.
### Datamodel: *uploads*, *entries*, *files*, *datasets*
......@@ -72,15 +72,15 @@ Once an upload is published, it becomes immutable.
An *upload* can contain an arbitrary directory structure of *raw* files.
For each recognized *mainfile*, NOMAD creates an entry.
An *upload* therefore contains a list of *entries*.
Therefore, an *upload* contains a list of *entries*.
Each *entry* is associated with its *mainfile*, an *archive*, and all other *auxiliary* files in the same directory.
*Entries* are automatically aggregated into *materials* based on the extract materials metadata.
*Entries* (of many uploads) can be manually curated into *datasets*. You can get a DOI for *datasets*.
*Entries* (of many uploads) can be manually curated into *datasets*for which you can also get a DOI.
### Using NOMAD software locally (the Oasis)
The software that runs NOMAD is Open-Source and can be used independently of the NOMAD
*central installation* that is run at [http://nomad-lab.eu](http://nomad-lab.eu).
*central installation* at [http://nomad-lab.eu](http://nomad-lab.eu).
We call any NOMAD installation that is not the *central* one a NOMAD Oasis.
<figure markdown>
......@@ -93,11 +93,10 @@ uses and hybrids are imaginable:
- Academia: Use the Oasis for local management of unpublished research data
- Mirror: Use the Oasis as a mirror that hosts a copy of all published NOMAD data
- Industry: Use the Oasis to manage private data and use published data fully internally with
to comply with strict privacy policies
- Industry: Use of Oasis to manage private data and full internal use of published data in compliance with strict privacy policies
- FAIRmat: Use Oasis to form a network of repositories to build a federated data infrastructure
for materials science.
This what we do in the [FAIRmat project](https://www.fair-di.eu/fairmat/consortium).
This is what we do in the [FAIRmat project](https://www.fair-di.eu/fairmat/consortium).
## Architecture
......@@ -119,6 +118,7 @@ NOMAD. The worker runs all the processing (parsing, normalization). Their separa
to scale the system for various use-cases.
Other services are:
- rabbitmq: a task queue that we use to distribute tasks for the *worker* containers
- mongodb: a no-sql database used to maintain processing state and user-metadata
- elasticsearch: a no-sql database and search engine that drives our search
......@@ -127,11 +127,11 @@ Other services are:
- keycloak: our SSO user management system (can be used by all Oasises)
- a content management system to provide other web-page content (not part of the Oasis)
All NOMAD software is bundled a single NOMAD docker image and a Python
([nomad-lab on pypi](https://pypi.org/project/nomad-lab/)) package. The NOMAD docker
All NOMAD software is bundled in a single NOMAD docker image and a Python package
([nomad-lab on pypi](https://pypi.org/project/nomad-lab/)). The NOMAD docker
image can be downloaded from our public registry.
NOMAD software is organized in multiple git repositories. We use continuous integration
to constantly produce a latest version of docker image and Python package.
to constantly provide the latest version of docker image and Python package.
### NOMAD uses a modern and rich stack frameworks, systems, and libraries
......@@ -173,7 +173,7 @@ available to inform the user.
[Elasticsearch](https://www.elastic.co/webinars/getting-started-elasticsearch)
is used to store repository data (not the raw files).
Elasticsearch allows for flexible scalable search and analytics.
Elasticsearch enables flexible, scalable search and analytics.
#### mongodb
......@@ -186,8 +186,8 @@ processing of uploaded files and the generated entries. We use
#### Keycloak
[Keycloak](https://www.keycloak.org/) is used for user management. It manages users and
provide functions for registering, password forget, editing user accounts, and single
sign on of fairdi@nomad and other related services.
provides functions for registration, forgetting passwords, editing user accounts, and single
sign-on to fairdi@nomad and other related services.
#### FastAPI
......@@ -201,14 +201,14 @@ Fruthermore, you can browse and use the API via [OpenAPI dashboard](https://swag
#### Elasticstack
The [elastic stack](https://www.elastic.co/guide/index.html)
(previously *ELK* stack) is a central logging, metrics, and monitoring
solution that collects data within the cluster and provides a flexible analytics frontend
for said data.
(previously *ELK* stack) is a centralized logging, metrics, and monitoring
solution that collects data within the cluster and provides a flexible analytics front end
for that data.
#### Javascript, React, Material-UI
The frontend (GUI) of **nomad@FAIRDI** build on top of the
The frontend (GUI) of **nomad@FAIRDI** is built on the
[React](https://reactjs.org/docs/getting-started.html) component framework.
This allows us to build the GUI as a set of re-usable components to
achieve a coherent representations for all aspects of nomad, while keeping development
......@@ -234,8 +234,8 @@ provide configuration to run the whole nomad stack on a single server node.
To run and scale nomad on a cluster, you can use [kubernetes](https://kubernetes.io/docs/home/)
to orchestrated the necessary containers. We provide a [helm](https://docs.helm.sh/)
chart with all necessary service and deployment descriptors that allow you to setup and
update nomad with few commands.
chart with all necessary service and deployment descriptors that allow you to set up and
update nomad with only a few commands.
#### GitLab
......@@ -243,5 +243,5 @@ update nomad with few commands.
Nomad as a software project is managed via [GitLab](https://docs.gitlab.com/).
The **nomad@FAIRDI** project is hosted [here](https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR).
GitLab is used to manage versions, different branches of development, tasks and issues,
as a [registry for Docker images](https://docs.gitlab.com/ee/user/project/container_registry.html),
as a [registry for Docker images](https://docs.gitlab.com/ee/user/packages/container_registry/index.html),
and [CI/CD platform](https://docs.gitlab.com/ee/ci/).
......@@ -13,7 +13,7 @@ from nomad.metainfo import MSection, Quantity, SubSection, Units
class System(MSection):
'''
A system section includes all quantities that describe a single a simulated
A system section includes all quantities that describe a single simulated
system (a.k.a. geometry).
'''
......@@ -37,15 +37,15 @@ allow to organize related data into, well, *sections*. Each section can have two
properties: *quantities* and *sub-sections*. Sections and their properties are defined with
Python classes and their attributes.
Each *quantity* defines a piece of data. Basic quantity attributes are its `type`, `shape`,
Each *quantity* defines a piece of data. Basic quantity attributes are `type`, `shape`,
`unit`, and `description`.
*Sub-sections* allow to place section into each other and therefore allow to form containment
hierarchies or sections and the respective data in them. Basic sub-section attributes are
*Sub-sections* allow to place section within each other, forming containment
hierarchies or sections and the respective data within them. Basic sub-section attributes are
`sub_section`(i.e. a reference to the section definition of the sub-section) and `repeats`
(determines if a sub-section can be contained once or multiple times).
(determines whether a sub-section can be included once or multiple times).
The above simply defines a schema, to use the schema and create actual data, we have to
The above simply defines a schema. To use the schema and create actual data, we have to
instantiate the above classes:
```python
......@@ -59,7 +59,7 @@ print(n_atoms = 3)