Commit 399f5f43 authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Merge branch 'v0.8.0' into metainfo

parents 1a1f04e1 8d55b346
......@@ -90,7 +90,6 @@ tests:
NOMAD_SPRINGER_DB_PATH: /nomad/fairdi/db/data/springer.db
......@@ -29,6 +29,10 @@ contributing, and API reference.
Omitted versions are plain bugfix releases with only minor changes and fixes.
### v0.7.9
- Everything to run a simple NOMAD OASIS based on the central user-management
- minor bugfixes
### v0.7.7
- Shows dataset contents with embargo data, but hides the entry details (raw-files, archive)
- minor bugfixes
Subproject commit 487cceae152be217e0689d6217f420480c2a9d39
Subproject commit d918460c31728058834432b736062d44e1e1c074
Subproject commit cd354f066cb8b85904a2725bb93abf7c443b3fdf
Subproject commit d30ef0bd9275206380866c89946a0c129e7d8df9
Subproject commit 92005ec9ff4b8e13bd86373d14bd5fafe2b52cd1
Subproject commit 75a5cd92dbd6299067e0fca0b9949f8b4410ec91
Subproject commit 5f97f32086c281ebda5ab6084ae2c7eba16b516f
Subproject commit e113cbf21f23054394ad6099ad4836cbd9e21790
Subproject commit b932711d741c2457a80bf2447c180ce49c23e6c9
Subproject commit bd5b5c6f947ec9b7172ef7970a92825c737e1e60
Subproject commit 5333328258b51b82882df9bfb6505f94d8d4d6af
Subproject commit d60013e1597493972237210a36549bfcf0a2706f
Subproject commit 3811ced85fb7d68ca579d5ca8d93e800f48c53a5
Subproject commit f2b7f39ca62438d25a21cdbaf267269fbc4f62ac
Subproject commit d9c9b3c14ecab80e58adab70917267e5e7fbe3f2
Subproject commit 5f07d80f9d1838b3f6b95e39266221002061e0d1
# Archive API tutorial
This contains the tutorials to use the new archive query functionality.
It uses the new metainfo definition for the archive data. In addition, the archive data
can now be filtered through the new api. The archive are now also stored using a new binary
format msgpack which in principle makes querying faster.
## Archive API
First, we look at how to use the new archive query api. Here we use the python requests
import requests
data = {
'atoms': 'Fe', 'scroll': True, 'per_page': 10,
'results': [{"section_run": {"section_single_configuration_calculation[-1]": {"energy_total": None}}}]}
response ='', data=data)
data = response.json
results = data.get('results')
To query the archive, we use the post method where we provide the usual query parameters
in a dictionary. In addition, we provide a schema for the archive data ala graphQL, i.e.
a heirarchical dictionary with null values for each of the property we would like to query.
In the example, we would like to return only the total energy for the last image. It is
important to point out that this schema uses the key 'results' and is a list since
this will be filled with a list of archive data with this schema.
## Archive and the new metainfo
A wrapper for the archive query api is implemented in ArchiveQuery.
from nomad.archive_library.filedb import ArchiveQuery
q = ArchiveQuery(
atoms=Fe, scroll=True, per_page=10, archive_data={
"section_run": {"section_single_configuration_calculation[-1]": {"energy_total": None}}})
metainfo = q.query()
for calc in metainfo:
Similarly, we provide query parameters and also the schema which in this case is 'archive_data'.
When we invoke query, a recursive api request is made until all the data matching our
parameters are downloaded. The results are then expressed in the new metainfo scheme
which offers auto-completion feature, among others.
## Msgpack container
The archive data are now stored in a binary format called msgpack. To create a msgpack database
from the archive data and query it, one uses ArchiveFileDB.
from nomad.archive_library.filedb import ArchiveFileDB
db = ArchiveFileDB('archive.msg', mode='w', entry_toc_depth=2)
db.add_data({'calc1':{'secA': {'subsecA': {'propA': 1.0}}, 'secB': {'propB': 'X'}}})
db.add_data({'calc2':{'secA': {'subsecA': {'propA': 2.0}}, 'secB': {'propB': 'Y'}}})
db = ArchiveFileDB('archive.msg')
db.query({'calc1':{'secA': None}})
In the example, we first create a database in 'archive.msg', and data which are added
will be fragmented down to subsections. We reload it for reading and query all entries
under 'secA' of 'calc1'.
......@@ -14,5 +14,6 @@ and infrastructure with a simplyfied architecture and consolidated code base.
Operating nomad
Operating NOMAD
.. mdinclude:: ../ops/
.. mdinclude:: ../ops/docker-compose/nomad/
.. mdinclude:: ../ops/helm/nomad/
.. mdinclude:: ../ops/containers/
.. mdinclude:: ../ops/docker-compose/nomad-oasis/
......@@ -11,18 +11,19 @@ The nomad infrastructure consists of a series of nomad and 3rd party services:
- rabbitmq: a task queue used to distribute work in a cluster
All 3rd party services should be run via *docker-compose* (see blow). The
nomad python services can also be run via *docker-compose* or manually started with python.
The gui can be run manually with a development server via yarn, or with
nomad python services can be run with python to develop them.
The gui can be run with a development server via yarn.
Below you will find information on how to install all python dependencies and code
manually. How to use *docker*/*docker-compose*. How run services with *docker-compose*
or manually.
manually. How to use *docker*/*docker-compose*. How run 3rd-party services with *docker-compose*.
Keep in mind the *docker-compose* configures all services in a way that mirror
the configuration of the python code in `nomad/` and the gui config in
To learn about how to run everything in docker, e.g. to operate a NOMAD OASIS in
production, go (here)(/app/docs/ops.html).
## Install python code and dependencies
### Cloning and development tools
......@@ -158,35 +159,12 @@ having to copy the git itself to the docker build context.
The images are build via *docker-compose* and don't have to be created manually.
### Build with docker-compose
We have multiple *docker-compose* files that must be used together.
- `docker-compose.yml` contains the base definitions for all services
- `docker-compose.override.yml` configures services for development (notably builds images for nomad services)
- `` will also provide the ELK service
- `` configures services for production (notable uses a pre-build image for nomad services that was build during CI/CD)
It is sufficient to use the implicit `docker-compose.yml` only (like in the command below).
The `override` will be used automatically.
Now we can build the *docker-compose* that contains all external services (rabbitmq,
mongo, elastic, elk) and nomad services (worker, app, gui).
docker-compose build
Docker-compose tries to cache individual building steps. Sometimes this causes
troubles and not everything necessary is build when you changed something. In
this cases use:
docker-compose build --no-cache
### Run everything with docker-compose
### Run necessary 3-rd party services with docker-compose
You can run all containers with:
docker-compose up
cd ops/docker-compose/nomad
docker-compose -f docker-compose.yml -f docker-compose.override.yml up -d mongo elastic rabbitmq
To shut down everything, just `ctrl-c` the running output. If you started everything
......@@ -195,25 +173,6 @@ in *deamon* mode (`-d`) use:
docker-compose down
### Run containers selectively
The following services/containers are managed via our docker-compose:
- rabbitmq, mongo, elastic, (elk, only for production)
- worker, app
- gui
- proxy
The *proxy* container runs *nginx* based reverse proxies that put all services under
a single port and different paths.
You can also run services selectively, e.g.
docker-compose up -d rabbitmq, mongo, elastic
docker-compose up worker
docker-compose up app gui proxy
## Accessing 3'rd party services
Usually these services only used by the nomad containers, but sometimes you also
need to check something or do some manual steps.
......@@ -234,12 +193,7 @@ The index prefix for logs is `logstash-`. The ELK is only available with the
You can access mongodb and elastic search via your preferred tools. Just make sure
to use the right ports (see above).
## Run nomad services manually
You can run the worker, app, and gui as part of the docker infrastructure, like
seen above. But, of course there are always reasons to run them manually during
development, like running them in a debugger, profiler, etc.
## Run nomad services
### API and worker
......@@ -253,11 +207,6 @@ To run it directly with celery, do (from the root)
celery -A nomad.processing worker -l info
Run the app via docker, or (from the root):
nomad admin run app
You can also run worker and app together:
nomad admin run appworker
"name": "nomad-fair-gui",
"version": "0.7.8",
"version": "0.7.10",
"commit": "nomad-gui-commit-placeholder",
"private": true,
"dependencies": {
......@@ -730,7 +730,8 @@ class EditUserMetadataDialogUnstyled extends React.Component {
user: PropTypes.object,
onEditComplete: PropTypes.func,
disabled: PropTypes.bool,
title: PropTypes.string
title: PropTypes.string,
info: PropTypes.object
static styles = theme => ({
......@@ -1055,7 +1056,7 @@ class EditUserMetadataDialogUnstyled extends React.Component {
renderDialogActions(submitting, submitEnabled) {
const {classes} = this.props
const {classes, info} = this.props
if (submitting) {
return <DialogActions>
......@@ -1070,7 +1071,7 @@ class EditUserMetadataDialogUnstyled extends React.Component {
} else {
return <DialogActions>
<InviteUserDialog />
{info && !info.oasis && <InviteUserDialog />}
<span style={{flexGrow: 1}} />
<Button onClick={this.handleClose} disabled={submitting}>
......@@ -154,7 +154,7 @@ class RawFiles extends React.Component {
if (fileContents.contents.length < (page + 1) * 16 * 1024) {
api.getRawFile(uploadId, shownFile, {offset: page * 16 * 1024, length: 16 * 1024})
api.getRawFile(uploadId, calcId, shownFile.split('/').reverse()[0], {offset: page * 16 * 1024, length: 16 * 1024})
.then(contents => {
const {fileContents} = this.state
// The back-button navigation might cause a scroll event, might cause to loadmore,
......@@ -200,6 +200,20 @@ class RawFiles extends React.Component {
let downloadUrl
if (selectedFiles.length === 1) {
// download the individual file
downloadUrl = `raw/${uploadId}/${selectedFiles[0]}`
} else if (selectedFiles.length === availableFiles.length) {
// use an endpoint that downloads all files of the calc
downloadUrl = `raw/calc/${uploadId}/${calcId}/*?strip=true`
} else if (selectedFiles.length > 0) {
// use a prefix to shorten the url
const prefix = selectedFiles[0].substring(0, selectedFiles[0].lastIndexOf("/"))
const files = => path.substring(path.lastIndexOf("/") + 1)).join(',')
downloadUrl = `raw/${uploadId}?files=${encodeURIComponent(files)}&prefix=${prefix}&strip=true`
return (
<div className={classes.root}>
<FormGroup row>
......@@ -225,7 +239,7 @@ class RawFiles extends React.Component {
<Download component={IconButton} disabled={selectedFiles.length === 0}
tooltip="download selected files"
url={(selectedFiles.length === 1) ? `raw/${uploadId}/${selectedFiles[0]}` : `raw/${uploadId}?files=${encodeURIComponent(selectedFiles.join(','))}&strip=true`}
fileName={selectedFiles.length === 1 ? this.label(selectedFiles[0]) : `${calcId}.zip`}
<DownloadIcon />
......@@ -216,6 +216,10 @@ class DatasetListUnstyled extends React.Component {
label: 'Dataset name',
render: (dataset) =>
created: {
label: 'Created',
render: (dataset) => dataset.created && new Date(dataset.created).toLocaleString()
DOI: {
label: 'Dataset DOI',
render: (dataset) => dataset.doi && <DOI doi={dataset.doi} />
......@@ -282,8 +286,7 @@ class DatasetListUnstyled extends React.Component {
id={row =>}
// selectedColumns={defaultSelectedColumns}
// entryDetails={this.renderEntryDetails.bind(this)}
selectedColumns={['name', 'DOI', 'entries', 'authors']}
......@@ -20,7 +20,7 @@ The archive API of the nomad@FAIRDI APIs. This API is about serving processed
from typing import Dict, Any
from io import BytesIO
import os.path
from flask import send_file, request
from flask import send_file, request, g
from flask_restplus import abort, Resource, fields
import json
import importlib
......@@ -31,12 +31,12 @@ import nomad_meta_info
from nomad.files import UploadFiles, Restricted
from nomad import search, config
from import common
from nomad.archive import query_archive
from .auth import authenticate, create_authorization_predicate
from .api import api
from .common import calc_route, streamed_zipfile, search_model, add_pagination_parameters,\
add_scroll_parameters, add_search_parameters, apply_search_parameters,\
query_api_python, query_api_curl
from .common import calc_route, streamed_zipfile, search_model, add_search_parameters, apply_search_parameters, query_model
ns = api.namespace(
......@@ -212,77 +212,70 @@ class ArchiveDownloadResource(Resource):
generator(), zipfile_name='', compress=compress)
_archive_query_parser = api.parser()
_archive_query_model_fields = {
'results': fields.List(fields.Raw, description=(
'A list of search results. Each result is a dict with quantities names as key and '
'values as values')),
'python': fields.String(description=(
'A string of python code snippet which can be executed to reproduce the api result.')),
'curl': fields.String(description=(
'A string of curl command which can be executed to reproduce the api result.')),
_archive_query_model = api.inherit('ArchiveCalculations', search_model, _archive_query_model_fields)
_archive_query_model = api.inherit('ArchiveSearch', search_model, {
'query': fields.Nested(query_model, description='The query used to find the requested entries.'),
'query_schema': fields.Raw(description='The query schema that defines what archive data to retrive.')
class ArchiveQueryResource(Resource):
@api.response(400, 'Invalid requests, e.g. wrong owner type or bad search parameters')
@api.response(401, 'Not authorized to access the data.')
@api.response(404, 'The upload or calculation does not exist')
@api.response(200, 'Archive data send')
@api.expect(_archive_query_parser, validate=True)
@api.marshal_with(_archive_query_model, skip_none=True, code=200, description='Search results sent')
def get(self):
def post(self):
Get archive data in json format from all query results.
Post a query schema and return it filled with archive data.
See ``/repo`` endpoint for documentation on the search
The actual data are in archive_data and a supplementary python code (curl) to
The actual data are in results and a supplementary python code (curl) to
execute search is in python (curl).
args = {
key: value for key, value in _archive_query_parser.parse_args().items()
if value is not None}
scroll = args.get('scroll', False)
scroll_id = args.get('scroll_id', None)
page = args.get('page', 1)
per_page = args.get('per_page', 10 if not scroll else 1000)
order = args.get('order', -1)
order_by = 'upload_id'
data_in = request.get_json()
scroll = data_in.get('scroll', None)
if scroll:
scroll_id = scroll.get('scroll_id')
scroll = True
pagination = data_in.get('pagination', {})
page = pagination.get('page', 1)
per_page = pagination.get('per_page', 10 if not scroll else 1000)
query = data_in.get('query', {})
query_schema = data_in.get('query_schema', '*')
except Exception:
abort(400, message='bad parameter types')
assert page >= 1
assert per_page > 0
except AssertionError:
abort(400, message='invalid pagination')
if order not in [-1, 1]:
if not (page >= 1 and per_page > 0):
abort(400, message='invalid pagination')
search_request = search.SearchRequest()
apply_search_parameters(search_request, args)
search_request.include('calc_id', 'upload_id', 'mainfile')
if g.user is not None:
search_request.owner('all', user_id=g.user.user_id)
apply_search_parameters(search_request, query)
search_request.include('calc_id', 'upload_id', 'with_embargo')
if scroll:
results = search_request.execute_scrolled(scroll_id=scroll_id, size=per_page)
results = search_request.execute_scrolled(
scroll_id=scroll_id, size=per_page, order_by='upload_id')
results['scroll']['scroll'] = True
results = search_request.execute_paginated(
per_page=per_page, page=page, order=order, order_by=order_by)
per_page=per_page, page=page, order_by='upload_id')
except search.ScrollIdNotFound:
abort(400, 'The given scroll_id does not exist.')
......@@ -291,41 +284,33 @@ class ArchiveQueryResource(Resource):
abort(400, str(e))
# build python code and curl snippet
results['python'] = query_api_python('archive', 'query', query_string=request.args)
results['curl'] = query_api_curl('archive', 'query', query_string=request.args)
data = []
calcs = results['results']
upload_files = None
for entry in calcs:
upload_id = entry['upload_id']
calc_id = entry['calc_id']
if upload_files is None or upload_files.upload_id != upload_id:
if upload_files is not None:
upload_files = UploadFiles.get(upload_id)
if upload_files is None:
raise KeyError
upload_files._is_authorized = create_authorization_predicate(upload_id, entry['calc_id'])
fo = upload_files.archive_file(calc_id, 'rb')
if upload_files is not None:
archive_files = None
current_upload_id = None
for entry in calcs:
upload_id = entry['upload_id']
calc_id = entry['calc_id']
if archive_files is None or current_upload_id != upload_id:
upload_files = UploadFiles.get(upload_id, create_authorization_predicate(upload_id))
if upload_files is None:
return []
archive_files = upload_files.archive_file_msgs()
current_upload_id = upload_id
if entry['with_embargo']:
archive_file = archive_files[1]
archive_file = archive_files[0]
except Restricted:
abort(401, message='Not authorized to access %s/%s.' % (upload_id, calc_id))
if archive_file is None:
except KeyError:
abort(404, message='Calculation %s/%s does not exist.' % (upload_id, calc_id))
data.append(query_archive(archive_file, {calc_id: query_schema}))
# assign archive data to results
results['results'] = data
return results, 200
......@@ -249,6 +249,9 @@ class UsersResource(Resource):
@api.expect(user_model, validate=True)
def put(self):
""" Invite a new user. """
if config.keycloak.oasis:
abort(400, 'User invide does not work this NOMAD OASIS')
json_data = request.get_json()
user = datamodel.User.m_from_dict(json_data)
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment