Commit e0e8eff1 authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Removed old v0 API. #591

parent 515307d3
Pipeline #110648 passed with stages
in 23 minutes and 52 seconds
......@@ -10,9 +10,7 @@ trade-offs between expressiveness, learning curve, and convinience:
- use an HTTP program like *curl* or *wget* to directly use NOMAD from within a shell
- use a generic Python HTTP library like [requests](https://requests.readthedocs.io/en/master/)
- use more specific Python libraries like [bravado](https://github.com/Yelp/bravado) that turn HTTP requests into NOMAD
specific function calls based on an [OpenAPI spec](https://swagger.io/specification/) that NOMAD offers and that describes our API
- directly in the browser via our generated [swagger dashboard](../api/)
- directly in the browser via our generated [OpenAPI dashboard](../api/v1)
- use the NOMAD Python client library, which offers custom and more powerful
implementations for certain tasks (currently only for accessing the NOMAD Archive)
......@@ -128,290 +126,6 @@ data = response.json()
print(json.dumps(data), indent=2)
```
## Using bravado and our OpenAPI spec
The Python library *bravado* is also an HTTP client, but instead of generic *GET URL*
style functions, it takes a formal specification of the NOMAD API and provides NOMAD
specific functions for you.
```python
from bravado.client import SwaggerClient
nomad_url = 'http://nomad-lab.eu/prod/rae/api'
# create the bravado client
client = SwaggerClient.from_url('%s/swagger.json' % nomad_url)
# perform the search request to print number of public entries
data = client.repo.search(atoms=['Si', 'O']).response().result
# print the total ammount of search results
print(data.pagination.total)
# print the data of the first result
print(data.results[0])
```
Read on and learn how to install bravado and perform various tasks, like:
- upload some data
- publish the data
- find it
- download it again
### Python packages
We do not assume many specific python packages. Only the *bravado* package (available
via pipy) is required. It allows us to use the nomad ReST API in a more friendly and
pythonic way. You can simply install it the usual way
Optionally, if you need to access your private data, the package *python-keycloak* is
required to conveniently acquire the necessary tokens to authenticate your self towards
NOMAD.
```sh
pip install bravado
pip install python-keycloak
```
For the following code snippets, we need the following imports:
```python
from bravado.requests_client import RequestsClient
from bravado.client import SwaggerClient
from bravado.exception import HTTPNotFound
from urllib.parse import urlparse
import time
import os.path
import sys
```
And optionally:
```python
from bravado.requests_client import RequestsClient, Authenticator
from keycloak import KeycloakOpenID
```
### An example file
Lets assume you have an example upload file ready. Its a `.zip` (`.tgz` would also work)
with some *VASP* data from a single run at `/example/AcAg/vasprun.xml`, `/example/AcAg/OUTCAR`, ...
Lets keep the filename in a variable:
```python
upload_file = 'example.zip'
```
### Nomad
We need to know the nomad installation to use and its respective API URL. To upload
data you also need an account (email, password). The toy account used here, should be
available on most nomad installations:
```python
nomad_url = 'https://nomad-lab.eu/prod/rae/api'
user = 'leonard.hofstadter@nomad-fairdi.tests.de'
password = 'password'
```
### Using bravado
Bravado reads a ReST API's definition from a `swagger.json` as it is provided by
many APIs, including nomad's of course.
```python
host = urlparse(nomad_url).netloc.split(':')[0]
http_client = RequestsClient()
client = SwaggerClient.from_url('%s/swagger.json' % nomad_url, http_client=http_client)
```
Bravado also allows to use authentication, if required. The following would be a typical setup:
```python
class KeycloakAuthenticator(Authenticator):
""" A bravado authenticator for NOMAD's keycloak-based user management. """
def __init__(self, user, password):
super().__init__(host=urlparse(nomad_url).netloc.split(':')[0])
self.user = user
self.password = password
self.token = None
self.__oidc = KeycloakOpenID(
server_url='https://nomad-lab.eu/fairdi/keycloak/auth/',
realm_name='fairdi_nomad_prod',
client_id='nomad_public')
def apply(self, request):
if self.token is None:
self.token = self.__oidc.token(username=self.user, password=self.password)
self.token['time'] = time()
elif self.token['expires_in'] < int(time()) - self.token['time'] + 10:
try:
self.token = self.__oidc.refresh_token(self.token['refresh_token'])
self.token['time'] = time()
except Exception:
self.token = self.__oidc.token(username=self.user, password=self.password)
self.token['time'] = time()
request.headers.setdefault('Authorization', 'Bearer %s' % self.token['access_token'])
return request
http_client = RequestsClient()
http_client.authenticator = KeycloakAuthenticator(user=user, password=password)
client = SwaggerClient.from_url('%s/swagger.json' % nomad_url, http_client=http_client)
```
### Uploading data
Now, we can look at actually using the nomad API. The API is divided into several
modules: *uploads*, *repo*, *archive*, *raw*, etc. Each provided functionality for
a certain aspect of nomad.
The *uploads* endpoints can be used to, you guessed it, upload your data. But they
also allow to get process on the upload processing; inspect, delete, and publish uploads;
and get details about the uploaded data, which code input/output files where found, etc.
#### Uploading a file
Its simple, since bravado supports uploading files:
```python
with open(upload_file, 'rb') as f:
upload = client.uploads.upload(file=f).response().result
```
If you already have you file on the nomad servers, e.g. under `/nomad/my_files/example.zip`,
you can skip the actual upload and say:
```python
upload = client.uploads.upload(local_path='/nomad/my_files/example.zip').response().result
```
#### Supervising the processing
When files are added to an upload, nomad will initiate a *process* to extract/update the
files, identify code data, parse and normalize the data.
You can continuously pull the API, to get an update on the processing and check if the
processing has been completed.
```python
while upload.process_running:
upload = client.uploads.get_upload(upload_id=upload.upload_id).response().result
time.sleep(5)
print('processed: %d, failures: %d' % (upload.processed_calcs, upload.failed_calcs))
```
Once the process completes, you can check if your upload was a success. If it
was not successful, you can also delete the upload again:
```python
if upload.process_status != 'SUCCESS':
print('something went wrong')
print('errors: %s' % str(upload.errors))
# delete the unsuccessful upload
client.uploads.delete_upload(upload_id=upload.upload_id).response().result
sys.exit(1)
```
Of course, you can also visit the nomad GUI
([https://nomad-lab.eu/prod/rae/gui/uploads](https://nomad-lab.eu/prod/rae/gui/uploads))
to inspect your uploads. (You might click reload, if you had the page already open.)
#### Publishing your upload
The uploaded data is only visible to you. We call this *staging*. After the processing
was successful and you are satisfied with our processing, you have to publish the upload.
This also allows you to add additional meta-data to your upload (e.g. comments, references, coauthors, etc.).
Here you also determine, if you want an *embargo* on your data.
Once the data was published, you cannot delete it anymore. You can skip this step, but
the reset of the tutorial, will only work for you, because the data is only visible to you.
To initiate the publish and provide further data:
```python
client.uploads.exec_upload_operation(upload_id=upload.upload_id, payload={
'operation': 'publish',
'metadata': {
'comment': 'Data from a cool external project',
'references': ['http://external.project.eu'],
# 'coauthors': ['sheldon.cooper@ucla.edu'], this does not yet work with emails
# 'external_id': 'external_id' this does also not work, but we could implement something like this
}
})
```
Publishing, also might take a while. You can inspect this analog to the upload processing:
```python
while upload.process_running:
try:
upload = client.uploads.get_upload(upload_id=upload.upload_id).response().result
time.sleep(1)
except HTTPNotFound:
# upload gets deleted from the upload staging area once published
break
```
This time we needed some exception handling, since the upload will be removed from the
staging area, and you will get a 404 on the `uploads` endpoint.
### Searching for data
The *repo* part of the API contains a *search* endpoint that support many different
quantities to search for. These include `formula` (e.g. *AcAg*), `system` (e.g. *bulk/2D/atom*), `spacegroup`, `authors`, `code` (e.g. *VASP*), etc.
In the following example, we search for the specific path segment `AcAg`.
```python
result = client.repo.search(paths='AcAg').response().result
if result.pagination.total == 0:
print('not found')
elif result.pagination.total > 1:
print('my ids are not specific enough, bummer ... or did I uploaded stuff multiple times?')
calc = result.results[0]
print(calc)
```
The result of a search always contains the key `pagination` with pagination data (`total`, `page`, `per_page`) and `results` with an array of the search result. The search results depend on
the type of search and their is no formal swagger model for it, therefore you get plain
dictionaries.
### Downloading data
The *raw* api allows to download data. You can do that either via bravado:
```python
client.raw.get(upload_id=calc['upload_id'], path=calc['mainfile']).response()
```
In case of published data, you can also create plain URLs and use a tool like *curl*:
```python
print('%s/raw/%s/%s' % (nomad_url, calc['upload_id'], calc['mainfile']))
print('%s/raw/%s/%s/*' % (nomad_url, calc['upload_id'], os.path.dirname(calc['mainfile'])))
```
There are different options to download individual files, or zips with multiple files.
### Using *curl* to access the API
The shell tool *curl* can be used to call most API endpoints. Most endpoints for searching
or downloading data are only **GET** operations controlled by URL parameters. For example:
Downloading data:
```sh
curl http://nomad-lab.eu/prod/rae/api/raw/query?upload_id=<your_upload_id> -o download.zip
```
It is a litle bit trickier, if you need to authenticate yourself, e.g. to download
not yet published or embargoed data. All endpoints support and most require the use of
an access token. To acquire an access token from our usermanagement system with curl:
```sh
curl --data 'grant_type=password&client_id=nomad_public&username=<your_username>&password=<your password>' \
https://nomad-lab.eu/fairdi/keycloak/auth/realms/fairdi_nomad_prod/protocol/openid-connect/token
```
You can use the access-token with:
```sh
curl -H 'Authorization: Bearer <you_access_token>' \
http://nomad-lab.eu/prod/rae/api/raw/query?upload_id=<your_upload_id> -o download.zip
```
### Conclusions
This was just a small glimpse into the nomad API. You should checkout our
[swagger-ui](nomad-lab.eu/prod/rae/api/)
for more details on all the API endpoints and their parameters. You can explore the
API via the swagger-ui and even try it in your browser.
## NOMAD's Python client library
This library is part devevloped by NOMAD. It is supposed to provide more powerful
......
API Reference
====================
This is just a brief summary of all API endpoints of the NOMAD API. For a more compelling documention
consult our *swagger* dashboards:
- (NOMAD API)[swagger dashboard](https://nomad-lab.eu/prod/rae/api/)
- (NOMAD's Optimade API)[swagger dashboard](https://nomad-lab.eu/prod/rae/optimade/)
Summary
-------
.. qrefflask:: nomad.app.flask:app
:undoc-static:
API(s) Details
--------------
.. autoflask:: nomad.app.flask:app
:undoc-static:
......@@ -51,8 +51,6 @@ extensions = [
'sphinx.ext.extlinks',
'sphinx_click.ext',
'sphinxcontrib.httpdomain',
'sphinxcontrib.autohttp.flask',
'sphinxcontrib.autohttp.flaskqref',
'celery.contrib.sphinx',
'm2r'
]
......
......@@ -304,22 +304,6 @@ Here are some example launch configs for VSCode:
"url": "http://localhost:3000",
"webRoot": "${workspaceFolder}/gui"
},
{
"name": "Python: API Flask (0.11.x or later)",
"type": "python",
"request": "launch",
"module": "flask",
"env": {
"FLASK_APP": "nomad/app/__init__.py"
},
"args": [
"run",
"--port",
"8000",
"--no-debugger",
"--no-reload"
]
},
{
"name": "Python: some test",
"type": "python",
......
......@@ -18,7 +18,6 @@ and infrastructure with a simplyfied architecture and consolidated code base.
normalizer.rst
oasis.rst
ops/ops.rst
api_reference.rst
python_reference.rst
.. # Introduction
......@@ -34,7 +33,6 @@ and infrastructure with a simplyfied architecture and consolidated code base.
.. # The different APIs
.. # curl
.. # requests
.. # bravado
.. # NOMAD's Python library
.. # Getting started
.. # Command line interface (CLI)
......@@ -64,5 +62,4 @@ and infrastructure with a simplyfied architecture and consolidated code base.
.. # How to write a normalizer
.. # Operating a NOMAD OASIS
.. # Operating NOMAD (with k8s)
.. # API Reference
.. # Python Reference
......@@ -88,15 +88,12 @@ provide functions for registering, password forget, editing user accounts, and s
sign on of fairdi@nomad and other related services.
### flask, et al.
The ReSTful API is build with the [flask](http://flask.pocoo.org/docs/1.0/)
framework and its [ReST+](https://flask-restplus.readthedocs.io/en/stable/) extension. This
allows us to automatically derive a [swagger](https://swagger.io/) description of the nomad API,
which in turn allows us to generate programming language specific client libraries, e.g. we
use [bravado](https://github.com/Yelp/bravado) for Python and
[swagger-js](https://github.com/swagger-api/swagger-js) for Javascript.
Fruthermore, you can browse and use the API via [swagger-ui](https://swagger.io/tools/swagger-ui/).
### FastAPI
The ReSTful API is build with the [FastAPI](https://fastapi.tiangolo.com/)
framework. This allows us to automatically derive a [OpenAPI](https://swagger.io/specification/) description
of the nomad API.
Fruthermore, you can browse and use the API via [OpenAPI dashboard](https://swagger.io/tools/swagger-ui/).
### Elasticstack
......
"""
This is a brief example on how to use the public nomad@FAIRDI API.
"""
'''
This is a brief example on how to use the public API.
'''
import requests
from bravado.client import SwaggerClient
from nomad import config
nomad_url = 'http://nomad-lab.eu/prod/rae/api'
# create the bravado client
client = SwaggerClient.from_url('%s/swagger.json' % nomad_url)
nomad_url = config.client.url
# perform the search request to print number of public entries
data = client.repo.search(atoms=['Si', 'O']).response().result
response = requests.post(
f'{nomad_url}/v1/entries',
json={
'query': {
'results.material.elements:any': ['Si', 'O']
}
})
response_data = response.json()
# print the total ammount of search results
print(data.pagination.total)
print(response_data['pagination']['total'])
# print the data of the first result
print(data.results[0])
print(response_data['data'][0])
"""
This is a brief example on how to authenticate with the public nomad@FAIRDI API.
"""
'''
This is a brief example on how use requests with authentication to talks to the NOMAD API.
'''
from bravado.requests_client import RequestsClient, Authenticator
from bravado.client import SwaggerClient
from urllib.parse import urlparse
from keycloak import KeycloakOpenID
from time import time
import requests
nomad_url = 'http://nomad-lab.eu/prod/rae/api'
from nomad import config
from nomad.client import Auth
nomad_url = config.client.url
user = 'yourusername'
password = 'yourpassword'
# an authenticator for NOMAD's keycloak user management
class KeycloakAuthenticator(Authenticator):
def __init__(self, user, password):
super().__init__(host=urlparse(nomad_url).netloc.split(':')[0])
self.user = user
self.password = password
self.token = None
self.__oidc = KeycloakOpenID(
server_url='https://nomad-lab.eu/fairdi/keycloak/auth/',
realm_name='fairdi_nomad_prod',
client_id='nomad_public')
def apply(self, request):
if self.token is None:
self.token = self.__oidc.token(username=self.user, password=self.password)
self.token['time'] = time()
elif self.token['expires_in'] < int(time()) - self.token['time'] + 10:
try:
self.token = self.__oidc.refresh_token(self.token['refresh_token'])
self.token['time'] = time()
except Exception:
self.token = self.__oidc.token(username=self.user, password=self.password)
self.token['time'] = time()
request.headers.setdefault('Authorization', 'Bearer %s' % self.token['access_token'])
return request
# create the bravado client
http_client = RequestsClient()
http_client.authenticator = KeycloakAuthenticator(user=user, password=password)
client = SwaggerClient.from_url('%s/swagger.json' % nomad_url, http_client=http_client)
# create an auth object
auth = Auth(user=user, password=password)
# simple search request to print number of user entries
print(client.repo.search(owner='user').response().result.pagination.total)
response = requests.get(f'{nomad_url}/v1/entries', params=dict(owner='user'), auth=auth)
print(response.json()['data'])
from nomad import datamodel
print(datamodel.EntryMetadata(domain='DFT', calc_id='test').__class__.__name__)
print(datamodel.EntryMetadata(calc_id='test').__class__.__name__)
print(datamodel.EntryMetadata(domain='EMS', calc_id='test').__class__.__name__)
"""
This example shows how to read files from many sources (here .tar.gz files),
chunk the data into even sized uploads and upload/process them in parallel. The assumption
is that each source file is much smaller than the targeted upload size.
"""
from typing import Iterator, Iterable, Union, Tuple, Dict, Any
from bravado.requests_client import RequestsClient
from bravado.client import SwaggerClient
from urllib.parse import urlparse, urlencode
import requests
import re
import time
import os
import os.path
import tarfile
import io
import zipfile
import zipstream
import uuid
# config
nomad_url = 'http://labdev-nomad.esc.rzg.mpg.de/fairdi/nomad/mp/api'
user = 'leonard.hofstadter@nomad-fairdi.tests.de'
password = 'password'
approx_upload_size = 32 * 1024 * 1024 * 1024 # you can make it really small for testing
max_parallel_uploads = 9
direct_stream = False
# create the bravado client
host = urlparse(nomad_url).netloc.split(':')[0]
http_client = RequestsClient()
http_client.set_basic_auth(host, user, password)
client = SwaggerClient.from_url('%s/swagger.json' % nomad_url, http_client=http_client)
def source_generator() -> Iterable[Tuple[str, Union[str, None], Union[str, None]]]:
"""
Yields all data sources. Yields tuples (path to .tgz, prefix, external_id). Prefix denotes
a subdirectory to put the contents in. Use None for no prefix. The external_id is
used when "publishing" the data to populate the external_id field.
"""
yield os.path.join(os.path.dirname(__file__), 'example-1.tar.gz'), 'example_1', 'external_1'
yield os.path.join(os.path.dirname(__file__), 'example-2.tar.gz'), 'example_2', 'external_2'
yield os.path.join(os.path.dirname(__file__), 'example-3.tar.gz'), 'example_3', 'external_3'
def upload_next_data(sources: Iterator[Tuple[str, str, str]], upload_name='next upload'):
"""
Reads data from the given sources iterator. Creates and uploads a .zip-stream of
approx. size. Returns the upload and corresponding metadata, or raises StopIteration
if the sources iterator was empty. Should be used repeatedly on the same iterator
until it is empty.
"""
# potentially raises StopIteration before being streamed
first_source = next(sources)
calc_metadata = []
def iterator():
"""
Yields dicts with keys arcname, iterable, as required for the zipstream
library. Will read from generator until the zip-stream has the desired size.
"""
size = 0
first = True
while(True):
if first:
source_file, prefix, external_id = first_source
first = False
else:
try:
source_file, prefix, external_id = next(sources)
except StopIteration:
break
source_tar = tarfile.open(source_file)
source = source_tar.fileobj
bufsize = source_tar.copybufsize
for source_member in source_tar.getmembers():
if not source_member.isfile():
continue
target = io.BytesIO()
source.seek(source_member.offset_data)
tarfile.copyfileobj( # type: ignore
source, target, source_member.size, tarfile.ReadError, bufsize)
size += source_member.size
target.seek(0)
def iter_content():
while True:
data = target.read(io.DEFAULT_BUFFER_SIZE)