Commit 0a94e6b0 authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Added API tutorial.

parent c835d98c
# API Tutorial
This tutorial assumes that you want to
- upload some data
- publish the data
- find it
- download it again
## Prequisites
### Python
The tutorial was tested with Python 3, but it might as well work with Python 2.
### Python packages
We do not assume many specific python packages. Only the *bravado* package (available
via pipy) is required. It allows us to use the nomad ReST API in a more friendly and
pythonic way. You can simply install it the usual way
```
pip install bravado
```
For the following code snippets, we need the following imports:
```python
from bravado.requests_client import RequestsClient
from bravado.client import SwaggerClient
from bravado.exception import HTTPNotFound
from urllib.parse import urlparse
import time
import os.path
import sys
```
### An example file
Lets assume you have an example upload file ready. Its a `.zip` (`.tgz` would also work)
with some *VASP* data from a single run at `/example/AcAg/vasprun.xml`, `/example/AcAg/OUTCAR`, ...
Lets keep the filename in a variable:
```python
upload_file = 'example.zip'
```
### Nomad
We need to know the nomad installation to use and its respective API URL. To upload
data you also need an account (email, password):
```python
nomad_url = 'http://enc-staging-nomad.esc.rzg.mpg.de/fairdi/nomad/v0.3.0/api'
user = 'leonard.hofstadter@nomad-fairdi.tests.de'
password = 'password'
```
### Using bravado
Bravado reads a ReST API's definition from a `swagger.json` as it is provided by
many APIs, including nomad's of course. Bravado also allows to use authentication,
which makes it even easier. The following would be a typical setup:
```python
host = urlparse(nomad_url).netloc.split(':')[0]
http_client = RequestsClient()
http_client.set_basic_auth(host, user, password)
client = SwaggerClient.from_url('%s/swagger.json' % nomad_url, http_client=http_client)
```
## Uploading data
Now, we can look at actually using the nomad API. The API is divided into several
modules: *uploads*, *repo*, *archive*, *raw*, etc. Each provided functionality for
a certain aspect of nomad.
The *uploads* endpoints can be used to, you guessed it, upload your data. But they
also allow to get process on the upload processing; inspect, delete, and publish uploads;
and get details about the uploaded data, which code input/output files where found, etc.
### Uploading a file
Its simple, since bravado supports uploading files:
```python
with open(upload_file, 'rb') as f:
upload = client.uploads.upload(file=f).response().result
```
If you already have you file on the nomad servers, e.g. under `/nomad/my_files/example.zip`,
you can skip the actual upload and say:
```python
upload = client.uploads.upload(local_path='/nomad/my_files/example.zip').response().result
```
### Supervising the processing
Once uploaded, nomad will extract the file, identify code data, parse and normalize the
data. We call this *processing* and *processing* consists of *tasks* (uploading, extracting, parsing).
You can consistently pull the API, to get an update on the processing and check if all
tasks have completed.
```python
while upload.tasks_running:
upload = client.uploads.get_upload(upload_id=upload.upload_id).response().result
time.sleep(5)
print('processed: %d, failures: %d' % (upload.processed_calcs, upload.failed_calcs))
```
Once there are no more tasks running, you can check if your upload was a success. If it
was not successful, you can also delete the upload again:
```python
if upload.tasks_status != 'SUCCESS':
print('something went wrong')
print('errors: %s' % str(upload.errors))
# delete the unsuccessful upload
client.uploads.delete_upload(upload_id=upload.upload_id).response().result
sys.exit(1)
```
Of course, you can also visit the nomad GUI
([http://enc-staging-nomad.esc.rzg.mpg.de/fairdi/nomad/v0.3.0/upload](http://enc-staging-nomad.esc.rzg.mpg.de/fairdi/nomad/v0.3.0/upload))
to inspect your uploads. (You might click reload, if you had the page already open.)
### Publishing your upload
The uploaded data is only visible to you. We call this *staging*. After the processing
was successful and you are satisfied with our processing, you have to publish the upload.
This also allows you to add additional meta-data to your upload (e.g. comments, references, coauthors, etc.).
Here you also determine, if you want an *embargo* on your data.
Once the data was published, you cannot delete it anymore. You can skip this step, but
the reset of the tutorial, will only work for you, because the data is only visible to you.
To initiate the publish and provide further data:
```python
client.uploads.exec_upload_operation(upload_id=upload.upload_id, payload={
'operation': 'publish',
'metadata': {
'comment': 'Data from a cool external project',
'references': ['http://external.project.eu'],
# 'coauthors': ['sheldon.cooper@ucla.edu'], this does not yet work with emails
# 'external_id': 'external_id' this does also not work, but we could implement something like this
}
})
```
Publishing, also might take a while. You can inspect this analog to the upload processing:
```python
while upload.process_running:
try:
upload = client.uploads.get_upload(upload_id=upload.upload_id).response().result
time.sleep(1)
except HTTPNotFound:
# upload gets deleted from the upload staging area once published
break
```
This time we needed some exception handling, since the upload will be removed from the
staging area, and you will get a 404 on the `uploads` endpoint.
### Searching for data
### Downloading data
......@@ -3,9 +3,6 @@ This is a brief example demonstrating the public nomad@FAIRDI API for doing oper
that might be necessary to integrate external project data.
"""
# This does not assume many specific python packages. Only the bravado
# library that allows to use swagger-based ReST APIs is required.
# It can be install via `pip install bravado`
from bravado.requests_client import RequestsClient
from bravado.client import SwaggerClient
from bravado.exception import HTTPNotFound
......@@ -14,13 +11,11 @@ import time
import os.path
import sys
nomad_url = 'http://enc-staging-nomad.esc.rzg.mpg/fairdi/nomad/v0.3.0/api'
nomad_url = 'http://enc-staging-nomad.esc.rzg.mpg.de/fairdi/nomad/v0.3.0/api'
user = 'leonard.hofstadter@nomad-fairdi.tests.de'
password = 'password'
# lets assume we have a test file from our external project
# with (among others) `/external_id/BrSiTi/vasp.xml.gz`
upload_file = 'externa_project_example.tgz'
upload_file = 'external_project_example.zip'
# create the bravado client
host = urlparse(nomad_url).netloc.split(':')[0]
......@@ -29,10 +24,9 @@ http_client.set_basic_auth(host, user, password)
client = SwaggerClient.from_url('%s/swagger.json' % nomad_url, http_client=http_client)
# upload data
# create the upload by uploaded a .zip file
upload = client.uploads.upload(file=upload_file).response().result
# constantly polling the upload to get updates on the processing
while upload.processing_running:
with open(upload_file, 'rb') as f:
upload = client.uploads.upload(file=f).response().result
while upload.tasks_running:
upload = client.uploads.get_upload(upload_id=upload.upload_id).response().result
time.sleep(5)
print('processed: %d, failures: %d' % (upload.processed_calcs, upload.failed_calcs))
......@@ -42,26 +36,20 @@ if upload.tasks_status != 'SUCCESS':
print('something went wrong')
print('errors: %s' % str(upload.errors))
# delete the unsuccessful upload
client.uploads.delete_upload(upload_id=upload.upload_id)
client.uploads.delete_upload(upload_id=upload.upload_id).response().result
sys.exit(1)
# publish data
# In the upload staging area the data is only visible to you. It has to be published
# to get into the public nomad.
# Therefore, the later search and download steps will work without publishing, but only if the
# client is authenticated with your user account. You should do that for testing stuff out,
# because there is no user-based deleting of published data.
# The publish step also allows you to provide additional metadata, see below.
client.uploads.exec_upload_operation(upload_id=upload.upload_id, payload={
'operation': 'publish',
'metadata': {
'comment': 'Data from materials project',
'references': ['http://materials-project.gov'],
# 'coauthors': ['person@lbl.gov', '...'], this does not yet work with emails
# 'external_id': 'a/mp/id' this does also not work, but we could implement something like this
'comment': 'Data from a cool external project',
'references': ['http://external.project.eu'],
# 'coauthors': ['sheldon.cooper@ucla.edu'], this does not yet work with emails
# 'external_id': 'external_id' this does also not work, but we could implement something like this
}
})
while upload.processing_running:
while upload.process_running:
try:
upload = client.uploads.get_upload(upload_id=upload.upload_id).response().result
time.sleep(1)
......@@ -70,28 +58,20 @@ while upload.processing_running:
break
# search for data
# The `paths` searchkey is a text search (works like google), where tokens are separated
# by '/'. You can basically search for (multiple) parts within a file path. Note
# that the following will also match `/prefix/tag1/something/else/tag2/vasp.xml`. It will
# not match `/tag2/tag3/vaps.xml` and also not `/tag1/tags2.xml`.
result = client.repo.get_calcs(paths='tag1/tag2').response().result
# The results are paginated. The pagination key holds an object with total, page, per_page
# kind of information
result = client.repo.search(paths='external_id').response().result
if result.pagination.total == 0:
print('not found')
sys.exit(1)
elif result.pagination.total > 1:
print('my ids are not specific enough, bummer ...')
sys.exit(1)
else:
# The results key holds an array with the current page data
calc = result.results[0]
print('my ids are not specific enough, bummer ... or did I uploaded stuff multiple times?')
# The results key holds an array with the current page data
calc = result.results[0]
# download data
# via api
client.raw.get(upload_id=calc.upload_id, path=calc.mainfile).response()
client.raw.get(upload_id=calc['upload_id'], path=calc['mainfile']).response()
# via download
# just the 'mainfile'
url = '%s/raw/%s/%s' % (nomad_url, calc.upload_id, calc.mainfile)
url = '%s/raw/%s/%s' % (nomad_url, calc['upload_id'], calc['mainfile'])
# all files
url = '%s/raw/%s/%s/*' % (nomad_url, calc.upload_id, os.path.dirname(calc.mainfile))
url = '%s/raw/%s/%s/*' % (nomad_url, calc['upload_id'], os.path.dirname(calc['mainfile']))
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment