Commit 90002052 authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Added more information about uploading data to the docs.

parent 32765da2
...@@ -8,7 +8,7 @@ and infrastructure with a simplyfied architecture and consolidated code base. ...@@ -8,7 +8,7 @@ and infrastructure with a simplyfied architecture and consolidated code base.
:maxdepth: 1 :maxdepth: 1
introduction.md introduction.md
upload.rst upload.md
api.md api.md
client/client.rst client/client.rst
metainfo.rst metainfo.rst
......
================== # How to upload data
How to upload data
==================
To contribute your data to the repository, please, login to our `upload page <../gui/uploads>`_ To contribute your data to the repository, please, login to our [upload page](https://nomad-lab.eu/prod/rae/gui/uploads)
(you need to register first, if you do not have a NOMAD account yet). (you need to register first, if you do not have a NOMAD account yet).
*A note for returning NOMAD users!* We revised the upload process with browser based upload *A note for returning NOMAD users!* We revised the upload process with browser based upload
alongside new shell commands. The new Upload page allows you to monitor upload processing alongside new shell commands. The new Upload page allows you to monitor upload processing
and verify processing results before publishing your data to the Repository. and verify processing results before publishing your data to the Repository.
The `upload page <../gui/uploads>`_ acts as a staging area for your data. It allows you to The [upload page](https://nomad-lab.eu/prod/rae/gui/uploads) acts as a staging area for your data. It allows you to
upload data, to supervise the processing of your data, and to examine all metadata that upload data, to supervise the processing of your data, and to examine all metadata that
NOMAD extracts from your uploads. The data on the upload page will be private and can be NOMAD extracts from your uploads. The data on the upload page will be private and can be
deleted again. If you are satisfied with our processing, you can publish the data. deleted again. If you are satisfied with our processing, you can publish the data.
...@@ -22,14 +20,6 @@ You should upload many files at the same time by creating .zip or .tar files of ...@@ -22,14 +20,6 @@ You should upload many files at the same time by creating .zip or .tar files of
Ideally, input and output files are accompanied by relevant auxiliary files. NOMAD will Ideally, input and output files are accompanied by relevant auxiliary files. NOMAD will
consider everything within a single directory as related. consider everything within a single directory as related.
**A note for VASP users** on the handling of **POTCAR** files: NOMAD takes care of it; you don't
need to worry about it. We understand that according to your VASP license, POTCAR files are
not supposed to be visible to the public. Thus, in agreement with Georg Kresse, NOMAD will
extract the most important information of POTCAR files and store it in the files named
``POTCAR.stripped``. These files can be assessed and downloaded by anyone, while the original
POTCAR files are only available to the uploader and assigned co-authors.
This is done automatically; you don't need to do anything.
Once published, data cannot be erased. Linking a corrected version to a corresponding older Once published, data cannot be erased. Linking a corrected version to a corresponding older
one ("erratum") will be possible soon. Files from an improved calculation, even for the one ("erratum") will be possible soon. Files from an improved calculation, even for the
same material, will be handled as a new entry. same material, will be handled as a new entry.
...@@ -50,4 +40,127 @@ By uploading you confirm authorship of the uploaded calculations. Co-authors mus ...@@ -50,4 +40,127 @@ By uploading you confirm authorship of the uploaded calculations. Co-authors mus
after the upload process. This procedure is very much analogous to the submission of a after the upload process. This procedure is very much analogous to the submission of a
publication to a scientific journal. publication to a scientific journal.
Upload of data is free of charge. Upload of data is free of charge.
\ No newline at end of file
## Limits
The following limitations apply to uploading:
- One upload cannot exceed 32 GB in size
- Only 10 non published uploads are allowed per user
## On the supported codes
NOMAD is interpreting your files. It will check each file and recognize if it is the
main output file of one of the supported codes. NOMAD will create a entry for this *mainfile*
that represents the respective data of this code run, experiment, etc. NOMAD only
shows that for such recognized entries. If you uploads do not contain any files that
NOMAD recognizes, you upload will be shown as empty and no data can be published.
However, all files that are associated to a recognized *mainfile* by residing in the
same directory, will be presented as *auxiliary* files along side the entry represented
by the *mainfile*.
### A note for VASP users
On the handling of **POTCAR** files: NOMAD takes care of it; you don't
need to worry about it. We understand that according to your VASP license, POTCAR files are
not supposed to be visible to the public. Thus, in agreement with Georg Kresse, NOMAD will
extract the most important information of POTCAR files and store it in the files named
`POTCAR.stripped`. These files can be assessed and downloaded by anyone, while the original
POTCAR files are only available to the uploader and assigned co-authors.
This is done automatically; you don't need to do anything.
## Preparing an upload file
You can upload .zip and .tar.gz files to NOMAD. The directory structure within can
be arbitrary. Keep in mind that files in a single directory are all associated (see above).
Ideally you only keep the files of a single (or closely related) code runs, experiments, etc.
in one directory.
You should not place files in additional archives within the upload file. NOMAD will not
extract any zips in zips and similar entrapments.
## Uploading large amounts of data
This problem is many fold. In the remainder the following topics are discussed.
- NOMAD restrictions about upload size and number of unpublished simultaneous uploads
- Managing metadata (comments, references, co-authors, datasets) for a large number of entries
- Safely transferring the data to NOMAD
### General strategy
Before you attempt to upload large amounts of data, do some experiments with a representative
and small subset of your data. Use this to simulate a larger upload,
checking and editing it the normal way. You do not have to publish this test upload;
simply delete it before publish, once you are satisfied with the results.
Ask for assistance. [Contact us](https://nomad-lab.eu/about/contact) in advance. This will
allow us to react to your specific situation and eventually prepare additional measures.
Keep enough time before you need your data to be published. Adding multiple hundreds of
GBs to NOMAD isn't a trivial feat and will take some time and effort from all sides.
### Upload restrictions
The upload restrictions are necessary to keep NOMAD data in manageable chunks and we cannot
simply grant exceptions to these rules.
This means you have to split your data into 32 GB uploads. Uploading these files, observing
the processing, and publishing the data can be automatized through NOMAD APIs.
When splitting your data, it is important to not split sub-directories if they represent
all files of a single entry. NOMAD can only bundle those related files to an entry if
they are part of the same upload (and directory). Therefore, there is no single recipe to
follow and a script to split your data will depend on how your data is organized.
### Avoid additional operations on your data
Changing the metadata of a large amounts of entries can be expensive and will also mean
more work with our APIs. A simpler solution is to add the metadata directly to your uploads.
This way NOMAD can pick it up automatically, no further actions required.
Each NOMAD upload can contain a `nomad.json` file at the root. This file can contain
metadata that you want to apply to all your entries. Here is an example:
```
{
"comment": "Data from a cool research project",
"references": ['http://archivex.org/mypaper'],
"co_authors": [
'<co-author-ids>',
'<co-author-ids>'
]
"datasets": [
'<dataset-id>'
],
"entries": {
"path/to/calcs/vasp.xml": {
"commit": "An entry specific comment."
}
}
}
```
Another measure is to directly publish your data upon upload. After performing some
smaller test upload, you should consider to skip our staging and publish the upload
right away. This can save you some time and additional API calls. The upload endpoint
has a parameter `publish_directly`. You can modify the upload command
that you get from the upload page like this:
```
curl "http://nomad-lab.eu/prod/rae/api/uploads/?token=<your-token>&publish_directly=true" -T <local_file>
```
### Save transfer of files
HTTP makes it easy for you to upload files via browser and curl, but it is not an
ideal protocol for the stable transfer of large and many files. Alternatively, we can organize
a separate manual file transfer to our servers. We will put your prepared upload
files (.zip or .tag.gz) on a predefined path on the NOMAD servers. NOMAD allows to *"upload"*
files directly from its servers via an additional `local_path` parameter:
```
curl -X PUT "http://nomad-lab.eu/prod/rae/api/uploads/?token=<your-token>&local_path=<path-to-upload-file>"
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment