diff --git a/README.parsers.md b/README.parsers.md index 32acc2b79c3ecdecc91b870d5632b1f52a14e3ee..eef8f7223f7e5747695d5825f1898094c537d9d2 100644 --- a/README.parsers.md +++ b/README.parsers.md @@ -7,27 +7,26 @@ output files and provide all information in NOMAD's unified Metainfo based Archi ## Preparing code input and output file for uploading to NOMAD -NOMAD accepts `.zip` and `.tar.gz` archives as uploads. Each upload can contain arbitrary -files and directories. NOMAD will automatically try to choose the right parser for you files. -For each parser (i.e. for each supported code) there is one type of file that the respective -parser can recognize. We call these files `mainfiles` as they typically are the main -output file a code. For each `mainfile` that NOMAD discovers it will create an entry -in the database that users can search, view, and download. NOMAD will associate all files -in the same directory as files that also belong to that entry. Parsers +An *upload* is basically a directory structure with files. If you have all the files locally +you can just upload everything as a `.zip` or `.tar.gz` file in a single step. While the upload is +in the *staging area* (i.e. before it is published) you can also easily add or remove files in the +directory tree via the web interface. NOMAD will automatically try to choose the right parser +for you files. + +For each parser there is one type of file that the respective parser can recognize. We call +these files *mainfiles*. For each mainfile that NOMAD discovers it will create an *entry* +in the database, which users can search, view, and download. NOMAD will consider all files +in the same directory as *auxiliary files* that also are associated with that entry. Parsers might also read information from these auxillary files. This way you can add more files -to an entry, even if the respective parser/code might not directly support it. +to an entry, even if the respective parser/code might not use them. However, we strongly +recommend to not have multiple mainfiles in the same directory. For CMS calculations, we +recommend having a separate directory for each code run. -For $codeLabel$ please provide at least the files from this table if applicable to your -calculations (remember that you can provide more files if you want): +For $codeLabel$ please provide at least the files from this table, if applicable +(remember that you always can provide additional files if you want): $tableOfFiles$ -To create an upload with all calculations in a directory structure: - -``` -zip -r <upload-file>.zip <directory>/* -``` - Go to the [NOMAD upload page](https://nomad-lab.eu/prod/rae/gui/uploads) to upload files or find instructions about how to upload files from the command line. diff --git a/docs/developers.md b/docs/developers.md index 58a5ea9c22e9788529aab6b5825e71edf53cb2fd..b396a0d8cb2643ab5147d79d1432e6fba97a09f1 100644 --- a/docs/developers.md +++ b/docs/developers.md @@ -380,30 +380,36 @@ Btw., the latter is almost never necessary in python. Terms: -- upload: A logical unit that comprises one (.zip) file uploaded by a user. -- calculation: A computation in the sense that is was created by an individual run of a CMS code. -- raw file: User uploaded files (e.g. part of the uploaded .zip), usually code input or output. -- upload file/uploaded file: The actual (.zip) file a user uploaded -- mainfile: The mainfile output file of a CMS code run. +- upload: A logical unit that comprises a collection of files uploaded by a user, organized + in a directory structure. +- entry: An archive item, created by parsing a *mainfile*. Each entry belongs to an upload and + is associated with various metadata (an upload may have many entries). +- calculation: denotes the results of a theoretical computation, created by CMS code. + Note that entries do not have to be based on calculations; they can also be based on + experimental results. +- raw file: A user uploaded file, located somewhere in the upload's directory structure. +- mainfile: A raw file identified as parseable, defining an entry of the upload in question. - aux file: Additional files the user uploaded within an upload. -- repo entry: Some quantities of a calculation that are used to represent that calculation in the repository. -- archive data: The normalized data of one calculation in nomad's meta-info-based format. +- entry metadata: Some quantities of an entry that are searchable in NOMAD. +- archive data: The normalized data of an entry in nomad's meta-info-based format. Throughout nomad, we use different ids. If something is called *id*, it is usually a random uuid and has no semantic connection to the entity -it identifies. If something is called a *hash* than it is a hash build based on the +it identifies. If something is called a *hash* then it is a hash generated based on the entity it identifies. This means either the whole thing or just some properties of said entities. - The most common hashes is the `calc_hash` based on mainfile and auxfile contents. -- The `upload_id` is a UUID assigned at upload time and never changed afterwards. -- The `mainfile` is a path within an upload that points to a main code output file. - Since, the upload directory structure does not change, this uniquely ids a calc within the upload. -- The `calc_id` (internal calculation id) is a hash over the `mainfile` and respective - `upload_id`. Therefore, each `calc_id` ids a calc on its own. -- We often use pairs of `upload_id/calc_id`, which in many context allow to resolve a calc +- The `upload_id` is a UUID assigned to the upload on creation. It never changes. +- The `mainfile` is a path within an upload that points to a file identified as parseable. + This also uniquely identifies an entry within the upload. +- The `entry_id` (`calc_id`) uniquely identifies an entry. It is a hash over the `mainfile` + and respective `upload_id`. **Note**: we want to switch to use `entry_id` and "entry-terminology" + instead of `calc_id` and "calc-terminology". Thus **always use the former if possible**. +- We often use pairs of `upload_id/entry_id`, which in many contexts allow to resolve an entry related file on the filesystem without having to ask a database about it. -- The `pid` or (`coe_calc_id`) is an sequential interger id. +- The `pid` or (`coe_calc_id`) is a legacy sequential interger id, previously used to identify + entries. We still store the `pid` on these older entries for historical purposes. - Calculation `handle` or `handle_id` are created based on those `pid`. To create hashes we use :py:func:`nomad.utils.hash`. diff --git a/docs/index.md b/docs/index.md index 99980e6b705df379f7d939181b6e62f7ee7c0f8b..5c0cb6764cadd805fb92e254605e19e4719b30de 100644 --- a/docs/index.md +++ b/docs/index.md @@ -179,7 +179,7 @@ Elasticsearch allows for flexible scalable search and analytics. #### mongodb [Mongodb](https://docs.mongodb.com/) is used to store and track the state of the -processing of uploaded files and therein contained calculations. We use +processing of uploaded files and the generated entries. We use [mongoengine](http://docs.mongoengine.org/) to program with mongodb. diff --git a/docs/web.md b/docs/web.md index f9419b6ac5695ae54b5f23851c72407b9e3bc5b1..670ccf5cb0c8ebedc32700aecfc49075521a6fe6 100644 --- a/docs/web.md +++ b/docs/web.md @@ -68,7 +68,7 @@ User metadata can also be provided in a file that is uploaded. This can be a `.j "<dataset-name>" ], "entries": { - "path/to/calcs/vasp.xml": { + "path/to/entry_dir/vasp.xml": { "commit": "An entry specific comment." } } diff --git a/gui/src/components/About.js b/gui/src/components/About.js index 3e75ff207beeda0cbfb02341518029fc9e1ae14f..a42bfea1ea8d6a8fbe5bbb3ac1381dec3684b83f 100644 --- a/gui/src/components/About.js +++ b/gui/src/components/About.js @@ -36,36 +36,31 @@ function CodeInfo({code, ...props}) { let introduction = ` For [${metadata.codeLabel || code}](${metadata.codeUrl}) please provide - all files that were used as input, were output by the code, or were produced you. + all input and output files and any other relevant files you may have produced. ` if (metadata.tableOfFiles && metadata.tableOfFiles !== '') { introduction = ` For [${metadata.codeLabel || code}](${metadata.codeUrl}) please provide at least - the files from the following table (if applicable). Ideally, you upload - all files that were used as input, were output by the code, or were produced you. - ` + the files from the following table (if applicable). We recommend to upload + all input and output files and any other relevant files you may have produced. + ` } return <Dialog open={true} {...props}> <DialogTitle>{metadata.codeLabel || code}</DialogTitle> <DialogContent> <Markdown>{` - ${introduction} NOMAD will present all files in the same directory for each - recognized calculation. This works best, if you put all files that belong to - individual code runs into individual directories or only combine files from a few - runs in the same directory. + ${introduction} All files located in the same directory as a *mainfile* (i.e. a parseable + file which defines an entry) are considered to be associated with the entry. + You should therefore put all files related to the same entry in the same directory. + However, try to avoid putting multiple *mainfiles* in the same directory, to avoid + confusion. For CMS calculations, we recommend a separate directory for each code run. ${metadata.tableOfFiles} ${(metadata.parserSpecific && metadata.parserSpecific !== '' && `Please note specifically for ${metadata.codeLabel || code}: ${metadata.parserSpecific}`) || ''} - To create an upload with all calculations in a directory structure: - - \`\`\` - zip -r <upload-file>.zip <directory>/* - \`\`\` - You can find further information on [the project page for NOMAD's ${metadata.codeLabel || code} parser](${metadata.parserGitUrl}). `}</Markdown> </DialogContent> @@ -278,7 +273,7 @@ export default function About() { </Grid> <InfoCard xs={4} title="Uploading is simple" bottom> <p> - You provide your own data <i>as is</i>. Just zip your code input and out files as they are, + You provide your own data <i>as is</i>. Just zip your files as they are, including nested directory structures and potential auxiliary files, and upload up to 32GB in a single .zip or .tar(.gz) file. NOMAD will automatically discover and process the relevant files. @@ -289,9 +284,9 @@ export default function About() { selected users. </p> <p> - Add additional metadata like <b>comments</b>, <b>references</b> to websites or papers, and your - <b>co-authors</b>. Curate your uploaded code runs into larger <b>datasets</b> and cite your data with a <b>DOI</b> - that we provide on request. + Add additional metadata like <b>comments</b>, <b>references</b> to websites or papers, and + your <b>co-authors</b>. Organize the uploaded entries into <b>datasets</b> and + cite your data with a <b>DOI</b> that we provide on request. </p> <p> You can provide via GUI or shell command <Link component={RouterLink} to={'/uploads'}>here</Link>. @@ -302,11 +297,11 @@ export default function About() { <p> Uploaded data is automatically processed and made available in the uploaded <b>raw files</b> or in its processed and unified <b>Archive</b> form. - NOMAD parsers convert raw code input and output files into NOMAD's common data format. + NOMAD parsers convert raw files into NOMAD's common data format. You can inspect the Archive form and extracted metadata before publishing your data. </p> - <p>NOMAD supports most community codes: <CodeList/></p> + <p>NOMAD supports most community codes and file formats: <CodeList/></p> <p> To use NOMAD's parsers and normalizers outside of NOMAD. Read <Link href="">here</Link> on how to install diff --git a/nomad/metainfo/README.md b/nomad/metainfo/README.md index 7bf3228814b3dd2bf3dd231b2cf3cf4ea6b59fd5..0cd592bcb5b608a10393645ac466a255740eb63f 100644 --- a/nomad/metainfo/README.md +++ b/nomad/metainfo/README.md @@ -473,7 +473,7 @@ nomad archive: nomad.archive(upload_id='hh41jh4l1e91821').run.system.atom_labels.values ``` -This can also be used to extend queries for calculations with queries for certain +This can also be used to extend queries for entries with queries for certain data points: ```python nomad.archive(user='me@email.org', upload_name='last_upload').run.system(