Skip to content
Snippets Groups Projects
Commit a30ae5bb authored by Amir Golparvar's avatar Amir Golparvar
Browse files

Resolve "Revising Tabular parser features"

Tabular parser is updated with several new features and bug fixes: new entries can be created directly from the schema by providing mapping_options under tabular_parser annotation, each mapping_options section now create a new archive file, added further customization of the entry name, added comprehensive documentation under How-to and References, deprecated the old TableDataParser

Changelog: Added
parent 691119ae
Branches
Tags
1 merge request!1436Resolve "Some Tabular parser features to revise or add"
Showing
with 588 additions and 266 deletions
...@@ -2,7 +2,7 @@ This guide is about using NOMAD's REST APIs directly, e.g. via Python's *request ...@@ -2,7 +2,7 @@ This guide is about using NOMAD's REST APIs directly, e.g. via Python's *request
To access the processed data with our client library `nomad-lab` follow To access the processed data with our client library `nomad-lab` follow
[How to access the processed data](archive_query.md). You watch our [How to access the processed data](archive_query.md). You watch our
[video tutorial on the API](../tutorial.md#access-data-via-api). [video tutorial on the API](../tutorial/access_api.md#access-data-via-api).
## Different options to use the API ## Different options to use the API
......
...@@ -24,12 +24,19 @@ NOMAD is useful for scientists that work with data, for research groups that nee ...@@ -24,12 +24,19 @@ NOMAD is useful for scientists that work with data, for research groups that nee
### Tutorial ### Tutorial
This series of [short videos will guide you through the main functionality of NOMAD](tutorial.md). A series of tutorials will guide you through the main functionality of NOMAD.
It covers th whole publish, explore, analyze cycle:
- [Upload and publish your own data](tutorial/upload_publish.md)
- [Use the search interface to identify interesting data](tutorial/explore.md)
- [Use the API to search and access processed data for analysis](tutorial/access_api.md)
- [Find and use the automations of the built-in schemas available in NOMAD](tutorial/builtin.md)
- [Create and use custom schemas in NOMAD](tutorial/custom.md)
- [Customization at its best: user-defined schema and automation](tutorial/plugins.md)
- [Third-party ELN integration](tutorial/third_party.md)
- [Example data and exercises](https://www.fairmat-nfdi.eu/events/fairmat-tutorial-1/tutorial-1-materials)
- [More videos and tutorials on YouTube](https://youtube.com/playlist?list=PLrRaxjvn6FDW-_DzZ4OShfMPcTtnFoynT)
- Upload and publish your own data
- Use the search interface to identify interesting data
- Use the API to search and access processed data for analysis
</div> </div>
<div markdown="block"> <div markdown="block">
......
...@@ -62,28 +62,6 @@ specific types of data. ...@@ -62,28 +62,6 @@ specific types of data.
</figcaption> </figcaption>
</figure> </figure>
### Shared entry structure
The processed data (archive) of each entry share the same structure. They all instantiate
the same root section `EntryArchive`. They all share common sections `metadata:EntryMetadata`
and `results:Results`. They also all contain a *data* section, but the used section
definition varies depending on the type of data of the specific entry. There is the
literal `data:EntryData` sub-section. Here `EntryData` is abstract and specific entries
will use concrete definitions that inherit from `EntryData`. There are also specific *data*
sections, like `run` for simulation data and `nexus` for nexus data.
!!! attention
The results, originally only designed for computational data, will soon be revised
an replaced by a different section. However, the necessity and function of a section
like this remains.
<figure markdown>
![schema language](super_structure.png)
<figcaption>
All entries instantiate the same section share the same structure.
</figcaption>
</figure>
### Base sections ### Base sections
Base section is a very loose category. In principle, every section definition can be Base section is a very loose category. In principle, every section definition can be
...@@ -148,7 +126,7 @@ and browse based on sub-sections, or explore the Metainfo through packages. ...@@ -148,7 +126,7 @@ and browse based on sub-sections, or explore the Metainfo through packages.
To see all user provided uploaded schemas, you can use a [search for the sub-section `definition`](https://nomad-lab.eu/prod/v1/gui/search/entries?quantities=definitions). To see all user provided uploaded schemas, you can use a [search for the sub-section `definition`](https://nomad-lab.eu/prod/v1/gui/search/entries?quantities=definitions).
The sub-section `definition` is a top-level `EntryArchive` sub-section. See also our The sub-section `definition` is a top-level `EntryArchive` sub-section. See also our
[how-to on writing and uploading schemas](http://127.0.0.1:8001/schemas/basics.html#uploading-schemas). [how-to on writing and uploading schemas](../schemas/basics.md#uploading-schemas).
### Contributing to the Metainfo ### Contributing to the Metainfo
...@@ -167,7 +145,7 @@ schemas, you most likely also upload data in archive files (or use ELNs to edit ...@@ -167,7 +145,7 @@ schemas, you most likely also upload data in archive files (or use ELNs to edit
Here you can also provide schemas and data in the same file. In many case Here you can also provide schemas and data in the same file. In many case
specific schemas will be small and only re-combine existing base sections. specific schemas will be small and only re-combine existing base sections.
See also our See also our
[how-to on writing schemas](http://127.0.0.1:8001/schemas/basics.html). [how-to on writing schemas](../schemas/basics.md).
## Data ## Data
...@@ -180,7 +158,45 @@ The Metainfo has many serialized forms. You can write `.archive.json` or `.archi ...@@ -180,7 +158,45 @@ The Metainfo has many serialized forms. You can write `.archive.json` or `.archi
files yourself. NOMAD internally stores all processed data in [message pack](https://msgpack.org/). Some files yourself. NOMAD internally stores all processed data in [message pack](https://msgpack.org/). Some
of the data is stored in mongodb or elasticsearch. When you request processed data via of the data is stored in mongodb or elasticsearch. When you request processed data via
API, you receive it in JSON. When you use the [ArchiveQuery](../apis/archive_query.md), all data is represented API, you receive it in JSON. When you use the [ArchiveQuery](../apis/archive_query.md), all data is represented
as Python objects (see also [here](http://127.0.0.1:8001/plugins/schemas.html#starting-example)). as Python objects (see also [here](../plugins/schemas.md#starting-example)).
No matter what the representation is, you can rely on the structure, names, types, shapes, and units No matter what the representation is, you can rely on the structure, names, types, shapes, and units
defined in the schema to interpret the data. defined in the schema to interpret the data.
## Archive files: a shared entry structure
Broadening the discussion on the *entry* files that one can find in NOMAD, both [schemas](#schema) or [processed data](#data) are serialized as the same kind of *archive file*, either `.archive.json` or `.archive.yaml`.
The NOMAD archive file is indeed composed by several sections.
NOMAD archive file:`EntryArchive`
* definitions: `Definitions`
* metadata: `EntryMetadata`
* data: `EntryData`
* run: `Run`
* nexus: `Nexus`
* workflow: `Workflow`
* results: `Results`
They all instantiate the same root section `EntryArchive`. They all share common sections `metadata:Metadata`
and `results:Results`. They also all contain a *data* section, but the used section
definition varies depending on the type of data of the specific entry. There is the
literal `data:EntryData` sub-section. Here `EntryData` is abstract and specific entries
will use concrete definitions that inherit from `EntryData`. There are also specific *data*
sections, like `run` for simulation data and `nexus` for nexus data.
!!! note
As shown in [Uploading schemas](../schemas/basics.md#uploading-schemas), one can, in principle, create an archive file with both `definitions` and one of the *data* sections filled, although this is not always desired because it will stick together a schema and a particular instance of that schema. They should be kept separate so that it is still possible to generate new data files from the same schema file.
!!! attention
The results, originally only designed for computational data, will soon be revised
an replaced by a different section. However, the necessity and function of a section
like this remains.
<figure markdown>
![schema language](super_structure.png)
<figcaption>
All entries instantiate the same section share the same structure.
</figcaption>
</figure>
...@@ -14,12 +14,105 @@ definitions: ...@@ -14,12 +14,105 @@ definitions:
Many annotations control the representation of data in the GUI. This can be for plots or data entry/editing capabilities. Many annotations control the representation of data in the GUI. This can be for plots or data entry/editing capabilities.
{{ pydantic_model('nomad.datamodel.metainfo.annotations.ELNAnnotation', heading='## eln') }} {{ pydantic_model('nomad.datamodel.metainfo.annotations.ELNAnnotation', heading='## ELN annotations') }}
{{ pydantic_model('nomad.datamodel.metainfo.annotations.BrowserAnnotation', heading='## Browser') }}
{{ pydantic_model('nomad.datamodel.metainfo.annotations.BrowserAnnotation', heading='## browser') }}
### `label_quantity`
This annotation goes in the section that we want to be filled with tabular data, not in the single quantities.
It is used to give a name to the instances that might be created by the parser. If it is not provided, the name of the section itself will be used as name.
Many times it is useful because, i. e., one might want to create a bundle of instances of, say, a "Substrate" class, each instance filename not being "Substrate_1", "Substrate_2", etc., but being named after a quantity contained in the class that is, for example, the specific ID of that sample.
```yaml
MySection:
more:
label_quantity: my_quantity
quantities:
my_quantity:
type: np.float64
shape: ['*']
description: "my quantity to be filled from the tabular data file"
unit: K
m_annotations:
tabular:
name: "Sheet1/my header"
plot:
x: timestamp
y: ./my_quantity
```
!!! important
The quantity designated as `label_quantity` should not be an array but a integer, float or string, to be set as the name of a file. If an array quantity is chosen, the parser would fall back to the use of the section as name.
## Tabular data ## Tabular data
{{ pydantic_model('nomad.datamodel.metainfo.annotations.TabularParserAnnotation', heading='### tabular_parser') }}
{{ pydantic_model('nomad.datamodel.metainfo.annotations.TabularAnnotation', heading='### tabular') }}
{{ pydantic_model('nomad.datamodel.metainfo.annotations.PlotAnnotation', heading='## plot') }} {{ pydantic_model('nomad.datamodel.metainfo.annotations.TabularAnnotation', heading='### `tabular`') }}
Each and every quantity to be filled with data from tabular data files should be annotated as the following example.
A practical example is provided in [How To](../schemas/tabular.md#preparing-the-tabular-data-file) section.
```yaml
my_quantity:
type: np.float64
shape: ['*']
description: "my quantity to be filled from the tabular data file"
unit: K
m_annotations:
tabular:
name: "Sheet1/my header"
plot:
x: timestamp
y: ./my_quantity
```
### `tabular_parser`
One special quantity will be dedicated to host the tabular data file. In the following examples it is called `data_file`, it contains the `tabular_parser` annotation, as shown below.
{{ pydantic_model('nomad.datamodel.metainfo.annotations.TabularParserAnnotation', heading = '') }}
### Available Combinations
|Tutorial ref.|`file_mode`|`mapping_mode`|`sections`|How to ref.|
|---|---|---|---|---|
|1|`current_entry`|`column`|`root`|[HowTo](../schemas/tabular.md#1-column-mode-current-entry-parse-to-root)|
|2|`current_entry`|`column`|my path|[HowTo](../schemas/tabular.md#2-column-mode-current-entry-parse-to-my-path)|
|<span style="color:red">np1</span>|`current_entry`|`row`|`root`|<span style="color:red">Not possible</span>|
|3|`current_entry`|`row`|my path|[HowTo](../schemas/tabular.md#3-row-mode-current-entry-parse-to-my-path)|
|<span style="color:red">np2</span>|`single_new_entry`|`column`|`root`|<span style="color:red">Not possible</span>|
|4|`single_new_entry`|`column`|my path|[HowTo](../schemas/tabular.md#4-column-mode-single-new-entry-parse-to-my-path)|
|<span style="color:red">np3</span>|`single_new_entry`|`row`|`root`|<span style="color:red">Not possible</span>|
|5|`single_new_entry`|`row`|my path|[HowTo](../schemas/tabular.md#5-row-mode-single-new-entry-parse-to-my-path)|
|<span style="color:red">np4</span>|`multiple_new_entries`|`column`|`root`|<span style="color:red">Not possible</span>|
|<span style="color:red">np5</span>|`multiple_new_entries`|`column`|my path|<span style="color:red">Not possible</span>|
|6|`multiple_new_entries`|`row`|`root`|[HowTo](../schemas/tabular.md#6-row-mode-multiple-new-entries-parse-to-root)|
|7|`multiple_new_entries`|`row`|my path|[HowTo](../schemas/tabular.md#7-row-mode-multiple-new-entries-parse-to-my-path)|
```yaml
data_file:
type: str
description: "the tabular data file containing data"
m_annotations:
tabular_parser:
parsing_options:
comment: '#'
mapping_options:
- mapping_mode: column
file_mode: single_new_entry
sections:
- my_section/my_quantity
```
<!-- The available options are:
|**name**|**type**|**description**|
|---|---|---|
|`parsing_options`|group of options|some pandas `Dataframe` options.|
|`mapping_options`|list of groups of options|they allow to choose among all the possible modes of parsing data from the spreadsheet file to the NOMAD archive file. Each group of options can be repeated in a list. | -->
{{ pydantic_model('nomad.datamodel.metainfo.annotations.PlotAnnotation', heading='## Plot') }}
{{ pydantic_model('nomad.datamodel.metainfo.annotations.BrowserAnnotation', heading='## browser') }}
docs/schemas/2col.png

26.2 KiB

docs/schemas/2col_notes.png

39.4 KiB

# Write NOMAD Schemas in YAML # Write NOMAD Schemas in YAML
This guide explains how to write and upload NOMAD schemas in our `.archive.yaml` format. For more information visit the [learn section on schemas](../learn/data.md). This guide explains how to write and upload NOMAD schemas in our `.archive.yaml` format. For more information on how an archive file is composed, visit the [learn section on schemas](../learn/data.md).
## Example data ## Example data
......
docs/schemas/columns.png

38.8 KiB

docs/schemas/rows.png

33.1 KiB

docs/schemas/rows_subsection.png

47.5 KiB

In order to import your data from a `.csv` or `Excel` file, NOMAD provides three distinct (and separate) ways, that Refer to the [Reference guide](../reference/annotations.md) for the full list of annotations connected to this parser and to the [Tabular parser tutorial](../tutorial/custom.md#the-built-in-tabular-parser) for a detailed description of each of them.
with each comes unique options for importing and interacting with your data. In order to better understand how to use
NOMAD tabular parser to import your data, follow three sections below. In each section you
can find a commented sample schema with a step-by-step guide on how to import your tabular data.
Tabular parser, implicitly, parse the data into the same NOMAD entry where the datafile is loaded. Also, explicitly,
this can be defined by putting the corresponding annotations under `current_entry` (check the examples below).
In addition, tabular parser can be set to parse the data into new entry (or entries). For this, the proper annotations
should be appended to `new_entry` annotation in your schema file.
Two main components of any tabular parser schema are:
1) implementing the correct base-section(s), and
2) providing a `data_file` `Quantity` with the correct `m_annotations`.
Please bear in mind that the schema files should 1) follow the NOMAD naming convention
(i.e. `My_Name.archive.yaml`), and 2) be accompanied by your data file in order for NOMAD to parse them.
In the examples provided below, an `Excel` file is assumed to contain all the data, as both NOMAD and
`Excel` support multiple-sheets data manipulations and imports. Note that the `Excel` file name in each schema
should match the name of the `Excel` data file, which in case of using a `.csv` data file, it can be replaced by the
`.csv` file name.
`TableData` (and any other section(s) that is inheriting from `TableData`) has a customizable checkbox Quantity
(i.e. `fill_archive_from_datafile`) to turn the tabular parser `on` or `off`.
If you do not want to have the parser running everytime you make a change to your archive data, it is achievable then via
unchecking the checkbox. It is customizable in the sense that if you do not wish to see this checkbox at all,
you can configure the `hide` parameter of the section's `m_annotations` to hide the checkbox. This in turn sets
the parser to run everytime you save your archive.
Be cautious though! Turning on the tabular parser (or checking the box) on saving your data will cause ## Preparing the tabular data file
losing/overwriting your manually-entered data by the parser!
## Column-mode NOMAD and `Excel` support multiple-sheets data manipulations and imports. Each quantity in the schema will be annotated with a source path composed by sheet name and column header. The path to be used with the tabular data displayed below would be `Sheet1/My header 1` and it would be placed it the `tabular` annotation, see [Schema annotations](../tutorial/custom.md#to-be-an-entry-or-not-to-be-an-entry) section.
The following sample schema creates one quantity off the entire column of an excel file (`column mode`).
For example, suppose in an excel sheet, several rows contain information of a chemical product (e.g. `purity` in one
column). In order to list all the purities under the column `purity` and import them into NOMAD, you can use the
following schema by substituting `My_Quantity` with any name of your choice (e.g. `Purity`),
`tabular-parser.data.xlsx` with the name of the `csv/excel` file where the data lies, and `My_Sheet/My_Column` with
sheet_name/column_name of your targeted data. The `Tabular_Parser` can also be changed to any arbitrary name of your
choice.
Important notes: <p align="center" width="100%">
<img width="30%" src="2col.png">
</p>
- `shape: ['*']` under `My_Quantity` is essential to parse the entire column of the data file. In the case there is only one sheet in the Excel file, or when using a `.csv` file that is a single-sheet format, the sheet name is not required in the path.
- The `data_file` `Quantity` can have any arbitrary name (e.g. `xlsx_file`)
- `My_Quantity` can also be defined within another subsection (see next sample schema)
- Use `current_entry` and append `column_to_sections` to specify which sub_section(s) is to be filled in
this mode. `Leaving this field empty` causes the parser to parse the entire schema under column mode.
```yaml The data sheets can be stored in one or more files depending on the user needs. Each sheet can independently be organized in one of the following ways:
--8<-- "examples/data/docs/tabular-parser-col-mode.archive.yaml"
``` 1) Columns:<br />
each column contains an array of cells that we want to parse into one quantity. Example: time and temperature arrays to be plotted as x and y.
<p align="center" width="100%">
<img width="30%" src="columns.png">
</p>
2) Rows:<br />
each row contains a set of cells that we want to parse into a section, i. e. a set of quantities. Example: an inventory tabular data file (for substrates, precursors, or more) where each column represents a property and each row corresponds to one unit stored in the inventory.
<p align="center" width="100%">
<img width="30%" src="rows.png">
</p>
3) Rows with repeated columns:<br />
<b>Step-by-step guide to import your data using column-mode:</b>
After writing your schema file, you can create a new upload in NOMAD (or use an existing upload), in addition to the mode 2), whenever the parser detects the presence of multiple columns (or multiple sets of columns) with same headers, these are taken as multiple instances of a subsection. More explanations will be delivered when showing the schema for such a structure. Example: a crystal growth process where each row is a step of the crystal growth and the repeated columns describe the "precursor materials", that can be more than one during such processes and they are described by the same "precursor material" section.
and upload both your `schema file` and the `excel/csv` file together (or zipped) to your NOMAD project. In the
`Overview` page of your NOMAD upload, you should be able to see a new entry created and appended to the `Process data`
section. Go to the entry page, click on `DATA` tab (on top of the screen) and in the `Entry` lane, your data
is populated under the `data` sub_section.
#### Row-mode Sample: <p align="center" width="100%">
The sample schema provided below, creates separate instances of a repeated section from each row of an excel file <img width="45%" src="rows_subsection.png">
(`row mode`). For example, suppose in an excel sheet, you have the information for a chemical product </p>
(e.g. `name` in one column), and each row contains one entry of the aforementioned chemical product.
Since each row is separate from others, in order to create instances of the same product out of all rows
and import them into NOMAD, you can use the following schema by substituting `My_Subsection`,
`My_Section` and `My_Quantity` with any appropriate name (e.g. `Substance`, `Chemical_product`
and `Name` respectively).
Important notes: Furthermore, we can insert comments before our data, we can use a special character to mark one or more rows as comment rows. The special character is annotated within the schema in the [parsing options](#parsing-options) section:
- This schema demonstrates how to import data within a subsection of another subsection, meaning the <p align="center" width="100%">
targeted quantity should not necessarily go into the main `quantites`. <img width="30%" src="2col_notes.png">
- Setting `row_to_sections` under `current_entry` signals that for each row in the sheet_name (provided in `My_Quantity`), </p>
one instance of the corresponding (sub-)section (in this example, `My_Subsection` sub-section as it has the `repeats`
option set to true), will be appended. Please bear in mind that if this mode is selected, then all other quantities
in this sub_section, should exist in the same sheet_name.
## Inheriting the TableData base section
`TableData` can be inherited adding the following lines in the yaml schema file:<br />
```yaml
MySection:
base_sections:
- nomad.datamodel.data.EntryData
- nomad.parsing.tabular.TableData
```
`EntryData` is usually also necessary as we will create entries from the section we are defining.<br />
`TableData` provides a customizable checkbox quantity, called `fill_archive_from_datafile`, to turn the tabular parser `on` or `off`.<br />
To avoid the parser running everytime a change is made to the archive data, it is sufficient to uncheck the checkbox. It is customizable in the sense that if you do not wish to see this checkbox at all, you can configure the `hide` parameter of the section's `m_annotations` to hide the checkbox. This in turn sets the parser to run everytime you save your archive. To hide it, add the following lines:
```yaml ```yaml
--8<-- "examples/data/docs/tabular-parser-row-mode.archive.yaml" MySection:
base_sections:
- nomad.datamodel.data.EntryData
- nomad.parsing.tabular.TableData
m_annotations:
eln:
hide: ['fill_archive_from_datafile']
``` ```
<b>Step-by-step guide to import your data using row-mode:</b> Be cautious though! Turning on the tabular parser (or checking the box) on saving your data will cause
losing/overwriting your manually-entered data by the parser!
## Importing data in NOMAD
After writing a schema file and creating a new upload in NOMAD (or using an existing upload), it is possible to upload the schema file. After creating a new Entry out of one section of the schema, the tabular data file must be dropped in the quantity designated by the `FileEditQuantity` annotation. After clicking save the parsing will start. In the Overview page of the NOMAD upload, new Entries are created and appended to the Processed data section. In the Entry page, clicking on DATA tab (on top of the screen) and in the Entry lane, the data is populated under the `data` subsection.
## Hands-on examples of all tabular parser modes
After writing your schema file, you can create a new upload in NOMAD (or use an existing upload), In this section eight examples will be presented, containing all the features available in tabular parser. Refer to the [Tutorial](../tutorial/custom.md#to-be-an-entry-or-not-to-be-an-entry) for more comments on the implications of the structures generated by the following yaml files.
and upload both your `schema file` and the `excel/csv` file together (or zipped) to your NOMAD project. In the
`Overview` page of your NOMAD upload, you should be able to see as many new sub-sections created and appended
to the repeating section as there are rows in your `excel/csv` file.
Go to the entry page of the new entries, click on `DATA` tab (on top of the screen) and in the `Entry` lane,
your data is populated under the `data` sub_section.
#### Entry-mode Sample:
The following sample schema creates one entry for each row of an excel file (`entry mode`).
For example, suppose in an excel sheet, you have the information for a chemical product (e.g. `name` in one column),
and each row contains one entry of the aforementioned chemical product. Since each row is separate from others, in
order to create multiple archives of the same product out of all rows and import them into NOMAD, you can use the
following schema by substituting `My_Quantity` with any appropriate name (e.g. `Name`).
Important note: ### 1. Column mode, current Entry, parse to root
- To create new entries based on your entire schema, set `row_to_entries` to `- root`. Otherwise, you can <p align="center" width="100%">
provide the relative path of specific sub_section(s) in your schema to create new entries. <img width="100%" src="../tutorial/tabular-1.png">
- Leaving `row_to_entries` empty causes the parser to parse the entire schema using <b>column mode</b>! </p>
The first case gives rise to the simplest data archive file. Here the tabular data file is parsed by columns, directly within the Entry where the `TableData` is inherited and filling the quantities in the root level of the schema (see dedicated how-to to learn [how to inherit tabular parser in your schema](../schemas/tabular.md#inheriting-the-tabledata-base-section)).
!!! important
- `data_file` quantity, i.e. the tabular data file name, is located in the same Entry of the parsed quantities.
- double check that `mapping_options > sections` contains the right path. It should point to the (sub)section where the quantities are decorated with `tabular` annotation, i. e., the one to be filled with tabular data (`root` in this case).
- quantities parsed in `column` mode must have the `shape: ['*']` attribute, that means they are arrays and not scalars.
```yaml ```yaml
--8<-- "examples/data/docs/tabular-parser-entry-mode.archive.yaml" --8<-- "examples/data/docs/tabular-parser_1_column_current-entry_to-root.archive.yaml"
``` ```
<b>Step-by-step guide to import your data using entry-mode:</b> ### 2. Column mode, current Entry, parse to my path
After writing your schema file, you can create a new upload in NOMAD (or use an existing upload), <p align="center" width="100%">
and upload both your `schema file` and the `excel/csv` file together (or zipped) to your NOMAD project. In the <img width="100%" src="../tutorial/tabular-2.png">
`Overview` page of your NOMAD upload, you should be able to see as many new entries created and appended </p>
to the `Process data` section as there are rows in your `excel/csv` file.
Go to the entry page of the new entries, click on `DATA` tab (on top of the screen) and in the `Entry` lane, The parsing mode presented here only differs from the previous for the `sections` annotations. In this case the section that we want to fill with tabular data can be nested arbitrarily deep in the schema and the `sections` annotation must be filled with a forward slash path to the desired section, e. g. `my_sub_section/my_sub_sub_section`.
your data is populated under the `data` sub_section.
!!! important
<b>Advanced options to use/set in tabular parser:</b> - `data_file` quantity, i.e. the tabular data file name, is located in the same Entry of the parsed quantities.
- double check that `mapping_options > sections` contains the right path. It should point to the (sub)section where the quantities are decorated with `tabular` annotation, i. e., the one to be filled with tabular data.
- If you want to populate your schema from multiple `excel/csv` files, you can - the section to be parsed can be arbitrarily nested, given that the path provided in `sections` reachs it (e. g. `my_sub_sec/my_sub_sub_sec`).
define multiple data_file `Quantity`s annotated with `tabular_parser` in the root level of your schema - quantities parsed in `column` mode must have the `shape: ['*']` attribute, that means they are arrays and not scalars.
(root level of your schema is where you inherit from `TableData` class under `base_sections`).
Each individual data_file quantity can now contain a list of sub_sections which are expected to be filled
using one- or all of the modes mentioned above. Check the `MyOverallSchema` section in
`Complex Schema` example below. It contains 2 data_file quantities that each one, contains separate instructions
to populate different parts of the schema. `data_file_1` is responsible to fill `MyColSubsection` while `data_file_2`
fills all sub_sections listed in `row_to_sections` and `entry_to_sections` under `new_entry`.
- When using the entry mode, you can create a custom `Quantity` to hold a reference to each new entries
generated by the parser. Check the `MyEntrySubsection` section in the `Complex Schema` example below.
The `refs_quantity` is a `ReferenceEditQuantiy` with type `#/MyEntry` which tells the parser to
populate this quantity with a reference to the fresh entry of type `MyEntry`. Also, you may use
`tabular_pattern` annotation to explicitly set the name of the fresh entries.
- If you have multiple columns with exact same name in your `excel/csv` file, you can parse them using row mode.
For this, define a repeating sub_section that handles your data in different rows and inside each row, define another
repeating sub_section that contains your repeating columns. Check `MySpecialRowSubsection` section in the
`Complex Schema` example below. `data_file_2` contains a repeating column called `row_quantity_2` and
we want to create a section out of each row and each column. This is done by
creating one row of type `MySpecialRowSubsection` and populate
`MyRowQuantity3` quantity from `row_quantity_3` column in the `csv` file, and appending each column of
`row_quantity_2` to `MyRowQuantity2`.
```yaml ```yaml
--8<-- "examples/data/docs/tabular-parser-complex.archive.yaml" --8<-- "examples/data/docs/tabular-parser_2_column_current-entry_to-path.archive.yaml"
``` ```
Here are all parameters for the two annotations `Tabular Parser` and `Tabular`. ### 3. Row mode, current Entry, parse to my path
{{ pydantic_model('nomad.datamodel.metainfo.annotations.TabularParserAnnotation', heading='### Tabular Parser') }} <p align="center" width="100%">
{{ pydantic_model('nomad.datamodel.metainfo.annotations.TabularAnnotation', heading='### Tabular') }} <img width="100%" src="../tutorial/tabular-3.png">
</p>
{{ pydantic_model('nomad.datamodel.metainfo.annotations.PlotAnnotation', heading='## Plot Annotation') }} The current is the first example of parsing in row mode. This means that every row of the excel file while be placed in one instance of the section that is defined in `sections`. This section must be decorated with `repeats: true` annotation, it will allow to generate multiple instances that will be appended in a list with sequential numbers. Instead of sequential numbers, the list can show specific names if `label_quantity` annotation is appended to the repeated section. This annotation is included in the how-to example. The section is written separately in the schema and it does not need the `EntryData` inheritance because the instances will be grafted directly in the current Entry. As explained [below](#91-row-mode-current-entry-parse-to-root), it is not possible for `row` and `current_entry` to parse directly in the root because we need to create multiple instances of the selected subsection and organize them in a list.
## Built-in base sections for ELNs !!! important
- `data_file` quantity, i.e. the tabular data file name, is located in the same Entry of the parsed quantities.
- double check that `mapping_options > sections` contains the right path. It should point to the (sub)section where the quantities are decorated with `tabular` annotation, i. e., the one to be filled with tabular data.
- the section to be parsed can be arbitrarily nested, given that the path provided in `sections` reachs it (e. g. `my_sub_sec/my_sub_sub_sec`).
- quantities parsed in `row` mode are scalars.
- make use of `repeats: true` in the subsection within the parent section `MySection`.
- `label_quantity` annotation uses a quantity as name of the repeated section. If it is not provided, a sequential number will be used for each instance.
Coming soon ... ```yaml
--8<-- "examples/data/docs/tabular-parser_3_row_current-entry_to-path.archive.yaml"
```
### 4. Column mode, single new Entry, parse to my path
## Custom normalizers <p align="center" width="100%">
<img width="100%" src="../tutorial/tabular-4.png">
</p>
For custom schemas, you might want to add custom normalizers. All files are parsed One more step of complexity is added here: the parsing is not performed in the current Entry, but a new Entry it automatically generated and filled.
and normalized when they are uploaded or changed. The NOMAD metainfo Python interface This structure foresees a parent Entry where we collect one or more tabular data files and possibly other info while we want to separate a specific entity of our data structure in another searchable Entry in NOMAD, e. g. a substrate Entry or a measurement Entry that would be collected inside a parent experiment Entry. We need to inherit `SubSect` class from `EntryData` because these will be standalone archive files in NOMAD. Parent and children Entries are connected by means of the `ReferenceEditQuantity` annotation in the parent Entry schema. This annotation is attached to a quantity that becomes a hook to the other ones, It is a powerful tool that allows to list in the overview of each Entry all the other referenced ones, allowing to build paths of referencing available at a glance.
allows you to add functions that are called when your data is normalized.
Here is an example: !!! important
- `data_file` quantity, i.e. the tabular data file name, is located in the parent Entry, the data is parsed in the child Entry.
- double check that `mapping_options > sections` contains the right path. It should point to the (sub)section where the quantities are decorated with `tabular` annotation, i. e., the one to be filled with tabular data.
- the section to be parsed can be arbitrarily nested, given that the path provided in `sections` reachs it (e. g. `my_sub_sec/my_sub_sub_sec`)
- quantities parsed in `column` mode must have the `shape: ['*']` attribute, that means they are arrays and not scalars.
- inherit also the subsection from `EntryData` as it must be a NOMAD Entry archive file.
```python ```yaml
--8<-- "examples/archive/custom_schema.py" --8<-- "examples/data/docs/tabular-parser_4_column_single-new-entry_to-path.archive.yaml"
``` ```
To add a `normalize` function, your section has to inherit from `ArchiveSection` which ### 5. Row mode, single new Entry, parse to my path
provides the base for this functionality. Now you can overwrite the `normalize` function
and add you own behavior. Make sure to call the `super` implementation properly to <p align="center" width="100%">
support schemas with multiple inheritance. <img width="100%" src="../tutorial/tabular-5.png">
</p>
If we parse an archive like this: Example analogous to the previous, where the new created Entry contains now a repeated subsection with a list of instances made from each line of the tabular data file, as show in the [Row mode, current Entry, parse to my path](#3-row-mode-current-entry-parse-to-my-path) case.
!!! important
- `data_file` quantity, i.e. the tabular data file name, is located in the parent Entry, the data is parsed in the child Entry.
- double check that `mapping_options > sections` contains the right path. It should point to the (sub)section where the quantities are decorated with `tabular` annotation, i. e., the one to be filled with tabular data.
- the section to be parsed can be arbitrarily nested, given that the path provided in `sections` reachs it (e. g. `my_sub_sec/my_sub_sub_sec`)
- quantities parsed in `row` mode are scalars.
- inherit also the subsection from `EntryData` as it must be a NOMAD Entry archive file.
- make use of `repeats: true` in the subsection within the parent section `MySection`.
- `label_quantity` annotation uses a quantity as name of the repeated section. If it is not provided, a sequential number will be used for each instance.
```yaml ```yaml
--8<-- "examples/archive/custom_data.archive.yaml" --8<-- "examples/data/docs/tabular-parser_5_row_single-new-entry_to-path.archive.yaml"
``` ```
we will get a final normalized archive that contains our data like this: ### 6. Row mode, multiple new entries, parse to root
```json
{
"data": {
"m_def": "examples.archive.custom_schema.SampleDatabase",
"samples": [
{
"added_date": "2022-06-18T00:00:00+00:00",
"formula": "NaCl",
"sample_id": "2022-06-18 00:00:00+00:00--NaCl"
}
]
}
}
```
## Third-party integration <p align="center" width="100%">
<img width="100%" src="../tutorial/tabular-6.png">
</p>
NOMAD offers integration with third-party ELN providers, simplifying the process of connecting The last feature available for tabular parser is now introduced: `multiple_new_entries`. It is only meaningful for `row` mode because each row of the tabular data file will be placed in a new Entry that is an instance of a class defined in the schema, this would not make sense for columns, though, as they usually need to be parsed all together in one class of the schema, for example the "timestamp" and "temperature" columns in a spreadsheet file would need to lie in the same class as they belong to the same part of experiment.
and interacting with external platforms. Three main external ELN solutions that are integrated into NOMAD A further comment is needed to explain the combination of this feature with `root`. As mentioned before, using `root` foresees to graft data directly in the present Entry. In this case, this means that a manyfold of Entries will be generated based on the only class available in the schema. These Entries will not be bundled together by a parent Entry but just live in our NOMAD Upload as a spare list. They might be referenced manually by the user with `ReferenceEditQuantity` in other archive files. Bundling them together in one overarching Entry already at the parsing stage would require the next and last example to be introduced.
are: [elabFTW](https://www.elabftw.net/), [Labfolder](https://labfolder.com/) and [chemotion](https://chemotion.net/).
The process of data retrieval and data mapping onto NOMAD's schema
varies for each of these third-party ELN provider as they inherently allow for certain ways of communicating with their
database. Below you can find a <b>How-to</b> guide on importing your data from each of these external
repositories.
!!!important
- `data_file` quantity, i.e. the tabular data file name, is located in the parent Entry, the data is parsed in the children Entries.
- double check that `mapping_options > sections` contains the right path. It should point to the (sub)section where the quantities are decorated with `tabular` annotation, i. e., the one to be filled with tabular data.
- quantities parsed in `row` mode are scalars.
- inherit also the subsection from `EntryData` as it must be a NOMAD Entry archive file.
- make use of `repeats: true` in the subsection within the parent section `MySection`.
- `label_quantity` annotation uses a quantity as name of the repeated section. If it is not provided, a sequential number will be used for each instance.
### elabFTW integration ```yaml
--8<-- "examples/data/docs/tabular-parser_6_row_multiple-new-entries_to-root.archive.yaml"
```
elabFTW is part of [the ELN Consortium](https://github.com/TheELNConsortium) ### 7. Row mode, multiple new entries, parse to my path
and supports exporting experimental data in ELN file format. ELNFileFormat is a zipped file
that contains <b>metadata</b> of your elabFTW project along with all other associated data of
your experiments.
<b>How to import elabFTW data into NOMAD:</b> <p align="center" width="100%">
<img width="100%" src="../tutorial/tabular-7.png">
</p>
Go to your elabFTW experiment and export your project as `ELN Archive`. Save the file to your filesystem under As anticipated in the previous example, `row` mode in connection to `multiple_new_entries` will produce a manyfold of instances of a specific class, each of them being a new Entry. In the present case, each instance will also automatically be placed in a `ReferenceEditQuantity` quantity lying in a subsection defined within the parent Entry, coloured in plum in the following example image.
your preferred name and location (keep the `.eln` extension intact).
To parse your ebalFTW data into NOMAD,
go to the upload page of NOMAD and create a new upload. In the `overview` page, upload your exported file (either by
drag-dropping it into the <i>click or drop files</i> box or by navigating to the path where you stored the file).
This causes triggering NOMAD's parser to create as many new entries in this upload as there are experiments in your
elabFTW project.
You can inspect the parsed data of each of your entries (experiments) by going to the <b>DATA</b> !!!important
tab of each entry page. Under <i>Entry</i> column, click on <i>data</i> section. Now a new lane titled - `data_file` quantity, i.e. the tabular data file name, is located in the same Entry, the data is parsed in the children Entries.
`ElabFTW Project Import` should be visible. Under this section, (some of) the metadata of your project is listed. - double check that `mapping_options > sections` contains the right path. It should point to the (sub)section where the quantities are decorated with `tabular` annotation, i. e., the one to be filled with tabular data.
There two sub-sections: 1) <b>experiment_data</b>, and 2) <b>experiment_files</b>. - the section to be parsed can be arbitrarily nested, given that the path provided in `sections` reachs it (e. g. `my_sub_sec/my_sub_sub_sec`)
- quantities parsed in `row` mode are scalars.
- inherit also the subsection from `EntryData` as it must be a standalone NOMAD archive file.
- make use of `repeats: true` in the subsection within the parent section `MySection`.
- `label_quantity` annotation uses a quantity as name of the repeated section. If it is not provided, a sequential number will be used for each instance.
<b>experiment_data</b> section contains detailed information of the given elabFTW experiment, such as ```yaml
links to external resources and extra fields. <b>experiment_files</b> section is a list of sub-sections --8<-- "examples/data/docs/tabular-parser_7_row_multiple-new-entries_to-path.archive.yaml"
containing metadata and additional info of the files associated with the experiment. ```
### 8. The Sub-Subsection nesting schema
### Labfolder integration <p align="center" width="100%">
<img width="100%" src="../tutorial/tabular-8.png">
</p>
Labfolder provides API endpoints to interact with your ELN data. NOMAD makes API calls to If the tabular data file contains multiple columns with exact same name, there is a way to parse them using `row` mode. As explained in previous examples, this mode creates an instance of a subsection of the schema for each row of the file. Whenever column with same name are found they are interpreted as multiple instances of a sub-subsection nested inside the subsection. To build a schema with such a feature it is enough to have two nested classes, each of them bearing a `repeats: true` annotation. This structure can be applied to each and every of the cases above with `row` mode parsing.
retrieve, parse and map the data from your Labfolder instance/database to a NOMAD's schema.
To do so, the necessary information are listed in the table below:
<i>project_url</i>: !!!important
The URL address to the Labfolder project. it should follow this pattern: - make use of `repeats: true` in the subsection within the parent section `MySection` and also in the sub-subsection within `MySubSect`.
'https://your-labfolder-server/eln/notebook#?projectIds=your-project-id'. This is used to setup - `label_quantity` annotation uses a quantity as name of the repeated section. If it is not provided, a sequential number will be used for each instance.
the server and initialize the NOMAD schema.
<i>labfolder_email</i>: ```yaml
The email (user credential) to authenticate and login the user. <b>Important Note</b>: this --8<-- "examples/data/docs/tabular-parser_8_row_current-entry_to-path_subsubsection.archive.yaml"
information <b>is discarded</b> once the authentication process is finished. ```
<i>password</i>: ### 9. Not possible implementations
The password (user credential) to authenticate and login the user. <b>Important Note</b>: this
information <b>is discarded</b> once the authentication process is finished.
<b>How to import Labfolder data into NOMAD:</b> Some combinations of `mapping_options`, namely `file_mode`, `mapping_mode`, and `sections`, can give rise to not interpretable instructions or not useful data structure. For the sake of completeness, a brief explanation of the five not possible cases will be provided.
#### 9.1 Row mode, current Entry, parse to root
To get your data transferred to NOMAD, first go to NOMAD's upload page and create a new upload. `row` mode always requires a section instance to be populated with one row of cells from the tabular data file. Multiple instances are hence generated from the rows available in the file. The instances are organized in a list and the list must be necessarily hosted as a subsection in some parent section. That's why, within the parent section, a path in `sections` must be provided different from `root`.
Then click on `CREATE ENTRY` button. Select a name for your entry and pick `Labfolder Project Import` from
the `Built-in schema` dropdown menu. Then click on `CREATE`. This creates an entry where you can
insert your user information. Fill the `Project url`, `Labfolder email` and `password` fields. Once completed,
click on the `save icon` in the
top-right corner of the screen. This triggers NOMAD's parser to populate the schema of current ELN.
Now the metadata and all files of your Labfolder project should be populated in this entry.
The `elements` section lists all the data and files in your projects. There are 6 main data types #### 9.2 Column mode, single new Entry, parse to root
returned by Labfolder's API: `DATA`, `FILE`, `IMAGE`, `TABLE`, `TEXT` and `WELLPLATE`. `DATA` element is
a special Labfolder element where the data is structured in JSON format. Every data element in NOMAD has a special
`Quantity` called `labfolder_data` which is a flattened and aggregated version of the data content.
`IMAGE` element contains information of any image stored in your Labfolder project. `TEXT` element
contains data of any text field in your Labfodler project.
### Chemotion integration This would create a redundant Entry with the very same structure of the one where the `data_file` quantity is placed, the structure would furthermore miss a reference between the two Entries. A better result is achieved using a path in `sections` that would create a new Entry and reference it in the parent one.
#### 9.3 Row mode, single new Entry, parse to root
NOMAD supports importing your data from Chemotion repository via `chemotion` parser. The parser maps As explained in the first section of not possible cases, when parsing in row mode we create multiple instances that cannot remain as standalone floating objects. They must be organized as a list in a subsection of the parent Entry.
your data that is structured under chemotion schema, into a predefined NOMAD schema. From your Chemotion
repo, you can export your entire data as a zip file which then is used to populate NOMAD schema.
<b>How to import Chemotion data into NOMAD:</b> #### 9.4 Column mode, multiple new entries, parse to root
Go to your Chemotion repository and export your project. Save the file to your filesystem under This case would create a useless set of Entries containing one array quantity each. Usually, when parsing in column mode we want to parse together all the columns in the same section.
your preferred name and location (`your_file_name.zip`).
To get your data parsed into NOMAD,
go to the upload page of NOMAD and create a new upload. In the `overview` page, upload your exported file (either by
drag-dropping it into the <i>click or drop files</i> box or by navigating to the path where you stored the file).
This causes triggering NOMAD's parser to create one new entry in this upload.
You can inspect the parsed data of each of this new entry by navigating to the <b>DATA</b> #### 9.5 Column mode, multiple new entries, parse to my path
tab of the current entry page. Under <i>Entry</i> column, click on <i>data</i> section. Now a new lane titled
`Chemotion Project Import` should be visible. Under this section, (some of) the metadata of your project is listed.
Also, there are various (sub)sections which are either filled depending on whether your datafile
contains information on them.
If a section contains an image (or attachment) it is appended to the same section under `file` Quantity. This case would create a useless set of Entries containing one array quantity each. Usually, when parsing in column mode we want to parse together all the columns in the same section.
This video tutorial explains the basics of API and shows how to do simple requests
against the NOMAD api.
!!! note
The NOMAD seen in the tutorials is an older version with a different color theme,
but all the demonstrated functionality is still available on the current version.
You'll find the NOMAD test installation mentioned in the first video
[here](https://nomad-lab.eu/prod/v1/test/gui/search/entries).
<div class="youtube">
<iframe src="https://www.youtube.com/embed/G1frBCrxC0g" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
!!! attention
This part of the documentation is still work in progress.
## What is a custom schema
!!! attention
This part of the documentation is still work in progress.
An example of custom schema written in YAML language.
```yaml
definitions:
name: 'My test ELN'
sections:
MySection:
base_sections:
- nomad.datamodel.data.EntryData
m_annotations:
eln:
quantities:
my_array_quantity_1:
type: str
shape: ['*']
my_array_quantity_2:
type: str
shape: ['*']
```
## The base sections
!!! attention
This part of the documentation is still work in progress.
## Use of YAML files
!!! attention
This part of the documentation is still work in progress.
## The built-in tabular parser
NOMAD provides a standard parser to import your data from a spreadsheet file (`Excel` file with .xlsx extension) or from a CSV file (a Comma-Separated Values file with .csv extension). There are several ways to parse a tabular data file into a structured [data file](../learn/data.md#data), depending on which structure we want to give our data. Therefore, the tabular parser can be set very flexibly, directly from the [schema file](../learn/data.md#schema) through [annotations](../schemas/elns.md#annotations).
In this tutorial we will focus on most common modes of the tabular parser. A complete description of all modes is given in the [Reference](../reference/annotations.md#tabular_parser) section. You can also follow the dedicated [How To](../schemas/tabular.md) to see practical examples of the NOMAD tabular parser, in each section you can find a commented sample schema with a step-by-step guide on how to set it to obtain the desired final structure of your parsed data.
We will make use of the tabular parser in a custom yaml schema. To obtain some structured data in NOMAD with this parser:<br />
1) the schema files should follow the NOMAD [archive files](../learn/data.md#archive-files-a-shared-entry-structure) naming convention (i.e. `.archive.json` or `.archive.yaml` extension)<br />
2) a data file must be instantiated from the schema file<br />
[comment]: <> (--> a link to the part upload etc should be inserted)
3) a tabular data file must be dragged in the annotated [quantity](../schemas/basics.md#quantities) in order for NOMAD to parse it (the quantity is called `data_file` in the following examples)
### To be an Entry or not to be an Entry
To use this parser, three kinds of annotation must be included in the schema: `tabular`, `tabular_parser`, `label_quantity`. Refer to the dedicated [Reference](../reference/annotations.md#tabular-data) section for the full list of options.
!!! important
The ranges of the three `mapping_options`, namely `file_mode`, `mapping_mode`, and `sections` can give rise to twelve different combinations (see table in [Reference](../reference/annotations.md#available-combinations)). It is worth to analyze each of them to understand which is the best choice to pursue from case to case.
Some of them give rise to "not possible" data structures but are still listed for completeness, a brief explanation of why it is not possible to implement them is also provided.
The main bring-home message is that a tabular data file can be parsed in one or more entries in NOMAD, giving rise to diverse and arbitrarily complex structures.
In the following sections, two examples will be illustrated. A [tabular data file](../schemas/tabular.md#preparing-the-tabular-data-file) is parsed into one or more [data archive files](../learn/data.md#data), their structure is based on a [schema archive file](../learn/data.md#schema). NOMAD archive files are denoted as Entries.
!!! note
From the NOMAD point of view, a schema file and a data file are the same kind of file where different sections have been filled (see [archive files description](../learn/data.md#archive-files-a-shared-entry-structure)). Specifically, a schema file has its `definitions` section filled while a data file will have its `data` section filled. See [How to write a schema](../schemas/basics.md#uploading-schemas) for a more complete description of an archive file.
### Example 1
We want instantiate an object created from the schema already shown in the first [Tutorial section](#what-is-a-custom-schema) and populate it with the data contained in the following excel file.
<p align="center" width="100%">
<img width="30%" src="../schemas/2col.png">
</p>
The two columns in the file will be stored in a NOMAD Entry archive within two array quantities, as shown in the image below. In the case where the section to be filled is not in the root level of our schema but nested inside, it is useful to check the dedicated [How-to](../schemas/tabular.md#2-column-mode-current-entry-parse-to-my-path).
<p align="center" width="100%">
<img width="100%" src="../tutorial/tabular-1.png">
</p>
The schema will be decorated by the annotations mentioned at the beginning of this section and will look like this:
```yaml
definitions:
name: 'My test ELN'
sections:
MySection:
base_sections:
- nomad.datamodel.data.EntryData
- nomad.parsing.tabular.TableData
m_annotations:
eln:
quantities:
data_file:
type: str
default: test.xlsx
m_annotations:
tabular_parser:
parsing_options:
comment: '#'
mapping_options:
- mapping_mode: column
file_mode: current_entry
sections:
- '#root'
browser:
adaptor: RawFileAdaptor
eln:
component: FileEditQuantity
my_array_quantity_1:
type: str
shape: ['*']
m_annotations:
tabular:
name: "My header 1"
my_array_quantity_2:
type: str
shape: ['*']
m_annotations:
tabular:
name: "My header 2"
```
Here the tabular data file is parsed by columns, directly within the Entry where the `TableData` is inherited and filling the quantities in the root level of the schema (see dedicated how-to to learn [how to inherit tabular parser in your schema](../schemas/tabular.md#inheriting-the-tabledata-base-section)).
!!! note
In yaml files a dash character indicates a list element. `mapping_options` is a list because it is possible to parse multiple tabular sheets from the same schema with different parsing options. `sections` in turn is a list because multiple sections of the schema can be parsed with same parsing options.
### Example 2
<p align="center" width="100%">
<img width="100%" src="../tutorial/tabular-6.png">
</p>
In this example, each row of the tabular data file will be placed in a new Entry that is an instance of a class defined in the schema. This would make sense for, say, an inventory spreadsheet where each row can be a separate entity such as a sample, a substrate, etc.
In this case, a manyfold of Entries will be generated based on the only class available in the schema. These Entries will not be bundled together by a parent Entry but just live in our NOMAD Upload as a spare list, to bundle them together it is useful to check the dedicated [How-to](../schemas/tabular.md#7-row-mode-multiple-new-entries-parse-to-my-path). They might still be referenced manually inside an overarching Entry, such as an experiment Entry, from the ELN with `ReferenceEditQuantity`.
```yaml
definitions:
name: 'My test ELN'
sections:
MySection:
base_sections:
- nomad.datamodel.data.EntryData
- nomad.parsing.tabular.TableData
m_annotations:
eln:
more:
label_quantity: my_quantity_1
quantities:
data_file:
type: str
default: test.xlsx
m_annotations:
tabular_parser:
parsing_options:
comment: '#'
mapping_options:
- mapping_mode: row
file_mode: multiple_new_entries
sections:
- '#root'
browser:
adaptor: RawFileAdaptor
eln:
component: FileEditQuantity
my_quantity_1:
type: str
m_annotations:
tabular:
name: "My header 1"
my_quantity_2:
type: str
m_annotations:
tabular:
name: "My header 2"
```
\ No newline at end of file
This tutorial shows how to use NOMAD's search interface and structured data browsing to explore available data.
!!! note
The NOMAD seen in the tutorials is an older version with a different color theme,
but all the demonstrated functionality is still available on the current version.
You'll find the NOMAD test installation mentioned in the first video
[here](https://nomad-lab.eu/prod/v1/test/gui/search/entries).
<div class="youtube">
<iframe src="https://www.youtube.com/embed/38S2U-TIvxE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
</div>
!!! attention
This part of the documentation is still work in progress.
## Custom normalizers
For custom schemas, you might want to add custom normalizers. All files are parsed
and normalized when they are uploaded or changed. The NOMAD metainfo Python interface
allows you to add functions that are called when your data is normalized.
Here is an example:
```python
--8<-- "examples/archive/custom_schema.py"
```
To add a `normalize` function, your section has to inherit from `ArchiveSection` which
provides the base for this functionality. Now you can overwrite the `normalize` function
and add you own behavior. Make sure to call the `super` implementation properly to
support schemas with multiple inheritance.
If we parse an archive like this:
```yaml
--8<-- "examples/archive/custom_data.archive.yaml"
```
we will get a final normalized archive that contains our data like this:
```json
{
"data": {
"m_def": "examples.archive.custom_schema.SampleDatabase",
"samples": [
{
"added_date": "2022-06-18T00:00:00+00:00",
"formula": "NaCl",
"sample_id": "2022-06-18 00:00:00+00:00--NaCl"
}
]
}
}
```
\ No newline at end of file
docs/tutorial/tabular-0.png

5.14 KiB

docs/tutorial/tabular-1.png

143 KiB

docs/tutorial/tabular-2.png

148 KiB

docs/tutorial/tabular-3.png

173 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment