Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • nomad-lab/nomad-FAIR
  • pgoel/nomad-FAIR
  • jpd47/nomad-FAIR
3 results
Show changes
Showing
with 37 additions and 147 deletions
......@@ -69,7 +69,7 @@ There are three main ways to include data in an example upload:
myexampleupload = ExampleUploadEntryPoint(
name = 'MyExampleUpload',
description = 'My custom example upload.',
url='http://my_large_file_address.zip
url='http://my_large_file_address.zip'
)
```
......
%% Cell type:markdown id:105d7cdf tags:
<div style="
background-color: #f7f7f7;
background-image: url('data:image/svg+xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiIHN0YW5kYWxvbmU9Im5vIj8+CjxzdmcKICAgd2lkdGg9IjcyIgogICBoZWlnaHQ9IjczIgogICB2aWV3Qm94PSIwIDAgNzIgNzMiCiAgIGZpbGw9Im5vbmUiCiAgIHZlcnNpb249IjEuMSIKICAgaWQ9InN2ZzEzMTkiCiAgIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyIKICAgeG1sbnM6c3ZnPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyI+CiAgPGRlZnMKICAgICBpZD0iZGVmczEzMjMiIC8+CiAgPHBhdGgKICAgICBkPSJNIC0wLjQ5OTk4NSwxNDUgQyAzOS41MzMsMTQ1IDcyLDExMi41MzIgNzIsNzIuNSA3MiwzMi40Njc4IDM5LjUzMywwIC0wLjQ5OTk4NSwwIC00MC41MzI5LDAgLTczLDMyLjQ2NzggLTczLDcyLjUgYyAwLDQwLjAzMiAzMi40NjcxLDcyLjUgNzIuNTAwMDE1LDcyLjUgeiIKICAgICBmaWxsPSIjMDA4YTY3IgogICAgIGZpbGwtb3BhY2l0eT0iMC4yNSIKICAgICBpZD0icGF0aDEzMTciIC8+Cjwvc3ZnPgo='), url('data:image/svg+xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiIHN0YW5kYWxvbmU9Im5vIj8+CjxzdmcKICAgd2lkdGg9IjIxNyIKICAgaGVpZ2h0PSIyMjMiCiAgIHZpZXdCb3g9IjAgMCAyMTcgMjIzIgogICBmaWxsPSJub25lIgogICB2ZXJzaW9uPSIxLjEiCiAgIGlkPSJzdmcxMTA3IgogICB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciCiAgIHhtbG5zOnN2Zz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPgogIDxkZWZzCiAgICAgaWQ9ImRlZnMxMTExIiAvPgogIDxwYXRoCiAgICAgZD0ibSAyMi4wNDIsNDUuMDEwOSBjIDIxLjM2MjUsMjEuMjc1NyA1NS45NzYsMjEuMjc1NyA3Ny41MTkyLDAgQyAxMTkuNTU4LDI1LjA4IDE1MS41MDIsMjMuNzM1MiAxNzIuODY0LDQxLjM3OCBjIDEuMzQ1LDEuNTI1NCAyLjY5LDMuMjUxNiA0LjIzNiw0Ljc5NzEgMjEuMzYzLDIxLjI3NTYgMjEuMzYzLDU1Ljc5ODkgMCw3Ny4yNTQ5IC0yMS4zNjIsMjEuMjc2IC0yMS4zNjIsNTUuNzk4IDAsNzcuMjU1IDIxLjM2MywyMS40NTYgNTUuOTc2LDIxLjI3NSA3Ny41MiwwIDIxLjU0MywtMjEuMjc2IDIxLjM2MiwtNTUuNzk5IDAsLTc3LjI1NSAtMjEuMzYzLC0yMS4yNzYgLTIxLjM2MywtNTUuNzk4NiAwLC03Ny4yNTQ5IDEyLjY4OSwtMTIuNjQ1IDE3Ljg4OSwtMzAuMTA3MSAxNS4zOTksLTQ2LjU4NTc2IC0xLjU0NiwtMTEuNTAwOTQgLTYuNzI2LC0yMi44MjExNCAtMTUuNTgsLTMxLjYzMjU0IC0yMS4zNjMsLTIxLjI3NTYgLTU1Ljk3NiwtMjEuMjc1NiAtNzcuNTE5LDAgLTIxLjM2MywyMS4yNzU3IC01NS45NzYsMjEuMjc1NyAtNzcuNTE5NCwwIC0yMS4zNjI1LC0yMS4yNzU2IC01NS45NzYxLC0yMS4yNzU2IC03Ny41MTkyLDAgQyAwLjY3OTU2NSwtMTAuNzg3NiAwLjY3OTU5NiwyMy43MzUyIDIyLjA0Miw0NS4wMTA5IFoiCiAgICAgZmlsbD0iIzJhNGNkZiIKICAgICBzdHJva2U9IiMyYTRjZGYiCiAgICAgc3Ryb2tlLXdpZHRoPSIxMiIKICAgICBzdHJva2UtbWl0ZXJsaW1pdD0iMTAiCiAgICAgaWQ9InBhdGgxMTA1IiAvPgogIDxwYXRoCiAgICAgZD0ibSA1MS45OTUyMTIsMjIyLjczMDEzIGMgMjguMzU5MSwwIDUxLjM1ODM5OCwtMjIuOTk5OSA1MS4zNTgzOTgsLTUxLjM1ODQgMCwtMjguMzU4NiAtMjIuOTk5Mjk4LC01MS4zNTg1OSAtNTEuMzU4Mzk4LC01MS4zNTg1OSAtMjguMzU5MSwwIC01MS4zNTg2MDIsMjIuOTk5OTkgLTUxLjM1ODYwMiw1MS4zNTg1OSAwLDI4LjM1ODUgMjIuOTk5NTAyLDUxLjM1ODQgNTEuMzU4NjAyLDUxLjM1ODQgeiIKICAgICBmaWxsPSIjMTkyZTg2IgogICAgIGZpbGwtb3BhY2l0eT0iMC4zNSIKICAgICBpZD0icGF0aDE5MzciIC8+Cjwvc3ZnPgo=') ;
background-position: left bottom, right top;
background-repeat: no-repeat, no-repeat;
background-size: auto 60px, auto 160px;
border-radius: 5px;
box-shadow: 0px 3px 1px -2px rgba(0, 0, 0, 0.2), 0px 2px 2px 0px rgba(0, 0, 0, 0.14), 0px 1px 5px 0px rgba(0,0,0,.12);">
<h1 style="
color: #2a4cdf;
font-style: normal;
font-size: 2.25rem;
line-height: 1.4em;
font-weight: 600;
padding: 30px 200px 0px 30px;">
NOMAD as a Data Management Framework Tutorial</h1>
<p style="
line-height: 1.4em;
padding: 30px 200px 0px 30px;">
This tutorial notebook demonstrates how to use NOMAD
for managing custom data and file types. Based on a simple <i>Countries of the World</i>
dataset, it shows how to model the data in a schema, do parsing and normalization,
process data, access existing data with NOMAD's API for analysis, and how to
add visualization to your data entries.
</p>
<p style="font-size: 1.25em; font-style: italic; padding: 5px 200px 30px 30px;">
Markus Scheidgen, José A. Márquez</p>
</div>
%% Cell type:code id:99504af2-9c6c-4747-8298-bcbf0be9f685 tags:
``` python
# This is necessary in some development environments. You can ignore this!
from nomad.config import client
client.url = client.url.replace('://localhost', '://host.docker.internal')
# A utility to show structured data in cell outputs.
from IPython.display import JSON
```
%% Cell type:markdown id:1f0492bf-01cc-4bee-819d-b1153ad1c61d tags:
# Content
- [Data](#Data)
- [Schema](#Schema)
- [Parsing](#Parsing)
- [Normalizing](#Normalizing)
- [Plugins](#Plugins)
- [Processing](#Processing)
- [Analysis](#Analysis)
- [Visualization](#Visualization)
%% Cell type:markdown id:2605130f-05b5-477f-8551-732561a41c40 tags:
## How to run this notebook
Ideally you are here, because you created the example upload *NOMAD as a Data Management Framework Tutorial* and you started the `Tutorial.ipynb` notebook. *No other preparation is required.*
Alternatively, you can download the [necessary files from gitlab](https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/tree/develop/examples/data/cow_tutorial).
From the downloaded directory, you can run the `Tutorial.ipynb` using our Jupyterlab docker image:
```
docker run --rm -p 8888:8888 -v `pwd`:/home/jovyan/work \
gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-remote-tools-hub/jupyterlab:latest
```
If you want to run this with your jupyter or any other solution,
you need to install the `nomad-lab` pypi package and your OS also needs to have libmagic installed.
```
sudo apt-get install --yes --quiet --no-install-recommends libmagic-dev
pip install nomad-lab
```
%% Cell type:markdown id:4009e853-67cc-4d9d-a173-e13186d3130e tags:
<div class="alert alert-block alert-warning">
The cells is this notebook are not independent and have to be run in order.
</div>
%% Cell type:markdown id:d65a2c82-e930-4964-b470-58070775c533 tags:
<div style="height: 4rem;">&nbsp;</div>
## Data
Here is some example data in a proprietary text file format: [Germany.data.txt](raw_data/Germany.data.txt). The data combines two public kaggle datasets ([1](https://www.kaggle.com/datasets/fernandol/countries-of-the-world), [2](https://www.kaggle.com/datasets/kaggle/world-development-indicators)); the orignal data files and a notebook to create the `.data.txt` files can be [downloaded here](https://datashare.mpcdf.mpg.de/s/CKgf3TZ7TtxB2P1). Now image we have such a file for each country in the world.
%% Cell type:raw id:e6db14f4-d029-4b14-a890-ca3cce9d5143 tags:
#Country=Germany
#Region=WESTERN EUROPE
#Population=82422299
#Area (sq. mi.)=357021
#Pop. Density (per sq. mi.)=230,9
#Coastline (coast/area ratio)=0,67
#Net migration=2,18
#Infant mortality (per 1000 births)=4,16
#GDP ($ per capita)=27600
#Literacy (%)=99,0
#Phones (per 1000)=667,9
#Arable (%)=33,85
#Crops (%)=0,59
#Other (%)=65,56
#Climate=3
#Birthrate=8,25
#Deathrate=10,62
#Agriculture=0,009
#Industry=0,296
#Service=0,695
indicator year value
Adolescent fertility rate (births per 1,000 women ages 15-19) 1960 37.9798
Age dependency ratio (% of working-age population) 1960 49.140379601754596
Age dependency ratio, old (% of working-age population) 1960 17.167974839431402
Age dependency ratio, young (% of working-age population) 1960 31.972404762323198
Alternative and nuclear energy (% of total energy use) 1960
...
%% Cell type:markdown id:7f7c1d94-8989-4d95-8bb7-d4903c83e4c7 tags:
<div style="height: 4rem;">&nbsp;</div>
## Schema
You might also read [How to write a schema package](https://nomad-lab.eu/prod/v1/staging/docs/howto/plugins/schema_packages.html) or [Structured data](https://nomad-lab.eu/prod/v1/staging/docs/explanation/data.html) from the NOMAD documentation.
With a first impression on the data, we start to design a *schema*. Schema describes a data structure that can be used to instantiate NOMAD entries. The basic building blocks for schemas are:
- *sections*: Containers for data. Instantiated from the `MSection` class.
- *quantities*: Concrete data values. Instantiated from the `Quantity` class.
- *sub-sections*: Used to nest sections within each other. Instantiated from the `SubSection` class
- *schemas*: Top-level sections from which NOMAD entries are created. Instantiated from the `Schema` class.
Here is an example of a schema definition as a Python class:
%% Cell type:code id:ea2aedb5-d998-4b67-8d1d-21a7ba6880af tags:
``` python
from nomad.metainfo import Quantity
from nomad.datamodel import Schema
import numpy as np
class Country(Schema):
name = Quantity(type=str)
population = Quantity(type=np.int32)
area = Quantity(type=np.float64, unit='km^2')
```
%% Cell type:markdown id:aab7f210-ed64-4693-ba8b-f9054913547a tags:
Now we can instantiate this schema with some data. In `Germany.data.txt`, we find something like this:
```
#Country=Germany
#Region=WESTERN EUROPE
#Population=82422299
#Area (sq. mi.)=357021
#Pop. Density (per sq. mi.)=230,9
#Coastline (coast/area ratio)=0,67
```
Let's use some of this information to populate the data. Schemas and other sections can be instantiated like normal Python classes. Quantities are passed as constructor keyword arguments or by assigning values to fields.
%% Cell type:code id:d671ae89-314d-4128-a499-8b70b001e0c6 tags:
``` python
from nomad.units import ureg
example = Country(
name='Germany',
population=82422299
)
example.area = area=357021 * ureg('mi^2')
example.m_to_dict()
```
%% Cell type:markdown id:1d55dcfb-9d04-4b0e-876b-20fc6b315916 tags:
Above, we put only the needed technical information. To make the definitions more useful to human users, you can also add documentation in natural language with `description` or link related resources with `links`.
%% Cell type:code id:70d49672-ebff-454d-afd9-adcf4552a360 tags:
``` python
from nomad.metainfo import Section
class Country(Schema):
''' This section represents a country of the world. '''
m_def = Section(links=[
'https://www.kaggle.com/datasets/fernandol/countries-of-the-world'])
name = Quantity(
type=str,
description='The country\'s name.')
population = Quantity(
type=np.int32,
description='The country\'s population.')
area = Quantity(
type=np.float64, unit='km^2',
description='The are of the country.')
```
%% Cell type:markdown id:529a5613-0b27-4970-90be-9b290d4a46de tags:
***
Above we only used scalar quantities (single values). How can we model a time series? Let's use this to also demonstrate *sub sections*. Let's say we want to add multiple time series as sub-sections to our more general `Country` schema.
We need to define a new class for a new *section* `Timeseries` and then use `Timeseries` and add `SubSection`s in our `Country` class:
%% Cell type:code id:391e2b09-d081-460e-9eb6-4177d907b15d tags:
``` python
from nomad.metainfo import MSection
class Timeseries(MSection):
year = Quantity(type=np.int, shape=['*'])
year = Quantity(type=np.int32, shape=['*'])
value = Quantity(type=np.float64, shape=['*'])
```
%% Cell type:code id:ab723ddf-3429-43f5-a74d-d386805f9233 tags:
``` python
from nomad.metainfo import SubSection
class Country(Schema):
name = Quantity(type=str)
population = Quantity(type=np.int32)
area = Quantity(type=np.float64, unit='km^2')
gdp = SubSection(
section=Timeseries,
description='GDP per capita (constant 2005 US$)'
)
birth_rate = SubSection(
section=Timeseries,
description='per 1,000 people per year'
)
```
%% Cell type:markdown id:ce40d69a-1679-4b25-99c9-d736e4e00eb0 tags:
***
In order to use such custom schemas in NOMAD, we have to bundle all the definitions in a *schema package*. Such schema packages can be distributed as NOMAD Plugins, or uploaded a `.archive.json/yaml` files. For our use case, we will convert our Python definitions into the corresponding YAML version and save it directly into the current upload:
%% Cell type:code id:e3b25167-78d4-4b00-91aa-c66e26319255 tags:
``` python
from nomad.metainfo import SchemaPackage
from nomad.datamodel import EntryArchive
import yaml
def create_schema_package():
return EntryArchive(
definitions=SchemaPackage(
name='Countries of the World',
sections=[
Country.m_def, Timeseries.m_def
]
)
)
def save_schema_package_to_yaml():
with open('schema_package.archive.yaml', 'wt') as f:
f.write(yaml.dump(create_schema_package().m_to_dict(with_out_meta=True), indent=2))
save_schema_package_to_yaml()
print(yaml.dump(create_schema_package().m_to_dict(with_out_meta=True), indent=2))
```
%% Cell type:markdown id:080c89eb-31da-48db-b046-a107f3d835d0 tags:
The full schema in a Python file (as you would have it in a NOMAD plugin) would look like this: [country.py](nomad-countries/src/nomad_countries/schema_packages/country.py). And in a `.yaml` file (as you would upload it to NOMAD), will look like this: [schema_package.archive.yaml](schema_package.archive.yaml).
%% Cell type:markdown id:ab6905e0-1808-4d08-a5a3-abd6eb8a3df5 tags:
<div style="height: 4rem;">&nbsp;</div>
## Normalizing
Why then not just always write the schema in `yaml`?
In Python, we can add `normalize` functions to our schema. These allow us to add additional processing steps, for example, to augment our data.
%% Cell type:code id:9606533a-2ab7-4bed-ad0a-0015fe6cc6ab tags:
``` python
class Country(Schema):
name = Quantity(type=str)
population = Quantity(type=np.int32)
area = Quantity(type=np.float64, unit='km^2')
population_density = Quantity(type=np.float64, unit='1/km^2')
gdp = SubSection(
section=Timeseries,
description='GDP per capita (constant 2005 US$)'
)
birth_rate = SubSection(
section=Timeseries,
description='per 1,000 people per year'
)
def normalize(self, archive, logger):
self.population_density = self.population / self.area
save_schema_package_to_yaml()
```
%% Cell type:code id:988ad4f3-deac-4da1-b44a-bb2fd52a8785 tags:
``` python
example = Country(
name='Germany',
population=82422299,
area=(357021 * ureg('mi^2'))
)
example.normalize(None, None)
example.m_to_dict()
```
%% Cell type:markdown id:f21e73fe-054e-4326-9526-97b648f793a1 tags:
This is an extremely simple example, but `normalize` functions can be incredible powerful as they allow to incorporate custom Python code into NOMAD's data processing. This can be used for example to fit your data on the fly, to add derived quantities, or even retrieve data drom external APIs.
%% Cell type:markdown id:c9c35522-cc4f-46ee-b2a5-8eb5039e57ca tags:
<div style="height: 4rem;">&nbsp;</div>
## Parsing
You might also read [From file to data](https://nomad-lab.eu/prod/v1/staging/docs/explanation/basics.html)
or [How to write a parser](https://nomad-lab.eu/prod/v1/staging/docs/howto/plugins/parsers.html) from the NOMAD
documentation.
We don't want to always create schema instances by hand, we want to automatize and write a parser that populates the schema with data from a file as soon as it gets detected by a NOMAD installation (for example, via drag'n dropping or uploading it via the NOMAD API).
A parser *reads* the contents from a file and *writes* the data in the NOMAD format based on a schema into an *archive*. The signature for a `parse` function, i.e. within a NOMAD plugin, is this:
%% Cell type:code id:a655a29e-9bde-4196-ab6f-d77ac816e04e tags:
``` python
def parse(mainfile, archive, logger):
# fill the archive: EntryArchive and return
pass
```
%% Cell type:markdown id:844bb9fd-adae-446d-9823-b87f61f1fdfb tags:
We can extract the parsing part that deals with the file format in a `read` function. This would allows us to have different `read` functions for slightly different file formats, while re-using the part that populates the schema.
%% Cell type:code id:48a2abf0-f0a2-4a0f-bb03-50c162a9387b tags:
``` python
import re
def read(mainfile):
data = {}
with open(mainfile, 'rt') as f:
while True:
line = f.readline()
match = re.match(r'#([^=]+)=(.+)', line)
if not match:
break
key, str_value = match.group(1), match.group(2)
try:
value = float(str_value.replace(',', '.'))
except Exception:
value = str_value
data[key] = value
return data
read('raw_data/Germany.data.txt')
```
%% Cell type:markdown id:81ee46fc-0792-4850-a3a4-1d757dd96fbb tags:
In the actual `parse` function call, we use the `read` function and populate the given `archive` with the data.
%% Cell type:code id:782ddb83-b973-4c2d-8af4-a984f278bcd5 tags:
``` python
def parse(mainfile, archive, logger):
data = read(mainfile)
archive.data = Country(
name=data['Country'],
population=data['Population'],
area=data['Area (sq. mi.)']
)
```
%% Cell type:markdown id:81e1781e-0703-41c4-9d6a-e6f2fcd2bbdd tags:
To call the `parse` function, we create an empty `EntryArchive` for the `archive` argument. The real NOMAD processing will also provide a `logger` that can be used to report parsing problems.
%% Cell type:code id:be1c04ac-0675-43e8-9063-ab768fb27a33 tags:
``` python
from nomad.datamodel import EntryMetadata
archive = EntryArchive(metadata=EntryMetadata())
parse('raw_data/Germany.data.txt', archive, None)
JSON(archive.m_to_dict())
```
%% Cell type:markdown id:843ba266-bcac-4f99-afec-d3670d1ad105 tags:
After parsing we can *normalize* the archive to call our `normalize` functions. There is an utility called `normalize_all` in the NOMAD Python package that allows to call the *normalization*.
%% Cell type:code id:a66f0c4b-2839-46b7-8b3d-1429dade1d75 tags:
``` python
from nomad.client import normalize_all
normalize_all(archive)
JSON(archive.m_to_dict())
```
%% Cell type:markdown id:e268f999-35f4-4308-947b-d35fd4f08d94 tags:
<div style="height: 4rem;">&nbsp;</div>
## Plugins
Above, we showed how to write a simple `parse` function to learn about what a parser does. To add such a custom parser to a NOMAD installation, we will need to develop a *plugin*. You can read [how to get started with plugins](https://nomad-lab.eu/prod/v1/staging/docs/howto/plugins/plugins.html) to learn more about plugin development, but we have included an example of a plugin in the `nomad-countries` folder in this upload. In this more complete example there is additional functionality to parse the csv part and populate the `gdp` and `birth_rate` sub sections. Feel free to explore the code inside the `nomad-countries` folder to learn more.
We can actually install this complete plugin into this Jupyter environment by installing it with `pip`:
%% Cell type:code id:74127b65 tags:
``` python
pip install -e ./nomad-countries
```
%% Cell type:markdown id:20b82871 tags:
<div class="alert alert-block alert-warning">
<b>Attention:</b> Before the next cells will work, you have to restart the Python kernel. You can do this in "Kernel->Restart Kernel" or by using the "00" hotkey.
</div>
%% Cell type:markdown id:f75e1bc1-a8bf-46cc-b6bb-7805d756e84e tags:
Once the package is installed and you have restarted the kernel, we can use the [parsing programming interface described in the documentation](https://nomad-lab.eu/prod/v1/staging/docs/howto/programmatic/local_parsers.html#from-a-python-program) to run the parser on files:
%% Cell type:code id:0a3a43c4-ce4c-4bb2-96f3-e5fccd05afce tags:
``` python
from nomad.client import parse, normalize_all
from IPython.display import JSON
archive = parse('raw_data/Germany.data.txt')[0]
normalize_all(archive)
JSON(archive.m_to_dict())
```
%% Cell type:markdown id:fc212f77-02e2-4bb4-a5a7-936d91fca28f tags:
Or we use the `nomad` shell command, i.e. the command line interface (CLI):
%% Cell type:code id:16778884-eba9-4f9c-b33c-863d79d1b0e6 tags:
``` python
!PYTHONPATH=. nomad parse raw_data/Germany.data.txt --show-archive
```
%% Cell type:markdown id:d79aaff3-b850-454a-94f7-ea972169bbc2 tags:
<div style="height: 4rem;">&nbsp;</div>
## Processing
You might also read [Processing](https://nomad-lab.eu/prod/v1/staging/docs/explanation/processing.html)
from the NOMAD documentation.
With a NOMAD that uses the *Countries of the World* plugin, we would simply upload `*.country.txt`, and NOMAD would process them for us by *matching* the files to our parser, doing the *parsing*, and *normalizing*. Finally NOMAD would persist the results.
Without our own NOMAD, we can still emulate the process and create `*.archive.json` files. We apply the code from before to a few of the countries.
There is one technicality we have to change to prepare the `.archive.json` files. NOMAD needs to know what schema we are using. Because we are using a Python schema, the exported `.json` will contain a references to a Python class (`data.m_def=nomad_countries.schema_packages.country.Country`) and we have to change it to a reference for the `.yaml` schema that we created earlier:
%% Cell type:code id:17bb62c0-03d7-4511-8262-5a4e11c72d52 tags:
``` python
import json
for country in ('Germany', 'Poland', 'France'):
# Process the country file
archive = parse(f'raw_data/{country}.data.txt')[0]
normalize_all(archive)
json_data = archive.m_to_dict()
# Here we replace the schema reference
json_data['data']['m_def'] = \
'../upload/raw/schema_package.archive.yaml#/definitions/section_definitions/Country'
# Save the country as a .archive.json
with open(f'nomad_data/{country}.archive.json', 'wt') as f:
f.write(json.dumps(json_data, indent=2))
```
%% Cell type:markdown id:d8f9f0be-622a-4898-a2f8-0dfd0abfbb02 tags:
<div style="height: 4rem;">&nbsp;</div>
## Analysis
You might also read [How to use the API](https://nomad-lab.eu/prod/v1/staging/docs/howto/programmatic/api.html) or [How to access processed data](https://nomad-lab.eu/prod/v1/staging/docs/howto/programmatic/archive_query.html).
from the NOMAD documentation.
<div class="alert alert-block alert-warning">
<b>Attention:</b> Before the next cells will work, you have to go back to NOMAD. On the upload page, press the reprocess button on the top-right.
</div>
%% Cell type:markdown id:59c22d5d-f124-4d46-ac31-bf502485c3af tags:
Analysis means you need to access the processed data from NOMAD. There are two principle ways. You can use a generic HTTP library like `requests` to use our RESTful API directly, or you use our client library form the `nomad-lab` Python package.
%% Cell type:markdown id:60a7eb87-807f-4e0f-a47b-241ffb56ea2c tags:
Below, we are using `requests` to perform a query and retrieve the id for our "uploaded" `.yaml` schema. You can learn more about our API endpoints on the [API dashboard](https://nomad-lab.eu/prod/v1/api/v1/extensions/docs). With `requests` you get the raw API responses as JSON.
%% Cell type:code id:3e6e4113-76c3-47a6-9569-e921c405854c tags:
``` python
import requests
from nomad.config import client
from nomad.client import Auth
response = requests.post(f'{client.url}/v1/entries/query', auth=Auth(), json={
'owner': 'user',
'query': {
'mainfile': 'schema_package.archive.yaml',
'upload_name': 'NOMAD as a Data Management Framework Tutorial'
}
})
schema_entry_id = response.json()['data'][0]['entry_id']
```
%% Cell type:markdown id:44513346-06ed-436e-a7e2-35c204a41d99 tags:
Below, we use the `ArchiveQuery` utility. It allows you to `query` for entries and access the `required` parts of the processed data at the same time. With `ArchiveQuery` you retrieve Python objects that instantiate the respective schema.
%% Cell type:code id:74919e82-44d5-4f5c-90f0-618dad091942 tags:
``` python
from nomad.client import ArchiveQuery
archive_query = ArchiveQuery(
query={
f'data.population#entry_id:{schema_entry_id}.Country#int:gt': 50e6,
'upload_name': 'NOMAD as a Data Management Framework Tutorial'
},
required={
'data': '*'
}
)
countries = [entry.data for entry in archive_query.download()]
```
%% Cell type:markdown id:63c97c8f-05eb-493d-919f-5965f78bd0a9 tags:
With the data available, you can perform your analysis on top of the data. For example we can plot the data with `plotly`.
%% Cell type:code id:d50b35c9-73aa-4dd8-a413-215adb70cb65 tags:
``` python
import plotly.express as px
import pandas as pd
px.line(
pd.concat([
pd.DataFrame(dict(
year=country.gdp.year,
GDP=country.gdp.value,
name=country.name
))
for country in countries
]),
x='year', y=['GDP'], color='name'
).show()
```
%% Cell type:code id:58379d84-9a8a-4e36-b607-b0f20fbcee25 tags:
``` python
px.line(
pd.concat([
pd.DataFrame(dict(
year=country.birth_rate.year,
birth_rate=country.birth_rate.value,
name=country.name
))
for country in countries
]),
x='year', y='birth_rate', color='name'
).show()
```
%% Cell type:markdown id:24cec166-c261-4bb0-bcd0-554cdfd09129 tags:
<div style="height: 4rem;">&nbsp;</div>
## Adding Visualization to your NOMAD entries
You might also read the reference on [Plot Annotations](https://nomad-lab.eu/prod/v1/staging/docs/reference/annotations.html#plot)
from the NOMAD documentation.
%% Cell type:markdown id:e61f9efa-40fd-4bc6-bd18-eeec346f0100 tags:
We can also put visualizations into the schema, allowing the NOMAD UI to show them. We can either add a schema *annotation* that informs the UI how to do the visualization, or we can add a Plotly figure to our data and let the UI simply show it.
%% Cell type:markdown id:a175386c-684f-4a35-b2ca-475976a47e3c tags:
### Plot annotation
%% Cell type:markdown id:0428059f-2ece-4ba9-81b5-220109d4c26b tags:
Schema annotation can be added to Python schemas and `.yaml` schemas as well. They do not require any Python code to run and also work for uploaded schemas and do not require a plugin. The plot annotations require that your section inherits from `PlotSection`. This is how the annotation looks in a `.yaml` schema:
```yaml
definitions:
name: Countries of the World
section_definitions:
- base_sections:
- nomad.datamodel.metainfo.plot.PlotSection
- nomad.datamodel.data.EntryData
m_annotations:
plotly_graph_object:
- data:
x: '#birth_rate/year'
y: '#birth_rate/value'
layout:
yaxis:
title: birth rate (per 1,000 people)
name: Country
quantities:
...
```
You can also add annotation in Python schemas:
%% Cell type:code id:5bee98dd-240b-4b69-9e97-5994d294a26e tags:
``` python
import numpy as np
from nomad.metainfo import Section, Quantity, SubSection, MSection
from nomad.datamodel.data import Schema
from nomad.datamodel.metainfo.plot import PlotSection
class Timeseries(MSection):
year = Quantity(type=np.int, shape=['*'])
year = Quantity(type=np.int32, shape=['*'])
value = Quantity(type=np.float64, shape=['*'])
class Country(PlotSection, Schema):
m_def=Section(a_plotly_graph_object={
'data': {
'x': '#birth_rate/year',
'y': '#birth_rate/value'
},
'layout': {
'yaxis': {
'title': 'birth rate (per 1,000 people)'
}
}
})
name = Quantity(type=str)
population = Quantity(type=np.int32)
area = Quantity(type=np.float64, unit='km^2')
population_density = Quantity(type=np.float64, unit='1/km^2')
gdp = SubSection(
section=Timeseries,
description='GDP per capita (constant 2005 US$)'
)
birth_rate = SubSection(
section=Timeseries,
description='per 1,000 people per year'
)
def normalize(self, archive, logger):
self.population_density = self.population / self.area
```
%% Cell type:markdown id:202f67c4-a844-425a-909e-71f774ab541f tags:
The annotation is based on Plotly [graph objects](https://plotly.com/python/graph-objects/). The `layout` is passed directly to Plotly, the `data` allows to reference quantities in the data. Let's save this new version:
%% Cell type:code id:cbc4d7a4 tags:
``` python
from nomad.metainfo import SchemaPackage
from nomad.datamodel import EntryArchive
import yaml
def create_schema_package():
return EntryArchive(
definitions=SchemaPackage(
name='Countries of the World',
sections=[Country.m_def, Timeseries.m_def]
)
)
def save_schema_package_to_yaml():
with open('schema_package.archive.yaml', 'wt') as f:
f.write(yaml.dump(create_schema_package().m_to_dict(with_out_meta=True), indent=2))
save_schema_package_to_yaml()
```
%% Cell type:markdown id:bc664c3a-8397-4b40-85af-a4e653c27d55 tags:
You can go back to NOMAD and reprocess the upload. Look at one of the Country entries to see the plot.
%% Cell type:markdown id:1a9253eb-185b-4c25-b5a8-09c4f7c3aad1 tags:
### Creating custom plots programmatically
%% Cell type:markdown id:efb8f602-e746-40ef-8d07-41d09f7aa8dd tags:
You can also create Plotly plots during the processing as part of a `normalize` function. This gives you the full functionality of plotly and you simply store the results via Plotly's `to_plotly_json` function. We are using the base class `PlotSection` that provides a `figures` property to store figures.
%% Cell type:code id:b05e9b93-ab4c-4d4b-bc34-bbf9f79d88e4 tags:
``` python
from nomad.datamodel.metainfo.plot import PlotSection, PlotlyFigure
class Country(PlotSection, Schema):
name = Quantity(type=str)
population = Quantity(type=np.int32)
area = Quantity(type=np.float64, unit='km^2')
population_density = Quantity(type=np.float64, unit='1/km^2')
gdp = SubSection(
section=Timeseries,
description='GDP per capita (constant 2005 US$)'
)
birth_rate = SubSection(
section=Timeseries,
description='per 1,000 people per year'
)
def normalize(self, archive, logger):
super(Country, self).normalize(archive, logger)
self.population_density = self.population / self.area
self.figures.append(PlotlyFigure(
figure=px.line(
pd.DataFrame(dict(year=self.gdp.year, GDP=self.gdp.value)),
x='year', y=['GDP']
).to_plotly_json()
))
save_schema_package_to_yaml()
```
%% Cell type:markdown id:ed674b51-6f86-4c1c-affe-137805f7f78c tags:
You can go back to NOMAD and reprocess the upload. Look at one of the Country entries to see the plot.
%% Cell type:markdown id:826f807e-cf57-4ad8-98ae-d2f2435e6ec0 tags:
<div style="height: 4rem;">&nbsp;</div>
You reached the end of this notebook. Here are some useful links:
- [nomad-lab.eu](https://nomad-lab.eu)
- [NOMAD Documentation](https://nomad-lab.eu/docs)
- [Our user forums](https://matsci.org/c/nomad/32)
......
......@@ -70,6 +70,6 @@ class CountryParser(MatchingParser):
return None
return Timeseries(
year=[year for year, _ in data[indicator]],
value=[value for _, value in data[indicator]],
year=[int(year) for year, _ in data[indicator]],
value=[float(value) for _, value in data[indicator]],
)
......@@ -9,7 +9,7 @@ m_package = Package(name='Countries of the World')
class Timeseries(MSection):
year = Quantity(type=np.int, shape=['*'])
year = Quantity(type=np.int32, shape=['*'])
value = Quantity(type=np.float64, shape=['*'])
......
......@@ -29,7 +29,7 @@
"ellipsometry_experiment_type": "NIR-Vis-UV spectroscopic ellipsometry",
"plot_name": "Psi and Delta",
"start_time": "2022-01-27T03:35:00+00:00",
"User": {
"user": {
"name": "Name Surname",
"affiliation": "Humboldt-Universität zu Berlin",
"address": "Zum Großen Windkanal 2, 12489 Berlin, Germany",
......@@ -40,7 +40,7 @@
"identifier": "exp-ID",
"is_persistent": "false"
},
"Instrument": {
"instrument": {
"software_RC2": "CompleteEASE",
"software_RC2/@version": "6.37",
"software_RC2/@url": "https://www.jawoollam.com/ellipsometry-software/completeease",
......@@ -49,7 +49,7 @@
"light_source": {
"source_type": "arc lamp"
},
"Detector": {
"detector": {
"detector_type": "CCD",
"detector_channel_type": "multichannel",
"count_time": 1
......@@ -79,7 +79,7 @@
"model": "RC2 (Vers. 0.0.1)"
}
},
"Sample": {
"sample": {
"atom_types": "Si, O",
"chemical_formula": "SiO2",
"layer_structure": "2nm SiO2 on Si",
......
......@@ -101,7 +101,7 @@ definitions:
eln:
component: DateTimeEditQuantity
sub_sections:
User:
user:
section:
m_annotations:
eln:
......@@ -143,7 +143,7 @@ definitions:
m_annotations:
eln:
component: StringEditQuantity
Instrument:
instrument:
section:
m_annotations:
eln:
......@@ -193,7 +193,7 @@ definitions:
m_annotations:
eln:
component: StringEditQuantity
Detector:
detector:
section:
quantities:
detector_type:
......@@ -296,7 +296,7 @@ definitions:
m_annotations:
eln:
component: StringEditQuantity
Sample:
sample:
section:
m_annotations:
eln:
......
......@@ -40,7 +40,8 @@ sample:
environment:
sample_medium: air
history:
notes: Commercially purchased sample
notes:
description: Commercially purchased sample
layer_structure: 2nm SiO2 on Si
physical_form: multi layer
sample_name: 2nm SiO2 on Si
......
......@@ -3,4 +3,10 @@ This upload demonstrates the use of tabular data. In this example we use an *xls
This schema is meant as a starting point. You can download the schema file and
extend the schema for your own tables.
In order to see the parsed data, create an entry by clicking on the **_create from schema_** button,
pick a name for your entry, and select **_Custom schema_** from the options. Then click on the
search icon, from the dialogue, click on the _**Periodic Table**_ and select **_Element_** from the
dropdown menu. Clicking on `Create` would trigger the parser and you should be able to see all elements
successfully parsed into individual entries.
Consult our [documentation on the NOMAD Archive and Metainfo](https://nomad-lab.eu/prod/v1/staging/docs/) to learn more about schemas.
......@@ -5,7 +5,7 @@ definitions:
sections:
Element:
more:
label_quantity: '#/data/name'
label_quantity: name
base_sections:
# We use ElnBaseSection here. This provides a few quantities (name, ags)description, t
# that are added to the search index. If we map table columns to these quantities,
......
Basic examples:
theory:
path: examples/data/uploads/theory.zip
title: Electronic structure code input and output files
description: |
This upload demonstrate the basic use of NOMAD's *parsers*. For many *electronic
structure codes* (VASP, etc.), NOMAD provides parsers. You simply upload
the *input and output files* of your simulations and NOMAD parsers are extracting
all necessary metadata to produce a **FAIR** dataset.
eln:
path: examples/data/uploads/eln.zip
title: Electronic Lab Notebook
description: |
This example contains a custom NOMAD *schema* to create an **Electronic
Lab Notebook (ELN)** and a few example *data* entries that use this schema.
The schema demonstrates the basic concepts behind a NOMAD ELN and can be a good
**starting point** to create you own schemas that model **FAIR data** acquired in your lab.
tables:
path: examples/data/uploads/tabular.zip
title: Tabular Data
description: |
This upload demonstrates the used of tabular data. In this example we use an *xlsx*
file in combination with a custom schema. The schema describes what the columns
in the excel file mean and NOMAD can parse everything accordingly to
produce a **FAIR** dataset.
Tutorials:
rdm_tutorial:
path: examples/data/uploads/rdm_tutorial.zip
title: Tailored Research Data Management (RDM) with NOMAD
description: |
This notebook will teach you how you can build tailored research data
management (RDM) solutions using NOMAD. It uses existing thermally
activated delayed fluorescent (TADF) molecule data to teach you how we
can use the NOMAD to turn research data into datasets that are FAIR:
Findable, Accessible, Interoperable and Reusable. The end-result can be
distributed as a NOMAD plugin: a self-contained Python package that can be
installed on a NOMAD Oasis.
cow_tutorial:
path: examples/data/uploads/cow_tutorial.zip
title: NOMAD as a Data Management Framework Tutorial
description: |
This upload provides a notebook as a tutorial that demonstrates how to use NOMAD
for managing custom data and file types. Based on a simple *Countries of the World*
dataset, it shows how to model the data in a schema, do parsing and normalization,
process data, access existing data with NOMAD's API for analysis, and how to
add visualization to your data.
FAIRmat examples:
ellips:
path: examples/data/uploads/ellips.zip
title: Ellipsometry
description: |
This example presents the capabilities of the NOMAD platform to store and standardize ellipsometry data.
It shows the generation of a NeXus file according to the [NXellipsometry](https://manual.nexusformat.org/classes/contributed_definitions/NXellipsometry.html#nxellipsometry)
application definition and a successive analysis of a SiO2 on Si Psi/Delta measurement.
mpes:
path: examples/data/uploads/mpes.zip
title: Mpes
description: |
This example presents the capabilities of the NOMAD platform to store and standardize multi photoemission spectroscopy (MPES) experimental data. It contains three major examples:
- Taking a pre-binned file, here stored in a h5 file, and converting it into the standardized MPES NeXus format.
There exists a [NeXus application definition for MPES](https://manual.nexusformat.org/classes/contributed_definitions/NXmpes.html#nxmpes) which details the internal structure of such a file.
- Binning of raw data (see [here](https://www.nature.com/articles/s41597-020-00769-8) for additional resources) into a h5 file and consecutively generating a NeXus file from it.
- An analysis example using data in the NeXus format and employing the [pyARPES](https://github.com/chstan/arpes) analysis tool to reproduce the main findings of [this paper](https://arxiv.org/pdf/2107.07158.pdf).
xps:
path: examples/data/uploads/xps.zip
title: XPS
description: |
This example presents the capabilities of the NOMAD platform to store and standardize XPS data.
It shows the generation of a NeXus file according to the
[NXmpes](https://manual.nexusformat.org/classes/contributed_definitions/NXmpes.html#nxmpes)
application definition and a successive analysis of an example data set.
sts:
path: examples/data/uploads/sts.zip
title: STS
description: |
This example is for two types of experiments: Scanning Tunneling Microscopy (STM) and Scanning Tunneling Spectroscopy (STS) from Scanning Probe Microscopy.
It can transform the data from files generated by a the nanonis software into the NeXus application definition NXsts.
The example contains data files from the two specific nanonis software versions generic 5e and generic 4.5.
stm:
path: examples/data/uploads/stm.zip
title: STM
description: |
This example is for two types of experiments: Scanning Tunneling Microscopy (STM) and Scanning Tunneling Spectroscopy (STS) from Scanning Probe Microscopy.
It can transform the data from files generated by a the nanonis software into the NeXus application definition NXsts.
The example contains data files from the two specific nanonis software versions generic 5e and generic 4.5.
apm:
path: examples/data/uploads/apm.zip
title: Atom Probe Microscopy
description: |
This example presents the capabilities of the NOMAD platform to store and standardize atom probe data.
It shows the generation of a NeXus file according to the
[NXapm](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXapm.html#nxapm)
application definition and a successive analysis of an example data set.
The example contains a small atom probe dataset from an experiment with a LEAP instrument to get you started
and keep the size of your NOMAD installation small. Ones started, we recommend to change the respective
input file in the NOMAD Oasis ELN to run the example with your own datasets.
em:
path: examples/data/uploads/em.zip
title: Electron Microscopy
description: |
This example presents the capabilities of the NOMAD platform to store and standardize electron microscopy.
It shows the generation of a NeXus file according to the
[NXem](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXem.html#nxem)
application definition.
The example contains a small set of electron microscopy datasets to get started and keep the size of your
NOMAD installation small. Ones started, we recommend to change the respective input file in the NOMAD Oasis
ELN to run the example with your own datasets.
iv_temp:
path: examples/data/uploads/iv_temp.zip
title: Sensor Scan - IV Temperature Curve
description: |
This example shows users how to take data from a Python framework and map it out to a Nexus application definition for IV Temperature measurements, [NXiv_temp](https://fairmat-experimental.github.io/nexus-fairmat-proposal/1c3806dba40111f36a16d0205cc39a5b7d52ca2e/classes/contributed_definitions/NXiv_temp.html#nxiv-temp).
We use the Nexus ELN features of Nomad to generate a Nexus file.
Subproject commit 0d8b9116cb90c8d70b831ec93416d3302595f7c0
Subproject commit 3b246145b21484b72c67c6d4fb12207c1f869b0d
Subproject commit 48752a78360e1369d7754fcfccae068bf932d428
Subproject commit 06ee0e92751ec253ea6ac4b3da755e1f4c29446c
Subproject commit aa0b0f530d4955b21d6081f257d5ffbc978b3852
{
"name": "nomad-fair-gui",
"private": true,
"workspaces": [
"materia",
"crystcif-parse"
],
"dependencies": {
"@date-io/date-fns": "^1.3.13",
"@fontsource/material-icons": "^4.2.1",
"@fontsource/titillium-web": "^4.2.2",
"@lauri-codes/materia": "^1.0.1",
"crystcif-parse": "0.2.9",
"@h5web/app": "8.0.0",
"@h5web/lib": "8.0.0",
"@material-ui/core": "^4.12.4",
......
......@@ -109,12 +109,12 @@ export const DistributionInfo = React.memo(({data}) => {
<li>version: {data.version}</li>
{data?.plugin_packages?.length
? <li>{"plugin packages: "}
{data.plugin_packages.map(pluginPackage => <>
{data.plugin_packages.map((pluginPackage, i) => <>
<Link key={pluginPackage.name} href="#" onClick={() => {
setSelected(pluginPackage)
setTitle('Plugin package')
}}>{pluginPackage.name}</Link>
{", "}
{i !== data.plugin_packages.length - 1 ? ", " : null}
</>)}
</li>
: null
......@@ -125,12 +125,14 @@ export const DistributionInfo = React.memo(({data}) => {
const entryPoints = categories[category]
return <li key={category}>
{`${pluralize(category, 2)}: `}
{entryPoints.map(entryPoint => <>
{entryPoints
.sort((a, b) => (a.id > b.id) ? 1 : ((b.id > a.id) ? -1 : 0))
.map((entryPoint, i) => <>
<Link key={entryPoint.id} href="#" onClick={() => {
setSelected(entryPoint)
setTitle('Plugin entry point')
}}>{entryPoint.id}</Link>
{", "}
{i !== entryPoints.length - 1 ? ", " : null}
</>)}
</li>
})}
......
......@@ -45,11 +45,13 @@ describe('Test numberEditQuantity', () => {
await waitFor(() => expect(numberFieldValueInput.value).toEqual('10'))
await changeValue(numberFieldValueInput, '5')
await changeValue(numberFieldValueInput, '0')
await changeValue(numberFieldValueInput, '')
await waitFor(() => expect(handleChange.mock.calls).toHaveLength(2))
await waitFor(() => expect(handleChange.mock.calls).toHaveLength(3))
await waitFor(() => expect(handleChange.mock.calls[0][0]).toBe(5))
await waitFor(() => expect(handleChange.mock.calls[1][0]).toBe(undefined))
await waitFor(() => expect(handleChange.mock.calls[1][0]).toBe(0))
await waitFor(() => expect(handleChange.mock.calls[2][0]).toBe(undefined))
})
test.each([
......
......@@ -34,7 +34,7 @@ const quantityDef = {
const testSearchDialogCancelButton = async () => {
const dialog = screen.getByTestId('search-dialog')
await waitFor(() => expect(screen.queryByText('visibility=visible')).toBeInTheDocument())
await waitFor(() => expect(screen.queryByText('visible')).toBeInTheDocument())
// cancel the search
await userEvent.click(within(dialog).getByRole('button', {name: /cancel/i}))
......@@ -43,7 +43,7 @@ const testSearchDialogCancelButton = async () => {
const testSearchDialogOkButton = async () => {
const dialog = screen.getByTestId('search-dialog')
await waitFor(() => expect(screen.queryByText('visibility=visible')).toBeInTheDocument())
await waitFor(() => expect(screen.queryByText('visible')).toBeInTheDocument())
// accept the search
await userEvent.click(within(dialog).getByTestId('search-dialog-ok'))
......