Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • nomad-lab/nomad-FAIR
  • pgoel/nomad-FAIR
  • jpd47/nomad-FAIR
3 results
Show changes
Commits on Source (101)
Showing
with 954 additions and 79 deletions
......@@ -83,7 +83,6 @@ share/python-wheels/
.installed.cfg
*.egg
MANIFEST
nexus.obj
# PyInstaller
# Usually these files are written by a python script from a template
......@@ -450,3 +449,7 @@ docker-compose.local.yml
# nexus objects
nexus.obj
celerybeat-schedule.dir
celerybeat-schedule.dat
celerybeat-schedule.bak
......@@ -63,5 +63,8 @@
path = dependencies/parsers/simulation
url = https://github.com/nomad-coe/simulation-parsers.git
[submodule "dependencies/parsers/example"]
path = dependencies/parsers/example
url = https://github.com/nomad-coe/nomad-parser-example.git
path = dependencies/parsers/example
url = https://github.com/nomad-coe/nomad-parser-example.git
[submodule "dependencies/nomad-aitoolkit"]
path = dependencies/nomad-aitoolkit
url = https://github.com/FAIRmat-NFDI/nomad-aitoolkit.git
This diff is collapsed.
......@@ -126,9 +126,6 @@ ARG SETUPTOOLS_SCM_PRETEND_VERSION='0.0'
# Build documentation
# This is a temporary workaround because atomisticparsers installs an older version
# of nomad-lab via pip install git+...still containing pynxtools as a submodule
RUN pip uninstall -y pynxtools
RUN pip install ".[parsing,infrastructure,dev]"
RUN ./scripts/generate_docs_artifacts.sh \
......@@ -174,14 +171,6 @@ RUN pip install --progress-bar off --prefer-binary -r requirements.txt
# install
COPY --from=dev_python /app/dist/nomad-lab-*.tar.gz .
RUN pip install nomad-lab-*.tar.gz
# This is a temporary workaround because atomisticparsers installs an older version
# of nomad-lab via pip install git+...still containing pynxtools as a submodule.
RUN pip uninstall -y pynxtools
RUN pip install pynxtools[convert]
# This is a temporary workaround because pynxtools installs an incompatible
# version of h5grove
RUN pip uninstall -y h5grove
RUN pip install h5grove[fastapi]==1.3.0
# Reduce the size of the packages
RUN find /usr/local/lib/python3.9/ -type d -name 'tests' ! -path '*/networkx/*' -exec rm -r '{}' + \
......@@ -213,6 +202,7 @@ WORKDIR /app
# transfer installed packages from the build stage
COPY --chown=nomad:1000 scripts/run.sh .
COPY --chown=nomad:1000 scripts/run-worker.sh .
COPY --chown=nomad:1000 nomad/jupyterhub_config.py ./nomad/jupyterhub_config.py
COPY --chown=nomad:1000 --from=dev_python /app/examples/data/uploads /app/examples/data/uploads
......
......@@ -52,8 +52,15 @@ Omitted versions are plain bugfix releases with only minor changes and fixes. Th
file [`CHANGELOG.md`](https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/blob/develop/CHANGELOG.md)
contains much more detailed information about changes and fixes in the released versions.
### v1.2.2
- Added gunicron multi-process manager to serve the app.
### v1.3.0
- More concise plugins and more plugin types: normalizer and apps.
- Refactored the labels in ELNs and the archive browser to be more consistent.
- Optional simplified log-transfer.
- An "all" group that allows to share unpublished data with everyone.
- HDF5 references.
- Dynamic search quantities in search Apps.
- Added gunicron multi-process manager to serve NOMAD Oasis.
- Added the graph API.
### v1.2.1
- CLI utility to export archive data
......
Subproject commit bced0fceeb691e3ecef1f4b62c198edf8d137484
Subproject commit 4741354878819a8f108fe03679480965127417f6
Subproject commit 7e2734de71cef61f6511955245606abdce6b78ad
Subproject commit 72aa44b44b453255b7d16c480d4d2f5880f0e8f5
Subproject commit b3c3d7414918d69ddad9e1398133f6c60de541d8
......@@ -35,20 +35,20 @@ Each of the _mainfiles_ represent an electronic-structure calculation (either [D
graph LR;
A2((Inputs)) --> B2[DFT];
A1((Inputs)) --> B1[DFT];
subgraph pressure P<sub>2</sub>
subgraph pressure P2
B2[DFT] --> C2[TB];
C2[TB] --> D21[DMFT at T<sub>1</sub>];
C2[TB] --> D22[DMFT at T<sub>2</sub>];
C2[TB] --> D21[DMFT at T1];
C2[TB] --> D22[DMFT at T2];
end
D21[DMFT at T<sub>1</sub>] --> E21([Output calculation P<sub>2</sub>, T<sub>1</sub>])
D22[DMFT at T<sub>2</sub>] --> E22([Output calculation P<sub>2</sub>, T<sub>2</sub>])
subgraph pressure P<sub>1</sub>
D21[DMFT at T1] --> E21([Output calculation P2, T1])
D22[DMFT at T2] --> E22([Output calculation P2, T2])
subgraph pressure P1
B1[DFT] --> C1[TB];
C1[TB] --> D11[DMFT at T<sub>1</sub>];
C1[TB] --> D12[DMFT at T<sub>2</sub>];
C1[TB] --> D11[DMFT at T1];
C1[TB] --> D12[DMFT at T2];
end
D11[DMFT at T<sub>1</sub>] --> E11([Output calculation P<sub>1</sub>, T<sub>1</sub>])
D12[DMFT at T<sub>2</sub>] --> E12([Output calculation P<sub>1</sub>, T<sub>2</sub>])
D11[DMFT at T1] --> E11([Output calculation P1, T1])
D12[DMFT at T2] --> E12([Output calculation P1, T2])
```
Here, "Input" refers to the all _input_ information given to perform the calculation (e.g., atom positions, model parameters, experimental initial conditions, etc.). "DFT", "TB" and "DMFT" refer to individual _tasks_ of the workflow, which each correspond to a _SinglePoint_ entry in NOMAD. "Output calculation" refers to the _output_ data of each of the final DMFT tasks.
......
# Domain-specific examples for X-ray photoelectron spectroscopy
!!! warning "Attention"
We are currently working to update this content.
## Contextualization for the technique and the scientific domain
A variety of file formats are used in the research field of X-ray photoelectron spectroscopy and related techniques. The pynxtools-xps plugin of the pynxtools parsing library solves the challenge of how these formats can be parsed and normalized into a common representation that increases interoperability and adds semantic expressiveness.
- [pynxtools-xps](https://fairmat-nfdi.github.io/pynxtools-xps/)
pynxtools-xps, which is a plugin for [pynxtools](https://github.com/FAIRmat-NFDI/pynxtools), provides a tool for reading data from various propietary and open data formats from technology partners and the wider XPS community and standardizing it such that it is compliant with the [NeXus](https://www.nexusformat.org/) application definition [`NXmpes`](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXmpes.html)
## Supported file formats
A list of the supported file formats can be found in the [pynxtools-xps](https://fairmat-nfdi.github.io/pynxtools-xps/) documentation.
\ No newline at end of file
# How to release a new NOMAD version
## What is a release
NOMAD is a public service, a Git repository, a Python package, and a docker image.
What exactly is a NOMAD release? It is all of the following:
- a version tag on the main NOMAD [git project](https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR),
e.g. [`v1.3.0`](https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/tags/v1.3.0)
- a gitlab release based on a tag with potential release notes
- a version of the `nomad-lab` Python package released to pypi.org, e.g. `nomad-lab==1.3.0`.
- a docker image tag, e.g. `gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair:v1.3.0`
- the docker image tag `stable` points to the image with the latest release tag
## Steps towards a new release
- Silently create a new version tag in the `v1.3.0` format.
- Deploy the build from this tag to the public NOMAD deployments.
What deployments are updated might depend on the current needs. But usually
the production and test deployment should be updated.
- Release the Python package to the local gitlab registry. (This will update the
NORTH Jupyter image in the next nightly build and most likely effect plugins)
- Bump the `latest` docker image tag.
- For minor and major releases, encourage (Oasis) users to test the public services and the latest docker image for a short trial phase (e.g. 3 days). For patch releases this step should be
skipped.
- Create a gitlab release from the tag with potential release notes. Those notes
should also be added to the README.md. It is ok, if the updated README.md is not part of the
release itself.
- Bump the `stable` docker image tag.
- Publish the Python package to [pypi.org](https://pypi.org/)
## How to deal with hotfixes
This depends on the current `develop` branch and requires a judgement call. There are
two opposing scenarios:
1. The `develop` branch only contains minor fixes or fix/features that are not likely to effect
the released functionality. In this case, a new release with an increased patch version
is the right call.
2. The `develop` branch adds major refactorings and commits that likely effect the
released functionality. In this case, a `v1.3.0-hotfix` branch should be created.
After adding commits with the hotfix, the release process can be applied to the
hotfix branch in order to create a `v1.3.1` release that only contains the hotfixes and
not the changes on develop. After the `v1.3.1` release, the `v1.3.0-hotfix` branch is merged
back into develop. Hotfix branches should not live longer than a week.
## Major, minor, patch versions
- **patch**: No significant refactorings. Only new/updated features behind disabled feature switches.
Bugfixes. Might mark features as deprecated.
- **minor**: Might enabled new features by default. Can contain major refactorings (especially if they effect to plugin developers, data stewards etc.). Might finally deprecate features.
Should "basically" be backwards compatible.
- **major**: Breaking changes and will require data migration.
What is a *breaking change* and what does "basically" backwards compatible mean?
We develop experimental functionality and often need multiple iterations
to get a feature right. This also means that we technically introduce breaking changes
far more often than we can issue major releases. It is again a judgement call to decide on
major vs minor. The following things would generally not be considered *breaking* and would be considered *backwards compatible*:
- the breaking change is for a feature that is not enabled by default
- data migration is necessary for new functionality, but optional for existing functionality
- it is unlikely that plugins not developed by FAIRmat are effected
- it is unlikely that data beyond the central NOMAD deployments need to be migrated
## Release schedule
Patch releases should happen frequently and at least once every other month. Also
minor releases should be done semi regular. Important new features or at least
bi-annual FAIRmat events should trigger a minor release. Major releases require
more involved planning, data migration, and respective instructions and assistance to
NOMAD (Oasis) users. They are also political. Therefore, they do not a have a regular
schedule.
With a one `develop` branch Git strategy, there might be necessary exceptions to
regular patch releases. In general, new features should be protected by feature switches,
and should not be an issue. However, major refactorings that might effect multiple components are hard to hide behind a feature switch. In such cases, the release schedule might be
put on hold for another month or two.
\ No newline at end of file
......@@ -2,10 +2,10 @@
<!-- # Operating an OASIS -->
Originally, NOMAD Central Repository is a service run at Max-Planck's computing facility in Garching, Germany.
Originally, the NOMAD Central Repository is a service that runs at the Max-Planck's computing facility in Garching, Germany.
However, the NOMAD software is Open-Source, and everybody can run it. Any service that
uses NOMAD software independently is called a *NOMAD OASIS*. A *NOMAD OASIS* does not
need to be fully isolated. For example, you can publish uploads from your OASIS to the
uses NOMAD software independently is called a *NOMAD Oasis*. A *NOMAD Oasis* does not
need to be fully isolated. For example, you can publish uploads from your NOMAD Oasis to the
central NOMAD installation.
!!! note
......@@ -76,12 +76,13 @@ add up quickly, especially if many CPU cores are available for processing entrie
parallel. We recommend at least 2GB per core and a minimum of 8GB. You also need to consider
RAM and CPU for running tools like jupyter, if you opt to use NOMAD NORTH.
### Sharing data through the logtransfer service and data privacy notice
### Sharing data through log transfer and data privacy notice
NOMAD includes a `logtransfer` service. When enabled this service automatically collects
and transfers non-personalized log-data to us. Currently, this service is experimental
NOMAD includes a *log transfer* functions. When enabled this it automatically collects
and transfers non-personalized logging data to us. Currently, this functionality is experimental
and requires opt-in. However, in upcoming versions of NOMAD Oasis, we might change to out-out.
See the instructions in the configuration below on how to enable/disable `logtransfer`.
To enable this functionality add `logtransfer.enabled: true` to you `nomad.yaml`.
The service collects log-data and aggregated statistics, such as the number of users or the
number of uploaded datasets. In any case this data does not personally identify any users or
......@@ -93,7 +94,7 @@ The data is solely used by the NOMAD developers and FAIRmat, including but not l
* Improving our NOMAD software based on usage patterns.
* Generating aggregated and anonymized reports.
We do not share any data collected through the `logtransfer` service with any third parties.
We do not share any collected data with any third parties.
We may update this data privacy notice from time to time to reflect changes in our data practices.
We encourage you to review this notice periodically for any updates.
......@@ -168,9 +169,6 @@ Changes necessary:
- The group in the value of the hub's user parameter needs to match the docker group
on the host. This should ensure that the user which runs the hub, has the rights to access the host's docker.
- On Windows or MacOS computers you have to run the `app` and `worker` container without `user: '1000:1000'` and the `north` container with `user: root`.
- To opt-in the `logtransfer` service
([data notice above](#sharing-data-through-the-logtransfer-service-and-data-privacy-notice)), start `docker compose`
with the flag `--profile with_logtransfer`. See also below for further necessary adaptations in the `nomad.yaml` file.
A few things to notice:
......@@ -201,7 +199,7 @@ You should change the following:
users back to this host. Make sure this is the hostname, your users can use.
- Replace `deployment`, `deployment_url`, and `maintainer_email` with representative values.
The `deployment_url` should be the url to the deployment's api (should end with `/api`).
- To enable the `logtransfer` service activate logging in `logstash` format by setting `enable: true`.
- To enable the *log transfer* set `logtransfer.enable: true` ([data privacy notice above](#sharing-data-through-the-logtransfer-service-and-data-privacy-notice)).
- You can change `api_base_path` to run NOMAD under a different path prefix.
- You should generate your own `north.jupyterhub_crypt_key`. You can generate one
with `openssl rand -hex 32`.
......@@ -485,12 +483,6 @@ docker run --rm -v `pwd`/nginx.conf:/etc/nginx/conf.d/default.conf -p 80:80 ngin
### Running NOMAD
Before you start, we need to transfer your `nomad.yaml` config values to the GUI's
javascript. You need to repeat this, if you change your `nomad.yaml`. You can do this by running:
```
nomad admin ops gui-config
```
To run NOMAD, you must run two services. One is the NOMAD app, it serves the API and GUI:
```sh
--8<-- "run.sh"
......@@ -498,7 +490,7 @@ To run NOMAD, you must run two services. One is the NOMAD app, it serves the API
the second is the NOMAD worker, that runs the NOMAD processing.
```
celery -A nomad.processing worker -l info -Q celery
--8<-- "run_worker.sh"
```
This should give you a working OASIS at `http://<your-host>/<your-path-prefix>`.
......
......@@ -52,7 +52,7 @@ The definition fo the actual app is given as an instance of the `App` class spec
```python
from nomad.config.models.plugins import AppEntryPoint
from nomad.config.models.ui import App, Column, Columns, FilterMenu, FilterMenus
from nomad.config.models.ui import App, Column, Columns, FilterMenu, FilterMenus, Filters
myapp = AppEntryPoint(
......@@ -69,22 +69,34 @@ myapp = AppEntryPoint(
description='An app customized for me.',
# Longer description that can also use markdown
readme='Here is a much longer description of this app.',
# Controls the available search filters. If you want to filter by
# quantities in a schema package, you need to load the schema package
# explicitly here. Note that you can use a glob syntax to load the
# entire package, or just a single schema from a package.
filters=Filters(
include=['*#nomad_example.schema_packages.mypackage.MySchema'],
),
# Controls which columns are shown in the results table
columns=Columns(
selected=['entry_id'],
selected=[
'entry_id'
'data.mysection.myquantity#nomad_example.schema_packages.mypackage.MySchema'
],
options={
'entry_id': Column(),
'upload_create_time': Column(),
'data.mysection.myquantity#nomad_example.schema_packages.mypackage.MySchema': Column(),
}
),
# Dictionary of search filters that are always enabled for queries made
# within this app. This is especially important to narrow down the
# results to the wanted subset. Any available search filter can be
# targeted here.
# targeted here. This example makes sure that only entries that use
# MySchema are included.
filters_locked={
'upload_create_time': {
'gte': 0
}
"section_defs.definition_qualified_name:all": [
"nomad_example.schema_packages.mypackage.MySchema"
]
},
# Controls the filter menus shown on the left
filter_menus=FilterMenus(
......@@ -101,7 +113,7 @@ myapp = AppEntryPoint(
'autorange': True,
'nbins': 30,
'scale': 'linear',
'quantity': 'results.material.n_elements',
'quantity': 'data.mysection.myquantity#nomad_example.schema_packages.mypackage.MySchema',
'layout': {
'lg': {
'minH': 3,
......@@ -118,10 +130,31 @@ myapp = AppEntryPoint(
)
)
```
!!! tip
If you want to load an app definition from a YAML file, this can be easily done with the pydantic `parse_obj` function:
```python
import yaml
from nomad.config.models.plugins import AppEntryPoint
from nomad.config.models.ui import App
yaml_data = """
label: My App
path: myapp
category: Theory
"""
myapp = AppEntryPoint(
name='MyApp',
description='App defined using the new plugin mechanism.',
app=App.parse_obj(
yaml.safe_load(yaml_data)
),
)
```
### Loading custom quantity definitions into an app
By default, none of the quantities from custom schemas are available in an app, and they need to be explicitly added. Each app may define additional **filters** that should be enabled in it. Filters have a special meaning in the app context: filters are pieces of (meta)info than can be queried in the search interface of the app, but also targeted in the rest of the app configuration as explained below in.
By default, none of the quantities from custom schemas are available in an app, and they need to be explicitly added. Each app may define additional **filters** that should be enabled in it. Filters have a special meaning in the app context: filters are pieces of (meta)info that can be queried in the search interface of the app, but also targeted in the rest of the app configuration as explained below in.
!!! note
......@@ -135,8 +168,8 @@ schema is defined in:
- Python schemas are identified by the python path for the class that inherits
from `Schema`. For example, if you have a python package called `nomad_example`,
which has a subpackage called `schema_packages`, containing a module called `myschema.py`, which contains the class `MySchema`, then
the schema name will be `nomad_example.schema_packages.myschema.MySchema`.
which has a subpackage called `schema_packages`, containing a module called `mypackage.py`, which contains the class `MySchema`, then
the schema name will be `nomad_example.schema_packages.mypackage.MySchema`.
- YAML schemas are identified by the entry id of the schema file together with
the name of the section defined in the YAML schema. For example
if you have uploaded a schema YAML file containing a section definition called
......@@ -147,41 +180,42 @@ The quantities from schemas may be included or excluded as filter by using the
[`filters`](#filters) field in the app config. This option supports a
wildcard/glob syntax for including/excluding certain filters. For example, to
include all filters from the Python schema defined in the class
`nomad_example.schema_packages.myschema.MySchema`, you could use:
`nomad_example.schema_packages.mypackage.MySchema`, you could use:
```yaml
myapp:
filters:
include:
- '*#nomad_example.schema_packages.myschema.MySchema'
```python
filters=Filters(
include=['*#nomad_example.schema_packages.mypackage.MySchema']
)
```
The same thing for a YAML schema could be achieved with:
```yaml
myapp:
filters:
include:
- '*#entry_id:<entry_id>.MySchema'
```python
filters=Filters(
include=['*#entry_id:<entry_id>.MySchema']
)
```
Once quantities from a schema are included in an app as filters, they can be targeted in the rest of the app. The app configuration often refers to specific filters to configure parts of the user interface. For example, one could configure the results table to show a new column using one of the schema quantities with:
```yaml
myapp:
columns:
include:
- 'data.mysection.myquantity#nomad_example.schema_packages.myschema.MySchema'
- 'entry_id'
options:
data.mysection.myquantity#myschema.schema.MySchema:
...
```python
columns=Columns(
selected=[
'entry_id'
'data.mysection.myquantity#nomad_example.schema_packages.mypackage.MySchema'
],
options={
'entry_id': Column(),
'upload_create_time': Column(),
'data.mysection.myquantity#nomad_example.schema_packages.mypackage.MySchema': Column(),
}
)
```
The syntax for targeting quantities depends on the resource:
- For python schemas, you need to provide the path and the python schema name separated
by a hashtag (#), for example `data.mysection.myquantity#nomad_example.schema_packages.myschema.MySchema`.
by a hashtag (#), for example `data.mysection.myquantity#nomad_example.schema_packages.mypackage.MySchema`.
- For YAML schemas, you need to provide the path and the YAML schema name separated
by a hashtag (#), for example `data.mysection.myquantity#entry_id:<entry_id>.MySchema`.
- Quantities that are common for all NOMAD entries can be targeted by using only
......
......@@ -177,12 +177,22 @@ we will get a final normalized archive that contains our data like this:
}
```
## Migration guide
By default, schema packages are identified by the full qualified path to the Python module that contains the definitions. An example of a full qualified path could be `nomad_example.schema_packages.mypackage`, where the first part is the Python package name, second part is a subpackage, and the last part is a Python module containing the definitions. This is the easiest way to prevent conflicts between different schema packages: python package names are unique (prevents clashes between packages) and paths inside a package must point to a single python module (prevents clashes within package). This does, however, mean that *if you move your schema definition in the plugin source code, any references to the old definition will break*. This becomes problematic in installations that have lot of old data processed with the old definition location, as those entries will still refer to the old location and will not work correctly.
As it might not be possible, or even wise to prevent changes in the source code layout, and reprocessing all old entries might be impractical, we do provide an alias mechanism to help with migration tasks. Imagine your schema package was contained in `nomad_example.schema_packages.mypackage`, and in a newer version of your plugin you want to move it to `nomad_example.schema_packages.mynewpackage`. The way to do this without completely breaking the old entries is to add an alias in the schema package definition:
```python
m_package = SchemaPackage(aliases=['nomad_example.schema_packages.mypackage'])
```
Note that this will only help in scenarious where you have moved the definition and not removed or modified any of them.
## Definitions
The following describes in detail the schema language for the NOMAD Metainfo and how it is expressed in Python.
### Common attributes of Metainfo Definitions
In the example, you have already seen the basic Python interface to the Metainfo. *Sections* are
......
......@@ -27,6 +27,7 @@ A series of tutorials will guide you through the main functionality of NOMAD.
- [Use the search interface to identify interesting data](tutorial/explore.md)
- [Use the API to search and access processed data for analysis](tutorial/access_api.md)
- [Create and use custom schemas in NOMAD](tutorial/custom.md)
- [Developing a NOMAD plugin](tutorial/develop_plugin.md)
- [Example data and exercises](https://www.fairmat-nfdi.eu/events/fairmat-tutorial-1/tutorial-1-materials){:target="_blank"}
- [More videos and tutorials on YouTube](https://youtube.com/playlist?list=PLrRaxjvn6FDW-_DzZ4OShfMPcTtnFoynT){:target="_blank"}
......
# Developing a NOMAD Plugin
In this tutorial you will learn how to create and develop a NOMAD plugin. As an example we
will create a plugin to log data for a simple sintering process.
## Prerequisites
- A GitHub account. This can be created for free on [github.com](https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F&source=header-home).
- Basic understanding of Python.
- Basic understanding of NOMAD metainfo, see for example [tutorial 8](https://www.fairmat-nfdi.eu/events/fairmat-tutorial-8/tutorial-8-materials).
!!! note
Several software development concepts are being used during this tutorial.
Here is a list with some further information on each of them:
* [what is Git](https://learn.microsoft.com/en-us/devops/develop/git/what-is-git)
* [what is VSCode, i. e., an Integrated Development Environment (IDE)](https://aws.amazon.com/what-is/ide/)
* [what is Pip](https://realpython.com/lessons/what-is-pip-overview/)
* [what is a Python virtual environment](https://realpython.com/python-virtual-environments-a-primer/#why-do-you-need-virtual-environments)
* [creating a Python package](https://packaging.python.org/en/latest/tutorials/packaging-projects/)
* [uploading a package to PyPI](https://www.freecodecamp.org/news/how-to-create-and-upload-your-first-python-package-to-pypi/)
* [what is cruft](https://cruft.github.io/cruft/)
## Create a Git(Hub) repository
Firstly, we recommend to use git to version control your NOMAD plugin.
There is a GitHub template repository that can be used for this at [github.com/FAIRmat-NFDI/nomad-plugin-template](https://github.com/FAIRmat-NFDI/nomad-plugin-template).
To use the template you should choose the "Create an new repository" option after pressing
the green "Use this template" button in the upper right corner.
Please note that you have to be logged into to GitHub to see this option.
![Use template](./images/use_template_dark.png#gh-dark-mode-only)
![Use template](./images/use_template_light.png#gh-light-mode-only)
Enter a name (I will use "nomad-sintering" for mine) for your repository and click
"Create Repository".
## Generate the plugin structure
Next, we will use a cookiecutter template to create the basic structure of our NOMAD
plugin.
There are now two options for how to proceed.
1. You can use the GitHub codespaces environment to develop your plugin, or
2. If you have access to a Linux computer you can also run the same steps locally.
### 1. Using GitHub codespaces
To use a GitHub codespace for the plugin development you should choose the "Create
codespace on main" option after pressing the green "<> Code" button in the upper right
corner.
![Use codepace](./images/codespace_dark.png#gh-dark-mode-only)
![Use codespace](./images/codespace_light.png#gh-light-mode-only)
### 2. Developing locally
If you have a Linux machine and prefer to develop locally you should **instead** click the
"Local" tab after pressing the green "<> Code" button, copy the path, and clone your
repository by running:
```sh
git clone PATH/COPIED/FROM/REPOSITORY
```
and move inside the top directory
```
cd REPOSITORY_NAME
```
You will also need to install [cruft](https://pypi.org/project/cruft/), preferably using
`pipx`:
```sh
# pipx is strongly recommended.
pipx install cruft
# If pipx is not an option,
# you can install cruft in your Python user directory.
python -m pip install --user cruft
```
### Run cruft
The next step is to run cruft to use our cookiecutter template:
```sh
cruft create https://github.com/FAIRmat-NFDI/cookiecutter-nomad-plugin
```
Cookiecutter prompts you for information regarding your plugin and I will enter the
following for my example:
```no-highlight
[1/12] full_name (John Doe): Hampus Näsström
[2/12] email (john.doe@physik.hu-berlin.de): hampus.naesstroem@physik.hu-berlin.de
[3/12] github_username (foo): hampusnasstrom
[4/12] plugin_name (foobar): sintering
[5/12] module_name (sintering):
[6/12] short_description (Nomad example template): A schema package plugin for sintering.
[7/12] version (0.1.0):
[8/12] Select license
1 - MIT
2 - BSD-3
3 - GNU GPL v3.0+
4 - Apache Software License 2.0
Choose from [1/2/3/4] (1):
[9/12] include_schema_package [y/n] (y): y
[10/12] include_normalizer [y/n] (y): n
[11/12] include_parser [y/n] (y): n
[12/12] include_app [y/n] (y): n
```
There you go - you just created a minimal NOMAD plugin:
!!! note
In the above prompt, we pressed `y` for schema_package, this creates a python package
with a plugin entry point for a schema package.
```no-highlight
nomad-sintering/
├── LICENSE
├── MANIFEST.in
├── README.md
├── docs
│ └── ...
├── mkdocs.yml
├── move_template_files.sh
├── pyproject.toml
├── src
│ └── nomad_sintering
│ ├── __init__.py
│ └── schema_packages
│ ├── __init__.py
│ └── mypackage.py
└── tests
├── conftest.py
├── data
│ └── test.archive.yaml
└── schema_packages
└── test_schema.py
```
!!! note
The project `nomad-sintering` is created in a new directory, we have included a helper script to move all the files to the parent level of the repository.
```sh
sh CHANGE_TO_PLUGIN_NAME/move_template_files.sh
```
!!! warning "Attention"
The `CHANGE_TO_PLUGIN_NAME` should be substituted by the name of the plugin you've created. In the above case it'll be `sh nomad-sintering/move_template_files.sh`.
Finally, we should add the files we created to git and commit the changes we have made:
```sh
git add -A
git commit -m "Generated plugin from cookiecutter template"
git push
```
### Enable Cruft updates
In order to receive updates from our cookiecutter template we have included a GitHub
action that automatically checks for updates once a week (or by triggering it manually).
In order for this action to run we need to give the action permission to write and create
pull requests. To do this we should go back to the plugin repo and head to the settings
tab and navigate to the Actions/General options on the left:
![Use template](./images/github_settings_dark.png#gh-dark-mode-only)
![Use template](./images/github_settings_light.png#gh-light-mode-only)
At the very bottom of this place you should mark the "Read and write permissions"
and the "Allow GitHub Actions to create and approve pull requests" options and click save.
![Use template](./images/workflow_permissions_dark.png#gh-dark-mode-only)
![Use template](./images/workflow_permissions_light.png#gh-light-mode-only)
## Setting up the python environment
### Creating a virtual environment
Before we can start developing we recommend to create a virtual environment using Python 3.9
```sh
python3.9 -m venv .pyenv
source .pyenv/bin/activate
```
### Installing the plugin
Next we should install our plugin package in editable mode and using the nomad package
index
```sh
pip install --upgrade pip
pip install -e '.[dev]' --index-url https://gitlab.mpcdf.mpg.de/api/v4/projects/2187/packages/pypi/simple
```
!!! note
Until we have an official PyPI NOMAD release with the latest NOMAD version, make sure to include NOMAD's internal package registry (e.g. via --index-url). The latest PyPI package available today is version 1.2.2 and it misses some updates functional to this tutorial.
In the future, when a newer release of `nomad-lab` will be available ( 1.2.2) you can omit the `--index-url`.
## Importing a yaml schema
### The schema
We will now convert the yaml schema package from part 2 where we described a sintering
step:
```yaml
definitions:
name: 'Tutorial 13 sintering schema'
sections:
TemperatureRamp:
m_annotations:
eln:
properties:
order:
- "name"
- "start_time"
- "initial_temperature"
- "final_temperature"
- "duration"
- "comment"
base_sections:
- nomad.datamodel.metainfo.basesections.ProcessStep
quantities:
initial_temperature:
type: np.float64
unit: celsius
description: "initial temperature set for ramp"
m_annotations:
eln:
component: NumberEditQuantity
defaultDisplayUnit: celsius
final_temperature:
type: np.float64
unit: celsius
description: "final temperature set for ramp"
m_annotations:
eln:
component: NumberEditQuantity
defaultDisplayUnit: celsius
Sintering:
base_sections:
- nomad.datamodel.metainfo.basesections.Process
- nomad.datamodel.data.EntryData
sub_sections:
steps:
repeats: True
section: '#/TemperatureRamp'
```
We can grab this file from the tutorial repository using curl
```sh
curl -L -o sintering.archive.yaml "https://raw.githubusercontent.com/FAIRmat-NFDI/AreaA-Examples/main/tutorial13/part3/files/sintering.archive.yaml"
```
### `metainfo-yaml2py`
We will now use an external package `metainfo-yaml2py` to convert the yaml schema package
into python class definitions.
First we install the package with `pip`:
```sh
pip install metainfoyaml2py
```
Then we can run the `metainfo-yaml2py` command on the `sintering.archive.yaml` file with
the `-n` flag for adding `normalize()` functions (will be explained later)
and specify the output directory, with the `-o` flag, to be our `schema_packages`
directory:
```sh
metainfo-yaml2py sintering.archive.yaml -o src/nomad_sintering/schema_packages -n
```
### Updating `__init__.py` and `pyproject.toml`
The metadata of our package is defined in the `__init__.py` file and here we now need to
add the sintering package that we just created.
If we take a look in that file we can see an example created by the cookiecutter template.
We can go ahead and copy the `MySchemaPackageEntryPoint` class and the `mypackage`
instance and paste them below.
We then need to change:
1. the name of the class,
2. the import in the load function to import our sintering schema package,
3. the name of the instance and the class it uses,
4. ideally we should also update the description and the name.
The changes could look something like this:
```py
class SinteringEntryPoint(SchemaPackageEntryPoint):
def load(self):
from nomad_sintering.schema_packages.sintering import m_package
return m_package
sintering = SinteringEntryPoint(
name='Sintering',
description='Schema package for describing a sintering process.',
)
```
Finally, we also need to add our new entry point to the `pyproject.toml`.
At the bottom of the toml you will see how this was done for the example and we just need
to replicate that with whatever we called our instance:
```toml
sintering = "nomad_sintering.schema_packages:sintering"
```
Before we continue, we should commit our changes to git:
```sh
git add -A
git commit -m "Added sintering classes from yaml schema"
git push
```
### Ruff autoformatting
If we check the actions tab of the GitHub repository we might see that the last commit
caused an error in the Ruff format checking. We can either disable this workflow (not
recommended) or we can check and format our code with Ruff.
To check what Ruff thinks about our code we run:
```sh
ruff check .
```
To fix any issues we can run:
```sh
ruff check . --fix
```
And commit the changes:
```sh
git add -A
git commit -m "Ruff linting"
git push
```
## Adding a normalize function
Next we will add some functionality to our use case through a so called "normalize"
function. This allows us to add functionality to our schemas via Python code.
### The use case
For this tutorial we will assume that we have a recipe file for our hot plate that we will
parse:
```csv
step name,duration [min],initial temperature [C],final temperature [C]
heating, 30, 25, 300
hold, 60, 300, 300
cooling, 30, 300, 25
```
We can grab this file from the tutorial repository and place it in the tests/data
directory using curl
```sh
curl -L -o tests/data/sintering_example.csv "https://raw.githubusercontent.com/FAIRmat-NFDI/AreaA-Examples/main/tutorial13/part3/files/sintering_example.csv"
```
### Adding the code
The first thing we need to add is a new `Quantity` in our `Sintering` class to hold the
recipe file:
```py
data_file = Quantity(
type=str,
description='The recipe file for the sintering process.',
a_eln={
"component": "FileEditQuantity",
},
)
```
Here we have used the `a_eln` component annotation to add a `FileEditQuantity`. You will
see in part 4 how this looks in the GUI.
Secondly we need to update the normalize method to read the data file and update the
corresponding data.
First we will check if the `self.data_file` is present and, if so, use the
`archive.m_context.raw_file()` method to open the file and read it with the pandas
function `read_csv()`:
```py
if self.data_file:
with archive.m_context.raw_file(self.data_file) as file:
df = pd.read_csv(file)
```
We will then create a list to hold the steps, iterate over our data frame, create an
instance of a `TemperatureRamp`, and fill them.
```py
steps = []
for i, row in df.iterrows():
step = TemperatureRamp()
step.name = row['step name']
step.duration = ureg.Quantity(float(row['duration [min]']), 'min')
step.initial_temperature = ureg.Quantity(row['initial temperature [C]'], 'celsius')
step.final_temperature = ureg.Quantity(row['final temperature [C]'], 'celsius')
steps.append(step)
```
Here we have used the NOMAD unit registry to handle all the units.
Finally, we will assign the `self.steps` with our new list of steps.
```py
self.steps = steps
```
We also need to add the import of pandas and the NOMAD unit registry to the top of our
`sintering.py` file:
```py
from nomad.units import ureg
import pandas as pd
```
Here are all the changes combined:
```py
from nomad.units import ureg
import pandas as pd
class Sintering(Process, EntryData, ArchiveSection):
'''
Class autogenerated from yaml schema.
'''
m_def = Section()
steps = SubSection(
section_def=TemperatureRamp,
repeats=True,
)
data_file = Quantity(
type=str,
description='The recipe file for the sintering process.',
a_eln={
"component": "FileEditQuantity",
},
)
def normalize(self, archive, logger: BoundLogger) - None:
'''
The normalizer for the `Sintering` class.
Args:
archive (EntryArchive): The archive containing the section that is being
normalized.
logger (BoundLogger): A structlog logger.
'''
super(Sintering, self).normalize(archive, logger)
if self.data_file:
with archive.m_context.raw_file(self.data_file) as file:
df = pd.read_csv(file)
steps = []
for i, row in df.iterrows():
step = TemperatureRamp()
step.name = row['step name']
step.duration = ureg.Quantity(float(row['duration [min]']), 'min')
step.initial_temperature = ureg.Quantity(row['initial temperature [C]'], 'celsius')
step.final_temperature = ureg.Quantity(row['final temperature [C]'], 'celsius')
steps.append(step)
self.steps = steps
```
## Running the normalize function
We will now run the NOMAD processing on a test file to see the normalize function in
action.
### Create an archive.json file
The first step is to create the test file.
We should add a file with the ending `.archive.yaml` or `archive.json` and which contains
a `data` section and an `m_def` key with the value being our sintering section.
Finally, we should also add the `data_file` key with the value being our `.csv` file from
before.
```yaml
data:
m_def: nomad_sintering.schema_packages.sintering.Sintering
data_file: sintering_example.csv
```
We can once again grab this file from the tutorial repository and place it in the
tests/data directory using curl
```sh
curl -L -o tests/data/test_sintering.archive.yaml "https://raw.githubusercontent.com/FAIRmat-NFDI/AreaA-Examples/main/tutorial13/part3/files/test_sintering.archive.yaml"
```
!!! warning "Attention"
You might need to modify the package name for the `m_def` if you called your python
module something other than `nomad_sintering`
### Run the NOMAD CLI
To run the processing we us the NOMAD CLI method `parse` with the flag `--show-archive`
and save the output in a json file
```sh
nomad parse tests/data/test_sintering.archive.yaml --show-archive > normalized.archive.json
```
However, when we run this we will get an error from NOMAD!
```bash
could not normalize section (normalizer=MetainfoNormalizer, section=Sintering, exc_info=Cannot convert from 'milliinch' ([length]) to 'second' ([time]))
```
What is happening here is that it has treated our `'min'` unit for duration as `'milliinch'`
and not the intended minutes. To fix this we can directly edit the normalize function
of the `Sintering` class in the `sintering.py` file by replacing `'min'` with `'minutes'`.
```py
def normalize(self, archive: 'EntryArchive', logger: 'BoundLogger') -> None:
"""
The normalizer for the `Sintering` class.
Args:
archive (EntryArchive): The archive containing the section that is being
normalized.
logger (BoundLogger): A structlog logger.
"""
super().normalize(archive, logger)
if self.data_file:
with archive.m_context.raw_file(self.data_file) as file:
df = pd.read_csv(file)
steps = []
for i, row in df.iterrows():
step = TemperatureRamp()
step.name = row['step name']
# Changed 'min' to 'minutes' here:
step.duration = ureg.Quantity(float(row['duration [min]']), 'minutes')
```
Since we installed our package in editable mode the changes will take effect as soon as we
save and rerunning the nomad parse command above should now work.
To view the output you can open and inspect the `normalized.archive.json` file. The
beginning of that file should look something like:
```json
{
"data": {
"m_def": "nomad_sintering.schema_packages.sintering.Sintering",
"name": "test sintering",
"datetime": "2024-06-04T16:52:23.998519+00:00",
"data_file": "sintering_example.csv",
"steps": [
{
"name": "heating",
"duration": 1800.0,
"initial_temperature": 25.0,
"final_temperature": 300.0
},
{
"name": "hold",
"duration": 3600.0,
"initial_temperature": 300.0,
"final_temperature": 300.0
},
{
"name": "cooling",
"duration": 1800.0,
"initial_temperature": 300.0,
"final_temperature": 25.0
}
]
},
...
```
### Next steps
The next step is to include your new schema in a custom NOMAD Oasis. For more information
on how to setup a NOMAD Oasis you can have a look at
[How-to guides/NOMAD Oasis/Install and Oasis](../howto/oasis/install.md).
Before we move one we should make sure that we have committed our changes to git:
```sh
git add -A
git commit -m "Added a normalize function to the Sintering schema"
git push
```
docs/tutorial/images/codespace_dark.png

147 KiB

docs/tutorial/images/codespace_light.png

149 KiB

docs/tutorial/images/github_settings_dark.png

166 KiB

docs/tutorial/images/github_settings_light.png

165 KiB