Compare revisions

7f276c14 · a2df06d4 · 37534475 · cf1af177 · 83ccf6ed · cf207fb0
--- a/.gitignore
+++ b/.gitignore
@@ -83,7 +83,6 @@ share/python-wheels/
 .installed.cfg
 *.egg
 MANIFEST
-nexus.obj

 # PyInstaller
 #  Usually these files are written by a python script from a template
@@ -450,3 +449,7 @@ docker-compose.local.yml

 # nexus objects
 nexus.obj
+
+celerybeat-schedule.dir
+celerybeat-schedule.dat
+celerybeat-schedule.bak
--- a/.gitmodules
+++ b/.gitmodules
@@ -63,5 +63,8 @@
 	path = dependencies/parsers/simulation
 	url = https://github.com/nomad-coe/simulation-parsers.git
 [submodule "dependencies/parsers/example"]
-        path = dependencies/parsers/example
-        url = https://github.com/nomad-coe/nomad-parser-example.git
+	path = dependencies/parsers/example
+	url = https://github.com/nomad-coe/nomad-parser-example.git
+[submodule "dependencies/nomad-aitoolkit"]
+	path = dependencies/nomad-aitoolkit
+	url = https://github.com/FAIRmat-NFDI/nomad-aitoolkit.git
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
--- a/Dockerfile
+++ b/Dockerfile
@@ -126,9 +126,6 @@ ARG SETUPTOOLS_SCM_PRETEND_VERSION='0.0'

 # Build documentation

-# This is a temporary workaround because atomisticparsers installs an older version
-# of nomad-lab via pip install git+...still containing pynxtools as a submodule
-RUN pip uninstall -y pynxtools
 RUN pip install ".[parsing,infrastructure,dev]"

 RUN ./scripts/generate_docs_artifacts.sh \
@@ -174,14 +171,6 @@ RUN pip install --progress-bar off --prefer-binary -r requirements.txt
 # install
 COPY --from=dev_python /app/dist/nomad-lab-*.tar.gz .
 RUN pip install nomad-lab-*.tar.gz
-# This is a temporary workaround because atomisticparsers installs an older version
-# of nomad-lab via pip install git+...still containing pynxtools as a submodule.
-RUN pip uninstall -y pynxtools
-RUN pip install pynxtools[convert]
-# This is a temporary workaround because pynxtools installs an incompatible
-# version of h5grove
-RUN pip uninstall -y h5grove
-RUN pip install h5grove[fastapi]==1.3.0

 # Reduce the size of the packages
 RUN find /usr/local/lib/python3.9/ -type d -name 'tests' ! -path '*/networkx/*' -exec rm -r '{}' + \
@@ -213,6 +202,7 @@ WORKDIR /app

 # transfer installed packages from the build stage
 COPY --chown=nomad:1000 scripts/run.sh .
+COPY --chown=nomad:1000 scripts/run-worker.sh .
 COPY --chown=nomad:1000 nomad/jupyterhub_config.py ./nomad/jupyterhub_config.py

 COPY --chown=nomad:1000 --from=dev_python /app/examples/data/uploads /app/examples/data/uploads

--- a/README.md
+++ b/README.md
@@ -52,8 +52,15 @@ Omitted versions are plain bugfix releases with only minor changes and fixes. Th
 file [`CHANGELOG.md`](https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/blob/develop/CHANGELOG.md)
 contains much more detailed information about changes and fixes in the released versions.

-### v1.2.2
- Added gunicron multi-process manager to serve the app.
+### v1.3.0
+- More concise plugins and more plugin types: normalizer and apps.
+- Refactored the labels in ELNs and the archive browser to be more consistent.
+- Optional simplified log-transfer.
+- An "all" group that allows to share unpublished data with everyone.
+- HDF5 references.
+- Dynamic search quantities in search Apps.
+- Added gunicron multi-process manager to serve NOMAD Oasis.
+- Added the graph API.

 ### v1.2.1
 - CLI utility to export archive data

--- a/nomad-aitoolkit @ bced0fce
+++ b/nomad-aitoolkit @ bced0fce
+Subproject commit bced0fceeb691e3ecef1f4b62c198edf8d137484
--- a/atomistic @ 7e2734de
+++ b/atomistic @ 7e2734de
-Subproject commit 4741354878819a8f108fe03679480965127417f6
+Subproject commit 7e2734de71cef61f6511955245606abdce6b78ad
--- a/electronic @ b3c3d741
+++ b/electronic @ b3c3d741
-Subproject commit 72aa44b44b453255b7d16c480d4d2f5880f0e8f5
+Subproject commit b3c3d7414918d69ddad9e1398133f6c60de541d8
--- a/docs/examples/computational_data/workflows.md
+++ b/docs/examples/computational_data/workflows.md
@@ -35,20 +35,20 @@ Each of the _mainfiles_ represent an electronic-structure calculation (either [D
 graph LR;
    A2((Inputs)) --> B2[DFT];
    A1((Inputs)) --> B1[DFT];
-    subgraph pressure P<sub>2</sub>
+    subgraph pressure P2
    B2[DFT] --> C2[TB];
-    C2[TB] --> D21[DMFT at T<sub>1</sub>];
-    C2[TB] --> D22[DMFT at T<sub>2</sub>];
+    C2[TB] --> D21[DMFT at T1];
+    C2[TB] --> D22[DMFT at T2];
    end
-    D21[DMFT at T<sub>1</sub>] --> E21([Output calculation P<sub>2</sub>, T<sub>1</sub>])
-    D22[DMFT at T<sub>2</sub>] --> E22([Output calculation P<sub>2</sub>, T<sub>2</sub>])
-    subgraph pressure P<sub>1</sub>
+    D21[DMFT at T1] --> E21([Output calculation P2, T1])
+    D22[DMFT at T2] --> E22([Output calculation P2, T2])
+    subgraph pressure P1
    B1[DFT] --> C1[TB];
-    C1[TB] --> D11[DMFT at T<sub>1</sub>];
-    C1[TB] --> D12[DMFT at T<sub>2</sub>];
+    C1[TB] --> D11[DMFT at T1];
+    C1[TB] --> D12[DMFT at T2];
    end
-    D11[DMFT at T<sub>1</sub>] --> E11([Output calculation P<sub>1</sub>, T<sub>1</sub>])
-    D12[DMFT at T<sub>2</sub>] --> E12([Output calculation P<sub>1</sub>, T<sub>2</sub>])
+    D11[DMFT at T1] --> E11([Output calculation P1, T1])
+    D12[DMFT at T2] --> E12([Output calculation P1, T2])
 ```
 Here, "Input" refers to the all _input_ information given to perform the calculation (e.g., atom positions, model parameters, experimental initial conditions, etc.). "DFT", "TB" and "DMFT" refer to individual _tasks_ of the workflow, which each correspond to a _SinglePoint_ entry in NOMAD. "Output calculation" refers to the _output_ data of each of the final DMFT tasks.


--- a/docs/examples/experiment_data/xps.md
+++ b/docs/examples/experiment_data/xps.md
+# Domain-specific examples for X-ray photoelectron spectroscopy
+
+!!! warning "Attention"
+    We are currently working to update this content.
+
+## Contextualization for the technique and the scientific domain
+A variety of file formats are used in the research field of X-ray photoelectron spectroscopy and related techniques. The pynxtools-xps plugin of the pynxtools parsing library solves the challenge of how these formats can be parsed and normalized into a common representation that increases interoperability and adds semantic expressiveness.
+- [pynxtools-xps](https://fairmat-nfdi.github.io/pynxtools-xps/)
+
+
+pynxtools-xps, which is a plugin for [pynxtools](https://github.com/FAIRmat-NFDI/pynxtools), provides a tool for reading data from various propietary and open data formats from technology partners and the wider XPS community and standardizing it such that it is compliant with the [NeXus](https://www.nexusformat.org/) application definition [`NXmpes`](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXmpes.html) 
+
+## Supported file formats
+A list of the supported file formats can be found in the [pynxtools-xps](https://fairmat-nfdi.github.io/pynxtools-xps/) documentation.
\ No newline at end of file
--- a/docs/howto/develop/release.md
+++ b/docs/howto/develop/release.md
+# How to release a new NOMAD version
+
+## What is a release
+
+NOMAD is a public service, a Git repository, a Python package, and a docker image.
+What exactly is a NOMAD release? It is all of the following:
+
+- a version tag on the main NOMAD [git project](https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR),
+e.g. [`v1.3.0`](https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/-/tags/v1.3.0)
+- a gitlab release based on a tag with potential release notes
+- a version of the `nomad-lab` Python package released to pypi.org, e.g. `nomad-lab==1.3.0`.
+- a docker image tag, e.g. `gitlab-registry.mpcdf.mpg.de/nomad-lab/nomad-fair:v1.3.0`
+- the docker image tag `stable` points to the image with the latest release tag
+
+## Steps towards a new release
+
+- Silently create a new version tag in the `v1.3.0` format.
+- Deploy the build from this tag to the public NOMAD deployments.
+What deployments are updated might depend on the current needs. But usually
+the production and test deployment should be updated.
+- Release the Python package to the local gitlab registry. (This will update the
+NORTH Jupyter image in the next nightly build and most likely effect plugins)
+- Bump the `latest` docker image tag.
+- For minor and major releases, encourage (Oasis) users to test the public services and the latest docker image for a short trial phase (e.g. 3 days). For patch releases this step should be
+skipped.
+- Create a gitlab release from the tag with potential release notes. Those notes
+should also be added to the README.md. It is ok, if the updated README.md is not part of the
+release itself.
+- Bump the `stable` docker image tag.
+- Publish the Python package to [pypi.org](https://pypi.org/)
+
+## How to deal with hotfixes
+
+This depends on the current `develop` branch and requires a judgement call. There are
+two opposing scenarios:
+
+1. The `develop` branch only contains minor fixes or fix/features that are not likely to effect
+the released functionality. In this case, a new release with an increased patch version
+is the right call.
+
+2. The `develop` branch adds major refactorings and commits that likely effect the
+released functionality. In this case, a `v1.3.0-hotfix` branch should be created.
+After adding commits with the hotfix, the release process can be applied to the
+hotfix branch in order to create a `v1.3.1` release that only contains the hotfixes and
+not the changes on develop. After the `v1.3.1` release, the `v1.3.0-hotfix` branch is merged
+back into develop. Hotfix branches should not live longer than a week.
+
+## Major, minor, patch versions
+
+- **patch**: No significant refactorings. Only new/updated features behind disabled feature switches.
+Bugfixes. Might mark features as deprecated.
+
+- **minor**: Might enabled new features by default. Can contain major refactorings (especially if they effect to plugin developers, data stewards etc.). Might finally deprecate features.
+Should "basically" be backwards compatible.
+
+- **major**: Breaking changes and will require data migration.
+
+What is a *breaking change* and what does "basically" backwards compatible mean?
+We develop experimental functionality and often need multiple iterations
+to get a feature right. This also means that we technically introduce breaking changes
+far more often than we can issue major releases. It is again a judgement call to decide on
+major vs minor. The following things would generally not be considered *breaking* and would be considered *backwards compatible*:
+
+- the breaking change is for a feature that is not enabled by default
+- data migration is necessary for new functionality, but optional for existing functionality
+- it is unlikely that plugins not developed by FAIRmat are effected
+- it is unlikely that data beyond the central NOMAD deployments need to be migrated
+
+## Release schedule
+
+Patch releases should happen frequently and at least once every other month. Also
+minor releases should be done semi regular. Important new features or at least
+bi-annual FAIRmat events should trigger a minor release. Major releases require
+more involved planning, data migration, and respective instructions and assistance to
+NOMAD (Oasis) users. They are also political. Therefore, they do not a have a regular
+schedule.
+
+With a one `develop` branch Git strategy, there might be necessary exceptions to
+regular patch releases. In general, new features should be protected by feature switches,
+and should not be an issue. However, major refactorings that might effect multiple components are hard to hide behind a feature switch. In such cases, the release schedule might be
+put on hold for another month or two.
\ No newline at end of file
--- a/docs/howto/oasis/install.md
+++ b/docs/howto/oasis/install.md
@@ -2,10 +2,10 @@

 <!-- # Operating an OASIS -->

-Originally, NOMAD Central Repository is a service run at Max-Planck's computing facility in Garching, Germany.
+Originally, the NOMAD Central Repository is a service that runs at the Max-Planck's computing facility in Garching, Germany.
 However, the NOMAD software is Open-Source, and everybody can run it. Any service that
-uses NOMAD software independently is called a *NOMAD OASIS*. A *NOMAD OASIS* does not
-need to be fully isolated. For example, you can publish uploads from your OASIS to the
+uses NOMAD software independently is called a *NOMAD Oasis*. A *NOMAD Oasis* does not
+need to be fully isolated. For example, you can publish uploads from your NOMAD Oasis to the
 central NOMAD installation.

 !!! note
@@ -76,12 +76,13 @@ add up quickly, especially if many CPU cores are available for processing entrie
 parallel. We recommend at least 2GB per core and a minimum of 8GB. You also need to consider
 RAM and CPU for running tools like jupyter, if you opt to use NOMAD NORTH.

-### Sharing data through the logtransfer service and data privacy notice
+### Sharing data through log transfer and data privacy notice

-NOMAD includes a `logtransfer` service. When enabled this service automatically collects
-and transfers non-personalized log-data to us. Currently, this service is experimental
+NOMAD includes a *log transfer* functions. When enabled this it automatically collects
+and transfers non-personalized logging data to us. Currently, this functionality is experimental
 and requires opt-in. However, in upcoming versions of NOMAD Oasis, we might change to out-out.
-See the instructions in the configuration below on how to enable/disable `logtransfer`.
+
+To enable this functionality add `logtransfer.enabled: true` to you `nomad.yaml`.

 The service collects log-data and aggregated statistics, such as the number of users or the
 number of uploaded datasets. In any case this data does not personally identify any users or
@@ -93,7 +94,7 @@ The data is solely used by the NOMAD developers and FAIRmat, including but not l
 * Improving our NOMAD software based on usage patterns.
 * Generating aggregated and anonymized reports.

-We do not share any data collected through the `logtransfer` service with any third parties.
+We do not share any collected data with any third parties.

 We may update this data privacy notice from time to time to reflect changes in our data practices.
 We encourage you to review this notice periodically for any updates.
@@ -168,9 +169,6 @@ Changes necessary:
 - The group in the value of the hub's user parameter needs to match the docker group
 on the host. This should ensure that the user which runs the hub, has the rights to access the host's docker.
 - On Windows or MacOS computers you have to run the `app` and `worker` container without `user: '1000:1000'` and the `north` container with `user: root`.
- To opt-in the `logtransfer` service
-  ([data notice above](#sharing-data-through-the-logtransfer-service-and-data-privacy-notice)), start `docker compose`
-  with the flag `--profile with_logtransfer`. See also below for further necessary adaptations in the `nomad.yaml` file.

 A few things to notice:

@@ -201,7 +199,7 @@ You should change the following:
 users back to this host. Make sure this is the hostname, your users can use.
 - Replace `deployment`, `deployment_url`, and `maintainer_email` with representative values.
 The `deployment_url` should be the url to the deployment's api (should end with `/api`).
- To enable the `logtransfer` service activate logging in `logstash` format by setting `enable: true`.
+- To enable the *log transfer* set `logtransfer.enable: true` ([data privacy notice above](#sharing-data-through-the-logtransfer-service-and-data-privacy-notice)).
 - You can change `api_base_path` to run NOMAD under a different path prefix.
 - You should generate your own `north.jupyterhub_crypt_key`. You can generate one
 with `openssl rand -hex 32`.
@@ -485,12 +483,6 @@ docker run --rm -v `pwd`/nginx.conf:/etc/nginx/conf.d/default.conf -p 80:80 ngin

 ### Running NOMAD

-Before you start, we need to transfer your `nomad.yaml` config values to the GUI's
-javascript. You need to repeat this, if you change your `nomad.yaml`. You can do this by running:
-```
-nomad admin ops gui-config
-```
-
 To run NOMAD, you must run two services. One is the NOMAD app, it serves the API and GUI:
 ```sh
 --8<-- "run.sh"
@@ -498,7 +490,7 @@ To run NOMAD, you must run two services. One is the NOMAD app, it serves the API

 the second is the NOMAD worker, that runs the NOMAD processing.
 ```
-celery -A nomad.processing worker -l info -Q celery
+--8<-- "run_worker.sh"
 ```

 This should give you a working OASIS at `http://<your-host>/<your-path-prefix>`.

--- a/docs/howto/plugins/apps.md
+++ b/docs/howto/plugins/apps.md
@@ -52,7 +52,7 @@ The definition fo the actual app is given as an instance of the `App` class spec

 ```python
 from nomad.config.models.plugins import AppEntryPoint
-from nomad.config.models.ui import App, Column, Columns, FilterMenu, FilterMenus
+from nomad.config.models.ui import App, Column, Columns, FilterMenu, FilterMenus, Filters


 myapp = AppEntryPoint(
@@ -69,22 +69,34 @@ myapp = AppEntryPoint(
        description='An app customized for me.',
        # Longer description that can also use markdown
        readme='Here is a much longer description of this app.',
+        # Controls the available search filters. If you want to filter by
+        # quantities in a schema package, you need to load the schema package
+        # explicitly here. Note that you can use a glob syntax to load the
+        # entire package, or just a single schema from a package.
+        filters=Filters(
+            include=['*#nomad_example.schema_packages.mypackage.MySchema'],
+        ),
        # Controls which columns are shown in the results table
        columns=Columns(
-            selected=['entry_id'],
+            selected=[
+                'entry_id'
+                'data.mysection.myquantity#nomad_example.schema_packages.mypackage.MySchema'
+            ],
            options={
                'entry_id': Column(),
                'upload_create_time': Column(),
+                'data.mysection.myquantity#nomad_example.schema_packages.mypackage.MySchema': Column(),
            }
        ),
        # Dictionary of search filters that are always enabled for queries made
        # within this app. This is especially important to narrow down the
        # results to the wanted subset. Any available search filter can be
-        # targeted here.
+        # targeted here. This example makes sure that only entries that use
+        # MySchema are included.
        filters_locked={
-            'upload_create_time': {
-                'gte': 0
-            }
+            "section_defs.definition_qualified_name:all": [
+                "nomad_example.schema_packages.mypackage.MySchema"
+            ]
        },
        # Controls the filter menus shown on the left
        filter_menus=FilterMenus(
@@ -101,7 +113,7 @@ myapp = AppEntryPoint(
                    'autorange': True,
                    'nbins': 30,
                    'scale': 'linear',
-                    'quantity': 'results.material.n_elements',
+                    'quantity': 'data.mysection.myquantity#nomad_example.schema_packages.mypackage.MySchema',
                    'layout': {
                        'lg': {
                            'minH': 3,
@@ -118,10 +130,31 @@ myapp = AppEntryPoint(
    )
 )
 ```
+!!! tip
+    If you want to load an app definition from a YAML file, this can be easily done with the pydantic `parse_obj` function:
+
+    ```python
+        import yaml
+        from nomad.config.models.plugins import AppEntryPoint
+        from nomad.config.models.ui import App
+
+        yaml_data = """
+            label: My App
+            path: myapp
+            category: Theory
+        """
+        myapp = AppEntryPoint(
+            name='MyApp',
+            description='App defined using the new plugin mechanism.',
+            app=App.parse_obj(
+                yaml.safe_load(yaml_data)
+            ),
+        )
+    ```

 ### Loading custom quantity definitions into an app

-By default, none of the quantities from custom schemas are available in an app, and they need to be explicitly added. Each app may define additional **filters** that should be enabled in it. Filters have a special meaning in the app context: filters are pieces of (meta)info than can be queried in the search interface of the app, but also targeted in the rest of the app configuration as explained below in.
+By default, none of the quantities from custom schemas are available in an app, and they need to be explicitly added. Each app may define additional **filters** that should be enabled in it. Filters have a special meaning in the app context: filters are pieces of (meta)info that can be queried in the search interface of the app, but also targeted in the rest of the app configuration as explained below in.

 !!! note

@@ -135,8 +168,8 @@ schema is defined in:

 - Python schemas are identified by the python path for the class that inherits
 from `Schema`. For example, if you have a python package called `nomad_example`,
-which has a subpackage called `schema_packages`, containing a module called `myschema.py`, which contains the class `MySchema`, then
-the schema name will be `nomad_example.schema_packages.myschema.MySchema`.
+which has a subpackage called `schema_packages`, containing a module called `mypackage.py`, which contains the class `MySchema`, then
+the schema name will be `nomad_example.schema_packages.mypackage.MySchema`.
 - YAML schemas are identified by the entry id of the schema file together with
 the name of the section defined in the YAML schema. For example
 if you have uploaded a schema YAML file containing a section definition called
@@ -147,41 +180,42 @@ The quantities from schemas may be included or excluded as filter by using the
 [`filters`](#filters) field in the app config. This option supports a
 wildcard/glob syntax for including/excluding certain filters. For example, to
 include all filters from the Python schema defined in the class
-`nomad_example.schema_packages.myschema.MySchema`, you could use:
+`nomad_example.schema_packages.mypackage.MySchema`, you could use:

-```yaml
-myapp:
-  filters:
-    include:
-      - '*#nomad_example.schema_packages.myschema.MySchema'
+```python
+filters=Filters(
+    include=['*#nomad_example.schema_packages.mypackage.MySchema']
+)
 ```

 The same thing for a YAML schema could be achieved with:

-```yaml
-myapp:
-  filters:
-    include:
-      - '*#entry_id:<entry_id>.MySchema'
+```python
+filters=Filters(
+    include=['*#entry_id:<entry_id>.MySchema']
+)
 ```

 Once quantities from a schema are included in an app as filters, they can be targeted in the rest of the app. The app configuration often refers to specific filters to configure parts of the user interface. For example, one could configure the results table to show a new column using one of the schema quantities with:

-```yaml
-myapp:
-  columns:
-    include:
-      - 'data.mysection.myquantity#nomad_example.schema_packages.myschema.MySchema'
-      - 'entry_id'
-    options:
-      data.mysection.myquantity#myschema.schema.MySchema:
-      ...
+```python
+columns=Columns(
+    selected=[
+        'entry_id'
+        'data.mysection.myquantity#nomad_example.schema_packages.mypackage.MySchema'
+    ],
+    options={
+        'entry_id': Column(),
+        'upload_create_time': Column(),
+        'data.mysection.myquantity#nomad_example.schema_packages.mypackage.MySchema': Column(),
+    }
+)
 ```

 The syntax for targeting quantities depends on the resource:

 - For python schemas, you need to provide the path and the python schema name separated
-by a hashtag (#), for example `data.mysection.myquantity#nomad_example.schema_packages.myschema.MySchema`.
+by a hashtag (#), for example `data.mysection.myquantity#nomad_example.schema_packages.mypackage.MySchema`.
 - For YAML schemas, you need to provide the path and the YAML schema name separated
 by a hashtag (#), for example `data.mysection.myquantity#entry_id:<entry_id>.MySchema`.
 - Quantities that are common for all NOMAD entries can be targeted by using only

--- a/docs/howto/plugins/schema_packages.md
+++ b/docs/howto/plugins/schema_packages.md
@@ -177,12 +177,22 @@ we will get a final normalized archive that contains our data like this:
 }
 ```

+## Migration guide
+
+By default, schema packages are identified by the full qualified path to the Python module that contains the definitions. An example of a full qualified path could be `nomad_example.schema_packages.mypackage`, where the first part is the Python package name, second part is a subpackage, and the last part is a Python module containing the definitions. This is the easiest way to prevent conflicts between different schema packages: python package names are unique (prevents clashes between packages) and paths inside a package must point to a single python module (prevents clashes within package). This does, however, mean that *if you move your schema definition in the plugin source code, any references to the old definition will break*. This becomes problematic in installations that have lot of old data processed with the old definition location, as those entries will still refer to the old location and will not work correctly.
+
+As it might not be possible, or even wise to prevent changes in the source code layout, and reprocessing all old entries might be impractical, we do provide an alias mechanism to help with migration tasks. Imagine your schema package was contained in `nomad_example.schema_packages.mypackage`, and in a newer version of your plugin you want to move it to `nomad_example.schema_packages.mynewpackage`. The way to do this without completely breaking the old entries is to add an alias in the schema package definition:
+
+```python
+m_package = SchemaPackage(aliases=['nomad_example.schema_packages.mypackage'])
+```
+
+Note that this will only help in scenarious where you have moved the definition and not removed or modified any of them.

 ## Definitions

 The following describes in detail the schema language for the NOMAD Metainfo and how it is expressed in Python.

-
 ### Common attributes of Metainfo Definitions

 In the example, you have already seen the basic Python interface to the Metainfo. *Sections* are

--- a/docs/index.md
+++ b/docs/index.md
@@ -27,6 +27,7 @@ A series of tutorials will guide you through the main functionality of NOMAD.
 - [Use the search interface to identify interesting data](tutorial/explore.md)
 - [Use the API to search and access processed data for analysis](tutorial/access_api.md)
 - [Create and use custom schemas in NOMAD](tutorial/custom.md)
+- [Developing a NOMAD plugin](tutorial/develop_plugin.md)

 - [Example data and exercises](https://www.fairmat-nfdi.eu/events/fairmat-tutorial-1/tutorial-1-materials){:target="_blank"}
 - [More videos and tutorials on YouTube](https://youtube.com/playlist?list=PLrRaxjvn6FDW-_DzZ4OShfMPcTtnFoynT){:target="_blank"}

--- a/docs/tutorial/develop_plugin.md
+++ b/docs/tutorial/develop_plugin.md
+# Developing a NOMAD Plugin
+In this tutorial you will learn how to create and develop a NOMAD plugin. As an example we
+will create a plugin to log data for a simple sintering process.
+
+## Prerequisites
+- A GitHub account. This can be created for free on [github.com](https://github.com/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F&source=header-home).
+- Basic understanding of Python.
+- Basic understanding of NOMAD metainfo, see for example [tutorial 8](https://www.fairmat-nfdi.eu/events/fairmat-tutorial-8/tutorial-8-materials).
+
+!!! note
+    Several software development concepts are being used during this tutorial.
+    Here is a list with some further information on each of them:
+
+    * [what is Git](https://learn.microsoft.com/en-us/devops/develop/git/what-is-git)
+    * [what is VSCode, i. e., an Integrated Development Environment (IDE)](https://aws.amazon.com/what-is/ide/)
+    * [what is Pip](https://realpython.com/lessons/what-is-pip-overview/)
+    * [what is a Python virtual environment](https://realpython.com/python-virtual-environments-a-primer/#why-do-you-need-virtual-environments)
+    * [creating a Python package](https://packaging.python.org/en/latest/tutorials/packaging-projects/)
+    * [uploading a package to PyPI](https://www.freecodecamp.org/news/how-to-create-and-upload-your-first-python-package-to-pypi/)
+    * [what is cruft](https://cruft.github.io/cruft/)
+
+## Create a Git(Hub) repository
+Firstly, we recommend to use git to version control your NOMAD plugin.
+There is a GitHub template repository that can be used for this at [github.com/FAIRmat-NFDI/nomad-plugin-template](https://github.com/FAIRmat-NFDI/nomad-plugin-template).
+
+To use the template you should choose the "Create an new repository" option after pressing
+the green "Use this template" button in the upper right corner.
+Please note that you have to be logged into to GitHub to see this option.
+
+![Use template](./images/use_template_dark.png#gh-dark-mode-only)
+![Use template](./images/use_template_light.png#gh-light-mode-only)
+
+Enter a name (I will use "nomad-sintering" for mine) for your repository and click
+"Create Repository".
+
+## Generate the plugin structure
+Next, we will use a cookiecutter template to create the basic structure of our NOMAD
+plugin.
+
+There are now two options for how to proceed.
+
+1. You can use the GitHub codespaces environment to develop your plugin, or
+2. If you have access to a Linux computer you can also run the same steps locally.
+
+### 1. Using GitHub codespaces
+To use a GitHub codespace for the plugin development you should choose the "Create
+codespace on main" option after pressing the green "<> Code" button in the upper right
+corner.
+
+![Use codepace](./images/codespace_dark.png#gh-dark-mode-only)
+![Use codespace](./images/codespace_light.png#gh-light-mode-only)
+
+### 2. Developing locally
+If you have a Linux machine and prefer to develop locally you should **instead** click the
+"Local" tab after pressing the green "<> Code" button, copy the path, and clone your
+repository by running:
+
+```sh
+git clone PATH/COPIED/FROM/REPOSITORY
+```
+and move inside the top directory
+```
+cd REPOSITORY_NAME
+```
+You will also need to install [cruft](https://pypi.org/project/cruft/), preferably using
+`pipx`:
+```sh
+# pipx is strongly recommended.
+pipx install cruft
+
+# If pipx is not an option,
+# you can install cruft in your Python user directory.
+python -m pip install --user cruft
+```
+
+### Run cruft
+The next step is to run cruft to use our cookiecutter template:
+```sh
+cruft create https://github.com/FAIRmat-NFDI/cookiecutter-nomad-plugin
+```
+Cookiecutter prompts you for information regarding your plugin and I will enter the
+following for my example:
+
+```no-highlight
+  [1/12] full_name (John Doe): Hampus Näsström
+  [2/12] email (john.doe@physik.hu-berlin.de): hampus.naesstroem@physik.hu-berlin.de
+  [3/12] github_username (foo): hampusnasstrom
+  [4/12] plugin_name (foobar): sintering
+  [5/12] module_name (sintering):
+  [6/12] short_description (Nomad example template): A schema package plugin for sintering.
+  [7/12] version (0.1.0):
+  [8/12] Select license
+    1 - MIT
+    2 - BSD-3
+    3 - GNU GPL v3.0+
+    4 - Apache Software License 2.0
+    Choose from [1/2/3/4] (1):
+  [9/12] include_schema_package [y/n] (y): y
+  [10/12] include_normalizer [y/n] (y): n
+  [11/12] include_parser [y/n] (y): n
+  [12/12] include_app [y/n] (y): n
+```
+
+There you go - you just created a minimal NOMAD plugin:
+
+!!! note
+    In the above prompt, we pressed `y` for schema_package, this creates a python package
+with a plugin entry point for a schema package.
+
+```no-highlight
+nomad-sintering/
+├── LICENSE
+├── MANIFEST.in
+├── README.md
+├── docs
+│   └── ...
+├── mkdocs.yml
+├── move_template_files.sh
+├── pyproject.toml
+├── src
+│   └── nomad_sintering
+│       ├── __init__.py
+│       └── schema_packages
+│           ├── __init__.py
+│           └── mypackage.py
+└── tests
+    ├── conftest.py
+    ├── data
+    │   └── test.archive.yaml
+    └── schema_packages
+        └── test_schema.py
+```
+
+!!! note
+    The project `nomad-sintering` is created in a new directory, we have included a helper script to move all the files to the parent level of the repository.
+
+
+```sh
+sh CHANGE_TO_PLUGIN_NAME/move_template_files.sh
+```
+
+!!! warning "Attention"
+    The `CHANGE_TO_PLUGIN_NAME` should be substituted by the name of the plugin you've created. In the above case it'll be `sh nomad-sintering/move_template_files.sh`.
+
+Finally, we should add the files we created to git and commit the changes we have made:
+```sh
+git add -A
+git commit -m "Generated plugin from cookiecutter template"
+git push
+```
+
+### Enable Cruft updates
+
+In order to receive updates from our cookiecutter template we have included a GitHub
+action that automatically checks for updates once a week (or by triggering it manually).
+In order for this action to run we need to give the action permission to write and create
+pull requests. To do this we should go back to the plugin repo and head to the settings
+tab and navigate to the Actions/General options on the left:
+
+![Use template](./images/github_settings_dark.png#gh-dark-mode-only)
+![Use template](./images/github_settings_light.png#gh-light-mode-only)
+
+At the very bottom of this place you should mark the "Read and write permissions"
+and the "Allow GitHub Actions to create and approve pull requests" options and click save.
+
+![Use template](./images/workflow_permissions_dark.png#gh-dark-mode-only)
+![Use template](./images/workflow_permissions_light.png#gh-light-mode-only)
+
+## Setting up the python environment
+
+### Creating a virtual environment
+Before we can start developing we recommend to create a virtual environment using Python 3.9
+
+```sh
+python3.9 -m venv .pyenv
+source .pyenv/bin/activate
+```
+
+### Installing the plugin
+Next we should install our plugin package in editable mode and using the nomad package
+index
+
+```sh
+pip install --upgrade pip
+pip install -e '.[dev]' --index-url https://gitlab.mpcdf.mpg.de/api/v4/projects/2187/packages/pypi/simple
+```
+
+!!! note
+    Until we have an official PyPI NOMAD release with the latest NOMAD version, make sure to include NOMAD's internal package registry (e.g. via --index-url). The latest PyPI package available today is version 1.2.2 and it misses some updates functional to this tutorial.
+    In the future, when a newer release of `nomad-lab` will be available (    1.2.2) you can omit the `--index-url`.
+
+## Importing a yaml schema
+
+### The schema
+We will now convert the yaml schema package from part 2 where we described a sintering
+step:
+
+```yaml
+definitions:
+  name: 'Tutorial 13 sintering schema'
+  sections:
+    TemperatureRamp:
+      m_annotations:
+        eln:
+          properties:
+            order:
+              - "name"
+              - "start_time"
+              - "initial_temperature"
+              - "final_temperature"
+              - "duration"
+              - "comment"
+      base_sections:
+        - nomad.datamodel.metainfo.basesections.ProcessStep
+      quantities:
+        initial_temperature:
+          type: np.float64
+          unit: celsius
+          description: "initial temperature set for ramp"
+          m_annotations:
+            eln:
+              component: NumberEditQuantity
+              defaultDisplayUnit: celsius
+        final_temperature:
+          type: np.float64
+          unit: celsius
+          description: "final temperature set for ramp"
+          m_annotations:
+            eln:
+              component: NumberEditQuantity
+              defaultDisplayUnit: celsius
+    Sintering:
+      base_sections:
+        - nomad.datamodel.metainfo.basesections.Process
+        - nomad.datamodel.data.EntryData
+      sub_sections:
+        steps:
+          repeats: True
+          section: '#/TemperatureRamp'
+```
+
+We can grab this file from the tutorial repository using curl
+```sh
+curl -L -o sintering.archive.yaml "https://raw.githubusercontent.com/FAIRmat-NFDI/AreaA-Examples/main/tutorial13/part3/files/sintering.archive.yaml"
+```
+
+
+### `metainfo-yaml2py`
+
+We will now use an external package `metainfo-yaml2py` to convert the yaml schema package
+into python class definitions.
+First we install the package with `pip`:
+```sh
+pip install metainfoyaml2py
+```
+
+Then we can run the `metainfo-yaml2py` command on the `sintering.archive.yaml` file with
+the `-n` flag for adding `normalize()` functions (will be explained later)
+and specify the output directory, with the `-o` flag, to be our `schema_packages`
+directory:
+```sh
+metainfo-yaml2py sintering.archive.yaml -o src/nomad_sintering/schema_packages -n
+```
+
+### Updating `__init__.py` and `pyproject.toml`
+
+The metadata of our package is defined in the `__init__.py` file and here we now need to
+add the sintering package that we just created.
+If we take a look in that file we can see an example created by the cookiecutter template.
+We can go ahead and copy the `MySchemaPackageEntryPoint` class and the `mypackage`
+instance and paste them below.
+We then need to change:
+1. the name of the class,
+2. the import in the load function to import our sintering schema package,
+3. the name of the instance and the class it uses,
+4. ideally we should also update the description and the name.
+
+The changes could look something like this:
+
+```py
+class SinteringEntryPoint(SchemaPackageEntryPoint):
+
+    def load(self):
+        from nomad_sintering.schema_packages.sintering import m_package
+
+        return m_package
+
+
+sintering = SinteringEntryPoint(
+    name='Sintering',
+    description='Schema package for describing a sintering process.',
+)
+```
+
+Finally, we also need to add our new entry point to the `pyproject.toml`.
+At the bottom of the toml you will see how this was done for the example and we just need
+to replicate that with whatever we called our instance:
+```toml
+sintering = "nomad_sintering.schema_packages:sintering"
+```
+
+Before we continue, we should commit our changes to git:
+```sh
+git add -A
+git commit -m "Added sintering classes from yaml schema"
+git push
+```
+
+### Ruff autoformatting
+
+If we check the actions tab of the GitHub repository we might see that the last commit
+caused an error in the Ruff format checking. We can either disable this workflow (not
+recommended) or we can check and format our code with Ruff.
+
+To check what Ruff thinks about our code we run:
+
+```sh
+ruff check .
+```
+
+To fix any issues we can run:
+
+```sh
+ruff check . --fix
+```
+
+And commit the changes:
+```sh
+git add -A
+git commit -m "Ruff linting"
+git push
+```
+
+## Adding a normalize function
+
+Next we will add some functionality to our use case through a so called "normalize"
+function. This allows us to add functionality to our schemas via Python code.
+
+### The use case
+
+For this tutorial we will assume that we have a recipe file for our hot plate that we will
+parse:
+```csv
+step name,duration [min],initial temperature [C],final temperature [C]
+heating, 30, 25, 300
+hold, 60, 300, 300
+cooling, 30, 300, 25
+```
+
+We can grab this file from the tutorial repository and place it in the tests/data
+directory using curl
+```sh
+curl -L -o tests/data/sintering_example.csv "https://raw.githubusercontent.com/FAIRmat-NFDI/AreaA-Examples/main/tutorial13/part3/files/sintering_example.csv"
+```
+
+### Adding the code
+
+The first thing we need to add is a new `Quantity` in our `Sintering` class to hold the
+recipe file:
+```py
+data_file = Quantity(
+    type=str,
+    description='The recipe file for the sintering process.',
+    a_eln={
+        "component": "FileEditQuantity",
+    },
+)
+```
+Here we have used the `a_eln` component annotation to add a `FileEditQuantity`. You will
+see in part 4 how this looks in the GUI.
+
+Secondly we need to update the normalize method to read the data file and update the
+corresponding data.
+
+First we will check if the `self.data_file` is present and, if so, use the
+`archive.m_context.raw_file()` method to open the file and read it with the pandas
+function `read_csv()`:
+
+```py
+if self.data_file:
+  with archive.m_context.raw_file(self.data_file) as file:
+    df = pd.read_csv(file)
+```
+
+We will then create a list to hold the steps, iterate over our data frame, create an
+instance of a `TemperatureRamp`, and fill them.
+```py
+    steps = []
+    for i, row in df.iterrows():
+      step = TemperatureRamp()
+      step.name = row['step name']
+      step.duration = ureg.Quantity(float(row['duration [min]']), 'min')
+      step.initial_temperature = ureg.Quantity(row['initial temperature [C]'], 'celsius')
+      step.final_temperature = ureg.Quantity(row['final temperature [C]'], 'celsius')
+      steps.append(step)
+```
+Here we have used the NOMAD unit registry to handle all the units.
+
+Finally, we will assign the `self.steps` with our new list of steps.
+```py
+  self.steps = steps
+```
+
+We also need to add the import of pandas and the NOMAD unit registry to the top of our
+`sintering.py` file:
+```py
+from nomad.units import ureg
+import pandas as pd
+```
+
+Here are all the changes combined:
+```py
+from nomad.units import ureg
+import pandas as pd
+
+
+class Sintering(Process, EntryData, ArchiveSection):
+    '''
+    Class autogenerated from yaml schema.
+    '''
+    m_def = Section()
+    steps = SubSection(
+        section_def=TemperatureRamp,
+        repeats=True,
+    )
+    data_file = Quantity(
+        type=str,
+        description='The recipe file for the sintering process.',
+        a_eln={
+            "component": "FileEditQuantity",
+        },
+    )
+
+    def normalize(self, archive, logger: BoundLogger) -    None:
+        '''
+        The normalizer for the `Sintering` class.
+
+        Args:
+            archive (EntryArchive): The archive containing the section that is being
+            normalized.
+            logger (BoundLogger): A structlog logger.
+        '''
+        super(Sintering, self).normalize(archive, logger)
+        if self.data_file:
+          with archive.m_context.raw_file(self.data_file) as file:
+            df = pd.read_csv(file)
+          steps = []
+          for i, row in df.iterrows():
+            step = TemperatureRamp()
+            step.name = row['step name']
+            step.duration = ureg.Quantity(float(row['duration [min]']), 'min')
+            step.initial_temperature = ureg.Quantity(row['initial temperature [C]'], 'celsius')
+            step.final_temperature = ureg.Quantity(row['final temperature [C]'], 'celsius')
+            steps.append(step)
+        self.steps = steps
+
+```
+
+## Running the normalize function
+We will now run the NOMAD processing on a test file to see the normalize function in
+action.
+
+### Create an archive.json file
+The first step is to create the test file.
+We should add a file with the ending `.archive.yaml` or `archive.json` and which contains
+a `data` section and an `m_def` key with the value being our sintering section.
+Finally, we should also add the `data_file` key with the value being our `.csv` file from
+before.
+```yaml
+data:
+  m_def: nomad_sintering.schema_packages.sintering.Sintering
+  data_file: sintering_example.csv
+```
+
+We can once again grab this file from the tutorial repository and place it in the
+tests/data directory using curl
+```sh
+curl -L -o tests/data/test_sintering.archive.yaml "https://raw.githubusercontent.com/FAIRmat-NFDI/AreaA-Examples/main/tutorial13/part3/files/test_sintering.archive.yaml"
+```
+!!! warning "Attention"
+    You might need to modify the package name for the `m_def` if you called your python
+    module something other than `nomad_sintering`
+
+### Run the NOMAD CLI
+To run the processing we us the NOMAD CLI method `parse` with the flag `--show-archive`
+and save the output in a json file
+
+```sh
+nomad parse tests/data/test_sintering.archive.yaml --show-archive > normalized.archive.json
+```
+
+However, when we run this we will get an error from NOMAD!
+
+```bash
+could not normalize section (normalizer=MetainfoNormalizer, section=Sintering, exc_info=Cannot convert from 'milliinch' ([length]) to 'second' ([time]))
+```
+
+What is happening here is that it has treated our `'min'` unit for duration as `'milliinch'`
+and not the intended minutes. To fix this we can directly edit the normalize function
+of the `Sintering` class in the `sintering.py` file by replacing `'min'` with `'minutes'`.
+
+```py
+def normalize(self, archive: 'EntryArchive', logger: 'BoundLogger') -> None:
+    """
+    The normalizer for the `Sintering` class.
+
+    Args:
+        archive (EntryArchive): The archive containing the section that is being
+        normalized.
+        logger (BoundLogger): A structlog logger.
+    """
+    super().normalize(archive, logger)
+    if self.data_file:
+        with archive.m_context.raw_file(self.data_file) as file:
+            df = pd.read_csv(file)
+        steps = []
+        for i, row in df.iterrows():
+            step = TemperatureRamp()
+            step.name = row['step name']
+            # Changed 'min' to 'minutes' here:
+            step.duration = ureg.Quantity(float(row['duration [min]']), 'minutes')
+```
+
+Since we installed our package in editable mode the changes will take effect as soon as we
+save and rerunning the nomad parse command above should now work.
+
+To view the output you can open and inspect the `normalized.archive.json` file. The
+beginning of that file should look something like:
+
+```json
+{
+  "data": {
+    "m_def": "nomad_sintering.schema_packages.sintering.Sintering",
+    "name": "test sintering",
+    "datetime": "2024-06-04T16:52:23.998519+00:00",
+    "data_file": "sintering_example.csv",
+    "steps": [
+      {
+        "name": "heating",
+        "duration": 1800.0,
+        "initial_temperature": 25.0,
+        "final_temperature": 300.0
+      },
+      {
+        "name": "hold",
+        "duration": 3600.0,
+        "initial_temperature": 300.0,
+        "final_temperature": 300.0
+      },
+      {
+        "name": "cooling",
+        "duration": 1800.0,
+        "initial_temperature": 300.0,
+        "final_temperature": 25.0
+      }
+    ]
+  },
+...
+```
+
+### Next steps
+The next step is to include your new schema in a custom NOMAD Oasis. For more information
+on how to setup a NOMAD Oasis you can have a look at
+[How-to guides/NOMAD Oasis/Install and Oasis](../howto/oasis/install.md).
+
+Before we move one we should make sure that we have committed our changes to git:
+```sh
+git add -A
+git commit -m "Added a normalize function to the Sintering schema"
+git push
+```
--- a/docs/tutorial/images/codespace_dark.png
+++ b/docs/tutorial/images/codespace_dark.png
--- a/docs/tutorial/images/codespace_light.png
+++ b/docs/tutorial/images/codespace_light.png
--- a/docs/tutorial/images/github_settings_dark.png
+++ b/docs/tutorial/images/github_settings_dark.png
--- a/docs/tutorial/images/github_settings_light.png
+++ b/docs/tutorial/images/github_settings_light.png
No results found