diff --git a/docs/apis/api.md b/docs/apis/api.md index d6b7f796c55b7a0fc64d36b887d05351a012b9b1..886040b7f15897a0b5d7a4abed85213e19e6c292 100644 --- a/docs/apis/api.md +++ b/docs/apis/api.md @@ -1,7 +1,7 @@ This guide is about using NOMAD's REST APIs directly, e.g. via Python's *request*. To access the processed data with our client library `nomad-lab` follow -[How to access the processed data](archive_query.md). You watch our +[How-to access the processed data](archive_query.md). You watch our [video tutorial on the API](../tutorial.md#access-data-via-api). ## Different options to use the API @@ -46,7 +46,7 @@ API functions that allows you to try these functions in the browser. Install the [NOMAD Python client library](pythonlib.md) and use it's `ArchiveQuery` functionality for a more convenient query based access of archive data following the -[How to access the processed data](archive_query.md) guide. +[How-to access the processed data](archive_query.md) guide. ## Using request @@ -250,7 +250,7 @@ the API: - Raw files, the files as they were uploaded to NOMAD. - Archive data, all of the extracted data for an entry. -There are also different entities (see also [Datamodel](../learn/how_nomad_works.md#datamodel-uploads-entries-files-datasets)) with different functions in the API: +There are also different entities (see also [Datamodel](../learn/basics.md)) with different functions in the API: - Entries - Uploads diff --git a/docs/apis/archive_query.md b/docs/apis/archive_query.md index 95e6f7bec60537523dc720fa2eba65517e44fd85..9d504608b1f54a510010e2d3b3c0f885839d8c97 100644 --- a/docs/apis/archive_query.md +++ b/docs/apis/archive_query.md @@ -5,7 +5,7 @@ based on the schema rather than plain JSON. See also this guide on using to work with processed data. As a requirement, you have to install the `nomad-lab` Python package. Follow the -[How to install nomad-lab](pythonlib.md) guide. +[How-to install nomad-lab](pythonlib.md) guide. ## Getting started diff --git a/docs/apis/local_parsers.md b/docs/apis/local_parsers.md index b42e85232a9d330a755574bd0881a0a6c3511cbb..688e11aa0fb4f263de3b3c123c814fd7ab73f3c2 100644 --- a/docs/apis/local_parsers.md +++ b/docs/apis/local_parsers.md @@ -1,4 +1,4 @@ -# How to run a parser +# How-to run a parser You can find a [list of all parsers](../reference/parsers.md) and supported files in the reference. diff --git a/docs/data/eln.md b/docs/data/eln.md index 74dcb39b5764da3d2b0ad5629ed923fb325eeefc..888755dddaa6be176811c421666362b66c0e9348 100644 --- a/docs/data/eln.md +++ b/docs/data/eln.md @@ -13,7 +13,7 @@ button. This will bring you to the upload page. Click the `CREATE ENTRY` button. This will bring-up a dialog to choose an ELN schema. All ELNs (as any entry in NOMAD) needs to follow a schema. You can choose from uploaded -custom schemas or NOMAD build-in schemas. You can choose the `Basic ELN` to create a +custom schemas or NOMAD built-in schemas. You can choose the `Basic ELN` to create a simple ELN entry. The name of your ELN entry, will be the filename for your ELN without the `.archive.json` @@ -33,5 +33,5 @@ click the `ADD EXAMPLE UPLOADS` button. The `Electronic Lab Notebook` example, w contain a schema and entries that instantiate different parts of the schema. The *ELN example sample (`sample.archive.json`) demonstrates what you can do. -Follow the [How to write a schema](../schemas/basics.md) and [How to define ELN](../schemas/elns.md) +Follow the [How-to write a schema](../schemas/basics.md) and [How-to define ELN](../schemas/elns.md) guides to create you own customized of ELNs. diff --git a/docs/develop/guides.md b/docs/develop/guides.md index 458ea6b14a7e08eb8f71c9e6fb9dd2d87fb6d88d..0277264849165da5671fd6ab65c09013c76a7d79 100644 --- a/docs/develop/guides.md +++ b/docs/develop/guides.md @@ -170,7 +170,7 @@ Please follow the following rules when logging: - If a logger is not already provided, only use :py:func:`nomad.utils.get_logger` to acquire a new logger. Never use the - build-in logging directly. These logger work like the system loggers, but + built-in logging directly. These logger work like the system loggers, but allow you to pass keyword arguments with additional context data. See also the [structlog docs](https://structlog.readthedocs.io/en/stable/). - In many context, a logger is already provided (e.g. api, processing, parser, normalizer). diff --git a/docs/develop/normalizers.md b/docs/develop/normalizers.md index 82ab15e7b846addeda84f1c75834c3572fbb818a..d3d0ab1595aab2a4e4e923d3e31b15d47202b1dd 100644 --- a/docs/develop/normalizers.md +++ b/docs/develop/normalizers.md @@ -1,4 +1,4 @@ -# How to write a normalizer +# How-to write a normalizer ## Introduction diff --git a/docs/index.md b/docs/index.md index 478341f3e51840aea7a83dd749b03cabac6c7c3b..803aedc5c6a14b45de8d08649d9871d772170b83 100644 --- a/docs/index.md +++ b/docs/index.md @@ -34,14 +34,14 @@ It covers th whole publish, explore, analyze cycle: </div> <div markdown="block"> -### How to guides +### How-to guides The documentation provides step-by-step instructions for a wide range of tasks. For example: -- [How to upload and publish data](data/upload.md) -- [How to write a custom ELN](schemas/elns.md) -- [How to run a parser locally](apis/local_parsers.md) -- [How to install NOMAD Oasis](oasis/install.md) +- [How-to upload and publish data](data/upload.md) +- [How-to write a custom ELN](schemas/elns.md) +- [How-to run a parser locally](apis/local_parsers.md) +- [How-to install NOMAD Oasis](oasis/install.md) </div> diff --git a/docs/learn/architecture.png b/docs/learn/architecture.png index be1b1a0035c9da6468d430d396e9cb77f03d5ff9..2a5a37aab444305b87cd21e87f1d0773cb3fb1c7 100644 Binary files a/docs/learn/architecture.png and b/docs/learn/architecture.png differ diff --git a/docs/learn/archive-example.png b/docs/learn/archive-example.png deleted file mode 100644 index 8dcd7bb38494b3fcb8e1a14f493b26a8f3771b15..0000000000000000000000000000000000000000 Binary files a/docs/learn/archive-example.png and /dev/null differ diff --git a/docs/learn/basics.md b/docs/learn/basics.md new file mode 100644 index 0000000000000000000000000000000000000000..0ec2f0cd17dce1697d2f7b7ca135cec86871e09c --- /dev/null +++ b/docs/learn/basics.md @@ -0,0 +1,131 @@ + +NOMAD is based on a *bottom-up* approach to data management. Instead of only supporting data in a specific +predefined format, we process files to extract data from an extendable variety of data formats. + +Converting heterogenous files into homogeneous machine actionable processed data is the +basis to make data FAIR. It allows us to build search interfaces, APIs, visualization, and +analysis tools independent from specific file formats. + +<figure markdown> +  + <figcaption>NOMAD's datamodel and processing</figcaption> +</figure> + +## Uploads + +Users create **uploads** to organize files. Think of an upload like a project: +many files can be put into a single upload and an upload can be structured with directories. +You can collaborate on uploads, share uploads, and publish uploads. The files in an +upload are called **raw files**. +Raw files are managed by users and they are never changed by NOMAD. + +!!! note + As a *rule*, **raw files** are not changed during processing (or otherwise). However, + to achieve certain functionality, a parser, normalizer, or schema developer might decide to bend + this rule. Use-cases include the generation of more mainfiles (and entries) and updating + of related mainfiles to automatize ELNs, or + generating additional files to convert a mainfile into a standardized format like nexus or cif. + + +## Entries + +All uploaded **raw files** are analysed to find files with a recognized format. Each file +that follows a recognized format is a **mainfile**. For each mainfile, NOMAD will create +a database **entry**. The entry is eternally matched to the mainfile. The entry id, for example, +is a hash over the upload id and the mainfile path (and an optional key) within the upload. +This **matching** process is automatic, and users cannot create entries +manually. + +!!! note + We say that raw files are not changed by NOMAD and that users cannot create entries, + but what about ELNs? There is a *create entry* button in the UI? + + However, + NOMAD will simply create an editable **mainfile** that indirectly creates an entry. + The user might use NOMAD as an editor to change the file, but the content is + determined by the users. Contrary to the processed data that is created + from raw files by NOMAD. + + +## Processing + +Also the processing of entries is automatic. Initially and on each mainfile change, +the entry corresponding to the mainfile, will be processed. Processing consist of +**parsing**, **normalizing**, and storing the created data. + + +### Parsing + +Parsers are small programs that transform data from a recognized *mainfile* into a +structured machine processable tree of data that we call the *archive* or [**processed data**](data.md) +of the entry. Only one parser is used for each entry. The used parser is determined +during matching and depends on the file format. Parsers can be added to NOMAD as +[plugins](../plugins/parsers.md); this is a list of [all built-in parsers](../reference/parsers.md). + + +!!! note + A special case is the parsing of NOMAD archive files. Usually a parser converts a file + from a source format into NOMAD's *archive* format for processed data. But users can + also create files following this format themselves. They can be uploaded either as `.json` or `.yaml` files + by using the `.archive.json` or `.archive.yaml` extension. In these cases, we also considering + these files as mainfiles and they are also going to be processed. Here the parsing + is a simple syntax check and basically just copying the data, but normalization might + still modify and augment the data substantially. One use-case for these *archive* files, + are ELNs. Here the NOMAD UI acts as an editor for a respective `.json` file, but on each save, the + corresponding file is going through all the regular processing steps. This allows + ELN schema developers to add all kinds of functionality such as updating referenced + entries, parsing linked files, or creating new entries for automation. + +### Normalizing + +While parsing converts a mainfile into processed data, normalizing is only working on the +processed data. Learn more about why to normalize in the documentation on [structured data](./data.md). +There are two principle ways to implement normalization in NOMAD: +**normalizers** and **normalize** functions. + +[Normalizers](../develop/normalizers.md) are small programs that take processed data as input. +There is a list of normalizers registered in the [NOMAD configuration](../reference/config.md#normalize). +In the future, normalizers might be +added as plugins as well. They run in the configured order. Every normalizer is run +on all entries and the normalizer might decide to do something or not, depending on what +it sees in the processed data. + +Normalize functions are special functions implemented as part of section definitions +in [Python schemas](../plugins/schemas.md#writing-schemas-in-python-compared-to-yaml-schemas). +There is a special normalizer that will go through all processed data and execute these +function if they are defined. Normalize functions get the respective section instance as +input. This allows [schema plugin](../plugins/schemas.md) developers to add normalizing to their sections. +Read about our [structured data](./data.md) to learn more about the different sections. + +### Storing and indexing + +As a last technical step, the processed data is stored and some information is passed +into the search index. The store for processed data is internal to NOMAD and processed +data cannot be accessed directly and only via the [archive API](../apis/api.md#access-processed-data-archives) +or [ArchiveQuery](../apis/archive_query.md) Python library functionality. +What information is stored in the search index is determined +by the *metadata* and *results* sections and cannot be changed by users or plugins. +However, all scalar values in the processed data are also index as key-values pairs. + +!!! attention + This part of the documentation should be more substantiated. There will be a learn section + about the search soon. + +## Files + +We already said that all uploaded files are **raw files**. Recognized files that have an +entry are called **mainfiles**. Only the mainfile of the entry is +passed to the parser during processing. However, a parser can call other tools or read other files. +Therefore, we consider all files in the same directory of the mainfile as **auxillary files**, +even though there is not necessarily a formal relationship with the entry. If +formal relationships with aux files are established, e.g. via a reference to the file within +the processed data, is up to the parser. + +## Datasets + +Users can build collections of entries to form **datasets**. You can imagine datasets +like tags or albums in other systems. Each entry can be contain in many datasets and +a dataset can hold many entries. Datasets can also overlap. Datasets are only +indirectly related to files. The main purpose of **datasets** in NOMAD is to have citable +collections of data. Users can get a DOI for their datasets. Datasets have no influence +on the processing of data. diff --git a/docs/learn/data-flow.png b/docs/learn/data-flow.png deleted file mode 100644 index b39effff4bd4fbd1efc884191d6a1a994e17c7de..0000000000000000000000000000000000000000 Binary files a/docs/learn/data-flow.png and /dev/null differ diff --git a/docs/learn/data.md b/docs/learn/data.md new file mode 100644 index 0000000000000000000000000000000000000000..a43addad44ee809bacd45829ff2d95462e6dbbb0 --- /dev/null +++ b/docs/learn/data.md @@ -0,0 +1,186 @@ +# Structured data and the NOMAD Metainfo + +NOMAD structures data into **sections**, where each section can contain data and more sections. +This allows to browse complex data like you would browse files and directories on your computer. +Each section follows a **definition** and all the contained data and sub-section have a +specific name, description, possible type, shape, and unit. This means that all data follows a **schema**. +This not only helps the human exploration, but also makes it machine interpretable, +increases consistency and interoperability, enables search, APIs, visualization, and +analysis. + +<figure markdown> +  + <figcaption>Browsing structured data in the NOMAD UI (<a href="https://nomad-lab.eu/prod/v1/gui/search/entries/entry/id/zQJMKax7xk384h_rx7VW_-6bRIgi/data/run/0/system/0/atoms/positions">link</a>)</figcaption> +</figure> + + +## Schema language + +The bases for structured data are schemas written in a **schema language**. Our +schema language is called the **NOMAD Metainfo** language. It +defines the tools to define sections, organize definitions into **packages**, and define +section properties (**sub-sections** and **quantities**). + +<figure markdown> +  + <figcaption>The NOMAD Metainfo schema language for structured data definitions</figcaption> +</figure> + +Packages contain section definitions, section definitions contain definitions for +sub-sections and quantities. Sections can inherit the properties of other sections. While +sub-sections allow to define containment hierarchies for sections, quantities can +use section definitions (or other quantity definitions) as a type to define references. + +If you are familiar with other schema languages and means to defined structured data +(json schema, XML schema, pydantic, database schemas, ORM, etc.), you might recognize +these concept under different names. Sections are similar to *classes*, *concepts*, *entities*, or *tables*. +Quantities are related to *properties*, *attributes*, *slots*, *columns*. +Sub-sections might be called *containment* or *composition*. Sub-sections and quantities +with a section type also define *relationships*, *links*, or *references*. + +Our guide on [how-to write a schema](../schemas/basics.md) explains these concepts with an example. + +## Schema + +NOMAD represents many different types of data. Therefore, we cannot speak of just *the one* +schema. The entirety of NOMAD schemas is called the **NOMAD Metainfo**. +Definitions used in the NOMAD Metainfo fall into three different categories. First, +we have sections that define a **shared entry structure**. Those are independent of the +type of data (and processed file type). They allow to find all generic parts without +any deeper understanding of the specific data. Second, we have definitions of +**re-usable base sections** for shared common concepts and their properties. +Specific schemas can use and extend these base sections. Base sections define a fixed +interface or contract that can be used to build tools (e.g. search, visualizations, analysis) +around them. Lastly, there are **specific schemas**. Those re-use base sections and +complement the shared entry structure. They define specific data structures to represent +specific types of data. + +<figure markdown> +  + <figcaption> + The three different categories of NOMAD schema definitions + </figcaption> +</figure> + +### Shared entry structure + +The processed data (archive) of each entry share the same structure. They all instantiate +the same root section `EntryArchive`. They all share common sections `metadata:EntryMetadata` +and `results:Results`. They also all contain a *data* section, but the used section +definition varies depending on the type of data of the specific entry. There is the +literal `data:EntryData` sub-section. Here `EntryData` is abstract and specific entries +will use concrete definitions that inherit from `EntryData`. There are also specific *data* +sections, like `run` for simulation data and `nexus` for nexus data. + +!!! attention + The results, originally only designed for computational data, will soon be revised + an replaced by a different section. However, the necessity and function of a section + like this remains. + +<figure markdown> +  + <figcaption> + All entries instantiate the same section share the same structure. + </figcaption> +</figure> + +### Base sections + +Base section is a very loose category. In principle, every section definition can be +inherited from or can be re-used in different contexts. There are some dedicated (or even abstract) +base section definitions (mostly defined in the `nomad.datamodel.metainfo` package and sub-packages), +but schema authors should not strictly limit themselves to these definitions. +The goal is to re-use as much as possible and to not re-invent the same sections over +and over again. Tools build around certain base section, provide an incentive to +use them. + +!!! attention + There is no detailed how-to or reference documentation on the existing base sections + and how to use them yet. + +One example for re-usable base section is the [workflow package](../schemas/workflows.md). +These allow to define workflows in a common way. They allow to place workflows in +the shared entry structure, and the UI provides a card with workflow visualization and +navigation for all entries that have a workflow inside. + +!!! attention + Currently there are two version of the workflow schema. They are stored in two + top-level `EntryArchive` sub-sections (`workflow` and `workflow2`). This + will change soon to something that supports multiple workflows used in + specific schemas and results. + +### Specific schemas + +Specific schemas allow users and plugin developers to describe their data in all detail. +However, users (and machines) not familiar with the specifics, will struggle to interpret +these kinda of data. Therefore, it is important to also translate (at least some of) the data +into a more generic and standardized form. + +<figure markdown> +  + <figcaption> + From specific data to more general interoperable data. + </figcaption> +</figure> + +The **results** section provides a shared structure designed around base section definitions. +This allows you to put (at least some of) your data where it is easy to find, and in a +form that is easy to interpret. Your non-interoperable, but highly +detailed data needs to be transformed into an interoperable (but potentially limited) form. + +Typically, a parser will be responsible to populate the specific schema, and the +interoperable schema parts (e.g. section results) are populated during normalization. +This allows to separate certain aspects of conversions and potentially enables re-use +for normalization routines. The necessary effort for normalization depends on how much +the specific schema deviates from base-sections. There are three levels: + +- the parser (or uploaded archive file) populates section results directly +- the specific schema re-uses the base sections used for the results and normalization +can be automated +- the specific schema represents the same information differently and a translating +normalization algorithm needs to be implemented. + +### Exploring the schema + +All built-in definitions that come with NOMAD or one of the installed plugins can +be explored with the [Metainfo browser](https://nomad-lab.eu/prod/v1/gui/analyze/metainfo/nomad.datamodel.datamodel.EntryArchive). You can start with the root section `EntryArchive` +and browse based on sub-sections, or explore the Metainfo through packages. + +To see all user provided uploaded schemas, you can use a [search for the sub-section `definition`](https://nomad-lab.eu/prod/v1/gui/search/entries?quantities=definitions). +The sub-section `definition` is a top-level `EntryArchive` sub-section. See also our +[how-to on writing and uploading schemas](http://127.0.0.1:8001/schemas/basics.html#uploading-schemas). + +### Contributing to the Metainfo + +The shared entry structure (including section results) is part of the NOMAD source-code. +It interacts with core functionality and needs to be highly controlled. +Contributions here are only possible through merge requests. + +Base sections can be contributed via plugins. Here they can be explored in the Metainfo +browser, your plugin can provide more tools, and you can make use of normalize functions. +See also our [how-to on writing schema plugins](../plugins/schemas.md). You could +also provide base sections via uploaded schemas, but those are harder to explore and +distribute to other NOMAD installations. + +Specific schemas can be provided via plugins or as uploaded schemas. When you upload +schemas, you most likely also upload data in archive files (or use ELNs to edit such files). +Here you can also provide schemas and data in the same file. In many case +specific schemas will be small and only re-combine existing base sections. +See also our +[how-to on writing schemas](http://127.0.0.1:8001/schemas/basics.html). + +## Data + +All processed data in NOMAD instantiates Metainfo schema definitions and the *archive* of +each entry is always an instance of `EntryArchive`. This provides an abstract structure +for all data. However, it is independent of the actual representation of data in computer memory +or how it might be stored in a file or database. + +The Metainfo has many serialized forms. You can write `.archive.json` or `.archive.yaml` +files yourself. NOMAD internally stores all processed data in [message pack](https://msgpack.org/). Some +of the data is stored in mongodb or elasticsearch. When you request processed data via +API, you receive it in JSON. When you use the [ArchiveQuery](../apis/archive_query.md), all data is represented +as Python objects (see also [here](http://127.0.0.1:8001/plugins/schemas.html#starting-example)). + +No matter what the representation is, you can rely on the structure, names, types, shapes, and units +defined in the schema to interpret the data. diff --git a/docs/learn/data.png b/docs/learn/data.png new file mode 100644 index 0000000000000000000000000000000000000000..81a2669f0aa423dd7a61f32ed1aba286873e2419 Binary files /dev/null and b/docs/learn/data.png differ diff --git a/docs/learn/datamodel.png b/docs/learn/datamodel.png index fb0f0afea3ba9133eaa671be1d5af5402b98b352..7bb9cfc00c0abccce5f636007514366b89684248 100644 Binary files a/docs/learn/datamodel.png and b/docs/learn/datamodel.png differ diff --git a/docs/learn/how-does-nomad-work.png b/docs/learn/how-does-nomad-work.png deleted file mode 100644 index 1c5eefadca24eeb68b98cbe39c51ccc79fc23251..0000000000000000000000000000000000000000 Binary files a/docs/learn/how-does-nomad-work.png and /dev/null differ diff --git a/docs/learn/how_nomad_works.md b/docs/learn/how_nomad_works.md deleted file mode 100644 index 265509f06b1a0ab2ba8743b500a2d26dc2792840..0000000000000000000000000000000000000000 --- a/docs/learn/how_nomad_works.md +++ /dev/null @@ -1,41 +0,0 @@ -# How does NOMAD work? - -## Managing data based on automatically extract rich metadata - - -NOMAD is based on a *bottom-up* approach. Instead of only managing data of a specific -predefined format, we use parsers and processing to support an extendable variety of -data formats. Uploaded *raw* files are analysed and files with a recognized format are parsed. -Parsers are small programs that transform data from the recognized *mainfiles* into a common machine -processable version that we call *archive*. The information in the common archive representation -drives everything else. It is the based for our search interface, the representation of materials -and their properties, as well as all analytics. - -## A common hierarchical machine processable format for all data - - -The *archive* is a hierarchical data format with a strict schema. -All the information is organized into logical nested *sections*. -Each *section* comprised a set of *quantities* on a common subject. -All *sections* and *quantities* are supported by a formal schema that defines names, descriptions, types, shapes, and units. -We sometimes call this data *archive* and the schema *metainfo*. - -## Datamodel: *uploads*, *entries*, *files*, *datasets* - -Uploaded *raw* files are managed in *uploads*. -Users can create *uploads* and use them like projects. -You can share them with other users, incrementally add and modify data in them, publish (incl. embargo) them, or transfer them between NOMAD installations. -As long as an *upload* is not published, you can continue to provide files, delete the upload again, or test how NOMAD is processing your files. -Once an upload is published, it becomes immutable. - -<figure markdown> - { width=600 } - <figcaption>NOMAD's main entities</figcaption> -</figure> - -An *upload* can contain an arbitrary directory structure of *raw* files. -For each recognized *mainfile*, NOMAD creates an entry. -Therefore, an *upload* contains a list of *entries*. -Each *entry* is associated with its *mainfile*, an *archive*, and all other *auxiliary* files in the same directory. -*Entries* are automatically aggregated into *materials* based on the extract materials metadata. -*Entries* (of many uploads) can be manually curated into *datasets*for which you can also get a DOI. \ No newline at end of file diff --git a/docs/learn/schema.png b/docs/learn/schema.png new file mode 100644 index 0000000000000000000000000000000000000000..d256dc9acc35eb76c1f674e9d24c990bf22fb060 Binary files /dev/null and b/docs/learn/schema.png differ diff --git a/docs/learn/schema_language.png b/docs/learn/schema_language.png new file mode 100644 index 0000000000000000000000000000000000000000..bd328a9dda0b0748d2f028b98cb18873eb1c3454 Binary files /dev/null and b/docs/learn/schema_language.png differ diff --git a/docs/learn/schemas.md b/docs/learn/schemas.md deleted file mode 100644 index 3e335e7df6b5415ddc770625eed9a702f17f2e15..0000000000000000000000000000000000000000 --- a/docs/learn/schemas.md +++ /dev/null @@ -1,56 +0,0 @@ -# Schemas and Structured Data in NOMAD - -NOMAD stores all processed data in a *well defined*, *structured*, and *machine readable* -format. Well defined means that each element is supported by a formal definition that provides -a name, description, location, shape, type, and possible unit for that data. It has a -hierarchical structure that logically organizes data in sections and subsections and allows -cross-references between pieces of data. Formal definitions and corresponding -data structures enable the machine processing of NOMAD data. - - - -## The Metainfo is the schema for Archive data. -The Archive stores descriptive and structured information about materials-science -data. Each entry in NOMAD is associated with one Archive that contains all the processed -information of that entry. What information can possibly exist in an archive, how this -information is structured, and how this information is to be interpreted is governed -by the Metainfo. - -## On schemas and definitions -Each piece of Archive data has a formal definition in the Metainfo. These definitions -provide data types with names, descriptions, categories, and further information that -applies to all incarnations of a certain data type. - -Consider a simulation `Run`. Each -simulation run in NOMAD is characterized by a *section*, that is called *run*. It can contain -*calculation* results, simulated *systems*, applied *methods*, the used *program*, etc. -What constitutes a simulation run is *defined* in the metainfo with a *section definition*. -All other elements in the Archive (e.g. *calculation*, *system*, ...) have similar definitions. - -Definitions follow a formal model. Depending on the definition type, each definition -has to provide certain information: *name*, *description*, *shape*, *units*, *type*, etc. - -## Types of definitions - -- *Sections* are the building block for hierarchical data. A section can contain other - sections (via *subsections*) and data (via *quantities*). -- *Subsections* define a containment relationship between sections. -- *Quantities* define a piece of data in a section. -- *References* are special quantities that allow to define references from a section to - another section or quantity. -- *Categories* allow to categorize definitions. -- *Packages* are used to organize definitions. - -## Interfaces -The Archive format and Metainfo schema is abstract and not not bound to any -specific storage format. Archive and Metainfo can be represented in various ways. -For example, NOMAD internally stores archives in a binary format, but serves them via -API in json. Users can upload archive files (as `.archive.json` or `.archive.yaml`) files. -Metainfo schema can be programmed with Python classes, but can also be uploaded as -archive files (the Metainfo itself is just a specific Archive schema). The following -chart provides a sense of various ways that data can be entered into NOMAD: - - - -There are various interface to provide or retrieve Archive data and Metainfo schemas. -The following documentation sections will explain a few of them. diff --git a/docs/learn/screenshot.png b/docs/learn/screenshot.png new file mode 100644 index 0000000000000000000000000000000000000000..562c8123634bd3c4ffa53a2c4ccbaa472622dc09 Binary files /dev/null and b/docs/learn/screenshot.png differ diff --git a/docs/learn/stack.png b/docs/learn/stack.png index 41db1e7c86e4fdd7129b4567f1ca516ca1639a7a..16925c9e99c8f1536383ce2a0b97c65b9d396889 100644 Binary files a/docs/learn/stack.png and b/docs/learn/stack.png differ diff --git a/docs/learn/super_structure.png b/docs/learn/super_structure.png new file mode 100644 index 0000000000000000000000000000000000000000..9c1fb998634477616cb80503499ab3c08545f0fc Binary files /dev/null and b/docs/learn/super_structure.png differ diff --git a/docs/oasis/customize.md b/docs/oasis/customize.md index 30609f00655a8b0be2fc3f293e39ada799d1393b..3a005529da537e8211b97436ef13f36938527a9c 100644 --- a/docs/oasis/customize.md +++ b/docs/oasis/customize.md @@ -5,7 +5,7 @@ This is an incomplete list of potential customizations. Please read the respective guides to learn more. -- Installation specific changes (domain, path-prefix): [How to install an Oasis](install.md) +- Installation specific changes (domain, path-prefix): [How-to install an Oasis](install.md) - [Restricting user access](admin.md#restricting-access-to-your-oasis) - Write .yaml based [schemas](../schemas/basics.md) and [ELNs](../schemas/elns.md) - Learn how to use the [tabular parser](../schemas/tabular.md) to manage data from .xls or .csv diff --git a/docs/plugins/parsers.md b/docs/plugins/parsers.md index c060e9ace6f3875c8a089812958d4aed50ca654a..b08a6c12e762d3c96507b6768c06621b7b8a72c8 100644 --- a/docs/plugins/parsers.md +++ b/docs/plugins/parsers.md @@ -2,6 +2,6 @@ NOMAD uses parsers to convert raw code input and output files into NOMAD's commo ## Getting started -Fork and clone the [schema example project](https://github.com/nomad-coe/nomad-parser-plugin-example) as described in [before](plugins.md). Follow the original [How to write a parser](../develop/parsers.md) documentation. +Fork and clone the [parser example project](https://github.com/nomad-coe/nomad-parser-plugin-example) as described in [before](plugins.md). Follow the original [How-to write a parser](../develop/parsers.md) documentation. {{pydantic_model('nomad.config.plugins.Parser', heading='### Parser plugin metadata', hide=['code_name','code_category','code_homepage','metadata'])}} \ No newline at end of file diff --git a/docs/plugins/plugins.md b/docs/plugins/plugins.md index 69c95319df534704c3055c02c0f7f3221f7e00d8..d6453d14df2ec91a6d955d87bb5ee742f294f5ee 100644 --- a/docs/plugins/plugins.md +++ b/docs/plugins/plugins.md @@ -98,20 +98,102 @@ Now follow the instructions for one of our examples and try for yourself: - [schema plugin](https://github.com/nomad-coe/nomad-schema-plugin-example) - [parser plugin](https://github.com/nomad-coe/nomad-parser-plugin-example) + +## Publish a plugin + +!!! attention + The processes around publishing plugins and using plugins of others are still + worked on. The "best" practices mentioned here are preliminary. + +### Create a (GitHub) project + +If you forked from our examples, you already have a GitHub project. Otherwise, you +should create one. This allows others to get your plugin sources or initiate communication +via issues or pull requests. + +These are good names for plugin projects, depending on if you maintain one or more +plugins in a project (a project can contain multiple modules with multiple +`nomad-plugin.yaml` files and therefore multiple plugins): + +- nomad-<yourname\>-plugin +- nomad-<yourname\>-plugins + +!!! note + If you develop a plugin in the context of **FAIRmat** or the **NOMAD CoE**, put your + plugin projects in the respective GitHub organization for [FAIRmat](https://github.com/fairmat-nfdi) + and the [NOMAD CoE](https://github.com/nomad-coe). Here, the naming convention above is binding. + +Your plugin projects should follow the layout of our example projects. + +### Different forms of plugin distribution + +- **source code**: Mounting plugin code into a NOMAD (Oasis) installation. This is described above and only +the plugin source code is needed. +- **built-in**: Plugins that are directly maintained by NOMAD as distributed as part of +the NOMAD docker images. The Python code for those plugins is already installed, you only need +to configure NOMAD to use the plugins (or not). +- **PyPI/pip package**: Plugin projects can be published as PyPI/pip packages. Those +packages can then be installed either during NOMAD start-up (not implemented yet) or +when building a customized docker images (see [below](#pypipip-package)). + +Independent of the form of distribution, you'll still need to add the plugin to +your configuration as explained above. + +### PyPI/pip package + +Learn from the PyPI documentation how to [create a package for PyPI](https://packaging.python.org/en/latest/tutorials/packaging-projects/). +We recommend to use the `pyproject.toml`-based approach. Here is an example `pyproject.toml` file: + +```toml +--8<-- "examples/plugins/schema/pyproject.toml" +``` + +The package can be build like this: +``` +pip install build +python -m build --sdist +``` + +Learn from the PyPI documentation how to [publish a package to PyPI](https://packaging.python.org/en/latest/tutorials/packaging-projects/#uploading-the-distribution-archives). +If you have access to the MPCDF GitLab and NOMAD's presence there, you can also +use the `nomad-FAIR` registry: + +``` +pip install twine +twine upload \ + -u <username> -p <password> \ + --repository-url https://gitlab.mpcdf.mpg.de/api/v4/projects/2187/packages/pypi \ + dist/nomad-example-schema-plugin-*.tar.gz +``` + +### Register your plugin + +!!! attention + This is work in progress. We plan to provide a plugin registry that allows you to + publish your plugin's *metadata*. This will then be used to simplify plugin management + within a NOMAD installation. + + The built-in plugins can already be found in the [documentation reference](../reference/plugins.md). + ## Add a plugin to your NOMAD -To add a plugin, you need to add the *plugin metadata* to `nomad.yaml` (see above) and you need -to add the *plugin code* to the `PYTHONPATH` of your NOMAD. The `nomad.yaml` needs to be -edited manually in the usual way. There are several ways to -add *plugin code* to a NOMAD installation. +Adding a plugin, depends on the form of plugin distribution and how you run NOMAD. +Eventually, you need to add the *plugin metadata* to `nomad.yaml` (see above) and you need +to add the *plugin code* to the `PYTHONPATH`. The `nomad.yaml` needs to be +edited manually in the usual ways. There are several ways to add *plugin code*. + +### Built-in plugins -### Development setup of NOMAD +Those are already part of the NOMAD sources or NOMAD docker images. You only need +to configure them in your `nomad.yaml`. -Simply add the plugin directory to the `PYTHONPATH` environment variable. When you start -the application (e.g. `nomad admin run appworker`), Python will find your code when NOMAD +### Add to Python path + +When you run NOMAD as a developer, simply add the plugin directory to the `PYTHONPATH` environment variable. +When you start the application (e.g. `nomad admin run appworker`), Python will find your code when NOMAD imports the `python_package` given in the `plugins.options` of your `nomad.yaml`. -### NOMAD Oasis +### Mount into a NOMAD Oasis The NOMAD docker image adds the folder `/app/plugins` to the `PYTHONPATH`. You simply have to add the *plugin metadata* to your Oasis' `nomad.yaml` and mount your code into the `/app/plugins` @@ -157,14 +239,33 @@ curl localhost/nomad-oasis/alive Read the [Oasis install guide](../oasis/install.md) for more details. -### Other means +### Install PyPI/pip package -- via python packages (coming soon...) -- via github projects (coming soon...) +If the plugin is published on PyPI, you can simply install it with pip. If the +plugin was published to our MPCDF GitLab registry, you have to use the `--index-url` +parameter: -## Publish a plugin +``` +pip install nomad-example-schema-plugin --index-url https://gitlab.mpcdf.mpg.de/api/v4/projects/2187/packages/pypi/simple +``` + +Installing via pip works for NOMAD developers, but how to pip install into an Oasis? +The package could either be installed when NOMAD is started or via +a customized docker image. -coming soon... +!!! attention + We still need to implement that configured plugins, if not already installed, + get automatically installed during NOMAD start. -We plan to provide a plugin registry that allows you to publish your plugin's *metadata*. -This can then be used to simplify plugin management within a NOMAD installation. \ No newline at end of file +You can build a custom NOMAD docker image that has your packages already installed. +Here is an example `Dockerfile`: + +```Dockerfile +--8<-- "examples/plugins/schema/Dockerfile" +``` + +The image can be build like this: + +``` +docker build -t nomad-with-plugins . +``` diff --git a/docs/plugins/schemas.md b/docs/plugins/schemas.md index c73dfc9321a47cf79e97f670f53715d6a7a94899..96069ce3538ddf121fc0ca39db484eec96330711 100644 --- a/docs/plugins/schemas.md +++ b/docs/plugins/schemas.md @@ -386,7 +386,7 @@ print(json.dumps(calc.m_to_dict(), indent=2)) ### Access structured data via the NOMAD Python package The NOMAD Python package provides utilities to [query large amounts of -archive data](/archive_query.md). This uses the build-in Python schema classes as +archive data](/archive_query.md). This uses the built-in Python schema classes as an interface to the data. ## Custom normalizers diff --git a/docs/reference/parsers.md b/docs/reference/parsers.md index 53904ab601882ce760bb8081ae9edb4948e76b40..e9a12fd84cd154bb79760954d13b81522f3edcb2 100644 --- a/docs/reference/parsers.md +++ b/docs/reference/parsers.md @@ -4,8 +4,8 @@ You might also want to read: - - [How to run parsers locally](../apis/local_parsers.md) - - [How to develop a parser plugin](../plugins/parsers.md) + - [How-to run parsers locally](../apis/local_parsers.md) + - [How-to develop a parser plugin](../plugins/parsers.md) This is a list of all available parsers and supported file formats: diff --git a/docs/reference/plugins.md b/docs/reference/plugins.md new file mode 100644 index 0000000000000000000000000000000000000000..9bdb7687bfdbff42e0b9f7891f42e8e010845a00 --- /dev/null +++ b/docs/reference/plugins.md @@ -0,0 +1,9 @@ +# Built-in plugins + +!!! note + + You might also want to read [the plugin how-tos](../plugins/plugins.md) + +This is a list of all built-in plugins: + +{{ plugin_list() }} diff --git a/docs/schemas/basics.md b/docs/schemas/basics.md index e9e3a907987a32d23fc0340ddc2c802550a4f8a4..965d74d6968b00c33b626276bc11fc8037617c2f 100644 --- a/docs/schemas/basics.md +++ b/docs/schemas/basics.md @@ -1,6 +1,6 @@ # Write NOMAD Schemas in YAML -This guide explains how to write and upload NOMAD schemas in our `.archive.yaml` format. For more information visit the [learn section on schemas](../learn/schemas.md). +This guide explains how to write and upload NOMAD schemas in our `.archive.yaml` format. For more information visit the [learn section on schemas](../learn/data.md). ## Example data @@ -102,7 +102,7 @@ NOMAD manages units and data with units via the [Pint](https://pint.readthedocs. be simple units (or their aliases) or complex expressions. Here are a few examples: `m`, `meter`, `mm`, `millimeter`, `m/s`, `m/s**2`. -While you can use all kinds of units in your uploaded schemas, the build-in NOMAD schema (Metainfo) uses only SI units. +While you can use all kinds of units in your uploaded schemas, the built-in NOMAD schema (Metainfo) uses only SI units. ## Sub-sections @@ -287,7 +287,7 @@ The fact that a sub-section or reference target can have different "forms" (i.e. ### Pre-defined sections -NOMAD provides a series of build-in *section definitions*. For example, there is `EntryArchive`, a definition for the top-level object in all NOMAD archives (e.g. `.archive.yaml` files). Here is a simplified except of the *main* NOMAD schema `nomad.datamodel`: +NOMAD provides a series of built-in *section definitions*. For example, there is `EntryArchive`, a definition for the top-level object in all NOMAD archives (e.g. `.archive.yaml` files). Here is a simplified except of the *main* NOMAD schema `nomad.datamodel`: ```yaml EntryArchive: @@ -318,7 +318,7 @@ example: --8<-- "examples/docs/inheritance/hello.archive.yaml" ``` -Here are a few other build-in section definitions and packages of definitions: +Here are a few other built-in section definitions and packages of definitions: |Section definition or package|Purpose| |---|---| diff --git a/docs/schemas/tabular.md b/docs/schemas/tabular.md index 2944dc79c7b58d0624c438554fea0d320b6e1d9d..89547db259231e9c59740dbf615fb2d07cc7adb1 100644 --- a/docs/schemas/tabular.md +++ b/docs/schemas/tabular.md @@ -216,7 +216,7 @@ and supports exporting experimental data in ELN file format. ELNFileFormat is a that contains <b>metadata</b> of your elabFTW project along with all other associated data of your experiments. -<b>How to import elabFTW data into NOMAD:</b> +<b>How-to import elabFTW data into NOMAD:</b> Go to your elabFTW experiment and export your project as `ELN Archive`. Save the file to your filesystem under your preferred name and location (keep the `.eln` extension intact). @@ -255,7 +255,7 @@ To do so, the necessary information are listed in the table below: The password (user credential) to authenticate and login the user. <b>Important Note</b>: this information <b>is discarded</b> once the authentication process is finished. -<b>How to import Labfolder data into NOMAD:</b> +<b>How-to import Labfolder data into NOMAD:</b> To get your data transferred to NOMAD, first go to NOMAD's upload page and create a new upload. Then click on `CREATE ENTRY` button. Select a name for your entry and pick `Labfolder Project Import` from diff --git a/docs/schemas/workflow-schema.png b/docs/schemas/workflow-schema.png index c9865704a36814f78bc37ad42b6f124840c3a683..ab7153ef14eac7e70be16bdfdd462e13434d4605 100644 Binary files a/docs/schemas/workflow-schema.png and b/docs/schemas/workflow-schema.png differ diff --git a/docs/schemas/workflows.md b/docs/schemas/workflows.md index 0f64fc1fec47e1d6f768d3142834cdea38d69d3b..bd6c42b88a61c6f3a8c8a86a88e961cabbe41ab8 100644 --- a/docs/schemas/workflows.md +++ b/docs/schemas/workflows.md @@ -2,7 +2,7 @@ title: Workflows --- -## The build-in abstract workflow schema +## The built-in abstract workflow schema Workflows are an important aspect of data as they explain how the data came to be. Let's first clarify that *workflow* refers to a workflow that already happened and that has diff --git a/examples/plugins/schema b/examples/plugins/schema index c6ba2ce8b842df5e29c6a3a8e1ce080c0c260163..d484c6e55cc0458d982092b4fa23084201997e71 160000 --- a/examples/plugins/schema +++ b/examples/plugins/schema @@ -1 +1 @@ -Subproject commit c6ba2ce8b842df5e29c6a3a8e1ce080c0c260163 +Subproject commit d484c6e55cc0458d982092b4fa23084201997e71 diff --git a/mkdocs.yml b/mkdocs.yml index 5ef8924a1638a995663510373e3215c10e1f54ac..286d166916b299374bf6a2b95ca32e0d098bcca6 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -5,48 +5,49 @@ site_author: The NOMAD Authors nav: - Home: index.md - Tutorial: tutorial.md - - How to guides: + - How-to guides: - Data Management: - - How to upload/publish data: data/upload.md - - How to use ELNs: data/eln.md - - How to explore data: data/explore.md - - How to use NORTH: data/north.md + - How-to upload/publish data: data/upload.md + - How-to use ELNs: data/eln.md + - How-to explore data: data/explore.md + - How-to use NORTH: data/north.md - Schemas: - - How to write a schema: schemas/basics.md - - How to define ELNs: schemas/elns.md - - How to define tabular data: schemas/tabular.md - - How to define workflows: schemas/workflows.md - - How to reference hdf5: schemas/hdf5.md + - How-to write a schema: schemas/basics.md + - How-to define ELNs: schemas/elns.md + - How-to define tabular data: schemas/tabular.md + - How-to define workflows: schemas/workflows.md + - How-to reference hdf5: schemas/hdf5.md - Programming interfaces: - - How to use the API: apis/api.md - - How to install nomad-lab: apis/pythonlib.md - - How to access processed data: apis/archive_query.md - - How to run a parser: apis/local_parsers.md + - How-to use the API: apis/api.md + - How-to install nomad-lab: apis/pythonlib.md + - How-to access processed data: apis/archive_query.md + - How-to run a parser: apis/local_parsers.md - Plugins: - - How to develop and use plugins: plugins/plugins.md - - How to write schema plugins: plugins/schemas.md - - How to write parser plugins: plugins/parsers.md + - How-to develop, publish, and install plugins: plugins/plugins.md + - How-to write schema plugins: plugins/schemas.md + - How-to write parser plugins: plugins/parsers.md - Development: - - How to get started: develop/setup.md + - How-to get started: develop/setup.md - Code guidelines: develop/guides.md - - How to contribute: develop/gitlab.md - - How to extend the search: develop/search.md - - How to write a parser: develop/parsers.md - - How to write a normalizer: develop/normalizers.md + - How-to contribute: develop/gitlab.md + - How-to extend the search: develop/search.md + - How-to write a parser: develop/parsers.md + - How-to write a normalizer: develop/normalizers.md - Oasis: - - How to install an Oasis: oasis/install.md - - How to customize an Oasis: oasis/customize.md - - How to migrate Oasis versions: oasis/migrate.md + - How-to install an Oasis: oasis/install.md + - How-to customize an Oasis: oasis/customize.md + - How-to migrate Oasis versions: oasis/migrate.md - Administrative tasks: oasis/admin.md - Learn: - - Files, processing, data: learn/how_nomad_works.md - - Schemas and structured data: learn/schemas.md + - From files to data: learn/basics.md + - Structured data: learn/data.md - Architecture: learn/architecture.md - Why you need an Oasis: learn/oasis.md - Reference: - reference/config.md - reference/annotations.md - reference/cli.md + - reference/plugins.md - reference/parsers.md - reference/glossary.md theme: diff --git a/nomad/mkdocs.py b/nomad/mkdocs.py index af837ad22c2d63196cb0c88959303ce37d3059d7..780cb7679c91ae6cd30ed3da35e388717d213c37 100644 --- a/nomad/mkdocs.py +++ b/nomad/mkdocs.py @@ -33,7 +33,7 @@ from markdown.extensions.toc import slugify from nomad.utils import strip from nomad import config -from nomad.config.plugins import Parser +from nomad.config.plugins import Parser, Plugin from nomad.app.v1.models import ( query_documentation, owner_documentation) @@ -419,3 +419,29 @@ def define_env(env): render_category(name, category) for name, category in categories.items() ]) + + @env.macro + def plugin_list(): # pylint: disable=unused-variable + plugins = [ + plugin for plugin in config.plugins.options.values() + ] + + def render_plugin(plugin: Plugin) -> str: + result = plugin.name + docs_or_code_url = plugin.plugin_documentation_url or plugin.plugin_source_code_url + if docs_or_code_url: + result = f'[{plugin.name}]({docs_or_code_url})' + if plugin.description: + result = f'{result} ({plugin.description})' + + return result + + categories = {} + for plugin in plugins: + category = plugin.python_package.split(".")[0] + categories.setdefault(category, []).append(plugin) + + return '\n\n'.join([ + f'**{category}**: {", ".join([render_plugin(plugin) for plugin in plugins])}' + for category, plugins in categories.items() + ]) diff --git a/pyproject.toml b/pyproject.toml index 5c5d5e978490b9baf40ad7c32c275f83599c2fbe..27aff1d5d48890459331bfdf70c12c396cca8e8b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -10,7 +10,7 @@ authors = [ { name = "NOMAD Laboratory", email = 'markus.scheidgen@physik.hu-berlin.de' }, ] dynamic = ["version"] -license = { file = "LICENSE" } +license = { text = "Apache-2.0" } requires-python = ">=3.9" dependencies = [ 'numpy~=1.22.4', diff --git a/tests/datamodel/metainfo/eln/test_system.py b/tests/datamodel/metainfo/eln/test_system.py index ac2f7e70d4fb4de6a0def25ac4ec9cede7173d62..4638c62145ed3a2706d33e1a71c6db6f10a7095e 100644 --- a/tests/datamodel/metainfo/eln/test_system.py +++ b/tests/datamodel/metainfo/eln/test_system.py @@ -16,10 +16,13 @@ # limitations under the License. # +import pytest + from tests.normalizing.conftest import run_processing, run_normalize from nomad.datamodel.data import User +@pytest.mark.skip() def test_substance(raw_files, no_warn): directory = 'tests/data/datamodel/metainfo/eln' mainfile = 'test_substance.archive.yaml'