|
|
NomadMetaInfo are the main method used to describe the information stored in NomadLab.
|
|
|
|
|
|
# The thinking behind meta data #
|
|
|
|
|
|
The *NOMAD Archive* contains the result of the parsing and normalisation of the data contained in the Repository. Ideally, the *Archive* contains all the (parsable) data contained in the *Repository*, but in an easy-to-access format, so that the *Database* for the *Encyclopedia* can be constructed starting from the data structure of the *Archive*. The first step is the definition of a metadata structure for the un-ambiguous archiving of the data.
|
|
|
|
|
|
When classifying data, identifying the type of a value – and describing it – is of crucial importance. Metadata is information on the data. Its meaning depends on what one considers as data. Here, with data we mean the properties, values, etc. contained in the files stored in the *NOMAD Repository*. So, metadata is the information that is used to identify, describe and classify these values. For example, the name of the program used to perform a given calculation (e.g., “VASP”) is considered data, while the related metadata is the string “program used to perform the calculation”. If one thinks of storing data as ‘key’-‘value’ pairs (as in a dictionary), the ‘key’ is the metadata. To avoid conflicts (doubly defined metadata with different meaning) all names would need to be registered centrally. Since one would typically like to avoid very long strings as metadata names, the “good short“ names easily and quickly run out.
|
|
|
|
|
|
These problems can be solved by introducing hierarchically structured metadata: Adding a longer description to the short name, makes the meaning clearer. One should note that the name of the metadata is the label used to refer to it, directly, when writing the parser, so it is inconvenient to have long strings in the code. On the other hand, the description of the metadata can be accessed via the metadata name, when needed. The metadata itself can be described introducing multiple inheritance.
|
|
|
|
|
|
In this context, inheritance among metadata means that one metadata, called child, has the same features of another metadata, called parent. The child may have more features than the parent, but inherits all parent’s features. Multiple refers to the fact that one child may have several parents. Parents in turn can have (super)parents and children children of their own. This creates a hierarchical structure.
|
|
|
|
|
|
In our case, for instance, the metadata *energy\_total\_scf\_converged* (containing final, converged, total – electronic + ionic – energy) inherits from both *energy\_total\_potential* (an abstract-type metadata – see below for metadata types in detail – that contains final – i.e., scf converged or result of a perturbative method – energy quantities) and *section\_single\_configuration\_calculation* (a section-type metadata that groups all metadata related to a calculation performed on a given configuration and with a given method, see below for more details on sections).
|
|
|
|
|
|
To allow the reuse of short and descriptive names, a string called unique identifier (gid) is assigned to each metadata. The gid depends not just on the name, but also on the description, and the identifier of all its dependencies. An identifier clash is therefore very unlikely, as it would mean that two metadata with same name, description, and parents mean two different things.
|
|
|
|
|
|
# Conceptual model for calculations #
|
|
|
|
|
|
Most data values do not make sense taken isolated from their context, as they are connected to each other. For example an *energy\_total\_scf\_converged* value is not independent of the system (atomic configuration, etc.) it refers to. Thus we have to define which values are grouped together. This is done by using metadata objects of type *section*.
|
|
|
The value associated with a section metadata is a list of groups of keys and values that are connected.
|
|
|
For example, a typical calculation has the following sections:
|
|
|
* *section\_run*: represents a single "run" of the program,
|
|
|
* *section\_method*: contains the information defining the theory level, and convergence parameters,
|
|
|
* *section\_system\_description*: the content of each of this section corresponds to a different and specific system configuration
|
|
|
* *section\_single\_configuration\_calculation*: contains the results for a system as defined in a single *section\_method* and a single *section\_system\_description*.
|
|
|
* *section\_scf\_iteration*: each entry is a single self-consistency iteration.
|
|
|
A metadata *x\_index* (of type integer) and a metadata *x\_identifier* (of type string) are implicitly defined for each section x. Sections can be nested meaning that each inner one can contain one or more outer sections.
|
|
|
By using sections, it is possible to put less information in the single metadata, for example the *energy\_total* value could be identified also as *energy\_total* in a *section\_single\_configuration\_calculation*, and the actual xc functional and computation parameters can be found in its associated *section\_method*.
|
|
|
|
|
|
# Practical implementation #
|
|
|
|
|
|
NomadMetaInfo, i.e., the practical implementation of the NOMAD metadata, uses a dictionary in json format to describe a metadata:
|
|
|
|
|
|
{
|
|
|
"name": "energy_total_scf_converged",
|
|
|
"description": "A total (final, converged) energy calculated with XC_method_scf",
|
|
|
"kindStr": "type_document_content",
|
|
|
"dtypeStr": "f",
|
|
|
"repeats": false,
|
|
|
"shape": [],
|
|
|
"superNames": [
|
|
|
"energy_total_potential"
|
|
|
],
|
|
|
"units": "J"
|
|
|
}
|
|
|
|
|
|
and stores a list of such dictionaries in the metadata key of a dictionary within files ending with ".nomadmetainfo.json".
|
|
|
|
|
|
There is a git repository nomad-meta-info contains the current version on the metadata, along with several tools to help handling NOMAD metadata, verifying correctness, versions,...
|
|
|
|
|
|
There is a git repository nomad-meta-info contains the current version on the metadata, along with several tools to help handling NOMAD metadata, verifying correctness, versions, ...
|
|
|
The goal of the NOMAD metadata is not just to describe the data of a calculation and the various properties calculated in a run, but also derived quantities not necessarily parsed like basis-set-superposition-error (BSSE) corrected energies. As said, it gives a unique way to identify a given property, and allow one to easily treat similar properties in the same way.
|
|
|
|
|
|
The metadata type is declared in kindStr and can be:
|
|
|
* *type\_document\_content* has a value associated, but cannot be further inherited (the default type)
|
|
|
* *type\_section* describes a section that groups related quantities
|
|
|
* *type\_abstract\_document\_content* are types that are used only to classify other types.
|
|
|
|
|
|
# Web Interface #
|
|
|
|
|
|
A Web REST interface offers both json values of a [complete version](https://nomad-dev.rz-berlin.mpg.de/nmi/v/last/info.html) and single values:
|
|
|
|
|
|
`https://nomad-dev.rz-berlin.mpg.de/nmi/v/common/n/<matadata name>/info.html`,
|
|
|
|
|
|
e.g.: [energy\_total\_scf\_converged](https://nomad-dev.rz-berlin.mpg.de/nmi/v/common/n/energy\_total\_scf\_converged/info.html),
|
|
|
|
|
|
as long as json values of a [complete version](https://nomad-dev.rz-berlin.mpg.de/nmi/v/common/info.json) and single values:
|
|
|
|
|
|
`https://nomad-dev.rz-berlin.mpg.de/nmi/v/common/n/<metadata name>/info.json`
|
|
|
|
|
|
e.g., [energy\_total\_scf\_converged](https://nomad-dev.rz-berlin.mpg.de/nmi/v/common/n/energy\_total\_scf\_converged/info.json)
|
|
|
|
|
|
# Extensibility #
|
|
|
|
|
|
NomadMetaInfo are defined in a way that everybody (in principle, in practice, at the moment only the NOMAD Database developers) can extend them and introduce new types without needing to consult a central authority, and clashes are basically impossible. This happens because, while it is well possible to use the same name in a different ways, internally NomadMetaInfo are always identified by gid, which is a checksum that depends on the whole definition of the metadeata, and the gid of all its dependencies. Thus if a person defines a different "energy\_total\_scf\_converged" metadata, it will have another, different (be it in the description string, inheritance, or some other property), gid.
|
|
|
A single document or piece of information has to use a unique definition for each name, but different documents might use different ones without problems. This is very useful for new or experimental properties, that can be stored and used before being standardized.
|
|
|
|
|
|
# Standard NomadMetaInfo #
|
|
|
Still, one should strive to register the type used and use a "standard" version of the metadata, so that one can search across all documents for, e.g., “energy\_total\_scf\_converged” using a unique key that has a clear meaning.
|
|
|
The goal of the repository at https://github.com/nomad-dev/nomad-meta-info is exactly to define this "standard" version of the NomadMetaInfo. They are stored in the "standard_meta_info" directory.
|
|
|
The repository contains also scripts in scripts/nomadscripts to help ensure that the definition does not contain errors and generate overrides.
|
|
|
|
|
|
# Concrete MetaInfo
|
|
|
|
|
|
There is linked visualization of the [last version](https://nomad-dev.rz-berlin.mpg.de/nmi/v/common/info.html) a [more interactive version](https://nomad-dev.rz-berlin.mpg.de/ui/index.html), and a (large!) [svg plot](commonMetaInfo.svg) of the inheritance of the common meta infos.
|
|
|
|
|
|
Special info on some concrete meta info is in the following pages:
|
|
|
|
|
|
* [basis-set-atom-centered-short-name](metainfo/basis-set-atom-centered-short-name)
|
|
|
* [basis-set-atom-centered-unique-name](metainfo/basis-set-atom-centered-unique-name)
|
|
|
* [basis-set-cell-associated-kind](metainfo/basis-set-cell-associated-kind)
|
|
|
* [basis-set-cell-associated-name](metainfo/basis-set-cell-associated-name)
|
|
|
* [basis-set-kind](metainfo/basis-set-kind)
|
|
|
* [basis-set-name](metainfo/basis-set-name)
|
|
|
* [eigenvalues-kind](metainfo/eigenvalues-kind)
|
|
|
* [energy-comparable](metainfo/energy-comparable)
|
|
|
* [relativity-treatment](metainfo/relativity-treatment)
|
|
|
* [self-interaction-correction](metainfo/self-interaction-correction)
|
|
|
* [van-der-Waals-treatment](metainfo/van-der-Waals-treatment)
|
|
|
* [XC-functional](metainfo/XC-functional)
|
|
|
* [method-current](metainfo/method-current)
|
|
|
* [energy-current](metainfo/energy-current)
|
|
|
|
|
|
# Overrides #
|
|
|
|
|
|
The standard version of the metadata can change, and a list of how to map old versions to new ones (if the new metadata is for all purposes equivalent to the old one) can be specified with override files.
|
|
|
|
|
|
These files describe the new version of a NomadMetaInfo, by listing old gid and new gid, and can thus introduce versioning for NomadMetaInfo.
|
|
|
The name or other keys can be given, but are only informative and can be omitted.
|
|
|
|
|
|
## File Naming convention ##
|
|
|
|
|
|
Normally overrides are given between two tagged versions or between the last checked-in version, and the current state. So, override files are by default given as
|
|
|
|
|
|
<oldVersion>_<newVersion>_YYYY-MM-DD.nomadmetainfo_overrides.json
|
|
|
|
|
|
where ''oldVersion'' can be the first 10 characters of the git sha, a tag name, or even empty; just like ''newVersion''. ''YYYY-MM-DD'' is the current date, and if required an "_n" with a suitable number ''n'' that does not clash with existing files can be used.
|
|
|
|
|
|
The extension .nomadmetainfo_overrides.json is mandatory.
|
|
|
|
|
|
## Automatic Generation ##
|
|
|
|
|
|
Normally you can generate these files automatically with scripts/nomadscripts/calculate\_meta\_info\_overrides.py
|
|
|
The script works if the name of the KindInfo is the same but have different gid.
|
|
|
|
|
|
## Complex Cases ##
|
|
|
|
|
|
In cases in which you have renamed a NomadMetaInfo or there is a NomadMetaInfo outside the
|
|
|
standard that you want to replace with the standard one you have to create (or complete)
|
|
|
the override file by hand.
|
|
|
In these cases the --verbose flag can be useful.
|
|
|
|
|
|
It is also possible to use
|
|
|
|
|
|
scripts/nomadscripts/normalize\_local\_kinds.py --add-gid
|
|
|
|
|
|
to update each KindInfo with its gid, which then you can use to create manual override files.
|
|
|
|
|
|
Please, do not check in the repository the generated .nomadmetainfo.json files with gid. |