Commit 1874328c authored by Markus Scheidgen's avatar Markus Scheidgen
Browse files

Added search docs; added search keys to metainfo browser. Align docs style with app. #699, #592

parent 39b2c947
Pipeline #118409 passed with stages
in 33 minutes and 42 seconds
document.getElementsByClassName("md-header__button")[0].title = "NOMAD"
\ No newline at end of file
......@@ -254,3 +254,58 @@ m_package = Package()
m_package.__init_metainfo__()
```
## Adding definition to the existing metainfo schema
Now you know how to define new sections and quantities, but how should your additions
be integrated in the existing schema and what conventions need to be followed?
### Metainfo schema super structure
The `EntryArchive` section definition set the root of the archive for each entry in
NOMAD. It therefore defines the top level sections:
- `metadata`, all "administrative" metadata (ids, permissions, publish state, uploads, user metadata, etc.)
- `results`, a summary with copies and references to data from method specific sections. This also
presents the [searchable metadata](search.md).
- `workflows`, all workflow metadata
- Method specific sub-sections, e.g. `run`. This is were all parsers are supposed to
add the parsed data.
The main NOMAD Python project include Metainfo definitions in the following modules:
- `nomad.metainfo` Defines the Metainfo itself. This includes a self-referencing schema
of itself. E.g. there is a section `Section`, etc.
- `nomad.datamodel` Mostly defines the section `metadata` that contains all "administrative"
metadata. It also contains the root section `EntryArchive`.
- `nomad.datamodel.metainfo` Defines all the central, method specific (but not parser specific) definitions.
For example the section `run` with all the simulation (computational material science definitions)
definition that are shared among the respective parsers.
### Extending existing sections
Parsers can provide their own definitions. By conventions those are places into a
`metainfo` sub-module of the parser Python module. The definitions here can add properties
to existing sections (e.g. from `nomad.datamodel.metainfo`). By convention us a `x_mycode_`
prefix. This is done with the
`extends_base_section` [Section property](#sections). Here is an example:
```py
from nomad.metainfo import Section
from nomad.datamodel.metainfo.simulation import Method
class MyCodeRun(Method)
m_def = Section(extends_base_section=True)
x_mycode_execution_mode = Quantity(
type=MEnum('hpc', 'parallel', 'single'), description='...')
```
### Metainfo schema conventions
- Use lower snake case for section properties; use upper camel case for section definitions.
- Use a `_ref` suffix for references.
- Use sub-sections rather than inheritance to add specific quantities to a general section.
E.g. section `workflow` contains a section `geometry_optimization` for all geometry optimization specific
workflow quantities.
- Prefix parser specific and custom definitions with `x_name_`. Where `name` is the
short handle of a code name or other special method prefix.
# Extending the search
## The search indices
NOMAD uses elasticsearch as the underlying search engine. The respective indices
are automatically populate during processing and other NOMAD operations. The indices
are build from some of the archive information of each entry. These are mostly the
sections `metadata` (ids, user metadata, other "administrative" and "internal" metadata)
and `results` (a summary of all extracted (meta-)data). But these sections are not
indexed verbatim. What exactly and how it is indices is determined by the metainfo
and the `elasticsearch` metainfo extension.
### The elasticsearch metainfo extension
Here is the definition of `results.material.elements` as an example:
```py
class Material(MSection):
...
elements = Quantity(
type=MEnum(chemical_symbols),
shape=["0..*"],
default=[],
description='Names of the different elements present in the structure.',
a_elasticsearch=[
Elasticsearch(material_type, many_all=True),
Elasticsearch(suggestion="simple")
]
)
```
Extensions are denoted with the `a_` prefix as in `a_elasticsearch`.
While extensions can have all kinds of values, the elasticsearch extension is rather
complex and uses the `Elasticsearch` class.
There can be multiple values. Each `Elasticsearch` instance configures a different part
of the index. This means that the same quantity can be indexed multiple time. A typical
example is, if you need a text and a keyword based search for the same data. Here
is a version of the `metadata.mainfile` definition as another example:
```py
mainfile = metainfo.Quantity(
type=str, categories=[MongoEntryMetadata, MongoSystemMetadata],
description='The path to the mainfile from the root directory of the uploaded files',
a_elasticsearch=[
Elasticsearch(_es_field='keyword'),
Elasticsearch(
mapping=dict(type='text', analyzer=path_analyzer.to_dict()),
field='path', _es_field='')
]
)
```
### The different indices
The first (optional) argument for `Elasticsearch` determines where the data is indexed.
There are three principle places:
- the entry index (default, `entry_type`)
- the materials index (`material_type`)
- the entries within the materials index (`material_entry_type`)
#### Entry index
This is the default and is used even if another (additional) value is given. All data
is put into the entry index.
#### Materials index
This is a separate index from the entry index and contains aggregated material information.
Each document in this index represents a material. We use a hash over some material
properties (elements, system type, symmetry) to define what a material is and what entries
belong to what material.
Some parts of the materials documents contain the material information that is always
the same across all entries of this material. Examples are elements, formulas, symmetry.
#### Material entries
The materials index also contains entry specific information that allows to filter
materials for the existence of entries with certain criteria. Examples are
publish status, user metadata, used method, or property data.
### Adding quantities
In principle all quantities could be added to the index. But for convention and simplicity,
only quantities defined in sections `metadata` and `results` should be added. This
means that if you want to add custom quantities from your parser for example, you will
also need to adapt the results normalizer to copy or reference parsed data.
## The search API
The search API does not have to change. It automatically supports all quantities with
the eleasticsearch extensions. The keys that you can use in the API are the metainfo
paths of the respective quantities, e.g. `results.material.elements` or `mainfile` (note
that the `metadata.` prefix is always omitted). If there are multiple elasticsearch
annotations for the same quantity, all but one of the define a `field` parameter, which
is added to the quantity path, e.g. `mainfile.path`.
## The search web interface
Comming soon ...
\ No newline at end of file
.md-header__button.md-logo :where(img,svg) {
width: 4.2rem;
height: 2rem;
}
.md-header, .md-header__inner {
background-color: #fff;
color: #008DC3;
font-weight: 400;
}
.md-search__form:hover {
background-color: rgba(0,0,0,.13);
}
\ No newline at end of file
......@@ -446,7 +446,13 @@ SubSectionDef.propTypes = ({
})
function DefinitionProperties({def, children}) {
if (!(children || def.aliases?.length || def.deprecated || Object.keys(def.more).length)) {
const searchAnnotations = def.m_annotations && Object.keys(def.m_annotations)
.filter(key => key === 'elasticsearch')
.map(key => def.m_annotations[key].filter(
value => !(value.endsWith('.suggestion') || value.endsWith('__suggestion')))
)
if (!(children || def.aliases?.length || def.deprecated || Object.keys(def.more).length || searchAnnotations)) {
return ''
}
......@@ -457,6 +463,8 @@ function DefinitionProperties({def, children}) {
{Object.keys(def.more).map((moreKey, i) => (
<Typography key={i}><b>{moreKey}</b>:&nbsp;{String(def.more[moreKey])}</Typography>
))}
{searchAnnotations && <Typography><b>search&nbsp;keys</b>:&nbsp;{
searchAnnotations.join(', ')}</Typography>}
</Compartment>
}
DefinitionProperties.propTypes = ({
......
......@@ -12,6 +12,7 @@ nav:
- Extending and Developing NOMAD:
- developers.md
- metainfo.md
- search.md
- parser.md
- normalizers.md
- Operating NOMAD (Oasis): oasis.md
......@@ -22,8 +23,10 @@ theme:
accent: teal
font:
text: 'Titillium Web'
logo: null
favicon: assets/favicon-hres.png
logo: assets/nomad-logo.png
favicon: assets/favicon.png
features:
- navigation.instant
# repo_url: https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/
markdown_extensions:
- attr_list
......@@ -40,8 +43,13 @@ markdown_extensions:
toc_depth: 3
extra:
generator: false
homepage: https://nomad-lab.eu/prod/v1/gui/about
use_directory_urls: false
plugins:
- search
- macros:
module_name: nomad/mkdocs
\ No newline at end of file
module_name: nomad/mkdocs
extra_css:
- stylesheets/extra.css
extra_javascript:
- javascript.js
\ No newline at end of file
......@@ -71,12 +71,17 @@ def metainfo():
def metainfo_undecorated():
from nomad.metainfo import Package, Environment
from nomad.datamodel import EntryArchive
# TODO similar to before, due to lazyloading, we need to explicily access parsers
# to actually import all parsers and indirectly all metainfo packages
from nomad.parsing import parsers
parsers.parsers
# Create the ES mapping to populate ES annoations with search keys.
from nomad.search import entry_type
entry_type.create_mapping(EntryArchive.m_def)
# TODO we call __init_metainfo__() for all packages where this has been forgotten
# by the package author. Ideally this would not be necessary and we fix the
# actual package definitions.
......
......@@ -405,6 +405,9 @@ class DocumentType():
assert name not in self.metrics, 'Metric names must be unique: %s' % name
self.metrics[name] = (metric, search_quantity)
if self == entry_type:
annotation.search_quantity = search_quantity
def __repr__(self):
return self.name
......@@ -596,6 +599,7 @@ class Elasticsearch(DefinitionAnnotation):
Attributes:
name:
The name of the quantity (plus additional field if set).
search_quantity: The entry type SearchQuantity associated with this annoation.
'''
def __init__(
self,
......@@ -655,6 +659,8 @@ class Elasticsearch(DefinitionAnnotation):
self.nested = nested
self.suggestion = suggestion
self.search_quantity = None
@property
def values(self):
return self._values
......@@ -749,6 +755,12 @@ class Elasticsearch(DefinitionAnnotation):
return f'Elasticsearch({self.definition})'
def m_to_dict(self):
if self.search_quantity:
return self.search_quantity.qualified_name
else:
return self.name
class SearchQuantity():
'''
......
......@@ -1541,6 +1541,12 @@ class MSection(metaclass=MObjectMeta): # TODO find a way to make this a subclas
else:
raise NotImplementedError('Higher shapes (%s) not supported: %s' % (quantity.shape, quantity))
def serialize_annotation(annotation):
if isinstance(annotation, Annotation):
return annotation.m_to_dict()
else:
return str(annotation)
def items() -> Iterable[Tuple[str, Any]]:
# metadata
if with_meta:
......@@ -1550,6 +1556,16 @@ class MSection(metaclass=MObjectMeta): # TODO find a way to make this a subclas
if self.m_parent_sub_section is not None:
yield 'm_parent_sub_section', self.m_parent_sub_section.name
annotations = {}
for annotation_name, annotation in self.m_annotations.items():
if isinstance(annotation, list):
annotation_value = [serialize_annotation(item) for item in annotation]
else:
annotation_value = [serialize_annotation(annotation)]
annotations[annotation_name] = annotation_value
if len(annotations) > 0:
yield 'm_annotations', annotations
# quantities
sec_path = self.m_path()
for name, quantity in self.m_def.all_quantities.items():
......@@ -3087,7 +3103,13 @@ class Category(Definition):
class Annotation:
''' Base class for annotations. '''
pass
def m_to_dict(self):
'''
Returns a JSON serializable representation that is used for exporting the
annotation to JSON.
'''
return str(self.__class__.__name__)
class DefinitionAnnotation(Annotation):
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment