Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
nomad-lab
nomad-FAIR
Commits
1874328c
Commit
1874328c
authored
Dec 22, 2021
by
Markus Scheidgen
Browse files
Added search docs; added search keys to metainfo browser. Align docs style with app.
#699
,
#592
parent
39b2c947
Pipeline
#118409
passed with stages
in 33 minutes and 42 seconds
Changes
10
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
docs/assets/favicon.png
0 → 100644
View file @
1874328c
3.29 KB
docs/javascript.js
0 → 100644
View file @
1874328c
document
.
getElementsByClassName
(
"
md-header__button
"
)[
0
].
title
=
"
NOMAD
"
\ No newline at end of file
docs/metainfo.md
View file @
1874328c
...
...
@@ -254,3 +254,58 @@ m_package = Package()
m_package
.
__init_metainfo__
()
```
## Adding definition to the existing metainfo schema
Now you know how to define new sections and quantities, but how should your additions
be integrated in the existing schema and what conventions need to be followed?
### Metainfo schema super structure
The
`EntryArchive`
section definition set the root of the archive for each entry in
NOMAD. It therefore defines the top level sections:
-
`metadata`
, all "administrative" metadata (ids, permissions, publish state, uploads, user metadata, etc.)
-
`results`
, a summary with copies and references to data from method specific sections. This also
presents the
[
searchable metadata
](
search.md
)
.
-
`workflows`
, all workflow metadata
-
Method specific sub-sections, e.g.
`run`
. This is were all parsers are supposed to
add the parsed data.
The main NOMAD Python project include Metainfo definitions in the following modules:
-
`nomad.metainfo`
Defines the Metainfo itself. This includes a self-referencing schema
of itself. E.g. there is a section
`Section`
, etc.
-
`nomad.datamodel`
Mostly defines the section
`metadata`
that contains all "administrative"
metadata. It also contains the root section
`EntryArchive`
.
-
`nomad.datamodel.metainfo`
Defines all the central, method specific (but not parser specific) definitions.
For example the section
`run`
with all the simulation (computational material science definitions)
definition that are shared among the respective parsers.
### Extending existing sections
Parsers can provide their own definitions. By conventions those are places into a
`metainfo`
sub-module of the parser Python module. The definitions here can add properties
to existing sections (e.g. from
`nomad.datamodel.metainfo`
). By convention us a
`x_mycode_`
prefix. This is done with the
`extends_base_section`
[
Section property
](
#sections
)
. Here is an example:
```
py
from
nomad.metainfo
import
Section
from
nomad.datamodel.metainfo.simulation
import
Method
class
MyCodeRun
(
Method
)
m_def
=
Section
(
extends_base_section
=
True
)
x_mycode_execution_mode
=
Quantity
(
type
=
MEnum
(
'hpc'
,
'parallel'
,
'single'
),
description
=
'...'
)
```
### Metainfo schema conventions
-
Use lower snake case for section properties; use upper camel case for section definitions.
-
Use a
`_ref`
suffix for references.
-
Use sub-sections rather than inheritance to add specific quantities to a general section.
E.g. section
`workflow`
contains a section
`geometry_optimization`
for all geometry optimization specific
workflow quantities.
-
Prefix parser specific and custom definitions with
`x_name_`
. Where
`name`
is the
short handle of a code name or other special method prefix.
docs/search.md
0 → 100644
View file @
1874328c
# Extending the search
## The search indices
NOMAD uses elasticsearch as the underlying search engine. The respective indices
are automatically populate during processing and other NOMAD operations. The indices
are build from some of the archive information of each entry. These are mostly the
sections
`metadata`
(ids, user metadata, other "administrative" and "internal" metadata)
and
`results`
(a summary of all extracted (meta-)data). But these sections are not
indexed verbatim. What exactly and how it is indices is determined by the metainfo
and the
`elasticsearch`
metainfo extension.
### The elasticsearch metainfo extension
Here is the definition of
`results.material.elements`
as an example:
```
py
class
Material
(
MSection
):
...
elements
=
Quantity
(
type
=
MEnum
(
chemical_symbols
),
shape
=
[
"0..*"
],
default
=
[],
description
=
'Names of the different elements present in the structure.'
,
a_elasticsearch
=
[
Elasticsearch
(
material_type
,
many_all
=
True
),
Elasticsearch
(
suggestion
=
"simple"
)
]
)
```
Extensions are denoted with the
`a_`
prefix as in
`a_elasticsearch`
.
While extensions can have all kinds of values, the elasticsearch extension is rather
complex and uses the
`Elasticsearch`
class.
There can be multiple values. Each
`Elasticsearch`
instance configures a different part
of the index. This means that the same quantity can be indexed multiple time. A typical
example is, if you need a text and a keyword based search for the same data. Here
is a version of the
`metadata.mainfile`
definition as another example:
```
py
mainfile
=
metainfo
.
Quantity
(
type
=
str
,
categories
=
[
MongoEntryMetadata
,
MongoSystemMetadata
],
description
=
'The path to the mainfile from the root directory of the uploaded files'
,
a_elasticsearch
=
[
Elasticsearch
(
_es_field
=
'keyword'
),
Elasticsearch
(
mapping
=
dict
(
type
=
'text'
,
analyzer
=
path_analyzer
.
to_dict
()),
field
=
'path'
,
_es_field
=
''
)
]
)
```
### The different indices
The first (optional) argument for
`Elasticsearch`
determines where the data is indexed.
There are three principle places:
-
the entry index (default,
`entry_type`
)
-
the materials index (
`material_type`
)
-
the entries within the materials index (
`material_entry_type`
)
#### Entry index
This is the default and is used even if another (additional) value is given. All data
is put into the entry index.
#### Materials index
This is a separate index from the entry index and contains aggregated material information.
Each document in this index represents a material. We use a hash over some material
properties (elements, system type, symmetry) to define what a material is and what entries
belong to what material.
Some parts of the materials documents contain the material information that is always
the same across all entries of this material. Examples are elements, formulas, symmetry.
#### Material entries
The materials index also contains entry specific information that allows to filter
materials for the existence of entries with certain criteria. Examples are
publish status, user metadata, used method, or property data.
### Adding quantities
In principle all quantities could be added to the index. But for convention and simplicity,
only quantities defined in sections
`metadata`
and
`results`
should be added. This
means that if you want to add custom quantities from your parser for example, you will
also need to adapt the results normalizer to copy or reference parsed data.
## The search API
The search API does not have to change. It automatically supports all quantities with
the eleasticsearch extensions. The keys that you can use in the API are the metainfo
paths of the respective quantities, e.g.
`results.material.elements`
or
`mainfile`
(note
that the
`metadata.`
prefix is always omitted). If there are multiple elasticsearch
annotations for the same quantity, all but one of the define a
`field`
parameter, which
is added to the quantity path, e.g.
`mainfile.path`
.
## The search web interface
Comming soon ...
\ No newline at end of file
docs/stylesheets/extra.css
0 → 100644
View file @
1874328c
.md-header__button.md-logo
:where
(
img
,
svg
)
{
width
:
4.2rem
;
height
:
2rem
;
}
.md-header
,
.md-header__inner
{
background-color
:
#fff
;
color
:
#008DC3
;
font-weight
:
400
;
}
.md-search__form
:hover
{
background-color
:
rgba
(
0
,
0
,
0
,
.13
);
}
\ No newline at end of file
gui/src/components/archive/MetainfoBrowser.js
View file @
1874328c
...
...
@@ -446,7 +446,13 @@ SubSectionDef.propTypes = ({
})
function
DefinitionProperties
({
def
,
children
})
{
if
(
!
(
children
||
def
.
aliases
?.
length
||
def
.
deprecated
||
Object
.
keys
(
def
.
more
).
length
))
{
const
searchAnnotations
=
def
.
m_annotations
&&
Object
.
keys
(
def
.
m_annotations
)
.
filter
(
key
=>
key
===
'
elasticsearch
'
)
.
map
(
key
=>
def
.
m_annotations
[
key
].
filter
(
value
=>
!
(
value
.
endsWith
(
'
.suggestion
'
)
||
value
.
endsWith
(
'
__suggestion
'
)))
)
if
(
!
(
children
||
def
.
aliases
?.
length
||
def
.
deprecated
||
Object
.
keys
(
def
.
more
).
length
||
searchAnnotations
))
{
return
''
}
...
...
@@ -457,6 +463,8 @@ function DefinitionProperties({def, children}) {
{
Object
.
keys
(
def
.
more
).
map
((
moreKey
,
i
)
=>
(
<
Typography
key
=
{
i
}
><
b
>
{
moreKey
}
<
/b>: {String
(
def.more
[
moreKey
])
}</
Typography
>
))}
{
searchAnnotations
&&
<
Typography
><
b
>
search
&
nbsp
;
keys
<
/b>:
{
searchAnnotations
.
join
(
'
,
'
)}
<
/Typography>
}
<
/Compartment
>
}
DefinitionProperties
.
propTypes
=
({
...
...
mkdocs.yml
View file @
1874328c
...
...
@@ -12,6 +12,7 @@ nav:
-
Extending and Developing NOMAD
:
-
developers.md
-
metainfo.md
-
search.md
-
parser.md
-
normalizers.md
-
Operating NOMAD (Oasis)
:
oasis.md
...
...
@@ -22,8 +23,10 @@ theme:
accent
:
teal
font
:
text
:
'
Titillium
Web'
logo
:
null
favicon
:
assets/favicon-hres.png
logo
:
assets/nomad-logo.png
favicon
:
assets/favicon.png
features
:
-
navigation.instant
# repo_url: https://gitlab.mpcdf.mpg.de/nomad-lab/nomad-FAIR/
markdown_extensions
:
-
attr_list
...
...
@@ -40,8 +43,13 @@ markdown_extensions:
toc_depth
:
3
extra
:
generator
:
false
homepage
:
https://nomad-lab.eu/prod/v1/gui/about
use_directory_urls
:
false
plugins
:
-
search
-
macros
:
module_name
:
nomad/mkdocs
\ No newline at end of file
module_name
:
nomad/mkdocs
extra_css
:
-
stylesheets/extra.css
extra_javascript
:
-
javascript.js
\ No newline at end of file
nomad/cli/dev.py
View file @
1874328c
...
...
@@ -71,12 +71,17 @@ def metainfo():
def
metainfo_undecorated
():
from
nomad.metainfo
import
Package
,
Environment
from
nomad.datamodel
import
EntryArchive
# TODO similar to before, due to lazyloading, we need to explicily access parsers
# to actually import all parsers and indirectly all metainfo packages
from
nomad.parsing
import
parsers
parsers
.
parsers
# Create the ES mapping to populate ES annoations with search keys.
from
nomad.search
import
entry_type
entry_type
.
create_mapping
(
EntryArchive
.
m_def
)
# TODO we call __init_metainfo__() for all packages where this has been forgotten
# by the package author. Ideally this would not be necessary and we fix the
# actual package definitions.
...
...
nomad/metainfo/elasticsearch_extension.py
View file @
1874328c
...
...
@@ -405,6 +405,9 @@ class DocumentType():
assert
name
not
in
self
.
metrics
,
'Metric names must be unique: %s'
%
name
self
.
metrics
[
name
]
=
(
metric
,
search_quantity
)
if
self
==
entry_type
:
annotation
.
search_quantity
=
search_quantity
def
__repr__
(
self
):
return
self
.
name
...
...
@@ -596,6 +599,7 @@ class Elasticsearch(DefinitionAnnotation):
Attributes:
name:
The name of the quantity (plus additional field if set).
search_quantity: The entry type SearchQuantity associated with this annoation.
'''
def
__init__
(
self
,
...
...
@@ -655,6 +659,8 @@ class Elasticsearch(DefinitionAnnotation):
self
.
nested
=
nested
self
.
suggestion
=
suggestion
self
.
search_quantity
=
None
@
property
def
values
(
self
):
return
self
.
_values
...
...
@@ -749,6 +755,12 @@ class Elasticsearch(DefinitionAnnotation):
return
f
'Elasticsearch(
{
self
.
definition
}
)'
def
m_to_dict
(
self
):
if
self
.
search_quantity
:
return
self
.
search_quantity
.
qualified_name
else
:
return
self
.
name
class
SearchQuantity
():
'''
...
...
nomad/metainfo/metainfo.py
View file @
1874328c
...
...
@@ -1541,6 +1541,12 @@ class MSection(metaclass=MObjectMeta): # TODO find a way to make this a subclas
else
:
raise
NotImplementedError
(
'Higher shapes (%s) not supported: %s'
%
(
quantity
.
shape
,
quantity
))
def
serialize_annotation
(
annotation
):
if
isinstance
(
annotation
,
Annotation
):
return
annotation
.
m_to_dict
()
else
:
return
str
(
annotation
)
def
items
()
->
Iterable
[
Tuple
[
str
,
Any
]]:
# metadata
if
with_meta
:
...
...
@@ -1550,6 +1556,16 @@ class MSection(metaclass=MObjectMeta): # TODO find a way to make this a subclas
if
self
.
m_parent_sub_section
is
not
None
:
yield
'm_parent_sub_section'
,
self
.
m_parent_sub_section
.
name
annotations
=
{}
for
annotation_name
,
annotation
in
self
.
m_annotations
.
items
():
if
isinstance
(
annotation
,
list
):
annotation_value
=
[
serialize_annotation
(
item
)
for
item
in
annotation
]
else
:
annotation_value
=
[
serialize_annotation
(
annotation
)]
annotations
[
annotation_name
]
=
annotation_value
if
len
(
annotations
)
>
0
:
yield
'm_annotations'
,
annotations
# quantities
sec_path
=
self
.
m_path
()
for
name
,
quantity
in
self
.
m_def
.
all_quantities
.
items
():
...
...
@@ -3087,7 +3103,13 @@ class Category(Definition):
class
Annotation
:
''' Base class for annotations. '''
pass
def
m_to_dict
(
self
):
'''
Returns a JSON serializable representation that is used for exporting the
annotation to JSON.
'''
return
str
(
self
.
__class__
.
__name__
)
class
DefinitionAnnotation
(
Annotation
):
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment