Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
nomad-FAIR
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
nomad-lab
nomad-FAIR
Merge requests
!1655
Resolve "Generic xml parser"
Code
Review changes
Check out branch
Download
Patches
Plain diff
Merged
Resolve "Generic xml parser"
1875-generic-xml-parser
into
develop
Overview
31
Commits
52
Pipelines
42
Changes
12
Merged
Alvin Noe Ladines
requested to merge
1875-generic-xml-parser
into
develop
1 year ago
Overview
31
Commits
52
Pipelines
42
Changes
12
Expand
Closes
#1875 (closed)
0
0
Merge request reports
Compare
develop
version 42
b52d8ef0
3 months ago
version 41
7ede1f24
3 months ago
version 40
6abab8d4
3 months ago
version 39
64711e99
3 months ago
version 38
cb7cf390
3 months ago
version 37
700adc69
3 months ago
version 36
1e2dae60
3 months ago
version 35
de685c55
3 months ago
version 34
8d2750b9
3 months ago
version 33
b7134185
3 months ago
version 32
c22a5e0d
4 months ago
version 31
cb546f70
4 months ago
version 30
8418a88b
4 months ago
version 29
90d1a054
4 months ago
version 28
bcb5354c
4 months ago
version 27
5d70841f
4 months ago
version 26
50f01b64
4 months ago
version 25
ac2dc094
4 months ago
version 24
efbc64df
4 months ago
version 23
67f1a903
5 months ago
version 22
b90552ec
6 months ago
version 21
267cbbc6
6 months ago
version 20
8cff4506
7 months ago
version 19
dffc87a5
7 months ago
version 18
0eca3221
8 months ago
version 17
046cf4a3
8 months ago
version 16
df6bc130
8 months ago
version 15
9780cfe1
8 months ago
version 14
85098cdf
8 months ago
version 13
4aabba56
8 months ago
version 12
a3dbee16
8 months ago
version 11
18b4aea3
8 months ago
version 10
469e05b1
8 months ago
version 9
43e89cc8
8 months ago
version 8
3dd0e1ae
9 months ago
version 7
3dd0e1ae
11 months ago
version 6
64f88928
11 months ago
version 5
d1ba9055
11 months ago
version 4
52d30517
11 months ago
version 3
1713d7e9
11 months ago
version 2
621d1634
11 months ago
version 1
c9f7df23
11 months ago
develop (base)
and
latest version
latest version
f314b212
52 commits,
3 months ago
version 42
b52d8ef0
51 commits,
3 months ago
version 41
7ede1f24
50 commits,
3 months ago
version 40
6abab8d4
50 commits,
3 months ago
version 39
64711e99
50 commits,
3 months ago
version 38
cb7cf390
49 commits,
3 months ago
version 37
700adc69
48 commits,
3 months ago
version 36
1e2dae60
47 commits,
3 months ago
version 35
de685c55
46 commits,
3 months ago
version 34
8d2750b9
46 commits,
3 months ago
version 33
b7134185
44 commits,
3 months ago
version 32
c22a5e0d
42 commits,
4 months ago
version 31
cb546f70
41 commits,
4 months ago
version 30
8418a88b
40 commits,
4 months ago
version 29
90d1a054
39 commits,
4 months ago
version 28
bcb5354c
38 commits,
4 months ago
version 27
5d70841f
35 commits,
4 months ago
version 26
50f01b64
34 commits,
4 months ago
version 25
ac2dc094
34 commits,
4 months ago
version 24
efbc64df
33 commits,
4 months ago
version 23
67f1a903
31 commits,
5 months ago
version 22
b90552ec
29 commits,
6 months ago
version 21
267cbbc6
28 commits,
6 months ago
version 20
8cff4506
25 commits,
7 months ago
version 19
dffc87a5
24 commits,
7 months ago
version 18
0eca3221
21 commits,
8 months ago
version 17
046cf4a3
19 commits,
8 months ago
version 16
df6bc130
18 commits,
8 months ago
version 15
9780cfe1
17 commits,
8 months ago
version 14
85098cdf
16 commits,
8 months ago
version 13
4aabba56
15 commits,
8 months ago
version 12
a3dbee16
14 commits,
8 months ago
version 11
18b4aea3
13 commits,
8 months ago
version 10
469e05b1
12 commits,
8 months ago
version 9
43e89cc8
95 commits,
8 months ago
version 8
3dd0e1ae
10 commits,
9 months ago
version 7
3dd0e1ae
10 commits,
11 months ago
version 6
64f88928
9 commits,
11 months ago
version 5
d1ba9055
8 commits,
11 months ago
version 4
52d30517
7 commits,
11 months ago
version 3
1713d7e9
3 commits,
11 months ago
version 2
621d1634
2 commits,
11 months ago
version 1
c9f7df23
1 commit,
11 months ago
12 files
+
2462
−
10
Inline
Compare changes
Side-by-side
Inline
Show whitespace changes
Show one file at a time
Files
12
Search (e.g. *.vue) (Ctrl+P)
docs/howto/customization/mapping_parser.md
0 → 100644
+
217
−
0
Options
# How to write data to archive with MappingParser
`MappingParser`
is a generic parser class implemented in
`nomad.parsing.file_parser/mapping_parser.py`
to handle the conversion to and from a
data object and a python dictionary. We refer to an instance of the
this class as 'mapping parser' throughout this section. In the following, the abstract
properties and methods of the mapping parser are explained. The various implementations of
the mapping parser are also defined and
`Mapper`
which is required to convert a
mapping parser into another mapping parser is explained as well.
## MappingParser
The mapping parser has several abstract properties and methods and the most important
ones are listed in the following:
-
`filepath`
: path to the input file to be parsed
-
`data_object`
: object resulting from loading the file in memory with
`load_file`
-
`data`
: dictionary representation of
`data_object`
-
`mapper`
: instance of
`Mapper`
required by
`convert`
-
`load_file`
: method to load the file given by
`filepath`
-
`to_dict`
: method to convert
`data_object`
into
`data`
-
`from_dict`
: method to convert
`data`
into
`data_object`
-
`convert`
: method to convert to another mapping parser
`data_object`
can be an
`XML`
element tree or a
`metainfo`
section for example depending on
the inheriting class. In order to convert a mapping parser to another parser,
the target parser must provide a
[
`Mapper`
](
#mapper
)
object. We refer to this simply as
mapper throughout.
In the following, we describe the currently implemented mapping parsers.
### XMLParser
This is mapping parser for XML files. It uses
[
`lxml`
](
https://lxml.de/
)
to
load the file as an element tree. The dictionary is generated by iteratively parsing the
elements of the tree in
`to_dict`
. The values parsed from element
`text`
are automatically
converted to a corresponding data type. If attributes are present, the value is wrapped in
a dictionary with key given by
`value_key`
('__value' by default) while the attribute keys
are prefixed by
`attribute_prefix`
('@' by default). The following XML:
```
xml
<a>
<b
name=
'item1'
>
name
</b>
<b
name=
'item2'
>
name2
</b>
</a>
```
will be converted to:
```
python
data
=
{
'
a
'
:
{
'
b
'
:
[
{
'
@name
'
:
'
item1
'
,
'
__value
'
:
'
name
'
},
{
'
@name
'
:
'
item2
'
,
'
__value
'
:
'
name2
'
}
]
}
}
```
The conversion can be reversed using the
`from_dict`
method.
### HDF5Parser
This is the mapping parser for HDF5 files. It uses
[
`h5py`
](
https://www.h5py.org/
)
to load
the file as an HDF5 group. Similar to
[
XMLParser
](
#xmlparser
)
, the HDF5 datasets are
iteratively parsed from the underlying groups and if attributes are present these are
also parsed. The
`from_dict`
method is also implemented to convert a dictionary into an
HDF5 group.
### MetainfoParser
This is the mapping parser for NOMAD archive files or metainfo sections.
It accepts a schema root node annotated with
`MappingAnnotation`
as
`data_object`
.
`create_mapper`
generates the actual mapper as matching the
`annotation_key`
.
If a
`filepath`
is specified, it instead falls back on the
[
`ArchiveParser`
](
--ref--
)
.
<!-- TODO: add reference -->
The annotation should always point to a parsed value via a
`path`
(JMesPath format).
It may optionally specify a multi-argument
`operator`
for data mangling.
<!-- most operators are binary, would change the name -->
In this case, specify a tuple consisting of:
-
the operator name, defined within the same scope.
-
a list of paths with the corresponding values for the operator arguments.
<!-- @Alvin: can you verify? -->
Similar to
`MSection`
, it can be converted to (
`to_dict`
) or from (
`from_dict`
) a Python
`dict`
.
Other attributes are currently accessible.
```
python
from
nomad.datamodel.metainfo.annotations
import
Mapper
as
MappingAnnotation
class
BSection
(
ArchiveSection
):
v
=
Quantity
(
type
=
np
.
float64
,
shape
=
[
2
,
2
])
v
.
m_annotations
[
'
mapping
'
]
=
dict
(
xml
=
MappingAnnotation
(
mapper
=
'
.v
'
),
hdf5
=
MappingAnnotation
(
mapper
=
(
'
get_v
'
,
[
'
.v[0].d
'
])),
)
v2
=
Quantity
(
type
=
str
)
v2
.
m_annotations
[
'
mapping
'
]
=
dict
(
xml
=
MappingAnnotation
(
mapper
=
'
.c[0].d[1]
'
),
hdf5
=
MappingAnnotation
(
mapper
=
'
g.v[-2]
'
),
)
class
ExampleSection
(
ArchiveSection
):
b
=
SubSection
(
sub_section
=
BSection
,
repeats
=
True
)
b
.
m_annotations
[
'
mapping
'
]
=
dict
(
xml
=
MappingAnnotation
(
mapper
=
'
a.b1
'
),
hdf5
=
MappingAnnotation
(
mapper
=
'
.g1
'
)
)
ExampleSection
.
m_def
.
m_annotations
[
'
mapping
'
]
=
dict
(
xml
=
MappingAnnotation
(
mapper
=
'
a
'
),
hdf5
=
MappingAnnotation
(
mapper
=
'
g
'
)
)
parser
=
MetainfoParser
()
p
.
data_object
=
ExampleSection
(
b
=
[
BSection
()])
p
.
annotation_key
=
'
xml
'
p
.
mapper
# Mapper(source=Path(path='a'....
```
### Converting mapping parsers
The following is a sample python code to illustrate the mapping of the contents of an
HDF5 file to an archive. First, we create a
`MetainfoParser`
object for the archive. The
annotation key is set to
`hdf5`
which will generate a
[
mapper
](
#mapper
)
from the
`hdf5`
annotations defined in the definitions. Essentially,
only metainfo sections and quantities with the
`hdf5`
annotation will be mapped. The mapper
will contain paths for the source (HDF5) and the target (archive). The archive is then
set to the archive parser
`data_object`
. Here, the archive already contains some data
which should be merged to data that will be parsed. Next, a parser for HDF5 data is
created. We use a custom class of the
`HDF5Parser`
which implements the
`get_v`
method
defined in
`BSection.v`
In this example, we do not read the data from the HDF5 file but
instead generate it from a dictionary by using the
`from_dict`
method. By invoking the
`convert`
method, the archive parser data object is populated with the corresponding
HDF5 data.
```
python
class
ExampleHDF5Parser
(
HDF5Parser
):
@staticmethod
def
get_v
(
value
):
return
np
.
array
(
value
)[
1
:,
:
2
]
archive_parser
=
MetainfoParser
()
archive_parser
.
annotation_key
=
'
hdf5
'
archive_parser
.
data_object
=
ExampleSection
(
b
=
[
BSection
(
v
=
np
.
eye
(
2
))])
hdf5_parser
=
ExampleHDF5Parser
()
d
=
dict
(
g
=
dict
(
g1
=
dict
(
v
=
[
dict
(
d
=
np
.
array
([[
1
,
2
,
3
],
[
4
,
5
,
6
],
[
7
,
8
,
9
]]))]),
v
=
[
'
x
'
,
'
y
'
,
'
z
'
],
g
=
dict
(
c1
=
dict
(
i
=
[
4
,
6
],
f
=
[
{
'
@index
'
:
0
,
'
__value
'
:
1
},
{
'
@index
'
:
2
,
'
__value
'
:
2
},
{
'
@index
'
:
1
,
'
__value
'
:
1
},
],
d
=
[
dict
(
e
=
[
3
,
0
,
4
,
8
,
1
,
6
]),
dict
(
e
=
[
1
,
7
,
8
,
3
,
9
,
1
])],
),
c
=
dict
(
v
=
[
dict
(
d
=
np
.
eye
(
3
),
e
=
np
.
zeros
(
3
)),
dict
(
d
=
np
.
ones
((
3
,
3
)))]),
),
)
)
hdf5_parser
.
from_dict
(
d
)
hdf5_parser
.
convert
(
archive_parser
)
# >>> archive_parser.data_object
# ExampleSection(b, b2)
# >>> archive_parser.data_object.b[1].v
# array([[4., 5.],
# [7., 8.]])
```
## Mapper
A mapper is necessary in order to convert a mapping parser to a target mapping parser
by mapping data from the source to the target. There are three kinds of mapper:
`Map`
,
`Evaluate`
and
`Mapper`
each inheriting from
`BaseMapper`
. A mapper has attributes
source and target which define the paths to the source data and target, respectively.
`Map`
is intended for mapping data directly from source to target. The path to the data is
given by the attribute
`path`
.
`Evaluate`
will execute a function defined by
`function_name`
with the arguments given by the mapped values of the paths in
`function_args`
. Lastly,
`Mapper`
allows the nesting of mappers by providing a list of
mappers to its attribute
`mapper`
. All the paths are instances of
`Path`
with the string
value of the path to the data given by the attribute
`path`
. The value of path should
follow the
[
jmespath specifications
](
https://jmespath.org/specification.html
)
but could be
prefixed by
`.`
which indicates that this is a path relative to the parent. This will communicate to the
mapper which source to get the data.
```
python
Mapper
(
source
=
Path
(
path
=
'
a.b2
'
,
target
=
Path
(
path
=
'
b2
'
),
mapper
=
[
Mapper
(
source
=
Path
(
path
=
'
.c
'
,
parent
=
Path
(
path
=
'
a.b2
'
)),
target
=
Path
(
path
=
'
.c
'
,
parent
=
Path
(
path
=
'
b2
'
)),
mapper
=
[
Map
(
target
=
Path
(
path
=
'
.i
'
,
parent
=
Path
(
path
=
'
.c
'
,
parent
=
Path
(
path
=
'
b2
'
))
),
path
=
Path
(
path
=
'
.d
'
,
parent
=
Path
(
path
=
'
.c
'
parent
=
Path
(
path
=
'
a.b2
'
))
)
),
Evaluate
(
target
=
Path
(
path
=
'
.g
'
,
parent
=
Path
(
path
=
'
.c
'
,
parent
=
Path
(
path
=
'
b2
'
))
),
function_name
=
'
slice
'
,
function_args
=
[
Path
(
path
=
'
a.b2.c.f.g.i
'
)]
)
]
)
),
)
```
Loading