xpath-style functionality for metainfo and archive queries
Archive/metainfo data can be quite complex and hard to navigate. Therefore, it would be nice to have a xpath-style mechanism to access metainfo data in deeper sections.
Some example features:
- exist queries:
section_single_configuration_calculation[@section_dos]
- value queries:
section_system[@is_representative==True]
- first match queries:
section_run[]/section_single_configuration_calculation[@section_dos]
, go through all section_run and only use first that have scc with dos - min/max queries:
section_single_configuration_calculation[min(energy_total)]
- wildcard parent sections:
*/atom_labels
gives first value for the quantity/section - wildcard in certain section:
section_run[1]/*/atom_labels
First this could be attached to nomad.metainfo.metainfo.MSection
. This should give me the section_dos or None from an archive entry:
entry.m_xpath('section_run[]/section_single_configuration_calculation[@section_dos]/section_dos')
The implementation should work on a segment level and m_xpath
should just be a chain to calls of m_xpath_segment
.
run.m_xpath_segment('section_single_configuration_calculation[@section_dos]')
The segment parsing and semantics should go into its only module. Later this should also be used in the backend to evaluate the required/schema part of archive queries in nomad.archive.query_archive
:
'section_run[]': {
'section_system[@section_symmetry]': '*'
'section_single_configuration_calculation[@section_dos]: '*'
}
Maybe something like https://github.com/akesterson/dpath-python can be used directly. There are probably other implementations out there. Otherwise xml/xpath could be an inspiration.