Parser re-compile "submatchers" all the time
Many of the legacy NOMAD CoE parsers use SimpleMatcher
s (SM). In order to use a parse tree of SMs, the tree has to be "compiled". This takes quite a while and should only be done once for each parse tree. Unfortunately, most parsers do not allow that: the SM tree is build and compiled for each parser run.
I managed to add a cache to the compile function, so that each SM tree is only compiled once. While some parsers only create the SM tree once, some parser don't. In principle this should be avoidable, but the code structure does not allow it.
Examples of such parsers are:
- quantum espresso
- crystal
- cp2k
- cpmd
Besides this, the parsers are not opted for reuse at all. While the legacy/nomadcore modules suggest reusability at some places (e.g. parser vs. context, interface vs. parsers), it is not thought through and lots of initialisation is done again, again, and again.
Tasks:
- replace simple_parser.mainFunction, baseclasses.ParserInterface with a unified interface that really promotes parser reuse
- rewrite parsers, one by one, to use this interface
- cleanup the parser code in the process: pep8, dead-code, unnecessary imports
- test