Improvements to text_parser
The usual stuff, mandatory:
-
documentation (including simple example) -
proper tests (you can turn test_parsing.py
into a directoryparsing
with the oldtest_parsing.py
and newtest_text_parser.py
-
add type annotations to all public methods -
catch all matches to build a map of what has been extracted from the file and provide parser "telemetry"/statistics. In the old framework, we had an ANSI colored version of the parsed file. This was super helpful for debugging. This is also something we could show in the GUI, so you can see what was covered.
More specific suggestions:
-
it should be possible to initialise Quantity
from a metainfo quantity and get name, type, shape, etc. from there -
there should be automatic conversion from the unit in the text file to the unit attached to the metainfo Quantity via pint; the unit conversion method should not be called to_si, but to_metainfo_unit or something. -
since Quantity is a class, str_operation should be provided via inheritance? Or at least there should be the option to also overwrite it in a subclass? -
I do not like the idea of storing the values in Quantity. That introduces a very tight coupling between the two classes; makes it harder to clear out values between parser runs; what if someone gets the idea to use the same quantity in different parsers. The values should be stored in a separate mapping in UnstructuredTextFileParser. Quantity should only be providing the pattern and convert functions. -
If you do this dynamic parsing thing, where you only parse if someone asks for a quantity, why not go all the way and only parse for that quantity? -
You could also add getattr/getattribute to allow more convenient parser.quantity
style access. -
You could have an additional function that allows to pass a section. This function could automatically look and try parsing for all quantities in that section's definition and assign values accordingly.