Dynamic Quantities and Updating without Reprocessing
Atm, any quantity that we want to appear in the ES query, has to be stored in the archive and indexed. However, some quantities are derived from an already stored archive quantity with a many-to-one mapping. In many of these cases, I would see it possible that one wants to update these quantities without necessarily reprocessing. Real case scenarios include:
- URLs to a git project or code: the project migrated
- summing of an array: we get user requests to add search statistics of an array previously not considered
- metadata from XC functionals / Force Fields: the scientific knowledge here gets updated
- searching groups of elements, e.g. transition metals, rare-earth, actinides, lanthanides, halogens, etc.
Due to the many-to-one relationship, we can derive these quantities from other stored quantities. Could we devise an additional layer, so:
- queries for some quantities are transformed into queries for the mapped quantities, returning the corresponding entries? Obviously, the mapped quantities have to be indexed.
- an end user cannot distinguish between a stored or dynamically derived quantity?
For example: somebody looks for a field describing the dataset used to train an XC functional. We, internally, have a map to xc_functional_name
. They get all the entries with the matching xc_functional_name
. When they visit a specific entry, they see the dataset under DATA, even though it is not stored in the archive.