Non local factors in derived properties

Some derived properties, take large data (e.g. a matrix) and normalize it based on small factors (e.g. for scaling or shifting). While the normalized data is usually part of the same section, the scaling factors might come from different sections. To evaluate such a derived property, not just the local data, but also the distant sections containing factors need to be available (e.g. loaded from the archive). This breaks the otherwise local nature of quantities, i.e. you cannot necessarily access the quantity on its section alone.

There are several thinkable solutions:

the quantity stores the factors as a tuple
- the factors are assigned to the quantity (e.g. by parser/normalizer) and passed to the derive function as parameters
- the factors are defined in the quantity definitions and automatically stored when the archive is written
we shoehorn this into quantity references. The serialized reference would be something like /run/scc/20/dos/values#normalized?cell_volume=0.19282e-12.
- the parameters have to be provided when the reference is set
- the parameters are somehow automatically set on serialization
- the actual computation is added to the values quantity definition
- the actual computation is added to the reference quantity definition

the original post

I'm facing a small dilemma with the derived metainfo quantities.

The metainfo specific_heat_capacity is simply a re-weighted version of thermodynamical_property_heat_capacity_C_v. Due to this simple relation, the metainfo is implemented as a derived quantity: it is calculated upon access. These derived properties are not stored in the archive and are not available until a proper MSection is de-serialized from the underlying data.

In the API I would want to access this information. Because of the derived nature, I tried to de-serialize only the parent section section_thermodynamical_properties. This can be done easily. But when I access the derived property, the calculation needs to resolve several references to obtain the normalization constant (in this case it needs to resolve the section_system that was used for calculating this property). To solve these references I thus basically have to deserialize the entire section_run, and only after that, all the references can be resolved properly.

Question: Although this mechanism works in practice, it means that upon accessing any derived property, there is a possibility that the whole Archive needs to be de-serialized into an MSection object in RAM to resolve the needed properties. This may become problematic and I'm wondering if there are better alternatives? This issue will affect all properties where we have have both the normalized and non-normalized version of the same data available (DOS, band structures, etc.). I'm wondering if it would make sense to add a new type of quantity (e.g. QuantityNormalized) that would explicitly store the normalization constants in the archive? This would be an intermediate solution between storing the full normalized data and using the derived properties.

Edited Jan 28, 2021 by Markus Scheidgen