Metainfo data type improvements
Our metainfo has some inconsistencies in how the different data types are handled. Here are some observations which could be broken down into separate issues if we decide to act on them:
- Units are ignored for primitive python numeric types (#1006 (closed)).
- Units cannot be assigned to non-numpy 1D arrays (
pint
does not allow assigning units to regular python 1D lists). - Units can be assigned to non-numeric quantities (maybe a
@constraint
would help here). - Possible precision loss in downcasting is not handled consistently. For primitive types, one cannot assign values whose values do not match the target data type. This ensures that no precision is lost, but is too strict (cannot assign int to float). For numpy data types, there is no check for precision loss, and values can be casted freely. E.g. storing a float array to an int array does not raise any errors.
- The differences between some of the data types are quite vague. We do support both 32 and 64 bit
int
andfloat
numbers, but is this difference realized when we are saving the archive to disk? How does the fact that Javascript only deals with one numeric 64 bit numeric type (Number
) affect things when reading/saving data in an ELN? - Arrays with dimensions > 1 can only be created for quantities that have a numpy data type. This is causing some confusion especially for people who do not know what numpy is.
My gut feeling would be that many of these issues would if we used numpy internally when storing all numeric values, no matter what shape (0D, 1D, ND). This way our metainfo would have one "normalized" form for all numeric data, but the user could use any compatible data type in assignments (we would check the compatibility automatically). This would also mean that upon accessing numeric data, the user would always get a numpy data back. Whether this is confusing or not is hard to say. This change would also mean that we could simplify the supported numeric data types. How far we simplify them is an open question. One extreme is to only support two numeric types: int
(64 bit integer) and float
(64 bit float). Another option would be to keep support for different variants (int32
, int64
, float32
, float64
, uint32
, uint64
etc.) but maybe use simple strings instead of the actual numpy data type objects, as it may be confusing for ELN users.
All of this is open for discussion and any comments are welcome.