Improving the handling of numeric datatypes in the metainfo

Our metainfo is quite inconsistent in how it handles numeric datatypes. Here is a list of different problems that should be fixed:

  • The type field is not accepting 'bare' numpy types like np.int32, or np.float64, instead they have to be constructed using the dtype class (np.dtype(np.float)). This is due to the type check instanceof(type, np.dtype), which fails unless the type is created with the np.dtype function. Instead we should extract np.dtype.type if a np.dtype is given: this contains the final 'bare' numpy type that should be validated and stored.
  • The metainfo is accepting all numpy dtypes, which includes several very exotic ones (e.g. np.complex128) which we do not really support. Instead of simply checking that type is an instance of np.dtype, we should have an explicit list of supported numpy dtypes: np.int32, np.int64, np.uint32, np.uint64, np.float32, np.float64. This list can be extended of course in the future.
  • PrimitiveQuantity is not handling unit information in the __set__ and __get__ methods. This means that unit information is completely ignored when assigning or retrieving values. If there is a very good reason why units should not be allowed with primitive types, then we should add a @constraint for this.
  • There is no constaint for checking if units are assigned to a numeric type or not. Need to add a @constraint that throws a MetainfoError upon package init.
  • On the schema level it is possible to define a 1D list with a unit and a python numeric type (int, float). However, Pint does not allow attaching unit information to bare python lists: it always uses numpy arrays. There should be a @constraint that says that any non-scalar quantity with a unit needs to use numpy dtypes.

There are still problems especially when assigning data to a field with a different data type. For some types we raise an error, for numpy types we simply do a conversion etc. Conversions where data is not lost (e.g. from int to float) should probably be implicit, but conversion where there may be some data loss (e.g. from float to int, from signed to unsigned) we should probably throw an error.

Edited Aug 05, 2022 by Lauri Himanen
Assignee Loading
Time tracking Loading