Improving the handling of numeric datatypes in the metainfo
Our metainfo is quite inconsistent in how it handles numeric datatypes. Here is a list of different problems that should be fixed:
-
The type
field is not accepting 'bare' numpy types likenp.int32
, ornp.float64
, instead they have to be constructed using the dtype class (np.dtype(np.float)
). This is due to the type checkinstanceof(type, np.dtype)
, which fails unless the type is created with thenp.dtype
function. Instead we should extractnp.dtype.type
if anp.dtype
is given: this contains the final 'bare' numpy type that should be validated and stored. -
The metainfo is accepting all numpy dtypes, which includes several very exotic ones (e.g. np.complex128
) which we do not really support. Instead of simply checking that type is an instance ofnp.dtype
, we should have an explicit list of supported numpy dtypes:np.int32
,np.int64
,np.uint32
,np.uint64
,np.float32
,np.float64
. This list can be extended of course in the future. -
PrimitiveQuantity
is not handling unit information in the__set__
and__get__
methods. This means that unit information is completely ignored when assigning or retrieving values. If there is a very good reason why units should not be allowed with primitive types, then we should add a@constraint
for this. -
There is no constaint for checking if units are assigned to a numeric type or not. Need to add a @constraint
that throws aMetainfoError
upon package init. -
On the schema level it is possible to define a 1D list with a unit and a python numeric type ( int
,float
). However, Pint does not allow attaching unit information to bare python lists: it always uses numpy arrays. There should be a@constraint
that says that any non-scalar quantity with a unit needs to use numpy dtypes.
There are still problems especially when assigning data to a field with a different data type. For some types we raise an error, for numpy types we simply do a conversion etc. Conversions where data is not lost (e.g. from int
to float
) should probably be implicit, but conversion where there may be some data loss (e.g. from float to int, from signed to unsigned) we should probably throw an error.