Improving the handling of numeric datatypes in the metainfo
Our metainfo is quite inconsistent in how it handles numeric datatypes. Here is a list of different problems that should be fixed:
-
The type
field is not accepting 'bare' numpy types likenp.int32
, ornp.float64
, instead they have to be constructed using the dtype class (np.dtype(np.float)
). This is due to the type checkinstanceof(type, np.dtype)
, which fails unless the type is created with thenp.dtype
function. Instead we should extractnp.dtype.type
if anp.dtype
is given: this contains the final 'bare' numpy type that should be validated and stored. -
The metainfo is accepting all numpy dtypes, which includes several very exotic ones (e.g. np.complex128
) which we do not really support. Instead of simply checking that type is an instance ofnp.dtype
, we should have an explicit list of supported numpy dtypes:np.int32
,np.int64
,np.uint32
,np.uint64
,np.float32
,np.float64
. This list can be extended of course in the future. -
PrimitiveQuantity
is not handling unit information in the__set__
and__get__
methods. This means that unit information is completely ignored when assigning or retrieving values. If there is a very good reason why units should not be allowed with primitive types, then we should add a@constraint
for this. -
There is no constaint for checking if units are assigned to a numeric type or not. Need to add a @constraint
that throws aMetainfoError
upon package init. -
On the schema level it is possible to define a 1D list with a unit and a python numeric type ( int
,float
). However, Pint does not allow attaching unit information to bare python lists: it always uses numpy arrays. There should be a@constraint
that says that any non-scalar quantity with a unit needs to use numpy dtypes.
There are still problems especially when assigning data to a field with a different data type. For some types we raise an error, for numpy types we simply do a conversion etc. Conversions where data is not lost (e.g. from int
to float
) should probably be implicit, but conversion where there may be some data loss (e.g. from float to int, from signed to unsigned) we should probably throw an error.
Edited by Lauri Himanen