Fix bug in GPU version
The associaten with Cuda memory as Fortran arrays was sometimes done wrongly , since not the number of array elements but instead the number of array elements times the sizeof(datatype) was used. This was a mix-up between the C allocation and the Fortran reshape. This could lead to memory corruption