Get rid of internal memcpy in hermitian_multiply in case of device pointers
In hermitian_multiply, when the function is called with device pointers (data already on GPU) this is internally copied from one memory allocation to another. This should be removed, and multiply should just work on the provided memory space (maybe via a transfer statement)