Enable GPU verion with OpenMP in ELPA 2stage
The GPU logic has been implemented in the OpenMP code paths in ELPA2. Currently, this implies that _internal_ to ELPA2, the number of OpenMP threads is set to one (independent of how many threads the calling application uses) and the original value is restored at the end of ELPA. Though this is not the general case, this is _not_ a limitation, since in the GPU case no work is done on the CPU and thus no threading is helpful
This diff is collapsed.