Modify elpa_transpose_vectors_ NCCL such that it can be used in cholesky
The cholesky decomposition step could be further sped-up on GPUs if one could use elpa_transpose_vectors_ NCCL frm ELPA 1stage full_to_tridi there.
The cholesky decomposition step could be further sped-up on GPUs if one could use elpa_transpose_vectors_ NCCL frm ELPA 1stage full_to_tridi there.