- 11 Nov, 2019 1 commit
-
-
Andreas Marek authored
-
- 08 Nov, 2019 1 commit
-
-
Andreas Marek authored
-
- 05 Nov, 2019 1 commit
-
-
Andreas Marek authored
-
- 04 Nov, 2019 2 commits
-
-
Andreas Marek authored
Skew See merge request !26
-
Andreas Marek authored
-
- 31 Oct, 2019 1 commit
-
- 30 Oct, 2019 3 commits
-
-
Pavel Kus authored
-
-
Andreas Marek authored
Gitlab CI: Test for 64bit blas and 32bit MPI See merge request !24
-
- 29 Oct, 2019 1 commit
-
-
Andreas Marek authored
-
- 28 Oct, 2019 2 commits
-
-
Andreas Marek authored
-
Pavel Kus authored
This commit addresses several issues. It essentially forbids the use of the GPU kernel, which become obsolete and caused problems. But it does not complete remove the related code, nor does it forbid from explicitly selecting the GPU kernel. However, if the user does select it, the warning will be issued and the GENERIC kernel would be used instead. In the more details: * Commentin out operations in the GPU kernel, which do not compile with CUDA 10.1. This makes the kernel deffinitely not ussable (but it was true even before) * removing the gpu_tridiag_band option, sincie the tridiag->banded routine is actually not ported to GPU at all. This step will thus always be run on the CPU * removing the gpu_trans_ev_tridi_to_band option, since the GPU version of this step cannot run without the GPU kernel and it is not usable. This step will thus also be performed on the CPU * modifying REAL_GPU_KERNEL_ONLY_WHEN_GPU_IS_ACTIVE and COMPLEX_GPU_KERNEL_ONLY_WHEN_GPU_IS_ACTIVE such that the GPU kernel is not considered during the autotuning * TODO however, the GPU kernel can still be enforced by the user. In this case, during the calculation, a warning is issued and the kernel is switched to the GENERIC one. This should be improved and there should not even be the possibility to choose the GPU kernel at the begining.
-
- 26 Oct, 2019 2 commits
-
-
Andreas Marek authored
Long int scalapack See merge request !23
-
Andreas Marek authored
-
- 25 Oct, 2019 1 commit
-
-
Andreas Marek authored
-
- 24 Oct, 2019 7 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
-
Andreas Marek authored
-
Andreas Marek authored
-
Andreas Marek authored
-
Carolin Penke authored
-
Andreas Marek authored
-
- 23 Oct, 2019 3 commits
-
-
Pavel Kus authored
with cudaDeviceSynchronize
-
Andreas Marek authored
- 22 Oct, 2019 3 commits
-
-
Pavel Kus authored
a_dev was never freed on the GPU However, this might not be enough. What if bandred runs on GPU and band_to_tridi on CPU? a_dev is then not allocated. Has to be re-thinked in general
-
Sebastian Ohlmann authored
When profiling the GPU version, NVTX can be used to highlight the corresponding regions of the code in the timeline of the profiling tool (nvvp or nsight systems). This is very useful to correlate what happens on the GPU with what part of the code we are in.
-
Pavel Kus authored
-
- 21 Oct, 2019 1 commit
-
-
Andreas Marek authored
-
- 19 Oct, 2019 2 commits
-
-
Andreas Marek authored
-
-
- 17 Oct, 2019 2 commits
-
-
Andreas Marek authored
ELPA can now be linked against a 64bit integer version of MPI and ScalaPack. This is an experimental feature The following points are still to be done - does not work with real QR-decomposition - generalized routines return wrong results - the C tests and the C Cannon algorithm implementation do not work (no 64bit header files for MPI *at least* with Intel MPI)
-
Andreas Marek authored
-
- 14 Oct, 2019 1 commit
-
-
Andreas Marek authored
ELPA can now link agains a 64bit integer verion of BLAS/LAPACK. Currently this only works if ELPA is compiled with MPI=OFF! The 64bit support is not available in the legacy interface
-
- 11 Oct, 2019 5 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
Seperate the variable type definition of the library and the test programs See merge request !21
-
Andreas Marek authored
-
Andreas Marek authored
-
Andreas Marek authored
Auto detect See merge request !20
-
- 10 Oct, 2019 1 commit
-
-
Andreas Marek authored
On heterogenous cluster, of nodes with different CPUs the _experimental_ feature (--enable-heterogenous-cluster-support) can be used: It compares the (Intel) cpuid set of all CPUs which are used by ELPA MPI processes and finds the SIMD instruction set, which is supported by all used CPUs. The ELPA 2stage back-transformation kernel (a.k.a "kernel") will be set accordingly on all MPI processes. This feature, can override the setting of the kernel done previously by the user! At the moment it will only work for Intel CPUs, i.e. clusters consisting of nodes with Intel CPUs and e.g. AMD CPUs are at the moment _NOT_ supported. Since this is an experimental feature, it might be dropped again in the future, if it turns out not to be useful for the users
-