elpa issueshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues2022-02-09T13:50:23Zhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/911 MPI rank per GPU rest with OpenMP threads2022-02-09T13:50:23ZAndreas Marek1 MPI rank per GPU rest with OpenMP threadshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/106Add and RECOMMEND setup_gpu() to documentation2023-10-25T06:04:33ZAndreas MarekAdd and RECOMMEND setup_gpu() to documentationPetr KarpovPetr Karpovhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/59add scalapack test to gitlab CI2018-01-07T10:01:37ZPavel Kusadd scalapack test to gitlab CIThe scalapack tests are built only when --enable-scalapack-tests option is used with configure. We should test it in gitlab CI as well, but MKL 11.3 is strangely failing on buildtest. The problem seems to disappear when switching to MKL ...The scalapack tests are built only when --enable-scalapack-tests option is used with configure. We should test it in gitlab CI as well, but MKL 11.3 is strangely failing on buildtest. The problem seems to disappear when switching to MKL 2017 (even though it works on Hydra for both 11.3 and 2017). So we should return this test when we switch to MKL 2017 on buildtest.Pavel KusPavel Kushttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/65API change in elpa_deallocate()2019-02-18T12:44:14ZAsk Hjorth LarsenAPI change in elpa_deallocate()Hi! Please excuse me if this is not the right place to post this, or if I have missed info in the docs.
`elpa_deallocate()` recently got another argument, namely the error code:
https://gitlab.mpcdf.mpg.de/elpa/elpa/commit/69b68de30...Hi! Please excuse me if this is not the right place to post this, or if I have missed info in the docs.
`elpa_deallocate()` recently got another argument, namely the error code:
https://gitlab.mpcdf.mpg.de/elpa/elpa/commit/69b68de30e21d2d959baa426b968e39603ebd758
This will require existing interfaces to be updated as reported here:
https://gitlab.com/gpaw/gpaw/issues/197
Is there a recommended way to write interfaces that are compatible with both this *and* the older version? For example by accessing the version number in the preprocessor?https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/1Assumedsize arrays2023-11-06T20:38:45ZAndreas MarekAssumedsize arraysSome subroutines/functions of the Fortran code still use (deprecated) assumedsize arrays. This has been introduces for simplicity and performance (avoid unecessary copying of arrays) but makes debugging hard (or even impossible). This sh...Some subroutines/functions of the Fortran code still use (deprecated) assumedsize arrays. This has been introduces for simplicity and performance (avoid unecessary copying of arrays) but makes debugging hard (or even impossible). This should be changedhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/33AVX512 complex kernels do not work2017-05-21T22:14:19ZAndreas MarekAVX512 complex kernels do not workhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/112Bad perforrmance of ELPA_2STAGE_REAL_NVIDIA_SM80_GPU kernel2024-02-22T16:24:44ZPetr KarpovBad perforrmance of ELPA_2STAGE_REAL_NVIDIA_SM80_GPU kernelFor ELPA 2023.11.001, ELPA_2STAGE_REAL_NVIDIA_SM80_GPU kernel gives much worse performance than ELPA_2STAGE_REAL_NVIDIA_GPU.
Reproducer:
[config.log](/uploads/034f72c0f3c6f4e9c0974ed58ae23668/config.log) with modules:
```
module load ...For ELPA 2023.11.001, ELPA_2STAGE_REAL_NVIDIA_SM80_GPU kernel gives much worse performance than ELPA_2STAGE_REAL_NVIDIA_GPU.
Reproducer:
[config.log](/uploads/034f72c0f3c6f4e9c0974ed58ae23668/config.log) with modules:
```
module load autoconf/2.71 cuda/11.4 gcc/11 openmpi/4 mkl/2022.1 nccl/2.11.4
```
The problem comes from tridi_to_band. Here are the timings for 4 GPUs (1 Raven node) and N=10k, 30k, and 40k matrices:
10k
ELPA_2STAGE_REAL_NVIDIA_GPU: tridi_to_band 2.643782 s
ELPA_2STAGE_REAL_NVIDIA_SM80_GPU: tridi_to_band 10.700442 s
30k
ELPA_2STAGE_REAL_NVIDIA_GPU: tridi_to_band 45.905885 s
ELPA_2STAGE_REAL_NVIDIA_SM80_GPU: tridi_to_band 243.225826 s
40k
ELPA_2STAGE_REAL_NVIDIA_GPU: tridi_to_band 101.433715 s
ELPA_2STAGE_REAL_NVIDIA_SM80_GPU: tridi_to_band 565.902228 s
Here are the run logs for 40k matrix:
[slurm-9329866_40k_ELPA_2STAGE_REAL_NVIDIA_GPU.out](/uploads/ab2008646022d10507396235341de56b/slurm-9329866_40k_ELPA_2STAGE_REAL_NVIDIA_GPU.out)
[slurm-9329868_40k_ELPA_2STAGE_REAL_NVIDIA_SM80_GPU.out](/uploads/c64789bd736906b7193e057ec71ad448/slurm-9329868_40k_ELPA_2STAGE_REAL_NVIDIA_SM80_GPU.out)https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/67cannot build ELPA on talos with cuda 10.12019-11-20T11:05:57ZPavel Kuscannot build ELPA on talos with cuda 10.1works with cuda 10.0works with cuda 10.0Pavel KusPavel Kushttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/46Check complex stripe_width factor of 22017-05-21T22:14:18ZAndreas MarekCheck complex stripe_width factor of 2https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/82Check MPI calls within OpenMP parallelized regions2021-09-03T13:04:11ZAndreas MarekCheck MPI calls within OpenMP parallelized regionsCurrently we require the MPI library to provide the threading levels "MPI_THREAD_SERIALIZED" or "MPI_THREAD_MULTIPLE". This is done for safety and might not be necessary for all cases of calling ELPA.
Todo:
- make a list of all MPI call...Currently we require the MPI library to provide the threading levels "MPI_THREAD_SERIALIZED" or "MPI_THREAD_MULTIPLE". This is done for safety and might not be necessary for all cases of calling ELPA.
Todo:
- make a list of all MPI calls (also from subroutines) which are called from within OpenMP parallel regions
- check for all calls, whether it can be guaranteed which thread (master or any) will initiate the communication and which thread (master, or the same who initiated the call, or any) can end the communication
- adapt the required threading level accordinglySoheil SoltaniSoheil Soltanihttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/87Check status of GPU port of generalized routines2021-08-25T12:16:48ZAndreas MarekCheck status of GPU port of generalized routineshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/88Check status of mpi-redistribution routine2021-08-25T11:58:43ZAndreas MarekCheck status of mpi-redistribution routinehttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/47check trans_ev_band_to_full row_group2017-09-05T16:56:15ZAndreas Marekcheck trans_ev_band_to_full row_grouphttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/48cleanup of size_of_PRECISION_real/complex2017-05-21T22:14:18ZAndreas Marekcleanup of size_of_PRECISION_real/complexhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/49cleanup of THIS_REAL/COMPLEX_KERNEL2017-05-21T22:14:18ZAndreas Marekcleanup of THIS_REAL/COMPLEX_KERNELhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/41Cleanup of trans_ev_tridi_to_band2017-05-21T22:14:18ZAndreas MarekCleanup of trans_ev_tridi_to_bandThe real/complex case have been unified in one file. Still some cleanup is necessary.The real/complex case have been unified in one file. Still some cleanup is necessary.https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/38C macros for ELPA1 auxillary2023-06-30T12:22:51ZAndreas MarekC macros for ELPA1 auxillaryPavel KusPavel Kushttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/37C macros in ELPA2 complex case missing2017-05-21T22:14:18ZAndreas MarekC macros in ELPA2 complex case missingPavel KusPavel Kushttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/42Code divergence of real/complex CPU part in trans_ev_band_to_full2017-05-21T22:14:18ZAndreas MarekCode divergence of real/complex CPU part in trans_ev_band_to_fullThe real/complex CPU parts of trans_ev_band_to_full diverged:
The real code path uses blocking, the complex part does not!
To do:
- make blocking in real part an OPTION, i.e. a fall back to same code path as the complex version shoul...The real/complex CPU parts of trans_ev_band_to_full diverged:
The real code path uses blocking, the complex part does not!
To do:
- make blocking in real part an OPTION, i.e. a fall back to same code path as the complex version should be possible
- implement blocking for complex part also OPTIONAL.
In this way, we retain better tuning options for different architectureshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/13Complex generiv kernel produces wrong results2018-02-05T19:48:54ZAndreas MarekComplex generiv kernel produces wrong resultsIn some cases the error residual is wrongIn some cases the error residual is wrong