elpa issueshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues2024-02-22T16:08:32Zhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/111pkg-config file does not propagate dependencies correctly2024-02-22T16:08:32ZHenri Menkepkg-config file does not propagate dependencies correctlyI have ELPA built with [Spack](https://spack.io/) and a Spack-provided CUDA. The ELPA shared libraries are correctly linked against that version of CUDA (in this case installed in `/home/menke/Code/octopus/mpsd-software/23b/cascadelake/s...I have ELPA built with [Spack](https://spack.io/) and a Spack-provided CUDA. The ELPA shared libraries are correctly linked against that version of CUDA (in this case installed in `/home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/cuda-11.4.4-32q6h4aw7nfc6bm24hhlex3xbjho2zbs/`).
```console
$ ldd /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/elpa-2021.11.001-7z75o35wbzejlarg2f3rdh2ba4ct6vk3/lib/libelpa_openmp.so
linux-vdso.so.1 (0x00007fff51b4c000)
libcudart.so.11.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/cuda-11.4.4-32q6h4aw7nfc6bm24hhlex3xbjho2zbs/lib64/libcudart.so.11.0 (0x00007fe3ad200000)
libcublas.so.11 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/cuda-11.4.4-32q6h4aw7nfc6bm24hhlex3xbjho2zbs/lib64/libcublas.so.11 (0x00007fe3a3e00000)
libscalapack.so => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/netlib-scalapack-2.1.0-n455vez6w6zrmmsaa6blizkd2dhpjp4g/lib/libscalapack.so (0x00007fe3a3866000)
libopenblas.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openblas-0.3.20-urulap4bpumsqt4witgc7zjrwygpvfd7/lib/libopenblas.so.0 (0x00007fe3a2b2a000)
libmpi_usempif08.so.40 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openmpi-4.1.4-fzlupbc53f4c7rv56ksg2ijp7cmjk3p2/lib/libmpi_usempif08.so.40 (0x00007fe3ad5e9000)
libmpi_usempi_ignore_tkr.so.40 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openmpi-4.1.4-fzlupbc53f4c7rv56ksg2ijp7cmjk3p2/lib/libmpi_usempi_ignore_tkr.so.40 (0x00007fe3ad5d8000)
libmpi_mpifh.so.40 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openmpi-4.1.4-fzlupbc53f4c7rv56ksg2ijp7cmjk3p2/lib/libmpi_mpifh.so.40 (0x00007fe3ad568000)
libmpi.so.40 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openmpi-4.1.4-fzlupbc53f4c7rv56ksg2ijp7cmjk3p2/lib/libmpi.so.40 (0x00007fe3a27fa000)
libgfortran.so.5 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-x86_64_v2/gcc-10.4.0/gcc-11.3.0-vbm62s5zafmejv43sy2jxrk7kjsdvy5c/lib64/libgfortran.so.5 (0x00007fe3a254f000)
libm.so.6 => /lib64/libm.so.6 (0x00007fe3a2403000)
libgomp.so.1 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-x86_64_v2/gcc-10.4.0/gcc-11.3.0-vbm62s5zafmejv43sy2jxrk7kjsdvy5c/lib64/libgomp.so.1 (0x00007fe3ad4f0000)
libquadmath.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-x86_64_v2/gcc-10.4.0/gcc-11.3.0-vbm62s5zafmejv43sy2jxrk7kjsdvy5c/lib64/libquadmath.so.0 (0x00007fe3ad4a8000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe3ad1dc000)
libc.so.6 => /lib64/libc.so.6 (0x00007fe3a220c000)
libgcc_s.so.1 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-x86_64_v2/gcc-10.4.0/gcc-11.3.0-vbm62s5zafmejv43sy2jxrk7kjsdvy5c/lib64/libgcc_s.so.1 (0x00007fe3ad1c3000)
/lib64/ld-linux-x86-64.so.2 (0x00007fe3ad95f000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fe3ad1be000)
librt.so.1 => /lib64/librt.so.1 (0x00007fe3ad1b4000)
libcublasLt.so.11 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/cuda-11.4.4-32q6h4aw7nfc6bm24hhlex3xbjho2zbs/lib64/libcublasLt.so.11 (0x00007fe38e800000)
libopen-rte.so.40 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openmpi-4.1.4-fzlupbc53f4c7rv56ksg2ijp7cmjk3p2/lib/libopen-rte.so.40 (0x00007fe38e6d6000)
libopen-pal.so.40 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openmpi-4.1.4-fzlupbc53f4c7rv56ksg2ijp7cmjk3p2/lib/libopen-pal.so.40 (0x00007fe38e5bd000)
libucp.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/ucx-1.13.1-blorefcxmyrtaw6bohmqni2jfopg7qgz/lib/libucp.so.0 (0x00007fe38e501000)
libuct.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/ucx-1.13.1-blorefcxmyrtaw6bohmqni2jfopg7qgz/lib/libuct.so.0 (0x00007fe38e4c7000)
libucm.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/ucx-1.13.1-blorefcxmyrtaw6bohmqni2jfopg7qgz/lib/libucm.so.0 (0x00007fe38e4aa000)
libucs.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/ucx-1.13.1-blorefcxmyrtaw6bohmqni2jfopg7qgz/lib/libucs.so.0 (0x00007fe38e44a000)
libpmix.so.2 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/pmix-4.1.2-5rfcuayjujkz34vzv6xrxb62oim2nq6l/lib/libpmix.so.2 (0x00007fe38e253000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007fe3a2206000)
libz.so.1 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/zlib-1.2.13-strlfw5tgsnsidsylwj62qn2d2bjcju2/lib/libz.so.1 (0x00007fe38e23b000)
libhwloc.so.15 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/hwloc-2.8.0-dqe5wh7xoko5nk4d4fqagb2ojlozz4yu/lib/libhwloc.so.15 (0x00007fe38e1dc000)
libevent_core-2.1.so.7 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/libevent-2.1.12-ayltwo3wtjbzrwes3xuceqxin4i3awea/lib/libevent_core-2.1.so.7 (0x00007fe38e1a6000)
libevent_pthreads-2.1.so.7 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/libevent-2.1.12-ayltwo3wtjbzrwes3xuceqxin4i3awea/lib/libevent_pthreads-2.1.so.7 (0x00007fe3a2202000)
libnuma.so.1 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/numactl-2.0.14-hurh7orzyk3j53trmysdw2mhbhudfbmc/lib/libnuma.so.1 (0x00007fe38e199000)
libpciaccess.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/libpciaccess-0.16-vkvruq6waqd5ykaausuasxstprppewvz/lib/libpciaccess.so.0 (0x00007fe38e18d000)
libxml2.so.2 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/libxml2-2.10.1-67i7fqm3rcif6pkdga6xesjd64gqgbyx/lib/libxml2.so.2 (0x00007fe38e028000)
libatomic.so.1 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-x86_64_v2/gcc-10.4.0/gcc-11.3.0-vbm62s5zafmejv43sy2jxrk7kjsdvy5c/lib64/libatomic.so.1 (0x00007fe38e01e000)
liblzma.so.5 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/xz-5.2.7-zynnvn6zxkjith7y7fk26w6m7ynndjvr/lib/liblzma.so.5 (0x00007fe38dff6000)
libiconv.so.2 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/libiconv-1.16-fh2cpeogok6yjhafcn4ccpbbc7kcad3q/lib/libiconv.so.2 (0x00007fe38def9000)
```
However, the pkg-config file does not reflect this custom library path and only contains `-lcudart -lcublas`. Any downstream consumer will now fail to link against ELPA because the transitive dependency on CUDA cannot be resolved.
```console
$ cat /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/elpa-2021.11.001-7z75o35wbzejlarg2f3rdh2ba4ct6vk3/lib/pkgconfig/elpa_openmp.pc
prefix=/home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/elpa-2021.11.001-7z75o35wbzejlarg2f3rdh2ba4ct6vk3
exec_prefix=${prefix}
libdir=${exec_prefix}/lib
includedir=${prefix}/include
Name: elpa_openmp
Description: ELPA is a Fortran-based high-performance computational library for the (massively) parallel solution of symmetric or Hermitian, standard or generalized eigenvalue problems.
Version: 2021.11.001
URL:
Libs: -L${libdir} -lelpa_openmp -lcudart -lcublas -lscalapack -lopenblas -lopenblas /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/netlib-scalapack-2.1.0-n455vez6w6zrmmsaa6blizkd2dhpjp4g/lib/libscalapack.so -fopenmp
Cflags: -I${includedir}/elpa_openmp-2021.11.001 -fopenmp
fcflags= -I${includedir}/elpa_openmp-2021.11.001/modules -fopenmp
```Petr KarpovPetr Karpov2024-03-24https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/112Bad perforrmance of ELPA_2STAGE_REAL_NVIDIA_SM80_GPU kernel2024-02-22T16:24:44ZPetr KarpovBad perforrmance of ELPA_2STAGE_REAL_NVIDIA_SM80_GPU kernelFor ELPA 2023.11.001, ELPA_2STAGE_REAL_NVIDIA_SM80_GPU kernel gives much worse performance than ELPA_2STAGE_REAL_NVIDIA_GPU.
Reproducer:
[config.log](/uploads/034f72c0f3c6f4e9c0974ed58ae23668/config.log) with modules:
```
module load ...For ELPA 2023.11.001, ELPA_2STAGE_REAL_NVIDIA_SM80_GPU kernel gives much worse performance than ELPA_2STAGE_REAL_NVIDIA_GPU.
Reproducer:
[config.log](/uploads/034f72c0f3c6f4e9c0974ed58ae23668/config.log) with modules:
```
module load autoconf/2.71 cuda/11.4 gcc/11 openmpi/4 mkl/2022.1 nccl/2.11.4
```
The problem comes from tridi_to_band. Here are the timings for 4 GPUs (1 Raven node) and N=10k, 30k, and 40k matrices:
10k
ELPA_2STAGE_REAL_NVIDIA_GPU: tridi_to_band 2.643782 s
ELPA_2STAGE_REAL_NVIDIA_SM80_GPU: tridi_to_band 10.700442 s
30k
ELPA_2STAGE_REAL_NVIDIA_GPU: tridi_to_band 45.905885 s
ELPA_2STAGE_REAL_NVIDIA_SM80_GPU: tridi_to_band 243.225826 s
40k
ELPA_2STAGE_REAL_NVIDIA_GPU: tridi_to_band 101.433715 s
ELPA_2STAGE_REAL_NVIDIA_SM80_GPU: tridi_to_band 565.902228 s
Here are the run logs for 40k matrix:
[slurm-9329866_40k_ELPA_2STAGE_REAL_NVIDIA_GPU.out](/uploads/ab2008646022d10507396235341de56b/slurm-9329866_40k_ELPA_2STAGE_REAL_NVIDIA_GPU.out)
[slurm-9329868_40k_ELPA_2STAGE_REAL_NVIDIA_SM80_GPU.out](/uploads/c64789bd736906b7193e057ec71ad448/slurm-9329868_40k_ELPA_2STAGE_REAL_NVIDIA_SM80_GPU.out)https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/109Modify elpa_transpose_vectors_ NCCL such that it can be used in cholesky2024-03-06T06:09:54ZAndreas MarekModify elpa_transpose_vectors_ NCCL such that it can be used in choleskyThe cholesky decomposition step could be further sped-up on GPUs if one could use elpa_transpose_vectors_ NCCL frm ELPA 1stage full_to_tridi there.The cholesky decomposition step could be further sped-up on GPUs if one could use elpa_transpose_vectors_ NCCL frm ELPA 1stage full_to_tridi there.Petr KarpovPetr Karpovhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/108UCX warnings for GPU complex_double tests with OpenMPI2023-12-15T13:01:02ZPetr KarpovUCX warnings for GPU complex_double tests with OpenMPIReproducer:
```
module purge
module load cuda/11.4 gcc/11 openmpi/4 mkl/2022.1 nccl/2.11.4
export OMPI_MCA_coll=^hcoll
../configure --prefix=$HOME/soft/elpa_mpi_00 --enable-option-checking=fatal CC=mpicc FC=mpif90 CXX=mpicxx CFLAGS="-...Reproducer:
```
module purge
module load cuda/11.4 gcc/11 openmpi/4 mkl/2022.1 nccl/2.11.4
export OMPI_MCA_coll=^hcoll
../configure --prefix=$HOME/soft/elpa_mpi_00 --enable-option-checking=fatal CC=mpicc FC=mpif90 CXX=mpicxx CFLAGS="-O3 -g -march=skylake-avx512 -I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include" CXXFLAGS="-std=c++17 -O3 -march=skylake-avx512 -I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include" FCFLAGS="-O3 -g -march=skylake-avx512 -I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include" LDFLAGS="-L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_gf_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -lpthread -Wl,-rpath,$MKL_HOME/lib/intel64" --with-mpi=yes --enable-assumed-size --enable-band-to-full-blocking --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_70 -with-cuda-path=$CUDA_HOME --enable-avx512 --enable-cpp-tests=no --enable-single-precision --enable-nvtx
```
The warnings like
```
[1702644215.666499] [ravg1002:132812:0] mpool.c:55 UCX WARN object 0xcf82c0 {{cpml|cb|snd_tag|rk_use} send length 41943040 ucp_proto_progress_tag_rndv_rts() comp:mca_pml_ucx_send_nbx_completion()host me was not returned to mpool ucp_requests
```
appear for complex_double tests, e.g. `validate_complex_double_eigenvectors_1stage_gpu_random` but not for real_doublePetr KarpovPetr Karpovhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/107Non-gpu version of HIP tests are failing without GPUs2023-11-14T09:35:43ZPetr KarpovNon-gpu version of HIP tests are failing without GPUsExample of configure line on Raven:
```
module purge
module load cuda/11.4 gcc/11 openmpi/4 mkl/2022.1
export PATH=/u/pekarp/soft/HIP/rocm-5.1.x/bin:${PATH}
export LD_LIBRARY_PATH=/u/pekarp/soft/HIP/rocm-5.1.x/lib:$LD_LIBRARY_PATH
expo...Example of configure line on Raven:
```
module purge
module load cuda/11.4 gcc/11 openmpi/4 mkl/2022.1
export PATH=/u/pekarp/soft/HIP/rocm-5.1.x/bin:${PATH}
export LD_LIBRARY_PATH=/u/pekarp/soft/HIP/rocm-5.1.x/lib:$LD_LIBRARY_PATH
export CUDA_PATH=$CUDA_HOME # needed for HIP internally!
../configure --prefix=$HOME/soft/elpa_mpi_00 --enable-option-checking=fatal CC=mpicc FC=mpif90 CPP="gcc -E" CXX="hipcc" \
CFLAGS="-O0 -g -I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include -march=skylake-avx512" \
CXXFLAGS="-std=c++17 -O0 -g -I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include -I$HOME/soft/HIP/rocm-5.1.x/include -I$HOME/soft/HIP/rocm-5.1.x/hipblas/include " \
FCFLAGS="-O0 -g -I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include -march=skylake-avx512" \
SCALAPACK_FCFLAGS="-I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include" \
SCALAPACK_LDFLAGS="-L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_gf_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -lpthread -Wl,-rpath,$MKL_HOME/lib/intel64" \
LIBS="-L$HOME/soft/HIP/rocm-5.1.x/lib -lhipblas -L/mpcdf/soft/SLE_15/packages/x86_64/cuda/11.4.2/lib64/ -lcublas -L$HOME/soft/HIP/rocm-5.1.x/lib -lhipblas -L/mpcdf/soft/SLE_15/packages/x86_64/cuda/11.4.2/lib64/ -lcudart" \
--with-mpi=yes --enable-cpp-tests=no --enable-single-precision --enable-nvtx \
--enable-amd-gpu --enable-marshalling-hipblas-library --enable-mpi-launcher=srun --disable-detect-mpi-launcher --disable-cpp-tests
```
Example of failing test:
```
validate_real_double_eigenvectors_1stage_random
```https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/106Add and RECOMMEND setup_gpu() to documentation2023-10-25T06:04:33ZAndreas MarekAdd and RECOMMEND setup_gpu() to documentationPetr KarpovPetr Karpovhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/105SYCL kernels for multiply missing2023-10-25T06:03:39ZAndreas MarekSYCL kernels for multiply missingPetr KarpovPetr Karpovhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/104Get rid of internal memcpy in hermitian_multiply in case of device pointers2023-10-27T06:16:41ZAndreas MarekGet rid of internal memcpy in hermitian_multiply in case of device pointersIn hermitian_multiply, when the function is called with device pointers (data already on GPU) this is
internally copied from one memory allocation to another. This should be removed, and multiply should
just work on the provided memory s...In hermitian_multiply, when the function is called with device pointers (data already on GPU) this is
internally copied from one memory allocation to another. This should be removed, and multiply should
just work on the provided memory space (maybe via a transfer statement)Petr KarpovPetr Karpovhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/103Intermittent bug in CI test validate_multiple_objs_...2023-10-20T10:58:09ZPetr KarpovIntermittent bug in CI test validate_multiple_objs_...Here is an example of the bug:
https://gitlab.mpcdf.mpg.de/elpa/elpa/-/jobs/2244241
Most likely it's caused by a race condition in writing a file with a state of ELPA object when the tests are too quick.Here is an example of the bug:
https://gitlab.mpcdf.mpg.de/elpa/elpa/-/jobs/2244241
Most likely it's caused by a race condition in writing a file with a state of ELPA object when the tests are too quick.https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/96No tests with random matrix and nvhpcskd2022-04-05T13:47:08ZAndreas MarekNo tests with random matrix and nvhpcskdThe PGI compiler creates errors for the random seed, see
https://github.com/flang-compiler/flang/issues/691The PGI compiler creates errors for the random seed, see
https://github.com/flang-compiler/flang/issues/691https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/94Setting of GPU kernel depens on order of set calls2022-02-03T17:05:00ZAndreas MarekSetting of GPU kernel depens on order of set callsWhen setting
first set("solver",2stage) and then
set("kernel",GPU_KERNEL)
it uses the CPU kernel (the default kernel seems to be set)
In the other order it works correctlyWhen setting
first set("solver",2stage) and then
set("kernel",GPU_KERNEL)
it uses the CPU kernel (the default kernel seems to be set)
In the other order it works correctlyhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/92external sanity checker for ELPA settings / input2021-08-25T12:07:43ZAndreas Marekexternal sanity checker for ELPA settings / inputhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/911 MPI rank per GPU rest with OpenMP threads2022-02-09T13:50:23ZAndreas Marek1 MPI rank per GPU rest with OpenMP threadshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/90Interfaces for device-memory arrays2021-09-02T16:36:58ZAndreas MarekInterfaces for device-memory arrayshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/89GPU version: limit rectriction to power of 2 block sizes2021-08-25T12:00:31ZAndreas MarekGPU version: limit rectriction to power of 2 block sizeshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/88Check status of mpi-redistribution routine2021-08-25T11:58:43ZAndreas MarekCheck status of mpi-redistribution routinehttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/87Check status of GPU port of generalized routines2021-08-25T12:16:48ZAndreas MarekCheck status of GPU port of generalized routineshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/85GPU improvements in tridi_to_band for non-MPI2021-07-09T07:00:44ZAndreas MarekGPU improvements in tridi_to_band for non-MPIA lot of memory transfers to / from the device could be avoided, similar to the CUDA_AWARE_MPI caseA lot of memory transfers to / from the device could be avoided, similar to the CUDA_AWARE_MPI casehttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/81Toeplitz test cases hang for realy small matrices na=42021-05-06T12:37:33ZAndreas MarekToeplitz test cases hang for realy small matrices na=4If you use 4 MPI tasks for a setup of na=4 nev=4 nblk=1, the the test-cases for Toeplitz matrices hang.
The test-cases for other matrix setups do work, however.
It seems that the code hangs in the "solve" stepIf you use 4 MPI tasks for a setup of na=4 nev=4 nblk=1, the the test-cases for Toeplitz matrices hang.
The test-cases for other matrix setups do work, however.
It seems that the code hangs in the "solve" stephttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/79Print in the test programs the number of GPU's used2021-04-15T06:16:37ZAndreas MarekPrint in the test programs the number of GPU's usedIf ELPA is build with GPU support, print in the startup of the test programs the number of GPUs usedIf ELPA is build with GPU support, print in the startup of the test programs the number of GPUs used