elpa issueshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues2024-02-22T16:24:44Zhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/112Bad perforrmance of ELPA_2STAGE_REAL_NVIDIA_SM80_GPU kernel2024-02-22T16:24:44ZPetr KarpovBad perforrmance of ELPA_2STAGE_REAL_NVIDIA_SM80_GPU kernelFor ELPA 2023.11.001, ELPA_2STAGE_REAL_NVIDIA_SM80_GPU kernel gives much worse performance than ELPA_2STAGE_REAL_NVIDIA_GPU.
Reproducer:
[config.log](/uploads/034f72c0f3c6f4e9c0974ed58ae23668/config.log) with modules:
```
module load ...For ELPA 2023.11.001, ELPA_2STAGE_REAL_NVIDIA_SM80_GPU kernel gives much worse performance than ELPA_2STAGE_REAL_NVIDIA_GPU.
Reproducer:
[config.log](/uploads/034f72c0f3c6f4e9c0974ed58ae23668/config.log) with modules:
```
module load autoconf/2.71 cuda/11.4 gcc/11 openmpi/4 mkl/2022.1 nccl/2.11.4
```
The problem comes from tridi_to_band. Here are the timings for 4 GPUs (1 Raven node) and N=10k, 30k, and 40k matrices:
10k
ELPA_2STAGE_REAL_NVIDIA_GPU: tridi_to_band 2.643782 s
ELPA_2STAGE_REAL_NVIDIA_SM80_GPU: tridi_to_band 10.700442 s
30k
ELPA_2STAGE_REAL_NVIDIA_GPU: tridi_to_band 45.905885 s
ELPA_2STAGE_REAL_NVIDIA_SM80_GPU: tridi_to_band 243.225826 s
40k
ELPA_2STAGE_REAL_NVIDIA_GPU: tridi_to_band 101.433715 s
ELPA_2STAGE_REAL_NVIDIA_SM80_GPU: tridi_to_band 565.902228 s
Here are the run logs for 40k matrix:
[slurm-9329866_40k_ELPA_2STAGE_REAL_NVIDIA_GPU.out](/uploads/ab2008646022d10507396235341de56b/slurm-9329866_40k_ELPA_2STAGE_REAL_NVIDIA_GPU.out)
[slurm-9329868_40k_ELPA_2STAGE_REAL_NVIDIA_SM80_GPU.out](/uploads/c64789bd736906b7193e057ec71ad448/slurm-9329868_40k_ELPA_2STAGE_REAL_NVIDIA_SM80_GPU.out)https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/111pkg-config file does not propagate dependencies correctly2024-02-22T16:08:32ZHenri Menkepkg-config file does not propagate dependencies correctlyI have ELPA built with [Spack](https://spack.io/) and a Spack-provided CUDA. The ELPA shared libraries are correctly linked against that version of CUDA (in this case installed in `/home/menke/Code/octopus/mpsd-software/23b/cascadelake/s...I have ELPA built with [Spack](https://spack.io/) and a Spack-provided CUDA. The ELPA shared libraries are correctly linked against that version of CUDA (in this case installed in `/home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/cuda-11.4.4-32q6h4aw7nfc6bm24hhlex3xbjho2zbs/`).
```console
$ ldd /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/elpa-2021.11.001-7z75o35wbzejlarg2f3rdh2ba4ct6vk3/lib/libelpa_openmp.so
linux-vdso.so.1 (0x00007fff51b4c000)
libcudart.so.11.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/cuda-11.4.4-32q6h4aw7nfc6bm24hhlex3xbjho2zbs/lib64/libcudart.so.11.0 (0x00007fe3ad200000)
libcublas.so.11 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/cuda-11.4.4-32q6h4aw7nfc6bm24hhlex3xbjho2zbs/lib64/libcublas.so.11 (0x00007fe3a3e00000)
libscalapack.so => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/netlib-scalapack-2.1.0-n455vez6w6zrmmsaa6blizkd2dhpjp4g/lib/libscalapack.so (0x00007fe3a3866000)
libopenblas.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openblas-0.3.20-urulap4bpumsqt4witgc7zjrwygpvfd7/lib/libopenblas.so.0 (0x00007fe3a2b2a000)
libmpi_usempif08.so.40 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openmpi-4.1.4-fzlupbc53f4c7rv56ksg2ijp7cmjk3p2/lib/libmpi_usempif08.so.40 (0x00007fe3ad5e9000)
libmpi_usempi_ignore_tkr.so.40 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openmpi-4.1.4-fzlupbc53f4c7rv56ksg2ijp7cmjk3p2/lib/libmpi_usempi_ignore_tkr.so.40 (0x00007fe3ad5d8000)
libmpi_mpifh.so.40 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openmpi-4.1.4-fzlupbc53f4c7rv56ksg2ijp7cmjk3p2/lib/libmpi_mpifh.so.40 (0x00007fe3ad568000)
libmpi.so.40 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openmpi-4.1.4-fzlupbc53f4c7rv56ksg2ijp7cmjk3p2/lib/libmpi.so.40 (0x00007fe3a27fa000)
libgfortran.so.5 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-x86_64_v2/gcc-10.4.0/gcc-11.3.0-vbm62s5zafmejv43sy2jxrk7kjsdvy5c/lib64/libgfortran.so.5 (0x00007fe3a254f000)
libm.so.6 => /lib64/libm.so.6 (0x00007fe3a2403000)
libgomp.so.1 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-x86_64_v2/gcc-10.4.0/gcc-11.3.0-vbm62s5zafmejv43sy2jxrk7kjsdvy5c/lib64/libgomp.so.1 (0x00007fe3ad4f0000)
libquadmath.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-x86_64_v2/gcc-10.4.0/gcc-11.3.0-vbm62s5zafmejv43sy2jxrk7kjsdvy5c/lib64/libquadmath.so.0 (0x00007fe3ad4a8000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe3ad1dc000)
libc.so.6 => /lib64/libc.so.6 (0x00007fe3a220c000)
libgcc_s.so.1 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-x86_64_v2/gcc-10.4.0/gcc-11.3.0-vbm62s5zafmejv43sy2jxrk7kjsdvy5c/lib64/libgcc_s.so.1 (0x00007fe3ad1c3000)
/lib64/ld-linux-x86-64.so.2 (0x00007fe3ad95f000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fe3ad1be000)
librt.so.1 => /lib64/librt.so.1 (0x00007fe3ad1b4000)
libcublasLt.so.11 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/cuda-11.4.4-32q6h4aw7nfc6bm24hhlex3xbjho2zbs/lib64/libcublasLt.so.11 (0x00007fe38e800000)
libopen-rte.so.40 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openmpi-4.1.4-fzlupbc53f4c7rv56ksg2ijp7cmjk3p2/lib/libopen-rte.so.40 (0x00007fe38e6d6000)
libopen-pal.so.40 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/openmpi-4.1.4-fzlupbc53f4c7rv56ksg2ijp7cmjk3p2/lib/libopen-pal.so.40 (0x00007fe38e5bd000)
libucp.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/ucx-1.13.1-blorefcxmyrtaw6bohmqni2jfopg7qgz/lib/libucp.so.0 (0x00007fe38e501000)
libuct.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/ucx-1.13.1-blorefcxmyrtaw6bohmqni2jfopg7qgz/lib/libuct.so.0 (0x00007fe38e4c7000)
libucm.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/ucx-1.13.1-blorefcxmyrtaw6bohmqni2jfopg7qgz/lib/libucm.so.0 (0x00007fe38e4aa000)
libucs.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/ucx-1.13.1-blorefcxmyrtaw6bohmqni2jfopg7qgz/lib/libucs.so.0 (0x00007fe38e44a000)
libpmix.so.2 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/pmix-4.1.2-5rfcuayjujkz34vzv6xrxb62oim2nq6l/lib/libpmix.so.2 (0x00007fe38e253000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007fe3a2206000)
libz.so.1 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/zlib-1.2.13-strlfw5tgsnsidsylwj62qn2d2bjcju2/lib/libz.so.1 (0x00007fe38e23b000)
libhwloc.so.15 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/hwloc-2.8.0-dqe5wh7xoko5nk4d4fqagb2ojlozz4yu/lib/libhwloc.so.15 (0x00007fe38e1dc000)
libevent_core-2.1.so.7 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/libevent-2.1.12-ayltwo3wtjbzrwes3xuceqxin4i3awea/lib/libevent_core-2.1.so.7 (0x00007fe38e1a6000)
libevent_pthreads-2.1.so.7 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/libevent-2.1.12-ayltwo3wtjbzrwes3xuceqxin4i3awea/lib/libevent_pthreads-2.1.so.7 (0x00007fe3a2202000)
libnuma.so.1 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/numactl-2.0.14-hurh7orzyk3j53trmysdw2mhbhudfbmc/lib/libnuma.so.1 (0x00007fe38e199000)
libpciaccess.so.0 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/libpciaccess-0.16-vkvruq6waqd5ykaausuasxstprppewvz/lib/libpciaccess.so.0 (0x00007fe38e18d000)
libxml2.so.2 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/libxml2-2.10.1-67i7fqm3rcif6pkdga6xesjd64gqgbyx/lib/libxml2.so.2 (0x00007fe38e028000)
libatomic.so.1 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-x86_64_v2/gcc-10.4.0/gcc-11.3.0-vbm62s5zafmejv43sy2jxrk7kjsdvy5c/lib64/libatomic.so.1 (0x00007fe38e01e000)
liblzma.so.5 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/xz-5.2.7-zynnvn6zxkjith7y7fk26w6m7ynndjvr/lib/liblzma.so.5 (0x00007fe38dff6000)
libiconv.so.2 => /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/libiconv-1.16-fh2cpeogok6yjhafcn4ccpbbc7kcad3q/lib/libiconv.so.2 (0x00007fe38def9000)
```
However, the pkg-config file does not reflect this custom library path and only contains `-lcudart -lcublas`. Any downstream consumer will now fail to link against ELPA because the transitive dependency on CUDA cannot be resolved.
```console
$ cat /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/elpa-2021.11.001-7z75o35wbzejlarg2f3rdh2ba4ct6vk3/lib/pkgconfig/elpa_openmp.pc
prefix=/home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/elpa-2021.11.001-7z75o35wbzejlarg2f3rdh2ba4ct6vk3
exec_prefix=${prefix}
libdir=${exec_prefix}/lib
includedir=${prefix}/include
Name: elpa_openmp
Description: ELPA is a Fortran-based high-performance computational library for the (massively) parallel solution of symmetric or Hermitian, standard or generalized eigenvalue problems.
Version: 2021.11.001
URL:
Libs: -L${libdir} -lelpa_openmp -lcudart -lcublas -lscalapack -lopenblas -lopenblas /home/menke/Code/octopus/mpsd-software/23b/cascadelake/spack/opt/spack/linux-opensuse_leap15-cascadelake/gcc-11.3.0/netlib-scalapack-2.1.0-n455vez6w6zrmmsaa6blizkd2dhpjp4g/lib/libscalapack.so -fopenmp
Cflags: -I${includedir}/elpa_openmp-2021.11.001 -fopenmp
fcflags= -I${includedir}/elpa_openmp-2021.11.001/modules -fopenmp
```Petr KarpovPetr Karpov2024-03-24https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/109Modify elpa_transpose_vectors_ NCCL such that it can be used in cholesky2024-03-06T06:09:54ZAndreas MarekModify elpa_transpose_vectors_ NCCL such that it can be used in choleskyThe cholesky decomposition step could be further sped-up on GPUs if one could use elpa_transpose_vectors_ NCCL frm ELPA 1stage full_to_tridi there.The cholesky decomposition step could be further sped-up on GPUs if one could use elpa_transpose_vectors_ NCCL frm ELPA 1stage full_to_tridi there.Petr KarpovPetr Karpovhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/108UCX warnings for GPU complex_double tests with OpenMPI2023-12-15T13:01:02ZPetr KarpovUCX warnings for GPU complex_double tests with OpenMPIReproducer:
```
module purge
module load cuda/11.4 gcc/11 openmpi/4 mkl/2022.1 nccl/2.11.4
export OMPI_MCA_coll=^hcoll
../configure --prefix=$HOME/soft/elpa_mpi_00 --enable-option-checking=fatal CC=mpicc FC=mpif90 CXX=mpicxx CFLAGS="-...Reproducer:
```
module purge
module load cuda/11.4 gcc/11 openmpi/4 mkl/2022.1 nccl/2.11.4
export OMPI_MCA_coll=^hcoll
../configure --prefix=$HOME/soft/elpa_mpi_00 --enable-option-checking=fatal CC=mpicc FC=mpif90 CXX=mpicxx CFLAGS="-O3 -g -march=skylake-avx512 -I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include" CXXFLAGS="-std=c++17 -O3 -march=skylake-avx512 -I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include" FCFLAGS="-O3 -g -march=skylake-avx512 -I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include" LDFLAGS="-L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_gf_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -lpthread -Wl,-rpath,$MKL_HOME/lib/intel64" --with-mpi=yes --enable-assumed-size --enable-band-to-full-blocking --enable-nvidia-gpu --with-NVIDIA-GPU-compute-capability=sm_70 -with-cuda-path=$CUDA_HOME --enable-avx512 --enable-cpp-tests=no --enable-single-precision --enable-nvtx
```
The warnings like
```
[1702644215.666499] [ravg1002:132812:0] mpool.c:55 UCX WARN object 0xcf82c0 {{cpml|cb|snd_tag|rk_use} send length 41943040 ucp_proto_progress_tag_rndv_rts() comp:mca_pml_ucx_send_nbx_completion()host me was not returned to mpool ucp_requests
```
appear for complex_double tests, e.g. `validate_complex_double_eigenvectors_1stage_gpu_random` but not for real_doublePetr KarpovPetr Karpovhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/107Non-gpu version of HIP tests are failing without GPUs2023-11-14T09:35:43ZPetr KarpovNon-gpu version of HIP tests are failing without GPUsExample of configure line on Raven:
```
module purge
module load cuda/11.4 gcc/11 openmpi/4 mkl/2022.1
export PATH=/u/pekarp/soft/HIP/rocm-5.1.x/bin:${PATH}
export LD_LIBRARY_PATH=/u/pekarp/soft/HIP/rocm-5.1.x/lib:$LD_LIBRARY_PATH
expo...Example of configure line on Raven:
```
module purge
module load cuda/11.4 gcc/11 openmpi/4 mkl/2022.1
export PATH=/u/pekarp/soft/HIP/rocm-5.1.x/bin:${PATH}
export LD_LIBRARY_PATH=/u/pekarp/soft/HIP/rocm-5.1.x/lib:$LD_LIBRARY_PATH
export CUDA_PATH=$CUDA_HOME # needed for HIP internally!
../configure --prefix=$HOME/soft/elpa_mpi_00 --enable-option-checking=fatal CC=mpicc FC=mpif90 CPP="gcc -E" CXX="hipcc" \
CFLAGS="-O0 -g -I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include -march=skylake-avx512" \
CXXFLAGS="-std=c++17 -O0 -g -I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include -I$HOME/soft/HIP/rocm-5.1.x/include -I$HOME/soft/HIP/rocm-5.1.x/hipblas/include " \
FCFLAGS="-O0 -g -I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include -march=skylake-avx512" \
SCALAPACK_FCFLAGS="-I$MKL_HOME/include/intel64/lp64 -I$CUDA_HOME/include" \
SCALAPACK_LDFLAGS="-L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_gf_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -lpthread -Wl,-rpath,$MKL_HOME/lib/intel64" \
LIBS="-L$HOME/soft/HIP/rocm-5.1.x/lib -lhipblas -L/mpcdf/soft/SLE_15/packages/x86_64/cuda/11.4.2/lib64/ -lcublas -L$HOME/soft/HIP/rocm-5.1.x/lib -lhipblas -L/mpcdf/soft/SLE_15/packages/x86_64/cuda/11.4.2/lib64/ -lcudart" \
--with-mpi=yes --enable-cpp-tests=no --enable-single-precision --enable-nvtx \
--enable-amd-gpu --enable-marshalling-hipblas-library --enable-mpi-launcher=srun --disable-detect-mpi-launcher --disable-cpp-tests
```
Example of failing test:
```
validate_real_double_eigenvectors_1stage_random
```https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/106Add and RECOMMEND setup_gpu() to documentation2023-10-25T06:04:33ZAndreas MarekAdd and RECOMMEND setup_gpu() to documentationPetr KarpovPetr Karpovhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/105SYCL kernels for multiply missing2023-10-25T06:03:39ZAndreas MarekSYCL kernels for multiply missingPetr KarpovPetr Karpovhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/104Get rid of internal memcpy in hermitian_multiply in case of device pointers2023-10-27T06:16:41ZAndreas MarekGet rid of internal memcpy in hermitian_multiply in case of device pointersIn hermitian_multiply, when the function is called with device pointers (data already on GPU) this is
internally copied from one memory allocation to another. This should be removed, and multiply should
just work on the provided memory s...In hermitian_multiply, when the function is called with device pointers (data already on GPU) this is
internally copied from one memory allocation to another. This should be removed, and multiply should
just work on the provided memory space (maybe via a transfer statement)Petr KarpovPetr Karpovhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/103Intermittent bug in CI test validate_multiple_objs_...2023-10-20T10:58:09ZPetr KarpovIntermittent bug in CI test validate_multiple_objs_...Here is an example of the bug:
https://gitlab.mpcdf.mpg.de/elpa/elpa/-/jobs/2244241
Most likely it's caused by a race condition in writing a file with a state of ELPA object when the tests are too quick.Here is an example of the bug:
https://gitlab.mpcdf.mpg.de/elpa/elpa/-/jobs/2244241
Most likely it's caused by a race condition in writing a file with a state of ELPA object when the tests are too quick.https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/102Undefined reference to elpa_skew functions2023-12-15T21:47:03ZPetr KarpovUndefined reference to elpa_skew functionsThere is a problem with undefined reference to elpa_skew functions, when the skew symmetric support is disabled (--disable-skew-symmetric-support).
Here is the reproducer for raven:
module load anaconda/3/2021.11 intel/21.6.0 impi/2021...There is a problem with undefined reference to elpa_skew functions, when the skew symmetric support is disabled (--disable-skew-symmetric-support).
Here is the reproducer for raven:
module load anaconda/3/2021.11 intel/21.6.0 impi/2021.6 mkl/2022.1 gcc/11 cuda/11.4
../configure CC=mpicc FC=mpiifort CXX=mpiicpc CFLAGS="-O3 -march=skylake-avx512" FCFLAGS="-O3 -xCORE-AVX512" SCALAPACK_FCFLAGS="-I/mpcdf/soft/SLE_15/packages/x86_64/intel_oneapi/2021.3/mkl/latest/include/intel64/lp64" SCALAPACK_LDFLAGS="-L/mpcdf/soft/SLE_15/packages/x86_64/intel_oneapi/2021.3/mkl/latest/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -Wl,-rpath,/mpcdf/soft/SLE_15/packages/x86_64/intel_oneapi/2021.3/mkl/latest/lib/intel64" --disable-openmp --disable-64bit-integer-math-support --disable-64bit-integer-mpi-support --enable-mpi-module --enable-detect-mpi-launcher --enable-generic --disable-sparc64 --disable-neon-arch64 --disable-vsx --enable-sse --enable-sse-assembly --enable-avx --enable-avx2 --enable-avx512 --disable-sve128 --disable-sve256 --disable-sve512 --disable-bgp --disable-bgp --enable-assumed-size --disable-ifx-compiler --enable-Fortran2008-features --enable-option-checking=fatal --disable-heterogenous-cluster-support --enable-timings --enable-band-to-full-blocking --without-threading-support-check-during-build --disable-runtime-threading-support-checks --disable-allow-thread-limiting --disable-gpu --enable-nvidia-gpu --disable-amd-gpu --disable-intel-gpu-sycl --disable-nvidia-sm80-gpu --disable-NVIDIA-gpu-memory-debug --disable-cuda-aware-mpi --disable-gpu-streams --disable-nvtx --disable-c-tests --disable-cpp-tests --disable-skew-symmetric-support --with-mpi=yes --disable-redirect --enable-single-precision --disable-autotuning --disable-scalapack-tests --disable-autotune-redistribute-matrix --with-papi=no --with-likwid=no --disable-store-build-config --disable-python --disable-python-tests --with-cuda-path="/mpcdf/soft/SLE_15/packages/x86_64/cuda/11.4.2" --with-NVIDIA-GPU-compute-capability=sm_80 --with-cusolver
make -j 18
Here is the error message we get:
ld: ./.libs/libelpa.so: undefined reference to `elpa_skew_eigenvectors_a_h_a_f'
ld: ./.libs/libelpa.so: undefined reference to `elpa_skew_eigenvalues_d_ptr_f'
ld: ./.libs/libelpa.so: undefined reference to `elpa_skew_eigenvalues_a_h_a_d'
ld: ./.libs/libelpa.so: undefined reference to `elpa_skew_eigenvectors_d_ptr_f'
ld: ./.libs/libelpa.so: undefined reference to `elpa_skew_eigenvectors_d_ptr_d'
ld: ./.libs/libelpa.so: undefined reference to `elpa_skew_eigenvectors_a_h_a_d'
ld: ./.libs/libelpa.so: undefined reference to `elpa_skew_eigenvalues_a_h_a_f'
ld: ./.libs/libelpa.so: undefined reference to `elpa_skew_eigenvalues_d_ptr_d'https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/99ELPA 2022 release crashes for a test case in FHI-aims2023-02-08T11:29:48ZSebastian KokottELPA 2022 release crashes for a test case in FHI-aimsI compiled FHI-aims with external ELPA on the MPCDF raven cluster using the current release version:
```
/mpcdf/soft/SLE_15/packages/skylake/elpa/intel_21.6.0-2021.6.0-impi_2021.6-2021.6.0/2022.05.001-standard/lib/libelpa.so
```
First, I...I compiled FHI-aims with external ELPA on the MPCDF raven cluster using the current release version:
```
/mpcdf/soft/SLE_15/packages/skylake/elpa/intel_21.6.0-2021.6.0-impi_2021.6-2021.6.0/2022.05.001-standard/lib/libelpa.so
```
First, I did some tests for smaller systems, and everything worked fine and was reproducible compared to the default version used in FHI-aims (2020).
Then, I checked for a large-scale system using 64 nodes. Here, the ELPA calls in the first and second cycles worked but crashed during the third cycle.
ELPA stopped after:
```
Updating Kohn-Sham eigenvalues and eigenvectors using ELSI and the ELPA eigensolver.
Starting ELPA eigensolver
Finished transformation to standard eigenproblem
| Time : 33.756 s
```
I'm attaching the runs with elpa2020 (success) and elpa2022 (fail).
Any idea what the source of the crash might be? Many thanks in advance!
[64_elpa_2022.tgz](/uploads/3ad66120371fbfff1e887b2c193de3d8/64_elpa_2022.tgz)
[64_elpa_2020.tgz](/uploads/73350b9a39d8da794888165a720d071f/64_elpa_2020.tgz)https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/96No tests with random matrix and nvhpcskd2022-04-05T13:47:08ZAndreas MarekNo tests with random matrix and nvhpcskdThe PGI compiler creates errors for the random seed, see
https://github.com/flang-compiler/flang/issues/691The PGI compiler creates errors for the random seed, see
https://github.com/flang-compiler/flang/issues/691https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/94Setting of GPU kernel depens on order of set calls2022-02-03T17:05:00ZAndreas MarekSetting of GPU kernel depens on order of set callsWhen setting
first set("solver",2stage) and then
set("kernel",GPU_KERNEL)
it uses the CPU kernel (the default kernel seems to be set)
In the other order it works correctlyWhen setting
first set("solver",2stage) and then
set("kernel",GPU_KERNEL)
it uses the CPU kernel (the default kernel seems to be set)
In the other order it works correctlyhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/93Internal Compiler Error with Intel compiler on branch cusolver_device_ptr2022-05-11T07:37:24ZAndreas MarekInternal Compiler Error with Intel compiler on branch cusolver_device_ptrIn a new branch, after adding a few new api routines the Intel compiler produces an ICE.
branch:
git checkout cusolver_device_ptr
Software used
Currently Loaded Modulefiles:
1) autoconf/2.69 2) automake/1.15 3) libtool/2.4.6 4...In a new branch, after adding a few new api routines the Intel compiler produces an ICE.
branch:
git checkout cusolver_device_ptr
Software used
Currently Loaded Modulefiles:
1) autoconf/2.69 2) automake/1.15 3) libtool/2.4.6 4) intel/21.3.0 5) impi/2021.3 6) mkl/2021.3
Configure line:
../configure CC=mpiicc CFLAGS="-O3 -march=skylake-avx512 -g" FC=mpiifort FCFLAGS="-O3 -g" SCALAPACK_FCFLAGS="-I/mpcdf/soft/SLE_15/packages/x86_64/intel_oneapi/2021.3/mkl/latest/include/intel64/lp64" SCALAPACK_LDFLAGS="-L/mpcdf/soft/SLE_15/packages/x86_64/intel_oneapi/2021.3/mkl/latest/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -Wl,-rpath,/mpcdf/soft/SLE_15/packages/x86_64/intel_oneapi/2021.3/mkl/latest/lib/intel64" --enable-avx512https://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/92external sanity checker for ELPA settings / input2021-08-25T12:07:43ZAndreas Marekexternal sanity checker for ELPA settings / inputhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/911 MPI rank per GPU rest with OpenMP threads2022-02-09T13:50:23ZAndreas Marek1 MPI rank per GPU rest with OpenMP threadshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/90Interfaces for device-memory arrays2021-09-02T16:36:58ZAndreas MarekInterfaces for device-memory arrayshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/89GPU version: limit rectriction to power of 2 block sizes2021-08-25T12:00:31ZAndreas MarekGPU version: limit rectriction to power of 2 block sizeshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/88Check status of mpi-redistribution routine2021-08-25T11:58:43ZAndreas MarekCheck status of mpi-redistribution routinehttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/issues/87Check status of GPU port of generalized routines2021-08-25T12:16:48ZAndreas MarekCheck status of GPU port of generalized routines