Crash of ELPA AVX kernels
As recieved per email:
I am experiencing stability issues with the ELPA AVX kernels. I have tested the latest stable tar-ball release (2015-11) as well as the latest git master version.
I have used the following hardware/software combinations, both of which support AVX2 instructions:
Intel i5-5200U (Haswell)/gcc&gfortran 5.3/openmpi 1.10.2/netlib blas/lapack 3.6.0-4 / netlib scalapack 2.0.2.-4
Cray XC40 Intel Xeon E5-2690v3 (Haswell)/gcc&gfortran 5.2 (with cray wrappers)/cray-mpich 7.3.1/cray-libsci 13.3.0
I have configured ELPA with ./configure --prefix=/home/nico/lib --with-avx-optimization FCFLAGS="-O3 -march=haswell -mavx2 -mfma" CFLAGS="-O3 -march=haswell -mavx2 -mfma" CXXFLAGS="-O3 -march=haswell -mavx2 -mfma". A config.log is attached for the first machine.
I have tested the different available ELPA kernels using the included test file test_real2_choose_kernel_with_api.F90 by changing the kernel directly in the call to solve_evp_real_2stage (REAL_ELPA_KERNEL_*) and recompiling. Running the script with e.g. "mpiexec -n 1 ./elpa2_test_real_choose_kernel_with_api 64 32 16" (the test matrix is the same with each call to the program) everything runs without issue with the generic, generic_simple and SSE kernels. However, with all the AVX kernels the program crashes roughly 75% of the time during the backtransformation tridi->band.
I have attached an example output of a crash with the AVX_BLOCK_2 kernel. As far as I have been able to debug (using strategically placed prints), the crash occurs on line 191 of elpa2_kernels_real_avx-avx2_2hv.c (__m256d x1 = _mm256_load_pd(&q[ldq]);)