Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • elpa elpa
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 21
    • Issues 21
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • elpa
  • elpaelpa
  • Issues
  • #11

Closed
Open
Created Apr 06, 2016 by Andreas Marek@amarekOwner

Crash of ELPA AVX kernels

As recieved per email:

I am experiencing stability issues with the ELPA AVX kernels. I have tested the latest stable tar-ball release (2015-11) as well as the latest git master version.

I have used the following hardware/software combinations, both of which support AVX2 instructions:

Intel i5-5200U (Haswell)/gcc&gfortran 5.3/openmpi 1.10.2/netlib blas/lapack 3.6.0-4 / netlib scalapack 2.0.2.-4

Cray XC40 Intel Xeon E5-2690v3 (Haswell)/gcc&gfortran 5.2 (with cray wrappers)/cray-mpich 7.3.1/cray-libsci 13.3.0

I have configured ELPA with ./configure --prefix=/home/nico/lib --with-avx-optimization FCFLAGS="-O3 -march=haswell -mavx2 -mfma" CFLAGS="-O3 -march=haswell -mavx2 -mfma" CXXFLAGS="-O3 -march=haswell -mavx2 -mfma". A config.log is attached for the first machine.

I have tested the different available ELPA kernels using the included test file test_real2_choose_kernel_with_api.F90 by changing the kernel directly in the call to solve_evp_real_2stage (REAL_ELPA_KERNEL_*) and recompiling. Running the script with e.g. "mpiexec -n 1 ./elpa2_test_real_choose_kernel_with_api 64 32 16" (the test matrix is the same with each call to the program) everything runs without issue with the generic, generic_simple and SSE kernels. However, with all the AVX kernels the program crashes roughly 75% of the time during the backtransformation tridi->band.

I have attached an example output of a crash with the AVX_BLOCK_2 kernel. As far as I have been able to debug (using strategically placed prints), the crash occurs on line 191 of elpa2_kernels_real_avx-avx2_2hv.c (__m256d x1 = _mm256_load_pd(&q[ldq]);)

Assignee
Assign to
Time tracking