• Alexander Heinecke's avatar
    This commit improves ELPA's performance on Intel(R) Xeon(R) E5v2 and E5v3 series CPUs by: · fe63372d
    Alexander Heinecke authored
    - enabling fusing iterations of stage 5 in ELPA2 for every configuration
    - Changed reduction bandwidth in ELPA2 to be at least 64
    - partial OpenMP parallelization of the QR factorization in bandred_real
    - OpenMP parallelization of SYMM
    - OpenMP parallelization of SYR2K in bandred_real
    - OpenMP parallelization for elpa1_reduce_add_vectors and elpa1_transpose_vectors
    - AVX2 support in backtransformation elpa2_kernels (FMA3 instructions introduced with Haswell microarchitecture)
    fe63372d
elpa1.F90 138 KB