This commit improves ELPA's performance on Intel(R) Xeon(R) E5v2 and E5v3 series CPUs by:
- enabling fusing iterations of stage 5 in ELPA2 for every configuration - Changed reduction bandwidth in ELPA2 to be at least 64 - partial OpenMP parallelization of the QR factorization in bandred_real - OpenMP parallelization of SYMM - OpenMP parallelization of SYR2K in bandred_real - OpenMP parallelization for elpa1_reduce_add_vectors and elpa1_transpose_vectors - AVX2 support in backtransformation elpa2_kernels (FMA3 instructions introduced with Haswell microarchitecture)
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.