- 31 May, 2016 1 commit
-
-
Andreas Marek authored
a preprocessor flag was missing
-
- 23 May, 2016 2 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
-
- 19 May, 2016 1 commit
-
-
Andreas Marek authored
-
- 03 May, 2016 1 commit
-
-
Andreas Marek authored
-
- 02 May, 2016 1 commit
-
-
Andreas Marek authored
-
- 29 Apr, 2016 1 commit
-
-
Andreas Marek authored
-
- 25 Apr, 2016 2 commits
-
-
Andreas Marek authored
In case of single precision calculations the stripe_width needs to be a multiple, which differs from the double precision by a factor of 2 since one needs 32 bytes alignment and the sizeof(float) and sizeof(double) is different by a factor of two This commit closes issue #18
-
Andreas Marek authored
The sub-kernels _8_ and _4_ were wrong This also solves problems with single precision SSE Block 6 kernel, since it also uses the Block 4 kernel
-
- 24 Apr, 2016 1 commit
-
-
Andreas Marek authored
The correct type is "float complex" for single precision, not "complex". Double precision should be "double complex" This closes issue #17
-
- 20 Apr, 2016 1 commit
-
-
Andreas Marek authored
It turned out that if a CPU supports SSE the already existing test for SSE assembly instructions always passes. However, the compilation of gcc SSE intrinic instructions might nevertheless fail if gcc is not called with one of the options "-msse3", "-msse4" , "-msse4.1", "-msse4.2", "-mavx", or "-mavx2"! Obviously gcc does still not consider SSE as a standard on X86_64 Intel CPUs. An additional configure test has been introduced, which test for gcc intrinsic sse instructions. If this test fails, the corresponding kernels are switched off.
-
- 19 Apr, 2016 3 commits
-
-
Andreas Marek authored
The C++ kernels can be written as C kernels, which simplifies the build procedure
-
Andreas Marek authored
In order to increase type safty all ELPA2 kernels provide now an interface. The interfaces for the C/C++ kernels are automatically generated during the configure step
-
Andreas Marek authored
-
- 18 Apr, 2016 3 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
-
Andreas Marek authored
-
- 15 Apr, 2016 1 commit
-
-
Andreas Marek authored
-
- 14 Apr, 2016 2 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
-
- 13 Apr, 2016 4 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
-
Andreas Marek authored
-
Andreas Marek authored
Quite likely the FMA4 (AMD) implementation has never been tested. There is a fishy intrinsic call, which is most likely an typo Abort with #error at compile time
-
- 12 Apr, 2016 1 commit
-
-
Andreas Marek authored
-
- 08 Apr, 2016 2 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
-
- 05 Apr, 2016 1 commit
-
-
Andreas Marek authored
The SSE kernels with blocking of 2,4,6 (real case) and 1,2 (complex) case are now available by default Thus the following changes have been done - introduce new macros in configure.ac and Makefile.am - renmae the AVX kernels in AVX_AVX2 (they also support AVX2) - introduce new files with SSE kernel - introduce new kernel parameters ! - make the SSE kernels callable The results are identical with previous kernels
-
- 04 Apr, 2016 2 commits
-
-
Andreas Marek authored
- The SSE part will be available in different files. - Specify whether AVX or AVX2 was used to build
-
Andreas Marek authored
-
- 01 Apr, 2016 1 commit
-
-
Andreas Marek authored
The single precision version of the SSE assembly kernel is about 1.8 times faster than the double precision version
-
- 18 Mar, 2016 1 commit
-
-
Andreas Marek authored
library It the configure option "--enable-single-precision" is specified, ELPA will also be build for single precision usage. The double precision and single precision will be available at the same time with names "solve_evp_real_1stage_double" or "solve_evp_real_1stage_single" and so on... This change immplied some major refactoring of the ELPA code: 1.) functions/procedures had to be renamed with suffix "_double" 2.) If necessary the same functions have to be available with suffix "_single" 3.) Variable kind definitions have to be consistent with the intented use To avoid uneccessary code duplication this is done (most of the time) with preprocessor string substitution. The documentation has been updated. NOT SUPPORTED are at the moment: - single precision usage of ELPA2 with kernels, others than "generic" and "generic_simple" - single precision usage of GPU
-
- 24 Feb, 2016 2 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
The configure flag "--enable-shared-memory-only" triggers a build of ELPA without MPI support: - all MPI calls are skipped (or overloaded) - all calls to scalapack functions are replaced by the corresponding lapack calls - all calls to blacs are skipped Using ELPA without MPI gives the same results as using ELPA with 1 MPI task! This version is not yet optimized for performance, here and there some unecessary copies are done. Ths version is intended for users, who do not have MPI in their application but still would like to use ELPA on one compute node
-
- 04 Feb, 2016 1 commit
-
-
Andreas Marek authored
-
- 02 Feb, 2016 5 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
This commit is performance critical and has to be timed carefully. Thus one can switch back to the old implementation. The new one, however is more safe and better to debug
-
Andreas Marek authored
-
Andreas Marek authored
This commit might be performance critical, it has to be timed carefully. Thus one can switch back to the old implementation. The new one, however, is more safe and better to debug
-
Andreas Marek authored
-