 01 Apr, 2016 1 commit


Andreas Marek authored
The single precision version of the SSE assembly kernel is about 1.8 times faster than the double precision version

 29 Mar, 2016 1 commit


Andreas Marek authored
At the moment only the generic kernels are available for singleprecision. The SSE and AVX kernels have still to be ported.

 18 Mar, 2016 2 commits


Andreas Marek authored

Andreas Marek authored
library It the configure option "enablesingleprecision" is specified, ELPA will also be build for single precision usage. The double precision and single precision will be available at the same time with names "solve_evp_real_1stage_double" or "solve_evp_real_1stage_single" and so on... This change immplied some major refactoring of the ELPA code: 1.) functions/procedures had to be renamed with suffix "_double" 2.) If necessary the same functions have to be available with suffix "_single" 3.) Variable kind definitions have to be consistent with the intented use To avoid uneccessary code duplication this is done (most of the time) with preprocessor string substitution. The documentation has been updated. NOT SUPPORTED are at the moment:  single precision usage of ELPA2 with kernels, others than "generic" and "generic_simple"  single precision usage of GPU

 04 Mar, 2016 2 commits


Andreas Marek authored

Andreas Marek authored
files

 29 Feb, 2016 1 commit


Andreas Marek authored

 26 Feb, 2016 7 commits


Angerer, Christoph (cangerer) authored

Andreas Marek authored

Andreas Marek authored

Andreas Marek authored

Andreas Marek authored

Angerer, Christoph (cangerer) authored

Angerer, Christoph (cangerer) authored
moved declaration of istat and errorMessage outside #ifdef WITH_OPENMP because otherwise it doesn't compile

 24 Feb, 2016 4 commits


Andreas Marek authored

Andreas Marek authored

Andreas Marek authored
The test programs include the same template now, the printed messages are thus unified

Andreas Marek authored
The configure flag "enablesharedmemoryonly" triggers a build of ELPA without MPI support:  all MPI calls are skipped (or overloaded)  all calls to scalapack functions are replaced by the corresponding lapack calls  all calls to blacs are skipped Using ELPA without MPI gives the same results as using ELPA with 1 MPI task! This version is not yet optimized for performance, here and there some unecessary copies are done. Ths version is intended for users, who do not have MPI in their application but still would like to use ELPA on one compute node

 18 Feb, 2016 2 commits


Andreas Marek authored

Andreas Marek authored

 17 Feb, 2016 4 commits


Andreas Marek authored

Andreas Marek authored

Andreas Marek authored

Andreas Marek authored
ELPA2 can now be build (as ELPA1) for single precision calculations. The ELPA2 kernles which are implemented in assembler, C, or C++ have NOT yet been ported. Thus at the moment only the GENERIC and GENERIC_SIMPLE kernels support single precision calculations

 15 Feb, 2016 1 commit


Andreas Marek authored
This version is not tested yet

 12 Feb, 2016 1 commit


Andreas Marek authored
ELPA2 can now be build (as ELPA1) for single precision calculations. The ELPA2 kernles which are implemented in assembler, C, or C++ have NOT yet been ported. Thus at the moment only the GENERIC and GENERIC_SIMPLE kernels support single precision calculations

 11 Feb, 2016 1 commit


Andreas Marek authored
With the configure option "enablesingleprecision" ELPA1 is build with singleprecision (halfwords) only. The best precision in singleprecision (float or complex) is 2^23 ~ 1.2e7. The accuracy of the error residual of ELPA1 in singleprecision mode is of the order 1e4 to 1e5. The orthogonality of the EV's is fullfilled up to about ~1e6. Thus the precision of ELPA1 in singleprecision mode is roughly 100  1000 times less than the best achievable precison. This is consistent with the doubleprecision mode, where also a factor of 100  1000 less precision than the theoretical best one is found. The float EVs are identical to the double EVs to at least 1e2, the precision of the EVs is thus about 1e7/1e2 = 1e5 times lower than the best theoretical precision. If the same holds for the double precision calculations, this implies that the double precision results can also be only trusted on the level 1e11 (5 orders of magnitude larger than the best theoretical precision) The best speedup compared to the double precision calculation is a factor of two. This is by far not achieved yet, since the singl precision version is not at all optimized at the moment

 05 Feb, 2016 1 commit


Andreas Marek authored
As always the debug messages appear if the environment variable is set

 04 Feb, 2016 2 commits


Andreas Marek authored

Andreas Marek authored

 03 Feb, 2016 4 commits


Andreas Marek authored

Andreas Marek authored

Andreas Marek authored

Andreas Marek authored

 02 Feb, 2016 6 commits


Andreas Marek authored

Andreas Marek authored

Andreas Marek authored

Andreas Marek authored

Andreas Marek authored

Andreas Marek authored
This commit is performance critical and has to be timed carefully. Thus one can switch back to the old implementation. The new one, however is more safe and better to debug
