- 18 Mar, 2016 1 commit
-
-
Andreas Marek authored
library It the configure option "--enable-single-precision" is specified, ELPA will also be build for single precision usage. The double precision and single precision will be available at the same time with names "solve_evp_real_1stage_double" or "solve_evp_real_1stage_single" and so on... This change immplied some major refactoring of the ELPA code: 1.) functions/procedures had to be renamed with suffix "_double" 2.) If necessary the same functions have to be available with suffix "_single" 3.) Variable kind definitions have to be consistent with the intented use To avoid uneccessary code duplication this is done (most of the time) with preprocessor string substitution. The documentation has been updated. NOT SUPPORTED are at the moment: - single precision usage of ELPA2 with kernels, others than "generic" and "generic_simple" - single precision usage of GPU
-
- 24 Feb, 2016 2 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
The configure flag "--enable-shared-memory-only" triggers a build of ELPA without MPI support: - all MPI calls are skipped (or overloaded) - all calls to scalapack functions are replaced by the corresponding lapack calls - all calls to blacs are skipped Using ELPA without MPI gives the same results as using ELPA with 1 MPI task! This version is not yet optimized for performance, here and there some unecessary copies are done. Ths version is intended for users, who do not have MPI in their application but still would like to use ELPA on one compute node
-
- 11 Feb, 2016 1 commit
-
-
Andreas Marek authored
With the configure option "--enable-single-precision" ELPA1 is build with single-precision (half-words) only. The best precision in single-precision (float or complex) is 2^-23 ~ 1.2e-7. The accuracy of the error residual of ELPA1 in single-precision mode is of the order 1e-4 to 1e-5. The orthogonality of the EV's is fullfilled up to about ~1e-6. Thus the precision of ELPA1 in single-precision mode is roughly 100 - 1000 times less than the best achievable precison. This is consistent with the double-precision mode, where also a factor of 100 - 1000 less precision than the theoretical best one is found. The float EVs are identical to the double EVs to at least 1e-2, the precision of the EVs is thus about 1e-7/1e-2 = 1e5 times lower than the best theoretical precision. If the same holds for the double precision calculations, this implies that the double precision results can also be only trusted on the level 1e-11 (5 orders of magnitude larger than the best theoretical precision) The best speed-up compared to the double precision calculation is a factor of two. This is by far not achieved yet, since the singl precision version is not at all optimized at the moment
-
- 02 Feb, 2016 1 commit
-
-
Andreas Marek authored
The generic real kernel is now contained in a module, this allows strict interface checking! It also does not use assumed size arrays anymore. Both points increase the possibility to debug and find errors. However, this might be performance critical! It is possible to switch back to the old implementation if that turns out to be beneficial w.r.t. performance. Timings with gfortran 4.9 on Intel Haswell showed that the new implementation is about 30 percent faster then the previous one
-
- 22 Dec, 2015 1 commit
-
-
Andreas Marek authored
-
- 16 Dec, 2015 1 commit
-
-
Andreas Marek authored
This commit does not change the interfaces defined in ELPA_2015.11.001 ! All functionality is available via the interface names and definitions as in ELPA_2015.11.001 But some new interfaces have been added, in order to unfiy the references from C and Fortran codes: - The procedures to create the ELPA (row/column) communicators are now available from C _and_ Fortran with the name "get_elpa_communicators". The old Fortran name "get_elpa_row_col_comms" and the old C name "elpa_get_communicators" are from now on deprecated but still available - The 1-stage solver routines are available from C _and_ Fortran via the names "solve_evp_real_1stage" and "solve_evp_complex_1stage". The old Fortran names "solve_evp_real" and "solve_evp_complex" are from now on deprecated but still functional. All documentation (man pages, doxygen, and example test programs) have been changed accordingly. This commit implies a change in the API versioning number, but no changes to codes calling ELPA (if they have been already updated to the API of ELPA_2015.11.001)
-
- 11 Dec, 2015 1 commit
-
-
Andreas Marek authored
- the contact email is now: elpa-library@mpcdf.mpg.de - the official website is now hosted at http://elpa.mpcdf.mpg.de
-
- 10 Dec, 2015 1 commit
-
-
Andreas Marek authored
The user functions of ELPA are now documented with doxygen tags. At the moment the interface of ELPA 2015.11.001 is decribed. The documentation has step by step to be implemented for all functions and test programms.
-
- 09 Dec, 2015 1 commit
-
-
Andreas Marek authored
This variables, do not have to be global, they can be parsed along internally in ELPA. Removing them makes debugging more easy and the public interface more lean
-
- 26 Nov, 2015 1 commit
-
-
Andreas Marek authored
The API versioning number was not updated correctly at the release. This lead to a wrong soname. This is fixed now
-
- 16 Nov, 2015 1 commit
-
-
Andreas Marek authored
Due to the efforts of Intel, ELPA features now build-in support of AVX2 and FMA for the latest Intel processors
-
- 05 Nov, 2015 1 commit
-
-
Andreas Marek authored
-
- 04 Nov, 2015 1 commit
-
-
Andreas Marek authored
-
- 03 Nov, 2015 1 commit
-
-
Andreas Marek authored
The examples, how to invoke ELPA from a c program have been updated. There are now examples for ELPA1 and ELPA2 both real and complex case. The test cases are still with less functionality than their Fortran counter parts, they are just ment as a "proof-of-concept".
-
- 24 Aug, 2015 1 commit
-
-
Andreas Marek authored
Inge Gutheil from FZ Juelich pointed out, that the configure test for BGQ failed due to typos. These are corrected now
-
- 16 Jun, 2015 1 commit
-
-
Andreas Marek authored
-
- 26 May, 2015 1 commit
-
-
Andreas Marek authored
-
- 21 May, 2015 1 commit
-
-
Andreas Marek authored
- The compiler options for nvcc are changed - The include paths are updated
-
- 19 May, 2015 1 commit
-
-
Andreas Marek authored
An "dangling" fi has been removed
-
- 29 Apr, 2015 5 commits
-
-
Andreas Marek authored
Remove variables which are not needed (anymore)
-
Andreas Marek authored
The macros which define the functionality to test for - a specific real/complex kernel (not all available kernels) are now defined in files in the m4 directory
-
Andreas Marek authored
Remove variables which are not needed (anymore)
-
Andreas Marek authored
The macros which define the functionality to test for - GPU support only (no CPU based kernels) - a specific real/complex kernel (not all available kernels) are now defined in files in the m4 directory
-
Andreas Marek authored
Configure treats the GPU kernels now as any other kernel, i. e. if GPU support is enabled (and it is possible to build it) then it will be build in ADDITION to all other possible kernels for the desired hardware. Also, it is possbile to configure the build process for the GPU version ONLY (as it was already possible to trigger the build for only ONE specific real/complex kernel). Note: The sources at the moment CANNOT handle this, i.e. if GPU support is configured, the GPU only code path is compiled. This will be changed in the near future.
-
- 28 Apr, 2015 1 commit
-
-
Andreas Marek authored
-
- 27 Apr, 2015 1 commit
-
-
Lorenz Huedepohl authored
There was an inconsistency when the OpenMP flag was different for the Fortran and C compiler (e.g. -openmp for ifort and -fopenmp for gcc). This led to strange errors when linking the example program with the C main() routine when using Intel Fortran, Intel MPI, and GCC together, a typical error message was /usr/bin/ld: MPIR_Thread: TLS definition in [...]/intel64/lib/libmpi_dbg_mt.so section .tbss mismatches non-TLS definition in [...]/intel64/lib/libmpi_dbg.so section .bss [...]/intel64/lib/libmpi_dbg_mt.so: could not read symbols: Bad value The reason seems to be that the various MPI wrapper shell scripts (mpicc, mpiifort) need the correct OpenMP option to select the thread-safe Intel MPI debug library. Previously, always OPENMP_FCFLAGS was appended to LDFLAGS, which did not trigger this when linking a C main program with mpicc.
-
- 24 Mar, 2015 1 commit
-
-
Andreas Marek authored
-
- 23 Mar, 2015 2 commits
-
-
Lorenz Huedepohl authored
Just adding -maxv works on many systems which have compiler that can produce AVX code but do not necessarily have processors with AVX support.
-
Lorenz Huedepohl authored
-
- 19 Mar, 2015 1 commit
-
-
Lorenz Huedepohl authored
The flag -mavx was not removed from C/CXXFLAGS again if AVX is unusable
-
- 18 Mar, 2015 1 commit
-
-
Andreas Marek authored
- provide C interface for ELPA Library - correct an error in the test case for QR-decomposition
-
- 11 Mar, 2015 2 commits
-
-
Lorenz Huedepohl authored
Some compilers detected the static out-of-bounds condition present in the test code and refused to compile it.
-
Andreas Marek authored
C interfaces are now available and definied in the header elpa.h
-
- 11 Feb, 2015 1 commit
-
-
Andreas Marek authored
Error in configure test program fixed
-
- 02 Feb, 2015 1 commit
-
-
Andreas Marek authored
As obvious from the previous commits, this release of ELPA introduces a (optional) QR-decomposition for real valued matrices. This option can be used at run-time by either setting an environment variable, or by calling the ELPA-2 solver for real matrices with an additional flag. Thus the ABI changed, w.r.t. previous versions. Furthermore, the build process of ELPA has been made more consistent. All optimization flags (especially O1, O2 etc.) have to be set at build time by the user via the CFLAGS, FCFLAGS, and CXXFLAGS variables. The configure script does not set automatically the "O-Flags" anymore.
-
- 30 Jan, 2015 1 commit
-
-
Lorenz Huedepohl authored
Some users where "clever" enough to supply a library in LDFLAGS/LIBS thath contained omp_get_num_threads, therefore tricking configure into thinking that we do not need any flags to enable OpenMP. Now the Fortran test only works if "use omp_lib" and "!$" OpenMP conditional compilation work. Also, if no valid OpenMP flag could be detected configure silently continued. I changed this to an explicit error.
-
- 29 Jan, 2015 1 commit
-
-
Andreas Marek authored
The qr decomposition is now available as a runtime choice. Some testing has still to be done
-
- 28 Jan, 2015 1 commit
-
-
Andreas Marek authored
-
- 27 Jan, 2015 1 commit
-
-
Lorenz Huedepohl authored
-