- 20 Apr, 2016 1 commit
-
-
Andreas Marek authored
It turned out that if a CPU supports SSE the already existing test for SSE assembly instructions always passes. However, the compilation of gcc SSE intrinic instructions might nevertheless fail if gcc is not called with one of the options "-msse3", "-msse4" , "-msse4.1", "-msse4.2", "-mavx", or "-mavx2"! Obviously gcc does still not consider SSE as a standard on X86_64 Intel CPUs. An additional configure test has been introduced, which test for gcc intrinsic sse instructions. If this test fails, the corresponding kernels are switched off.
-
- 19 Apr, 2016 2 commits
-
-
Andreas Marek authored
The C++ kernels can be written as C kernels, which simplifies the build procedure
-
Andreas Marek authored
In order to increase type safty all ELPA2 kernels provide now an interface. The interfaces for the C/C++ kernels are automatically generated during the configure step
-
- 08 Apr, 2016 1 commit
-
-
Lorenz Hüdepohl authored
-
- 06 Apr, 2016 1 commit
-
-
Andreas Marek authored
-
- 05 Apr, 2016 1 commit
-
-
Andreas Marek authored
The SSE kernels with blocking of 2,4,6 (real case) and 1,2 (complex) case are now available by default Thus the following changes have been done - introduce new macros in configure.ac and Makefile.am - renmae the AVX kernels in AVX_AVX2 (they also support AVX2) - introduce new files with SSE kernel - introduce new kernel parameters ! - make the SSE kernels callable The results are identical with previous kernels
-
- 04 Apr, 2016 1 commit
-
-
Andreas Marek authored
- The SSE part will be available in different files. - Specify whether AVX or AVX2 was used to build
-
- 24 Feb, 2016 2 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
The configure flag "--enable-shared-memory-only" triggers a build of ELPA without MPI support: - all MPI calls are skipped (or overloaded) - all calls to scalapack functions are replaced by the corresponding lapack calls - all calls to blacs are skipped Using ELPA without MPI gives the same results as using ELPA with 1 MPI task! This version is not yet optimized for performance, here and there some unecessary copies are done. Ths version is intended for users, who do not have MPI in their application but still would like to use ELPA on one compute node
-
- 02 Feb, 2016 1 commit
-
-
Andreas Marek authored
The generic real kernel is now contained in a module, this allows strict interface checking! It also does not use assumed size arrays anymore. Both points increase the possibility to debug and find errors. However, this might be performance critical! It is possible to switch back to the old implementation if that turns out to be beneficial w.r.t. performance. Timings with gfortran 4.9 on Intel Haswell showed that the new implementation is about 30 percent faster then the previous one
-
- 16 Dec, 2015 1 commit
-
-
Andreas Marek authored
This commit does not change the interfaces defined in ELPA_2015.11.001 ! All functionality is available via the interface names and definitions as in ELPA_2015.11.001 But some new interfaces have been added, in order to unfiy the references from C and Fortran codes: - The procedures to create the ELPA (row/column) communicators are now available from C _and_ Fortran with the name "get_elpa_communicators". The old Fortran name "get_elpa_row_col_comms" and the old C name "elpa_get_communicators" are from now on deprecated but still available - The 1-stage solver routines are available from C _and_ Fortran via the names "solve_evp_real_1stage" and "solve_evp_complex_1stage". The old Fortran names "solve_evp_real" and "solve_evp_complex" are from now on deprecated but still functional. All documentation (man pages, doxygen, and example test programs) have been changed accordingly. This commit implies a change in the API versioning number, but no changes to codes calling ELPA (if they have been already updated to the API of ELPA_2015.11.001)
-
- 11 Dec, 2015 1 commit
-
-
Andreas Marek authored
- the contact email is now: elpa-library@mpcdf.mpg.de - the official website is now hosted at http://elpa.mpcdf.mpg.de
-
- 10 Dec, 2015 1 commit
-
-
Andreas Marek authored
The user functions of ELPA are now documented with doxygen tags. At the moment the interface of ELPA 2015.11.001 is decribed. The documentation has step by step to be implemented for all functions and test programms.
-
- 09 Dec, 2015 1 commit
-
-
Andreas Marek authored
This variables, do not have to be global, they can be parsed along internally in ELPA. Removing them makes debugging more easy and the public interface more lean
-
- 26 Nov, 2015 1 commit
-
-
Andreas Marek authored
The API versioning number was not updated correctly at the release. This lead to a wrong soname. This is fixed now
-
- 16 Nov, 2015 1 commit
-
-
Andreas Marek authored
Due to the efforts of Intel, ELPA features now build-in support of AVX2 and FMA for the latest Intel processors
-
- 05 Nov, 2015 1 commit
-
-
Andreas Marek authored
-
- 04 Nov, 2015 1 commit
-
-
Andreas Marek authored
-
- 03 Nov, 2015 1 commit
-
-
Andreas Marek authored
The examples, how to invoke ELPA from a c program have been updated. There are now examples for ELPA1 and ELPA2 both real and complex case. The test cases are still with less functionality than their Fortran counter parts, they are just ment as a "proof-of-concept".
-
- 24 Aug, 2015 1 commit
-
-
Andreas Marek authored
Inge Gutheil from FZ Juelich pointed out, that the configure test for BGQ failed due to typos. These are corrected now
-
- 26 May, 2015 1 commit
-
-
Andreas Marek authored
-
- 19 May, 2015 1 commit
-
-
Andreas Marek authored
An "dangling" fi has been removed
-
- 29 Apr, 2015 2 commits
-
-
Andreas Marek authored
Remove variables which are not needed (anymore)
-
Andreas Marek authored
The macros which define the functionality to test for - a specific real/complex kernel (not all available kernels) are now defined in files in the m4 directory
-
- 28 Apr, 2015 1 commit
-
-
Andreas Marek authored
-
- 27 Apr, 2015 1 commit
-
-
Lorenz Huedepohl authored
There was an inconsistency when the OpenMP flag was different for the Fortran and C compiler (e.g. -openmp for ifort and -fopenmp for gcc). This led to strange errors when linking the example program with the C main() routine when using Intel Fortran, Intel MPI, and GCC together, a typical error message was /usr/bin/ld: MPIR_Thread: TLS definition in [...]/intel64/lib/libmpi_dbg_mt.so section .tbss mismatches non-TLS definition in [...]/intel64/lib/libmpi_dbg.so section .bss [...]/intel64/lib/libmpi_dbg_mt.so: could not read symbols: Bad value The reason seems to be that the various MPI wrapper shell scripts (mpicc, mpiifort) need the correct OpenMP option to select the thread-safe Intel MPI debug library. Previously, always OPENMP_FCFLAGS was appended to LDFLAGS, which did not trigger this when linking a C main program with mpicc.
-
- 23 Mar, 2015 2 commits
-
-
Lorenz Huedepohl authored
Just adding -maxv works on many systems which have compiler that can produce AVX code but do not necessarily have processors with AVX support.
-
Lorenz Huedepohl authored
-
- 19 Mar, 2015 1 commit
-
-
Lorenz Huedepohl authored
The flag -mavx was not removed from C/CXXFLAGS again if AVX is unusable
-
- 18 Mar, 2015 1 commit
-
-
- provide C interface for ELPA Library - correct an error in the test case for QR-decomposition
-
- 11 Mar, 2015 2 commits
-
-
Lorenz Huedepohl authored
Some compilers detected the static out-of-bounds condition present in the test code and refused to compile it.
-
Andreas Marek authored
C interfaces are now available and definied in the header elpa.h
-
- 11 Feb, 2015 1 commit
-
-
Andreas Marek authored
Error in configure test program fixed
-
- 02 Feb, 2015 1 commit
-
-
Andreas Marek authored
As obvious from the previous commits, this release of ELPA introduces a (optional) QR-decomposition for real valued matrices. This option can be used at run-time by either setting an environment variable, or by calling the ELPA-2 solver for real matrices with an additional flag. Thus the ABI changed, w.r.t. previous versions. Furthermore, the build process of ELPA has been made more consistent. All optimization flags (especially O1, O2 etc.) have to be set at build time by the user via the CFLAGS, FCFLAGS, and CXXFLAGS variables. The configure script does not set automatically the "O-Flags" anymore.
-
- 30 Jan, 2015 1 commit
-
-
Lorenz Huedepohl authored
Some users where "clever" enough to supply a library in LDFLAGS/LIBS thath contained omp_get_num_threads, therefore tricking configure into thinking that we do not need any flags to enable OpenMP. Now the Fortran test only works if "use omp_lib" and "!$" OpenMP conditional compilation work. Also, if no valid OpenMP flag could be detected configure silently continued. I changed this to an explicit error.
-
- 29 Jan, 2015 1 commit
-
-
Andreas Marek authored
The qr decomposition is now available as a runtime choice. Some testing has still to be done
-
- 28 Jan, 2015 1 commit
-
-
Andreas Marek authored
-
- 27 Jan, 2015 1 commit
-
-
Lorenz Huedepohl authored
-
- 25 Aug, 2014 2 commits
-
-
Andreas Marek authored
At build time it can be specified that the ELPA test programs give more detailed timing information, which is usefull for performace measurements
-
Andreas Marek authored
If specified in the configure step, the test programs redirect their stdout and stderr output of each MPI task in a seperate file, which will be stored in a subdirectory "mpi_stdout". This will only be done if the environment variable "REDIRECT_ELPA_TEST_OUTPUT" is set to "true"
-