1. 22 Apr, 2016 1 commit
  2. 20 Apr, 2016 1 commit
    • Andreas Marek's avatar
      Additional configure check for gcc SSE intrinsics · 896388e9
      Andreas Marek authored
      It turned out that if a CPU supports SSE the already existing
      test for SSE assembly instructions always passes.
      However, the compilation of gcc SSE intrinic instructions might
      nevertheless fail if gcc is not called with one of the options
      "-msse3", "-msse4" , "-msse4.1", "-msse4.2", "-mavx", or "-mavx2"!
      
      Obviously gcc does still not consider SSE as a standard on X86_64
      Intel CPUs.
      
      An additional configure test has been introduced, which test for
      gcc intrinsic sse instructions. If this test fails, the corresponding
      kernels are switched off.
      896388e9
  3. 19 Apr, 2016 2 commits
  4. 08 Apr, 2016 1 commit
  5. 06 Apr, 2016 1 commit
  6. 05 Apr, 2016 1 commit
    • Andreas Marek's avatar
      Introduction of new SSE kernels with different blocking · 69792b15
      Andreas Marek authored
      The SSE kernels with blocking of 2,4,6 (real case) and 1,2 (complex)
      case are now available by default
      
      Thus the following changes have been done
      - introduce new macros in configure.ac and Makefile.am
      - renmae the AVX kernels in AVX_AVX2 (they also support AVX2)
      - introduce new files with SSE kernel
      - introduce new kernel parameters !
      - make the SSE kernels callable
      
      The results are identical with previous kernels
      69792b15
  7. 04 Apr, 2016 1 commit
  8. 01 Apr, 2016 1 commit
  9. 18 Mar, 2016 1 commit
    • Andreas Marek's avatar
      Allow ELPA to be build with single and double precision symbols in one · 647aa5a8
      Andreas Marek authored
      library
      
      It the configure option "--enable-single-precision" is specified,
      ELPA will also be build for single precision usage. The double precision
      and single precision will be available at the same time with names
      "solve_evp_real_1stage_double" or "solve_evp_real_1stage_single" and
      so on...
      
      This change immplied some major refactoring of the ELPA code:
      1.) functions/procedures had to be renamed with suffix "_double"
      
      2.) If necessary the same functions have to be available with suffix
      "_single"
      
      3.) Variable kind definitions have to be consistent with the
      intented use
      
      To avoid uneccessary code duplication this is done (most of the time)
      with preprocessor string substitution.
      
      The documentation has been updated.
      
      NOT SUPPORTED are at the moment:
      
      - single precision usage of ELPA2 with kernels, others than "generic"
        and "generic_simple"
      
      - single precision usage of GPU
      647aa5a8
  10. 24 Feb, 2016 2 commits
    • Andreas Marek's avatar
      Add migration notice · 31a03aa2
      Andreas Marek authored
      31a03aa2
    • Andreas Marek's avatar
      Optional build of ELPA without MPI · 49f119aa
      Andreas Marek authored
      The configure flag "--enable-shared-memory-only" triggers a build
      of ELPA without MPI support:
      
      - all MPI calls are skipped (or overloaded)
      - all calls to scalapack functions are replaced by the corresponding
        lapack calls
      - all calls to blacs are skipped
      
      Using ELPA without MPI gives the same results as using ELPA with 1 MPI
      task!
      
      This version is not yet optimized for performance, here and there some
      unecessary copies are done.
      
      Ths version is intended for users, who do not have MPI in their
      application but still would like to use ELPA on one compute node
      49f119aa
  11. 11 Feb, 2016 1 commit
    • Andreas Marek's avatar
      Enable single-precision calculations for ELPA1 · de6a4fde
      Andreas Marek authored
      With the configure option "--enable-single-precision" ELPA1 is build
      with single-precision (half-words) only.
      
      The best precision in single-precision (float or complex) is
      2^-23 ~ 1.2e-7. The accuracy of the error residual of ELPA1 in
      single-precision mode is of the order 1e-4 to 1e-5. The orthogonality of
      the EV's is fullfilled up to about ~1e-6.
      
      Thus the precision of ELPA1 in single-precision mode is roughly 100 -
      1000 times less than the best achievable precison. This is consistent
      with the double-precision mode, where also a factor of 100 - 1000 less
      precision than the theoretical best one is found.
      
      The float EVs are identical to the double EVs to at least 1e-2, the
      precision of the EVs is thus about 1e-7/1e-2 = 1e5 times lower than the
      best theoretical precision. If the same holds for the double precision
      calculations, this implies that the double precision results can also
      be only trusted on the level 1e-11 (5 orders of magnitude larger
      than the best theoretical precision)
      
      The best speed-up compared to the double precision calculation is
      a factor of two. This is by far not achieved yet, since the singl
      precision version is not at all optimized at the moment
      de6a4fde
  12. 02 Feb, 2016 1 commit
    • Andreas Marek's avatar
      Remove assumed size from generic real kernel · cb4c4ae7
      Andreas Marek authored
      The generic real kernel is now contained in a module, this allows
      strict interface checking! It also does not use assumed size arrays
      anymore. Both points increase the possibility to debug and find errors.
      
      However, this might be performance critical! It is possible to
      switch back to the old implementation if that turns out to
      be beneficial w.r.t. performance. Timings with gfortran 4.9 on Intel
      Haswell showed that the new implementation is about 30 percent faster
      then the previous one
      cb4c4ae7
  13. 22 Dec, 2015 1 commit
  14. 16 Dec, 2015 1 commit
    • Andreas Marek's avatar
      Add interface to unify C and Fortran names · bb046d1c
      Andreas Marek authored
      This commit does not change the interfaces defined in ELPA_2015.11.001 !
      All functionality is available via the interface names and definitions
      as in ELPA_2015.11.001
      
      But some new interfaces have been added, in order to unfiy the
      references from C and Fortran codes:
      
      - The procedures to create the ELPA (row/column) communicators are now
        available from C _and_ Fortran with the name "get_elpa_communicators".
        The old Fortran name "get_elpa_row_col_comms" and the old C name
        "elpa_get_communicators" are from now on deprecated but still available
      
      - The 1-stage solver routines are available from C _and_ Fortran via
        the names "solve_evp_real_1stage" and "solve_evp_complex_1stage".
        The old Fortran names "solve_evp_real" and "solve_evp_complex" are
        from now on deprecated but still functional.
      
      All documentation (man pages, doxygen, and example test programs) have
      been changed accordingly.
      
      This commit implies a change in the API versioning number, but no
      changes to codes calling ELPA (if they have been already updated to the
      API of ELPA_2015.11.001)
      bb046d1c
  15. 11 Dec, 2015 1 commit
  16. 10 Dec, 2015 1 commit
    • Andreas Marek's avatar
      Create doxygen documentation for ELPA · 927f988a
      Andreas Marek authored
      The user functions of ELPA are now documented with doxygen tags.
      At the moment the interface of ELPA 2015.11.001 is decribed.
      
      The documentation has step by step to be implemented for all functions
      and test programms.
      927f988a
  17. 09 Dec, 2015 1 commit
  18. 26 Nov, 2015 1 commit
    • Andreas Marek's avatar
      ELPA_2015.11 release fix · 318ba8e2
      Andreas Marek authored
      The API versioning number was not updated correctly at the release.
      This lead to a wrong soname.
      
      This is fixed now
      318ba8e2
  19. 16 Nov, 2015 1 commit
    • Andreas Marek's avatar
      ELPA_2015.11.001 release · 16ad394d
      Andreas Marek authored
      Due to the efforts of Intel, ELPA features now build-in
      support of AVX2 and FMA for the latest Intel processors
      16ad394d
  20. 05 Nov, 2015 1 commit
  21. 04 Nov, 2015 1 commit
  22. 03 Nov, 2015 1 commit
    • Andreas Marek's avatar
      Update of c test cases · 505004e7
      Andreas Marek authored
      The examples, how to invoke ELPA from a c program have been updated.
      There are now examples for ELPA1 and ELPA2 both real and complex case.
      The test cases are still with less functionality than their Fortran
      counter parts, they are just ment as a "proof-of-concept".
      505004e7
  23. 24 Aug, 2015 1 commit
  24. 16 Jun, 2015 1 commit
  25. 26 May, 2015 1 commit
  26. 21 May, 2015 1 commit
  27. 19 May, 2015 1 commit
  28. 29 Apr, 2015 5 commits
    • Andreas Marek's avatar
      Cleanup of configure.ac · 701a7cff
      Andreas Marek authored
      Remove variables which are not needed (anymore)
      701a7cff
    • Andreas Marek's avatar
      configure.ac: move ELPA specific macros into ./m4 · 93c19c5e
      Andreas Marek authored
      The macros which define the functionality to test
      for
       - a specific real/complex kernel (not all available kernels)
      
      are now defined in files in the m4 directory
      93c19c5e
    • Andreas Marek's avatar
      Cleanup of configure.ac · 18c83c76
      Andreas Marek authored
      Remove variables which are not needed (anymore)
      18c83c76
    • Andreas Marek's avatar
      configure.ac: move ELPA specific macros into ./m4 · c788ec6b
      Andreas Marek authored
      The macros which define the functionality to test
      for
       - GPU support only (no CPU based kernels)
       - a specific real/complex kernel (not all available kernels)
      
      are now defined in files in the m4 directory
      c788ec6b
    • Andreas Marek's avatar
      configure.ac: treat GPU kernel as other kernels · 0a27d7c8
      Andreas Marek authored
      Configure treats the GPU kernels now as any other kernel, i. e.
      if GPU support is enabled (and it is possible to build it) then
      it will be build in ADDITION to all other possible kernels for
      the desired hardware.
      
      Also, it is possbile to configure the build process for
      the GPU version ONLY (as it was already possible to trigger the
      build for only ONE specific real/complex kernel).
      
      Note: The sources at the moment CANNOT handle this, i.e. if
      GPU support is configured, the GPU only code path is compiled.
      This will be changed in the near future.
      0a27d7c8
  29. 28 Apr, 2015 1 commit
  30. 27 Apr, 2015 1 commit
    • Lorenz Huedepohl's avatar
      Handle different OpenMP flags for Fortran and C · ba9a188f
      Lorenz Huedepohl authored
      There was an inconsistency when the OpenMP flag was different for the
      Fortran and C compiler (e.g. -openmp for ifort and -fopenmp for gcc).
      
      This led to strange errors when linking the example program with the C
      main() routine when using Intel Fortran, Intel MPI, and GCC together, a
      typical error message was
      
        /usr/bin/ld: MPIR_Thread: TLS definition in [...]/intel64/lib/libmpi_dbg_mt.so section .tbss mismatches non-TLS definition in [...]/intel64/lib/libmpi_dbg.so section .bss
        [...]/intel64/lib/libmpi_dbg_mt.so: could not read symbols: Bad value
      
      The reason seems to be that the various MPI wrapper shell scripts
      (mpicc, mpiifort) need the correct OpenMP option to select the
      thread-safe Intel MPI debug library. Previously, always OPENMP_FCFLAGS
      was appended to LDFLAGS, which did not trigger this when linking a C
      main program with mpicc.
      ba9a188f
  31. 24 Mar, 2015 1 commit
  32. 23 Mar, 2015 2 commits
  33. 19 Mar, 2015 1 commit