1. 21 Jan, 2021 1 commit
  2. 14 Dec, 2020 1 commit
  3. 29 Sep, 2020 1 commit
  4. 25 Sep, 2020 1 commit
  5. 11 Aug, 2020 1 commit
  6. 10 Aug, 2020 1 commit
  7. 31 Jul, 2020 2 commits
  8. 10 Jun, 2020 1 commit
  9. 05 Jun, 2020 1 commit
  10. 08 Apr, 2020 1 commit
  11. 03 Apr, 2020 1 commit
  12. 28 Mar, 2020 2 commits
  13. 24 Mar, 2020 1 commit
  14. 18 Mar, 2020 1 commit
  15. 05 Mar, 2020 2 commits
  16. 02 Mar, 2020 1 commit
  17. 28 Feb, 2020 1 commit
  18. 04 Dec, 2019 1 commit
  19. 20 Nov, 2019 2 commits
    • Wenzhe Yu's avatar
      GPU memory optimization in ELPA2 · af7bb4a0
      Wenzhe Yu authored
      * Removed redundant malloc, memset and memcpy
      * Use pinned host memory
      * Implemented blocking for GPU code path in step5
      * Removed unused code
      af7bb4a0
    • Wenzhe Yu's avatar
      Rewrite compute_hh_trafo CUDA kernels · 6cd5a4f1
      Wenzhe Yu authored
      * Switch to a simple non-WY algorithm
      * Unify real and complex cases
      * Update reduction kernel
      * Use __shfl_xor_sync for warp reduce (CUDA 9+)
      * Support 2^n block size, n = 1,2,...,10
      * Use templates when possible
      * Clean up unused CUDA functions
      * Increase default stripe width when using GPU
      6cd5a4f1
  20. 28 Oct, 2019 1 commit
    • Pavel Kus's avatar
      partially addressing issues with the GPU kernel · ec5b3bec
      Pavel Kus authored
      This commit addresses several issues. It essentially forbids the use of
      the GPU kernel, which become obsolete and caused problems. But it
      does not complete remove the related code, nor does it forbid from
      explicitly selecting the GPU kernel. However, if the user does select
      it, the warning will be issued and the GENERIC kernel would be used
      instead. In the more details:
      * Commentin out operations in the GPU kernel, which do not compile with
        CUDA 10.1. This makes the kernel deffinitely not ussable (but it was
        true even before)
      * removing the gpu_tridiag_band option, sincie the tridiag->banded routine
        is actually not ported to GPU at all. This step will thus always be
        run on the CPU
      * removing the gpu_trans_ev_tridi_to_band option, since the GPU version
        of this step cannot run without the GPU kernel and it is not usable.
        This step will thus also be performed on the CPU
      * modifying REAL_GPU_KERNEL_ONLY_WHEN_GPU_IS_ACTIVE and
        COMPLEX_GPU_KERNEL_ONLY_WHEN_GPU_IS_ACTIVE such that the GPU kernel is
        not considered during the autotuning
      
      * TODO however, the GPU kernel can still be enforced by the user. In
        this case, during the calculation, a warning is issued and the kernel
        is switched to the GENERIC one. This should be improved and there
        should not even be the possibility to choose the GPU kernel at the
        begining.
      ec5b3bec
  21. 23 Oct, 2019 1 commit
  22. 22 Oct, 2019 1 commit
    • Pavel Kus's avatar
      fixing not freed a_dev · 57821b44
      Pavel Kus authored
      a_dev was never freed on the GPU
      However, this might not be enough. What if bandred runs on GPU and
      band_to_tridi on CPU? a_dev is then not allocated. Has to be re-thinked
      in general
      57821b44
  23. 19 Oct, 2019 1 commit
  24. 17 Oct, 2019 1 commit
    • Andreas Marek's avatar
      Experimental feature: 64bit integer support for MPI · 043ddf39
      Andreas Marek authored
      ELPA can now be linked against a 64bit integer version of MPI and
      ScalaPack. This is an experimental feature
      
      The following points are still to be done
      - does not work with real QR-decomposition
      - generalized routines return wrong results
      - the C tests and the C Cannon algorithm implementation do not
        work (no 64bit header files for MPI *at least* with Intel MPI)
      043ddf39
  25. 10 Oct, 2019 1 commit
    • Andreas Marek's avatar
      HETEROGENOUS_CLUSTER support · dd47b584
      Andreas Marek authored
      On heterogenous cluster, of nodes with different CPUs the _experimental_
      feature (--enable-heterogenous-cluster-support) can be used:
      
      It compares the (Intel) cpuid set of all CPUs which are used by ELPA MPI
      processes and finds the SIMD instruction set, which is supported by all
      used CPUs. The ELPA 2stage back-transformation kernel (a.k.a "kernel")
      will be set accordingly on all MPI processes.
      
      This feature, can override the setting of the kernel done previously by
      the user!
      
      At the moment it will only work for Intel CPUs, i.e. clusters consisting
      of nodes with Intel CPUs and e.g. AMD CPUs are at the moment _NOT_
      supported.
      
      Since this is an experimental feature, it might be dropped again in the
      future, if it turns out not to be useful for the users
      dd47b584
  26. 26 Sep, 2019 1 commit
  27. 04 Jul, 2019 1 commit
  28. 16 May, 2019 1 commit
    • Pavel Kus's avatar
      adding instrumentation for likwid · 9bbe883f
      Pavel Kus authored
      also introducing --with-likwid parameter to the configure script
      
      Can be used to compile with likwid and to enable the likwid to measure
      performance of individual compute steps
      9bbe883f
  29. 18 Apr, 2019 1 commit
  30. 16 Jan, 2019 1 commit
  31. 11 Jan, 2019 1 commit
  32. 08 Nov, 2018 1 commit
  33. 07 Nov, 2018 2 commits
  34. 15 Oct, 2018 1 commit
    • Pavel Kus's avatar
      doing GPU initialization for the first time only · d900a3e1
      Pavel Kus authored
      The GPU initialization is actually quite constly, e.g. on Minsky it
      takes roughly 0.7s. That is hurting performance for small matrices.
      Thus a check has been added and now GPU should be initialized only the
      first time.
      d900a3e1
  35. 04 Jun, 2018 1 commit