1. 04 Dec, 2019 1 commit
  2. 20 Nov, 2019 4 commits
    • Wenzhe Yu's avatar
      Use cuBLAS in multiply_a_b · 85782a1f
      Wenzhe Yu authored
      85782a1f
    • Wenzhe Yu's avatar
      GPU memory optimization in ELPA2 · af7bb4a0
      Wenzhe Yu authored
      * Removed redundant malloc, memset and memcpy
      * Use pinned host memory
      * Implemented blocking for GPU code path in step5
      * Removed unused code
      af7bb4a0
    • Wenzhe Yu's avatar
      Extend CUDA wrapper · 6e5c03a6
      Wenzhe Yu authored
      * cudaMallocHost
      * cudaFreeHost
      * cudaHostRegister
      * cudaHostUnregister
      6e5c03a6
    • Wenzhe Yu's avatar
      Rewrite compute_hh_trafo CUDA kernels · 6cd5a4f1
      Wenzhe Yu authored
      * Switch to a simple non-WY algorithm
      * Unify real and complex cases
      * Update reduction kernel
      * Use __shfl_xor_sync for warp reduce (CUDA 9+)
      * Support 2^n block size, n = 1,2,...,10
      * Use templates when possible
      * Clean up unused CUDA functions
      * Increase default stripe width when using GPU
      6cd5a4f1
  3. 14 Oct, 2019 6 commits
  4. 09 Oct, 2019 8 commits
  5. 08 Oct, 2019 4 commits
  6. 07 Oct, 2019 2 commits
  7. 01 Oct, 2019 1 commit
  8. 30 Sep, 2019 1 commit
  9. 26 Sep, 2019 4 commits
  10. 24 Sep, 2019 1 commit
  11. 23 Sep, 2019 2 commits
  12. 20 Sep, 2019 1 commit
  13. 12 Sep, 2019 1 commit
  14. 11 Sep, 2019 1 commit
  15. 09 Sep, 2019 1 commit
  16. 06 Sep, 2019 1 commit
  17. 05 Sep, 2019 1 commit