- 03 Mar, 2021 1 commit
-
-
Andreas Marek authored
-
- 02 Mar, 2021 1 commit
-
-
Andreas Marek authored
-
- 27 Feb, 2021 2 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
-
- 26 Feb, 2021 4 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
-
Andreas Marek authored
- Rename keyword "gpu" -> "nvidia-gpu" - Add keyword "amd-gpu"
-
Andreas Marek authored
-
- 25 Feb, 2021 1 commit
-
-
Andreas Marek authored
-
- 24 Feb, 2021 3 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
-
Andreas Marek authored
The GPU logic has been implemented in the OpenMP code paths in ELPA2. Currently, this implies that _internal_ to ELPA2, the number of OpenMP threads is set to one (independent of how many threads the calling application uses) and the original value is restored at the end of ELPA. Though this is not the general case, this is _not_ a limitation, since in the GPU case no work is done on the CPU and thus no threading is helpful
-
- 15 Feb, 2021 1 commit
-
-
Andreas Marek authored
-
- 14 Dec, 2020 1 commit
-
-
Andreas Marek authored
-
- 18 Nov, 2020 1 commit
-
-
Andreas Marek authored
-
- 17 Nov, 2020 1 commit
-
-
Andreas Marek authored
-
- 25 Sep, 2020 1 commit
-
-
Andreas Marek authored
-
- 18 Sep, 2020 1 commit
-
-
Wenzhe Yu authored
-
- 05 Jun, 2020 1 commit
-
-
Wenzhe Yu authored
-
- 02 Jun, 2020 1 commit
-
-
Andreas Marek authored
-
- 20 Nov, 2019 2 commits
-
-
Wenzhe Yu authored
* cudaMallocHost * cudaFreeHost * cudaHostRegister * cudaHostUnregister
-
Wenzhe Yu authored
* Switch to a simple non-WY algorithm * Unify real and complex cases * Update reduction kernel * Use __shfl_xor_sync for warp reduce (CUDA 9+) * Support 2^n block size, n = 1,2,...,10 * Use templates when possible * Clean up unused CUDA functions * Increase default stripe width when using GPU
-
- 30 Oct, 2019 1 commit
-
-
Pavel Kus authored
-
- 23 Oct, 2019 1 commit
-
-
Pavel Kus authored
with cudaDeviceSynchronize
-
- 22 Oct, 2019 2 commits
-
-
Sebastian Ohlmann authored
When profiling the GPU version, NVTX can be used to highlight the corresponding regions of the code in the timeline of the profiling tool (nvvp or nsight systems). This is very useful to correlate what happens on the GPU with what part of the code we are in.
-
Pavel Kus authored
-
- 14 Oct, 2019 1 commit
-
-
Wenzhe Yu authored
-
- 15 Feb, 2019 1 commit
-
-
Andreas Marek authored
-
- 15 Oct, 2018 1 commit
-
-
Pavel Kus authored
The GPU initialization is actually quite constly, e.g. on Minsky it takes roughly 0.7s. That is hurting performance for small matrices. Thus a check has been added and now GPU should be initialized only the first time.
-
- 22 Feb, 2018 1 commit
-
-
Andreas Marek authored
-
- 11 Sep, 2017 1 commit
-
-
Andreas Marek authored
-
- 29 Aug, 2017 1 commit
-
-
Andreas Marek authored
-
- 24 Aug, 2017 1 commit
-
-
Andreas Marek authored
-
- 03 Aug, 2017 1 commit
-
-
Andreas Marek authored
-
- 27 Apr, 2017 1 commit
-
-
Pavel Kus authored
on misnky
-
- 18 Apr, 2017 1 commit
-
-
Pavel Kus authored
-
- 12 Apr, 2017 1 commit
-
-
Pavel Kus authored
-
- 06 Apr, 2017 3 commits
-
-
Andreas Marek authored
-
Andreas Marek authored
-
Andreas Marek authored
-