elpa merge requestshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests2024-03-28T10:01:40Zhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/174Fix merge of hipsolver, NCCL/RCCL bugs2024-03-28T10:01:40ZPetr KarpovFix merge of hipsolver, NCCL/RCCL bugs- Fix hipsolver merge problem
- Fix RCCL bug, correctness tested on LUMI
- Fix NCCL bugs: NCCL codepath was disactivated in elpa1/tridiag_template.F90, invert_trm_template.F90, and multiply_a_b/elpa_multiply_a_b_template.F90
- Change boo...- Fix hipsolver merge problem
- Fix RCCL bug, correctness tested on LUMI
- Fix NCCL bugs: NCCL codepath was disactivated in elpa1/tridiag_template.F90, invert_trm_template.F90, and multiply_a_b/elpa_multiply_a_b_template.F90
- Change bool->int in ELPA1 tridiagonalization C-backend and Fortran interfacesAndreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/173Man fix for elpa_autotune_deallocate2024-03-20T20:31:38ZPetr KarpovMan fix for elpa_autotune_deallocateFix C-interface for 'elpa_autotune_deallocate' man pageFix C-interface for 'elpa_autotune_deallocate' man pageAndreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/172Fix print settings2024-03-20T07:35:42ZPetr KarpovFix print settings- Fix elpa_print_settings for CFLAGS=-D_FORTIFY_SOURCE=2 (like in OBS GNU installation)
- Move setup_gpu() after setting runtime options in test.c
- Add DeviceSynchronize() after kernel call in [cuda|hip]_check_device_info_FromC. This fi...- Fix elpa_print_settings for CFLAGS=-D_FORTIFY_SOURCE=2 (like in OBS GNU installation)
- Move setup_gpu() after setting runtime options in test.c
- Add DeviceSynchronize() after kernel call in [cuda|hip]_check_device_info_FromC. This fixes a potential problem of not catching a problem in gpusolver?potrf, when info_dev!=0Andreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/171Add man page for elpa_setup_gpu2024-03-14T08:42:40ZPetr KarpovAdd man page for elpa_setup_gpuAndreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/170Draft: Master pre stage2024-03-09T07:24:23ZAndreas MarekDraft: Master pre stagehttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/169Master pre stage2024-03-09T07:23:09ZAndreas MarekMaster pre stagehttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/168Rccl support2024-03-02T12:07:40ZAndreas MarekRccl supportAndreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/167Convert recommendation into possibility2024-03-04T10:17:37ZTobias MelsonConvert recommendation into possibilityAndreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/166GPU Cholesky optimization, solves #1092024-03-06T06:09:55ZPetr KarpovGPU Cholesky optimization, solves #109- Added elpa_gpu_ccl_transpose_vectors in Cholesky-GPU
- Extract memcpy of info outside of cublas?potrf
- Move nccl_group_start out of the loops
- delete unused vendor_agnostic_layer_template.F90
- Add new cusolverDnXpotrf interface (cus...- Added elpa_gpu_ccl_transpose_vectors in Cholesky-GPU
- Extract memcpy of info outside of cublas?potrf
- Move nccl_group_start out of the loops
- delete unused vendor_agnostic_layer_template.F90
- Add new cusolverDnXpotrf interface (cusolverDn?potrf is deprecated)Andreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/165Master pre stage2024-02-28T07:07:26ZAndreas MarekMaster pre stagehttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/164Add GPU device information to gpu_object2024-02-23T07:11:26ZAndreas MarekAdd GPU device information to gpu_object- At start up some GPU devices parameters are queried and stored
- For example the count of SM processors is passed to (some) kernels- At start up some GPU devices parameters are queried and stored
- For example the count of SM processors is passed to (some) kernelshttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/163Gpu cholesky2024-02-15T06:18:07ZAndreas MarekGpu choleskyAndreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/162Master pre stage2024-02-07T11:40:13ZAndreas MarekMaster pre stageAndreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/161Fix cublas caching for cublasGemv, cublasGemm2024-02-06T07:21:13ZPetr KarpovFix cublas caching for cublasGemv, cublasGemmFix the problem with cublas caching for cublasGemv, cublasGemm.
It has been introduced with cublas 11.11.3.6 (https://docs.nvidia.com/cuda/archive/11.8.0/cuda-toolkit-release-notes/index.html):
- Introduced cuBlasLt heuristics cache ...Fix the problem with cublas caching for cublasGemv, cublasGemm.
It has been introduced with cublas 11.11.3.6 (https://docs.nvidia.com/cuda/archive/11.8.0/cuda-toolkit-release-notes/index.html):
- Introduced cuBlasLt heuristics cache that stores the mapping of matmul problems to kernels previously selected by heuristics. That helps reduce the host-side overhead for repeating matmul problems. Refer to https://docs.nvidia.com/cuda/cublas/index.html#cublasLt-heuristics-cache.
The problem with caching was resolved by NVIDIA with cublas 12.3.4.1 https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cublas-release-12-3-update-1
For the intermediate cublas version we have to switch caching by hand using cublasLtHeuristicsCacheSetCapacity(0).Andreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/160Mpi setup object2024-01-25T11:08:56ZAndreas MarekMpi setup objectAndreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/159Async in trans ev2023-12-21T08:15:53ZAndreas MarekAsync in trans evAndreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/158Master pre stage2023-12-17T09:17:22ZAndreas MarekMaster pre stageAndreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/157Fix memory access violation bug in double_complex cuda_store_u_v_in_uv_vu kernel2023-12-11T15:14:30ZPetr KarpovFix memory access violation bug in double_complex cuda_store_u_v_in_uv_vu kernelAndreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/156Master pre stage2023-12-07T13:06:07ZAndreas MarekMaster pre stageAndreas MarekAndreas Marekhttps://gitlab.mpcdf.mpg.de/elpa/elpa/-/merge_requests/155elpa1 gpu optimization2023-12-05T08:44:18ZPetr Karpovelpa1 gpu optimizationFixes for the ELPA1 GPU optimizations concerning the synchronization in dot-product-like kernels. The vulnerability was exposed by SYCL on CPU tests.Fixes for the ELPA1 GPU optimizations concerning the synchronization in dot-product-like kernels. The vulnerability was exposed by SYCL on CPU tests.Andreas MarekAndreas Marek