Fix cublas caching for cublasGemv, cublasGemm (!161) · Merge requests · elpa / elpa

Fix the problem with cublas caching for cublasGemv, cublasGemm.

Introduced cuBlasLt heuristics cache that stores the mapping of matmul problems to kernels previously selected by heuristics. That helps reduce the host-side overhead for repeating matmul problems. Refer to https://docs.nvidia.com/cuda/cublas/index.html#cublasLt-heuristics-cache.

For the intermediate cublas version we have to switch caching by hand using cublasLtHeuristicsCacheSetCapacity(0).

Fix cublas caching for cublasGemv, cublasGemm