Fix cublas caching for cublasGemv, cublasGemm
Fix the problem with cublas caching for cublasGemv, cublasGemm.
It has been introduced with cublas 11.11.3.6 (https://docs.nvidia.com/cuda/archive/11.8.0/cuda-toolkit-release-notes/index.html):
- Introduced cuBlasLt heuristics cache that stores the mapping of matmul problems to kernels previously selected by heuristics. That helps reduce the host-side overhead for repeating matmul problems. Refer to https://docs.nvidia.com/cuda/cublas/index.html#cublasLt-heuristics-cache.
The problem with caching was resolved by NVIDIA with cublas 12.3.4.1 https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cublas-release-12-3-update-1
For the intermediate cublas version we have to switch caching by hand using cublasLtHeuristicsCacheSetCapacity(0).