Allow to use NVIDIA cub in real GPU kernel (might give ~10% speedup)

Merge request reports

Loading