Check performace of scan over angles.
!30 (merged) replaced the vectorized mapping over angles in the compiled kernel with a scan (i.e. serial loop) to improve memory usage. However this comes with potential under-utilization of a gpu in case of too small tiles. This must be checked, tested and improved in case.