-
Alexander Heinecke authored
Current fix does as much blocking as possible, which should be beneficial from both a compute and communication point of view. Additionally, a second possible fix was added which just calls the blocked version if the local matrix has a sufficient size. This might create smaller and more messages at scale.
b6ce5fd4