WIP: Made _sumup of KL faster by summing up in parallel when possible (!511) · Merge requests · ift / NIFTy

Reimar H Leike requested to merge faster_MPIKL into NIFTy_6 May 27, 2020

Marked as work in progress because

has to be tested whether it is really faster
Speed could in theory be even more increased if one uses the numpy versions of communication, i.e. Send and Recv

The basic idea of this change is to sum up the KL in a tree-type structure. This goes as follows. Suppose we have this list:

1  2  3  4  5  6

Then we combine the numbers pairwise:

1  2  3  4  5  6
| /   |  /  |  /
3     7     11

We then iterate this procedure:

3  7  11
| /   | 
10    11

Here the 11 was at the end of the list, and was thus not summed. Finally we get

10 11
|  /  
21

This procedure converges in O(ln(N)) steps, if N is the length of the list. Suppose the objects of the list are distributed over different MPI tasks, then in every step we need at most N/2 communications. However, because the range of numbers is shared consecutively, every task needs to communicate at most 2 times per iteration, so most communications are done in parallel. The method is organized such that the result of a sum is always computed by the higher ranked tasks. i.e. if the list is distributed over 4 tasks as follows:

1  1  2  2  3  4

Then the ranks that hold the sums evolve as follows:

1  1  2  3  3  4
| /   |  /  |  /
1     3     4

So far two communication were needed, with task 3 needing to communicate twice. It continues as

with two more necessary communications. In the end task 4 broadcasts the result.

Edited May 27, 2020 by Reimar H Leike

WIP: Made _sumup of KL faster by summing up in parallel when possible

Merge request reports