Implement MPI-parallel cumsum
Given: every task has an array a(:,freq)
, where the last axis is distributed over tasks.
Wanted: suma
, where suma(:,i) = sum(a[:,:i])
,
Possible approach:
- every task computes
suma_loc = np.cumsum(a, axis=-1)
- the "last" entries are gathered on all tasks:
tmp = np.array(comm.allgather(suma_loc[:,-1]))
- compute the local offset to add to
suma_loc
: tmp = np.sum(tmp[:,:rank], axis=-1)` - add to
suma_loc
:suma = suma_loc+tmp