correlated Field model linearization adjoint very slow if total_N != 0

As noticed by @parras @pfrank and me the adjoint of the linearization of the correlated field model is very slow if total_N != 0.

Here is a demo:

import nifty8 as ift
import numpy as np

sp1 = ift.RGSpace((4000, 4000))
cfmaker = ift.CorrelatedFieldMaker('')
cfmaker.add_fluctuations(sp1, (0.1, 1e-2), (2, .2), (.01, .5), (-4, 2.),
                         'amp1')
cfmaker.set_amplitude_total_offset(0., (1e-2, 1e-6))
cf0 = cfmaker.finalize(0)

n_tot=1
cfmaker = ift.CorrelatedFieldMaker('', total_N=n_tot)
cfmaker.add_fluctuations(sp1, (0.1, 1e-2), (2, .2), (.01, .5), (-4, 2.),
                         'amp1', dofdex=np.arange(n_tot))
cfmaker.set_amplitude_total_offset(0., (1e-2, 1e-6), dofdex=np.arange(n_tot))
cf1 = cfmaker.finalize(0)

print("benchmark for total_N = 0")
ift.exec_time(cf0)
print("benchmark for total_N = 1")
ift.exec_time(cf1)

I had a quick look at the issue. The problem seems to be the _Distributor operator in the correlated field model file: https://gitlab.mpcdf.mpg.de/ift/nifty/-/blob/NIFTy_8/src/library/correlated_fields.py#L214

For the case of total_N != 0 this _Distributor is called multiple times. The call which is very slow in adjoint direction is in line: https://gitlab.mpcdf.mpg.de/ift/nifty/-/blob/NIFTy_8/src/library/correlated_fields.py#L362 and the line below.

Note this _Distibutor operator is only used for the case total_N != 0, and therefore the simple case with total_N = 0 is reasonably fast.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information