Added the option to also pad at the start of field arrays.The `FieldZeroPadder` was able to pad either in the center of the field arrays or at the end.
WIP: More constant support
Fixes #304
`NormalTransform` and `LognormalTransform`

I found the functions around `_LognormalMomentMatching` and `_normal` from `correlated_fields.py` quite handy, so I moved them to a separate file and exposed them in the NIFTy namespace:
- `ift.NormalTransform` for the OpChain that tran...I found the functions around `_LognormalMomentMatching` and `_normal` from `correlated_fields.py` quite handy, so I moved them to a separate file and exposed them in the NIFTy namespace:
- `ift.NormalTransform` for the OpChain that transforms standard normally distributed values to normally distributed values
- `ift.LognormalTransform` for the OpChain that '...' to log-normally distributed values with given mean and std
- `ift.utilities.lognormal_moments` to calculate parameters for gauss(x, m, sig) so that exp(gauss(x, m, sig)) has the given mean and std. Used in `ift.LognormalTransform`, but also useful for calculating prior values for lognormal models.
- `ift.utilities.value_reshaper`: helper to make arrays of shape `(N,)` from scalars and arrays of length one.
Also added tests for the transforms.
This supersedes !511, and @reimar is OK with it :)

Before merging I'll run a few tests with EHT and varying number of MPI tasks.
Before merging I'll run a few tests with EHT and varying number of MPI tasks.This supersedes !511, and @reimar is OK with it :)
Before merging I'll run a few tests with EHT and varying number of MPI tasks.https://gitlab.mpcdf.mpg.de/ift/nifty/-/merge_requests/511WIP: Made _sumup of KL faster by summing up in parallel when possible2020-05-27T15:38:26ZReimar H LeikeWIP: Made _sumup of KL faster by summing up in parallel when possibleMarked as work in progress because
1. has to be tested whether it is really faster
2. Speed could in theory be even more increased if one uses the numpy versions of communication, i.e. Send and Recv
The basic idea of this change is...Marked as work in progress because
1. has to be tested whether it is really faster
2. Speed could in theory be even more increased if one uses the numpy versions of communication, i.e. Send and Recv
The basic idea of this change is to sum up the KL in a tree-type structure. This goes as follows.
Suppose we have this list:
```
1 2 3 4 5 6
```
Then we combine the numbers pairwise:
```
1 2 3 4 5 6
| / | / | /
3 7 11
```
We then iterate this procedure:
```
3 7 11
| / |
10 11
```
Here the 11 was at the end of the list, and was thus not summed. Finally we get
```
10 11
| /
21
```
This procedure converges in O(ln(N)) steps, if N is the length of the list.
Suppose the objects of the list are distributed over different MPI tasks, then in every step we need at most N/2 communications. However, because the range of numbers is shared consecutively, every task needs to communicate at most 2 times per iteration, so most communications are done in parallel. The method is organized such that the result of a sum is always computed by the higher ranked tasks. i.e. if the list is distributed over 4 tasks as follows:
```
1 1 2 2 3 4
```
Then the ranks that hold the sums evolve as follows:
```
1 1 2 3 3 4
| / | / | /
1 3 4
```
So far two communication were needed, with task 3 needing to communicate twice. It continues as
```
1 3 4
| / |
3 4
| /
| /
4
```
This patch tries to carry out summations over samples in a completely deterministic way, independent of the number of MPI tasks involved. My first tests with EHT indicate that this works, i.e. we get bit-identical results for the same initial conditions when varying the number of tasks.

@reimar I'm not absolutely sure that we have a test for `_metric_sample`. How hard would it be to add one?
@reimar I'm not absolutely sure that we have a test for `_metric_sample`. How hard would it be to add one?