`Information Field Theory <http://www.mpa-garching.mpg.de/ift/>`_ [1]_ (IFT) is information theory, the logic of reasoning under uncertainty, applied to fields.

A field can be any quantity defined over some space, e.g. the air temperature over Europe, the magnetic field strength in the Milky Way, or the matter density in the Universe.

...

...

@@ -33,78 +35,7 @@ NIFTy comes with reimplemented MAP and VI estimators.

.. [5] T.A. Enßlin (2019), "Information theory for fields", accepted by Annalen der Physik; `[DOI] <https://doi.org/10.1002/andp.201800127>`_, `[arXiv:1804.03350] <http://arxiv.org/abs/1804.03350>`_

Discretized continuum

---------------------

The representation of fields that are mathematically defined on a continuous space in a finite computer environment is a common necessity.

The goal hereby is to preserve the continuum limit in the calculus in order to ensure a resolution independent discretization.

Any partition of the continuous position space :math:`\Omega` (with volume :math:`V`) into a set of :math:`Q` disjoint, proper subsets :math:`\Omega_q` (with volumes :math:`V_q`) defines a pixelization,

Here the number :math:`Q` characterizes the resolution of the pixelization and the continuum limit is described by :math:`Q \rightarrow \infty` and :math:`V_q \rightarrow 0` for all :math:`q \in \{1,\dots,Q\}` simultaneously.

Moreover, the above equation defines a discretization of continuous integrals, :math:`\int_\Omega \mathrm{d}x \mapsto \sum_q V_q`.

Any valid discretization scheme for a field :math:`{s}` can be described by a mapping,

if the weighting function :math:`w_q(x)` is chosen appropriately.

In order for the discretized version of the field to converge to the actual field in the continuum limit, the weighting functions need to be normalized in each subset; i.e., :math:`\forall q: \int_{\Omega_q} \mathrm{d}x \; w_q(x) = 1`.

Choosing such a weighting function that is constant with respect to :math:`x` yields

which corresponds to a discretization of the field by spatial averaging.

Another common and equally valid choice is :math:`w_q(x) = \delta(x-x_q)`, which distinguishes some position :math:`x_q \in \Omega_q`, and evaluates the continuous field at this position,

In practice, one often makes use of the spatially averaged pixel position, :math:`x_q = \left< x \right>_{\Omega_q}`.

If the resolution is high enough to resolve all features of the signal field :math:`{s}`, both of these discretization schemes approximate each other, :math:`\left< s(x) \right>_{\Omega_q} \approx s(\left< x \right>_{\Omega_q})`, since they approximate the continuum limit by construction.

(The approximation of :math:`\left< s(x) \right>_{\Omega_q} \approx s(x_q \in \Omega_q)` marks a resolution threshold beyond which further refinement of the discretization reveals no new features; i.e., no new information content of the field :math:`{s}`.)

All operations involving position integrals can be normalized in accordance with the above definitions.

For example, the scalar product between two fields :math:`{s}` and :math:`{u}` is defined as

where :math:`\dagger` denotes adjunction and :math:`*` complex conjugation.

Since the above approximation becomes an equality in the continuum limit, the scalar product is independent of the pixelization scheme and resolution, if the latter is sufficiently high.

The above line of argumentation analogously applies to the discretization of operators.

For a linear operator :math:`{A}` acting on some field :math:`{s}` as :math:`{A} {s} = \int_\Omega \mathrm{d}y \; A(x,y) \; s(y)`, a matrix representation discretized with constant weighting functions is given by

The proper discretization of spaces, fields, and operators, as well as the normalization of position integrals, is essential for the conservation of the continuum limit.

Their consistent implementation in NIFTy allows a pixelization independent coding of algorithms.

Free Theory & Implicit Operators

--------------------------------

...

...

@@ -205,7 +136,7 @@ NIFTy takes advantage of this formulation in several ways:

3) The response can be non-linear, e.g. :math:`{R'(s)=R \exp(A\,\xi)}`, see `demos/getting_started_2.py`.

4) The amplitude operator may dependent on further parameters, e.g. :math:`A=A(\tau)= F\, \widehat{e^\tau}` represents an amplitude operator with a positive definite, unknown spectrum defined in the Fourier domain.

4) The amplitude operator may depend on further parameters, e.g. :math:`A=A(\tau)= F\, \widehat{e^\tau}` represents an amplitude operator with a positive definite, unknown spectrum defined in the Fourier domain.

The amplitude field :math:`{\tau}` would get its own amplitude operator, with a cepstrum (spectrum of a log spectrum) defined in quefrency space (harmonic space of a logarithmically binned harmonic space) to regularize its degrees of freedom by imposing some (user-defined degree of) spectral smoothness.

5) NIFTy calculates the gradient of the information Hamiltonian and the Fisher information metric with respect to all unknown parameters, here :math:`{\xi}` and :math:`{\tau}`, by automatic differentiation.

...

...

@@ -296,7 +227,7 @@ Thus, only the gradient of the KL is needed with respect to this, which can be e

We stochastically estimate the KL-divergence and gradients with a set of samples drawn from the approximate posterior distribution.

The particular structure of the covariance allows us to draw independent samples solving a certain system of equations.

This KL-divergence for MGVI is implemented in the class MetricGaussianKL within NIFTy5.

This KL-divergence for MGVI is implemented in the class :class:`~minimization.metric_gaussian_kl.MetricGaussianKL` within NIFTy5.

The demo `getting_started_3.py` for example not only infers a field this way, but also the power spectrum of the process that has generated the field.

.. note:: Some of this discussion is rather technical and may be skipped in a first read-through.

Setup

.....

IFT employs stochastic processes to model distributions over function spaces, in particular Gaussian processes :math:`s \sim \mathcal{G}(s,k)` where :math:`k` denotes the covariance function.

The domain of the fields, and hence :math:`k`, is given by a Riemannian manifold :math:`(\mathcal{M},g)`, where :math:`g` denotes a Riemannian metric.

Fields are defined to be scalar functions on the manifold, living in the function space :math:`\mathcal{F}(\mathcal{M})`.

Unless we find ourselves in the lucky situation that we can solve for the posterior statistics of interest analytically, we need to apply numerical methods.

This is where NIFTy comes into play.

.. figure:: images/inference.png

:width: 80%

:align: center

Figure 1: Sketch of the various spaces and maps involved in the inference process.

A typical setup for inference of such signals using NIFTy is shown in figure 1.

We start with a continuous signal :math:`s \in \mathcal{S}`, defined in some function space :math:`\mathcal{S} := \mathcal{F}(\mathcal{M})` over a manifold :math:`(\mathcal{M},g)` with metric :math:`g`.

This is measured by some instrument, e.g. a telescope.

The measurement produces data in an unstructured data space :math:`\mathcal{D}` via a known response function :math:`R : \mathcal{S} \rightarrow \mathcal{D}` and involves noise :math:`\mathcal{D} \ni n \sim \mathcal{N}(n | 0, N)` with known covariance matrix :math:`N`.

In the case of additive noise, the result of the measurement is given by

.. math::

d = R(s) + n \, .

Discretisation and index notation

.................................

To compute anything numerically, we first need to represent the problem in finite dimensions.

As for stochastic processes, several discretisations of :math:`\mathcal{S}` like collocation methods, expansion into orthogonal polynomials, etc. can be used (see [6]_, [7]_ for an overview and further information about their reliability).

In particular, NIFTy uses the midpoint method as reviewed in section 2.1 in [6]_ and Fourier expansion.

Without going into the details, discretisation methods basically introduce a finite set of basis functions :math:`\{\phi_i\}_{i\in \mathcal{I}}`, where :math:`\mathcal{I}` denotes a generic index set with :math:`|\mathcal{I}| = N` being the chosen discretisation dimension.

Any Riemannian manifold :math:`(\mathcal{M},g)` is equipped with a canonical scalar product given by

.. math::

\left< a , b \right>_{\mathcal{M}} = \int_{\mathcal{M}} \mathrm{d}x \, \sqrt{|g|} \, a(x) \, b(x) \, .

Projection to the finite basis :math:`\{\phi_i\}_{i\in \mathcal{I}}` is then given by

.. math::

f^i = v^{ij} \, \left< f , \phi_j \right>_{\mathcal{M}} \,

where the Einstein summation convention is assumed and we defined the volume metric

along with its inverse, :math:`v^{ij}`, satisfying :math:`v^{ij}v_{jk} = \delta^i_k`.

Obviously, the basis :math:`\{\phi_i\}_{i\in \mathcal{I}}` needs to be chosen s.th. the volume metric is invertible, otherwise we run into trouble.

Volume factors are encoded into the :math:`v_{ij}`.

For specific choices of the basis :math:`\{\phi_i\}_{i\in \mathcal{I}}`, e.g. indicator functions in the case of a pixelation, the entries of :math:`v_{ij}` are indeed just the volumes of the elements.

Lowering and raising indices works with :math:`v_{ij}` and :math:`v^{ij}` just as usual.

After projection, any function :math:`f \in \mathcal{S}` is represented by its approximation :math:`\hat{f} \in \hat{\mathcal{S}} \simeq \mathbb{R}^N`, where

.. math::

\hat{f} = f^i\,\phi_i \, ,

which defines an embedding :math:`\hat{\mathcal{S}} \hookrightarrow \mathcal{S} = \mathcal{F}(\mathcal{M})`.

**Changes of bases** are performed by reapproximating the :math:`\{\phi_i\}_{i\in \mathcal{I}}` in terms of another basis :math:`\{\phi'_i\}_{i\in \mathcal{I'}}` :

If :math:`A` is a (linear) integral operator defined by a kernel :math:`\tilde{A}: \mathcal{M} \times \cdots \mathcal{M} \rightarrow \mathbb{R}`, its components due to :math:`\{\phi_i\}_{i\in \mathcal{I}}` are given by

.. [6] Bruno Sudret and Armen Der Kiureghian (2000), "Stochastic Finite Element Methods and Reliability: A State-of-the-Art Report"

.. [7] Dongbin Xiu (2010), "Numerical methods for stochastic computations", Princeton University Press.

Resolution and self-consistency

...............................

Looking at figure 1, we see that the there are two response operators:

On the one hand, there is the actual response :math:`R: \mathcal{S} \rightarrow \mathcal{D}` of the instrument used for measurement, mapping the actual signal to data.

On the other hand, there is a discretised response :math:`\hat{R}: \hat{\mathcal{S}} \rightarrow \mathcal{D}`, mapping from the discretised space to data.

Apparently, the discretisation and the discretised response need to satisfy a self-consistency equation, given by

.. math::

R = \hat{R} \circ D \, .

An obvious corrollary is that different discretisations :math:`D, D'` with resulting discretised responses :math:`\hat{R}, \hat{R}'` will need to satisfy

.. math::

\hat{R} \circ D = \hat{R}' \circ D' \, .

NIFTy is implemented such that in order to change resolution, only the line of code defining the space needs to be altered.

It automatically takes care of depended structures like volume factors, discretised operators and responses.

A visualisation of this can be seen in figure 2 and 3, which displays the MAP inference of a signal at various resolutions.

.. figure:: images/42vs6.png

:scale: 40%

:align: center

Figure 2: MAP inference for different resolutions of the function space.

.. figure:: images/42vs9.png

:scale: 40%

:align: center

Figure 3: MAP inference converging at high resolution.

Implementation in NIFTy

-----------------------

.. currentmodule:: nifty5

Most codes in NIFTy will contain the description of a measurement process,

or more generally, a log-likelihood.

This log-likelihood is necessarily a map from the quantity of interest (a field) to a real number.

The likelihood has to be unitless because it is a log-probability and should not scale with resolution.

Often, likelihoods contain integrals over the quantity of interest :math:`s`, which have to be discretized, e.g. by a sum

Here the domain of the integral :math:`\Omega = \dot{\bigcup_q} \; \Omega_i` is the disjoint union over smaller :math:`\Omega_i`, e.g. the pixels of the space, and :math:`s_i` is the discretized field value on the :math:`i`-th pixel.

This introduces the weighting :math:`V_i=\int_{\Omega_i}\text{d}x\, 1`, also called the volume factor, a property of the space.

NIFTy aids you in constructing your own likelihood by providing methods like :func:`~field.Field.weight`, which weights all pixels of a field with its corresponding volume.

An integral over a :class:`~field.Field` :code:`s` can be performed by calling :code:`s.weight(1).sum()`, which is equivalent to :code:`s.integrate()`.

Volume factors are also applied automatically in the following places:

- :class:`~operators.harmonic_operators.FFTOperator` as well as all other harmonic operators. Here the zero mode of the transformed field is the integral over the original field, thus the whole field is weighted once.

- some response operators, such as the :class:`~library.los_response.LOSResponse`. In this operator a line integral is descritized, so a 1-dimensional volume factor is applied.

- In :class:`~library.correlated_fields.CorrelatedField` as well :class:`~library.correlated_fields.MfCorrelatedField`, the field is multiplied by the square root of the total volume in configuration space. This ensures that the same field reconstructed over a larger domain has the same variance in position space in the limit of infinite resolution. It also ensures that power spectra in NIFTy behave according to the definition of a power spectrum, namely the power of a k-mode is the expectation of the k-mode square, divided by the volume of the space.

Note that in contrast to some older versions of NIFTy, the dot product of fields does not apply a volume factor

.. math::

s^\dagger t = \sum_i s_i^* t_i .

If this dot product is supposed to be invariant under changes in resolution, then either :math:`s_i` or :math:`t_i` has to decrease as the number of pixels increases, or more specifically, one of the two fields has to be an extensive quantity while the other has to be intensive.

One can make this more explicit by denoting intensive quantities with upper index and extensive quantities with lower index

.. math::

s^\dagger t = (s^*)^i t_i

where we used Einstein sum convention.

This notation connects to the theoretical discussion before.

One of the field has to have the volume metric already incorperated to assure the continouum limit works.

When building statistical models, all indices will end up matching this upper-lower convention automatically, e.g. for a Gaussian log-likelihood :math:`L` we have