Commit 1beb7771 authored by Martin Reinecke's avatar Martin Reinecke

cleanup

parent bb413b40
......@@ -19,7 +19,7 @@ There is a full toolbox of methods that can be used, like the classical approxim
.. [3] T.A. Enßlin (2014), "Astrophysical data analysis with information field theory", AIP Conference Proceedings, Volume 1636, Issue 1, p.49; `arXiv:1405.7701 <http://arxiv.org/abs/1405.7701>`_
.. [4] Wikipedia contributors (2018), `"Information field theory" <https://en.wikipedia.org/w/index.php?title=Information_field_theory&oldid=876731720>`_, Wikipedia, The Free Encyclopedia.
.. [4] Wikipedia contributors (2018), `"Information field theory" <https://en.wikipedia.org/w/index.php?title=Information_field_theory&oldid=876731720>`_, Wikipedia, The Free Encyclopedia.
.. [5] T.A. Enßlin (2019), "Information theory for fields", accepted by Annalen der Physik; `arXiv:1804.03350 <http://arxiv.org/abs/1804.03350>`_
......@@ -85,7 +85,7 @@ The above line of argumentation analogously applies to the discretization of ope
The proper discretization of spaces, fields, and operators, as well as the normalization of position integrals, is essential for the conservation of the continuum limit. Their consistent implementation in NIFTY allows a pixelization independent coding of algorithms.
Free Theory & Implicit Operators
Free Theory & Implicit Operators
--------------------------------
A free IFT appears when the signal field :math:`{s}` and the noise :math:`{n}` of the data :math:`{d}` are independent, zero-centered Gaussian processes of kown covariances :math:`{S}` and :math:`{N}`, respectively,
......@@ -94,7 +94,7 @@ A free IFT appears when the signal field :math:`{s}` and the noise :math:`{n}` o
\mathcal{P}(s,n) = \mathcal{G}(s,S)\,\mathcal{G}(n,N),
and the measurement equation is linear in both, signal and noise,
and the measurement equation is linear in both signal and noise,
.. math::
......@@ -109,15 +109,15 @@ associate professor
\mathcal{H}(d,s)= -\log \mathcal{P}(d,s)= \frac{1}{2} s^\dagger S^{-1} s + \frac{1}{2} (d-R\,s)^\dagger N^{-1} (d-R\,s) + \mathrm{const}
is only of quadratic order in :math:`{s}`, which leads to a linear relation between the data and the posterior mean field.
is only of quadratic order in :math:`{s}`, which leads to a linear relation between the data and the posterior mean field.
In this case, the posterior is
In this case, the posterior is
.. math::
\mathcal{P}(s|d) = \mathcal{G}(s-m,D)
with
with
.. math::
......@@ -129,7 +129,7 @@ the posterior mean field,
D = \left( S^{-1} + R^\dagger N^{-1} R\right)^{-1}
the posterior covariance operator, and
the posterior covariance operator, and
.. math::
......@@ -137,7 +137,7 @@ the posterior covariance operator, and
the information source. The operation in :math:`{d= D\,R^\dagger N^{-1} d}` is also called the generalized Wiener filter.
NIFTy permits to define the involved operators :math:`{R}`, :math:`{R^\dagger}`, :math:`{S}`, and :math:`{N}` implicitely, as routines that can be applied to vectors, but which do not require the explicit storage of the matrix elements of the operators.
NIFTy permits to define the involved operators :math:`{R}`, :math:`{R^\dagger}`, :math:`{S}`, and :math:`{N}` implicitely, as routines that can be applied to vectors, but which do not require the explicit storage of the matrix elements of the operators.
Some of these operators are diagonal in harmonic (Fourier) basis, and therefore only require the specification of a (power) spectrum and :math:`{S= F\,\widehat{P_s} F^\dagger}`. Here :math:`{F = \mathrm{HarmonicTransformOperator}}`, :math:`{\widehat{P_s} = \mathrm{DiagonalOperator}(P_s)}`, and :math:`{P_s(k)}` is the power spectrum of the process that generated :math:`{s}` as a function of the (absolute value of the) harmonic (Fourier) space koordinate :math:`{k}`. For those, NIFTy can easily also provide inverse operators, as :math:`{S^{-1}= F\,\widehat{\frac{1}{P_s}} F^\dagger}` in case :math:`{F}` is unitary, :math:`{F^\dagger=F^{-1}}`.
......@@ -170,7 +170,7 @@ The joint information Hamiltonian for the whitened signal field :math:`{\xi}` re
\mathcal{H}(d,\xi)= -\log \mathcal{P}(d,s)= \frac{1}{2} \xi^\dagger \xi + \frac{1}{2} (d-R\,A\,\xi)^\dagger N^{-1} (d-R\,A\,\xi) + \mathrm{const}.
NIFTy takes advantage of this formulation in several ways:
NIFTy takes advantage of this formulation in several ways:
1) All prior degrees of freedom have unit covariance which improves the condition number of operators which need to be inverted.
2) The amplitude operator can be regarded as part of the response, :math:`{R'=R\,A}`. In general, more sophisticated responses can be constructed out of the composition of simpler operators.
......
......@@ -35,6 +35,7 @@ class EnergyOperator(Operator):
It is intended as an objective function for field inference.
Typical usage in IFT:
- as an information Hamiltonian (i.e. a negative log probability)
- or as a Gibbs free energy (i.e. an averaged Hamiltonian),
aka Kullbach-Leibler divergence.
......@@ -47,8 +48,8 @@ class SquaredNormOperator(EnergyOperator):
Usage
-----
E = SquaredNormOperator() represents a field energy E that is the L2 norm
of a field f:
``E = SquaredNormOperator()`` represents a field energy E that is the
L2 norm of a field f:
:math:`E(f) = f^\dagger f`
"""
......@@ -74,7 +75,7 @@ class QuadraticFormOperator(EnergyOperator):
Notes
-----
`E = QuadraticFormOperator(op)` represents a field energy that is a
``E = QuadraticFormOperator(op)`` represents a field energy that is a
quadratic form in a field f with kernel op:
:math:`E(f) = 0.5 f^\dagger op f`
......@@ -111,8 +112,10 @@ class GaussianEnergy(EnergyOperator):
Notes
-----
- At least one of the arguments has to be provided.
- `E = GaussianEnergy(mean=m, covariance=D)` represents (up to constants)
- ``E = GaussianEnergy(mean=m, covariance=D)`` represents (up to constants)
:math:`E(f) = - \log G(f-m, D) = 0.5 (f-m)^\dagger D^{-1} (f-m)`,
an information energy for a Gaussian distribution with mean m and covariance D.
"""
......@@ -163,9 +166,9 @@ class PoissonianEnergy(EnergyOperator):
Notes
-----
E = GaussianEnergy(d) represents (up to an f-independent term log(d!))
``E = PoissonianEnergy(d)`` represents (up to an f-independent term log(d!))
:math:`E(f) = -\log Poisson(d|f) = \sum f - d^\dagger \log(f)`,
:math:`E(f) = -\log \\text{Poisson}(d|f) = \sum f - d^\dagger \log(f)`,
where f is a Field in data space with the expectation values for
the counts.
......@@ -216,9 +219,9 @@ class BernoulliEnergy(EnergyOperator):
Notes
-----
E = BernoulliEnergy(d) represents
``E = BernoulliEnergy(d)`` represents
:math:`E(f) = -\log \mbox{Bernoulli}(d|f) = -d^\dagger \log f - (1-d)^\dagger \log(1-f)`,
:math:`E(f) = -\log \\text{Bernoulli}(d|f) = -d^\dagger \log f - (1-d)^\dagger \log(1-f)`,
where f is a field in data space (d.domain) with the expected frequencies of
events.
......@@ -253,7 +256,7 @@ class Hamiltonian(EnergyOperator):
Notes
-----
H = Hamiltonian(E_lh) represents
``H = Hamiltonian(E_lh)`` represents
:math:`H(f) = 0.5 f^\dagger f + E_{lh}(f)`
......@@ -270,7 +273,7 @@ class Hamiltonian(EnergyOperator):
For more details see:
"Encoding prior knowledge in the structure of the likelihood"
Jakob Knollmüller, Torsten A. Ensslin, submitted, arXiv:1812.04403
https://arxiv.org/abs/1812.04403
:link:`https://arxiv.org/abs/1812.04403`
"""
def __init__(self, lh, ic_samp=None):
self._lh = lh
......@@ -302,20 +305,22 @@ class SampledKullbachLeiblerDivergence(EnergyOperator):
approximatively the relevant part of a KL to be used in Variational Bayes
inference if the samples are drawn from the approximating Gaussian.
Let Q(f) = G(f-m,D) Gaussian used to approximate
P(f|d), the correct posterior with information Hamiltonian
H(d,f) = - log P(d,f) = - log P(f|d) + const.
Let :math:`Q(f) = G(f-m,D)` Gaussian used to approximate
:math:`P(f|d)`, the correct posterior with information Hamiltonian
:math:`H(d,f) = -\log P(d,f) = -\log P(f|d) + \\text{const.}`
The KL divergence between those should then be optimized for m. It is
:math:`KL(Q,P) = \int Df Q(f) \log Q(f)/P(f)\\
= \left< \log Q(f) \\right>_Q(f) - < \log P(f) >_Q(f) = const + < H(f) >_G(f-m,D)`
:math:`KL(Q,P) = \int Df Q(f) \log Q(f)/P(f)\\\\
= \left< \log Q(f) \\right>_Q(f) - \left< \log P(f) \\right>_Q(f)\\\\
= \\text{const} + \left< H(f) \\right>_G(f-m,D)`
in essence the information Hamiltonian averaged over a Gaussian distribution
centered on the mean m.
in essence the information Hamiltonian averaged over a Gaussian
distribution centered on the mean m.
SampledKullbachLeiblerDivergence(H) approximates < H(f) >_G(f-m,D) if the
residuals f-m are drawn from covariance D.
SampledKullbachLeiblerDivergence(H) approximates
:math:`\left< H(f) \\right>_{G(f-m,D)}` if the residuals
:math:`f-m` are drawn from covariance :math:`D`.
Parameters
----------
......@@ -327,17 +332,17 @@ class SampledKullbachLeiblerDivergence(EnergyOperator):
Notes
-----
KL = SampledKullbachLeiblerDivergence(H, samples) represents
``KL = SampledKullbachLeiblerDivergence(H, samples)`` represents
:math:`KL(m) = \sum_i H(m+v_i) / N`,
:math:`\\text{KL}(m) = \sum_i H(m+v_i) / N`,
where v_i are the residual samples, N is their number, and m is the mean field
around which the samples are drawn.
where :math:`v_i` are the residual samples, :math:`N` is their number,
and :math:`m` is the mean field around which the samples are drawn.
Having symmetrized residual samples, with both, v_i and -v_i being present,
ensures that the distribution mean is exactly represented. This reduces sampling
noise and helps the numerics of the KL minimization process in the variational
Bayes inference.
Having symmetrized residual samples, with both v_i and -v_i being present
ensures that the distribution mean is exactly represented. This reduces
sampling noise and helps the numerics of the KL minimization process in the
variational Bayes inference.
"""
def __init__(self, h, res_samples):
self._h = h
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment