Commit 31c7db31 authored by Philipp Arras's avatar Philipp Arras
Browse files

Merge branch 'NIFTy_5' into cosm

parents b1896312 252a5232
......@@ -74,7 +74,7 @@ if __name__ == '__main__':
ic_sampling = ift.GradientNormController(iteration_limit=100)
# Minimize the Hamiltonian
H = ift.Hamiltonian(likelihood, ic_sampling)
H = ift.StandardHamiltonian(likelihood, ic_sampling)
H = ift.EnergyAdapter(position, H, want_metric=True)
# minimizer = ift.L_BFGS(ic_newton)
H, convergence = minimizer(H)
......@@ -99,7 +99,7 @@ if __name__ == '__main__':
minimizer = ift.NewtonCG(ic_newton)
# Compute MAP solution by minimizing the information Hamiltonian
H = ift.Hamiltonian(likelihood)
H = ift.StandardHamiltonian(likelihood)
initial_position = ift.from_random('normal', domain)
H = ift.EnergyAdapter(initial_position, H, want_metric=True)
H, convergence = minimizer(H)
......@@ -100,10 +100,10 @@ if __name__ == '__main__':
# Set up likelihood and information Hamiltonian
likelihood = ift.GaussianEnergy(mean=data, covariance=N)(signal_response)
H = ift.Hamiltonian(likelihood, ic_sampling)
H = ift.StandardHamiltonian(likelihood, ic_sampling)
initial_position = ift.MultiField.full(H.domain, 0.)
position = initial_position
initial_mean = ift.MultiField.full(H.domain, 0.)
mean = initial_mean
plot = ift.Plot()
plot.add(signal(mock_position), title='Ground Truth')
......@@ -117,9 +117,9 @@ if __name__ == '__main__':
# Draw new samples to approximate the KL five times
for i in range(5):
# Draw new samples and minimize KL
KL = ift.KL_Energy(position, H, N_samples)
KL = ift.MetricGaussianKL(mean, H, N_samples)
KL, convergence = minimizer(KL)
position = KL.position
mean = KL.position
# Plot current reconstruction
plot = ift.Plot()
......@@ -128,7 +128,7 @@ if __name__ == '__main__':
plot.output(ny=1, ysize=6, xsize=16, name="loop-{:02}.png".format(i))
# Draw posterior samples
KL = ift.KL_Energy(position, H, N_samples)
KL = ift.MetricGaussianKL(mean, H, N_samples)
sc = ift.StatCalculator()
for sample in KL.samples:
sc.add(signal(sample + KL.position))
......@@ -103,7 +103,7 @@ N = ift.DiagonalOperator(ift.from_global_data(d_space, var))
IC = ift.DeltaEnergyController(tol_rel_deltaE=1e-12, iteration_limit=200)
likelihood = ift.GaussianEnergy(d, N)(R)
Ham = ift.Hamiltonian(likelihood, IC)
Ham = ift.StandardHamiltonian(likelihood, IC)
H = ift.EnergyAdapter(params, Ham, want_metric=True)
# Minimize
......@@ -9,7 +9,7 @@ Theoretical Background
IFT is fully Bayesian. How else could infinitely many field degrees of freedom be constrained by finite data?
There is a full toolbox of methods that can be used, like the classical approximation (= Maximum a posteriori = MAP), effective action (= Variational Bayes = VI), Feynman diagrams, renormalization, and more. IFT reproduces many known well working algorithms. This should be reassuring. Also, there were certainly previous works in a similar spirit. Anyhow, in many cases IFT provides novel rigorous ways to extract information from data. NIFTy comes with reimplemented MAP and VI estimators. It also provides a Hamiltonian Monte Carlo sampler for Fields (HMCF). (*FIXME* does it?)
There is a full toolbox of methods that can be used, like the classical approximation (= Maximum a posteriori = MAP), effective action (= Variational Bayes = VI), Feynman diagrams, renormalitation, and more. IFT reproduces many known well working algorithms. This should be reassuring. And, there were certainly previous works in a similar spirit. Anyhow, in many cases IFT provides novel rigorous ways to extract information from data. NIFTy comes with reimplemented MAP and VI estimators.
.. tip:: *In-a-nutshell introductions to information field theory* can be found in [2]_, [3]_, [4]_, and [5]_, with the latter probably being the most didactical.
......@@ -184,14 +184,14 @@ The reconstruction of a non-Gaussian signal with unknown covariance from a non-t
| **Output of tomography demo** |
| .. image:: images/getting_started_3_setup.png |
| |
| :width: 50 % |
| Non-Gaussian signal field, |
| data backprojected into the image domain, power |
| spectrum of underlying Gausssian process. |
| .. image:: images/getting_started_3_results.png |
| |
| :width: 50 % |
| Posterior mean field signal |
| reconstruction, its uncertainty, and the power |
......@@ -199,3 +199,110 @@ The reconstruction of a non-Gaussian signal with unknown covariance from a non-t
| samples in comparison to the correct one (thick |
| orange line). |
Maximim a Posteriori
One popular field estimation method is Maximim a Posteriori (MAP).
It only requires to minimize the information Hamiltonian, e.g by a gradient descent method that stops when
.. math::
\frac{\partial \mathcal{H}(d,\xi)}{\partial \xi} = 0.
NIFTy5 is able to calculate the necessary gradient from a generative model of the signal and the data and to minimize the Hamiltonian.
However, MAP provides often unsatisfactory result in case a deep hirachical Bayesian networks describes the singal and data generation.
The reason for this is that MAP ignores the volume factors in parameter space, which are not to be neglected in deciding whether a solution is reasonable or not.
In the high dimensional setting of field inference these volume factors can differ by large ratios.
A MAP estimate, which is only representative for a tiny fraction of the parameter space, might be a poorer choice (with respect to an error norm) compared to a slightly worse location with slightly lower posterior probability, which, however, is associated with a much larger volume (of nearby locations with similar probability).
This causes MAP signal estimates to be more prone to overfitting the noise as well as to perception thresholds than methods that take volume effects into account.
Variational Inference
One method that takes volume effects into account is Variational Inference (VI).
In VI, the posterior :math:`\mathcal{P}(\xi|d)` is approximated by a simpler distribution, often a Gaussian :math:`\mathcal{Q}(\xi)=\mathcal{G}(\xi-m,D)`.
The parameters of :math:`\mathcal{Q}`, the mean :math:`m` and its uncertainty dispersion :math:`D` are obtained by minimization of an appropriate information distance measure between :math:`\mathcal{Q}` and :math:`\mathcal{P}`.
As a compromise between being optimal and being computational affordable the (reverse) Kullbach Leiberler (KL) divergence is used in VI:
.. math::
\mathrm{KL}(m,D|d)= \mathcal{D}_\mathrm{KL}(\mathcal{Q}||\mathcal{P})=
\int \mathcal{D}\xi \,\mathcal{Q}(\xi) \log \left( \frac{\mathcal{Q}(\xi)}{\mathcal{P}(\xi)} \right)
Minimizing this with respect to all entries of the covariance :math:`D` is unfeasible for fields.
Therefore, Metric Gaussian Variational Inference (MGVI) makes the Ansatz to approximate the precision matrix :math:`M=D^{-1}` by the Bayesian Fisher information metric,
.. math::
M \approx \left\langle \frac{\partial \mathcal{H}(d,\xi)}{\partial \xi} \, \frac{\partial \mathcal{H}(d,\xi)}{\partial \xi}^\dagger \right\rangle_{(d,\xi)},
where in the MGVI practice the average is performed over :math:`\mathcal{P}(d,\xi)\approx \mathcal{P}(d|\xi)\,\mathcal{Q}(\xi)` by evaluating the expression at :math:`\xi` samples drawn from the Gaussian :math:`\mathcal{Q}(\xi)` and corrsponding data samples dran from their generative process :math:`\mathcal{P}(d|\xi)`.
With this approximation, the KL becomes effectively a function of the mean :math:`m`, as :math:`D= D(m) \approx M^{-1}`. Thus, only the gradient of the KL is needed with respect to this, which can be expressed as
.. math::
\frac{\partial \mathrm{KL}(m|d)}{\partial m} = \left\langle \frac{\partial \mathcal{H}(d,\xi)}{\partial \xi} \right\rangle_{\mathcal{G}(\xi-m,D)}.
The advantage of this Ansatz is that the averages can be represented by sample averages, and all the gradients are represented by operators that NIFTy5 can calculate and that do not need the storage of full matrices. Therefore, NIFTy5 is able to draw samples according to a Gaussian with a covariance given by the inverse information metric, and to minimize the KL correspondingly.
Setting up a KL for MGVI is done via objects of the class MetricGaussianKL.
It should be noted that MetricGaussianKL does not estimate the full KL, as within the MGVI approximation only the KL is optimized with respect to the posterior mean and any other additive part of the KL is dropped. It turns out that a metric Gaussian average of the original information Hamiltonian contains all necessary dependencies:
.. math::
\mathrm{KL}(m|d) = \left\langle - \mathcal{H}_\mathcal{Q}(\xi|m) + \mathcal{H}(\xi|d)
.. math::
\mathcal{H}_\mathcal{Q}(\xi|m) = - \log \mathcal{Q}(\xi) =
- \log \mathcal{G}(\xi-m,D)
is the information Hamiltonian of the approximating Gaussian and
.. math::
\mathcal{H}(\xi|d) = - \log \mathcal{P}(\xi|d) =
- \log \left( \frac{\mathcal{P}(d,\xi)}{\mathcal{P}(d)} \right) =
\mathcal{H}(d,\xi) - \mathcal{H}(d)
the posterior information Hamiltonian.
Since neither
.. math::
\left\langle \mathcal{H}_\mathcal{Q}(\xi|m) \right\rangle_{\mathcal{Q}(\xi)} =
\frac{1}{2} \log \left| 2\pi e D \right|
.. math::
\left\langle \mathcal{H}(d) \right\rangle_{\mathcal{Q}(\xi)} =
depend directly on :math:`m`, they are dropped from the KL to be minimized. What remains is
.. math::
\mathrm{KL}(m|d) \;\widehat{=}\;
\left\langle \mathcal{H}(\xi,d) \right\rangle_{\mathcal{Q}(\xi)},
where :math:`\widehat{=}` expresses equality up to irrelvant (here not :math:`m`-dependent) terms.
The fact that the KL depends indirectly also through :math:`D=D(m)` in a second way on :math:`m` is ignored in the MGVI approach. This can often be justified by uncertainties usually being mostly determined by the :math:`m`-independent measurment setup and the variation of the uncertainty with the posterior mean is expected to be subdominant and of moderate importance for the KL.
The demo for example infers this way not only a field, but also the power spectrum of the process that has generated the field.
The cross-correlation of field and power spectum is taken care of thereby.
Posterior samples can be obtained to study this cross-correlation.
It should be noted that MGVI as any VI method typically underestimates uncertainties due to the fact that :math:`\mathcal{D}_\mathrm{KL}(\mathcal{Q}||\mathcal{P})`, the reverse KL, is used, whereas :math:`\mathcal{D}_\mathrm{KL}(\mathcal{P}||\mathcal{Q})` would be optimal to approximate :math:`\mathcal{P}` by :math:`\mathcal{Q}` from an information theoretical perspective.
This, however, would require that one is able to integrate the posterior, in wich case one could calculate the desired posterior mean and its uncertainty covariance directly and therefore would not have any need to perform VI.
......@@ -49,7 +49,7 @@ from .operators.simple_linear_operators import (
from .operators.value_inserter import ValueInserter
from .operators.energy_operators import (
EnergyOperator, GaussianEnergy, PoissonianEnergy, InverseGammaLikelihood,
BernoulliEnergy, Hamiltonian, AveragedEnergy)
BernoulliEnergy, StandardHamiltonian, AveragedEnergy)
from .probing import probe_with_posterior_samples, probe_diagonal, \
......@@ -68,7 +68,7 @@ from .minimization.scipy_minimizer import (ScipyMinimizer, L_BFGS_B, ScipyCG)
from import Energy
from .minimization.quadratic_energy import QuadraticEnergy
from .minimization.energy_adapter import EnergyAdapter
from .minimization.kl_energy import KL_Energy
from .minimization.metric_gaussian_kl import MetricGaussianKL
from .sugar import *
from .plot import Plot
......@@ -18,7 +18,7 @@
from ..minimization.energy_adapter import EnergyAdapter
from ..multi_field import MultiField
from ..operators.distributors import PowerDistributor
from ..operators.energy_operators import Hamiltonian, InverseGammaLikelihood
from ..operators.energy_operators import StandardHamiltonian, InverseGammaLikelihood
from ..operators.scaling_operator import ScalingOperator
from ..operators.simple_linear_operators import ducktape
......@@ -53,8 +53,8 @@ def make_adjust_variances(a,
Hamiltonian that can be used for further minimization.
A Hamiltonian that can be used for further minimization.
d = a*xi
......@@ -72,7 +72,7 @@ def make_adjust_variances(a,
if scaling is not None:
x = ScalingOperator(scaling,
return Hamiltonian(InverseGammaLikelihood(d_eval)(x), ic_samp=ic_samp)
return StandardHamiltonian(InverseGammaLikelihood(d_eval)(x), ic_samp=ic_samp)
def do_adjust_variances(position,
......@@ -20,31 +20,70 @@ from ..linearization import Linearization
from .. import utilities
class KL_Energy(Energy):
def __init__(self, position, h, nsamp, constants=[],
constants_samples=None, gen_mirrored_samples=False,
class MetricGaussianKL(Energy):
"""Provides the sampled Kullback-Leibler divergence between a distribution
and a Metric Gaussian.
A Metric Gaussian is used to approximate some other distribution.
It is a Gaussian distribution that uses the Fisher Information Metric
of the other distribution at the location of its mean to approximate the
variance. In order to infer the mean, the a stochastic estimate of the
Kullback-Leibler divergence is minimized. This estimate is obtained by
drawing samples from the Metric Gaussian at the current mean.
During minimization these samples are kept constant, updating only the
mean. Due to the typically nonlinear structure of the true distribution
these samples have to be updated by re-initializing this class at some
point. Here standard parametrization of the true distribution is assumed.
mean : Field
The current mean of the Gaussian.
hamiltonian : StandardHamiltonian
The StandardHamiltonian of the approximated probability distribution.
n_samples : integer
The number of samples used to stochastically estimate the KL.
constants : list
A list of parameter keys that are kept constant during optimization.
point_estimates : list
A list of parameter keys for which no samples are drawn, but that are
optimized for, corresponding to point estimates of these.
mirror_samples : boolean
Whether the negative of the drawn samples are also used,
as they are equaly legitimate samples. If true, the number of used
samples doubles. Mirroring samples stabilizes the KL estimate as
extreme sample variation is counterbalanced. (default : False)
For further details see: Metric Gaussian Variational Inference
(in preparation)
def __init__(self, mean, hamiltonian, n_sampels, constants=[],
point_estimates=None, mirror_samples=False,
super(KL_Energy, self).__init__(position)
if h.domain is not position.domain:
super(MetricGaussianKL, self).__init__(mean)
if hamiltonian.domain is not mean.domain:
raise TypeError
self._h = h
self._hamiltonian = hamiltonian
self._constants = constants
if constants_samples is None:
constants_samples = constants
self._constants_samples = constants_samples
if point_estimates is None:
point_estimates = constants
self._constants_samples = point_estimates
if _samples is None:
met = h(Linearization.make_partial_var(
position, constants_samples, True)).metric
met = hamiltonian(Linearization.make_partial_var(
mean, point_estimates, True)).metric
_samples = tuple(met.draw_sample(from_inverse=True)
for _ in range(nsamp))
if gen_mirrored_samples:
for _ in range(n_sampels))
if mirror_samples:
_samples += tuple(-s for s in _samples)
self._samples = _samples
self._lin = Linearization.make_partial_var(position, constants)
self._lin = Linearization.make_partial_var(mean, constants)
v, g = None, None
for s in self._samples:
tmp = self._h(self._lin+s)
tmp = self._hamiltonian(self._lin+s)
if v is None:
v = tmp.val.local_data[()]
g = tmp.gradient
......@@ -56,9 +95,9 @@ class KL_Energy(Energy):
self._metric = None
def at(self, position):
return KL_Energy(position, self._h, 0,
self._constants, self._constants_samples,
return MetricGaussianKL(position, self._hamiltonian, 0,
self._constants, self._constants_samples,
def value(self):
......@@ -71,7 +110,8 @@ class KL_Energy(Energy):
def _get_metric(self):
if self._metric is None:
lin = self._lin.with_want_metric()
mymap = map(lambda v: self._h(lin+v).metric, self._samples)
mymap = map(lambda v: self._hamiltonian(lin+v).metric,
self._metric = utilities.my_sum(mymap)
self._metric = self._metric.scale(1./len(self._samples))
......@@ -27,7 +27,7 @@ class BlockDiagonalOperator(EndomorphicOperator):
domain : MultiDomain
Domain and target of the operator.
operators : dict
Dictionary with subdomain names as keys and :class:`LinearOperator`s
Dictionary with subdomain names as keys and :class:`LinearOperator` s
as items.
def __init__(self, domain, operators):
......@@ -259,7 +259,7 @@ class BernoulliEnergy(EnergyOperator):
return v.add_metric(met)
class Hamiltonian(EnergyOperator):
class StandardHamiltonian(EnergyOperator):
"""Computes an information Hamiltonian in its standard form, i.e. with the
prior being a Gaussian with unit covariance.
......@@ -314,52 +314,24 @@ class Hamiltonian(EnergyOperator):
def __repr__(self):
subs = 'Likelihood:\n{}'.format(utilities.indent(self._lh.__repr__()))
subs += '\nPrior: Quadratic{}'.format(self._lh.domain.keys())
return 'Hamiltonian:\n' + utilities.indent(subs)
return 'StandardHamiltonian:\n' + utilities.indent(subs)
class AveragedEnergy(EnergyOperator):
"""Computes Kullback-Leibler (KL) divergence or Gibbs free energies.
A sample-averaged energy, e.g. an Hamiltonian, approximates the relevant
part of a KL to be used in Variational Bayes inference if the samples are
drawn from the approximating Gaussian:
.. math ::
\\text{KL}(m) = \\frac1{\\#\\{v_i\\}} \\sum_{v_i} H(m+v_i),
where :math:`v_i` are the residual samples and :math:`m` is the mean field
around which the samples are drawn.
"""Averages an energy over samples
h: Hamiltonian
The energy to be averaged.
res_samples : iterable of Fields
Set of residual sample points to be added to mean field for approximate
estimation of the KL.
Set of residual sample points to be added to mean field for
approximate estimation of the KL.
Having symmetrized residual samples, with both v_i and -v_i being present
ensures that the distribution mean is exactly represented. This reduces
sampling noise and helps the numerics of the KL minimization process in the
variational Bayes inference.
See also
Let :math:`Q(f) = G(f-m,D)` be the Gaussian distribution
which is used to approximate the accurate posterior :math:`P(f|d)` with
information Hamiltonian
:math:`H(d,f) = -\\log P(d,f) = -\\log P(f|d) + \\text{const}`. In
Variational Bayes one needs to optimize the KL divergence between those
two distributions for m. It is:
:math:`KL(Q,P) = \\int Df Q(f) \\log Q(f)/P(f)\\\\
= \\left< \\log Q(f) \\right>_Q(f) - \\left< \\log P(f) \\right>_Q(f)\\\\
= \\text{const} + \\left< H(f) \\right>_G(f-m,D)`
in essence the information Hamiltonian averaged over a Gaussian
distribution centered on the mean m.
Having symmetrized residual samples, with both v_i and -v_i being
present ensures that the distribution mean is exactly represented.
:class:`AveragedEnergy(h)` approximates
:math:`\\left< H(f) \\right>_{G(f-m,D)}` if the residuals
......@@ -69,7 +69,7 @@ def test_hamiltonian_and_KL(field):
field = field.exp()
space = field.domain
lh = ift.GaussianEnergy(domain=space)
hamiltonian = ift.Hamiltonian(lh)
hamiltonian = ift.StandardHamiltonian(lh)
ift.extra.check_value_gradient_consistency(hamiltonian, field)
S = ift.ScalingOperator(1., space)
samps = [S.draw_sample() for i in range(3)]
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment