Commit 8075b373 authored by Philipp Arras's avatar Philipp Arras
Browse files

Merge branch 'more_pypi_preparations' into 'NIFTy_7'

Remove gitversion interface, docs improvements and workaround pip

See merge request !657
parents ec726ff5 823f4607
Pipeline #105237 passed with stages
in 17 minutes and 41 seconds
...@@ -93,6 +93,13 @@ likelihood becomes the identity matrix. This is needed for the `GeoMetricKL` ...@@ -93,6 +93,13 @@ likelihood becomes the identity matrix. This is needed for the `GeoMetricKL`
algorithm. algorithm.
Remove gitversion interface
---------------------------
Since we provide proper nifty releases on PyPI now, the gitversion interface is
not supported any longer.
Changes since NIFTy 5 Changes since NIFTy 5
===================== =====================
......
Approximate Inference
=====================
In Variational Inference (VI), the posterior :math:`\mathcal{P}(\xi|d)` is approximated by a simpler, parametrized distribution, often a Gaussian :math:`\mathcal{Q}(\xi)=\mathcal{G}(\xi-m,D)`.
The parameters of :math:`\mathcal{Q}`, the mean :math:`m` and its covariance :math:`D` are obtained by minimization of an appropriate information distance measure between :math:`\mathcal{Q}` and :math:`\mathcal{P}`.
As a compromise between being optimal and being computationally affordable, the variational Kullback-Leibler (KL) divergence is used:
.. math::
\mathrm{KL}(m,D|d)= \mathcal{D}_\mathrm{KL}(\mathcal{Q}||\mathcal{P})=
\int \mathcal{D}\xi \,\mathcal{Q}(\xi) \log \left( \frac{\mathcal{Q}(\xi)}{\mathcal{P}(\xi)} \right)
NIFTy features two main alternatives for variational inference: Metric Gaussian Variational Inference (MGVI) and geometric Variational Inference (geoVI).
A visual comparison of the MGVI and GeoVI algorithm can be found in `variational_inference_visualized.py <https://gitlab.mpcdf.mpg.de/ift/nifty/-/blob/NIFTy_7/demos/variational_inference_visualized.py>`_.
Metric Gaussian Variational Inference (MGVI)
--------------------------------------------
Minimizing the KL divergence with respect to all entries of the covariance :math:`D` is unfeasible for fields.
Therefore, Metric Gaussian Variational Inference (MGVI, [1]_) approximates the posterior precision matrix :math:`D^{-1}` at the location of the current mean :math:`m` by the Bayesian Fisher information metric,
.. math::
M \approx \left\langle \frac{\partial \mathcal{H}(d,\xi)}{\partial \xi} \, \frac{\partial \mathcal{H}(d,\xi)}{\partial \xi}^\dagger \right\rangle_{(d,\xi)}.
In practice the average is performed over :math:`\mathcal{P}(d,\xi)\approx \mathcal{P}(d|\xi)\,\delta(\xi-m)` by evaluating the expression at the current mean :math:`m`.
This results in a Fisher information metric of the likelihood evaluated at the mean plus the prior information metric.
Therefore we will only have to infer the mean of the approximate distribution.
The only term within the KL-divergence that explicitly depends on it is the Hamiltonian of the true problem averaged over the approximation:
.. math::
\mathrm{KL}(m|d) \;\widehat{=}\;
\left\langle \mathcal{H}(\xi,d) \right\rangle_{\mathcal{Q}(\xi)},
where :math:`\widehat{=}` expresses equality up to irrelevant (here not :math:`m`-dependent) terms.
Thus, only the gradient of the KL is needed with respect to this, which can be expressed as
.. math::
\frac{\partial \mathrm{KL}(m|d)}{\partial m} = \left\langle \frac{\partial \mathcal{H}(d,\xi)}{\partial \xi} \right\rangle_{\mathcal{G}(\xi-m,D)}.
We stochastically estimate the KL-divergence and gradients with a set of samples drawn from the approximate posterior distribution.
The particular structure of the covariance allows us to draw independent samples solving a certain system of equations.
This KL-divergence for MGVI is implemented by
:func:`~nifty7.minimization.kl_energies.MetricGaussianKL` within NIFTy7.
Note that MGVI typically provides only a lower bound on the variance.
Geometric Variational Inference (geoVI)
---------------------------------------
For non-linear posterior distributions :math:`\mathcal{P}(\xi|d)` an approximation with a Gaussian :math:`\mathcal{Q}(\xi)` in the coordinates :math:`\xi` is sub-optimal, as higher order interactions are ignored.
A better approximation can be achieved by constructing a coordinate system :math:`y = g\left(\xi\right)` in which the posterior is close to a Gaussian, and perform VI with a Gaussian :math:`\mathcal{Q}(y)` in these coordinates.
This approach is called Geometric Variation Inference (geoVI).
It is discussed in detail in [2]_.
One useful coordinate system is obtained in case the metric :math:`M` of the posterior can be expressed as the pullback of the Euclidean metric by :math:`g`:
.. math::
M = \left(\frac{\partial g}{\partial \xi}\right)^T \frac{\partial g}{\partial \xi} \ .
In general, such a transformation exists only locally, i.e. in a neighbourhood of some expansion point :math:`\bar{\xi}`, denoted as :math:`g_{\bar{\xi}}\left(\xi\right)`.
Using :math:`g_{\bar{\xi}}`, the GeoVI scheme uses a zero mean, unit Gaussian :math:`\mathcal{Q}(y) = \mathcal{G}(y, 1)` approximation.
It can be expressed in :math:`\xi` coordinates via the pushforward by the inverse transformation :math:`\xi = g_{\bar{\xi}}^{-1}(y)`:
.. math::
\mathcal{Q}_{\bar{\xi}}(\xi) = \left(g_{\bar{\xi}}^{-1} * \mathcal{Q}\right)(\xi) = \int \delta\left(\xi - g_{\bar{\xi}}^{-1}(y)\right) \ \mathcal{G}(y, 1) \ \mathcal{D}y \ ,
where :math:`\delta` denotes the Kronecker-delta.
GeoVI obtains the optimal expansion point :math:`\bar{\xi}` such that :math:`\mathcal{Q}_{\bar{\xi}}` matches the posterior as good as possible.
Analogous to the MGVI algorithm, :math:`\bar{\xi}` is obtained by minimization of the KL-divergence between :math:`\mathcal{P}` and :math:`\mathcal{Q}_{\bar{\xi}}` w.r.t. :math:`\bar{\xi}`.
Furthermore the KL is represented as a stochastic estimate using a set of samples drawn from :math:`\mathcal{Q}_{\bar{\xi}}` which is implemented in NIFTy7 via :func:`~nifty7.minimization.kl_energies.GeoMetricKL`.
Publications
------------
If you use MGVI or geoVI, the authors of the respective papers [1]_ [2]_ would greatly appreciate a citation.
.. [1] J. Knollmüller, T.A. Enßlin, "Metric Gaussian Variational Inference"; `[arXiv:1901.11033] <https://arxiv.org/abs/1901.11033>`_
.. [2] P. Frank, R. Leike, and T.A. Enßlin (2021), "Geometric Variational Inference"; `[arXiv:2105.10470] <https://arxiv.org/abs/2105.10470>`_ `[doi] <https://doi.org/10.3390/e23070853>`_
NIFTy-related publications NIFTy-related publications
========================== ==========================
:: .. parsed-literal::
@article{asclnifty5, @article{asclnifty5,
title={NIFTy5: Numerical Information Field Theory v5}, title={NIFTy5: Numerical Information Field Theory v5},
author={Arras, Philipp and Baltac, Mihai and Ensslin, Torsten A and Frank, Philipp and Hutschenreuter, Sebastian and Knollmueller, Jakob and Leike, Reimar and Newrzella, Max-Niklas and Platz, Lukas and Reinecke, Martin and others}, author={Arras, Philipp and Baltac, Mihai and Ensslin, Torsten A and Frank, Philipp and Hutschenreuter, Sebastian and Knollmueller, Jakob and Leike, Reimar and Newrzella, Max-Niklas and Platz, Lukas and Reinecke, Martin and others},
...@@ -9,6 +9,7 @@ NIFTy-related publications ...@@ -9,6 +9,7 @@ NIFTy-related publications
year={2019} year={2019}
} }
.. parsed-literal::
@software{nifty, @software{nifty,
author = {{Martin Reinecke, Theo Steininger, Marco Selig}}, author = {{Martin Reinecke, Theo Steininger, Marco Selig}},
title = {NIFTy -- Numerical Information Field TheorY}, title = {NIFTy -- Numerical Information Field TheorY},
...@@ -17,7 +18,8 @@ NIFTy-related publications ...@@ -17,7 +18,8 @@ NIFTy-related publications
date = {2018-04-05}, date = {2018-04-05},
} }
@article{2013A&A...554A..26S, .. parsed-literal::
@article{nifty1,
author = {{Selig}, M. and {Bell}, M.~R. and {Junklewitz}, H. and {Oppermann}, N. and {Reinecke}, M. and {Greiner}, M. and {Pachajoa}, C. and {En{\ss}lin}, T.~A.}, author = {{Selig}, M. and {Bell}, M.~R. and {Junklewitz}, H. and {Oppermann}, N. and {Reinecke}, M. and {Greiner}, M. and {Pachajoa}, C. and {En{\ss}lin}, T.~A.},
title = "{NIFTY - Numerical Information Field Theory. A versatile PYTHON library for signal inference}", title = "{NIFTY - Numerical Information Field Theory. A versatile PYTHON library for signal inference}",
journal = {\aap}, journal = {\aap},
...@@ -35,7 +37,8 @@ NIFTy-related publications ...@@ -35,7 +37,8 @@ NIFTy-related publications
adsnote = {Provided by the SAO/NASA Astrophysics Data System} adsnote = {Provided by the SAO/NASA Astrophysics Data System}
} }
@article{2017arXiv170801073S, .. parsed-literal::
@article{nifty3,
author = {{Steininger}, T. and {Dixit}, J. and {Frank}, P. and {Greiner}, M. and {Hutschenreuter}, S. and {Knollm{\"u}ller}, J. and {Leike}, R. and {Porqueres}, N. and {Pumpe}, D. and {Reinecke}, M. and {{\v S}raml}, M. and {Varady}, C. and {En{\ss}lin}, T.}, author = {{Steininger}, T. and {Dixit}, J. and {Frank}, P. and {Greiner}, M. and {Hutschenreuter}, S. and {Knollm{\"u}ller}, J. and {Leike}, R. and {Porqueres}, N. and {Pumpe}, D. and {Reinecke}, M. and {{\v S}raml}, M. and {Varady}, C. and {En{\ss}lin}, T.},
title = "{NIFTy 3 - Numerical Information Field Theory - A Python framework for multicomponent signal inference on HPC clusters}", title = "{NIFTy 3 - Numerical Information Field Theory - A Python framework for multicomponent signal inference on HPC clusters}",
journal = {ArXiv e-prints}, journal = {ArXiv e-prints},
...@@ -48,3 +51,18 @@ NIFTy-related publications ...@@ -48,3 +51,18 @@ NIFTy-related publications
adsurl = {http://cdsads.u-strasbg.fr/abs/2017arXiv170801073S}, adsurl = {http://cdsads.u-strasbg.fr/abs/2017arXiv170801073S},
adsnote = {Provided by the SAO/NASA Astrophysics Data System} adsnote = {Provided by the SAO/NASA Astrophysics Data System}
} }
.. parsed-literal::
@article{geovi,
author = {Frank, Philipp and Leike, Reimar and Enßlin, Torsten A.},
title = {Geometric Variational Inference},
journal = {Entropy},
volume = {23},
year = {2021},
number = {7},
article-number = {853},
url = {https://www.mdpi.com/1099-4300/23/7/853},
issn = {1099-4300},
doi = {10.3390/e23070853}
}
...@@ -181,81 +181,3 @@ In the high dimensional setting of field inference these volume factors can diff ...@@ -181,81 +181,3 @@ In the high dimensional setting of field inference these volume factors can diff
A MAP estimate, which is only representative for a tiny fraction of the parameter space, might be a poorer choice (with respect to an error norm) compared to a slightly worse location with slightly lower posterior probability, which, however, is associated with a much larger volume (of nearby locations with similar probability). A MAP estimate, which is only representative for a tiny fraction of the parameter space, might be a poorer choice (with respect to an error norm) compared to a slightly worse location with slightly lower posterior probability, which, however, is associated with a much larger volume (of nearby locations with similar probability).
This causes MAP signal estimates to be more prone to overfitting the noise as well as to perception thresholds than methods that take volume effects into account. This causes MAP signal estimates to be more prone to overfitting the noise as well as to perception thresholds than methods that take volume effects into account.
Metric Gaussian Variational Inference
-------------------------------------
One method that takes volume effects into account is Variational Inference (VI).
In VI, the posterior :math:`\mathcal{P}(\xi|d)` is approximated by a simpler, parametrized distribution, often a Gaussian :math:`\mathcal{Q}(\xi)=\mathcal{G}(\xi-m,D)`.
The parameters of :math:`\mathcal{Q}`, the mean :math:`m` and its covariance :math:`D` are obtained by minimization of an appropriate information distance measure between :math:`\mathcal{Q}` and :math:`\mathcal{P}`.
As a compromise between being optimal and being computationally affordable, the variational Kullback-Leibler (KL) divergence is used:
.. math::
\mathrm{KL}(m,D|d)= \mathcal{D}_\mathrm{KL}(\mathcal{Q}||\mathcal{P})=
\int \mathcal{D}\xi \,\mathcal{Q}(\xi) \log \left( \frac{\mathcal{Q}(\xi)}{\mathcal{P}(\xi)} \right)
Minimizing this with respect to all entries of the covariance :math:`D` is unfeasible for fields.
Therefore, Metric Gaussian Variational Inference (MGVI) approximates the posterior precision matrix :math:`D^{-1}` at the location of the current mean :math:`m` by the Bayesian Fisher information metric,
.. math::
M \approx \left\langle \frac{\partial \mathcal{H}(d,\xi)}{\partial \xi} \, \frac{\partial \mathcal{H}(d,\xi)}{\partial \xi}^\dagger \right\rangle_{(d,\xi)}.
In practice the average is performed over :math:`\mathcal{P}(d,\xi)\approx \mathcal{P}(d|\xi)\,\delta(\xi-m)` by evaluating the expression at the current mean :math:`m`.
This results in a Fisher information metric of the likelihood evaluated at the mean plus the prior information metric.
Therefore we will only have to infer the mean of the approximate distribution.
The only term within the KL-divergence that explicitly depends on it is the Hamiltonian of the true problem averaged over the approximation:
.. math::
\mathrm{KL}(m|d) \;\widehat{=}\;
\left\langle \mathcal{H}(\xi,d) \right\rangle_{\mathcal{Q}(\xi)},
where :math:`\widehat{=}` expresses equality up to irrelevant (here not :math:`m`-dependent) terms.
Thus, only the gradient of the KL is needed with respect to this, which can be expressed as
.. math::
\frac{\partial \mathrm{KL}(m|d)}{\partial m} = \left\langle \frac{\partial \mathcal{H}(d,\xi)}{\partial \xi} \right\rangle_{\mathcal{G}(\xi-m,D)}.
We stochastically estimate the KL-divergence and gradients with a set of samples drawn from the approximate posterior distribution.
The particular structure of the covariance allows us to draw independent samples solving a certain system of equations.
This KL-divergence for MGVI is implemented by
:func:`~nifty7.minimization.kl_energies.MetricGaussianKL` within NIFTy7.
Note that MGVI typically provides only a lower bound on the variance.
Geometric Variational Inference
-------------------------------
For non-linear posterior distributions :math:`\mathcal{P}(\xi|d)` an approximation with a Gaussian :math:`\mathcal{Q}(\xi)` in the coordinates :math:`\xi` is sub-optimal, as higher order interactions are ignored.
A better approximation can be achieved by constructing a coordinate system :math:`y = g\left(\xi\right)` in which the posterior is close to a Gaussian, and perform VI with a Gaussian :math:`\mathcal{Q}(y)` in these coordinates.
This approach is called Geometric Variation Inference (geoVI).
It is discussed in detail in [6]_.
One useful coordinate system is obtained in case the metric :math:`M` of the posterior can be expressed as the pullback of the Euclidean metric by :math:`g`:
.. math::
M = \left(\frac{\partial g}{\partial \xi}\right)^T \frac{\partial g}{\partial \xi} \ .
In general, such a transformation exists only locally, i.e. in a neighbourhood of some expansion point :math:`\bar{\xi}`, denoted as :math:`g_{\bar{\xi}}\left(\xi\right)`.
Using :math:`g_{\bar{\xi}}`, the GeoVI scheme uses a zero mean, unit Gaussian :math:`\mathcal{Q}(y) = \mathcal{G}(y, 1)` approximation.
It can be expressed in :math:`\xi` coordinates via the pushforward by the inverse transformation :math:`\xi = g_{\bar{\xi}}^{-1}(y)`:
.. math::
\mathcal{Q}_{\bar{\xi}}(\xi) = \left(g_{\bar{\xi}}^{-1} * \mathcal{Q}\right)(\xi) = \int \delta\left(\xi - g_{\bar{\xi}}^{-1}(y)\right) \ \mathcal{G}(y, 1) \ \mathcal{D}y \ ,
where :math:`\delta` denotes the Kronecker-delta.
GeoVI obtains the optimal expansion point :math:`\bar{\xi}` such that :math:`\mathcal{Q}_{\bar{\xi}}` matches the posterior as good as possible.
Analogous to the MGVI algorithm, :math:`\bar{\xi}` is obtained by minimization of the KL-divergence between :math:`\mathcal{P}` and :math:`\mathcal{Q}_{\bar{\xi}}` w.r.t. :math:`\bar{\xi}`.
Furthermore the KL is represented as a stochastic estimate using a set of samples drawn from :math:`\mathcal{Q}_{\bar{\xi}}` which is implemented in NIFTy7 via :func:`~nifty7.minimization.kl_energies.GeoMetricKL`.
A visual comparison of the MGVI and GeoVI algorithm can be found in `variational_inference_visualized.py <https://gitlab.mpcdf.mpg.de/ift/nifty/-/blob/NIFTy_7/demos/variational_inference_visualized.py>`_.
.. [6] P. Frank, R. Leike, and T.A. Enßlin (2021), "Geometric Variational Inference"; `[arXiv:2105.10470] <https://arxiv.org/abs/2105.10470>`_
...@@ -12,6 +12,7 @@ are found in the `API reference <../mod/nifty7.html>`_. ...@@ -12,6 +12,7 @@ are found in the `API reference <../mod/nifty7.html>`_.
whatisnifty whatisnifty
installation installation
ift ift
approximate_inference
volume volume
code code
citations citations
...@@ -15,24 +15,15 @@ ...@@ -15,24 +15,15 @@
# #
# NIFTy is being developed at the Max-Planck-Institut fuer Astrophysik. # NIFTy is being developed at the Max-Planck-Institut fuer Astrophysik.
from setuptools import find_packages, setup
import os import os
import site
import sys
from setuptools import find_packages, setup
def write_version(): # Workaround until https://github.com/pypa/pip/issues/7953 is fixed
import subprocess site.ENABLE_USER_SITE = "--user" in sys.argv[1:]
try:
p = subprocess.Popen(["git", "describe", "--dirty", "--tags", "--always"],
stdout=subprocess.PIPE)
res = p.communicate()[0].strip().decode('utf-8')
except FileNotFoundError:
print("Could not determine version string from git history")
res = "unknown"
with open(os.path.join("nifty7", "git_version.py"), "w") as f:
f.write('gitversion = "{}"\n'.format(res))
write_version()
exec(open('nifty7/version.py').read()) exec(open('nifty7/version.py').read())
with open("README.md") as f: with open("README.md") as f:
......
...@@ -496,6 +496,7 @@ def GeoMetricKL(mean, hamiltonian, n_samples, minimizer_samp, mirror_samples, ...@@ -496,6 +496,7 @@ def GeoMetricKL(mean, hamiltonian, n_samples, minimizer_samp, mirror_samples,
-------- --------
`Geometric Variational Inference`, Philipp Frank, Reimar Leike, `Geometric Variational Inference`, Philipp Frank, Reimar Leike,
Torsten A. Enßlin, `<https://arxiv.org/abs/2105.10470>`_ Torsten A. Enßlin, `<https://arxiv.org/abs/2105.10470>`_
`<https://doi.org/10.3390/e23070853>`_
""" """
if not isinstance(hamiltonian, StandardHamiltonian): if not isinstance(hamiltonian, StandardHamiltonian):
raise TypeError raise TypeError
......
# Store the version here so:
# 1) we don't load dependencies by storing it in __init__.py
# 2) we can import it in setup.py for the same reason
# 3) we can import it into your module module
__version__ = '7.0' __version__ = '7.0'
def gitversion():
try:
from .git_version import gitversion
except ImportError:
return "unknown"
return gitversion
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment