Skip to content
GitLab
Projects
Groups
Snippets
/
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Philipp Arras
whatisalikelihood
Commits
6adf12f8
Commit
6adf12f8
authored
Aug 24, 2018
by
Philipp Arras
Browse files
More text
parent
4cf08f90
Pipeline
#35435
failed with stage
in 19 seconds
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Sidebyside
main.tex
View file @
6adf12f8
...
...
@@ 3,6 +3,10 @@
\usepackage
{
csquotes
}
\usepackage
{
hyperref
}
\usepackage
{
fancyvrb
}
\usepackage
[onehalfspacing]
{
setspace
}
\usepackage
{
libertine
}
% Ein bisschen enger. Formeln normal.
\setlength
{
\parindent
}{
0mm
}
\usepackage
[left=3cm,right=2.25cm,top=3cm,bottom=3cm]
{
geometry
}
\usepackage
{
listings
}
\usepackage
{
color
}
...
...
@@ 49,10 +53,54 @@
\begin{document}
\VerbatimFootnotes
\section*
{
The need for a guide
}
A Bayesian reconstruction algorithm may be viewed as having three basic building
blocks. First, there is the
\emph
{
likelihood
}
$
\mathcal
P
(
ds
)
$
which describes the
measurement process. It incorporates all information one has about the
experiment. It contains the data, the statistics of the data including its
uncertainty and a description of the measurement device itself. Second, the
\emph
{
prior
}
$
\mathcal
P
(
s
)
$
describes the knowledge the scientist has
\emph
{
before
}
executing the experiment. In order to set a proper prior one needs
to make the knowledge one has about the physical process which was observed by
the measurement device. Finally, one needs an algorithmic and computational
framework which is able to actually do the Bayesian inference given the above
information.
It becomes clear that these three parts are relatively separate from each other.
If you have some data and know your measurement process and measurement device,
you've got all information in order to implement the likelihood. In contrast,
the IFT group led by Torsten En
{
\ss
}
lin has knowledge and GPLlicensed software
which can perform the inference. The interesting part is the second one, i.e.
thinking about the priors. Here one needs to have an understanding both of the
underlying physics which one has observed and probability theory and Bayesian
statistics. This is moment where an interesting interaction between observers,
i.e. you, and information field theorists, i.e. Torsten and his group, can
happen.
Since Torsten's group cannot help you a lot in implementing the likelihood, we
decided to put together this guide in order to explain to you how to do it on
your own in a fashion which is compatible with our inference machinery called
NIFTy.
\section*
{
Disclaimer
}
\begin{itemize}
\item
IFT is good for reconstructing fields, i.e. physical quantities (like
temperature, flux, etc.) which are defined on a continuous space (like time,
space, frequency/energy of electromagnetic radiation). If you've got data,
which tells you something about such a field, keep on reading. If not, you may
use IFT and NIFTy as well but perhaps there are other methods available to you
which fit your needs better.
\item
The explanations below shall be pretty general in order to be applicable
to a variety of problems. However, there is some probability that your
\texttt
{
R
}
(will be defined below) is just a linear function. In this case
everything becomes very simple, since the derivative of a linear function is
just the function itself. Then only
\texttt
{
R(s)
}
and
\texttt
{
R
\_
adjoint(d)
}
need to be implemented. You do not need to understand what I mean already at
this point. Just feel lucky and relax. You may relax anyways.
\item
Read the full guide before you start to implement something.
\end{itemize}
\section*
{
Mathematical description of what NIFTy needs
}
For first order minimization:
\begin{itemize}
\item
$
\mathcal
H
(
ds
)
=

\log
\mathcal
P
(
ds
)
+
\text
{
constant in
}
s
$
\item
$
\mathcal
H'
=
\frac
{
\partial
}{
\partial
s
}
\mathcal
H
(
ds
)
$
...
...
@@ 81,23 +129,23 @@ Since NIFTy is implemented in python and is based on numpy let us be as specific
as possible and talk about numpy arrays. In the end,
$
s
$
and
$
d
$
will be a numpy
array defined in a python script.
The array
\
verb

d

is created by a script which reads in the actual data which
The array
\
texttt
{
d
}
is created by a script which reads in the actual data which
drops out of an instrument. This array will remain constant throughout the IFT
algorithm since the data is given and never will be changed.
The array
\
verb

s

is obviously not constant; only its shape won't change in the
The array
\
texttt
{
s
}
is obviously not constant; only its shape won't change in the
course of the algorithm. NIFTy will store here the reconstruction of the
physical field which the user wants to infer.
\
verb

s

would be a
physical field which the user wants to infer.
\
texttt
{
s
}
would be a
onedimensional array for a timeseries, twodimensional for a multifrequency
timeseries or for a singlefrequency image of the sky, threedimensional for a
multifrequency image of sky, etc. To cut a long story short: be aware of the
shape of
\
verb

s

and
\verb

d

!
shape of
\
texttt
{
s
}
and
\texttt
{
d
}
!
Now, the response appears. When we talk about
\enquote
{
the response
}
we mean a
function
\
verb

R(s)

which takes an array of shape
\
verb

s.shape

and returns an
array of shape
\
verb

d.shape

. This function shall be made such that it
simulates the measurement device. Assume one knows the signal
\
verb

s

, what
would be the data
\
verb

d

in a noiseless measurement? Make sure that the
function
\
texttt
{
R(s)
}
which takes an array of shape
\
texttt
{
s.shape
}
and returns an
array of shape
\
texttt
{
d.shape
}
. This function shall be made such that it
simulates the measurement device. Assume one knows the signal
\
texttt
{
s
}
, what
would be the data
\
texttt
{
d
}
in a noiseless measurement? Make sure that the
following code runs:
\begin{lstlisting}
...
...
@@ 119,32 +167,32 @@ function:
\begin{align*}
R:
\mathbb
R
^{
s.shape
}
\to
\mathbb
R
^{
d.shape
}
,
\end{align*}
its gradient at the position
\
verb

s=position

is a linear map of the following shape:
its gradient at the position
\
texttt
{
s=position
}
is a linear map of the following shape:
\footnote
{
There are various ways to think about derivatives and gradients of
multidimensional functions. A different view on gradients would be that at a
given point
$
s
=
s
_
0
$
the gradient is a matrix with
\
verb

s.size

columns and
\
verb

d.size

rows. Obviously, it is not feasible to store such a matrix on a
given point
$
s
=
s
_
0
$
the gradient is a matrix with
\
texttt
{
s.size
}
columns and
\
texttt
{
d.size
}
rows. Obviously, it is not feasible to store such a matrix on a
computer due to its size. There we think of this matrix in terms of a linear
map which maps an array of shape
\
verb

s.shape

to an array of shape
\
verb

d.shape

. This linear map shall be implemented on the computer in terms
of a function. Think of this map as a linear approximation to
\
verb

R

based
at
\
verb

s
_0

.
}
map which maps an array of shape
\
texttt
{
s.shape
}
to an array of shape
\
texttt
{
d.shape
}
. This linear map shall be implemented on the computer in terms
of a function. Think of this map as a linear approximation to
\
texttt
{
R
}
based
at
\
texttt
{
s
\
_
0
}
.
}
\begin{align*}
\left
.
\frac
{
dR
}{
ds
}
\right

_{
s=position
}
= R':
\mathbb
R
^{
s.shape
}
\to
\mathbb
R
^{
d.shape
}
\end{align*}
What needs to be implemented is a function
\
verb

R
_prime(position, s0)

which
takes the arguments
\
verb

position

(which is an array of shape
\
verb

s.shape

What needs to be implemented is a function
\
texttt
{
R
\
_
prime(position, s0)
}
which
takes the arguments
\
texttt
{
position
}
(which is an array of shape
\
texttt
{
s.shape
}
and determines the position at which we want to calculate the derivative) and
the array
\
verb

s0

which shall be the derivative taken of.
\
verb

R
_prime

is nonlinear in
\
verb

position

in general and linear in
\
verb

s0

. The output of
\
verb

R
_prime

is of shape
\
verb

d.shape

.
the array
\
texttt
{
s0
}
which shall be the derivative taken of.
\
texttt
{
R
\
_
prime
}
is nonlinear in
\
texttt
{
position
}
in general and linear in
\
texttt
{
s0
}
. The output of
\
texttt
{
R
\
_
prime
}
is of shape
\
texttt
{
d.shape
}
.
Finally, we need to implement the function
\
verb
R
_
prime
_
adjoint(postition,
d0).
It implements the adjoint action of the derivative:
$
R'
^
\dagger
$
. It takes
a
\
verb

position

of shape
\
verb

s.shape

as well. After the position is set
Finally, we need to implement the function
\
texttt
{
R
\
_
prime
\
_
adjoint(postition,
d0)
}
.
It implements the adjoint action of the derivative:
$
R'
^
\dagger
$
. It takes
a
\
texttt
{
position
}
of shape
\
texttt
{
s.shape
}
as well. After the position is set
we've got a linear function again which shall be the adjoint to
\
verb

R
_prime(position, _)

. Thus,
\
verb

d0

has the same shape as the data
\
verb

d

and the output of
\
verb

R
_prime

has the shape
\
verb

s.shape

.
\
texttt
{
R
\
_
prime(position,
\
_
)
}
. Thus,
\
texttt
{
d0
}
has the same shape as the data
\
texttt
{
d
}
and the output of
\
texttt
{
R
\
_
prime
}
has the shape
\
texttt
{
s.shape
}
.
In order to test your implementation make sure that the following code runs
through:
...
...
@@ 168,16 +216,63 @@ np.testing.assert_allclose(finite_diff, derivative)
\end{lstlisting}
\section*
{
The same in NIFTy language
}
You might have noticed that you needed to use the size of a pixel in signal
space in order to implement your response. This pixel size will change as soon
as we change the resolution of our reconstruction
\texttt
{
s
}
. And this is where
NIFTy comes in. NIFTy has the notion of
\enquote
{
spaces
}
. In a nutshell, a NIFTy
space comes with the following properties:
\begin{itemize}
\item
Topology of the space: NIFTy supports regular grid spaces of arbitrary
dimensions, two 2sphere pixelations and unstructured spaces (which do not
have a specific topology and do not have pixel sizes).
\item
Size of a pixel. This is called
\enquote
{
distances
}
in NIFTy.
\end{itemize}
What needs to be done now, is to make the three functions which were implemented
above (
\texttt
{
R(s)
}
,
\texttt
{
R
\_
prime(position, s)
}
and
\texttt
{
R
\_
prime
\_
adjoint(position, d)
}
) aware of the space on which
\texttt
{
position
}
,
\texttt
{
s
}
(these are the same) and
\texttt
{
d
}
are defined.
To this end, define your own class
\texttt
{
Response
}
which inherits from the
NIFTy class
\texttt
{
LinearOperator
}
. I recommend to take a simple linear
operator like the
\texttt
{
FieldZeroPadder
}
, copy it and adopt it to your needs.
The method
\texttt
{
apply()
}
takes an instance of
\texttt
{
Field
}
(which is
esentially a numpy array accompanied by a domain) and returns one as well.
Some tipps exclusively for you:
\begin{itemize}
\item
Please have a look at the method
\texttt
{
weight()
}
of
\texttt
{
Field
}
. With
it, you can easily multiply the field values with or divide by the volume of
each pixel.
\item
The methods
\texttt
{
from
\_
global
\_
data()
}
and
\texttt
{
to
\_
global
\_
data()
}
convert a Field to a numpy array and vice versa.
\item
Be aware of the fact that NIFTy fields are immuatable. As soon as you
pass a numpy array into a NIFTy field with
\texttt
{
from
\_
global
\_
data()
}
the
\enquote
{
lock
}
flag will be set and the numpy array is immuatable
afterwards.
\texttt
{
to
\_
global
\_
data()
}
returns an object reference to this
numpy array which is locked and cannot be modified. If you want to modify
it, you may want to copy it with
\texttt
{
to
\_
global
\_
data().copy()
}
.
\end{itemize}
The point is: You need to fiddle around until the following test passes:
\begin{lstlisting}
import nifty5 as ift
# op is the linear operator you have defined
ift.extra.consistency
_
check(op)
\end{lstlisting}
This test does the same as the test for the adjointness above.
% TODO Include models
\section*
{
Example
$
\gamma
$
ray imaging
}
\section*
{
Example
:
$
\gamma
$
ray imaging
}
The information a
$
\gamma
$
ray astronomer would provide to the algorithm (in the
simplest case):
\begin{itemize}
\item
Data has Poissonian statistics.
\item
Two functions:
\
verb

R(s)

and
\
verb

R
_adjoint(d)

where
$
R
$
applies a convolution with
a Gaussian kernel of given size to
\
verb

s

and picks out the positions to which
there exists a data point in
\
verb

d

.
\verb

R
_adjoint(d)

implements
$
R
^
\dagger
$
.
\item
Two functions:
\
texttt
{
R(s)
}
and
\
texttt
{
R
\
_
adjoint(d)
}
where
$
R
$
applies a convolution with
a Gaussian kernel of given size to
\
texttt
{
s
}
and picks out the positions to which
there exists a data point in
\
texttt
{
d
}
.
\texttt
{
R
\
_
adjoint(d)
}
implements
$
R
^
\dagger
$
.
\end{itemize}
Why is this already sufficient?
\begin{itemize}
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment