diff git a/main.tex b/main.tex
index 4a6cdf34cfbe1cdd149d11c59f9b759621ff9a2b..cb45ea03f15bf6d7c0a360199969adbe4f74cc0f 100644
 a/main.tex
+++ b/main.tex
@@ 55,59 +55,45 @@
\VerbatimFootnotes
\section*{Nomenclature}
$d$ is the data vector and $s$ is the physical field about which one wants to
learn something given the data $d$ in a Bayesian fashion.
+$d$ is the data vector and $s$ is the physical field about which one wants to learn something given the data $d$ in a Bayesian fashion.
\section*{Why should I read this guide?}
Bayesian reconstruction algorithms may be viewed in terms of three basic
building blocks.
+Bayesian reconstruction algorithms may be viewed in terms of three basic building blocks.
\begin{enumerate}
\item The \emph{likelihood} $\mathcal P(ds)$ describes the measurement process.
 It incorporates all available information about the experiment. It contains
 the data, the statistics of the data including the error bars and a
 description of the measurement device itself.
\item The \emph{prior} $\mathcal P(s)$ describes the knowledge the scientist has
 \emph{before} executing the experiment. In order to define a prior one needs to
 make one's knowledge about the physical process which was observed by the
 measurement device explicit.
+ It incorporates all available information about the experiment.
+ It contains the data, the statistics of the data including the error bars and a description of the measurement device itself.
+\item The \emph{prior} $\mathcal P(s)$ describes the knowledge the scientist has \emph{before} executing the experiment.
+ In order to define a prior one needs to make one's knowledge about the physical process which was observed by the measurement device explicit.
\item Finally, one needs an algorithmic and computational framework which is able to actually do the Bayesian inference, compute the \emph{posterior} $\mathcal P(s/d)$, given the above information.
\end{enumerate}
It becomes clear that these three parts are separate from each other. To
implement the likelihood one needs the data and know the measurement process and
measurement device. Then, one's got all information in order to implement the
likelihood. This is your part!
+It becomes clear that these three parts are separate from each other.
+To implement the likelihood one needs the data and know the measurement process and measurement device.
+Then, one's got all information in order to implement the likelihood.
+This is your part!
The IFT group led by Torsten En{\ss}lin has knowledge and GPLlicensed software
to address the third part.
+The IFT group led by Torsten En{\ss}lin has knowledge and GPLlicensed software to address the third part.
The interesting part is the second one, i.e. thinking about the priors. Here one
needs to have an understanding both of the underlying physics and Bayesian probability
theory. This is where an interesting interaction between
observers, i.e. you, and information field theorists, i.e. Torsten and his
group, will happen.
+The interesting part is the second one, i.e.\ thinking about the priors.
+Here one needs to have an understanding both of the underlying physics and Bayesian probability theory.
+This is where an interesting interaction between observers, i.e.\ you, and information field theorists, i.e.\ Torsten and his group, will happen.
Since Torsten's group cannot help you a lot in implementing the likelihood, we
have put together this guide in order to explain to you how to do it on your own
in a fashion which is compatible with our inference machinery called NIFTy.
+Since Torsten's group cannot help you a lot in implementing the likelihood, we have put together this guide in order to explain to you how to do it on your own in a fashion which is compatible with our inference machinery called NIFTy.
\section*{Disclaimer}
\begin{itemize}
\item IFT is good for reconstructing fields, i.e. physical quantities (like
 temperature, flux, etc.) which are defined on a continuous space (like time,
 space, frequency/energy of electromagnetic radiation). If you've got data,
 which tells you something about such a field, keep on reading. If not, you may
 use IFT and NIFTy as well, but there may be other methods available which
 fit your needs better.
\item The explanations below shall be pretty general in order to be applicable
 to a variety of problems. Therefore, the text might be difficult to understand
 in its full generality. However, it is not unlikely that your \lstinline{R}
 (which will be defined below) is just a linear function. In this case
 everything becomes very simple, since the derivative of a linear function is
 just the function itself. Then only \lstinline{R(s)} and \lstinline{R_adjoint(d)}
 need to be implemented. You do not need to understand what I'm saying already
 at this point. Just feel lucky and relax. You may relax anyways.
+\item IFT is good for reconstructing fields, i.e.\ physical quantities (like temperature, flux, etc.) which are defined on a continuous space (like time, space, frequency/energy of electromagnetic radiation).
+ If you've got data, which tells you something about such a field, keep on reading. If not, you may use IFT and NIFTy as well, but there may be other methods available which fit your needs better.
+\item The explanations below shall be pretty general in order to be applicable to a variety of problems.
+ Therefore, the text might be difficult to understand in its full generality.
+ However, it is not unlikely that your \lstinline{R} (which will be defined below) is just a linear function.
+ In this case everything becomes very simple, since the derivative of a linear function is just the function itself.
+ Then only \lstinline{R(s)} and \lstinline{R_adjoint(d)} need to be implemented.
+ You do not need to understand what I'm saying already at this point.
+ Just feel lucky and relax.
+ You may relax anyways.
\item Read the full guide before you start implementing something.
\end{itemize}
@@ 117,17 +103,18 @@ The abstract definition of what you'll need to implement is very short:
\item The loglikelihood $\mathcal H (ds) :=  \log \mathcal P (ds)$ as a function of $s$.
\item Its derivative $\mathcal H' = \frac{\partial}{\partial s} \mathcal H (ds)$.
\end{itemize}
That's it. The rest of this paper explains what these formulae mean and how to
compute them. From now one, our discussion will become increasingly specific.
+That's it.
+The rest of this guide explains what these formulae mean and how to compute them.
+From now one, our discussion will become increasingly specific.
\section*{Becoming more specific}
If the likelihood is Gaussian (with noise covariance $N$ and response $R$)
\begin{align*}
\mathcal H(ds) \propto (dR(s))^\dagger N^{1}(dR(s))
+ \mathcal H(ds) \propto (dR(s))^\dagger N^{1}(dR(s))
\end{align*}
or Poissonian
\begin{align*}
\mathcal H(ds) \propto  \log (R(s))^\dagger d+\sum_i R(s)_i,
+ \mathcal H(ds) \propto  \log (R(s))^\dagger d+\sum_i R(s)_i,
\end{align*}
NIFTy needs:
\begin{itemize}
@@ 138,31 +125,25 @@ NIFTy needs:
\section*{Even more specific}
Since NIFTy is implemented in Python and is based on numpy let us be as specific
as possible and talk about numpy arrays. In the end, $s$ and $d$ will be numpy
arrays.

The array \lstinline{d} is created by a script which reads in the actual data which
drops out of an instrument. This array will remain constant throughout the IFT
algorithm since the data is given and never will be changed.

The array \lstinline{s} is obviously not constant; only its shape won't change in
the course of the algorithm. In \lstinline{s}, NIFTy will store the reconstruction
of the physical field which the user wants to infer. \lstinline{s} would be a
onedimensional array for a timeseries, twodimensional for a multifrequency
timeseries or for a singlefrequency image of the sky, threedimensional for a
multifrequency image of the sky, etc. To cut a long story short: be aware of the
shape of \lstinline{s} and \lstinline{d}!
+Since NIFTy is implemented in Python and is based on numpy let us be as specific as possible and talk about numpy arrays.
+In the end, $s$ and $d$ will be numpy arrays.
+
+The array \lstinline{d} is created by a script which reads in the actual data which drops out of an instrument.
+This array will remain constant throughout the IFT algorithm since the data is given and never will be changed.
+
+The array \lstinline{s} is obviously not constant; only its shape won't change in the course of the algorithm.
+In \lstinline{s}, NIFTy will store the reconstruction of the physical field which the user wants to infer.
+\lstinline{s} would be a onedimensional array for a timeseries, twodimensional for a multifrequency timeseries or for a singlefrequency image of the sky, threedimensional for a multifrequency image of the sky, etc.
+To cut a long story short: be aware of the shape of \lstinline{s} and \lstinline{d}!
\subsection*{The response}
When we talk about \enquote{the response}, we mean a function \lstinline{R(s)} which
takes an array of shape \lstinline{s.shape} and returns an array of shape
\lstinline{d.shape}. This function shall be made such that it simulates the
measurement device. Assume one knows the signal \lstinline{s}, what would be the
data \lstinline{d} in a noiseless measurement? Make sure that you understand what a
response is to 100\%. If you have questions about it, do not hesitate to ask!
This is one of the crucial parts of the whole process. The following code shall
be running.
+When we talk about \enquote{the response}, we mean a function \lstinline{R(s)} which takes an array of shape \lstinline{s.shape} and returns an array of shape \lstinline{d.shape}.
+This function shall be made such that it simulates the measurement device.
+Assume one knows the signal \lstinline{s}, what would be the data \lstinline{d} in a noiseless measurement?
+Make sure that you understand what a response is to 100\%.
+If you have questions about it, do not hesitate to ask!
+This is one of the crucial parts of the whole process.
+The following code shall be running.
\begin{lstlisting}
import numpy as np
@@ 179,40 +160,31 @@ else:
\end{lstlisting}
\subsection*{Derivative of response}
Next, $R'$ and $R'^\dagger$ need to be implemented. Since $R$ is a
function:
+Next, $R'$ and $R'^\dagger$ need to be implemented.
+Since $R$ is a function:
\begin{align*}
R: \mathbb R^{s.shape} \to \mathbb R^{d.shape},
\end{align*}
its gradient at the position \lstinline{s=position} is a linear map of the
following shape: \footnote{There are various ways to think about derivatives and
 gradients of multidimensional functions. A different view on gradients would
 be that at a given point $s=\text{position}$ the gradient is a matrix with
 \lstinline{s.size} columns and \lstinline{d.size} rows. Obviously, it is not
 feasible to store such a matrix on a computer due to its size. Therefore we think
 of this matrix in terms of a linear map which maps an array of shape
 \lstinline{s.shape} to an array of shape \lstinline{d.shape}. This linear map shall
 be implemented on the computer in terms of a function. Think of this map as a
 linear approximation to \lstinline{R} based at \lstinline{position}.}
+its gradient at the position \lstinline{s=position} is a linear map of the following shape:\footnote{There are various ways to think about derivatives and gradients of multidimensional functions.
+ A different view on gradients would be that at a given point $s=\text{position}$ the gradient is a matrix with \lstinline{s.size} columns and \lstinline{d.size} rows.
+ Obviously, it is not feasible to store such a matrix on a computer due to its size.
+ Therefore we think of this matrix in terms of a linear map which maps an array of shape \lstinline{s.shape} to an array of shape \lstinline{d.shape}.
+ This linear map shall be implemented on the computer in terms of a function.
+ Think of this map as a linear approximation to \lstinline{R} based at \lstinline{position}.}
\begin{align*}
\left. \frac{dR}{ds}\right_{s=\text{position}} = R': \mathbb R^{s.shape} \to \mathbb R^{d.shape}
\end{align*}
What needs to be implemented is a function \lstinline{R_prime(position, s0)} which
takes the arguments \lstinline{position} (which is an array of shape \lstinline{s.shape}
and determines the position at which we want to calculate the derivative) and
the array \lstinline{s0} of which the derivative shall be taken.
\lstinline{R_prime} is nonlinear in \lstinline{position} in general and linear in
\lstinline{s0}. The output of \lstinline{R_prime} is of shape \lstinline{d.shape}.
+What needs to be implemented is a function \lstinline{R_prime(position, s0)} which takes the arguments \lstinline{position} (which is an array of shape \lstinline{s.shape} and determines the position at which we want to calculate the derivative) and the array \lstinline{s0} of which the derivative shall be taken.
+\lstinline{R_prime} is nonlinear in \lstinline{position} in general and linear in \lstinline{s0}.
+The output of \lstinline{R_prime} is of shape \lstinline{d.shape}.
Finally, we need to implement the function \lstinline{R_prime_adjoint(postition, d0)}.
It implements the adjoint action of the derivative: $R'^\dagger$. It takes
a \lstinline{position} of shape \lstinline{s.shape} as well. After the position is set
we've got a linear function again which shall be the adjoint to
\lstinline{R_prime(position, _)}. Thus, \lstinline{d0} has the same shape as the data
\lstinline{d} and the output of \lstinline{R_prime} has the shape \lstinline{s.shape}.

In order to test your implementation make sure that the following code runs
through:
+It implements the adjoint action of the derivative: $R'^\dagger$.
+It takes a \lstinline{position} of shape \lstinline{s.shape} as well.
+After the position is set we've got a linear function again which shall be the adjoint to \lstinline{R_prime(position, _)}.
+Thus, \lstinline{d0} has the same shape as the data \lstinline{d} and the output of \lstinline{R_prime} has the shape \lstinline{s.shape}.
+
+In order to test your implementation make sure that the following code runs through:
\begin{lstlisting}
from numpy.random import default_rng
rng = default_rng()
@@ 236,50 +208,38 @@ np.testing.assert_allclose(finite_diff, derivative)
\end{lstlisting}
\section*{The same in NIFTy language}
You might have noticed that you needed to use the size of a pixel in signal
space in order to implement your response. This pixel size will change as soon
as we change the resolution of our reconstruction \lstinline{s}. And this is where
NIFTy comes in. NIFTy has the notion of \enquote{spaces}. In a nutshell, a NIFTy
space comes with the following properties:
+You might have noticed that you needed to use the size of a pixel in signal space in order to implement your response.
+This pixel size will change as soon as we change the resolution of our reconstruction \lstinline{s}.
+And this is where NIFTy comes in. NIFTy has the notion of \enquote{spaces}.
+In a nutshell, a NIFTy space comes with the following properties:
\begin{itemize}
\item Topology of the space supported by NIFTY:
\begin{itemize}
\item regular grid spaces of arbitrary dimensions (RGSpace)
\item two 2sphere pixelations, namely HEALPix (HPSpace) and GaussLegendre (GLSpace)
\item unstructured spaces (UnstructuredDomain) (which do not
 have a specific topology and do not have pixel sizes).
\end{itemize}

+ \begin{itemize}
+ \item regular grid spaces of arbitrary dimensions (RGSpace)
+ \item two 2sphere pixelations, namely HEALPix (HPSpace) and GaussLegendre (GLSpace)
+ \item unstructured spaces (UnstructuredDomain) (which do not have a specific topology and do not have pixel sizes).
+ \end{itemize}
+
\item Size of a pixel. This is called \enquote{distances} in NIFTy.
\end{itemize}
What needs to be done now, is to make the three functions which were implemented
above (\lstinline{R(s)}, \lstinline{R_prime(position, s)} and
\lstinline{R_prime_adjoint(position, d)}) aware of the space on which
\lstinline{position}, \lstinline{s} (these are the same) and \lstinline{d} are defined.
+What needs to be done now, is to make the three functions which were implemented above (\lstinline{R(s)}, \lstinline{R_prime(position, s)} and \lstinline{R_prime_adjoint(position, d)}) aware of the space on which \lstinline{position}, \lstinline{s} (these are the same) and \lstinline{d} are defined.
\subsection*{Derivative of response}
To this end, define your own class \lstinline{DerivativeResponse} which inherits
from the NIFTy class \lstinline{LinearOperator}. I recommend to take a simple
linear operator like the \lstinline{FieldZeroPadder}, copy it and adopt it to your
needs. The method \lstinline{apply()} takes an instance of \lstinline{Field} (which is
esentially a numpy array accompanied by a domain) and returns one as well.
+To this end, define your own class \lstinline{DerivativeResponse} which inherits from the NIFTy class \lstinline{LinearOperator}.
+I recommend to take a simple linear operator like the \lstinline{FieldZeroPadder}, copy it and adopt it to your needs.
+The method \lstinline{apply()} takes an instance of \lstinline{Field} (which is esentially a numpy array accompanied by a domain) and returns one as well.
Some tips exclusively for you:
\begin{itemize}
\item Please have a look at the method \lstinline{weight()} of \lstinline{Field}. With
 it, you can easily multiply the field values with or divide by the volume of
 each pixel.
\item The methods \lstinline{from_raw()} and \lstinline{val()}
 convert a Field to a numpy array and vice versa.
\item Be aware of the fact that NIFTy fields are immutable. As soon as you
 pass a numpy array into a NIFTy field with \lstinline{from_raw()}, the
 \enquote{lock} flag will be set and the numpy array is immutable
 afterwards. \lstinline{val()} returns an object reference to this
 numpy array which is locked and cannot be modified. If you want to modify
 it, you may want to obtain an unlocked copy via
 \lstinline{val_rw()}.
+\item Please have a look at the method \lstinline{weight()} of \lstinline{Field}.
+ With it, you can easily multiply the field values with or divide by the volume of each pixel.
+\item The methods \lstinline{from_raw()} and \lstinline{val()} convert a Field to a numpy array and vice versa.
+\item Be aware of the fact that NIFTy fields are immutable.
+ As soon as you pass a numpy array into a NIFTy field with \lstinline{from_raw()}, the \enquote{lock} flag will be set and the numpy array is immutable afterwards.
+ \lstinline{val()} returns an object reference to this numpy array which is locked and cannot be modified.
+ If you want to modify it, you may want to obtain an unlocked copy via \lstinline{val_rw()}.
\end{itemize}
The point is: You need to fiddle around until the following test passes:
@@ 291,31 +251,24 @@ ift.extra.consistency_check(op)
This test does the same as the test for the adjointness above.
\subsection*{Response}
So far, we have wrapped the derivative of the response in a
\lstinline{LinearOperator}. The other thing to be done is to make the function
\lstinline{R(s)} fieldaware. Rewrite it such that it takes a field in signal space
as input and returns a field in data space. Make sure that all methods you have
implemented can deal with arbitrary sizes of the signal space. All pixel volumes
should be taken care of by you.
+So far, we have wrapped the derivative of the response in a \lstinline{LinearOperator}.
+The other thing to be done is to make the function \lstinline{R(s)} fieldaware.
+Rewrite it such that it takes a field in signal space as input and returns a field in data space.
+Make sure that all methods you have implemented can deal with arbitrary sizes of the signal space.
+All pixel volumes should be taken care of by you.
\section*{Example: $\gamma$ray imaging}
The information a $\gamma$ray astronomer would provide to the algorithm (in the
simplest case):
+The information a $\gamma$ray astronomer would provide to the algorithm (in the simplest case):
\begin{itemize}
\item Data has Poissonian statistics.
\item Two functions: \lstinline{R(s)} and \lstinline{R_adjoint(d)} where $R$ applies a convolution with
 a Gaussian kernel of given size to \lstinline{s} and picks out the positions to which
 there exists a data point in \lstinline{d}. \lstinline{R_adjoint(d)} implements $R^\dagger$.
+\item Two functions: \lstinline{R(s)} and \lstinline{R_adjoint(d)} where $R$ applies a convolution with a Gaussian kernel of given size to \lstinline{s} and picks out the positions to which there exists a data point in \lstinline{d}. \lstinline{R_adjoint(d)} implements $R^\dagger$.
\end{itemize}
Why is this already sufficient?
\begin{itemize}
\item The Hamiltonian $\mathcal H$ is given by: $\mathcal H (ds) =  \log
 (R(s))^\dagger d+\sum_i R(s)_i$. Implementing $R$ and stating that the data is
 Poissonian determines this form.
 \item Since $R$ is a composition of a convolution and a sampling, both of which
 are linear operations, $R$ itself is a linear operator.\footnote{I.e. $R(\alpha
 s_1+s_2) = \alpha R(s_1) + R(s_2)$.} Thus, $R' = R$ and $R'^\dagger =
 R^\dagger$. All in all, we need an implementation for $R$ and $R^\dagger$.
+\item The Hamiltonian $\mathcal H$ is given by: $\mathcal H (ds) =  \log (R(s))^\dagger d+\sum_i R(s)_i$.
+ Implementing $R$ and stating that the data is Poissonian determines this form.
+\item Since $R$ is a composition of a convolution and a sampling, both of which are linear operations, $R$ itself is a linear operator.\footnote{i.e.\ $R(\alpha s_1+s_2) = \alpha R(s_1) + R(s_2)$.} Thus, $R' = R$ and $R'^\dagger = R^\dagger$.
+ All in all, we need an implementation for $R$ and $R^\dagger$.
\end{itemize}
\end{document}