Commit 01a19a3d authored by Pilar Cossio-Tejada's avatar Pilar Cossio-Tejada
Browse files

Polishing & including priors without FFTALGO

parent 470535fe
......@@ -2,7 +2,7 @@ cmake_minimum_required(VERSION 2.6)
project(BioEM)
###Set up options
option (INCLUDE_CUDA "Build BioEM with CUDA support" ON)
option (INCLUDE_CUDA "Build BioEM with CUDA support" OFF)
option (INCLUDE_OPENMP "Build BioEM with OpenMP support" ON)
option (INCLUDE_MPI "Build BioEM with MPI support" ON)
option (PRINT_CMAKE_VARIABLES "List all CMAKE Variables" OFF)
......@@ -23,7 +23,7 @@ else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${BIOEM_GCC_FLAGS}")
endif()
set (BIOEM_SOURCE_FILES "bioem.cpp" "main.cpp" "map.cpp" "model.cpp" "param.cpp" "cmodules/timer.cpp")
set (BIOEM_SOURCE_FILES "bioem.cpp" "main.cpp" "map.cpp" "model.cpp" "param.cpp" "timer.cpp")
###Find Required Packages
find_package(PkgConfig)
......
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
< BioEM software for Bayesian inference of Electron Microscopy images>
Copyright (C) 2016 Pilar Cossio, David Rohr and Gerhard Hummer
Copyright (C) 2016 Pilar Cossio, David Rohr, Fabio Baruffa, Markus Rampp,
Volker Lindenstruth and Gerhard Hummer.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
......
%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
% < BioEM software Manual for Bayesian inference of Electron Microscopy images>
% Copyright (C) 2016 Pilar Cossio, David Rohr, Fabio Baruffa, Markus Rampp,
% Volker Lindenstruth and Gerhard Hummer.
% Max Planck Institute of Biophysics, Frankfurt, Germany.
%
% See license statement for terms of distribution.
%
% ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
\documentclass[a4paper,10pt]{report}
\usepackage[british,activeacute]{babel}
\usepackage[latin1]{inputenc}
......@@ -23,12 +33,14 @@
\usepackage{xcolor}
\usepackage{listings}
\usepackage{bm}
\usepackage{geometry}
\usepackage[a4paper, total={7in, 10in}]{geometry}
\usepackage{fancyhdr}
\usepackage{mathptmx}
%\setlength{\headheight}{15.2pt}
%\pagestyle{fancy}
\setlength\parindent{0pt}
\newcommand{\HRule}{\rule{\linewidth}{0.5mm}}
%\usepackage[utf8]{inputenc}
......@@ -73,7 +85,7 @@ For any comments or questions please contact: {\it pilar.cossio@biophys.mpg.de}.
\section*{Copyright}
$<${\bf BioEM} software for Bayesian inference of Electron Microscopy images$>$
Copyright (C) 2016 Pilar Cossio, David Rohr, Volker Linderstruth and Gerhard Hummer.
Copyright (C) 2016 Pilar Cossio, David Rohr, Fabio Baruffa, Markus Rampp, Volker Linderstruth and Gerhard Hummer.
The BioEM program is a free software, under the terms of the GNU General Public License as published by
the Free Software Foundation, version 3 of the License. This program is distributed in the hope that it will be useful,
......@@ -83,7 +95,8 @@ GNU General Public License for more details.
\section*{Citation}
To publish results from the usage of the BioEM software please cite refs. \cite{CossioHummerJSB_2013,BioEM_server}.
Please cite refs.~\cite{CossioHummerJSB_2013,BioEM_server}.
%For scientific results from usage of the BioEM program please cite refs. \cite{CossioHummerJSB_2013,BioEM_server}.
\tableofcontents
......@@ -98,21 +111,23 @@ To publish results from the usage of the BioEM software please cite refs. \cite{
\section{Introduction}
Most biological systems are dynamic, they change conformation with time and inter-convert between several functional metastable states.
These flexible biomolecules can be characterized using electron microscopy (EM), a technique that produces frozen images of the sample in a near-native environment.
Each EM particle-image contains information of the configuration of the frozen biomolecule, and, in principle, each particle could be in a different conformational state.
Thus, the flexibility of the single-molecules can be probed by analyzing each EM image individually.
However, this is very challenging because the signal-to-noise level is very low. Effective EM methods cluster and average the particle-images to increase the signal, and
require most particles to be in the same conformation. This has so far limited EM to study just a small subset of non-flexible biomolecules that acquire a unique stable state.
Here, we present a computing tool to harness the single-molecule character of EM for studying dynamic biomolecules.
With our method, we can categorize and classify models of flexible systems from individual EM images.
These flexible biomolecules can be characterized using electron microscopy (EM),
a technique that produces frozen images of the sample in a near-native environment.
Each individual image contains information of the instantaneous configuration of the biomolecule, and, in principle,
each particle can be in a different conformational state.
However, analyzing the images individually is challenging because the signal-to-noise level is very low.
This has so far limited EM to study a small subset of static biomolecules, because,
for it to be effective, it requires most particles to be in the same conformation.
Here, we present a computing tool to harness the single-molecule caracter of EM for studying dynamic biomolecules.
With our method, we can categorize and classify models of flexible biomolecules from individual EM images.
Bayesian inference of electron microscopy images, BioEM \cite{CossioHummerJSB_2013,BioEM_server}, allows us to
compute the posterior probability of a set of models given experimental data.
compute the posterior probability of a model given experimental data.
The BioEM posterior is calculated by solving a multidimensional integral over many nuisance parameters that account for
the experimental factors in the image formation, such as molecular orientation and interference effects.
The BioEM software computes this integral via numerical grid sampling over a portable CPU/GPU computing platform.
By comparing the BioEM posterior probabilities it is possible to discriminate and rank structural models, allowing to characterize
the dynamics of many biomolecular systems.
the variability and dynamics of the biological system.
In this chapter, we briefly describe the mathematical background of the BioEM method.
Then, we present the necessary tools and procedures to install the BioEM software. We describe the prerequisite programs
......@@ -124,7 +139,8 @@ we describe the steps to install BioEM using the CMake program.
The BioEM method calculates the posterior probability of a model, $m$, given a set of experimental images, $\omega \in \Omega$.
Its key idea is to create a calculated image, from the original model, as similar as possible to the experimental image.
The calculated image is modelled with nuisance parameters, $\boldsymbol \theta$, that describe the moleculal orientation, interference effects with the Point Spread Function (PSF), uncertainties
The calculated image is generated using nuisance parameters, $\boldsymbol \theta$,
that describe the molecule orientation, interference effects with the Point Spread Function (PSF), uncertainties
in the particle center, intensity normalization, offset and noise.
Figure \ref{fig:likeliCons} exemplifies how a calculated image from a model, with a given set of nuisance parameters, is created.
Technically, the model is first rotated to a given orientation, then projected along the $z$-axis,
......@@ -132,7 +148,7 @@ then it is convoluted with a PSF to cope with imaging artifacts,
next it is shifted by a certain number of pixels to account for the uncertainties in the particle center.
Normalization, and offset in the intensity, as well as noise, are taken implicitly into account.
The calculated image is compared to an experimental particle-image, $\omega$, through a likelihood function, $L(\omega|m,\boldsymbol\theta)$.
Eq. 7 of ref. \cite{CossioHummerJSB_2013} shows its analytical formulation.
Eq. 7 of ref.~\cite{CossioHummerJSB_2013} shows its analytical formulation.
\begin{figure}[h]
\begin{centering}
......@@ -144,7 +160,7 @@ Eq. 7 of ref. \cite{CossioHummerJSB_2013} shows its analytical formulation.
\label{fig:likeliCons}
\end{figure}
The Bayesian posterior probability of a model, given an experimental image, is a weighted integral over the product of prior probabilities and likelihood, over all nuisance parameters,
The posterior probability of a model, given an experimental image, is a weighted integral over the product of prior probabilities and likelihood, over all nuisance parameters,
\begin{equation}
P_{m\omega} \propto \int
L(\omega|m,\boldsymbol\theta)p_M(m)p(\boldsymbol\theta)
......@@ -153,51 +169,53 @@ The Bayesian posterior probability of a model, given an experimental image, is a
\end{equation}
where $p_M(m)$, $p(\boldsymbol\theta)$ are the prior probabilities of model and parameters, respectively.
The BioEM software is used to perform the integrals in Eq. \ref{eq:Pmom} over orientation, PSF parameters, and center displacement using numerical grid sampling.
The remaining integrals over the intensity normalization, offset, and noise are performed analytically following ref. \cite{CossioHummerJSB_2013}.
The remaining integrals over the intensity normalization, offset, and noise are performed analytically following ref.~\cite{CossioHummerJSB_2013}.
The posterior probability of a single model given a set of images, $\omega \in \Omega$, becomes
\begin{equation}
P(m|\Omega) \propto \prod_{\omega=1}^{\Omega}P_{m\omega}~.
\label{eq:pb2}
\end{equation}
The main result of the BioEM software is the computation of Eq. \ref{eq:pb2}. This can be used for model comparison and discrimination ({\it e.g.} to rank the best model) or
to calculate the posterior probability of a full set of models, $m \in M$, following Eq. 2 of ref. \cite{CossioHummerJSB_2013}.
The main result of the BioEM software is the computation of Eq. \ref{eq:pb2}.
This can be used for model comparison and discrimination ({\it e.g.} to rank the best model) or
to calculate the posterior probability of a full set of models, $m \in M$, following Eq. 2 of ref.~\cite{CossioHummerJSB_2013}.
In this manual, it is assumed that the user has sufficient comprehension of the BioEM theory. Therefore, it is encouraged to read refs. \cite{CossioHummerJSB_2013,BioEM_server} thoroughly.
In this manual, it is assumed that the user has sufficient comprehension
of the BioEM theory. Therefore, it is encouraged to read refs.~\cite{CossioHummerJSB_2013,BioEM_server} thoroughly.
\section{Installation}
\subsection{Prerequisite programs and libraries}
Before installation, there are several programs and libraries that should be preinstalled on the compute node. In the following,
we give a brief explanation of the mandatory, and optional prerequisite programs.
we give a brief explanation of the mandatory, and optional prerequisite programs. {\bf PC: The following needs checking::}
\subsubsection{Mandatory preinstalled libraries and programs}
\subsubsection{Mandatory preinstalled libraries/programs}
\begin{itemize}
\item {\it FFTW library (minimal version {\bf 3.3.4}):} is a subroutine library for computing the discrete Fourier transform.
\item {\it FFTW library (minimal version {\bf 3.3.3}):} is a subroutine library for computing the discrete Fourier transform.
It is specifically used in BioEM, to calculate the convolution of the ideal image with the PSF, and
the cross-correlation of the calculated image to the experimental image. FFTW can be downloaded from the webpage www.fftw.org.
\item {\it BOOST library (minimal version {\bf 1.57}):} provides support for tasks and structures such as linear algebra, pseudorandom number generation, multithreading,
\item {\it BOOST library (minimal version {\bf 1.57}):} provides support for tasks and structures
such as linear algebra, pseudorandom number generation, multithreading,
image processing, and unit testing. In particular, this library is used to access and organize input-data in the BioEM code.
BOOST can be downloaded from www.boost.org.
\item {\it OpenMP:} is a programming interface that supports shared memory parallel programming.
It is normally included in the standard GNU or Intel c++ compliers, so no downloading should be necessary. For more information see http://openmp.org/.
It is normally, included in the standard GNU or Intel c++ compliers, so no downloading should be necessary. For more information see http://openmp.org/.
\end{itemize}
\subsubsection{Optional preinstalled programs}
The optional but {\it encouraged} to use programs for an easy compilation, and optimal performance, are described bellow:
The optional but {\it encouraged} to use programs for an easy compilation, and optimal performance, are described below:
\begin{itemize}
\item {\it CMake (minimal version {\bf 2.6}):} is a cross-platform software for managing the build process of software using a compiler-independent method ({\it i.e.} creating a Makefile).
CMake can be downloaded from www.cmake.org.
\item {\it CUDA (minimal version {\bf 6.5}):} is a parallel computing platform implemented by the graphics processing units (GPUs) that NVIDIA produce.
\item {\it CUDA (minimal version {\bf 5.5}):} is a parallel computing platform implemented by the graphics processing units (GPUs) that NVIDIA produce.
Thus, NVIDIA graphics cards are necessary for running BioEM with the CUDA implementation. For more information see
www.nvidia.com.
......@@ -212,19 +230,19 @@ it will be possible to install BioEM.
\vspace{0.5cm}
{\it Note:} It is recommended that the same complier is used for the preinstalled
libraries and for BioEM.
{\it Note:} It is recommended that the same complier that is used to compile the
libraries is also used to compile BioEM.
\subsection{Download}
\label{download}
A compressed directory of the BioEM software can be downloaded from [{\it mpi biophys}].
A compressed directory of the BioEM software can be downloaded from [{\bf PC: mpi biophys or RZG?}].
After downloading the {\it tar.gz} file, uncompress it by executing
\vspace{0.5cm}
\fbox{%
\parbox{12cm}{
{\footnotesize \texttt{tar -zxvf BioEM.tar.gz}}}}
{ \texttt{tar -zxvf BioEM.tar.gz}}}}
\vspace{0.5cm}
......@@ -232,12 +250,12 @@ In the uncompressed {\bf BioEM} directory, there are:
\begin{itemize}
\item[--]the source code {\it .cpp} and {\it .cu} files.
\item[--]the {\bf include} directory with corresponding header files.
\item[--]the copyright license, and {\it README.md} files.
\item[--]the {\it CMakeLists.txt} file that is necessary for installation with CMake (see section bellow).
\item[--]the {\bf Tutorial\_BioEM} directory that includes the example files used in the tutorial (see chapter \ref{tutorial}). Inside this directory,
there is also the {\bf MODEL\_COMPARISON} directory.
\item[--]the copyright license, and {\it README.md} file.
\item[--]the {\it CMakeLists.txt} file that is necessary for installation with CMake (see below).
\item[--]the {\bf Tutorial\_BioEM} directory that includes the example files used in the tutorial (chapter \ref{tutorial}). Inside this directory,
there is also a directory called {\bf MODEL\_COMPARISON}.
\item[--]the {\bf Quaternions} directory that includes files with lists of quaternions that sample uniformly
the rotational group {\it SO3} (see section \ref{intor}).
the rotational group {\it SO3} (section \ref{intor}).
\end{itemize}
......@@ -248,9 +266,23 @@ CMake contains all the instructions to generate automatically a {\it Makefile}
according to the specific architecture of the computing node, and the desired features of parallelization.
CMake uses the {\it CMakeLists.txt} file. This file is provided in the uncompressed {\bf BioEM} directory.
The {\it CMakeLists.txt} has several modifiable options, that
should be enabled/disabled ({\bf ON}/{\bf OFF}, respectively) according to the desired functionalities of parallelization.
should be enabled/disabled ({\bf ON}/{\bf OFF}, respectively) according to the desired functionalities.
The keywords for the modifiable options are shown in Table \ref{tableCMake}.
These options can be enabled or disabled by executing cmake with:
\\
\fbox{%
\parbox{12cm}{
\texttt{-D<optionname>=ON/OFF}}}
\\
\noindent For example, to turn on the compilation with CUDA run
\\
\fbox{%
\parbox{12cm}{
{\texttt{cmake -DINCLUDE\_CUDA=ON CMakeLists.txt}}}}\\
It is also possible to modify these options directly in the CMakeLists.txt file. At the begining of this file, the
keywords and ON/OFF options are presented.
\begin{table}[h]
\begin{center}
......@@ -273,28 +305,13 @@ standard host compiler is incompatible with CUDA)\\
\hline
\end{tabular}
\end{center}
\caption{CMake adjustable options.}
\caption{CMake keyword options.}
\label{tableCMake}
\end{table}
These options can be enabled or disabled by executing cmake with:
\fbox{%
\parbox{10cm}{
\texttt{-D<optionname>=ON/OFF}}}
\\
For example, to turn on the compilation with CUDA run
\fbox{%
\parbox{10cm}{
\footnotesize{\texttt{cmake -DINCLUDE\_CUDA=ON ../CMakeLists.txt}}}}\\
It is also possible to modify these options directly in the {\it CMakeLists.txt} file. The adjustable
keywords and \texttt{ON/OFF} options are at the beginning of this file.
{\it Note:} For certain architectures, a {\it FindFFTW.cmake} may be required to find the FFTW libraries. This file is included in the {\bf BioEM} directory.
For more information on specific CMake features see www.cmake.org.
{\it Note:} For certain architectures, an {\it FindFFTW.cmake} may be required to find the FFTW libraries. This file is included in the {\bf BioEM} directory.
%For more information on specific CMake features see www.cmake.org.
\subsubsection{Steps for basic installation}
......@@ -305,14 +322,14 @@ For more information on specific CMake features see www.cmake.org.
\item[--] Create a build directory in the main {\bf BioEM} directory, and access it by
\fbox{%
\parbox{10cm}{
{\footnotesize \texttt{mkdir build \&\& cd build}}}}
\parbox{12cm}{
{\texttt{mkdir build \&\& cd build}}}}
\item[--] Run CMake with the desired options and the {\it CMakeLists.txt} file
\item[--] Run CMake with the desired options and the {\it CMakeLists.txt} file {\bf PC: Not working for me in the Bio cluster where the build is in-source??}
\fbox{%
\parbox{10cm}{
{\footnotesize \texttt{%cd build \\
\parbox{12cm}{
{\texttt{
cmake -D<optionname1>=ON -D<optionname2>=OFF ../CMakeLists.txt}}}}
\item[--] If this process is successful, a {\it Makefile} and {\bf CMakeFiles} directory should be generated.
If this is not the case, enable the variable
......@@ -321,8 +338,8 @@ CMake with verbosity to debug.
\item[--] After generating the {\it Makefile}, execute it% in the build directory
\fbox{%
\parbox{10cm}{
{\footnotesize \texttt{make}}}}
\parbox{12cm}{
{ \texttt{make}}}}
\item[--] If this process is successful a \texttt{bioEM} executable should be generated.
\end{itemize}
......@@ -332,7 +349,7 @@ For a simple test, run the BioEM executable
\fbox{%
\parbox{12cm}{
{\footnotesize \texttt{./bioEM}}}}
{\texttt{./bioEM}}}}
\vspace{0.5cm}
......@@ -348,7 +365,7 @@ If the code runs successfully, the output on the terminal screen should be as th
\fbox{%
\parbox{12cm}{
{\footnotesize \texttt{
{\texttt{
Command line inputs:\\
--Modelfile arg (Mandatory) Name of model file\\
--Particlesfile arg if BioEM (Mandatory) Name of particle-image file\\
......@@ -361,7 +378,7 @@ Command line inputs:\\
--DumpMaps (Optional) Dump maps after they were read from particle-image file\\
--LoadMapDump (Optional) Read Maps from dump option\\
--OutputFile arg (Optional) For changing the outputfile name\\
--help (Optional) Produce help message\\
--help (Optional) Produce help message
}}}
}
\end{tabular}
......@@ -376,10 +393,10 @@ Command line inputs:\\
\chapter{BioEM Input}
In this chapter, we describe the BioEM input commands and keywords, as well as the file formats.
BioEM has two main sources of input: from the commandline and from the input-parameter file. In the first section, we describe the commandline input items from Table \ref{tableBioEM}.
In the second section, we describe the keywords that should be specified in the input-parameter file. Lastly, we describe the specific formats of the model, particle-image, and
parameter files that can be read with the BioEM software.
In this chapter, we describe the BioEM input commands and keywords. BioEM has two
main sources of input: from the commandline and from the input-parameter file. In the first section, we describe each commandline item from Table \ref{tableBioEM}.
In the second section, we describe the keywords that should be specified in the input-parameter file. Lastly, we describe the specific formats of the model, particle-image,
and input-paramter files that are used in the BioEM software.
\section{Commandline Input}
......@@ -389,12 +406,12 @@ The names of these files are passed to the \texttt{bioEM} executable via the com
We now give a detailed description of the commandline input items shown in Table \ref{tableBioEM}.
\subsection{Model file}
\label{modfile}
The structural model is represented as spheres in 3-dimensional space. The position of the center of the sphere should be specified in the model file, as well
as its corresponding radius and number of electrons. These spheres can represent atoms, coarse-grained residues or multi-scale blobs.
The radius size approximately determines the resolution of the model. Spheres with radius less than the pixel size are projected on to a single pixel.
The name of the file containing the model has to be provided in the commandline when \texttt{bioEM} is executed:
\noindent The name of the file containing the model has to be provided in the commandline when \texttt{bioEM} is executed:
\vspace{0.5cm}
\fbox{%
......@@ -403,7 +420,7 @@ The name of the file containing the model has to be provided in the commandline
\vspace{0.5cm}
\noindent where \texttt{arg} is the model file name. The possible formats for the model ({\it pdb} or text) are described in section \ref{modformat}.
\noindent where \texttt{arg} is the model filename. The possible formats for the model ({\it pdb} or text) are described in section \ref{modformat}.
\subsection{Particle-image file}
\label{partimag}
......@@ -443,8 +460,8 @@ The first time the particle-image file is read, include in the commandline the k
{\footnotesize \texttt{--LoadMapDump }}}}
\vspace{0.5cm}
\noindent Note that the {\it maps.dump} file should be in the same directory where the code is executed. The latter option does not require
the \texttt{ --Particlesfile} command. See chapter \ref{tutorial} for examples.
\noindent Note that the {\it maps.dump} file should be in the same directory where the code is executed. Using this last option, it is not necessary
to include \texttt{ --Particlesfile} in the commandline. See chapter \ref{tutorial} for examples.
\subsection{Input-parameter file}
......@@ -458,7 +475,7 @@ change the output, but has a large influence on the compute performance. The two
the second set belongs to the compute node where the problem is processed. For a detailed description of the performance variables see chapter \ref{perfparm}.
The physical parameters are passed via an input-parameter file that contains specific keywords for the physical constraints, and integration limits of the algorithm.
The name of the input-parameter file is passed to the \texttt{bioEM} executable using the commandline:
The name of the input-parameter file is passed via the commandline:
\vspace{0.5cm}
\fbox{%
......@@ -466,7 +483,7 @@ The name of the input-parameter file is passed to the \texttt{bioEM} executable
{\footnotesize \texttt{ --Inputfile arg}}}}
\vspace{0.5cm}
\noindent where \texttt{arg} is the file name.
\noindent where \texttt{arg} is the filename.
In section \ref{InputParam}, we describe in detail the keywords used in the input-parameter file.
......@@ -487,7 +504,8 @@ For this feature use the following commandline keyword
{\footnotesize \texttt{--ReadOrientation arg}}}}
\vspace{0.5cm}
\noindent where \texttt{arg} is the name of the file containing the list of orientations. The format for the orientations (Euler angles or quaternions) is described in section \ref{orform}.
\noindent where \texttt{arg} is the name of the file containing the list of orientations.
The format for the orientations (Euler angles or quaternions) is described in section \ref{orform}.
\subsection{BioEM output}
\label{biout}
......@@ -502,14 +520,16 @@ To change the name of the output file use the following commandline keyword
\noindent where \texttt{arg} is the desired name of the output file.
This file contains the logarithm of the posterior probability of
the model to each individual experimental image (see section \ref{anaout} for additonal output file descriptions).
the model to each individual experimental image and the parameter set that gives
a maximum of the posterior (see section \ref{anaout} for its format).
\subsection{Print a 2D image from a model}
\label{printmap}
This option is {\it completely independent} of the BioEM posterior probability calculation.
Its purpose is to print a synthetic image from a model given a specific set of parameters.
This feature can be useful, for example, to compare an experimental image to the calculated image of a model given a set of parameters ({\it e.g.} Figure 1 of ref. \cite{CossioHummerJSB_2013}), or to create a synthetic
This feature can be useful for post-analysis, for example to compare the best calculated map
to an experimental image (as in Figure 1 of ref.~\cite{CossioHummerJSB_2013}), or to create a synthetic
image set with artificial noise.
The commandline keyword is
......@@ -521,7 +541,7 @@ The commandline keyword is
\vspace{0.5cm}
\noindent where \texttt{arg} is the name of the inputfile that contains the parameter specifications to create the image. These keywords are different
\noindent where \texttt{arg} is the name of the inputfile that contains the parameter specifications to create an image. The keywords for this input file are different
from those used in the standard BioEM posterior calculations. For a detailed description of these keywords see section \ref{parambestmap}.
Here, an output file, called \texttt{BESTMAP}, contains the corresponding calculated image.
See chapter \ref{tutorial} for examples.
......@@ -531,10 +551,10 @@ See chapter \ref{tutorial} for examples.
\section{Input of Physical Parameters}
In the last section, we described the main commandline keywords used in the BioEM code.
Now, we focus on the input of the physical parameters that are passed via the input-parameter file (see also section \ref{Inputfile}).
These variables describe the physical conditions and constraints of the BioEM algorithm, such as the pixel size, the integration ranges
and grid points. They are specified using keywords in the input-parameter file.
Up to now, we have seen several commandline inputs that can be used in BioEM. We now focus on the
input of the physical parameters that are necessary for the BioEM computation and are read from {\it inside} the input-parameter file.
These parameters describe the physical constraints of the algorithm, such as the integration ranges
and grid points, and are passed using specific keywords in the this file (see also section \ref{Inputfile}).
\label{InputParam}
......@@ -554,7 +574,7 @@ We assume a square particle-image. Here, \texttt{(int)} is the number of pixels
\end{itemize}
In a standard BioEM calculation, the integration over the model orientations,
In the BioEM calculation, the integration over the model orientations,
PSF parameters, and center displacement are performed numerically.
To do so, one needs to define the integration ranges,
and grid spacing for each parameter.
......@@ -563,29 +583,26 @@ thus should be specified by the user.
\subsection{Integration of orientations}
\label{intor}
In BioEM, there are two ways to describe the orientation of the model
There are two ways to describe the orientation of the model
in 3D space: with the Euler angles or with quaternions.
\begin{itemize}
\item {\it Euler Angles}. The Euler angles are $\alpha,\beta,\gamma$, and represent a sequence of three elemental rotations, {\it i.e.} rotations about the axes of a coordinate system.
The first rotation is around the $z$-axis by an angle $\alpha$, the second rotation is around the $x$-axis by an angle $\beta$, and a last rotation is again around the $z$-axis by an angle $\gamma$.
\item {\it Euler Angles}. The Euler angles are $\alpha,\beta,\gamma$, and represent a sequence of three elemental rotations about the axes of a coordinate system.
We use the reference rotations $Z_1 X_2 Z_3$, such that the first rotation is around the $z$-axis by an angle $\alpha$, the second rotation is around the $x$-axis by an angle $\beta$, and a last rotation is again around the $z$-axis by an angle $\gamma$.
\item {\it Quaternions}. %The orientation of a rigid body can also be described with quaternions.
\item {\it Quaternions}. The orientation of a rigid body can also be described with quaternions.
A set of quaternions is a four-dimensional vector over the real numbers ($q_1$, $q_2$, $q_3$, $q_4$) each within $[-1,1]$ such that
$1=q_1^2+q_2^2+q_3^2+q_4^2$. Euler's theorem states that the orientation of a system about a fixed point can be expressed in terms of a single rotation by a given angle about a fixed axis. The quaternions are used to express this axis and the rotation about it.
In this case, the keyword \\
\texttt{USE\_QUATERNIONS} \\
is required in the input-parameter file.
$1=q_1^2+q_2^2+q_3^2+q_4^2$.
\end{itemize}
There are also several ways to sample the space of Euler angles or quaternions.
Now, we {\it importantly remark} that not all possibilities sample uniformly the group of rotations in 3D space ({\it SO3}), which is crucial to perform a fast and accurate
There are several ways to sample the space of Euler angles or quaternions.
We {\it importantly remark} that not all possibilities sample uniformly the group of rotations in 3D space ({\it SO3}), which is crucial to perform a fast and accurate
integration of uniformly distributed model orientations.
\subsubsection{Uniform sampling of SO3}
To uniformly sample {\it SO3}, we recommend using a list of quaternions generated with the successive orthonormal images method from ref. \cite{Yershova2010}.
To uniformly sample {\it SO3}, we recommend using a list of quaternions generated with the successive orthonormal images method from ref.~\cite{Yershova2010}.
In the directory {\bf Quaternions}, we provide lists of quaternions that have been generated
using this method. Here, it is necessary to follow section \ref{ortfile} because a list of quaternions is read from a separate file.
using this method. Here, it is necessary to follow section \ref{ortfile} because a list of quaternions is read from a separate file. To use quaternions
the keyword \texttt{USE\_sQUATERNIONS} in the input-parameter file is also required.
\subsubsection{Non-uniform sampling}
......@@ -607,7 +624,7 @@ The keywords in the parameter file are
\texttt{GRIDPOINTS\_ALPHA$\sim$ 2*GRIDPOINTS\_BETA}.
\item {\it Grid-sampling of quaternions:}
With BioEM it is also possible to generate a n\"aive grid in quaternion space. One should provide the
With BioEM it is also possible to generate a grid in quaternion space. One should provide the
keywords
\indent \texttt{USE\_QUATERNIONS}\\
......@@ -615,7 +632,7 @@ keywords
where \texttt{(int)} is the grid spacing in each dimension $[-1,1]$.
\item {\it Non-uniform sampling with orientations read from a file:} We note that with the option of reading the orientations from a file (section \ref{ortfile}) the user has
\item {\it Non-uniform sampling of orientations from a file:} We note that with the option of reading the orientations from a file (section \ref{ortfile}) the user has
great freedom to sample, also non-uniformly, the orientational space.
......@@ -623,17 +640,18 @@ great freedom to sample, also non-uniformly, the orientational space.
\subsection{Integration of the CTF or PSF parameters}
\subsection{Integration of the PSF parameters}
\subsubsection{Parameters in Fourier space using the CTF and Envelope:}
%\subsubsection{Parameters in Fourier space using the CTF and Envelope:}
To take into account the interference effects in the
experiment, we multiply the idea image from a model with the contrast transfer function (CTF) and envelope function in Fourier space (see ref. \cite{BioEM_server}).
experiment, we convolute the ideal image from the model with the PSF.
In practice, we use its Fourier-space equivalent, which is the multiplication the contrast transfer function (CTF) and envelope function.
An approximate expression for the CTF is
\begin{equation}
\mathrm{CTF}(s)=-A\cos(as^2/2)-\sqrt{1-A^2}\sin(as^2/2),
\end{equation}
where $s$ is the radial spatial frequency, and $a=2\pi \lambda \Delta f$ with $\lambda$ the
electron wavelength, and $\Delta f$ the defocus. Parameter $A \in [0,1]$ establishes the contributions of the cosine and sine components.
where $s$ is the radial spatial frequency, and $a=2\pi \lambda \Delta f$ with $\lambda$ is the
electron wavelength, and $\Delta f$ is the defocus. Parameter $A \in [0,1]$ establishes the contributions of the cosine and sine components.
The envelope function is
\begin{equation}
......@@ -641,9 +659,8 @@ The envelope function is
\end{equation}
where parameter $b$ controlls the Gaussian width and modulates the CTF.
For the BioEM computation, one must integrate numerically Eq. \ref{eq:Pmom} over the three parameters $a$ (or equivalently $\Delta f$), $b$ and $A$.
To do so, one should include in the input-parameter file the keyword, integration limits, and number of grid points for each parameter:
To calculate the BioEM posterior probability, we integrate numerically the three parametes $\Delta f$, $b$ and $A$.
To do so, one should include in the input-parameter file the keyword for each parameter, its integration limits, and number of grid points:
\vspace{0.3cm}
\textit{Parameter -- (start) -- (end) -- (gridpoints)}\\
......@@ -651,8 +668,7 @@ To do so, one should include in the input-parameter file the keyword, integratio
\indent \texttt{CTF\_B\_ENV (float) (float) (int)}\\
\indent \texttt{CTF\_AMPLITUDE (float) (float) (int)}\\
\noindent The defocus, $\Delta f$, should be in units of $\mu$m, and the $b$ envelope parameter in \AA$^2$. The amplitude parameter $A$ is adimensional within $[0,1]$.
\noindent The defocus, $\Delta f$, should be in units of $\mu$m, and $b$ in \AA$^2$. The amplitude parameter $A$ is adimensional within $[0,1]$.
The default value of the electron wavelength is 0.019688$\AA$, which corresponds to a $300 kV$ microscope. To change this value use
the keyword\\
......@@ -661,50 +677,36 @@ the keyword\\
\noindent where \texttt{(float)} should be in $\AA$.
In Figure \ref{CTFPSF}, we show examples of $|\mathrm{CTF}(s)\mathrm{Env}(s)|^2$ for different $b$ envelope parameters (top) and different defocus (bottom).
%\subsubsection{Parameters in real space using the PSF:}
%The point spread function (PSF) is the real-space equivalent of the contrast transfer and envelope functions,
%its analytic expression is
%\begin{equation}
% \mathrm{PSF}(r)\propto e^{-\chi r^2/2} \big(-A_R\cos(\vartheta r^2/2)-\sqrt{1-A_R^2}\sin(\vartheta r^2/2)\big).
%\end{equation}
%Similarly as in Fourier space, the PSF has three real-space variables: amplitude $A_R$, envelope $\chi$ and phase $\vartheta$.
%The amplitude $A_R$ defines the contribution of the PSF real-space sine or cosine parts and is within $[0,1]$.
%The envelope parameter $\chi$ and phase $\vartheta$ should be given in units of $\AA^{-2}$.
%The analytic relation between the parameters in Fourier and real space is shown in the supporting information of ref. \cite{BioEM_server}.
%The ranges of these parameters will depend on the specific imaging conditions of each experiment
%(defocus, astigmatism etc). In Figure \ref{CTFPSF}, we show examples of point spread functions and their corresponding CTF and envelope.
\begin{figure}[ht!]
\begin{centering}
\includegraphics[width=\textwidth]{CTF_PSF_Manual.eps}
\par\end{centering}
\caption{ {\it Contrast transfer function and equivalent point spread function.}
Examples of $|\mathrm{CTF}(s)\mathrm{Env}(s)|^2$ for different $b$ envelope values at fixed defocus $\Delta f=1\mu m$ (top); and different defocus values with fixed envelope $b=200\AA$ (bottom).
The equivalent PSF for each CTF is shown on the right panel. The amplitude $A=0.2$ and electron wavelength $\lambda=0.019688\AA$
are fixed.}
\label{CTFPSF}
\end{figure}
%To use the PSF instead of the CTF analysis (which is default), one should include the keyword:
%\indent \texttt{USE\_PSF}
\subsubsection{Parameters in real space using the PSF:}
%The information of the integration limits and grid points should be included in the input file as:
The point spread function (PSF) is the real-space equivalent of the contrast transfer and envelope functions,
its analytic expression is
\begin{equation}
\mathrm{PSF}(r)\propto e^{-\chi r^2/2} \big(-A_R\cos(\vartheta r^2/2)-\sqrt{1-A_R^2}\sin(\vartheta r^2/2)\big).
\end{equation}
%Here, an ideal calculated image is convoluted with the PSF to mimic the interference effects in the imaging experiment.
Similarly as in Fourier space, the PSF has three real-space variables: amplitude $A_R$, envelope $\chi$ and phase $\vartheta$.
The amplitude $A_R$ defines the contribution of the PSF real-space sine or cosine parts and is within $[0,1]$.