Commit da09ec2b authored by Pilar Cossio's avatar Pilar Cossio
Browse files

Guide

parent e73c526d
......@@ -2,20 +2,20 @@
< BioEM software for Bayesian inference of Electron Microscopy images>
Copyright (C) 2014 Pilar Cossio, David Rohr and Gerhard Hummer.
Max Planck Institute of Biophysics, Frankfurt, Germany.
Copyright (C) 2014 Pilar Cossio, David Rohr and Gerhard Hummer.
Max Planck Institute of Biophysics, Frankfurt, Germany.
See license statement for terms of distribution.
See license statement for terms of distribution.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*************************************************************
BioEM: Bayesian inference of Electron Microscopy
*************************************************************
**********************************************************************************
BioEM: Bayesian inference of Electron Microscopy
**********************************************************************************
PRE-ALPHA VERSION: November, 2014
PRE-ALPHA VERSION: November, 2014
**************************************************************
**********************************************************************************
Requisites:
**** FFTW libraries: http://www.fftw.org/
......@@ -26,191 +26,266 @@ Requisites:
Optional:
**** CMake: http://www.cmake.org/
for compliation with CMakeLists.txt file.
for compilation with CMakeLists.txt file.
**** Cuda: Parallel Code for GPUs.
**** MPI: http://en.wikipedia.org/wiki/Message_Passing_Interface
***************************************************************
*********************************************************************************
INSTALLATION:
Download the BioEM code from .... and untar the file.
The easiest installation is done with the CMake program (http://www.cmake.org/)
that generates automatically a Makefile, according to the
corresponding CPU/GPU architectures and desired features. Cmake
uses the CMakeLists.txt file provided in the main directory and for certain
architectures maybe additional files like FindFFTW.cmake could also be needed.
First, assure yourself that the libraries
**** FFTW libraries: http://www.fftw.org/
**** BOOST libraries: http://www.boost.org/
are installed.
--- CMakeLists.txt:
This file has all the instructions for generating the Makefile for compilation.
At the beginning of the CMakeLists.txt certain features are provided
that should be enabled/disabled (ON/OFF) according to the desired functionalities:
option (INCLUDE_CUDA "Build BioEM with CUDA support" OFF)
option (INCLUDE_OPENMP "Build BioEM with OpenMP support" ON)
option (INCLUDE_MPI "Build BioEM with MPI support" ON)
option (PRINT_CMAKE_VARIABLES "List all CMAKE Variables" OFF)
option (CUDA_FORCE_GCC "Force GCC as host compiler for CUDA part (If standard host compiler is incompatible with CUDA)" OFF)
The extra features, should have their corresponding program installed (such as CUDA and MPI).
STEPS FOR Basic INSTALLATION:
- Generate a build directory in the main BioEM directory
mkdir build
- go into the build directory
cd build
- run cmake with the CMakeLists.txt file in the main directory
cmake ../CMakeLists.txt
If this process is successful a Makefile and CMakeFiles directory
should be generated.
- Run the Makefile
make
If this process is successful a bioEM executable should be generated.
- Simple test:
./bioEM
**Output
++++++++++++ FROM COMMAND LINE +++++++++++
Command line inputs:
--Modelfile arg (Mandatory) Name of model file
--Particlesfile arg if BioEM (Mandatory) Name of particles file
--Inputfile arg if BioEM (Mandatory) Name of input parameter file
--PrintBestCalMap arg (Optional) Only print best calculated map (file nec.).
NO BioEM (!)
--ReadEulerAngles arg (Optional) Read Euler angle list instead of uniform
grid (file nec.)
--ReadPDB (Optional) If reading model file in PDB format
--ReadMRC (Optional) If reading particle file in MRC format
--ReadMultipleMRC (Optional) If reading Multiple MRCs
--DumpMaps (Optional) Dump maps after they were red from maps file
--LoadMapDump (Optional) Read Maps from dump instead of maps file
--help (Optional) Produce help message
Note:
- For changing the compiler on can export the CC and CXX variables before urning cmake.
Example for the intel compilers:
export CXX=icpc
export CC=icc
-If using CUDA with INTEL one needs to turn ON the CUDA_FORCE_GCC variable in the CMakeLists.txt
file.
*****************************************************************************************
DESCRIPTION:
*** The main objective of the BioEM code is to compare one Model to multiple experimental
EM images, obtaining a posterior probability using Bayesian analysis with the
mathematical details explained in Ref. [1].
The main objective of the BioEM code is to compare one Model to multiple experimental
EM images, obtaining a posterior probability using Bayesian analysis with the
mathematical details explained in Ref. [1].
BioEM has two main variable/input sets:
*** Command line input & help is found by just running the
compiled executable ./bioEM
*) PERFORMANCE: Variables that enhance or modify the code performance
according to the necessities of the user. A detailed description
of these variables is presented in the PERFORMANCE_VARIABLES file, and
they are passed to BioEM via environment variables.
++++++++++++ FROM COMMAND LINE +++++++++++
*) PHYSICAL: these parameters describe the physical problem (e.g. like the number of Pixels), and
are specified in files passed to BioEM via the --Inputfile command line option.
In the following, we will only describe these Physical Variables.
--Modelfile arg (Mandatory) Name of model file
--Particlesfile arg if BioEM (Mandatory) Name of paricles file
--Inputfile arg if BioEM (Mandatory) Name of input parameter file
--PrintBestCalMap arg (Optional) Only print best calculated map (file nec.).
NO BioEM (!)
--ReadEulerAngles arg (Optional) Read Euler angle list instead of uniform
grid (file nec.)
--ReadPDB (Optional) If reading model file in PDB format
--ReadMRC (Optional) If reading particle file in MRC format
--ReadMultipleMRC (Optional) If reading Multiple MRCs
--DumpMaps (Optional) Dump maps after they were red from maps file
--LoadMapDump (Optional) Read Maps from dump instead of maps file
--help (Optional) Produce help message
BioEM has four main input readthroughs:
**** INPUT:
BioEM has four main input read through:
1) Command line, where the filenames of the Model, Parameters ranges and Particles
should be provided (and some extra features as seen before).
2) The Model file should contain the coordinates of the model either in PDB or
2) The Model file that contains the coordinates of the model either in PDB or
txt format (see bellow).
3) The parameter file should contain all the parameter ranges, and additional
3) The parameter file that contains all the parameter ranges, and additional
features can be included (see bellow).
4) The particle file should contain the EM images, it can be in text format
4) The particle file that contains the EM images, it can be in text format
or in MRC (this should be specified in the command line) (see bellow).
*** TUTORIAL DIRECTORY:
In the following, we will provide a detailed explanation of each set:
A directory with example EM particles, c-alpha PDB & simple Model, and
the corresponding launch scripts are provided.
-- Standard input file parameters are provided and recommened.
* COMMAND LINE necessary input and help is found by just running the compiled executable
./bioEM
++++++++++++ FROM COMMAND LINE +++++++++++
Command line inputs:
--Modelfile arg (Mandatory) Name of model file
--Particlesfile arg if BioEM (Mandatory) Name of particles file
--Inputfile arg if BioEM (Mandatory) Name of input parameter file
--PrintBestCalMap arg (Optional) Only print best calculated map (file nec.).
NO BioEM (!)
--ReadEulerAngles arg (Optional) Read Euler angle list instead of uniform
grid (file nec.)
--ReadPDB (Optional) If reading model file in PDB format
--ReadMRC (Optional) If reading particle file in MRC format
--ReadMultipleMRC (Optional) If reading Multiple MRCs
--DumpMaps (Optional) Dump maps after they were red from maps file
--LoadMapDump (Optional) Read Maps from dump instead of maps file
--help (Optional) Produce help message
** EXPERIMENTAL IMAGE FORMAT:
Two options are allowed for the map-particle files:
A) Simple *.txt or .dat with data formated as
printf"%8d%8d%16.8f\n" where the first two columns are
the pixel indexes and the third column is the intensity.
Multiple particles are read in the same file with the
separator "PARTICLE" & Number.
Pixel indexes should start at 0 and all pixels should be
in the file.
-- For this case it is recommended all particles
to be normalized to zero average and unit standard deviation.
Example in::
B) Standard MRC particle file. If reading multiple MRCs
provide in command line
--Particlesfile FILE --ReadMRC --ReadMultipleMRC
where FILE contains the names of each mrc file to be read.
If only one MRC on command line
--Particlesfile FILEMRC --ReadMRC
where FILEMRC is the name of the single mrc file.
By default when reading MRC particles are normalized to
to be normalized to zero average and unit standard deviation.
Each MRC file can contain multiple particles.
Example in::
Note:: .mrc extension is not mandatory to read mrc but a warning is
printed out.
Useful Key Words for procesing multiple models
--DumpMaps
writing out in file maps.dump in XX format so its faster to re-read.
To read use
--LoadMapDump
--- TUTORIAL DIRECTORY:
In the Tutorial directory, the file named LAUNCH_BASIC, contains all the basic
examples for the command line input to BioEM.
** MODEL FORMAT:
* MODEL INPUT FORMAT:
A) Standard PDB file: Reading only CA atoms and corresponding
residues with proper density.
Key word in command line is needed::
--ReadPDB
Also, it is recommended to have in the parameter file the key word
"PROJECT_RADIUS". For modeling the CA atoms as spheres
with the proper number of electrons and van der waals radii
corresponding to each amino acid. If this key word is not mentioned elements
will be considered as points.
Note:: .pdb extension is not mandatory to read pdb but a warning sign is
printed out.
A) Standard PDB file: Reading only CA atoms and corresponding
residues with proper density.
Key word in command line is needed::
--ReadPDB
Also, it is recommended to have in the parameter file the key word
"PROJECT_RADIUS". For modeling the CA atoms as spheres
with the proper number of electrons and van der waals radii
corresponding to each amino acid. If this key word is not mentioned elements
will be considered as points.
Note:: .pdb extension is not mandatory to read pdb but a warning sign is
printed out.
B)*.txt *.dat file: With format printf"%f %f %f %f %f\n",
the first three columns as the coordinates of atoms or
voxels, fourth column is the radius (\AA) and the
last column is the corresponding density::
---- x y z radius density -------
(Useful for all atom representation or 3D EM density maps).
The key word "PROJECT_RADIUS" is needed to consider
the elements in the coordinate file as spheres and project their radius.
If this key word is not mentioned elements will be considered as points.
B)*.txt *.dat file: With format printf"%f %f %f %f %f\n",
the first three columns as the coordinates of atoms or
voxels, fourth column is the radius (\AA) and the
last column is the corresponding density::
---- x y z radius density -------
(Useful for all atom representation or 3D EM density maps).
The key word "PROJECT_RADIUS" is needed to consider
the elements in the coordinate file as spheres and project their radius.
If this key word is not mentioned elements will be considered as points.
** PARAMETER FILE FORMAT:
* PARAMETER INPUT:
In the Tutorial directory, the file PARAMETER_EXPLANATION, contains a detailed
explanation of the parameters needed for the BioEM calculation.
* EXPERIMENTAL IMAGE FORMAT:
Two options are allowed for the map-particle files:
A) Simple *.txt or .dat with data formated as
printf"%8d%8d%16.8f\n" where the first two columns are
the pixel indexes and the third column is the intensity.
Multiple particles are read in the same file with the
separator "PARTICLE" & Number.
Pixel indexes should start at 0 and all pixels should be
in the file.
-- For this case it is recommended all particles
to be normalized to zero average and unit standard deviation.
B) Standard MRC particle file. If reading multiple MRCs
provide in command line
--Particlesfile FILE --ReadMRC --ReadMultipleMRC
where FILE contains the names of each mrc file to be read.
If only one MRC on command line
--Particlesfile FILEMRC --ReadMRC
where FILEMRC is the name of the single mrc file.
By default when reading MRC particles are normalized to
to be normalized to zero average and unit standard deviation.
Each MRC file can contain multiple particles.
Note:: .mrc extension is not mandatory to read mrc but a warning is
printed out.
Useful Key Words for processing multiple models
--DumpMaps
writing out in file maps.dump in XX format so its faster to re-read.
To read use
--LoadMapDump
(see also LAUNCH_BASIC tutorial)
Additional:
Print CTF maximizing parameters
** STANDARD CALCULATION:
**** STANDARD BioEM CALCULATION:
In a standard BioEM calculation the goal is to obtain the posterior
probability from a Model given a set of images. In this case
a Model file, Parameter file and Particle file should be provided.
Example in::
** Optional Calculations
**** Optional Calculations
Several additional options are available in this program:
A) Euler Angle Probabilities: This option prints out the
posterior probabilities of the model as a function of the Euler Angles.
In this case no integration over the angles is performed, and one
can view more directly the probability distriubtion as a function of the angles.
can view more directly the probability distribution as a function of the angles.
Input needed:
Example in:
B)Cross Correlation Calculation: This option prints the best* cross correlation
of the model at as a function of the pixels in the micrograph (*see Manual
for mathematical formulation of how the "best" cross correlation is obtained).
This can be useful in the preliminary steps of particle picking (identification).
Input needed::
Example in:
C)Print map from Model: This option is completely independent of the BioEM calculation.
It can be useful to construct synthetic images from a model, given fixed set of parameters.
Noise can also be included in the artificial image.
Input needed::
Example in:
(see PARAMETER_EXPLANATION and LAUNCH_BASIC for examples)
*** OUTPUT:
-- Main output file: "Output_Probabilities"
with
RefMap #(number Particle Map) Probability #(log(P))
RefMap #(number Particle Map) Maximizing Param: #(Euler Angles) #(PSF parameters) #(center displacement)
-- Main output file: "Output_Probabilities"
with
RefMap #(number Particle Map) Probability #(log(P))
RefMap #(number Particle Map) Maximizing Param: #(Euler Angles) #(PSF parameters) #(center displacement)
**Important: It is recommended to compare log(P) with respect to other Models or to Noise as in [1].
**Important: It is recommended to compare log(P) with respect to other Models or to Noise as in [1].
** Optional OUTPUTS:
-- Write the probailities for each triplet of Euler Angles (key word: WRITE_PROB_ANGLES in InputFile).
Key word in
-- Write the probabilities for each triplet of Euler Angles (key word: WRITE_PROB_ANGLES in InputFile).
-- Write the cross-correlations of a full micrograph
-- Write the cross-correlations of a full micrograph
Key word in
-- (Excluding BioEM calculation) Print a map given a set of parameters
-- (Excluding BioEM calculation) Print a map given a set of parameters
Key word in command line:: --PrintBestCalMap
(see PARAMETER_EXPLANATION and LAUNCH_BASIC for examples)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment