Commit 7b6dd3b4 authored by Pilar Cossio's avatar Pilar Cossio

Tutorial Basics Vr. 1

parent de14bf2a
************************* HEADER:: NOTATION *******************************************
RefMap: MapNumber ; alpha - beta - gamma - log Probability
************************* HEADER:: NOTATION *******************************************
0 0.836 1.369 2.64 -68323.9
0 -1.375 0.4 3.018 -68390.4
1 0.836 1.369 2.64 -68323.9
1 -1.375 0.4 3.018 -68390.4
2 0.836 1.369 2.64 -68555.9
2 -1.375 0.4 3.018 -68565.2
This diff is collapsed.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
< BioEM software for Bayesian inference of Electron Microscopy images>
Copyright (C) 2014 Pilar Cossio, David Rohr and Gerhard Hummer.
Max Planck Institute of Biophysics, Frankfurt, Germany.
See license statement for terms of distribution.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
*************************************************************
BioEM: Bayesian inference of Electron Microscopy
*************************************************************
PRE-ALPHA VERSION: November, 2014
**************************************************************
Requisites:
**** FFTW libraries: http://www.fftw.org/
**** BOOST libraries: http://www.boost.org/
**** OpenMP: http://openmp.org/
Optional:
**** CMake: http://www.cmake.org/
for compliation with CMakeLists.txt file.
**** Cuda: Parallel Code for GPUs.
***************************************************************
DESCRIPTION:
*** The main objective of the BioEM code is to compare one Model to multiple experimental
EM images, obtaining a posterior probability using Bayesian analysis with the
mathematical details explained in Ref. [1].
*** Command line input & help is found by just running the
compiled executable ./bioEM
++++++++++++ FROM COMMAND LINE +++++++++++
--Modelfile arg (Mandatory) Name of model file
--Particlesfile arg if BioEM (Mandatory) Name of paricles file
--Inputfile arg if BioEM (Mandatory) Name of input parameter file
--PrintBestCalMap arg (Optional) Only print best calculated map (file nec.).
NO BioEM (!)
--ReadEulerAngles arg (Optional) Read Euler angle list instead of uniform
grid (file nec.)
--ReadPDB (Optional) If reading model file in PDB format
--ReadMRC (Optional) If reading particle file in MRC format
--ReadMultipleMRC (Optional) If reading Multiple MRCs
--DumpMaps (Optional) Dump maps after they were red from maps file
--LoadMapDump (Optional) Read Maps from dump instead of maps file
--help (Optional) Produce help message
BioEM has four main input readthroughs:
1) Command line, where the filenames of the Model, Parameters ranges and Particles
should be provided (and some extra features as seen before).
2) The Model file should contain the coordinates of the model either in PDB or
txt format (see bellow).
3) The parameter file should contain all the parameter ranges, and additional
features can be included (see bellow).
4) The particle file should contain the EM images, it can be in text format
or in MRC (this should be specified in the command line) (see bellow).
*** TUTORIAL DIRECTORY:
A directory with example EM particles, c-alpha PDB & simple Model, and
the corresponding launch scripts are provided.
-- Standard input file parameters are provided and recommened.
** EXPERIMENTAL IMAGE FORMAT:
Two options are allowed for the map-particle files:
A) Simple *.txt or .dat with data formated as
printf"%8d%8d%16.8f\n" where the first two columns are
the pixel indexes and the third column is the intensity.
Multiple particles are read in the same file with the
separator "PARTICLE" & Number.
Pixel indexes should start at 0 and all pixels should be
in the file.
-- For this case it is recommended all particles
to be normalized to zero average and unit standard deviation.
Example in::
B) Standard MRC particle file. If reading multiple MRCs
provide in command line
--Particlesfile FILE --ReadMRC --ReadMultipleMRC
where FILE contains the names of each mrc file to be read.
If only one MRC on command line
--Particlesfile FILEMRC --ReadMRC
where FILEMRC is the name of the single mrc file.
By default when reading MRC particles are normalized to
to be normalized to zero average and unit standard deviation.
Each MRC file can contain multiple particles.
Example in::
Note:: .mrc extension is not mandatory to read mrc but a warning is
printed out.
Useful Key Words for procesing multiple models
--DumpMaps
writing out in file maps.dump in XX format so its faster to re-read.
To read use
--LoadMapDump
** MODEL FORMAT:
A) Standard PDB file: Reading only CA atoms and corresponding
residues with proper density.
Key word in command line is needed::
--ReadPDB
Also, it is recommended to have in the parameter file the key word
"PROJECT_RADIUS". For modeling the CA atoms as spheres
with the proper number of electrons and van der waals radii
corresponding to each amino acid. If this key word is not mentioned elements
will be considered as points.
Note:: .pdb extension is not mandatory to read pdb but a warning sign is
printed out.
B)*.txt *.dat file: With format printf"%f %f %f %f %f\n",
the first three columns as the coordinates of atoms or
voxels, fourth column is the radius (\AA) and the
last column is the corresponding density::
---- x y z radius density -------
(Useful for all atom representation or 3D EM density maps).
The key word "PROJECT_RADIUS" is needed to consider
the elements in the coordinate file as spheres and project their radius.
If this key word is not mentioned elements will be considered as points.
** PARAMETER FILE FORMAT:
Additional:
Print CTF maximizing parameters
** STANDARD CALCULATION:
In a standard BioEM calculation the goal is to obtain the posterior
probability from a Model given a set of images. In this case
a Model file, Parameter file and Particle file should be provided.
Example in::
** Optional Calculations
Several additional options are available in this program:
A) Euler Angle Probabilities: This option prints out the
posterior probabilities of the model as a function of the Euler Angles.
In this case no integration over the angles is performed, and one
can view more directly the probability distriubtion as a function of the angles.
Input needed:
Example in:
B)Cross Correlation Calculation: This option prints the best* cross correlation
of the model at as a function of the pixels in the micrograph (*see Manual
for mathematical formulation of how the "best" cross correlation is obtained).
This can be useful in the preliminary steps of particle picking (identification).
Input needed::
Example in:
C)Print map from Model: This option is completely independent of the BioEM calculation.
It can be useful to construct synthetic images from a model, given fixed set of parameters.
Noise can also be included in the artificial image.
Input needed::
Example in:
*** OUTPUT:
-- Main output file: "Output_Probabilities"
with
RefMap #(number Particle Map) Probability #(log(P))
RefMap #(number Particle Map) Maximizing Param: #(Euler Angles) #(PSF parameters) #(center displacement)
**Important: It is recommended to compare log(P) with respect to other Models or to Noise as in [1].
** Optional OUTPUTS:
-- Write the probailities for each triplet of Euler Angles (key word: WRITE_PROB_ANGLES in InputFile).
Key word in
-- Write the cross-correlations of a full micrograph
Key word in
-- (Excluding BioEM calculation) Print a map given a set of parameters
Key word in command line:: --PrintBestCalMap
[1] Cossio, P and Hummer, G. J Struct Biol. 2013 Dec;184(3):427-37. doi: 10.1016/j.jsb.2013.10.006.
This diff is collapsed.
This diff is collapsed.
0.836000 1.369000 2.640000
-1.375000 0.400000 3.018000
########################################################################
######------------------BASIC FEATURES----------------------------######
########################################################################
############### Text Model - Text Image #########################
FILES::
Model file: Model_Text
Parameter file: Param_Input_Short
Image file: Text_Image_Form
Command line for calculating the probability of a model (text format) with parameter input and images in text format:
~/BioEM/build/bioEM --Inputfile Param_Input_Short --Modelfile Model_Text --Particlesfile Text_Image_Form
Example Outputfile: Output_Probabilities_Text_Image_Form
NOTE: Check coordinates in printed COORDREAD file to varify that the Model is correct.
############### PDB Model - Text Image #########################
New Command:: --ReadPDB
FILES::
Model file: Model.pdb
Parameter file: Param_Input_Short
Image file: Text_Image_Form
~/BioEM/build/bioEM --Inputfile Param_Input_Short --Modelfile Model.pdb --ReadPDB --Particlesfile Text_Image_Form
############### PDB Model - One MRC Image #########################
New Command:: --ReadMRC
FILES::
Model file: Model.pdb
Parameter file: Param_Input_Short
Image file: OneImage.mrc
~/BioEM/build/bioEM --Inputfile Param_Input_Short --Modelfile Model.pdb --ReadPDB --Particlesfile OneImage.mrc --ReadMRC
############### PDB Model - Multiple MRCs #########################
New Command:: --ReadMultipleMRC
FILES::
Model file: Model.pdb
Parameter file: Param_Input_Short
FILE with MRC NAMES: ListMRC
~/BioEM/build/bioEM --Inputfile Param_Input_Short --Modelfile Model.pdb --ReadPDB --Particlesfile ListMRC --ReadMRC --ReadMultipleMRC
Example Outputfile::Output_Probabilities_MultipleMRC
NOTE: Both commands --ReadMRC --ReadMultipleMRC are requiered.
########################################################################
######------------EXTRA FEATURES with BioEM-----------------------######
########################################################################
############### PDB Model - Multiple MRCs --DumpMaps #########################
New Command:: --DumpMaps
This extra feature writes out the particle images in XX format so its easyer to read in a
further BioEM run.
FILES::
Model file: Model.pdb
Parameter file: Param_Input_Short
FILE with MRC NAMES: ListMRC
~/BioEM/build/bioEM --Inputfile Param_Input_Short --Modelfile Model.pdb --ReadPDB --Particlesfile ListMRC --ReadMRC --ReadMultipleMRC --DumpMaps
Additional outputfile:: maps.dump (that will be useful in an extra trial)
############### PDB Model --LoadMapDump #######################
New Command:: --LoadMapDump
This extra feature read in the particle images in XX format from file maps.dump
that was printed out previously with --DumpMaps. Here now Particle file name is needed
(just maps.dump).
FILES::
Model file: Model.pdb
Parameter file: Param_Input_Short
Dumped Mapfile: maps.dump
~/BioEM/build/bioEM --Inputfile Param_Input_Short --Modelfile Model.pdb --ReadPDB --LoadMapDump
######################## PDB Model + Read Euler angles from File #############
With this feature the Euler angles are not sampled uniformly on the spere but
one can read them from a file. The format should of the file be alpha (12.6f) beta (12.6f) gamma (12.6f).
New Command:: --ReadEulerAngles arg
ADDITIONALY in the input parameter file should have
NOT_UNIFORM_TOTAL_ANGS XX
with XX=total number of euler angle triplets.
FILES::
Model file: Model.pdb
Parameter file: Param_Input_ReadEulerAng
Image file: Text_Image_Form
EulerAngle File: Euler_Angle_List
~/BioEM/build/bioEM --Inputfile Param_Input_ReadEulerAng --Modelfile Model.pdb --ReadPDB --Particlesfile Text_Image_Form --ReadEulerAngles Euler_Angle_List
########################################################################
#######----------------EXTRA FEATURE No BIOEM-------------------########
########################################################################
With this code it is also possible to simply print out a synthetic image
from a model given a set of parameters. This is useful for post-analysis
characterization, for example in comparing the best calculated map
to the experimental image (as in Fig.1 Ref.[1]). Or for creating synthetic
image sets with noise.
New Command:: --PrintBestCalMap arg
FILES::
Model file: Model.pdb
Print Map Parameter file: Param_Print_MAP
IMPORTANT:: Print Map Input parameters are different from those of standard BioEM parameter input
(see Manual).
~/BioEM/build/bioEM --Modelfile Model.pdb --ReadPDB --PrintBestCalMap Param_Print_MAP
New outputfile:: BESTMAP with format
MAP ----- Pixel X ---- Pixel Y --- Intensity
after each Y-column there is an extra line for easly plotting with pm3d map in Gnuplot
see http://gnuplot.info/.
NOTE:: BioEM Output probability files are NOT generated.
#################################################################
##### In this example, one reads only ONE MRC file from the --Particlesfile
#### Commands --ReadMRC is necessary
module load mkl
module load intel
module load boost
module load impi
export FFTALGO=1;
export GPU=0;
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${BOOST_HOME}/lib
export BIOEM_DEBUG_OUTPUT=0
~/BioEM/build/bioEM --Inputfile Param_Input_Stre --Modelfile Model.pdb --ReadPDB --Particlesfile OneImage.mrc --ReadMRC
##### In this example, one reads multiple MRC files names from the --Particlesfile
#### Commands --ReadMRC --ReadMultipleMRC are necessary
export BIOEM_DEBUG_OUTPUT=0
~/BioEM/build/bioEM --Inputfile Param_Input_Stre --Modelfile Model.pdb --ReadPDB --Particlesfile ListMRC --ReadMRC --ReadMultipleMRC
OneImage.mrc
TwoImages.mrc
This diff is collapsed.
This diff is collapsed.
************************* HEADER:: NOTATION *******************************************
RefMap: MapNumber ; LogProb: natural logarithm of posterior Probability ; Constant: Numerical Const. for adding Probabilities
RefMap: MapNumber ; Maximizing Param: MaxLogProb - alpha - beta - gamma - PSF amp - PSF phase - PSF envelope - center x - center y - normalization - offsett
************************* HEADER:: NOTATION *******************************************
RefMap: 0 LogProb: -67985.2 Constant: 702.336
RefMap: 0 Maximizing Param: -67985.2 -1.5708 [rad] 1.15928 [rad] -0.942478 [rad] 1 [ ] 0.008 [1./A²] 0.0006 [1./A²] -2 [pix] -6 [pix] -0.00888036 [ ] 0.0286706 [ ]
RefMap: 1 LogProb: -67985.2 Constant: 702.336
RefMap: 1 Maximizing Param: -67985.2 -1.5708 [rad] 1.15928 [rad] -0.942478 [rad] 1 [ ] 0.008 [1./A²] 0.0006 [1./A²] -2 [pix] -6 [pix] -0.00888036 [ ] 0.0286706 [ ]
RefMap: 2 LogProb: -68341.8 Constant: 345.699
RefMap: 2 Maximizing Param: -68341.9 0.314159 [rad] 0.643501 [rad] 0.314159 [rad] 1 [ ] 0.01 [1./A²] 0.0008 [1./A²] 11 [pix] 1 [pix] -0.00799752 [ ] 0.02588 [ ]
************************* HEADER:: NOTATION *******************************************
RefMap: MapNumber ; LogProb: natural logarithm of posterior Probability ; Constant: Numerical Const. for adding Probabilities
RefMap: MapNumber ; Maximizing Param: MaxLogProb - alpha - beta - gamma - PSF amp - PSF phase - PSF envelope - center x - center y - normalization - offsett
************************* HEADER:: NOTATION *******************************************
RefMap: 0 LogProb: -67985.2 Constant: 702.336
RefMap: 0 Maximizing Param: -67985.2 -1.5708 [rad] 1.15928 [rad] -0.942478 [rad] 1 [ ] 0.008 [1./A²] 0.0006 [1./A²] -2 [pix] -6 [pix] -0.00888036 [ ] 0.0286706 [ ]
RefMap: 1 LogProb: -67985.2 Constant: 702.336
RefMap: 1 Maximizing Param: -67985.2 -1.5708 [rad] 1.15928 [rad] -0.942478 [rad] 1 [ ] 0.008 [1./A²] 0.0006 [1./A²] -2 [pix] -6 [pix] -0.00888036 [ ] 0.0286706 [ ]
RefMap: 2 LogProb: -68341.8 Constant: 345.699
RefMap: 2 Maximizing Param: -68341.9 0.314159 [rad] 0.643501 [rad] 0.314159 [rad] 1 [ ] 0.01 [1./A²] 0.0008 [1./A²] 11 [pix] 1 [pix] -0.00799752 [ ] 0.02588 [ ]
************************* HEADER:: NOTATION *******************************************
RefMap: MapNumber ; LogProb: natural logarithm of posterior Probability ; Constant: Numerical Const. for adding Probabilities
RefMap: MapNumber ; Maximizing Param: MaxLogProb - alpha - beta - gamma - PSF amp - PSF phase - PSF envelope - center x - center y - normalization - offsett
************************* HEADER:: NOTATION *******************************************
RefMap: 0 LogProb: -67985.3 Constant: 702.244
RefMap: 0 Maximizing Param: -67985.3 -1.5708 [rad] 1.15928 [rad] -0.942478 [rad] 1 [ ] 0.008 [1./A²] 0.0006 [1./A²] -2 [pix] -6 [pix] -0.00888036 [ ] 0.0286706 [ ]
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
< BioEM software for Bayesian inference of Electron Microscopy images>
Copyright (C) 2014 Pilar Cossio, David Rohr and Gerhard Hummer.
Max Planck Institute of Biophysics, Frankfurt, Germany.
See license statement for terms of distribution.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
********************************************************************************
BioEM: Bayesian inference of Electron Microscopy
********************************************************************************
PRE-ALPHA VERSION: November, 2014
*********************************************************************************
*************************BIOEM PARAMETERS****************************************
############# MANDATORY parameters:
PIXEL_SIZE (float)
Pixel size in [A] of the experimental micrograph.
NUMBER_PIXELS (int)
Assuming a square particle, this is the number of pixels in each
dimension, e.g. particle of 220 x 220 pixles, then the number should
be 220.
############# INTEGRATION RANGES:
In a standard BioEM calculation, one needs to define the parameter integration ranges
and grid spacing.
** ORIENTATION::
- In BioEM the Euler angles are used to define a given orientation of the
molecule in 3D. Two possible manners are allowed for the integration
over the orientations:
A) [Default]
Sample the full space with uniform distribution of alpha [-pi,pi], cos(beta) [-1,1]
and gamma [-pi,pi]. For this case one needs only to provide the number of grid points
in alpha and beta.
Key words in Parameter file:
GRIDPOINTS_ALPHA (int)
GRIDPOINTS_BETA (int)
Note: To sample uniformly it is recommended that GRIDPOINTS_ALPHA~2*GRIDPOINTS_BETA.
B) Read Euler angles from File. With this feature the Euler angles are not
sampled uniformly on the spere but are read them a file. The format should of the file be
alpha (12.6f) beta (12.6f) gamma (12.6f).
Key words in Commandline:
--ReadEulerAngles arg
arg=FILE NAME (with Euler angle tripplets).
Key words in Parameter file:
NOT_UNIFORM_TOTAL_ANGS (int)
with (int) denotes the total number of euler angle triplets. (see XX for an example).
**POINT SPREAD FUNCTION::
The point spread function is defined in the Supplementary Information of Ref. [1].
For this integration we have 3 variables: amplitud, envelop and phase.
**Note:: This calculation is done in Real Space (not in Fourier as the CTF).
The amplitud defines the contribution of the sine or cosine parts it is within [0,1].
The envelop is the real space equivalent of the b-factor, it should be given in
units of 1/AA^2, standard is should go xx
The phase is also in real space it should be given in units of 1/AA^2, standard is should go xx
The number of grid points (GRIDPONTS_ ) within each integration range, the starting point of the
integration (START_ ) and the grid spacing (GRIDSPACE_ ) are selected by the user.
Key words in Parameter file:
ENVELOPE:
GRIDPOINTS_ENVELOPE (int)
START_ENVELOPE (float)
GRIDSPACE_ENVELOPE (float)
PHASE:
GRIDPOINTS_PSF_PHASE (int)
START_PSF_PHASE (float)
GRIDSPACE_PSF_PHASE (float)
AMPLITUD:
GRIDPOINTS_PSF_AMP (int)
START_PSF_AMP (float)
GRIDSPACE_PSF_AMP (float)
An additional feature is to print out the corresponding CTF parameters that maximize the posterior:
Key words in Parameter file:
WRITE_CTF_PARAM
** CENTER DISPLACEMENT
The integration of the particle translation is done equidistantly from the center, with
a given grid space in pixels equally for both dimensions.
Key words in Parameter file:
MAX_D_CENTER (int)
PIXEL_GRID_CENTER (int)
Example: if MAX_D_CENTER=10 and PIXEL_GRID_CENTER=2, the integration will be done
between [-10,10] in pixel x and [-10,10] in pixel y with sampling every 2 pixels.
** NORMALIZATION AND OFFSET:
The integration of the normalization and offset is carried out analytically.
############# Extra FEATURES
Additionaly of calculating the bioEM probability, one can extract other features
from the model-experiment comparison:
** Posterior probability as a funtion of Euler angles::
One can write out the posterior probability as a function of each Euler angle
triplet (i.e. integration is performed over PSF, center, normalization and offset
but not over the orientations).
Key words in Parameter file:
WRITE_PROB_ANGLES
There is an additional outputfile called ANG_PROB where the results are written.
** Cross Correlation calculation:
It is also possible to write out the best cross-correlation as a function
of the pixels. Here the integration standard integration over the specified
parameters is performed and additionaly the cross correlation is stored as
sum(exp(-CC(i,j)))
where CC(i,j) is the cross correlation of the model at pixel i,j and the
sum is over all parameters apart from the center displacement.
to optimize speed a grid can is used (i.e. intead of storing it every pixel,
it stores it every X pixels).
This feature is better for particle identification and can be used with
entire micrographs, so it it recommened to have few input maps.
Mandatory Key words in Parameter file:
WRITE_CROSSCOR
CROSSCOR_GRID_SPACE (int)
Extra key words in Parameter file:
FLIPPED
Important!: If the micrograph intensities are inverted (particles are white instead of black).
CROSSCOR_NOTBAYESIAN
If only the maximum of CC(i,j) is stored instead of the sum over the parameters
(as above).
There is an additional outputfile called CROSS_CORRELATION where the cross-correlations are writen.
See Param_Input_CrossCorrelation as an example of the input file.
NUMBER_PIXELS 220
PIXEL_SIZE 1.77
GRIDPOINTS_ALPHA 10
GRIDPOINTS_BETA 5
GRIDPOINTS_ENVELOPE 4
START_ENVELOPE 0.0002
GRIDSPACE_ENVELOPE 0.0002
GRIDPOINTS_PSF_PHASE 4
START_PSF_PHASE 0.004
GRIDSPACE_PSF_PHASE 0.002
GRIDPOINTS_PSF_AMP 1
START_PSF_AMP 1.
GRIDSPACE_PSF_AMP 0
MAX_D_CENTER 25
PIXEL_GRID_CENTER 2
PROJECT_RADIUS
WRITE_CROSSCOR
CROSSCOR_GRID_SPACE 2
#FLIPPED Important!: Only If the micrograph intensities are inverted (particles are white instead of black).
CROSSCOR_NOTBAYESIAN
NUMBER_PIXELS 220
PIXEL_SIZE 1.77
GRIDPOINTS_ENVELOPE 4
START_ENVELOPE 0.0002
GRIDSPACE_ENVELOPE 0.0002
GRIDPOINTS_PSF_PHASE 4
START_PSF_PHASE 0.004
GRIDSPACE_PSF_PHASE 0.002
GRIDPOINTS_PSF_AMP 1
START_PSF_AMP 1.
GRIDSPACE_PSF_AMP 0
MAX_D_CENTER 25
PIXEL_GRID_CENTER 2
PROJECT_RADIUS
NOT_UNIFORM_TOTAL_ANGS 2
WRITE_PROB_ANGLES
NUMBER_PIXELS 220
PIXEL_SIZE 1.77
GRIDPOINTS_ALPHA 10
GRIDPOINTS_BETA 5
GRIDPOINTS_ENVELOPE 4
START_ENVELOPE 0.0002
GRIDSPACE_ENVELOPE 0.0002
GRIDPOINTS_PSF_PHASE 4
START_PSF_PHASE 0.004
GRIDSPACE_PSF_PHASE 0.002
GRIDPOINTS_PSF_AMP 1
START_PSF_AMP 1.
GRIDSPACE_PSF_AMP 0
MAX_D_CENTER 25
PIXEL_GRID_CENTER 2