Pilar Cossio · d8591825
--- a/Performance-Variables.md
+++ b/Performance-Variables.md
@@ -5,6 +5,8 @@ We provide a list of the performance variables that enhance or modify the code

 - **GPU**=[x] (Default: 0) Set to 1 to enable GPU usage, set to 0 to use only the CPU. The GPU will be used to calculate the probabilities for all particle-images. The preparation of projections and PSF convolutions will be processed by the CPU. This is arranged in a pipeline to ensure continuous GPU utilization.

+- **OMP_NUM_THREADS**=[x] (Default: Number of CPU cores) This is the standard OpenMP environment variable to define the number of OpenMP threads. It can be used for profiling purposes to analyze the scaling. It can be set to x=1 to use MPI exclusively or to other values for a mixed MPI / OpenMP configuration.
+
 ## GPU Variables 
 - **GPUDEVICE**=[x] (Default: fastest) Only relevant if GPU=1
    
@@ -18,10 +20,29 @@ We provide a list of the performance variables that enhance or modify the code

 - **GPUASYNC**=[x] (Default: 1) Only relevant if GPU=1. This uses a pipeline to overlap the processing on the GPU, the preparation of projections and convolutions on the CPU, and the DMA transfer. There is no reason to disable this except for debugging purposes.

+- **GPUDUALSTREAM**=[x] (Default: 1) Only relevant if GPU=1. If this is set to 1, the GPU will use two streams in parallel. This can help to improve the GPU utilization. Benchmarks have shown that there is a very little positive effect by this setting.
+
 - **GPUALGO**=[x] (Default: 2) This option is only relevant if GPU=1 and FFTALGO=0. Hence, it is commonly not used, since FFTALGO defaults to 1. For the non-Fourier-algorithm there are three GPUALGO implementations:

    – x=2: This will parallelize over the particle-images, and over the center displacements. The approach requires less memory bandwidth than GPUALGO=0 or GPUALGO=1. However, it has several constraints on the problem configuration: i) the number of center displacements per dimension must be a power of 2, and ii) must be a factor of the number of CUDA threads per block.
 
    – x=1: This will parallelize over the particle-images, and then loop over the center displacements on the GPU. It is usually slower than GPUALGO=2 but there are no constraints on the problem configuration. The particle-images are not processed all at once but in chunks.

-    – x=0: As GPUALGO=1, all particle images are processed at once. It is always slower than GPUALGO=1 and should not be used anymore.
\ No newline at end of file
+    – x=0: As GPUALGO=1, all particle images are processed at once. It is always slower than GPUALGO=1 and should not be used anymore.
+
+## Additional variables
+
+- **BIOEM_PROJECTIONS_AT_ONCE**=[x] (Default: 1) This defines the number of projections prepared at once. Benchmarks have shown that the effect is negligible. OpenMP is used to prepare these projections in parallel. It is mostly relevant, if OpenMP is used, no GPU is used, and/or the number of reference particle-image is very small.
+
+## Debug variables
+- **BIOEM_DEBUG_BREAK=[x] (Default: deactivated) This is a debugging option. It will reduce the number of projection and PSF convolutions to a maximum of x both. It can be used for profiling to analyze scaling, and for fast sanity tests.
+
+- **BIOEM_DEBUG_NMAPS**=[x] (Default: deactivated) As BIOEM DEBUG BREAK, with the difference that this limits the number of reference particle-images to a maximum of x.
+
+- **BIOEM_DEBUG_OUTPUT**=[x] (Default: 2) Change the verbosity of the output. Higher means more output, lower means less output.
+    
+     – 0: Stands for no debug output.
+
+     – 1: Limited timing output.
+
+     – 2: Standard timing output showing durations of projection, convolution, and cross-correlation comparison. Values above 1 add successively more extensive output.
\ No newline at end of file