We provide a list of the performance variables that enhance or modify the code’s computational performance without modifying the numerical results (see chapter 3 of the usermanual). They are passed via environment variables using bash (with the export command).
FFTALGO=[x] (Default: 1) Set to 1 to enable the Fourier-algorithm or to 0 to use the non-Fourier-algorithm.
GPU=[x] (Default: 0) Set to 1 to enable GPU usage, set to 0 to use only the CPU. The GPU will be used to calculate the probabilities for all particle-images. The preparation of projections and PSF convolutions will be processed by the CPU. This is arranged in a pipeline to ensure continuous GPU utilization.
OMP_NUM_THREADS=[x] (Default: Number of CPU cores) This is the standard OpenMP environment variable to define the number of OpenMP threads. It can be used for profiling purposes to analyze the scaling. It can be set to x=1 to use MPI exclusively or to other values for a mixed MPI / OpenMP configuration.
GPUDEVICE=[x] (Default: fastest) Only relevant if GPU=1
– If this is not set, BioEM will autodetect the fastest GPU. Only possible if MPI is not used.
– If x >= 0, BioEM will use GPU number x. Only possible if MPI is not used.
– If x = -1. For MPI runs. BioEM runs with N MPI threads, and the system has G GPUs, then BioEM will use GPU with number (N % G). The idea is that one can place multiple MPI processes on one node, and each will use a different GPU. For a multi-node configuration, one must make sure that consecutive MPI ranks are placed on the same node, i.e. four processes on two nodes (node0 and node1) must be placed as (node0, node0, node1, node1) and not as (node0, node1, node0, node1), because in the latter case only 1 GPU per node will be used (by two MPI processes each).
GPUWORKLOAD=[x] (Default: 100) Only relevant if GPU=1. This defines the fraction of the workload in percent. To be precise: the fraction of the number of particle-images processed by the GPU. The remaining particle-images will be processed by the CPU. Preparation of projection and convolution will be processed by the CPU in any case.
GPUASYNC=[x] (Default: 1) Only relevant if GPU=1. This uses a pipeline to overlap the processing on the GPU, the preparation of projections and convolutions on the CPU, and the DMA transfer. There is no reason to disable this except for debugging purposes.
GPUDUALSTREAM=[x] (Default: 1) Only relevant if GPU=1. If this is set to 1, the GPU will use two streams in parallel. This can help to improve the GPU utilization. Benchmarks have shown that there is a very little positive effect by this setting.
GPUALGO=[x] (Default: 2) This option is only relevant if GPU=1 and FFTALGO=0. Hence, it is commonly not used, since FFTALGO defaults to 1. For the non-Fourier-algorithm there are three GPUALGO implementations:
– x=2: This will parallelize over the particle-images, and over the center displacements. The approach requires less memory bandwidth than GPUALGO=0 or GPUALGO=1. However, it has several constraints on the problem configuration: i) the number of center displacements per dimension must be a power of 2, and ii) must be a factor of the number of CUDA threads per block.
– x=1: This will parallelize over the particle-images, and then loop over the center displacements on the GPU. It is usually slower than GPUALGO=2 but there are no constraints on the problem configuration. The particle-images are not processed all at once but in chunks.
– x=0: As GPUALGO=1, all particle images are processed at once. It is always slower than GPUALGO=1 and should not be used anymore.
- BIOEM_PROJECTIONS_AT_ONCE=[x] (Default: 1) This defines the number of projections prepared at once. Benchmarks have shown that the effect is negligible. OpenMP is used to prepare these projections in parallel. It is mostly relevant, if OpenMP is used, no GPU is used, and/or the number of reference particle-image is very small.
**BIOEM_DEBUG_BREAK=[x] (Default: deactivated) This is a debugging option. It will reduce the number of projection and PSF convolutions to a maximum of x both. It can be used for profiling to analyze scaling, and for fast sanity tests.
BIOEM_DEBUG_NMAPS=[x] (Default: deactivated) As BIOEM DEBUG BREAK, with the difference that this limits the number of reference particle-images to a maximum of x.
BIOEM_DEBUG_OUTPUT=[x] (Default: 2) Change the verbosity of the output. Higher means more output, lower means less output.
– 0: Stands for no debug output.
– 1: Limited timing output.
– 2: Standard timing output showing durations of projection, convolution, and cross-correlation comparison. Values above 1 add successively more extensive output.
It is recommended that the following settings should be left at theirs defaults: FFTALGO (Default 1), GPUALGO (Default 2), GPUASYNC (Default 1), GPUDUALSTREAM (Default 1).