... | ... | @@ -6,6 +6,9 @@ We provide a list of the performance variables that enhance or modify the code |
|
|
- **GPU**=[x] (Default: 0) Set to 1 to enable GPU usage, set to 0 to use only the CPU. The GPU will be used to calculate the probabilities for all particle-images. The preparation of projections and PSF convolutions will be processed by the CPU. This is arranged in a pipeline to ensure continuous GPU utilization.
|
|
|
|
|
|
- **GPUALGO**=[x] (Default: 2) This option is only relevant if GPU=1 and FFTALGO=0. Hence, it is commonly not used, since FFTALGO defaults to 1. For the non-Fourier-algorithm there are three GPUALGO implementations:
|
|
|
– x=2: This will parallelize over the particle-images, and over the center displacements. The approach requires less memory bandwidth than GPUALGO=0 or GPUALGO=1. However, it has several constraints on the problem configuration: i) the number of center displacements per dimension must be a power of 2, and ii) must be a factor of the number of CUDA threads per block.
|
|
|
– x=1: This will parallelize over the particle-images, and then loop over the center displacements on the GPU. It is usually slower than GPUALGO=2 but there are no constraints on the problem configuration. The particle-images are not processed all at once but in chunks.
|
|
|
|
|
|
– x=2: This will parallelize over the particle-images, and over the center displacements. The approach requires less memory bandwidth than GPUALGO=0 or GPUALGO=1. However, it has several constraints on the problem configuration: i) the number of center displacements per dimension must be a power of 2, and ii) must be a factor of the number of CUDA threads per block.
|
|
|
|
|
|
– x=1: This will parallelize over the particle-images, and then loop over the center displacements on the GPU. It is usually slower than GPUALGO=2 but there are no constraints on the problem configuration. The particle-images are not processed all at once but in chunks.
|
|
|
|
|
|
– x=0: As GPUALGO=1, all particle images are processed at once. It is always slower than GPUALGO=1 and should not be used anymore. |
|
|
\ No newline at end of file |