- 31 Jul, 2017 1 commit
-
-
Pilar Cossio authored
profiling: improving NVTX profiling CPU+GPU execution See merge request !5
-
- 25 Jul, 2017 1 commit
-
-
Luka Stanisic authored
nvtx+summary: improving nvtx by tracing initialization, which also fixes potential bogus first measurement of Projection for the summary
-
- 19 Jul, 2017 4 commits
-
-
Luka Stanisic authored
-
Pilar Cossio authored
bugfix (from valgrind): even when CC is not used, it should be initialized to a… See merge request !4
-
Luka Stanisic authored
To enable it use undisclosed CMake option -DUSE_NVTX=ON (by default it is OFF)
-
Luka Stanisic authored
bugfix (from valgrind): even when CC is not used, it should be initialized to a value (e.g. 0?) as it is printed later. Also some new code developments might use this CC value (by mistake), so better to keep it initialized
-
- 11 Jul, 2017 3 commits
-
-
Pilar Cossio authored
doc and copyright update See merge request !3
-
Luka Stanisic authored
-
Luka Stanisic authored
-
- 06 Jul, 2017 1 commit
-
-
Pilar Cossio authored
minor fixes See merge request !2
-
- 03 Jul, 2017 11 commits
-
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
- 30 Jun, 2017 6 commits
-
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
- 23 Jun, 2017 3 commits
-
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
- 22 Jun, 2017 1 commit
-
-
Luka Stanisic authored
-
- 21 Jun, 2017 1 commit
-
-
Luka Stanisic authored
-
- 20 Jun, 2017 6 commits
-
-
Luka Stanisic authored
REVERT: temporarily disabled fastest CUDA detection, since it strangely causes error on dvl machine (reverted from commit 1459580e)
-
Luka Stanisic authored
-
Luka Stanisic authored
adding older preliminary data for the initial Autotuning algorithm. Results obtained on dvl01 machine
-
Luka Stanisic authored
-
Luka Stanisic authored
-
Luka Stanisic authored
-
- 19 Jun, 2017 2 commits
-
-
Luka Stanisic authored
-
Luka Stanisic authored
WATCH OUT! This is additional CUDA timing profiling activated by setting BIOEM_DEBUG_OUTPUT=4. However, this profiling is quite intrusive, as it adds additional synchronizations between GPUs and OMP that are now sequentially working on maps comparison. If the BIOEM_DEBUG_OUTPUT<4, code is ignored and the performance is back to normal
-