- [ ] Markus thinks that enabling autonuning with GPUWORKLOAD=-1 or without stating GPUWORKLOAD might be the best idea
+ Pilar agrees, and this should be activated by default (GPUWORKLOAD='')
+ Final code should look like this before merging (and keeping only one Autotuning algorithm)
- [] Discuss on following meeting the possibility to recalibrate the workload during the execution. For now there seems to be no need for such a thing, but it can be added quite easily and cheaply in terms of the performance overhead and code development
- [X] Discuss on following meeting the possibility to recalibrate the workload during the execution. For now there seems to be no need for such a thing, but it can be added quite easily and cheaply in terms of the performance overhead and code development
+ Pilar believes that this can be useful, check with Markus
+ Propose an environment variable for this, as well as for the number of comparisons to get stable performance (STABILIZER)
+ Pilar agreed, do it in such a way
- [X] Check if there is another approach for Autotuning, described in Numerical Recipes book
+ Yes with bisection, performance results will show if this is better than other approaches
- [X] Report error regarding documentation of the computation of GPUWORKLOAD
- [ ] Need to do a nice cleanup before merging into the main project
+ Check if it is OK to add workload information to the "TimeComparison" line
- [ ] Add nice printf for writing the Optimal Workload
+ Check if it is OK to add such info
- [ ] Add more profoud CUDA profiling, possibly using specialized CUDA tools for that. We will certainly need it in future when doing more developments in BioEM
+ Already added debugging information
- [ ] Ensure that pinning is done correctly (in Intel case there shouldnt be any problem)
- [ ] Strange error on phys machine where no error is supposed to occur (just after the CUDA computation). It is happening in bioem_cuda.cu:300, although the code error is 0 (which normally means cudaSuccess, hence no error). Disabled temporarily the check to make it work, but this needs to be investigated more
+ Code was 0 as cudaGetLastError was reseting the code error. Hence, using cudaPeekAtLastError() might be better
+ Actually cudaPeekAtLastError() shows that the error was CUDA_ERROR_INVALID_DEVICE
+ Error occurs in the first CUDA computing call inside for loop, in call /multComplexMap/
- [X] Still it seems that when initializing device 1 then 0, this causes a problem (initialization 0 then 1 seems to be fine). Need to inspect this problem in more details
+ Markus thinks that this may be related to the way CUDA is configured on /dvl/ machine. They enabled the special MPS mode, which for some unknown reasons is causing troubles for BioEM code.
+ We will need to inspect this more and check this hypothesis
- No need for a large execution, as the problem is quite stable. So decrease the number of MPI nodes and the size of the problem.
- Actually after update draco seems to be in /powersafe/ mode, so performance is quite stable. However, workload value could be now quite different, as OMP is not as performant as it used to be
- [ ] Discuss with Christian about draco governor modes
- [] Explain to Pilar in BioEM issue
- [X] Explain to Pilar in BioEM issue
*** TODO EXCLUSIVE_MODE on dvl GPUs issue
:LOGBOOK:
...
...
@@ -2162,3 +2170,10 @@ make: *** [all] Error 2
+ A bus error is trying to access memory that can't possibly be there. You've used an address that's meaningless to the system, or the wrong kind of address for that operation.
- The problem was probably coming from CUDA, as the contexts where not properly started in DEFAULT MODE
- New fixes to the code resolved this issue
** 2017-06-30
*** TODO See about new classes with Markus [1/2]
:LOGBOOK:
- State "TODO" from [2017-06-30 Fri 18:17]
:END:
- [X] Is he OK with such approach.=Yes, both Markus and Pilar agree with such approach