+ Just measure the timings on GPUs and CPUs for an iteration, and then from that derive the optimal balance
+ Possibly rebalance every X iterations (or projections/orientations) if the balance changed
** 2017-06-16
*** TODO Developing autotuning on dvl machine [2/4]
:LOGBOOK:
- State "TODO" from [2017-06-16 Fri 16:58]
:END:
- Added nicer error detection and print for the Driver CUDA errors
- Problem on dvl device was related to the part that tests all CUDA devices, searching for the fastest one. Commenting out this part made both MPI and non-MPI executions possible.
- [ ] Still it seems that when initializing device 1 then 0, this causes a problem (initialization 0 then 1 seems to be fine). Need to inspect this problem in more details
- [X] When changing the code, do constantly checks if numerically everything is still correct. Possibly rely on subtract_LogP.sh script available in Tutorial_Bio/MODEL_COMPARISON
+ If comparing only first 20 models, there is significant difference between the obtained Output_Probabilities and the ones sent by Markus together with the inputs. Hence another Output_Probabilities_20_ref as a reference was created
- [X] When changing workload during the execution, never getting the best performance for that workload (compared to when the same workload is tested without autotuning).=Actually, this was solved by doing deviceFinishRun() -> deviceStartRun(). These functions introduce overhead, but it seems that they are necessary
- [ ] Create a script for typical runs
*** Simple analysis of the results
#+begin_src R :results output graphics :file :file (org-babel-temp-file "figure" ".png") :exports both :width 600 :height 400 :session org-R