Commit 50bf1803 authored by Luka Stanisic's avatar Luka Stanisic

updating labbook

parent 62b893fe
......@@ -445,7 +445,7 @@ mpiexec -n 8 $BUILD_DIR/bioEM --Inputfile $INPUT_DIR/INPUT_FRH_Sep2016 --Modelfi
- Later just do /llsubmit batch_script/
*** [working] new recipes for draco
- Installation with Intel modules
**** Installation with Intel modules
#+BEGIN_SRC
module purge
module load git/2.13
......@@ -471,7 +471,9 @@ cmake -DMPI_C_COMPILER=mpiicc -DMPI_CXX_COMPILER=mpiicpc -DCMAKE_CXX_COMPILER=ic
make -j5 VERBOSE=1
#+END_SRC
- Installation with gcc modules. Before CUDA_HOST_COMPILER needs to be manually set to gcc/5.4 as CUDA was compiled with that gcc
**** Installation with gcc modules
Before CUDA_HOST_COMPILER needs to be manually set to gcc/5.4 as CUDA was compiled with that gcc
#+BEGIN_SRC
#set (CUDA_HOST_COMPILER /mpcdf/soft/SLES122/common/gcc/5.4.0/bin/gcc)
# # Needs to be set instead of
......@@ -597,6 +599,7 @@ srun -n 4 $BUILD_DIR/bioEM --Inputfile $TUTORIAL_DIR/Param_Input --Modelfile $TU
#$ -M luka.stanisic@rzg.mpg.de
#$ -pe impi_hydra 96
#$ -l h_rt=01:00:00
#$ -P gpu
#$ -l use_gpus=1
##$ -R yes
......@@ -1693,8 +1696,9 @@ ggplot(df, aes(x=Workload, y=Time, fill=Workload)) + geom_bar(stat="identity") +
#+RESULTS:
[[file:analysis/results1_analysis.pdf]]
** 2017-06-19
*** TODO Autotuning [6/8]:
*** TODO Autotuning [7/8]:
:LOGBOOK:
- State "TODO" from "TODO" [2017-07-03 Mon 08:51]
- State "TODO" from "TODO" [2017-06-30 Fri 10:49]
- State "TODO" from "TODO" [2017-06-30 Fri 10:49]
- State "TODO" from "TODO" [2017-06-23 Fri 15:46]
......@@ -1721,9 +1725,10 @@ ggplot(df, aes(x=Workload, y=Time, fill=Workload)) + geom_bar(stat="identity") +
- [ ] Markus thinks that enabling autonuning with GPUWORKLOAD=-1 or without stating GPUWORKLOAD might be the best idea
+ Pilar agrees, and this should be activated by default (GPUWORKLOAD='')
+ Final code should look like this before merging (and keeping only one Autotuning algorithm)
- [ ] Discuss on following meeting the possibility to recalibrate the workload during the execution. For now there seems to be no need for such a thing, but it can be added quite easily and cheaply in terms of the performance overhead and code development
- [X] Discuss on following meeting the possibility to recalibrate the workload during the execution. For now there seems to be no need for such a thing, but it can be added quite easily and cheaply in terms of the performance overhead and code development
+ Pilar believes that this can be useful, check with Markus
+ Propose an environment variable for this, as well as for the number of comparisons to get stable performance (STABILIZER)
+ Pilar agreed, do it in such a way
- [X] Check if there is another approach for Autotuning, described in Numerical Recipes book
+ Yes with bisection, performance results will show if this is better than other approaches
- [X] Report error regarding documentation of the computation of GPUWORKLOAD
......@@ -1747,7 +1752,9 @@ ggplot(df, aes(x=Workload, y=Time, fill=Workload)) + geom_bar(stat="identity") +
- [ ] Need to do a nice cleanup before merging into the main project
+ Check if it is OK to add workload information to the "TimeComparison" line
- [ ] Add nice printf for writing the Optimal Workload
+ Check if it is OK to add such info
- [ ] Add more profoud CUDA profiling, possibly using specialized CUDA tools for that. We will certainly need it in future when doing more developments in BioEM
+ Already added debugging information
- [ ] Ensure that pinning is done correctly (in Intel case there shouldnt be any problem)
*** DONE Simple analysis of the result2
......@@ -1807,6 +1814,7 @@ ggplot(df, aes(x=Algorithm, y=Time, fill=Workload)) + geom_bar(stat="identity")
- [ ] Strange error on phys machine where no error is supposed to occur (just after the CUDA computation). It is happening in bioem_cuda.cu:300, although the code error is 0 (which normally means cudaSuccess, hence no error). Disabled temporarily the check to make it work, but this needs to be investigated more
+ Code was 0 as cudaGetLastError was reseting the code error. Hence, using cudaPeekAtLastError() might be better
+ Actually cudaPeekAtLastError() shows that the error was CUDA_ERROR_INVALID_DEVICE
+ Error occurs in the first CUDA computing call inside for loop, in call /multComplexMap/
- [X] Still it seems that when initializing device 1 then 0, this causes a problem (initialization 0 then 1 seems to be fine). Need to inspect this problem in more details
+ Markus thinks that this may be related to the way CUDA is configured on /dvl/ machine. They enabled the special MPS mode, which for some unknown reasons is causing troubles for BioEM code.
+ We will need to inspect this more and check this hypothesis
......@@ -1970,7 +1978,7 @@ ggplot(df, aes(x=Algorithm, y=Time, fill=Workload)) + geom_bar(stat="identity")
- No need for a large execution, as the problem is quite stable. So decrease the number of MPI nodes and the size of the problem.
- Actually after update draco seems to be in /powersafe/ mode, so performance is quite stable. However, workload value could be now quite different, as OMP is not as performant as it used to be
- [ ] Discuss with Christian about draco governor modes
- [ ] Explain to Pilar in BioEM issue
- [X] Explain to Pilar in BioEM issue
*** TODO EXCLUSIVE_MODE on dvl GPUs issue
:LOGBOOK:
......@@ -2162,3 +2170,10 @@ make: *** [all] Error 2
+ A bus error is trying to access memory that can't possibly be there. You've used an address that's meaningless to the system, or the wrong kind of address for that operation.
- The problem was probably coming from CUDA, as the contexts where not properly started in DEFAULT MODE
- New fixes to the code resolved this issue
** 2017-06-30
*** TODO See about new classes with Markus [1/2]
:LOGBOOK:
- State "TODO" from [2017-06-30 Fri 18:17]
:END:
- [X] Is he OK with such approach.=Yes, both Markus and Pilar agree with such approach
- [ ] How about the copyrights (and my name)?
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment