Merge branch 'BioEM-1.0' into 'master'

profiling: improving NVTX profiling CPU+GPU execution

See merge request !5
2 jobs for master in 39 seconds (queued for 2 seconds)