the code should now work for clusters with arbitrary numbers of cpus per node.
assigned to @sebak
merged
mentioned in commit 3fabdf72