Faster kernels for assembling mass matrix
Solves the following issue(s):
Partially sovle #253 (closed)
Core changes:
Improve kernel_3d_mat
by using smaller array to store part or single dimension of multidimensional array.
The kernel is now approximatively 33% faster (went from 0.2535s/exec to 0.1672s(exec) in my variational tests).
Model-specific changes:
Fix a small mistake in the variational propagator where the error was computed after the update and hence always being close to zero instead of the real error.
Documentation changes:
None