Implement more "apply" variants for mav/fmav classes
mav
and fmav
should have full support for unary, binary and ternary element-wise operations, with variants that
- leave all arguments unchanged and produce no output array (typically reductions)
- leave arguments unchanged and create a new array (perhaps taking an extra template argument to specify the element type of the new array) (+, -, * etc.)
- change the leading argument (*=, +=, etc.).
Special shortcuts could be used if
- all arrays are contiguous with equal strides (just use a single loop over all elements)
- all arrays have equal strides (only calculate the offset once)
If possible, the implementation should process the smallest strides in the innermost loop.