Implement more "apply" variants for mav/fmav classes
fmav should have full support for unary, binary and ternary element-wise operations, with variants that
- leave all arguments unchanged and produce no output array (typically reductions)
- leave arguments unchanged and create a new array (perhaps taking an extra template argument to specify the element type of the new array) (+, -, * etc.)
- change the leading argument (*=, +=, etc.).
Special shortcuts could be used if
- all arrays are contiguous with equal strides (just use a single loop over all elements)
- all arrays have equal strides (only calculate the offset once)
If possible, the implementation should process the smallest strides in the innermost loop.
To upload designs, you'll need to enable LFS. More information