Suspiciously good performance ...
When running bench.py
, it seems that (at least on my hardware) pypocketfft very often performs better than FFTW with "FFTW_MEASURE" planning for multi-D transforms.
I verified that my libfftw.so was compiled with AVX support, so I'm really confused what's going on.
@g-peterbell any ideas? The code cannot be that good :P