Suspiciously good performance ...

When running bench.py, it seems that (at least on my hardware) pypocketfft very often performs better than FFTW with "FFTW_MEASURE" planning for multi-D transforms. I verified that my libfftw.so was compiled with AVX support, so I'm really confused what's going on.

@g-peterbell any ideas? The code cannot be that good :P