Export good_size using raw C-API
pybind11 has a noticeable overhead for each function call ~200ns vs just ~30ns for a plain C-API function. That doesn't matter so much for the FFTs since they generally take much longer than that but for
good_size it can be the most expensive part of the function call.
Note: this is already in SciPy (scipy#10809)