pypocketfft issueshttps://gitlab.mpcdf.mpg.de/mtr/pypocketfft/-/issues2020-09-04T16:41:05Zhttps://gitlab.mpcdf.mpg.de/mtr/pypocketfft/-/issues/14More interesting OSX issues2020-09-04T16:41:05ZMartin ReineckeMore interesting OSX issuesI have a CI setup for tests on OSX for the `ducc` package, which contains an almost verbatim copy of `pypocketfft`'s threading functionality. One thing I learned was that in certain circumstances the C++ compiler defines some functionali...I have a CI setup for tests on OSX for the `ducc` package, which contains an almost verbatim copy of `pypocketfft`'s threading functionality. One thing I learned was that in certain circumstances the C++ compiler defines some functionality macros, despite the functionality not being available (see https://travis-ci.com/github/litebird/ducc/jobs/380712386).
Also it appears possible to create deadlocks with the new threading implementation on OSX (see the other recent failed builds at https://travis-ci.com/github/litebird/ducc/builds). It's quite possible that I broke something when I adjusted the threading code for `ducc`, but on the other hand the changes were quite minimal. @g-peterbell, if you have an idea what could be going wrong and/or think that the `pypocketfft` might also be vulnerable to this, please chime in!https://gitlab.mpcdf.mpg.de/mtr/pypocketfft/-/issues/13Make thread pool size more flexible?2020-08-27T12:06:07ZMartin ReineckeMake thread pool size more flexible?When running parallel FFTs with, say, four threads on a system with 12 CPUs and 24 virtual cores, I'm not observing a set of 4 virtual cores at 100% with `htop` (as would be the case with an OpenMP code), but rather a homogeneous, small ...When running parallel FFTs with, say, four threads on a system with 12 CPUs and 24 virtual cores, I'm not observing a set of 4 virtual cores at 100% with `htop` (as would be the case with an OpenMP code), but rather a homogeneous, small load on all 24 virtual cores. If I'm interpreting this correctly, this happens because we allocate a thread pool with as many threads as there are virtual cores, and assigning tasks to them in a round-robin fashion. So the load is jumping around very quickly.
This doesn't seem optimal: it invalidates a lot of caches, and it probably confuses the thread scheduler.
Would it be possible to resize the pool on demand, roughly like this:
```
inline thread_pool &get_pool2(size_t nthreads=0)
{
static std::unique_ptr<thread_pool> pool(std::make_unique<thread_pool>(1));
if ((!pool) || ((nthreads!=0) && (nthreads!=pool->size()))) // resize
{
pool = std::make_unique<thread_pool>(nthreads);
}
#if __has_include(<pthread.h>)
static std::once_flag f;
call_once(f,
[]{
pthread_atfork(
+[]{ get_pool2().shutdown(); }, // prepare
+[]{ get_pool2().restart(); }, // parent
+[]{ get_pool2().restart(); } // child
);
});
#endif
return *pool;
}
```
@g-peterbell do you think this could work, or am I missing some subtle multithreading issue? First tests look OK, but with concurrency I'd like to hear a second opinion :)https://gitlab.mpcdf.mpg.de/mtr/pypocketfft/-/issues/12multiply defined symbols2020-01-08T08:53:30ZMartin Reineckemultiply defined symbolsCurrently, the header contains symbols like `thread_id`, which need to be defined exactly once in a linked executable. If the header is included in more than one translation unit, these symbols become multiply defined, and the executable...Currently, the header contains symbols like `thread_id`, which need to be defined exactly once in a linked executable. If the header is included in more than one translation unit, these symbols become multiply defined, and the executable won't link.
I have a fix for this, but since it is part of a larger rework, a merge request will take a few more days.
@g-peterbellhttps://gitlab.mpcdf.mpg.de/mtr/pypocketfft/-/issues/11Suspiciously good performance ...2019-12-04T13:58:52ZMartin ReineckeSuspiciously good performance ...When running `bench.py`, it seems that (at least on my hardware) pypocketfft very often performs better than FFTW with "FFTW_MEASURE" planning for multi-D transforms.
I verified that my libfftw.so was compiled with AVX support, so I'm re...When running `bench.py`, it seems that (at least on my hardware) pypocketfft very often performs better than FFTW with "FFTW_MEASURE" planning for multi-D transforms.
I verified that my libfftw.so was compiled with AVX support, so I'm really confused what's going on.
@g-peterbell any ideas? The code cannot be that good :Phttps://gitlab.mpcdf.mpg.de/mtr/pypocketfft/-/issues/9Segmentation fault with multithreading on Macs?2019-08-14T13:27:27ZMartin ReineckeSegmentation fault with multithreading on Macs?A colleague just ran `stress.py` with the current master branch on a Mac and got intermittent segmentation faults.
If we switch to the `tweak_dcst4` branch (which has a macro to disable multithreading), and switch all multithreading off...A colleague just ran `stress.py` with the current master branch on a Mac and got intermittent segmentation faults.
If we switch to the `tweak_dcst4` branch (which has a macro to disable multithreading), and switch all multithreading off, things work again.
Any idea what could be going wrong, @peterbell10?
(I cannot provoke segfaults on Linux, no matter what I try...)https://gitlab.mpcdf.mpg.de/mtr/pypocketfft/-/issues/5Update author and copyright information2019-07-20T11:21:29ZMartin ReineckeUpdate author and copyright informationI'm sorry @peterbell10, I completely forgot to update copyright and author info after you started contributing.
Do you hold the copyright for your contributions yourself, or does it belong to some organization? (Mine goes to Max-Planck-...I'm sorry @peterbell10, I completely forgot to update copyright and author info after you started contributing.
Do you hold the copyright for your contributions yourself, or does it belong to some organization? (Mine goes to Max-Planck-Society automatically for everything work-related.)https://gitlab.mpcdf.mpg.de/mtr/pypocketfft/-/issues/4sine and cosine transforms?2019-07-20T11:18:37ZMartin Reineckesine and cosine transforms?I'm thinking about adding sine and cosine transforms when I find some time.
The main reason is that `scipy.fft` offers these transforms but will fall back to `scipy.fftpack` if they are called. I'm not worried about performance but rathe...I'm thinking about adding sine and cosine transforms when I find some time.
The main reason is that `scipy.fft` offers these transforms but will fall back to `scipy.fftpack` if they are called. I'm not worried about performance but rather about accuracy, which is notoriously bad for `fftpack`'s real transforms (if the transform sizes are near-prime).
Also, at the moment, existing programs using `scipy` that do not switch explicitly to `scipy.fft` won't benefit from the new library and still use `fftpack`, correct? If `pypocketfft` contains `fftpack`'s full functionality, then maybe it could replace `fftpack` entirely (behind `scipy.fftpack`'s original interface, of course).
@peterbell10, do you think this is a useful plan? Or will it be too tedious to put `pypocketfft` behind the `fftpack` interface?https://gitlab.mpcdf.mpg.de/mtr/pypocketfft/-/issues/3Python interface cleanup?2019-07-17T09:51:15ZMartin ReineckePython interface cleanup?I think it might be convenient to:
- merge `fftn` and `ifftn` into one function (maybe `c2c` or `complex_FFT`?)
- provide `r2c` and `c2r` functions instead of `rfftn` and `irfftn`. They would have a direction flag and could therefore als...I think it might be convenient to:
- merge `fftn` and `ifftn` into one function (maybe `c2c` or `complex_FFT`?)
- provide `r2c` and `c2r` functions instead of `rfftn` and `irfftn`. They would have a direction flag and could therefore also do the job of `numpy`'s `hfft` and `ihfft`.
- maybe merge `rfft_scipy` and `irfft_scipy` in an analogous fashion (and replace `scipy` by `rfftpack`, which expresses more clearly what the function does).
This would make the interface code smaller, and I think it would present the underlying symmetries of the transforms much better.
@peterbell10, what do you think?https://gitlab.mpcdf.mpg.de/mtr/pypocketfft/-/issues/2sym_rfftn for ifftn2019-06-19T08:02:40ZPeter Bellsym_rfftn for ifftnThis is an ommission from the changes I made in my scipy PR: `ifftn` for real input can be optimised in exactly the same way as with `fftn`. You just need to take the complex conjugate of the 1D r2c transform before continuing to the rem...This is an ommission from the changes I made in my scipy PR: `ifftn` for real input can be optimised in exactly the same way as with `fftn`. You just need to take the complex conjugate of the 1D r2c transform before continuing to the remainging axes.
I suspect it's not a very common use case but it's not difficult to add. Alternatively, this could go into pocketfft itself as a `forwards` argument to `r2c`.https://gitlab.mpcdf.mpg.de/mtr/pypocketfft/-/issues/1Testing2019-05-27T16:48:28ZDaniel LenzTesting