Custom thread pool
As promised, a thread pool in <200 lines of C++. This doesn't handle fork
yet and I should emphasise that this wasn't originally written for performance sensitive code.
Some preliminary testing shows that there is roughly a 10-15 us penalty for nthreads > 1
. I'm sure this can be brought down, by e.g.
- reducing work queue contention
- making work items less generic
- using more atomics and fewer locks
Question: should I put the OpenMP code back and just use this as a fallback?
Merge request reports
Activity
Thanks a lot, that looks great! I don't have time to look at it in detail at the moment, but I think the 10-15 microseconds are perfectly fine; OpenMP overhead is not negligible either (can't find benchmarks at the moment, but I'm sure it's not terribly much better), and getting parallelism with purely C++11 code is a substantial benefit.
Question: should I put the OpenMP code back and just use this as a fallback?
I don't think that's necessary.
I tested with a fairly small transform
c2c(np.random.randn(10,100), nthreads=2)
The OpenMP version took about the same time as
nthreads=1
whereas this version was ~15 us slower.For large enough transforms the difference is of course negligible so tweaking
thread_count
may be enough.Edited by Peter BellI've adapted
thread_count
a bit to account for the higher cost of submitting jobs to the pool. It also now factorsvlen
into the calculation and allows a sliding scale of 1 to n threads instead of just 1 or n. Just a rough heuristic but seems alright in my tests, may be very hardware dependant though.mentioned in commit 98bcb4c3