Skip to content
Snippets Groups Projects

Custom thread pool

Merged Peter Bell requested to merge thread_pool into master

As promised, a thread pool in <200 lines of C++. This doesn't handle fork yet and I should emphasise that this wasn't originally written for performance sensitive code.

Some preliminary testing shows that there is roughly a 10-15 us penalty for nthreads > 1. I'm sure this can be brought down, by e.g.

  • reducing work queue contention
  • making work items less generic
  • using more atomics and fewer locks

Question: should I put the OpenMP code back and just use this as a fallback?

Edited by Peter Bell

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Thanks a lot, that looks great! I don't have time to look at it in detail at the moment, but I think the 10-15 microseconds are perfectly fine; OpenMP overhead is not negligible either (can't find benchmarks at the moment, but I'm sure it's not terribly much better), and getting parallelism with purely C++11 code is a substantial benefit.

    Question: should I put the OpenMP code back and just use this as a fallback?

    I don't think that's necessary.

  • Peter Bell added 1 commit

    added 1 commit

    • 5b58dedc - Handle fork correctly in thread pool

    Compare with previous version

  • Author Developer

    I tested with a fairly small transform

    c2c(np.random.randn(10,100), nthreads=2)

    The OpenMP version took about the same time as nthreads=1 whereas this version was ~15 us slower.

    For large enough transforms the difference is of course negligible so tweaking thread_count may be enough.

    Edited by Peter Bell
  • Peter Bell added 1 commit

    added 1 commit

    • 0b65b8fb - Handle fork correctly in thread pool

    Compare with previous version

  • Stupid question from a multithreading newbie: if POCKETFFT_PTHREADS is not defined, what is the resulting disadvantage?

  • Author Developer

    It's all to do with fork. When fork is called, only the calling thread is recreated in the child process so our thread_pool would lose all of its workers. This would mean any subsequent FFTs with nthreads>1 would deadlock.

  • So we cannot actually get multithreading going with C++11 features alone? That would be a pity.

  • Author Developer

    Well, everything except the pthread_atfork handler is plain C++11.

  • Yes, but if libpthreads is not available, the whole thing does not work.

    Still, I assume that libpthreads is available almost everywhere, and if it's easier to work with in a Python environment than OpenMP, we might as well go for it!

  • Author Developer

    I should probably also add that fork is a posix feature so if pthreads isn't available then fork shouldn't be an issue anyway. e.g. this works fine on windows without pthreads.

  • Peter Bell added 1 commit

    added 1 commit

    • 891b5454 - Better parallelisability heuristic

    Compare with previous version

  • Peter Bell unmarked as a Work In Progress

    unmarked as a Work In Progress

  • Peter Bell changed the description

    changed the description

  • Author Developer

    I've adapted thread_count a bit to account for the higher cost of submitting jobs to the pool. It also now factors vlen into the calculation and allows a sliding scale of 1 to n threads instead of just 1 or n. Just a rough heuristic but seems alright in my tests, may be very hardware dependant though.

  • I should probably also add that fork is a posix feature so if pthreads isn't available then fork shouldn't be an issue anyway. e.g. this works fine on windows without pthreads.

    OK, I think I finally got it :)

  • Martin Reinecke mentioned in commit 98bcb4c3

    mentioned in commit 98bcb4c3

Please register or sign in to reply
Loading