Custom thread pool

As promised, a thread pool in <200 lines of C++. This doesn't handle fork yet and I should emphasise that this wasn't originally written for performance sensitive code.

Some preliminary testing shows that there is roughly a 10-15 us penalty for nthreads > 1. I'm sure this can be brought down, by e.g.

  • reducing work queue contention
  • making work items less generic
  • using more atomics and fewer locks

Question: should I put the OpenMP code back and just use this as a fallback?

