As promised, a thread pool in <200 lines of C++. This doesn't handle
fork yet and I should emphasise that this wasn't originally written for performance sensitive code.
Some preliminary testing shows that there is roughly a 10-15 us penalty for
nthreads > 1. I'm sure this can be brought down, by e.g.
- reducing work queue contention
- making work items less generic
- using more atomics and fewer locks
Question: should I put the OpenMP code back and just use this as a fallback?