Closes #13 (closed)
Instead of submitting to a shared queue, this will first loop through the list of threads and if one is free it assigns the work directly to that thread. Otherwise it dumps it into a fallback queue (shouldn't happen used under normal usage).
Calling FFTs in a loop, I was able to see the work focused onto a fixed subset of my cores instead of being spread around. I was also able to measure a small performance improvement running with 2 threads on a 4-core, 8-thread CPU.