use numpy add.at

16 jobs for fix_cf_performace in 19 minutes and 8 seconds (queued for 5 seconds)