Getting rid of standard parallelization?

The default way of parallelizing large tasks with MPI is currently to distribute fields over MPI tasks along their first axis. While it appears to work (our tests run fine with MPI), it has not been used during the last few years, and other parallelization approaches seem more promising.

If there is general agreement, I propose to remove this parallelization from the code, which would make NIFTy much smaller and easier to use and maintain.