and why it is not possible to run with much larger matrix on machines with very large memory (e.g. Optane memory equipped node)