automatic creation and resolution of hdf5 references for large arrays
Started as a discussions in NOMAD-Discord#developers
:
Joe:
Is there a built-in function to resolve an hdf5 reference in nomad-lab? perhaps @Alvin Noe Ladines you know?
Alvin:
I can implement an external function but I really think that HDFReference should be implemented at the msection level. My opinion should be that any numpy array should be written in hdf5 when size exceeds some defined value. But we should discuss this in detail in an issue. should @Joseph Rudzinski or i create?
Joe:
Yes, this would be ideal. I don't exactly understand how this would work in terms of defining the quantities in the metainfo. Is there a way that we can get away from explicitly defining an hdf5 ref and link the reference when accessed? I guess you have an approach in mind?
I think it would probably be best if you implemented this automated type feature, since there are aspects that I am not that familiar with, but we could meet first to discuss if you want.
My proposal would be to write the units as a str attribute (UnitRegistry syntax) to the dataset in the hdf5 file and then automatically check and apply them when the reference is resolved. We could use/adapt the functions that I wrote in the H5MD parser.
Let's discuss @ladinesa