I have been doing some quick tests of the pyiron integration in the develop deployment. Some things I have observed:
With pyiron, it comes with a set of notebooks that get placed one level above uploads. There, the user is missing some rights, and the notebooks have some errors (when instantiating the pyiron Project class). When the notebooks are inside a "NOMAD upload" folder, then they run normally.
I have also noticed that the "raw" files of the calculations get stored in some subfolders under tar.bz2 files, that cannot get processed by NOMAD after reprocessing. I tried to upload manually the lammps files but got some parsing errors. @jrudz, maybe you or one of your guys can have a quick run of the notebooks to pick up on some issues? Any other comments @mscheidg?
I have tested the example notebooks, and indeed some of them fails. They produce two distinct set of files: 1) an h5 file containing all the calculations and 2) all the raw input and outputs in tar.gz2.
I managed to run the lammps, sphinx and two more simulations without a problem.
The h5 file contains all the info that already exists in raw files in the tar.bz2. One problem is that the h5 file does not say explicitly which code/package has been used to generate the h5 file. It makes it a bit hard to parse the data into a NOMAD schema. Right now I am working on a h5 general parser such that it can pick up the h5 file and parse different sections into the schema. I will have a chat with Area C to get a better view of the h5 file content once the DPG is over.
I haven't tried to parse directly the lammps data as I get some error locally when trying to connect north to my local volumes.
One of the original ideas was that the raw files, get parsed automatically with the current parsers in NOMAD upon reprocessing. Then we will enhance the pyiron data with additional workflow entries and so on... What I meant is that parsing the original raw files does not work out of the box, as in any other NOMAD uploads. Let me know if you want to chat to clarify this.
I have talked to Markus, and the problem with some of the failed notebooks are simply wrong user permissions set on the pyiron image and we might need our own customized image of the pyiron.
Also our parser indeed fails to pick up on the lammps calculation from pyiron (ping @ladinesa)log.lammps
Lammps is extremely difficult to parse completely and accurately. I will be trying to improve some of the most common issue in the coming months. For now, yes, you could take the last one, but please add a warning then that says that the parsing may be incomplete.
@amgo Could we meet early next week so that you can catch me up on what you have done so far and we can discuss any issues and the plan for the future? I am currently free in the afternoon. 2pm?