Skip to content

New Atomic Descriptors

James Darby requested to merge jpd47/nomad-FAIR:descriptors into develop

Hi,

I've updated the atomic descriptors that are computed. The old approach was to store 4 different variations of the SOAP power spectrum, 2 descriptor vectors for each atom in the system and then 2 structure-wise-averaged descriptor vectors. Instead of this, the new approach is to store 2 structure-wise-averaged descriptor vectors only. One of these is an element-agnostic, radially scaled variant of the SOAP power spectrum and the second is the descriptor extracted from the recent MACE-MP-0 foundation model.

Questions/Discussion points:

  1. Total storage requirement is now greatly reduced to 296 floats per system. With 13 million structures currently in NOMAD I make this about 16 GB of data. Is this acceptable?
  2. Adding the MACE descriptors has introduced new dependencies via mace-torch. mace-torch is pip installable so I'm hoping this won't cause any issues. I haven't updated requirement.txt etc. Should I do this?
  3. The MACE descriptors are computed using the MACE-MP-0 foundation model. The model file (42 MB) gets automatically downloaded to ~/.cache/mace/ the first time it is called. Think we need to be careful with how this will work with multiple different "workers" running in parallel. Will they try to write to the same file?
  4. Loading the MACE model takes much longer than computing the MACE descriptor. Currently the model is loaded once per system, I don't see a great way of avoiding this but worth being aware of.
  5. The current schema for SOAP descriptors (shown below) includes entries for the parameters that were used to generate the descriptors e.g. r_cut, n_max etc. These will be the same for all structures in the database so this seems inefficient? My proposal is to put this information into the quantity definition of the soap_descriptor itself (see image) so the same data isn't duplicated for each structure. Thoughts? image

Merge request reports