Capturing simple molecule + surface systems in the topology
After issue #855 (closed), we are able to identify surfaces within the data with fairly good accuracy. Another interesting extension to this would be to try to capture surface + molecule combinations = adsorption systems (there are probably +10k of these in NOMAD). To do this, we need to
-
Modify the topology
function in nomad/normalizing/material.py:- After the clustering by MatID, you can run the network analysis for the remaining outlier atoms, and if they are connected + are organic (C, H, O, N, S, F?) the outliers are classified as molecule.
-
Add unit tests: - One can find nice examples in the search with structural_type=surface + some organic chemical element. Maybe 5-6 examples will be a good start. Including a system that is "split" by the periodic boundary would be good to make sure that our PBC handling is correct.
- Test that both the surface and the molecule are correctly identified (structural_type=surface/molecule). The molecule should have 100% correct indices.
- Test that in systems with several unconnected outliers, only the surface is detected. In such systems, there is a higher chance that what we think is a molecule is actually something else. We can think about this more when we get some experience with handling single molecules.