Skip to content
Snippets Groups Projects

Descriptor Embedding and Clustering for Atomisitic-environment Framework (DECAF)

https://gitlab.mpcdf.mpg.de/klai/decaf.git

Tutorials

For tutorials with examples, please visit the GitLab Pages https://klai.pages.mpcdf.de/decaf/

Description

This is a Python package which provide a work flow to obtain clustering of local environments in dataset of structures.
Please refer the methodology paper "A Fuzzy Classification Framework to Identify Equivalent Atoms in Complex Materials and Molecules"[1] for details.
It provides mainly the following functions:

  1. Computating SOAP descriptor from an input atomic structure as an ASE Atoms object.
  2. Applying classical multidimensional scaling (MDS) on a dataset of SOAP.
  3. Differnetiating atomic environments of the embeded dataset using mean shift clustering (MSC).
  4. Embedding and classifying environments outside of MDS-MSC dataset.

Optional functions are also provided:

  1. Applying kernel principal component analysis (kPCA) / principal component analysis (PCA) / Sketch-Map[2] for embedding
  2. Applying HDBSCAN[3] for clustering.

References linking the journal article[1] and the code

Here we provide the locations in the code implementing the corresponding methods in the article.[1]
For details about how to use each function, please refer to decaf/examples/sample_code.ipynb or comments in decaf/src/decaf.py.

Methodology involved in the main text[1]:

double-SOAP (Sec.2A[1]):
decaf/src/decaf.py : function get_SOAP

classical MDS (Sec.2B[1]):
decaf/src/decaf.py : function get_cMDS

embedding any SOAP vector with obtained model
decaf/src/decaf.py : function embed_cMDS

MSC (Sec.2C[1]):
decaf/src/decaf.py : function get_MeanShift

Demonstrations in the main text[1]:

PAH examples (Sec.3A[1]):
decaf/examples/sample_code.ipynb : block PAH Example

Pd Surfaces examples (Sec.3B[1]):
decaf/examples/sample_code.ipynb : block Pd Surfaces Demonstration

Out-of-sample classification of Pd nanoparticle (Sec.3C[1]):
decaf/examples/sample_code.ipynb : block Classification Demonstration

Demonstrations in the supplementary information (SI)[1]:

kPCA (Sec.S1A[1]):
decaf/examples/sample_code.ipynb : block kPCA Embedding

SketchMap (Sec.S1B[1]):
decaf/examples/sample_code.ipynb : block Sketch Map Embedding

HDBSCAN (Sec.S2A[1]):
decaf/examples/sample_code.ipynb : block HDBSCAN Clustering

Demonstration in Sec.S3-4 are reproducible with change in (hyper)parameters according to the SI with functions in:
decaf/examples/sample_code.ipynb : block Pd Surfaces Demonstration

Demonstration in Sec.S5: MD settings and analysis are given in main text and reproducible, thus omitted in the example here.

Installation

You can install the package simply with the following command

pip install . --user

Then import the package with the following in Python

import decaf

Dependence:
Numpy, ASE, DScribe, Scikit Learn, Scipy

Repository Structure:

decaf
├── examples                            # Folder containing examples of applying DECAF
│   ├── Compiled_SketchMap              #     Folder containing compiled SketchMap if needed
│   ├── sample_code.ipynb               #     Sample code of DECAF applied on the demonstration cases
│   └── Structures                      #     Folder containing atomic structures for the demonstration cases
│       └── **.con
├── pyproject.toml                      # Setup code for installing DECAF
├── README.md                           # The readme you are reading now.
└── src                                 # Folder containing Source code of DECAF
    └── decaf.py                        #     Source code of DECAF

Reference

  1. K. C. Lai, S. Matera, C. Scheurer, K. Reuter, "A Fuzzy Classification Framework to Identify Equivalent Atoms in Complex Materials and Molecules" J. Chem. Phys 159.2 (2023). DOI: 10.1063/5.0160369 .
  2. M. Ceriotti, G. A. Tribello, and M. Parrinello, “Simplifying the representation of complex free-energy landscapes using sketch-map,” Proc. Natl. Acad. Sci. U.S.A. 108, 13023–13028 (2011).
  3. L. McInnes, J. Healy, and S. Astels, “hdbscan: Hierarchical density based clustering.” J. Open Source Softw. 2, 205 (2017).

Authors and Affiliation

Authors:
King Chun Lai, Sebastian Matera, Christoph Scheurer, Karsten Reuter

Affiliation:
Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, 14195 Berlin, Germany

Support

King Chun Lai : lai@fhi-berlin.mpg.de

License

Descriptor Embedding and Clustering for Atomisitic-environment Framework by King Chun Lai, Sebastian Matera, Christoph Scheurer, Karsten Reuter is licensed under CC BY 4.0