Skip to content
Snippets Groups Projects

IPML


Author: Tristan Bereau (Max Planck Institute for Polymer Research, Mainz, Germany)

http://www2.mpip-mainz.mpg.de/~bereau/

Created: 2017


Publication

For a detailed account of the implementation, see:

Tristan Bereau, Robert A. DiStasio Jr., Alexandre Tkatchenko, and O. Anatole von Lilienfeld, Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning, The Journal of Chemical Physics 148, 241706 (2018); see also link

Installation

Requirements

ipml is a python script that requires a number of dependencies:

  • numpy
  • scipy
  • numba
  • qml

I recommend using conda (for Python 2.7) to install all dependencies https://conda.io/docs/user-guide/install/index.html.

Link to the qml package: https://github.com/qmlcode/qml. documentation: http://www.qmlcode.org/installation.html.

Make sure you have git lfs installed. See documentation https://git-lfs.github.com

Installation

Clone the repository

git clone https://gitlab.mpcdf.mpg.de/trisb/ipml.git

This will download everything but the machine learning models (*.pkl files). Because of their large sizes, they are instead accessible using git lfs.

Pull the large files from the repository:

git lfs checkout

If successful, the files mtp_bset_slatm/corr_?.pkl will be 100+MB large.

Usage

See example file: examples/1161_waterdimer10.py. It computes the intermolecular interaction of the water dimer of the S22 dataset.

Databases

Databases for the machine learning models can be found in the directory databases. They must be untarred before use, e.g.,

cd databases
tar -xzf multipole_database.tgz

Machine learning models

All ML models can be found as *.pkl files in ml_models/, except for multipole coefficients (see below).

Training machine learning model of multipoles

Coulomb matrix representation

As an example of training a machine learning model for multipoles using the Coulomb matrix representation:

cd mtp_bset
python train_mtp_ml_bset.py --mols 300 --frac 0.8 --ele O --plot

the -mols argument instructs the script to consider 300 molecules overall, the data is split between training and test at a --frac fraction (80% in this case), learning only --ele chemical element (oxygen here), and the last argument --plot uses matplotlib to display a correlation plot for the test set.

aSLATM representation

Head to the aSLATM representation directory

cd mtp_bset
python build_mbtypes.py
python train_mtp_ml_bset.py --mols 300 --frac 0.8 --ele O --plot

The first python command precomputes the aSLATM representations and stores them in the mbtypes.pkl file. These are then used by the second python command. You'll notice that training using aSLATM is slower, because the representation is significantly larger. This also results in much larger pkl training files.