IPML
Author: Tristan Bereau (Max Planck Institute for Polymer Research, Mainz, Germany)
http://www2.mpip-mainz.mpg.de/~bereau/
Created: 2017
Publication
For a detailed account of the implementation, see:
Tristan Bereau, Robert A. DiStasio Jr., Alexandre Tkatchenko, and O. Anatole von Lilienfeld, Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning, The Journal of Chemical Physics 148, 241706 (2018); see also link
Installation
Requirements
ipml
is a python script that requires a number of dependencies:
numpy
scipy
numba
qml
I recommend using conda
(for Python 2.7) to install all dependencies
https://conda.io/docs/user-guide/install/index.html.
Link to the qml
package:
https://github.com/qmlcode/qml.
documentation: http://www.qmlcode.org/installation.html.
Make sure you have git lfs
installed. See documentation
https://git-lfs.github.com
Installation
Clone the repository
git clone https://gitlab.mpcdf.mpg.de/trisb/ipml.git
This will download everything but the machine learning models (*.pkl
files).
Because of their large sizes, they are instead accessible using git lfs
.
Pull the large files from the repository:
git lfs checkout
If successful, the files mtp_bset_slatm/corr_?.pkl
will be 100+MB large.
Usage
See example file: examples/1161_waterdimer10.py
. It computes the
intermolecular interaction of the water dimer of the S22 dataset.
Databases
Databases for the machine learning models can be found in the
directory databases
. They must be untarred before use, e.g.,
cd databases
tar -xzf multipole_database.tgz
Machine learning models
All ML models can be found as *.pkl
files in ml_models/
, except
for multipole coefficients (see below).
Training machine learning model of multipoles
Coulomb matrix representation
As an example of training a machine learning model for multipoles using the Coulomb matrix representation:
cd mtp_bset
python train_mtp_ml_bset.py --mols 300 --frac 0.8 --ele O --plot
the -mols
argument instructs the script to consider 300 molecules
overall, the data is split between training and test at a --frac
fraction (80% in this case), learning only --ele
chemical element
(oxygen here), and the last argument --plot
uses matplotlib
to
display a correlation plot for the test set.
aSLATM representation
Head to the aSLATM representation directory
cd mtp_bset
python build_mbtypes.py
python train_mtp_ml_bset.py --mols 300 --frac 0.8 --ele O --plot
The first python command precomputes the aSLATM representations and
stores them in the mbtypes.pkl
file. These are then used by the
second python command. You'll notice that training using aSLATM is
slower, because the representation is significantly larger. This also
results in much larger pkl
training files.