Skip to content
Snippets Groups Projects
Commit fe3dfbec authored by Luigi Sbailo's avatar Luigi Sbailo
Browse files

Minor

parent e917a10b
Branches
No related tags found
No related merge requests found
%% Cell type:code id: tags:
``` python
%%HTML
<script>
window.findCellIndicesByTag = function findCellIndicesByTag(tagName) {
return (Jupyter.notebook.get_cells()
.filter(
({metadata: {tags}}) => tags && tags.includes(tagName)
)
.map((cell) => Jupyter.notebook.find_cell_index(cell))
);
};
window.runCells = function runCells(tagName) {
var c = window.findCellIndicesByTag(tagName);
Jupyter.notebook.execute_cells(c);
};
</script>
```
%% Cell type:markdown id: tags:
<div id="teaser" style=' background-position: right center; background-size: 00px; background-repeat: no-repeat;
padding-top: 20px;
padding-right: 10px;
padding-bottom: 170px;
padding-left: 10px;
border-bottom: 14px double #333;
border-top: 14px double #333;' >
<div style="text-align:center">
<b><font size="6.4">Predicting energy differences between crystal structures: (Meta-)stability of octet-binary compounds</font></b>
</div>
<p>
created by: Mohammad-Yasin Arif, Luigi Sbailò, Thomas A. R. Purcell, Luca M. Ghiringhelli, and Matthias Scheffler
based on the original version designed by: Angelo Ziletti, Emre Ahmetcik, Runhai Ouyang, Luca M. Ghiringhelli, and Matthias Scheffler<br><br>
Fritz Haber Institute of the Max Planck Society, Faradayweg 4-6, D-14195 Berlin, Germany <br>
<span class="nomad--last-updated" data-version="v1.0.0">[Last updated: Sep 5, 2020]</span>
<div>
<img style="float: left;" src="assets/descriptor_role/Logo_MPG.png" width="200">
<img style="float: right;" src="assets/descriptor_role/Logo_NOMAD.png" width="250">
</div>
</div>
%% Cell type:code id: tags:
``` python
%%HTML
<script>
code_show=true;
function code_toggle() {
if (code_show)
{
$('div.input').hide();
}
else
{
$('div.input').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
window.runCells("startup");
</script>
The raw code for this notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.
```
%% Output
%% Cell type:markdown id: tags:
### Introduction
This tutorial shows how to find descriptive parameters (short formulas) that predict the crystal structure (here, rocksalt (RS) or zincblende (ZB)), using the example of octet binary compounds. It is based on the algorithm sure independence screening and sparsifying operator (SISSO), that enables to search for optimal descriptor by scanning huge feature spaces.
<div style="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;">R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, L. M. Ghiringhelli: <span style="font-style: italic;">SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates</span>, Phys. Rev. Materials 2, 083802 (2018) <a href="https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802" target="_blank">[PDF]</a>.</div>
With the default settings, the method reproduces the same results from:
<div style="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;">L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, M. Scheffler: <span style="font-style: italic;">Big Data of Materials Science: Critical Role of the Descriptor</span>, Phys. Rev. Lett. 114, 105503 (2015) <a href="http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.114.105503">[PDF]</a>,</div>
%% Cell type:markdown id: tags:
<details>
<summary>
<div style="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;"><b>Explanation of the method (click to expand)</b></div></summary>
We present a tool for predicting the crystal structure of octet binary compounds, by using a set of descriptive parameters (a descriptor) based on free-atom data of the atomic species constituting the binary material. We apply a newly developed method: sure independence screening and sparsifying operator (SISSO), that allows to find an optimal descriptor in a huge feature space containing billions of features. In this tutorial an $\ell_0$-optimization is used as the sparsifying operator.
The method is described in:
<div style="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;">
R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, L. M. Ghiringhelli: <span style="font-style: italic;">SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates</span>, Phys. Rev. Materials 2, 083802 (2018) <a href="https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802" target="_blank">[PDF]</a>. <br> </div>
SISSO($\ell_0$) works iteratively. In the first iteration, a number k of features is collected that have the largest correlation (scalar product) with the property vector. The feature with the largest correlation is simply the 1D descriptor. Next, a residual is constructed as the error made at the first iteration. A new set of k features is now selected as those having the largest correlation with the residual. The 2D descriptor is the pair of features that yield the smallest fitting error upon least square regression, among all possible pairs contained in the union of the sets selected in this and the first iteration. In each next iteration a new residual is constructed as the error made in the previous iteration, then a new set of k features is extracted as those that have largest correlation with each new residual. The nD descriptor is the n-tuple of features that yield the smallest fitting error upon least square regression, among all possible n-tuples contained in the union of the sets obtained in each new iteration and all the previous iterations. If k=1 the method collapses to the so-called orthogonal matching pursuit.
The prediction of the ground-state structure for binary compounds from a simple descriptor has a notable history in materials science [1-7], where descriptors were designed by chemically/physically-inspired intuition. The tool presented here allows for the machine-learning-aided automatic discovery of a descriptor and a model for the prediction of the difference in energy between a selected pair of structures for 82 octet binary materials.
By running the tutorial with the default setting, the (RS vs. ZB) results of the <a href="http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.114.10550" target="_blank">PRL 2015</a> identified by the LASSO+$\ell_0$ method can be recovered. SISSO and LASSO+$\ell_0$ do not always yield the same results (see <a href="http://analytics-toolkit.nomad-coe.eu/tutorial-SIS">compressed-sensing tutorial</a>) but in this case the default model parameters were tuned to obtain the same results.
Additionally, in <a href="http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.114.10550" target="_blank">PRL-2015</a>, a slightly different criterion for the construction of the feature set was adopted, compared to <a href="https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802" target="_blank">PRM-2018</a>. For the sake of reproducing exactly the results of <a href="http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.114.10550" target="_blank">PRL-2015</a>, the default settings in the input widget include "PRL2015" as choice for "SISSO rung".
References:
<ol>
<li>J. A. van Vechten, Phys. Rev. 182, 891 (1969).</li>
<li>J. C. Phillips, Rev. Mod. Phys. 42, 317 (1970).</li>
<li>J. John and A. N. Bloch, Phys. Rev. Lett. 33, 1095 (1974).</li>
<li>J. R. Chelikowsky and J. C. Phillips, Phys. Rev. B 17, 2453 (1978).</li>
<li>A. Zunger, Phys. Rev. B 22, 5839 (1980).</li>
<li>D. G. Pettifor, Solid State Commun. 51, 31 (1984).</li>
<li>Y. Saad, D. Gao, T. Ngo, S. Bobbitt, J. R. Chelikowsky, and W. Andreoni, Phys. Rev. B 85, 104104 (2012).</li>
</ol>
In this example, you can run a compressed-sensing based algorithm for finding the optimal descriptor and model that predicts the difference in energy between crystal structures (here, zincblende vs. rocksalt).
The descriptor is selected out of a large number of candidates constructed as functions of basic input features, the primary features.
</details>
%% Cell type:markdown id: tags:
The idea demonstrated in this tutorial is to start from simple physical quantities ("primary features", here properties of the constituent free atoms such as orbital radii), to generate millions (or billions) of candidate formulas by applying arithmetic operations combining primary features. These candidate formulas constitute the so-called "feature space". Then, SISSO is used to select only a few of these formulas that explain the data.
By clicking directly on "Run" below, you can reproduce results from the above publication, or you can modify the settings to produce your own results. To the purpose, in the panel below, you can select primary features and allowed operations by clicking the check-boxes. You can also select the SISSO rung (i.e., the number of iterations in the construction of the feature space), the number of features that are selected at each iteration of the SIS step, and the max number of dimensions of the model. Then, press "Run". After the results are shown for all models from one dimensional to the max chosen dimension, you can press "Plot interactive map" to reveal a map of the RS vs ZB relative stability, for the highest dimensional model. If the highest dimension model is 2D, the separation line between the two phases (i.e., the locus where the predicted $\Delta$E is zero) is shown. For higher dimensional models, the 3rd and 4th dimensions can be controlled by the size of the Marker or the Color. Drop-down menus allow to assign axes, markers, and colors, to the descriptor components of choice
By running the tutorial with the default setting, the (RS vs. ZB) results of the <a href="http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.114.10550" target="_blank">PRL 2015</a> identified by the LASSO+$\ell_0$ method can be recovered. SISSO and LASSO+$\ell_0$ do not always yield the same results (see <a href="http://analytics-toolkit.nomad-coe.eu/tutorial-SIS">compressed-sensing tutorial</a>) but in this case the default model parameters were tuned to obtain the same results.
Additionally, in <a href="http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.114.10550" target="_blank">PRL-2015</a>, a slightly different criterion for the construction of the feature set was adopted, compared to <a href="https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802" target="_blank">PRM-2018</a>. For the sake of reproducing exactly the results of <a href="http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.114.10550" target="_blank">PRL-2015</a>, the default settings in the input widget include "PRL2015" as choice for "SISSO rung". In order to unlock the feature and operator selection, first select a rung different from "PRL2015".
%% Cell type:code id: tags:startup
``` python
import os
import pandas as pd
import numpy as np
from itertools import combinations
from time import time
import matplotlib.pyplot as plt
import scipy.stats as ss
import warnings
from collections import Counter
from sklearn.kernel_ridge import KernelRidge
from sklearn.model_selection import GridSearchCV, LeaveOneOut
from IPython.display import HTML, clear_output
import ipywidgets as widgets
import plotly.graph_objects as go
from ipywidgets import widgets, interactive
from jupyter_jsmol import JsmolView
import nglview
from ase.io import read
from descriptor_role.utils import generate_structures
from descriptor_role.visualizer import Visualizer
from cpp_sisso import generate_fs, SISSORegressor, generate_phi_0_from_csv, FeatureSpace
# set display options for the notebook
%matplotlib inline
warnings.filterwarnings('ignore')
```
%% Cell type:code id: tags:startup
``` python
# load data
RS_structures = read("data/descriptor_role/RS_structures.xyz", index=':')
ZB_structures = read("data/descriptor_role/ZB_structures.xyz", index=':')
def generate_table(RS_structures, ZB_structures):
for RS, ZB in zip(RS_structures, ZB_structures):
energy_diff = RS.info['energy'] - ZB.info['energy']
min_struc_type = 'RS' if energy_diff < 0 else 'ZB'
struc_obj_min = RS if energy_diff < 0 else ZB
yield [RS.info['energy'], ZB.info['energy'],
energy_diff, min_struc_type,
RS.info['Z'], ZB.info['Z'],
RS.info['period'], ZB.info['period'],
RS.info['IP'], ZB.info['IP'],
RS.info['EA'], ZB.info['EA'],
RS.info['E_HOMO'], ZB.info['E_HOMO'],
RS.info['E_LUMO'], ZB.info['E_LUMO'],
RS.info['r_s'], ZB.info['r_s'],
RS.info['r_p'], ZB.info['r_p'],
RS.info['r_d'], ZB.info['r_d'],
abs(RS.info['r_p']+RS.info['r_s']-ZB.info['r_p']-ZB.info['r_s']),
abs(RS.info['r_p']-RS.info['r_s'])+abs(ZB.info['r_p']-ZB.info['r_s']),
RS, ZB, struc_obj_min]
df = pd.DataFrame(
generate_table(RS_structures, ZB_structures),
columns=['energy_RS', 'energy_ZB',
'energy_diff', 'min_struc_type',
'Z_A (nuc_charge)', 'Z_B (nuc_charge)',
'period_A (unitless)', 'period_B (unitless)',
'IP_A (eV_IP)', 'IP_B (eV_IP)',
'EA_A (eV_IP)', 'EA_B (eV_IP)',
'E_HOMO_A (eV)', 'E_HOMO_B (eV)',
'E_LUMO_A (eV)', 'E_LUMO_B (eV)',
'r_s_A', 'r_s_B',
'r_p_A', 'r_p_B',
'r_d_A', 'r_d_B',
'r_sigma', 'r_pi',
'struc_obj_RS', 'struc_obj_ZB', 'struc_obj_min'],
index=list(RS.get_chemical_formula() for RS in RS_structures)
)
# print data without structure objects
df_reduced = df.drop(['energy_RS', 'energy_ZB', 'min_struc_type', 'struc_obj_RS', 'struc_obj_ZB', 'struc_obj_min'], axis=1)
#df_reduced
```
%% Cell type:code id: tags:startup
``` python
def plot_2d_solution(b):
with out2:
clear_output()
generate_structures (RS_structures,ZB_structures)
visualizer=Visualizer(df, sisso, feat_space)
visualizer.show()
```
%% Cell type:code id: tags:startup
``` python
def get_feat_space_and_sr(
df,
ops=["add", "abs_diff", "div", "sq", "exp"],
cols="all",
max_phi=2,
n_sis_select=50,
remove_double_divison=True,
max_dim=3,
n_residual=1,
default=True,
):
phi_0, prop_unit, prop, prop_test, task_sizes_train, task_sizes_test, leave_out_inds = generate_phi_0_from_csv(
df, "energy_diff", cols=cols, task_key=None, leave_out_frac=0.0, leave_out_inds=None
)
if default:
feat_space = FeatureSpace(
"./data/descriptor_role/phi.txt",
phi_0,
task_sizes_train,
n_sis_select,
1.0
)
else:
feat_space = generate_fs(
phi_0,
prop,
task_sizes_train,
ops,
max_phi,
n_sis_select
)
sisso = SISSORegressor(
feat_space,
prop_unit,
prop,
prop_test,
task_sizes_train,
task_sizes_test,
leave_out_inds,
max_dim,
1,
1
)
return feat_space, sisso
```
%% Cell type:code id: tags:startup
``` python
def prl_select(change):
if change['new'] == 'PRL2015':
default_selection('')
else:
for widget in op_list+feat_list:
widget.disabled = False
def default_selection(b):
default_operations = ['add','abs_diff','exp','sq','div']
default_features = ['IP','EA','r_s','r_p','r_d']
for op, widget in zip(possible_operations, op_list):
widget.value = op in default_operations
widget.disabled = True
for feat, widget in zip(possible_features, feat_list):
widget.value = feat in default_features
widget.disabled = True
tier_selection.value = 'PRL2015'
feat_per_iter_selection.value = 50
dimension_selection.value = 2
def find_descriptors(b):
with out2:
clear_output()
with out1:
clear_output()
print('Calculating...', flush=True)
selected_features = []
allowed_operations = []
for op, widget in zip(possible_operations, op_list):
if widget.value:
allowed_operations.append(op)
for feat, widget in zip(possible_features, feat_list):
if widget.value:
if feat == 'r_sigma' or feat == 'r_pi':
selected_features.append(feat)
else:
selected_features.extend([feat+nuc for nuc in ['_A', '_B']])
if tier_selection.value == 'PRL2015':
selected_features = "all"
tier = 2
default = True
else:
tier = tier_selection.value
default = False
global feat_space
global sisso
feat_space, sisso = get_feat_space_and_sr(
df = df_reduced,
ops = allowed_operations,
cols = selected_features,
max_phi = tier,
n_sis_select = feat_per_iter_selection.value,
remove_double_divison=True,
max_dim = dimension_selection.value,
n_residual = 1,
default = default)
clear_output()
if (dimension_selection.value>1):
plot_button.disabled=False
else:
plot_button.disabled=True
print("Number of features generated: " + str(feat_space.n_feat))
try:
sisso.fit()
for i in range(dimension_selection.value):
print(str(i+1)+'D model')
print("RMSE: {:.4} | Descriptor: {}".format(sisso.models[i][0].rmse, sisso.models[i][0]))
string = "c0:{:.4}".format(sisso.models[i][0].coefs[0][-1])
for j in range(i+1):
string = string + str(" | a"+str(j)+":{:.4}".format(sisso.models[i][0].coefs[0][j]))
print(string + '\n')
except RuntimeError:
print("\nThe number of selected features per SIS iteration is bigger than the number of features available. Please reduce the number of selected features per SIS iteration (number of features generated / max number of dimensions) or increase the number of selected features and operations.")
feat_space, sisso = get_feat_space_and_sr(
df = df_reduced,
ops = allowed_operations,
cols = selected_features,
max_phi = tier,
n_sis_select = feat_per_iter_selection.value,
remove_double_divison=True,
max_dim = dimension_selection.value,
n_residual = 1,
default = default)
clear_output()
if (dimension_selection.value>1):
plot_button.disabled=False
else:
plot_button.disabled=True
print("Number of features generated: " + str(feat_space.n_feat))
try:
sisso.fit()
for i in range(dimension_selection.value):
print(str(i+1)+'D model')
print("RMSE: {:.4} | Descriptor: {}".format(sisso.models[i][0].rmse, sisso.models[i][0]))
string = "c0:{:.4}".format(sisso.models[i][0].coefs[0][-1])
for j in range(i+1):
string = string + str(" | a"+str(j)+":{:.4}".format(sisso.models[i][0].coefs[0][j]))
print(string + '\n')
except RuntimeError:
print("\nThe number of selected features per SIS iteration is bigger than the number of features available. Please reduce the number of selected features per SIS iteration (number of features generated / max number of dimensions) or increase the number of selected features and operations.")
except:
print('The present selection does not lead to the creation of any derived features in the highest selected rung, please select at least one binary or power operator, or reduce the maximum rung')
```
%% Cell type:code id: tags:startup
``` python
cb_layout = widgets.Layout(width = '15px')
thin_layout = widgets.Layout(width = '100px')
mid_layout = widgets.Layout(width = '200px')
wide_layout = widgets.Layout(width = '300px')
#possible_operations = ['add', 'sub', 'abs_diff', 'mult', 'div', 'exp', 'neg_exp', 'inv', 'sq', 'cb',
# 'six_pow', 'sqrt', 'cbrt', 'log', 'abs', 'sin', 'cos']
possible_operations = ['add', 'sub', 'abs_diff', 'mult', 'div', 'exp', 'neg_exp', 'inv', 'sq', 'cb',
'sqrt', 'cbrt', 'log', 'abs']
possible_features = ['Z','IP','EA','E_HOMO','E_LUMO','r_s','r_p','r_d', 'r_sigma', 'r_pi']
tooltips = {
"Z" : "Atomic number",
"IP" : "Atomic ionization potential",
"EA" : "Atomic electron affinity",
"E_HOMO" : "Energy of highest occupied atomic orbital",
"E_LUMO" : "Energy of lowest unoccupied molecular orbital",
"r_s" : "Radius at which the radial probability density of the valence s orbital is maximum",
"r_p" : "Radius at which the radial probability density of the valence p orbital is maximum",
"r_d" : "Radius at which the radial probability density of the valence d orbital is maximum",
"r_sigma" : "John-Bloch's indicator 1: |rp(A) + rs(A) - rp(B) - rs(B)| [Phys. Rev. Lett. 33. 1095 (1974)",
"r_pi": "John-Bloch's indicator2: |rp(A) - rs(A)| +| rp(B) - rs(B)| [Phys. Rev. Lett. 33. 1095 (1974)]"
}
labels = {
'add' : '$x + y$', 'sub' : '$x - y$', 'abs_diff' : '$|x - y|$', 'mult' : '$x \cdot y$', 'div' : '$x / y$',
'exp' : '$\exp(x)$', 'neg_exp' : '$\exp(-x)$', 'inv' : '$1/x$', 'sq' : '$x^2$', 'cb' : '$x^3$',
'six_pow' : '$x^6$', 'sqrt' : '$\sqrt{x}$', 'cbrt' : '$\sqrt[3]{x}$', 'log' : '$\log(x)$',
'abs' : '$|x|$', 'sin' : '$\sin(x)$', 'cos' : '$\cos(x)$', 'Z' : '$Z$', 'IP' : '$IP$', 'EA' : '$EA$',
'E_HOMO' : '$E_{HOMO}$', 'E_LUMO' : '$E_{LUMO}$', 'r_s' : '$r_s$', 'r_p' : '$r_p$', 'r_d' : '$r_d$',
'r_sigma' : '$r_\sigma$', 'r_pi' : '$r_\pi$'
}
op_list = []
op_labels = []
feat_list = []
feat_labels = []
for operation in possible_operations:
op_list.append(widgets.Checkbox(description='', value=True, indent=False, layout=cb_layout))
op_labels.append(widgets.Label(value=labels[operation]))
for feature in possible_features:
feat_list.append(widgets.Checkbox(description=tooltips[feature], value=True, indent=False, layout=cb_layout))
feat_labels.append(widgets.Label(value=labels[feature]))
op_box = widgets.VBox([widgets.Label()]+op_list)
op_label_box = widgets.VBox([widgets.Label(value='Operations:', layout=thin_layout)]+op_labels)
feat_box = widgets.VBox([widgets.Label()]+feat_list)
feat_label_box = widgets.VBox([widgets.Label(value='Features:', layout=thin_layout)]+feat_labels)
tier_selection = widgets.Dropdown(options=['PRL2015', 1,2,3], layout=thin_layout)
feat_per_iter_selection = widgets.BoundedIntText(value=26, min=1, max=100, step=1, layout=thin_layout)
dimension_selection = widgets.BoundedIntText(value = 3, min=1, max=4, step=1, layout = thin_layout)
settings_box = widgets.VBox([
widgets.Label(value='Settings:', layout=wide_layout),
widgets.Label(value='SISSO rung:', layout=wide_layout),
tier_selection,
widgets.Label(value='Number of selected features per SIS iteration:', layout=wide_layout),
feat_per_iter_selection,
widgets.Label(value='Maximum number of dimensions:', layout=wide_layout),
dimension_selection])
default_button = widgets.Button(description = 'Default selection', layout=mid_layout)
descriptor_button = widgets.Button(description = 'Run', layout=mid_layout)
plot_button = widgets.Button(description = 'Plot interactive map', disabled=True, layout=mid_layout)
default_button.on_click(default_selection)
descriptor_button.on_click(find_descriptors)
plot_button.on_click(plot_2d_solution)
button_box = widgets.VBox([default_button, descriptor_button, plot_button])
out1 = widgets.Output()
out2 = widgets.Output()
gui_box = widgets.HBox([op_box, op_label_box, feat_box, feat_label_box, settings_box, button_box])
out_box = widgets.VBox([gui_box, out1, out2])
tier_selection.observe(prl_select, names='value')
default_selection('')
display(out_box)
```
%% Output
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment