Commit 86e45d64 authored by Qaem Hassanzada's avatar Qaem Hassanzada
Browse files

fix the Platt probability mistake for the chemical formula

parent 286be127
...@@ -1074,11 +1074,11 @@ ...@@ -1074,11 +1074,11 @@
" nA = ABX3.nA\n", " nA = ABX3.nA\n",
" rA_rB_ratio = ABX3.rA/ABX3.rB\n", " rA_rB_ratio = ABX3.rA/ABX3.rB\n",
" rX_rB_ratio = ABX3.rX/ABX3.rB\n", " rX_rB_ratio = ABX3.rX/ABX3.rB\n",
" p_tau=[eval(tolerance_factor_dict[\"tau\"][0])]*len(p_t_sisso_train)\n",
" p_tau=clf2.predict_proba(np.array(p_tau).reshape(-1,1))[:,1]\n",
" if (str(ABX3.chosen_ox_states)==\"nan\"):\n", " if (str(ABX3.chosen_ox_states)==\"nan\"):\n",
" print(\"Unfortunately, no charge-balanced combinations for the cations oxidation states were found, therefore tau can not be evaluated for the given formula.\")\n", " print(\"Unfortunately, no charge-balanced combinations for the cations oxidation states were found, therefore tau can not be evaluated for the given formula.\")\n",
" else:\n", " else:\n",
" p_tau=[eval(tolerance_factor_dict[\"tau\"][0])]*len(p_t_sisso_train)\n",
" p_tau=clf2.predict_proba(np.array(p_tau).reshape(-1,1))[:,1]\n",
" print(\"The entered chemical formula has: \\n\\n Larger cation (A): %s \\n Smaller cation (B): %s \\n Anion (X): %s\" % (ABX3.pred_A,ABX3.pred_B,ABX3.X))\n", " print(\"The entered chemical formula has: \\n\\n Larger cation (A): %s \\n Smaller cation (B): %s \\n Anion (X): %s\" % (ABX3.pred_A,ABX3.pred_B,ABX3.X))\n",
" tau_value=eval(tolerance_factor_dict[\"tau\"][0])\n", " tau_value=eval(tolerance_factor_dict[\"tau\"][0])\n",
" a=[ABX3.pred_A,ABX3.pred_B,ABX3.X,\"3\"]\n", " a=[ABX3.pred_A,ABX3.pred_B,ABX3.X,\"3\"]\n",
......
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<img src="assets/perovskites_tolerance_factor/header.jpg" width="900"> <img src="assets/perovskites_tolerance_factor/header.jpg" width="900">
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<img style="float: left;" src="assets/perovskites_tolerance_factor/logo_MPG.png" width=150> <img style="float: left;" src="assets/perovskites_tolerance_factor/logo_MPG.png" width=150>
<img style="float: left; margin-top: -10px" src="assets/perovskites_tolerance_factor/logo_NOMAD.png" width=250> <img style="float: left; margin-top: -10px" src="assets/perovskites_tolerance_factor/logo_NOMAD.png" width=250>
<img style="float: left; margin-top: -5px" src="assets/perovskites_tolerance_factor/logo_HU.png" width=130> <img style="float: left; margin-top: -5px" src="assets/perovskites_tolerance_factor/logo_HU.png" width=130>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This tutorial shows how a tolerance factor for predicting perovskite stability can be learned from data with the sure-independece-screening-and-sparsifying-operator (SISSO) descriptor-identification approach. This tutorial shows how a tolerance factor for predicting perovskite stability can be learned from data with the sure-independece-screening-and-sparsifying-operator (SISSO) descriptor-identification approach.
The SISSO method is described in detail in: The SISSO method is described in detail in:
<div style="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;"> <div style="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;">
R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, L. M. Ghiringhelli: <span style="font-style: italic;">SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates</span>, Phys. Rev. Materials 2, 083802 (2018) <a href="https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802" target="_blank">[PDF]</a> . R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, L. M. Ghiringhelli: <span style="font-style: italic;">SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates</span>, Phys. Rev. Materials 2, 083802 (2018) <a href="https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802" target="_blank">[PDF]</a> .
</div> </div>
This tutorial is based on the following publication: This tutorial is based on the following publication:
<div style="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;"> <div style="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;">
C. Bartel, C. Sutton, B. R. Goldsmith, R. Ouyang, C. B. Musgrave, Luca M. Ghiringhelli, M. Scheffler: <span style="font-style: italic;">New tolerance factor to predict the stability of perovskite oxides and halides</span>, Sci. Adv. 5, eaav0693 (2019) <a href="https://advances.sciencemag.org/content/advances/5/2/eaav0693.full.pdf" target="_blank">[PDF]</a> . C. Bartel, C. Sutton, B. R. Goldsmith, R. Ouyang, C. B. Musgrave, Luca M. Ghiringhelli, M. Scheffler: <span style="font-style: italic;">New tolerance factor to predict the stability of perovskite oxides and halides</span>, Sci. Adv. 5, eaav0693 (2019) <a href="https://advances.sciencemag.org/content/advances/5/2/eaav0693.full.pdf" target="_blank">[PDF]</a> .
</div> </div>
Some scripts and data tables for this notebook have been adapted from the content of: Some scripts and data tables for this notebook have been adapted from the content of:
<div style="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;"> <div style="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;">
<a href="https://github.com/CJBartel/perovskite-stability" target="_blank">https://github.com/CJBartel/perovskite-stability</a> <a href="https://github.com/CJBartel/perovskite-stability" target="_blank">https://github.com/CJBartel/perovskite-stability</a>
</div> </div>
# Perovskites and the Goldschmidt tolerance factor # Perovskites and the Goldschmidt tolerance factor
Perovskites constitute a class of materials having the basic formula $ABX_3$ and displaying a common structure in which a smaller metal cation $B$ (e.g. a transition metal) resides in corner-sharing octahedra of $X$ anions (e.g. $O^{2-}$, $Cl^-$, $Br^-$) and a larger $A$ metal cation (e.g. alkali, alkaline earth or lanthanide) has a 12-fold coordination with the $X$ anions. This class of compounds has a remarkable variety of electronic, magnetic, optical, mechanical, and transport properties. Such variety is derived from the possibility of tuning the materials propertites by the composition. About 90% of the metallic chemical elements of the periodic table can be stabilized in a perovskite structure. Therefore, perovskites are versatile materials suitable for a number of applications including photovoltaics, thermoelectrics and catalysis. Perovskites constitute a class of materials having the basic formula $ABX_3$ and displaying a common structure in which a smaller metal cation $B$ (e.g. a transition metal) resides in corner-sharing octahedra of $X$ anions (e.g. $O^{2-}$, $Cl^-$, $Br^-$) and a larger $A$ metal cation (e.g. alkali, alkaline earth or lanthanide) has a 12-fold coordination with the $X$ anions. This class of compounds has a remarkable variety of electronic, magnetic, optical, mechanical, and transport properties. Such variety is derived from the possibility of tuning the materials propertites by the composition. About 90% of the metallic chemical elements of the periodic table can be stabilized in a perovskite structure. Therefore, perovskites are versatile materials suitable for a number of applications including photovoltaics, thermoelectrics and catalysis.
The first step to design new perovskites is to assess their stability. For this purpose, the Goldschmidt tolerance factor, $t$, has been extensively used to predict the stability of a material in the perovskite structure based on the (Shannon) ionic radii,$r_i$, of each ion $(A,B,X)$ in the chemical formula $ABX_3$: The first step to design new perovskites is to assess their stability. For this purpose, the Goldschmidt tolerance factor, $t$, has been extensively used to predict the stability of a material in the perovskite structure based on the (Shannon) ionic radii,$r_i$, of each ion $(A,B,X)$ in the chemical formula $ABX_3$:
$$ t=\frac{r_A+r_X}{\sqrt2(r_B+r_X)} $$ $$ t=\frac{r_A+r_X}{\sqrt2(r_B+r_X)} $$
$t$ measures how much the $A$-site cation fits into the corner-sharing octahedral network in a cubic crystal structure. It indicates the compatibilty of a given set of ions with the ideal, cubic perovskite structure ($t\approx1$). Distortions from the cubic structure arise from size mismatch between cations and anions, which results in perovskite structures other than cubic (e.g. orthorhombic, rhombohedral). However, when these distortions are too large, the perovskite structure may be unstable and non-perovskites structures are rather formed. $t$ measures how much the $A$-site cation fits into the corner-sharing octahedral network in a cubic crystal structure. It indicates the compatibilty of a given set of ions with the ideal, cubic perovskite structure ($t\approx1$). Distortions from the cubic structure arise from size mismatch between cations and anions, which results in perovskite structures other than cubic (e.g. orthorhombic, rhombohedral). However, when these distortions are too large, the perovskite structure may be unstable and non-perovskites structures are rather formed.
The accuracy of the Goldschmidt factor is, however, often insufficient to screen for new potential materials and several modification have been proposed to overcome this issue. In this tutorial, we show how data can be used to derive tolerance factors for perovskite stability. The accuracy of the Goldschmidt factor is, however, often insufficient to screen for new potential materials and several modification have been proposed to overcome this issue. In this tutorial, we show how data can be used to derive tolerance factors for perovskite stability.
# The SISSO method for descriptor identifcation # The SISSO method for descriptor identifcation
A crucial step in data-driven materials science is the identification of descriptors, functions of physical parameters that characterize the phenomena governing a certain property. More specifically, a descriptor $\mathbf{d}$ (in general, an array) is mapped into a target property $P$ of interest by means of a mathematical relationship: $P=P(\mathbf{d})$. Descriptors numerically encode the different materials and, crucially, should be obtained (measured or calculated) more easily than the target property, so that they can be evaluated for large sets of still unknown materials to search for new ones. A crucial step in data-driven materials science is the identification of descriptors, functions of physical parameters that characterize the phenomena governing a certain property. More specifically, a descriptor $\mathbf{d}$ (in general, an array) is mapped into a target property $P$ of interest by means of a mathematical relationship: $P=P(\mathbf{d})$. Descriptors numerically encode the different materials and, crucially, should be obtained (measured or calculated) more easily than the target property, so that they can be evaluated for large sets of still unknown materials to search for new ones.
The-sure-independence-screening-and-sparsifying-operator (SISSO) method combines a symbolic-regression-based feature construction with compressed sensing for the (data-driven) identification of the best low-dimensional descriptors. Within SISSO's feature construction, an initial set of input features (the primary features) offered by the user are systematically combined via the application of mathematical operators (e.g. addition, multiplication, exponential, square root), generating a large space of candidate features. The candidate features are then ranked according to their fit to the target property (number of materials in the overlap of convex-hull regions, for the case of classification problems) and the top-ranked features are further used for descriptor selection. The-sure-independence-screening-and-sparsifying-operator (SISSO) method combines a symbolic-regression-based feature construction with compressed sensing for the (data-driven) identification of the best low-dimensional descriptors. Within SISSO's feature construction, an initial set of input features (the primary features) offered by the user are systematically combined via the application of mathematical operators (e.g. addition, multiplication, exponential, square root), generating a large space of candidate features. The candidate features are then ranked according to their fit to the target property (number of materials in the overlap of convex-hull regions, for the case of classification problems) and the top-ranked features are further used for descriptor selection.
For futher details on compressed-sensing methods (including SISSO) for descriptor identification, a <a href="https://analytics-toolkit.nomad-coe.eu/hub/user-redirect/notebooks/tutorials/compressed_sensing.ipynb" target="_blank">dedicated notebook</a> is available in the NOMAD toolkit. For futher details on compressed-sensing methods (including SISSO) for descriptor identification, a <a href="https://analytics-toolkit.nomad-coe.eu/hub/user-redirect/notebooks/tutorials/compressed_sensing.ipynb" target="_blank">dedicated notebook</a> is available in the NOMAD toolkit.
For navigation convenience, the following sections are collapsed. You can uncollapse one by one and run the cells as usual or reach to the section you are interested and before running the first cell in the selected section, click "Run All Above" in the "Cell" menu. For navigation convenience, the following sections are collapsed. You can uncollapse one by one and run the cells as usual or reach to the section you are interested and before running the first cell in the selected section, click "Run All Above" in the "Cell" menu.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Import required modules # Import required modules
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import numpy as np import numpy as np
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
from matplotlib import ticker from matplotlib import ticker
import pandas as pd import pandas as pd
import seaborn as sns import seaborn as sns
import warnings import warnings
import itertools import itertools
import os import os
from math import log from math import log
import re import re
import numexpr as ne import numexpr as ne
import ipywidgets as widgets import ipywidgets as widgets
from IPython.display import Markdown, display, clear_output from IPython.display import Markdown, display, clear_output
from sissopp import Inputs, Unit, FeatureNode from sissopp import Inputs, Unit, FeatureNode
from perovskites_tolerance_factor.PredictPerovskites import PredictABX3, PredictAABBXX6 from perovskites_tolerance_factor.PredictPerovskites import PredictABX3, PredictAABBXX6
from sklearn import tree from sklearn import tree
from sklearn.model_selection import train_test_split from sklearn.model_selection import train_test_split
from sklearn import metrics from sklearn import metrics
from sklearn.calibration import CalibratedClassifierCV from sklearn.calibration import CalibratedClassifierCV
warnings.filterwarnings('ignore') warnings.filterwarnings('ignore')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Get the data # Get the data
The data consists of a list of 576 $ABX_3$ solids experimentally-characterized at ambient conditions, classified as stable or unstable at the perovskite structure, together with the following features: The data consists of a list of 576 $ABX_3$ solids experimentally-characterized at ambient conditions, classified as stable or unstable at the perovskite structure, together with the following features:
<div > <div >
<ul> <ul>
<li>$exp$_$label$ : a Boolean variable, for which the value 1.0 (0.0) means that the materials is stable (unstable) as perovskite</li> <li>$exp$_$label$ : a Boolean variable, for which the value 1.0 (0.0) means that the materials is stable (unstable) as perovskite</li>
<li>$r_A, r_B, r_X$: <a href="https://mipt.ru/upload/medialibrary/df1/revised-effective-ionic-radii-and-systematic-studies-of-interatomic-distances-in-halides-and-chalcogenides_shannon.pdf" target="_blank">Shannon ionic radii</a> of each ion, with $r_A>r_B$ </li> <li>$r_A, r_B, r_X$: <a href="https://mipt.ru/upload/medialibrary/df1/revised-effective-ionic-radii-and-systematic-studies-of-interatomic-distances-in-halides-and-chalcogenides_shannon.pdf" target="_blank">Shannon ionic radii</a> of each ion, with $r_A>r_B$ </li>
<li>$n_A, n_B, n_X$: oxidation satates of each ion </li> <li>$n_A, n_B, n_X$: oxidation satates of each ion </li>
<li>$\frac{r_A}{r_B}, \frac{r_A}{r_X}, \frac{r_B}{r_x}$: ionic radii ratios. Note: this features can be obtained from $r_A, r_B, r_X$ at the first rung of SISSO feature constructions. However, since in Goldschmidt's $t$ factor, ratios of radii appear prominently, they are offered directly as candidate features, aiming at simpler models. </li> <li>$\frac{r_A}{r_B}, \frac{r_A}{r_X}, \frac{r_B}{r_x}$: ionic radii ratios. Note: this features can be obtained from $r_A, r_B, r_X$ at the first rung of SISSO feature constructions. However, since in Goldschmidt's $t$ factor, ratios of radii appear prominently, they are offered directly as candidate features, aiming at simpler models. </li>
<li>$Z_A, Z_B, Z_X$: nuclear charges </li> <li>$Z_A, Z_B, Z_X$: nuclear charges </li>
<li>$r_{s,A}, r_{s,B}, r_{s,X}$: <i>calculated</i> radius where the radial distribution of the $s$ orbital has its maximum </li> <li>$r_{s,A}, r_{s,B}, r_{s,X}$: <i>calculated</i> radius where the radial distribution of the $s$ orbital has its maximum </li>
<li>$r_{p,A}, r_{p,B}, r_{p,X}$: <i>calculated</i> radius where the radial distribution of the $p$ orbital has its maximum</li> <li>$r_{p,A}, r_{p,B}, r_{p,X}$: <i>calculated</i> radius where the radial distribution of the $p$ orbital has its maximum</li>
<li>$HOMO_A, HOMO_B, HOMO_X$: <i>calculated</i> energy of the highest occupied atomic orbital</li> <li>$HOMO_A, HOMO_B, HOMO_X$: <i>calculated</i> energy of the highest occupied atomic orbital</li>
<li>$LUMO_A, LUMO_B, LUMO_X$: <i>calculated</i> energy of the lowest unoccupied atomic orbital</li> <li>$LUMO_A, LUMO_B, LUMO_X$: <i>calculated</i> energy of the lowest unoccupied atomic orbital</li>
<li>$EA_A, EA_B, EA_X$: <i>calculated</i> electron affinity</li> <li>$EA_A, EA_B, EA_X$: <i>calculated</i> electron affinity</li>
<li>$IP_A, IP_B, IP_X$: <i>calculated</i> ionization potential</li> <li>$IP_A, IP_B, IP_X$: <i>calculated</i> ionization potential</li>
</div> </div>
The <i>calculated</i> features were obtained with DFT-PBE using the FHI-aims all-electron full-potential code and correspond to properties of isolated atoms. The <i>calculated</i> features were obtained with DFT-PBE using the FHI-aims all-electron full-potential code and correspond to properties of isolated atoms.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#load data #load data
df = pd.read_csv("data/perovskites_tolerance_factor/data_perovskite.csv", index_col=0) df = pd.read_csv("data/perovskites_tolerance_factor/data_perovskite.csv", index_col=0)
#add rX_rB_ratio feature to the dataframe #add rX_rB_ratio feature to the dataframe
rX_rB_ratio="rB_rX_ratio**-1" rX_rB_ratio="rB_rX_ratio**-1"
df.eval('rX_rB_ratio =' + rX_rB_ratio, inplace=True) df.eval('rX_rB_ratio =' + rX_rB_ratio, inplace=True)
df df
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#count the number of material in each class in the whole dataset #count the number of material in each class in the whole dataset
print('In the whole dataset, %s compositions are unstable and %s are stable in the perovskite structure.' % (df['exp_label'].value_counts().values[0], df['exp_label'].value_counts().values[1])) print('In the whole dataset, %s compositions are unstable and %s are stable in the perovskite structure.' % (df['exp_label'].value_counts().values[0], df['exp_label'].value_counts().values[1]))
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# make lists of random numbers for train and test sets # make lists of random numbers for train and test sets
inds = np.arange(576) inds = np.arange(576)
np.random.shuffle(inds) np.random.shuffle(inds)
task_sizes_train = [460] task_sizes_train = [460]
task_sizes_test = [116] task_sizes_test = [116]
test_inds = [int(ii) for ii in np.sort(inds[:task_sizes_test[0]])] test_inds = [int(ii) for ii in np.sort(inds[:task_sizes_test[0]])]
train_inds = [int(ii) for ii in np.sort(inds[task_sizes_test[0]:])] train_inds = [int(ii) for ii in np.sort(inds[task_sizes_test[0]:])]
# test data frame # test data frame
index_id_test=df.index[test_inds] index_id_test=df.index[test_inds]
df_units_columns= df.columns.values[:] df_units_columns= df.columns.values[:]
test_vals = df.loc[index_id_test, :].values test_vals = df.loc[index_id_test, :].values
test_df= pd.DataFrame(index=index_id_test,data=np.array(test_vals),columns=df_units_columns) test_df= pd.DataFrame(index=index_id_test,data=np.array(test_vals),columns=df_units_columns)
display(test_df) display(test_df)
# train data frame # train data frame
index_id_train=df.index[train_inds] index_id_train=df.index[train_inds]
train_vals = df.loc[index_id_train,:].values train_vals = df.loc[index_id_train,:].values
train_df= pd.DataFrame(index=index_id_train,data=np.array(train_vals),columns=df_units_columns) train_df= pd.DataFrame(index=index_id_train,data=np.array(train_vals),columns=df_units_columns)
display(train_df) display(train_df)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#count the number of material in each class in the training/test sets #count the number of material in each class in the training/test sets
print('In the training set, %s compositions are unstable and %s are stable.' % (train_df['exp_label'].value_counts().values[0], train_df['exp_label'].value_counts().values[1])) print('In the training set, %s compositions are unstable and %s are stable.' % (train_df['exp_label'].value_counts().values[0], train_df['exp_label'].value_counts().values[1]))
print('In the test set, %s compositions are unstable and %s are stable.' % (test_df['exp_label'].value_counts().values[0], test_df['exp_label'].value_counts().values[1])) print('In the test set, %s compositions are unstable and %s are stable.' % (test_df['exp_label'].value_counts().values[0], test_df['exp_label'].value_counts().values[1]))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Generate the candidate features space from the primary features and operators # Generate the candidate features space from the primary features and operators
The two sets of elements needed to create the feature space with SISSO are the features to be used (i.e. the primary features) and the set of mathematical operators to be applied. Another input from the user is the number of times the operators are applied, the so-called rung (max_phi). The two sets of elements needed to create the feature space with SISSO are the features to be used (i.e. the primary features) and the set of mathematical operators to be applied. Another input from the user is the number of times the operators are applied, the so-called rung (max_phi).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#add the primary features units to data set #add the primary features units to data set
df_units=df.rename(columns={'rA':'rA (AA)', 'rB':'rB (AA)', 'rX':'rX (AA)', df_units=df.rename(columns={'rA':'rA (AA)', 'rB':'rB (AA)', 'rX':'rX (AA)',
'nA':'nA (Unitless)', 'nB':'nB (Unitless)', 'nX':'nX (Unitless)', 'nA':'nA (Unitless)', 'nB':'nB (Unitless)', 'nX':'nX (Unitless)',
'rA_rB_ratio':'rA_rB_ratio (Unitless)', 'rA_rX_ratio':'rA_rX_ratio (Unitless)', 'rB_rX_ratio':'rB_rX_ratio (Unitless)', 'rA_rB_ratio':'rA_rB_ratio (Unitless)', 'rA_rX_ratio':'rA_rX_ratio (Unitless)', 'rB_rX_ratio':'rB_rX_ratio (Unitless)',
'rS_A':'rS_A (AA)', 'rP_A':'rP_A (AA)', 'rS_A':'rS_A (AA)', 'rP_A':'rP_A (AA)',
'Z_A':'Z_A (elem_charge)', 'Z_A':'Z_A (elem_charge)',
'HOMO_A':'HOMO_A (eV)', 'LUMO_A':'LUMO_A (eV)', 'HOMO_A':'HOMO_A (eV)', 'LUMO_A':'LUMO_A (eV)',
'EA_A':'EA_A (eV)', 'IP_A':'IP_A (eV)', 'EA_A':'EA_A (eV)', 'IP_A':'IP_A (eV)',
'rS_B':'rS_B (AA)', 'rP_B':'rP_B (AA)', 'rS_B':'rS_B (AA)', 'rP_B':'rP_B (AA)',
'Z_B':'Z_B (elem_charge)', 'Z_B':'Z_B (elem_charge)',
'HOMO_B':'HOMO_B (eV)', 'LUMO_B':'LUMO_B (eV)', 'HOMO_B':'HOMO_B (eV)', 'LUMO_B':'LUMO_B (eV)',
'EA_B':'EA_B (eV)', 'IP_B':'IP_B (eV)', 'EA_B':'EA_B (eV)', 'IP_B':'IP_B (eV)',
'rS_X':'rS_X (AA)', 'rP_X':'rP_X (AA)', 'rS_X':'rS_X (AA)', 'rP_X':'rP_X (AA)',
'Z_X':'Z_X (elem_charge)', 'Z_X':'Z_X (elem_charge)',
'HOMO_X':'HOMO_X (eV)','LUMO_X':'LUMO_X (eV)', 'HOMO_X':'HOMO_X (eV)','LUMO_X':'LUMO_X (eV)',
'EA_X':'EA_X (eV)', 'IP_X':'IP_X (eV)','rX_rB_ratio':'rX_rB_ratio (Unitless)', 'EA_X':'EA_X (eV)', 'IP_X':'IP_X (eV)','rX_rB_ratio':'rX_rB_ratio (Unitless)',
}) })
df_units df_units
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#define list of primary features - user has to choose #define list of primary features - user has to choose
cols = [ cols = [
# 'rA (AA)', # 'rA (AA)',
# 'rB (AA)', # 'rB (AA)',
# 'rX (AA)', # 'rX (AA)',
'nA (Unitless)', 'nA (Unitless)',
# 'nA2 (Unitless)', # 'nA2 (Unitless)',
# 'nB (Unitless)', # 'nB (Unitless)',
# 'nX(Unitless)', # 'nX(Unitless)',
'rA_rB_ratio (Unitless)', 'rA_rB_ratio (Unitless)',
# 'rA_rX_ratio (Unitless)', # 'rA_rX_ratio (Unitless)',
# 'rB_rX_ratio (Unitless)', # 'rB_rX_ratio (Unitless)',
'rX_rB_ratio (Unitless)', 'rX_rB_ratio (Unitless)',
# 'log_rA_rB_ratio (Unitless)', # 'log_rA_rB_ratio (Unitless)',
# 'rS_A (AA)', # 'rS_A (AA)',
# 'rP_A (AA)', # 'rP_A (AA)',
# 'Z_A (elem_charge)', # 'Z_A (elem_charge)',
# 'HOMO_A (eV)', # 'HOMO_A (eV)',
# 'LUMO_A (eV)', # 'LUMO_A (eV)',
# 'EA_A (eV)', # 'EA_A (eV)',
# 'IP_A (eV)', # 'IP_A (eV)',
# 'rS_B (AA)', # 'rS_B (AA)',
# 'rP_B (AA)', # 'rP_B (AA)',
# 'Z_B (elem_charge)', # 'Z_B (elem_charge)',
# 'HOMO_B (eV)', # 'HOMO_B (eV)',
# 'LUMO_B (eV)', # 'LUMO_B (eV)',
# 'EA_B (eV)', # 'EA_B (eV)',
# 'IP_B (eV)', # 'IP_B (eV)',
# 'rS_X (AA)', # 'rS_X (AA)',
# 'rP_X (AA)', # 'rP_X (AA)',
# 'Z_X (elem_charge)', # 'Z_X (elem_charge)',
# 'HOMO_X (eV)', # 'HOMO_X (eV)',
# 'LUMO_X (eV)', # 'LUMO_X (eV)',
# 'EA_X (eV)', # 'EA_X (eV)',
# 'IP_X (eV)' # 'IP_X (eV)'
] ]
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#define list of operators - user has to choose #define list of operators - user has to choose
ops = [ ops = [
# "add", # "add",
# "two_power", # "two_power",
"sub", "sub",
# "abs_diff", # "abs_diff",
"mult", "mult",
"div", "div",
# "exp", # "exp",
# "neg_exp", # "neg_exp",
# "inv", # "inv",
"sq", "sq",
# "cb", # "cb",
# "sixth_power", # "sixth_power",
# "sqrt", # "sqrt",
# "cbrt", # "cbrt",
"log", "log",
# "abs", # "abs",
# "sin", # "sin",
# "cos", # "cos",
] ]
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# making input files for SISSO++ # making input files for SISSO++
# note: together with the pre-uncommented features and operators above, these settings ensure to find tau as # note: together with the pre-uncommented features and operators above, these settings ensure to find tau as
# in the paper. All settings can be modified, but note that adding features and/or operators, # in the paper. All settings can be modified, but note that adding features and/or operators,
# increasing n_sis_select, max_rung, and n_dim increases the computational time, typically exponentially! # increasing n_sis_select, max_rung, and n_dim increases the computational time, typically exponentially!
inputs = Inputs() inputs = Inputs()
inputs.allowed_ops = ops inputs.allowed_ops = ops
inputs.n_sis_select = 2000 inputs.n_sis_select = 2000
inputs.max_rung = 3 inputs.max_rung = 3
inputs.n_dim = 3 inputs.n_dim = 3
inputs.calc_type = "classification" inputs.calc_type = "classification"
inputs.n_residual = 200 inputs.n_residual = 200
inputs.n_model_store = 1 inputs.n_model_store = 1
inputs.min_abs_feat_val = 0 inputs.min_abs_feat_val = 0
inputs.max_abs_feat_val = 1e5 inputs.max_abs_feat_val = 1e5
inputs.task_names = ["all_mats"] inputs.task_names = ["all_mats"]
inputs.task_sizes_train = [460] inputs.task_sizes_train = [460]
inputs.task_sizes_test = [116] inputs.task_sizes_test = [116]
inputs.leave_out_inds = list(test_inds) inputs.leave_out_inds = list(test_inds)
#list_object[index] = new_value #list_object[index] = new_value
# exp_label values according to the previous list of random number for train and test sets # exp_label values according to the previous list of random number for train and test sets
inputs.prop_train = df_units.loc[df_units.index[train_inds], "exp_label"].values inputs.prop_train = df_units.loc[df_units.index[train_inds], "exp_label"].values
inputs.prop_test = df_units.loc[df_units.index[test_inds], "exp_label"].values inputs.prop_test = df_units.loc[df_units.index[test_inds], "exp_label"].values
inputs.prop_label = "exp_label" inputs.prop_label = "exp_label"
inputs.prop_unit = Unit("eV") inputs.prop_unit = Unit("eV")
inputs.global_param_optfalse = True inputs.global_param_optfalse = True
inputs.fix_intercept= True inputs.fix_intercept= True
# name of materials according to the previous list of random number for train and test sets # name of materials according to the previous list of random number for train and test sets
inputs.sample_ids_train = df_units.index[train_inds].tolist() inputs.sample_ids_train = df_units.index[train_inds].tolist()
inputs.sample_ids_test = df_units.index[test_inds].tolist() inputs.sample_ids_test = df_units.index[test_inds].tolist()
#primary features - deletes the unit part of the features and then uses node to make the input #primary features - deletes the unit part of the features and then uses node to make the input
phi_0 = [] phi_0 = []
for cc, col in enumerate(cols): for cc, col in enumerate(cols):
expr = col.split("(")[0].strip() expr = col.split("(")[0].strip()
if len(col.split("(")) == 2: if len(col.split("(")) == 2:
unit = Unit(col.split("(")[1].split(")")[0].strip()) unit = Unit(col.split("(")[1].split(")")[0].strip())
else: else:
unit = Unit() unit = Unit()
phi_0.append(FeatureNode(cc, expr, df_units.loc[df_units.index[train_inds], col].tolist(), df_units.loc[df_units.index[test_inds], col].tolist(), unit)) phi_0.append(FeatureNode(cc, expr, df_units.loc[df_units.index[train_inds], col].tolist(), df_units.loc[df_units.index[test_inds], col].tolist(), unit))
inputs.phi_0 = phi_0 inputs.phi_0 = phi_0
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Select the best candidate features and classification thresholds # Select the best candidate features and classification thresholds
Next, the generated candidate features are selected in two steps. In a first step (SIS), they are ranked according to the number of materials $N$ that fall in overlapping regions of stable and unstable domains and only the top-ranked features are kept. The domain is defined as the range between the maximum and minimum values of the feature for each of the classes (stable and unstable). The best candidate features are those that present lower $N$. The lenght of the overlap domain, $S$, is used to rank features presenting the same $N$. $N$ and $S$ correspond to equations 2 and 3, respectively, in Phys. Rev. Materials 2, 083802 (2018). Next, the generated candidate features are selected in two steps. In a first step (SIS), they are ranked according to the number of materials $N$ that fall in overlapping regions of stable and unstable domains and only the top-ranked features are kept. The domain is defined as the range between the maximum and minimum values of the feature for each of the classes (stable and unstable). The best candidate features are those that present lower $N$. The lenght of the overlap domain, $S$, is used to rank features presenting the same $N$. $N$ and $S$ correspond to equations 2 and 3, respectively, in Phys. Rev. Materials 2, 083802 (2018).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#Perform SISSO++ to create a feature space and select the features #Perform SISSO++ to create a feature space and select the features
from sissopp import FeatureSpace, SISSOClassifier from sissopp import FeatureSpace, SISSOClassifier
feature_space = FeatureSpace(inputs) feature_space = FeatureSpace(inputs)
feature_space.sis(df_units.loc[df_units.index[train_inds], "exp_label"].values) feature_space.sis(df_units.loc[df_units.index[train_inds], "exp_label"].values)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
features=[feat.expr for feat in feature_space.phi_selected] # features features=[feat.expr for feat in feature_space.phi_selected] # features
vals = [feat.value for feat in feature_space.phi_selected] # the values of the features vals = [feat.value for feat in feature_space.phi_selected] # the values of the features
vals=np.transpose(vals) # transpose of the matrix to make it compatible with the indices vals=np.transpose(vals) # transpose of the matrix to make it compatible with the indices
index_id=df_units.index[train_inds] # indices of materials index_id=df_units.index[train_inds] # indices of materials
feature_df=pd.DataFrame(index=index_id,data=np.array(vals),columns=features) # make a data frame of the features and values of the features for different materials feature_df=pd.DataFrame(index=index_id,data=np.array(vals),columns=features) # make a data frame of the features and values of the features for different materials
feature_df feature_df
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
In a second step (SO), classification trees are used to choose the best candidate feature among those selected by the overlaps (above). For each of the selected candidate features, a classification tree is trained, providing a threshold for the classification and its accuracy. The selected candidate features get ranked based on their accuracy. In a second step (SO), classification trees are used to choose the best candidate feature among those selected by the overlaps (above). For each of the selected candidate features, a classification tree is trained, providing a threshold for the classification and its accuracy. The selected candidate features get ranked based on their accuracy.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#train classification trees for the selected descriptors #train classification trees for the selected descriptors
#depth of the classification tree - user has to choose #depth of the classification tree - user has to choose
def rank_tree(labels, feature_space, depth): #rank features according to the classification-tree accuracy def rank_tree(labels, feature_space, depth): #rank features according to the classification-tree accuracy
score = [] score = []
for i in list(range(0,feature_space.shape[1])): # 'i' is a column and 'for' is from the first column to the last one for i in list(range(0,feature_space.shape[1])): # 'i' is a column and 'for' is from the first column to the last one
x=np.array(feature_space)[:,i] # take the first column values x=np.array(feature_space)[:,i] # take the first column values
clf = tree.DecisionTreeClassifier(max_depth=depth) clf = tree.DecisionTreeClassifier(max_depth=depth)
clf = clf.fit(x.reshape(-1,1), labels) # Build a decision-tree classifier from the training set (X, y). X is the values of features (for each for iteration on column) and Y is the target value, here exp_label clf = clf.fit(x.reshape(-1,1), labels) # Build a decision-tree classifier from the training set (X, y). X is the values of features (for each for iteration on column) and Y is the target value, here exp_label
score.append([feature_space.columns.values[i],clf.score(x.reshape(-1,1),labels)]) # make a list of the feature and the mean accuracy of the all values of that feaure (for different materials) score.append([feature_space.columns.values[i],clf.score(x.reshape(-1,1),labels)]) # make a list of the feature and the mean accuracy of the all values of that feaure (for different materials)
score_sorted=sorted(score,reverse=True,key=lambda x: x[1]) # sort the features based on the accuracy score_sorted=sorted(score,reverse=True,key=lambda x: x[1]) # sort the features based on the accuracy
return score_sorted return score_sorted
labels_train=train_df["exp_label"].to_numpy() labels_train=train_df["exp_label"].to_numpy()
rank_list=rank_tree(labels_train, feature_df, 1) rank_list=rank_tree(labels_train, feature_df, 1)
tree_pd =pd.DataFrame(rank_list, columns=['feature','tree accuracy']) # make a new data frame of the feature and the accuracies tree_pd =pd.DataFrame(rank_list, columns=['feature','tree accuracy']) # make a new data frame of the feature and the accuracies
tree_pd.style.set_properties(subset=['feature'], **{'width': '300px'}) tree_pd.style.set_properties(subset=['feature'], **{'width': '300px'})
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The top-ranked candidate feature corresponds to the the SISSO-derived tolerance factor. We call this descriptor $t_{sisso}$. The top-ranked candidate feature corresponds to the the SISSO-derived tolerance factor. We call this descriptor $t_{sisso}$.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# the first ranked feature is the t_sisso # the first ranked feature is the t_sisso
t_sisso_expression=str(rank_list[0][0]) t_sisso_expression=str(rank_list[0][0])
t_sisso_expression = t_sisso_expression.replace("ln","log") t_sisso_expression = t_sisso_expression.replace("ln","log")
t_sisso_expression = t_sisso_expression.replace("^","**") t_sisso_expression = t_sisso_expression.replace("^","**")
print('Identified expression for t_sisso: %s' % t_sisso_expression) print('Identified expression for t_sisso: %s' % t_sisso_expression)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
$t_{sisso}$ can be now evaluated for all the materials, including those in the test set. For comparison, we also evaluate the Goldschmidt factor ($t$) and the $\tau$ descriptor found in Sci. Adv. 5, eaav0693 (2019). $t_{sisso}$ can be now evaluated for all the materials, including those in the test set. For comparison, we also evaluate the Goldschmidt factor ($t$) and the $\tau$ descriptor found in Sci. Adv. 5, eaav0693 (2019).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# evaluate t_sisso, t and tau for all materials in train and test sets and add them as extra columns to the train and test data frame # evaluate t_sisso, t and tau for all materials in train and test sets and add them as extra columns to the train and test data frame
#make a dictionary for t_sisso,t, tau #make a dictionary for t_sisso,t, tau
tolerance_factor_dict = { tolerance_factor_dict = {
"t_sisso": [t_sisso_expression], "t_sisso": [t_sisso_expression],
"t": ["(rA+rX)/(1.41421*(rB+rX))"], "t": ["(rA+rX)/(1.41421*(rB+rX))"],
"tau": ["rX/rB-nA*(nA-rA_rB_ratio/log(rA_rB_ratio))"] "tau": ["rX/rB-nA*(nA-rA_rB_ratio/log(rA_rB_ratio))"]
} }
#Add tau threshold #Add tau threshold
tolerance_factor_dict["tau"].append(4.18) tolerance_factor_dict["tau"].append(4.18)
train_df.eval('t_sisso = ' + tolerance_factor_dict['t_sisso'][0], inplace=True) train_df.eval('t_sisso = ' + tolerance_factor_dict['t_sisso'][0], inplace=True)
train_df.eval('t = '+ tolerance_factor_dict['t'][0],inplace=True) train_df.eval('t = '+ tolerance_factor_dict['t'][0],inplace=True)
train_df.eval('tau = '+ tolerance_factor_dict['tau'][0], inplace=True) train_df.eval('tau = '+ tolerance_factor_dict['tau'][0], inplace=True)
test_df.eval('t_sisso = ' + tolerance_factor_dict['t_sisso'][0], inplace=True) test_df.eval('t_sisso = ' + tolerance_factor_dict['t_sisso'][0], inplace=True)
test_df.eval('t = '+ tolerance_factor_dict['t'][0], inplace=True) test_df.eval('t = '+ tolerance_factor_dict['t'][0], inplace=True)
test_df.eval('tau = '+ tolerance_factor_dict['tau'][0], inplace=True) test_df.eval('tau = '+ tolerance_factor_dict['tau'][0], inplace=True)
train_df train_df
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Evalute the descriptor performance # Evalute the descriptor performance
The accuracy of the classification tree for $t_{sisso}$ can be now evaluated for train and test sets. For the classification tree, a maximum depth of one is used here. The accuracy of the classification tree for $t_{sisso}$ can be now evaluated for train and test sets. For the classification tree, a maximum depth of one is used here.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#train classification tree with t_sisso #train classification tree with t_sisso
x_train_t_sisso=train_df['t_sisso'].to_numpy() x_train_t_sisso=train_df['t_sisso'].to_numpy()
x_test_t_sisso=test_df['t_sisso'].to_numpy() x_test_t_sisso=test_df['t_sisso'].to_numpy()
labels_test=test_df['exp_label'].to_numpy() labels_test=test_df['exp_label'].to_numpy()
clf = tree.DecisionTreeClassifier(max_depth=1) clf = tree.DecisionTreeClassifier(max_depth=1)
clf = clf.fit(x_train_t_sisso.reshape(-1,1),labels_train) clf = clf.fit(x_train_t_sisso.reshape(-1,1),labels_train)
labels_pred_t_sisso=clf.predict(x_test_t_sisso.reshape(-1,1)) labels_pred_t_sisso=clf.predict(x_test_t_sisso.reshape(-1,1))
tree.plot_tree(clf) tree.plot_tree(clf)
print('Classification tree accuracy (for t_sisso) on the train set: %f.' % clf.score(x_train_t_sisso.reshape(-1,1),labels_train)) print('Classification tree accuracy (for t_sisso) on the train set: %f.' % clf.score(x_train_t_sisso.reshape(-1,1),labels_train))
print('Classification tree accuracy (for t_sisso) on the test set: %f.' % metrics.accuracy_score(labels_test, labels_pred_t_sisso)) print('Classification tree accuracy (for t_sisso) on the test set: %f.' % metrics.accuracy_score(labels_test, labels_pred_t_sisso))
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
threshold_t_sisso=clf.tree_.threshold[0] threshold_t_sisso=clf.tree_.threshold[0]
#Add threshold to the dictionary #Add threshold to the dictionary
tolerance_factor_dict["t_sisso"].append(threshold_t_sisso) tolerance_factor_dict["t_sisso"].append(threshold_t_sisso)
print('t_sisso < %f indicates stable perovskites.' % tolerance_factor_dict["t_sisso"][1]) print('t_sisso < %f indicates stable perovskites.' % tolerance_factor_dict["t_sisso"][1])
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
The accuracy of the classification tree for $t$ can be also evaluated. In order to mimic the original calibration of $t$ performed by Goldschmidt, we adopt here a maximum depth of two for the classification tree, corresponding to a classification based on two thresholds. The accuracy of the classification tree for $t$ can be also evaluated. In order to mimic the original calibration of $t$ performed by Goldschmidt, we adopt here a maximum depth of two for the classification tree, corresponding to a classification based on two thresholds.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#train classification tree with t #train classification tree with t
x_train_t=train_df['t'].to_numpy() x_train_t=train_df['t'].to_numpy()
x_test_t=test_df['t'].to_numpy() x_test_t=test_df['t'].to_numpy()
clf1 = tree.DecisionTreeClassifier(max_depth=2) clf1 = tree.DecisionTreeClassifier(max_depth=2)
clf1 = clf1.fit(x_train_t.reshape(-1,1),labels_train) clf1 = clf1.fit(x_train_t.reshape(-1,1),labels_train)
labels_pred_t=clf1.predict(x_test_t.reshape(-1,1)) labels_pred_t=clf1.predict(x_test_t.reshape(-1,1))
tree.plot_tree(clf1) tree.plot_tree(clf1)
print('Classification tree accuracy (for t) on the train set: %f.' % clf1.score(x_train_t.reshape(-1,1),labels_train)) print('Classification tree accuracy (for t) on the train set: %f.' % clf1.score(x_train_t.reshape(-1,1),labels_train))
print('Classification tree accuracy (for t) on the test set: %f.' % metrics.accuracy_score(labels_test, labels_pred_t)) print('Classification tree accuracy (for t) on the test set: %f.' % metrics.accuracy_score(labels_test, labels_pred_t))
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
threshold_t=[clf1.tree_.threshold[0],clf1.tree_.threshold[4]] threshold_t=[clf1.tree_.threshold[0],clf1.tree_.threshold[4]]
# add t threshold to dictionary # add t threshold to dictionary
tolerance_factor_dict["t"].append(threshold_t) tolerance_factor_dict["t"].append(threshold_t)
print('%f < t < %f indicates stable perovskites.' % (tolerance_factor_dict["t"][1][0],tolerance_factor_dict["t"][1][1]) ) print('%f < t < %f indicates stable perovskites.' % (tolerance_factor_dict["t"][1][0],tolerance_factor_dict["t"][1][1]) )
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Platt-scaled classification probabilities $P(t_{sisso})$ are also computed based on the $t_{sisso}$ values and the labels. Such probabilities indicate the likelihood that a material is stable in the perovskite structure (as opposed to the stable vs. non-stable classification provided by the threshold alone). Platt-scaled classification probabilities $P(t_{sisso})$ are also computed based on the $t_{sisso}$ values and the labels. Such probabilities indicate the likelihood that a material is stable in the perovskite structure (as opposed to the stable vs. non-stable classification provided by the threshold alone).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#evaluation of P(t_sisso) #evaluation of P(t_sisso)
labels_platt=clf.predict(x_train_t_sisso.reshape(-1,1)) labels_platt=clf.predict(x_train_t_sisso.reshape(-1,1))
clf2 = CalibratedClassifierCV(cv=3) clf2 = CalibratedClassifierCV(cv=3)
clf2 = clf2.fit(x_train_t_sisso.reshape(-1,1), labels_platt) clf2 = clf2.fit(x_train_t_sisso.reshape(-1,1), labels_platt)
p_t_sisso_train=clf2.predict_proba(x_train_t_sisso.reshape(-1,1))[:,1] p_t_sisso_train=clf2.predict_proba(x_train_t_sisso.reshape(-1,1))[:,1]
p_t_sisso_test=clf2.predict_proba(x_test_t_sisso.reshape(-1,1))[:,1] p_t_sisso_test=clf2.predict_proba(x_test_t_sisso.reshape(-1,1))[:,1]
train_df['p_t_sisso'] = p_t_sisso_train # add p_t_sisso to the train and test data frame train_df['p_t_sisso'] = p_t_sisso_train # add p_t_sisso to the train and test data frame
test_df['p_t_sisso'] = p_t_sisso_test test_df['p_t_sisso'] = p_t_sisso_test
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#plotting P(t_sisso) vs t_sisso #plotting P(t_sisso) vs t_sisso
plt.figure(figsize=(8,8)) plt.figure(figsize=(8,8))
concat=pd.concat([train_df.assign(dataset='train'), test_df.assign(dataset='test')]) concat=pd.concat([train_df.assign(dataset='train'), test_df.assign(dataset='test')])
plot1=sns.scatterplot(x='t_sisso', y='p_t_sisso', data=concat,hue='exp_label', style='dataset', plot1=sns.scatterplot(x='t_sisso', y='p_t_sisso', data=concat,hue='exp_label', style='dataset',
palette=['red','blue'], markers=['s','o'], s=80) palette=['red','blue'], markers=['s','o'], s=80)
plot1.set_xlabel("$t_{sisso}$", fontsize=20) plot1.set_xlabel("$t_{sisso}$", fontsize=20)
plot1.set_ylabel("$P(t_{sisso})$", fontsize=20) plot1.set_ylabel("$P(t_{sisso})$", fontsize=20)
plot1.tick_params(labelsize=20) plot1.tick_params(labelsize=20)
plt.xlim(threshold_t_sisso-0.5, threshold_t_sisso+0.5) plt.xlim(threshold_t_sisso-0.5, threshold_t_sisso+0.5)
plt.axvline(tolerance_factor_dict["t_sisso"][1]) plt.axvline(tolerance_factor_dict["t_sisso"][1])
plt.axhline(0.5,linestyle='--') plt.axhline(0.5,linestyle='--')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Finally, the performance of $t_{sisso}$ and $t$ can be compared by plotting $t_{sisso}$ vs. $t$ and $P(t_{sisso})$ vs. $t$. In thes plot, each datapoint is colored according to the (true) experimental label (e.g. whether a materials is a perovskite (blue) or not (red)) for both train and test sets. Finally, the performance of $t_{sisso}$ and $t$ can be compared by plotting $t_{sisso}$ vs. $t$ and $P(t_{sisso})$ vs. $t$. In thes plot, each datapoint is colored according to the (true) experimental label (e.g. whether a materials is a perovskite (blue) or not (red)) for both train and test sets.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plt.figure(figsize=(8,8)) plt.figure(figsize=(8,8))
plot2=sns.scatterplot(x='t', y='t_sisso', data=concat, hue='exp_label', style='dataset', plot2=sns.scatterplot(x='t', y='t_sisso', data=concat, hue='exp_label', style='dataset',
palette=['red','blue'], markers=['s','o'], s=80) palette=['red','blue'], markers=['s','o'], s=80)
plot2.set_xlabel("$t$", fontsize=20) plot2.set_xlabel("$t$", fontsize=20)
plot2.set_ylabel("$t_{sisso}$", fontsize=20) plot2.set_ylabel("$t_{sisso}$", fontsize=20)
plot2.tick_params(labelsize=20) plot2.tick_params(labelsize=20)
plt.ylim(threshold_t_sisso-4, threshold_t_sisso+4) plt.ylim(threshold_t_sisso-4, threshold_t_sisso+4)
plt.axvline(tolerance_factor_dict["t"][1][0]) plt.axvline(tolerance_factor_dict["t"][1][0])
plt.axvline(tolerance_factor_dict["t"][1][1]) plt.axvline(tolerance_factor_dict["t"][1][1])
plt.axhline(threshold_t_sisso) plt.axhline(threshold_t_sisso)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
plt.figure(figsize=(8,8)) plt.figure(figsize=(8,8))
plot3=sns.scatterplot(x='t', y='p_t_sisso', data=concat, hue='exp_label', style='dataset', plot3=sns.scatterplot(x='t', y='p_t_sisso', data=concat, hue='exp_label', style='dataset',
palette=['red','blue'], markers=['s','o'], s=80) palette=['red','blue'], markers=['s','o'], s=80)
plot3.set_xlabel("$t$", fontsize=20) plot3.set_xlabel("$t$", fontsize=20)
plot3.set_ylabel("$P(t_{sisso})$", fontsize=20) plot3.set_ylabel("$P(t_{sisso})$", fontsize=20)
plot3.tick_params(labelsize=20) plot3.tick_params(labelsize=20)
plt.axvline(tolerance_factor_dict["t"][1][0]) plt.axvline(tolerance_factor_dict["t"][1][0])
plt.axvline(tolerance_factor_dict["t"][1][1]) plt.axvline(tolerance_factor_dict["t"][1][1])
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Predict stability of unseen materials (exploitation) using $t_{sisso}$ # Predict stability of unseen materials (exploitation) using $t_{sisso}$
With $t_{sisso}$ in hand, one can explore large composition spaces in search for (new) compositions which are likely to form perovskites. We start by creating a list of candidate materials to be tested. With $t_{sisso}$ in hand, one can explore large composition spaces in search for (new) compositions which are likely to form perovskites. We start by creating a list of candidate materials to be tested.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
potential_A=['Li','Na','K','Rb','Cs','Be','Mg','Ca','Sr','Ba','Sc','Y','La','Ce','Pr','Nd','Pm','Sm'] potential_A=['Li','Na','K','Rb','Cs','Be','Mg','Ca','Sr','Ba','Sc','Y','La','Ce','Pr','Nd','Pm','Sm']
potential_B=['Al','Ti', 'V', 'Cr','Mn','Fe','Co','Ni','Cu', 'Zn','Ga','Ge','Zr','Nb','Mo','Tc','Ru','Rh','Pd','Ag','Cd','Sn','Sb','Ta','W','Pt','Pb','Bi'] potential_B=['Al','Ti', 'V', 'Cr','Mn','Fe','Co','Ni','Cu', 'Zn','Ga','Ge','Zr','Nb','Mo','Tc','Ru','Rh','Pd','Ag','Cd','Sn','Sb','Ta','W','Pt','Pb','Bi']
potential_X=['O3'] potential_X=['O3']
candidates=list(map("".join, itertools.product(potential_A, potential_B, potential_X))) # combine the elements candidates=list(map("".join, itertools.product(potential_A, potential_B, potential_X))) # combine the elements
print('A list of %i candidate compositions was created.' %len(candidates)) print('A list of %i candidate compositions was created.' %len(candidates))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
For each of the compositions, one has to predict which cations are at the $A$ and $B$-sites of the perovskite structure (based on their relative sizes) and assign oxidation states for each cation (based on their periodic table group and common oxidation states). Oxidation states are not only needed for the evaluation of the tolerance factors, but also for the determination of the Shannon ionic radii, which are themselves oxidation-state dependant. Furthermore, based on oxidations states, one can assess if a given composition is charge-balanced or not. In order to perform these tasks, we use PredictABX3.py, provided as SI of Sci. Adv. 5, eaav0693 (2019). From the list of given compositions, we determine A and B and exclude the compounds that cannot be charge-balanced. For each of the compositions, one has to predict which cations are at the $A$ and $B$-sites of the perovskite structure (based on their relative sizes) and assign oxidation states for each cation (based on their periodic table group and common oxidation states). Oxidation states are not only needed for the evaluation of the tolerance factors, but also for the determination of the Shannon ionic radii, which are themselves oxidation-state dependant. Furthermore, based on oxidations states, one can assess if a given composition is charge-balanced or not. In order to perform these tasks, we use PredictABX3.py, provided as SI of Sci. Adv. 5, eaav0693 (2019). From the list of given compositions, we determine A and B and exclude the compounds that cannot be charge-balanced.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
for i in candidates: for i in candidates:
abx=PredictABX3(i) # PredictABX3 library predict that the given candidate is perovskite or not abx=PredictABX3(i) # PredictABX3 library predict that the given candidate is perovskite or not
if isinstance(abx.pred_A,float): if isinstance(abx.pred_A,float):
candidates.remove(i) candidates.remove(i)
# else: # else:
# print('For the compound %s, A = %s and B = %s' % (i, abx.pred_A, abx.pred_B)) # print('For the compound %s, A = %s and B = %s' % (i, abx.pred_A, abx.pred_B))
print('A list of %i charge-balanced compositions was selected for further analysis:' %len(candidates)) print('A list of %i charge-balanced compositions was selected for further analysis:' %len(candidates))
#print(candidates) #print(candidates)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We then collect all the input features for each of the given materials (including the DFT-calculated atomic feature) and evalute $t_{sisso}$, $t$ and $\tau$ for each of the materials. We then collect all the input features for each of the given materials (including the DFT-calculated atomic feature) and evalute $t_{sisso}$, $t$ and $\tau$ for each of the materials.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
def get_calc_feat(element): # For a given element gets the DFT calculated features from 'atomic_features.csv' file def get_calc_feat(element): # For a given element gets the DFT calculated features from 'atomic_features.csv' file
calc_feat=[] calc_feat=[]
with open('data/perovskites_tolerance_factor/atomic_features.csv') as f: with open('data/perovskites_tolerance_factor/atomic_features.csv') as f:
for line in f: for line in f:
line = line.split(',') line = line.split(',')
if line[2] == element: if line[2] == element:
calc_feat.append(float(line[12])*1E10) # rS (Angsrom) calc_feat.append(float(line[12])*1E10) # rS (Angsrom)
calc_feat.append(float(line[14])*1E10) # rP (Angstrom) calc_feat.append(float(line[14])*1E10) # rP (Angstrom)
calc_feat.append(float(line[1])) # Z_A calc_feat.append(float(line[1])) # Z_A
calc_feat.append(float(line[5])*6.241509074461E18) # HOMO (eV) calc_feat.append(float(line[5])*6.241509074461E18) # HOMO (eV)
calc_feat.append(float(line[6])*6.241509074461E18) # LUMO (eV) calc_feat.append(float(line[6])*6.241509074461E18) # LUMO (eV)
calc_feat.append(float(line[8])*6.241509074461E18) # EA (eV) calc_feat.append(float(line[8])*6.241509074461E18) # EA (eV)
calc_feat.append(float(line[9])*6.241509074461E18) # IP (eV) calc_feat.append(float(line[9])*6.241509074461E18) # IP (eV)
return calc_feat return calc_feat
exp_features=pd.read_csv('data/perovskites_tolerance_factor/TableS1.csv',index_col=0) exp_features=pd.read_csv('data/perovskites_tolerance_factor/TableS1.csv',index_col=0)
features=[] features=[]
for i in candidates: for i in candidates:
abx=PredictABX3(i) abx=PredictABX3(i)
if i in exp_features.index: # If the predicted material is in the Table consider it if i in exp_features.index: # If the predicted material is in the Table consider it
features.append([i, 0, abx.rA, abx.rB, abx.rX, abx.nA, abx.nB, abx.nX, abx.rA/abx.rB, abx.rA/abx.rX, abx.rB/abx.rX, abx.rX/abx.rB, *get_calc_feat(abx.pred_A), *get_calc_feat(abx.pred_B), *get_calc_feat(abx.X)]) features.append([i, 0, abx.rA, abx.rB, abx.rX, abx.nA, abx.nB, abx.nX, abx.rA/abx.rB, abx.rA/abx.rX, abx.rB/abx.rX, abx.rX/abx.rB, *get_calc_feat(abx.pred_A), *get_calc_feat(abx.pred_B), *get_calc_feat(abx.X)])
#make a data frame of the new materials with the chosen features #make a data frame of the new materials with the chosen features
exploit=pd.DataFrame(features, columns=['material', 'tau_pred_label', exploit=pd.DataFrame(features, columns=['material', 'tau_pred_label',
'rA','rB','rX','nA','nB','nX','rA_rB_ratio','rA_rX_ratio','rB_rX_ratio','rX_rB_ratio', 'rA','rB','rX','nA','nB','nX','rA_rB_ratio','rA_rX_ratio','rB_rX_ratio','rX_rB_ratio',
'rS_A','rP_A','Z_A','HOMO_A','LUMO_A','EA_A','IP_A', 'rS_A','rP_A','Z_A','HOMO_A','LUMO_A','EA_A','IP_A',
'rS_B','rP_B','Z_B','HOMO_B','LUMO_B','EA_B','IP_B', 'rS_B','rP_B','Z_B','HOMO_B','LUMO_B','EA_B','IP_B',
'rS_X','rP_X','Z_X','HOMO_X','LUMO_X','EA_X','IP_X']) 'rS_X','rP_X','Z_X','HOMO_X','LUMO_X','EA_X','IP_X'])
exploit.eval('t_sisso = ' + tolerance_factor_dict["t_sisso"][0], inplace=True) exploit.eval('t_sisso = ' + tolerance_factor_dict["t_sisso"][0], inplace=True)
exploit.eval('t ='+ tolerance_factor_dict["t"][0], inplace=True) exploit.eval('t ='+ tolerance_factor_dict["t"][0], inplace=True)
exploit.eval('tau ='+ tolerance_factor_dict["tau"][0], inplace=True) exploit.eval('tau ='+ tolerance_factor_dict["tau"][0], inplace=True)
exploit exploit
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
x_exploit_t_sisso=exploit['t_sisso'].to_numpy() x_exploit_t_sisso=exploit['t_sisso'].to_numpy()
p_t_sisso_exploit=clf2.predict_proba(x_exploit_t_sisso.reshape(-1,1))[:,1] p_t_sisso_exploit=clf2.predict_proba(x_exploit_t_sisso.reshape(-1,1))[:,1]
exploit['p_t_sisso'] = p_t_sisso_exploit exploit['p_t_sisso'] = p_t_sisso_exploit
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
By using the established threshold, one can now determine which are the materials predicted to be stable at the perovskite structure using the tolerance factor identified by SISSO. By using the established threshold, one can now determine which are the materials predicted to be stable at the perovskite structure using the tolerance factor identified by SISSO.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
stable_candidates_t_sisso=[] stable_candidates_t_sisso=[]
for i in list(range(len(exploit))): for i in list(range(len(exploit))):
if exploit['t_sisso'][i] < threshold_t_sisso: if exploit['t_sisso'][i] < threshold_t_sisso:
stable_candidates_t_sisso.append(exploit['material'][i]) stable_candidates_t_sisso.append(exploit['material'][i])
print('According to t_sisso, %i compositions are predicted to be stable as perovskites:' %len(stable_candidates_t_sisso)) print('According to t_sisso, %i compositions are predicted to be stable as perovskites:' %len(stable_candidates_t_sisso))
print(stable_candidates_t_sisso) print(stable_candidates_t_sisso)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We can also compare the results obtained by means of $t_{sisso}$ with those coming from $t$ and $\tau$. In the $P(t_{sisso})$ vs. $t$ plot below, the datapoints are colored acording to their stability predicted by $\tau$. We can also compare the results obtained by means of $t_{sisso}$ with those coming from $t$ and $\tau$. In the $P(t_{sisso})$ vs. $t$ plot below, the datapoints are colored acording to their stability predicted by $\tau$.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
for i in list(range(len(exploit))): for i in list(range(len(exploit))):
if exploit['tau'][i] < + tolerance_factor_dict["t_sisso"][1]: if exploit['tau'][i] < + tolerance_factor_dict["t_sisso"][1]:
exploit['tau_pred_label'][i]=1 exploit['tau_pred_label'][i]=1
else: else:
exploit['tau_pred_label'][i]=0 exploit['tau_pred_label'][i]=0
exploit exploit
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#plot #plot
plt.figure(figsize=(8,8)) plt.figure(figsize=(8,8))
plot2=sns.scatterplot(x='t', y='p_t_sisso', data=exploit, hue='tau_pred_label', plot2=sns.scatterplot(x='t', y='p_t_sisso', data=exploit, hue='tau_pred_label',
palette=['orange','magenta'], markers=['s'], s=80) palette=['orange','magenta'], markers=['s'], s=80)
plot2.set_xlabel("$t$", fontsize=20) plot2.set_xlabel("$t$", fontsize=20)
plot2.set_ylabel("$P(t_{sisso})$", fontsize=20) plot2.set_ylabel("$P(t_{sisso})$", fontsize=20)
plot2.tick_params(labelsize=20) plot2.tick_params(labelsize=20)
plt.axvline(tolerance_factor_dict["t"][1][0]) plt.axvline(tolerance_factor_dict["t"][1][0])
plt.axvline(tolerance_factor_dict["t"][1][1]) plt.axvline(tolerance_factor_dict["t"][1][1])
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# $\tau$ calculator: <br> Predicting the stability as perovskite of user-selected materials # $\tau$ calculator: <br> Predicting the stability as perovskite of user-selected materials
Here, user can enter the chemical formula of an arbitrary $ABX_3$ material, which is classified as stable vs unstable in the pervoskite structure by using $\tau$ from the original article. Here, user can enter the chemical formula of an arbitrary $ABX_3$ material, which is classified as stable vs unstable in the pervoskite structure by using $\tau$ from the original article.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# defining some widgets # defining some widgets
text = widgets.Text( text = widgets.Text(
value="CaTiO3", value="CaTiO3",
description='ABX3', ) description='ABX3', )
def printmd(string): def printmd(string):
display(Markdown(string)) display(Markdown(string))
printmd("Please enter a material with formula ABX3, e.g., CaTiO3:") printmd("Please enter a material with formula ABX3, e.g., CaTiO3:")
# button, output, function and linkage # button, output, function and linkage
butt = widgets.Button(description='ENTER') butt = widgets.Button(description='ENTER')
butt.style.button_color = 'lightblue' butt.style.button_color = 'lightblue'
butt.style.font_weight='bold' butt.style.font_weight='bold'
outt = widgets.Output() outt = widgets.Output()
def on_butt_clicked(b): def on_butt_clicked(b):
with outt: with outt:
clear_output() clear_output()
# ABX3 # ABX3
# Prediction will be done based on the articles tolerance factor (tau). If there is another factor, it should be replaced below # Prediction will be done based on the articles tolerance factor (tau). If there is another factor, it should be replaced below
ABX3=PredictABX3(text.value) ABX3=PredictABX3(text.value)
rB=ABX3.rB rB=ABX3.rB
rX= ABX3.rX rX= ABX3.rX
nA = ABX3.nA nA = ABX3.nA
rA_rB_ratio = ABX3.rA/ABX3.rB rA_rB_ratio = ABX3.rA/ABX3.rB
rX_rB_ratio = ABX3.rX/ABX3.rB rX_rB_ratio = ABX3.rX/ABX3.rB
p_tau=[eval(tolerance_factor_dict["tau"][0])]*len(p_t_sisso_train)
p_tau=clf2.predict_proba(np.array(p_tau).reshape(-1,1))[:,1]
if (str(ABX3.chosen_ox_states)=="nan"): if (str(ABX3.chosen_ox_states)=="nan"):
print("Unfortunately, no charge-balanced combinations for the cations oxidation states were found, therefore tau can not be evaluated for the given formula.") print("Unfortunately, no charge-balanced combinations for the cations oxidation states were found, therefore tau can not be evaluated for the given formula.")
else: else:
p_tau=[eval(tolerance_factor_dict["tau"][0])]*len(p_t_sisso_train)
p_tau=clf2.predict_proba(np.array(p_tau).reshape(-1,1))[:,1]
print("The entered chemical formula has: \n\n Larger cation (A): %s \n Smaller cation (B): %s \n Anion (X): %s" % (ABX3.pred_A,ABX3.pred_B,ABX3.X)) print("The entered chemical formula has: \n\n Larger cation (A): %s \n Smaller cation (B): %s \n Anion (X): %s" % (ABX3.pred_A,ABX3.pred_B,ABX3.X))
tau_value=eval(tolerance_factor_dict["tau"][0]) tau_value=eval(tolerance_factor_dict["tau"][0])
a=[ABX3.pred_A,ABX3.pred_B,ABX3.X,"3"] a=[ABX3.pred_A,ABX3.pred_B,ABX3.X,"3"]
correct_formula=''.join(a) correct_formula=''.join(a)
print('\n τ = %.2f which is %s %.3f (τ threshold), so %s is predicted %s' print('\n τ = %.2f which is %s %.3f (τ threshold), so %s is predicted %s'
% (tau_value,'<' if tau_value < tolerance_factor_dict["tau"][1] else '>',tolerance_factor_dict["t_sisso"][1] , correct_formula , 'perovskite' if tau_value < tolerance_factor_dict["tau"][1] else 'nonperovskite')) % (tau_value,'<' if tau_value < tolerance_factor_dict["tau"][1] else '>',tolerance_factor_dict["t_sisso"][1] , correct_formula , 'perovskite' if tau_value < tolerance_factor_dict["tau"][1] else 'nonperovskite'))
print(' The Platt-scaled probability to be perovskite is %.2f' % p_tau[0]) print(' The Platt-scaled probability to be perovskite is %.2f' % p_tau[0])
butt.on_click(on_butt_clicked) butt.on_click(on_butt_clicked)
widgets.VBox([text,butt,outt]) widgets.VBox([text,butt,outt])
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<b>Platt-scaled classification probabilities</b> <br> <b>Platt-scaled classification probabilities</b> <br>
Here, for the user selected material, Platt-scaled classification probabilities $P(t_{sisso})$ are plotted based on Here, for the user selected material, Platt-scaled classification probabilities $P(t_{sisso})$ are plotted based on
$\tau$ values and labels. The plot is a 2D colormap and user can select from the drop-down list which variables are put on the X and Y axes. As a default $P(t_{sisso})$ is plotted with respect to rA, rB variables. $\tau$ values and labels. The plot is a 2D colormap and user can select from the drop-down list which variables are put on the X and Y axes. As a default $P(t_{sisso})$ is plotted with respect to rA, rB variables.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# P(τ) colormap plot - variables as dropdown list - As a default P(τ) is plotted with respect to rA, rB # P(τ) colormap plot - variables as dropdown list - As a default P(τ) is plotted with respect to rA, rB
ABX3 = PredictABX3(text.value) ABX3 = PredictABX3(text.value)
rX = ABX3.rX rX = ABX3.rX
nA = ABX3.nA nA = ABX3.nA
rA = ABX3.rA rA = ABX3.rA
rB = ABX3.rB rB = ABX3.rB
rA_rB_ratio=rA/rB rA_rB_ratio=rA/rB
list_var_rArBrX = np.linspace(0.1, 2.2, 200) list_var_rArBrX = np.linspace(0.1, 2.2, 200)
list_var_nA = [1,2,3] list_var_nA = [1,2,3]
dropdown_variables_list = widgets.Dropdown(options = ['','rA , rX', 'rB , rX','nA , rA','nA , rB','nA , rX']) dropdown_variables_list = widgets.Dropdown(options = ['','rA , rX', 'rB , rX','nA , rA','nA , rB','nA , rX'])
def dropdown_variables(change): def dropdown_variables(change):
X, Y = np.meshgrid(list_var_rArBrX, list_var_rArBrX) X, Y = np.meshgrid(list_var_rArBrX, list_var_rArBrX)
if change == "rA , rB": if change == "rA , rB":
def f(rb, ra): def f(rb, ra):
rB=rb rB=rb
rA_rB_ratio=ra/rb rA_rB_ratio=ra/rb
return (ne.evaluate(tolerance_factor_dict["tau"][0])) return (ne.evaluate(tolerance_factor_dict["tau"][0]))
plt.xlabel("$r_B(\mathrm{\AA})$",fontsize=20) plt.xlabel("$r_B(\mathrm{\AA})$",fontsize=20)
plt.ylabel("$r_A(\mathrm{\AA})$",fontsize=20) plt.ylabel("$r_A(\mathrm{\AA})$",fontsize=20)
plt.locator_params(axis="y", nbins=6) plt.locator_params(axis="y", nbins=6)
plt.plot(ABX3.rB,ABX3.rA,"gs", label=text.value,markeredgewidth=4) plt.plot(ABX3.rB,ABX3.rA,"gs", label=text.value,markeredgewidth=4)
elif change['new'] == "rA , rX": elif change['new'] == "rA , rX":
def f(rx, ra): def f(rx, ra):
rX=rx rX=rx
rA_rB_ratio=ra/rB rA_rB_ratio=ra/rB
return (ne.evaluate(tolerance_factor_dict["tau"][0])) return (ne.evaluate(tolerance_factor_dict["tau"][0]))
plt.xlabel("$r_X(\mathrm{\AA})$",fontsize=20) plt.xlabel("$r_X(\mathrm{\AA})$",fontsize=20)
plt.ylabel("$r_A(\mathrm{\AA})$",fontsize=20) plt.ylabel("$r_A(\mathrm{\AA})$",fontsize=20)
plt.locator_params(axis="y", nbins=6) plt.locator_params(axis="y", nbins=6)
plt.plot(ABX3.rX,ABX3.rA,"gs", label=text.value,markeredgewidth=4) plt.plot(ABX3.rX,ABX3.rA,"gs", label=text.value,markeredgewidth=4)
elif change['new'] == "rB , rX": elif change['new'] == "rB , rX":
def f(rb, rx): def f(rb, rx):
rB=rb rB=rb
rX=rx rX=rx
rA_rB_ratio=rA/rb rA_rB_ratio=rA/rb
return (ne.evaluate(tolerance_factor_dict["tau"][0])) return (ne.evaluate(tolerance_factor_dict["tau"][0]))
plt.xlabel("$r_B(\mathrm{\AA})$",fontsize=20) plt.xlabel("$r_B(\mathrm{\AA})$",fontsize=20)
plt.ylabel("$r_X(\mathrm{\AA})$",fontsize=20) plt.ylabel("$r_X(\mathrm{\AA})$",fontsize=20)
plt.locator_params(axis="y", nbins=6) plt.locator_params(axis="y", nbins=6)
plt.plot(rB,rX,"gs", label=text.value,markeredgewidth=4) plt.plot(rB,rX,"gs", label=text.value,markeredgewidth=4)
elif change['new'] == "nA , rA": elif change['new'] == "nA , rA":
def f(ra, na): def f(ra, na):
nA=na nA=na
rA_rB_ratio=ra/rB rA_rB_ratio=ra/rB
return (ne.evaluate(tolerance_factor_dict["tau"][0])) return (ne.evaluate(tolerance_factor_dict["tau"][0]))
X, Y = np.meshgrid(list_var_rArBrX,list_var_nA) X, Y = np.meshgrid(list_var_rArBrX,list_var_nA)
plt.xlabel("$r_A(\mathrm{\AA})$",fontsize=20) plt.xlabel("$r_A(\mathrm{\AA})$",fontsize=20)
plt.ylabel("$n_A$",fontsize=20) plt.ylabel("$n_A$",fontsize=20)
plt.locator_params(axis="y", nbins=len(list_var_nA)) plt.locator_params(axis="y", nbins=len(list_var_nA))
plt.plot(ABX3.rA,ABX3.nA,"gs", label=text.value,markeredgewidth=4) plt.plot(ABX3.rA,ABX3.nA,"gs", label=text.value,markeredgewidth=4)
elif change['new'] == "nA , rB": elif change['new'] == "nA , rB":
def f(rb, na): def f(rb, na):
nA=na nA=na
rB=rb rB=rb
rA_rB_ratio=rA/rb rA_rB_ratio=rA/rb
return (ne.evaluate(tolerance_factor_dict["tau"][0])) return (ne.evaluate(tolerance_factor_dict["tau"][0]))
X, Y = np.meshgrid(list_var_rArBrX,list_var_nA) X, Y = np.meshgrid(list_var_rArBrX,list_var_nA)
plt.xlabel("$r_B(\mathrm{\AA})$",fontsize=20) plt.xlabel("$r_B(\mathrm{\AA})$",fontsize=20)
plt.ylabel("$n_A$",fontsize=20) plt.ylabel("$n_A$",fontsize=20)
plt.locator_params(axis="y", nbins=len(list_var_nA)) plt.locator_params(axis="y", nbins=len(list_var_nA))
plt.plot(ABX3.rB,ABX3.nA,"gs", label=text.value,markeredgewidth=4) plt.plot(ABX3.rB,ABX3.nA,"gs", label=text.value,markeredgewidth=4)
elif change['new'] == "nA , rX": elif change['new'] == "nA , rX":
def f(rx, na): def f(rx, na):
nA=na nA=na
rX=rx rX=rx
rA_rB_ratio=rA/rB rA_rB_ratio=rA/rB
return (ne.evaluate(tolerance_factor_dict["tau"][0])) return (ne.evaluate(tolerance_factor_dict["tau"][0]))
X, Y = np.meshgrid(list_var_rArBrX,list_var_nA) X, Y = np.meshgrid(list_var_rArBrX,list_var_nA)
plt.xlabel("$r_X(\mathrm{\AA})$",fontsize=20) plt.xlabel("$r_X(\mathrm{\AA})$",fontsize=20)
plt.ylabel("$n_A$",fontsize=20) plt.ylabel("$n_A$",fontsize=20)
plt.locator_params(axis="y", nbins=len(list_var_nA)) plt.locator_params(axis="y", nbins=len(list_var_nA))
plt.plot(ABX3.rX,ABX3.nA,"gs", label=text.value,markeredgewidth=4) plt.plot(ABX3.rX,ABX3.nA,"gs", label=text.value,markeredgewidth=4)
Z = list(f(X, Y)) Z = list(f(X, Y))
Z = [np.nan_to_num(i, copy=False) for i in Z] Z = [np.nan_to_num(i, copy=False) for i in Z]
X = list(X) X = list(X)
Y = list(Y) Y = list(Y)
p_t_sisso=[clf2.predict_proba(np.array(i).reshape(-1,1))[:,1] for i in Z] p_t_sisso=[clf2.predict_proba(np.array(i).reshape(-1,1))[:,1] for i in Z]
plt.contourf(X, Y, p_t_sisso, 50, cmap='RdBu') plt.contourf(X, Y, p_t_sisso, 50, cmap='RdBu')
cbar = plt.colorbar() cbar = plt.colorbar()
cbar.set_label(label=r'P($\tau$)',size=22) cbar.set_label(label=r'P($\tau$)',size=22)
cbar.locator=ticker.MaxNLocator(nbins=10) cbar.locator=ticker.MaxNLocator(nbins=10)
cbar.update_ticks() cbar.update_ticks()
for t in cbar.ax.get_yticklabels(): for t in cbar.ax.get_yticklabels():
t.set_fontsize(15) t.set_fontsize(15)
plt.title("Platt-scaled probability",x=0.5, y=1.05,fontsize=22) plt.title("Platt-scaled probability",x=0.5, y=1.05,fontsize=22)
plt.tick_params(axis = 'both', which = 'major', labelsize = 18) plt.tick_params(axis = 'both', which = 'major', labelsize = 18)
plt.tick_params(which='minor',length=5,width=3.0) plt.tick_params(which='minor',length=5,width=3.0)
plt.tick_params(which='major',length=5.0, width=3.00) plt.tick_params(which='major',length=5.0, width=3.00)
plt.tick_params(axis='y',which='minor',right='off') plt.tick_params(axis='y',which='minor',right='off')
plt.tick_params(axis='x',which='major',top='off') plt.tick_params(axis='x',which='major',top='off')
plt.tick_params(axis='x',which='minor',top='off') plt.tick_params(axis='x',which='minor',top='off')
plt.tick_params(axis='y',which='major',right='off') plt.tick_params(axis='y',which='major',right='off')
plt.rcParams['figure.figsize'] = [9.5, 7] plt.rcParams['figure.figsize'] = [9.5, 7]
plt.locator_params(axis="x", nbins=6) plt.locator_params(axis="x", nbins=6)
plt.legend(prop={'size': 13},bbox_to_anchor=(1.132, 1.10)) plt.legend(prop={'size': 13},bbox_to_anchor=(1.132, 1.10))
plt.show() plt.show()
dropdown_variables("rA , rB") dropdown_variables("rA , rB")
dropdown_variables_list.observe(dropdown_variables, names='value') dropdown_variables_list.observe(dropdown_variables, names='value')
display(dropdown_variables_list) display(dropdown_variables_list)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
``` ```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment