Commit 1a629640 authored by Luigi's avatar Luigi
Browse files

Create module perovskites_tolerance_factor

parent c1a811d1
H,2.2
He,0
Li,0.98
Be,1.57
B,2.04
C,2.55
N,3.04
O,3.44
F,3.98
Ne,0
Na,0.93
Mg,1.31
Al,1.61
Si,1.9
P,2.19
S,2.58
Cl,3.16
Ar,0
K,0.82
Ca,1
Sc,1.36
Ti,1.54
V,1.63
Cr,1.66
Mn,1.55
Fe,1.83
Co,1.88
Ni,1.91
Cu,1.9
Zn,1.65
Ga,1.81
Ge,2.01
As,2.18
Se,2.55
Br,2.96
Kr,3
Rb,0.82
Sr,0.95
Y,1.22
Zr,1.33
Nb,1.6
Mo,2.16
Tc,1.9
Ru,2.2
Rh,2.28
Pd,2.2
Ag,1.93
Cd,1.69
In,1.78
Sn,1.96
Sb,2.05
Te,2.1
I,2.66
Xe,2.6
Cs,0.79
Ba,0.89
La,1.1
Ce,1.12
Pr,1.13
Nd,1.14
Pm,1.13
Sm,1.17
Eu,1.2
Gd,1.2
Tb,1.1
Dy,1.22
Ho,1.23
Er,1.24
Tm,1.25
Yb,1.1
Lu,1.27
Hf,1.3
Ta,1.5
W,2.36
Re,1.9
Os,2.2
Ir,2.2
Pt,2.28
Au,2.54
Hg,2
Tl,1.62
Pb,2.33
Bi,2.02
Po,2
At,2.2
Rn,0
Fr,0.7
Ra,0.9
Ac,1.1
Th,1.3
Pa,1.5
U,1.38
Np,1.36
Pu,1.28
Am,1.3
Cm,1.3
Bk,1.3
Cf,1.3
Es,1.3
Fm,1.3
Md,1.3
No,1.3
Lr,1.3
Rf,nan
Db,nan
Sg,nan
Bh,nan
Hs,nan
Mt,nan
%% Cell type:markdown id: tags:
<div id="teaser" style=' background-position: right center; background-size: 00px; background-repeat: no-repeat;
padding-top: 20px;
padding-right: 10px;
padding-bottom: 170px;
padding-left: 10px;
border-bottom: 14px double #333;
border-top: 14px double #333;' >
<div style="text-align:center">
<b><font size="6.4">Finding a tolerance factor to predict perovskite stability with SISSO </font></b>
</div>
<p>
created by:
Lucas Foppa<sup>1</sup>,
Thomas Purcell<sup>1</sup>,
Luigi Sbailò <sup> 1</sup>,
Christopher Bartel <sup> 2</sup>,
and Luca M. Ghiringhelli<sup>1</sup> <br><br>
<sup>1</sup> Fritz Haber Institute of the Max Planck Society, Berlin, Germany <br>
<sup>2</sup> UC Berkeley, CA, USA <br>
<span class="nomad--last-updated" data-version="v1.0.0">[Last updated: May 30, 2020]</span>
<div>
<img style="float: left;" src="./assets/perovskites_tolerance_factor/Logo_MPG.png" width="200">
<img style="float: right;" src="./assets/perovskites_tolerance_factor/Logo_NOMAD.png" width="250">
</div>
</div>
%% Cell type:markdown id: tags:
This tutorial shows how tolerance factors for perovskite stability can be derived from data with the sure independece screening and sparsifying operator (SISSO) descriptor-identification approach.
The SISSO method is described in detail in:
<div style="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;">
R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, L. M. Ghiringhelli: <span style="font-style: italic;">SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates</span>, Phys. Rev. Materials 2, 083802 (2018) <a href="https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802" target="_blank">[PDF]</a> .
</div>
This tutorial is based on the following publication:
<div style="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;">
C. Bartel, C. Sutton, B. R. Goldsmith, R. Ouyang, C. B. Musgrave, Luca M. Ghiringhelli, M. Scheffler: <span style="font-style: italic;">New tolerance factor to predict the stabilityof perovskite oxides and halides</span>, Sci. Adv. 5, eaav0693 (2019) <a href="https://advances.sciencemag.org/content/advances/5/2/eaav0693.full.pdf" target="_blank">[PDF]</a> .
</div>
# Perovskites and the Goldschmidt tolerance factor
Perovskites are a class of materials having the basic formula $ABX_3$ and displaying a common structure in which a smaller metal cation $B$ (e.g. a transition metal) resides in corner-sharing octahedra of $X$ anions (e.g. $O^2-$, $Cl^-$, $Br^-$) and a larger A metal cation (e.g. alkali, alkaline earth or lanthanide) has a 12-fold coordination with the $X$ anions. This class of compounds has a remarkable variety of electronic, magnetic, optical, mechanical and transport properties, which is derived from the possibility of tuning the materials propertites by the composition. In fact, ca. 90% of the metallic natural elements of the periodic table can be stabilized in a perovskite structure. Therefore, perovskites are versatile materials suitable for a number of applications including photovoltaics, thermoelectrics and catalysis.
The first step to design new perovskites is to assess their stability. For this purpose, the Goldschmidt tolerance factor, $t$, has been extensively used to predict the stability of a material in the perovskite structure based on the (Shannon) ionic radii,$r_i$, of each ion on the chemical formula $(A,B,X)$:
$$ t=\frac{r_A+r_X}{\sqrt2(r_B+r_X)} $$
$t$ measures how much the $A$-site cation fits into the corner-sharing octahedral network in a cubic crystal structure. It indicates the compatibilty of a given set of ions with the ideal, cubic perovskite structure ($t\approx1$). Distortions from the cubic structure arise from size mismatch between cations and anions, which results in perovskite structures other than cubic (e.g. orthorhombic, rhombohedral). However, when these distortions are too large (e.g. $t<0.8$ or $t>1.05$), the perovskite structure may be unstable and non-perovskites structures are formed.
The accuracy of the Goldschmidt factor is, however, often insufficient to screen for new potential materials and several modification have been proposed to overcome this issue. For instance, the input radii have been refined and the dimensionality of the factor has been increased. In this tutorial, we show how data can be used to derive tolerance factors for perovskite stability.
# The SISSO method for descriptor identifcation
A crucial step in data-driven materials science is the identification of descriptors, functions of parameters characterizing the phenomena governing a certain property. Descriptors allow distinguishing materials and, crucially, should be obtained (measured or calculated) more easily than the property itself, so that they can be evaluated for large sets of still unknown materials to search for new ones.
The sure independence screening and sparsifying operator (SISSO) method combines a symbolic-regression-based feature construction with compressed sensing for the identification of the best low-dimensional descriptors based on data. Within SISSO feature construction, an initial set of input features (the primary features) offered by the user are systematically combined by the application of mathematical operators (e.g. addition, multiplication, exponential, square root), generating a large space (containing up to a billion elements) of candidate features. The candidate features are then ranked according to their fit to the target property (number of materials in the overlap of convex-hull regions, for the case of classification problems) and the top-ranked features are further used for descriptor selection.
For futher details on compressed sensing methods (including SISSO) for descriptor identification, a dedicated notebook is available in the NOMAD toolkit.
%% Cell type:markdown id: tags:
# Import required modules
%% Cell type:code id: tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
import seaborn as sns
from sisso.feature_creation.feature_space import FeatureSpace
from PredictPerovskites import PredictABX3, PredictAABBXX6
from perovskites_tolerance_factor.PredictPerovskites import PredictABX3, PredictAABBXX6
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.calibration import CalibratedClassifierCV
```
%% Cell type:markdown id: tags:
# Get the data
The data consists of a list of 576 $ABX_3$ solids experimentally-characterized at ambient conditions, classified as stable or unstable at the perovskite structure, together with the following features:
<div >
<ul>
<li>$r_A, r_B, r_X$: Shannon ionic radii of each ion, with $r_A>r_B$ </li>
<li>$n_A, n_B, n_X$: oxidation satates of each ion </li>
<li>$\frac{r_A}{r_B}, \frac{r_A}{r_X}, \frac{r_B}{r_x}$: ionic radii ratios </li>
<li>$Z_A, Z_B, Z_X$: nuclear charges </li>
<li>$r_{s,A}, r_{s,B}, r_{s,X}$: calculated radius where the radial distribution of the $s$ orbital has its maximum </li>
<li>$r_{p,A}, r_{p,B}, r_{p,X}$: calculated radius where the radial distribution of the $p$ orbital has its maximum</li>
<li>$HOMO_A, HOMO_B, HOMO_X$: calculated energy of the highest occupied atomic orbital</li>
<li>$LUMO_A, LUMO_B, LUMO_X$: calculated energy of the lowest unoccupied atomic orbital</li>
<li>$EA_A, EA_B, EA_X$: calculated electron affinity</li>
<li>$IP_A, IP_B, IP_X$: calculated ionization potential</li>
</div>
The calculated features were obtained with DFT-PBE using the FHI-aims all-electron full-potential code and correspond to properties of isolated atoms.
%% Cell type:code id: tags:
``` python
#load data
df = pd.read_csv("data/perovskite_tolerance_factor/data_perovskite.csv", index_col=0)
df = pd.read_csv("data/perovskites_tolerance_factor/data_perovskite.csv", index_col=0)
#show data
df
```
%% Output
exp_label rA (AA) rB (AA) rX (AA) nA (Unitless) nB (Unitless) \
material
AgBrO3 -1 1.28 0.31 1.40 1 5
AgCdBr3 -1 1.28 0.95 1.96 1 2
PbAgBr3 -1 1.49 1.15 1.96 2 1
AgCaCl3 -1 1.28 1.00 1.81 1 2
AgClO3 -1 1.28 0.12 1.40 1 5
... ... ... ... ... ... ...
RbUO3 1 1.72 0.76 1.40 1 5
SmTiO3 1 1.24 0.67 1.40 3 3
SrTeO3 -1 1.44 0.97 1.40 2 4
SrTiO3 1 1.44 0.60 1.40 2 4
YTmO3 -1 1.08 0.88 1.40 3 3
nX(Unitless) rA_rB_ratio (Unitless) rA_rX_ratio (Unitless) \
material
AgBrO3 -2 4.12903 0.914286
AgCdBr3 -1 1.34737 0.653061
PbAgBr3 -1 1.29565 0.760204
AgCaCl3 -1 1.28000 0.707182
AgClO3 -2 10.66670 0.914286
... ... ... ...
RbUO3 -2 2.26316 1.228570
SmTiO3 -2 1.85075 0.885714
SrTeO3 -2 1.48454 1.028570
SrTiO3 -2 2.40000 1.028570
YTmO3 -2 1.22727 0.771429
rB_rX_ratio (Unitless) ... LUMO_B (eV) EA_B (eV) IP_B (eV) \
material ...
AgBrO3 0.221429 ... 0.055110 -3.678151 12.554312
AgCdBr3 0.484694 ... -1.157118 0.948262 9.271930
PbAgBr3 0.586735 ... -0.246293 -1.475587 7.755963
AgCaCl3 0.552486 ... -1.945848 0.149995 6.309260
AgClO3 0.085714 ... 0.019724 -3.935230 13.876021
... ... ... ... ... ...
RbUO3 0.542857 ... -1.995273 0.546862 5.590258
SmTiO3 0.478571 ... -4.219539 -0.313899 7.119307
SrTeO3 0.692857 ... 0.193946 -2.575489 9.729526
SrTiO3 0.428571 ... -4.219539 -0.313899 7.119307
YTmO3 0.628571 ... -1.072406 0.522244 6.424662
rS_X (AA) rP_X (AA) Z_X (elem_charge) HOMO_X (eV) LUMO_X (eV) \
material
AgBrO3 0.4608 0.4333 8 -9.030485 -0.068724
AgCdBr3 0.7514 0.8834 35 -7.858439 0.055110
PbAgBr3 0.7514 0.8834 35 -7.858439 0.055110
AgCaCl3 0.6785 0.7567 17 -8.594666 0.019724
AgClO3 0.4608 0.4333 8 -9.030485 -0.068724
... ... ... ... ... ...
RbUO3 0.4608 0.4333 8 -9.030485 -0.068724
SmTiO3 0.4608 0.4333 8 -9.030485 -0.068724
SrTeO3 0.4608 0.4333 8 -9.030485 -0.068724
SrTiO3 0.4608 0.4333 8 -9.030485 -0.068724
YTmO3 0.4608 0.4333 8 -9.030485 -0.068724
EA_X (eV) IP_X (eV)
material
AgBrO3 -3.078804 16.431366
AgCdBr3 -3.678151 12.554312
PbAgBr3 -3.678151 12.554312
AgCaCl3 -3.935230 13.876021
AgClO3 -3.078804 16.431366
... ... ...
RbUO3 -3.078804 16.431366
SmTiO3 -3.078804 16.431366
SrTeO3 -3.078804 16.431366
SrTiO3 -3.078804 16.431366
YTmO3 -3.078804 16.431366
[576 rows x 31 columns]
%% Cell type:code id: tags:
``` python
#count the number of material in each class in the whole dataset
print('In the whole dataset, %s compositions are unstable and %s are stable.' % (df['exp_label'].value_counts().values[0], df['exp_label'].value_counts().values[1]))
```
%% Output
In the whole dataset, 313 compositions are unstable and 263 are stable.
%% Cell type:code id: tags:
``` python
#split the data in 80% training and 20% testing
train,test=train_test_split(df,test_size=0.2)
```
%% Cell type:code id: tags:
``` python
#count the number of material in each class in the training/test sets
print('In the training set, %s compositions are unstable and %s are stable.' % (train['exp_label'].value_counts().values[0], train['exp_label'].value_counts().values[1]))
print('In the test set, %s compositions are unstable and %s are stable.' % (test['exp_label'].value_counts().values[0], test['exp_label'].value_counts().values[1]))
```
%% Output
In the training set, 237 compositions are unstable and 223 are stable.
In the test set, 76 compositions are unstable and 40 are stable.
%% Cell type:code id: tags:
``` python
#sort the training data by the labels (stable/unstable)
train.sort_values(by=['exp_label'],inplace=True)
```
%% Output
<ipython-input-6-da9e169065e3>:2: SettingWithCopyWarning:
<ipython-input-7-da9e169065e3>:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
train.sort_values(by=['exp_label'],inplace=True)
%% Cell type:markdown id: tags:
# Generate the candidate features space from the primary features and operators
The two ingredients to create the feature space with SISSO are the features to be used (i.e. the primary features) and the set of mathematical operators to be applied. Another input from the user is the number of times the operators are applied, the so-called rung (max_phi).
%% Cell type:code id: tags:
``` python
#define list of primary features - user has to choose
cols = [
# 'rA (AA)',
# 'rB (AA)',
# 'rX (AA)',
'nA (Unitless)',
# 'nB (Unitless)',
# 'nX(Unitless)',
'rA_rB_ratio (Unitless)',
# 'rA_rX_ratio (Unitless)',
'rB_rX_ratio (Unitless)',
# 'rS_A (AA)',
# 'rP_A (AA)',
# 'Z_A (elem_charge)',
# 'HOMO_A (eV)',
# 'LUMO_A (eV)',
# 'EA_A (eV)',
# 'IP_A (eV)',
# 'rS_B (AA)',
# 'rP_B (AA)',
# 'Z_B (elem_charge)',
# 'HOMO_B (eV)',
# 'LUMO_B (eV)',
# 'EA_B (eV)',
# 'IP_B (eV)',
# 'rS_X (AA)',
# 'rP_X (AA)',
# 'Z_X (elem_charge)',
# 'HOMO_X (eV)',
# 'LUMO_X (eV)',
# 'EA_X (eV)',
# 'IP_X (eV)'
]
```
%% Cell type:code id: tags:
``` python
#define list of operators - user has to choose
ops = [
# "add",
"sub",
# "abs_diff",
"mult",
"div",
# "exp",
# "neg_exp",
"inv",
# "sq",
# "cb",
# "sixth_power",
# "sqrt",
# "cbrt",
"log",
# "abs",
# "sin",
# "cos",
]
```
%% Cell type:code id: tags:
``` python
#feature space creation - user has to choose rung
fs = FeatureSpace.from_df(
train,
"exp_label",
ops,
cols,
max_phi=3, # rung
n_sis_select=100,
parameterize=False,
fix_c_0=False,
)
```
%% Output
/home/sbailo/anaconda3/envs/sisso/lib/python3.8/site-packages/sisso/feature_creation/nodes/functions.py:41: RuntimeWarning: divide by zero encountered in true_divide
return np.divide(1.0, alpha * x + a) + c
/home/sbailo/anaconda3/envs/sisso/lib/python3.8/site-packages/sisso/feature_creation/nodes/functions.py:71: RuntimeWarning: divide by zero encountered in log
return alpha * np.log(x + a) + c
/home/sbailo/anaconda3/envs/sisso/lib/python3.8/site-packages/sisso/feature_creation/nodes/functions.py:71: RuntimeWarning: invalid value encountered in log
return alpha * np.log(x + a) + c
/home/sbailo/anaconda3/envs/sisso/lib/python3.8/site-packages/sisso/feature_creation/nodes/functions.py:26: RuntimeWarning: divide by zero encountered in true_divide
return np.divide(x[pvt:], x[:pvt] * alpha + a) + c
/home/sbailo/anaconda3/envs/sisso/lib/python3.8/site-packages/sisso/feature_creation/nodes/functions.py:26: RuntimeWarning: invalid value encountered in true_divide
return np.divide(x[pvt:], x[:pvt] * alpha + a) + c
%% Cell type:code id: tags:
``` python
#visualize the feature space created
fs.all_df
```
%% Output
nA () rA_rB_ratio () rB_rX_ratio () nA - rA_rB_ratio () \
material
FeGaO3 3.0 1.25806 0.442857 -1.74194
TlBeCl3 1.0 3.77778 0.248619 2.77778
BaTeO3 2.0 1.65979 0.692857 -0.34021
AgCdBr3 1.0 1.34737 0.484694 0.34737
AgVO3 1.0 2.37037 0.385714 1.37037
... ... ... ... ...
EuCrO3 3.0 1.80645 0.442857 -1.19355
YAlO3 3.0 2.00000 0.385714 -1.00000
LuFeO3 3.0 1.60938 0.457143 -1.39062
YGaO3 3.0 1.74194 0.442857 -1.25806
TlMnCl3 1.0 2.04819 0.458564 1.04819
nA*rA_rB_ratio () nA - rB_rX_ratio () nA*rB_rX_ratio () \
material
FeGaO3 3.77418 -2.557143 1.328571
TlBeCl3 3.77778 -0.751381 0.248619
BaTeO3 3.31958 -1.307143 1.385714
AgCdBr3 1.34737 -0.515306 0.484694
AgVO3 2.37037 -0.614286 0.385714
... ... ... ...
EuCrO3 5.41935 -2.557143 1.328571
YAlO3 6.00000 -2.614286 1.157142
LuFeO3 4.82814 -2.542857 1.371429
YGaO3 5.22582 -2.557143 1.328571
TlMnCl3 2.04819 -0.541436 0.458564
rA_rB_ratio - rB_rX_ratio () rA_rB_ratio*rB_rX_ratio () \
material
FeGaO3 -0.815203 0.557141
TlBeCl3 -3.529161 0.939228
BaTeO3 -0.966933 1.149997
AgCdBr3 -0.862676 0.653062
AgVO3 -1.984656 0.914285
... ... ...
EuCrO3 -1.363593 0.799999
YAlO3 -1.614286 0.771428
LuFeO3 -1.152237 0.735717
YGaO3 -1.299083 0.771430
TlMnCl3 -1.589626 0.939226
-nA*rA_rB_ratio + nA - rA_rB_ratio () ... 1.0/rB_rX_ratio () \
material ...
FeGaO3 5.51612 ... 3.573874
TlBeCl3 1.00000 ... 57.403584
BaTeO3 3.65979 ... 3.976149
AgCdBr3 1.00000 ... 3.745468
AgVO3 1.00000 ... 14.566891
... ... ... ...
EuCrO3 6.61290 ... 7.368658
YAlO3 7.00000 ... 10.370378
LuFeO3 6.21876 ... 5.665851
YGaO3 6.48388 ... 6.851771
TlMnCl3 1.00000 ... 9.148303
1.0*rB_rX_ratio () 1.0*rA_rB_ratio/rB_rX_ratio**2 () \
material
FeGaO3 0.279808 1.25806
TlBeCl3 0.017421 3.77778
BaTeO3 0.251500 1.65979
AgCdBr3 0.266989 1.34737
AgVO3 0.068649 2.37037
... ... ...
EuCrO3 0.135710 1.80645
YAlO3 0.096429 2.00000
LuFeO3 0.176496 1.60938
YGaO3 0.145948 1.74194
TlMnCl3 0.109310 2.04819
1.0*rB_rX_ratio**2/rA_rB_ratio () \
material
FeGaO3 0.794875
TlBeCl3 0.264706
BaTeO3 0.602486
AgCdBr3 0.742187
AgVO3 0.421875
... ...
EuCrO3 0.553572
YAlO3 0.500000
LuFeO3 0.621357
YGaO3 0.574073
TlMnCl3 0.488236
1.0*rB_rX_ratio/(nA*rA_rB_ratio) () \
material
FeGaO3 1.056047
TlBeCl3 0.065811
BaTeO3 0.834873
AgCdBr3 0.359733
AgVO3 0.162723
... ...
EuCrO3 0.735460
YAlO3 0.578571
LuFeO3 0.852147
YGaO3 0.762696
TlMnCl3 0.223887
1.0*nA*rA_rB_ratio/rB_rX_ratio () \
material
FeGaO3 0.946927
TlBeCl3 15.195057
BaTeO3 1.197787
AgCdBr3 2.779836
AgVO3 6.145408
... ...
EuCrO3 1.359694
YAlO3 1.728396
LuFeO3 1.173506
YGaO3 1.311138
TlMnCl3 4.466530
1.0*rB_rX_ratio/rA_rB_ratio**2 () \
material
FeGaO3 0.442857
TlBeCl3 0.248619
BaTeO3 0.692857
AgCdBr3 0.484694
AgVO3 0.385714
... ...
EuCrO3 0.442857
YAlO3 0.385714
LuFeO3 0.457143
YGaO3 0.442857
TlMnCl3 0.458564
1.0*rA_rB_ratio**2/rB_rX_ratio () 1.0/rA_rB_ratio () \
material
FeGaO3 2.258065 0.155893
TlBeCl3 4.022219 0.016362
BaTeO3 1.443299 0.289224
AgCdBr3 2.063157 0.174361
AgVO3 2.592595 0.062765
... ... ...
EuCrO3 2.258065 0.108568
YAlO3 2.592595 0.074388
LuFeO3 2.187499 0.129851
YGaO3 2.258065 0.112588
TlMnCl3 2.180721 0.102667
1.0*rA_rB_ratio ()
material
FeGaO3 6.414670
TlBeCl3 61.117845
BaTeO3 3.457530
AgCdBr3 5.735240
AgVO3 15.932552
... ...
EuCrO3 9.210833
YAlO3 13.443093
LuFeO3 7.701130
YGaO3 8.881906
TlMnCl3 9.740255
[460 rows x 823 columns]
%% Cell type:code id: tags:
``` python
#count the number of features created
print('From %s primary features and %s operators, SISSO generated a feature space containing %s candidate features.' % (len(cols),len(ops),fs.all_df.shape[1]))
```
%% Output
From 3 primary features and 5 operators, SISSO generated a feature space containing 823 candidate features.
%% Cell type:markdown id: tags:
# Select the best candidate features
Next, the generated candidate features are selected in two steps. In a first step, they are ranked according to the number of materials $N$ that fall in overlapping regions of stable and unstable domains and only the top-ranked features are kept. The domain is defined as the range between the maximum and minimum values of the feature for each of the classes (stable and unstable). The best candidate features are those that present lower $N$. The lenght of the overlap domain, $S$, is used to rank features with similar $N$. $N$ and $S$ correspond to equations 2 and 3, respectively, in the original SISSO publication (Phys. Rev. Materials 2, 083802 (2018)).