" print('Estimated calculation time: {}min and {:.1f}s'.format(minutes, seconds), flush=True)\n",
" if time_estimate > 1000:\n",
" print('The calculation is estimated to take too long. Please change the settings to reduce the calculation time (e.g. reduce the SISSO rung, the number of operations or the number of features)')\n",
This tutorial shows how to find descriptive parameters (short formulas) that predict the crystal structure (here, rocksalt (RS) or zincblende (ZB)), using the example of octet binary compounds. It is based on the algorithm sure independence screening and sparsifying operator (SISSO), that enables to search for optimal descriptor by scanning huge feature spaces.
<divstyle="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;">R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, L. M. Ghiringhelli: <spanstyle="font-style: italic;">SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates</span>, Phys. Rev. Materials 2, 083802 (2018) <ahref="https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802"target="_blank">[PDF]</a>.</div>
With the default settings, the method reproduces the same results from:
<divstyle="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;">L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, M. Scheffler: <spanstyle="font-style: italic;">Big Data of Materials Science: Critical Role of the Descriptor</span>, Phys. Rev. Lett. 114, 105503 (2015) <ahref="http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.114.105503">[PDF]</a>,</div>
%% Cell type:markdown id: tags:
<details>
<summary>
<divstyle="padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;"><b>Explanation of the method (click to expand)</b></div></summary>
We present a tool for predicting the crystal structure of octet binary compounds, by using a set of descriptive parameters (a descriptor) based on free-atom data of the atomic species constituting the binary material. We apply a newly developed method: sure independence screening and sparsifying operator (SISSO), that allows to find an optimal descriptor in a huge feature space containing billions of features. In this tutorial an $\ell_0$-optimization is used as the sparsifying operator.
R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, L. M. Ghiringhelli: <spanstyle="font-style: italic;">SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates</span>, Phys. Rev. Materials 2, 083802 (2018) <ahref="https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802"target="_blank">[PDF]</a>. <br></div>
SISSO($\ell_0$) works iteratively. In the first iteration, a number k of features is collected that have the largest correlation (scalar product) with the property vector. The feature with the largest correlation is simply the 1D descriptor. Next, a residual is constructed as the error made at the first iteration. A new set of k features is now selected as those having the largest correlation with the residual. The 2D descriptor is the pair of features that yield the smallest fitting error upon least square regression, among all possible pairs contained in the union of the sets selected in this and the first iteration. In each next iteration a new residual is constructed as the error made in the previous iteration, then a new set of k features is extracted as those that have largest correlation with each new residual. The nD descriptor is the n-tuple of features that yield the smallest fitting error upon least square regression, among all possible n-tuples contained in the union of the sets obtained in each new iteration and all the previous iterations. If k=1 the method collapses to the so-called orthogonal matching pursuit.
The prediction of the ground-state structure for binary compounds from a simple descriptor has a notable history in materials science [1-7], where descriptors were designed by chemically/physically-inspired intuition. The tool presented here allows for the machine-learning-aided automatic discovery of a descriptor and a model for the prediction of the difference in energy between a selected pair of structures for 82 octet binary materials.
References:
<ol>
<li>J. A. van Vechten, Phys. Rev. 182, 891 (1969).</li>
<li>J. C. Phillips, Rev. Mod. Phys. 42, 317 (1970).</li>
<li>J. John and A. N. Bloch, Phys. Rev. Lett. 33, 1095 (1974).</li>
<li>J. R. Chelikowsky and J. C. Phillips, Phys. Rev. B 17, 2453 (1978).</li>
<li>A. Zunger, Phys. Rev. B 22, 5839 (1980).</li>
<li>D. G. Pettifor, Solid State Commun. 51, 31 (1984).</li>
<li>Y. Saad, D. Gao, T. Ngo, S. Bobbitt, J. R. Chelikowsky, and W. Andreoni, Phys. Rev. B 85, 104104 (2012).</li>
</ol>
In this example, you can run a compressed-sensing based algorithm for finding the optimal descriptor and model that predicts the difference in energy between crystal structures (here, zincblende vs. rocksalt).
The descriptor is selected out of a large number of candidates constructed as functions of basic input features, the primary features.
</details>
%% Cell type:markdown id: tags:
The idea demonstrated in this tutorial is to start from simple physical quantities ("primary features", here properties of the constituent free atoms such as orbital radii), to generate millions (or billions) of candidate formulas by applying arithmetic operations combining primary features. These candidate formulas constitute the so-called "feature space". Then, SISSO is used to select only a few of these formulas that explain the data.
By clicking directly on "Run" below, you can reproduce the 2D map as published in <ahref="http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.114.10550"target="_blank">PRL 2015</a>. You can also select primary features and allowed operations (by clicking the check-boxes), as well as the SISSO rung (i.e., the number of iterations in the construction of the feature space), the number of features that are selected at each iteration of the SIS step, and the max number of dimensions of the model. Then press "Run". \
After the results are shown for all models from one dimensional to the max chosen dimension, you can press "Plot interactive map" to reveal a map of the RS vs ZB relative stability, for the highest dimensional model. If the highest dimension model is 2D, the separation line between the two phases (i.e., the locus where the predicted $\Delta$E is zero) is shown. For higher dimensional models, the 3rd and 4th dimensions can be visualized via the size or the color of the data-point markers. Intuitive drop-down menus allow to assign axes, markers, and colors, to the descriptor components of choice.
With the selection of "PRL2015" as SISSO rung, a special feature space is uploaded, which allows for the reproduction of also the 1D and 3D models as published in <ahref="http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.114.10550"target="_blank">PRL 2015</a> . This is because in <ahref="http://journals.aps.org/prl/abstract/10.1103/PhysRevLett.114.10550"target="_blank">PRL-2015</a> a slightly different criterion for the construction of the feature set was adopted, compared to <ahref="https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802"target="_blank">PRM-2018</a>.
print('Estimated calculation time: {}min and {:.1f}s'.format(minutes,seconds),flush=True)
iftime_estimate>1000:
print('The calculation is estimated to take too long. Please change the settings to reduce the calculation time (e.g. reduce the SISSO rung, the number of operations or the number of features)')
print("\nThe number of selected features per SIS iteration is bigger than the number of features available. Please reduce the number of selected features per SIS iteration (number of features generated / max number of dimensions) or increase the number of selected features and operations.")
exceptRuntimeError:
print('The present selection does not lead to the creation of any derived features in the highest selected rung, please select at least one binary or power operator, or reduce the maximum rung')