Commit f968bd76 authored by Luigi Sbailo's avatar Luigi Sbailo
Browse files

Close all windows

parent 471487b6
......@@ -64,7 +64,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"heading_collapsed": true
},
"source": [
"# Import required modules"
]
......@@ -76,7 +78,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:00.178630Z",
"start_time": "2022-01-24T19:30:00.172488Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -107,7 +110,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"heading_collapsed": true
},
"source": [
"# Get the data\n",
"The data consists of a list of 576 $ABX_3$ solids experimentally-characterized at ambient conditions, classified as stable or unstable at the perovskite structure, together with the following features:\n",
......@@ -138,6 +143,7 @@
"end_time": "2022-01-24T19:30:00.677764Z",
"start_time": "2022-01-24T19:30:00.609313Z"
},
"hidden": true,
"scrolled": true
},
"outputs": [],
......@@ -159,7 +165,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:00.834548Z",
"start_time": "2022-01-24T19:30:00.826359Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -175,6 +182,7 @@
"end_time": "2022-01-24T19:30:01.291495Z",
"start_time": "2022-01-24T19:30:01.187018Z"
},
"hidden": true,
"scrolled": true
},
"outputs": [],
......@@ -208,7 +216,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:01.493453Z",
"start_time": "2022-01-24T19:30:01.482670Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -219,7 +228,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"heading_collapsed": true
},
"source": [
"# Generate the candidate features space from the primary features and operators\n",
"The two sets of elements needed to create the feature space with SISSO are the features to be used (i.e. the primary features) and the set of mathematical operators to be applied. Another input from the user is the number of times the operators are applied, the so-called rung (max_phi). "
......@@ -233,6 +244,7 @@
"end_time": "2022-01-24T19:30:02.153584Z",
"start_time": "2022-01-24T19:30:02.081199Z"
},
"hidden": true,
"scrolled": true
},
"outputs": [],
......@@ -265,7 +277,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:02.661295Z",
"start_time": "2022-01-24T19:30:02.655031Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -314,7 +327,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:03.044852Z",
"start_time": "2022-01-24T19:30:03.039471Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -348,7 +362,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:03.562527Z",
"start_time": "2022-01-24T19:30:03.545410Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -402,7 +417,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"heading_collapsed": true
},
"source": [
"# Select the best candidate features and classification thresholds\n",
"Next, the generated candidate features are selected in two steps. In a first step (SIS), they are ranked according to the number of materials $N$ that fall in overlapping regions of stable and unstable domains and only the top-ranked features are kept. The domain is defined as the range between the maximum and minimum values of the feature for each of the classes (stable and unstable). The best candidate features are those that present lower $N$. The lenght of the overlap domain, $S$, is used to rank features presenting the same $N$. $N$ and $S$ correspond to equations 2 and 3, respectively, in Phys. Rev. Materials 2, 083802 (2018)."
......@@ -415,7 +432,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:14.311889Z",
"start_time": "2022-01-24T19:30:09.485916Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -434,7 +452,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:14.616975Z",
"start_time": "2022-01-24T19:30:14.313796Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -450,7 +469,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"In a second step (SO), classification trees are used to choose the best candidate feature among those selected by the overlaps (above). For each of the selected candidate features, a classification tree is trained, providing a threshold for the classification and its accuracy. The selected candidate features get ranked based on their accuracy. "
]
......@@ -463,6 +484,7 @@
"end_time": "2022-01-24T19:30:19.140067Z",
"start_time": "2022-01-24T19:30:15.949001Z"
},
"hidden": true,
"scrolled": true
},
"outputs": [],
......@@ -488,7 +510,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"The top-ranked candidate feature corresponds to the the SISSO-derived tolerance factor. We call this descriptor $t_{sisso}$. "
]
......@@ -500,7 +524,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:26.768946Z",
"start_time": "2022-01-24T19:30:26.764986Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -513,7 +538,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"$t_{sisso}$ can be now evaluated for all the materials, including those in the test set. For comparison, we also evaluate the Goldschmidt factor ($t$) and the $\\tau$ descriptor found in Sci. Adv. 5, eaav0693 (2019)."
]
......@@ -526,6 +553,7 @@
"end_time": "2022-01-24T19:30:28.932963Z",
"start_time": "2022-01-24T19:30:28.861892Z"
},
"hidden": true,
"scrolled": true
},
"outputs": [],
......@@ -555,7 +583,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"heading_collapsed": true
},
"source": [
"# Evalute the descriptor performance\n",
"The accuracy of the classification tree for $t_{sisso}$ can be now evaluated for train and test sets. For the classification tree, a maximum depth of one is used here. "
......@@ -568,7 +598,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:30.511142Z",
"start_time": "2022-01-24T19:30:30.349614Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -593,7 +624,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:31.022923Z",
"start_time": "2022-01-24T19:30:31.012377Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -607,7 +639,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"The accuracy of the classification tree for $t$ can be also evaluated. In order to mimic the original calibration of $t$ performed by Goldschmidt, we adopt here a maximum depth of two for the classification tree, corresponding to a classification based on two thresholds."
]
......@@ -619,7 +653,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:32.671849Z",
"start_time": "2022-01-24T19:30:32.445127Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -642,7 +677,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:33.129720Z",
"start_time": "2022-01-24T19:30:33.121887Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -656,7 +692,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"Platt-scaled classification probabilities $P(t_{sisso})$ are also computed based on the $t_{sisso}$ values and the labels. Such probabilities indicate the likelihood that a material is stable in the perovskite structure (as opposed to the stable vs. non-stable classification provided by the threshold alone)."
]
......@@ -668,7 +706,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:34.287862Z",
"start_time": "2022-01-24T19:30:34.253551Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -690,7 +729,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:35.101804Z",
"start_time": "2022-01-24T19:30:34.829472Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -709,7 +749,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"Finally, the performance of $t_{sisso}$ and $t$ can be compared by plotting $t_{sisso}$ vs. $t$ and $P(t_{sisso})$ vs. $t$. In thes plot, each datapoint is colored according to the (true) experimental label (e.g. whether a materials is a perovskite (blue) or not (red)) for both train and test sets. "
]
......@@ -721,7 +763,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:36.332172Z",
"start_time": "2022-01-24T19:30:36.034838Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -744,7 +787,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:36.848603Z",
"start_time": "2022-01-24T19:30:36.564951Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -760,7 +804,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"heading_collapsed": true
},
"source": [
"# Predict stability of unseen materials (exploitation) using $t_{sisso}$\n",
"With $t_{sisso}$ in hand, one can explore large composition spaces in search for (new) compositions which are likely to form perovskites. We start by creating a list of candidate materials to be tested. "
......@@ -773,7 +819,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:38.161952Z",
"start_time": "2022-01-24T19:30:38.156437Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -786,7 +833,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"For each of the compositions, one has to predict which cations are at the $A$ and $B$-sites of the perovskite structure (based on their relative sizes) and assign oxidation states for each cation (based on their periodic table group and common oxidation states). Oxidation states are not only needed for the evaluation of the tolerance factors, but also for the determination of the Shannon ionic radii, which are themselves oxidation-state dependant. Furthermore, based on oxidations states, one can assess if a given composition is charge-balanced or not. In order to perform these tasks, we use PredictABX3.py, provided as SI of Sci. Adv. 5, eaav0693 (2019). From the list of given compositions, we determine A and B and exclude the compounds that cannot be charge-balanced. "
]
......@@ -798,7 +847,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:39.179393Z",
"start_time": "2022-01-24T19:30:39.036361Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -815,7 +865,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"We then collect all the input features for each of the given materials (including the DFT-calculated atomic feature) and evalute $t_{sisso}$, $t$ and $\\tau$ for each of the materials. "
]
......@@ -827,7 +879,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:41.245876Z",
"start_time": "2022-01-24T19:30:40.045816Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -874,7 +927,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:41.307104Z",
"start_time": "2022-01-24T19:30:41.300785Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -885,7 +939,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"By using the established threshold, one can now determine which are the materials predicted to be stable at the perovskite structure using the tolerance factor identified by SISSO."
]
......@@ -897,7 +953,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:42.627971Z",
"start_time": "2022-01-24T19:30:42.621032Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -912,7 +969,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"We can also compare the results obtained by means of $t_{sisso}$ with those coming from $t$ and $\\tau$. In the $P(t_{sisso})$ vs. $t$ plot below, the datapoints are colored acording to their stability predicted by $\\tau$."
]
......@@ -925,6 +984,7 @@
"end_time": "2022-01-24T19:30:43.776426Z",
"start_time": "2022-01-24T19:30:43.709174Z"
},
"hidden": true,
"scrolled": true
},
"outputs": [],
......@@ -944,7 +1004,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:44.476499Z",
"start_time": "2022-01-24T19:30:44.221569Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -966,7 +1027,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:21:25.843492Z",
"start_time": "2022-01-24T19:21:25.839935Z"
}
},
"heading_collapsed": true
},
"source": [
"# $\\tau$ calculator: <br> Predicting the stability as perovskite of user-selected materials\n",
......@@ -980,7 +1042,8 @@
"ExecuteTime": {
"end_time": "2022-01-24T19:30:47.853594Z",
"start_time": "2022-01-24T19:30:47.795534Z"
}
},
"hidden": true
},
"outputs": [],
"source": [
......@@ -1029,7 +1092,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"<b>Platt-scaled classification probabilities</b> <br>\n",
"Here, for the user selected material, Platt-scaled classification probabilities $P(t_{sisso})$ are plotted based on\n",
......@@ -1044,6 +1109,7 @@
"end_time": "2022-01-24T19:30:49.803156Z",
"start_time": "2022-01-24T19:30:49.405573Z"
},
"hidden": true,
"scrolled": false
},
"outputs": [],
......@@ -1161,7 +1227,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"hidden": true
},
"outputs": [],
"source": []
}
......@@ -1182,7 +1250,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
"version": "3.7.3"
}
},
"nbformat": 4,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment