tetradymite_PRM2020.ipynb 31.2 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-09-17T14:26:04.070620Z",
     "start_time": "2020-09-17T14:26:04.062967Z"
    }
   },
   "source": [
    "<div id=\"teaser\" style=' background-position:  right center; background-size: 00px; background-repeat: no-repeat; \n",
    "    padding-top: 20px;\n",
    "    padding-right: 10px;\n",
    "    padding-bottom: 170px;\n",
    "    padding-left: 10px;\n",
    "    border-bottom: 14px double #333;\n",
    "    border-top: 14px double #333;' > \n",
    "\n",
    "   \n",
    "   <div style=\"text-align:center\">\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
22
    "    <b><font size=\"6.4\">Discovery of new topological insulators in alloyed tetradymites\n",
23
    "        </font></b>    \n",
24
25
    "  </div>\n",
    "\n",
26
27
28
29
30
    "<p>\n",
    "created by: Luigi Sbailò, Thomas A. R. Purcell, Luca M. Ghiringhelli, and Matthias Scheffler   \n",
    "   \n",
    "Fritz Haber Institute of the Max Planck Society, Faradayweg 4-6, D-14195 Berlin, Germany <br>\n",
    "<span class=\"nomad--last-updated\" data-version=\"v1.0.0\">[Last updated: Sep 29, 2020]</span>\n",
31
32
33
34
35
36
37
38
39
40
41
42
    "  \n",
    "<div> \n",
    "<img  style=\"float: left;\" src=\"assets/tetradymite_PRM2020/Logo_MPG.png\" width=\"200\"> \n",
    "<img  style=\"float: right;\" src=\"assets/tetradymite_PRM2020/Logo_NOMAD.png\" width=\"250\">\n",
    "</div>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
    "### Introduction\n",
    "This tutorial shows how to find descriptive parameters (short formulas) that predict whether alloyed materials are topological or trivial insulators, using the example of tetradymites. It is based on the algorithm sure independence screening and sparsifying operator (SISSO), that enables to search for optimal descriptor by scanning huge feature spaces.\n",
    "\n",
    "<div style=\"padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;\">R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, L. M. Ghiringhelli: <span style=\"font-style: italic;\">SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates</span>, Phys. Rev. Materials  2, 083802 (2018) <a href=\"https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802\" target=\"_blank\">[PDF]</a>.</div>\n",
    "\n",
    "With the default settings, the method reproduces the same results from:\n",
    "\n",
    "<div style=\"padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;\">G. Cao, R. Ouyang, L. M. Ghiringhelli, M. Scheffler, H. Liu, C. Carbogno, and Z. Zhang: <span style=\"font-style: italic;\">Artificial intelligence for high-throughput discovery of topological insulators: The example of alloyed tetradymites</span>,  Phys. Rev. Materials 4, 034204 (2020) <a href=\"https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.4.034204\">[PDF]</a>,</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "    <summary>\n",
59
    "        <div style=\"padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;\"><b>Explanation of the method (click to expand/collapse)</b></div></summary>\n",
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
    "\n",
    "We present a tool for predicting whether alloyed tetradymite are topological or trivial insulators, by using a set of descriptive parameters (a descriptor) based on free-atom data of the atomic species constituting the $AB-LMN$ materials, where $A,B \\in \\{ \\textrm{As, Sb, Bi} \\}$ and $L,M,N \\in \\{ \\textrm{S, Se, Te} \\}$. We apply a recently developed method: sure independence screening and sparsifying operator (SISSO), that allows to find an optimal descriptor in a huge feature space containing billions of features. In this tutorial an $\\ell_0$-optimization is used as the sparsifying operator.\n",
    "The method is described in:\n",
    "               \n",
    "<div style=\"padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;\">\n",
    "R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, L. M. Ghiringhelli: <span style=\"font-style: italic;\">SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates</span>, Phys. Rev. Materials  2, 083802 (2018) <a href=\"https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802\" target=\"_blank\">[PDF]</a>. <br> </div>\n",
    "               \n",
    "In this tutorial, we focus on the classification flavor of SISSO($\\ell_0$). \n",
    "In the space of descriptors, each category’s domain (here, topological vs trivial insulator) is approximated as\n",
    "the region of space within the convex hull of the corresponding training data. SISSO finds the low-dimensional descriptor yielding the minimum overlap between these convex regions. In practice, the algorithm is iterative. At the first iteration, in the SIS step, it selects the $k$ features which yield the smallest overlap when convex regions (segments encompassing all the data in one category) over the training data are constructed. In the first iteration the feature giving the smalles overlap is already the 1D model. At each subsequent iteration $i$, in the SIS step. $k$ new features that do the same for those training points which were in the overlap regions at the previous steps (i.e., the residuals). Then, in the SO step, all $i$-tuples of features selected combining in all possible ways the features selected in the SIS steps are ranked by the size of the overlap. The $i$-tuple with the smallest overlap is the $i$D model. \n",
    "\n",
    "In order to better identify a predictive model to classify unseen data point, at each dimension (iteration) a soft-margin support-vector machine <a href=\"https://link.springer.com/article/10.1007%252FBF00994018\" target=\"_blank\">[C. Cortes & V. Vapnik, Machine learning 20, 273 (1995)]</a> is trained to define the separating hyperplanes. The resulting model is identified by the coefficents and intercept of the hyperplanes.\n",
    "               \n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-09-30T09:55:37.970299Z",
     "start_time": "2020-09-30T09:55:37.934551Z"
    }
   },
   "source": [
85
    "The idea demonstrated in this tutorial is to start from simple physical quantities (\"primary features\", here properties of the constituent free atoms such as Pauling electronegativity), to generate millions (or billions) of candidate formulas by applying arithmetic operations combining primary features. These candidate formulas constitute the so-called \"feature space\". Then, SISSO is used to select only a few of these formulas that explain the data.\n",
86
    "\n",
87
    "By clicking directly on \"Run\" below, i.e., with the default selection, you can reproduce the 2D map as published in <a href=\"https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802\" target=\"_blank\">PRM 2020</a>. You can also select primary features and allowed operations (by clicking the check-boxes), as well as the SISSO rung (i.e., the number of iterations in the construction of the feature space), the number of features that are selected at each iteration of the SIS step, and the max number of dimensions of the model. The materials considered here have up to 5 different atomic species in the unit cell, with the prototype formula $AB-LMN$, where the cations $A,B \\in \\{ \\textrm{As, Sb, Bi} \\}$ and the anions $L,M,N \\in \\{ \\textrm{S, Se, Te} \\}$. We have therefore grouped the features to be selected into those for cations and anions. This means that by selecting, e.g., a cation feature, such feature is added to the primary feature set for both $A$ and $B$ elements, but either is treated singularly in the feature construction and SISSO optimization. After the features' and other settings' selection, press \"Run\". \\\n",
88
89
90
    "After the results are shown for all models from one dimensional to the max chosen dimension, you can press \"Plot interactive map\" to reveal a map of tetradymites' topological vs trivial insulators, for the highest-dimensional model. If the highest-dimensional model is 2D, the support-vector-machine separation line between the two phases is shown. For higher dimensional models, the 3rd and 4th dimensions can be visualized via the size or the color of the data-point markers. Intuitive drop-down menus allow to assign axes, markers, and colors, to the descriptor components of choice.\n",
    "\n",
    "With the selection of \"PRM2020\" (or default selection) as SISSO rung, a special feature space is uploaded, which contains much fewer features than in the production calculation used in <a href=\"https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802\" target=\"_blank\">PRM 2020</a>. This allows to reobtain in the notebook the same result in a reasonsable time. Still, the provided feature space contains thousands of the top ranked features and SISSO finds the best nD model. "
91
92
93
94
   ]
  },
  {
   "cell_type": "code",
95
   "execution_count": 83,
96
97
   "metadata": {
    "ExecuteTime": {
98
99
     "end_time": "2021-09-15T12:06:55.215052Z",
     "start_time": "2021-09-15T12:06:55.209182Z"
Luigi Sbailo's avatar
Luigi Sbailo committed
100
101
102
    },
    "init_cell": true
   },
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
   "outputs": [
    {
     "data": {
      "text/html": [
       "<script>\n",
       "    code_show=true; \n",
       "    function code_toggle() {\n",
       "        if (code_show)\n",
       "        {\n",
       "            $('div.input').hide();\n",
       "        } \n",
       "        else \n",
       "        {\n",
       "            $('div.input').show();\n",
       "        }\n",
       "        code_show = !code_show\n",
       "    } \n",
       "    $( document ).ready(code_toggle);\n",
       "    window.runCells(\"startup\");\n",
       "</script>\n",
       "The Python code for this notebook is by default hidden for easier reading.\n",
       "To toggle on/off the code, click <a href=\"javascript:code_toggle()\">here</a>.\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
Luigi Sbailo's avatar
Luigi Sbailo committed
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
   "source": [
    "%%HTML\n",
    "<script>\n",
    "    code_show=true; \n",
    "    function code_toggle() {\n",
    "        if (code_show)\n",
    "        {\n",
    "            $('div.input').hide();\n",
    "        } \n",
    "        else \n",
    "        {\n",
    "            $('div.input').show();\n",
    "        }\n",
    "        code_show = !code_show\n",
    "    } \n",
    "    $( document ).ready(code_toggle);\n",
    "    window.runCells(\"startup\");\n",
    "</script>\n",
152
153
    "The Python code for this notebook is by default hidden for easier reading.\n",
    "To toggle on/off the code, click <a href=\"javascript:code_toggle()\">here</a>."
Luigi Sbailo's avatar
Luigi Sbailo committed
154
155
156
157
   ]
  },
  {
   "cell_type": "code",
158
   "execution_count": 84,
159
160
   "metadata": {
    "ExecuteTime": {
161
162
     "end_time": "2021-09-15T12:06:55.220065Z",
     "start_time": "2021-09-15T12:06:55.216481Z"
Luigi Sbailo's avatar
Luigi Sbailo committed
163
164
    },
    "init_cell": true
165
166
167
   },
   "outputs": [],
   "source": [
168
169
170
    "from sissopp import Inputs, FeatureSpace, SISSOClassifier, FeatureNode, Unit\n",
    "from sissopp.py_interface import read_csv\n",
    "from sissopp.py_interface.import_dataframe import get_unit\n",
171
172
173
174
175
176
177
178
    "from tetradymite_PRM2020.visualizer import Visualizer\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import os"
   ]
  },
  {
   "cell_type": "code",
179
   "execution_count": null,
180
181
   "metadata": {
    "ExecuteTime": {
Luigi Sbailo's avatar
Luigi Sbailo committed
182
183
     "end_time": "2021-06-21T15:58:13.011817Z",
     "start_time": "2021-06-21T15:58:01.884774Z"
184
185
    }
   },
Luigi Sbailo's avatar
Luigi Sbailo committed
186
   "outputs": [],
187
   "source": [
Luigi Sbailo's avatar
Luigi Sbailo committed
188
    "# The dataset is stored in the NOMAD Archive and can be accessed with this query.\n",
189
190
191
192
193
194
195
    "from nomad import client, config\n",
    "config.client.url = 'http://nomad-lab.eu/prod/rae/api'\n",
    "query = client.query_archive(query={\n",
    "    'dataset_id': ['BjT-NFK0QdOx81_z5TmyeQ']},\n",
    "                                  per_page=100,\n",
    ")\n",
    "print(query)\n"
196
197
198
199
   ]
  },
  {
   "cell_type": "code",
200
   "execution_count": 85,
201
202
   "metadata": {
    "ExecuteTime": {
203
204
     "end_time": "2021-09-15T12:06:55.233765Z",
     "start_time": "2021-09-15T12:06:55.221460Z"
Luigi Sbailo's avatar
Luigi Sbailo committed
205
206
    },
    "init_cell": true
207
208
209
   },
   "outputs": [],
   "source": [
Luigi Sbailo's avatar
Luigi Sbailo committed
210
    "df_train = pd.read_pickle('./data/tetradymite_PRM2020/training_set')"
211
212
213
214
   ]
  },
  {
   "cell_type": "code",
215
   "execution_count": null,
216
217
   "metadata": {
    "ExecuteTime": {
Luigi Sbailo's avatar
Luigi Sbailo committed
218
219
     "end_time": "2021-06-21T15:58:13.328332Z",
     "start_time": "2021-06-21T15:58:13.053778Z"
Luigi Sbailo's avatar
Luigi Sbailo committed
220
221
    },
    "scrolled": true
222
223
224
   },
   "outputs": [],
   "source": [
Luigi Sbailo's avatar
Luigi Sbailo committed
225
    "# This piece of code is not run at initialization. \n",
226
    "# It can create the molecular structures which are visualized.\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
227
    "\n",
228
229
230
231
    "path_structure = './data/tetradymite_PRM2020/structures/'\n",
    "try:\n",
    "    os.mkdir(path_structure)\n",
    "except OSError:\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
232
    "    !rm ./data/tetradymite_PRM2020/structures/*\n",
233
234
235
236
237
238
239
    "compounds=df_train.index.to_list()\n",
    "scale_factor = 10**10\n",
    "alist = []\n",
    "for compound in compounds:\n",
    "    for entry in range (1581):\n",
    "        labels = query[entry].section_run[0].section_system[-1].atom_labels\n",
    "        if (len(labels)>5):\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
240
    "            continue\n",
241
    "        \n",
242
243
244
245
    "        labels_1 = str(labels[0])+'_'+str(labels[1])+'_'+str(labels[3])+'_'+str(labels[4])+'_'+str(labels[2])\n",
    "        labels_2 = str(labels[0])+'_'+str(labels[1])+'_'+str(labels[4])+'_'+str(labels[3])+'_'+str(labels[2])\n",
    "        labels_3 = str(labels[1])+'_'+str(labels[0])+'_'+str(labels[3])+'_'+str(labels[4])+'_'+str(labels[2])\n",
    "        labels_4 = str(labels[1])+'_'+str(labels[0])+'_'+str(labels[4])+'_'+str(labels[3])+'_'+str(labels[2])\n",
246
    "\n",
247
    "        if compound in list([labels_1, labels_2, labels_3, labels_4]):\n",
248
    "\n",
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
    "            n_atoms = len (labels)\n",
    "            lat_x, lat_y, lat_z = query[entry].section_run[0].section_system[-1].lattice_vectors.magnitude * scale_factor\n",
    "            file = open(path_structure + str(compound) +\".xyz\",\"w\") \n",
    "            file.write(\"%d\\n\\n\"%(n_atoms*8))\n",
    "            for i in [0,1,2]:\n",
    "                    for j in [0,1,2]:\n",
    "                        for k in [0,1,2]:\n",
    "                            for n in range (n_atoms):\n",
    "                                el = query[entry].section_run[0].section_system[-1].atom_labels[n]\n",
    "                                xyz = query[entry].section_run[0].section_system[-1].atom_positions[n].magnitude * scale_factor\n",
    "                                xyz += i*lat_x\n",
    "                                xyz += j*lat_y\n",
    "                                xyz += k*lat_z\n",
    "                                file.write (el)\n",
    "                                file.write (\"\\t%f\\t%f\\t%f\\n\"%(xyz[0],xyz[1],xyz[2]))\n",
    "            file.close()\n",
    "            alist.append(compound)\n",
266
    "\n",
267
    "            break\n",
268
269
270
271
272
    "    "
   ]
  },
  {
   "cell_type": "code",
273
   "execution_count": 86,
274
275
   "metadata": {
    "ExecuteTime": {
276
277
     "end_time": "2021-09-15T12:06:55.365899Z",
     "start_time": "2021-09-15T12:06:55.235398Z"
Luigi Sbailo's avatar
Luigi Sbailo committed
278
279
    },
    "init_cell": true
280
   },
281
282
283
284
285
286
   "outputs": [],
   "source": [
    "zeta = {'S':16, 'As':33, 'Se':34, 'Sb':51, 'Te':52, 'Bi':83}\n",
    "chi = {'S':2.58, 'As':2.18, 'Se':2.55, 'Sb':2.05, 'Te':2.12, 'Bi':2.02}\n",
    "lambd = {'S':0.05, 'As':0.19, 'Se':0.22, 'Sb':0.4, 'Te':0.49, 'Bi':1.25}\n",
    "\n",
287
    "df_feat = pd.DataFrame(index=df_train.index, columns=[\n",
288
289
290
291
292
293
    "                                                     'z_A','z_B','z_L','z_M','z_N',\n",
    "                                                     'x_A','x_B','x_L','x_M','x_N',\n",
    "                                                     'l_A','l_B','l_L','l_M','l_N',\n",
    "                                                     ])\n",
    "for comp in df_train.index:\n",
    "    ablmn = comp.split('_')\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
294
    "    df_feat.loc[comp] = pd.Series({\n",
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
    "                                   'z_A':zeta[ablmn[0]],\n",
    "                                   'z_B':zeta[ablmn[1]],\n",
    "                                   'z_L':zeta[ablmn[2]],\n",
    "                                   'z_M':zeta[ablmn[3]],\n",
    "                                   'z_N':zeta[ablmn[4]],\n",
    "                                   'x_A':chi[ablmn[0]],\n",
    "                                   'x_B':chi[ablmn[1]],\n",
    "                                   'x_L':chi[ablmn[2]],\n",
    "                                   'x_M':chi[ablmn[3]],\n",
    "                                   'x_N':chi[ablmn[4]],\n",
    "                                   'l_A':lambd[ablmn[0]],\n",
    "                                   'l_B':lambd[ablmn[1]],\n",
    "                                   'l_L':lambd[ablmn[2]],\n",
    "                                   'l_M':lambd[ablmn[3]],\n",
    "                                   'l_N':lambd[ablmn[4]],\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
310
311
312
    "                                  }) \n",
    "\n",
    "df_feat['Class'] = df_train['Class']"
313
314
315
316
   ]
  },
  {
   "cell_type": "code",
317
   "execution_count": 87,
318
319
   "metadata": {
    "ExecuteTime": {
320
321
     "end_time": "2021-09-15T12:06:55.374165Z",
     "start_time": "2021-09-15T12:06:55.367318Z"
Luigi Sbailo's avatar
Luigi Sbailo committed
322
323
    },
    "init_cell": true
324
325
326
   },
   "outputs": [],
   "source": [
327
328
329
330
    "def get_feat_space_and_sisso_regressor(\n",
    "    selected_ops=[\"add\", \"abs_diff\", \"div\", \"sq\", \"exp\"],\n",
    "    selected_features = 'all',\n",
    "    max_rung=2,\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
331
    "    n_sis_select=50,\n",
332
333
    "    n_dim=2,\n",
    "    n_residual=10,\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
334
335
    "    default=True,\n",
    "):\n",
336
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
337
    "    if default:\n",
338
339
340
341
    "        \n",
    "        selected_ops = [\"add\", \"sub\", \"mult\", \"div\", \"abs_diff\", \"sq\", \"cb\", \"sqrt\", \"cbrt\", \"inv\", \"abs\"] \n",
    "        selected_features = 'all'\n",
    "        inputs = read_csv(\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
342
    "            df_train, \n",
343
    "            prop_key=\"Class\",\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
344
    "            cols='all',\n",
345
    "            max_rung=max_rung,\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
346
    "            leave_out_frac=0.0,\n",
347
    "            )\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
348
    "    else:\n",
349
350
    "        \n",
    "        inputs = read_csv(\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
351
    "            df_feat, \n",
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
    "            prop_key=\"Class\",\n",
    "            cols=selected_features,\n",
    "            max_rung=max_rung,\n",
    "            leave_out_frac=0.0\n",
    "            )\n",
    "        \n",
    "    inputs.max_rung = max_rung\n",
    "    inputs.allowed_ops = selected_ops\n",
    "    inputs.n_sis_select = n_sis_select\n",
    "    inputs.n_dim = n_dim\n",
    "    inputs.n_residual = n_residual\n",
    "    inputs.n_model_store = 1\n",
    "    inputs.calc_type = \"classification\"\n",
    "    inputs.sample_ids_train = df_feat.index.tolist()\n",
    "    inputs.prop_train = df_feat[\"Class\"].to_numpy()\n",
    "    inputs.prop_test = np.array([])\n",
    "    inputs.prop_label = \"Class\"\n",
    "    inputs.task_names = [\"all_mats\"]\n",
    "\n",
    "        \n",
    "    feat_space = FeatureSpace(inputs)\n",
    "    \n",
    "    sisso = SISSOClassifier(inputs, feat_space)\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
375
    "        \n",
376
    "    return feat_space, sisso "
377
378
379
380
   ]
  },
  {
   "cell_type": "code",
381
   "execution_count": 88,
382
383
   "metadata": {
    "ExecuteTime": {
384
385
     "end_time": "2021-09-15T12:06:55.393669Z",
     "start_time": "2021-09-15T12:06:55.375598Z"
386
    },
Luigi Sbailo's avatar
Luigi Sbailo committed
387
    "init_cell": true
388
389
390
   },
   "outputs": [],
   "source": [
Luigi Sbailo's avatar
Luigi Sbailo committed
391
392
    "# In this cell interactions with buttons are defined\n",
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
393
394
395
    "from ipywidgets import widgets, interactive\n",
    "from IPython.display import HTML, clear_output\n",
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
396
    "def handle_rung_selection(change):\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
397
    "    if change['new'] == 'PRM2020':\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
398
399
    "        default_operations =  ['add', 'sub', 'abs_diff', 'mult', 'div', 'exp', 'neg_exp', 'inv', 'sq', 'cb', \n",
    "                            'sqrt', 'cbrt', 'log', 'abs']\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
400
    "        default_features = ['z_cations','x_cations','l_cations','z_anions','x_anions','l_anions']\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
401
402
403
404
405
406
407
    "\n",
    "        for op, widget in zip(possible_operations, op_list):\n",
    "            widget.value = op in default_operations\n",
    "            widget.disabled = True\n",
    "        for feat, widget in zip(possible_features, feat_list):\n",
    "            widget.value = feat in default_features\n",
    "            widget.disabled = True\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
408
    "        rung_selection.value = 'PRM2020'\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
409
410
411
412
413
    "        feat_per_iter_selection.value = 50\n",
    "        dimension_selection.value = 2    \n",
    "    else:\n",
    "        for widget in op_list+feat_list:\n",
    "            widget.disabled = False\n",
414
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
    "def plot_button_clicked(button):\n",
    "    with out2:\n",
    "        model = sisso.models[1][0]\n",
    "        classified=model.prop_train\n",
    "        compounds = df_train.index.to_list()\n",
    "        df=pd.DataFrame(data={\n",
    "            \"Compound\":compounds,\n",
    "            \"Classification\":classified})\n",
    "        for feat in sisso.models[sisso.n_dim-1][0].feats:\n",
    "            df[str(feat.expr)]=feat.value\n",
    "        classes = ['Topological insulators', 'Trivial insulators']\n",
    "        visualizer=Visualizer(df, sisso, classes)\n",
    "        visualizer.show()\n",
    "        \n",
    "\n",
    "def default_button_clicked(button):\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
431
    "    \n",
Luigi Sbailo's avatar
Luigi Sbailo committed
432
    "    rung_selection.value = 'PRM2020'\n",
433
    "    feat_per_iter_selection.value = 50\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
434
435
    "    dimension_selection.value = 2\n",
    "    \n",
Luigi Sbailo's avatar
Luigi Sbailo committed
436
    "def run_button_clicked(button):\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
437
438
    "    with out2:\n",
    "        clear_output()    \n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
439
    "    with out1:        \n",
Luigi Sbailo's avatar
Luigi Sbailo committed
440
441
442
443
444
445
446
    "        clear_output()\n",
    "        print('Calculating...', flush=True)\n",
    "        selected_features = []\n",
    "        allowed_operations = []\n",
    "        for op, widget in zip(possible_operations, op_list):\n",
    "            if widget.value:\n",
    "                allowed_operations.append(op)\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
447
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
448
449
450
    "        for sel_feat, widget in zip(possible_features, feat_list):\n",
    "            if widget.value:\n",
    "                feat = sel_feat.split('_')[0]\n",
451
452
453
454
455
456
457
458
    "                typ = sel_feat.split('_')[1]\n",
    "                if (typ=='cations'):\n",
    "                    selected_features.append(feat + '_'+ 'A')        \n",
    "                    selected_features.append(feat + '_'+ 'B')        \n",
    "                if (typ=='anions'):\n",
    "                    selected_features.append(feat + '_'+ 'L')        \n",
    "                    selected_features.append(feat + '_'+ \"M\")        \n",
    "                    selected_features.append(feat + '_'+ \"N\")        \n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
459
    "                            \n",
Luigi Sbailo's avatar
Luigi Sbailo committed
460
    "        if rung_selection.value == 'PRM2020':\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
461
462
463
464
    "            selected_features = \"all\"\n",
    "            tier = 0\n",
    "            default = True\n",
    "        else:\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
465
    "            tier = rung_selection.value\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
466
467
468
469
    "            default = False\n",
    "            \n",
    "        global feat_space\n",
    "        global sisso\n",
470
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
471
    "        try:\n",
472
473
474
475
476
477
478
479
480
481
482
    "            feat_space, sisso = get_feat_space_and_sisso_regressor(\n",
    "                selected_ops = allowed_operations,\n",
    "                selected_features = selected_features,\n",
    "                max_rung = tier,\n",
    "                n_sis_select = feat_per_iter_selection.value,\n",
    "                n_dim = dimension_selection.value,\n",
    "                n_residual = 10,\n",
    "                default = default\n",
    "            )\n",
    "\n",
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
483
484
485
486
487
    "            clear_output()\n",
    "            if (dimension_selection.value>1):\n",
    "                plot_button.disabled=False\n",
    "            else:\n",
    "                plot_button.disabled=True\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
488
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
489
    "            print(\"Number of features generated: \" + str(feat_space.n_feat))\n",
490
    "            print(\"\")\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
491
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
492
493
    "            try:\n",
    "                sisso.fit()\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
494
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
495
496
    "                for i in range(dimension_selection.value):\n",
    "                    print(str(i+1)+'D model')\n",
497
498
    "                    print(\"# misclassified: {} \".format(int(sisso.models[i][0].n_convex_overlap_train)))\n",
    "                    string = \"SVM dividing line: c0\"\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
499
    "                    for nf, feat  in enumerate(sisso.models[i][0].feats):\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
500
    "                        string = string + str(' + a'+str(nf)+'*'+str(feat.expr))\n",
501
502
    "                    string = string + \" = 0\"\n",
    "                    print(string)\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
503
504
505
506
507
    "                    string = \"c0:{:.4}\".format(sisso.models[i][0].coefs[0][-1])\n",
    "                    for j in range(i+1):\n",
    "                        string = string + str(\"  |  a\"+str(j)+\":{:.4}\".format(sisso.models[i][0].coefs[0][j]))\n",
    "                    print(string + '\\n')\n",
    "                global df\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
508
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
509
510
511
512
    "            except RuntimeError:\n",
    "                print(\"\\nThe number of selected features per SIS iteration is bigger than the number of features available. Please reduce the number of selected features per SIS iteration (number of features generated / max number of dimensions) or increase the number of selected features and operations.\")\n",
    "        except:\n",
    "            print('The present selection does not lead to the creation of any derived features in the highest selected rung, please select at least one binary or power operator, or reduce the maximum rung')"
513
514
515
516
   ]
  },
  {
   "cell_type": "code",
517
   "execution_count": 89,
518
519
   "metadata": {
    "ExecuteTime": {
520
521
     "end_time": "2021-09-15T12:06:55.691702Z",
     "start_time": "2021-09-15T12:06:55.395093Z"
522
    },
Luigi Sbailo's avatar
Luigi Sbailo committed
523
    "init_cell": true,
524
525
    "scrolled": false
   },
526
527
528
529
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
530
       "model_id": "a6173b0cb1eb48fd82eb15d5f54f1207",
531
532
533
534
535
536
537
538
539
540
541
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(HBox(children=(VBox(children=(Label(value=''), Checkbox(value=True, disabled=True, indent=False…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
542
   "source": [
Luigi Sbailo's avatar
Luigi Sbailo committed
543
544
545
546
547
548
549
550
    "cb_layout = widgets.Layout(width = '15px')\n",
    "thin_layout = widgets.Layout(width = '100px')\n",
    "mid_layout = widgets.Layout(width = '200px')\n",
    "wide_layout = widgets.Layout(width = '300px')\n",
    "\n",
    "possible_operations = ['add', 'sub', 'abs_diff', 'mult', 'div', 'exp', 'neg_exp', 'inv', 'sq', 'cb', \n",
    "                        'sqrt', 'cbrt', 'log', 'abs']\n",
    "\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
551
    "possible_features = ['z_cations','x_cations','l_cations','z_anions','x_anions','l_anions']\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
552
553
    "\n",
    "tooltips = {\n",
554
    "    \"z_cations\" : \"Atomic number\",\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
555
556
    "    \"x_cations\" : \"Pauling electronegativity\",\n",
    "    \"l_cations\" : \"Spin orbit coupling\",\n",
557
    "    \"z_anions\" : \"Atomic number\",\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
558
559
    "    \"x_anions\" : \"Pauling electronegativity\",\n",
    "    \"l_anions\" : \"Spin orbit coupling\",\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
560
561
562
563
564
565
    "}\n",
    "\n",
    "labels = {\n",
    "    'add' : '$x + y$', 'sub' : '$x - y$', 'abs_diff' : '$|x - y|$', 'mult' : '$x \\cdot y$', 'div' : '$x / y$',\n",
    "    'exp' : '$\\exp(x)$', 'neg_exp' : '$\\exp(-x)$', 'inv' : '$1/x$', 'sq' : '$x^2$', 'cb' : '$x^3$', \n",
    "    'six_pow' : '$x^6$', 'sqrt' : '$\\sqrt{x}$', 'cbrt' : '$\\sqrt[3]{x}$', 'log' : '$\\log(x)$',\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
566
567
    "    'abs' :  '$|x|$', 'sin' : '$\\sin(x)$', 'cos' : '$\\cos(x)$', 'z_cations' : '$Z_{cations}$', 'x_cations' : '$\\chi_{cations}$', \n",
    "    'l_cations' : '$\\lambda_{cations}$', 'z_anions' : '$Z_{anions}$', 'x_anions' : '$\\chi_{anions}$', 'l_anions' : '$\\lambda_{anions}$'  \n",
Luigi Sbailo's avatar
Luigi Sbailo committed
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
    "}\n",
    "\n",
    "op_list = []\n",
    "op_labels  = []\n",
    "feat_list = []\n",
    "feat_labels = []\n",
    "for operation in possible_operations:\n",
    "    op_list.append(widgets.Checkbox(description='', value=True, indent=False, layout=cb_layout))\n",
    "    op_labels.append(widgets.Label(value=labels[operation]))\n",
    "for feature in possible_features:\n",
    "    feat_list.append(widgets.Checkbox(description=tooltips[feature], value=True, indent=False, layout=cb_layout))\n",
    "    feat_labels.append(widgets.Label(value=labels[feature]))\n",
    "    \n",
    "op_box = widgets.VBox([widgets.Label()]+op_list)\n",
    "op_label_box = widgets.VBox([widgets.Label(value='Operations:', layout=thin_layout)]+op_labels)\n",
583
    "for box in op_list: box.disabled = True\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
584
585
    "feat_box = widgets.VBox([widgets.Label()]+feat_list)\n",
    "feat_label_box = widgets.VBox([widgets.Label(value='Features:', layout=thin_layout)]+feat_labels)\n",
586
    "for box in feat_list: box.disabled = True\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
587
    "\n",
588
589
    "rung_selection = widgets.Dropdown(options=['PRM2020', 1,2,3], value=2,layout=thin_layout)\n",
    "rung_selection.value = 'PRM2020'\n",
590
    "feat_per_iter_selection = widgets.BoundedIntText(value = 50, min=10, max=200, step=1, layout=thin_layout)\n",
591
    "dimension_selection = widgets.BoundedIntText(value = 2, min=1, max=4, step=1, layout = thin_layout)\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
592
593
594
    "settings_box = widgets.VBox([\n",
    "    widgets.Label(value='Settings:', layout=wide_layout),\n",
    "    widgets.Label(value='SISSO rung:', layout=wide_layout),\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
595
    "    rung_selection,\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
596
597
598
    "    widgets.Label(value='To unfreeze the feature selection,' , layout=wide_layout),\n",
    "    widgets.Label(value='please select any rung other than PRM2020.', layout=widgets.Layout(width = '300px', bottom='10px') ),\n",
    "    widgets.Label(value='Number of selected features per SIS iteration:',  layout=wide_layout),\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
599
600
601
602
603
    "    feat_per_iter_selection,\n",
    "    widgets.Label(value='Maximum number of dimensions:', layout=wide_layout),\n",
    "    dimension_selection])\n",
    "\n",
    "default_button = widgets.Button(description = 'Default selection', layout=mid_layout)\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
604
    "run_button = widgets.Button(description = 'Run', layout=mid_layout)\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
605
    "plot_button = widgets.Button(description = 'Plot interactive map', disabled=True, layout=mid_layout)\n",
606
607
    "button_box = widgets.VBox([default_button, run_button, plot_button])\n",
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
608
609
610
611
612
    "default_button.on_click(default_button_clicked)\n",
    "run_button.on_click(run_button_clicked)\n",
    "plot_button.on_click(plot_button_clicked)\n",
    "rung_selection.observe(handle_rung_selection, names='value')\n",
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
613
614
615
616
617
618
619
    "out1 = widgets.Output()\n",
    "out2 = widgets.Output()\n",
    "\n",
    "gui_box = widgets.HBox([op_box, op_label_box, feat_box, feat_label_box, settings_box, button_box])\n",
    "out_box = widgets.VBox([gui_box, out1, out2])\n",
    "\n",
    "display(out_box)"
620
621
622
623
   ]
  }
 ],
 "metadata": {
Luigi Sbailo's avatar
Luigi Sbailo committed
624
  "celltoolbar": "Initialization Cell",
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
640
   "version": "3.7.3"
641
642
643
644
645
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}