tetradymite_PRM2020.ipynb 32.2 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-09-17T14:26:04.070620Z",
     "start_time": "2020-09-17T14:26:04.062967Z"
    }
   },
   "source": [
    "<div id=\"teaser\" style=' background-position:  right center; background-size: 00px; background-repeat: no-repeat; \n",
    "    padding-top: 20px;\n",
    "    padding-right: 10px;\n",
    "    padding-bottom: 170px;\n",
    "    padding-left: 10px;\n",
    "    border-bottom: 14px double #333;\n",
    "    border-top: 14px double #333;' > \n",
    "\n",
    "   \n",
    "   <div style=\"text-align:center\">\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
22
    "    <b><font size=\"6.4\">Discovery of new topological insulators in alloyed tetradymites\n",
23
    "        </font></b>    \n",
24
25
    "  </div>\n",
    "\n",
26
27
28
29
30
    "<p>\n",
    "created by: Luigi Sbailò, Thomas A. R. Purcell, Luca M. Ghiringhelli, and Matthias Scheffler   \n",
    "   \n",
    "Fritz Haber Institute of the Max Planck Society, Faradayweg 4-6, D-14195 Berlin, Germany <br>\n",
    "<span class=\"nomad--last-updated\" data-version=\"v1.0.0\">[Last updated: Sep 29, 2020]</span>\n",
31
32
33
34
35
36
37
38
39
40
41
42
    "  \n",
    "<div> \n",
    "<img  style=\"float: left;\" src=\"assets/tetradymite_PRM2020/Logo_MPG.png\" width=\"200\"> \n",
    "<img  style=\"float: right;\" src=\"assets/tetradymite_PRM2020/Logo_NOMAD.png\" width=\"250\">\n",
    "</div>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
    "### Introduction\n",
    "This tutorial shows how to find descriptive parameters (short formulas) that predict whether alloyed materials are topological or trivial insulators, using the example of tetradymites. It is based on the algorithm sure independence screening and sparsifying operator (SISSO), that enables to search for optimal descriptor by scanning huge feature spaces.\n",
    "\n",
    "<div style=\"padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;\">R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, L. M. Ghiringhelli: <span style=\"font-style: italic;\">SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates</span>, Phys. Rev. Materials  2, 083802 (2018) <a href=\"https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802\" target=\"_blank\">[PDF]</a>.</div>\n",
    "\n",
    "With the default settings, the method reproduces the same results from:\n",
    "\n",
    "<div style=\"padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;\">G. Cao, R. Ouyang, L. M. Ghiringhelli, M. Scheffler, H. Liu, C. Carbogno, and Z. Zhang: <span style=\"font-style: italic;\">Artificial intelligence for high-throughput discovery of topological insulators: The example of alloyed tetradymites</span>,  Phys. Rev. Materials 4, 034204 (2020) <a href=\"https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.4.034204\">[PDF]</a>,</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "    <summary>\n",
59
    "        <div style=\"padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;\"><b>Explanation of the method (click to expand/collapse)</b></div></summary>\n",
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
    "\n",
    "We present a tool for predicting whether alloyed tetradymite are topological or trivial insulators, by using a set of descriptive parameters (a descriptor) based on free-atom data of the atomic species constituting the $AB-LMN$ materials, where $A,B \\in \\{ \\textrm{As, Sb, Bi} \\}$ and $L,M,N \\in \\{ \\textrm{S, Se, Te} \\}$. We apply a recently developed method: sure independence screening and sparsifying operator (SISSO), that allows to find an optimal descriptor in a huge feature space containing billions of features. In this tutorial an $\\ell_0$-optimization is used as the sparsifying operator.\n",
    "The method is described in:\n",
    "               \n",
    "<div style=\"padding: 1ex; margin-top: 1ex; margin-bottom: 1ex; border-style: dotted; border-width: 1pt; border-color: blue; border-radius: 3px;\">\n",
    "R. Ouyang, S. Curtarolo, E. Ahmetcik, M. Scheffler, L. M. Ghiringhelli: <span style=\"font-style: italic;\">SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates</span>, Phys. Rev. Materials  2, 083802 (2018) <a href=\"https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802\" target=\"_blank\">[PDF]</a>. <br> </div>\n",
    "               \n",
    "In this tutorial, we focus on the classification flavor of SISSO($\\ell_0$). \n",
    "In the space of descriptors, each category’s domain (here, topological vs trivial insulator) is approximated as\n",
    "the region of space within the convex hull of the corresponding training data. SISSO finds the low-dimensional descriptor yielding the minimum overlap between these convex regions. In practice, the algorithm is iterative. At the first iteration, in the SIS step, it selects the $k$ features which yield the smallest overlap when convex regions (segments encompassing all the data in one category) over the training data are constructed. In the first iteration the feature giving the smalles overlap is already the 1D model. At each subsequent iteration $i$, in the SIS step. $k$ new features that do the same for those training points which were in the overlap regions at the previous steps (i.e., the residuals). Then, in the SO step, all $i$-tuples of features selected combining in all possible ways the features selected in the SIS steps are ranked by the size of the overlap. The $i$-tuple with the smallest overlap is the $i$D model. \n",
    "\n",
    "In order to better identify a predictive model to classify unseen data point, at each dimension (iteration) a soft-margin support-vector machine <a href=\"https://link.springer.com/article/10.1007%252FBF00994018\" target=\"_blank\">[C. Cortes & V. Vapnik, Machine learning 20, 273 (1995)]</a> is trained to define the separating hyperplanes. The resulting model is identified by the coefficents and intercept of the hyperplanes.\n",
    "               \n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-09-30T09:55:37.970299Z",
     "start_time": "2020-09-30T09:55:37.934551Z"
    }
   },
   "source": [
85
    "The idea demonstrated in this tutorial is to start from simple physical quantities (\"primary features\", here properties of the constituent free atoms such as Pauling electronegativity), to generate millions (or billions) of candidate formulas by applying arithmetic operations combining primary features. These candidate formulas constitute the so-called \"feature space\". Then, SISSO is used to select only a few of these formulas that explain the data.\n",
86
    "\n",
87
    "By clicking directly on \"Run\" below, i.e., with the default selection, you can reproduce the 2D map as published in <a href=\"https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802\" target=\"_blank\">PRM 2020</a>. You can also select primary features and allowed operations (by clicking the check-boxes), as well as the SISSO rung (i.e., the number of iterations in the construction of the feature space), the number of features that are selected at each iteration of the SIS step, and the max number of dimensions of the model. The materials considered here have up to 5 different atomic species in the unit cell, with the prototype formula $AB-LMN$, where the cations $A,B \\in \\{ \\textrm{As, Sb, Bi} \\}$ and the anions $L,M,N \\in \\{ \\textrm{S, Se, Te} \\}$. We have therefore grouped the features to be selected into those for cations and anions. This means that by selecting, e.g., a cation feature, such feature is added to the primary feature set for both $A$ and $B$ elements, but either is treated singularly in the feature construction and SISSO optimization. After the features' and other settings' selection, press \"Run\". \\\n",
88
89
90
    "After the results are shown for all models from one dimensional to the max chosen dimension, you can press \"Plot interactive map\" to reveal a map of tetradymites' topological vs trivial insulators, for the highest-dimensional model. If the highest-dimensional model is 2D, the support-vector-machine separation line between the two phases is shown. For higher dimensional models, the 3rd and 4th dimensions can be visualized via the size or the color of the data-point markers. Intuitive drop-down menus allow to assign axes, markers, and colors, to the descriptor components of choice.\n",
    "\n",
    "With the selection of \"PRM2020\" (or default selection) as SISSO rung, a special feature space is uploaded, which contains much fewer features than in the production calculation used in <a href=\"https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.2.083802\" target=\"_blank\">PRM 2020</a>. This allows to reobtain in the notebook the same result in a reasonsable time. Still, the provided feature space contains thousands of the top ranked features and SISSO finds the best nD model. "
91
92
93
94
   ]
  },
  {
   "cell_type": "code",
Luigi Sbailo's avatar
Luigi Sbailo committed
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-06-21T15:58:01.856952Z",
     "start_time": "2021-06-21T15:58:01.835979Z"
    }
   },
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
111
112
   "metadata": {
    "ExecuteTime": {
Luigi Sbailo's avatar
Luigi Sbailo committed
113
114
     "end_time": "2021-06-21T16:55:44.085425Z",
     "start_time": "2021-06-21T16:55:44.050435Z"
Luigi Sbailo's avatar
Luigi Sbailo committed
115
116
117
    },
    "init_cell": true
   },
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
   "outputs": [
    {
     "data": {
      "text/html": [
       "<script>\n",
       "    code_show=true; \n",
       "    function code_toggle() {\n",
       "        if (code_show)\n",
       "        {\n",
       "            $('div.input').hide();\n",
       "        } \n",
       "        else \n",
       "        {\n",
       "            $('div.input').show();\n",
       "        }\n",
       "        code_show = !code_show\n",
       "    } \n",
       "    $( document ).ready(code_toggle);\n",
       "    window.runCells(\"startup\");\n",
       "</script>\n",
       "The Python code for this notebook is by default hidden for easier reading.\n",
       "To toggle on/off the code, click <a href=\"javascript:code_toggle()\">here</a>.\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
Luigi Sbailo's avatar
Luigi Sbailo committed
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
   "source": [
    "%%HTML\n",
    "<script>\n",
    "    code_show=true; \n",
    "    function code_toggle() {\n",
    "        if (code_show)\n",
    "        {\n",
    "            $('div.input').hide();\n",
    "        } \n",
    "        else \n",
    "        {\n",
    "            $('div.input').show();\n",
    "        }\n",
    "        code_show = !code_show\n",
    "    } \n",
    "    $( document ).ready(code_toggle);\n",
    "    window.runCells(\"startup\");\n",
    "</script>\n",
167
168
    "The Python code for this notebook is by default hidden for easier reading.\n",
    "To toggle on/off the code, click <a href=\"javascript:code_toggle()\">here</a>."
Luigi Sbailo's avatar
Luigi Sbailo committed
169
170
171
172
   ]
  },
  {
   "cell_type": "code",
Luigi Sbailo's avatar
Luigi Sbailo committed
173
   "execution_count": 99,
174
175
   "metadata": {
    "ExecuteTime": {
Luigi Sbailo's avatar
Luigi Sbailo committed
176
177
     "end_time": "2021-06-21T16:55:44.106011Z",
     "start_time": "2021-06-21T16:55:44.087469Z"
Luigi Sbailo's avatar
Luigi Sbailo committed
178
179
    },
    "init_cell": true
180
181
182
   },
   "outputs": [],
   "source": [
Luigi Sbailo's avatar
Luigi Sbailo committed
183
    "from sissopp import get_max_number_feats, get_estimate_n_feat_next_rung, generate_fs, SISSOClassifier, generate_phi_0_from_csv, FeatureSpace\n",
184
185
186
187
188
189
190
191
    "from tetradymite_PRM2020.visualizer import Visualizer\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import os"
   ]
  },
  {
   "cell_type": "code",
192
   "execution_count": null,
193
194
   "metadata": {
    "ExecuteTime": {
Luigi Sbailo's avatar
Luigi Sbailo committed
195
196
     "end_time": "2021-06-21T15:58:13.011817Z",
     "start_time": "2021-06-21T15:58:01.884774Z"
197
198
    }
   },
Luigi Sbailo's avatar
Luigi Sbailo committed
199
   "outputs": [],
200
   "source": [
Luigi Sbailo's avatar
Luigi Sbailo committed
201
    "# The dataset is stored in the NOMAD Archive and can be accessed with this query.\n",
202
203
204
205
206
207
208
    "from nomad import client, config\n",
    "config.client.url = 'http://nomad-lab.eu/prod/rae/api'\n",
    "query = client.query_archive(query={\n",
    "    'dataset_id': ['BjT-NFK0QdOx81_z5TmyeQ']},\n",
    "                                  per_page=100,\n",
    ")\n",
    "print(query)\n"
209
210
211
212
   ]
  },
  {
   "cell_type": "code",
Luigi Sbailo's avatar
Luigi Sbailo committed
213
   "execution_count": 100,
214
215
   "metadata": {
    "ExecuteTime": {
Luigi Sbailo's avatar
Luigi Sbailo committed
216
217
     "end_time": "2021-06-21T16:55:44.142771Z",
     "start_time": "2021-06-21T16:55:44.107888Z"
Luigi Sbailo's avatar
Luigi Sbailo committed
218
219
    },
    "init_cell": true
220
221
222
   },
   "outputs": [],
   "source": [
Luigi Sbailo's avatar
Luigi Sbailo committed
223
    "df_train = pd.read_pickle('./data/tetradymite_PRM2020/training_set')"
224
225
226
227
   ]
  },
  {
   "cell_type": "code",
228
   "execution_count": null,
229
230
   "metadata": {
    "ExecuteTime": {
Luigi Sbailo's avatar
Luigi Sbailo committed
231
232
     "end_time": "2021-06-21T15:58:13.328332Z",
     "start_time": "2021-06-21T15:58:13.053778Z"
Luigi Sbailo's avatar
Luigi Sbailo committed
233
234
    },
    "scrolled": true
235
236
237
   },
   "outputs": [],
   "source": [
Luigi Sbailo's avatar
Luigi Sbailo committed
238
239
240
    "# This piece of code is not run at initialization. \n",
    "# It serves to create the molecular structures which are visualized.\n",
    "\n",
241
242
243
244
    "path_structure = './data/tetradymite_PRM2020/structures/'\n",
    "try:\n",
    "    os.mkdir(path_structure)\n",
    "except OSError:\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
245
    "    !rm ./data/tetradymite_PRM2020/structures/*\n",
246
247
248
249
250
251
252
    "compounds=df_train.index.to_list()\n",
    "scale_factor = 10**10\n",
    "alist = []\n",
    "for compound in compounds:\n",
    "    for entry in range (1581):\n",
    "        labels = query[entry].section_run[0].section_system[-1].atom_labels\n",
    "        if (len(labels)>5):\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
253
    "            continue\n",
254
    "        \n",
255
256
257
258
    "        labels_1 = str(labels[0])+'_'+str(labels[1])+'_'+str(labels[3])+'_'+str(labels[4])+'_'+str(labels[2])\n",
    "        labels_2 = str(labels[0])+'_'+str(labels[1])+'_'+str(labels[4])+'_'+str(labels[3])+'_'+str(labels[2])\n",
    "        labels_3 = str(labels[1])+'_'+str(labels[0])+'_'+str(labels[3])+'_'+str(labels[4])+'_'+str(labels[2])\n",
    "        labels_4 = str(labels[1])+'_'+str(labels[0])+'_'+str(labels[4])+'_'+str(labels[3])+'_'+str(labels[2])\n",
259
    "\n",
260
    "        if compound in list([labels_1, labels_2, labels_3, labels_4]):\n",
261
    "\n",
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
    "            n_atoms = len (labels)\n",
    "            lat_x, lat_y, lat_z = query[entry].section_run[0].section_system[-1].lattice_vectors.magnitude * scale_factor\n",
    "            file = open(path_structure + str(compound) +\".xyz\",\"w\") \n",
    "            file.write(\"%d\\n\\n\"%(n_atoms*8))\n",
    "            for i in [0,1,2]:\n",
    "                    for j in [0,1,2]:\n",
    "                        for k in [0,1,2]:\n",
    "                            for n in range (n_atoms):\n",
    "                                el = query[entry].section_run[0].section_system[-1].atom_labels[n]\n",
    "                                xyz = query[entry].section_run[0].section_system[-1].atom_positions[n].magnitude * scale_factor\n",
    "                                xyz += i*lat_x\n",
    "                                xyz += j*lat_y\n",
    "                                xyz += k*lat_z\n",
    "                                file.write (el)\n",
    "                                file.write (\"\\t%f\\t%f\\t%f\\n\"%(xyz[0],xyz[1],xyz[2]))\n",
    "            file.close()\n",
    "            alist.append(compound)\n",
279
    "\n",
280
    "            break\n",
281
282
283
284
285
    "    "
   ]
  },
  {
   "cell_type": "code",
Luigi Sbailo's avatar
Luigi Sbailo committed
286
   "execution_count": 101,
287
288
   "metadata": {
    "ExecuteTime": {
Luigi Sbailo's avatar
Luigi Sbailo committed
289
290
     "end_time": "2021-06-21T16:55:44.270578Z",
     "start_time": "2021-06-21T16:55:44.144647Z"
Luigi Sbailo's avatar
Luigi Sbailo committed
291
292
    },
    "init_cell": true
293
   },
294
295
296
297
298
299
   "outputs": [],
   "source": [
    "zeta = {'S':16, 'As':33, 'Se':34, 'Sb':51, 'Te':52, 'Bi':83}\n",
    "chi = {'S':2.58, 'As':2.18, 'Se':2.55, 'Sb':2.05, 'Te':2.12, 'Bi':2.02}\n",
    "lambd = {'S':0.05, 'As':0.19, 'Se':0.22, 'Sb':0.4, 'Te':0.49, 'Bi':1.25}\n",
    "\n",
300
    "df_feat = pd.DataFrame(index=df_train.index, columns=[\n",
301
302
303
304
305
306
    "                                                     'z_A','z_B','z_L','z_M','z_N',\n",
    "                                                     'x_A','x_B','x_L','x_M','x_N',\n",
    "                                                     'l_A','l_B','l_L','l_M','l_N',\n",
    "                                                     ])\n",
    "for comp in df_train.index:\n",
    "    ablmn = comp.split('_')\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
307
    "    df_feat.loc[comp] = pd.Series({\n",
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
    "                                   'z_A':zeta[ablmn[0]],\n",
    "                                   'z_B':zeta[ablmn[1]],\n",
    "                                   'z_L':zeta[ablmn[2]],\n",
    "                                   'z_M':zeta[ablmn[3]],\n",
    "                                   'z_N':zeta[ablmn[4]],\n",
    "                                   'x_A':chi[ablmn[0]],\n",
    "                                   'x_B':chi[ablmn[1]],\n",
    "                                   'x_L':chi[ablmn[2]],\n",
    "                                   'x_M':chi[ablmn[3]],\n",
    "                                   'x_N':chi[ablmn[4]],\n",
    "                                   'l_A':lambd[ablmn[0]],\n",
    "                                   'l_B':lambd[ablmn[1]],\n",
    "                                   'l_L':lambd[ablmn[2]],\n",
    "                                   'l_M':lambd[ablmn[3]],\n",
    "                                   'l_N':lambd[ablmn[4]],\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
323
324
325
    "                                  }) \n",
    "\n",
    "df_feat['Class'] = df_train['Class']"
326
327
328
329
   ]
  },
  {
   "cell_type": "code",
Luigi Sbailo's avatar
Luigi Sbailo committed
330
   "execution_count": 102,
331
332
   "metadata": {
    "ExecuteTime": {
Luigi Sbailo's avatar
Luigi Sbailo committed
333
334
     "end_time": "2021-06-21T16:55:44.292705Z",
     "start_time": "2021-06-21T16:55:44.271942Z"
Luigi Sbailo's avatar
Luigi Sbailo committed
335
336
    },
    "init_cell": true
337
338
339
   },
   "outputs": [],
   "source": [
Luigi Sbailo's avatar
Luigi Sbailo committed
340
    "def get_featspace_sisso(\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
341
342
343
344
345
346
347
348
349
350
351
    "    df,\n",
    "    ops= ['add', 'sub', 'abs_diff', 'mult', 'div', 'exp', 'neg_exp', 'inv', 'sq', 'cb', \n",
    "                        'sqrt', 'cbrt', 'log', 'abs'],\n",
    "    cols=\"all\",\n",
    "    max_phi=2,\n",
    "    n_sis_select=50,\n",
    "    remove_double_divison=True,\n",
    "    max_dim=3,\n",
    "    n_residual=1,\n",
    "    default=True,\n",
    "):\n",
352
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
353
    "    if default:\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
354
355
356
357
358
359
360
361
    "        phi_0, prop_label, prop_unit, prop, prop_test, task_sizes_train, task_sizes_test, leave_out_inds = generate_phi_0_from_csv(\n",
    "            df_train, \n",
    "            \"Class\",\n",
    "            cols='all',\n",
    "            task_key=None,\n",
    "            leave_out_frac=0.0,\n",
    "            leave_out_inds=None,\n",
    "            max_rung=1\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
362
363
364
365
366
    "        )\n",
    "        feat_space = generate_fs(\n",
    "            phi_0, \n",
    "            prop, \n",
    "            task_sizes_train, \n",
Luigi Sbailo's avatar
Luigi Sbailo committed
367
368
    "            [\"add\", \"sub\", \"mult\", \"div\", \"abs_diff\", \"sq\", \"cb\", \"sqrt\", \"cbrt\", \"inv\", \"abs\"], \n",
    "            [],\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
369
370
    "            \"classification\",\n",
    "            0, \n",
371
    "            n_sis_select\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
372
373
    "        )\n",
    "    else:\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
374
375
376
377
378
379
380
381
    "        phi_0, prop_label, prop_unit, prop, prop_test, task_sizes_train, task_sizes_test, leave_out_inds = generate_phi_0_from_csv(\n",
    "            df_feat, \n",
    "            \"Class\", \n",
    "            cols=cols, \n",
    "            task_key=None, \n",
    "            leave_out_frac=0.0, \n",
    "            leave_out_inds=None, \n",
    "            max_rung=max_phi\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
382
383
384
385
386
387
    "        )\n",
    "        feat_space = generate_fs(\n",
    "            phi_0, \n",
    "            prop, \n",
    "            task_sizes_train, \n",
    "            ops,\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
388
    "            [],\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
389
390
391
392
393
394
395
    "            \"classification\",\n",
    "            max_phi, \n",
    "            n_sis_select\n",
    "        )\n",
    "        \n",
    "    sisso = SISSOClassifier(\n",
    "        feat_space,\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
396
    "        prop_label,\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
397
398
399
400
401
402
403
404
405
406
407
    "        prop_unit,\n",
    "        prop,\n",
    "        prop_test,\n",
    "        task_sizes_train,\n",
    "        task_sizes_test,\n",
    "        leave_out_inds,\n",
    "        max_dim,\n",
    "        10,\n",
    "        10\n",
    "    )\n",
    "    return feat_space, sisso"
408
409
410
411
   ]
  },
  {
   "cell_type": "code",
Luigi Sbailo's avatar
Luigi Sbailo committed
412
   "execution_count": 103,
413
414
   "metadata": {
    "ExecuteTime": {
Luigi Sbailo's avatar
Luigi Sbailo committed
415
416
     "end_time": "2021-06-21T16:55:44.326081Z",
     "start_time": "2021-06-21T16:55:44.294079Z"
417
    },
Luigi Sbailo's avatar
Luigi Sbailo committed
418
    "init_cell": true
419
420
421
   },
   "outputs": [],
   "source": [
Luigi Sbailo's avatar
Luigi Sbailo committed
422
423
    "# In this cell interactions with buttons are defined\n",
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
424
425
426
    "from ipywidgets import widgets, interactive\n",
    "from IPython.display import HTML, clear_output\n",
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
427
    "def handle_rung_selection(change):\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
428
    "    if change['new'] == 'PRM2020':\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
429
430
    "        default_operations =  ['add', 'sub', 'abs_diff', 'mult', 'div', 'exp', 'neg_exp', 'inv', 'sq', 'cb', \n",
    "                            'sqrt', 'cbrt', 'log', 'abs']\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
431
    "        default_features = ['z_cations','x_cations','l_cations','z_anions','x_anions','l_anions']\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
432
433
434
435
436
437
438
    "\n",
    "        for op, widget in zip(possible_operations, op_list):\n",
    "            widget.value = op in default_operations\n",
    "            widget.disabled = True\n",
    "        for feat, widget in zip(possible_features, feat_list):\n",
    "            widget.value = feat in default_features\n",
    "            widget.disabled = True\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
439
    "        rung_selection.value = 'PRM2020'\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
440
441
442
443
444
    "        feat_per_iter_selection.value = 50\n",
    "        dimension_selection.value = 2    \n",
    "    else:\n",
    "        for widget in op_list+feat_list:\n",
    "            widget.disabled = False\n",
445
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
    "def plot_button_clicked(button):\n",
    "    with out2:\n",
    "        model = sisso.models[1][0]\n",
    "        classified=model.prop_train\n",
    "        compounds = df_train.index.to_list()\n",
    "        df=pd.DataFrame(data={\n",
    "            \"Compound\":compounds,\n",
    "            \"Classification\":classified})\n",
    "        for feat in sisso.models[sisso.n_dim-1][0].feats:\n",
    "            df[str(feat.expr)]=feat.value\n",
    "        classes = ['Topological insulators', 'Trivial insulators']\n",
    "        visualizer=Visualizer(df, sisso, classes)\n",
    "        visualizer.show()\n",
    "        \n",
    "\n",
    "def default_button_clicked(button):\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
462
463
464
    "    \n",
    "    default_operations =  ['add', 'sub', 'abs_diff', 'mult', 'div', 'exp', 'neg_exp', 'inv', 'sq', 'cb', \n",
    "                        'sqrt', 'cbrt', 'log', 'abs']\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
465
    "    default_features = ['z_cations','x_cations','l_cations','z_anions','x_anions','l_anions']\n",
466
    "    \n",
Luigi Sbailo's avatar
Luigi Sbailo committed
467
468
    "    for op, widget in zip(possible_operations, op_list):\n",
    "        widget.value = op in default_operations\n",
469
    "        widget.disabled = True\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
470
471
    "    for feat, widget in zip(possible_features, feat_list):\n",
    "        widget.value = feat in default_features\n",
472
    "        widget.disabled = True\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
473
    "    rung_selection.value = 'PRM2020'\n",
474
    "    feat_per_iter_selection.value = 50\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
475
476
    "    dimension_selection.value = 2\n",
    "    \n",
Luigi Sbailo's avatar
Luigi Sbailo committed
477
    "def run_button_clicked(button):\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
478
479
    "    with out2:\n",
    "        clear_output()    \n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
480
    "    with out1:        \n",
Luigi Sbailo's avatar
Luigi Sbailo committed
481
482
483
484
485
486
487
    "        clear_output()\n",
    "        print('Calculating...', flush=True)\n",
    "        selected_features = []\n",
    "        allowed_operations = []\n",
    "        for op, widget in zip(possible_operations, op_list):\n",
    "            if widget.value:\n",
    "                allowed_operations.append(op)\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
488
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
489
490
491
    "        for sel_feat, widget in zip(possible_features, feat_list):\n",
    "            if widget.value:\n",
    "                feat = sel_feat.split('_')[0]\n",
492
493
494
495
496
497
498
499
    "                typ = sel_feat.split('_')[1]\n",
    "                if (typ=='cations'):\n",
    "                    selected_features.append(feat + '_'+ 'A')        \n",
    "                    selected_features.append(feat + '_'+ 'B')        \n",
    "                if (typ=='anions'):\n",
    "                    selected_features.append(feat + '_'+ 'L')        \n",
    "                    selected_features.append(feat + '_'+ \"M\")        \n",
    "                    selected_features.append(feat + '_'+ \"N\")        \n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
500
    "                            \n",
Luigi Sbailo's avatar
Luigi Sbailo committed
501
    "        if rung_selection.value == 'PRM2020':\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
502
503
504
505
    "            selected_features = \"all\"\n",
    "            tier = 0\n",
    "            default = True\n",
    "        else:\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
506
    "            tier = rung_selection.value\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
507
508
509
510
511
    "            default = False\n",
    "            \n",
    "        global feat_space\n",
    "        global sisso\n",
    "        \n",
Luigi Sbailo's avatar
Luigi Sbailo committed
512
    "        try:\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
513
    "            feat_space, sisso = get_featspace_sisso(\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
514
515
516
517
518
519
520
521
522
523
524
525
526
527
    "                    df = df_train,\n",
    "                    ops = allowed_operations,\n",
    "                    cols = selected_features,\n",
    "                    max_phi = tier,\n",
    "                    n_sis_select = feat_per_iter_selection.value,\n",
    "                    remove_double_divison=True,\n",
    "                    max_dim = dimension_selection.value,\n",
    "                    n_residual = 1,\n",
    "                    default = default)\n",
    "            clear_output()\n",
    "            if (dimension_selection.value>1):\n",
    "                plot_button.disabled=False\n",
    "            else:\n",
    "                plot_button.disabled=True\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
528
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
529
    "            print(\"Number of features generated: \" + str(feat_space.n_feat))\n",
530
    "            print(\"\")\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
531
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
532
533
    "            try:\n",
    "                sisso.fit()\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
534
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
535
536
    "                for i in range(dimension_selection.value):\n",
    "                    print(str(i+1)+'D model')\n",
537
538
    "                    print(\"# misclassified: {} \".format(int(sisso.models[i][0].n_convex_overlap_train)))\n",
    "                    string = \"SVM dividing line: c0\"\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
539
    "                    for nf, feat  in enumerate(sisso.models[i][0].feats):\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
540
    "                        string = string + str(' + a'+str(nf)+'*'+str(feat.expr))\n",
541
542
    "                    string = string + \" = 0\"\n",
    "                    print(string)\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
543
544
545
546
547
    "                    string = \"c0:{:.4}\".format(sisso.models[i][0].coefs[0][-1])\n",
    "                    for j in range(i+1):\n",
    "                        string = string + str(\"  |  a\"+str(j)+\":{:.4}\".format(sisso.models[i][0].coefs[0][j]))\n",
    "                    print(string + '\\n')\n",
    "                global df\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
548
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
549
550
551
552
    "            except RuntimeError:\n",
    "                print(\"\\nThe number of selected features per SIS iteration is bigger than the number of features available. Please reduce the number of selected features per SIS iteration (number of features generated / max number of dimensions) or increase the number of selected features and operations.\")\n",
    "        except:\n",
    "            print('The present selection does not lead to the creation of any derived features in the highest selected rung, please select at least one binary or power operator, or reduce the maximum rung')"
553
554
555
556
   ]
  },
  {
   "cell_type": "code",
Luigi Sbailo's avatar
Luigi Sbailo committed
557
   "execution_count": null,
558
559
   "metadata": {
    "ExecuteTime": {
Luigi Sbailo's avatar
Luigi Sbailo committed
560
561
     "end_time": "2021-06-21T16:55:44.615778Z",
     "start_time": "2021-06-21T16:55:44.327382Z"
562
    },
Luigi Sbailo's avatar
Luigi Sbailo committed
563
    "init_cell": true,
564
565
    "scrolled": false
   },
Luigi Sbailo's avatar
Luigi Sbailo committed
566
   "outputs": [],
567
   "source": [
Luigi Sbailo's avatar
Luigi Sbailo committed
568
569
570
571
572
573
574
575
    "cb_layout = widgets.Layout(width = '15px')\n",
    "thin_layout = widgets.Layout(width = '100px')\n",
    "mid_layout = widgets.Layout(width = '200px')\n",
    "wide_layout = widgets.Layout(width = '300px')\n",
    "\n",
    "possible_operations = ['add', 'sub', 'abs_diff', 'mult', 'div', 'exp', 'neg_exp', 'inv', 'sq', 'cb', \n",
    "                        'sqrt', 'cbrt', 'log', 'abs']\n",
    "\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
576
    "possible_features = ['z_cations','x_cations','l_cations','z_anions','x_anions','l_anions']\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
577
578
    "\n",
    "tooltips = {\n",
579
    "    \"z_cations\" : \"Atomic number\",\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
580
581
    "    \"x_cations\" : \"Pauling electronegativity\",\n",
    "    \"l_cations\" : \"Spin orbit coupling\",\n",
582
    "    \"z_anions\" : \"Atomic number\",\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
583
584
    "    \"x_anions\" : \"Pauling electronegativity\",\n",
    "    \"l_anions\" : \"Spin orbit coupling\",\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
585
586
587
588
589
590
    "}\n",
    "\n",
    "labels = {\n",
    "    'add' : '$x + y$', 'sub' : '$x - y$', 'abs_diff' : '$|x - y|$', 'mult' : '$x \\cdot y$', 'div' : '$x / y$',\n",
    "    'exp' : '$\\exp(x)$', 'neg_exp' : '$\\exp(-x)$', 'inv' : '$1/x$', 'sq' : '$x^2$', 'cb' : '$x^3$', \n",
    "    'six_pow' : '$x^6$', 'sqrt' : '$\\sqrt{x}$', 'cbrt' : '$\\sqrt[3]{x}$', 'log' : '$\\log(x)$',\n",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
591
592
    "    'abs' :  '$|x|$', 'sin' : '$\\sin(x)$', 'cos' : '$\\cos(x)$', 'z_cations' : '$Z_{cations}$', 'x_cations' : '$\\chi_{cations}$', \n",
    "    'l_cations' : '$\\lambda_{cations}$', 'z_anions' : '$Z_{anions}$', 'x_anions' : '$\\chi_{anions}$', 'l_anions' : '$\\lambda_{anions}$'  \n",
Luigi Sbailo's avatar
Luigi Sbailo committed
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
    "}\n",
    "\n",
    "op_list = []\n",
    "op_labels  = []\n",
    "feat_list = []\n",
    "feat_labels = []\n",
    "for operation in possible_operations:\n",
    "    op_list.append(widgets.Checkbox(description='', value=True, indent=False, layout=cb_layout))\n",
    "    op_labels.append(widgets.Label(value=labels[operation]))\n",
    "for feature in possible_features:\n",
    "    feat_list.append(widgets.Checkbox(description=tooltips[feature], value=True, indent=False, layout=cb_layout))\n",
    "    feat_labels.append(widgets.Label(value=labels[feature]))\n",
    "    \n",
    "op_box = widgets.VBox([widgets.Label()]+op_list)\n",
    "op_label_box = widgets.VBox([widgets.Label(value='Operations:', layout=thin_layout)]+op_labels)\n",
    "feat_box = widgets.VBox([widgets.Label()]+feat_list)\n",
    "feat_label_box = widgets.VBox([widgets.Label(value='Features:', layout=thin_layout)]+feat_labels)\n",
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
611
    "rung_selection = widgets.Dropdown(options=['PRM2020', 1,2,3], layout=thin_layout)\n",
612
    "feat_per_iter_selection = widgets.BoundedIntText(value=26, min=10, max=100, step=1, layout=thin_layout)\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
613
614
615
616
    "dimension_selection = widgets.BoundedIntText(value = 3, min=1, max=4, step=1, layout = thin_layout)\n",
    "settings_box = widgets.VBox([\n",
    "    widgets.Label(value='Settings:', layout=wide_layout),\n",
    "    widgets.Label(value='SISSO rung:', layout=wide_layout),\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
617
    "    rung_selection,\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
618
619
620
    "    widgets.Label(value='To unfreeze the feature selection,' , layout=wide_layout),\n",
    "    widgets.Label(value='please select any rung other than PRM2020.', layout=widgets.Layout(width = '300px', bottom='10px') ),\n",
    "    widgets.Label(value='Number of selected features per SIS iteration:',  layout=wide_layout),\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
621
622
623
624
625
    "    feat_per_iter_selection,\n",
    "    widgets.Label(value='Maximum number of dimensions:', layout=wide_layout),\n",
    "    dimension_selection])\n",
    "\n",
    "default_button = widgets.Button(description = 'Default selection', layout=mid_layout)\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
626
    "run_button = widgets.Button(description = 'Run', layout=mid_layout)\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
627
    "plot_button = widgets.Button(description = 'Plot interactive map', disabled=True, layout=mid_layout)\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
628
629
630
631
632
633
634
    "default_button.on_click(default_button_clicked)\n",
    "run_button.on_click(run_button_clicked)\n",
    "plot_button.on_click(plot_button_clicked)\n",
    "button_box = widgets.VBox([default_button, run_button, plot_button])\n",
    "\n",
    "rung_selection.observe(handle_rung_selection, names='value')\n",
    "\n",
Luigi Sbailo's avatar
Luigi Sbailo committed
635
636
637
638
639
640
641
642
    "\n",
    "out1 = widgets.Output()\n",
    "out2 = widgets.Output()\n",
    "\n",
    "gui_box = widgets.HBox([op_box, op_label_box, feat_box, feat_label_box, settings_box, button_box])\n",
    "out_box = widgets.VBox([gui_box, out1, out2])\n",
    "\n",
    "display(out_box)"
643
   ]
644
645
646
647
648
649
650
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
651
652
653
  }
 ],
 "metadata": {
Luigi Sbailo's avatar
Luigi Sbailo committed
654
  "celltoolbar": "Initialization Cell",
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
Luigi Sbailo's avatar
Fixie    
Luigi Sbailo committed
670
   "version": "3.7.10"
671
672
673
674
675
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}