diff --git a/exploratory_analysis.ipynb b/exploratory_analysis.ipynb
index 45cd401c590e15eb30c3074a6055dd99ba33cfda..0fa07c2b49e73d64de3463b22b75c30eeb1118a2 100644
--- a/exploratory_analysis.ipynb
+++ b/exploratory_analysis.ipynb
@@ -36,7 +36,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In this tutorial, we use unsupervised learning for a preliminary exploration of materials science data. More specifically, we analyze 82 octet binary materials known to crystallize in zinc blende (ZB) and rocksalst (RS) structures. Our aim is to show how to facilitate the visualization of unlabeled data and gain an understanding of the relevant inner structures inside the dataset. As a first step in our data analysis, we would like to detect whether data points can be classified into different  clusters, where each cluster is aimed to group together objects that share similar features. With an explorative analysis we would like to visualize the structure and spatial displacement of the clusters, but when the feature space is higlhly multidimensional such visualization is directly not possible. Hence, we project the feature space into a two-dimensional manifold which, instead, can be  visualized. To avoid losing relevant information, the embedding into a lower dimensional manifold must be performed while preserving the most informative features in the original space. Below we introduce into different clustering and embedding methods, which can be combined to obtain different visualizations of our dataset."
+    "In this tutorial, we use unsupervised learning for a preliminary exploration of materials science data. More specifically, we analyze 82 octet binary materials known to crystallize in zinc blende (ZB) and rocksalt (RS) structures. Our aim is to show how to facilitate the visualization of unlabeled data and gain an understanding of the relevant inner structures inside the dataset. As a first step in our data analysis, we would like to detect whether data points can be classified into different  clusters, where each cluster is aimed to group together objects that share similar features. With an explorative analysis we would like to visualize the structure and spatial displacement of the clusters, but when the feature space is higlhly multidimensional such visualization is directly not possible. Hence, we project the feature space onto a two-dimensional manifold which, instead, can be  visualized. To avoid losing relevant information, the embedding into a lower dimensional manifold must be performed while preserving the most informative features in the original space. Below we introduce into different clustering and embedding methods, which can be combined to obtain different visualizations of our dataset."
    ]
   },
   {
@@ -50,10 +50,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Cluster analysis is performed to group together data points that are more similar to each other in comparison with points belonging to other clusters. Clustering can be achieved by means of many different algorithms, each with proper characteristics and input parameters. The choice of the clustering algorithms to be used depends on the specific data set analyzed, and, once an optimal algorithm has been chosen, it is often necessary to iteratively modify the input parameters until results achieve the desired resolution. We focus on four different algorithms as described below.\n",
-    "- ___k_-means__ partitions the data set into _k_ clusters, where each data point belongs to the cluster with the nearest mean. This partition ultimately minimizes the within-cluster variance to find the most compact partitioning of the data set. _K_-means uses an iterative refinement technique that is fast and scalable, but if falls in local minima. Thus, the algorithm is iterated multiple times with different initial conditions and the best outcome is finally chosen. Drawbacks of this algorithm are that the number of clusters _k_ is an input parameter which must be known in advance and clusters are convex shaped.\n",
-    "- __Hierarchical clustering__ builds a hierarchy of clusters with a bottom-up (__agglomerative__) or top-down (__divisive__) approach. In a bottom-up approach, that we deploy below, starting with all data points placed in its own cluster, different pairs of clusters are iteratively merged together where the decision of the clusters to be merged is determined in a greedy manner. This is iterated until all points are grouped within one cluster, and the resulting hierarchy of clusters is presentend in a dendogram. If a distance thereshold is given,  merging of clusters when outside this distance, this stops the algorithm when no more mergings are possible. The algorithm then returns a certain number of clusters as a function of the threshold distance . An advantage of this algorithm is that the construction of dendroids allows for a visual inspection of the clustering, but hierarchical clustering is considerably slower than the other algorithms discussed above and not well suited for big data.\n",
-    "- Density-based spatial clustering of applications with noise (__DBSCAN__) is a  algorithm that, without knowing the exact number of clusters, groups points that are close to each other leaving outliers marked as noise and not defined in any cluster. In this algorithm a neighborood distance _$\\epsilon$_  and a number of points _min-samples_ are used to determine whether a point belongs to a cluster: in case the point has a number _min-samples_ of other points  within the distance _$\\epsilon$_ is marked as core point and belongs to a cluster; otherwise, the point is marked as noise. This algorithm is fast and clusters can assume any shape, but the choice of the distance _$\\epsilon$_ migth be non trivial.\n",
+    "Cluster analysis is performed to group together data points that are more similar to each other in comparison with points belonging in other clusters. Clustering can be achieved by means of many different algorithms, each with proper characteristics and input parameters. The choice of the clustering algorithms to be used depends on the specific dataset analyzed, and, once an optimal algorithm has been chosen, it is often necessary to iteratively modify the input parameters until results achieve the desired resolution. We focus on four different algorithms as described below.\n",
+    "- ___k_-means__ partitions the data set into _k_ clusters, where each datapoint belongs in the cluster with the nearest mean. This partition ultimately minimizes the within-cluster variance to find the most compact partitioning of the data set. _k_-means uses an iterative refinement technique that is fast and scalable, but if falls in local minima. Thus, the algorithm is iterated multiple times with different initial conditions and the best outcome is finally chosen. Drawbacks of this algorithm are that the number of clusters _k_ is an input parameter which must be known in advance and clusters are convex shaped.\n",
+    "- __Hierarchical clustering__ builds a hierarchy of clusters with a bottom-up (__agglomerative__) or top-down (__divisive__) approach. In a bottom-up approach, that we deploy below, starting with all datapoints placed in its own cluster, different pairs of clusters are iteratively merged together where the decision of the clusters to be merged is determined in a greedy manner. This is iterated until all points are grouped within one cluster, and the resulting hierarchy of clusters is presentend in a dendrogram. If a distance threshold is given, clusters are not merged if they are more distant than the threshold value, and this stops the algorithm when no more mergings are possible. The algorithm then returns a certain number of clusters as a function of the threshold distance. An advantage of this algorithm is that the construction of dendroids allows for a visual inspection of the clustering, but hierarchical clustering is a rather slow algorithm and not well suited for big data.\n",
+    "- Density-based spatial clustering of applications with noise (__DBSCAN__) is an algorithm that, without knowing the exact number of clusters, groups points that are close to each other leaving outliers marked as noise and not defined in any clusters. In this algorithm a neighborood distance _$\\epsilon$_  and a number of points _min-samples_ are used to determine whether a point belongs to a cluster: in case the point has a number _min-samples_ of other points  within the distance _$\\epsilon$_ is marked as core point and belongs to a cluster; otherwise, the point is marked as noise. This algorithm is fast and clusters can assume any shapes, but the choice of the distance _$\\epsilon$_ migth be non trivial.\n",
     "- The fast search and find of density peaks (__DenPeak__) algorithm is a density-based algorithm that is able to automatically locate non-spherical clusters. Density peaks are assumed to be sourrounded by lower density regions. Based on the position of the highest density peak, the peaks can be visualized on a graph that shows their sourrounding density and the distance from the first peak. It is then possible to choose the peaks to include from this plot, where each peak represents a different cluster."
    ]
   },
@@ -68,7 +68,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Visualization of a dataset is not possible when it is defined in a highly multidimensional space, but a visual analysis can help to detect inner structures in the dataset. Hence, in order to make such visualization possible, we reduce the dimensionality of the system with methodologies specifically developed to avoid losing critical information during the embedding into a lower dimensionality space. In this tutorial, we use three different embedding methods that are summarize below.\n",
+    "Visualization of a dataset is not possible when it is defined in a highly multidimensional space, but a visual analysis can help to detect inner structures in the dataset. Hence, in order to make such visualization possible, we reduce the dimensionality of the system with methodologies specifically developed to avoid losing critical information during the embedding into a lower dimensionality space. In this tutorial, we use three different embedding methods that are summarized below.\n",
     "- Principal component analysis (__PCA__) is a linear projection method that seeks for an orthogonal transformation of the dataset so as to render the variables of the dataset uncorrelated. The dimensionality reduction is then performed onto the features with highest variance to preserve as much information as possible. This is a deterministic but linear method, that fails to catch non linear correlations.\n",
     "- Multi-dimensional scaling (__MDS__) constructs a pairwise distance matrix in the original space, and seeks a low-dimensional representation that preserves the original distances as much as possible. This method tends to preserve local structures better than global structures and scales badly with the number of data points. \n",
     "- T-distributed Stochastic Neighbor Embedding (__t-SNE__) is a non-linear dimensionality reduction method that converts similarities between data points to joint probabilities and minimizes the Kullback-Leibler divergence between the joint probabilities of the embedding and the original space. The cost function is not convex and results depend on the inizialization. Non linear effects in this method might occasionally produce misleading results, therefore several iterations of the method are recommended.\n"
@@ -90,8 +90,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
+   "execution_count": 88,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2021-01-01T16:07:03.608283Z",
+     "start_time": "2021-01-01T16:07:03.598933Z"
+    }
+   },
    "outputs": [],
    "source": [
     "from ase.io import read\n",
@@ -105,18 +110,26 @@
     "from sklearn.svm import SVC\n",
     "from sklearn.model_selection import train_test_split\n",
     "import plotly.express as px\n",
+    "import plotly.graph_objects as go\n",
     "import ipywidgets as widgets\n",
-    "from IPython.display import display\n",
+    "from IPython.display import display, clear_output\n",
     "from pydpc import Cluster as DPCClustering\n",
     "import matplotlib.pyplot as plt"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
+   "execution_count": 89,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2021-01-01T16:07:04.219755Z",
+     "start_time": "2021-01-01T16:07:04.215903Z"
+    }
+   },
    "outputs": [],
    "source": [
+    "import warnings\n",
+    "warnings.filterwarnings(\"ignore\", category=DeprecationWarning) \n",
     "pd.options.mode.chained_assignment = None"
    ]
   },
@@ -125,7 +138,7 @@
    "metadata": {},
    "source": [
     "# Get the data\n",
-    "We load the data and place it into a Panda's dataframe. Data has been downloaded from the NOMAD archive and the NOMAD atomic data collection. It consists of RS-ZB energy differences (in eV/atom) of the 82 octet binary compounds, structure objects containing the atomic positions of the materials and properties of the atomic constituents. The following atomic features are included:\n",
+    "We load the data and place it into a Panda's dataframe. Data has been downloaded from the NOMAD Archive and the NOMAD atomic data collection. It consists of RS-ZB energy differences (in eV/atom) of the 82 octet binary compounds, structure objects containing the atomic positions of the materials and properties of the atomic constituents. The following atomic features are included:\n",
     "\n",
     "- Z:  atomic number\n",
     "- period: period in the periodic table\n",
@@ -138,8 +151,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 25,
    "metadata": {
+    "ExecuteTime": {
+     "end_time": "2021-01-01T15:08:43.334067Z",
+     "start_time": "2021-01-01T15:08:43.240012Z"
+    },
     "scrolled": true
    },
    "outputs": [],
@@ -189,13 +206,18 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "A 'Clustering' class is defined that includes all clustering algorithms that are covered during the tutorial. Before creating an instance of this class, a dataframe variable 'df' must have been defined. The clustering functions in the class, assign labels to the entries in the dataframe according to the results of the clustering."
+    "A 'Clustering' class is defined that includes all clustering algorithms that are covered during the tutorial. Before creating an instance of this class, a dataframe variable 'df' must have been defined. The clustering functions in the class assign labels to the entries in the dataframe according to the outcome of the clustering."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
+   "execution_count": 8,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2021-01-01T14:50:23.468443Z",
+     "start_time": "2021-01-01T14:50:23.458779Z"
+    }
+   },
    "outputs": [],
    "source": [
     "class Clustering:\n",
@@ -248,73 +270,125 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The embedding algorithms are handled with a graphical interface that is generated using Jupyter Widgets, that allows to generate a plot with the desiered embedding algorithm by pushing a bottom. Before plotting data with any embedding algorithm, a dataframe 'df' must have been defined and cluster labels assigned to each data point."
+    "The embedding algorithms are handled with a graphical interface that is generated using Jupyter Widgets, that allows to generate a plot with the desiered embedding algorithm by pushing a button. Before plotting data with any of the embedding algorithms, a dataframe 'df' must have been defined, and cluster labels assigned to each data point."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
+   "execution_count": 320,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2021-01-01T20:10:15.901128Z",
+     "start_time": "2021-01-01T20:10:15.888977Z"
+    }
+   },
    "outputs": [],
    "source": [
-    "btn_PCA = widgets.Button(description='PCA')\n",
-    "btn_MDS = widgets.Button(description='MDS')\n",
-    "btn_tSNE = widgets.Button(description='t-SNE')\n",
-    "btn_kmeans = widgets.Button(description='k-means')\n",
-    "btn_hierarchical = widgets.Button(description='hierarchical')\n",
-    "btn_dbscan = widgets.Button(description='DBSCAN')\n",
-    "btn_plot = widgets.Button (description='plot')\n",
+    "def show_embedding ():\n",
     "\n",
+    "    btn_PCA = widgets.Button(description='PCA')\n",
+    "    btn_MDS = widgets.Button(description='MDS')\n",
+    "    btn_tSNE = widgets.Button(description='t-SNE')\n",
+    "    btn_kmeans = widgets.Button(description='k-means')\n",
+    "    btn_hierarchical = widgets.Button(description='hierarchical')\n",
+    "    btn_dbscan = widgets.Button(description='DBSCAN')\n",
+    "    btn_plot = widgets.Button (description='plot')\n",
     "\n",
-    "def btn_eventhandler_embedding (obj):\n",
+    "    def btn_eventhandler_embedding (obj):\n",
     "\n",
-    "    method = str (obj.description)\n",
-    "    \n",
-    "    try:\n",
-    "        df \n",
-    "    except NameError:\n",
-    "        print(\"Please define a dataframe 'df'\")\n",
-    "        return\n",
-    "    try:\n",
-    "        df['clustering'][0]\n",
-    "    except KeyError:\n",
-    "        print(\"Please assign labels with a clustering algorithm\")\n",
-    "        return\n",
-    "    try:\n",
-    "        hover_features\n",
-    "    except NameError:\n",
-    "        print(\"Please create a list 'hover_features' containing all hover features\")\n",
-    "        return\n",
-    "              \n",
-    "    if (method == 'PCA'):\n",
-    "        transformed_data = PCA(n_components=2).fit_transform(df[features])\n",
-    "        df['x_emb']=transformed_data[:,0]\n",
-    "        df['y_emb']=transformed_data[:,1]\n",
-    "        df['embedding'] = 'PCA'\n",
-    "    elif (method == 'MDS'):\n",
-    "        transformed_data = MDS (n_components=2).fit_transform(df[features])\n",
-    "        df['x_emb']=transformed_data[:,0]\n",
-    "        df['y_emb']=transformed_data[:,1]\n",
-    "        df['embedding'] = 'MDS'\n",
-    "    elif (method == 't-SNE'):\n",
-    "        transformed_data = TSNE (n_components=2).fit_transform(df[features])\n",
-    "        df['x_emb']=transformed_data[:,0]\n",
-    "        df['y_emb']=transformed_data[:,1]\n",
-    "        df['embedding'] = 't-SNE'\n",
-    "    plot_embedding()\n",
+    "        method = str (obj.description)\n",
     "\n",
-    "def plot_embedding():\n",
-    "    print (\"Clustering algorithm used: \",df['clustering'][0], \"\\t Embedding method used: \", df['embedding'][0])    \n",
-    "#     df[\"labels\"]=df[\"labels\"].astype(str)\n",
-    "    display(px.scatter(df,x='x_emb',y='y_emb',color=df['labels'].astype(str),hover_data=df[hover_features], hover_name=df.index ))\n",
-    "    \n",
-    "    \n",
-    "btn_PCA.on_click(btn_eventhandler_embedding)\n",
-    "btn_MDS.on_click(btn_eventhandler_embedding)\n",
-    "btn_tSNE.on_click(btn_eventhandler_embedding)\n",
+    "        try:\n",
+    "            df \n",
+    "        except NameError:\n",
+    "            print(\"Please define a dataframe 'df'\")\n",
+    "            return\n",
+    "        try:\n",
+    "            df['clustering'][0]\n",
+    "        except KeyError:\n",
+    "            print(\"Please assign labels with a clustering algorithm\")\n",
+    "            return\n",
+    "        try:\n",
+    "            hover_features\n",
+    "        except NameError:\n",
+    "            print(\"Please create a list 'hover_features' containing all hover features\")\n",
+    "            return\n",
     "\n",
+    "        if (method == 'PCA'):\n",
+    "            transformed_data = PCA(n_components=2).fit_transform(df[features])\n",
+    "            df['x_emb']=transformed_data[:,0]\n",
+    "            df['y_emb']=transformed_data[:,1]\n",
+    "            df['embedding'] = 'PCA'\n",
+    "        elif (method == 'MDS'):\n",
+    "            transformed_data = MDS (n_components=2).fit_transform(df[features])\n",
+    "            df['x_emb']=transformed_data[:,0]\n",
+    "            df['y_emb']=transformed_data[:,1]\n",
+    "            df['embedding'] = 'MDS'\n",
+    "        elif (method == 't-SNE'):\n",
+    "            transformed_data = TSNE (n_components=2).fit_transform(df[features])\n",
+    "            df['x_emb']=transformed_data[:,0]\n",
+    "            df['y_emb']=transformed_data[:,1]\n",
+    "            df['embedding'] = 't-SNE'\n",
+    "        plot_embedding()\n",
     "\n",
-    "box = widgets.HBox ([btn_PCA,btn_MDS,btn_tSNE])"
+    "    def plot_embedding():\n",
+    "        with fig.batch_update():\n",
+    "            fig['data'][0]['x']=df['x_emb']\n",
+    "            fig['data'][0]['y']=df['y_emb']\n",
+    "            fig['data'][0]['customdata']=np.expand_dims(df['min_struc_type'].to_numpy(),axis=1)\n",
+    "            fig['data'][0]['hovertemplate']=r\"<b>%{text}</b><br><br> Low energy structure:  %{customdata[0]}<br>\"\n",
+    "            fig['data'][0]['marker'].color=df['labels'].to_numpy()\n",
+    "            fig['data'][0]['marker'].colorscale=tuple([tuple([0,'#14213d']),tuple([1,'#fca311'])])\n",
+    "            fig['data'][0]['text']=df.index\n",
+    "            fig.update_layout(plot_bgcolor='rgba(229,236,246, 0.5)',\n",
+    "                  xaxis=dict(visible=True),\n",
+    "                  yaxis=dict(visible=True))\n",
+    "        label.value = \"Clustering algorithm used: \" + str(df['clustering'][0]) + \"\\t Embedding method used: \" + str(df['embedding'][0])    \n",
+    "\n",
+    "    btn_PCA.on_click(btn_eventhandler_embedding)\n",
+    "    btn_MDS.on_click(btn_eventhandler_embedding)\n",
+    "    btn_tSNE.on_click(btn_eventhandler_embedding)\n",
+    "    label = widgets.Label(value='Select a dimension reduction method to visualize the 2-dimensional embedding')\n",
+    "    fig = go.FigureWidget()\n",
+    "    fig.add_trace(go.Scatter(\n",
+    "                         name='embedding',\n",
+    "                         mode='markers',\n",
+    "                        ))\n",
+    "    fig.update_layout(plot_bgcolor='rgba(229,236,246, 0.5)',\n",
+    "                      xaxis=dict(visible=False, title='x_emb'),\n",
+    "                      yaxis=dict(visible=False, title='y_emb'))\n",
+    "\n",
+    "    return widgets.VBox([widgets.HBox ([btn_PCA,btn_MDS,btn_tSNE]),label, fig])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 322,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2021-01-01T20:11:05.452861Z",
+     "start_time": "2021-01-01T20:11:05.304006Z"
+    },
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "fae6d0da3c25423791f763044233393e",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "VBox(children=(HBox(children=(Button(description='PCA', style=ButtonStyle()), Button(description='MDS', style=…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "show_embedding()"
    ]
   },
   {
@@ -326,8 +400,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
+   "execution_count": 35,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2021-01-01T15:13:02.811407Z",
+     "start_time": "2021-01-01T15:13:02.806201Z"
+    }
+   },
    "outputs": [],
    "source": [
     "features = []\n",
@@ -353,13 +432,17 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Feature standardization if the operation of rescaling data so as to be shaped as a Gaussian with zero mean and unit variance, and it is a common requirement for machine learning algorithms. In fact, estimators can be biased towards dimensions presenting higher absolute values, or outliers can undermine the learning capabilites  of the algorithm. Hence, we standardize our dataset by subtracting the mean value and dividing it by the standard deviation for each variable."
+    "Feature standardization is the operation of rescaling data so as to be shaped as a Gaussian with zero mean and unit variance, and it is a common requirement for machine learning algorithms. In fact, estimators can be biased towards dimensions presenting higher absolute values, or outliers can undermine the learning capabilites  of the algorithm. Hence, we standardize our dataset by subtracting the mean value and dividing it by the standard deviation for each variable."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 36,
    "metadata": {
+    "ExecuteTime": {
+     "end_time": "2021-01-01T15:13:03.146889Z",
+     "start_time": "2021-01-01T15:13:03.133137Z"
+    },
     "scrolled": true
    },
    "outputs": [],
@@ -376,13 +459,30 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 37,
    "metadata": {
+    "ExecuteTime": {
+     "end_time": "2021-01-01T15:13:05.302852Z",
+     "start_time": "2021-01-01T15:13:03.661340Z"
+    },
     "scrolled": true
    },
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "image/png": "\n",
+      "text/plain": [
+       "<Figure size 1440x1080 with 16 Axes>"
+      ]
+     },
+     "metadata": {
+      "needs_background": "light"
+     },
+     "output_type": "display_data"
+    }
+   ],
    "source": [
-    "hist = df[features].hist( bins=10, figsize = (20,15))"
+    "hist = df[features].hist( bins=10, figsize = (20,15));"
    ]
   },
   {
@@ -397,7 +497,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "K-means requires the knowledge of the number of clusters and clustering depends on the initial conditions, hence the algorithm is iterated,  up to _max\\_iter_ times, with different initial conditions until convergence. As we know that our octet binary materials crystallize in the RS and ZB structures, a natural distinction in this dataset is between materials with the most stable conformationzin in the RS vs ZB structure. Hence we seek for two clusters, aiming to find clusters of materials with the same most stable structure. "
+    "$k$-means requires the knowledge of the number of clusters and clustering depends on the initial condition. Therefore the algorithm is iterated,  up to _max\\_iter_ times, with different initial conditions until convergence. As we know that our octet binary materials crystallize in the RS and ZB structures, a natural distinction in this dataset is between materials with the most stable conformation in the RS vs ZB structure. Hence we seek for two clusters, aiming to find clusters of materials with the same most stable structure. "
    ]
   },
   {
@@ -409,8 +509,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 94,
    "metadata": {
+    "ExecuteTime": {
+     "end_time": "2021-01-01T16:08:09.067069Z",
+     "start_time": "2021-01-01T16:08:09.040255Z"
+    },
     "scrolled": true
    },
    "outputs": [],
@@ -422,9 +526,32 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
+   "execution_count": 95,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2021-01-01T16:08:09.336832Z",
+     "start_time": "2021-01-01T16:08:09.331988Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "AgBr    0\n",
+      "AgCl    1\n",
+      "AgF     1\n",
+      "AgI     0\n",
+      "AlAs    0\n",
+      "AlN     1\n",
+      "AlP     0\n",
+      "AlSb    0\n",
+      "AsGa    0\n",
+      "AsB     0\n",
+      "Name: labels, dtype: int32\n"
+     ]
+    }
+   ],
    "source": [
     "print(df['labels'][:10])"
    ]
@@ -440,25 +567,132 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Clicking any of the bottons below will display the dataset embedding according to the label placed on the botton. Different clusters are visualized with different colors, and by hovering over points it is possible to see the material they represent and some defined features. In this case we are interested to see which is the lowest energy structure of the materials, then we select only the 'min_struc_type' as hovering feature. Please note that any other feature can be added to the 'hover_features' list."
+    "Clicking any of the buttons below will display the dataset embedding according to the label placed on the button. Different clusters are visualized with different colors, and by hovering over points it is possible to see the material they represent and some defined features. In this case we are interested to see which is the lowest energy structure of the materials, then we select only the 'min_struc_type' as hovering feature. Please note that any other feature can be added to the 'hover_features' list."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
+   "execution_count": 101,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2021-01-01T16:10:25.790100Z",
+     "start_time": "2021-01-01T16:10:25.732129Z"
+    }
+   },
    "outputs": [],
    "source": [
-    "hover_features = ['min_struc_type']"
+    "btn_PCA = widgets.Button(description='PCA')\n",
+    "btn_MDS = widgets.Button(description='MDS')\n",
+    "btn_tSNE = widgets.Button(description='t-SNE')\n",
+    "btn_kmeans = widgets.Button(description='k-means')\n",
+    "btn_hierarchical = widgets.Button(description='hierarchical')\n",
+    "btn_dbscan = widgets.Button(description='DBSCAN')\n",
+    "btn_plot = widgets.Button (description='plot')\n",
+    "\n",
+    "\n",
+    "def btn_eventhandler_embedding (obj):\n",
+    "\n",
+    "    method = str (obj.description)\n",
+    "    \n",
+    "    try:\n",
+    "        df \n",
+    "    except NameError:\n",
+    "        print(\"Please define a dataframe 'df'\")\n",
+    "        return\n",
+    "    try:\n",
+    "        df['clustering'][0]\n",
+    "    except KeyError:\n",
+    "        print(\"Please assign labels with a clustering algorithm\")\n",
+    "        return\n",
+    "    try:\n",
+    "        hover_features\n",
+    "    except NameError:\n",
+    "        print(\"Please create a list 'hover_features' containing all hover features\")\n",
+    "        return\n",
+    "              \n",
+    "    if (method == 'PCA'):\n",
+    "        transformed_data = PCA(n_components=2).fit_transform(df[features])\n",
+    "        df['x_emb']=transformed_data[:,0]\n",
+    "        df['y_emb']=transformed_data[:,1]\n",
+    "        df['embedding'] = 'PCA'\n",
+    "    elif (method == 'MDS'):\n",
+    "        transformed_data = MDS (n_components=2).fit_transform(df[features])\n",
+    "        df['x_emb']=transformed_data[:,0]\n",
+    "        df['y_emb']=transformed_data[:,1]\n",
+    "        df['embedding'] = 'MDS'\n",
+    "    elif (method == 't-SNE'):\n",
+    "        transformed_data = TSNE (n_components=2).fit_transform(df[features])\n",
+    "        df['x_emb']=transformed_data[:,0]\n",
+    "        df['y_emb']=transformed_data[:,1]\n",
+    "        df['embedding'] = 't-SNE'\n",
+    "    plot_embedding()\n",
+    "\n",
+    "def plot_embedding():\n",
+    "    clear_output()    \n",
+    "\n",
+    "    fig = go.FigureWidget()\n",
+    "\n",
+    "    fig.add_trace(go.Scatter(x=df['x_emb'],y=df['y_emb'],\n",
+    "                             mode='markers',\n",
+    "                             marker=dict(color=df['labels']),\n",
+    "                             customdata=np.expand_dims(df['min_struc_type'].to_numpy(),axis=1),\n",
+    "                             text=df.index,\n",
+    "                             hovertemplate=\n",
+    "                             r\"<b>%{text}</b><br><br>\" +\n",
+    "                             \"x axis: %{x:,.2f}<br>\" +\n",
+    "                             \"y axis: %{y:,.2f}<br>\" +\n",
+    "                             \"Low energy structure:  %{customdata[0]}<br>\"\n",
+    "                            ))\n",
+    "    print (\"Clustering algorithm used: \",df['clustering'][0], \"\\t Embedding method used: \", df['embedding'][0])    \n",
+    "#     df[\"labels\"]=df[\"labels\"].astype(str)\n",
+    "#     display(px.scatter(df,x='x_emb',y='y_emb',color=df['labels'].astype(str),hover_data=df[hover_features], hover_name=df.index ))\n",
+    "#     with output:\n",
+    "    display(fig)\n",
+    "    \n",
+    "btn_PCA.on_click(btn_eventhandler_embedding)\n",
+    "btn_MDS.on_click(btn_eventhandler_embedding)\n",
+    "btn_tSNE.on_click(btn_eventhandler_embedding)\n",
+    "\n",
+    "\n",
+    "box = widgets.HBox ([btn_PCA,btn_MDS,btn_tSNE])"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 104,
    "metadata": {
+    "ExecuteTime": {
+     "end_time": "2021-01-01T16:10:36.562307Z",
+     "start_time": "2021-01-01T16:10:36.553397Z"
+    },
     "scrolled": false
    },
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Clustering algorithm used:  k-means \t Embedding method used:  t-SNE\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "372ac81158154de79dae475e8db9b9d6",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "FigureWidget({\n",
+       "    'data': [{'customdata': array([['RS'],\n",
+       "                                   ['RS'],\n",
+       "         …"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
    "source": [
     "display(box)"
    ]
@@ -481,8 +715,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
+   "execution_count": 14,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-12-18T18:43:38.603954Z",
+     "start_time": "2020-12-18T18:43:38.597727Z"
+    }
+   },
    "outputs": [],
    "source": [
     "def composition_RS_ZB (df):\n",
@@ -503,11 +742,69 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 15,
    "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-12-18T18:43:39.273928Z",
+     "start_time": "2020-12-18T18:43:39.238298Z"
+    },
     "scrolled": true
    },
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>RS</th>\n",
+       "      <th>ZB</th>\n",
+       "      <th>Materials in cluster</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>85</td>\n",
+       "      <td>15</td>\n",
+       "      <td>40</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>16</td>\n",
+       "      <td>83</td>\n",
+       "      <td>42</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   RS  ZB Materials in cluster\n",
+       "0  85  15                   40\n",
+       "1  16  83                   42"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "composition_RS_ZB(df)"
    ]
@@ -516,7 +813,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can see that K-means finds two distinct clusters, and in one of these clusters there are more 'RS' stable structures while in the other there are more 'ZB' stable structures. This is a hint that in the space described by the atomic features, materials with the same most stable structure are close to each other. On the other hand, we know that K-means is only able to detect spherically shaped clusters, therefore delimiting clusters containing only one specific stable structure is difficult under this assumption."
+    "We can see that $k$-means finds two distinct clusters, and in one of these clusters there are more 'RS' stable structures while in the other there are more 'ZB' stable structures. This is a hint that in the space described by the atomic features, materials with the same most stable structure are close to each other. On the other hand, we know that $k$-means is only able to detect spherically shaped clusters, and the delimitation of clusters containing only one specific stable structure can be difficult under this assumption."
    ]
   },
   {
@@ -536,8 +833,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
+   "execution_count": 16,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-12-18T18:50:54.589458Z",
+     "start_time": "2020-12-18T18:50:54.569294Z"
+    }
+   },
    "outputs": [],
    "source": [
     "eps = 3\n",
@@ -547,20 +849,98 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 17,
    "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-12-18T18:50:55.527429Z",
+     "start_time": "2020-12-18T18:50:55.518975Z"
+    },
     "scrolled": true
    },
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "1ac9898689124574aef1ccc5775edf6f",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(Button(description='PCA', style=ButtonStyle()), Button(description='MDS', style=ButtonStyle()),…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
    "source": [
     "display(box)"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
+   "execution_count": 18,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-12-18T18:50:57.413977Z",
+     "start_time": "2020-12-18T18:50:57.382729Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>RS</th>\n",
+       "      <th>ZB</th>\n",
+       "      <th>Materials in cluster</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>0</td>\n",
+       "      <td>100</td>\n",
+       "      <td>17</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>70</td>\n",
+       "      <td>30</td>\n",
+       "      <td>30</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   RS   ZB Materials in cluster\n",
+       "0   0  100                   17\n",
+       "1  70   30                   30"
+      ]
+     },
+     "execution_count": 18,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "composition_RS_ZB(df)"
    ]
@@ -569,7 +949,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can see that the algorithm found two different clusters, and we notice that each cluster is more representative of the RS vs ZB stable structure compared to K-means. However, this happens at the cost of neglecting many points that have been classified as noise."
+    "We can see that the algorithm found two different clusters, and we notice that each cluster is more representative of the RS vs ZB stable structure compared to $k$-means. However, this happens at the cost of neglecting many points that have been classified as noise."
    ]
   },
   {
@@ -597,8 +977,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 19,
    "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-12-18T18:51:42.316557Z",
+     "start_time": "2020-12-18T18:51:42.305922Z"
+    },
     "scrolled": true
    },
    "outputs": [],
@@ -609,11 +993,30 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 20,
    "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-12-18T18:51:42.933612Z",
+     "start_time": "2020-12-18T18:51:42.924728Z"
+    },
     "scrolled": false
    },
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "1ac9898689124574aef1ccc5775edf6f",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(Button(description='PCA', style=ButtonStyle()), Button(description='MDS', style=ButtonStyle()),…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
    "source": [
     "display(box)"
    ]
@@ -629,14 +1032,44 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "One advantage of hierarchical methods is that they allow to decompose and understand the clustering process. Indeed, below we plot a dendogram that shows all agglomeration steps that from having all single objects as individual clusters group objects into a unique cluster. On the y-axys there is the distance threshold, and the number of biforcations in the dendogram for a certain value on the y-axis represents the number of clusters that are generated chossing that value as distance threshold. Hence, from the dendogram we can select the value of distance threshold that we need to have a certain number of clusters. "
+    "One advantage of hierarchical methods is that they allow to decompose and understand the clustering process. Indeed, below we plot a dendogram that shows all agglomeration steps that from having all single objects as individual clusters group objects into a unique supercluster. On the y-axys there is the distance threshold, and the number of biforcations in the dendogram for a certain value on the y-axis represents the number of clusters that are generated choosing that value as distance threshold. Hence, from the dendogram we can select the value of distance threshold that we need to have a certain number of clusters. "
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
+   "execution_count": 21,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-12-18T18:52:40.950213Z",
+     "start_time": "2020-12-18T18:52:40.723508Z"
+    }
+   },
+   "outputs": [
+    {
+     "ename": "AttributeError",
+     "evalue": "'AgglomerativeClustering' object has no attribute 'distances_'",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mAttributeError\u001b[0m                            Traceback (most recent call last)",
+      "\u001b[0;32m<ipython-input-21-4f7faa6e95b9>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     27\u001b[0m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtitle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Hierarchical Clustering Dendrogram'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     28\u001b[0m \u001b[0;31m# plot the top three levels of the dendrogram\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 29\u001b[0;31m \u001b[0mplot_dendrogram\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtruncate_mode\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'level'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mp\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     30\u001b[0m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mxlabel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Number of points in node (or index of point if no parenthesis).\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     31\u001b[0m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshow\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;32m<ipython-input-21-4f7faa6e95b9>\u001b[0m in \u001b[0;36mplot_dendrogram\u001b[0;34m(model, **kwargs)\u001b[0m\n\u001b[1;32m     14\u001b[0m         \u001b[0mcounts\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcurrent_count\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     15\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 16\u001b[0;31m     linkage_matrix = np.column_stack([model.children_, model.distances_,\n\u001b[0m\u001b[1;32m     17\u001b[0m                                       counts]).astype(float)\n\u001b[1;32m     18\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+      "\u001b[0;31mAttributeError\u001b[0m: 'AgglomerativeClustering' object has no attribute 'distances_'"
+     ]
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEICAYAAABcVE8dAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVZklEQVR4nO3cf7BkZZ3f8feHGVFABJTx18wgKMgPE7HgirpZBYMGBktnTUgCqAirIWQXYypuBTZbq27pJrqVrbgG3MnIIjFunNWVVTAgq2VYNMDKTILIQDAjKAw/ZEBAHXYlA9/8cc44Pc2duT339r13uM/7VdVVfc55uvt7nu7+9NPP6dOpKiRJC98e812AJGluGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8HcjSdYnOWE3qOOsJN/eyfarkrx7Nh9jhNtfk+S9M6lhHJK8Psnt813HOCQ5IcnG+a5Ds8fAnyNJfpjkTUPrtgu9qnpFVV0z58XtoqpaUVX/ZTYfI8meST6c5P8m2dz33yVJDh7jY8zoQwegqr5VVYePq6ZB/Yfa3yb5WZKfJlmX5IIkz5yNx9PCZ+AvAEkWT+M2i2ajljH6c+BtwBnAfsDRwDrgxPksatB0+n0azquqfYEXAR8ATgOuTJI5eOxfGve+zlHfaYiBvxsZ/BaQZI9+NPeDJA8l+UKS5/bbDk5SSd6T5C7gm/36Lya5P8mjSa5N8oqB+740yR8nuTLJZuCNSZYnuSzJpv4xLhyq5z8keTjJnUlWDKzfbjolyT9Lcls/Er01yTH9+q31b13/9hH74U3Am4GVVXVjVW2pqker6qKq+pNJ2n84yecGlrf2z+J++awkd/R13JnkHUmOBFYBr0vy8ySP9G2f2e/3XUl+nGRVkr36bSck2Zjk/CT3A58Zngbpn8PfSnJz/zz8WZJnDWz/N0nuS3Jvkvf2dR46VZ9U1eb+29/bgNcBb+nvb5TXybv7/Xkwye8M1LJX/7p4OMmtwKuH+vWH/b7eDGxOsjjJ29JNPT7Svw6OHGh/TJL/3ffzF/t9/+hO+u6AJF/tX38P99eXDdzfNUk+muS6/jm6Isnzkvxpum88N2aM3/haYODvvv4l8GvA8cCLgYeBi4baHA8cCZzUL18FHAY8H/hfwJ8OtT8D+H1gX+B64KvAj4CDgaXAmoG2rwFuBw4E/gD4k+Spo8ok/xj4MHAm8By6QHqo3/wD4PV0I/TfAz6X5EUj7PubgO9U1d0jtN2pJPsAnwRW9CPlXwFuqqrbgHOB66vq2VW1f3+TjwMvB14FHErXLx8cuMsXAs8FXgKcs4OH/SfAycAhwCuBs/paTgb+db9/h9I9f7ukqu4C1tL1K4z2OvlV4HC6b0cfHAjpDwEv6y8nAZMdlzmd7sNlf+ClwOeBfwUsAa4Erkg3/bYn8BfApXT983lg+AN+uO/2AD7TLx8E/A1w4dBtTgPeRfc8vIzudfuZ/n5u6/dBo6oqL3NwAX4I/Bx4ZODyGPDtoTZv6q/fBpw4sO1FwP8DFtMFdAEv3cnj7d+32a9fvhT47MD21wGbgMWT3PYsYMPA8t79fb2wX74GeG9//Wrg/SP2wU10o/atj/HtHbT7NLBmivsarOHDwOcGtm3tn8XAPn1f/yNgr0n2c7D/A2wGXjbUT3f2108AHgeeNbD9BGDj0HP4zoHlPwBW9dcvAf79wLZD+zoPnWofh9avAT69C6+TZQPbvwOc1l+/Azh5YNs5k+zLrw8s/y7whYHlPYB7+j54Q389A9u/DXx0R303yX69Cnh4aP9/Z2D5D4GrBpbfSvfhPe/v76fLxRH+3Pq1qtp/6wX4jZ20fQnwF/1X50fo3thPAC8YaPPLEXCSRUk+1n+1/yndmxW6EfpT2gPLgR9V1ZYdPP79W69U1WP91WdP0m453Uj+KZKcmeSmgX34O0P17MhDdME1Y1W1GfindKP5+5L89yRH7KD5EroPt3UDNX+tX7/Vpqr62yke9v6B64+xrd9ezPbPwXS/wSwFftJfH+V1Mmo9P5rksQa3v3iwTVU92W9f2m+7p/oknuS2MNR3SfZO8p+T/Kh/zV4L7J/tjy/9eOD630yyPNlrUjtg4O++7qabhth/4PKsqrpnoM3gm+sMYCXddMF+dKM76Eatk7W/GzgoMz94djfdV+3tJHkJ3Uj9POB5/QfcLUP17Mg3gOMG53OnsJkuqLd64eDGqrq6qt5M9yHyf/q6YPv+AHiQLkReMdDn+1XVYKjM5O9l7wMG92n5rt5BkuXAscC3+lWjvE52Vs9gDQdN0mZwf++l+4DZWkv629/T39fSoWm/4f0b7rsP0E01vaaqnkP3LQFGe41oGgz83dcq4Pf74CTJkiQrd9J+X+AXdKPjvYF/N8X9f4fuTfqxJPskeVaSvzeNOi8GfivJsekc2te8D90bfFNf/9l0I/wpVdU3gK/TjVyP7Q8W7pvk3CS/PslNbgLekOSgJPsBv711Q5IX9Aca96Hrn5/TjYChGy0u6+eft45YPw38xyTP72+/NMlJjMcXgLOTHJlkb7Y/NrBT/Wj4eOArdM/dlf2mXX2dDNfz2/3B02XA+0Zo/5YkJyZ5Bl1g/wK4jm5u/QngvP75WgkcN8X97Uv3AftIugPNzsfPMgN/9/VHwOXAXyb5GXAD3YHUHfks3dfte4Bb+/Y7VFVP0M2BHgrcBWykm/rYJVX1RboDwf8N+BnwZeC5VXUr3Zzr9XTB+neB/7kLd30qXaj9GfAo3beDCbrR/3ANX+/b3Uz3082vDmzegy6Y7qWbBjmebVNp3wTWA/cnebBfdz6wAbihn2b4Bt0odMaq6iq6A8j/o3+M6/tNv9jJzS7sn/8fA58AvkQ37/5kv31XXyeDfo/uNXMn8JfAf52i/tuBdwL/ie7b0FuBt1bV41X1OPAPgffQHTN5J93zsLN9+wSwV39fN9BNn2kWZfspN0lzpf+1zC3AM3dyLOVpK8lf0x2w/sx816KOI3xpDiV5e/8zxgPofgJ6xUIJ+yTHJ3lhP6XzbrqfpDpq341MGfjpTmd/IMktO9ieJJ9MsiHdySbHjL9MacH453THNX5AN+f9L+a3nLE6HPgu3RTcB4BTq+q++S1Jg6ac0knyBroDXZ+tqqccdEtyCt3BnlPo5g7/qKpGnUOUJM2RKUf4VXUt237zO5mVdB8GVVU30P2Odiy/oZYkjc84/sBoKdufYLGxX/eUr3JJzqE/HX2fffY59ogjdnT+iyRpMuvWrXuwqpZM3fKpxhH4k50kMek8UVWtBlYDTExM1Nq1a8fw8JLUjiSTnRE9knH8Smcj259Rt4zuN8+SpN3IOAL/cuDM/tc6rwUe9ci8JO1+ppzSSfJ5un+6OzDd/35/CHgGQFWtojsb8hS6MwcfA86erWIlSdM3ZeBX1elTbC/gN8dWkSRpVnimrSQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1IiRAj/JyUluT7IhyQWTbN8vyRVJvptkfZKzx1+qJGkmpgz8JIuAi4AVwFHA6UmOGmr2m8CtVXU0cALwh0n2HHOtkqQZGGWEfxywoaruqKrHgTXAyqE2BeybJMCzgZ8AW8ZaqSRpRkYJ/KXA3QPLG/t1gy4EjgTuBb4HvL+qnhy+oyTnJFmbZO2mTZumWbIkaTpGCfxMsq6Glk8CbgJeDLwKuDDJc55yo6rVVTVRVRNLlizZxVIlSTMxSuBvBJYPLC+jG8kPOhu4rDobgDuBI8ZToiRpHEYJ/BuBw5Ic0h+IPQ24fKjNXcCJAEleABwO3DHOQiVJM7N4qgZVtSXJecDVwCLgkqpan+Tcfvsq4CPApUm+RzcFdH5VPTiLdUuSdtGUgQ9QVVcCVw6tWzVw/V7gH4y3NEnSOHmmrSQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGjBT4SU5OcnuSDUku2EGbE5LclGR9kr8ab5mSpJlaPFWDJIuAi4A3AxuBG5NcXlW3DrTZH/gUcHJV3ZXk+bNUryRpmkYZ4R8HbKiqO6rqcWANsHKozRnAZVV1F0BVPTDeMiVJMzVK4C8F7h5Y3tivG/Ry4IAk1yRZl+TMye4oyTlJ1iZZu2nTpulVLEmallECP5Osq6HlxcCxwFuAk4DfTfLyp9yoanVVTVTVxJIlS3a5WEnS9E05h083ol8+sLwMuHeSNg9W1WZgc5JrgaOB74+lSknSjI0ywr8ROCzJIUn2BE4DLh9q8xXg9UkWJ9kbeA1w23hLlSTNxJQj/KrakuQ84GpgEXBJVa1Pcm6/fVVV3Zbka8DNwJPAxVV1y2wWLknaNakano6fGxMTE7V27dp5eWxJerpKsq6qJqZzW8+0laRGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRhj4ktQIA1+SGjFS4Cc5OcntSTYkuWAn7V6d5Ikkp46vREnSOEwZ+EkWARcBK4CjgNOTHLWDdh8Hrh53kZKkmRtlhH8csKGq7qiqx4E1wMpJ2r0P+BLwwBjrkySNySiBvxS4e2B5Y7/ul5IsBd4OrNrZHSU5J8naJGs3bdq0q7VKkmZglMDPJOtqaPkTwPlV9cTO7qiqVlfVRFVNLFmyZMQSJUnjsHiENhuB5QPLy4B7h9pMAGuSABwInJJkS1V9eRxFSpJmbpTAvxE4LMkhwD3AacAZgw2q6pCt15NcCnzVsJek3cuUgV9VW5KcR/frm0XAJVW1Psm5/fadzttLknYPo4zwqaorgSuH1k0a9FV11szLkiSNm2faSlIjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWrESIGf5OQktyfZkOSCSba/I8nN/eW6JEePv1RJ0kxMGfhJFgEXASuAo4DTkxw11OxO4PiqeiXwEWD1uAuVJM3MKCP844ANVXVHVT0OrAFWDjaoquuq6uF+8QZg2XjLlCTN1CiBvxS4e2B5Y79uR94DXDXZhiTnJFmbZO2mTZtGr1KSNGOjBH4mWVeTNkzeSBf450+2vapWV9VEVU0sWbJk9ColSTO2eIQ2G4HlA8vLgHuHGyV5JXAxsKKqHhpPeZKkcRllhH8jcFiSQ5LsCZwGXD7YIMlBwGXAu6rq++MvU5I0U1OO8KtqS5LzgKuBRcAlVbU+ybn99lXAB4HnAZ9KArClqiZmr2xJ0q5K1aTT8bNuYmKi1q5dOy+PLUlPV0nWTXdA7Zm2ktQIA1+SGmHgS1IjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSIwx8SWqEgS9JjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiMMfElqhIEvSY0w8CWpEQa+JDXCwJekRhj4ktQIA1+SGmHgS1IjDHxJaoSBL0mNMPAlqREGviQ1wsCXpEYY+JLUCANfkhph4EtSI0YK/CQnJ7k9yYYkF0yyPUk+2W+/Ockx4y9VkjQTUwZ+kkXARcAK4Cjg9CRHDTVbARzWX84B/njMdUqSZmiUEf5xwIaquqOqHgfWACuH2qwEPludG4D9k7xozLVKkmZg8QhtlgJ3DyxvBF4zQpulwH2DjZKcQ/cNAOAXSW7ZpWoXrgOBB+e7iN2EfbGNfbGNfbHN4dO94SiBn0nW1TTaUFWrgdUASdZW1cQIj7/g2Rfb2Bfb2Bfb2BfbJFk73duOMqWzEVg+sLwMuHcabSRJ82iUwL8ROCzJIUn2BE4DLh9qczlwZv9rndcCj1bVfcN3JEmaP1NO6VTVliTnAVcDi4BLqmp9knP77auAK4FTgA3AY8DZIzz26mlXvfDYF9vYF9vYF9vYF9tMuy9S9ZSpdknSAuSZtpLUCANfkhox64Hv3zJsM0JfvKPvg5uTXJfk6Pmocy5M1RcD7V6d5Ikkp85lfXNplL5IckKSm5KsT/JXc13jXBnhPbJfkiuSfLfvi1GOFz7tJLkkyQM7Oldp2rlZVbN2oTvI+wPgpcCewHeBo4banAJcRfdb/tcCfz2bNc3XZcS++BXggP76ipb7YqDdN+l+FHDqfNc9j6+L/YFbgYP65efPd93z2Bf/Fvh4f30J8BNgz/mufRb64g3AMcAtO9g+rdyc7RG+f8uwzZR9UVXXVdXD/eINdOczLESjvC4A3gd8CXhgLoubY6P0xRnAZVV1F0BVLdT+GKUvCtg3SYBn0wX+lrktc/ZV1bV0+7Yj08rN2Q78Hf3lwq62WQh2dT/fQ/cJvhBN2RdJlgJvB1bNYV3zYZTXxcuBA5Jck2RdkjPnrLq5NUpfXAgcSXdi5/eA91fVk3NT3m5lWrk5yl8rzMTY/pZhARh5P5O8kS7wf3VWK5o/o/TFJ4Dzq+qJbjC3YI3SF4uBY4ETgb2A65PcUFXfn+3i5tgofXEScBPw94GXAV9P8q2q+uks17a7mVZuznbg+7cM24y0n0leCVwMrKiqh+aotrk2Sl9MAGv6sD8QOCXJlqr68pxUOHdGfY88WFWbgc1JrgWOBhZa4I/SF2cDH6tuIntDkjuBI4DvzE2Ju41p5eZsT+n4twzbTNkXSQ4CLgPetQBHb4Om7IuqOqSqDq6qg4E/B35jAYY9jPYe+Qrw+iSLk+xN92+1t81xnXNhlL64i+6bDkleQPfPkXfMaZW7h2nl5qyO8Gv2/pbhaWfEvvgg8DzgU/3IdkstwH8IHLEvmjBKX1TVbUm+BtwMPAlcXFUL7q/FR3xdfAS4NMn36KY1zq+qBfe3yUk+D5wAHJhkI/Ah4Bkws9z0rxUkqRGeaStJjTDwJakRBr4kNcLAl6RGGPiS1AgDX5IaYeBLUiP+P03AEu5791vuAAAAAElFTkSuQmCC\n",
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ]
+     },
+     "metadata": {
+      "needs_background": "light"
+     },
+     "output_type": "display_data"
+    }
+   ],
    "source": [
     "def plot_dendrogram(model, **kwargs):\n",
     "    # Create linkage matrix and then plot the dendrogram\n",
@@ -689,45 +1122,550 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
+   "execution_count": 22,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-12-18T18:55:01.700344Z",
+     "start_time": "2020-12-18T18:55:01.658371Z"
+    },
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>energy_RS</th>\n",
+       "      <th>energy_ZB</th>\n",
+       "      <th>energy_diff</th>\n",
+       "      <th>min_struc_type</th>\n",
+       "      <th>Z(A)</th>\n",
+       "      <th>Z(B)</th>\n",
+       "      <th>period(A)</th>\n",
+       "      <th>period(B)</th>\n",
+       "      <th>IP(A)</th>\n",
+       "      <th>IP(B)</th>\n",
+       "      <th>...</th>\n",
+       "      <th>r_s(B)</th>\n",
+       "      <th>r_p(A)</th>\n",
+       "      <th>r_p(B)</th>\n",
+       "      <th>r_d(A)</th>\n",
+       "      <th>r_d(B)</th>\n",
+       "      <th>clustering</th>\n",
+       "      <th>labels</th>\n",
+       "      <th>x_emb</th>\n",
+       "      <th>y_emb</th>\n",
+       "      <th>embedding</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>AgBr</th>\n",
+       "      <td>-108781.333959</td>\n",
+       "      <td>-108781.303925</td>\n",
+       "      <td>-0.030033</td>\n",
+       "      <td>RS</td>\n",
+       "      <td>1.044160</td>\n",
+       "      <td>0.500776</td>\n",
+       "      <td>5.0</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>-0.682185</td>\n",
+       "      <td>-0.155241</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0.039278</td>\n",
+       "      <td>-0.022042</td>\n",
+       "      <td>0.109901</td>\n",
+       "      <td>0.249586</td>\n",
+       "      <td>-0.095787</td>\n",
+       "      <td>Hierarchical</td>\n",
+       "      <td>1</td>\n",
+       "      <td>-39.526848</td>\n",
+       "      <td>22.154827</td>\n",
+       "      <td>t-SNE</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>AgCl</th>\n",
+       "      <td>-79397.451083</td>\n",
+       "      <td>-79397.408285</td>\n",
+       "      <td>-0.042797</td>\n",
+       "      <td>RS</td>\n",
+       "      <td>1.044160</td>\n",
+       "      <td>-0.558171</td>\n",
+       "      <td>5.0</td>\n",
+       "      <td>3.0</td>\n",
+       "      <td>-0.682185</td>\n",
+       "      <td>-0.550095</td>\n",
+       "      <td>...</td>\n",
+       "      <td>-0.349442</td>\n",
+       "      <td>-0.022042</td>\n",
+       "      <td>-0.333307</td>\n",
+       "      <td>0.249586</td>\n",
+       "      <td>-0.806601</td>\n",
+       "      <td>Hierarchical</td>\n",
+       "      <td>1</td>\n",
+       "      <td>-50.796947</td>\n",
+       "      <td>28.824591</td>\n",
+       "      <td>t-SNE</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>AgF</th>\n",
+       "      <td>-74477.428165</td>\n",
+       "      <td>-74477.274407</td>\n",
+       "      <td>-0.153758</td>\n",
+       "      <td>RS</td>\n",
+       "      <td>1.044160</td>\n",
+       "      <td>-1.028814</td>\n",
+       "      <td>5.0</td>\n",
+       "      <td>2.0</td>\n",
+       "      <td>-0.682185</td>\n",
+       "      <td>-2.285188</td>\n",
+       "      <td>...</td>\n",
+       "      <td>-1.848794</td>\n",
+       "      <td>-0.022042</td>\n",
+       "      <td>-1.773734</td>\n",
+       "      <td>0.249586</td>\n",
+       "      <td>-1.659579</td>\n",
+       "      <td>Hierarchical</td>\n",
+       "      <td>2</td>\n",
+       "      <td>38.285912</td>\n",
+       "      <td>95.895142</td>\n",
+       "      <td>t-SNE</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>AgI</th>\n",
+       "      <td>-171339.208181</td>\n",
+       "      <td>-171339.245107</td>\n",
+       "      <td>0.036925</td>\n",
+       "      <td>ZB</td>\n",
+       "      <td>1.044160</td>\n",
+       "      <td>1.559722</td>\n",
+       "      <td>5.0</td>\n",
+       "      <td>5.0</td>\n",
+       "      <td>-0.682185</td>\n",
+       "      <td>0.283853</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0.872252</td>\n",
+       "      <td>-0.022042</td>\n",
+       "      <td>0.811648</td>\n",
+       "      <td>0.249586</td>\n",
+       "      <td>-0.628898</td>\n",
+       "      <td>Hierarchical</td>\n",
+       "      <td>1</td>\n",
+       "      <td>-50.215115</td>\n",
+       "      <td>13.525675</td>\n",
+       "      <td>t-SNE</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>AlAs</th>\n",
+       "      <td>-34200.077513</td>\n",
+       "      <td>-34200.290775</td>\n",
+       "      <td>0.213262</td>\n",
+       "      <td>ZB</td>\n",
+       "      <td>-0.901775</td>\n",
+       "      <td>0.383115</td>\n",
+       "      <td>3.0</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>0.560131</td>\n",
+       "      <td>0.912996</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0.594594</td>\n",
+       "      <td>-0.753988</td>\n",
+       "      <td>0.700845</td>\n",
+       "      <td>-0.437955</td>\n",
+       "      <td>0.437324</td>\n",
+       "      <td>Hierarchical</td>\n",
+       "      <td>0</td>\n",
+       "      <td>-7.230611</td>\n",
+       "      <td>-354.558777</td>\n",
+       "      <td>t-SNE</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>...</th>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>SrTe</th>\n",
+       "      <td>-137269.487147</td>\n",
+       "      <td>-137269.107853</td>\n",
+       "      <td>-0.379295</td>\n",
+       "      <td>RS</td>\n",
+       "      <td>0.529060</td>\n",
+       "      <td>1.500892</td>\n",
+       "      <td>5.0</td>\n",
+       "      <td>5.0</td>\n",
+       "      <td>0.423168</td>\n",
+       "      <td>0.722286</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1.094378</td>\n",
+       "      <td>0.978781</td>\n",
+       "      <td>1.070186</td>\n",
+       "      <td>-0.931917</td>\n",
+       "      <td>-0.237949</td>\n",
+       "      <td>Hierarchical</td>\n",
+       "      <td>1</td>\n",
+       "      <td>-5.772508</td>\n",
+       "      <td>10.822165</td>\n",
+       "      <td>t-SNE</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>OZn</th>\n",
+       "      <td>-25540.809205</td>\n",
+       "      <td>-25540.911173</td>\n",
+       "      <td>0.101968</td>\n",
+       "      <td>ZB</td>\n",
+       "      <td>0.071193</td>\n",
+       "      <td>-1.087644</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>2.0</td>\n",
+       "      <td>-1.815302</td>\n",
+       "      <td>-1.348317</td>\n",
+       "      <td>...</td>\n",
+       "      <td>-1.571137</td>\n",
+       "      <td>-0.514985</td>\n",
+       "      <td>-1.552129</td>\n",
+       "      <td>-0.231026</td>\n",
+       "      <td>1.148139</td>\n",
+       "      <td>Hierarchical</td>\n",
+       "      <td>2</td>\n",
+       "      <td>71.758156</td>\n",
+       "      <td>-28.533104</td>\n",
+       "      <td>t-SNE</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>SZn</th>\n",
+       "      <td>-29945.889373</td>\n",
+       "      <td>-29946.165186</td>\n",
+       "      <td>0.275813</td>\n",
+       "      <td>ZB</td>\n",
+       "      <td>0.071193</td>\n",
+       "      <td>-0.617001</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>3.0</td>\n",
+       "      <td>-1.815302</td>\n",
+       "      <td>0.114207</td>\n",
+       "      <td>...</td>\n",
+       "      <td>-0.016253</td>\n",
+       "      <td>-0.514985</td>\n",
+       "      <td>-0.000901</td>\n",
+       "      <td>-0.231026</td>\n",
+       "      <td>1.681250</td>\n",
+       "      <td>Hierarchical</td>\n",
+       "      <td>3</td>\n",
+       "      <td>35.325409</td>\n",
+       "      <td>-33.065041</td>\n",
+       "      <td>t-SNE</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>SeZn</th>\n",
+       "      <td>-57752.319875</td>\n",
+       "      <td>-57752.583012</td>\n",
+       "      <td>0.263137</td>\n",
+       "      <td>ZB</td>\n",
+       "      <td>0.071193</td>\n",
+       "      <td>0.441945</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>-1.815302</td>\n",
+       "      <td>0.381952</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0.316936</td>\n",
+       "      <td>-0.514985</td>\n",
+       "      <td>0.368439</td>\n",
+       "      <td>-0.231026</td>\n",
+       "      <td>1.005977</td>\n",
+       "      <td>Hierarchical</td>\n",
+       "      <td>3</td>\n",
+       "      <td>28.319963</td>\n",
+       "      <td>-44.431747</td>\n",
+       "      <td>t-SNE</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>TeZn</th>\n",
+       "      <td>-118239.807676</td>\n",
+       "      <td>-118240.052677</td>\n",
+       "      <td>0.245001</td>\n",
+       "      <td>ZB</td>\n",
+       "      <td>0.071193</td>\n",
+       "      <td>1.500892</td>\n",
+       "      <td>4.0</td>\n",
+       "      <td>5.0</td>\n",
+       "      <td>-1.815302</td>\n",
+       "      <td>0.722286</td>\n",
+       "      <td>...</td>\n",
+       "      <td>1.094378</td>\n",
+       "      <td>-0.514985</td>\n",
+       "      <td>1.070186</td>\n",
+       "      <td>-0.231026</td>\n",
+       "      <td>-0.237949</td>\n",
+       "      <td>Hierarchical</td>\n",
+       "      <td>3</td>\n",
+       "      <td>4.945252</td>\n",
+       "      <td>-34.944271</td>\n",
+       "      <td>t-SNE</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>82 rows × 27 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "          energy_RS      energy_ZB  energy_diff min_struc_type      Z(A)  \\\n",
+       "AgBr -108781.333959 -108781.303925    -0.030033             RS  1.044160   \n",
+       "AgCl  -79397.451083  -79397.408285    -0.042797             RS  1.044160   \n",
+       "AgF   -74477.428165  -74477.274407    -0.153758             RS  1.044160   \n",
+       "AgI  -171339.208181 -171339.245107     0.036925             ZB  1.044160   \n",
+       "AlAs  -34200.077513  -34200.290775     0.213262             ZB -0.901775   \n",
+       "...             ...            ...          ...            ...       ...   \n",
+       "SrTe -137269.487147 -137269.107853    -0.379295             RS  0.529060   \n",
+       "OZn   -25540.809205  -25540.911173     0.101968             ZB  0.071193   \n",
+       "SZn   -29945.889373  -29946.165186     0.275813             ZB  0.071193   \n",
+       "SeZn  -57752.319875  -57752.583012     0.263137             ZB  0.071193   \n",
+       "TeZn -118239.807676 -118240.052677     0.245001             ZB  0.071193   \n",
+       "\n",
+       "          Z(B)  period(A)  period(B)     IP(A)     IP(B)  ...    r_s(B)  \\\n",
+       "AgBr  0.500776        5.0        4.0 -0.682185 -0.155241  ...  0.039278   \n",
+       "AgCl -0.558171        5.0        3.0 -0.682185 -0.550095  ... -0.349442   \n",
+       "AgF  -1.028814        5.0        2.0 -0.682185 -2.285188  ... -1.848794   \n",
+       "AgI   1.559722        5.0        5.0 -0.682185  0.283853  ...  0.872252   \n",
+       "AlAs  0.383115        3.0        4.0  0.560131  0.912996  ...  0.594594   \n",
+       "...        ...        ...        ...       ...       ...  ...       ...   \n",
+       "SrTe  1.500892        5.0        5.0  0.423168  0.722286  ...  1.094378   \n",
+       "OZn  -1.087644        4.0        2.0 -1.815302 -1.348317  ... -1.571137   \n",
+       "SZn  -0.617001        4.0        3.0 -1.815302  0.114207  ... -0.016253   \n",
+       "SeZn  0.441945        4.0        4.0 -1.815302  0.381952  ...  0.316936   \n",
+       "TeZn  1.500892        4.0        5.0 -1.815302  0.722286  ...  1.094378   \n",
+       "\n",
+       "        r_p(A)    r_p(B)    r_d(A)    r_d(B)    clustering  labels      x_emb  \\\n",
+       "AgBr -0.022042  0.109901  0.249586 -0.095787  Hierarchical       1 -39.526848   \n",
+       "AgCl -0.022042 -0.333307  0.249586 -0.806601  Hierarchical       1 -50.796947   \n",
+       "AgF  -0.022042 -1.773734  0.249586 -1.659579  Hierarchical       2  38.285912   \n",
+       "AgI  -0.022042  0.811648  0.249586 -0.628898  Hierarchical       1 -50.215115   \n",
+       "AlAs -0.753988  0.700845 -0.437955  0.437324  Hierarchical       0  -7.230611   \n",
+       "...        ...       ...       ...       ...           ...     ...        ...   \n",
+       "SrTe  0.978781  1.070186 -0.931917 -0.237949  Hierarchical       1  -5.772508   \n",
+       "OZn  -0.514985 -1.552129 -0.231026  1.148139  Hierarchical       2  71.758156   \n",
+       "SZn  -0.514985 -0.000901 -0.231026  1.681250  Hierarchical       3  35.325409   \n",
+       "SeZn -0.514985  0.368439 -0.231026  1.005977  Hierarchical       3  28.319963   \n",
+       "TeZn -0.514985  1.070186 -0.231026 -0.237949  Hierarchical       3   4.945252   \n",
+       "\n",
+       "           y_emb  embedding  \n",
+       "AgBr   22.154827      t-SNE  \n",
+       "AgCl   28.824591      t-SNE  \n",
+       "AgF    95.895142      t-SNE  \n",
+       "AgI    13.525675      t-SNE  \n",
+       "AlAs -354.558777      t-SNE  \n",
+       "...          ...        ...  \n",
+       "SrTe   10.822165      t-SNE  \n",
+       "OZn   -28.533104      t-SNE  \n",
+       "SZn   -33.065041      t-SNE  \n",
+       "SeZn  -44.431747      t-SNE  \n",
+       "TeZn  -34.944271      t-SNE  \n",
+       "\n",
+       "[82 rows x 27 columns]"
+      ]
+     },
+     "execution_count": 22,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-12-18T18:55:39.726794Z",
+     "start_time": "2020-12-18T18:55:39.689441Z"
+    }
+   },
    "outputs": [],
    "source": [
-    "Clustering().dpc()"
+    "import hdbscan"
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 67,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-12-18T19:02:33.653988Z",
+     "start_time": "2020-12-18T19:02:33.633756Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "HDBSCAN(algorithm='best', allow_single_cluster=False, alpha=1.0,\n",
+       "        approx_min_span_tree=True, cluster_selection_epsilon=0.0,\n",
+       "        cluster_selection_method='eom', core_dist_n_jobs=4,\n",
+       "        gen_min_span_tree=False, leaf_size=40,\n",
+       "        match_reference_implementation=False, memory=Memory(location=None),\n",
+       "        metric='euclidean', min_cluster_size=9, min_samples=1, p=None,\n",
+       "        prediction_data=False)"
+      ]
+     },
+     "execution_count": 67,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
-    "We choose the values on the x,y-axis, and the algorithm will return the clusters that are given by the peaks that we selected. In this case we select the 3 peaks which are the closest to the top right vertex."
+    "min_cluster_size = 9\n",
+    "clusterer = hdbscan.HDBSCAN(min_cluster_size=min_cluster_size, min_samples=1)\n",
+    "clusterer.fit(df[features])"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
+   "execution_count": 68,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-12-18T19:02:34.026348Z",
+     "start_time": "2020-12-18T19:02:34.022206Z"
+    }
+   },
    "outputs": [],
    "source": [
-    "Clustering().dpc(2,4.)"
+    "cluster_labels=clusterer.labels_\n",
+    "df['labels']=cluster_labels\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 69,
    "metadata": {
-    "scrolled": true
+    "ExecuteTime": {
+     "end_time": "2020-12-18T19:02:34.365919Z",
+     "start_time": "2020-12-18T19:02:34.356374Z"
+    },
+    "scrolled": false
    },
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "1ac9898689124574aef1ccc5775edf6f",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(Button(description='PCA', style=ButtonStyle()), Button(description='MDS', style=ButtonStyle()),…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
    "source": [
     "display(box)"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
+   "execution_count": 70,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2020-12-18T19:02:34.791129Z",
+     "start_time": "2020-12-18T19:02:34.759139Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>RS</th>\n",
+       "      <th>ZB</th>\n",
+       "      <th>Materials in cluster</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>78</td>\n",
+       "      <td>21</td>\n",
+       "      <td>41</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>0</td>\n",
+       "      <td>100</td>\n",
+       "      <td>26</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   RS   ZB Materials in cluster\n",
+       "0  78   21                   41\n",
+       "1   0  100                   26"
+      ]
+     },
+     "execution_count": 70,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
     "composition_RS_ZB(df)"
    ]
@@ -796,7 +1734,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.7"
+   "version": "3.7.9"
   }
  },
  "nbformat": 4,