From d85c997479be03937c9f3fab3badac649c1d6e2b Mon Sep 17 00:00:00 2001
From: Luigi Sbailo <luigi.sbailo@physik.hu-berlin.de>
Date: Mon, 28 Mar 2022 14:10:53 +0200
Subject: [PATCH] Reinsert fast search and find of density peaks in tutorial

---
 exploratory_analysis.ipynb | 116 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 114 insertions(+), 2 deletions(-)

diff --git a/exploratory_analysis.ipynb b/exploratory_analysis.ipynb
index c532662..daafd34 100644
--- a/exploratory_analysis.ipynb
+++ b/exploratory_analysis.ipynb
@@ -864,11 +864,123 @@
     "The effect of _min_samples_ is to fix how conservative respect to outliers detection the algorithm should be. Increasing its value the distorsion effects of the mutual reachability distance become more evident, while decreasing it less points are classified as outliers. \n",
     "Can you obtain more meaningful results by decreasing the value of this parameter?"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Fast search and find of density peaks\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The fast search and find of density peaks algorithm is introduced in:\n",
+    "\n",
+    "A. Rodriguez, A. Laio: <span style=\"font-style: italic;\">Clustering by fast search and find of density peaks</span>,  Science, (2014).\n",
+    "\n",
+    "The implementation of the algorithm that we use is taken from https://pypi.org/project/pydpc/."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pydpc import Cluster as DPCClustering"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The fast search and find of density peaks algorithm allows to make the clusters selection based on a graphical interpretation. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Clustering().dpc()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2022-03-28T12:09:46.286109Z",
+     "start_time": "2022-03-28T12:09:46.279639Z"
+    }
+   },
+   "source": [
+    "In the plot above, each point represents a different peak that could be the core of a specific cluster if selected.\n",
+    "All points of the dataset are placed in the plot, and in the top right position of the plot we always have one point that represents the peak in the highest density region. \n",
+    "The other peaks are then placed in the plot according to their local density and distance ('delta' in the graph) from the first peak. \n",
+    "Choosing the values on the x,y-axis, it is possible to select the clusters that the algorithm returns.\n",
+    "Here, we select the 3 peaks closest to the vertex."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Clustering().dpc(2.4,3.8)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "show_embedding()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "composition_RS_ZB(df)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We have found two clusters containing only materials with the same most stable structure and a mixed cluster containing both most stable structures. \n",
+    "It is interesting to visualize this result with MDS, where we can see that the mixed cluster is placed in between the pure clusters as a transition zone.\n",
+    "\n",
+    "These clustering results suggest that the atominc features we have used are sufficient for classifying materials according to their most stable structure.\n",
+    "Even though the RS and ZB clusters are not clearly separated as a mixed cluster is also found, a supervised machine learning model might be able to learn classification of the 82 octet binary materials.\n",
+    "We might also expect that such model faces challenges especially when classifing materials in the transition area.\n",
+    "A supervised learning algorithm, namely SISSO, has been used for such classification, and we resort to other tutorials in the AI toolkit to study this application (see https://nomad-lab.eu/prod/analytics/public/user-redirect/notebooks/tutorials/compressed_sensing.ipynb and https://nomad-lab.eu/prod/analytics/public/user-redirect/notebooks/tutorials/descriptor_role.ipynb).\n",
+    "\n",
+    "In this tutorial, we have seen an exemplary application of unsupervised learning that has been deployed for explorying the structure of a multi-dimensional dataset.\n",
+    "We have performed a clustering analysis, that led us finding clusters representative of different external labels, i.e. the most stable structure.\n",
+    "Such clustering gave us a clear evidence that the set of features used for clustering should be enough for determining the value of the external labels.\n",
+    "A subsequent step of such analysis would be the deployment of a supervised learing algorithm to find an interpretable relationship between the input features and the labels. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },
@@ -882,7 +994,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.7"
+   "version": "3.7.3"
   }
  },
  "nbformat": 4,
-- 
GitLab