From b81b533d9efbdc40b4d28c41d95d51aed671e80f Mon Sep 17 00:00:00 2001
From: Luigi Sbailo <luigi.sbailo@gmail.com>
Date: Tue, 8 Dec 2020 17:52:55 +0100
Subject: [PATCH] Restore lost cell

---
 compressed_sensing.ipynb | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/compressed_sensing.ipynb b/compressed_sensing.ipynb
index d411ca3..84242fb 100644
--- a/compressed_sensing.ipynb
+++ b/compressed_sensing.ipynb
@@ -865,6 +865,21 @@
     "visualizer.show()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Predicting new materials (extrapolation)\n",
+    "<div style=\"list-style:disc; margin: 2px;padding: 10px;border: 0px;border:8px double   green; font-size:16px;padding-left: 32px;padding-right: 22px; width:89%\">\n",
+    "<li>Perform a leave-one-out cross-validation (LOOCV) using SISSO.</li>\n",
+    "<li>Analyze the prediction accuracy and how often the same descriptor is selected.</li>\n",
+    "</div>\n",
+    "\n",
+    "We have seen that we can fit the energy differences of materials accurately. But what about predicting the energy difference of a 'new' material (which was not included when determining the model)? We test the prediction performance via LOOCV.  In a LOOCV for each material the following procedure is performed: the selected material is excluded, the model is built on the remaining materials and the model accurcy is tested on the excluded material. This means that we need to run SISSO function 82 times. <br>\n",
+    "Get the data in the next cell and run the LOOCV  one cell after. Note that running the LOOCV  could take up to ten minutes. Use the remaining two cells of this chapter to analyse the results.<br>\n",
+    "How is the prediction error compared to the fitting error? How often is the same descriptor selected? Are there materials that yield an outlying high/low error? "
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
-- 
GitLab