Add advanced embedding tutorial.

af29e400 · Angelo Ziletti · e5f3cd18 · af29e400
Commit af29e400 authored 8 years ago by Angelo Ziletti
--- a/beaker-notebooks/Embedding_advanced_nov23.bkr
+++ b/beaker-notebooks/Embedding_advanced_nov23.bkr
+{
+    "beaker": "2",
+    "evaluators": [
+        {
+            "name": "HTML",
+            "plugin": "HTML",
+            "view": {
+                "cm": {
+                    "mode": "htmlmixed"
+                }
+            }
+        },
+        {
+            "name": "JavaScript",
+            "plugin": "JavaScript",
+            "view": {
+                "cm": {
+                    "mode": "javascript",
+                    "background": "#FFE0F0"
+                }
+            },
+            "languageVersion": "ES2015"
+        },
+        {
+            "name": "IPython",
+            "plugin": "IPython",
+            "setup": "%matplotlib inline\nimport numpy\nimport matplotlib\nfrom matplotlib import pylab, mlab, pyplot\nnp = numpy\nplt = pyplot\nfrom IPython.display import display\nfrom IPython.core.pylabtools import figsize, getfigs\nfrom pylab import *\nfrom numpy import *\n",
+            "view": {
+                "cm": {
+                    "mode": "python"
+                }
+            },
+            "deferred": {
+                "promise": {}
+            }
+        },
+        {
+            "name": "TeX",
+            "plugin": "TeX",
+            "view": {
+                "cm": {
+                    "mode": "stex"
+                }
+            }
+        }
+    ],
+    "cells": [
+        {
+            "id": "codetU7T6w",
+            "type": "code",
+            "evaluator": "HTML",
+            "input": {
+                "body": [
+                    "<p style=\"color: #20335d;;font-weight: 900; font-size: 22pt;\">  NOMAD analytics toolkit</p>",
+                    "<label style=\"text-align: center; color: #20335d; font-weight: 900; font-size: 18pt;\">How to build a two-dimensional materials similarity map:</label>",
+                    "<label style=\"color: #20335d;font-weight: 900; font-size: 15pt;\"> The case of octet-binary zincblende/rocksalt semiconductors</label>",
+                    " <p style=\"font-size: 15px;\"> developed by Angelo Ziletti, Ankit Kariryaa, Fawzi Mohamed, Luca Ghiringhelli, and Matthias Scheffler. [Last update November 23, 2016]</p>"
+                ],
+                "hidden": true
+            },
+            "output": {
+                "state": {},
+                "result": {
+                    "type": "BeakerDisplay",
+                    "innertype": "Html",
+                    "object": "<script>\nvar beaker = bkHelper.getBeakerObject().beakerObj;\n</script>\n<p style=\"color: #20335d;;font-weight: 900; font-size: 22pt;\">  NOMAD analytics toolkit</p>\n<label style=\"text-align: center; color: #20335d; font-weight: 900; font-size: 18pt;\">How to build a two-dimensional materials similarity map:</label>\n<label style=\"color: #20335d;font-weight: 900; font-size: 15pt;\"> The case of octet-binary zincblende/rocksalt semiconductors</label>\n <p style=\"font-size: 15px;\"> developed by Angelo Ziletti, Ankit Kariryaa, Fawzi Mohamed, Luca Ghiringhelli, and Matthias Scheffler. [Last update November 23, 2016]</p>"
+                },
+                "selectedType": "BeakerDisplay",
+                "elapsedTime": 0,
+                "height": 140
+            },
+            "evaluatorReader": true,
+            "lineCount": 4
+        },
+        {
+            "id": "codegYCcWJ",
+            "type": "code",
+            "evaluator": "HTML",
+            "input": {
+                "body": [
+                    "<style type=\"text/css\">",
+                    " .lasso_instructions{",
+                    "    font-size: 15px;",
+                    "  } ",
+                    "</style>",
+                    "<!-- Button trigger modal -->",
+                    "<button type=\"button\" class=\"btn btn-default\" data-toggle=\"modal\" data-target=\"#lasso-motivation-modal\">",
+                    " Introduction and motivation",
+                    "</button>",
+                    "",
+                    "<!-- Modal -->",
+                    "<div class=\"modal fade\" id=\"lasso-motivation-modal\" tabindex=\"-1\" role=\"dialog\" aria-labelledby=\"lasso-motivation-modal-label\">",
+                    "  <div class=\"modal-dialog modal-lg\" role=\"document\">",
+                    "    <div class=\"modal-content\">",
+                    "      <div class=\"modal-header\">",
+                    "        <button type=\"button\" class=\"close\" data-dismiss=\"modal\" aria-label=\"Close\"><span aria-hidden=\"true\">×</span></button>",
+                    "        <h4 class=\"modal-title\" id=\"lasso-motivation-modal-label\">Introduction and motivation</h4>",
+                    "      </div>",
+                    "      <div class=\"modal-body lasso_instructions\">",
+                    "         <p> In this tutorial notebook, we present a tool that produces two-dimensional structure maps for octet binary compounds, by starting from a high-dimensional rotational, translation, and permutational invariant representation (a descriptor) of the spatial structure (the geometry) that identifies each data point (material).",
+                    "          ",
+                    "        <p> The low-dimensional embedding methods (here, two-dimensional for the sake of visualization) are <i>unsupervised</i> machine-learning algorithms; so, in our example, the algorithm processes only the similarity (the distance) between the points in the high-dimensional representation. </p>",
+                    "        ",
+                    "        <p> In the linear method, <b>principal component analysis (<a href=\"https://en.wikipedia.org/wiki/Principal_component_analysis\" target=\"_blank\">PCA</a>)</b>, the direction (linear combination of the input coordinates) with the maximum variance is identified as the first principal component (PC). The direction perpendicular to the first PC with the largest variance is the second PC.",
+                    "          The process can be iterated up to as many dimensions as the initial dimensionality of the data, but here we stop at the second dimension and give the amount of total variance recovered by the first two principal components. </p>",
+                    "      <p> In the two popular non-linear methods we chose, <b>multidimensional scaling (<a href=\"https://en.wikipedia.org/wiki/Multidimensional_scaling\" target=\"_blank\">MDS</a>) </b> tries to preserve the distances from the given high-dimensional to the two-dimensional representation, ",
+                    "        whereas <b>t-Distributed Stochastic Neighbor Embedding (<a href=\"https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding\" target=\"_blank\">t-SNE</a>) </b> tries to preserve the local shape for groups of neighboring points. Both methods use a notion of distance that in our example is the Euclidean norm, even if in principle it could be any proper norm. </p>",
+                    "",
+                    "        <p> In the results, we show the data points colored according to the classification (zincblende or rocksalt) in Part 1 and Part 2, while in Part 3 the data points are colored by a property which is retrieved from the database by means of a query. The labeling and consequent coloring is independent of the embedding method used, therefore the labeling is an <i>a posteriori</i> check that the high-dimensional representation could contain information about the labeling itself. In practice, if the coloring identifies clearly distinct areas, then the two dimensional representation is a map for the prediction of the labels, so that a new data point of unknown labeling, that lands in the two-dimensional map in an area of points with known labeling, is expected to belong to that same labeling. </p>",
+                    "        ",
+                    "<p>The merit of the embedding methods is to provide relatively inexpensive tools to visually test whether a given set of features contains information about an investigated property (label). For this reason, they are widely used as preliminary tools for discovering structures in the data. </p>",
+                    "      </div>",
+                    "      <div class=\"modal-footer\">",
+                    "        <button type=\"button\" class=\"btn btn-default\" data-dismiss=\"modal\">Close</button>",
+                    "<!--         <button type=\"button\" class=\"btn btn-primary\">Save changes</button> -->",
+                    "      </div>",
+                    "    </div>",
+                    "  </div>",
+                    "</div>"
+                ],
+                "hidden": true
+            },
+            "output": {
+                "state": {},
+                "result": {
+                    "type": "BeakerDisplay",
+                    "innertype": "Html",
+                    "object": "<script>\nvar beaker = bkHelper.getBeakerObject().beakerObj;\n</script>\n<style type=\"text/css\">\n .lasso_instructions{\n    font-size: 15px;\n  } \n</style>\n<!-- Button trigger modal -->\n<button type=\"button\" class=\"btn btn-default\" data-toggle=\"modal\" data-target=\"#lasso-motivation-modal\">\n Introduction and motivation\n</button>\n\n<!-- Modal -->\n<div class=\"modal fade\" id=\"lasso-motivation-modal\" tabindex=\"-1\" role=\"dialog\" aria-labelledby=\"lasso-motivation-modal-label\">\n  <div class=\"modal-dialog modal-lg\" role=\"document\">\n    <div class=\"modal-content\">\n      <div class=\"modal-header\">\n        <button type=\"button\" class=\"close\" data-dismiss=\"modal\" aria-label=\"Close\"><span aria-hidden=\"true\">×</span></button>\n        <h4 class=\"modal-title\" id=\"lasso-motivation-modal-label\">Introduction and motivation</h4>\n      </div>\n      <div class=\"modal-body lasso_instructions\">\n         <p> In this tutorial notebook, we present a tool that produces two-dimensional structure maps for octet binary compounds, by starting from a high-dimensional rotational, translation, and permutational invariant representation (a descriptor) of the spatial structure (the geometry) that identifies each data point (material).\n          \n        </p><p> The low-dimensional embedding methods (here, two-dimensional for the sake of visualization) are <i>unsupervised</i> machine-learning algorithms; so, in our example, the algorithm processes only the similarity (the distance) between the points in the high-dimensional representation. </p>\n        \n        <p> In the linear method, <b>principal component analysis (<a href=\"https://en.wikipedia.org/wiki/Principal_component_analysis\" target=\"_blank\">PCA</a>)</b>, the direction (linear combination of the input coordinates) with the maximum variance is identified as the first principal component (PC). The direction perpendicular to the first PC with the largest variance is the second PC.\n          The process can be iterated up to as many dimensions as the initial dimensionality of the data, but here we stop at the second dimension and give the amount of total variance recovered by the first two principal components. </p>\n      <p> In the two popular non-linear methods we chose, <b>multidimensional scaling (<a href=\"https://en.wikipedia.org/wiki/Multidimensional_scaling\" target=\"_blank\">MDS</a>) </b> tries to preserve the distances from the given high-dimensional to the two-dimensional representation, \n        whereas <b>t-Distributed Stochastic Neighbor Embedding (<a href=\"https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding\" target=\"_blank\">t-SNE</a>) </b> tries to preserve the local shape for groups of neighboring points. Both methods use a notion of distance that in our example is the Euclidean norm, even if in principle it could be any proper norm. </p>\n\n        <p> In the results, we show the data points colored according to the classification (zincblende or rocksalt) in Part 1 and Part 2, while in Part 3 the data points are colored by a property which is retrieved from the database by means of a query. The labeling and consequent coloring is independent of the embedding method used, therefore the labeling is an <i>a posteriori</i> check that the high-dimensional representation could contain information about the labeling itself. In practice, if the coloring identifies clearly distinct areas, then the two dimensional representation is a map for the prediction of the labels, so that a new data point of unknown labeling, that lands in the two-dimensional map in an area of points with known labeling, is expected to belong to that same labeling. </p>\n        \n<p>The merit of the embedding methods is to provide relatively inexpensive tools to visually test whether a given set of features contains information about an investigated property (label). For this reason, they are widely used as preliminary tools for discovering structures in the data. </p>\n      </div>\n      <div class=\"modal-footer\">\n        <button type=\"button\" class=\"btn btn-default\" data-dismiss=\"modal\">Close</button>\n<!--         <button type=\"button\" class=\"btn btn-primary\">Save changes</button> -->\n      </div>\n    </div>\n  </div>\n</div>"
+                },
+                "selectedType": "BeakerDisplay",
+                "elapsedTime": 0,
+                "height": 73
+            },
+            "evaluatorReader": true,
+            "lineCount": 39
+        },
+        {
+            "id": "sectionpWF7DM",
+            "type": "section",
+            "title": "<p style=\"color: #20335d; font-weight: 900;\">Part 1: Tutorial </p>",
+            "level": 1,
+            "evaluatorReader": false,
+            "collapsed": false
+        },
+        {
+            "id": "sectionIheVZq",
+            "type": "section",
+            "title": "<p style=\"color: #20335d; font-size: 15pt;font-weight: 900;\">1.1 Import libraries and define file paths</p>",
+            "level": 2,
+            "evaluatorReader": false,
+            "collapsed": false
+        },
+        {
+            "id": "markdownd8hU1H",
+            "type": "markdown",
+            "body": [
+                "<div class=\"modal-body lasso_instructions\">",
+                "We start importing libraries and defining paths for data files, temporary files, and output files. ",
+                "</div>"
+            ],
+            "evaluatorReader": false
+        },
+        {
+            "id": "codeJCRIa2",
+            "type": "code",
+            "evaluator": "IPython",
+            "input": {
+                "body": [
+                    "from IPython.core.display import HTML ",
+                    "",
+                    "# load packages",
+                    "import hashlib",
+                    "import time",
+                    "import numpy as np",
+                    "from nomad_sim.wrappers_desc_class import get_json_list, calc_descriptor ",
+                    "from nomad_sim.wrappers_desc_class import calc_model, calc_embedding, plot",
+                    "from nomad_sim.utils_crystals import create_supercell",
+                    "",
+                    "",
+                    "# define paths",
+                    "tmp_folder = '/home/beaker/.beaker/v1/web/tmp/'",
+                    "control_file = '/home/beaker/.beaker/v1/web/tmp/control.json'",
+                    "data_folder='/home/beaker/test/nomad_sim/data_zcrs'",
+                    "lookup_file = '/home/beaker/.beaker/v1/web/tmp/lookup.dat'"
+                ]
+            },
+            "output": {
+                "state": {},
+                "selectedType": "Results",
+                "pluginName": "IPython",
+                "shellId": "45EC712359E9459FBD4B531D7B8C7C6A",
+                "elapsedTime": 2181,
+                "height": 89,
+                "result": {
+                    "type": "Results",
+                    "outputdata": [
+                        {
+                            "type": "err",
+                            "value": "/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n  \"This module will be removed in 0.20.\", DeprecationWarning)\nUsing TensorFlow backend.\n"
+                        }
+                    ]
+                }
+            },
+            "evaluatorReader": true,
+            "lineCount": 16
+        },
+        {
+            "id": "sectionGehWM5",
+            "type": "section",
+            "title": "<p style=\"color: #20335d; font-size: 15pt;font-weight: 900;\">1.2 Represent the system: the Partial Radial Distribution Function (PRDF) as\"descriptor\"</p>",
+            "level": 2,
+            "evaluatorReader": false,
+            "collapsed": false
+        },
+        {
+            "id": "markdownb4dAuG",
+            "type": "markdown",
+            "body": [
+                "<div class=\"modal-body lasso_instructions\">",
+                "The Partial Radial DIstribution Function (PRDF) considers distributions of pairwise distances $d_{\\alpha \\beta}$ <br>",
+                "between two atom type $\\alpha$ and $\\beta$.",
+                "[<a href=\"http://journals.aps.org/prb/abstract/10.1103/PhysRevB.89.205118\" target=\"blank\">K. T. Schütt et al., Phys. Rev. B 89, 205118 (2014)</a>] ",
+                "</div>"
+            ],
+            "evaluatorReader": false
+        },
+        {
+            "id": "codeP4vxI6",
+            "type": "code",
+            "evaluator": "TeX",
+            "input": {
+                "body": [
+                    "{\\Large g_{\\alpha \\beta}(r) = \\dfrac{1}{N_\\alpha V_r} \\sum_{i=1}^{N_\\alpha}\\sum_{i=j}^{N_\\beta} \\theta \\left( d_{\\alpha_i \\beta_j} -r \\right) ",
+                    "\\theta \\left( r + dr - d_{\\alpha_i \\beta_j}\\right)}"
+                ],
+                "hidden": true
+            },
+            "output": {
+                "state": {},
+                "result": {
+                    "type": "BeakerDisplay",
+                    "innertype": "Latex",
+                    "object": "{\\Large g_{\\alpha \\beta}(r) = \\dfrac{1}{N_\\alpha V_r} \\sum_{i=1}^{N_\\alpha}\\sum_{i=j}^{N_\\beta} \\theta \\left( d_{\\alpha_i \\beta_j} -r \\right) \n\\theta \\left( r + dr - d_{\\alpha_i \\beta_j}\\right)}"
+                },
+                "selectedType": "BeakerDisplay",
+                "elapsedTime": 52,
+                "height": 86
+            },
+            "evaluatorReader": true,
+            "lineCount": 2
+        },
+        {
+            "id": "markdownkqHXtA",
+            "type": "markdown",
+            "body": [
+                "<div class=\"modal-body lasso_instructions\">",
+                " where $N_{\\alpha, \\beta}$ are the numbers of atoms of type $\\alpha$ and $\\beta$, $V_r$ is the volume of the shell, and $\\theta$ is the ",
+                "<a href=\"https://en.wikipedia.org/wiki/Heaviside_step_function\" target=\"blank\">Heaviside step function</a>.",
+                " <br><br>",
+                " It can be seen as the density of atoms of type $\\beta$ in a shell of radius $r$ and width $dr$ centered around an atom of type $\\alpha$:",
+                " <br>",
+                " <img src=\"http://journals.aps.org/prb/article/10.1103/PhysRevB.89.205118/figures/1/medium\" height=\"200pt\">",
+                " </div>"
+            ],
+            "evaluatorReader": false
+        },
+        {
+            "id": "markdownBHzL0A",
+            "type": "markdown",
+            "body": [
+                "<div class=\"modal-body lasso_instructions\">",
+                "Run the cell below to calculate the Partial Radial DIstribution Function for the 820 files in the \"Octet binary dataset\". <br>",
+                "<font color=\"blue\">Estimated calculation time: 1min40sec</font>",
+                "</div>"
+            ],
+            "evaluatorReader": false
+        },
+        {
+            "id": "codegly6Qd",
+            "type": "code",
+            "evaluator": "IPython",
+            "input": {
+                "body": [
+                    "# read the list of files from the data folder",
+                    "json_list = get_json_list(",
+                    "    method='folder', ",
+                    "    drop_duplicates=False, ",
+                    "    data_folder=data_folder, ",
+                    "    tmp_folder=tmp_folder)",
+                    "",
+                    "operations_on_structure = [(create_supercell, {'replicas': [1, 1, 1]})]",
+                    "op_list = np.zeros(len(json_list))",
+                    "",
+                    "json_list = json_list[:10]",
+                    "",
+                    "# calculate descriptor",
+                    "target_list = calc_descriptor(",
+                    "    desc_type='prdf', ",
+                    "    is_rs_zb=True,",
+                    "    operations_on_structure=operations_on_structure,",
+                    "    json_list=json_list, ",
+                    "    tmp_folder=tmp_folder)",
+                    "",
+                    "# Note:",
+                    "# target_list is a list of 0 or 1; ",
+                    "# it is 0 if the binary is zincblende ",
+                    "# it is 1 if the binary is rocksalt",
+                    "# this list is later used for the color plot in the Viewer"
+                ]
+            },
+            "output": {
+                "state": {},
+                "selectedType": "Results",
+                "pluginName": "IPython",
+                "shellId": "45EC712359E9459FBD4B531D7B8C7C6A",
+                "elapsedTime": 940,
+                "height": 172,
+                "result": {
+                    "type": "Results",
+                    "outputdata": [
+                        {
+                            "type": "err",
+                            "value": "INFO: Calculating descriptor: prdf\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Using None cell\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 1/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 3/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 5/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 7/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 9/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Descriptor calculation: done.\n"
+                        }
+                    ]
+                }
+            },
+            "evaluatorReader": true,
+            "lineCount": 25
+        },
+        {
+            "id": "sectionRhNN5H",
+            "type": "section",
+            "title": "<p style=\"color: #20335d; font-size: 15pt;font-weight: 900;\">1.3 Calculate a two-dimensional materials map: two-dimensional embedding</p>",
+            "level": 2,
+            "evaluatorReader": false,
+            "collapsed": false
+        },
+        {
+            "id": "markdownS8xK6x",
+            "type": "markdown",
+            "body": [
+                "<div class=\"modal-body lasso_instructions\">",
+                "The goal of this section is to reduce the dimensionality of a dataset to two dimensions for visualization purposes <br>",
+                "",
+                "  <p> Here, we use <b>multidimensional scaling </b>, a non-linear method that tries to preserve the distances from the given high-dimensional to the two-dimensional representation. <br>",
+                "In \"Part2: Guided Exercise\" you will be able to use different methods\". [<a href=\"https://en.wikipedia.org/wiki/Multidimensional_scaling\" target=\"_blank\">more info</a>]<br>",
+                "<font color=\"blue\">Estimated calculation time: 30sec</font>",
+                "",
+                "</div>"
+            ],
+            "evaluatorReader": false
+        },
+        {
+            "id": "code12UjwI",
+            "type": "code",
+            "evaluator": "IPython",
+            "input": {
+                "body": [
+                    "calc_embedding(",
+                    "    embed_method='mds',",
+                    "    desc_type='prdf', ",
+                    "    target_name='target',",
+                    "    desc_file='descriptor.tar.gz',",
+                    "    standardize='True',",
+                    "    lookup_file=lookup_file, ",
+                    "    tmp_folder=tmp_folder)"
+                ]
+            },
+            "output": {
+                "state": {},
+                "selectedType": "Results",
+                "pluginName": "IPython",
+                "shellId": "45EC712359E9459FBD4B531D7B8C7C6A",
+                "elapsedTime": 387,
+                "dataresult": 0,
+                "height": 321,
+                "result": {
+                    "type": "Results",
+                    "outputdata": [
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Setting up dictionary for cluster calculation.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Configuration count: 10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Longest rdf list: 4516\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Atomic set info:\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Number of different atoms in the set: 17\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Highest atomic number in the set: 57\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Actual atom set: set([32, 33, 34, 3, 4, 35, 48, 9, 12, 13, 14, 16, 50, 19, 52, 55, 56])\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Sparse array shape: (10, 57, 57, 50), (10, 162450)\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py:2699: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`.\n  VisibleDeprecationWarning)\nDEBUG: Processing configuration 1/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Actual feature matrix needs 4104 bytes\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Feature matrix shape: (10, 162450)\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Data standardized by removing the mean and scaling to unit variance.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Using Multidimensional scaling as embedding method\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Time to compute 2d-embedding: 0.014 sec\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Two-dimensional embedding: done.\n"
+                        }
+                    ]
+                }
+            },
+            "evaluatorReader": true,
+            "lineCount": 8
+        },
+        {
+            "id": "section8KgfL6",
+            "type": "section",
+            "title": "<p style=\"color: #20335d; font-size: 15pt;font-weight: 900;\">1.4 Plot the results in the interactive NOMAD Viewer</p>",
+            "level": 2,
+            "evaluatorReader": false,
+            "collapsed": false
+        },
+        {
+            "id": "markdowngmwMJ7",
+            "type": "markdown",
+            "body": [
+                "<div class=\"modal-body lasso_instructions\">",
+                "Run the cell below to generate figures and geometry files to interactively visualize the results.<br>",
+                "Once the calculation is completed, click on the link \"Click here to open the Viewer\" to see the results in the NOMAD Viewer.<br>",
+                "",
+                "<font color=\"blue\"> Estimated calculation time: 55 sec</font>",
+                "</div>"
+            ],
+            "evaluatorReader": false
+        },
+        {
+            "id": "codeVI8NCe",
+            "type": "code",
+            "evaluator": "IPython",
+            "input": {
+                "body": [
+                    "# read and plot data",
+                    "json_list, frame_list, x_list, y_list, _ = get_json_list(method='file', data_folder=data_folder,",
+                    "    path_to_file=lookup_file, drop_duplicates=False, displace_duplicates=True, get_unique_list=True)",
+                    "",
+                    "# generate a pseudo-random name for Viewer",
+                    "name_html_page = hashlib.sha224(str(time.time())).hexdigest()[:16]",
+                    "",
+                    "# 3x3 supercell for visualization purposes",
+                    "operations_on_structure = [(create_supercell, {'replicas': [3, 3, 3]})]",
+                    "op_list = np.zeros(len(json_list))",
+                    "",
+                    "# generate figures and geometry files for the plot",
+                    "filename = plot(",
+                    "    target_list=target_list, ",
+                    "    target_unit='arb. units', ",
+                    "    target_name='Target',",
+                    "    target_class_names = ['Zincblende', 'Rocksalt'],",
+                    "    is_classification=True,",
+                    "    name=name_html_page, ",
+                    "    json_list=json_list, ",
+                    "    frames='list', ",
+                    "    frame_list=frame_list, ",
+                    "    op_list=op_list,",
+                    "    operations_on_structure=operations_on_structure,",
+                    "    file_format='NOMAD', ",
+                    "    clustering_x_list=x_list, ",
+                    "    clustering_y_list=y_list, ",
+                    "    clustering_point_size=12, ",
+                    "    tmp_folder=tmp_folder, ",
+                    "    control_file=control_file,",
+                    "    cell_type=None,",
+                    "    plot_title='Structural-similarity map')",
+                    "",
+                    "# show HTML link to the Viewer page",
+                    "HTML(filename)",
+                    ""
+                ]
+            },
+            "output": {
+                "state": {},
+                "selectedType": "Results",
+                "pluginName": "IPython",
+                "shellId": "45EC712359E9459FBD4B531D7B8C7C6A",
+                "elapsedTime": 992,
+                "height": 223,
+                "result": {
+                    "type": "Results",
+                    "outputdata": [
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Information from the lookup file read correctly.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Generating figures and geometry files.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 1/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 3/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 5/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 7/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 9/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Generating figures and geometry files: done.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Plotting 10 crystal structures.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Click on the button 'View interactive 2D scatter plot' to see the plot.\n"
+                        }
+                    ],
+                    "payload": "<div class=\"output_subarea output_html rendered_html\"><a target=\"_blank\" href=\"/user/tmp/c560d2423b2a837c.html\">Click here to open the Viewer</a></div>"
+                }
+            },
+            "evaluatorReader": true,
+            "lineCount": 36
+        },
+        {
+            "id": "section93eRoj",
+            "type": "section",
+            "title": "<p style=\"color: #20335d; font-weight: 900;\">Part 2: Guided Exercise: explore different embedding methods </p>",
+            "level": 1,
+            "evaluatorReader": false,
+            "collapsed": false
+        },
+        {
+            "id": "markdown1F2Tgb",
+            "type": "markdown",
+            "body": [
+                "<div class=\"modal-body lasso_instructions\">",
+                "In this guided exercise, you will use different methods to visualize high-dimensional data in two-dimensions; <br>",
+                "these methods are called ",
+                "<b> embedding </b> or <b> non-linear dimensionality reduction </b> methods. ",
+                "[<a href=\"https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction\" target=\"_blank\">more info</a>]",
+                "",
+                "<br>",
+                "<br> ",
+                "We first load the libraries, set up the paths to the data, and calculate the Partial Radial Distribution Function (PRDF) descriptor exactly as in Part 1. <br>",
+                "<font color=\"blue\"> Estimated calculation time: 2min 10sec</font>",
+                "</div> "
+            ],
+            "evaluatorReader": false
+        },
+        {
+            "id": "codeSoV3Pn",
+            "type": "code",
+            "evaluator": "IPython",
+            "input": {
+                "body": [
+                    "from IPython.core.display import HTML ",
+                    "",
+                    "# load packages",
+                    "from nomad_sim.wrappers_desc_class import get_json_list, calc_descriptor ",
+                    "from nomad_sim.wrappers_desc_class import calc_model, calc_embedding, plot",
+                    "from nomad_sim.utils_crystals import create_supercell",
+                    "import hashlib",
+                    "import time",
+                    "",
+                    "# define paths",
+                    "tmp_folder = '/home/beaker/.beaker/v1/web/tmp/'",
+                    "control_file = '/home/beaker/.beaker/v1/web/tmp/control.json'",
+                    "data_folder='/home/beaker/test/nomad_sim/data_zcrs'",
+                    "lookup_file = '/home/beaker/.beaker/v1/web/tmp/lookup.dat'",
+                    "",
+                    "# read the list of files from the data folder",
+                    "json_list = get_json_list(",
+                    "    method='folder', ",
+                    "    drop_duplicates=False, ",
+                    "    data_folder=data_folder, ",
+                    "    tmp_folder=tmp_folder)",
+                    "",
+                    "operations_on_structure = [(create_supercell, {'replicas': [1, 1, 1]})]",
+                    "op_list = np.zeros(len(json_list))",
+                    "",
+                    "json_list = json_list[:10]",
+                    "",
+                    "# calculate descriptor",
+                    "target_list = calc_descriptor(",
+                    "    desc_type='prdf', ",
+                    "    is_rs_zb=True, ",
+                    "    json_list=json_list, ",
+                    "    operations_on_structure=operations_on_structure,",
+                    "    tmp_folder=tmp_folder)",
+                    "",
+                    "# Note:",
+                    "# target_list is a list of 0 or 1; ",
+                    "# it is 0 if the binary is zincblende ",
+                    "# it is 1 if the binary is rocksalt",
+                    "# this list is later used for the color plot in the Viewer"
+                ]
+            },
+            "output": {
+                "state": {},
+                "selectedType": "Results",
+                "pluginName": "IPython",
+                "shellId": "45EC712359E9459FBD4B531D7B8C7C6A",
+                "elapsedTime": 1081,
+                "height": 172,
+                "result": {
+                    "type": "Results",
+                    "outputdata": [
+                        {
+                            "type": "err",
+                            "value": "INFO: Calculating descriptor: prdf\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Using None cell\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 1/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 3/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 5/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 7/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 9/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Descriptor calculation: done.\n"
+                        }
+                    ]
+                },
+                "dataresult": [
+                    0,
+                    1,
+                    0,
+                    0,
+                    1,
+                    1,
+                    0,
+                    1,
+                    0,
+                    1
+                ]
+            },
+            "evaluatorReader": true,
+            "lineCount": 40
+        },
+        {
+            "id": "markdownHF5TDF",
+            "type": "markdown",
+            "body": [
+                "<div class=\"modal-body lasso_instructions\">",
+                "",
+                "In the NOMADsim package of the NOMAD Analytics toolkit, <br>the following dimensionality reduction methods are available: <br>",
+                "<br>",
+                "",
+                "<ol>",
+                "",
+                "<li> <b>Principal Component Analysis (PCA)</b> [<a href=\"https://en.wikipedia.org/wiki/Principal_component_analysis\" target=\"_blank\">more info</a>]: <br>",
+                "keyword <i>embed_method</i> = <font color=\"red\">'pca'</font> <br>",
+                "<font color=\"blue\"> Estimated calculation time for embedding: 11 sec</font> <br>",
+                "<font color=\"blue\"> Estimated total calculation time (including figure generation): 1min 30sec</font>",
+                "</li>",
+                "<br>",
+                "",
+                "<li> <b>Multidimensional scaling (MDS)</b> [<a href=\"https://en.wikipedia.org/wiki/Multidimensional_scaling\" target=\"_blank\">more info</a>]: <br>",
+                "keyword <i>embed_method</i> = <font color=\"red\">'mds'</font> <br>",
+                "<font color=\"blue\"> Estimated calculation time for embedding: 11 sec</font> <br>",
+                "<font color=\"blue\"> Estimated total calculation time (including figure generation): 1min 30sec</font>",
+                "</li>",
+                "<br>",
+                "",
+                "<li> <b>t-Distributed Stochastic Neighbor Embedding (t-SNE)</b> [<a href=\"https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding\" target=\"_blank\">more info</a>]: <br>",
+                "keyword <i>embed_method</i> = <font color=\"red\">'tsne_pca'</font> <br>",
+                "<font color=\"blue\"> Estimated calculation time for embedding: 26 sec</font> <br>",
+                "<font color=\"blue\"> Estimated total calculation time (including figure generation): 1min 45sec</font>",
+                " </li>",
+                "<br>",
+                "",
+                " </ol>",
+                "",
+                " In the <i>calc_embedding</i> function below, change the keyword <i>embed_method</i> <br>",
+                " to calculate the two-dimensional structural similarity plot with the method of your choice.  <br>",
+                " When the calculation is completed, a link \"Click here to open the Viewer\" will appear. <br> ",
+                " Click on that link to see the results in the NOMAD Viewer.",
+                "</div>"
+            ],
+            "evaluatorReader": false
+        },
+        {
+            "id": "codenp8XHW",
+            "type": "code",
+            "evaluator": "IPython",
+            "input": {
+                "body": [
+                    "calc_embedding(",
+                    "    # CHANGE THE EMBEDDING METHOD HERE  ",
+                    "    embed_method='pca',",
+                    "",
+                    "",
+                    "    # --------- DO NOT CHANGE THE CODE BELOW ---------------#",
+                    "    desc_type='prdf', ",
+                    "    target_name='energy',",
+                    "    desc_file='descriptor.tar.gz',",
+                    "    standardize='True',",
+                    "    lookup_file=lookup_file, ",
+                    "    tmp_folder=tmp_folder)",
+                    "",
+                    "",
+                    "# read and plot data",
+                    "json_list, frame_list, x_list, y_list, _ = get_json_list(method='file', data_folder=data_folder,",
+                    "    path_to_file=lookup_file, drop_duplicates=False, displace_duplicates=True, get_unique_list=True)",
+                    "",
+                    "# generate a pseudo-random name for Viewer",
+                    "name_html_page = hashlib.sha224(str(time.time())).hexdigest()[:16]",
+                    "",
+                    "# 3x3 supercell for visualization purposes",
+                    "operations_on_structure = [(create_supercell, {'replicas': [3, 3, 3]})]",
+                    "op_list = np.zeros(len(json_list))",
+                    "",
+                    "# generate figures and geometry files for the plot",
+                    "filename = plot(",
+                    "    target_list=target_list, ",
+                    "    target_unit='arb. units', ",
+                    "    target_name='Target',",
+                    "    target_class_names = ['Zincblende', 'Rocksalt'],",
+                    "    is_classification=True,",
+                    "    name=name_html_page, ",
+                    "    json_list=json_list, ",
+                    "    frames='list', ",
+                    "    frame_list=frame_list, ",
+                    "    file_format='NOMAD', ",
+                    "    op_list=op_list,",
+                    "    operations_on_structure=operations_on_structure,",
+                    "    clustering_x_list=x_list, ",
+                    "    clustering_y_list=y_list, ",
+                    "    clustering_point_size=12, ",
+                    "    tmp_folder=tmp_folder, ",
+                    "    control_file=control_file,",
+                    "    cell_type=None,",
+                    "    plot_title='MY Structural-similarity map')",
+                    "",
+                    "# show HTML link to the Viewer page",
+                    "HTML(filename)"
+                ]
+            },
+            "output": {
+                "state": {},
+                "selectedType": "Results",
+                "pluginName": "IPython",
+                "shellId": "45EC712359E9459FBD4B531D7B8C7C6A",
+                "elapsedTime": 1702,
+                "height": 622,
+                "result": {
+                    "type": "Results",
+                    "outputdata": [
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Setting up dictionary for cluster calculation.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Configuration count: 10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Longest rdf list: 4516\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Atomic set info:\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Number of different atoms in the set: 17\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Highest atomic number in the set: 57\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Actual atom set: set([32, 33, 34, 3, 4, 35, 48, 9, 12, 13, 14, 16, 50, 19, 52, 55, 56])\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Sparse array shape: (10, 57, 57, 50), (10, 162450)\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Processing configuration 1/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Actual feature matrix needs 4104 bytes\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Feature matrix shape: (10, 162450)\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Data standardized by removing the mean and scaling to unit variance.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Using Principal Component Analysis (PCA) as embedding method\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Explained variance by each component (%):[ 13.12267662  12.93601294]\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Total variance explained (%): 26.0586895584\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Eigenvectors: \n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Eigenvector 1: \n[ -4.12077340e-17  -1.00271180e-15   8.14612542e-16 ...,  -8.07331675e-03\n  -8.07331675e-03  -8.07331675e-03]\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Eigenvector 2: \n[  1.06879920e-15  -8.49242374e-16  -4.55489994e-16 ...,  -1.89223858e-15\n  -1.94148557e-15  -1.92572832e-15]\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Time to compute 2d-embedding: 0.42 sec\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Two-dimensional embedding: done.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Information from the lookup file read correctly.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Generating figures and geometry files.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 1/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 3/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 5/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 7/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 9/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Generating figures and geometry files: done.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Plotting 10 crystal structures.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Click on the button 'View interactive 2D scatter plot' to see the plot.\n"
+                        }
+                    ],
+                    "payload": "<div class=\"output_subarea output_html rendered_html\"><a target=\"_blank\" href=\"/user/tmp/068a9c5fe3101642.html\">Click here to open the Viewer</a></div>"
+                }
+            },
+            "evaluatorReader": true,
+            "lineCount": 49
+        },
+        {
+            "id": "sectionIhW1xp",
+            "type": "section",
+            "title": "<p style=\"color: #20335d; font-weight: 900;\">Part 3:  Interactive query of the database and two-dimensional embedding </p>",
+            "level": 1,
+            "evaluatorReader": false,
+            "collapsed": false
+        },
+        {
+            "id": "markdownkdhnI9",
+            "type": "markdown",
+            "body": [
+                "<div class=\"modal-body lasso_instructions\">",
+                "In this part, you will interactively query the database, then calculate the partial radial distribution function (as done in Part 1), visualize the results in a  high-dimensional data in two-dimensions, ",
+                "and finally generate an interactive Viewer with the results.<br>",
+                "",
+                "<br>",
+                "The difference with Part 1 and Part 2 is that you will query the database, and perform the data-analytics operations (seen in Part 1 and Part 2) on the result of your query.",
+                "</div> "
+            ],
+            "evaluatorReader": false
+        },
+        {
+            "id": "sectionkJgLnr",
+            "type": "section",
+            "title": "<p style=\"color: #20335d; font-size: 15pt;font-weight: 900;\">3.1 Interactive Query</p>",
+            "level": 2,
+            "evaluatorReader": false,
+            "collapsed": false
+        },
+        {
+            "id": "codeodP4QD",
+            "type": "code",
+            "evaluator": "IPython",
+            "input": {
+                "body": [
+                    "from IPython.core.display import HTML ",
+                    "",
+                    "# load packages",
+                    "from nomad_sim.wrappers_desc_class import get_json_list, calc_descriptor ",
+                    "from nomad_sim.wrappers_desc_class import calc_model, calc_embedding, plot",
+                    "from nomad_sim.utils_crystals import create_supercell",
+                    "import hashlib",
+                    "import time",
+                    "import pandas as pd",
+                    "import numpy as np",
+                    "",
+                    "# define paths",
+                    "tmp_folder = '/home/beaker/.beaker/v1/web/tmp/'",
+                    "control_file = '/home/beaker/.beaker/v1/web/tmp/control.json'",
+                    "data_folder='/home/beaker/test/nomad_sim/data_zcrs'",
+                    "lookup_file = '/home/beaker/.beaker/v1/web/tmp/lookup.dat'"
+                ]
+            },
+            "output": {
+                "state": {},
+                "selectedType": "Hidden",
+                "pluginName": "IPython",
+                "shellId": "45EC712359E9459FBD4B531D7B8C7C6A",
+                "elapsedTime": 558
+            },
+            "evaluatorReader": true,
+            "lineCount": 16
+        },
+        {
+            "id": "codewLVedq",
+            "type": "code",
+            "evaluator": "IPython",
+            "input": {
+                "body": [
+                    "# read the list of files from the data folder",
+                    "json_list = get_json_list(",
+                    "    method='folder', ",
+                    "    drop_duplicates=False, ",
+                    "    data_folder=data_folder, ",
+                    "    tmp_folder=tmp_folder)",
+                    "",
+                    "operations_on_structure = [(create_supercell, {'replicas': [1, 1, 1]})]",
+                    "op_list = np.zeros(len(json_list))",
+                    "",
+                    "json_list = json_list[:10]"
+                ]
+            },
+            "output": {
+                "state": {},
+                "selectedType": "Hidden",
+                "pluginName": "IPython",
+                "shellId": "45EC712359E9459FBD4B531D7B8C7C6A",
+                "elapsedTime": 493
+            },
+            "evaluatorReader": true,
+            "lineCount": 11
+        },
+        {
+            "id": "codexk2sxp",
+            "type": "code",
+            "evaluator": "IPython",
+            "input": {
+                "body": [
+                    "band_gap = np.random.rand(len(json_list))",
+                    "df = pd.DataFrame({",
+                    "        \"json_list\": json_list,",
+                    "        \"band_gap\": band_gap",
+                    "    })",
+                    "df"
+                ]
+            },
+            "output": {
+                "state": {
+                    "datatablestate": {
+                        "pagination": {
+                            "use": true,
+                            "rowsToDisplay": 25,
+                            "fixLeft": 0,
+                            "fixRight": 0
+                        },
+                        "columnNames": [
+                            "band_gap",
+                            "json_list"
+                        ],
+                        "actualtype": [
+                            "4.4",
+                            0
+                        ],
+                        "actualalign": [
+                            "R",
+                            "L"
+                        ],
+                        "colorder": [
+                            0,
+                            1,
+                            2
+                        ],
+                        "getCellSho": [
+                            true,
+                            true
+                        ],
+                        "barsOnColumn": {},
+                        "heatmapOnColumn": {},
+                        "tableFilter": "",
+                        "showFilter": false,
+                        "columnSearchActive": false,
+                        "columnFilter": [],
+                        "columnWidth": []
+                    }
+                },
+                "result": {
+                    "columnNames": [
+                        "Index",
+                        "band_gap",
+                        "json_list"
+                    ],
+                    "subtype": "TableDisplay",
+                    "values": [
+                        [
+                            0,
+                            0.44946722519831317,
+                            "/home/beaker/test/nomad_sim/data_zcrs/P2of-qZ_Nv0b7vgXOhXW-GhI-zEJX.json"
+                        ],
+                        [
+                            1,
+                            0.6370776411974852,
+                            "/home/beaker/test/nomad_sim/data_zcrs/P9HwBVH9CaRy6exyTIGnjtUmzmRxD.json"
+                        ],
+                        [
+                            2,
+                            0.24900202469357458,
+                            "/home/beaker/test/nomad_sim/data_zcrs/P7phsDEr7Yr60VY4pRQEh7JA3-BCU.json"
+                        ],
+                        [
+                            3,
+                            0.6403621175855289,
+                            "/home/beaker/test/nomad_sim/data_zcrs/PQemsbzVUUZnBviHLK9konFRXNBfK.json"
+                        ],
+                        [
+                            4,
+                            0.806523494119849,
+                            "/home/beaker/test/nomad_sim/data_zcrs/PrZEVfU48Ok0DkJYiZWjo7Uu4ZNPM.json"
+                        ],
+                        [
+                            5,
+                            0.447860165796217,
+                            "/home/beaker/test/nomad_sim/data_zcrs/PudDm0on_-EHhn0SHX20l2vdbSQ1x.json"
+                        ],
+                        [
+                            6,
+                            0.4876955205806144,
+                            "/home/beaker/test/nomad_sim/data_zcrs/PjGykEyzLOFynTPTNDcycF0GYg1PE.json"
+                        ],
+                        [
+                            7,
+                            0.031413070629291995,
+                            "/home/beaker/test/nomad_sim/data_zcrs/P93H-Xi385RvOeiNOl9pG1-KUaUu-.json"
+                        ],
+                        [
+                            8,
+                            0.5319046091232978,
+                            "/home/beaker/test/nomad_sim/data_zcrs/PaS_PmxIz-YpWUjS9jvwRRtAysL9u.json"
+                        ],
+                        [
+                            9,
+                            0.6735479919531677,
+                            "/home/beaker/test/nomad_sim/data_zcrs/Pp4wUDDucIEdS9euDT89Y6xQA_JPq.json"
+                        ]
+                    ],
+                    "hasIndex": "true",
+                    "type": "TableDisplay",
+                    "types": [
+                        "integer",
+                        "double",
+                        "string"
+                    ]
+                },
+                "selectedType": "Table",
+                "pluginName": "IPython",
+                "shellId": "45EC712359E9459FBD4B531D7B8C7C6A",
+                "elapsedTime": 375,
+                "height": 345
+            },
+            "evaluatorReader": true,
+            "lineCount": 6
+        },
+        {
+            "id": "section5a08jP",
+            "type": "section",
+            "title": "<p style=\"color: #20335d; font-size: 15pt;font-weight: 900;\">3.2 Descriptor calculation and two-dimensional embedding</p>",
+            "level": 2,
+            "evaluatorReader": false,
+            "collapsed": false
+        },
+        {
+            "id": "codehZchk3",
+            "type": "code",
+            "evaluator": "IPython",
+            "input": {
+                "body": [
+                    "# get json_list from query results",
+                    "json_list = df.json_list.tolist()",
+                    "",
+                    "# calculate descriptor",
+                    "calc_descriptor(",
+                    "    desc_type='prdf', ",
+                    "    is_rs_zb=False,",
+                    "    operations_on_structure=operations_on_structure,",
+                    "    json_list=json_list, ",
+                    "    tmp_folder=tmp_folder)",
+                    "",
+                    "calc_embedding(",
+                    "    embed_method='mds',",
+                    "    desc_type='prdf', ",
+                    "    target_name='target',",
+                    "    desc_file='descriptor.tar.gz',",
+                    "    standardize='True',",
+                    "    lookup_file=lookup_file, ",
+                    "    tmp_folder=tmp_folder)"
+                ]
+            },
+            "output": {
+                "state": {},
+                "result": {
+                    "type": "Results",
+                    "outputdata": [
+                        {
+                            "type": "err",
+                            "value": "INFO: Calculating descriptor: prdf\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Using None cell\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 1/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 3/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 5/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 7/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Calculating descriptor: file 9/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Descriptor calculation: done.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Setting up dictionary for cluster calculation.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Configuration count: 10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Longest rdf list: 4516\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Atomic set info:\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Number of different atoms in the set: 17\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Highest atomic number in the set: 57\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Actual atom set: set([32, 33, 34, 3, 4, 35, 48, 9, 12, 13, 14, 16, 50, 19, 52, 55, 56])\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Sparse array shape: (10, 57, 57, 50), (10, 162450)\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Processing configuration 1/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Actual feature matrix needs 4104 bytes\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Feature matrix shape: (10, 162450)\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Data standardized by removing the mean and scaling to unit variance.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Using Multidimensional scaling as embedding method\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Time to compute 2d-embedding: 0.011 sec\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Two-dimensional embedding: done.\n"
+                        }
+                    ]
+                },
+                "selectedType": "Results",
+                "pluginName": "IPython",
+                "shellId": "45EC712359E9459FBD4B531D7B8C7C6A",
+                "elapsedTime": 1399,
+                "dataresult": [
+                    0,
+                    1,
+                    0,
+                    0,
+                    1,
+                    1,
+                    0,
+                    1,
+                    0,
+                    1
+                ],
+                "height": 421
+            },
+            "evaluatorReader": true,
+            "lineCount": 19
+        },
+        {
+            "id": "sectioneISJ2s",
+            "type": "section",
+            "title": "<p style=\"color: #20335d; font-size: 15pt;font-weight: 900;\">3.3 Result visualization in the interactive NOMAD Viewer</p>",
+            "level": 2,
+            "evaluatorReader": false,
+            "collapsed": false
+        },
+        {
+            "id": "markdowndvKLyf",
+            "type": "markdown",
+            "body": [
+                "<div class=\"modal-body lasso_instructions\">",
+                "We will now plot the results. Contrarily to what we have done in Part 1 and Part 2, here you can change the property that you want to use to color code the results.",
+                "<br>",
+                "In particular, given the Query results at point 3.1, you can decide to color code according to: ",
+                "<ul>",
+                "",
+                "<li> band gap (keyword: <font color=\"blue\"> band_gap</font>) <br>",
+                "</li>",
+                "<li>?? (keyword: <font color=\"blue\"> ??</font>) <br>",
+                "</li>",
+                "<br>",
+                "",
+                " </ul>",
+                "",
+                "For example, in the cell below, change the line <br>",
+                " <pre>target_list = df['band_gap'].tolist() </pre> ",
+                " to",
+                " <pre>target_list = df['??'].tolist() </pre> ",
+                "to change the color code of the plot  from \"band_gap\" to \"??\". <br>",
+                "You should also change the <i>target_unit</i> and <i>legend_title</i> accordingly.<br>",
+                "",
+                "<br>",
+                " When the calculation is completed, a link \"Click here to open the Viewer\" will appear. <br> ",
+                " Click on that link to see the results in the NOMAD Viewer.",
+                "</div>",
+                "",
+                "<br> ",
+                "</div> "
+            ],
+            "evaluatorReader": false
+        },
+        {
+            "id": "code1A2q58",
+            "type": "code",
+            "evaluator": "IPython",
+            "input": {
+                "body": [
+                    "# read and plot data",
+                    "json_list, frame_list, x_list, y_list, _ = get_json_list(method='file', data_folder=data_folder,",
+                    "    path_to_file=lookup_file, drop_duplicates=False, displace_duplicates=True, get_unique_list=True)",
+                    "",
+                    "# generate a pseudo-random name for Viewer",
+                    "name_html_page = hashlib.sha224(str(time.time())).hexdigest()[:16]",
+                    "",
+                    "# 3x3 supercell for visualization purposes",
+                    "operations_on_structure = [(create_supercell, {'replicas': [3, 3, 3]})]",
+                    "op_list = np.zeros(len(json_list))",
+                    "",
+                    "",
+                    "# ------------- CHANGE THE COLOR-CODING PROPERTY HERE --------------#",
+                    "target_list = df['band_gap'].tolist()",
+                    "target_unit = 'eV'",
+                    "legend_title = 'Band gap'",
+                    "# ------------------------------------------------------------------#",
+                    "",
+                    "",
+                    "    ",
+                    "# generate figures and geometry files for the plot",
+                    "filename = plot(",
+                    "    target_list=target_list, ",
+                    "    target_unit=target_unit, ",
+                    "    legend_title=legend_title,",
+                    "    target_name='Target',",
+                    "    is_classification=False,",
+                    "    name=name_html_page, ",
+                    "    json_list=json_list, ",
+                    "    frames='list', ",
+                    "    frame_list=frame_list, ",
+                    "    op_list=op_list,",
+                    "    operations_on_structure=operations_on_structure,",
+                    "    file_format='NOMAD', ",
+                    "    clustering_x_list=x_list, ",
+                    "    clustering_y_list=y_list, ",
+                    "    clustering_point_size=12, ",
+                    "    tmp_folder=tmp_folder, ",
+                    "    control_file=control_file,",
+                    "    cell_type=None,",
+                    "    plot_title='Structural-similarity map')",
+                    "",
+                    "# show HTML link to the Viewer page",
+                    "HTML(filename)",
+                    ""
+                ]
+            },
+            "output": {
+                "state": {},
+                "result": {
+                    "type": "Results",
+                    "outputdata": [
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Information from the lookup file read correctly.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Generating figures and geometry files.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 1/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 3/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 5/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 7/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Generating figures and geometries: file 9/10\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Generating figures and geometry files: done.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "DEBUG: Plotting 10 crystal structures.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: The color in the plot is given by the target value.\n"
+                        },
+                        {
+                            "type": "err",
+                            "value": "INFO: Click on the button 'View interactive 2D scatter plot' to see the plot.\n"
+                        }
+                    ],
+                    "payload": "<div class=\"output_subarea output_html rendered_html\"><a target=\"_blank\" href=\"/user/tmp/e918a6abeebd3442.html\">Click here to open the Viewer</a></div>"
+                },
+                "selectedType": "Results",
+                "pluginName": "IPython",
+                "shellId": "45EC712359E9459FBD4B531D7B8C7C6A",
+                "elapsedTime": 1105,
+                "height": 240
+            },
+            "evaluatorReader": true,
+            "lineCount": 45
+        }
+    ],
+    "namespace": {}
+}