" <p> In the two popular non-linear method we chose, <b>multidimensional scaling (<a href=\"https://en.wikipedia.org/wiki/Multidimensional_scaling\" target=\"_blank\">MDS</a>) </b> tries to preserve the distances from the given high-dimensional to the two-dimensional representation, ",
" and the <b>t-Distributed Stochastic Neighbor Embedding (<a href=\"https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding\" target=\"_blank\">t-SNE</a>) </b> tries to preserve the local shape of groups of neighboring points. Both methods use a notion of distance that in our example is the Euclidean norm, even if in principle it could be any proper norm. </p>",
"",
" <p> In the results, we show the data points colored according to the difference in energy between the Rocksalt (RS) and Zincblende (ZB) crystal structures (both relaxed to their local minima) of the material they represent. The labeling and consequent coloring is independent of the embedding method used, therefore the labeling is an <i>a posteriori</i>",
" <p> In the results, we show the data points colored according to the difference in energy between the Rocksalt (RS) and Zincblende (ZB) crystal structures (both relaxed to their local minima) of the material they represent. The labeling and consequent coloring are independent of the embedding method used, therefore the labeling is an <i>a posteriori</i>",
" check that the high-dimensional representation could contain information about the labeling itself. In practice, if the coloring identifies clearly distinct areas, then the two dimensional representation is a map for the prediction of the labels, so that a new data point of unknown labeling, that lands in the 2D map in a area of points with known labeling, is expected to belong to that same labeling. </p>",
" ",
"<p>The merit of the embedding methods is to provide relatively inexpensive tools to visually test whether a given set of features contains information about an investigated property (label). For this reason, they are widely used as preliminary tools for discovering structures in the data. </p>",
...
...
@@ -141,7 +141,7 @@
"result": {
"type": "BeakerDisplay",
"innertype": "Html",
"object": "<script>\nvar beaker = bkHelper.getBeakerObject().beakerObj;\n</script>\n<style type=\"text/css\">\n .lasso_instructions{\n font-size: 15px;\n } \n</style>\n<!-- Button trigger modal -->\n<button type=\"button\" class=\"btn btn-default\" data-toggle=\"modal\" data-target=\"#lasso-motivation-modal\">\n Introduction and motivation\n</button>\n\n<!-- Modal -->\n<div class=\"modal fade\" id=\"lasso-motivation-modal\" tabindex=\"-1\" role=\"dialog\" aria-labelledby=\"lasso-motivation-modal-label\">\n <div class=\"modal-dialog modal-lg\" role=\"document\">\n <div class=\"modal-content\">\n <div class=\"modal-header\">\n <button type=\"button\" class=\"close\" data-dismiss=\"modal\" aria-label=\"Close\"><span aria-hidden=\"true\">×</span></button>\n <h4 class=\"modal-title\" id=\"lasso-motivation-modal-label\">Introduction and motivation</h4>\n </div>\n <div class=\"modal-body lasso_instructions\">\n <p> In this tutorial, we present a tool that produces two-dimensional structure maps for octet binary compounds, by starting from a high-dimensional set of <i>features</i> (coordinates) that identify each data point (material), based on free-atom data of the atomic species constituting the binary material. </p>\n \n <p> The low-dimensional embedding methods (here, two-dimensional for the sake of visualization) are <i>unsupervised</i> machine-learning algorithms; so, in our example, the algorithm processes only the spatial arrangement of the points in the high-dimensional representation that is determined by the user. </p>\n \n <p> In the linear method, <b>principal component analysis (<a href=\"https://en.wikipedia.org/wiki/Feature_scaling\" target=\"_blank\">PCA</a>)</b>, the direction (linear combination of the input coordinates) with the maximum variance is identified as the first principal component (PC). The direction perpendicular to the first PC with the largest variance is the second PC.\n The process can be iterated up to as many dimensions as the initial dimensionality of the data, but here we stop at the second dimension and give the amount of total variance recovered by the first two principal components. </p>\n <p> In the two popular non-linear method we chose, <b>multidimensional scaling (<a href=\"https://en.wikipedia.org/wiki/Multidimensional_scaling\" target=\"_blank\">MDS</a>) </b> tries to preserve the distances from the given high-dimensional to the two-dimensional representation, \n and the <b>t-Distributed Stochastic Neighbor Embedding (<a href=\"https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding\" target=\"_blank\">t-SNE</a>) </b> tries to preserve the local shape of groups of neighboring points. Both methods use a notion of distance that in our example is the Euclidean norm, even if in principle it could be any proper norm. </p>\n\n <p> In the results, we show the data points colored according to the difference in energy between the Rocksalt (RS) and Zincblende (ZB) crystal structures (both relaxed to their local minima) of the material they represent. The labeling and consequent coloring is independent of the embedding method used, therefore the labeling is an <i>a posteriori</i>\n check that the high-dimensional representation could contain information about the labeling itself. In practice, if the coloring identifies clearly distinct areas, then the two dimensional representation is a map for the prediction of the labels, so that a new data point of unknown labeling, that lands in the 2D map in a area of points with known labeling, is expected to belong to that same labeling. </p>\n \n<p>The merit of the embedding methods is to provide relatively inexpensive tools to visually test whether a given set of features contains information about an investigated property (label). For this reason, they are widely used as preliminary tools for discovering structures in the data. </p>\n<p> The prediction of RS vs ZB structure from a simple descriptor has a notable history in materials science [1-7], where descriptors were designed by chemically/physically-inspired intuition. </p>\n\n <p>References:</p>\n <ol>\n <li>J. A. van Vechten, Phys. Rev. 182, 891 (1969).</li>\n <li>J. C. Phillips, Rev. Mod. Phys. 42, 317 (1970).</li>\n <li>J. St. John and A.N. Bloch, Phys. Rev. Lett. 33, 1095 (1974).</li>\n <li>J. R. Chelikowsky and J. C. Phillips, Phys. Rev. B 17, 2453 (1978).</li>\n <li>A. Zunger, Phys. Rev. B 22, 5839 (1980).</li>\n <li>D. G. Pettifor, Solid State Commun. 51, 31 (1984).</li>\n <li>Y. Saad, D. Gao, T. Ngo, S. Bobbitt, J. R. Chelikowsky, and W. Andreoni, Phys. Rev. B 85, 104104 (2012).</li>\n </ol>\n </div>\n <div class=\"modal-footer\">\n <button type=\"button\" class=\"btn btn-default\" data-dismiss=\"modal\">Close</button>\n<!-- <button type=\"button\" class=\"btn btn-primary\">Save changes</button> -->\n </div>\n </div>\n </div>\n</div>"
"object": "<script>\nvar beaker = bkHelper.getBeakerObject().beakerObj;\n</script>\n<style type=\"text/css\">\n .lasso_instructions{\n font-size: 15px;\n } \n</style>\n<!-- Button trigger modal -->\n<button type=\"button\" class=\"btn btn-default\" data-toggle=\"modal\" data-target=\"#lasso-motivation-modal\">\n Introduction and motivation\n</button>\n\n<!-- Modal -->\n<div style=\"display: none;\" class=\"modal fade\" id=\"lasso-motivation-modal\" tabindex=\"-1\" role=\"dialog\" aria-labelledby=\"lasso-motivation-modal-label\">\n <div class=\"modal-dialog modal-lg\" role=\"document\">\n <div class=\"modal-content\">\n <div class=\"modal-header\">\n <button type=\"button\" class=\"close\" data-dismiss=\"modal\" aria-label=\"Close\"><span aria-hidden=\"true\">×</span></button>\n <h4 class=\"modal-title\" id=\"lasso-motivation-modal-label\">Introduction and motivation</h4>\n </div>\n <div class=\"modal-body lasso_instructions\">\n <p> In this tutorial, we present a tool that produces two-dimensional structure maps for octet binary compounds, by starting from a high-dimensional set of <i>features</i> (coordinates) that identify each data point (material), based on free-atom data of the atomic species constituting the binary material. </p>\n \n <p> The low-dimensional embedding methods (here, two-dimensional for the sake of visualization) are <i>unsupervised</i> machine-learning algorithms; so, in our example, the algorithm processes only the spatial arrangement of the points in the high-dimensional representation that is determined by the user. </p>\n \n <p> In the linear method, <b>principal component analysis (<a href=\"https://en.wikipedia.org/wiki/Feature_scaling\" target=\"_blank\">PCA</a>)</b>, the direction (linear combination of the input coordinates) with the maximum variance is identified as the first principal component (PC). The direction perpendicular to the first PC with the largest variance is the second PC.\n The process can be iterated up to as many dimensions as the initial dimensionality of the data, but here we stop at the second dimension and give the amount of total variance recovered by the first two principal components. </p>\n <p> In the two popular non-linear method we chose, <b>multidimensional scaling (<a href=\"https://en.wikipedia.org/wiki/Multidimensional_scaling\" target=\"_blank\">MDS</a>) </b> tries to preserve the distances from the given high-dimensional to the two-dimensional representation, \n and the <b>t-Distributed Stochastic Neighbor Embedding (<a href=\"https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding\" target=\"_blank\">t-SNE</a>) </b> tries to preserve the local shape of groups of neighboring points. Both methods use a notion of distance that in our example is the Euclidean norm, even if in principle it could be any proper norm. </p>\n\n <p> In the results, we show the data points colored according to the difference in energy between the Rocksalt (RS) and Zincblende (ZB) crystal structures (both relaxed to their local minima) of the material they represent. The labeling and consequent coloring are independent of the embedding method used, therefore the labeling is an <i>a posteriori</i>\n check that the high-dimensional representation could contain information about the labeling itself. In practice, if the coloring identifies clearly distinct areas, then the two dimensional representation is a map for the prediction of the labels, so that a new data point of unknown labeling, that lands in the 2D map in a area of points with known labeling, is expected to belong to that same labeling. </p>\n \n<p>The merit of the embedding methods is to provide relatively inexpensive tools to visually test whether a given set of features contains information about an investigated property (label). For this reason, they are widely used as preliminary tools for discovering structures in the data. </p>\n<p> The prediction of RS vs ZB structure from a simple descriptor has a notable history in materials science [1-7], where descriptors were designed by chemically/physically-inspired intuition. </p>\n\n <p>References:</p>\n <ol>\n <li>J. A. van Vechten, Phys. Rev. 182, 891 (1969).</li>\n <li>J. C. Phillips, Rev. Mod. Phys. 42, 317 (1970).</li>\n <li>J. St. John and A.N. Bloch, Phys. Rev. Lett. 33, 1095 (1974).</li>\n <li>J. R. Chelikowsky and J. C. Phillips, Phys. Rev. B 17, 2453 (1978).</li>\n <li>A. Zunger, Phys. Rev. B 22, 5839 (1980).</li>\n <li>D. G. Pettifor, Solid State Commun. 51, 31 (1984).</li>\n <li>Y. Saad, D. Gao, T. Ngo, S. Bobbitt, J. R. Chelikowsky, and W. Andreoni, Phys. Rev. B 85, 104104 (2012).</li>\n </ol>\n </div>\n <div class=\"modal-footer\">\n <button type=\"button\" class=\"btn btn-default\" data-dismiss=\"modal\">Close</button>\n<!-- <button type=\"button\" class=\"btn btn-primary\">Save changes</button> -->\n </div>\n </div>\n </div>\n</div>"
},
"selectedType": "BeakerDisplay",
"elapsedTime": 0,
...
...
@@ -330,7 +330,7 @@
" ",
" <br>",
" <div class=\"row\"> <!-- Start of row-->",
" <p class=\"lasso_selection_description\"><b>Units of measure: </b> ",
" <p class=\"lasso_selection_description\"><b>Unit of measures: </b> ",