Commit 96d3ea33 authored by Luca Massimiliano Ghiringhelli's avatar Luca Massimiliano Ghiringhelli
Browse files

text fixes in the intro

parent 6a4d5277
......@@ -159,22 +159,22 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The materials of interest are represented as vectors in a vector space $X$ according to some chosen representation. The coordinates $x_i$ could represent features of the material e.g. bond distanc, lattice parameters, composition of elements etc. The ML moodels try to predict a target property $y$ with the minimal error according to some loss function. In our case $y$ is the formation energy. For this example three ML models have been used, MBTR, SOAP and n-gram. Additionally, calculation were performed on a simple representation just containing atomic properties, which is expected to produce much larger errors. All of this data is compiled into ``data.csv``."
"The materials of interest are represented as vectors in a vector space $X$ according to some chosen representation. The coordinates $x_i$ could represent features of the material e.g. bond distance, lattice parameters, composition of elements etc. The ML moodels try to predict a target property $y$ with the minimal error according to some loss function. In our case, $y$ is the formation energy. For this example, three ML models have been used. Specifically, kernel-ridge-regression models were trained by using three different descriptor of the atomc structures: <a href="https://arxiv.org/abs/1704.06439" target="_blank">MBTR</a>, <a href="https://arxiv.org/abs/1502.01366" target="_blank">SOAP/a>, <a href="https://www.nature.com/articles/s41524-019-0239-3" target="_blank">$n$-gram</a>. Additionally, calculation were performed on a simple representation just containing atomic properties, which is expected to produce much larger errors. All of this data is compiled into ``data.csv``."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A DA is defined by a function $\\sigma: X \\rightarrow \\{true, false\\}$, which describes a series of inequality constraints on the coordinates $x_i$. Thus, these selectors describe intersections of axis-parallel half-spaces resulting in simple convex regions in $X$. This allows to systematically reason about the described sub-domains (e.g., it iseasy to determine their differences and overlap) and also to sample novel points from them. These domains are found through subgroup discovery (SGD), maximizing the impact on the model error. This impact is defined by the product of selector coverage and the error reduction within, i.e.: \n",
"A DA is defined by a function $\\sigma: X \\rightarrow \\{true, false\\}$, which describes a series of inequality constraints on the coordinates $x_i$. Thus, these selectors describe intersections of axis-parallel half-spaces resulting in simple convex regions in $X$. This allows to systematically reason about the described sub-domains (e.g., it is easy to determine their differences and overlap) and also to sample novel points from them. These domains are found through subgroup discovery (SGD), maximizing the impact on the model error. This impact is defined by the product of selector coverage and the error reduction within, i.e.: \n",
"$\\mathrm{impact}(\\sigma) = \\left( \\frac{s}{k} \\right)^\\gamma \\left( \\frac{1}{k} \\sum\\limits^k_{i=1} l_i(f) - \\frac{1}{s} \\sum\\limits_{i \\in I(\\sigma)} l_i(f) \\right)^{1-\\gamma}$ \n",
"where:\n",
"\n",
"$k$... number of points in the data set \n",
"$s$... number of points in the DA set \n",
"$I(\\sigma)$... set of selected indices ($I(\\sigma) = \\{i: 1\\le i\\le k, \\sigma(x_i)= \\mathrm{true}\\}$) \n",
"$l_i$... loss function (e.g. squared error) \n",
"$f$... ML model (i.e. $f: X\\rightarrow \\mathbb{R}$) \n",
"$l_i$... loss function (e.g., squared error) \n",
"$f$... ML model (i.e., $f: X\\rightarrow \\mathbb{R}$) \n",
"$\\gamma$... coverage weight, $0 < \\gamma < 1$"
]
},
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment