"The materials of interest are represented as vectors in a vector space $X$ according to some chosen representation. The coordinates $x_i$ could represent features of the material e.g. bond distanc, lattice parameters, composition of elements etc. The ML moodels try to predict a target property $y$ with the minimal error according to some loss function. In our case $y$ is the formation energy. For this example three ML models have been used, MBTR, SOAP and n-gram. Additionally, calculation were performed on a simple representation just containing atomic properties, which is expected to produce much larger errors. All of this data is compiled into ``data.csv``."
"The materials of interest are represented as vectors in a vector space $X$ according to some chosen representation. The coordinates $x_i$ could represent features of the material e.g. bond distance, lattice parameters, composition of elements etc. The ML moodels try to predict a target property $y$ with the minimal error according to some loss function. In our case, $y$ is the formation energy. For this example, three ML models have been used. Specifically, kernel-ridge-regression models were trained by using three different descriptor of the atomc structures: <a href="https://arxiv.org/abs/1704.06439" target="_blank">MBTR</a>, <a href="https://arxiv.org/abs/1502.01366" target="_blank">SOAP/a>, <a href="https://www.nature.com/articles/s41524-019-0239-3" target="_blank">$n$-gram</a>. Additionally, calculation were performed on a simple representation just containing atomic properties, which is expected to produce much larger errors. All of this data is compiled into ``data.csv``."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A DA is defined by a function $\\sigma: X \\rightarrow \\{true, false\\}$, which describes a series of inequality constraints on the coordinates $x_i$. Thus, these selectors describe intersections of axis-parallel half-spaces resulting in simple convex regions in $X$. This allows to systematically reason about the described sub-domains (e.g., it iseasy to determine their differences and overlap) and also to sample novel points from them. These domains are found through subgroup discovery (SGD), maximizing the impact on the model error. This impact is defined by the product of selector coverage and the error reduction within, i.e.: \n",
"A DA is defined by a function $\\sigma: X \\rightarrow \\{true, false\\}$, which describes a series of inequality constraints on the coordinates $x_i$. Thus, these selectors describe intersections of axis-parallel half-spaces resulting in simple convex regions in $X$. This allows to systematically reason about the described sub-domains (e.g., it iseasy to determine their differences and overlap) and also to sample novel points from them. These domains are found through subgroup discovery (SGD), maximizing the impact on the model error. This impact is defined by the product of selector coverage and the error reduction within, i.e.: \n",