Info on the normalization of the Wyckoff positions
In order to compare prototypes a property called normalized_wyckoff was introduced.
encyclopedia-preprocessing/Nomad/Preprocessing/System/preprcessormaterial3d.py and in function
structure.py or as it is used in the
classify4me_normalizer.py - classifier based on the prototypes (that is not using encyclopedia preprocessor)]
It returns a dictionary with all the wyckoff positions, and for each of them how many atoms of each type are at that position.The atom labels are normalized by calling the atom type with more atoms in the cell
x_1, the second one
... Ties are resolved looking at the most common atom at the first Wyckoff positions in alphabetical order, then at the second position,... until one atom is more common than the other, or they are really equivalent.
We want to get rid of the atom labels and replace them with something independent, that is intrinsic, that no matter if you have formula like MgCu3 and FeTi3. These two might have the same prototype (let's assume that they have). If you have Fe at position a and 2Ti at position b and one Ti at position c, we cannot directly compare and see that one is equal to other. To choose we introduce
x_2 notation that is solving the problem.
We replace with new label and we get
x_2x_13 in this way we can compare that formula and see that is equal and atom at position a is atom
x_2 and other is
Problem is when 2 atoms have exactly the same numerical count. e.g. Fe2O2 - how to decide which one you call
x_1 and which
x_2. So what we do if they are equal: we begin to look at the first Wyckoff position and we decide if one is present more that the other, than we choose that one. And then we look at other Wyckoff position within that Wyckoff position and we look again (because it could be that both
x_2 here, Fe and O and one cannot decide). If we cannot yet decide because they are both here we look at the next Wyckoff position in alphabetical order until we can decide or if you cannot decide - it is always the same it means they are equivalent - the two atoms are equivalent so it does not matter cause we cannot ever decide, you can choose one is
x_1 and one
x_2 or vice versa - it will work the same.
The idea is we want to give name for the label in a way that we will give the same label to the same prototype no matter what, but if something is different the label is different. We choose first by the most common, than we look if it is not yet enough we look at the Wyckoff positions in the alphabetical order which is more common and so on. Basically the only thing to implement is to decide if e.g. Co is before or after Mn. We look if it is more common in the formula it is either bigger or smaller. We compare first the atom counts if 'a > b' we get 1, if 'a < b' we get -1, and 'a = b' we get 0, so we get the ordering. Otherwise we continue and compare the Wyckoff within sorted Wyckoff and we compare again how many times you have one and how many times other. If one in more occurring than other we can decide how they are ordered; until we find a way to order them or otherwise they are equal. And this means it doesn’t matter. So we sort with this comparison that orders in that way we will order all the atom names in a way that it doesn’t depend on the atom name itself - it depends only on how often they come and how often they are in the different Wyckoff positions.
Than we use the position in the sorted array as label. Now we have label, number that depends only on the formula and the Wyckoff positions and not the actual atom name. So if we have different formula that has the same Wyckoff position we will get exactly the same labels. That means we can compare then the Wyckoff and labels saying I have an atom
x_1 at position a and atom
x_2 at position b and if it has the same prototype it will look exactly the same now.