**Info on the normalization of the Wyckoff positions**

In order to compare prototypes a property called normalized_wyckoff was introduced.

[see in `encyclopedia-preprocessing/Nomad/Preprocessing/System/preprcessormaterial3d.py`

and in function `get_normalized_wyckoff`

in `structure.py`

or as it is used in the `classify4me_normalizer.py`

- classifier based on the prototypes (that is not using encyclopedia preprocessor)]

It returns a dictionary with all the wyckoff positions, and for each of them how many atoms of each type are at that position.The atom labels are normalized by calling the atom type with more atoms in the cell `x_1`

, the second one `x_2`

,`...`

Ties are resolved looking at the most common atom at the first Wyckoff positions in alphabetical order, then at the second position,... until one atom is more common than the other, or they are really equivalent.

**Detailed description:**

We want to get rid of the atom labels and replace them with something independent, that is intrinsic, that no matter if you have formula like MgCu3 and FeTi3. These two might have the same prototype (let's assume that they have). If you have Fe at position *a* and 2Ti at position *b* and one Ti at position *c*, we cannot directly compare and see that one is equal to other. To choose we introduce `x_1`

and `x_2`

notation that is solving the problem.
We replace with new label and we get `x_2x_13`

and `x_2x_13`

in this way we can compare that formula and see that is equal and atom at position a is atom `x_2`

and other is `x_1`

Problem is when 2 atoms have exactly the same numerical count. e.g. Fe2O2 - how to decide which one you call `x_1`

and which `x_2`

. So what we do if they are equal: we begin to look at the first Wyckoff position and we decide if one is present more that the other, than we choose that one. And then we look at other Wyckoff position within that Wyckoff position and we look again (because it could be that both `x_1`

and `x_2`

here, Fe and O and one cannot decide). If we cannot yet decide because they are both here we look at the next Wyckoff position in alphabetical order until we can decide or if you cannot decide - it is always the same it means they are equivalent - the two atoms are equivalent so it does not matter cause we cannot ever decide, you can choose one is `x_1`

and one `x_2`

or vice versa - it will work the same.

The idea is we want to give name for the label in a way that we will give the same label to the same prototype no matter what, but if something is different the label is different. We choose first by the most common, than we look if it is not yet enough we look at the Wyckoff positions in the alphabetical order which is more common and so on. Basically the only thing to implement is to decide if e.g. Co is before or after Mn. We look if it is more common in the formula it is either bigger or smaller. We compare first the atom counts if 'a > b' we get 1, if 'a < b' we get -1, and 'a = b' we get 0, so we get the ordering. Otherwise we continue and compare the Wyckoff within sorted Wyckoff and we compare again how many times you have one and how many times other. If one in more occurring than other we can decide how they are ordered; until we find a way to order them or otherwise they are equal. And this means it doesn’t matter. So we sort with this comparison that orders in that way we will order all the atom names in a way that it doesn’t depend on the atom name itself - it depends only on how often they come and how often they are in the different Wyckoff positions.

Than we use the position in the sorted array as label. Now we have label, number that depends only on the formula and the Wyckoff positions and not the actual atom name. So if we have different formula that has the same Wyckoff position we will get exactly the same labels. That means we can compare then the Wyckoff and labels saying I have an atom `x_1`

at position *a* and atom `x_2`

at position *b* and if it has the same prototype it will look exactly the same now.