Commit 73b4ea01 authored by Niklas Leimeroth's avatar Niklas Leimeroth
Browse files

comment remove_jobs and deletes

parent 785f2ff7
%% Cell type:markdown id:widespread-perry tags:
%% Cell type:markdown id:compound-aquatic tags:
# Fitting an EAM potential
EAM potentials are pair functionals.
In a generalised form they are equal to Finnis-Sinclair, effective medium theory or glue potentials. Their total energy can be written as
$E = \sum_{ij}V(r_{ij}) + \sum_i F(\rho_i)$
with
$\rho_i = \sum_j \rho(r_{ij})$
The original functions for V, $\rho$ and F were derived from different theories, but they can be chosen freely.
Fitting is done using atomicrex https://atomicrex.org. In the fit process an objective or cost function is minimized. The objective function is defined as
$\chi^2 = \sum_i w_i r_i$
where $w_i$ is a weight and $r_i$ is a residual that describes the difference to target values. This residual can be defined in different ways, so it is not possible to simply compare the residual for different fitting processes or codes. A more in depth explanation and some examples can be found on https://atomicrex.org/overview.html#objective-function.
%% Cell type:code id:domestic-convert tags:
%% Cell type:code id:convenient-feeding tags:
``` python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pyiron import Project, ase_to_pyiron
```
%% Cell type:markdown id:hazardous-organic tags:
%% Cell type:markdown id:hollywood-indication tags:
### Import the training data
%% Cell type:code id:capital-rental tags:
%% Cell type:code id:united-preview tags:
``` python
data_pr = Project("../../datasets")
if len(data_pr.job_table()) == 0:
data_pr.unpack("Cu_training_archive")
data_pr.job_table()
```
%% Output
id status chemicalformula job \
0 759 finished None df1_A1_A2_A3_EV_elast_phon
1 760 finished None df3_10k
2 761 finished None df2_1k
subjob projectpath \
0 /df1_A1_A2_A3_EV_elast_phon /home/niklas/pyiron/projects/
1 /df3_10k /home/niklas/pyiron/projects/
2 /df2_1k /home/niklas/pyiron/projects/
project \
0 pyiron_potentialfit/datasets/imported_datasets/Cu_database/
1 pyiron_potentialfit/datasets/imported_datasets/Cu_database/
2 pyiron_potentialfit/datasets/imported_datasets/Cu_database/
timestart timestop totalcputime computer \
0 2021-02-08 10:33:52.341472 None None zora@cmti001#1
1 2021-02-08 10:33:53.993230 None None zora@cmti001#1
2 2021-02-08 10:33:54.435308 None None zora@cmti001#1
hamilton hamversion parentid masterid
0 TrainingContainer 0.4 None None
1 TrainingContainer 0.4 None None
2 TrainingContainer 0.4 None None
%% Cell type:code id:chinese-interest tags:
%% Cell type:code id:rotary-voice tags:
``` python
data_job = data_pr.load("df1_A1_A2_A3_EV_elast_phon")
df = data_job.to_pandas()
```
%% Cell type:code id:enormous-courage tags:
%% Cell type:code id:reliable-chain tags:
``` python
pr = Project("WorkshopPotential")
pr.remove_jobs()
#pr.remove_jobs()
```
%% Output
Are you sure you want to delete all jobs from 'WorkshopPotential'? y/(n) y
%% Cell type:markdown id:unlimited-strike tags:
%% Cell type:markdown id:early-accounting tags:
### Create an atomicrex job
%% Cell type:code id:satisfied-meditation tags:
%% Cell type:code id:photographic-source tags:
``` python
job = pr.create_job(pr.job_type.Atomicrex, "PotentialDF1")
```
%% Cell type:markdown id:strange-window tags:
%% Cell type:markdown id:southern-portuguese tags:
### Add the structures that should be fitted.
It is possible to assign different weights to certain structures or properties, depending on what should be investigated using the potential. Here every structure has the same weight, but the force vector with N*3 values is normalized to have the same total weight as the single value energy. Therefore it is divided by the number of atoms.
%% Cell type:code id:national-still tags:
%% Cell type:code id:practical-nickname tags:
``` python
for id, row in df.iterrows():
struct = ase_to_pyiron(row.atoms)
s = job.structures.add_structure(struct, f"id{id}", relative_weight=1)
s.fit_properties.add_FitProperty("atomic-energy", target_value=row.energy/row.number_of_atoms, relative_weight=1)
s.fit_properties.add_FitProperty("atomic-forces", target_value=row.forces, relative_weight=1/row.number_of_atoms)
```
%% Cell type:markdown id:joined-active tags:
%% Cell type:markdown id:discrete-forth tags:
### Define the type of potential and necessary functions.
In this case an eam potential is fitted.
%% Cell type:code id:chinese-poison tags:
%% Cell type:code id:capable-associate tags:
``` python
job.potential = job.factories.potentials.eam_potential()
```
%% Cell type:markdown id:sweet-overhead tags:
%% Cell type:markdown id:superior-bench tags:
It is necessary to define a pair potential, an electronic density function and an embedding function.
For all of those it is possible to choose between different functional forms.
Classic pair potentials are physically motivated and have a very limited number of paramaters that are derived from a experimentally measured quantity.
Splines or polynomials offer more flexibility, but can lead to unphysical oscillations or overfitting. Compared with the machine learning potentials shown later the number of parameters is very low no matter which functions you choose and the problem is highly non linear.
In this case a generalized morse function is used for the pair interaction. It has the form
$(\frac{D_0}{S-1}exp(-\beta \sqrt{2S}(r-r_0))-\frac{D_0S}{S-1}exp(-\beta\sqrt{2/S}(r-r_0)))+\delta $
The parameters in the morse potential can be derived from phyiscal quantities, but in this case they are just educated guesses. For example $r_0$ is the equilibrium distance of a dimer. The nearest neighbor distance in fcc Cu is about 2.5 $\mathring A$ so it is taken as initial value.
In the case of analytic functions the initial parameter choices should not matter too much, since the functional form is constrained.
The electronic density and embedding function will be splines. Depending on the properties that are calculated other functional forms could give better results. The inital parameters require more testing and hand tuning than the parameters of analytic functions.
%% Cell type:code id:miniature-semester tags:
%% Cell type:code id:fantastic-input tags:
``` python
V = job.factories.functions.morse_B(identifier="V_CuCu", D0=0.35, r0=2.5, beta=2, S=2, delta=0)
```
%% Cell type:markdown id:adaptive-andorra tags:
%% Cell type:markdown id:moderate-italic tags:
Pre defined functions like the morse function can be plotted to see the influence of the initial parameter values
%% Cell type:code id:geological-defense tags:
%% Cell type:code id:rising-posting tags:
``` python
V.plot()
```
%% Output
(<Figure size 720x504 with 1 Axes>,
<AxesSubplot:xlabel='r [$\\AA$]', ylabel='func(r)'>)
%% Cell type:markdown id:intimate-anger tags:
%% Cell type:markdown id:southeast-brazil tags:
Additionally it is a good idea to define limits for the parameters. This is optional for local minimizers, but the fit can quickly run away without limits. Global optimizers typically require them to constrain the sampled space.
%% Cell type:code id:facial-electric tags:
%% Cell type:code id:beginning-monthly tags:
``` python
V.parameters.D0.min_val = 0
V.parameters.D0.max_val = 2
V.parameters.r0.min_val = 1.5
V.parameters.r0.max_val = 3.0
V.parameters.S.min_val = 1.1
V.parameters.S.max_val = 10.0
V.parameters.delta.min_val = -1
V.parameters.delta.max_val = 1
V.parameters.beta.min_val = 0.1
V.parameters.beta.max_val = 10
```
%% Cell type:markdown id:offshore-julian tags:
%% Cell type:markdown id:meaning-badge tags:
Additionally a screening function needs to be defined for the morse potential
%% Cell type:code id:found-reconstruction tags:
%% Cell type:code id:agricultural-girlfriend tags:
``` python
V.screening = job.factories.functions.exp_A_screening(identifier="V_cutoff", cutoff=7)
```
%% Cell type:markdown id:indie-assembly tags:
%% Cell type:markdown id:talented-ukraine tags:
The electron density is chosen to be a spline function. The cutoff has to be defined. Derivatives left and right are optional, they default to 0. For the right cutoff this is fine, since the forces should smoothly go to 0. For the left this is not necessarily the best choice, since the function value should increase at very close distances. Very large absolute values will lead to osciallations and should be avoided.
%% Cell type:code id:experimental-hanging tags:
%% Cell type:code id:surprised-completion tags:
``` python
rho = job.factories.functions.spline(identifier="rho_CuCu", cutoff=7, derivative_left=-1)
```
%% Cell type:markdown id:extreme-questionnaire tags:
%% Cell type:markdown id:incorporate-pepper tags:
For a spline function it is necessary to define node points. They can be equally spaced or sampled with higher density around turning points, f.e. the first neighbor distance.
Too few nodepoints mean low flexibilty, too many lead to overfitting. This requires some testing to find an optimal choice.
%% Cell type:code id:disturbed-realtor tags:
%% Cell type:code id:quantitative-demonstration tags:
``` python
rho_nodes = np.linspace(0.5, 7.0, 7).round(2)
rho_nodes
```
%% Output
array([0.5 , 1.58, 2.67, 3.75, 4.83, 5.92, 7. ])
%% Cell type:markdown id:taken-month tags:
%% Cell type:markdown id:prime-commonwealth tags:
The nodes need initial values. The electron density should be proportional to $e^{-r}$, so this function is chosen to calculate them.
%% Cell type:code id:understood-finance tags:
%% Cell type:code id:loaded-claim tags:
``` python
decaying_exp = lambda r: np.exp(-r)
rho_initial = decaying_exp(rho_nodes)
```
%% Cell type:markdown id:statistical-stylus tags:
%% Cell type:markdown id:fatty-recorder tags:
A density can't be negative so the lower limit is set to 0. The upper limit is chosen to be 3 times the initial values. These choices aswell as the choice for $e^{-r}$ as initial values are somewhat arbitrary. The electron density from single atoms does not directly influence the calculated energies and forces, instead the summed up density at some place is used in the embedding function, so the final numerical values are an interplay between electron density and embedding function. Since the latter will also be a spline function it can only be defined for a certain range of rho values as node points. Therefore it is better to limit the range of electron density values and define larger limits for the embedding function instead.
%% Cell type:code id:crucial-portfolio tags:
%% Cell type:code id:suited-middle tags:
``` python
rho_mins = np.zeros((len(rho_nodes)))
rho_maxs = 3*rho_initial.round(6)
rho.parameters.create_from_arrays(rho_nodes, rho_initial, min_vals=rho_mins, max_vals=rho_maxs)
```
%% Cell type:raw id:meaning-bowling tags:
%% Cell type:raw id:executed-coast tags:
Finally the last node point at the cutoff range is set to 0 and fitting is disabled to prevent a discontinuous change of energy at the cutoff.
%% Cell type:code id:thrown-leone tags:
%% Cell type:code id:rational-execution tags:
``` python
rho.parameters["node_7.0"].start_val = 0
rho.parameters["node_7.0"].enabled = False
```
%% Cell type:markdown id:declared-junior tags:
%% Cell type:markdown id:continuing-elements tags:
$-\sqrt(\rho)$ can be used as initial guess for the embedding energy, which is taken from second moment approximation tight binding.
The node points have to be chosen in a range compatible to the electron density. This can be estimated by calculating it for a densely packed structure.
Alternatively atomicrex writes the maximum electron density of all structures to the output. This can be used as a hint for the node points for consequent fits.
Everything else is similar to the electron density.
%% Cell type:code id:enhanced-throw tags:
%% Cell type:code id:thrown-statement tags:
``` python
F = job.factories.functions.spline(identifier="F_CuCu", cutoff=5)
F_nodes = np.linspace(0.0, 5.0, 7).round(2) #9 is worse 11 is worse 7 is best
F_init = -np.sqrt(F_nodes)
F_maxs = np.zeros(len(F_nodes))
F_mins = -np.ones(len(F_nodes))*5
F.parameters.create_from_arrays(F_nodes, F_init, F_mins, F_maxs)
F.parameters["node_0.0"].enabled=False
F.parameters["node_0.0"].start_val = 0
F.parameters["node_5.0"].enabled=False
F.parameters["node_5.0"].start_val = 0
```