Skip to content
Snippets Groups Projects
Commit 3e2639db authored by Thomas Purcell's avatar Thomas Purcell
Browse files

Fix Command line tutorial image problem

parent 16ab8877
No related branches found
No related tags found
No related merge requests found
...@@ -351,7 +351,9 @@ build-gnu-gcov: ...@@ -351,7 +351,9 @@ build-gnu-gcov:
pages: pages:
stage: doc_builds stage: doc_builds
script: script:
- source cpp_sisso_gnu_py_env/bin/activate - source cpp_sisso_gnu_param_py_env/bin/activate
- export LD_LIBRARY_PATH=$HOME/intel/oneapi/intelpython/latest/lib/:$HOME/intel/oneapi/intelpython/latest/lib/python3.7:$LD_LIBRARY_PATH
- export PYTHONPATH=$HOME/intel/oneapi/intelpython/latest/lib/python3.7/site-packages/:cpp_sisso_gnu_param_py_env/lib/python3.7/site-packages/
- cd docs/ - cd docs/
- make html - make html
- mv _build/html/ ../public - mv _build/html/ ../public
......
Acknowledgements # Acknowledgements
---
`SISSO++` would not be possible without the following packages: `SISSO++` would not be possible without the following packages:
...@@ -9,7 +8,7 @@ Acknowledgements ...@@ -9,7 +8,7 @@ Acknowledgements
- [googletest](https://github.com/google/googletest) - [googletest](https://github.com/google/googletest)
- [NLopt](http://github.com/stevengj/nlopt) - [NLopt](http://github.com/stevengj/nlopt)
# How to cite these packages: ## How to cite these packages:
Please make sure to give credit to the right people when using `SISSO++`: Please make sure to give credit to the right people when using `SISSO++`:
For classification problems cite: For classification problems cite:
......
...@@ -69,7 +69,6 @@ Below are reconstructions of both files for this calculation (To see the file cl ...@@ -69,7 +69,6 @@ Below are reconstructions of both files for this calculation (To see the file cl
<details> <details>
<summary>feature_space/SIS_summary.txt</summary> <summary>feature_space/SIS_summary.txt</summary>
```text
# FEAT_ID Score Feature Expression # FEAT_ID Score Feature Expression
0 0.920868624862486329 ((E_HOMO_B / r_p_A) / (r_sigma + r_p_B)) 0 0.920868624862486329 ((E_HOMO_B / r_p_A) / (r_sigma + r_p_B))
1 0.919657911026942054 ((|r_pi - r_s_A|) / (r_s_A^3)) 1 0.919657911026942054 ((|r_pi - r_s_A|) / (r_s_A^3))
...@@ -115,12 +114,10 @@ Below are reconstructions of both files for this calculation (To see the file cl ...@@ -115,12 +114,10 @@ Below are reconstructions of both files for this calculation (To see the file cl
38 0.262777418218664849 ((E_LUMO_A^6) / (r_p_B^3)) 38 0.262777418218664849 ((E_LUMO_A^6) / (r_p_B^3))
39 0.253659279222423484 ((E_LUMO_A / r_p_B) * (E_LUMO_B * E_LUMO_A)) 39 0.253659279222423484 ((E_LUMO_A / r_p_B) * (E_LUMO_B * E_LUMO_A))
#----------------------------------------------------------------------- #-----------------------------------------------------------------------
```
</details> </details>
<details> <details>
<summary>feature_space/selected_features.txt</summary> <summary>feature_space/selected_features.txt</summary>
```text
# FEAT_ID Feature Postfix Expression (RPN) # FEAT_ID Feature Postfix Expression (RPN)
0 9|14|div|18|15|add|div 0 9|14|div|18|15|add|div
1 19|12|abd|12|cb|div 1 19|12|abd|12|cb|div
...@@ -166,7 +163,7 @@ Below are reconstructions of both files for this calculation (To see the file cl ...@@ -166,7 +163,7 @@ Below are reconstructions of both files for this calculation (To see the file cl
38 10|sp|15|cb|div 38 10|sp|15|cb|div
39 10|15|div|11|10|mult|mult 39 10|15|div|11|10|mult|mult
#----------------------------------------------------------------------- #-----------------------------------------------------------------------
```
</details> </details>
In both files the change in rung is represented by the commented out dashed (--) line. In both files the change in rung is represented by the commented out dashed (--) line.
...@@ -183,7 +180,6 @@ An example of these files is provided here: ...@@ -183,7 +180,6 @@ An example of these files is provided here:
<details> <details>
<summary>models/train_dim_2_model_0.dat</summary> <summary>models/train_dim_2_model_0.dat</summary>
```csv
# c0 + a0 * ((EA_B - IP_A) * (|r_sigma - r_s_B|)) + a1 * ((E_HOMO_B / r_p_A) / (r_sigma + r_p_B)) # c0 + a0 * ((EA_B - IP_A) * (|r_sigma - r_s_B|)) + a1 * ((E_HOMO_B / r_p_A) / (r_sigma + r_p_B))
# Property Label: $E_{RS} - E_{ZB}$; Unit of the Property: eV # Property Label: $E_{RS} - E_{ZB}$; Unit of the Property: eV
# RMSE: 0.0931540779192557; Max AE: 0.356632500670745 # RMSE: 0.0931540779192557; Max AE: 0.356632500670745
...@@ -281,7 +277,6 @@ An example of these files is provided here: ...@@ -281,7 +277,6 @@ An example of these files is provided here:
SeZn , 2.631368992806530e-01, 2.463580576975095e-01, 7.384497385908948e-01, -2.320488278555971e+00 SeZn , 2.631368992806530e-01, 2.463580576975095e-01, 7.384497385908948e-01, -2.320488278555971e+00
TeZn , 2.450012951740060e-01, 1.776248032825628e-01, 2.763715059556858e+00, -2.304848319397327e+00 TeZn , 2.450012951740060e-01, 1.776248032825628e-01, 2.763715059556858e+00, -2.304848319397327e+00
```
</details> </details>
...@@ -295,7 +290,6 @@ To do this we have to modify the `sisso.json` file to automatically leave out a ...@@ -295,7 +290,6 @@ To do this we have to modify the `sisso.json` file to automatically leave out a
<details> <details>
<summary> updated sisso.json file</summary> <summary> updated sisso.json file</summary>
```json
{ {
"data_file": "data.csv", "data_file": "data.csv",
"property_key": "E_RS - E_ZB", "property_key": "E_RS - E_ZB",
...@@ -311,7 +305,7 @@ To do this we have to modify the `sisso.json` file to automatically leave out a ...@@ -311,7 +305,7 @@ To do this we have to modify the `sisso.json` file to automatically leave out a
"leave_out_inds": [], "leave_out_inds": [],
"opset": ["add", "sub", "abs_diff", "mult", "div", "inv", "abs", "exp", "log", "sin", "cos", "sq", "cb", "six_pow", "sqrt", "cbrt", "neg_exp"] "opset": ["add", "sub", "abs_diff", "mult", "div", "inv", "abs", "exp", "log", "sin", "cos", "sq", "cb", "six_pow", "sqrt", "cbrt", "neg_exp"]
} }
```
</details> </details>
Now lets make ten cross validation directories in the working directory and copy the `data.csv` and `sisso.json` into them and run separate calculations for each run. Now lets make ten cross validation directories in the working directory and copy the `data.csv` and `sisso.json` into them and run separate calculations for each run.
...@@ -339,7 +333,6 @@ A full example of the testing set output file is reproduced below: ...@@ -339,7 +333,6 @@ A full example of the testing set output file is reproduced below:
<details> <details>
<summary>The test data file cv_0/models/test_dim_2_model_0.dat</summary> <summary>The test data file cv_0/models/test_dim_2_model_0.dat</summary>
```csv
# c0 + a0 * ((E_HOMO_B / r_p_A) / (r_sigma + r_p_B)) # c0 + a0 * ((E_HOMO_B / r_p_A) / (r_sigma + r_p_B))
# Property Label: $E_{RS} - E_{ZB}$; Unit of the Property: eV # Property Label: $E_{RS} - E_{ZB}$; Unit of the Property: eV
# RMSE: 0.212994478440008; Max AE: 0.442277221520276 # RMSE: 0.212994478440008; Max AE: 0.442277221520276
...@@ -363,7 +356,6 @@ A full example of the testing set output file is reproduced below: ...@@ -363,7 +356,6 @@ A full example of the testing set output file is reproduced below:
BrNa , -1.264287278827400e-01, -1.888626375989341e-01, -8.644123624074346e-01 BrNa , -1.264287278827400e-01, -1.888626375989341e-01, -8.644123624074346e-01
CSi , 6.690237272359810e-01, 3.948280949265375e-01, -3.351692484156472e+00 CSi , 6.690237272359810e-01, 3.948280949265375e-01, -3.351692484156472e+00
```
</details> </details>
## Analyzing the Results with python ## Analyzing the Results with python
...@@ -388,7 +380,9 @@ To visualize these results we will also use `plot_validation_rmse` at the end, a ...@@ -388,7 +380,9 @@ To visualize these results we will also use `plot_validation_rmse` at the end, a
Here is an example of the `plot_validation_rmse` output: Here is an example of the `plot_validation_rmse` output:
<details> <details>
<summary> Cross-Validation results </summary> <summary> Cross-Validation results </summary>
![image](command_line/cv/cv_10_error.png)
![image](./command_line/cv/cv_10_error.png)
</details> </details>
These initial results, particularly the high standard error of the mean for the 1D and 3D models, indicate that more cross-validation samples are needed (Note: you will have different values as the random samples will be different), so lets increase the total number of samples to 100, and redo the analysis These initial results, particularly the high standard error of the mean for the 1D and 3D models, indicate that more cross-validation samples are needed (Note: you will have different values as the random samples will be different), so lets increase the total number of samples to 100, and redo the analysis
...@@ -417,7 +411,8 @@ As can be seen from the standard error measurements the results are now reasonab ...@@ -417,7 +411,8 @@ As can be seen from the standard error measurements the results are now reasonab
<details> <details>
<summary> Converged cross-validation results </summary> <summary> Converged cross-validation results </summary>
![image](command_line/cv/cv_100_error.png) ![image](./command_line/cv/cv_100_error.png)
</details> </details>
Because the validation error for the three and four dimensional models are within each others error bars and the standard error increases when going to the fourth dimension, we conclude that the three-dimensional model has the ideal complexity. Because the validation error for the three and four dimensional models are within each others error bars and the standard error increases when going to the fourth dimension, we conclude that the three-dimensional model has the ideal complexity.
...@@ -433,7 +428,8 @@ To see the distributions for this system we run ...@@ -433,7 +428,8 @@ To see the distributions for this system we run
<details> <details>
<summary> Distribution of Errors </summary> <summary> Distribution of Errors </summary>
![image](./command_line/error_cv.png) ![image](./command_line/cv/error_cv_dist.png)
</details> </details>
One thing that stands out in the plot is the large error seen in a single point for both the one and two dimensional models. One thing that stands out in the plot is the large error seen in a single point for both the one and two dimensional models.
...@@ -596,7 +592,7 @@ The result of which is shown below: ...@@ -596,7 +592,7 @@ The result of which is shown below:
<details> <details>
<summary> Final 3D model </summary> <summary> Final 3D model </summary>
![image](./command_line/3d_model.png) ![image](./command_line/cv/3d_model.png)
</details> </details>
Additionally you can generate a output the model as a Matlab function or a LaTeX string using the following commands. Additionally you can generate a output the model as a Matlab function or a LaTeX string using the following commands.
...@@ -610,9 +606,9 @@ Additionally you can generate a output the model as a Matlab function or a LaTeX ...@@ -610,9 +606,9 @@ Additionally you can generate a output the model as a Matlab function or a LaTeX
A copy of the generated matlab function is below. A copy of the generated matlab function is below.
<details> <details>
<summary> Final 3D model </summary> <summary> Matlab function of the Final 3D model </summary>
```matlab
function P = model(X) function P = model(X)
% Returns the value of E_{RS} - E_{ZB} = c0 + a0 * ((r_d_B / r_d_A) * (r_p_B * E_HOMO_A)) + a1 * ((IP_A^3) * (|r_sigma - r_s_B|)) + a2 * ((IP_A / r_p_A) / (r_p_B + r_p_A)) % Returns the value of E_{RS} - E_{ZB} = c0 + a0 * ((r_d_B / r_d_A) * (r_p_B * E_HOMO_A)) + a1 * ((IP_A^3) * (|r_sigma - r_s_B|)) + a2 * ((IP_A / r_p_A) / (r_p_B + r_p_A))
% %
...@@ -650,5 +646,5 @@ A copy of the generated matlab function is below. ...@@ -650,5 +646,5 @@ A copy of the generated matlab function is below.
P = reshape(c0 + a0 * f0 + a1 * f1 + a2 * f2, [], 1); P = reshape(c0 + a0 * f0 + a1 * f1 + a2 * f2, [], 1);
end end
```
</details> </details>
Performing Classification with SISSO++ # Performing Classification with SISSO++
---
inally, besides regression problems, `SISSO++` can be used to solve classification problems. inally, besides regression problems, `SISSO++` can be used to solve classification problems.
As an example of this we will adapt the previous example by replacing the property with the identifier of if the material favors the rock-salt or zinc-blende structure, and change the calculation type to be `classification`. As an example of this we will adapt the previous example by replacing the property with the identifier of if the material favors the rock-salt or zinc-blende structure, and change the calculation type to be `classification`.
It is important to note that while this problem only has two classes, multi-class classification is also possible. It is important to note that while this problem only has two classes, multi-class classification is also possible.
...@@ -10,7 +10,6 @@ Here is the updated data file, with the property `E_RS - E_ZB (eV)` replaced wit ...@@ -10,7 +10,6 @@ Here is the updated data file, with the property `E_RS - E_ZB (eV)` replaced wit
<details> <details>
<summary>Here is the full data_class.csv file for the calculation</summary> <summary>Here is the full data_class.csv file for the calculation</summary>
```
# Material,Class,Z_A (nuc_charge) ,Z_B (nuc_charge) ,period_A,period_B,IP_A (eV_IP) ,IP_B (eV_IP) ,EA_A (eV_IP),EA_B (eV_IP) ,E_HOMO_A (eV) ,E_HOMO_B (eV) ,E_LUMO_A (eV),E_LUMO_B (eV) ,r_s_A ,r_s_B ,r_p_A ,r_p_B ,r_d_A ,r_d_B,r_sigma ,r_pi # Material,Class,Z_A (nuc_charge) ,Z_B (nuc_charge) ,period_A,period_B,IP_A (eV_IP) ,IP_B (eV_IP) ,EA_A (eV_IP),EA_B (eV_IP) ,E_HOMO_A (eV) ,E_HOMO_B (eV) ,E_LUMO_A (eV),E_LUMO_B (eV) ,r_s_A ,r_s_B ,r_p_A ,r_p_B ,r_d_A ,r_d_B,r_sigma ,r_pi
AgBr,0,47,35,5,4,-8.0580997467,-12.649600029,-1.66659998894,-3.73930001259,-4.71000003815,-8.00100040436,-0.479000002146,0.708000004292,1.32000005245,0.75,1.87999999523,0.879999995232,2.97000002861,1.87000000477,1.570000052448,0.689999938012 AgBr,0,47,35,5,4,-8.0580997467,-12.649600029,-1.66659998894,-3.73930001259,-4.71000003815,-8.00100040436,-0.479000002146,0.708000004292,1.32000005245,0.75,1.87999999523,0.879999995232,2.97000002861,1.87000000477,1.570000052448,0.689999938012
AgCl,0,47,17,5,3,-8.0580997467,-13.9018001556,-1.66659998894,-3.97079992294,-4.71000003815,-8.69999980927,-0.479000002146,0.574000000954,1.32000005245,0.680000007153,1.87999999523,0.759999990463,2.97000002861,1.66999995708,1.760000050064,0.63999992609 AgCl,0,47,17,5,3,-8.0580997467,-13.9018001556,-1.66659998894,-3.97079992294,-4.71000003815,-8.69999980927,-0.479000002146,0.574000000954,1.32000005245,0.680000007153,1.87999999523,0.759999990463,2.97000002861,1.66999995708,1.760000050064,0.63999992609
...@@ -94,7 +93,7 @@ Here is the updated data file, with the property `E_RS - E_ZB (eV)` replaced wit ...@@ -94,7 +93,7 @@ Here is the updated data file, with the property `E_RS - E_ZB (eV)` replaced wit
SZn,1,30,16,4,3,-10.1354999542,-11.7951002121,1.08070003986,-2.84489989281,-6.21700000763,-7.10599994659,-1.19400000572,0.64200001955,1.10000002384,0.740000009537,1.54999995232,0.850000023842,2.25,2.36999988556,1.059999942781,0.559999942785 SZn,1,30,16,4,3,-10.1354999542,-11.7951002121,1.08070003986,-2.84489989281,-6.21700000763,-7.10599994659,-1.19400000572,0.64200001955,1.10000002384,0.740000009537,1.54999995232,0.850000023842,2.25,2.36999988556,1.059999942781,0.559999942785
SeZn,1,30,34,4,4,-10.1354999542,-10.9460000992,1.08070003986,-2.75099992752,-6.21700000763,-6.65399980545,-1.19400000572,1.31599998474,1.10000002384,0.800000011921,1.54999995232,0.949999988079,2.25,2.18000006676,0.89999997616,0.599999904638 SeZn,1,30,34,4,4,-10.1354999542,-10.9460000992,1.08070003986,-2.75099992752,-6.21700000763,-6.65399980545,-1.19400000572,1.31599998474,1.10000002384,0.800000011921,1.54999995232,0.949999988079,2.25,2.18000006676,0.89999997616,0.599999904638
TeZn,1,30,52,4,5,-10.1354999542,-9.86670017242,1.08070003986,-2.66599988937,-6.21700000763,-6.10900020599,-1.19400000572,0.0989999994636,1.10000002384,0.939999997616,1.54999995232,1.13999998569,2.25,1.83000004292,0.569999992854,0.649999916554 TeZn,1,30,52,4,5,-10.1354999542,-9.86670017242,1.08070003986,-2.66599988937,-6.21700000763,-6.10900020599,-1.19400000572,0.0989999994636,1.10000002384,0.939999997616,1.54999995232,1.13999998569,2.25,1.83000004292,0.569999992854,0.649999916554
```
</details> </details>
## Running `SISSO++` for Classification problems ## Running `SISSO++` for Classification problems
...@@ -148,7 +147,6 @@ The two output files stored in `feature_space/` are also very similar, with the ...@@ -148,7 +147,6 @@ The two output files stored in `feature_space/` are also very similar, with the
<details> <details>
<summary>feature_space/SIS_summary.txt</summary> <summary>feature_space/SIS_summary.txt</summary>
```text
# FEAT_ID Score Feature Expression # FEAT_ID Score Feature Expression
0 2.00218777423865069 (r_sigma + r_p_B) 0 2.00218777423865069 (r_sigma + r_p_B)
1 2.0108802733799549 (r_pi - r_p_A) 1 2.0108802733799549 (r_pi - r_p_A)
...@@ -192,7 +190,8 @@ The two output files stored in `feature_space/` are also very similar, with the ...@@ -192,7 +190,8 @@ The two output files stored in `feature_space/` are also very similar, with the
38 -0.999999633027590651 (period_A * Z_B) 38 -0.999999633027590651 (period_A * Z_B)
39 -0.999999625788926316 (Z_B / EA_A) 39 -0.999999625788926316 (Z_B / EA_A)
#----------------------------------------------------------------------- #-----------------------------------------------------------------------
```
</details>
Additionally the model files change to better represent the classifier. Additionally the model files change to better represent the classifier.
The largest changes are in the header, where the coefficients now represent the linear decision boundaries calculated using support-vector machines (SVM). The largest changes are in the header, where the coefficients now represent the linear decision boundaries calculated using support-vector machines (SVM).
...@@ -200,7 +199,6 @@ The estimated property vector in this case refers to the predicted class from SV ...@@ -200,7 +199,6 @@ The estimated property vector in this case refers to the predicted class from SV
<details> <details>
<summary>models/train_dim_2_model_0.dat</summary> <summary>models/train_dim_2_model_0.dat</summary>
```csv
# [(EA_B * Z_A), (r_sigma + r_p_B)] # [(EA_B * Z_A), (r_sigma + r_p_B)]
# Property Label: $$Class$$; Unit of the Property: Unitless # Property Label: $$Class$$; Unit of the Property: Unitless
# # Samples in Convex Hull Overlap Region: 0;# Samples SVM Misclassified: 0 # # Samples in Convex Hull Overlap Region: 0;# Samples SVM Misclassified: 0
...@@ -298,7 +296,6 @@ The estimated property vector in this case refers to the predicted class from SV ...@@ -298,7 +296,6 @@ The estimated property vector in this case refers to the predicted class from SV
SeZn , 1.000000000000000e+00, 1.000000000000000e+00, -8.252999782560001e+01, 1.849999964239000e+00 SeZn , 1.000000000000000e+00, 1.000000000000000e+00, -8.252999782560001e+01, 1.849999964239000e+00
TeZn , 1.000000000000000e+00, 1.000000000000000e+00, -7.997999668110000e+01, 1.709999978544000e+00 TeZn , 1.000000000000000e+00, 1.000000000000000e+00, -7.997999668110000e+01, 1.709999978544000e+00
```
</details> </details>
## Updating the SVM Model Using `sklearn` ## Updating the SVM Model Using `sklearn`
...@@ -332,11 +329,14 @@ These changes are a result of different SVM libraries leading to slightly differ ...@@ -332,11 +329,14 @@ These changes are a result of different SVM libraries leading to slightly differ
<summary> `SISSO++` Classification </summary> <summary> `SISSO++` Classification </summary>
![image](./classification/sissopp.png) ![image](./classification/sissopp.png)
</details> </details>
<details> <details>
<summary> sklearn SVM </summary> <summary> sklearn SVM </summary>
![image](./classification/c_1000.png) ![image](./classification/c_1000.png)
</details> </details>
However as we decrease the value of `c` an increasing number of points becomes miss classified, suggesting the model is potentially over-fitting the data . However as we decrease the value of `c` an increasing number of points becomes miss classified, suggesting the model is potentially over-fitting the data .
......
docs/tutorial/command_line/cv/cv_100_error.png

17.8 KiB

docs/tutorial/command_line/cv/cv_10_error.png

16.3 KiB

docs/tutorial/command_line/cv/error_cv_dist.png

32 KiB

0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment