Commit 3e2639db authored by Thomas Purcell's avatar Thomas Purcell
Browse files

Fix Command line tutorial image problem

parent 16ab8877
......@@ -351,7 +351,9 @@ build-gnu-gcov:
pages:
stage: doc_builds
script:
- source cpp_sisso_gnu_py_env/bin/activate
- source cpp_sisso_gnu_param_py_env/bin/activate
- export LD_LIBRARY_PATH=$HOME/intel/oneapi/intelpython/latest/lib/:$HOME/intel/oneapi/intelpython/latest/lib/python3.7:$LD_LIBRARY_PATH
- export PYTHONPATH=$HOME/intel/oneapi/intelpython/latest/lib/python3.7/site-packages/:cpp_sisso_gnu_param_py_env/lib/python3.7/site-packages/
- cd docs/
- make html
- mv _build/html/ ../public
......
Acknowledgements
---
# Acknowledgements
`SISSO++` would not be possible without the following packages:
......@@ -9,7 +8,7 @@ Acknowledgements
- [googletest](https://github.com/google/googletest)
- [NLopt](http://github.com/stevengj/nlopt)
# How to cite these packages:
## How to cite these packages:
Please make sure to give credit to the right people when using `SISSO++`:
For classification problems cite:
......
......@@ -69,7 +69,6 @@ Below are reconstructions of both files for this calculation (To see the file cl
<details>
<summary>feature_space/SIS_summary.txt</summary>
```text
# FEAT_ID Score Feature Expression
0 0.920868624862486329 ((E_HOMO_B / r_p_A) / (r_sigma + r_p_B))
1 0.919657911026942054 ((|r_pi - r_s_A|) / (r_s_A^3))
......@@ -115,12 +114,10 @@ Below are reconstructions of both files for this calculation (To see the file cl
38 0.262777418218664849 ((E_LUMO_A^6) / (r_p_B^3))
39 0.253659279222423484 ((E_LUMO_A / r_p_B) * (E_LUMO_B * E_LUMO_A))
#-----------------------------------------------------------------------
```
</details>
<details>
<summary>feature_space/selected_features.txt</summary>
```text
# FEAT_ID Feature Postfix Expression (RPN)
0 9|14|div|18|15|add|div
1 19|12|abd|12|cb|div
......@@ -166,7 +163,7 @@ Below are reconstructions of both files for this calculation (To see the file cl
38 10|sp|15|cb|div
39 10|15|div|11|10|mult|mult
#-----------------------------------------------------------------------
```
</details>
In both files the change in rung is represented by the commented out dashed (--) line.
......@@ -183,7 +180,6 @@ An example of these files is provided here:
<details>
<summary>models/train_dim_2_model_0.dat</summary>
```csv
# c0 + a0 * ((EA_B - IP_A) * (|r_sigma - r_s_B|)) + a1 * ((E_HOMO_B / r_p_A) / (r_sigma + r_p_B))
# Property Label: $E_{RS} - E_{ZB}$; Unit of the Property: eV
# RMSE: 0.0931540779192557; Max AE: 0.356632500670745
......@@ -281,7 +277,6 @@ An example of these files is provided here:
SeZn , 2.631368992806530e-01, 2.463580576975095e-01, 7.384497385908948e-01, -2.320488278555971e+00
TeZn , 2.450012951740060e-01, 1.776248032825628e-01, 2.763715059556858e+00, -2.304848319397327e+00
```
</details>
......@@ -295,7 +290,6 @@ To do this we have to modify the `sisso.json` file to automatically leave out a
<details>
<summary> updated sisso.json file</summary>
```json
{
"data_file": "data.csv",
"property_key": "E_RS - E_ZB",
......@@ -311,7 +305,7 @@ To do this we have to modify the `sisso.json` file to automatically leave out a
"leave_out_inds": [],
"opset": ["add", "sub", "abs_diff", "mult", "div", "inv", "abs", "exp", "log", "sin", "cos", "sq", "cb", "six_pow", "sqrt", "cbrt", "neg_exp"]
}
```
</details>
Now lets make ten cross validation directories in the working directory and copy the `data.csv` and `sisso.json` into them and run separate calculations for each run.
......@@ -339,7 +333,6 @@ A full example of the testing set output file is reproduced below:
<details>
<summary>The test data file cv_0/models/test_dim_2_model_0.dat</summary>
```csv
# c0 + a0 * ((E_HOMO_B / r_p_A) / (r_sigma + r_p_B))
# Property Label: $E_{RS} - E_{ZB}$; Unit of the Property: eV
# RMSE: 0.212994478440008; Max AE: 0.442277221520276
......@@ -363,7 +356,6 @@ A full example of the testing set output file is reproduced below:
BrNa , -1.264287278827400e-01, -1.888626375989341e-01, -8.644123624074346e-01
CSi , 6.690237272359810e-01, 3.948280949265375e-01, -3.351692484156472e+00
```
</details>
## Analyzing the Results with python
......@@ -388,7 +380,9 @@ To visualize these results we will also use `plot_validation_rmse` at the end, a
Here is an example of the `plot_validation_rmse` output:
<details>
<summary> Cross-Validation results </summary>
![image](command_line/cv/cv_10_error.png)
![image](./command_line/cv/cv_10_error.png)
</details>
These initial results, particularly the high standard error of the mean for the 1D and 3D models, indicate that more cross-validation samples are needed (Note: you will have different values as the random samples will be different), so lets increase the total number of samples to 100, and redo the analysis
......@@ -417,7 +411,8 @@ As can be seen from the standard error measurements the results are now reasonab
<details>
<summary> Converged cross-validation results </summary>
![image](command_line/cv/cv_100_error.png)
![image](./command_line/cv/cv_100_error.png)
</details>
Because the validation error for the three and four dimensional models are within each others error bars and the standard error increases when going to the fourth dimension, we conclude that the three-dimensional model has the ideal complexity.
......@@ -433,7 +428,8 @@ To see the distributions for this system we run
<details>
<summary> Distribution of Errors </summary>
![image](./command_line/error_cv.png)
![image](./command_line/cv/error_cv_dist.png)
</details>
One thing that stands out in the plot is the large error seen in a single point for both the one and two dimensional models.
......@@ -596,7 +592,7 @@ The result of which is shown below:
<details>
<summary> Final 3D model </summary>
![image](./command_line/3d_model.png)
![image](./command_line/cv/3d_model.png)
</details>
Additionally you can generate a output the model as a Matlab function or a LaTeX string using the following commands.
......@@ -610,9 +606,9 @@ Additionally you can generate a output the model as a Matlab function or a LaTeX
A copy of the generated matlab function is below.
<details>
<summary> Final 3D model </summary>
<summary> Matlab function of the Final 3D model </summary>
```matlab
function P = model(X)
% Returns the value of E_{RS} - E_{ZB} = c0 + a0 * ((r_d_B / r_d_A) * (r_p_B * E_HOMO_A)) + a1 * ((IP_A^3) * (|r_sigma - r_s_B|)) + a2 * ((IP_A / r_p_A) / (r_p_B + r_p_A))
%
......@@ -650,5 +646,5 @@ A copy of the generated matlab function is below.
P = reshape(c0 + a0 * f0 + a1 * f1 + a2 * f2, [], 1);
end
```
</details>
Performing Classification with SISSO++
---
# Performing Classification with SISSO++
inally, besides regression problems, `SISSO++` can be used to solve classification problems.
As an example of this we will adapt the previous example by replacing the property with the identifier of if the material favors the rock-salt or zinc-blende structure, and change the calculation type to be `classification`.
It is important to note that while this problem only has two classes, multi-class classification is also possible.
......@@ -10,7 +10,6 @@ Here is the updated data file, with the property `E_RS - E_ZB (eV)` replaced wit
<details>
<summary>Here is the full data_class.csv file for the calculation</summary>
```
# Material,Class,Z_A (nuc_charge) ,Z_B (nuc_charge) ,period_A,period_B,IP_A (eV_IP) ,IP_B (eV_IP) ,EA_A (eV_IP),EA_B (eV_IP) ,E_HOMO_A (eV) ,E_HOMO_B (eV) ,E_LUMO_A (eV),E_LUMO_B (eV) ,r_s_A ,r_s_B ,r_p_A ,r_p_B ,r_d_A ,r_d_B,r_sigma ,r_pi
AgBr,0,47,35,5,4,-8.0580997467,-12.649600029,-1.66659998894,-3.73930001259,-4.71000003815,-8.00100040436,-0.479000002146,0.708000004292,1.32000005245,0.75,1.87999999523,0.879999995232,2.97000002861,1.87000000477,1.570000052448,0.689999938012
AgCl,0,47,17,5,3,-8.0580997467,-13.9018001556,-1.66659998894,-3.97079992294,-4.71000003815,-8.69999980927,-0.479000002146,0.574000000954,1.32000005245,0.680000007153,1.87999999523,0.759999990463,2.97000002861,1.66999995708,1.760000050064,0.63999992609
......@@ -94,7 +93,7 @@ Here is the updated data file, with the property `E_RS - E_ZB (eV)` replaced wit
SZn,1,30,16,4,3,-10.1354999542,-11.7951002121,1.08070003986,-2.84489989281,-6.21700000763,-7.10599994659,-1.19400000572,0.64200001955,1.10000002384,0.740000009537,1.54999995232,0.850000023842,2.25,2.36999988556,1.059999942781,0.559999942785
SeZn,1,30,34,4,4,-10.1354999542,-10.9460000992,1.08070003986,-2.75099992752,-6.21700000763,-6.65399980545,-1.19400000572,1.31599998474,1.10000002384,0.800000011921,1.54999995232,0.949999988079,2.25,2.18000006676,0.89999997616,0.599999904638
TeZn,1,30,52,4,5,-10.1354999542,-9.86670017242,1.08070003986,-2.66599988937,-6.21700000763,-6.10900020599,-1.19400000572,0.0989999994636,1.10000002384,0.939999997616,1.54999995232,1.13999998569,2.25,1.83000004292,0.569999992854,0.649999916554
```
</details>
## Running `SISSO++` for Classification problems
......@@ -148,7 +147,6 @@ The two output files stored in `feature_space/` are also very similar, with the
<details>
<summary>feature_space/SIS_summary.txt</summary>
```text
# FEAT_ID Score Feature Expression
0 2.00218777423865069 (r_sigma + r_p_B)
1 2.0108802733799549 (r_pi - r_p_A)
......@@ -192,7 +190,8 @@ The two output files stored in `feature_space/` are also very similar, with the
38 -0.999999633027590651 (period_A * Z_B)
39 -0.999999625788926316 (Z_B / EA_A)
#-----------------------------------------------------------------------
```
</details>
Additionally the model files change to better represent the classifier.
The largest changes are in the header, where the coefficients now represent the linear decision boundaries calculated using support-vector machines (SVM).
......@@ -200,7 +199,6 @@ The estimated property vector in this case refers to the predicted class from SV
<details>
<summary>models/train_dim_2_model_0.dat</summary>
```csv
# [(EA_B * Z_A), (r_sigma + r_p_B)]
# Property Label: $$Class$$; Unit of the Property: Unitless
# # Samples in Convex Hull Overlap Region: 0;# Samples SVM Misclassified: 0
......@@ -298,7 +296,6 @@ The estimated property vector in this case refers to the predicted class from SV
SeZn , 1.000000000000000e+00, 1.000000000000000e+00, -8.252999782560001e+01, 1.849999964239000e+00
TeZn , 1.000000000000000e+00, 1.000000000000000e+00, -7.997999668110000e+01, 1.709999978544000e+00
```
</details>
## Updating the SVM Model Using `sklearn`
......@@ -332,11 +329,14 @@ These changes are a result of different SVM libraries leading to slightly differ
<summary> `SISSO++` Classification </summary>
![image](./classification/sissopp.png)
</details>
<details>
<summary> sklearn SVM </summary>
![image](./classification/c_1000.png)
</details>
However as we decrease the value of `c` an increasing number of points becomes miss classified, suggesting the model is potentially over-fitting the data .
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment