Skip to content
Snippets Groups Projects
Commit 6ed03b8e authored by Thomas Purcell's avatar Thomas Purcell
Browse files

Update classification docs per Chris' comments

parent 841966b0
Branches
No related tags found
No related merge requests found
...@@ -125,23 +125,23 @@ mpiexec -n 2 sisso++ sisso.json ...@@ -125,23 +125,23 @@ mpiexec -n 2 sisso++ sisso.json
``` ```
and get the following on screen output and get the following on screen output
``` ```
time input_parsing: 0.000935793 s time input_parsing: 0.00104308 s
time to generate feat sapce: 0.00244188 s time to generate feat sapce: 0.00736403 s
Projection time: 0.000730991 s Projection time: 0.0013411 s
Time to get best features on rank : 0.000519991 s Time to get best features on rank : 0.000585079 s
Complete final combination/selection from all ranks: 0.00114894 s Complete final combination/selection from all ranks: 0.000355005 s
Time for SIS: 0.0025301 s Time for SIS: 0.00245714 s
Time for l0-norm: 0.215944 s Time for l0-norm: 0.105334 s
Projection time: 0.000841856 s Projection time: 0.00138497 s
Time to get best features on rank : 0.00506878 s Time to get best features on rank : 0.00287414 s
Complete final combination/selection from all ranks: 0.00351715 s Complete final combination/selection from all ranks: 0.000135899 s
Time for SIS: 0.00997591 s Time for SIS: 0.00476503 s
Time for l0-norm: 0.951096 s Time for l0-norm: 2.79099 s
Percent of training data in the convex overlap region: 2.43902% Percent of training data in the convex overlap region: 2.43902%
[(r_sigma + r_p_B)] [(r_sigma + r_p_B)]
Percent of training data in the convex overlap region: 0% Percent of training data in the convex overlap region: 0%
[(Z_B / EA_A), (r_sigma + r_s_B)] [(EA_A * Z_B), (r_sigma + r_p_B)]
``` ```
As with the regression problems, the standard output provides information about what step the calculation just finished and how long it took to complete so you can see where a job failed or ran out of time. As with the regression problems, the standard output provides information about what step the calculation just finished and how long it took to complete so you can see where a job failed or ran out of time.
However, the final summary now provides the list of features that best separate out the classes with fewest number of points inside the overlap region of the convex hulls of each class. However, the final summary now provides the list of features that best separate out the classes with fewest number of points inside the overlap region of the convex hulls of each class.
...@@ -151,8 +151,8 @@ The two output files stored in `feature_space/` are also very similar, with the ...@@ -151,8 +151,8 @@ The two output files stored in `feature_space/` are also very similar, with the
# FEAT_ID Score Feature Expression # FEAT_ID Score Feature Expression
0 2.00218777423865069 (r_sigma + r_p_B) 0 2.00218777423865069 (r_sigma + r_p_B)
1 2.0108802733799549 (r_pi - r_p_A) 1 2.0108802733799549 (|r_pi - r_p_A|)
2 2.0108802733799549 (|r_pi - r_p_A|) 2 2.0108802733799549 (r_pi - r_p_A)
3 3.00521883927864941 (r_pi * r_sigma) 3 3.00521883927864941 (r_pi * r_sigma)
4 6.0271211617331506 (r_sigma / IP_B) 4 6.0271211617331506 (r_sigma / IP_B)
5 6.02820376741344255 (r_sigma + r_s_B) 5 6.02820376741344255 (r_sigma + r_s_B)
...@@ -180,19 +180,20 @@ The two output files stored in `feature_space/` are also very similar, with the ...@@ -180,19 +180,20 @@ The two output files stored in `feature_space/` are also very similar, with the
26 -0.999999978575254467 (Z_B * Z_A) 26 -0.999999978575254467 (Z_B * Z_A)
27 -0.999999973721653945 (EA_B^6) 27 -0.999999973721653945 (EA_B^6)
28 -0.999999961553741268 (E_HOMO_A^6) 28 -0.999999961553741268 (E_HOMO_A^6)
29 -0.999999902601353075 (IP_B * Z_A) 29 -0.99999991416031242 (IP_B^3)
30 -0.999999878198415182 (r_d_A^6) 30 -0.999999902601353075 (IP_B * Z_A)
31 -0.999999858299492561 (IP_A * Z_A) 31 -0.999999878198415182 (r_d_A^6)
32 -0.99999985529594615 (Z_B / E_LUMO_B) 32 -0.999999858299492561 (IP_A * Z_A)
33 -0.999999850065982798 (E_HOMO_B * Z_A) 33 -0.99999985529594615 (Z_B / E_LUMO_B)
34 -0.999999771428597528 (period_B * Z_A) 34 -0.999999850065982798 (E_HOMO_B * Z_A)
35 -0.999999756076769386 (E_HOMO_A * Z_A) 35 -0.999999771428597528 (period_B * Z_A)
36 -0.999999699570055189 (EA_B * Z_A) 36 -0.999999756076769386 (E_HOMO_A * Z_A)
37 -0.99999967830096359 (EA_A * Z_B) 37 -0.999999734902030979 (E_HOMO_B^3)
38 -0.999999633027590651 (period_A * Z_B) 38 -0.999999699570055189 (EA_B * Z_A)
39 -0.999999625788926316 (Z_B / EA_A) 39 -0.99999967830096359 (EA_A * Z_B)
#----------------------------------------------------------------------- #-----------------------------------------------------------------------
</details> </details>
Additionally the model files change to better represent the classifier. Additionally the model files change to better represent the classifier.
...@@ -201,129 +202,137 @@ The estimated property vector in this case refers to the predicted class from SV ...@@ -201,129 +202,137 @@ The estimated property vector in this case refers to the predicted class from SV
<details> <details>
<summary>models/train_dim_2_model_0.dat</summary> <summary>models/train_dim_2_model_0.dat</summary>
# [(EA_B * Z_A), (r_sigma + r_p_B)] # [(EA_A * Z_B), (r_sigma + r_p_B)]
# Property Label: $$Class$$; Unit of the Property: Unitless # Property Label: $Class$; Unit of the Property: Unitless
# # Samples in Convex Hull Overlap Region: 0;# Samples SVM Misclassified: 0 # # Samples in Convex Hull Overlap Region: 0;# Samples SVM Misclassified: 0
# Decision Boundaries # Decision Boundaries
# Task w0 w1 b # Task w0 w1 b
# # all_0, -1.218479788468588e-03, -1.840577490326880e+00, 4.197450511898939e+00, # all__0.0_1.0, -1.213391554552964e-02, -1.090594288515569e+01, 2.501183842953395e+01,
# Feature Rung, Units, and Expressions # Feature Rung, Units, and Expressions
# 0; 1; eV_IP * nuc_charge; 7|0|mult; (EA_B * Z_A); $\left(EA_{B} Z_{A}\right)$; (EA_B .* Z_A); EA_B,Z_A # 0; 1; eV_IP * nuc_charge; 6|1|mult; (EA_A * Z_B); $\left(EA_{A} Z_{B}\right)$; (EA_A .* Z_B); EA_A,Z_B
# 1; 1; Unitless; 18|15|add; (r_sigma + r_p_B); $\left(r_{sigma} + r_{p, B}\right)$; (r_sigma + r_p_B); r_sigma,r_p_B # 1; 1; Unitless; 18|15|add; (r_sigma + r_p_B); $\left(r_{sigma} + r_{p, B}\right)$; (r_sigma + r_p_B); r_sigma,r_p_B
# Number of Samples Per Task # Number of Samples Per Task
# Task, n_mats_train # Task, n_mats_train
# # all, 82 # all , 82
# Sample ID , Property Value , Property Value (EST) , Feature 0 Value , Feature 1 Value # Sample ID , Property Value , Property Value (EST) , Feature 0 Value , Feature 1 Value
AgBr , 0.000000000000000e+00, 0.000000000000000e+00, -1.757471005917300e+02, 2.450000047680000e+00 AgBr , 0.000000000000000e+00, 0.000000000000000e+00, -5.833099961290000e+01, 2.450000047680000e+00
AgCl , 0.000000000000000e+00, 0.000000000000000e+00, -1.866275963781800e+02, 2.520000040527000e+00 AgCl , 0.000000000000000e+00, 0.000000000000000e+00, -2.833219981198000e+01, 2.520000040527000e+00
AgF , 0.000000000000000e+00, 0.000000000000000e+00, -2.008544983864900e+02, 2.790000051256000e+00 AgF , 0.000000000000000e+00, 0.000000000000000e+00, -1.499939990046000e+01, 2.790000051256000e+00
AgI , 1.000000000000000e+00, 1.000000000000000e+00, -1.651344988344000e+02, 2.300000071522000e+00 AgI , 1.000000000000000e+00, 1.000000000000000e+00, -8.832979941382000e+01, 2.300000071522000e+00
AlAs , 1.000000000000000e+00, 1.000000000000000e+00, -2.390960025792000e+01, 1.629999995228000e+00 AlAs , 1.000000000000000e+00, 1.000000000000000e+00, -1.031250000000000e+01, 1.629999995228000e+00
AlN , 1.000000000000000e+00, 1.000000000000000e+00, -2.427749931815000e+01, 1.939999997612000e+00 AlN , 1.000000000000000e+00, 1.000000000000000e+00, -2.187500000000000e+00, 1.939999997612000e+00
AlP , 1.000000000000000e+00, 1.000000000000000e+00, -2.495999944204000e+01, 1.650000035759000e+00 AlP , 1.000000000000000e+00, 1.000000000000000e+00, -4.687500000000000e+00, 1.650000035759000e+00
AlSb , 1.000000000000000e+00, 1.000000000000000e+00, -2.400709939004000e+01, 1.480000019070000e+00 AlSb , 1.000000000000000e+00, 1.000000000000000e+00, -1.593750000000000e+01, 1.480000019070000e+00
AsGa , 1.000000000000000e+00, 1.000000000000000e+00, -5.701520061504000e+01, 1.470000028615000e+00 AsGa , 1.000000000000000e+00, 1.000000000000000e+00, -3.567299902452000e+00, 1.470000028615000e+00
AsB , 1.000000000000000e+00, 1.000000000000000e+00, -9.196000099200001e+00, 1.289999961847000e+00 AsB , 1.000000000000000e+00, 1.000000000000000e+00, -3.544200003135000e+00, 1.289999961847000e+00
BN , 1.000000000000000e+00, 1.000000000000000e+00, -9.337499737750001e+00, 1.099999964237000e+00 BN , 1.000000000000000e+00, 1.000000000000000e+00, -7.518000006650001e-01, 1.099999964237000e+00
BP , 1.000000000000000e+00, 1.000000000000000e+00, -9.599999785400000e+00, 1.130000054836000e+00 BP , 1.000000000000000e+00, 1.000000000000000e+00, -1.611000001425000e+00, 1.130000054836000e+00
BSb , 1.000000000000000e+00, 1.000000000000000e+00, -9.233499765400000e+00, 1.820000052445000e+00 BSb , 1.000000000000000e+00, 1.000000000000000e+00, -5.477400004845000e+00, 1.820000052445000e+00
BaO , 0.000000000000000e+00, 0.000000000000000e+00, -1.683303947449600e+02, 4.320000201465000e+00 BaO , 0.000000000000000e+00, 0.000000000000000e+00, 2.223999977112000e+00, 4.320000201465000e+00
BaS , 0.000000000000000e+00, 0.000000000000000e+00, -1.593143939973600e+02, 4.040000200273000e+00 BaS , 0.000000000000000e+00, 0.000000000000000e+00, 4.447999954224000e+00, 4.040000200273000e+00
BaSe , 0.000000000000000e+00, 0.000000000000000e+00, -1.540559959411200e+02, 3.980000197889000e+00 BaSe , 0.000000000000000e+00, 0.000000000000000e+00, 9.451999902726000e+00, 3.980000197889000e+00
BaTe , 0.000000000000000e+00, 0.000000000000000e+00, -1.492959938047200e+02, 3.840000212194000e+00 BaTe , 0.000000000000000e+00, 0.000000000000000e+00, 1.445599985122800e+01, 3.840000212194000e+00
BeO , 1.000000000000000e+00, 1.000000000000000e+00, -1.202359962464000e+01, 1.830000072725000e+00 BeO , 1.000000000000000e+00, 1.000000000000000e+00, 5.044000148776000e+00, 1.830000072725000e+00
BeS , 1.000000000000000e+00, 1.000000000000000e+00, -1.137959957124000e+01, 1.550000071533000e+00 BeS , 1.000000000000000e+00, 1.000000000000000e+00, 1.008800029755200e+01, 1.550000071533000e+00
BeSe , 1.000000000000000e+00, 1.000000000000000e+00, -1.100399971008000e+01, 1.490000069149000e+00 BeSe , 1.000000000000000e+00, 1.000000000000000e+00, 2.143700063229800e+01, 1.490000069149000e+00
BeTe , 1.000000000000000e+00, 1.000000000000000e+00, -1.066399955748000e+01, 1.350000083454000e+00 BeTe , 1.000000000000000e+00, 1.000000000000000e+00, 3.278600096704400e+01, 1.350000083454000e+00
C2 , 1.000000000000000e+00, 1.000000000000000e+00, -5.234399914740000e+00, 6.299999952320000e-01 C2 , 1.000000000000000e+00, 1.000000000000000e+00, -5.234399914740000e+00, 6.299999952320000e-01
CaO , 0.000000000000000e+00, 0.000000000000000e+00, -6.011799812320000e+01, 3.619999915355000e+00 CaO , 0.000000000000000e+00, 0.000000000000000e+00, 2.431200027464000e+00, 3.619999915355000e+00
CaS , 0.000000000000000e+00, 0.000000000000000e+00, -5.689799785620000e+01, 3.339999914163000e+00 CaS , 0.000000000000000e+00, 0.000000000000000e+00, 4.862400054928000e+00, 3.339999914163000e+00
CaSe , 0.000000000000000e+00, 0.000000000000000e+00, -5.501999855040000e+01, 3.279999911779000e+00 CaSe , 0.000000000000000e+00, 0.000000000000000e+00, 1.033260011672200e+01, 3.279999911779000e+00
CaTe , 0.000000000000000e+00, 0.000000000000000e+00, -5.331999778740000e+01, 3.139999926084000e+00 CaTe , 0.000000000000000e+00, 0.000000000000000e+00, 1.580280017851600e+01, 3.139999926084000e+00
CdO , 0.000000000000000e+00, 0.000000000000000e+00, -1.442831954956800e+02, 2.510000020265000e+00 CdO , 0.000000000000000e+00, 0.000000000000000e+00, 6.709599971768000e+00, 2.510000020265000e+00
CdS , 1.000000000000000e+00, 1.000000000000000e+00, -1.365551948548800e+02, 2.230000019073000e+00 CdS , 1.000000000000000e+00, 1.000000000000000e+00, 1.341919994353600e+01, 2.230000019073000e+00
CdSe , 1.000000000000000e+00, 1.000000000000000e+00, -1.320479965209600e+02, 2.170000016689000e+00 CdSe , 1.000000000000000e+00, 1.000000000000000e+00, 2.851579988001400e+01, 2.170000016689000e+00
CdTe , 1.000000000000000e+00, 1.000000000000000e+00, -1.279679946897600e+02, 2.030000030994000e+00 CdTe , 1.000000000000000e+00, 1.000000000000000e+00, 4.361239981649200e+01, 2.030000030994000e+00
BrCs , 0.000000000000000e+00, 0.000000000000000e+00, -2.056615006924500e+02, 4.870000123980000e+00 BrCs , 0.000000000000000e+00, 0.000000000000000e+00, -1.993599951266000e+01, 4.870000123980000e+00
ClCs , 0.000000000000000e+00, 0.000000000000000e+00, -2.183939957617000e+02, 4.940000116827000e+00 ClCs , 0.000000000000000e+00, 0.000000000000000e+00, -9.683199763292000e+00, 4.940000116827000e+00
CsF , 0.000000000000000e+00, 0.000000000000000e+00, -2.350424981118500e+02, 5.210000127556000e+00 CsF , 0.000000000000000e+00, 0.000000000000000e+00, -5.126399874684000e+00, 5.210000127556000e+00
CsI , 0.000000000000000e+00, 0.000000000000000e+00, -1.932424986360000e+02, 4.720000147822000e+00 CsI , 0.000000000000000e+00, 0.000000000000000e+00, -3.018879926202800e+01, 4.720000147822000e+00
BrCu , 1.000000000000000e+00, 1.000000000000000e+00, -1.084397003651100e+02, 2.129999995230000e+00 BrCu , 1.000000000000000e+00, 1.000000000000000e+00, -5.734749913200000e+01, 2.129999995230000e+00
ClCu , 1.000000000000000e+00, 1.000000000000000e+00, -1.151531977652600e+02, 2.199999988077000e+00 ClCu , 1.000000000000000e+00, 1.000000000000000e+00, -2.785449957840000e+01, 2.199999988077000e+00
CuF , 0.000000000000000e+00, 0.000000000000000e+00, -1.239314990044300e+02, 2.469999998806000e+00 CuF , 0.000000000000000e+00, 0.000000000000000e+00, -1.474649977680000e+01, 2.469999998806000e+00
CuI , 1.000000000000000e+00, 1.000000000000000e+00, -1.018914992808000e+02, 1.980000019072000e+00 CuI , 1.000000000000000e+00, 1.000000000000000e+00, -8.684049868560000e+01, 1.980000019072000e+00
GaN , 1.000000000000000e+00, 1.000000000000000e+00, -5.789249837405000e+01, 1.780000030999000e+00 GaN , 1.000000000000000e+00, 1.000000000000000e+00, -7.566999793080000e-01, 1.780000030999000e+00
GaP , 1.000000000000000e+00, 1.000000000000000e+00, -5.951999866948000e+01, 1.490000069146000e+00 GaP , 1.000000000000000e+00, 1.000000000000000e+00, -1.621499955660000e+00, 1.490000069146000e+00
GaSb , 1.000000000000000e+00, 1.000000000000000e+00, -5.724769854548000e+01, 1.320000052457000e+00 GaSb , 1.000000000000000e+00, 1.000000000000000e+00, -5.513099849244000e+00, 1.320000052457000e+00
Ge2 , 1.000000000000000e+00, 1.000000000000000e+00, -3.036800003052800e+01, 1.159999966620000e+00 Ge2 , 1.000000000000000e+00, 1.000000000000000e+00, -3.036800003052800e+01, 1.159999966620000e+00
CGe , 1.000000000000000e+00, 1.000000000000000e+00, -2.791679954528000e+01, 1.439999997614000e+00 CGe , 1.000000000000000e+00, 1.000000000000000e+00, -5.694000005724000e+00, 1.439999997614000e+00
GeSi , 1.000000000000000e+00, 1.000000000000000e+00, -3.177599906921600e+01, 1.139999985693000e+00 GeSi , 1.000000000000000e+00, 1.000000000000000e+00, -1.328600001335600e+01, 1.139999985693000e+00
AsIn , 1.000000000000000e+00, 1.000000000000000e+00, -9.012080097216000e+01, 1.779999971388000e+00 AsIn , 1.000000000000000e+00, 1.000000000000000e+00, -8.457900077121000e+00, 1.779999971388000e+00
InN , 1.000000000000000e+00, 1.000000000000000e+00, -9.150749742995001e+01, 2.089999973772000e+00 InN , 1.000000000000000e+00, 1.000000000000000e+00, -1.794100016359000e+00, 2.089999973772000e+00
InP , 1.000000000000000e+00, 1.000000000000000e+00, -9.407999789691999e+01, 1.800000011919000e+00 InP , 1.000000000000000e+00, 1.000000000000000e+00, -3.844500035055000e+00, 1.800000011919000e+00
InSb , 1.000000000000000e+00, 1.000000000000000e+00, -9.048829770092000e+01, 1.629999995230000e+00 InSb , 1.000000000000000e+00, 1.000000000000000e+00, -1.307130011918700e+01, 1.629999995230000e+00
BrK , 0.000000000000000e+00, 0.000000000000000e+00, -7.104670023921000e+01, 3.820000171660000e+00 BrK , 0.000000000000000e+00, 0.000000000000000e+00, -2.174549937248500e+01, 3.820000171660000e+00
ClK , 0.000000000000000e+00, 0.000000000000000e+00, -7.544519853586000e+01, 3.890000164507000e+00 ClK , 0.000000000000000e+00, 0.000000000000000e+00, -1.056209969520700e+01, 3.890000164507000e+00
FK , 0.000000000000000e+00, 0.000000000000000e+00, -8.119649934773000e+01, 4.160000175236000e+00 FK , 0.000000000000000e+00, 0.000000000000000e+00, -5.591699838639000e+00, 4.160000175236000e+00
IK , 0.000000000000000e+00, 0.000000000000000e+00, -6.675649952879999e+01, 3.670000195502000e+00 IK , 0.000000000000000e+00, 0.000000000000000e+00, -3.292889904976300e+01, 3.670000195502000e+00
BrLi , 0.000000000000000e+00, 0.000000000000000e+00, -1.121790003777000e+01, 2.899999976160000e+00 BrLi , 0.000000000000000e+00, 0.000000000000000e+00, -2.443349897863000e+01, 2.899999976160000e+00
ClLi , 0.000000000000000e+00, 0.000000000000000e+00, -1.191239976882000e+01, 2.969999969007000e+00 ClLi , 0.000000000000000e+00, 0.000000000000000e+00, -1.186769950390600e+01, 2.969999969007000e+00
FLi , 0.000000000000000e+00, 0.000000000000000e+00, -1.282049989701000e+01, 3.239999979736000e+00 FLi , 0.000000000000000e+00, 0.000000000000000e+00, -6.282899737362000e+00, 3.239999979736000e+00
ILi , 0.000000000000000e+00, 0.000000000000000e+00, -1.054049992560000e+01, 2.750000000002000e+00 ILi , 0.000000000000000e+00, 0.000000000000000e+00, -3.699929845335400e+01, 2.750000000002000e+00
MgO , 0.000000000000000e+00, 0.000000000000000e+00, -3.607079887392000e+01, 2.770000010735000e+00 MgO , 0.000000000000000e+00, 0.000000000000000e+00, 5.539999961856000e+00, 2.770000010735000e+00
MgS , 0.000000000000000e+00, 0.000000000000000e+00, -3.413879871372000e+01, 2.490000009543000e+00 MgS , 0.000000000000000e+00, 0.000000000000000e+00, 1.107999992371200e+01, 2.490000009543000e+00
MgSe , 0.000000000000000e+00, 0.000000000000000e+00, -3.301199913024000e+01, 2.430000007159000e+00 MgSe , 0.000000000000000e+00, 0.000000000000000e+00, 2.354499983788800e+01, 2.430000007159000e+00
MgTe , 0.000000000000000e+00, 0.000000000000000e+00, -3.199199867244000e+01, 2.290000021464000e+00 MgTe , 0.000000000000000e+00, 0.000000000000000e+00, 3.600999975206400e+01, 2.290000021464000e+00
BrNa , 0.000000000000000e+00, 0.000000000000000e+00, -4.113230013849000e+01, 3.559999942780000e+00 BrNa , 0.000000000000000e+00, 0.000000000000000e+00, -2.504949897527000e+01, 3.559999942780000e+00
ClNa , 0.000000000000000e+00, 0.000000000000000e+00, -4.367879915234000e+01, 3.629999935627000e+00 ClNa , 0.000000000000000e+00, 0.000000000000000e+00, -1.216689950227400e+01, 3.629999935627000e+00
FNa , 0.000000000000000e+00, 0.000000000000000e+00, -4.700849962237000e+01, 3.899999946356000e+00 FNa , 0.000000000000000e+00, 0.000000000000000e+00, -6.441299736497999e+00, 3.899999946356000e+00
INa , 0.000000000000000e+00, 0.000000000000000e+00, -3.864849972720000e+01, 3.409999966622000e+00 INa , 0.000000000000000e+00, 0.000000000000000e+00, -3.793209844826600e+01, 3.409999966622000e+00
BrRb , 0.000000000000000e+00, 0.000000000000000e+00, -1.383541004658300e+02, 4.690000057220000e+00 BrRb , 0.000000000000000e+00, 0.000000000000000e+00, -2.066399931907500e+01, 4.690000057220000e+00
ClRb , 0.000000000000000e+00, 0.000000000000000e+00, -1.469195971487800e+02, 4.760000050067000e+00 ClRb , 0.000000000000000e+00, 0.000000000000000e+00, -1.003679966926500e+01, 4.760000050067000e+00
FRb , 0.000000000000000e+00, 0.000000000000000e+00, -1.581194987297900e+02, 5.030000060796000e+00 FRb , 0.000000000000000e+00, 0.000000000000000e+00, -5.313599824904999e+00, 5.030000060796000e+00
IRb , 0.000000000000000e+00, 0.000000000000000e+00, -1.299994990824000e+02, 4.540000081062000e+00 IRb , 0.000000000000000e+00, 0.000000000000000e+00, -3.129119896888500e+01, 4.540000081062000e+00
Si2 , 1.000000000000000e+00, 1.000000000000000e+00, -1.390199959278200e+01, 1.129999995230000e+00 Si2 , 1.000000000000000e+00, 1.000000000000000e+00, -1.390199959278200e+01, 1.129999995230000e+00
CSi , 1.000000000000000e+00, 1.000000000000000e+00, -1.221359980106000e+01, 1.430000007151000e+00 CSi , 1.000000000000000e+00, 1.000000000000000e+00, -5.957999825478000e+00, 1.430000007151000e+00
Sn2 , 1.000000000000000e+00, 1.000000000000000e+00, -5.195999741550001e+01, 1.340000033380000e+00 Sn2 , 1.000000000000000e+00, 1.000000000000000e+00, -5.195999741550001e+01, 1.340000033380000e+00
CSn , 1.000000000000000e+00, 1.000000000000000e+00, -4.361999928950000e+01, 1.759999990465000e+00 CSn , 1.000000000000000e+00, 1.000000000000000e+00, -6.235199689860000e+00, 1.759999990465000e+00
GeSn , 1.000000000000000e+00, 1.000000000000000e+00, -4.745000004769999e+01, 1.479999959471000e+00 GeSn , 1.000000000000000e+00, 1.000000000000000e+00, -3.325439834592000e+01, 1.479999959471000e+00
SiSn , 1.000000000000000e+00, 1.000000000000000e+00, -4.964999854565000e+01, 1.459999978544000e+00 SiSn , 1.000000000000000e+00, 1.000000000000000e+00, -1.454879927634000e+01, 1.459999978544000e+00
OSr , 0.000000000000000e+00, 0.000000000000000e+00, -1.142241964340800e+02, 3.999999910595000e+00 OSr , 0.000000000000000e+00, 0.000000000000000e+00, 2.744800090792000e+00, 3.999999910595000e+00
SSr , 0.000000000000000e+00, 0.000000000000000e+00, -1.081061959267800e+02, 3.719999909403000e+00 SSr , 0.000000000000000e+00, 0.000000000000000e+00, 5.489600181584000e+00, 3.719999909403000e+00
SeSr , 0.000000000000000e+00, 0.000000000000000e+00, -1.045379972457600e+02, 3.659999907019000e+00 SeSr , 0.000000000000000e+00, 0.000000000000000e+00, 1.166540038586600e+01, 3.659999907019000e+00
SrTe , 0.000000000000000e+00, 0.000000000000000e+00, -1.013079957960600e+02, 3.519999921324000e+00 SrTe , 0.000000000000000e+00, 0.000000000000000e+00, 1.784120059014800e+01, 3.519999921324000e+00
OZn , 1.000000000000000e+00, 1.000000000000000e+00, -9.017699718480000e+01, 2.189999967815000e+00 OZn , 1.000000000000000e+00, 1.000000000000000e+00, 8.645600318880000e+00, 2.189999967815000e+00
SZn , 1.000000000000000e+00, 1.000000000000000e+00, -8.534699678430000e+01, 1.909999966623000e+00 SZn , 1.000000000000000e+00, 1.000000000000000e+00, 1.729120063776000e+01, 1.909999966623000e+00
SeZn , 1.000000000000000e+00, 1.000000000000000e+00, -8.252999782560001e+01, 1.849999964239000e+00 SeZn , 1.000000000000000e+00, 1.000000000000000e+00, 3.674380135524000e+01, 1.849999964239000e+00
TeZn , 1.000000000000000e+00, 1.000000000000000e+00, -7.997999668110000e+01, 1.709999978544000e+00 TeZn , 1.000000000000000e+00, 1.000000000000000e+00, 5.619640207272000e+01, 1.709999978544000e+00
</details> </details>
### Cross-Validation ## Cross-Validation
While we won't do it here, cross-validation should also be performed for classification problems. While we won't do it here, cross-validation should also be performed for classification problems.
For those calculations the number of miscalssified points in the test set is the most important measure of the error. For those calculations the number of miscalssified points in the test set is the most important measure of the error.
## Updating the SVM Model Using `sklearn` ## Updating the SVM Model Using `sklearn`
The final decision boundary listed in the classification model is found via [linear support vector machine (SVM) model](https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf).
The objective function for SVM (equation 1 in the linked pdf) balances the size of the margin (distance between the points and the decision boundary) and the number of points miscalssified.
In `libsvm` the trade-off between these two components is controlled by the cost parameter, `c`, where a larger `c` prioritizes the number of miscalssified points over the margin size.
Because the basis of the classification algorithm is based on the overlap region of the convex hull, the `c` value for the SVM model is set at a fairly high value of 1000.0. Because the basis of the classification algorithm is based on the overlap region of the convex hull, the `c` value for the SVM model is set at a fairly high value of 1000.0.
This will prioritize reducing the number of misclassified points, but does make the model more susceptible to being over fit. While within the scope of the SISSO algorithm this choice makes sense, it may not the best one for all models.
To account for this the python interface has the ability to refit the Linear SVM using the `svm` module of `sklearn`. To account for this the python interface has the ability to refit the Linear SVM using the `svm` module of `sklearn`.
To do this in python we need to run and store the updated models into separate files Using this functionality we can modify the `c` parameter from 1.0 to 1000.0 on a log-scale and evaluate how well each model is performing.
``` Importantly we could also use `sklearn` to perform cross-validation on the SVM models to help quantify that performance (we will not do that here since the data is linearly separable and a hard margin is appropriate).
To update the SVM models in python we need to run and store the updated models into separate files as shown below
```python
>>> from sissopp.postprocess.classification import update_model_svm >>> from sissopp.postprocess.classification import update_model_svm
>>> model_1 = update_model_svm("models/train_dim_2_model_0.dat", 1.0, 100000, filename="models/train_dim_2_model_0_c_1.dat") >>> model_1 = update_model_svm("models/train_dim_2_model_0.dat", 1.0, 100000, filename="models/train_dim_2_model_0_c_1.dat")
The updated coefficient for the decision boundaries: The updated coefficient for the decision boundaries:
[array([[-2.94570042e-04, -8.09254771e-01, 1.91416311e+00]])] [array([[ 7.34341190e-04, -8.31373991e-01, 2.06766275e+00]])]
>>> model_10 = update_model_svm("models/train_dim_2_model_0.dat", 10.0, 100000, filename="models/train_dim_2_model_0_c_10.dat") >>> model_10 = update_model_svm("models/train_dim_2_model_0.dat", 10.0, 100000, filename="models/train_dim_2_model_0_c_10.dat")
The updated coefficient for the decision boundaries: The updated coefficient for the decision boundaries:
[array([[-9.37486903e-04, -1.73634223e+00, 3.94114178e+00]])] [array([[-2.33513390e-03, -1.83281827e+00, 4.27591385e+00]])]
>>> model_100 = update_model_svm("models/train_dim_2_model_0.dat", 100.0, 100000, filename="models/train_dim_2_model_0_c_100.dat") >>> model_100 = update_model_svm("models/train_dim_2_model_0.dat", 100.0, 100000, filename="models/train_dim_2_model_0_c_100.dat")
The updated coefficient for the decision boundaries: The updated coefficient for the decision boundaries:
[array([[-8.00966318e-03, -3.83350338e+00, 8.58806106e+00]])] [array([[-5.91833282e-03, -4.40795646e+00, 1.01687678e+01]])]
>>> model_1000 = update_model_svm("models/train_dim_2_model_0.dat", 1000.0, 100000, filename="models/train_dim_2_model_0_c_1000.dat") >>> model_1000 = update_model_svm("models/train_dim_2_model_0.dat", 1000.0, 100000, filename="models/train_dim_2_model_0_c_1000.dat")
The updated coefficient for the decision boundaries: The updated coefficient for the decision boundaries:
[array([[-0.01834904, -7.01118464, 15.7364891 ]])] [array([[-1.10004264e-02, -9.01866680e+00, 2.07093411e+01]])]
``` ```
Comparing the final `c=1000.0` results to the ones found by `SISSO++` we see that the coefficients for the decision are slightly different. Comparing the final `c=1000.0` results to the ones found by `SISSO++` we see that the coefficients for the decision are slightly different.
These changes are a result of different SVM libraries leading to slightly different results; however, if we plot both of these models, we see that the boundaries are fairly close to each other suggesting that the changes are minor. These changes are a result of different SVM libraries leading to slightly different results; however, if we plot both of these models, we see that the boundaries are fairly close to each other suggesting that the changes are minor.
...@@ -333,7 +342,7 @@ These changes are a result of different SVM libraries leading to slightly differ ...@@ -333,7 +342,7 @@ These changes are a result of different SVM libraries leading to slightly differ
>>> plot_classification("models/train_dim_2_model_0_c_1000.dat", filename="c_1000.png", fig_settings={"size":{"width": 5.0, "height": 5.0}}).show() >>> plot_classification("models/train_dim_2_model_0_c_1000.dat", filename="c_1000.png", fig_settings={"size":{"width": 5.0, "height": 5.0}}).show()
``` ```
<details> <details>
<summary> `SISSO++` Classification </summary> <summary> SISSO++ Classification </summary>
![image](./classification/sissopp.png) ![image](./classification/sissopp.png)
...@@ -345,7 +354,8 @@ These changes are a result of different SVM libraries leading to slightly differ ...@@ -345,7 +354,8 @@ These changes are a result of different SVM libraries leading to slightly differ
![image](./classification/c_1000.png) ![image](./classification/c_1000.png)
</details> </details>
However as we decrease the value of `c` an increasing number of points becomes miss classified, suggesting the model is potentially over-fitting the data .
However as we decrease the value of `c` an increasing number of points becomes miss classified, suggesting the model is potentially over-fitting the data and would not properly classify new data points.
```python ```python
>>> from sissopp.postprocess.plot.classification import plot_classification >>> from sissopp.postprocess.plot.classification import plot_classification
...@@ -356,31 +366,32 @@ However as we decrease the value of `c` an increasing number of points becomes m ...@@ -356,31 +366,32 @@ However as we decrease the value of `c` an increasing number of points becomes m
>>> plot_classification("models/train_dim_2_model_0_c_1000.dat", filename="c_1000.png", fig_settings={"size":{"width": 5.0, "height": 5.0}}).show() >>> plot_classification("models/train_dim_2_model_0_c_1000.dat", filename="c_1000.png", fig_settings={"size":{"width": 5.0, "height": 5.0}}).show()
``` ```
<details>
<summary> sklearn SVM c=1.0 </summary>
![image](./classification/c_1.png)
</details>
<details>
<summary> sklearn SVM c=10.0 </summary>
![image](./classification/c_10.png)
</details>
<details>
<summary> sklearn SVM c=100.0 </summary>
![image](./classification/c_100.png)
</details>
<details>
<summary> sklearn SVM c=1000.0 </summary>
![image](./classification/c_1000.png)
</details>
docs/tutorial/classification/c_1.png

20.2 KiB | W: | H:

docs/tutorial/classification/c_1.png

19.3 KiB | W: | H:

docs/tutorial/classification/c_1.png
docs/tutorial/classification/c_1.png
docs/tutorial/classification/c_1.png
docs/tutorial/classification/c_1.png
  • 2-up
  • Swipe
  • Onion skin
docs/tutorial/classification/c_10.png

19.2 KiB

docs/tutorial/classification/c_100.png

19.2 KiB

docs/tutorial/classification/c_1000.png

20.3 KiB | W: | H:

docs/tutorial/classification/c_1000.png

19.2 KiB | W: | H:

docs/tutorial/classification/c_1000.png
docs/tutorial/classification/c_1000.png
docs/tutorial/classification/c_1000.png
docs/tutorial/classification/c_1000.png
  • 2-up
  • Swipe
  • Onion skin
This diff is collapsed.
{ {
"data_file": "data.csv", "data_file": "data_class.csv",
"property_key": "Class", "property_key": "Class",
"desc_dim": 2, "desc_dim": 2,
"n_sis_select": 20, "n_sis_select": 20,
......
docs/tutorial/classification/sissopp.png

20.4 KiB | W: | H:

docs/tutorial/classification/sissopp.png

19.1 KiB | W: | H:

docs/tutorial/classification/sissopp.png
docs/tutorial/classification/sissopp.png
docs/tutorial/classification/sissopp.png
docs/tutorial/classification/sissopp.png
  • 2-up
  • Swipe
  • Onion skin
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment