Skip to content
Snippets Groups Projects
Commit d8c1b421 authored by Thomas Purcell's avatar Thomas Purcell
Browse files

Merge branch 'master' of github.com:tpurcell90/sissopp

parents e6001c42 1ffb0f4d
No related branches found
No related tags found
No related merge requests found
Showing
with 1594 additions and 12 deletions
C++ Implementation of SISSO
C++ Implementation of SISSO with python bindings
===
## Overview
......@@ -35,7 +35,7 @@ set(CMAKE_CXX_FLAGS "-O2" CACHE STRING "")
#################
# Feature Flags #
#################
set(BUILD_PYTHON ON CACHE BOOL "")
set(USE_PYTHON ON CACHE BOOL "")
set(EXTERNAL_BOOST OFF CACHE BOOL "")
```
Here the `-O2` flag is for optimizations, it is recommended to stay as `-O2` or `-O3`, but it can be changed to match compiler requirements.
......@@ -64,7 +64,7 @@ make install
Once all the commands are run `sisso++` should be in the `~/cpp_sisso/main directory/bin/` directory.
### Install `_sisso`
To install the python bindings first ensure your python path matches the path used to configure `boost` and then repeat the same commands as above but set `BUILD_PYTHON` in `initial_config.cmake` to `ON`.
To install the python bindings first ensure your python path matches the path used to configure `boost` and then repeat the same commands as above but set `USE_PYTHON` in `initial_config.cmake` to `ON`.
Once installed you should have access to the python interface via `import cpp_sisso`.
......@@ -76,9 +76,8 @@ To see a sample of the input files look in `~/sisso++/main directory/test/exec_t
To use the code two files are necessary: `sisso.json` and `data.csv`.
`data.csv` stores all the data for the calculation in a `csv` file.
The first row in the file corresponds to the feature meta data with the following format `expression (Unit) : [lower theoretical bound, upper theoretical bound) | (lower theoretical absolute bound, upper theoretical absolute bound]`.
For the bounds
For example if one of the primary features used in the set is the lattice constant of a material the header would be `lat_param (AA) : (0.0, infty)` (the absolute boundary is set from the theoretical bounds) and the header for an arbitrary energy that whose value is between -50 eV and 50 eV and can't be zero would be `E (ev) : [-50.0, 50.0] | (0.0, 50.0]`.
The first row in the file corresponds to the feature meta data with the following format `expression (Unit)`.
For example if one of the primary features used in the set is the lattice constant of a material the header would be `lat_param (AA)`.
The first column of the file are sample labels for all of the other rows, and is not used.
The input parameters are stored in `sisso.json`, here is a list of all possible variables that can be sored in `sisso.json`
......@@ -99,13 +98,9 @@ The expression of the column where the task identification is stored. (Default:
The set of operators to use to combine the features during feature creation. (If empty use all available features)
#### `param_opset`
The list of operators to be parameterized against the property and the list of parameters to optimize over (same list as opset and an operator can be defined for both).
#### `calc_type`
The type of calculation to run either regression, log_regression, or classification
The type of calculation to run either regression or classification
#### `desc_dim`
......@@ -145,7 +140,7 @@ Maximum absolute value allowed in the feature's training data (Default: 1e50)
#### `leave_out_inds`
The indexes from the data set to use as the test set. If empty and `leave_out_frac > 0` the selection will be random
The indicies from the data set to use as the test set. If empty and `leave_out_frac > 0` the selection will be random
#### `leave_out_frac`
......
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
version="1.1"
viewBox="0 0 179.83904 152.625"
stroke-miterlimit="10"
id="svg15"
sodipodi:docname="FV.svg"
width="179.83904"
height="152.625"
style="fill:none;stroke:none;stroke-linecap:square;stroke-miterlimit:10"
inkscape:version="0.92.4 (unknown)">
<metadata
id="metadata21">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title></dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<defs
id="defs19" />
<sodipodi:namedview
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1"
objecttolerance="10"
gridtolerance="10"
guidetolerance="10"
inkscape:pageopacity="0"
inkscape:pageshadow="2"
inkscape:window-width="2560"
inkscape:window-height="1411"
id="namedview17"
showgrid="false"
inkscape:pagecheckerboard="true"
fit-margin-top="10"
fit-margin-left="10"
fit-margin-right="10"
fit-margin-bottom="10"
inkscape:zoom="2.4633342"
inkscape:cx="8.9457869"
inkscape:cy="93.592415"
inkscape:window-x="1920"
inkscape:window-y="0"
inkscape:window-maximized="1"
inkscape:current-layer="svg15" />
<clipPath
id="g7f066f2381_0_4.0">
<path
d="M 0,0 H 907.0866 V 377.95276 H 0 Z"
id="path2"
inkscape:connector-curvature="0"
style="clip-rule:nonzero" />
</clipPath>
<path
style="fill:#ffffff;fill-rule:nonzero"
inkscape:connector-curvature="0"
id="path9"
d="M 72.375,67.39063 H 50.03125 V 142.625 H 24.21875 V 67.39063 H 10 V 54.95313 L 24.21875,48 v -6.9375 q 0,-16.15625 7.95312,-23.60937 Q 40.125,10 57.65625,10 71.01562,10 81.4375,13.98438 L 74.82812,32.9375 q -7.78125,-2.45312 -14.39062,-2.45312 -5.5,0 -7.95313,3.26562 -2.45312,3.25 -2.45312,8.32813 V 48 H 72.375 Z" />
<path
style="fill:#ffffff;fill-rule:nonzero;stroke-width:1.0324142"
inkscape:connector-curvature="0"
id="path11"
d="M 103.17161,142.625 64.75,48 h 28.778734 l 19.485596,53.92187 q 3.24761,10.23438 4.06367,19.375 h 0.53294 q 0.44967,-8.125 4.06367,-19.375 L 141.06027,48 h 28.77876 l -38.42163,94.625 z" />
</svg>
docs/.icons/images/FV_green.png

14.6 KiB

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
version="1.1"
viewBox="0 0 200.00001 200"
stroke-miterlimit="10"
id="svg15"
sodipodi:docname="FV_green.svg"
width="200"
height="200"
style="fill:none;stroke:none;stroke-linecap:square;stroke-miterlimit:10"
inkscape:version="0.92.5 (0.92.5+68)"
inkscape:export-filename="/home/knoop/local/hilde/docs/images/FV_green.png"
inkscape:export-xdpi="300"
inkscape:export-ydpi="300">
<metadata
id="metadata21">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title></dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<defs
id="defs19" />
<sodipodi:namedview
pagecolor="#007d6b"
bordercolor="#666666"
borderopacity="1"
objecttolerance="10"
gridtolerance="10"
guidetolerance="10"
inkscape:pageopacity="1"
inkscape:pageshadow="2"
inkscape:window-width="1852"
inkscape:window-height="1051"
id="namedview17"
showgrid="false"
inkscape:pagecheckerboard="true"
fit-margin-top="10"
fit-margin-left="10"
fit-margin-right="10"
fit-margin-bottom="10"
inkscape:zoom="2.4633342"
inkscape:cx="44.669726"
inkscape:cy="68.542608"
inkscape:window-x="68"
inkscape:window-y="0"
inkscape:window-maximized="1"
inkscape:current-layer="svg15" />
<clipPath
id="g7f066f2381_0_4.0">
<path
d="M 0,0 H 907.0866 V 377.95276 H 0 Z"
id="path2"
inkscape:connector-curvature="0"
style="clip-rule:nonzero" />
</clipPath>
<path
style="fill:#ffffff;fill-rule:nonzero"
inkscape:connector-curvature="0"
id="path9"
d="m 88.455485,91.07813 h -22.34375 v 75.23437 h -25.8125 V 91.07813 H 26.080485 V 78.64063 L 40.299235,71.6875 V 64.75 q 0,-16.15625 7.95312,-23.60937 7.95313,-7.45313 25.48438,-7.45313 13.35937,0 23.78125,3.98438 L 90.908605,56.625 q -7.78125,-2.45312 -14.39062,-2.45312 -5.5,0 -7.95313,3.26562 -2.45312,3.25 -2.45312,8.32813 v 5.92187 h 22.34375 z" />
<path
style="fill:#ffffff;fill-rule:nonzero;stroke-width:1.0324142"
inkscape:connector-curvature="0"
id="path11"
d="M 119.25209,166.3125 80.830485,71.6875 h 28.778735 l 19.48559,53.92187 q 3.24761,10.23438 4.06368,19.375 h 0.53293 q 0.44967,-8.125 4.06367,-19.375 L 157.14076,71.6875 h 28.77875 l -38.42163,94.625 z" />
</svg>
# Contributing
When contributing to this repository, please first discuss the change you wish to make via an issue,
email, or any other method with the maintainers of this repository. This will make life easier for everyone.
## Report Issues
Please use the [issue tracker](https://gitlab.com/sissopp-developers/sissopp/-/issues) to report issues. Before posting an issue please insure that it meets the following requirements:
- The issue has not been reported previously (Have a brief look at the issues page)
- Describe the issue in terms of actual v. expected behavior
- Provide a minimal example of the issue you are seeing
## Contribute Code via Merge Request
In order to contribute code to `SISSO++`, please use a merge request (see guidelines of preparing a merge request [here](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)).
- Please _document_ and _test_ your changes. Tests are found in `sissopp/tests` and written with [pytest](https://docs.pytest.org/en/stable/) for the python bindings and [googletest](https://github.com/google/googletest) for the C++ interface.
- If a new feature is introduced please create a minimal test of the binary file
# Credits
`SISSO++` would not be possible without the following packages:
- [boost](https://www.boost.org/)
- [Coin-Clp](https://github.com/coin-or/Clp)
- [LIBSVM](https://www.csie.ntu.edu.tw/~cjlin/libsvm/)
- [googletest](https://github.com/google/googletest)
- [mkdocs](https://www.mkdocs.org/) and [mkdocs-material](https://squidfunk.github.io/mkdocs-material/)
### How to cite these packages:
Please make sure to give credit to the right people when using `SISSO++`:
For classification problems cite:
- [How to cite Coin-Clp](https://zenodo.org/record/3748677#.YBuxDVmYVhE)
- [How to cite LIBSVM](https://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f203)
## Installation
The package uses a CMake build system, and compatible all versions of the C++ standard library after C++ 14.
### Prerequisites
To install the sisso++ the following packages are needed:
- CMake version 3.10 and up
- A C++ complier (compatible with C++ 14 and later)
- BLAS/LAPACK (Architecture specific compilations like MKL or ACML are recommended)
- MPI
- Boost with the following libraries compiled (mpi, serialization, system, and filesystem)
To build the optional python bindings the following are also needed:
- Python 3 interpreter
- Boost with the python and numpy libraries compiled
### Install `SISSO++`
`SISSO++` is installed using a cmake build system, with some basic configuration files stored in `cmake/toolchains/`
As an example here is an `initial_config.cmake` file used to construct `SISSO++` and the python bindings using the gnu compiler.
```
###############
# Basic Flags #
###############
set(CMAKE_CXX_COMPILER g++ CACHE STRING "")
set(CMAKE_CXX_FLAGS "-O3" CACHE STRING "")
#################
# Feature Flags #
#################
set(USE_PYTHON ON CACHE BOOL "")
set(EXTERNAL_BOOST OFF CACHE BOOL "")
```
Here the `-O3` flag is for optimizations, it is recommended to stay as `-O3` or `-O2`, but it can be changed to match compiler requirements.
When building Boost from source (`EXTERNAL_BOOST OFF`) the number of processes used when building Boost may be set using the
`BOOST_BUILD_N_PROCS` flag in CMake. For example, to build Boost using 4 processes, the following flag should be included in the
`initial_config.cmake` file:
```
#set(BOOST_BUILD_N_PROCS 4 CACHE STRING "")
```
This flag will have no effect when linking against external boost, i.e. `EXTERNAL_BOOST ON`.
To install `SISSO++` run the following commands (this assumes gnu compiler and MKL are used, if you are using a different compiler/BLAS library change the flags to the relevant data)
```
export MKLROOT=/path/to/mkl/
export BOOST_ROOT=/path/to/boost
cd ~/SISSO++/main directory
mkdir build/;
cd build/;
cmake -C initial_config.cmake ../
make install
```
Once all the commands are run `SISSO++` should be in the `~/SISSO++/main directory/bin/` directory.
### Install the Python Bindings
To install the python bindings first ensure your python path matches the path used to configure `boost` and then repeat the same commands as above but set `USE_PYTHON` in `initial_config.cmake` to `ON`.
Once installed you should have access to the python interface via `import sissopp`.
C++ Implementation of SISSO with python bindings
===
## Overview
This package provides a C++ implementation of SISSO with built in Python bindings for an efficient python interface.
Future work will expand the python interface to include more postporcessing analysis tools.
## Installation
The package uses a CMake build system, and compatible all versions of the C++ standard library after C++ 14.
### Prerequisites
To install the sisso++ the following packages are needed:
- CMake version 3.10 and up
- A C++ complier (compatible with C++ 14 and later)
- BLAS/LAPACK (Architecture specific compilations like MKL or ACML are recommended)
- MPI
- Boost with the following libraries compiled (mpi, serialization, system, and filesystem)
To build the optional python bindings the following are also needed:
- Python 3 interpreter
- Boost with the python and numpy libraries compiled
### Install `sisso++`
`sisso++` is installed using a cmake build system, with some basic configuration files stored in `cmake/toolchains/`
As an example here is an `initial_config.cmake` file used to construct `sisso++` and the python bindings using the gnu compiler.
```
###############
# Basic Flags #
###############
set(CMAKE_CXX_COMPILER g++ CACHE STRING "")
set(CMAKE_CXX_FLAGS "-O2" CACHE STRING "")
#################
# Feature Flags #
#################
set(USE_PYTHON ON CACHE BOOL "")
set(EXTERNAL_BOOST OFF CACHE BOOL "")
```
Here the `-O2` flag is for optimizations, it is recommended to stay as `-O2` or `-O3`, but it can be changed to match compiler requirements.
When building Boost from source (`EXTERNAL_BOOST OFF`) the number of processes used when building Boost may be set using the
`BOOST_BUILD_N_PROCS` flag in CMake. For example, to build Boost using 4 processes, the following flag should be included in the
`initial_config.cmake` file:
```
#set(BOOST_BUILD_N_PROCS 4 CACHE STRING "")
```
This flag will have no effect when linking against external boost, i.e. `EXTERNAL_BOOST ON`.
To install `sisso++` run the following commands (this assumes gnu compiler and MKL are used, if you are using a different compiler/BLAS library change the flags to the relevant data)
```
export MKLROOT=/path/to/mkl/
export BOOST_ROOT=/path/to/boost
cd ~/sisso++/main directory
mkdir build/;
cd build/;
cmake -C initial_config.cmake ../
make install
```
Once all the commands are run `sisso++` should be in the `~/cpp_sisso/main directory/bin/` directory.
### Install `_sisso`
To install the python bindings first ensure your python path matches the path used to configure `boost` and then repeat the same commands as above but set `USE_PYTHON` in `initial_config.cmake` to `ON`.
Once installed you should have access to the python interface via `import cpp_sisso`.
## Running the code
### Input files
To see a sample of the input files look in `~/sisso++/main directory/test/exec_test`
To use the code two files are necessary: `sisso.json` and `data.csv`.
`data.csv` stores all the data for the calculation in a `csv` file.
The first row in the file corresponds to the feature meta data with the following format `expression (Unit)`.
For example if one of the primary features used in the set is the lattice constant of a material the header would be `lat_param (AA)`.
The first column of the file are sample labels for all of the other rows, and is not used.
The input parameters are stored in `sisso.json`, here is a list of all possible variables that can be sored in `sisso.json`
#### `data_file`
The name of the csv file where the data is stored. (Default: "data.csv")
#### `property_key`
The expression of the column where the property to be modeled is stored. (Default: "prop")
#### `task_key`
The expression of the column where the task identification is stored. (Default: "Task")
#### `opset`
The set of operators to use to combine the features during feature creation. (If empty use all available features)
#### `calc_type`
The type of calculation to run either regression or classification
#### `desc_dim`
The maximum dimension of the model to be created
#### `n_sis_select`
The number of features that SIS selects over each iteration
#### `max_rung`
The maximum rung of the feature (height of the tallest possible binary expression tree - 1)
#### `n_residual`
Number of residuals to used to select the next subset of materials in the iteration. (Affects SIS after the 1D model) (Default: 1)
#### `n_models_store`
Number of models to output as file for each dimension (Default: n_residual)
#### `n_rung_store`
The number of rungs where all of the training/testing data of the materials are stored in memory. (Default: `max_rung` - 1)
#### `n_rung_generate`
The number of rungs to generate on the fly during each SIS step. Must be 1 or 0. (Default: 0)
#### `min_abs_feat_val`
Minimum absolute value allowed in the feature's training data (Default: 1e-50)
#### `max_abs_feat_val`
Maximum absolute value allowed in the feature's training data (Default: 1e50)
#### `leave_out_inds`
The indicies from the data set to use as the test set. If empty and `leave_out_frac > 0` the selection will be random
#### `leave_out_frac`
Fraction (in decimal form) of the data to use as a test set (Default: 0.0 if `leave_out_inds` is empty, otherwise `len(leave_out_inds)) / Number of rows in data file`
#### `fix_intercept`
If true set the intercept to 0.0 for all Regression models (Default: false)
This does not work for classification
#### `max_feat_cross_correlation`
The maximum Pearson correlation allowed between selected features (Default: 1.0)
### Perform the Calculation
Once the input files are made the code can be run using the following command
```
mpiexec -n 2 ~/sisso++/main directory/bin/sisso++ sisso.json
```
### Analyzing the Results
Once the calculations are done, two sets of output files are generated.
A list of all selected features is stored in: `feature_space/selected_features.txt` and every model used as a residual for SIS is stored in `models`.
The model output files are split into train/test files sorted by the dimensionality of the model and by the train RMSE. The model with the lowest RMSE is stored in the lowest number file.
For example `train_dim_3_model_0.dat` will have the best 3D model, `train_dim_3_model_1.dat` would have the second best, etc.
Each model file has a large header containing information about the features selected and model generated
```
# c0 + a0 * [(|r_p_B - (r_s_B)|) / ([(r_d_A) * (E_HOMO_B)])] + a1 * [(|r_p_B - (r_s_A)|) * ([(IP_A) / (r_s_A)])] + a2 * [(|E_HOMO_B - (EA_B)|) / ((r_p_A)^2)]
# RMSE: 0.0779291679452223; Max AE: 0.290810937048465
# Coefficients
# Task; a0 a1 a2 c0
# 0, 7.174549961742731e+00, 8.687856036798111e-02, 2.468463139364077e-01, -3.995345676823570e-02,
# Feature Rung, Units, and Expressions
# 0, 2, 1 / eV, [(|r_p_B - (r_s_B)|) / ([(r_d_A) * (E_HOMO_B)])]
# 1, 2, eV, [(|r_p_B - (r_s_A)|) * ([(IP_A) / (r_s_A)])]
# 2, 2, 1 / AA^2 * eV, [(|E_HOMO_B - (EA_B)|) / ((r_p_A)^2)]
# Number of Samples Per Task
# Task; n_mats_train
# 0, 78
```
After this header the following data is stored in the file:
```
#Property Value Property Value (EST) Feature 0 Value Feature 1 Value Feature 2 Value
```
With this file the model can be perfectly recreated using the python binding.
### Using the Python Library
To see how the python interface can be used look at `examples/python_interface_demo.ipynb`
If you get an error about not being able to load MKL libraries, you may have to run `conda install numpy` to get proper linking.
# References
## Work that was performed using `SISSO++`
This diff is collapsed.
This diff is collapsed.
File added
docs/Tutorial/combined/3d_model.png

41.8 KiB

docs/Tutorial/combined/cv/cv_10_error.png

34 KiB

docs/Tutorial/combined/cv/cv_50_error.png

35 KiB

This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment