Update README.md

dd5b6a95 · Thomas Purcell · a0a13662 · dd5b6a95
Commit dd5b6a95 authored 3 years ago by Thomas Purcell
--- a/README.md
+++ b/README.md
@@ -5,192 +5,4 @@ C++ Implementation of SISSO with python bindings
 This package provides a C++ implementation of SISSO with built in Python bindings for an efficient python interface.
 Future work will expand the python interface to include more postporcessing analysis tools.

-## Installation
-The package uses a CMake build system, and compatible all versions of the C++ standard library after C++ 14.
-
-### Prerequisites
-To install the sisso++ the following packages are needed:
-
- CMake version 3.10 and up
- A C++ complier (compatible with C++ 14 and later)
- BLAS/LAPACK (Architecture specific compilations like MKL or ACML are recommended)
- MPI
- Boost with the following libraries compiled (mpi, serialization, system, and filesystem)
-
-To build the optional python bindings the following are also needed:
-
- Python 3 interpreter
- Boost with the python and numpy libraries compiled
-
-### Install `sisso++`
-`sisso++` is installed using a cmake build system, with some basic configuration files stored in `cmake/toolchains/`
-As an example here is an `initial_config.cmake` file used to construct `sisso++` and the python bindings using the gnu compiler.
-```
-###############
-# Basic Flags #
-###############
-set(CMAKE_CXX_COMPILER g++ CACHE STRING "")
-set(CMAKE_CXX_FLAGS "-O2" CACHE STRING "")
-
-#################
-# Feature Flags #
-#################
-set(USE_PYTHON ON CACHE BOOL "")
-set(EXTERNAL_BOOST OFF CACHE BOOL "")
-```
-Here the `-O2` flag is for optimizations, it is recommended to stay as `-O2` or `-O3`, but it can be changed to match compiler requirements.
-
-When building Boost from source (`EXTERNAL_BOOST OFF`) the number of processes used when building Boost may be set using the
-`BOOST_BUILD_N_PROCS` flag in CMake.  For example, to build Boost using 4 processes, the following flag should be included in the
-`initial_config.cmake` file:
-```
-#set(BOOST_BUILD_N_PROCS 4 CACHE STRING "")
-```
-This flag will have no effect when linking against external boost, i.e. `EXTERNAL_BOOST ON`.
-
-To install `sisso++` run the following commands (this assumes gnu compiler and MKL are used, if you are using a different compiler/BLAS library change the flags to the relevant data)
-```
-export MKLROOT=/path/to/mkl/
-export BOOST_ROOT=/path/to/boost
-
-cd ~/sisso++/main directory
-mkdir build/;
-cd build/;
-
-cmake -C initial_config.cmake ../
-make install
-```
-
-Once all the commands are run `sisso++` should be in the `~/cpp_sisso/main directory/bin/` directory.
-
-### Install `_sisso`
-To install the python bindings first ensure your python path matches the path used to configure `boost` and then repeat the same commands as above but set `USE_PYTHON` in `initial_config.cmake` to `ON`.
-
-Once installed you should have access to the python interface via `import cpp_sisso`.
-
-## Running the code
-
-### Input files
-
-To see a sample of the input files look in `~/sisso++/main directory/test/exec_test`
-
-To use the code two files are necessary: `sisso.json` and `data.csv`.
-`data.csv` stores all the data for the calculation in a `csv` file.
-The first row in the file corresponds to the feature meta data with the following format `expression (Unit)`.
-For example if one of the primary features used in the set is the lattice constant of a material the header would be `lat_param (AA)`.
-The first column of the file are sample labels for all of the other rows, and is not used.
-
-The input parameters are stored in `sisso.json`, here is a list of all possible variables that can be sored in `sisso.json`
-
-#### `data_file`
-
-The name of the csv file where the data is stored. (Default: "data.csv")
-
-#### `property_key`
-
-The expression of the column where the property to be modeled is stored. (Default: "prop")
-
-#### `task_key`
-
-The expression of the column where the task identification is stored. (Default: "Task")
-
-#### `opset`
-
-The set of operators to use to combine the features during feature creation. (If empty use all available features)
-
-#### `calc_type`
-
-The type of calculation to run either regression or classification
-
-#### `desc_dim`
-
-The maximum dimension of the model to be created
-
-#### `n_sis_select`
-
-The number of features that SIS selects over each iteration
-
-#### `max_rung`
-
-The maximum rung of the feature (height of the tallest possible binary expression tree - 1)
-
-#### `n_residual`
-
-Number of residuals to used to select the next subset of materials in the iteration. (Affects SIS after the 1D model) (Default: 1)
-
-#### `n_models_store`
-
-Number of models to output as file for each dimension (Default: n_residual)
-
-#### `n_rung_store`
-
-The number of rungs where all of the training/testing data of the materials are stored in memory. (Default: `max_rung` - 1)
-
-#### `n_rung_generate`
-
-The number of rungs to generate on the fly during each SIS step. Must be 1 or 0. (Default: 0)
-
-#### `min_abs_feat_val`
-
-Minimum absolute value allowed in the feature's training data (Default: 1e-50)
-
-#### `max_abs_feat_val`
-
-Maximum absolute value allowed in the feature's training data (Default: 1e50)
-
-#### `leave_out_inds`
-
-The indicies from the data set to use as the test set. If empty and `leave_out_frac > 0` the selection will be random
-
-#### `leave_out_frac`
-
-Fraction (in decimal form) of the data to use as a test set (Default: 0.0 if `leave_out_inds` is empty, otherwise `len(leave_out_inds)) / Number of rows in data file`
-
-#### `fix_intercept`
-
-If true set the intercept to 0.0 for all Regression models (Default: false)
-
-This does not work for classification
-
-#### `max_feat_cross_correlation`
-
-The maximum Pearson correlation allowed between selected features (Default: 1.0)
-
-### Perform the Calculation
-Once the input files are made the code can be run using the following command
-
-```
-mpiexec -n 2 ~/sisso++/main directory/bin/sisso++ sisso.json
-```
-
-### Analyzing the Results
-
-Once the calculations are done, two sets of output files are generated.
-A list of all selected features is stored in: `feature_space/selected_features.txt` and every model used as a residual for SIS is stored in `models`.
-The model output files are split into train/test files sorted by the dimensionality of the model and by the train RMSE. The model with the lowest RMSE is stored in the lowest number file.
-For example `train_dim_3_model_0.dat` will have the best 3D model, `train_dim_3_model_1.dat` would have the second best, etc.
-Each model file has a large header containing information about the features selected and model generated
-```
-# c0 + a0 * [(|r_p_B - (r_s_B)|) / ([(r_d_A) * (E_HOMO_B)])] + a1 * [(|r_p_B - (r_s_A)|) * ([(IP_A) / (r_s_A)])] + a2 * [(|E_HOMO_B - (EA_B)|) / ((r_p_A)^2)]
-# RMSE: 0.0779291679452223; Max AE: 0.290810937048465
-# Coefficients
-# Task;    a0                      a1                      a2                      c0
-# 0,       7.174549961742731e+00,  8.687856036798111e-02,  2.468463139364077e-01, -3.995345676823570e-02,
-# Feature Rung, Units, and Expressions
-# 0,  2, 1 / eV,                                           [(|r_p_B - (r_s_B)|) / ([(r_d_A) * (E_HOMO_B)])]
-# 1,  2, eV,                                               [(|r_p_B - (r_s_A)|) * ([(IP_A) / (r_s_A)])]
-# 2,  2, 1 / AA^2 * eV,                                    [(|E_HOMO_B - (EA_B)|) / ((r_p_A)^2)]
-# Number of Samples Per Task
-# Task;   n_mats_train
-# 0,      78
-```
-After this header the following data is stored in the file:
-```
-#Property Value          Property Value (EST)    Feature 0 Value         Feature 1 Value         Feature 2 Value
-```
-With this file the model can be perfectly recreated using the python binding.
-
-### Using the Python Library
-To see how the python interface can be used look at `examples/python_interface_demo.ipynb`
-If you get an error about not being able to load MKL libraries, you may have to run `conda install numpy` to get proper linking.
-
+For a more detailed expaplanation please visit our documentation at: https://tpurcell.pages.mpcdf.de/cpp_sisso