From dd5b6a956ebb9311fe81f36a18c2202b7b94d3cb Mon Sep 17 00:00:00 2001 From: Thomas Purcell <purcell@fhi-berlin.mpg.de> Date: Tue, 24 Aug 2021 12:27:55 +0000 Subject: [PATCH] Update README.md --- README.md | 190 +----------------------------------------------------- 1 file changed, 1 insertion(+), 189 deletions(-) diff --git a/README.md b/README.md index bad363fb..135832d0 100644 --- a/README.md +++ b/README.md @@ -5,192 +5,4 @@ C++ Implementation of SISSO with python bindings This package provides a C++ implementation of SISSO with built in Python bindings for an efficient python interface. Future work will expand the python interface to include more postporcessing analysis tools. -## Installation -The package uses a CMake build system, and compatible all versions of the C++ standard library after C++ 14. - -### Prerequisites -To install the sisso++ the following packages are needed: - -- CMake version 3.10 and up -- A C++ complier (compatible with C++ 14 and later) -- BLAS/LAPACK (Architecture specific compilations like MKL or ACML are recommended) -- MPI -- Boost with the following libraries compiled (mpi, serialization, system, and filesystem) - -To build the optional python bindings the following are also needed: - -- Python 3 interpreter -- Boost with the python and numpy libraries compiled - -### Install `sisso++` -`sisso++` is installed using a cmake build system, with some basic configuration files stored in `cmake/toolchains/` -As an example here is an `initial_config.cmake` file used to construct `sisso++` and the python bindings using the gnu compiler. -``` -############### -# Basic Flags # -############### -set(CMAKE_CXX_COMPILER g++ CACHE STRING "") -set(CMAKE_CXX_FLAGS "-O2" CACHE STRING "") - -################# -# Feature Flags # -################# -set(USE_PYTHON ON CACHE BOOL "") -set(EXTERNAL_BOOST OFF CACHE BOOL "") -``` -Here the `-O2` flag is for optimizations, it is recommended to stay as `-O2` or `-O3`, but it can be changed to match compiler requirements. - -When building Boost from source (`EXTERNAL_BOOST OFF`) the number of processes used when building Boost may be set using the -`BOOST_BUILD_N_PROCS` flag in CMake. For example, to build Boost using 4 processes, the following flag should be included in the -`initial_config.cmake` file: -``` -#set(BOOST_BUILD_N_PROCS 4 CACHE STRING "") -``` -This flag will have no effect when linking against external boost, i.e. `EXTERNAL_BOOST ON`. - -To install `sisso++` run the following commands (this assumes gnu compiler and MKL are used, if you are using a different compiler/BLAS library change the flags to the relevant data) -``` -export MKLROOT=/path/to/mkl/ -export BOOST_ROOT=/path/to/boost - -cd ~/sisso++/main directory -mkdir build/; -cd build/; - -cmake -C initial_config.cmake ../ -make install -``` - -Once all the commands are run `sisso++` should be in the `~/cpp_sisso/main directory/bin/` directory. - -### Install `_sisso` -To install the python bindings first ensure your python path matches the path used to configure `boost` and then repeat the same commands as above but set `USE_PYTHON` in `initial_config.cmake` to `ON`. - -Once installed you should have access to the python interface via `import cpp_sisso`. - -## Running the code - -### Input files - -To see a sample of the input files look in `~/sisso++/main directory/test/exec_test` - -To use the code two files are necessary: `sisso.json` and `data.csv`. -`data.csv` stores all the data for the calculation in a `csv` file. -The first row in the file corresponds to the feature meta data with the following format `expression (Unit)`. -For example if one of the primary features used in the set is the lattice constant of a material the header would be `lat_param (AA)`. -The first column of the file are sample labels for all of the other rows, and is not used. - -The input parameters are stored in `sisso.json`, here is a list of all possible variables that can be sored in `sisso.json` - -#### `data_file` - -The name of the csv file where the data is stored. (Default: "data.csv") - -#### `property_key` - -The expression of the column where the property to be modeled is stored. (Default: "prop") - -#### `task_key` - -The expression of the column where the task identification is stored. (Default: "Task") - -#### `opset` - -The set of operators to use to combine the features during feature creation. (If empty use all available features) - -#### `calc_type` - -The type of calculation to run either regression or classification - -#### `desc_dim` - -The maximum dimension of the model to be created - -#### `n_sis_select` - -The number of features that SIS selects over each iteration - -#### `max_rung` - -The maximum rung of the feature (height of the tallest possible binary expression tree - 1) - -#### `n_residual` - -Number of residuals to used to select the next subset of materials in the iteration. (Affects SIS after the 1D model) (Default: 1) - -#### `n_models_store` - -Number of models to output as file for each dimension (Default: n_residual) - -#### `n_rung_store` - -The number of rungs where all of the training/testing data of the materials are stored in memory. (Default: `max_rung` - 1) - -#### `n_rung_generate` - -The number of rungs to generate on the fly during each SIS step. Must be 1 or 0. (Default: 0) - -#### `min_abs_feat_val` - -Minimum absolute value allowed in the feature's training data (Default: 1e-50) - -#### `max_abs_feat_val` - -Maximum absolute value allowed in the feature's training data (Default: 1e50) - -#### `leave_out_inds` - -The indicies from the data set to use as the test set. If empty and `leave_out_frac > 0` the selection will be random - -#### `leave_out_frac` - -Fraction (in decimal form) of the data to use as a test set (Default: 0.0 if `leave_out_inds` is empty, otherwise `len(leave_out_inds)) / Number of rows in data file` - -#### `fix_intercept` - -If true set the intercept to 0.0 for all Regression models (Default: false) - -This does not work for classification - -#### `max_feat_cross_correlation` - -The maximum Pearson correlation allowed between selected features (Default: 1.0) - -### Perform the Calculation -Once the input files are made the code can be run using the following command - -``` -mpiexec -n 2 ~/sisso++/main directory/bin/sisso++ sisso.json -``` - -### Analyzing the Results - -Once the calculations are done, two sets of output files are generated. -A list of all selected features is stored in: `feature_space/selected_features.txt` and every model used as a residual for SIS is stored in `models`. -The model output files are split into train/test files sorted by the dimensionality of the model and by the train RMSE. The model with the lowest RMSE is stored in the lowest number file. -For example `train_dim_3_model_0.dat` will have the best 3D model, `train_dim_3_model_1.dat` would have the second best, etc. -Each model file has a large header containing information about the features selected and model generated -``` -# c0 + a0 * [(|r_p_B - (r_s_B)|) / ([(r_d_A) * (E_HOMO_B)])] + a1 * [(|r_p_B - (r_s_A)|) * ([(IP_A) / (r_s_A)])] + a2 * [(|E_HOMO_B - (EA_B)|) / ((r_p_A)^2)] -# RMSE: 0.0779291679452223; Max AE: 0.290810937048465 -# Coefficients -# Task; a0 a1 a2 c0 -# 0, 7.174549961742731e+00, 8.687856036798111e-02, 2.468463139364077e-01, -3.995345676823570e-02, -# Feature Rung, Units, and Expressions -# 0, 2, 1 / eV, [(|r_p_B - (r_s_B)|) / ([(r_d_A) * (E_HOMO_B)])] -# 1, 2, eV, [(|r_p_B - (r_s_A)|) * ([(IP_A) / (r_s_A)])] -# 2, 2, 1 / AA^2 * eV, [(|E_HOMO_B - (EA_B)|) / ((r_p_A)^2)] -# Number of Samples Per Task -# Task; n_mats_train -# 0, 78 -``` -After this header the following data is stored in the file: -``` -#Property Value Property Value (EST) Feature 0 Value Feature 1 Value Feature 2 Value -``` -With this file the model can be perfectly recreated using the python binding. - -### Using the Python Library -To see how the python interface can be used look at `examples/python_interface_demo.ipynb` -If you get an error about not being able to load MKL libraries, you may have to run `conda install numpy` to get proper linking. - +For a more detailed expaplanation please visit our documentation at: https://tpurcell.pages.mpcdf.de/cpp_sisso -- GitLab