Skip to content
Snippets Groups Projects
Select Git revision
  • pre_gpu_changes
  • master default protected
  • classification
3 results

cpp_sisso

  • Clone with SSH
  • Clone with HTTPS
  • Thomas Purcell's avatar
    Thomas Purcell authored
    Using these expressions one can reconstruct features using the primary feature set
    Capability to decompse features into the prevelance of primary features in them
    3041b3ab
    History

    C++ Implementation of SISSO

    Overview

    This package provides a C++ implementation of SISSO with built in Python bindings for an efficient python interface. Future work will expand the python interface to include more postporcessing analysis tools.

    Installation

    The package uses a CMake build system, and compatible all versions of the C++ standard library after C++ 14.

    Prerequisites

    To install the sisso++ the following packages are needed:

    • CMake version 3.10 and up
    • A C++ complier (compatible with C++ 14 and later)
    • BLAS/LAPACK (Architecture specific compilations like MKL or ACML are recommended)
    • MPI
    • Boost with the following libraries compiled (mpi, serialization, system, and filesystem)

    To build the optional python bindings the following are also needed:

    • Python 3 interpreter
    • Boost with the python and numpy libraries compiled

    Install sisso++

    To install sisso++ run the following commands (this assumes gnu compiler and MKL are used, if you are using a different compiler/BLAS library change the flags to the relevant data)

    export MKLROOT=/path/to/mkl/
    export BOOST_ROOT=/path/to/boost
    
    cd ~/sisso++/main directory
    mkdir build/;
    cd build/;
    
    cmake -C initial_config.cmake ../
    make install

    Inside initial_config.cmake there should be the following

    ###############
    # Basic Flags #
    ###############
    set(CMAKE_CXX_COMPILER g++ CACHE STRING "")
    set(CMAKE_CXX_FLAGS "-O2" CACHE STRING "")
    set(USE_PYTHON OFF)

    Here the -O2 flag is for optimizations, it is recommended to stay as -O2 or -O3, but it can be changed to match system requirements.

    Once all the commands are run sisso++ should be in the ~/sisso++/main directory/bin/ directory.

    Install _sisso

    To install the python bindings first ensure your python path matches the path used to configure boost and then repeat the same commands as above but set USE_PYTHON in initial_config.cmake to ON.

    Once installed you should have access to the python interface via import _sisso.

    Running the code

    Input files

    To see a sample of the input files look in ~/sisso++/main directory/test/

    To use the code two files are necessary: sisso.json and data.csv. data.csv stores all the data for the calculation in a csv file. The first row in the file corresponds to the feature meta data with the following format expression (Unit). For example if one of the primary features used in the set is the lattice constant of a material the header would be lat_param (AA). The first column of the file are sample labels for all of the other rows, and is not used.

    The input parameters are stored in sisso.json, here is a list of all possible variables that can be sored in sisso.json

    data_file

    The name of the csv file where the data is stored. (Default: "data.csv")

    prop_key

    The expression of the column where the property to be modeled is stored. (Default: "prop")

    task_key

    The expression of the column where the task identification is stored. (Default: "Task")

    opset

    The set of operators to use to combine the features during feature creation. (If empty use all available features)

    desc_dim

    The maximum dimension of the model to be created

    n_sis_select

    The number of features that SIS selects over each iteration

    max_rung

    The maximum rung of the feature (height of the tallest possible binary expression tree - 1)

    n_residual

    Number of residuals to used to select the next subset of materials in the iteration. (Affects SIS after the 1D model) (Default: 1)

    n_rung_store

    The number of rungs where all of the training/testing data of the materials are stored in memory. (Default: max_rung - 1)

    n_rung_generate

    The number of rungs to generate on the fly during each SIS step. Must be 1 or 0. (Default: 0)

    min_abs_feat_val

    Minimum absolute value allowed in the feature's training data (Default: 1e-50)

    max_abs_feat_val

    Maximum absolute value allowed in the feature's training data (Default: 1e50)

    leave_out_inds

    The indicies from the data set to use as the test set. If empty and leave_out_frac > 0 the selection will be random

    leave_out_frac

    Fraction (in decimal form) of the data to use as a test set (Default: 0.0 if leave_out_inds is empty, otherwise len(leave_out_inds)) / Number of rows in data file

    Perform the Calculation

    Once the input files are made the code can be run using the following command

    mpiexec -n 2 ~/sisso++/main directory/bin/sisso++ sisso.json

    Analyzing the Results

    Once the calculations are done, two sets of output files are generated. A list of all selected features is stored in: feature_space/selected_features.txt and every model used as a residual for SIS is stored in models. The model output files are split into train/test files sorted by the dimensionality of the model and by the train RMSE. The model with the lowest RMSE is stored in the lowest number file. For example train_dim_3_model_0.dat will have the best 3D model, train_dim_3_model_1.dat would have the second best, etc. Each model file has a large header containing information about the features selected and model generated

    # c0 + a0 * [(|r_p_B - (r_s_B)|) / ([(r_d_A) * (E_HOMO_B)])] + a1 * [(|r_p_B - (r_s_A)|) * ([(IP_A) / (r_s_A)])] + a2 * [(|E_HOMO_B - (EA_B)|) / ((r_p_A)^2)]
    # RMSE: 0.0779291679452223; Max AE: 0.290810937048465
    # Coefficients
    # Task;    a0                      a1                      a2                      c0
    # 0,       7.174549961742731e+00,  8.687856036798111e-02,  2.468463139364077e-01, -3.995345676823570e-02,
    # Feature Rung, Units, and Expressions
    # 0,  2, 1 / eV,                                           [(|r_p_B - (r_s_B)|) / ([(r_d_A) * (E_HOMO_B)])]
    # 1,  2, eV,                                               [(|r_p_B - (r_s_A)|) * ([(IP_A) / (r_s_A)])]
    # 2,  2, 1 / AA^2 * eV,                                    [(|E_HOMO_B - (EA_B)|) / ((r_p_A)^2)]
    # Number of Samples Per Task
    # Task;   n_mats_train
    # 0,      78

    After this header the following data is stored in the file:

    #Property Value          Property Value (EST)    Feature 0 Value         Feature 1 Value         Feature 2 Value

    With this file the model can be perfectly recreated using the python binding.

    Using the Python Library

    To see how the python interface can be used look at examples/python_interface_demo.ipynb If you get an error about not being able to load MKL libraries, you may have to run conda install numpy to get proper linking.