Commit 04b291bb authored by Thomas Purcell's avatar Thomas Purcell
Browse files

Update paper

parent 5d253bee
---
title: 'SISSO++'
title: 'SISSO++: A C++ Implementation of the Sure Independence Screening and Sparisifying Operator'
tags:
- SISSO
- Symbolic Regression
......@@ -32,17 +32,18 @@ SISSO is introduced for both regression and classification tasks.
In practice, SISSO first constructs a large and exhaustive feature space of billions of potential descriptors by taking in a set of user-provided *primary features*, and then iteratively applying a set of unary and binary operators, e.g., addition, multiplication, exponentiation, and squaring.
From this exhaustive pool of candidate descriptors, the best ones are identified via sure-independence screening, from which the best low-dimensional linear models are found via an $\ell_0$ regularization.
Because symbolic regression generates an interpretable equation, it has become an increasingly popular method across scientific disciplines [@Wang2019a, @Neumann2020, @Udrescu2020a].
Because symbolic regression generates an interpretable equation, it has become an increasingly popular method across scientific disciplines [@Wang2019a], [@Neumann2020], [@Udrescu2020a].
A particular advantage of these approaches are their capability to model complex phenomena using relatively simple descriptors.
Because of this, SISSO has been used successfully in the past to model, explore, and predict important material properties, including: the stability of different phases [@Bartel2018a, @Schleder2020], the catalytic activity and reactivity [@Han2021, @Xu2020, @Andersen2021], and glass transition temperatures [@Pilania2019].
Beyond regression problems, SISSO has also been used successfully to classify materials into different crystal prototypes [@Ouyang2019], or whether a material crystallizes in its ground state as a perovskite [@Bartel2019], or to determine if a material is a topological insulator or not [@Cao2020].
Because of this, SISSO has been used successfully in the past to model, explore, and predict important material properties, including: the stability of different phases [@Bartel2018a], [@Schleder2020], the catalytic activity and reactivity [@Han2021], [@Xu2020], [@Andersen2021], and glass transition temperatures [@Pilania2019].
Beyond regression problems, SISSO has also been used successfully to classify materials into different crystal prototypes [@Ouyang2019a], or whether a material crystallizes in its ground state as a perovskite [@Bartel2019a], or to determine if a material is a topological insulator or not [@Cao2020].
The SISSO++ package is a modular and extensible C++ implementation of the SISSO method with python bindings.
Specifically, SISSO++ applies this methodology for regression, log regression, and classification problems.
Additionally the library include multiple python functions to facilitate the post-processing, analyzing, and visualizing the resulting models.
# Statement of need
The main goal of the SISSO++ package is to provide a user-friendly, easily extendable version of the SISSO method for the scientific community that can be used both on high-performance architectures for data production and on personal computing devices for analyzing and visualizing the results.
The main goal of the SISSO++ package is to provide a user-friendly, easily extendable version of the SISSO method for the scientific community.
The code can be used both on high-performance architectures for data production and on personal computing devices for analyzing and visualizing the results.
For this reason, all computational-intensive task are written in C++ and support parallelization via MPI and openMP.
Additionally, the Python bindings allow one to easily incorporate the methods into computational workflows and postprocess results.
Furthermore, this can facilitate the future integration of SISSO into existing machine-learning frameworks, e.g. scikit-learn [@scikit-learn]
......@@ -68,6 +69,6 @@ The following features are implemented in SISSO++:
# Acknowledgements
The authors would like to thank Markus Rampp and Meisam Tabriz for technical support. We would also like to thank Lucas Foppa, Jingkai Quan, Aakash Naik, and Luigi Sbailò for testing and providing valuable feedback. T.P. would like to thank the Alexander von Humboldt Foundation for their support through the Alexander von Humboldt Postdoctoral Fellowship Program. This project was supported by TEC1p (the European Research Council (ERC) Horizon 2020 research and innovation programme, grant agreement No. 740233), BiGmax (the Max Planck Society’s Research Network on Big-Data-Driven Materials-Science), and the NOMAD pillar of the FAIR-DI e.V. association.
The authors would like to thank Markus Rampp and Meisam Tabriz of the MPCDF for technical support. We would also like to thank Lucas Foppa, Jingkai Quan, Aakash Naik, and Luigi Sbailò for testing and providing valuable feedback. T.P. would like to thank the Alexander von Humboldt Foundation for their support through the Alexander von Humboldt Postdoctoral Fellowship Program. This project was supported by TEC1p (the European Research Council (ERC) Horizon 2020 research and innovation programme, grant agreement No. 740233), BiGmax (the Max Planck Society’s Research Network on Big-Data-Driven Materials-Science), and the NOMAD pillar of the FAIR-DI e.V. association.
# References
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment