Commit 6c1e03f5 authored by Thomas Purcell's avatar Thomas Purcell
Browse files

Update paper per Chris v2.0 comments

parent d4df6d55
......@@ -217,3 +217,14 @@ url = {https://pubs.acs.org/doi/full/10.1021/acs.jcim.9b00807},
volume = {59},
year = {2019}
}
@article{scikit-learn,
title={Scikit-learn: Machine Learning in {P}ython},
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
journal={Journal of Machine Learning Research},
volume={12},
pages={2825--2830},
year={2011}
}
......@@ -23,38 +23,26 @@ affiliations:
date: September 2021
bibliography: paper.bib
---
a) Summary:
Describe SISSO for a non-specialist audience [what are primary features, operators, symbolic regression, etc.] and what are its use cases in science.
2:25
(b) Statement of need:
Add what makes this implementation superior to existing ones with clear examples. In my eyes these are
(i) performance/scalability
(ii) documentation and extendibility
(iii) user-friendliness (advanced features and scripts to perform recurring tasks)
2:26
A statement of need: Does the paper have a section titled ‘Statement of Need’ that clearly states what problems the software is designed to solve and who the target audience is?
(c) Features:
API, documentations, tutorial and quickstart guides are also important features
# Summary
The sure-independence screening and sparsifying operator (SISSO) method [@Ouyang2017] is an algorithm belonging to the field of artificial intelligence.
As a symbolic regression technique SISSO is used to identify low-dimensional, analytic functions, the so called descriptors, that best reproduce or classify a target data set.
In practice, SISSO first constructs a large and exhaustive feature space of billions of of potential descriptors by taking in a set of user-provided primary features, i.e. the input features, and then iteratively applying a set of analytical unary and binary operators, e.g., addition, multiplication, exponentiation, and squaring.
As a symbolic regression technique, SISSO is used to identify low-dimensional, analytic functions, the so called descriptors, that best reproduce or classify a target data set.
In practice, SISSO first constructs a large and exhaustive feature space of billions of potential descriptors by taking in a set of user-provided primary features, i.e. the input features, and then iteratively applying a set of analytical unary and binary operators, e.g., addition, multiplication, exponentiation, and squaring.
From this exhaustive pool of candidate descriptors, the best one is identified by performing an $\ell_0$-regularization to find the best low-dimensional linear model of the features using the SISSO operator.
Because symbolic regression generates an interpretable equation, it has become an increasingly popular method across scientific disciplines [@Wang2019a, @Neumann2020, @Udrescu2020a].
In particular, SISSO has been used successfully in the past to solve numerous problems in material science, including: the stability of materials [@Bartel2018a, @Schleder2020], catalysis [@Han2021, @Xu2020, @Andersen2021], and glass transition temperatures [@Pilania2019].
Beyond regression problems SISSO has also successfully used classify materials into different crystal prototypes [@Ouyang2019] or to determine if a material is a topological insulator or not [@Cao2019].
A particular advantage of these approaches are their capability to model complex phenomena using relatively simple features.
Because of this, SISSO has been used successfully in the past to model, explore, and predict important material properties, including: the stability of different phases [@Bartel2018a, @Schleder2020], the catalytic activity and reactivity [@Han2021, @Xu2020, @Andersen2021], and glass transition temperatures [@Pilania2019]. Beyond regression problems, SISSO has also been used successfully to classify materials into different crystal prototypes [@Ouyang2019] or to determine if a material is a topological insulator or not [@Cao2019].
The SISSO++ package is a modular and extensible C++ implementation of the SISSO method with python bindings.
Specifically, SISSO++ applies this methodology for regression, log regression, and classification problems.
Additionally the library include multiple python functions to facilitate the post-processing, analyzing, and visualizing the resulting models.
# Statement of need
The main goal of the SISSO++ package is to provide a user-friendly, easily-extendable version of the SISSO method for the use of the scientific community that can be used both on high-performance architectures for data production and on personal computing devices for analyzing and visualizing the results.
The main goal of the SISSO++ package is to provide a user-friendly, easily-extendable version of the SISSO method for the scientific community that can be used both on high-performance architectures for data production and on personal computing devices for analyzing and visualizing the results.
For this reason, all computational-intensive task are written in C++ and support parallelization via MPI and openMP.
Additionally, Python bindings allow to easily incorporate the methods in computational workflows and to easily postprocess results.
Furthermore, this can facilitate the future integration of SISSO in existing machine-learning frameworks, e.g. scikit-learn [@scikit-learn]
The code is designed in a modular fashion, which simplifies the process of extending the code for other applications.
Finally the project's extensive documentation and tutorials provide a good access point for new-users of the method.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment