Update paper per Chris v2.0 comments

6c1e03f5 · Thomas Purcell · d4df6d55 · 6c1e03f5 · 6c1e03f5
Commit 6c1e03f5 authored 3 years ago by Thomas Purcell
--- a/joss/paper.bib
+++ b/joss/paper.bib
@@ -217,3 +217,14 @@ url = {https://pubs.acs.org/doi/full/10.1021/acs.jcim.9b00807},
 volume = {59},
 year = {2019}
 }
+@article{scikit-learn,
+ title={Scikit-learn: Machine Learning in {P}ython},
+ author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
+         and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
+         and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
+         Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
+ journal={Journal of Machine Learning Research},
+ volume={12},
+ pages={2825--2830},
+ year={2011}
+}
--- a/joss/paper.md
+++ b/joss/paper.md
@@ -23,38 +23,26 @@ affiliations:
 date: September 2021
 bibliography: paper.bib
 ---
-a) Summary:
-Describe SISSO for a non-specialist audience [what are primary features, operators, symbolic regression, etc.] and what are its use cases in science.
-2:25
-(b) Statement of need:
-Add what makes this implementation superior to existing ones with clear examples. In my eyes these are
-(i) performance/scalability
-(ii) documentation and extendibility
-(iii) user-friendliness (advanced features and scripts to perform recurring tasks)
-2:26
-A statement of need: Does the paper have a section titled ‘Statement of Need’ that clearly states what problems the software is designed to solve and who the target audience is?
-
-(c) Features:
-API, documentations, tutorial and quickstart guides are also important features

 # Summary
 The sure-independence screening and sparsifying operator (SISSO) method [@Ouyang2017] is an algorithm belonging to the field of artificial intelligence.
-As a symbolic regression technique SISSO is used to identify low-dimensional, analytic functions, the so called descriptors, that best reproduce or classify a target data set.
-In practice, SISSO first constructs a large and exhaustive feature space of billions of of potential descriptors by taking in a set of user-provided primary features, i.e. the input features, and then iteratively applying a set of analytical unary and binary operators, e.g., addition, multiplication, exponentiation, and squaring.
+As a symbolic regression technique, SISSO is used to identify low-dimensional, analytic functions, the so called descriptors, that best reproduce or classify a target data set.
+In practice, SISSO first constructs a large and exhaustive feature space of billions of potential descriptors by taking in a set of user-provided primary features, i.e. the input features, and then iteratively applying a set of analytical unary and binary operators, e.g., addition, multiplication, exponentiation, and squaring.
 From this exhaustive pool of candidate descriptors, the best one is identified by performing an $\ell_0$-regularization to find the best low-dimensional linear model of the features using the SISSO operator.

 Because symbolic regression generates an interpretable equation, it has become an increasingly popular method across scientific disciplines [@Wang2019a, @Neumann2020, @Udrescu2020a].
-In particular, SISSO has been used successfully in the past to solve numerous problems in material science, including: the stability of materials [@Bartel2018a, @Schleder2020], catalysis [@Han2021, @Xu2020, @Andersen2021], and glass transition temperatures [@Pilania2019].
-Beyond regression problems SISSO has also successfully used classify materials into different crystal prototypes [@Ouyang2019] or to determine if a material is a topological insulator or not [@Cao2019].
+A particular advantage of these approaches are their capability to model complex phenomena using relatively simple features.
+Because of this, SISSO has been used successfully in the past to model, explore, and predict important material properties, including: the stability of different phases [@Bartel2018a, @Schleder2020], the catalytic activity and reactivity [@Han2021, @Xu2020, @Andersen2021], and glass transition temperatures [@Pilania2019]. Beyond regression problems, SISSO has also been used successfully to classify materials into different crystal prototypes [@Ouyang2019] or to determine if a material is a topological insulator or not [@Cao2019].

 The SISSO++ package is a modular and extensible C++ implementation of the SISSO method with python bindings.
 Specifically, SISSO++ applies this methodology for regression, log regression, and classification problems.
 Additionally the library include multiple python functions to facilitate the post-processing, analyzing, and visualizing the resulting models.

 # Statement of need
-The main goal of the SISSO++ package is to provide a user-friendly, easily-extendable version of the SISSO method for the use of the scientific community that can be used both on high-performance architectures for data production and on personal computing devices for analyzing and visualizing the results.
+The main goal of the SISSO++ package is to provide a user-friendly, easily-extendable version of the SISSO method for the scientific community that can be used both on high-performance architectures for data production and on personal computing devices for analyzing and visualizing the results.
 For this reason, all computational-intensive task are written in C++ and support parallelization via MPI and openMP.
 Additionally, Python bindings allow to easily incorporate the methods in computational workflows and to easily postprocess results.
+Furthermore, this can facilitate the future integration of SISSO in existing machine-learning frameworks, e.g. scikit-learn [@scikit-learn]
 The code is designed in a modular fashion, which simplifies the process of extending the code for other applications.
 Finally the project's extensive documentation and tutorials provide a good access point for new-users of the method.