diff --git a/joss/paper.bib b/joss/paper.bib index e64a1d1c32296d755915e11e1d8b4f494d6c1709..cc186e56cd5d8a8c3aba367db130b11773a0f652 100644 --- a/joss/paper.bib +++ b/joss/paper.bib @@ -217,3 +217,14 @@ url = {https://pubs.acs.org/doi/full/10.1021/acs.jcim.9b00807}, volume = {59}, year = {2019} } +@article{scikit-learn, + title={Scikit-learn: Machine Learning in {P}ython}, + author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V. + and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. + and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and + Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.}, + journal={Journal of Machine Learning Research}, + volume={12}, + pages={2825--2830}, + year={2011} +} diff --git a/joss/paper.md b/joss/paper.md index c09ed273372220bd83898686a97a625e8afd50c8..21c546f860d9ea143e1fae3149f0ab7bedd01ae4 100644 --- a/joss/paper.md +++ b/joss/paper.md @@ -23,38 +23,26 @@ affiliations: date: September 2021 bibliography: paper.bib --- -a) Summary: -Describe SISSO for a non-specialist audience [what are primary features, operators, symbolic regression, etc.] and what are its use cases in science. -2:25 -(b) Statement of need: -Add what makes this implementation superior to existing ones with clear examples. In my eyes these are -(i) performance/scalability -(ii) documentation and extendibility -(iii) user-friendliness (advanced features and scripts to perform recurring tasks) -2:26 -A statement of need: Does the paper have a section titled ‘Statement of Need’ that clearly states what problems the software is designed to solve and who the target audience is? - -(c) Features: -API, documentations, tutorial and quickstart guides are also important features # Summary The sure-independence screening and sparsifying operator (SISSO) method [@Ouyang2017] is an algorithm belonging to the field of artificial intelligence. -As a symbolic regression technique SISSO is used to identify low-dimensional, analytic functions, the so called descriptors, that best reproduce or classify a target data set. -In practice, SISSO first constructs a large and exhaustive feature space of billions of of potential descriptors by taking in a set of user-provided primary features, i.e. the input features, and then iteratively applying a set of analytical unary and binary operators, e.g., addition, multiplication, exponentiation, and squaring. +As a symbolic regression technique, SISSO is used to identify low-dimensional, analytic functions, the so called descriptors, that best reproduce or classify a target data set. +In practice, SISSO first constructs a large and exhaustive feature space of billions of potential descriptors by taking in a set of user-provided primary features, i.e. the input features, and then iteratively applying a set of analytical unary and binary operators, e.g., addition, multiplication, exponentiation, and squaring. From this exhaustive pool of candidate descriptors, the best one is identified by performing an $\ell_0$-regularization to find the best low-dimensional linear model of the features using the SISSO operator. Because symbolic regression generates an interpretable equation, it has become an increasingly popular method across scientific disciplines [@Wang2019a, @Neumann2020, @Udrescu2020a]. -In particular, SISSO has been used successfully in the past to solve numerous problems in material science, including: the stability of materials [@Bartel2018a, @Schleder2020], catalysis [@Han2021, @Xu2020, @Andersen2021], and glass transition temperatures [@Pilania2019]. -Beyond regression problems SISSO has also successfully used classify materials into different crystal prototypes [@Ouyang2019] or to determine if a material is a topological insulator or not [@Cao2019]. +A particular advantage of these approaches are their capability to model complex phenomena using relatively simple features. +Because of this, SISSO has been used successfully in the past to model, explore, and predict important material properties, including: the stability of different phases [@Bartel2018a, @Schleder2020], the catalytic activity and reactivity [@Han2021, @Xu2020, @Andersen2021], and glass transition temperatures [@Pilania2019]. Beyond regression problems, SISSO has also been used successfully to classify materials into different crystal prototypes [@Ouyang2019] or to determine if a material is a topological insulator or not [@Cao2019]. The SISSO++ package is a modular and extensible C++ implementation of the SISSO method with python bindings. Specifically, SISSO++ applies this methodology for regression, log regression, and classification problems. Additionally the library include multiple python functions to facilitate the post-processing, analyzing, and visualizing the resulting models. # Statement of need -The main goal of the SISSO++ package is to provide a user-friendly, easily-extendable version of the SISSO method for the use of the scientific community that can be used both on high-performance architectures for data production and on personal computing devices for analyzing and visualizing the results. +The main goal of the SISSO++ package is to provide a user-friendly, easily-extendable version of the SISSO method for the scientific community that can be used both on high-performance architectures for data production and on personal computing devices for analyzing and visualizing the results. For this reason, all computational-intensive task are written in C++ and support parallelization via MPI and openMP. Additionally, Python bindings allow to easily incorporate the methods in computational workflows and to easily postprocess results. +Furthermore, this can facilitate the future integration of SISSO in existing machine-learning frameworks, e.g. scikit-learn [@scikit-learn] The code is designed in a modular fashion, which simplifies the process of extending the code for other applications. Finally the project's extensive documentation and tutorials provide a good access point for new-users of the method.