TY - JOUR
T1 - Machine learning transition temperatures from 2D structure
AU - Sifain, Andrew E.
AU - Rice, Betsy M.
AU - Yalkowsky, Samuel H.
AU - Barnes, Brian C.
N1 - Funding Information:
Research was sponsored by the CCDC Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-19-2-0090 . The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the CCDC Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation. This work was supported in part by a grant of computer time from the DOD High Performance Computing Modernization Program at the ARL DoD Supercomputing Resource Center. We thank Brendan Gifford and Jason Morrill for fruitful discussions.
Publisher Copyright:
© 2021
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/6
Y1 - 2021/6
N2 - A priori knowledge of physicochemical properties such as melting and boiling could expedite materials discovery. However, theoretical modeling from first principles poses a challenge for efficient virtual screening of potential candidates. As an alternative, the tools of data science are becoming increasingly important for exploring chemical datasets and predicting material properties. Herein, we extend a molecular representation, or set of descriptors, first developed for quantitative structure-property relationship modeling by Yalkowsky and coworkers known as the Unified Physicochemical Property Estimation Relationships (UPPER). This molecular representation has group-constitutive and geometrical descriptors that map to enthalpy and entropy; two thermodynamic quantities that drive thermal phase transitions. We extend the UPPER representation to include additional information about sp2-bonded fragments. Additionally, instead of using the UPPER descriptors in a series of thermodynamically-inspired calculations, as per Yalkowsky, we use the descriptors to construct a vector representation for use with machine learning techniques. The concise and easy-to-compute representation, combined with a gradient-boosting decision tree model, provides an appealing framework for predicting experimental transition temperatures in a diverse chemical space. An application to energetic materials shows that the method is predictive, despite a relatively modest energetics reference dataset. We also report competitive results on diverse public datasets of melting points (i.e., OCHEM, Enamine, Bradley, and Bergström) comprised of over 47k structures. Open source software is available at https://github.com/USArmyResearchLab/ARL-UPPER.
AB - A priori knowledge of physicochemical properties such as melting and boiling could expedite materials discovery. However, theoretical modeling from first principles poses a challenge for efficient virtual screening of potential candidates. As an alternative, the tools of data science are becoming increasingly important for exploring chemical datasets and predicting material properties. Herein, we extend a molecular representation, or set of descriptors, first developed for quantitative structure-property relationship modeling by Yalkowsky and coworkers known as the Unified Physicochemical Property Estimation Relationships (UPPER). This molecular representation has group-constitutive and geometrical descriptors that map to enthalpy and entropy; two thermodynamic quantities that drive thermal phase transitions. We extend the UPPER representation to include additional information about sp2-bonded fragments. Additionally, instead of using the UPPER descriptors in a series of thermodynamically-inspired calculations, as per Yalkowsky, we use the descriptors to construct a vector representation for use with machine learning techniques. The concise and easy-to-compute representation, combined with a gradient-boosting decision tree model, provides an appealing framework for predicting experimental transition temperatures in a diverse chemical space. An application to energetic materials shows that the method is predictive, despite a relatively modest energetics reference dataset. We also report competitive results on diverse public datasets of melting points (i.e., OCHEM, Enamine, Bradley, and Bergström) comprised of over 47k structures. Open source software is available at https://github.com/USArmyResearchLab/ARL-UPPER.
KW - Cheminformatics
KW - Gradient boosting
KW - Machine learning
KW - Melting and boiling points
KW - Phase transitions
KW - Quantitative structure-property relationships
UR - http://www.scopus.com/inward/record.url?scp=85101770676&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101770676&partnerID=8YFLogxK
U2 - 10.1016/j.jmgm.2021.107848
DO - 10.1016/j.jmgm.2021.107848
M3 - Article
AN - SCOPUS:85101770676
VL - 105
JO - Journal of Molecular Graphics and Modelling
JF - Journal of Molecular Graphics and Modelling
SN - 1093-3263
M1 - 107848
ER -