key: cord-0020457-zvtu14uv authors: Keith, John A.; Vassilev-Galindo, Valentin; Cheng, Bingqing; Chmiela, Stefan; Gastegger, Michael; Müller, Klaus-Robert; Tkatchenko, Alexandre title: Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems date: 2021-07-07 journal: Chem Rev DOI: 10.1021/acs.chemrev.1c00107 sha: e5967d0886804979f7ada9d249ab72ac94447c1f doc_id: 20457 cord_uid: zvtu14uv [Image: see text] Machine learning models are poised to make a transformative impact on chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This Review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry and machine learning methods, showing how insights involving both can be achieved. We follow with a critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design. A lasting challenge in applied physical and chemical sciences has been to answer the question: how can one identif y and make chemical compounds or materials that have optimal properties for a given purpose? A substantial part of research in physics, chemistry, and materials science concerns the discovery and characterization of novel compounds that can benefit society, but most advances still are generally attributed to trial-and-error experimentation, and this requires significant time and cost. Current global challenges create greater urgency for faster, better, and less expensive research and development efforts. Computational chemistry (CompChem) methods have significantly improved over time, and they promise paradigm shifts in how compounds are fundamentally understood and designed for specific applications. Machine learning (ML) methods have in the past decades witnessed an unprecedented technological evolution enabling a plethora of applications, some of which have become daily companions in our lives. 1−3 Applications of ML include technological fields, such as web search, translation, natural language processing, self-driving vehicles, control architectures, and in the sciences, for example, medical diagnostics, 4−8 particle physics, 9 nano sciences, 10 bioinformatics, 11,12 braincomputer interfaces, 13 social media analysis, 14 robotics, 15, 16 and team, social, or board games. 17−19 These methods have also become popular for accelerating the discovery and design of new materials, chemicals, and chemical processes. 20 At the same time, we have witnessed hype, criticism, and misunderstanding about how ML tools are to be used in chemical research. From this, we see a need for researchers working at the intersection of CompChem+ML to more critically recognize the true strengths and weaknesses of each component in any given study. Specifically, we wanted to review why and how CompChem+ML can provide useful insights into the study of molecules and materials. While developing this Review, we polled the scientific community with an anonymous online survey that asked for questions and concerns regarding the use of ML models with chemistry applications. Respondents raised excellent points including: 1. ML methods are becoming less understood while they are also more regularly used as black box tools. 2. Many publications show inadequate technical expertise in ML (e.g., inappropriate splitting of training, testing, and validation sets). 3 . It can be difficult to compare different ML methods and know which is the best for a particular application or whether ML should even be used at all. 4. Data quality and context are often missing from ML modeling, and data sets need to be made freely available and clearly explained. Additionally, when asked about the most exciting active and emerging areas of ML in the next five years, respondents mentioned a wide range of topics from catalysis discovery, drug and peptide design, "above the arrow" reaction predictions, and generative models that promise to fundamentally transform chemical discovery. When asked about challenges that ML will not surmount in the next five years, respondents mentioned modeling complex photochemical and electrochemical environments, discovering exact exchange-correlation functionals, and completely autonomous reaction discovery. This Review will give our perspective on many of these topics. As context for this Review, Figure 1 shows a heatmap depicting the frequency of ML keywords found in scientific articles that also have keywords associated with different American Chemical Society (ACS) technical divisions. Preparing this figure required several steps. First, lists of ML keywords were chosen. Second, lists of keywords were created by perusing ACS division symposia titles from over the past five years. Third, Python scripts used Scopus Application Programming Interfaces (APIs) to identify the number of scientific publications that matched sets of ML and division symposia keywords. Figure 1 elucidates several interesting points. First, the most popular ML approaches across all divisions are clearly neural networks, followed by genetic algorithms and support vector machines/kernel methods. Second, divisions such as physical (PHYS), analytical (ANYL), and environmental (ENVR) are already using diverse sets of ML approaches, while divisions such as inorganic (INOR), nuclear (NUCL), and carbohydrate (CARB) are primarily employing more distinct subsets of approaches, while other divisions, such as educational (CHED), history (HIST), law (CHAL), and business-oriented divisions (BMGT and SCHB), that is, divisions that produce much fewer scholarly journal articles, are not linking to publications that mention ML. Third, ML has had more prevalence across practically all divisions over time. For further insight, Table 1 lists the top four keywords obtained from recent ACS symposium titles, as well as their respective contribution percentage reflected in Figure 1 . There, one sees that a handful of keywords can significantly overshadow matches in some of the bins, for exampled, "electro", "sensor", "protein", and "plastic". With any ML application, there will be a risk of imperfect data or user bias, but this is a useful launch point to appreciate how and where ML is being used in chemical sciences. A key takeaway is that we are witnessing an unprecedented crescendo in interest in ML over the last ten years (e.g., Figure 1c ) thanks to improved understanding of the intersectionality of traditional science and engineering disciplines with rapidly evolving disciplines such as CompChem and data science. Table 1 are freely available with a creative commons attribution license. Readers are welcome to use, adapt, and share these scripts with appropriate attribution: https://github.com/keithgroup/scopus_searching_ML_in_chem_literature.). CompChem, ML, and chemical and physical intuition (CPI). This review will classify concepts using a rendition of a "data to wisdom" hierarchy, Figure 2 . Scholars have noted shortcomings with similar constructs, 21 but we use it to reflect a stepladder for scientific progress, starting from collecting data and ending with overall impact. CompChem, ML, and CPI each have different strengths and weaknesses and bring synergistic opportunities. CPI alone can be employed to climb the ladder from data to impact, but current CPI may only provide limited understanding or applicability outside of available data sets. However, CompChem is extraordinarily well-suited for generating high quality data that contain useful information (vide infra, section 2) often more easily than via traditional experimentation. ML is likewise extremely wellsuited for recognizing and accurately quantifying nonlinear relationships (vide infra, section 3), a task that is especially difficult for even the most expert-level CPI alone. A key opportunity is that useful ML requires robust data sets, and these can be provided by CompChem as long as the CPI component is selecting and correctly interpreting appropriate methods for the task at hand to productively climb the ladder toward impact (vide infra, section 4). We stress that the impact generation process shown in Figure 2 is by no means a linear one  on the contrary, it contains many loops and dead ends. As we show later (in Section 4), within the troika of CompChem+ML+CPI, ML acts as a catalyst that accelerates explorative data-driven hypotheses generation. Automatically generated hypotheses are then validated and calibrated with CompChem and CPI to yield further improved ML modeling (enriched by more physical prior knowledge), which then loops back with improved hypotheses. This feedback loop is the key to the modern knowledge discovery leading to insight, wisdom and hopefully positive impacts to society. We consider quantum mechanics as described by the nonrelativistic time-independent Schrodinger equation as our "standard model" because it accurately represents the physics of charged particles (electrons and nuclei) that make up almost all molecules and materials. Indeed, this opinion has been held by some for almost a century: The fundamental laws necessary for the mathematical treatment of a large part of physics and the whole of chemistry are thus completely known, and the difficulty lies only in the fact that application of these laws leads to equations that are too complex to be solved. P. A. M. Dirac, 1929 Any theoretical method for predicting molecular or material phenomena must first be rooted in quantum mechanics theory and then suitably coarse-grained and approximated so that it can be applied in a practical setting. CompChem, or more precisely, computational quantum chemistry defines computationally driven numerical analyses based on quantum mechanics. In this section, we will explain how and why different CompChem methods capture different aspects of underlying physics. Specifically, this section provides a concise overview of the broad range of CompChem methods that are available for generating data sets that would be useful for MLassisted studies of molecules and materials. 2.1.1. Models and Levels of Abstraction. Models extract information from data. The renowned statistician George Box famously discussed "good models" as those characterized as "simple", "illuminating", and "useful". 22 Good models should be parsimonious and describe essential relationships without overelaboration. The ideal gas equation, PV = nRT, exemplifies a good model. The ideal gas equation relates macroscopic pressure (P), volume (V), number of molecules (n), and temperature (T) of gases under idealized conditions, without requiring explicit knowledge of the processes occurring on an atomic scale. Its simple functional form needs just one parameter, the ideal gas constant R, and this makes it possible to formulate useful insights, such as how at constant pressure a gas expands with rising temperature. On the other hand, this elegant equation only holds for conditions where the gas behaves as an ideal gas. The derivation of more accurate models of gases requires more mathematically complicated equations of state that rely on more free parameters 23 that in turn obfuscate physical insights, require more computational effort to solve, and thus make the model less "good". This example also offers a convenient connection to ML models that will be discussed later in section 3. As mathematical models for complex phenomena become more complicated and less intuitive to derive, ML models that infer nonlinear relationships from data become more applicable when increasing amounts of empirical data become available. Alternatively, the conventional CompChem treatment entails first determining the system's relevant geometry and its total ground state energy, and from that physical properties of interest (e.g., pressure, volume, band gap, polarizability, etc.) can be obtained using quantum and statistical mechanics. In this section, we discuss the relevant CompChem methods for these. While the mathematical physics for these methods might occasionally be too complicated for a user to fully understand, many algorithms exist so that they can still be easily run in a "black-box" way with modern computational chemistry software and accompanying tutorials. 24−27 CompChem thus serves as an invaluable tool to generate data and information for knowledge and insights across many length and time scales. Figure 3 is an adaption of a multiscale hierarchy of different classes of CompChem methods. It shows their applicability for modeling different length and time scales and depicts how large scale models may be developed based on smaller scale theories. 2.1.2. CompChem Representations. Integral to every CompChem study is the user's representation for the system, that is, how the user chooses to describe the system. CompChem representations can range from simple and lucid (e.g., a precise chemical system such as a water molecule isolated in a vacuum) to complex and ambiguous (e.g., a putative but speculative depiction of a solid−liquid interface under electrochemical conditions). Approximate wavefunctions (expressed on a basis set of mathematical functions) or approximate Hamiltonians (referred to as levels of theory) as described below in this section can also be considered representations. One might then say that many representations for different components of a system will constitute an overall representation, and this is true. The point we make is that the validity of any computational result depends on the overall representation, and sometimes an incorrect representation may provide a correct result due to "fortuitous error cancellation". In CompChem studies, a valid representation is one that captures the nature of the physical phenomena of a system. For a molecular example, if one is determining the bond energy of a large biodiesel molecule using CompChem methods, 28 it may or may not be justified to approximate a nearby long-chain alkyl group (−C n H (2n+1) ) simply as a methyl (−CH 3 ) or even a hydrogen atom. Indeed, choosing such a representation can sometimes be a useful example of CPI since alkyl bonds usually exhibit relatively short-ranged interactions (a feature that will be discussed in the context of ML in more detail in section 4.1.3.). An atomic scale geometry with fewer atoms would reduce the computational cost of the study or allow a more accurate but more computationally expensive calculation to be run. On the other hand, it might also be a poor choice if the chemical group, for example, a substituted alkyl group participated in physical organic interactions, such as subtle steric, induction, or resonance effects. 29 For a solid-state example, a user might exercise good CPI by assuming that a relatively small unit cell under periodic boundary conditions would capture salient features of a bulk material or a material surface (as is often the case for many metals). On the other hand, subtle symmetry-breaking effects in materials (e.g., distortions arising from tilting octahedra groups in perovskites, 30 or surface reconstruction phenomena that occur on single crystals) 31 might only be observed when considering larger and more computationally expensive unit cells. Relevant to both examples, it may also be that the CompChem method itself brings errors that obfuscate phenomena that the user intends to model. In general, CompChem errors may be due to 1) errors introduced by the user in the initial set up of the CompChem application, or 2) errors in the CompChem method when treating the physics of the system. In section 3, we will discuss how the choice of ML representation also plays similarly critical roles in determining whether and to what extent an ML model is useful. 2.1.3. Method Accuracy. The quantitative accuracy of a CompChem model stems from its suitability in describing the system. As explained above, an observed accuracy will depend on the representation being used. High-quality CompChem calculations have traditionally been benchmarked against data sets that consist of well-controlled and relatively precise thermochemistry experiments on small, isolated molecules. 32, 33 The error bars for standard calorimetry experiments are approximately 4 kJ/mol (or 1 kcal/mol or 0.04 eV), and computational methods that can provide greater accuracy than this are stated as achieving "chemical accuracy". Note that this term should be used when describing the accuracy of the method compared to the most accurate data possible; for example, if one CompChem method was found to reproduce another CompChem method within 1 kJ/mol, but both methods reproduce experimental data with errors of 20 kJ/ mol, then neither method should be called chemically accurate. There are many well-established reasons why CompChem models can bring errors. For example, errors may be due to size consistency 34 or size extensivity 35 problems that are intrinsic within the CompChem method, larger systems sometimes embody significant medium and long-range interactions (e.g., van der Waals forces) 36 or self-interaction errors 37 that might not be noticeable in small test cases. The recommended path forward is to consider which fundamental interactions are in play in the system and then use a CompChem model that is adequate at describing those interactions. Besides this, users should make use of existing tutorial references that provide practical knowledge about which parameters in a CompChem calculation should be carefully noted, for example ref 38 . Historically the most popular CompChem methods for molecular and materials modeling (the B3LYP 39 and PBE 40 exchange correlation functionals, see section 2.2.3.) are often said to have an expected accuracy of about 10−15 kJ/mol (or 2−4 kcal/mol or 0.1−0.2 eV) when modeling differences between the total energies of two similar systems, and errors are expected to be somewhat larger when considering transition state energies. Though this is used as a simple rule, it is obviously an oversimplification and actual accuracy is only assessed by thoughtful benchmarking of the case being considered. 41−45 2.1.4. Precision and Reproducibility. In CompChem, one normally assumes that any two users using the same representation for the system with the same code on the same computing architecture will obtain the exact same result within the numerical precision of the computers being used. This is not always the case, especially for molecular dynamics (MD) simulations that often rely on stochastic methods. 46 Computational precision also becomes more concerning when there are different versions of codes in circulation, errors that might arise from different compilers and libraries, and a lack of consensus in the community about which computational methods and which default settings should be used for specific application systems, for example, grid density selections, 47 or standard keywords for molecular dynamics simulations. 46, 48 There have been efforts to confirm that different codes can reproduce energies for the same system representation, 48, 49 but some commercial codes hold proprietary licenses that restrict publications that critically benchmark calculation accuracy and timings across different codes. A path forward to benefit the advancement of insight is the development of (open) source codes 50 that perform as well if not better than commercial codes. While increased access to computational algorithms is beneficial, it also raises the need for enforcing high standards of quality and reproducibility. 51, 52 We are also glad to see active developments to more lucidly show how any set of computational data is generated, precisely with which codes, keywords, and auxiliary scripts and routines. 53−56 We are now in an era where truly massive amounts of data and information can be generated for CompChem+ML efforts. To go forward, one needs to know what constitutes good and useful data, and the next section provides an overview of how to do this using CompChem. Earlier we mentioned that a usual task in CompChem is to calculate the ground state energy of an atomic scale system. Indeed, CompChem methods can determine the energy for a hypothetical configuration of atoms, and this constitutes the potential energy surface (PES) of the system (Figure 4 ). The PES is a hypersurface spanning 3N dimensions, where N is the number of atoms in the system. Since the PES is used to analyze chemical bonding between atoms within the system, the PES can also be simplified by ignoring translational and rotational degrees of freedom for the entire system. This reduces the dimensionality of the PES from 3N to 3N − 5 for linear systems (e.g., diatomic molecules or perfectly linear molecules such as acetylene) or 3N − 6 for all other nonlinear systems. Furthermore, since visualization is difficult beyond three dimensions, PES drawings will show a 1-D or 2-D projection of this hypersurface where the z-axis is conventionally used to represent the scale for system energy. Any arbitrary PES will contain several interesting features. Minima on the PES correspond to mechanically stable configurations of a molecule or material, for example reactant and product states of a chemical reaction or different conformational isomers of a molecule. Because they are minima, the second derivative of the energy given by the PES with respect to any dimension will be positive. Minima can also be connected by pathways, which indicate chemical transformations ( Figure 4 , red line). Along such pathways, the second derivative can be positive, zero, or negative, but all other second derivatives must be positive. Transition states are first-order saddlepoints and thus represent a maximum in one coordinate and a minimum along all others. They correspond to the lowest energy barriers connecting two minima on the PES and are hence important for characterizing transitions between PES minima (e.g., chemical reactions). Second-order saddle points 57 and bifurcating pathways 58 can also exist, but these are not discussed further here. A wide range of higher-level properties of the system can be predicted or derived using the PES, including predicted thermodynamic binding constants, kinetic rate constants for reactions, or properties based on dynamics of the system. The task is then to choose an appropriate CompChem method that can carry out energy and gradient calculations on the system's PES. Figure 5 shows several different hierarchies for CompChem methods capable of doing this. Note that all of these methods mentioned in this figure fall in the categories of the bottom two regions in the multiscale hierarchy Figure 3 . All of these methods in principle could be used to develop . Potential energy surface (PES) of a fictional system with the two coordinates R 1 and R 2 . The minima of the PES correspond to stable states of a system, such as equilibrium configurations and reactants or products. Minima can be connected by paths (red line), along which rearrangements and reactions can occur. The maximum along such a path is called a transition state. Transition states are firstorder saddle points, a maximum in one coordinate and minima in all others. They correspond to the minimum energy required to transition between two PES minima and play a crucial role in the description of chemical transformations. coarse-grained or continuum models as well. Also note that methods in Figure 5 will bring very different computational costs and opportunities for methods involving ML. 2.2.1. Wavefunction Theory Methods. In standard computational quantum chemistry, a system's energy can be computed in terms of the Schrodinger equation. 61−63 The wavefunction that will be used to represent the positions of electrons and nuclei in the system (Ψ(r, R)) is hard to intuit since it can be complex valued. However, its square describes the real probability density of the nuclear (R) and electronic positions (r). In a real system, the position and interactions of a single particle in the system with respect to all other particles will be correlated, and this makes exactly solving the Schrodinger equation impossible for almost all systems of practical interest. To make the problem more tractable, one may exploit the Born−Oppenheimer approximation; 64 since nuclei are expected to move much slower than the electrons they can be approximated as stationary at any point along the PES. This allows the energy to be calculated using the timeindependent Schrodinger equation and solving the eigenvalue problem: Here, the Hamiltonian operator (Ĥ) is the sum of the kinetic (T) and potential (V̂) operators, Ψ is the wavefunction (i.e., an eigenfunction) that represents particles in the system, and E is the energy (i.e., an eigenvalue). In this way, nuclei can be treated as fixed point charges, and then, eq 1 can be transformed into the so-called electronic Schrodinger equation, where the Hamiltonian Ĥe l and wavefunction Ψ el (r; R) now only depend on the nuclear coordinates R in a parametric fashion: The above expression has Ĥe l composed of single electron (e) and pairwise electron−nuclear (eN), nuclear−nuclear (NN), and electron−electron (ee) terms. Here, we will now implicitly assume the Born−Oppenheimer approximation throughout and leave off the subscript indicating the electronic problem. However, we note that the Born−Oppenheimer approximation is not always sufficient and computationally intensive nonadiabatic quantum dynamics may be required. 65 In certain cases, semiclassical treatments are appropriate; for example, nonadiabatic effects between electrons and nuclei can be considered using nuclear-electronic orbital methods. 66 A second common approximation is to expand the total electronic wavefunction in terms of one-electron wavefunctions (i.e., spin orbitals): ϕ(r i ). Electrons are Fermions and therefore exhibit antisymmetry, which in turn results in the Pauli exclusion principle. Antisymmetry means that the interchange of any two particles within the system should bring an overall sign change to the wavefunction (i.e., from + to −, or vice versa). This property is conveniently captured mathematically by combining one electron spin orbitals into the form of a Slater determinant: Note that a determinant's sign changes whenever two columns or rows are interchanged, and in a Slater determinant this corresponds to interchanging electrons and thus the physically appropriate sign change for the overall wavefunction. Additionally, n 1 ! is a normalizing factor to ensure the wavefunction is unitary. The spin orbitals can be treated as a mathematical expansion using a basis set of μ functions χ μ , each having coefficients c μi , which are generally Gaussian basis functions, 67 The different types of mathematical functions bring different strengths and weaknesses, but these will not be discussed further here. A universal point is that larger basis sets will have more basis functions and thus give a more flexible and physical representation of electrons within the system. On one hand this can be crucial for capturing subtle electronic structure effects due to electron correlation. On the other hand, larger basis sets also necessitate significantly higher computational effort. A standard technique to avoid high computational effort in electronic structure calculations is to replace nonreacting core electrons with analytic functions using effective core potentials (ECPs, i.e., pseudopotentials). 74−89 This requires reformulating the basis sets that describe the valence space of the atoms, for example see refs 90 and 91. Larger nuclei that bring higher atomic numbers and larger numbers of electrons will also exhibit relativistic effects, 92 and relativistic Hamiltonians are based on the Dirac equation 93, 94 or quantum electrodynamics. 95 These methods can range from reasonably cost-effective methods 96, 97 to those bringing extremely high computational cost. 98 Practical applications have traditionally used standard nonrelativistic Hamiltonian methods, along with ECPs (or pseudopotentials) that have been explicitly developed to account for compressed core orbitals that result from relativistic effects. Using the Born−Oppenheimer approximation (eq 2) together with a Slater determinant wavefunction (eq 3) expressed in a finite basis set (eq 4) brings about the simplest wavefunction based method, the Hartree−Fock (HF) approach (for historical context see refs 99−101). The HF method is a mean field approach, where each electron is treated as if it moves within the average field generated by all other electrons. It is generally considered inaccurate when describing many chemical systems, but it continues to serve as a critical pillar for CompChem electronic structure calculations since it either establishes the foundation for all other accurate methods or provides energy contributions (i.e., exact exchange) that is not provided in some CompChem methods. CompChem methods that achieve accuracy higher than HF theory are said to contain electron correlation, a critical component for understanding molecules and materials (as described in more detail in section 2.2.2.). Expressing Ψ as a Slater determinant and rearranging eq 2 while temporarily neglecting nuclear−nuclear interactions allows one to define the HF energy in terms of integrals of the electronic spin orbitals: r r r r r R r r r r r r r r r r r r r r r r r where the first two terms are referred to as one-electron integrals and represent the kinetic energy of the electrons and the potential energy contributions from electron-nuclei interactions. The remaining terms are two-electron integrals that describe the potential energy arising from electron− electron interactions and are called Coulomb and exchange integrals. Using Lagrange multipliers, one can express the HF equation in a compact matrix form, the so-called Roothan− Hall equations, 102−104 which allow for an efficient solution: Each matrix has a size of μ × μ, where μ is the number of basis functions used to express the orbitals of the system. C is a coefficient matrix collecting the basis coefficients c μi (see eq 4), while S is the overlap matrix measuring the degree of overlap between individual basis functions and ϵ is a diagonal matrix of the spin orbital energies. Finally, F is the Fock matrix, with elements of a similar form as in eq 5, but expressed in terms of basis functions χ μ . One important detail not readily apparent in eq 6 is that the Fock matrix depends on the orbital coefficients that must be provided before eq 6 can be solved. As such, eq 6 cannot be solved in closed form, but instead requires a socalled self-consistent field approach. Starting from an arbitrary set of trial (i.e., initial guess) functions, one iteratively solves for optimal molecular orbital coefficients, which are then used to construct a new Fock matrix, until a minimum energy is reached in accordance with the variational principle of quantum mechanics. Evaluating and transforming the twoelectron integrals in eq 5 are a significant bottleneck for these calculations and thus the computational effort of the HF methods formally scales as ( ) 4 μ with the number of basis functions. This means that a calculation on a system twice as large will require at least 2 4 = 16 times as much computing time. The electronic exchange interaction resulting from the antisymmetry of the wavefunction imposes a strong constraint on the mathematical form of ML models for electronic wavefunctions. Construction of efficient and reliable antisymmetric ML models for the many-body wavefunction is an important area of current research. 105, 106 2.2.2. Correlated Wavefunction Methods. The system's correlation energy is defined as sum of electron−electron interactions that originate beyond the mean-field approximation for electron−electron interactions that is provided by HF theory. While correlation energy makes up a rather small contribution to the overall energy of a system (usually about 1% of the total energy), because internal energies in molecular and material systems are so enormous, this contribution becomes rather significant. As an example, most molecular crystals would be unstable as solids if calculated using the HF level of theory. The missing component is attractive forces that are obtained from levels of theory that account for correlation energy. Correlation energies are obtained by calculating additional electron−electron interaction energies that arise from different arrangements of electron configurations (i.e., different possible excited states) that are not treated with the mean field approach of HF theory. The most complete correlation treatment is the full configuration interaction (FCI) method, which is the exact numerical solution of the electronic Schrodinger equation (in the complete basis limit) that considers interactions arising from all possible excited configurations of electrons. The FCI wavefunction takes the form of a linear combination of all possible excited Slater determinants which can be generated from a single HF reference wavefunction by electron excitations: (7) where Ψ β α represents the Slater determinant obtained by exciting an electron from orbital α into an unoccupied orbital β, and the as are expansion coefficients determining the weight of the different contributing configurations. Expectedly, FCI calculations scale extremely poorly with the number of electrons in the system ( n ( ) ! ), as the number of possible configurations grows rapidly, making them feasible only for small molecules. For an example of the state of the art, FCI calculations have been used to benchmark highly accurate methods on calculations on a benzene molecule. 107 Most correlated wavefunction methods use a subset of the possible configurations in eq 7 to be computationally tractable. The configuration interaction (CI) 108 method for example only includes determinants up to a certain permutation level (e.g., single and double excitations in CISD). Alternatively, MPn 35 (e.g., MP2) recovers the correlation energy by applying different orders of perturbation theory. Coupled cluster theory, another widely used post-HF method, includes additional electron configurations via cluster operators. 109 One coupled cluster method that involves single, double, and perturbative triples excitations, CCSD(T), is referred to as the "goldstandard" approach for CompChem electronic structure methods since it brings high accuracy for molecular energies. However, there are many newer advances that improve upon CCSD(T). 107, 110 Note that just because a method has a reputation for being accurate does not mean that it will be for all systems. For example, consider again the benzene molecule, which is best illustrated having dotted resonance bond depicting a planar molecule with equal C−C bond lengths. Such a geometry will not be found to be stable with many different CompChem methods, in part because of subtle chemical bonding interactions or errors that arise from specific choices of basis sets used with different levels of theory. 111, 112 A key point to reiterate is that correlated wavefunction methods are founded on the HF theory, and so they are even more computationally demanding than HF calculations, for example, n ( ) 5 for MP2, n ( ) 6 for CCSD and CISD and n ( ) 7 for CCSD(T). However, this computational expense is alleviated by continually improving computing resources (e.g., the usability of graphics processing units (GPUs)) 113−116 and the development of efficiency enhancing algorithms, such as pseudospectral methods, 117−119 resolution of the identity (RI), 120 domain-based local pair natural orbital methods (DLPNO), 121 and explicitly correlated R12/F12 methods. 122 There are also ongoing efforts to develop other CompChem methods based on quantum Monte Carlo 123 and density matrix renormalization group theory (DMRG) 124 to provide high accuracy with competitive scaling with other computational methods. Efforts are beginning to become implemented that use ML to accelerate these types of calculations. 105,106,125−129 Schemes have also been developed to exploit systematic errors between different levels of theory with different basis sets so that approximations can be extrapolated toward an exact result. Examples include the complete basis set (CBS), 130 Gaussian Gn, 131 Weizmann (W-)n 132 methods, and high accuracy extrapolated ab initio thermochemistry (HEAT) 133 methods. For a recent review on these and other methods, see ref 134 . These schemes are also becoming a target of recent work using ML methods. 135 HF determinants provide good baseline approximations of the ground state electronic structure of many molecules, but they may describe poorly more complicated bonding that arises during bond dissociation events, excited states, and conical intersections. 136−139 Some many-body wavefunctions are best described as a superposition of two or more configurations, for example, when other configurations in eq 7 can have similar or higher expansion coefficients a than the HF determinant. For this reason, high quality single reference methods like CCSD(T) fail because the theory assumes that salient electronic effects are captured by the initial single HF configuration. (In fact, methods such as CCSD(T) have been implemented with diagnostic approaches available that let users know when there may be cause for concern). 140 −142 In these cases, it may no longer be trivial to find reliable black-box or automated procedures (e.g., in situations involving resonance states, chemical reactions, molecular excited states, transition metal complexes, and metallic materials, etc.). 136 Socalled multiconfiguration approaches, 136 such as the generalized valence bond (GVB) method 143 or the complete active space self-consistent field (CASSCF), 144 the multireference CI (MRCI) methods, 145 complete active space perturbation theory (CASPT2), 146 or multireference coupled cluster (MRCC), 147, 148 can more physically model these systems since they employ several suitable reference configurations with different degrees of correlation treatments. These methods are not black-box and should be expected to require an experienced practitioner with CPI to choose the reference states that can substantially influence the quality of results. 149 This is an area though where ML can bring progress in automating the selections of physically justified active spaces. 129 In closing, there are a large number of available correlated wavefunction methods but many are even more costly than HF theory by virtue of requiring an HF reference energy expression shown in eq 5. Figure 5a depicts a so-called "magic cube" (that is an extension beyond a traditional "Pople diagram" 135, 150 ) that concisely shows a full hierarchy of computational approaches across different Hamiltonians, basis sets, and correlation treatment methods. This makes it easy to identify different wavefunction methods that should be more accurate and more likely to provide useful atomic scale insights (as well as those that would be more computationally intensive). Another important aspect highlighted in the "magic cube" is that higher level wavefunction methods require larger basis sets to successfully model electron correlation effects. A CCSD(T) computation carried out with a small basis set for example might only offer the same accuracy as MP2 while being two orders of magnitude more expensive to evaluate. 108 As was mentioned earlier with the benzene system, spurious errors with different basis sets might still be found that indicate problems with specific combinations of levels of theory and basis sets. The deep complexity of correlated wavefunction methods makes this a promising area for continued efforts in CompChem+ML research. 2.2.3. Density Functional Theory. Density-functional theory (DFT) 151 is another method to calculate the quantum mechanical internal energy of a system using an energy expression that relies on functionals (i.e., a function of a function) of electronic density ρ = |Ψ el(r; R) | 2 : Compared to wavefunction theory, DFT should be far more efficient since the dimensionality of a density representation for electrons will always be three rather than the 3n dimensions for any n-electron system described by a many-body wavefunction method. DFT has an important drawback that the exact expression for the energy functional is currently unknown, all approximations bring some degree of uncontrollable error, and this has precipitated disagreeable opinions from purists in chemical physics, especially those who are developing correlated wavefunction methods. However, there is also substantial evidence that DFT approximations are reasonably reliable and accurate for many practical applications that bring information, knowledge, and sometimes insight. We now provide a bird's-eye view of DFT-based methods. One thrust of DFT developments since its inception has focused on designing accurate expressions strictly in terms of a density representation, and these approaches are referred to as "kinetic energy (KE-)" or "orbital-free (OF-)" DFT. 152 Some energy contributions (e.g., nuclear-electron energy and classical electron−electron energy terms) can be expressed exactly, but other terms, such as the kinetic energy as a function of the density are not known and must be approximated. OF-DFT is very computationally efficient (these methods should scale linearly with system size 153, 154 ) but these formulations have not yet been developed to rival the accuracy or transferability of wavefunction methods, though they have been used for studying different classes of chemical and materials systems. 155−157 OF-DFT methods are also used in exciting applications modeling chemistry and materials under extreme conditions. 158−160 One should expect that once highly accurate forms are developed and matured, accurate CompChem calculations on electronic structures on systems having more than a million atoms might become commonplace. Indeed, there are efforts to use ML to develop more physical OFDFT methods. 161 , 162 The most commonly used form of DFT (which is also one of the most widely used CompChem methods in use today) is called Kohn−Sham (KS-)DFT. 163 In KS-DFT, one assumes a fictitious system of noninteracting electrons with the same ground state density as the real system of interest. This makes it possible to split the energy functional in eq 8 into a new form that involves an exact expression of the kinetic energy for noninteracting electrons: Here, T ni [ρ] is the kinetic energy of the noninteracting electrons, V eN [ρ] is the exact nuclear-electron potential, and V ee [ρ] is the Coulombic (classical) energy of the noninteracting electrons. The last two terms are corrections due to the interacting nature of electrons and nonclassical electron−electron repulsion. KS-DFT also expands the threedimensional electron density into a spin orbital-basis ϕ similar to HF theory to define the one-electron kinetic energy in a straightforward manner. This allows the T ni , V eN , and V ee expressions to be evaluated exactly and one arrives at the KS energy: The last two correction terms in eq 9 arise from electron interactions, and these are combined into the so-called "exchange-correlation" term (E xc ), which uniquely defines which scheme of KS-DFT is being used. In theory, an exact E xc term would capture all differences between the exact FCI energy and the system of noninteracting electrons for a ground state. The KS-DFT equations can be cast in a similar form as the Roothan−Hall equations (eq 6), which allows for a computationally efficient solution. Moreover, the elements of the KS matrix (which replaces the Fock matrix F) are easier to evaluate due to the fact that several of the computationally intensive integrals are now accounted for via E xc . Hence, the formal scaling for KS-DFT is n ( ) 3 with respect to the number of electrons. Even though this is much poorer scaling than ideally linear scaling OF-DFT, the exact treatment of noninteracting electrons makes KS-DFT more accurate. Furthermore, there are several modern exchange-correlation functionals that routinely achieve much higher accuracy than HF theory with less computational cost, and thus KS-DFT is a competitive alternative with many correlated wavefunction methods in many modern applications. A remaining problem is constructing a practical expression for the exchange-correlation functional, as its exact functional form remains unknown. This has spawned a wealth of approximations that have been founded with different degrees of first principles and/or empirical schemes. Classes of KS-DFT functionals are defined by whether the exchangecorrelation functional is based on just the homogeneous electron gas (i.e., the "local density approximation", LDA), that and its derivative (i.e., the "generalized gradient approximation", GGA), as well as other additional terms that should result in physically improved descriptions or error cancellations. The resulting hierarchy of KS-DFT functionals is often referred to as a "Jacob's Ladder" of DFT ( Figure 5b ). Generally, the higher up the ladder one goes, the more accurate but more computationally demanding the calculation. 164 However, the intrinsic inexactness in DFT makes it difficult to assess which functionals are physically better than others. 165, 166 Nevertheless, the Jacob's Ladder hierarchy is useful for clearly designating how and why newer methods should perform in specific applications (for perspective see refs 167−169) . Indeed, by being based on a ground-state representation for homogeneous electron gas, DFT calculations can sometimes bring more easily physical insight into some systems that are very challenging for wavefunction theory to examine (e.g., metals, where HF theory provides divergent exchange energy behaviors 170, 171 ). On the other hand, DFT is also generally not well-suited for studying physical phenomena involving localized orbitals or band structures such as those found in semiconducting materials with small band gaps, molecular or material excited charge transfer states, or interaction forces that can arise due to excited states, e.g. dispersion (or London) forces. The former features can normally be treated using Hubbard-corrected DFT+U models that require a systemspecific U−J parameter 172,173 or more generalizable but much more computationally expensive hybrid DFT approaches. Dispersion forces (i.e., van der Waals interactions) are nonexistent in semilocal DFT approximations, and it is now commonplace to introduce them into DFT calculations using a variety of different methods. 36 There is also growing interest in using embedded CompChem calculation schemes that can partition systems into discrete regions that could be treated with highly accurate correlated wavefunction theory and computationally efficient KS-DFT schemes separately. 174−178 DFT has also been extended to the modeling of excited states in the form of time-dependent (TD-)DFT. 179 Similar to ground state DFT, TDDFT is a less computationally expensive alternative to excited state wavefunction-based methods. The approach yields reasonable results where excitations induce only small changes in the ground state density, e.g. low lying excited states. 179, 180 However, due to its single reference nature, TDDFT tends to break down in situations where more than one electronic configuration contribute significantly to the excited state. Just as with correlated wavefunction methods, there are already signs of CompChem+ML efforts to improve the applicability of DFT-based methods. 181 Further approximations based on wavefunctions and DFT methods have been developed to simplify and accelerate energy calculations. These so-called semiempirical methods still explicitly consider the electronic structure of a molecule but in a more approximate way than methods described above. Semiempirical approaches based on wavefunction theory include methods like extended Huckel theory and neglect of diatomic differential overlap (NDDO). 186 Both approaches are simplifications of the HF eqs (eq 5) by introducing approximations to the different integrals. In the NDDO approach, 187 only the two-electron integrals in eq 5 are considered, where the two orbitals on the right and left-hand side of the remaining two-center (and one-center) integrals are then approximated by introducing a set of empirical functions, one for each unique type of integral. Moreover, the overlap matrix in eq 6 is assumed to be diagonal, which greatly simplifies the energy evaluation. This reduces the required computational effort tremendously and allows the scaling of these approaches to be reduced to N ( ) 2 . NDDO serves as a basis for more sophisticated semiempirical schemes, such as AM1, 188 PM7, 189 and MNDO, 190 where the energy is usually determined selfconsistently using a minimally sized basis set. Inadequacies in theory can be compensated by different empirical parametrization schemes that can allow these calculations to rival the accuracy of higher level theory for some systems. For example Dral et al. 191 provided a recent "big-data" analysis of the performance of several semiempirical methods with large data sets. Semiempirical schemes are also carried over to approximate KS-DFT with so-called density functional tight binding (DFTB). 192 DFTB simplifies the KS eqs (eq 10) by decomposing the total electron density ρ into a density of free and neutral atoms ρ 0 and a small perturbation term δρ 0 (ρ = ρ 0 + δρ 0 ). Expanding eq 10 in the perturbation δρ 0 makes it possible to partition the total energy into three terms amendable to different approximation schemes: E rep is a repulsive potential containing interactions between the nuclei and contributions from the exchange correlation functional (these are typically approximated via pairwise potentials). The charge fluctuation term E Coul is modeled as a Coulomb potential of Gaussian charge distributions computed from the approximate density. Finally, E BS refers to the "band structure" term, which considers the electronic structure and contains contributions from T ni , V eN , and the exchange correlation functional (see eq 10). To compute E BS , the density is expressed in a minimal basis of atomic orbitals, similar as in NDDO. The necessary Hamiltonian and overlap integrals are then evaluated via an approximate scheme based on Slater−Koster transformations. In addition to the energy, atomic partial charges are also computed in this step, which are then used in E Coul . As a consequence, DFTB equations can also be solved self-consistently. DFTB methods are parametrized by finding suitable forms for the repulsive potential and adjusting the parameters used in the Slater−Koster integrals. Non-selfconsistent and self-consistent tight-binding DFT meth-ods 193, 194 have been developed for simulating large scale systems. Semiempirical methods have also been a target of different ML schemes, yielding improved parametrization schemes and more accurate functional approximations. 195−198 2.2.5. Nuclear Quantum Effects. The quantum nature of lighter elements, such as H−Li, and even heavier elements that form strong chemical bonds (C−C bond in graphene for example 199 ) gives rise to significant nuclear quantum effects (NQEs). Such effects are responsible for large differences from the Dulong−Petit limit of the heat capacity of solids, isotope effects, and the deviations of the particle momentum distribution from the Maxwell−Boltzmann equation. 200 To capture NQEs, path-integral molecular dynamics (PIMD) 201, 202 or centroid molecular dynamics (CMD) 203, 204 can be used, but these methods are associated with much higher computational costs (usually about 30 times higher) compared with classical MD simulations using point nuclei. Moreover, because systems may be influenced by competing NQEs, the extent of NQEs is sensitive to the potential energy surface assumed. (Semi)local DFT approaches may not even qualitatively predict isotope fractionation ratios, and usually hybrid DFT is needed to reach quantitative accuracy. 205 However, employing hybrid DFT calculations or other high level methods in PIMD/CMD simulations can accrue extremely high computational costs. For this reason, ML force fields have been proposed as efficient means to carry out PIMD simulations, enabling essentially exact quantummechanical treatment of both electronic and nuclear degrees of freedom, at least for small molecules with dozens of atoms. 206, 207 2.2.6. Interatomic Potentials. Interatomic potentials introduce an additional level of abstraction compared to methods described above. Instead of using exact quantum mechanical expressions to create the PES for the system, analytic functions are used to model a presupposed PES that contains explicit interactions between atoms, while electrons are treated in an implicit manner (sometimes using partial charge schemes). 251−256 Interatomic potentials thus are (oftentimes dramatically) more computationally efficient than correlated wavefunction, DFT, and semiempirical approaches. This efficiency makes it possible to study even larger systems of atoms (e.g., biomolecules, surfaces, and materials) than is possible with other computational methods. Note that different empirical potentials bring substantially different computational efficiencies; for example Lennard-Jones (LJ) potentials are 248 and related models 249, 250 more efficient than classical forcefields (FFs) like AMBER and CHARMM, while those are more efficient than most bondorder potentials, such as ReaxFF. 245 , 246 The degree of efficiency arises from the balance of using accurate or physically justified functional forms, approximations, and model parametrizations. There are many different formulations (see Figure 5c ), and we will discuss the most general classes. An overview of the different types of potentials and their features is provided in Table 2 . For extensive discussions on these methods including semiempirical approaches, we refer to the extensive review by Akimov and Prezhdo (ref 257 ). An excellent review for interatomic potentials is provided by Harrison et al. (ref 258) , and an excellent overview of modern methods can be found in a special issue of J. Chem. Phys. 259 The distinctions between different types of FFs can be blurry sometimes, and we will differentiate categories in ascending complexity. One of the simplest interatomic potentials is the LJ potential: 260 Ä It models the total energy as the sum of all pairwise interaction between atoms i and j using an attractive and repulsive term depending on the interatomic distance r ij . ε ij modulates the strength of the interaction function, while σ ij defines where it reaches its minimum. The LJ potential is a prototypical "good model" of interatomic potentials, as it has a sufficiently simple physical form with only two parameters while still yielding useful results. For covalent systems, such as bulk carbon or silicon, just pairwise distances are not sufficient to capture the local coordination of the atoms, and many empirical potentials 212, 213, 261 for these systems were expressed as a function of the pairwise distances and three-body terms within a certain cutoff distance. The pairwise term can take the form of LJ-type, electrostatic, or harmonic potentials, and the three-body term is usually a function of the angles formed by sets of three atoms. So-called class I classical FFs introduce a more complicated energy expression: The first three terms are the energy contributions of the distances (r ij ), angles (θ ijk ) and dihedral angles (ϕ ijkl ) between bonded atoms. Because of this, they are also referred to as bonded contributions. Bond and angle energies are modeled via harmonic potentials, with the k ij and k ijk parameters modulating the potential strength and r ̅ ij and θ̅ ijk are the equilibrium distances and angles. The dihedral term is modeled with a Fourier series to capture the periodicity of dihedral angles, with k ijkl and ϕ ijkl as free parameters. The last two terms account for nonbonded interactions. The long-range electrostatics are modeled as the Coulomb energy between charges q i and q j , and the van der Waals energy is treated via a LJ potential (eq 12). In Class I/II FFs, empirical parameters are tabulated for a variety of elements in wide ranges of chemical environments (for example ref 262) . Parameters for any one system should not necessarily be assumed to transfer well to other systems, and reparametrizations may be needed depending on the application. Different sets of parametrization schemes give rise to different types of classical FFs, with CHARMM, 217 Amber, 214 An extension beyond these FFs are class II (i.e., "polarizable") FFs, where the static charges are replaced by environment dependent functions (e.g., AMOEBA 263 ). A significant advantage to the class I and II types of FFs is that they are computationally efficient, which makes them well suited for MD simulations of complex and extended (bio)molecules, such as proteins, lipids, or polymers. Implementations of FF calculations on GPUs makes these simulations extremely productive. 264−268 A disadvantage of Class I and II types of interatomic potentials is that they rely on predefined bonding patterns to compute the total energy, and this limits their transferability. In general, bonds between atoms are defined at the beginning of the simulation run and cannot change. Furthermore, bonding terms make use of harmonic potentials that are not suitable for modeling bond dissociation. Reactive potentials, which eschew harmonic potential dependencies and thus can describe the formation and breaking of chemical bonds, include the embedded atom method (EAM, Figure 5c ), which is used widely in materials science. 235 EAM is a type of many-body potential primarily used for metals, where each atom is embedded in the environment of all others. The total energy is given by Ä F i is an embedding function and ρĩ an approximation to the local electron density based on the environment of atom i. F i (ρĩ) can be seen as a contribution due to nonlocalized electrons in a metal. V ij is a term describing to the core−core repulsion between atoms. An EAM potential is determined by the functional forms used for F i and V ij , as well as how the density is expressed. Its dependence on the local environment without the need for predefined bonds make EAM well suited for modeling material properties of metals. An extension of EAM is modified EAM (MEAM), 236 which includes directional dependence in the description of the local density ρĩ, but this brings greater computational cost. EAMs also form the conceptual basis of the embedded atom neural network (EANN) machine learning potentials (MLPs). 269 Another common type of reactive potentials are bond-order potentials (BOPs). In general, BOPs model the total energy of a system as interactions between the neighboring atoms: V rep and V att are repulsive and attractive potentials depending on the interatomic distance r ij . A cutoff function f cut restricts all interactions to the local atomic environment. b ij(k) is the bond order term, from which the potential takes its name. This term measures the bond order between atoms i and j (i.e., "1" for a single bond, "2" for a double bond, and "0.6" for a partially dissociated bond). Bond orders can also depend on neighboring atoms k in some implementations. BOPs are typically used for covalently bound systems, such as bulk solids and liquids containing hydrogen, carbon or silicon (e.g., carbon nanotubes and graphene). Depending on the exact form of the expressions in eq 15, different types of BOPs are obtained, such as Tersoff 240, 241 and REBO 239, 242 potentials. BOPs can also be extended to incorporate dynamically assigned charges, yielding potentials like COMB 243, 270 or ReaxFF. 245, 246 As with EAMs, BOPs have also been used as a starting point for constructing more elaborate MLPs 271−273 that will also be discussed in more detail in section 3. While efficient and versatile, all interatomic potentials described above are inherently constrained by their functional forms. A different approach is pursued by MLPs, such as Behler−Parinello Neural Networks, 274 q-SNAP, 275 and GAP potentials 276 (Figure 5c ). In MLPs, suitable functional expressions for interactions and energy are determined in a fully data-driven manner and ultimately only limited by the amount and quality of available reference data. One can then use substantially more data to generate a much more accurate MLP than would be possible when using, for instance, a ReaxFF potential trained on similar data sets. 277 For the sake of completeness, we note that all approaches described here are fully atomistic−each atom is modeled as an individual entity. It is also possible to combine groups of atoms into pseudoparticles giving rise to so-called coarse grained methods. On an even higher level of abstraction, whole environments can be modeled as a single continuum. As such approaches are not subject of the present review, we refer the interested reader, for example, to refs 278 and 279. Once an energy calculation is completed by one of the CompChem methods above, many other interesting molecular properties can be calculated. Most of these properties can be obtained as the response of the energy to a perturbation, for example, changes in nuclear coordinates R, external electric (ϵ) or magnetic (B) fields or the nuclear magnetic moments {I i }. Given an expression for the energy, which depends on the above quantities, so-called response properties can be computed via the corresponding partial derivatives of the energy. A general response property Π then takes the form where the ns indicate the n-th order partial derivative with respect to the quantity in the subscript. 102 A common response property is nuclear forces F = −Π (1, 0, 0, 0) that are the negative first derivatives of the energy with respect to the nuclear positions. Such calculations allow a plethora of different geometry optimization schemes for chemical structures on the PES. Hessian calculations corresponding to the second derivative of energy with respect to nuclear positions are necessary to confirm the location of first-order saddle points on the PES and identify normal modes and their frequencies for vibrational partition functions that are useful for modeling temperature dependencies based on statistical thermodynamics. Hessian calculations are computationally costly, since they normally involve calculations based on finite differences methods involving many nuclear force calculations. Many methods have been developed to allow CompChem algorithms to sample minimum energy regions of the PES 280−284 or precisely locate points of interest. 285, 286 Historically, many of these techniques have relied on approximate or full Hessian calculations, 287 but other approaches, such as the nudged-elastic band 288, 289 and string 290−292 methods, are popular alternatives that do not require a Hessian calculation. There have also been efforts using different forms of ML to accelerate procedures or overcome long-standing challenges in efficient sampling of and optimization on the PES. 293− 298 The general expression above can provide a wealth of other quantities, some of which are relevant for molecular spectroscopy or provide a direct connection to experiment (see Table 3 ). Infrared spectra can be simulated based on dipole moments μ = −Π (0, 1, 0, 0), while molecular polariziabilities α = −Π (0, 2, 0, 0) offer access to polarized and depolarized Raman spectra. Nuclear magnetic shielding tensors σ = Π (0,0,1,1) are a central response property of a magnetic field. These allow the computation of chemical shifts recorded in nuclear magnetic resonance (NMR) spectroscopy via their trace Tr The beauty of this formalism lies in the fact that a single energy calculation method provides access to a wide range of quantum chemical properties in a highly systematic manner. A large number of modern MLPs use the response of the potential energy with respect to nuclear positions to obtain energy conserving forces. However, far fewer applications model perturbations with respect to electric and magnetic fields. Ref 299 extends the descriptor used in the Faber−Christensen− Huang−Lilienfeld (FCHL) Kernel by adding an explicit field dependent term that makes it possible to predict dipole moments across chemical compound space. Ref 300 introduces a general neural network (NN) framework to model interactions of a system with vector fields, which was then used to predict dipole moments, polarizabilities and nuclear magnetic shielding tensors as response properties. An important aspect of CompChem is molecular descriptions from within a solution environment. Simulating a dynamical environment composed of many surrounding molecules is usually not feasible with electronic-structure methods. To circumvent this problem, solvation modeling schemes have been devised (see refs 301−306 for discussions on this topic). The most popular approaches are so-called polarizable continuum solvent models (PCM). 279 They model the electrostatic interaction of a solute molecule with its environment by representing the charge distribution of the solvent molecules as a continuous electric field, the reaction field. This dielectric continuum can be interpreted as a thermally averaged representation of the environment and is typically assigned a constant permittivity depending on the particular solvent to be modeled (ε = 80.4 for water). The solute is placed inside a cavity embedded in this continuum. The charge distribution of the molecule then polarizes the continuous medium, which in turn acts back on the molecule. To compute the electrostatic interactions arising from this mutual polarization with electronic structure theory, a selfconsistent scheme is employed. After constructing a suitable molecular cavity, a Poisson problem of the following form is solved: Here, ρ m (r) is the charge distribution of the solute and ϵ(r) is the position dependent permittivity, which usually is set to one within the cavity and the ε of the solvent on the outside. V(r) is the electrostatic potential composed of the two terms where V m (r) is the solute potential and V s (r) is the apparent potential due to the surface charge distribution σ(s) Γ indicates the surface of the cavity. Eq 17 is solved numerically to obtain the surface charge distribution σ(s). Once σ(s) has been determined in this fashion, the potential is computed according to eq 19 and used to construct an effective Hamiltonian of the form where Ĥis the vacuum Hamiltonian. These equations are then solved self-consistently in a Roothan−Hall or KS approach, yielding the electrostatic solvent−solute interaction energy. This scheme is also called the self-consistent reaction field approach (SCRF). Continuum models differ in how the cavities are constructed and how eq 17 is solved to obtain the surface charge distribution. Variants include the original PCM model, also referred to as dielectric PCM (D-PCM), 307 the integral equation formulation of PCM (IEFPCM), 308 SMD, 309 conductor PCM (C-PCM), 310 or the conductor-like screening model (COSMO). 311 The latter two approaches replace the dielectric medium by a perfect conductor to allow for a particularly efficient computation of σ(s). PCMs can be further extended with statistical thermodynamics treatments to account for solutes having different size and concentration effects, and this leads to models such as COSMO-RS. 312 A drawback of most PCM-like approaches is that they neglect local solvent structures. Thus, they cannot reliably account for situations where explicit solvent interactions are important, for example, when for stabilizing specific sites for a transition state through hydrogen bonding. 301 Furthermore, while implicit models might be parametrized to fit bulk-like properties of mixed or ionic solvents (e.g., ref 313.), the complex local solvent environment presented by these systems are treatable by other means. For mixed solvent systems a range of hybrid schemes such as COSMO-RS, 305 reference interaction site models (RISMs) 314, 315 or QM/MM 316−318 approaches have been developed. As an in-depth discussion of these alternative schemes exceeds the scope of this Review, we instead refer to other references. 319 By solving for electronic structures, by whatever means is appropriate, one obtains molecular energies and energy spectrum (typically corresponding to quasiparticles given by KS or HF orbitals). From these, one can then compute molecular or material properties that arise from quantum mechanical and statistical operators, for example, thermodynamic energies, response properties, highest and lowest occupied molecular orbital energies, and band gaps, among other properties. Many properties are defined by the characters of the orbitals, and having knowledge of these should always be helpful and aid in deriving useful insight into designing molecules and materials for a particular function. Furthermore, one is often interested in how these molecules behave over time (i.e., the dynamics given some statistical ensemble that depends on temperature, pressure, etc) over all possible degrees of freedom. By understanding how energies and forces change over time, one can predict thermal and pressure dependencies as well as spectroscopic properties for advanced knowledge that builds toward insightful predictions. Molecular and materials chemistry is vastly complex and variable, and one often faces a question of whether to span wider chemical spaces versus take deeper explorations of a specific phenomenon. A key problem is that even after the effort of either approach, it is also not as clear how information for one system might be related to another to provide more knowledge. For instance, one may decide to calculate all possible properties of ethanol with a CompChem method, but understanding how any calculated property would be correlated to an analogous property of isopropanol is still usually difficult to do. There is great interest in understanding chemical and materials space through applications of quantitative structure activity/property relationships, 326, 327 cheminformatics, 328 conceptual DFT, 329 and alchemical perturbation DFT. 330 All these applications benefit from greater access to CompChem data, and all have promise as being interfaced with ML for transformative applications to catalyze wisdom and impact. ML has had a dramatic impact on many aspects of our daily lives and has arguably become one of the most far-reaching technologies of our era. It is hard to overstate its importance in solving long-standing computer science challenges, such as image classification 331−334 or natural language processing, 335−339 tasks that require knowledge that is hard to capture in a traditional computer program. 340−342 Previous classical artificial intelligence (AI) approaches relied on very large sets of rules and heuristics, but these were unable to cover the full scope of these complex problems. Over the past decade, advances in ML algorithms and computer technology made it possible to learn underlying regularities and relevant patterns from massive data sets that enable automatic constructions of powerful models that can sometimes even outperform humans at those tasks. This development inspired researchers to approach challenges in science with the same tools, driven by the hope that ML would revolutionize their respective fields in a similar way. Here, we give an overview of these developments in chemistry and physics to serve as an orientation for newcomers to ML. We will first explain what tasks ML is good at and when it might not be the best solution to a problem. We will start by introducing the field of ML in general terms and dissect its strengths and weaknesses. In the most general sense, ML algorithms estimate functional relationships without being given any explicit instructions of how to analyze or draw conclusions from the data. Learning algorithms can recover mappings between a set of inputs and corresponding outputs or just from the inputs alone. Without output labels, the algorithm is left on its own to discover structure in the data. Universal approximators 343, 344 are commonly used for that purpose. These reconstruct any function that fulfills a few basic properties, such as continuity and smoothness, as long as enough data is available. Smoothness is a crucial ingredient that makes a function learnable, because it implies that neighboring points are correlated in similar ways. That property means that one can draw successful conclusions about unknown points as long as they are close to the training data (coming from the same underlying probability distribution). 341 In contrast, completely random processes in the above sense allow no predictions. An association that immediately springs to mind is traditional regression analysis, but ML goes a step further. . Supervised learning algorithms have to balance two sources of error during training: the bias and variance of the model. A highly biased model is based on flawed assumptions about the problem at hand (under-fitting). Conversely, a high variance causes a model to follow small variations in the data too closely, therefore making it susceptible to picking up random noise (overfitting). The optimal bias-variance trade-off minimizes the generalization error of the model, for example, how well it performs on unknown data. It can be estimated with cross-validation techniques. Chemical Reviews pubs.acs.org/CR Review Regression analyses aim to reconstruct the function that goes through a set of known data points with the lowest error, but ML techniques aim to identify functions to predict interpolations between data points and thus minimize the prediction error for new data points that might later appear. 345 Those contrasting objectives are mirrored in the different optimization targets. In traditional regression, the optimization task Ä only measures the fit to the data, but learning algorithms typically aim to find models f̂that satisfy Ä Both optimization targets reward a close fit, often using the squared loss . However, the key difference is an additional regularization term in eq 22, which influences the selection of candidate models by introducing additional properties that promote generalization. To understand why this is necessary, it is helpful to consider that eq 22 is only a proxy for the optimization problem Ä that we would actually like to solve. In an ideal world, we would minimize the loss function over the complete distribution of inputs and labels p(x, y). However, this is obviously impossible in practice, so we apply the principle of Occam's razor that presumes that simpler (parsimonious) hypotheses are more likely to be correct. With this additional consideration we hope to be able to recover a reasonably general model, despite only having seen a finite training set. A common way to favor simpler models is via an additional term in the cost function, which is what ∥ΓΘ∥ 2 in eq 22 expresses. Here, Γ is a matrix that defines "simplicity" with regard to the model parameters Θ. Usually, (where I is the identity matrix and λ > 0) is chosen to simply favor a small L 2 -norm on the parameters, such that the solution does not rely on individual input features too strongly. This particular approach is called Tikhonov regularization, 346−348 but other regularization techniques also exist. 349, 350 A model that is heavily regularized (i.e., using a large λ) will eventually become biased in that it is too simplistic to fit the data well. In contrast, a lack of regularization might yield an overly complex model with high variance. Such an "overly fit" model will follow the data exactly to the point that it also models the noise components and consequently fails to generalize (see Figure 6 ). Finding the appropriate amount of regularization λ to manage under-and overfitting is known as attaining a good bias-variance trade-of f. 351 We will introduce a process called cross-validation to address this challenge further below (see section 3.4.3). 3.1.1. What Does ML Do Well? Implicit Knowledge from Data. ML algorithms can infer functional relationships from data in a statistically rigorous way without detailed knowledge about the problem at hand. ML thus captures implicit knowledge from a data set−even aspects where CPI might not be available. Traditional modeling approaches, such as the classical force fields discussed in section 2.2.6, rely on preconceived notions about the PES that is being modeled and, thus, the way the physical system behaves. In contrast, ML algorithms start from a loss function and a much more general model class. Within the limits permitted by the noise inherent to the data, generalization can be improved to arbitrary accuracy given increasingly larger informative training data sets. This process allows us to explore a problem even before there is a reasonably full understanding. An ML predictor can serve as a starting point for theory building and be regarded as a versatile tool in the modeling loop: building predictive models, improving them, enriching them by formal insight, and improving further and ultimately extracting a formal understanding. More and more research efforts start to combine data-driven learning algorithms with rigorous scientific or engineering theory to yield novel insights and applications. 10, 16, 352 Redundancy in CompChem Calculations. For a quantum chemical property for compounds in a data set, CompChem calculations need to be repeated independently for each input, even if they are very similar. No formally rigorous method exists to exploit redundancies in the calculations in such a scenario. The empiricism of learning algorithms however does provide a pathway to extract information based on compound structure similarity. A data-driven angle allows one to ask questions in new ways that give rise to new perspectives on established problems. For example, unsupervised algorithms like clustering or projection methods group objects according to latent structural patterns and provide insights that would remain hidden when only looking at individual compounds. 3.1.2. What Does ML Do Poorly? Lack of Generality and Precision. Some difficult problems in chemistry and physics can be solved accurately with CompChem, but doing so would require significant resources. For example, enumerating all pairwise interactions in a many-body system will inevitably scale quadratically, and there is no obvious path around this. One might ask if empirical approaches can address such fundamental problems more efficiently, but this is unfortunately not possible since ML is more suited for finding solutions in general function spaces rather than in deterministic algorithms where constraints guide the solution process. However, if we were not as interested in finding a full solution but rather some aspect of it, the stochastic nature of ML can be beneficial. For instance, a traditional ML approach might not be the best tool for explicitly calculating the Schrodinger equation, but it might be a far more useful tool for developing a force field that returns the energy of a system without the need for a cumbersome wavefunction and a selfconsistent algorithm. As an example, Hermann et al. 105 used deep NNs to show how ML methods may be suitable for overcoming challenges faced by traditional CompChem approaches. Reliance on High-Quality Data. ML algorithms require a large amount of high quality data, and it is hard to decide a priori when a data set is sufficient. Sometimes, a data set may be large, but it does not adequately sample all the relevant systems one intends to model. For example, an MD simulation might generate many thousands of molecular confirmations used to train an ML force field, but perhaps that sampling only occurred in a local region of the PES. In this case, the ML force field would be effective at modeling regions of the PES it was trained to but useless in other regions until more data and broader sampling occurred. This feature is general to all Chemical Reviews pubs.acs.org/CR Review empirical models that are generally limited in their extrapolation abilities. Inability to Derive High-Level Concepts. Standard ML algorithms cannot conceptualize knowledge from a data set. Two main reasons are the nonlinearity and excessive parametric complexity of most models that allow many equally viable solutions for the same problem. 353, 354 It can be hard to gain insight into the modeled relationship because it is not based on a small set of simple rules. Techniques have emerged to make ML models interpretable (explainable AI−XAI 355 ). While helpful, drawing scientific insight clearly still requires human expertise. 352,355−361 Furthermore, the path from an ML model back to a physical set of equations is being explored, but it is far from being fully established automatically. 362−368 Prone to Artifacts. Despite following the rules of best practice, ML algorithms can give unexpected and undesired results. Instead of extracting meaningful relationships, they may occasionally exploit nuisance patterns within the underlying experimental design, like the model architecture, the loss function or artifacts in the data set. This results in a "clever Hans" predictor, 360 which technically manages the learning problem but uses a trivial solution that is only applicable within the narrow scope of the particular experimental setup at hand. The predictor will appear to be performing well, while actually harvesting the wrong information and, therefore, not allowing any generalization or transferable insights. For example, a recently proposed random forest predictor for the success of Buchwald−Hartwig coupling reactions 369 was later revealed to give almost the same performance when the original inputs were replaced by Gaussian noise. 370, 371 This finding strongly suggested that the ML algorithm exploited some hidden underlying structure in the input data, irrespective of the chemical knowledge that was provided through the descriptor. Even though the model might appear quite useful, any conclusions that rely on the importance of the chemical features used in the model were thus rendered questionable at best. This example demonstrates that out-ofsample validation alone is often not sufficient to establish that a proposed model has indeed learned something meaningful. Therefore, the hypothesis described by the model must be challenged in extensive testing in practically relevant scenarios like actual physical simulations. In other words the ML model needs to lead to a better understanding of the modeling itself and the underlying chemistry. ML models are classified by the type of learning problem they solve. Consider for instance a data scientist who develops an ML model that can predict acidity constants (pK a values) for any molecule. A researcher with knowledge of physical organic chemistry might be aware of the empirical Taft equation 29 that provides a linear free energy relationship between molecules on the basis of empirical parameters that account for a molecule's fundamental field, inductive, resonance, and steric effects (e.g., values related to Hammett ρ and σ values). There are several ways the data scientist might develop an ML model for this or another application. Examples mentioned here include supervised, unsupervised, and reinforcement learning. 3.2.1. Supervised Learning. Supervised learning addresses learning problems where the ML model f : ML ̂⎯ → ⎯ connects a set of known inputs and outputs , either to perform a regression or classification task. While the former maps onto a continuous space (e.g., energy, polarizability), the latter outputs a categorical value (e.g., acid or base; metal or insulator) for each data point. Using the pK a predictor example, a supervised learning algorithm could be trained to correlate recognizable chemical patterns or structures to experimentally known pK a values. The goal would be to deduce the relationship between these inputs and outputs, such that the model is able to generalize beyond the known training set. A standard universal approximator has to accomplish this learning task without any preconceived notion about the problem at hand and will, therefore, likely require many examples before it can make accurate predictions. Recently, a lot of research is being carried out that investigates ways to incorporate high-level concepts into the learning algorithm in the form of prior knowledge. 207, 372 In this vein, one could take into account chemically relevant parameters, such as Hammett constants so that the parametrized ML model incorporates the modified Hammett or Taft equation. An example of a classification problem in materials science is the categorization of materials, where identifying characteristics of the electronic structure can be used to distinguish between insulators and metals. 373 3.2.2. Unsupervised Learning. Unsupervised learning describes problems in which only the inputs are known, with no corresponding labels. In this setting, the goal is to recover some of the underlying structure of the data to gain a higherlevel understanding. Unsupervised learning problems are not as rigorously defined as supervised problems in the sense that there can be multiple correct answers, depending on the model and objective function that is applied. For example, one might be interested in separating conformers of a molecule from an MD trajectory, given exclusively the positions of the atoms. A clustering algorithm (like the k-means algorithm) could identify those conformers by grouping the data based on common patterns. 374, 375 Alternatively, a projection technique could reveal a lowdimensional representation of the data set. 376 Often data is represented in high dimension, despite being intrinsically lowdimensional. With the right projection technique, it is possible to retain the meaningful properties in a representation with fewer degrees of freedom. A conceptually simple embedding method is principal component analysis (PCA) in which the relationship that is sought to be preserved is the scalar product between the data points. 340 There are many other linear and nonlinear projection methods, such as multidimensional scaling, 377 kernel PCA (KPCA), 378,379 t-distributed stochastic neighbor embedding (t-SNE), 380 sketch-map, 381 and the uniform manifold approximation and projection (UMAP). 382 Finally, anomaly detection is another extension of unsupervised learning, where 'outliers' to the available data can be discovered. 383 However, without knowing the labels (in this example, the potential energy associated with each geometry), there is no way to conclusively verify that the result is correct. The literature is gradually seeing more instances of unsupervised learning, particular to reveal important chemical properties to efficiently explore chemical/materials spaces. 3.2.3. Reinforcement Learning. Reinforcement learning (RL) describes problems that combine aspects of supervised and unsupervised learning. RL problems often involve defining an agent within an environment that learns by receiving feedback in the form of punishments and rewards. The progress of the agent is characterized by a combination of explorative activity and exploitation of already gathered knowledge. 384 For chemistry applications, RL techniques are Chemical Reviews pubs.acs.org/CR Review being increasingly used for finding molecules with desired properties in large chemical spaces. 10 Universal approximators have their origins in the 1960s, where the hope was to construct "learning machines" that have similar capabilities as the human brain. An early mathematical model of a single simplified neuron emerged that was called a perceptron (eq 24). 385, 386 i k j j j j j j y Here, x denotes the N-dimensional input to the perceptron. It has N + 1 parameters consisting of w i (so-called weights) and a single b (a so-called threshold) that are adapted to the data. This adaption process is typically called "learning" (vide infra), and it amounts to minimizing a predefined loss function. In the 1960s, this simple NN had very limited use, as it was only able to model a linear separating hyperplane. Even simple nonlinear functions like the XOR were out of reach. 387 Thus, excitement waned but then reappeared two decades later with the emergence of novel models consisting of more neurons and their arrangement in multilayer NN structures 388 (see eq 25). Recent algorithmic and hardware advances now allow deep and increasingly complex architectures. 1 In eq 25, g(·) denotes an activation function that is a nonlinear transformation that allows complex mappings between input and output. As with the perceptron, the parameters of multilayer NNs can be learned efficiently using iterative algorithms that compute the gradient of the loss-function using the so-called back-propagation (BP) algorithm. 388−390 In the late 1980s, artificial NNs were then proven to be universal approximators of smooth nonlinear functions, 343, 391, 392 and so they gained broad interest even outside the ML community that then was still relatively small. In 1995, a novel technique called Support Vector Machine (SVM) 345, 393 and kernel-based learning were then proposed, 379,394−396 which came with some useful theoretical guarantees. SVMs implement a nonlinear predictor: where K is the so-called kernel. The kernel implicitly defines an inner product in some feature space and thus avoids an explicit mapping of the inputs. This "kernel trick" 397 makes it possible to introduce nonlinearity into any learning algorithm that can be expressed in terms of inner products of the input. 379 It has since been applied to many other algorithms beyond SVMs, 394 such as Gaussian Processes (GP), 348 PCA, 378, 379 and independent component analysis (ICA). 398 The most effective kernels are tailored to the specific learning task at hand, but there are many generic choices, such as the polynomial kernel K(x j , x) = (⟨x j , x⟩ − b) d , which describes inner products between degree d polynomials. Another popular choice is the Gaussian kernel 2σ 2 )). It is one of the most versatile kernels because it only imposes smoothness assumptions on the solution depending on the width parameter σ. 347, 395 As seen in eq 26, an SVM can also be understood as a shallow NN with a fixed set of nonlinearities. In other words, the kernel explicitly defines a similarity metric to compare data points, whereas NNs have more freedom to shape this transformation during training because they nest parametrizable nonlinear transformations on multiple scales. This difference gives both techniques unique strengths and drawbacks. Despite that, there exists a duality between both approaches that allows NNs to be translated into kernel machines and analyzed more formally (see refs 399−401) . In the context of CompChem, both NNs and kernel-based methods are the most used ML approaches. Simpler learners, such as nearest neighbor models or decision trees can still be surprisingly effective. Those have also been successfully used to solve a wide spectrum of problems including drug design, chemical synthesis planning, and crystal structure classification. 402−407 In the following, we summarize the overall ML process, starting from a data set all the way to a trained and tested model. The ML workflow typically includes the following stages: 1 Gathering and preparing the data 2 Choosing a representation 3 Training the model 3a Train model candidates 3b Evaluate model accuracy 3c Tune hyperparameters 4 Testing the model out of sample Note, that the progression to a good ML model is not necessarily linear and some steps (except the out of sample test) may require reiteration as we learn about the problem at hand. 3.4.1. Data Sets. On a fundamental level, ML models could be simply regarded as sophisticated parametrizations of data sets. While the architectural details of the model matter, the reference data set forms the backbone that ultimately determines the model's effectiveness. If the data set is not representative of the problem at hand, the model will be incomplete and behave unpredictably in situations that have been improperly captured. The same applies to any other shortcomings of the data set, such as biases or noise artifacts that will also be reflected in the model. Some of these data set issues are likely to remain unnoticed when following the standard model selection protocol since training and test data sets are usually sampled from the same distribution. If the sampling method is too narrow, errors seen during the crossvalidation procedure may appear to be encouragingly small, but the ML model will fail catastrophically when applied to a real problem. If the training and test sets come from different distributions, then techniques to compensate this covariate shift can be used. 408, 409 Robust models can generally only be constructed from comprehensive data sets, but it is possible to incorporate certain patterns into models to make them more data-efficient. Prior scientific knowledge or intuition about specific problems can be used to reduce the function space from which an ML algorithm has to select a solution. If some of the unphysical solutions are removed a priori, less data are necessary to identify a good model. This is why NNs and kernel methods, Chemical Reviews pubs.acs.org/CR Review despite both being broad universal function classes, bring different scaling behaviors. The choice of the kernel function provides a direct way to include prior knowledge such as invariances, symmetries, or conservation laws, whereas NNs are typically used if the learning problem cannot be characterized as specifically. 207, 372, 410 In general, without prior knowledge, NNs often require larger data sets to produce the same accuracy as well-constrained kernel methods that embody problem knowledge. This consideration is particularly important if the data is expensive, for example, if it comes from high quality experiments or expensive computations. 3.4.2. Descriptors. To apply ML, the data set needs to be encoded into a numerical representation (i.e., features/ descriptors) that allows the learning algorithm to extract meaningful patterns and regularities. 411−419 This is particularly challenging for unstructured data like molecular graphs that have well-defined invariable or equivariable characteristics that are hard to capture in a vectorial representation. For example, atoms of the same type are indistinguishable from each other, but it is hard to represent them without imposing some kind of order (which inevitably assigns an identity to each atom). Furthermore, physical systems can be translated and rotated in space without affecting many attributes. Only a representation that is adapted to those transformations can solve the learning problem efficiently. It turned out to be a major challenge to reconcile all invariances of molecular systems in a descriptor without sacrificing its uniqueness or computability. Some representations cannot avoid collisions, where multiple geometries map onto the same representation. Others are unique, but prohibitively expensive to generate. Many solutions to this problem have been proposed, based on general strategies such as invariant integration, 207 parameter sharing, 352,421−423 density representations, 276 or finger printing techniques. 424−433 Alternatively, an NN model infers the representation from data. 352, 424, 434, 435 To date, none of the proposed approaches are without compromise, which is why the optimal choice of descriptor depends on the learning task at hand. 3.4.3. Training. The training process is the key step that ties together the data set and model architecture. Through the choice of the model architecture, we implicitly define a function space of possible solutions, which is then conditioned on the training data set by selecting suitable parameters. This optimization task is guided by a loss function that encodes our two somewhat opposing objectives: (1) achieving a good fit to the data, while (2) keeping the parametrization general enough such that the trained model becomes applicable to data that is not covered in the training set (see the two terms in eq 22). Satisfying the latter objectives involves a process called model selection in which a suitable model is chosen from a set of variants that have been trained with exclusive focus on the first objective. Depending on the model architecture, more or less sophisticated optimization algorithms can be applied to train the set of model candidates. Kernel-based learning algorithms are typically linear in their parameters α⃗ (see eq 26). Coupled with a quadratic loss function, f f x x ( ( ), y) ( ( ) y) 2 ̂=̂− , they yield a convex optimization problem. Convex problems can be solved quickly and reliably due to only having a single solution that is guaranteed to be globally optimal. This solution can be found algebraically by taking the derivative of the loss function and setting it to zero. For example, kernel ridge regression (KRR) and GPs then yield a linear system of the form which is typically solved in a numerically robust way by factorizing the kernel matrix K. There exist a broad spectrum of matrix factorization algorithms, such as the Cholesky decomposition, that exploit the symmetry and positive definiteness properties of kernel matrices. 436−440 Factorization approaches are, however, only feasible if enough memory is available to store the matrix factors, and this can be a limitation for large-scale problems. In that case, numerical optimization algorithms provide an alternative: they take a multistep approach to solve the optimization problem iteratively by following the gradient: where γ is the step size (or learning rate). Iterative solvers follow the gradient of the loss function until it vanishes at a minimum, which is much less computationally demanding per step, because it only requires the evaluation of the model f. In particular, kernel models can be evaluated without storing K (see eq 28). NNs are constructed by nesting nonlinear functions in multiple layers, which yields nonconvex optimization problems. Closed-form solutions similar to eq 27 do not exist, which means that NNs can only be trained iteratively, that is, analogous to eq 28. Several variants of this standard gradient descent algorithm exist including stochastic or mini-batch gradient descent, where only an n-sized portion of the training data (x,y) i:i+n is considered in every step. Because of multiple local minima and saddle points on the loss surface, the global minimum is exponentially hard to obtain (since these algorithms usually converge to a local minimum). However, thanks to the strong modeling power of NNs, local solutions are usually good enough. 441 Hyperparameters. In addition to the parameters that are determined when fitting an ML model to the data set (i.e., the node weights/biases or regression coefficients), many models contain so-called hyperparameters that need to be fixed before training. Two types of hyperparameters can be distinguished: ones that influence the model, such as the type of kernel or the NN architecture, and ones that affect the optimization algorithm, for example, the choice of regularization scheme or the aforementioned learning rate. Both tune a given model to the prior beliefs about the data set and thus play a significant role in model effectiveness. Hyperparameters can be used to gauge the generalization behavior of a model. Hyperparameter spaces are often rather complex: certain parameters might need to be selected from unbounded value spaces, others could be restricted to integers or have interdependencies. This is why they are usually optimized using primitive exhaustive search schemes like grid or random searches in combination with educated guesses for suitable search ranges. Common gradient-based optimization methods typically cannot be applied for this task. Instead, the performance of a given set of hyperparameters is measured by evaluating the respective model on another training data set called the validation data set (see Figure 6 ). This process is also referred to as model selection. Chemical Reviews pubs.acs.org/CR Review Model Selection. Cross-validation or out-of-sample testing is a technique to assess how a trained ML model will generalize to previously unseen data. 340, 395 For a reasonably complex model, it is typically not challenging to generate the right responses for the data known from the training set. This is why the training error is not indicative of how the model will fulfill its ultimate purpose of predicting responses for new inputs. Alas, since the probability distribution of the data is typically unknown, it is not possible to determine this so-called generalization error exactly. Instead, this error is often estimated using an independent test subset that is held back and later passed through the trained model to compare its responses to the known test labels. If the model suffers from overfitting on the training data, this test will yield large errors. It is important to remember not to tweak any parameters in response to these test results, as this will skew this assessment of the model performance and will lead to overfitting on the test set. 442 Besides cross-validation, there are alternative ways to estimate the generalization error, for example via maximization of the marginal likelihood in Bayesian inference. 443 CHEMICAL SYSTEMS We now discuss ways that CompChem methods described in section 2 and ML methods in section 3 can be implemented as CompChem+ML approaches for insights into chemical systems. We often notice the lack of details about why an ML model is used and how it actually contributes to worthwhile and scientific insights. Thus, we will summarize the underlying attributes of conventional CompChem+ML efforts and then explain why these attributes are important for specific applications. To begin, consider molecules or materials in a data set, and any entry will be related to another based on an abstract concept of "similarity". While similarity is an applicationdependent concept, it should go hand in hand with CPI. For instance, physical properties of chemical systems can be attributed to the structure or composition of the chemical fragments within those systems. Thus, if chemical structures and compositions of two entries in the database were similar, then their physical properties would also likely be similar. For CompChem+ML using a supervised algorithm, a CompChem prediction might be made on a hypothetical system, pinpointed by an ML model that was trained to identify chemical fragments that correlate with labeled physical properties. This would be a direct exploitation of chemical similarity. Alternatively, for CompChem+ML using an unsupervised algorithm, the ML model would identify an underlying distribution or key features based on the similarity between pairs of entries in the data set without labels. This would be a more nuanced leveraging of chemical similarity. In both cases the accuracy, efficiency and reliability of the ML models depend strongly on how similarity is defined and measured. In this section, we will first describe state-of-the-art descriptors and kernels for atomic systems that can be used to quantify the similarity between chemical systems. We will then explain the essential attributes of good atomic descriptors. "√" = satisfies condition; "○" = partially satisfies condition; "X" = does not satisfy condition. b Computational efficiency ranks with grades Ⓐ−Ⓓ in descending order. The efficiency class reflects the extent that the descriptor requires expensive operations (e.g., a hierarchical processing or matching of inputs). c Descriptor has been used within periodic boundary conditions. d "T" = translational; "R" = rotational; "P" = permutational. e In this context, a descriptor is referred to as smooth if its first derivative with respect to nuclear positions is continuous. f Only invariant to permutations represented in the training data. Lastly for this section, we will elucidate why and how specific combinations of these descriptors and ML algorithms are beginning to revolutionize the field of CompChem. In CompChem, molecules and materials are usually represented by the Cartesian coordinates and the chemical elements of all the atoms. Thus, the size of the vector representation containing the coordinates and charges will be and N N 3 { }, respectively, for a system of size N. Even though these atomic coordinates provide a complete description of the system, they are hardly ever used as the input of a ML model because this vector would introduce substantial superfluous redundancy. For instance, an ML model might treat two identical molecules that are rotated or translated as different molecules, and that in turn might cause the ML model to predict different physical properties for the two otherwise indistinguishable molecules. There are further difficulties when comparing molecules having different numbers of atoms. To work around these problems, atomic coordinates are usually converted into an appropriate representation ψ that is suitable for a particular task. Such conversions are useful because they allow the incorporation of physical invariances. Mathematically speaking, the representation fulfills where S indicates a symmetry operation, for example, a rigid rotation about an axis C i , an exchange of two identical atoms, or a translation of the whole system in the Cartesian space, etc. It can also be advantageous to adopt a coarse-grained representation of the system. 449, 450 For example, dihedral angles of a peptide might be accounted for without the positions of the side-chains, positions of ions in a solution might be accounted for without the explicit coordinates of solvents, or just the center of mass for a water molecule might be accounted for in place of the full three-centered atomistic representation. The choice of these coarse-grained representations provides a way to incorporate prior knowledge of the data, or such representations can be learned from an unsupervised learning step. 451 4.1.1. Descriptors. Atomistic systems can be represented in a myriad of ways. Some descriptions are designed to emphasize particular aspects of a system, while others aim to disambiguate similar chemical or physical principles across a wide range of molecules or materials. The set of desirable properties in a representation thus depends on the task at hand. All adhere to the aforementioned physical symmetries and invariances needed for chemical systems. Many have similar theoretical foundations that can be understood as the basis onto which the atomic density is projected, 452 and the connection between them has been summarized in a recent review. 453 Table 4 gives a coarse characterization of popular representations. 276, 411, 412, 415, 417, 418, 454, 455 To create this overview, we had to adopt a reductionist perspective, which inevitably hides the complexities involved in developing robust atomistic representations. Whether a representation satisfies a particular property can sometimes not be answered unequivocally. For example, is a descriptor unique if the ML model showed pathologically erroneous results? Should a symmetry be perfectly satisfied, even if it is a bad ML feature? We therefore stress that the table simply presents representations and their attributes. A representation that satisfies more attributes is not necessarily better if it also lacks another important attribute. We kindly refer the reader to the respective original publications for more information. The descriptors in Table 4 can be classified into two categories: global and atomic (i.e., not global). Traditional descriptors used in cheminformatics are global descriptors based on the covalent connectivity of atoms. These include simple valence counting and common neighbor analysis, 456 the presence or absence of predefined atomic fragments (e.g., the Morgan fingerprints 427 ), pairwise distances between atoms (e.g., Coulomb Matrix, 413 Sine Matrix, 414 Ewald Sum Matrix, 414 Bag of Bonds (BoB) 415 ), etc. Coulomb matrices have known problems because of lack of smoothness, but these are partly addressed by employing the Wasserstein norm, rather than Euclidean or Manhattan norms. 457 However, atomic descriptors 411, 412, [416] [417] [418] [419] [420] 458 are generally more popular than the global ones in ML and CompChem. In atomic descriptors, a chemical system is described as a set of atomic environments, , ... ... , and each consists of the atoms (chemical species and position) within a sphere of radius r cut centered at a specific atom i. One needs to combine the set of atomic descriptors of all environments to construct a descriptor for the entire atomic structure. The most straightforward way to do this is to average the atomic descriptors, where the sum runs over all N A atoms i in structure A and i is the environment around atom i. When there are multiple chemical species, the descriptors for the local environments of different species can either be included in the single sum, or the averaging can be performed for the environments of each species separately and the species-specific averaged local descriptors can be concatenated. This can be done by considering the root mean square displacement (RMSD), 454 the best match between the environments of the two structures (best-match), 459 or by combining local descriptors using a regularized entropy match (RE-Match). 459 We will now describe the Smooth Overlap of Atomic Positions (SOAP) descriptors 412 since many other descriptors based on the atomic density are similar and differ mainly by how the density is projected onto basis functions. 420, 452 To construct SOAP descriptors, one first considers an atomic environment that contains only one atomic species, and a Gaussian function of width σ is then placed on each atom i in to make an atomic density function: Ä Here, r denotes a point in Cartesian space, r i is the position of atom i relative to the central atom of , and the cutoff function f cut smoothly decays to zero beyond the cutoff radius r cut . This density representation ensures invariance with respect to translations and permutations of atoms of the same species but not rotations. To obtain a rotationally invariant descriptor, one expands the density in a basis of spherical harmonics, Y lm (r), and a set of orthogonal radial functions, g n (|r|), as to construct the power spectrum of the density using the expansion coefficients: One then obtains a vector of descriptors ψ = {ψ nn′l } by considering all components l ≤ l max and n, n′ ≤ n max that act as band limits that control the spatial resolution of the atomic density. The generalization to more than one chemical species is straightforward: 459 one constructs separate densities for each species α and then computes the power spectra ( ) nn l ψ αα ′ ′ for each pair of elements α and α′, where the two species indices correspond to the c* and c coefficients, respectively. The resulting vectors corresponding to each of the α and α′ pairs are then concatenated to obtain the descriptor vector of the complete environment. Atom-centered symmetry functions (ACSFs), or sometimes called Behler−Parrinello symmetry functions, 411 descriptors differ from SOAP in that they project the atomic densities over selected 2-body or 3-body symmetry functions. FCHL 417 descriptors follow similar principles while also considering the correlations between the atomic densities coming from different chemical species. The many-body tensor representation (MBTR) 418 approach involves taking the histograms of atom counts, inverse pairwise distances, and angles. Atomic cluster expansion (ACE) descriptors 420 first express atomic densities using spherical harmonics and then generate invariant products by contracting the spherical harmonics with the Clebsch−Gordan coefficients. Length-Scale Hyperparameters. Most atomic descriptors use length-scale hyperparameters specifically chosen for a given problem and system. 276, 411, 412, 415, 417, 418, 454, 455 There are several ways to automate hyperparameter selections. Ref 374 introduced general heuristics for choosing the SOAP hyperparameters for a system with arbitrary chemical composition based on characteristic bond lengths. Ref 465 adopts the strategy to first generate a comprehensive set of ACSFs and then select a subset using the sparsification methods such as farthest point sampling (FPS) 466 and CUR matrix decomposition. 467 Incompleteness of Atomic Descriptors. A structural descriptor is complete when there is no pair of configurations that produces the same descriptor. 468 For atomic descriptors, this means that different atomic environmentsafter considering the invariances of rotation, translation, and permutation of identical atomsshould adopt distinct descriptors. Without completeness, any ML model using the descriptors as input will give identical predictions of physically different systems. Ensuring completeness while preserving the invariances is nontrivial, however. One of the simplest descriptors is based on permutationally invariant pairwise atomic distances (2-body descriptors), and ref 412 demonstrated that these are generally not complete since one can construct two distinct tetrahedra using the same set of distances. Many have assumed that Representing a manybody chemical system in terms of atomic environments brings physical significance since certain extensive physical properties (e.g., the total energy, total electrostatic charge, and polarizability of a system) can be approximated by the sum of the atomic contributions coming from each atomic environment, for example, . This approximation is valid because the atomic contribution associated with a central atom is largely determined by its neighbors, and long-range interactions can be approximated in a mean-field manner without explicitly considering distant atoms. Such "locality" is tacitly assumed in many ML models for CompChem, and it is a crucial necessity for most common atomistic potentials and MLPs (section 2.2.6.). Most MLPs (e.g., BPNN, 274 GAP, 276 and DeepMD 462 ) approximate the total energy of a system as sums of local atomic energies. Figure 7 illustrates locality by showing a KPCA map of the atom environments of carbon in the QM9 set (see section 3.3 for more detailed descriptions of the data set). By color-coding the KPCA plot with the local energies from a SOAP-based GAP model trained on QM9 energies, 470 one observes a systematic and smooth trend in energies across clusters. The total molecular energy can then be accurately predicted by the sum of local energies, which means the total energy can be approximated on the basis of all the local environments contained in the molecule. For example, an NN potential trained on liquid water simulations can predict the densities, lattice energies, and vibrational properties of diverse ice phases because the local atomic environments found in liquid water span the similar environments as those observed in ice phases. 471 Another GAP potential of carbon trained on amorphous structures and other crystalline phases predicted novel carbon structures in random structure searches as well as approximate reaction barriers. 472, 473 The locality approximation is typically rationalized based on the multiscale nature of interatomic interactions in chemical systems. It is generally expected that shorter interatomic distances correspond to stronger interactions, such that a cutoff may be imposed after a certain radial distance d given a certain energy accuracy threshold ϵ. The multiscale nature of physical interactions underlies the usual classification of chemical interactions, from strong covalent bonds and ionic interactions to weaker noncovalent hydrogen bonds and van der Waals interactions. However, our understanding of noncovalent interactions in large molecules and materials is still emerging, 36 and no general rules-of-thumb exist to define the cutoff distance d corresponding to a defined ϵ. Moreover, the sufficiency of the locality argument also depends on the phase of the system and whether the system is extended or not. 474 Hence, for systems having long-range interactions (which includes most chemical systems), the locality assumption needs revision. There are currently three schools of approaches handling the long-range interactions. The first is to use global ML models, such as (s-)GDML, 207, 372 which learn global interactions directly. Global models tend to be more data-efficient because they focus on learning a full molecular or material PES, but this significantly limits transferability since the ML model alone can only be used on the system it was trained upon. The second is to learn the charges 475, 476 and multipoles 477 for each atom, and then the long-range electrostatic interactions based on environmentdependent charges or multipoles can be explicitly included using Coulomb's law. To ensure that the sum of the atomic charges reaches neutrality, charge equilibration schemes can be used. 478 The third is to capture the long-range electrostatic effects by introducing a nonlocal long-distance equivariant (LODE) representation, 479, 480 which is dependent on the electrostatic field generated by the decorated atom density. 4.1.4. Advantages of Built-In Symmetries. Built-in symmetry in ML models substantially compresses the dimensionality of atomic representations and ensures that physically equivalent systems are predicted to have identical properties. One of the most rigorous ways of imposing symmetry onto a model f is via the invariant integration over the relevant group 34) where P π x is a permutation of the input. However, the cardinality of even basic symmetry groups is exceedingly high, which makes this operation prohibitively expensive. This combinatorial challenge can be solved by limiting the invariant integral to the physical point group and fluxional symmetries that actually occur in the training data set, as done in sGDML. 207 Alternative approaches, such as parameter sharing 352,421−423 or density representations, 276 have also proven effective. For example, the DeepMD potential has two versions, the Smooth Edition (DeepPot-SE) explicitly preserves all the natural symmetries of the molecular system, and the other version that does not. 462 The DeepPot-SE offers much improved stability and accuracy. 207, 462 For ML predictions of scalar properties, the rotationally invariant atomic descriptor framework described earlier is appropriate. One may wish to predict vectorial or tensorial properties including dipole moments, polarizability, and elasticity. A covariant version of descriptors may be advantageous, and this can be expressed as where S indicates a symmetry operation such as a rigid rotation about an axis. Ref End-to-End NN Representations. All descriptors introduced above rely on a suitable set of hyperparameters (e.g., length scales, radial and angular resolution). Determining an optimal set of hyperparameters can be a tedious process, especially when heuristics are unavailable or fail due to the structural and compositional complexity of the system. A poor choice of descriptors can limit the accuracy of the final ML model, for example, when certain interatomic distances can not be resolved. End-to-end NN representations follow a different strategy to learn a representation directly from reference data. Using atom types and positions of a system as inputs, end-to-end NNs construct a set of atom-wise features x i . These features are then used to predict the property of interest, for example, the energy as a sum of atom-wise contributions. Unlike static descriptors, the representation is also optimized as part of the overall training process. This way end-to-end NNs can adapt to structural features in the data and the target properties in a fully automatic fashion to eliminate the need for extensive feature engineering from the practitioner. The deep tensor NN framework (DTNN) 352 introduced a procedure to iteratively refine a set of atom-wise features {x i } based on interactions with neighboring atoms. Higher-order interactions can then be captured in an hierarchical fashion. For example, a first information pass would only capture radial information, but further interactions would recover angular relations and so on. In DTNN, a learnable representation depending only on atom types x i 0 = e z i serves as an initial set of features. These are then refined by successive applications of an update function depending on the atomic environment that takes the general form Here, l indicates the number of overall update steps. The sum runs over all atoms j in the local environment, and a cutoff function f cut ensures smoothness of the representation. Each feature is updated with information from all neighboring atoms through the interaction function G. Apart from the neighbor features x j , G also depends on the interatomic distance |r i − r j |, which is usually expressed in the form of a radial basis vector g. After the update, an atom-wise transformation F can be applied to further modulate the features. Since each update depends only on the interatomic distances and the summation over neighboring atoms is commutative, end-to-end NNs of this type automatically achieve a representation that is invariant to rotation, translation and permutations of atoms. Using these atom-type dependent embeddings compactly encodes elemental information. This is advantageous for systems comprised of many different chemical elements. Such multicomponent systems can be problematic to treat with predefined descriptors (e.g., ACSFs or SOAP), as these typically introduce additional entries for each possible combination of atom types, resulting in a large number of descriptor dimensions. Since the introduction of DTNN, many different types of end-to-end NNs have been developed, and these vary by the choice for the functions F and G. For example, SchNet 434 uses continuous convolutions inspired by convolutional neural networks (CNNs) to describe the interatomic interactions. In this case, the update in eq 36 takes the form i k j j j j j j j y where the feature transformation (NN tr ) and the radial dependence (NN rad ) are both modeled as trainable NNs. Other ML models introduce additional physical information. The hierarchical interacting particle NN (HIP-NN) 484 enforces a physically motivated partitioning of the overall energy between the different refinement steps, while the PhysNet architecture 485 introduces explicit terms for longrange electrostatic and dispersion interactions. In ref 421, Gilmer et al. categorize graph networks of this general type as message-passing NNs (MPNNs) and introduce the concept of edge updates. These make it possible to use interatomic information beside the radial distance metric in the refinement procedure, and they have since been adapted for other architectures. 486 Another interesting extension are end-to-end NNs incorporating higher-order features beside the scalar x i used in the original DTNN framework. These are equivariant features that encode rotational symmetry and can be based on angles, dipole moment vectors, or features that can be expressed as spherical harmonics with l > 0. This enables the exchange of only radial information between atoms in each interaction pass and instead include higher structural information, such as dipole−dipole interactions or angular information. In addition, equivariant end-to-end NNs can also be used to predict vectorial or tensorial properties in a manner similar to the rotational-symmetry-adapted SOAP kernel. Examples include TensorField networks, 464 Cormorant, 463 DimeNet, 487 PiNet, 488 and FieldSchNet. 300 After a descriptor vector for each chemical structure is defined, one can then construct the design matrix and the kernel matrix for a set of structures. These matrices can then be used as the input of ML models. As described in section 2, supervised ML methods, such as NNs and GPs, can be used to approximate nonlinear and high-dimensional functions, particularly when massive amounts of training data become available. Thus, one should expect that using CompChem would be very useful for generating a large amount of almost noise-free training data of specific systems or atomic configurations, as long as a physically accurate method is being applied in the right way with appropriate computational resources. In contrast, experimental observations can be difficult to measure and reproduce precisely. Note that the aim of most CompChem +ML efforts have a similar scope as decades-old quantitative structure activity/property relationship (QSAR/QSPR) models that are often based on experiments or CompChem modeling. 326, 327, 489 Thus, researchers in CompChem+ML should be aware of potentially relatable work done by the QSAR/QSPR communities, and to what extent questions being posed have been sufficiently answered. On the other hand, ML usually provides higher accuracy than non-ML Chemical Reviews pubs.acs.org/CR Review statistical models, and so QSAR/QSPR efforts have been turning toward ML models as well. 490 We have explained how data from different CompChem methods, each containing different degrees of physical rigor, can be used to train ML models. ML models in turn can be created to approximate underlying high-dimensional functions intrinsic to physical systems. For example, research efforts are toward learning electron densities, 491 density functionals, 162 and molecular polarizabilities. 492 Besides these direct learning strategies, ML has been used to enhance the performance and suitability of CompChem models. As mentioned in section 1, the Δ-ML 493 approach is now a common technique for adapting an ML model that improves the quality of a theoretically insufficient but computationally affordable method. This approach has been used to learn many body corrections for water molecules to allow a relatively inexpensive KS-DFT approach like BLYP to more accurately reproduce CCSD(T) data. 494 Along similar lines, Shaw and co-workers used CompChem features along with an NN to reweight terms from an MP2 interaction energy to provide ML-enhanced methods with increased performance. 126 Miller and co-workers have developed ML-models where molecular orbitals themselves are learned to generate a density matrix functional that provides CCSD(T)-quality PESs with a single reference calculation. 495 von Lilienfeld and coworkers have investigated how the choice of regressors and molecular representations for ML models impacts accuracy, and their findings suggest ways that ML models may be trained to be more accurate and less computationally expensive than hybrid DFT methods. 496 Burke and co-workers have studied how ML methods can result in improved understanding and more physical exact KS-DFT 181,497−499 and OFDFT functionals. 161 Brockherde et al. have presented an approach, where ML models can directly learn the Hohenberg−Kohn map from the one-body potential efficiently to find the functional and its derivative. 162, 184 Akashi and co-workers have also reported the out-of-training transferability of NNs that capture total energies, which shows a path forward to generalizable methods. 500 Toward predictive insights, there are many other approaches that are broadly useful. One can exploit the "universal approximator" nature of ML architectures to find a function that gives the best solution in a variational setting. For instance, using restricted Boltzmann machines 501 or deep NNs as a basis representation of wavefunctions 105, 106, 502 in Quantum Monte Carlo calculations. Alternatively, the use of active learning might increase the efficiency, accuracy, scalability, and transferability of ML models. 503−505 We have laid the general framework for CompChem+ML studies, but this direction would not be complete without more details about training data (i.e., garbage in, garbage out). We now review the landscape of data sets in CompChem and how they will likely evolve over time. The past decade has seen continually increasing usefulness and availabilty of "big data" from CompChem that include community-wide data repositories comprised of millions of atomistic structures along with diverse physical and chemical properties. 506−509 Such repositories are becoming the norm, and it is more customary for different users to deposit raw or processed simulation data there for the benefit of the research community. This brings the possibility of robust validation tests for ML models, but it also necessitates approaches that are well-equipped to handle large and complex data sets. Typical data sets may come from diverse origins such as MD trajectories from ab initio simulations, data sets of small molecules and molecular conformers, or other training sets used for developing ML and non-ML FFs for specific applications. As the data sets grow, so do the scope of publications that involve ML as shown in Figure 1 . 4.3.1. Benchmark Data Sets. ML models must be validated before they can be trusted for predictions. Validations of descriptors or model trainings are performed on benchmark data sets, and several popular ones are summarized in Table 5 . These allow ML models to be compared on the same ground and provide large amounts of data for robust training. Their availability to the public also ensures that the data sets can evolve with time and be extended as a part of community efforts. 529 Among the entries in Table 5 , the most often used one is the QM9 set, which consists of approximately 134 000 of the smallest organic molecules that contain up to 9 heavy atoms (C, O, N, or F; excluding H) along with their CompChemcomputed molecular properties such as total energies, dipole moments, HOMO−LUMO gaps, etc. Several ML studies have already been published using this data set (see Figure 8 , ref 496) . A popular challenge associated with QM9 is to develop a next-generation ML model that learns the electronic energies of random assortments of organic molecules with higher accuracy and less required training data than other existing models. Doing so tests next generation molecular representations and training algorithms. Figure 8 illustrates how the choice of architecture and descriptors can influence the predictive performance and data efficiency of ML models using different properties of the QM9 data set as examples. The next significant advance will potentially be due to a combination of supervised and unsupervised learning models. 4.3.2. Visualization of Data Sets. As the structural data sets grow it becomes infeasible to manually identify hidden patterns or curate the data. Data-driven and automated frameworks for visualizing these data sets become increasingly popular. 530−533 Dimensionality reduction effectively translates the high dimensional data (i.e., the xyz-coordinates for molecules or materials in different atomic configurations) into a low-dimensional space easily visualized on paper or a computer screen. In this way, entries such as those in the QM9 set can be shown (see Figure 9 ). The KPCA maps in Figure 9 are based on the dimensionality reduction of the global SOAP descriptors, which are constructed by combining all the atomic SOAP descriptors using eq 30. Each dot represents a small molecule in the QM9 set, and the maps, thus, illustrate the similarity between the molecules, instead of the relations between the carbon atomic environments in Figure 7 . The maps in Figure 9 are color-coded using different molecular properties, such as the atomization energies, composition, and optical properties, and these properties are strongly correlated with the principal axes. These KPCA maps are, therefore, an intuitive and condensed way to help navigate the QM9 set. Similarly, ref 321 used SOAP-sketchmaps in conjunction with quasi-chemical theory to visualize similarities in local solvation structures and thus show an unsupervised learning procedure to identify structures that significantly impact solvation energies of small ions. Generally speaking, these data-driven maps are generated by processing the design matrix (or kernel matrix) associated with a data set using dimensionality reduction techniques introduced in section 3.2. A simple option is to use the ASAP code, 374 a Python-based command line tool, that automates analysis and mapping. Figures 7 and 9 were generated using ASAP using only two commands that are displayed in the figure. Data sets can also be explored in an intuitive manner using interactive visualizers 534 that run in a web browser and display 3D-structures corresponding to each atomistic structure in the data set. Mining for Chemistry. Conventional publications are an essential part of any CompChem knowledge base, and ML is becoming useful at accelerating information extraction from the scientific literature via text mining. 535−537 This topic was previously comprehensively reviewed in the context of cheminformatics. 538, 539 Natural language processing has already driven text-mining efforts for materials science discovery 538 and experimental synthesis conditions of oxides. 528, 540 CompChem+ML can also amplify existing efforts in chemometrics, 541 the science of data-driven extraction of chemical information. 542 This area has also branched into related disciplines of data mining for specific classes of materials 543 and catalysis informatics. 544 These approaches have great promise, especially for deriving information and knowledge from data, but it remains challenging to implement these in ways that achieve insight (and true impact). Some have shown paths forward for doing so. For example, ML models can obtain knowledge from failed experimental data more reliably than humans who are more susceptible to survivor bias, 545 and it can also be used to distill physical laws and fundamental equations using experimental 363 and computational data. 546 ML models can also be used to reliably predict SMILES representations (a string-based representation of molecular graphs) that allow encoded information to be derived from low-resolution images found in the literature. 547 ML models can interpret experimental X-ray absorption near edge structure (XANES) data and predict real space information about coordination environments. 548 Likewise, scanning tunneling microscopy (STM) data can be used to classify structural and rotational states on surfaces, 549 and name indicators can be used to predict in tandem mass Chemical Reviews pubs.acs.org/CR Review spectrometry (MS/MS) properties. 550 In closing, we see exciting opportunities for future applications that complement data and text mining to chemometrics through chemical space. We previously mentioned that ML can handle large data sets and extract insights while circumventing the high cost of quantum-mechanical calculations by statistical learning. CompChem+ML also has great potential in developing MLPs. Car and Parrinello proposed running MD using electronic-structure methods in 1985. 551 These are now mainstream but also quite computationally demanding and normally restricted to small system sizes (∼100 atoms) and short simulation times (∼10 −12 s). Alternatively, accurate atomistic potentials introduced in section 2.2.6 have been developed to allow Monte Carlo and MD simulations, but sufficiently accurate potentials are sometimes not available. MLPs have emerged as way to achieve as high accuracy as KS-DFT or correlated wavefunction methods but with a fraction of the cost. MLPs have been constructed for far-reaching systems from small organic molecules to bulk condensed materials and interfaces. 433, 552, 553 Several of the coauthors of the current review have also written separate review focused more narrowly on this topic, 554 and so, we only provide a brief overview here. Training an MLP to reproduce a system's PES usually requires generating diverse and high quality CompChem data points that cover the relevant temperature and pressure conditions, reaction pathways, polymorphs, defects, compositions, etc. 555−562 After data points comprised of atomic configurations, system energies, and forces are obtained, different methods for constructing MLPs employ either different descriptors (see a list of examples in Table 4 ) or different ML architectures to perform interpolations of the full PES. Again, smoothness is an essential feature for any PES, so special considerations are needed to avoid numerical noise that would result in discontinuities. 563, 564 Kernel method-based MLPs, such as GAP 276, 565 and sGDML, 207,372,566 ensure smoothness by relying on smoothly varying basis functions, but the scaling of kernel-based methods with respect to the number of training points is challenged without reduction mechanisms. 396, 567 As a much more efficient but somewhat less accurate alternative to GAP, SNAP 568 uses the coefficients of the SOAP descriptors and assumes a linear or quadratic relation between energies and the SOAP bispectrum components. 569 The most popular MLPs are currently NNbased due to their flexibility and capacity to train based on large amounts data. Among these, ANI 511,513 and BPNN 274,433,570 potentials use ACSF descriptors as inputs, while Deep NNs, such as SchNet 422, 434, 571 and DeepMD 572 use the coordinates and nuclear charges of atoms. We now focus on a few example applications. 4.4.1. Predicting Thermodynamic Properties. Many CompChem efforts focus on predicting thermodynamic properties at finite temperatures, such as heat capacity, density, and chemical potential. Although many physical properties are already accessible from MD simulations, doing estimations of free energies that establish the relative stability of different states using electronic structure methods remains difficult. The configurational part of the Gibbs free energy of a bulk system that has N distinguishable particles with atomic coordinates r = {r 1...N }, and the associated potential energy U(r) can be expressed as integrated over all possible coordinates r, where k B is the Boltzmann constant. In order to rigorously determine G, one must exhaustively sample the configuration space that has relatively high weight arising from the This normally requires thermodynamic integration or enhanced sampling methods (e.g., umbrella sampling, 573 metadynamics, 574 or transition path sampling 575 ), that require simulation times and scales far beyond what is accessible with MD simulations based on KS-DFT or correlated wavefunction methods. However, MLPs have unleashed both limits on the time scale and system size. An early example, 576 used an MLP with umbrella sampling 573 and the free energy perturbation method 577 to reveal the influence of van der Waals corrections on the thermodynamic properties of liquid water. Later, the combination of an MLP trained from hybrid DFT data and free energy methods reproduced several thermodynamic properties of water from quantum mechanics, including the density of ice and water, the difference in melting temperature for normal and heavy water, and the stability of different forms of ice. 578, 579 Ref 580 employed the DeepMD approach to study the relatively long time-scale nucleation of gallium. MLPs for high-pressure hydrogen provided evidence on how hydrogen gradually turns into a metal in giant planets. 581 In all these examples, high accuracy and long time scales were required to model the specific phenomena and reveal physical insights, and it is precisely the combination of CompChem +ML that enables both. 4.4.2. Nuclear Quantum Effects. As mentioned in section 2.2.5, NQEs of chemical systems having light elements bring challenges for atomistic modeling because the added mobility of lighter atoms in dynamics simulations requires higher computational cost to treat. To make the matter even more complicated, many atomistic potentials (see section 2.2.6), particularly the ones for water or organic molecules, cannot be used to model NQEs, because they often describe colavent bonds as rigid and thus cannot describe the fluctuations of the bond lengths and angles. As a remedy, several studies have been performed by training an MLP using higher rungs of KS-DFT (e.g., hybrid-DFT or meta-GGA) and then using this potential in PIMD simulations. 578,582−584 The study of water mentioned in the previous section, which used MLPs trained from hybrid DFT, revealed that NQEs were critical for promoting the hexagonal packing of molecules inside ice that ultimately lead to the 6-fold symmetry of snowflakes. 578 Highly data efficient ML potentials can even be trained on reference data at the computationally very expensive quantum-chemical CCSD(T) level of accuracy. For example, the sGDML 206, 207, 585 approach has been shown to faithfully reproduce such FFs for small molecules, which were then used to perform simulations with effectively fully quantized electrons and nuclei. Locating stationary points on the PES is a frequent task in CompChem, since these are needed for explaining reaction kinetics. Explorations for stationary points normally require many energy and force evaluations. ML approaches are being implemented to dramatically accelerate minimum energy as ML can also dramatically accelerate the challenge of efficiently sampling equilibrium or transition states by accelerating enhanced sampling methods such as umbrella sampling 573 and metadynamics. 574 These procedures make use of collective variables (CVs) that define a reaction coordinate, and computing the associated free energy surface (FES) amounts to generating the marginal probability distribution in these CVs. Unfortunately, the choice of the CVs is not always clear for specific systems, and ML has shown some promise in guiding their determination. 589−591 Another direction is to exploit that ML models can be considered as universal approximators of FESs. 592 For example, there are reports of adaptive enhanced sampling methods using a Gaussian Mixture model, 593 using an NN architecture to represent the FES 594 or the bias function in variational sampling simulations. 595 ML methods also offer fundamentally new ways to explore chemical compound and configuration space. Generative models can learn the structural and elemental distribution underlying chemical systems, and once trained, these models can then be used to directly sample from this distribution. It is furthermore possible to bias the generated structures toward exhibiting desired properties, for example, drug activity or thermal conductivity. As a consequence, generative models offer exciting new avenues in drug and materials design. 596, 597 Generative methods in CompChem include recurrent neural networks (RNNs), which can be used for the sequential generation of molecules encoded as SMILES strings. 598−600 Segler et al. demonstrated how such a recurrent model can first learn general molecular motifs and then be fine-tuned to sample molecules exhibiting activity against a variety of medical targets. 599 Autoencoders (AE) are another frequently used ML method for molecular generation. AEs learn to transform molecular graphs or SMILES into a low-dimensional feature space and backward. The resulting feature vector represents a smooth encoding of the molecular distribution and can be used to effectively sample chemical space. 601 607 An interesting extension to AEs are conditional AEs, which not only capture the distribution of molecular structures but also dependencies on various properties. 426, 608 This makes it possible to directly generate structures exhibiting certain property ranges or combinations without the need for biasing or additional optimization steps. AEs can also form the basis of another approach for exploring chemical space called generative adversarial networks (GANs). 609, 610 In a GAN, a generator model (often an AE) attempts to create samples that closely match the underlying data, while a discriminator tries to distinguish true from generated samples. These architectures can be enhanced by using RL objectives. RL learns an optimal sequence of actions (e.g., placement of atoms) leading to a desired outcome (e.g., molecule with certain property). This makes it possible to drive generative processes toward certain objectives, allowing for the targeted generation of molecules with particular properties. 611−614 RL in general is a promising alternative strategy for generative models, 615, 616 and they offer the possibility for tight integration into drug design cycles. 617 Alternative approaches combine autoregressive models with graph convolution networks. 618, 619 While these methods use SMILES or graphs to encode molecular structures, generative models have recently been extended to operate on 3D coordinates of molecules and materials. 620, 621 Gebauer et al. proposed an autoregressive generative model based on the SchNet architecture, called g-SchNet. 622 Once trained on the QM9 data set, g-SchNet was able to generate equilibrium structures without the need for optimization procedures. It was further found, that the model could be biased toward certain properties. In another promising approach, Noéet al. used an invertible NN based on normalizing flows to learn the distribution of atomic positions (e.g., sampled from an MD trajectory). This network can then be used to directly sample molecular configurations by sampling from this distribution without performing costly simulations. 298 Multiscale modeling is a term for including simulation or information from different scales (see Figure 3 ). ML has been introduced into QM/MM-like schemes that enable improved multiscale simulations, 300, 325, 623 and on the side of coarsegraining. 624 Different coarse-graining potentials have been developed, 625 but the inherent functional form for these potentials relies on CPI as well as trial-and-error procedures. Several works used ML for constructing coarse-grained potentials by matching mean forces. 449, 450, 626, 627 In closing, we see promise for incorporating experimental priors into ML models, for instance, using experimental measurements to improve an ML PES by complementing them with experimental data. We are not aware of such efforts for developing highly accurate MLPs beyond the atomic scale, although much work has been done along this line to refine FFs of RNAs and proteins, often incorporating methods from ML, including the maximum entropy approach. 628 The central challenge posed at the beginning of this review was how to identify and make chemical compounds or materials having optimal properties for a given purpose. To do so would help address critical and broad issues from pollution to global warming to human diseases. Traditional developments are often slow, expensive, and restricted by nontransferable empirical optimizations, and so efforts have turned to CompChem+ML to alleviate this. 515, 525, 629, 630 CompChem+ML are enabling searches through larger areas of chemical space much faster than before. 20,631−634 This section is not to extensively review the large amount of work using CompChem+ML in these different areas, but rather to highlight examples of applications that have resulted in notable insights so that others might use these notable works as templates for future efforts. Molecules and materials design is usually considered to be an optimization problem. 270, 426, 602, 607, 635 Thus, a comprehensive understanding of chemical space is needed to identify Chemical Reviews pubs.acs.org/CR Review compounds with desired properties that are subject to certain required constraints (e.g., a specific thermal stability or a suitable optical gap for absorbing sunlight). Those properties will also depend on many key variables (e.g., constitutive elements, crystal forms, geometrical and electronic characteristics, among others), which make the property prediction complex. 531 CompChem calculations as explained in section 2 should provide a continuous description of properties across a continuous representation (i.e., a descriptor or fingerprint) of molecules that is used to map molecular configurations to target properties, and vice versa. ML methods then can be implemented to search large databases to extract structure− property relationships for designing compounds with specific characteristics. 531,635−637 Optimizations would then be performed on the structure-based function learned from training configurations, and the composition of the chemical compound would then be recovered back from the continuous representation. As a protoypical example of molecular design via highthroughput screening, Gomez-Bombarelli et al. 632 showed a computation-driven search for novel thermally activated delayed fluorescence organic light-emitting diode (OLED) emitters. That work first filtered a search space of 1.6 million molecules down to approximately 400 000 candidates using ML to anticipate criteria for desirable OLEDs. For the purpose of evaluating candidates, they estimated an upper bound on the delayed fluorescence rate constant (k TADF ). TD-DFT calculations were then used to provide refined predictions of specific properties of thousands of promising novel OLED molecules across the visible spectrum so that synthetic chemists, device scientists, and industry partners would be able to choose the most promising molecules for experimental validation and implementation. Notably, this example of CompChem+ML resulted in new devices that exhibited an external quantum efficiency of over 22%. Figure 10 shows the high accuracy of ML in predicting useful properties for high-throughput screening of molecules and materials based on k TADF calculations. This work exemplifies how ML can accelerate the design of novel compounds in such a way that could not be possible using traditional CompChem methods alone. Integrations of features relevant to learning tasks allow one to improve the accuracy of ML predictions for a given target property. Park and Wolverton 638 improved the performance of the crystal graph convolution neural network (CGCNN) 639 by adding to the original framework information about the Voronoi tessellated crystal structures, which are explicit 3-body correlations of neighboring constituent atoms, and an optimized representation of interatomic bonds. The new approach that was labeled as iCGCNN achieved a predictive accuracy 20% higher than that of the original CGCNN when determining thermodynamic stabilities of compounds (i.e., predictions of hull distances). When used for high-throughput searches, iCGCNN exhibited a success rate higher than an undirected high-throughput search and higher than that of CGCNN. Figure 11 shows the improvement in predictions of nearly stable compounds after using more appropriate descriptors. This study showcases how descriptors can be tailored to further enhance the success of ML-aided highthroughput screening. A grand challenge in chemistry is to understand synthetic pathways to desired molecules. 640, 641 Retrosynthesis involves the design of chemical steps to produce molecules and materials that would be crucial to drug discovery, medicinal chemistry, and materials science. As a different kind of optimization problem, the general tactic is to analyze atomic scale compounds recursively, map them onto synthetically achievable building blocks, and then assemble those blocks into the desired compound. 642−644 Three main issues make retrosynthesis a formidable intellectual challenge. 645 First, simple combinatorics make the space of possible reactions greater than the space of possible molecules. Second, reactants seldom contain only one reactive functional group, and thus require predictions of multiple functional groups. Third, one failed step in the route can invalidate the entire synthesis because organic synthesis is a multistep process. Given these challenges, ML is becoming more established in determining reaction rules from CompChem data. 641 Computer-aided synthesis planning was actually first attempted in the 1960s. 646 Many have since attempted to formalize chemical perception and synthetic thinking using computer programs. 647−649 These programs are typically based on one of three possible algorithms: 649 1. Algorithms that use reaction rules (manually encoded or automatically derived from databases). 2. Algorithms that use principles of physical chemistry based on ab initio calculations to predict energy barriers. 3. Algorithms based on ML techniques. ML approaches are used to try to overcome the generalization issues of rule-based algorithms (that normally suffer from incompleteness, infeasible suggestions, and human bias) while also avoiding the high cost of CompChem calculations. It is now possible to obtain purely data-driven approaches for synthesis planning, which are promoting a rapid advancement in the field. For example, Coley and co-workers 650 designed a data-driven metric, SCScore, for describing a real synthesis modeled after the idea that products are, on average, more Chemical Reviews pubs.acs.org/CR Review synthetically complex than each of their reactants. The definition of a metric for selecting the most promising disconnections that produce easily synthesizable compounds is crucial for avoiding combinatorial explosions. Figure 12 shows that a data-driven metric, the SCScore, is more suitable than other heuristic metrics to perceive the complexity of each step in a given synthesis. This work offered a valuable contribution to the retrosynthesis working pipeline by providing a method that implicitly learns what structures and motifs are more prevalent as reactants. Apart from isolated approaches or algorithms to deal with specific tasks within retrosynthesis, there is already software available to advance this field. One example is the Chematica program, 651 which has implemented a new module that combines network theory, modern high-power computing, AI, and expert chemical knowledge to design synthetic pathways. A scoring function is used to promote synthetic brevity and penalize any reactivity conflicts or nonselectivities, thus allowing it to find solutions that might be hard for a human to identify. Figure 13A shows the decision tree for one of the almost 50 000 reaction rules used in Chematica. Reaction rules can be considered as the allowed moves from which the synthetic pathways are built, and such moves lead to an enormous synthetic space (the number of possibilities within n steps scales as 100 n ) as the one shown by the graph in Figure 13B . Chematica explores this large synthetic space by truncating and reverting from unpromising connections and drives its searches to the most efficient sequences of steps. Moreover, in the pathways presented to the user, each substance can be further analyzed with molecular mechanics tools. This software was used to obtain insights into the synthetic pathways to eight targets (seven bioactive substances and one natural product). All of the computer-planned routes were not only successfully carried out in the laboratory, but they also resulted in improved yields and cost savings over previous known paths. This work opened an avenue for chemists to finally obtain reliable pathways from in silico retrosynthesis. For further reading we recommend the two-part reviews of Coley and co-workers. 652,653 Catalysis research involves understanding how to impact chemical product yields and selectivities. 654 Traditional catalysis is normally discussed in textbooks in terms of homogeneous (i.e., within a solution phase), heterogeneous (occurring at a solid/liquid interface), and biological (occurring within enzymes and riboenzymes), but it is best not to use these terms too strictly because actual reaction 665, 666 or other external resonances. 667 Catalysis makes up roughly 35% of the world's gross domestic product, 668 and it is important to guide toward the end goal of achieving greater sustainability with catalytic processes. 669−671 These reasons help make catalysis a fertile training ground for applying and developing theoretical models (e.g., refs 672−674) that can be used along with CompChem or CompChem+ML. The research field is also burgeoning with many reports and review articles 544,675−679 that discuss perspectives and progress using ML methods for catalysis science. Here, we will mention notable examples. For example, CompChem+ML methods are enabling more data generation by allowing costly CompChem calculations to be run more efficiently, and more information means more comprehensive predictions of chemical and materials phase diagrams for catalysis 680,681 as well as stability and reactivity descriptors Figure 14 shows examples of the palettes of insight available using state-of-the-art CompChem +ML modeling for identifying activity and selectivity maps, as well as visualizations of data using t-SNE. 687 Regarding modeling of deeply complex chemical environments, Artrith and Kolpak developed MLPs for investigating the relationships between solvent, surface composition and morphology, surface electronic structure, and catalytic activity Chemical Reviews pubs.acs.org/CR Review in systems composed of thousands of atoms interfaces. 689 We expect such simulations for electro-and photocatalysis elucidation will continue to improve in size, scale, and accuracy. For other physical insights, new approaches by Kulik, Getman, and co-workers have also focused on developing ML models appropriate for elucidating complex d-orbital participation in homogeneous catalysis. 690 Rappe and co-workers have used regularized random forests to analyze how local chemical pressure effects adsorbate states on surface sites for the hydrogen evolution reaction. 691 Almost trivially simple ML approaches can be used in catalysis studies to deduce insights into interaction trends between single metal atoms and oxide supports, 692 to identify the significance of features (e.g., adsorbate type or coverage), where CompChem theories break down, 693 or they can be used to identify trends that result in optimal catalysis across multiple objectives, such as activity and cost ( Figure 15 ). 694 ML is also opening opportunities for CompChem+ML studies on highly detailed and complex networks of reactions. 695−700 Such models in principle can then significantly extend the range of utility of microkinetics modeling for predictions of products from catalysis. 701, 702 ML also enables studies of complicated reaction networks that can allow predictions of regioselective products based on CompChem data, 703 asymmetric catalysis important for natural product synthesis, 704, 705 and biochemical reactions. 706 Efforts to better understand "above-the-arrow" optimizations of reaction conditions relate back to the challenge of retrosynthetic challenges. 707, 708 Ideally, these efforts will continue while making use of rapid advances in CompChem+ML that enable predictive atomistic simulations to be run faster and more accurately. We see reason for excitement for different approaches, but we again stress the importance of ensuring that models will provide unique and physical results (see section 3 where we discuss the risk of "clever Hans" predictors 360 ). The central objective for drug discovery is to find structurally novel molecules with precise selectivity for a medicinal function. This involves identifying new chemical entities and obtaining structures with different physicochemical and polypharmacological properties (i.e., combinations of beneficial pharmacological effects or adverse side-effects). 709, 710 Drug discovery involves the identification of targets (a property optimization task, as in material design) and the determination of compounds with good on-target effects and minimal offtarget effects. 711 Traditionally, a drug discovery program may take around six years before a drug candidate can be used in clinical trials, and six or seven more years are required for three clinical phases. Thus, it is important to identify adverse effects as soon as possible to minimize time and monetary costs. 712 Accelerating drug discovery relies on predicting how and where a certain drug binds to more than one protein, a phenomenon that sometimes results in polypharmacology. Researchers are developing ready-to-use tools aimed to facilitate research for drug discovery, 713 but CompChem+ML is expected to continue providing even more benefits to the drug development pipeline. 714 In a recent study, Zhavoronkov et al. 617 developed a deep generative model for de novo small-molecule design: the generative tensorial reinforcement learning (GENTRL) model that was used to discover potent inhibitors of discoidin domain receptor 1 (DDR1), a kinase target implicated in fibrosis and other diseases. The drug discovery process was carried out in only 46 days, beginning with the recollection of appropriate data for training and finishing with the synthesis and experimental test of some compounds ( Figure 16A ). GENTRL was used to screen a total of 30 000 structures (some examples compared to the parent DDR1 kinase inhibitor are shown in Figure 16B ) down to only 40 structures that were randomly selected ensuring a coverage of the resulting chemical space and distribution of root-mean squared deviation values. Six of these molecules were then selected for experimental validation (see Figure 16C ), with one of them demonstrating favorable pharmacokinetics in mice. The predicted conformation of the successful compound according to pharmacophore modeling was very similar to the one predicted to be preferred and stable by CompChem methods. This work illustrates the utility of CompChem+ML approaches to give insights into drug design by rapidly giving compound candidates that are synthetically feasible and active against a desired target. Besides generating new chemical structures with favorable pharmacokinetics, ML methods are also used in pharmaceutical research and development for peptide design, compound activity prediction and for assisting scoring protein−ligand interaction (docking). 709,715−717 An example of the latter was proposed by Batra et al. 718 for efficiently identifying ligands that can potentially limit the host−virus interactions of SARS-CoV-2. Those authors designed a high-throughput strategy based on CompChem+ML that involved high-fidelity docking studies to find candidates displaying high-binding affinities. The ML model was used to search through thousands of approved ligands by the Food and Drug Administration (FDA) and a million biomolecules in the BindingDB database. 514 From these, insights were obtained for more than 19 000 molecules satisfying the Vina score (i.e., an important physicochemical measure of the therapeutic process of a molecule that is used to rank molecular conformations and predict free energy of binding). Figure 17 shows the Vina score predictions that led to the selection of the best candidates, some of which are also illustrated in the figure. The Vina scores for the top ligands were further confirmed using expensive docking approaches, resulting in the identification of 75 FDAapproved and 100 other ligands potentially useful to treat SARS-CoV-2. This study highlights a reasonable CompChem +ML strategy for making useful suggestions to aid expert biologists and medical professionals to focus in fewer candidates when performing either robust CompChem efforts or synthesis and trial experiments. Recent CompChem methods, algorithms, and codes have empowered new studies for a wealth of physical and chemical insights into molecules and materials. Today, the combination of CompChem+ML can be equipped to address new and more challenging questions in different domains of physics, materials science, chemistry, biology, and medicine. Productive research efforts in this direction necessitate interdisciplinary teams and increasing availability of high-quality data across appropriate regions of chemical compound space. Discovering new chemicals and materials requires thorough investigations. One needs to predict reaction pathways and interactions between molecules, optimize environmental conditions for catalytic reactions, enhance selectivities that eliminate undesired side reactions or side effects, and navigate other system-specific degrees of freedom. Addressing this complexity calls for a statistical view on chemical design and discovery, and CompChem+ML provides a natural synergy for obtaining predictive insights to lead to wisdom and impact. This Review provided a bird's-eye view of CompChem and ML and how they can be used together to make transformative impacts in the chemical sciences. The successes of CompChem +ML are particularly visible in physical chemistry and include drastic acceleration of molecular and materials modeling, discovery and prediction of chemicals with desired properties, Figure 17 . Vina scores predictions for the isolated protein (S-protein) and the protein-receptor complex (interface) for all the molecules in the BindingDB data sets and some exemplary top cases that satisfy the screening criteria. ML models trained on accurate CompChem databases are of upmost importance to efficiently gain insights into possible treatments, even for newly discovered diseases. Figure taken from 1. Reliance on ML in CompChem algorithms must be increased: ML algorithms can be integrated into CompChem algorithms at almost any simulation level ( Figure 3 ). ML algorithms are already available to accelerate calculations of CompChem energies, navigations along reaction pathways, and sampling of larger regions of the PES, but the reluctance of their use impedes progress. In general, these algorithms must be made more effective, efficient, accessible, user-friendly, and reproducible to benefit fundamental and applied research (see for example, ref 719.). 2. More general ML approaches are needed: ML methods must continue to evolve beyond now-common applications of learning a narrow region of a PES or identifying straightforward structure/property relationships. New ML methods should have the capacity to predict energetic and electronic properties and their more convoluted relationships across chemical space. Such approaches should grow toward uniformly describing compositional (chemical arrangement of atoms in a molecule) and configurational (physical arrangement of atoms in space) degrees of freedom on equal footing. Further progress in this field requires developing new universal ML models suitable for insights across diverse systems and physicochemical properties. 3. ML representations must include the right physics: ML methods that are claimed to be accurate but incorrectly describe the true physics of a system will eventually fail to achieve meaningful insights while lowering the reputation of other work in the field. Current ML representations (descriptors) can successfully describe local chemical bonding, but few if any are treating longrange electrostatics, polarization, and van der Waals dispersion interactions that are critical for rationalizing physical systems, both large and small. Combining intermolecular interaction theory (a key focus of advanced CompChem methods) with ML is an important direction for future progress toward studying complex molecular systems. 4. CompChem + ML applications need to strive toward achieving realistic complexity: Investigations using highly accurate CompChem methods normally require overly simplified model systems while more realistic model systems necessitate less accurate but computationally efficient CompChem methods. This compromise should no longer be necessary. We are due for a paradigm shift in how thermodynamics, kinetics, and dynamics of systems in complex chemical environments (e.g., for multiscale biological processes like drug design and/or catalytic processes at solid−liquid interfaces under photochemical excitations, etc.) can be treated more faithfully with less corner-cutting. An emerging idea is to dispatch ML approaches into computationally efficient model Hamiltonians for electronic interactions based on correlated wavefunction, KS-DFT, tight-binding, molecular orbital techniques, and/or the many-body dispersion method. ML can predict Hamiltonian parameters and the quantum-mechanical observables would be calculated via diagonalization of the corresponding Hamiltonian. The challenge is to find an appropriate balance between prediction accuracy and computational efficiency to dramatically enhance larger scale simulations. 5. Much more experimental data is needed: Validations of ML predictions require extensive comparisons with experimental observables such as reaction rates, spectroscopic observations, solvation energies, and melting temperatures. Such experiments may have previously been considered too routine, too mundane, or not insightful enough alone, but all high quality brings great value for future CompChem+ML efforts that tightly integrate quantum mechanics, statistical simulations, and fast ML predictions, all within a comprehensive molecular simulation framework. 720 6. Much more comprehensive data sets need to be assembled and curated: Current CompChem+ML efforts have profited heavily by the availability of benchmark data sets for relatively small molecules that allow a comparison of existing models. 413, 527 While efforts fixated on boosting prediction accuracies and shrinking down requisite training set sizes for ML models have had their merits, it is time to move on as further improvements are meaningless if the ML models are not making useful and insightful predictions themselves. More useful predictions will require knowledge from larger data sets, and these will inevitably contain heterogeneous combinations of different levels of theory or experiments that must be analyzed, "cleaned", and uncertainties adequately quantified for models to productively learn. Such hybrid data sets may be the key to arrive at novel hypotheses in chemistry that could then be experimentally tested. 7. Bolder and deeper explorations of chemical space are needed: So far most efforts to generate chemical data have focused on exploring parts of chemical space for new compounds for a targeted purpose. This should change. Combining ML model uncertainty estimates across broader swaths of chemical space could open pathways for fruitful statistical explorations, say, in an active learning framework. This could lead to discovering new synergies between data that otherwise would not have been possible to enable advances in scientific understanding and improve ML models. Generative models can bridge the gap between sampling and targeted structure generation imposing optimal compound properties, for example, for inverse chemical design. 125, 621, 622 This and other reviews 20,554,634,720−724 have stated how ML has become instrumental for recent progress in CompChem. We would like to also mention inspirations that ML has drawn from being applied to physical and chemical problems. ML methods generally assume that data is subject to measurement noise while CompChem data is generally approximate but also noise-free from a statistical perspective. ML modeling still requires regularization, but regularizers should reflect the underlying physics of molecular and Chemical Reviews pubs.acs.org/CR Review materials systems. ML models used in applications of vision contain discrete convolution filters that are suboptimal for chemical modeling, but recognition of this shortcoming has led to novel continuous convolution filters that are well suited for chemistry and have also become a popular novel architecture for core ML methods. 434 Furthermore, invariances, symmetries, and conservation laws are key ingredients to physical and chemical systems. Incorporating them into ML has led to novel and useful models for chemistry since they can learn from significantly less data, which then makes it possible to build force fields at unprecedentedly high levels of theory. 206, 207, 372 Using these powerful ML techniques for computer vision, natural language processing, and other applications is currently being explored. Structural information from molecular graphs provide the basis for novel tensor NNs or message passing architectures, 352, 421 as well as graph explanation methods. 725 Many further challenges exist that have led or will lead to mutual bidirectional cross-fertilization between ML and chemistry. These interdisciplinary efforts also initiate progress in respective application domains. The power of this path is that solving a burning problem in chemistry with a novel crafted ML model may also result in unforeseen insights in how to better design core ML methods. Interestingly, the exploratory usage of ML for knowledge discovery in chemistry typically requires novel ML models and unforeseen scientific innovations, and this can lead to interesting insight that is not necessary limited to chemistry alone, rather it is likely to go beyond. To conclude, the past decade has shown that it has not been enough to just apply existing ML algorithms, but breakthroughs are happening by a handshaking of innovations resulting in novel ML algorithms and architectures driven by the pursuit of novel insights in chemistry while retaining a deep understanding about the underlying physical and chemical principles. Research programs that foster interdisciplinary exchange, such as IPAM (www.ipam.ucla.edu), have seeded this progress, and these should be continued. Mixed teams with members educated in different aspects of physics, chemistry and ML have been instrumental. This also brings the need to solve the new educational challenge of developing new generations of researchers with an academic curriculum that interweaves chemistry, physics and computer science to enable a meaningful (multilingual) research contribution to this exciting emerging field. Deep Learning Deep Learning in Neural Networks: An Overview Deep Learning DNA Methylation-Based Classification of Central Nervous System Tumours Scoring of Tumor-Infiltrating Lymphocytes: From Visual Estimation to Machine Learning Analysis of DNA Methylation Profiles Distinguishes Primary Lung Squamous Cell Carcinomas From Head and Neck Metastases End-to-End Lung Cancer Screening With Three-Dimensional Deep Learning on Low-Dose Chest Computed Tomography Morphological and molecular breast cancer profiling through explainable machine learning Searching for Exotic Particles in High-Energy Physics With Deep Learning Bioinformatics Prediction of HIV Coreceptor Usage Improved Protein Structure Prediction Using Potentials From Deep Learning Optimizing Spatial Filters for Robust EEG Single-Trial Analysis Online Learning of Social Representations An Adaptive Deep Reinforcement Learning Framework Enables Curling Robots With Human-Like Performance in Real World Conditions The Art of Winning an Unfair Game Mastering the Game of Go With Deep Neural Networks and Tree Search Machine Learning for Chemical Discovery The Wisdom Hierarchy: Representations of the DIKW Hierarchy Physical Chemistry: A Molecular Approach Essentials of Computational Chemistry: Theories and Models Understanding Molecular Simulation: From Algorithms to Applications OpenMM 7: Rapid Development of High Performance Algorithms for Molecular Dynamics Accurate Bond Energies of Biodiesel Methyl Esters From Multireference Averaged Coupled-Pair Functional Calculations The Classification of Tilted Octahedra in Perovskites × 7) Surface by Atomic Force Microscopy Assessment of Gaussian-3 and Density Functional Theories for a Larger Experimental Test Set Summer School in Quantum Chemistry Many-Body Perturbation Theory and Coupled Cluster Theory for Electron Correlation in Molecules Theory and Practice of Modeling Van Der Waals Interactions in Electronic-Structure Calculations Quantifying the Effects of the Self-Interaction Error in DFT: When Do the Delocalized States Appear? The Devil in the Details: A Tutorial Review on Some Undervalued Aspects of Density Functional Theory Calculations Density-Functional Thermochemistry. III. The Role of Exact Exchange Generalized Gradient Approximation Made Simple A Look at the Density Functional Theory Zoo With the Advanced GMTKN55 Database for General Main Group Thermochemistry, Kinetics and Noncovalent Interactions Benchmark Database of Barrier Heights for Heavy Atom Transfer, Nucleophilic Substitution, Association, and Unimolecular Reactions and Its Use to Test Theoretical Methods Quantifying Density Errors in DFT Critical Assessment of the Performance of Density Functional Methods for Several Atomic and Molecular Properties Quantifying Uncertainties in Solvation Procedures for Modeling Aqueous Phase Reaction Mechanisms Sharing Data From Molecular Simulations Integration Grid Errors for Meta-Gga-Predicted Reaction Energies: Origin of Grid Errors for the M06 Suite of Functionals Promoting Transparency and Reproducibility in Enhanced Molecular Simulations Reproducibility in Density Functional Theory Calculations of Solids The Need for Open Source Software in Machine Learning Computational Chemistry Faces a Coding Crisis Challenge to Scientists: Does Your Ten-Year-Old Code Still Run? Examples of Effective Data Sharing in Scientific Publishing Managing the Computational Chemistry Big Data Problem: The ioChem-BD Platform AiiDA 1.0, a Scalable Computational Infrastructure for Automated Reproducible Workflows and Data Provenance Saddle Points of Index 2 on Potential Energy Surfaces and Their Role in Theoretical Reactivity Investigations Bifurcations on Potential Energy Surfaces of Organic Reactions Anatomy of Relativistic Energy Corrections in Light Molecular Systems Jacob's Ladder of Density Functional Approximations for the Exchange-Correlation Energy Quantisierung Als Eigenwertproblem (Erste Mitteilung) Quantisierung Als Eigenwertproblem (Zweite Mitteilung) Quantisierung Als Eigenwertproblem (Vierte Mitteilung) Hammes-Schiffer, S. Multicomponent Quantum Chemistry: Integrating Electronic and Nuclear Quantum Effects via the Nuclear-Electronic Orbital Method Accurate Correlation Consistent Basis Sets for Molecular Core-Valence Correlation Effects: The Second Row Atoms Al-Ar, and the First Row Atoms B-Ne Revisited Self-Consistent Molecular-Orbital Methods. I. Use of Gaussian Expansions of Slater-Type Atomic Orbitals Fully Optimized Contracted Gaussian Basis Sets for Atoms Li to Kr Energy Band Calculations by the Augmented Plane Wave Method A Linearised Relativistic Augmented-Plane-Wave Method Utilising Approximate Pure Spin Basis Functions Self-Consistent Mixed-Basis Approach to the Electronic Structure of Solids Separable Dual-Space Gaussian Pseudopotentials Ab Initio Effective Potentials for Use in Molecular Quantum Mechanics Ab Initio Effective Core Potentials for Molecular Calculations. Potentials for Main Group Elements Na to Bi Ab Initio Effective Core Potentials for Molecular Calculations. Potentials for the Transition Metal Atoms Sc to Hg Segmented Contraction Scheme for Small-Core Lanthanide Pseudopotential Basis Sets Small-Core Multiconfiguration-Dirac-Hartree-Fock-Adjusted Pseudopotentials for Post-D Main Group Elements: Application to PbH and PbO In Handbook of Relativistic Quantum Chemistry Reformulation of the Screened Heine-Abarenkov Model Potential Ab Initio Effective Core Potentials: Reduction of All-Electron Molecular Structure Calculations to Calculations Involving Only Valence Electrons Improved Ab Initio Effective Core Potentials for Molecular Calculations 85) Vanderbilt, D. Optimally Smooth Norm-Conserving Pseudopotentials Pseudopotentials for High-Throughput DFT Calculations Norm-Conserving and Ultrasoft Pseudopotentials for First-Row and Transition Elements From Ultrasoft Pseudopotentials to the Projector Augmented-Wave Method A Straightforward Method for Generating Soft Transferable Pseudopotentials Systematically Convergent Basis Sets With Relativistic Pseudopotentials. II. Small-Core Pseudopotentials and Correlation Consistent Basis Sets for the Post-D Group 16−18 Elements Revised Basis Sets for the LANL Effective Core Potentials Relativistic Effects in Chemistry: More Common Than You Thought The Quantum Theory of the Electron The Douglas-Kroll-Hess Approach Gradients in the Ab Initio Scalar Zeroth-Order Regular Approximation (ZORA) Approach The Dirac Equation in Quantum Chemistry: Strategies to Overcome the Current Computational Problems Self-Consistent Field, With Exchange, for Beryllium A Simplification of the Hartree-Fock Method Naḧerungsmethode Zur Losung Des Quantenmechanischen Mehrkorperproblems Introduction to Computational Chemistry New Developments in Molecular Orbital Theory The Molecular Orbital Theory of Chemical Valency VIII. A Method of Calculating Ionization Potentials Deep-Neural-Network Solution of the Electronic Schrodinger Equation Ab Initio Solution of the Many-Electron Schrodinger Equation With Deep Neural Networks The Ground State Electronic Energy of Benzene Coupled-Cluster Theory in Quantum Chemistry Describing Noncovalent Interactions Beyond the Common Approximations: How Accurate Is the "Gold Standard Popular Theoretical Methods Predict Benzene and Arenes to Be Nonplanar Comment on a Spurious Prediction of a Non-Planar Geometry for Benzene at the MP2 Level of Theory Generating Efficient Quantum Chemistry Codes for Novel Architectures TeraChem: A Graphical Processing Unit-Accelerated Electronic Structure Package for Large-Scale Ab Initio Molecular Dynamics Monte Carlo on Graphical Processing Units Real-Space Density Functional Theory on Graphical Processing Units: Computational Approach and Comparison to Gaussian Basis Set Methods Solution of the Hartree-Fock Equations by a Pseudospectral Method: Application to Diatomic Molecules Pseudospectral Full Configuration Interaction Fast Evaluation of the Coulomb Potential for Electron Densities Using Multipole Accelerated Resolution of Identity Approximation An Efficient and Near Linear Scaling Pair Natural Orbital Based Local Coupled Cluster Method Explicitly Correlated R12/F12 Methods for Electronic Structure Quantum Monte Carlo and Related Approaches The Density Matrix Renormalization Group Algorithm in Quantum Chemistry Unifying Machine Learning and Quantum Chemistry With a Deep Neural Network for Molecular Wavefunctions Improving the accuracy of Møller-Plesset perturbation theory with neural networks Transferable MP2-Based Machine Learning for Accurate Coupled-Cluster Energies Machine Learning Configuration Interaction Automation of Active Space Selection for Multireference Methods via Machine Learning on Chemical Bond Dissociation A Complete Basis Set Model Chemistry. VII. Use of the Minimum Population Localization Method Gaussian-3 (G3) Theory for Molecules Containing First and Second-Row Atoms W4 Theory for Computational Thermochemistry: In Pursuit of Confident Sub-kJ/Mol Predictions HEAT: High Accuracy Extrapolated Ab Initio Thermochemistry A Computational Chemist's Guide to Accurate Thermochemistry for Organic Molecules Boosting Quantum Machine Learning Models With a Multilevel Combination Technique: Pople Diagrams Revisited Multireference Electron Correlation Methods: Journeys Along Potential Energy Surfaces Applicability of the Multi-Reference Double-Excitation CI (MRD-CI) Method to the Calculation of Electronic Wavefunctions and Comparison With Related Techniques Optimizing Conical Intersections Without Derivative Coupling Vectors: Application to Multistate Multireference Second-Order Perturbation Theory (MS-CASPT2) Multireference Character for 3d Transition-Metal-Containing Molecules Comparison of the T1 and D1 Diagnostics for Electronic Structure Theory: A New Definition for the Open-Shell D11 Diagnostic Data-Driven Approaches Can Overcome the Cost-Accuracy Trade-Off in Multireference Diagnostics Complete Active Space SCF Method (CASSCF) Using a Density Matrix Formulated Super-Ci Approach Multiconfiguration Self-Consistent Field and Multireference Configuration Interaction Methods and Applications A Perspective on the CASPT2Method Multireference Nature of Chemistry: The Coupled-Cluster View Perspective: Multireference Coupled Cluster Theories of Dynamical Electron Correlation Erratum: O2-Binding to Heme: Electronic Structure and Spectrum of Oxyheme, Studied by Multiconfigurational Methods Two-Dimensional Chart of Quantum Chemistry Density-Functional Theory of Atoms and Molecules; International Series of Monographs on Chemistry Orbital-Free Density Functional Theory for Materials Research Introducing PROFESS 2.0: A Parallelized, Fully Linear Scaling Program for Orbital-Free Density Functional Theory Calculations ATLAS: A Real-Space Finite-Difference Implementation of Orbital-Free Density Functional Theory Nonlocal Kinetic Energy Functionals by Functional Integration Generalized Density Functional Theories Using the K -Electron Densities: Development of Kinetic Energy Functionals Nonlocal Orbital-Free Kinetic Energy Density Functional for Semiconductors Transport Properties of Lithium Hydride at Extreme Conditions From Orbital-Free Molecular Dynamics Ionic and Electronic Transport Properties in Dense Plasmas by Orbital-Free Density Functional Theory Two-Temperature Warm Dense Hydrogen as a Test of Quantum Protons Driven by Orbital-Free Density Functional Theory Electronic Forces Orbital-Free Bond Breaking via Machine Learning Bypassing the Kohn-Sham Equations With Machine Learning Self-Consistent Equations Including Exchange and Correlation Effects Advances in Density-Functional Calculations for Materials Modeling Handbook of Computational Chemistry Learn Density Functional Theory Rungs 1 to 4 of DFT Jacob's Ladder: Extensive Test on the Lattice Constant, Bulk Modulus, and Cohesive Energy of Solids Spin-Component-Scaled Double Hybrids: An Extensive Search for the Best Fifth-Rung Functionals Blending DFT and Perturbation Theory Reducing Density-Driven Error Without Exact Exchange Range Separated Hybrid Density Functional With Long-Range Hartree-Fock Exchange Applied to Solids Hartree-Fock Ab Initio Treatment of Crystalline Systems DFT+ U in Dudarev's Formulation With Corrected Interactions Between the Electrons With Opposite Spins: The Form of Hamiltonian, Calculation of Forces, and Bandgap Adjustments Correlated Metals and the LDA + U Method Quantum Embedding Theories Self-Consistently Determined Properties of Solids Without Band-Structure Calculations Advances in Correlated Electronic Structure Methods for Solids, Surfaces, and Nanostructures Exact Density-Functional-Theory Embedding Scheme Embedded Correlated Wavefunction Schemes: Theory and Applications Progress in Time-Dependent Density-Functional Theory Excited-State Potential Energy Curves From Time-Dependent Density-Functional Theory: A Cross Section of Formaldehyde's 1A1Manifold Finding Density Functionals With Machine Learning Machine Learning the Physical Nonlocal Exchange-Correlation Functional of Density-Functional Theory Machine Learning Approaches Toward Orbital-Free Density Functional Theory: Simultaneous Training on the Kinetic Energy Density Functional and Its Functional Derivative Quantum Chemical Accuracy From Density Functional Approximations via Machine Learning Completing Density Functional Theory by Machine Learning Hidden Messages From Molecules Development and Use of Quantum Mechanical Molecular Models. 76. AM1: A New General Purpose Quantum Mechanical Molecular Model Optimization of Parameters for Semiempirical Methods VI: More Modifications to the NDDO Approximations and Re-Optimization of Parameters Ground States of Molecules. 38. The MNDO Method. Approximations and Parameters Semiempirical Quantum-Chemical Orthogonalization-Corrected Methods: Benchmarks for Ground-State Properties Density-Functional Tight-Binding for Beginners Self-Consistent-Charge Density-Functional Tight-Binding Method for Simulations of Complex Materials Properties GFN2-xTB -An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method With Multipole Electrostatics and Density-Dependent Dispersion Contributions Machine Learning of Parameters for Accurate Semiempirical Quantum Chemical Calculations Machine-Learned Approximations to Density Functional Theory Hamiltonians. Sci. Rep. 2017 Accurate Many-Body Repulsive Potentials for Density-Functional Tight Binding From Deep Tensor Neural Networks A Density Functional Tight Binding Layer for Deep Learning of Chemical Hamiltonians Quantum Tunneling of Thermal Protons Through Pristine Graphene Nuclear Quantum Effects in Water and Aqueous Systems: Experiment, Theory, and Current Challenges Ab Initio Path Integral Molecular Dynamics: Basic Ideas Exploiting the Isomorphism Between Quantum Theory and Classical Statistical Mechanics of Polyatomic Fluids The Formulation of Quantum Statistical Mechanics Based on the Feynman Path Centroid Density. IV. Algorithms for Centroid Molecular Dynamics Communication: Relation of Centroid Molecular Dynamics and Ring-Polymer Molecular Dynamics to Exact Quantum Dynamics Quantum Fluctuations and Isotope Effects in Ab Initio Descriptions of Water Dynamical Strengthening of Covalent and Non-Covalent Molecular Interactions by Nuclear Quantum Effects at Finite Temperature Towards Exact Molecular Dynamics Simulations With Machine-Learned Force Fields The Lennard-Jones Potential: When (Not) to Use It Parameterization of Highly Charged Metal Ions Using the 12−6-4 LJ-type Nonbonded Model in Explicit Water Application of the Morse Potential Function to Cubic Metals The Classical Equation of State of Gaseous Helium, Neon and Argon Comparison of Simple Potential Functions for Simulating Liquid Water Computer Simulation of Local Order in Condensed Phases of Silicon Assisted Model Building With Energy Refinement. A General Program for Modeling Molecules and Their Interactions An Overview of the Amber Biomolecular Simulation Package Development and Testing of a General Amber Force Field The Biomolecular Simulation Program Biomolecular Force Field Based on the Free Enthalpy of Hydration and Solvation: The GROMOS Force-Field Parameter Sets 53A5 and 53A6 Definition and Testing of the GROMOS Force-Field Versions 54A7 and 54B7 Parametrization of Aliphatic CHn United Atoms of GROMOS96 Force Field Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids Optimized Intermolecular Potential Functions for Liquid Hydrocarbons DREIDING: A Generic Force Field for Molecular Simulations Merck Molecular Force Field. I. Basis, Form, Scope, Parameterization, and Performance of MMFF94 UFF, a Full Periodic Table Force Field for Molecular Mechanics and Molecular Dynamics Simulations Compass: An Ab Initio Force-Field Optimized for Condensed-Phase Applications -Overview With Details on Alkane and Benzene Compounds Thermodynamically Consistent Force Fields for the Assembly of Inorganic, Organic, and Biological Nanostructures: The INTERFACE Force Field Empirical Potential Derivation for Ionic Materials Current Status of the AMOEBA Polarizable Force Field An Empirical Polarizable Force Field Based on the Classical Drude Oscillator Model: Development History and Recent Applications Parametrizing a Polarizable Force Field From Ab Initio Data. I. The Fluctuating Point Charge Model Water Potential With Flexible Monomers: Dimer Potential Energy Surface, VRT Spectrum, and Second Virial Coefficient A Second Generation Distributed Point Polarizable Water Model Ab Initio Force Field Methods Derived From Quantum Mechanics The Embedded-Atom Method: A Review of Theory and Applications Modified Embedded-Atom Potentials for Cubic Materials and Impurities A Simple Empirical N-Body Potential for Transition Metals Long-Range Finnis-Sinclair Potentials Empirical Potential for Hydrocarbons for Use in Simulating the Chemical Vapor Deposition of Diamond Films New Empirical Approach for the Structure and Energy of Covalent Systems Modeling Solid-State Chemistry: Interatomic Potentials for Multicomponent Systems A Second-Generation Reactive Empirical Bond Order (REBO) Potential Energy Expression for Hydrocarbons Classical Atomistic Simulations of Surfaces and Heterogeneous Interfaces With the Charge-Optimized Many Body (COMB) Potentials Charge Optimized Many-Body Potential for the Si/SiO 2 System The ReaxFF Reactive Force-Field: Development, Applications and Future Directions A Reactive Force Field for Hydrocarbons APT a Next Generation QM-based Reactive Force Field Model An Empirical Valence Bond Approach for Comparing Reactions in Solutions and in Enzymes An Improved Multistate Empirical Valence Bond Model for Aqueous Proton Solvation and Transport Reactive Force Fields Made Simple An Approach to Computing Electrostatic Charges for Molecules Class IV Charge Models: A New Semiempirical Approach in Quantum Chemistry Electrostatic Effects in Proteins: Comparison of Dielectric and Charge Models Charge Transfer With Polarization Current Equalization. A Fluctuating Charge Model With Correct Asymptotics Describing Molecular Polarizability by a Bond Capacity Model Charge Equilibration for Molecular Dynamics Simulations Large-Scale Computations in Chemistry: A Bird's Eye View of a Vibrant Field Review of Force Fields and Intermolecular Potentials Used in Atomistic Computational Materials Research Preface: Special Topic: From Quantum Mechanics to Force Fields Empirical Interatomic Potential for Silicon With Improved Elastic Properties General Force Field: A Force Field for Drug-Like Molecules Compatible With the CHARMM All-Atom Additive Biological Force Fields AMOEBA Polarizable Atomic Multipole Force Field for Nucleic Acids Routine Microsecond Molecular Dynamics Simulations With AMBER on GPUs. 1. Generalized Born Routine Microsecond Molecular Dynamics Simulations With AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald GPUaccelerated Molecular Modeling Coming of Age Strong Scaling of General-Purpose Molecular Dynamics Simulations on GPUs Tinker-Hp: A Massively Parallel Molecular Dynamics Package for Multiscale Simulations of Large Complex Systems With Advanced Point Dipole Polarizable Force Fields Embedded Atom Neural Network Potentials: Efficient and Accurate Machine Learning With a Physically Inspired Representation Perspective: Materials Informatics and Big Data: Realization of the "Fourth Paradigm Physically Informed Artificial Neural Networks for Atomistic Modeling of Materials Intelligent-ReaxFF: Evaluating the Reactive Force Field Parameters With Machine Learning Machine Learnt Bond Order Potential to Model Metal-Organic (Co-C) Heterostructures Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces Extending the Accuracy of the SNAP Interatomic Potential Form Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, Without the Electrons Neural Network and ReaxFF Comparison for Au Properties The Power of Coarse Graining in Biomolecular Simulations Polarizable Continuum Model First Principles Global Optimization of Metal Clusters and Nanoalloys OGOLEM: Global Cluster Structure Optimisation for Arbitrary Mixtures of Flexible Molecules. A Multiscaling, Object-Oriented Approach Global Optimization by Basin-Hopping and the Lowest Energy Structures of Lennard-Jones Clusters Containing Up to 110 Atoms ABCluster: The Artificial Bee Colony Algorithm for Cluster Global Optimization Minima Hopping: An Efficient Search Method for the Global Minimum of the Potential Energy Surface of Complex Molecular Systems Geometry Optimization Optimization Methods for Finding Minimum Energy Paths Estimating the Hessian for Gradient-Type Geometry Optimizations A Climbing Image Nudged Elastic Band Method for Finding Saddle Points and Minimum Energy Paths A Generalized Solid-State Nudged Elastic Band Method Growing String Method With Interpolation and Optimization in Internal Coordinates: Method and Examples Optimization-Based String Method for Finding Minimum Energy Path Quadratic String Method for Determining the Minimum-Energy Path Based on Multiobjective Optimization Acceleration of Saddle-Point Searches With Machine Learning Local Bayesian Optimizer for Atomic Structures Low-Scaling Algorithm for Nudged Elastic Band Calculations Using a Surrogate Machine Learning Model Machine Learning in Computational Chemistry: An Evaluation of Method Performance for Nudged Elastic Band Calculations Nudged Elastic Band Calculations Accelerated With Gaussian Process Regression Based on Inverse Interatomic Distances Boltzmann Generators: Sampling Equilibrium States of Many-Body Systems With Deep Learning Operators in Quantum Machine Learning: Response Properties in Chemical Space Origins of Complex Solvent Effects on Chemical Reactivity and Computational Tools to Investigate Them: A Review Advances and Challenges in Modeling Solvated Reaction Mechanisms for Renewable Fuels and Chemicals Quantum Mechanical Continuum Solvation Models A Universal Approach to Solvation Modeling The COSMO and COSMO-RS Solvation Models Molecular Theory of Solvation Electrostatic Interaction of a Solute With a Continuum. A Direct Utilizaion of Ab Initio Molecular Potentials for the Prevision of Solvent Effects A New Integral Equation Formalism for the Polarizable Continuum Model: Theoretical Background and Applications to Isotropic and Anisotropic Dielectrics Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface Tensions Quantum Calculation of Molecular Energies and Energy Gradients in Solution by a Conductor Solvent Model COSMO: A New Approach to Dielectric Screening in Solvents With Explicit Expressions for the Screening Energy and Its Gradient Conductor-Like Screening Model for Real Solvents: A New Approach to the Quantitative Calculation of Solvation Phenomena Quantum Mechanical Continuum Solvation Models for Ionic Liquids A Cavity Corrected 3d-Rism Functional for Accurate Solvation Free Energies Hybrid Solvation Models for Bulk, Interface, and Membrane: Reference Interaction Site Methods Coupled With Density Functional Theory Progress in Ab Initio QM/MM Free-Energy Simulations of Electrostatic Energies in Proteins: Accelerated QM/MM Studies of pKa, Redox Reactions and Solvation Free Energies Explicit Solvation Matters: Performance of QM/MM Solvation Models in Nucleophilic Addition Hybrid QM/MM Study of Thio Effects in Transphosphorylation Reactions: The Role of Solvation First-Principles Modeling of Chemistry in Mixed Solvents: Where to Go From Here? A Review of Methods for the Calculation of Solution Free Energies and the Modelling of Systems in Solution Machine Learning-Guided Approach for Studying Solvation Environments Quasi-Chemical Theories of Associated Liquids The Hydration Number of Li+ in Liquid Water Fast Predictions of Liquid-Phase Acid-Catalyzed Reaction Rates Using Molecular Dynamics Simulations and Convolutional Neural Networks Solvation Free Energy Calculations With Quantum Mechanics/Molecular Mechanics and Machine Learning Models Quantitative Correlation of Physical and Chemical Properties With Chemical Structure: Utility for Prediction On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research Conceptual Density Functional Theory: Status, Prospects Alchemical Perturbation Density Functional Theory Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors Going Deeper With Convolutions Imagenet Large Scale Visual Recognition Challenge Imagenet Classification With Deep Convolutional Neural Networks A Neural Probabilistic Language Model Google's Neural Machine Translation System Efficient Estimation of Word Representations in Vector Space. arXiv Polosukhin, I. Attention Is All You Need The Elements of Statistical Learning: Data Mining, Inference, and Prediction Gaussian Processes in Machine Learning Multilayer Feedforward Networks Are Universal Approximators The Variational Gaussian Process. arXiv preprint The Nature of Statistical Learning Theory Networks for Approximation and Learning The Connection Between Regularization Operators and Support Vector Kernels Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping Dropout: A Simple Way to Prevent Neural Networks From Overfitting Neural Networks and the Bias/Variance Dilemma Quantum-Chemical Insights From Deep Tensor Neural Networks On the Inductive Bias of Neural Tangent Kernels. arXiv, 2019, 1905.12173 Explaining Nonlinear Classification Decisions With Deep Taylor Decomposition Explainable AI: Interpreting, Explaining and Visualizing Deep Learning How to Explain Individual Classification Decisions On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation Methods for Interpreting and Understanding Deep Neural Networks From Machine Learning to Explainable AI Unmasking Clever Hans Predictors and Assessing What Machines Really Learn Explaining deep neural networks and beyond: A review of methods and applications Automated Reverse Engineering of Nonlinear Dynamical Systems Distilling Free-Form Natural Laws From Experimental Data Discovering Governing Equations From Data by Sparse Identification of Nonlinear Dynamical Systems Sparse Learning of Stochastic Dynamical Equations Discovering Governing Reactions From Concentration Data Visual Interaction Networks: Learning a Physics Simulator From Video Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations Predicting Reaction Performance in C-N Cross-Coupling Using Machine Learning Predicting Reaction Performance in C-N Cross-Coupling Using Machine Learning Response to Comment on "Predicting Reaction Performance in C-N Cross-Coupling Using Machine Learning Big Data of Materials Science: Critical Role of the Descriptor Predicting the Phase Diagram of Titanium Dioxide With Random Search and Pattern Recognition Handbook of Data Visualization Kernel Principal Component Analysis Nonlinear Component Analysis as a Kernel Eigenvalue Problem Visualizing Data Using T-Sne Simplifying the Representation of Complex Free-Energy Landscapes Using Sketch-Map UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv A Unifying Review of Deep and Shallow Anomaly Detection Reinforcement Learning: A Survey The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms Perceptrons: An Introduction to Computational Geometry Learning Representations by Back-Propagating Errors Une procedure d'apprentissage pour reśeau àseuil asymetrique (A learning scheme for asymmetric threshold networks) Neural Networks for Pattern Recognition On the Approximate Realization of Continuous Mappings by Neural Networks Approximation by Superpositions of a Sigmoidal Function Support-Vector Networks An Introduction to Kernel-Based Learning Algorithms Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond Input Space Versus Feature Space in Kernel-Based Methods Predicting Time Series With Support Vector Machines Kernel-Based Nonlinear Blind Source Separation On Relevant Dimensions in Kernel Feature Spaces Kernel Analysis of Deep Networks Analyzing Local Structure in Kernel-Based Learning: Explanation, Complexity, and Reliability Assessment A Machine Learning Approach to Predicting Protein-Ligand Binding Affinity With Applications to Molecular Docking Machine Learning Approach for Structure-Based Zeolite Classification Materials Screening for the Discovery of New Half-Heuslers: Machine Learning Versus Ab Initio Methods Covariate Shift Adaptation by Importance Weighted Cross Validation Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites Atom-Centered Symmetry Functions for Constructing High-Dimensional Neural Network Potentials On Representing Chemical Environments Fast and Accurate Modeling of Molecular Atomization Energies With Machine Learning Crystal Structure Representations for Machine Learning Models of Formation Energies Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space Anatole von Lilienfeld, O. FCHL Revisited: Faster and More Accurate Quantum Machine Learning Alchemical and Structural Distribution Based Representation for Universal Quantum Machine Learning Unified Representation of Molecules and Crystals for Machine Learning How to Represent Crystal Structures for Machine Learning: Towards Fast Prediction of Electronic Properties Atomic Cluster Expansion for Accurate and Transferable Interatomic Potentials Neural Message Passing for Quantum Chemistry SchNet: A Continuous-Filter Convolutional Neural Network for Modeling Quantum Interactions A Novel Approach to Describe Chemical Environments in High-Dimensional Neural Network Potentials Convolutional Networks on Graphs for Learning Molecular Fingerprints High-Throughput Screening of Bimetallic Catalysts Enabled by Machine Learning Molecular Generative Model Based on Conditional Variational Autoencoder for De Novo Molecular Design Extended-Connectivity Fingerprints Comparing Molecular Patterns Using the Example of SMARTS: Applications and Filter Collection Analysis Comparing Molecular Patterns Using the Example of SMARTS: Theory and Algorithms SMILES, a Chemical Language and Information System: 1: Introduction to Methodology and Encoding Rules SMILES. 2. Algorithm for Generation of Unique SMILES Notation Neural Network Potential-Energy Surfaces in Chemistry: A Tool for Large-Scale Simulations Perspective: Machine Learning Potentials for Atomistic Simulations SchNet-A Deep Learning Architecture for Molecules and Materials Model for Molecular Energies From a Neural Network Approach Based on Local Information Gaussian Processes and Fast Matrix-Vector Multiplies Deep Kernel Learning. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics Product Kernel Interpolation for Scalable Gaussian Processes. arXiv Gpytorch: Blackbox Matrix-Matrix Gaussian Process Inference With Gpu Acceleration Exact Gaussian Processes on a Million Data Points Neural Networks: Tricks of the Trade Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies The Evidence Framework Applied to Classification Networks A Practical Bayesian Framework for Backpropagation Networks The Evidence Framework Applied to Support Vector Machines A New Look at the Statistical Model Identification Estimating the Dimension of a Model Network Information Criterion-Determining the Number of Hidden Units for an Artificial Neural Network Model Ensemble Learning of Coarse-Grained Molecular Dynamics Force Fields With a Kernel Approach Data-Driven Collective Variables for Enhanced Sampling Atom-Density Representations for Machine Learning Physics-Inspired Structural Representations for Molecules and Materials. arXiv Metrics for Measuring Distances in Configuration Spaces Structural Characterization of Deformed Crystals by Analysis of Common Atomic Neighborhood Wasserstein Metric for Improved Quantum Machine Learning With Adjacency Matrix Representations WACSF -Weighted Atom-Centered Symmetry Functions as Descriptors in Machine Learning Potentials Comparing Molecules and Solids Across Structural and Alchemical Space Many-Body Descriptors for Predicting Molecular Properties With Machine Learning: Analysis of Pairwise and Three-Body Interactions in Molecules Deep Potential Molecular Dynamics: A Scalable Model With the Accuracy of Quantum Mechanics End-to-End Symmetry Preserving Inter-Atomic Potential Energy Model for Finite and Extended Systems Cormorant: Covariant Molecular Neural Networks Tensor Field Networks: Rotation-and Translation-Equivariant Neural Networks for 3d Point Clouds. arXiv Automatic Selection of Atomic Fingerprints and Reference Configurations for Machine-Learning Potentials Farthest-Point Optimized Point Sets with Maximized Minimum Distance CUR Matrix Decompositions for Improved Data Analysis Incompleteness of Atomic Structure Representations Fourier Series of Atomic Radial Distribution Functions: A Molecular Fingerprint for Machine Learning Models of Quantum Chemical Properties Extracting Ice Phases From Liquid Water: Why a Machine-Learning Water Model Generalizes So Well. arXiv Machine Learning Based Interatomic Potential for Amorphous Carbon An Accurate and Transferable Machine Learning Potential for Carbon When Do Short-Range Atomistic Machine-Learning Models Fall Short General-Purpose Machine Learning Potentials Capturing Nonlocal Charge Transfer A Fourth-Generation High-Dimensional Neural Network Potential With Accurate Electrostatics Including Non-Local Charge Transfer Transferable Atomic Multipole Machine Learning Models for Small Organic Molecules Interatomic Potentials for Ionic Systems With Density Functional Accuracy Based on Charge Densities Obtained by a Neural Network Incorporating Long-Range Physics in Atomic-Scale Machine Learning Multi-Scale Approach for the Prediction of Atomic Scale Properties Accurate Interatomic Force Fields via Machine Learning With Covariant Kernels Symmetry-Adapted Machine Learning for Tensorial Properties of Atomistic Systems Feature Optimization for Atomistic Machine Learning Yields a Data-Driven Construction of the Periodic Table of the Elements Hierarchical Modeling of Molecular Energies Using a Deep Neural Network PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges Neural Message Passing With Edge Updates for Predicting Properties of Directional Message Passing for Molecular Graphs PiNN: A Python Library for Building Atomic Neural Networks of Molecules and Materials Best Practices for QSAR Model Development, Validation, and Exploitation Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships Transferable Machine-Learning Model of the Electron Density Accurate Molecular Polarizabilities With Coupled Cluster Theory and Machine Learning Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach Machine-Learning Approach for One-And Two-Body Corrections to Density Functional Theory: Applications to Molecular and Condensed Water Universal Density Matrix Functional From Molecular Orbital-Based Machine Learning: Transferability Across Organic Molecules Prediction Errors of Molecular Machine Learning Models Lower Than Hybrid DFT Error Can Exact Conditions Improve Machine-Learned Density Functionals? Understanding Machine-Learned Density Functionals Understanding Kernel Ridge Regression: Common Behaviors From Simple Functions to Density Functionals Neural-Network Kohn-Sham Exchange-Correlation Potential and Its Out-of-Training Transferability Solving the Quantum Many-Body Problem With Artificial Neural Networks An Improved Neural Network Method for Solving the Schrodinger Equation Quantum Machine Learning Using Atom-in-Molecule-Based Fragments Selected on the Fly Molecular Dynamics With on-the-Fly Machine Learning of Quantum-Mechanical Forces ANI-1: An Extensible Neural Network Potential With DFT Accuracy at Force Field Computational Cost Data Descriptor: ANI-1, a Data Set of 20 Million Calculated Off-Equilibrium Conformations for Organic Molecules The ANI-1ccx and ANI-1x Data Sets, Coupled-Cluster and Density Functional Theory Properties for A Web-Accessible Database of Experimentally Determined Protein-Ligand Binding Affinities The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid Computation-Ready, Experimental Metal-Organic Frameworks: A Tool to Enable High-Throughput Screening of Nanoporous Crystals FreeSolv: A Database of Experimental and Calculated Hydration Free Energies, With Input Files Virtual Exploration of the Chemical Universe Up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery Toward a Database of Hypothetical Zeolite Structures MoleculeNet: A Benchmark for Molecular Machine Learning An Introduction to Electrocatalyst Design Using Machine Learning for Renewable Energy Storage. arXiv Materials Design and Discovery With High-Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD) PubChemQC PM6: Data Sets of 221 Million Molecules With Optimized Molecular Geometries and Electronic Properties PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry QM7-X: A Comprehensive Dataset of Quantum-Mechanical Properties Spanning the Chemical Space of Quantum Chemistry Structures and Properties of 134 Kilo Molecules Machine-Learned and Codified Synthesis Parameters of Oxide Materials Public (Q)SAR Services, Integrated Modeling Environments, and Model Repositories on the Web: State of the Art and Perspectives for Future Development Anatole von Lilienfeld, O. Machine Learning of Molecular Electronic Properties in Chemical Compound Space Materials Cartography: Representing and Mining Materials Space Using Structural and Electronic Fingerprints Unsupervised Machine Learning in Atomistic Simulations, Between Predictions and Understanding Learning invariant representations of molecules for atomization energy prediction Interactive Structure-Property Explorer for Materials and Molecules Challenging Problems in Data Mining Research Top 10 Algorithms in Data Mining Data-Mining for Processes in Chemistry, Materials, and Engineering. Processes Unsupervised Word Embeddings Capture Latent Knowledge From Materials Science Literature Machine Learning in Chemoinformatics and Drug Discovery Materials Synthesis Insights From Scientific Literature via Text Extraction and Machine Learning Chemometrics: Views and Propositions ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information From the Scientific Literature Data Mining Our Way to the Next Generation of Thermoelectrics Extracting Knowledge From Data Through Catalysis Informatics Machine-Learning-Assisted Materials Discovery Using Failed Experiments Functional Form of the Superconducting Critical Temperature From Machine Learning Molecular Structure Extraction From Documents Using Deep Learning Supervised Machine-Learning-Based Determination of Three-Dimensional Structure of Metallic Nanoparticles Learning Surface Molecular Structures via Machine Vision PDeep: Predicting MS/MS Spectra of Peptides With Deep Learning Unified Approach for Molecular Dynamics and Density-Functional Theory Machine Learning for Molecular and Materials Science Machine Learning Interatomic Potentials as Emerging Tools for Materials Science Machine Learning Force Fields Structure-Based Sampling and Self-Correcting Machine Learning for Accurate Calculations of Potential Energy Surfaces and Vibrational Levels Potential Energy Surfaces Fitted by Artificial Neural Networks Potential Energy Surfaces From High Fidelity Fitting of Ab Initio Points: The Permutation Invariant Polynomial -Neural Network Approach Permutation Invariant Potential Energy Surfaces for Polyatomic Reactions Using Atomistic Neural Networks Communication: Fitting Potential Energy Surfaces With Fundamental Invariant Neural Network A Critical Comparison of Neural Network Potentials for Molecular Reaction Dynamics With Exact Permutation Symmetry Ab Initio Potential Energy Surfaces and Quantum Dynamics for Polyatomic Bimolecular Reactions Energy Landscapes for Machine Learning A Global Potential Energy Surface for the H 2 + OH ↔ H 2 O + H Reaction Using Neural Networks Combining Ab Initio Computations, Neural Networks, and Diffusion Monte Carlo: An Efficient Method to Treat Weakly Bound Molecules Novo Exploration and Self-Guided Learning of Potential-Energy Surfaces Constructing Accurate and Data Efficient Molecular Force Fields Using Machine Learning Efficient Sampling of Atomic Configurational Spaces Spectral Neighbor Analysis Method for Automated Generation of Quantum-Accurate Interatomic Potentials Performance and Cost Assessment of Machine Learning Interatomic Potentials Representing Potential Energy Surfaces by High-Dimensional Neural Network Potentials SchNetPack: A Deep Learning Toolbox for Atomistic Systems Deep Potential: A General Representation of a Many-Body Potential Energy Surface Nonphysical Sampling Distributions in Monte Carlo Free-Energy Estimation: Umbrella Sampling Escaping Free-Energy Minima Transition Path Sampling: Throwing Ropes Over Rough Mountain Passes, in the Dark How Van Der Waals Interactions Determine the Unique Properties of Water Statistical Mechanics: Theory and Molecular Simulation Ab Initio Thermodynamics of Liquid and Solid Water Quantum-Mechanical Exploration of the Phase Diagram of Water Ab Initio Phase Diagram And Nucleation of Gallium Evidence for Supercritical Behaviour of High-Pressure Liquid Hydrogen Nuclear Quantum Effects in Water at the Triple Point: Using Theory as a Link Between Experiments The Interplay of Structure and Dynamics in the Raman Spectrum of Liquid Water Over the Full Frequency and Temperature Range Isotope Effects in Liquid Water via Deep Potential Molecular Dynamics Molecular Force Fields With Gradient-Domain Machine Learning: Construction and Application to Dynamics of Small Molecules With Coupled Cluster Forces Efficient Global Structure Optimization With a Machine-Learned Surrogate Model Machine Learning Enhanced Global Optimization by Clustering Local Environments to Enable Bundled Atomic Energies Hierarchically Structured Allotropes of Phosphorus From Data-Driven Exploration Intrinsic Map Dynamics Exploration for Uncharted Effective Free-Energy Landscapes Neural-Network-Based Path Collective Variables for Enhanced Sampling of Phase Transformations Data-Driven Collective Variables for Enhanced Sampling Exploration, Sampling, and Reconstruction of Free Energy Surfaces With Gaussian Process Regression Gaussian Mixture-Based Enhanced Sampling for Statics and Dynamics Stochastic Neural Network Approach for Learning High-Dimensional Free Energy Surfaces Neural Networks-Based Variationally Enhanced Sampling The Advent of Generative Chemistry Aspuru-Guzik, A. Inverse Molecular Design Using Machine Learning: Generative Models for Matter Engineering Generative Recurrent Networks for De Novo Drug Design Generating Focused Molecule Libraries for Drug Discovery With Recurrent Neural Networks De Novo Design of Bioactive Small Molecules by Artificial Intelligence Constrained Graph Variational Autoencoders for Molecule Design Junction Tree Variational Autoencoder for Molecular Graph Generation Grammar Variational Autoencoder Machine Learning-Based Screening of Complex Molecules for Polymer Solar Cells Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules Entangled Conditional Adversarial Autoencoder for De Novo Drug Discovery Application of Generative Autoencoder in De Novo Molecular Design An Advanced Generative Adversarial Autoencoder Model for De Novo Generation of New Molecules With Desired Molecular Properties in Silico Sequence Generative Adversarial Nets with Policy Gradient Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models MolGAN: An Implicit Generative Model for Mol-CycleGAN: A Generative Model for Molecular Optimization Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation Deep Reinforcement Learning for De Novo Drug Design Deep Learning Enables Rapid Identification of Potent DDR1 Kinase Inhibitors Multi-Objective De Novo Drug Design With Conditional Graph Generative Model Molecular Geometry Prediction Using a Deep Generative Graph Neural Network Generating Equilibrium Molecules With Deep Neural Networks. arXiv Symmetry-Adapted Generation of 3d Point Sets for the Targeted Discovery of Molecules A Framework for Machine-Learning-Augmented Multiscale Atomistic Simulations on Parallel Supercomputers Machine Learning Force Fields and Coarse-Grained Variables in Molecular Dynamics: Application to Materials and Biological Systems Coarse-Graining Methods for Computational Biology Many-Body Coarse-Grained Interactions Using Gaussian Approximation Potentials DeePCG: Constructing Coarse-Grained Models via Deep Neural Networks Combining Simulations and Solution Experiments as a Paradigm for RNA Force Field Refinement Deep Learning for Computational Chemistry Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction The High-Throughput Highway to Computational Materials Design Design of Efficient Molecular Organic Light-Emitting Diodes by a High-Throughput Virtual Screening and Experimental Approach Discovering Charge Density Functionals and Structure-Property Relationships With PROPhet: A General Framework for Coupling Machine Learning and First-Principles Exploring Chemical Compound Space With Quantum-Based Machine Learning Inverse Strategies for Molecular Design Universal Fragment Descriptors for Predicting Properties of Inorganic Crystals Chemical Product Design -Recent Advances and Perspectives Developing an Improved Crystal Graph Convolutional Neural Network Framework for Accelerated Materials Discovery Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties Comparative Tests of Theoretical Procedures for Studying Chemical Reactions Linking the Neural Machine Translation and the Prediction of Planning Chemical Syntheses With Deep Neural Networks and Symbolic AI Alphachem A Short Review of Chemical Reaction Database Systems, Computer-Aided Synthesis Design, Reaction Prediction and Synthetic Feasibility Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction Computer-Assisted Design of Complex Organic Syntheses Computer-Aided Synthesis Design: 40 Years On Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models Synthetic Complexity Learned From a Reaction Corpus Efficient Syntheses of Diverse, Medicinally Relevant Targets Planned by Computer and Executed in the Autonomous Discovery in the Chemical Sciences Part I: Progress Autonomous Discovery in the Chemical Sciences Part II: Outlook A Brief History of Catalysis Bridging Homogeneous and Heterogeneous Catalysis by Heterogeneous Single-Metal-Site Catalysts Electrocatalysis for the Oxygen Evolution Reaction: Recent Development and Future Perspectives Electrocatalytic and Homogeneous Approaches to Conversion of CO 2 to Liquid Fuels Solar Thermochemical Production of Hydro-gena Review Effect of Crystallization Modes in TIPS-pentacene/insulating Polymer Blends on the Gas Sensing Properties of Organic Field-Effect Transistors A Critical Review on Hydrogen Evolution Electrocatalysis: Re-Exploring the Volcano-Relationship Methods, Mechanism, and Applications of Photodeposition in Photocatalysis: A Review A Review of Surface Plasmon Resonance-Enhanced Photocatalysis Photoelectron-Transfer Catalysis: Its Connections With Thermal and Electrochemical Analogs Photosensitization by Reversible Electron Transfer: Theories, Experimental Evidence, and Examples The 2020 Plasma Catalysis Roadmap Combining Non-Thermal Plasma With Heterogeneous Catalysis in Waste Gas Treatment: A Review Resonant Catalysis of Thermally Activated Chemical Reactions With Vibrational Polaritons Encyclopedia of Inorganic and Bioinorganic Chemistry Handbook of Green Chemistry Green Catalysis, Heterogeneous Catalysis On Inventing Reactions for Atom Economy Green Chemistry and Catalysis Theoretical Surface Science and Catalysis-Calculations and Concepts From the Sabatier Principle to a Predictive Theory of Transition-Metal Heterogeneous Catalysis Introducing Structural Sensitivity Into Adsorption-Energy Scaling Relations by Means of Coordination Numbers Machine Learning in Catalysis Machine Learning for Catalysis Informatics: Recent Applications and Prospects Towards Operando Computational Modeling in Heterogeneous Catalysis Machine Learning for Heterogeneous Catalyst Design and Discovery High-Throughput Experimentation Meets Artificial Intelligence: A New Pathway to Catalyst Discovery Automatic Prediction of Surface Phase Diagrams Using Ab Initio Grand Canonical Monte Carlo Automated Discovery and Construction of Surface Phase Diagrams Using Machine Learning Configurational Energies of Nanoparticles Based on Metal-Metal Coordination A Coordination-Based Model for Transition Metal Alloy Nanoparticles Size-, Shape-, and Composition-Dependent Model for Metal Nanoparticle Stability Prediction Unfolding Adsorption on Metal Nanoparticles: Connecting Stability With Catalysis Optimization of the Facet Structure of Transition-Metal Catalysts Applied to the Oxygen Reduction Reaction Accelerating T-SNE Using Tree-Based Algorithms Accelerated Discovery of CO 2 Electrocatalysts Using Active Machine Learning Understanding the Composition and Activity of Electrocatalytic Nanoalloys in Aqueous Solvents: A Combination of DFT and Accurate Neural Network Potentials Machine Learning Accelerates the Discovery of Design Rules and Exceptions in Stable Metal-Oxo Intermediate Formation Chemical Pressure-Driven Enhancement of the Hydrogen Evolving Activity of Ni 2 P From Nonmetal Surface Doping Interpreted via Machine Learning Interaction Trends Between Single Metal Atoms and Oxide Supports Identified With Density Functional Theory and Statistical Learning Machine Learning Corrected Alchemical Perturbation Density Functional Theory for Catalysis Applications Machine Learning Meets Volcano Plots: Computational Discovery of Cross-Coupling Catalysts Graph Theory Approach to High-Throughput Surface Adsorption Structure Generation To Address Surface Reaction Network Complexity Using Scaling Relations Machine Learning and DFT Calculations Discovery of Novel Chemical Reactions by Deep Generative Recurrent Neural Network ReactionPredictor: Prediction of Complex Chemical Reactions at the Mechanistic Level Using Machine Learning Neural Networks for the Prediction of Organic Chemistry Reactions Mapping the Space of Chemical Reactions Using Attention-Based Neural Networks Computer-Generated Kinetics for Coupled Heterogeneous/Homogeneous Systems: A Case Study in Catalytic Combustion of Methane on Platinum Sequential-Optimization-Based Framework for Robust Modeling and Design of Heterogeneous Catalytic Systems Machine Learning for Predicting Product Distributions in Catalytic Regioselective Reactions Holistic Prediction of Enantioselectivity in Asymmetric Catalysis Prediction of Higher-Selectivity Catalysts by Computer-Driven Workflow and Machine Learning Predictive Multivariate Models for Bioorthogonal Inverse-Electron Demand Diels-Alder Reactions The Digitization of Organic Synthesis Making Better Decisions During Synthetic Route Design: Leveraging Prediction to Achieve Greennessby-Design Computer-Based De Novo Design of Drug-Like Molecules Active-Learning Strategies in Computer-Assisted Drug Discovery. Drug Discovery Today Deep Learning in Biomedicine Industrial Enzymes: Trends, Scope and Relevance REINVENT 2.0: An AI Tool for De Novo Drug Design Drug Discovery With Explainable Artificial Intelligence The Rise of Deep Learning in Drug Discovery Spectrum of Deep Learning Algorithms in Drug Discovery Designing Antimicrobial Peptides: Form Follows Function Screening of Therapeutic Agents for COVID-19 Using Machine Learning and Ensemble Docking Studies A Machine Learning and Informatics Program Package for the Analysis, Mining, and Modeling of Chemical and Materials Data Machine Learning for Molecular Simulation Quantum Machine Learning in Chemical Compound Space Retrospective on a Decade of Machine Learning for Chemical Discovery Machine Learning the Ropes: Principles, Applications and Directions in Synthetic Chemistry Explaining Graph Neural Network Predictions by Identifying Relevant Walks. arXiv