Exploration on Quantum Chemical Potential Energy Surfaces: Towards the Discovery of New Chemistry
Chapter 1: Introduction

Published:12 Dec 2022

Special Collection: 2022 ebook collection
2022. "Introduction", Exploration on Quantum Chemical Potential Energy Surfaces: Towards the Discovery of New Chemistry, Koichi Ohno, Hiroko Satoh
Download citation file:
The number of known chemical substances has, to date, grown to over 190 million and is continually increasing. However, the number of currently known compounds is still much smaller than the number of possible chemical structures estimated to exist in the chemical space. It is challenging to explore unknown regions of this space theoretically with the end goal of identifying all chemical species that can be obtained from a given set of atoms and all reaction channels involving isomerization and decomposition. In principle, such information could be generated based on the mathematical properties of potential energy surfaces (PESs), although the simple pointbypoint exploration of such surfaces requires tremendous computational resources. It would therefore be helpful to mitigate such difficulties in order to discover new molecules and reactions and to estimate what is possible or impossible in the chemical space. This chapter summarizes the fundamental concepts and basic knowledge of the exploration of quantum chemical PESs.
1.1 Chemical Space
We are surrounded by numerous chemical compounds including naturally occurring molecules such as oxygen, water and foodstuffs indispensable to life, as well as artificially synthesized materials such as those found in electronic devices. In fact, the total number of known compounds registered in the CAS Registry^{1 } now amounts to more than 190 million and is increasing by approximately 16 million per year such that the total is expected to reach 1.8 billion in 100 years (Figure 1.1). Even so, this value is still much less than even the number of permutations that can be made from 13 different atoms arranged in a simple chain (13!/2 or approximately 3.1 billion). Thus, the number of undiscovered molecules is much larger than the number of presently known compounds and the search for new chemical species remains a challenge.
The chemical space has continually expanded since the advent of chemistry as a field of study and the use of computers beginning in the 1960s has assisted in this process. The first computational approaches to the exploration of the chemical space were informational methods based on graph theory, combinatorics and artificial intelligence techniques.^{2–8 } Until recently, computing power has been insufficient for the applications of quantum mechanics (QM) to chemical discovery, but the development of various new theories and computational methods, as well as significant improvements in computing resources, has made QM practically accessible in recent years.
A specific chemical formula can stand for different chemical structures (isomers) with different geometrical arrangements of the identical constituent atoms. Thus, the following questions need to be solved to explore the chemical space. What kinds of isomers can be produced from a given atomic composition? How are the isomers converted into one another? How are the isomers decomposed into smaller species, or conversely, how are they made from smaller species? These questions are important in molecular and reaction discovery. Many of the enumeration problems have been solved by treating chemical structures as chemical graphs in the field of chemoinformatics. One can enumerate isomers and their possible interconversion very efficiently with chemoinformatics methods. However, the quantitative prediction of molecular properties, such as reactivity and reaction mechanisms, and the enumeration of chemical structures beyond valence bond theory still require the use of QM.
The possible interconversions of compounds can also be examined by directly investigating the mathematical properties of potential energy surfaces (PESs). A PES indicates the electronic energy of a chemical system as a function of nuclear coordinates, as shown in Figure 1.2.^{9–11 } Typically, the calculation of a PES is challenging if the number of variables involved is greater than three because the computational demands are drastically increased in such cases.^{10 } Even so, various techniques have recently been developed that allow PESs to be evaluated as a means of identifying new molecules and reaction mechanisms. This book is intended to serve as a guide for the exploration of PESs based on QM calculations.
The present chapter summarizes the fundamental concepts and basic knowledge of the exploration of PESs.
1.2 Exploration of PESs
The theoretical analysis of a PES is essentially based on solving the timeindependent Schrödinger equation:^{10 }
where H is the Hamiltonian operator of the chemical system, Ψ is the wave function and E is the energy eigenvalue. This equation can be solved using the Born–Oppenheimer approximation, in which coupling between the nuclear and electronic motion is neglected. This approach allows the electronic part of the equation to be solved with fixed nuclear positions as parameters and the resulting PES forms the basis for solving the nuclear motion. Hence, most computational approaches involve solving the electronic Schrödinger equation for a given set of nuclear coordinates. The obtained energy eigenvalue E as a function of nuclear coordinates yields the PES.
Eqn (1.1) typically yields several eigenvalues, with the lowest value corresponding to the ground state and the higher ones to excited states. It should be noted that the present discussion is focused solely on groundstate PESs. Even so, the calculation of excitedstate PESs is important, especially in photochemistry where some surfaces come closer to or into contact with one another at points known as conical intersections.^{10,12 }
The number of nuclear coordinates on a PES increases with the number of atoms involved in the system. In the case of a diatomic system, the PES is reduced to a curve described as a function of interatomic distance. In more general cases with more than two atoms, the PES is not an ordinary surface in 3D space but rather a hypersurface with a dimension larger than two. Figure 1.3 presents a 3D PES in which the x and y axes are the nuclear coordinates and the z axis is the potential energy. Each minimum located in a basin on a PES corresponds to an equilibrium (EQ) molecular structure. In addition, each firstorder saddle point in the boundary region between adjacent basins on the PES, having a maximum along only one direction and a minimum in all other perpendicular directions, represents a transition state (TS) structure. Such a TS connects the basin corresponding to a reactant to another basin corresponding to a product via a minimum energy path (MEP). When the coordinates for each atom are massweighted, the MEP is referred to as the intrinsic reaction coordinate (IRC).^{13 } In addition, a path leading to fragment species is known as a dissociation channel (DC). There are two types of DCs. One is an ascending (uphill) DC starting from an EQ structure, referred to as a UDC, while the other is a descending (downhill) DC from a TS, referred to as a DDC.
This book focuses on the derivation of EQ, TS and DC structures on PESs because the identification of new EQ structures (EQs) can allow the discovery of formerly unknown chemical structures, while the evaluation of new TS structures allows novel reaction paths to be elucidated. Also, finding a new DC is equivalent to discovering a new fragmentation mechanism, which is essentially the reverse of an unknown synthetic path. The determination of EQ, TS and DC structures is thus an essential aspect of exploring PESs.
The EQ and TS points on a PES correspond to minimum points and firstorder saddle points, respectively. These are stationary points at which the first derivative (that is, the slope) along any direction is zero. At such stationary points, the Hessian matrix (or the Hessian) composed of the second derivatives must be diagonalized to check the eigenvalues. In the case of an EQ point, all eigenvalues will be positive, indicating that the point is located at a minimum, while a TS will be a firstorder saddle point with exactly one negative eigenvalue and the others positive. One may thus test for these mathematical conditions in a pointbypoint manner moving along the PES. However, this thorough exploration of the PES involves a large computational cost and so it is important to consider different exploration procedures.
The size of a PES depends on the number of variables and the range of variable values. If N is the number of atoms involved in the analysis, then the PES has 3 × N − 6 variables. It should also be noted that the three translational degrees of freedom and three rotational degrees of freedom do not change the energy of the system as far as the internal coordinates are not changed.
As a simple example, a system comprising ten atoms has 3 × N − 6 (3 × 10 − 6 = 24) dimensions. In the case where there is no preexisting knowledge of the system, pointbypoint lattice sampling along the PES may be the optimal means to choose for the exploration. In such a case, the number of sampling points will be very large but still finite. Assuming a linear range of 10 Å for each coordinate with intervals as small as 0.1 Å between sampling points, the number of sampling points is potentially 10/0.1 (=100) for each coordinate axis and the total number of lattice sampling points for 24 dimensions becomes 100^{24} (=10^{48}). Obtaining the energy value for a sampling point requires a certain amount of computation time. As an example, the QM calculations for a tenatom system typically require a computation time on the order of seconds. Assuming an average of one second for the PES calculations, the total central processing unit (CPU) time would amount to 1 × 10^{48} s (=3.2 ×10^{40} years), which is longer than the age of the universe (approximately 10^{10} years) (Figure 1.4). Even if advances in quantum computing reduce the CPU time for QM calculations drastically to 10^{−10} s, the CPU time for the lattice sampling along a PES for 10 atoms would be 3.2 × 10^{30} years, which is still much longer than the age of the universe. Therefore, the pointbypoint sampling of a PES is not a viable strategy.
Before discussing algorithms for the exploration of PESs, it is helpful to consider methods for obtaining potential energy values. If semiempirical techniques or model potentials are employed rather than more sophisticated QM methods, the CPU time that is required is drastically reduced. These simple methods are parameterized based on the use of certain datasets. They would give good results if the molecular system is in the category of the datasets but cannot reliably predict energies and structures outside of the category. This is also a drawback associated with the machine learning (ML)assisted construction of a PES generator based on training sets, even if data are generated by highlevel QM calculations. Consequently, reducing the CPU time using techniques based on parameterization or ML is not optimal, especially if the goal is to discover new chemistry via the exploration of PESs.
Algorithms are also available for the exploration of PESs but require efficiency considerations. Because the number of important points on a PES (that is, EQ, TS and DC points) is much less than the total number of lattice points, selective sampling is advantageous. One solution to this issue is a downhill walk following the steepest decent pathway from an initial point on the PES. This can be accomplished based on geometry optimization techniques that are available in most QM software programs. This process can be initiated from any point on the PES (Figure 1.5) and will generally converge at an EQ point or a DC path. If the initial point is in a basin surrounded by a dividing ridge, geometry optimization will result in an EQ structure, while if the initial point is in a semiinfinite flat area, a DC will be obtained.
While it might seem possible to identify all EQ and DC structures on a PES by repeating this geometry optimization beginning from various points, this is only viable if the initial points are evenly distributed over the PES toward all the EQ and DC structures. In addition, if the initial points are in the same basin or in the same flat area, they will converge to the same EQ or DC structure, respectively. Thus, it is important to avoid duplicate searches to efficiently achieve full exploration of the PES.
In contrast to downhill walking, uphill walking is more complex. Starting from a TS and following the steepest descent pathway can lead to an EQ structure, but starting from an EQ point and climbing to a TS requires guidance on identifying the correct pathway out of numerous possibilities (Figure 1.6). In fact, the steepest ascent from any initial point will most likely not end at a TS but rather at a local maximum at which all the Hessian eigenvalues are positive. Such locations are not important stationary points in chemistry. Walking uphill using certain guidelines, such as normal modes, can eventually lead to a TS but many trials will be required to achieve this goal. In particular, gentle slopes or soft vibrational modes are often related to internal rotations associated with conformational changes in a molecule without bond breaking or formation (Figure 1.7). Conformational searching is an important topic, but does not allow the discovery of chemical reactions in which bond breaking and formation take place.
The atoms in a reactant are in constant thermal motion and so may pass through a TS region and enter a product basin after a very long journey over a PES. Assuming that the PES cannot be precalculated, the computational requirements for a molecular dynamics (MD) approach are primarily determined by the need for repeated electronic energy calculations. In many systems, thermal energy will be insufficient to cause the system to pass through a TS within an acceptable simulation time, and so direct MD simulations (Figure 1.8a) cannot be used for a full exploration of the PES. Fortunately, many alternative methods that address these basic problems have been developed. One such technique is metadynamics,^{14 } which allows both the free energy surface associated with a PES and the temperature of the system to be explored (Figure 1.8b). Some studies have explored PESs based on simulations of free energy surfaces;^{15,16 } however, this book focuses only on the exploration of temperatureindependent PESs.
An ideal approach involves directly identifying an uphill reaction path from an EQ structure toward each of the TSs that surround the EQ point without any stopovers. However, tracking the reaction path toward a TS in this manner is a longstanding challenge and may not be possible for systems with more than three variables.^{10 } Thus, alternative approaches to assessing reaction paths are discussed in Section 1.3.
1.3 Generation of Reaction Route Maps
A thorough exploration of a PES will generate a reaction route map, which is a reaction network consisting of EQ, TS and DC structures along with the reaction pathways connecting these sites. A global reaction route map can assist in the evaluation of new regions of the chemical space with the aim of identifying new compounds. The various search methods that can be used to generate reaction route maps are described in detail in Chapter 2, while examples of PES exploration are provided in Chapter 3. In the following sections, basic issues related to discovering EQ, TS and DC structures, IRCs and reaction networks are briefly surveyed.
1.3.1 Finding an EQ Structure
Locating EQs is essentially based on a search for local minima on the PES, typically with the aim of finding new EQs that correspond to unknown chemical species. In the case that a variety of EQs exists and can be thermally interconverted based on moving through low energy barriers, the EQ structure having the lowest energy (that is, the global minimum) becomes important. As noted earlier, EQs can be obtained through geometry optimization but this process does not necessarily give the desired EQ structure. In fact, even after many iterations of geometry optimization beginning from various initial points, this EQ structure may not be reached. In contrast, if the global reaction route map is available for a particular chemical formula, the entire catalog of EQs (including isomers) may be determined, allowing new EQs and their energy distribution to be rapidly obtained.
The stability of an EQ structure is related to the heights of the energy barriers of the TSs surrounding the EQ structure, as discussed in the next subsection.
1.3.2 Finding a TS
A TS is defined as a firstorder saddle point on a PES and finding a TS is generally much more difficult than identifying an EQ structure. The methods available for this purpose can be classified into three types: (a) optimization starting from an initial guess, (b) a singleended TS search involving one or more uphill walks starting from an EQ point (Figure 1.9a) and (c) a doubleended TS search based on the interpolation of the intermediate points (or curve) between two given EQs (Figure 1.9b).
If the initial guess associated with type (a) is in a region in which only one Hessian eigenvalue is negative and all the other eigenvalues are positive (region A in Figure 1.10), the process will converge to a TS using conventional optimization methods.^{10 }
The simplest singleended method involves a single uphill walk from an EQ point to find a TS, while more general singleended approaches perform a series of uphill walks from an EQ point to find all the TSs surrounding that point. The latter processes are helpful for two reasons.
Firstly, the reaction paths around the EQ structure can be denoted as EQ–TS1, EQ–TS2, EQ–TS3 and so forth, to generate a network of nodes on the reaction route map. Secondly, the lowest energy TS (that is, the TS having the lowest potential energy) among the various TSs around the EQ point can be determined, which is important because this TS determines the stability of the EQ structure. Specifically, if the energy of this TS is sufficiently high, the chemical structure of the EQ point will not be easily converted and thus will be stable (Figure 1.11a). Conversely, if the energy of this TS is very low, the EQ structure will be unstable and readily converted to other structures (Figure 1.11b).
Doubleended TS search methods can be applied in the case where the reactant and product are directly connected via a TS. In general, the reactant and product do not necessarily need to be connected by a single TS but may be connected by a series of elementary reactions through several intermediates. In such cases, some doubleended TS search methods may be inapplicable and it becomes necessary to find the intermediates using other techniques. Note that the previous discussion assumes that the reactant and product each are single chemical species. If the reaction involves more than two reactant and/or product molecules, other methods should be used. Approaches to locating intermediates and TS structures in multicomponent systems are described in Chapter 2.
1.3.3 Finding a DC
As discussed earlier, both UDC and DDC paths can exist. The former can be found by walking uphill from EQs, while the latter are identified by walking downhill from TSs. It is important to locate DCs so as to estimate the stability of the EQs to which they are connected and to ascertain synthetic reaction routes associated with the dissociated components of each DC.
The height of a UDC relative to the EQ point equals the dissociation energy (approximately for finite separations and exactly for infinite separations). If this height is very low, the structure of the EQ point is easily decomposed into fragments specified by the UDC path. In addition, if the energy barrier on going from the EQ point to the DDC via the TS is minimal, the EQ structure is easily decomposed into fragments specified by the DDC path. Thus, the energy profile of the DC is a very important aspect for assessing the stability of the reaction products.
If a DC path leading to an EQ structure is found, the associated reaction scheme provides information related to the retrosynthesis to produce the EQ structure that can be designed based on the structures related to the DC path (Figure 1.12). The DC structures can be precursors or synthons.
1.3.4 Finding an IRC
Once a TS is identified, two downhill walks, one forward and the other backward, can be initiated from that point along the negative eigenvector direction. These downhill walks along the steepest decent path (based on massweighted coordinates) will result in a set of IRC pathways, each of which leads to an EQ or a DC structure. Thus, following the IRC from a TS yields EQ–TS–EQ′, EQ–TS–DC or DC–TS–DC′ reaction pathways.
Combining an IRC connection of the EQ–TS–EQ′ type with a singleended search for a TS around an EQ results in a reaction network comprising an EQ–TS–EQ′–TS′–EQ″–⋯ chain and several branches with EQ nodes that are directly connected to many TSs.
1.3.5 Finding a Reaction Network
Generating a global reaction route map for a given set of atoms requires the construction of an entire reaction network including EQ, TS and DC structures and IRCs using the methods described earlier, as summarized in Figure 1.13a. The process starts with identifying an EQ structure from an arbitrarily chosen initial geometry. Subsequently, TSs and DCs can be located by walking uphill from the EQ point using a singleended TS search technique. Once a TS is found, walking downhill from that TS along the IRC will yield EQ or DC structures. This procedure is then repeated in the vicinity of each newly located EQ structure until no new EQs are found. This process of sequentially identifying EQ–TS–EQ′–TS′–EQ″ chains will eventually generate the entire reaction network in the form of a global reaction route map (Figure 1.13b).
1.4 Exploration Based on Intention
If a global reaction route map is available for a chemical formula, all possible chemical species belonging to that formula can be found on the map. That is, this map provides a catalog of chemical structures and their reaction pathways for each chemical formula and so represents an effective approach to discovering unknown chemistry. However, the amount of computation required to generate the global reaction route map is expected to increase significantly as the system size expands.
It is notable that geometry optimization starting from an appropriate initial structure can yield an EQ structure that may represent an unknown chemical structure. Choosing this initial structure is not trivial but numerous novel compounds have been synthesized in this manner based on experience, knowledge or intuition. Similarly, the computational exploration of a PES may lead to the discovery of new chemical structures based on geometry optimization starting from an initial structure selected solely on the basis of intuition. Examples of such approaches to the identification of novel compounds and reactions are provided in Chapter 4.
1.5 Selection of Computational Levels
The exploration of a PES requires both reliable and efficient electronic structure calculations. That is, a sufficient level of accuracy is necessary to obtain meaningful results although, at the same time, the computational demands must not be excessive. Consequently, the selection of an appropriate calculation level is of utmost importance when performing a PES search.
Methods using semiempirical or ML parameters based on experimental data are not well suited to the discovery of unexpected structures or reactions on a PES. That is, appropriate nonempirical or firstprinciples methods must be employed to solve eqn (1.1). Although relativistic treatments are needed for systems including heavy atoms, relativistic effects are neglected in the present discussion. In the case of standard quantum chemical calculations, the quality of the results is inevitably limited by two major problems arising from the manyelectron expansion related to electron correlation and the oneelectron expansion due to the finite sizes of basis sets.^{10,12,17 } Within wavefunction theory, four main methods may be used to treat electron correlation: configuration interaction (CI), manybody perturbation theory (MBPT), coupled cluster (CC) and quantum Monte Carlo (QMC).
Typically, basis sets are classified based on their size as either minimal or single zeta, double zeta, triple zeta and so forth, with additional diffuse and polarization functions. A wellknown issue associated with atomcentered localized basis functions is the basis set superposition error (BSSE). For any given extent of separation of two molecules, the BSSE can be approximated using the counterpoise (CP) correction,^{10,12,17 } although calculating CP corrections for an entire PES is cumbersome and increases the computational demands. In contrast, complete basis set (CBS) models^{12,17 } do not require CP correction. One possible solution to this problem is to select an appropriate basis set that yields relative energy values reasonably comparable to those obtained at the CBS limit.
A suitable spin multiplicity, such as singlet, doublet or triplet, has to be selected as a component of the input to electronic structure calculations of a PES. Moreover, the correct treatment of openshell scenarios requires unrestricted levels of calculations to be performed. In cases where biradical dissociation is involved, unrestricted levels are required to obtain a wellbehaved (smooth) PES. In such cases, the effects of spin contamination must be adequately addressed if the 〈S^{2}〉 value begins to deviate from S(S + 1).^{10,12,17 }
Readers are encouraged to consult computational chemistry textbooks for further details about suitable levels of theory, simulation methods and basis set selection.
Abbreviations
 BSSE
Basis set superposition error
 CBS
Complete basis set
 CC
Coupled cluster
 CI
Configuration interaction
 CP
Counterpoise
 CPU
Central processing unit
 DC
Dissociation channel
 DDC
Descending (downhill) DC path beginning from a TS point
 EQ
Equilibrium
 EQs
EQ structures
 IRC
Intrinsic reaction coordinate
 MBP
Manybody perturbation
 MD
Molecular dynamics
 MEP
Minimum energy path
 ML
Machine learning
 PES
Potential energy surface
 QM
Quantum mechanics
 QMC
Quantum Monte Carlo
 TS
Transition state
 UDC
Ascending (uphill) DC path beginning from an EQ point