Skip to Main Content
Skip Nav Destination

Quantum mechanical/molecular mechanical (QM/MM) methodology has seen significant growth over the last couple of decades such that it is now considered as an established methodology within the the theoretical chemist's toolbox. The usefuleness of QM/MM methods in understanding mechanistic phenomena within the realm of enzymology is well represented in the literature. However, the application of this powerful method to other biologically relevant problems is less well explored. In particular this review focuses on the application of QM/MM methods to the established field of computational drug design. An initial discussion on the different approaches to using QM/MM methods within drug discovery is followed by an outline of the methodological aspects. Finally some key applications of QM/MM methods in the field of structure based drug design illuminate the important role that these methods play in both providing detailed information about binding interactions and the development of more accurate scoring protocols.

Quantum mechanical/molecular mechanical (QM/MM) hybrid methods have their origins in the 1970s in the pioneering work of Warshell and Levitt.1  However, until the last two decades, the uptake of this methodology and its application to solving problems of biochemical interest was essentially non-existent. In the beginning of the 1990s the situation changed dramatically and since this time there has been an explosion in the number of publications that use QM/MM methods to study systems of biochemical interest. Several recent reviews in the literature have provided excellent coverage of the recent developments with a strong focus on the usefulness of QM/MM methods to studying enzyme reactions.2–25 

The dramatic uptake of QM/MM methodology has been driven by significant developments in the methodology accompanied by a rapid increase in computer power and decrease in computer cost. Moreover, along with the development in QM/MM methodology, the development of QM methods to deal with larger systems with ever increasing accuracy has allowed these high-level QM methods to be applied to biochemical systems. One can now find examples in the literature where enzyme-catalysed reactions have been modelled computationally to within chemical accuracy (ca. 1kcal/mol) – a feat unimaginable only a few years previously.26  Despite these impressive strides in accuracy and efficiency, the application of QM/MM methods to the field of drug design has received far less attention until recently.8,13,18 

Computational drug discovery is a well-established discipline within the field of computational chemistry.27–56  Computational chemistry can be applied to the development of new drugs from both the perspective of the small molecule34,47,57–59  (drug-like compound) and the biological target (receptor).27,29,39,45,55  In cases where the drug target is unknown or a structure of the target receptor is unavailable, a quantitative–structure activity relationship (QSAR) can be developed by comparison of several ligands with varying biological responses (activities).34,47,57–59  One of the oldest and most popular 3D-QSAR methods is a Comparative Molecular Field Analysis (CoMFA), which uses an sp3-hybridized carbon with a charge of +1 to evaluate steric and electrostatic features of the compounds in the training set.60–64  By comparing the steric and electrostatic features of the compounds in the training set as a function of their activity a relationship between regions that benefit from having certain properties in relative molecular positions (e.g. decreased steric bulk next to a negatively charged region of the molecule) can be identified. While both ligand-based and receptor-based methods have seen significant successes throughout their application, in the current review the focus will be on receptor (structure) based methods.

Structure-based drug design (SBDD) stems from the lock-and-key principle originally proposed by Emil Fischer over 100 years ago. In the lock-and-key model the ligand and the receptor are both considered as rigid objects and the activity of a ligand results from the complementary matching of the steric and electrostatic features of the ligand (key) into the receptor's binding pocket (lock). This principle was subsequently revised in 1958 by Koshland with the proposal of the induced-fit model whereby both the ligand and the receptor are considered to behave dynamically and as such the ‘lock’ and the ‘key’ are both able to adapt to some extent to provide a better match where possible during the binding process. However, modelling the dynamic flexibility of a complete macromolecule (receptor) is still not computationally feasible when one wishes to screen a large database of potential drug-like molecules. As such, within modern docking methods a compromise between the lock-and-key method and the induced fit model is generally applied. These methods treat the ligand as a flexible molecule (or a series of different conformations of the ligand is used) and is docked into the rigid receptor,65–74  or semi rigid receptor75–80  (where flexibility of the side chains in the binding pocket is allowed). Irrespective of the approach used, the key requirement for SBDD is a reliable structure of the target macromolecule.

The SBDD approach usually involves multiple stages. These can be broadly classified as follows.

  1. Generation of poses/conformations of the ligand in the active site – this process determines whether the ligand is able, and if so in which orientation, to physically fit into the active site.

  2. Scoring and ranking the individual poses – in this process the static poses generated in the first step are given a score, usually based on electrostatic fit between the ligand and the residues that compose the binding site.

  3. Refinement of the docked structures and their subsequent ranking – this third step is the most complicated and various different approaches exist to introduce ligand and binding site flexibility into the refinement of the structure (e.g. molecular dynamics simulations, random search methods, etc.).55 

Quantum mechanical/molecular mechanical (QM/MM) scoring refinement focuses on the third stage of the SBDD approach – the structural refinement and determination of the final ranking. This refinement process has several problems at a computational level. The methodology employed in the refinement needs to be efficient in order to deal with the large numbers of ligands and poses that need to be refined. Because of this the use of molecular mechanics (i.e. empirical force field based approaches) has been the method of choice. However, empirical methods are inherently limited in their accuracy to the quality/specificity of the parameterization.81,82  As a result, in repeated regular structures, such as proteins, which are combinations of only 20 amino acids, force field methods are remarkably successful (e.g. the CHARMM,83,84  AMBER,85,86  GROMOS,87,88 etc. force fields).89  However, in the case of ligands, which contain an infinite variety of chemical motifs, the parametrization of effective force fields is much more difficult. As such, docking and scoring programs generally rely on universal force fields for modelling both the ligand and the protein, which are much less successful, but offer a common level of accuracy between the protein and the ligand. In principle, the use of quantum mechanical (QM) methods in the place of molecular mechanical methods would solve these problems, offering a common and accurate description of both the ligand and peptide. However, the main drawback for QM methods is that their computational cost prohibits their application to such problems.

The middle ground between purely ab initio and purely empirical methods are the set of methods known as semi-empirical methods. These methods offer the chemical flexibility of the ab initio methods (though not necessarily the accuracy) but are orders of magnitude more efficient. Throughout the literature one is able to find several examples of the applications of semi-empirical methods in the context of QM/MM biochemical modelling.8,13,90–100  The latest semi-empirical methods (OMx,101–103  SCC-DFTB,104–106  PDDG/PM3,107  PM6-DH2,108 etc.) offer an accurate and efficient description of the structure and energetics of small molecules. Moreover, these methods have been successfully combined with biochemical force fields to provide an accurate description of the interactions in the binding site, between a ligand – treated at the QM level of theory – and the protein – treated at the MM level of theory.8,13,98,99,102 

The general strategy in QM/MM scoring refinement is to run a QM/MM minimization of the binding site (ligand+protein) to refine the structure and binding energies of the various poses obtained from the first two stages of the virtual screening. This approach simultaneously provides a more accurate modelling of both the ligand and binding site and, as such, should provide enhanced differentiation between the various ligands, as well as a more detailed understanding of the interactions that are present in the binding site. As such, this method should provide considerable benefit in lead optimization activities.

In Section 1.2 an outline of the main QM/MM methodology employed in QM/MM scoring refinement is provided. In Section 1.3 two illustrative examples of the application of this methodology to challenging scoring problems are provided. In the final section (1.4) the main difficulties associated with QM/MM scoring are addressed and the potential future directions of the field are discussed.

The basic concept of a QM/MM method is to describe the chemically interesting region of the system (in this case the ligand and potentially the binding pocket) using a QM method and the remainder of the system, the environment, at the MM level of theory. In the most common case the ligand binds to the receptor non-covalently (Figure 1.1) and as such only the ligand is treated at the QM level of theory as the MM description of the protein (receptor) is reliable when appropriate biochemical force fields are chosen. The description of the ligand region using a QM method and the receptor using an MM method is trivial. However, the goal of any QM/MM method is to describe the interaction between the two regions – in other words, how do we best describe the coupling between QM and MM parts of the system? The coupling terms between the ligand and the receptor are exactly the interactions that will dictate the binding ability of the ligand and can be quite strong. Indeed, a balanced treatment of the three components, QM, MM and the coupling terms are equally important for any QM/MM scoring function to be successful.

Figure 1.1

Partitioning of the full system into the QM region (Ligand – dark grey) and the MM region (Receptor – light grey).

Figure 1.1

Partitioning of the full system into the QM region (Ligand – dark grey) and the MM region (Receptor – light grey).

Close modal

There exist two families of QM/MM approaches: those based on the additive scheme and those based on the subtractive scheme (see Section 1.2.2). The additive scheme, initially proposed by Warshel and Levitt in 1976,1  is employed in the majority of QM/MM programs presently in use and generally contains three terms:

Equation 1.1

where EQM/MM(Tot) is the total energy of the full system calculated at the QM/MM level of theory, EQM(Lig) is the energy of the ligand calculated at the QM level of theory, EMM(Rec) is the energy of the receptor calculated at the MM level of theory and EQM–MM(Lig,Rec) is the explicit coupling term between the ligand and the receptor. As we shall see in the subtractive scheme there is no explicit coupling term present; however, the variation in how the coupling term is defined allows one to further differentiate between various additive QM/MM approaches. These approaches were described in detail by Bakowies and Thiel in 1996,109  and have recently been reviewed in the literature; as such, an account will not be provided here.21  The QM/MM approach discussed herein refers to the electrostatic embedding approach,109  unless otherwise stated. For details of other methods the reader is referred to the literature.2,4,5,9,10,14,19–21,110–112 

Where there are no bonds that cross the QM/MM boundary, as is the case for non-covalently bound ligands, the coupling term, in the electrostatic embedding approach, has two components:

Equation 1.2

where refers to the van der Waals (vdW) interactions between the ligand atoms and the atoms in the receptor; and refers to the electrostatic interactions between the ligand and receptor atoms. If a covalent bond exists between the QM and MM regions a boundary scheme needs to be chosen in order to properly describe the forces acting across the boundary and also to ensure appropriate ‘capping’ of the QM region. The various schemes for dealing with QM–MM boundary regions have been recently reviewed and the interested reader can find details of these from the reviews.10,20,25 

The vdW interactions are treated at the MM level of theory and as such the atoms in the ligand require vdW parameters to be assigned to them. This assignment is usually done by analogy with ‘similar’ atom types that exist in the force field. However, alternative, and more rigorous, approaches for assigning vdW parameters to the atoms treated at the QM level of theory have also been presented.16,113  The error introduced by the MM treatment of the vdW interactions between the ligand and the receptor regions is expected to be minimal due to the short-range nature of these interactions.114 

Due to its long-range nature, the predominant coupling term between the ligand and the receptor regions is the electrostatic term (). Within the electronic embedding approach, the electrostatic coupling between the ligand and receptor descriptions is achieved by performing the QM calculation in the presence of the point charges (taken as the partial charges assigned within the force field) of the receptor. This is achieved by including the partial charges as one-electron terms in the QM Hamiltonian, resulting in the additional term (), defined in Equation (1.3).

Equation 1.3

The electron positions are given by ri, the point charges, as defined in the force field, for the receptor are denoted qj and are located at Rj and Qk are the charges on the nuclei in the ligand located at Rk. The indices i, j and k correspond to the N electrons, the L point charges in the receptor and M nuclei in the ligand, respectively.

The inclusion of the point charges from the receptor in the Hamiltonian results in the polarization of the electronic structure of the ligand. If the charges in receptor change their positions during a geometry optimization or a molecular dynamics simulation, then the electronic structure is able to respond to these positional changes realistically as the electrostatic coupling is treated at the QM level of theory.

The long-range nature of the electrostatic interactions imbues them with a special significance in QM/MM calculations. The long-range nature implies that a significantly large proportion of the environment needs to be included in a calculation in order to correctly reproduce the electrostatic effects on the chemically interesting region of the system.7,115,116  However, increasing the size of the environment (receptor and solvent) carries two important pitfalls; (1) the larger the number of degrees of freedom in a system, the greater the chance of falling into localized, disconnected, minima; and (2) the larger the system, the more computationally intensive the calculation becomes. During the initial development of QM/MM methods this second factor was considered essentially irrelevant as the calculation of the QM component was expected to be the rate-limiting step. However, as modern QM methods have increased in both speed and accuracy this is no longer necessarily true, particularly when semi-empirical methods are employed for the QM calculation.

The concern of localized, disconnected minima is particularly troublesome in SBDD where the goal is to determine the relative binding energy of different ligands in the binding site and often the variation can be as little as several kcal/mol across a series of competitive ligands. Given this small window of energy differences, spurious variations in ranking that can occur from changes in the environment (e.g. alterations in the H-bonding network of the solvent) or in distant parts of the receptor (e.g. rotation of an amino acid on the protein surface causing a stronger or weaker salt-bridge to be formed) are unacceptable. One approach that has been commonly employed to deal with this problem has been to freeze atoms in the solvent and receptor that are farther than a pre-determined distance (typically 10–15Å) away from the ligand atoms. This procedure reduces the number of degrees of freedom in the QM/MM simulation and treats the ‘frozen’ part of the system as a fixed external potential that interacts with the mobile inner ‘core’ of the system. Despite the fixed nature of the outer region, this approach still requires the computation of all interactions between the frozen atoms in the system in addition to the interaction between every atom in the frozen region with every atom within the inner core region. Thus, freezing the more distant components of a system does not alleviate the second pitfall – i.e. the computational cost associated with extended, solvated systems.

This problem of increasing system size has, of course, also plagued those in the MM community where there has been significant effort in developing accurate methods to treat long-range electrostatics. For large systems, where periodic boundary conditions are employed, the use of the Ewald summation is well established and has been employed in the MM community for some time.117–119  This approach has also been extended and implemented in a number of packages for QM/MM simulations as well.96,120,121  However, the majority of QM/MM simulations are performed on finite (non-periodic) systems where the Ewald summation is no longer valid.116,122–124  Therefore the use of solvent boundary potentials125–129  has become increasing prevalent.

The generalized solvent boundary potential126  (GSBP) is of particular interest and has already been implemented by a number of QM/MM developers.130–134  Within this approach the total system (Tot) is divided into three components – the receptor, which has the same definition as stated previously and is treated at the QM level of theory, an outer region, which includes parts of the receptor and solvent that is within a defined distance from the receptor and is again treated atomistically at the MM level of theory, and finally an extended outer region for all atoms that are outwith the defined boundary. Within the GSBP approach this extended outer region can include irregularly shaped parts of the macromolecule as well as bulk solvent.130,134  The solvent molecules are described by a continuous polarizable dielectric, whilst the irregular charge distribution is described by a fixed set of point charges. This approach has been implemented and tested for a number of biomolecular simulations and it is a promising approach for the treatment of long-range electrostatic interactions in QM/MM simulations, which can easily be extended to a range of QM/MM applications, including the scoring refinement step in SBDD.130–134 

The IMOMM (integrated molecular orbital/molecular mechanics) method by Morokuma and co-workers is the prototypical example of a subtractive QM/MM method.135  The IMOMM method was subsequently extended by the authors to allow for the combination of two QM methods in the IMOMO136  and finally to the popular ONIOM (our n-layered integrated molecular orbital and molecular mechanics) method.137,138  However, the later ONIOM and IMOMO methods are not, strictly speaking, QM/MM approaches given that the lower level of theory is often a semi-empirical or ‘cheaper’ QM method such as Hartree-Fock, rather than an MM method (a two-level ONIOM method, where the lower level is an MM method, corresponds to the IMOMM approach, which is most commonly used in SBDD studies).139–141  The strength of the subtractive QM/MM method is primarily in its ease of implementation. There is no explicit coupling between the regions calculated at the QM and MM levels of theory (the EQM–MM(Lig,Rec) term in Equation 1.1). Rather, the coupling between the ligand and its environment is handled at the lower (MM) level of theory.

Equation 1.4

The subtractive scheme requires three separate calculations in order to construct the total energy of the system. The first calculation is at the MM level of theory on the entire system. The ligand is then calculated at the QM level of theory and finally, independently, at the MM level of theory. By subtracting the MM energy of the ligand the subtractive scheme removes all ‘internal’ MM contributions of the ligand from the MM energy and thus results in an MM energy that contains only the energy of the receptor and solvent and the MM energy of the interaction between the ligand and its environment. Thus, within the subtractive method the coupling of the ligand to the receptor is simply contained within the initial MM calculation of the full system.

The simplicity of the subtractive method, unfortunately, does come at a cost. As the coupling interaction between the ligand and its environment is handled completely at the MM level of theory, each ligand needs to be well parameterized within the force field representation that is being used. While this is possible for small databases of ligands that are being screened,139–141  one of the driving concepts behind QM/MM SBDD is the ability to accurately represent the vast chemical space that ligand compounds occupy using QM methods. Finding or developing adequate parameters for a full library of compounds can be quite challenging and will generally be non-transferrable between different systems or different force fields. The accuracy of the subtractive method is therefore limited by the accuracy of the MM representation of the ligand.

This type of coupling, at the molecular mechanical level of theory, is often referred to as mechanical embedding.109  The QM calculation of the ligand is generally not performed in the presence of the point charges of the environment and as such the QM density is not polarized by the environment. Alternative subtractive schemes have recently been introduced that now allow electrostatic embedding.142–145 

The overwhelming application of QM/MM methods within structure-based drug design can be divided into two main groups. Firstly, there are several applications of QM/MM methods to gain a detailed understanding of the binding site interactions that play a role in discriminating between closely related ligands. In this type of application there is often only a small number of closely related ligands, which nonetheless demonstrate a large degree of diversity in their binding ability. From an empirical perspective the variation in activity of closely related ligands is often difficult to explain and, as such, detailed investigations into the binding modes of such ligands can shed further light onto the mechanism by which the ligands are stabilized in the binding site and consequently can aid in the further development of new lead compounds. The second main application of QM/MM methods is in the scoring refinement of a set of compounds based on an initial screen. The scoring refinement typically will make use of semi-empirical methods and focus less on the details of the binding mechanism, but rather uses the QM/MM method to gain an accurate representation of the various ligands, relative to the initial force field based screening protocol. In the following we consider two case studies that highlight the important role of QM/MM methods in both of these application areas.

The use of QM/MM methods to understand the various factors that help to differentiate the binding ability of closely related ligands is exemplified by two recent publications by Tuttle et al.98,102  The first of these demonstrates how both the non-covalent interactions between the binding site and the effects of strain energy on ligands, imposed by the binding site, affect the binding ability of Latrunculin A and two of its analogues to G-actin.98  In the second study, the role of dispersion in helping to stabilize ligands in the binding site was explored through dispersion augmented (-D) QM-D/MM methods.102 

Latrunculins form 1:1 complexes with globular actin (G-actin), which interferes with the ability of the protein to polymerize and form actin filaments. The use of these compounds in chemical biology as probe molecules is widespread and as such a ‘diverted synthesis’ campaign was carried out by Fürstner et al.98,146  in order to develop a library of closely related analogues that could shed light onto the mechanism by which the compounds achieved their remarkable biological activity. As a result of this study, one compound (L32, Scheme 1.1) was identified as having biological activity that was as good as, or even better, than the naturally occurring latrunculins. The aim of the computational study was therefore to elucidate the binding mode of Latrunculin A (LA, Scheme 1.1) and two of its analogues – the naturally occurring compound Latrunculin B (LB, Scheme 1.1) and the synthetic analogue L32 – to G-actin. Experimentally, the binding strength was measured by the ability of the compound to inhibit the biological response of actin towards fibroblasts. The experimental study revealed that the synthetic analog (L32) was as effective an inhibitor as LA (LA≥L32>LB).98 

Scheme 1.1

Naturally occurring compounds Latrunculin A (LA), Latrunculin B (LB), and the synthetic analogue L32.

Scheme 1.1

Naturally occurring compounds Latrunculin A (LA), Latrunculin B (LB), and the synthetic analogue L32.

Close modal

The ‘standard’ approach for preparing biological systems for modern QM/MM studies was applied. Following the preparation of the receptor and environment, the ligands were constructed and a conformational search was carried out in order to determine low-energy conformers of the relatively flexible ligands. The resulting low-energy conformers were then docked into the receptor, the resulting complexes were then minimized and finally optimized at the QM/MM level of theory.

The structure of actin with LA bound is available from the protein data bank (PDB ID: 1ESV).147  There are several unresolved residues in the protein structure, although as these residues are remote from the binding site, the backbone was capped with neutral residues at these positions. The protonation state of the ionizable residues was checked with propKa148  and reduce,149,150  which was also used to determine the orientation of the ambiguous residues (His, Asn, Gln). The crystal waters, three Ca2+ ions and ATP were retained; however, no additional hydration was performed.

A conformational search was performed for LA, LB and L32 using the PM3 Hamiltonian as implemented in Spartan.151  The ten lowest energy conformers of each molecule were selected and subsequently optimized within MNDO99152  using the MNDO/H Hamiltonian,153,154  which provides reasonable geometries and relative energies for hydrogen bonded systems.153,155  The reliability of the semi-empirical structures was checked via re-optimization of the conformers at the BLYP156,157  level of theory with the def2-SVP basis set158  available in Turbomole.159–163 

For each compound, the ten conformers (optimized with BLYP) were then docked into the actin-binding site using AutoDock 3.0 (AD3).70  The AD3 grid was constructed such that it encompassed and was centred on the binding site of LA (available from the original PDB). AD3 is not parameterized for substrates, such as ATP or Ca2+, and as such these residues are ignored during the docking procedure. As ten different conformational variants of each ligand were tested, the ligand was kept rigid during the docking process. The orientation resulting in the best binding energy for each conformer was selected from each docking run for optimization.

The complex with the best binding energy from each docking simulation was pre-minimized in CHARMM.84,164,165  The standard CHARMM force field166–168  used in this work does not contain parameters for the LA, LB or L32. However, given the preparative nature of the force field minimization (i.e. optimization of the full complex would be performed at the QM/MM level of theory), the parameters and topology of the ligands were automatically generated using the commercial CHARMm force field169  in the Accelrys suite of programs. The ligands were kept rigid and the remainder of the system was then relaxed during a 10 000-step ABNR minimization.

The structures resulting from the force field minimization were optimized at the hybrid quantum mechanical/molecular mechanical (QM/MM) level of theory with the modular program package ChemShell.170,171  The QM energy and gradients were provided by MNDO99,152  with MNDO/H153,154 as the QM level of theory. Chemshell's internal force field driver, using the CHARMM parameter and topology data,166–168  supplied the MM energy and gradients. No electrostatic cut-offs were employed in the QM/MM calculations. Electrostatic embedding109  was used to couple the QM and MM regions. The QM/MM geometry optimizations were performed with the HDLCOpt algorithm,172  as implemented in ChemShell. Only the most stable orientation of the ligand in the binding site, as judged by the complex with the lowest QM/MM energy after optimization was used in the analysis of the ligands’ binding ablities.

The binding sites of the three ligands form the QM/MM optimized structures (see Figure 1.2). The binding site of the LA complex (see Figure 1.2a) revealed a H-bond network similar to that observed in the PDB crystal structure that the complex was derived from.147  This similarity both in the orientation and the H-bonding network between the docked and optimized structure (see Figure 1.2a) and the crystal structure indicates that the methods applied were able to accurately reproduce the correct binding mode. The complex exhibits strong H-bonds between the head group and the surrounding residues, while the macrocycle has no obvious H-bond pairing with the protein. In the head group the NH functionality forms a strong H-bond to Asp107 and the carbonyl forms a strong H-bond to Thr136, while there exists only a weak interaction with Lys163. The O atom of the 6-membered ring (O1) forms a strong H-bond with Tyr19.98 

Figure 1.2

H-bond network of (a) LA (b) LB and (c) L32 in the actin binding site after the QM/MM optimization.

Figure 1.2

H-bond network of (a) LA (b) LB and (c) L32 in the actin binding site after the QM/MM optimization.

Close modal

Analysis of the H-bonding between analogues (LB and L32) and the receptor revealed some changes in the H-bonding network. In the case of LB, these changes led to a slightly weaker H-bond network with the two strong and one average H-bond in the LA complex being exchanged for one strong and two average H-bonds in the LB complex (see Figure 1.2b). However, for the L32 ligand, although there is a shift in the binding pattern, the qualitative strength of the H-bonds (two strong and one average) remains similar to the LA complex. These minor changes in the hydrogen bonding network did offer some insight into the relative binding ability and, consequently, the biological activity of the three compounds (i.e. LA≥L32>LB) although it was unlikely that such minor changes in the binding enthalpy, associated with the redistribution in H-bonding, could fully explain the observed effect on the overall binding ability.98 

The changes in the H-bond network of the binding site offer a qualitative explanation for the observed differences in the biological activity of the three ligands. However, the contribution from the hydrophobic effect will aid in stabilizing the ligands in the binding site. In Figure 1.3, the hydrophobicity of the binding pocket in the LA complex is mapped onto a Connolly surface173,174  of the receptor. The hydrophobicity is determined by the partial charges on the amino acid residues (assigned in the CHARMM force field). The majority of the macrocycle prefers a hydrophobic (non-polar) environment (green regions in Figure 1.4). This is achieved by a small part of the macrocycle (lower left region in Figure 1.4), although the majority of the system is solvent exposed (no surface) or surrounded by polar (red) residues.

Figure 1.3

Hydrophobicity mapped onto a Connolly surface of the receptor in the LA complex.

Figure 1.3

Hydrophobicity mapped onto a Connolly surface of the receptor in the LA complex.

Close modal
Figure 1.4

The dispersive interactions (black dashed lines) stabilizing the hapten in the binding site are described with the same accuracy as the ionic H-bonds between GluH50 and the hapten (pink dashed lines).

Figure 1.4

The dispersive interactions (black dashed lines) stabilizing the hapten in the binding site are described with the same accuracy as the ionic H-bonds between GluH50 and the hapten (pink dashed lines).

Close modal

The hydrophobic contribution to the stability of the complex could be estimated using an empirical function based on the change in the solvent accessible surface area (ΔSASA) in the bound and unbound states. The relationship proposed by Spolar and Record:175 

Equation 1.5

results in a stabilizing free energy contribution of −20±4.5 kcal/mol when the LA complex is formed. For the smaller ligands LB and L32 the stabilizing effect is slightly smaller (−19±4.3 kcal/mol and−17±4.3 kcal/mol) as a greater percentage of the ligand remains solvent exposed in the optimized structures. The calculated values clearly indicate that the binding of the ligand is favourable from a hydrophobic perspective; however, given that the error bars for this empirical approach the criteria offers no clear distinction between the three ligands.

The individual binding energy of the substrates can in principle be calculated directly. However, binding potential energies generally over-estimate the binding free energy of a ligand to the receptor, which is measured experimentally, due to the neglect of the dynamic factors that influence the binding ability of ligands. Nonetheless, given the structural similarity of the three compounds it was reasonable to assume that entropic and desolvation effects are approximately equivalent for the three ligands. Therefore, the relative binding potential energies (ΔΔEbind) may be reasonable.

The lowest energy structures of the complexes and the ligands optimized in the gas phase (QM=MNDO/H) can be used to calculate binding energies. For LA the binding potential energy is −72.1 kcal/mol. The relative stabilities of the ligands and the complexes can be determined from the isodesmic reactions defined in Scheme 1.1. The partners in the isodesmic reactions (pent-2-ene, hept-2,4-ene, nona-2,6-dienoic acid and 3,8-dimethyl-2,6-dienoic acid) were optimized in the gas phase at the MNDO/H level of theory. The isodesmic reactions indicate that the LB complex is destabilized by 12.5 kcal/mol and the L32 complex by 7.5 kcal/mol, relative to the LA complex. Applying the same isodesmic reactions to the isolated ligands, we find that LB and L32 are destabilized by 2.3 and 1.1 kcal/mol relative to LA in the gas phase.98 

The cycle depicted in Scheme 1.2 demonstrates that the relative binding energy (ΔΔEbind) of the ligands is the difference of their relative stability in the enzyme (ΔEisoR) and in the gas phase (ΔEisoL). Using the values obtained from the isodesmic reactions, the relative binding energies decrease in the order LA> L32> LB (0 versus 6.4 versus 10.2 kcal/mol). In all cases the binding of the ligand to the receptor causes a destabilization of the ligand due to the steric constraints imposed by the binding site. The LA and L32 ligands are destabilized by similar amounts (15.2 kcal/mol and 16.8 kcal/mol, respectively), but much less so than the LB ligand (21.5 kcal/mol). A similar distortion results for the receptor upon binding which is consistent with the ligand size, i.e. as the ligand size decreases, the distortion of the receptor structure from the unbound state decreases.98 

Scheme 1.2

(a) Thermodynamic cycle for calculating relative binding energies. (b) Isodesmic reactions for comparing the stability of LAR to LB and LB to L32.

Scheme 1.2

(a) Thermodynamic cycle for calculating relative binding energies. (b) Isodesmic reactions for comparing the stability of LAR to LB and LB to L32.

Close modal

Through the QM/MM optimization of the ligand–receptor complexes this study was able to determine that the distortion energies of the ligands, in concert with the differences in the H-bond network, resulted in the large variation in the biological activity for the ligands. The LA and L32 ligands both have H-bond networks that are similar in strength, equally, the ligand distortion of L32 and LA are similar. Thus, LA and L32 have similar binding abilities to the receptor. In contrast, the H-bond network is slightly weaker for LB and the distortion of the ligand from its optimum geometry is larger in this case.

Dispersion interactions, such as π-stacking, play an important role in biochemical structure and stability. As such there are often a number of π-stacking partners in a binding site that a ligand is able to interact with in order to create a stable complex. However, describing dispersion interactions using DFT or semi-empirical methods is notoriously difficult and as such the quality of the methods in describing such interactions within a QM/MM context has been unclear. However, a recent study on the binding of a hapten to the 34E4 enzyme has illustrated the utility of dispersion-augmented QM methods in obtaining an accurate description of the role of both dispersion and hydrogen bonding interactions in receptor–ligand binding.

Dispersion-augmented QM methods (usually denoted by the suffix -D to the QM method, e.g. OM3-D) contain an additional parameterized term, with the familiar r6  scaling.176,177  Despite the empirical nature of this term, several studies at both the DFT and semi-empirical level have shown that with the appropriate parameterization the inclusion of an empirical dispersion term greatly increases the accuracy of these QM-based methods.95,102,176–180 

Within a QM/MM context, mechanistic studies of enzyme-catalysed reactions181  have revealed that including an empirical dispersion term with DFT-based methods for modelling the QM region improves the accuracy of the calculated potential energy landscape. However, the use of dispersion-augmented methods within a SBDD format is equally important if selected residues of the biomolecule are also included in the QM region. If the receptor is modelled completely at the MM level of theory, the dispersion interactions between the ligand and receptor are handled at the MM level of theory, as described in Section 1.2.1, and this is generally adequate. However, if selected binding site residues are included into the QM region, or if the ligand will potentially have competitive intramolecular dispersion interactions that affect the orientation and conformation of the ligand in the binding site, then a dispersion-augmented QM method is required.

A recent study by Tuttle et al. on the role of dispersion forces in the binding site of the antibody 34E4 demonstrated the importance of dispersion forces in stabilizing the binding site and binding the associated hapten (see Figure 1.4).102  Using the OM3-D semi-empirical method in combination with the CHARMM force field the authors equilibrated the hapten–34E4 complex at the QM/MM level of theory. The OM3-D dispersion terms were found to contribute 26 kcal/mol to the binding of the hapten. The use of dispersion-corrected QM methods is therefore necessary to obtain a consistent and unbiased description of the non-covalent interactions that contribute to the binding energy and consequently the ranking of relative ligands in their ability to bind to a receptor.

QM/MM methods are currently being used in rescoring (scoring refinement) protocols within the SBDD strategy. One of the most important and outstanding applications of these methods is to the field of metalloprotein receptors.8,13,99,182  The necessity of using a QM/MM method in the scoring of ligands that bind to metalloproteins systems is the requirement of obtaining a realistic representation of the charge transfer that occurs when ligands bind to the metal centre in the protein. Moreover, a recent study has also highlighted the deficiency of standard force fields when metal ions are incorporated into the binding site, which can lead to the necessity of including proximal residues from the protein into the QM region as well.182 

Proteins that contain metal ions, and as such a large charge, in the binding site can result in the polarization of proximal residues that alters significantly their charge distribution away from the generic charge distribution that is parameterized in the force field description of the amino acids. However, rather than recalculating the charges for these specific amino acids and fixing them at new potentials in the unbound state, the QM/MM methodology allows for an ‘on-the-fly’ recalculation of the charge distribution simply through the inclusion of these susceptible amino acids into the QM region as well. Clearly, the difficulty with increasing the size of the QM region lies in the additional computational cost associated with this expansion. However, with efficient and accurate semi-empirical QM methods this computational cost is not necessarily prohibitive on modern computing architectures.

In the study by Cho and Rinaldo, they used an ‘extended QM/MM rescoring method’ that uses single point QM/MM calculations on structures obtained from the docking program Glide to recalculate the charge distribution in both the ligand atoms, the metal ion and the atoms in surrounding protein amino acids.182  The recalculated charges from the QM/MM calculation are then fed back into the docking program, creating a modified force field, which is used to re-dock the ligands into the receptor-binding site. This iterative process can conceivably be continued until the fluctuations between charges, in successive docking runs, have converged. In the study by Cho and Rinaldo, their method was tested on a set of eight matrix metalloproteinases, which had associated ligands with known binding orientations and energies.182  In seven of the eight cases the correct binding mode was identified with excellent agreement between the calculated and experimental binding orientation. Moreover, the predicted binding energy was also significantly improved, relative to the binding energy predicted with the standard Glide force field. Thus, although this approach does not directly use the QM/MM potential energy in the scoring function, the use of QM/MM methods to refine the docking force field in a self-consistent manner is a further example of the utility of QM/MM methods in SBDD.

A more direct application to the use of QM/MM methods in refining the scoring of ligand binding to a receptor is exemplified by the Balaz et al. where 28 hydroxamate inhibitors were bound to the zinc-dependent matrix metalloproteinase-9 (MM9).99  In this study the authors used a four-step process to obtain a refined scoring for the ligands to the MM9 receptor, as follows.

  1. This involved an initial docking of the ligand to the receptor using a standard docking force field (FlexX) with the best poses from this study chosen based on (i) a structural criterion such that the oxygen atoms in the ligands were aligned with the catalytic zinc atom; and (ii) if (i) was satisfied for multiple poses the best FlexX ranking was then employed.

  2. After the docked poses for each of the ligands was obtained the structures were then optimized at the QM/MM level of theory in order to obtain an accurate description of the interaction of the ligand with the metal atom. The QM region for each system involved the ligand, the metal atom, three histidine residues that were covalently linked to the zinc ion and a proximal glutamate that was expected to closely interact with the ligand. The atoms involved in the QM region were used in this study (see Figure 1.5).

  3. After the initial QM optimization, the bonds and angles between the zinc ion and the ligand were constrained and a MM MD simulation was carried out on the complex in order to allow the binding site of the complex to relax and to obtain an ensemble of structures that was consistent with the bound state of the system.

  4. Finally, structures at 100 fs intervals along the 5 ps MD simulation trajectory were extracted and single point QM/MM calculations were carried out on these structures in order to obtain QM/MM binding energy for the time-averaged structures.

Figure 1.5

QM region included employed in the QM/MM study of 28 hydroxamate inhibitors binding to matrix metalloproteinase-9.99  Zinc atom (grey) displayed using the VDW radius, while all other residues are displayed in licorice style.

Figure 1.5

QM region included employed in the QM/MM study of 28 hydroxamate inhibitors binding to matrix metalloproteinase-9.99  Zinc atom (grey) displayed using the VDW radius, while all other residues are displayed in licorice style.

Close modal

The inclusion of the MD simulation and QM/MM energy on the time-averaged structures (steps 3 and 4) resulted in a significant improvement in the correlation between the experimental and predicted binding energies. In comparison with the predicted binding from the FlexX simulation, the correlation coefficient is increased from 0.044 (for FlexX) to 0.900 when the QM/MM method described above was used.99  It should be noted that this excellent correlation was only possible after the use of a ΔSASA (change in solvent accessible surface area) term in order to include an estimation of desolvation effects. Nonetheless, with this term taken into account, there is still a marked improvement in the accuracy of the correlation when changing from the single optimized QM/MM energy (i.e. obtained after step 2) with a ΔSASA term included, to the energy of the time-averaged structures (i.e. after step 4).99  This result clearly indicates the importance of including both the effect of sampling and obtaining accurate energies for the ensemble of structures in order to discriminate between the binding abilities of structurally similar ligands.

The use of QM/MM methods within a drug discovery context is still within its infancy. To date the majority of applications focus on providing a more refined, detailed understanding of specific interactions within the binding site. There exist a few text cases in the literature which exemplify the use of QM/MM methods as part of a screening protocol; however, the application of this approach to screening libraries of compounds is still a development area and has not yet been taken up significantly by the drug-discovery community.

While QM/MM methods often provide an improved description of ligand–receptor binding this description is often not sufficient to independently provide the correct ranking of ligands in terms of their binding energies. As was observed in the case of the matrix metalloproteinase study,99  assessing the hydrophobic (entropic) contribution to the binding ability of a ligand to a receptor is often of equal or even greater importance than the stability gained through individual intermolecular interactions between the ligand and the binding site residues in determining the relative binding ability of ligands. Thus while an improved understanding and description of the intermolecular interactions in the binding site is important, the current inability of QM/MM methods to accurately account for the hydrophobic effect still minimizes their impact in the field.

The area where QM/MM methods have made the most significant impact and appear likely to continue to do so, is in the biding of ligands to metalloproteins and other receptors with highly charged binding sites. The role of quantum effects in these systems (particularly effects such as charge transfer and polarization) for stabilizing the ligand binding are of such importance that the standard empirical force field approaches, which are incapable of describing these effects, are completely inadequate in. In these cases, QM/MM methods offer a significant advantage and as these methods develop further to allow more rapid and diverse screening protocols, their application to metalloprotein receptors will remain a priority.

The author acknowledges the Royal Society of Edinburgh for support through a Scottish Executive Personal Research Fellowship, the EPSRC (EP/F031769) and the Glasgow Centre for Physical Organic Chemistry for funding.

Close Modal

or Create an Account

Close Modal
Close Modal