- 1.1 Introduction
- 1.2 Physicochemical Properties
- 1.3 Lipophilicity
- 1.3.1 Measuring Log P and Log D
- 1.3.2 Zwitterions
- 1.3.3 Other Solvent Systems
- 1.3.4 Membrane–Water Partition Coefficients
- 1.3.5 Chromatographic Log D Measurement
- 1.3.6 Calculating Log P and Log D7.4
- 1.4 Ionisation Constants
- 1.4.1 Measuring Ionisation Constants
- 1.4.2 Calculating Ionisation Constants
- 1.4.3 Manipulating pKa in Medicinal Chemistry Strategy
- 1.5 Hydrogen Bonding
- 1.5.1 Polar Surface Area
- 1.5.2 Quantifying the Contribution of a Hydrogen Bond
- 1.6 Solubility
- 1.6.1 Measurement of Solubility
- 1.6.2 Calculating Solubility
- 1.7 The Rule of Five
- 1.7.1 Beyond the Rule of Five
- 1.8 Ligand Efficiency Metrics
- 1.9 Compound Quality and Drug-likeness
- 1.10 Conclusions
- 1.11 Hints and Tips
- Key References
- References
Chapter 1: Physicochemical Properties
-
Published:03 Feb 2023
-
Product Type: Textbooks
A. M. Davis and P. D. Leeson, in The Handbook of Medicinal Chemistry, ed. S. E. Ward and A. Davis, The Royal Society of Chemistry, 2023, ch. 1, pp. 1-39.
Download citation file:
Chemistry is implicitly concerned with chemical structure, but just as much of medicinal chemistry thinking is occupied with the impact of the structure on the physicochemical properties of the molecule. Optimising the shape, hydrophobicity, hydrogen bonding and charge distribution of a lead molecule to be complementary to the target binding site is central to increasing potency and selectivity. Improving solubility and permeability and other absorption, distribution, metabolism, elimination and toxicological (ADMET) properties requires the control of bulk physicochemical properties. The control of physicochemical properties is directly related to the much used, but ill-defined, notion of “compound quality”. In this chapter we shall describe some of the more common physicochemical properties, how they are calculated and measured, and their impact on medicinal chemistry optimisation.
1.1 Introduction
Chemistry is implicitly concerned with chemical structure, but just as much of medicinal chemistry thinking is occupied with the impact of the structure on the physicochemical properties of the molecule. Optimising the shape, hydrophobicity, hydrogen bonding and charge distribution of a lead molecule to be complementary to the target binding site is central to increasing potency and selectivity. Improving solubility and permeability and other absorption, distribution, metabolism, elimination, and toxicological (ADMET) properties requires the control of bulk physicochemical properties. The control of physicochemical properties is directly related to the much used, but ill-defined, notion of “compound quality”. In this chapter we shall describe some of the more common physicochemical properties, how they are calculated and measured, and their impact on medicinal chemistry optimisation.
1.2 Physicochemical Properties
The fundamental physicochemical properties most often used in defining compound quality are shown in Table 1.1.
Property . | Description . |
---|---|
Molecular weight (MW) | Surrogate for molecular size |
Heavy atom count | Surrogate for molecular size |
log P | n-Octanol/water partition coefficient |
log D7.4 | n-Octanol/water distribution coefficient at pH 7.4 |
pKa | Ionisation constant |
Aqueous solubility | Equilibrium solubility at pH 7.4 |
HBA or sum of O + N atoms | Count of hydrogen bond acceptors |
HBD or sum of OH + NH atoms | Count of hydrogen bond donors |
Ro5 | Poor solubility and permeability is more likely when MW > 500, clog P > 5, O + N > 10, OH + NH > 5 |
PSA or TPSA | Dynamic or static total polar surface area |
Fsp3 | Fraction sp3 carbons = Csp3/C |
nAr | Count of aromatic rings |
Property forecast index (PFI) | Chrom log D7.4 + nAr |
Intrinsic PFI (iPFI) | Chrom log P + nAr |
LLE or LipE | Lipophilic Ligand Efficiency = p[Activity] − log P or log D |
LE | Ligand efficiency = p[Activity] × 1.37/heavy atom count (kcal mol−1 HA−1) |
Property . | Description . |
---|---|
Molecular weight (MW) | Surrogate for molecular size |
Heavy atom count | Surrogate for molecular size |
log P | n-Octanol/water partition coefficient |
log D7.4 | n-Octanol/water distribution coefficient at pH 7.4 |
pKa | Ionisation constant |
Aqueous solubility | Equilibrium solubility at pH 7.4 |
HBA or sum of O + N atoms | Count of hydrogen bond acceptors |
HBD or sum of OH + NH atoms | Count of hydrogen bond donors |
Ro5 | Poor solubility and permeability is more likely when MW > 500, clog P > 5, O + N > 10, OH + NH > 5 |
PSA or TPSA | Dynamic or static total polar surface area |
Fsp3 | Fraction sp3 carbons = Csp3/C |
nAr | Count of aromatic rings |
Property forecast index (PFI) | Chrom log D7.4 + nAr |
Intrinsic PFI (iPFI) | Chrom log P + nAr |
LLE or LipE | Lipophilic Ligand Efficiency = p[Activity] − log P or log D |
LE | Ligand efficiency = p[Activity] × 1.37/heavy atom count (kcal mol−1 HA−1) |
1.3 Lipophilicity
The lipophilicity of a molecule is the most important contributor to ADMET properties, and “reduce lipophilicity” has become something of a mantra to medicinal chemists. As we shall see in this chapter, and what will be apparent throughout the book, high lipophilicity is in general detrimental to achieving good drug-like properties.
The terms lipophilicity and hydrophobicity are used interchangeably in medicinal chemistry, probably erroneously, as the driving force for membrane partition, or partition of compounds into protein pockets, is hydrophobic rather than lipophilic. IUPAC provides distinct definitions:1
Lipophilicity represents the affinity of a molecule for a lipophilic environment.
Hydrophobicity is the association of non-polar groups or molecules in an aqueous environment, which arises from the tendency of water to exclude non-polar molecules.
While dispersion forces between non-polar surfaces do contribute to the overall partition of a compound out of water into a protein active site or non-polar environment such as the centre of a lipid bilayer, much of the driving force is hydrophobic, as we will describe further.
Thermodynamic measurements show that the free energy change governing partitioning is largely entropically driven.2 Table 1.2 shows the thermodynamic parameters for the transfer of some simple hydrophobic groups from their liquid state to water at 25 °C. As can be seen, the free energy of transfer is dominated by the entropy term, and at 25 °C the enthalpy change is almost zero.
Compound . | ΔG° hydrocarbon to water (kJ mol−1) . | ΔH° hc to w (kJ mol−1) . | TΔS° hc to w (kJ mol−1) . |
---|---|---|---|
+19.3 | +2.1 | −17.2 | |
+26.2 | +2.0 | −24.2 | |
+32.4 | +0.0 | −32.4 |
Compound . | ΔG° hydrocarbon to water (kJ mol−1) . | ΔH° hc to w (kJ mol−1) . | TΔS° hc to w (kJ mol−1) . |
---|---|---|---|
+19.3 | +2.1 | −17.2 | |
+26.2 | +2.0 | −24.2 | |
+32.4 | +0.0 | −32.4 |
The origin of the hydrophobic effect is still debated, not least due to the importance of the hydrophobic effect to protein folding, but the authors favour the so-called “iceberg model”.3 In bulk water, each water molecule can donate two hydrogen bonds and accept two hydrogen bonds through its lone pairs on oxygen, and in bulk water all four interactions are satisfied in a dynamic 3-D network. A hydrophobic group placed in water requires a cavity to be made to accommodate it, which disrupts the hydrogen bond network each water molecule makes with surrounding water molecules. In order to make a cavity in water for a hydrophobic group, some of the water molecules' hydrogen bonds are necessarily broken and the dynamic process of hydrogen bonds being made and broken in liquid water is disrupted. Those water molecules around the hydrophobic group become structured, almost ice-like, in trying to maximise their hydrogen bonding potential. When the hydrophobic group is removed from water and partitions into a hydrophobic receiver phase, the dynamic hydrogen bond network is re-established, the cavity filled, the ordered waters becoming disordered again, which is entropically favourable.4
For a flexible molecule, when placed in water it will try to minimise its exposed hydrophobic surface area and to maximise its hydrogen bonding potential to surrounding water molecules. When placed in a hydrophobic acceptor phase the opposite occurs, and the molecule will try minimise its potential for unpaired hydrogen bonds. For these reasons, partitioning of flexible molecules has a conformational component, and the molecule's conformation in water can be different from its conformation in the hydrophobic phase. For example, the macrocyclic immunosuppressant FK506 has different conformations in chloroform and water.5 The phenomenon of minimising the exposure of hydrophobic surface area in water is called “hydrophobic collapse”, and likewise the phenomenon of the potential to minimise unpaired hydrogen bonds in non-polar media can be termed “hydrophilic collapse”. The whole process has been recently referred to as “chameleonic” efficiency.6 The existence of hydrophobic and hydrophilic collapse may be seen by correlations of clog P/D vs. measured log D7.4 where the slope is often <1.0.
For partition into a protein active site, one might expect a positive contribution of dispersion forces between the hydrophobic groups on the ligand and hydrophobic sidechains in the cavity. The cavity is already preformed (unless the cavity is induced by the ligand). But where hydrophobicity drives drug binding, plots of pKi vs. log P/D often have slopes <1.0, and not >1.0 routinely, as might be expected, suggesting little benefit in the partition into the protein active site over partitioning into n-octanol. A few rare examples exist.7,8 As highlighted by Fersht,7 the log(kcat/Km) for the chymotrypsin-catalysed hydrolysis of a series of esters of the form R–CH(NHAc)CO2Me, increases with increasing hydrophobicity (as measured by n-octanol–water partition coefficients) with slope = 2.2. Fersht also cites the inhibition of chymotrypsin by a series of substituted formanilides, which increases with increasing hydrophobicity with a slope of 1.5.
For many years, the standard system in which to measure lipophilicity has been the partitioning of a molecule between n-octanol and water. The equilibrium of a neutral compound between n-octanol and water is measured, normally at 20 °C and the partition coefficient is reported on a log10 scale.
A log P of zero represents an equal partition between water and the n-octanol phase.
The solvent n-octanol became the standard hydrophobic phase for the partition experiment, as it is non-water miscible, UV-transparent, but has enough polarity due to its hydroxyl group to dissolve many drug molecules, unlike more hydrophobic alkane phases. While the properties of n-octanol have proven useful, it should be remembered that n-octanol–water is a model partitioning system, and its choice as a model system has some benefits, but also brings some compromises in quantifying hydrophobicity (for example, the n-octanol layer contains ∼33% water on a molar basis9 ). Some of these benefits and compromises will be highlighted later.
n-Octanol–water was the solvent system chosen to quantify hydrophobicity by Corwin Hansch in the 1960s in his seminal papers at the birth of modern QSAR.10,11 Since then, many compounds have had their log P values measured, and large compilations exist, which provide the basis for prediction algorithms to calculate log P, such as the classic clog P algorithm.12 The availability of good prediction algorithms and their supplementation with new measurements to improve predictive power, have sustained n-octanol–water as the industry standard partitioning system in which to measure log P.
1.3.1 Measuring Log P and Log D
The standard experimental procedure for measuring partition coefficients is known as the “shake-flask” method. Traditionally this would measure the equilibration of one compound, pre-dissolved in either water or n-octanol, shaken in a glass laboratory bottle at constant temperature until equilibrium is attained. The mixture is then centrifuged to separate the phases, each phase sampled and the concentration of the analyte in each phase measured by UV spectroscopy. It was a highly labour-intensive measurement. The method has now been automated to run on modern laboratory automation in 96-well plates, in mixtures of up to 10 compounds using LCMS with single ion monitoring, removing the throughput bottleneck of the traditional method.13 Due to the sensitivity of measuring concentrations in both phases to estimate the log P, shake flask has a dynamic range of −2 < log P < 4.
n-Octanol–water largely only supports the partition of neutral species. When the drug molecule contains an ionisable centre (i.e. is an acid, base or zwitterion), the distribution of the compound between n-octanol and water becomes dependent upon the aqueous-phase pH, and so the equilibrium must be measured at a particular pH. The distribution coefficient, D, is the partition of a compound at a particular pH and is also recorded on a log10 scale.
For an acid:
For a base:
A theoretical plot of log D vs. pH for an acid with pKa = 4 and a base with pKa = 9, both having log P = 2, is shown in Figure 1.1.
For every 1 log unit change in aqueous phase pH, either above the pKa of an acid or below the pKa of a base, the log D decreases by 1 log unit. As a standard point of comparison, the distribution coefficient is usually measured at pH 7.4, representing physiological pH. Log D measured at pH 6.5 is also used, as this is relevant to absorption from the lower intestine. The log P can still be measured if the aqueous phase pH is chosen such that the compound is largely in its neutral form, but that necessitates a bespoke experiment.
The log D–pH curves in Figure 1.1 are ideal curves assuming only the neutral form can partition into the hydrophobic phase in accord with the pH partition hypothesis. Experimentally, n-octanol–water distribution coefficients do not behave as the theoretical curves in Figure 1.1 suggest. At higher pH values for an acid or lower pH values for a base, both a few pH units away from the pKa, distribution in n-octanol–water can become pH independent, suggesting distribution of the ionised form into n-octanol can become the dominant species over the partition of the diminishing neutral form, see Figure 1.2.
However, the plateau log D value is dependent upon the aqueous phase ionic strength, suggesting the partitioning species is an intermolecular ion pair, the drug molecule taking with it a counterion from the buffer. In a study with the acidic drug proxicromil, at pH 7.4 in n-octanol–water, where the water phase contained 0.15 M NaCl, more than 95% of the observed partition of proxicromil was due to the ion pair, while only 5% was due to the neutral form of proxicromil.14
There is little literature evidence supporting the physiological importance of intermolecular ion-pair partitioning to membrane permeability, as observed in n-octanol–water partitioning. In fact, introducing an intramolecular ion pair is a successful strategy to minimise passive partitioning across the blood–brain barrier, as exemplified by the development of non-sedating antihistamines.
It could be argued that the problem of ion-pair partitioning in solvents such as n-octanol should be embraced, and that all distribution coefficients should be determined using a physiological concentration of salt in the buffer. However, this approach should be refuted since the contribution of the ion-pairing component to the observed distribution coefficient will depend strongly on the nature of the organic solvent, being much larger for aliphatic alcohols such as n-octanol than for chloroform or alkanes.15,16
For practical purposes the existence of ion-pair partitioning seen in n-octanol should be treated as an artifact of the chosen partitioning system and dependent on experimental conditions. Experimentally one would try to avoid measuring a log Do/w on these ion-pair plateau. This can be done by minimising the ionic strength of the aqueous phase or extrapolation of the distribution coefficient to zero ion strength.
Ion pairing also influences the partitioning of molecules into phospholipid membranes, although the mechanism is different, and in this case, can have biological significance. Studying partitioning of proxicromil into DOPC membrane vesicles, while an increase in ionic strength leads to an increase in partitioning of proxicromil, this is almost exclusively due to the fact that increased ionic strength leads to a reduction in surface potential at the phospholipid membrane–water interface and modulation of the self-induced surface charge driven by the interfacial partitioning of proxicromil ion. This is because proxicromil can partition to organise its hydrophobic structure to dip into the membrane while leaving its carboxylic acid headgroup exposed to the membrane water interface. The ionic strength effect can be well accounted for by the Gouy–Chapman model of the electrical double layer.14 This indicates that the increased values of Dmem observed at higher ionic strengths are not due to the intervention of ion-pair partitioning. This is even more significant for bases, as we shall detail later.
1.3.2 Zwitterions
For a zwitterionic compound, which contains an acidic and a basic ionisable group, at aqueous phase pH values lying between the pKa values, partitioning is a maximum where the molecule can exist as a neutral and a zwitterionic species.
For a zwitterion the log D at a particular pH is given by:
Log D is at a maximum between the pKa values, but between the pKa values, the compound exists both as a zwitterionic form and a neutral form, the exact ratio being dependent on the microscopic pKa values of the different ionising species. As n-octanol–water can support the partitioning of the intramolecular ion pair and the neutral form, this may not be relevant for phospholipid bilayer partitioning, as mentioned above. If a high proportion of the neutral form is required, the separation of the pKas may also be important – as when the acid and base pKa are closer together, the proportion of the neutral form increases. Zwitterions are an important class of drugs,17 and can achieve permeability by both passive permeability (driven by the neutral form) and active transport, Figure 1.3. The fact that non-sedating histamine receptor H1 antagonists such as fexofenadine use zwitterionic nature to minimise CNS exposure while achieving good gastrointestinal permeability due to active uptake, indicates some of the benefits of designing zwitterions.18,19 Levodopa is a zwitterionic prodrug of dopamine, used in the symptomatic treatment of Parkinson's disease. In this case forming the zwitterion increases oral absorption and delivery to the brain, presumably by competition with natural amino acids for active transport.20 In the brain it is decarboxylated to release dopamine, supplementing the dopamine deficit in Parkinson's patients. A number of groups have attempted to increase the delivery of levodopa to the brain by forming ester and amide prodrugs of the prodrug to increase their passive permeability.21
As we shall see in this chapter, and what will be apparent throughout the book, is that both high log D7.4 and high log P are detrimental. So what are good ranges to aim for? For oral drugs, log D7.4 values of 0–2 and log P values of 0–3 would seem good target ranges, leading to compounds with both good water and good lipid solubility. Similar targets have been cited by other authors.22 These ranges for log P and log D7.4 are also suitable for CNS penetration, although oral, and especially CNS penetration, adds further limits on other physicochemical properties, as we shall go on to describe in the following sections.
Log D7.4 can be lowered by either decreasing the log P of the molecule, or moving the pKa of ionisable centres further away from pH 7.4 (lower for an acid or higher for a base). But lowering log D7.4 by changing the pKa of a high log P compound, by increasing the pKa of bases or decreasing the pKa of acids, at extremes could lead to highly amphoteric compounds with cell lytic properties.
1.3.3 Other Solvent Systems
Water–hydrophobic solvent phase partitioning as a quantitative scale of hydrophobicity goes all the way back to the birth of QSAR. Early work used olive oil–water partitioning. Hans Horst Meyer23 first proposed and demonstrated that anaesthetic potency was related to lipid solubility. Two years later a similar proposal was made by Charles Overton.24 They showed the narcotic effect on tadpoles was proportional to the partition coefficient of the compounds from water into olive oil, as a model for the partition of the narcotics into the biological membrane. Beginning in the 1960s, after the pioneering work of Hansch and co-workers, n-octanol became the dominant system, as discussed above. But other hydrophobic solvents have been chosen in which to measure partitioning with water, including alkanes, toluene, chloroform as a hydrophobic phase possessing a weak hydrogen bond donor, propylene glycol dipelargonate as a complementary hydrophobic acceptor solvent,25 and phospholipid membrane vesicles26 as a more biologically relevant system.
1.3.4 Membrane–Water Partition Coefficients
Phospholipid vesicles have often been used as a model hydrophobic phase, of particular interest in modelling drug membrane effects. Different phospholipids have been used, but dimyristoyl phosphatidyl choline (DPMC) is commonly chosen. In order to measure partitioning, the alkyl chains need to be in the melted state, so the experimental temperature needs to be above the phase transition temperature of the membrane, which for DMPC is about 23 °C. Besides partitioning into the hydrophobic core of the membrane vesicles, ionisable compounds can partition in an interfacial way, as already mentioned for proxicromil, and which is particularly efficient for bases, where the positive charge results in ionic interactions with the negatively charged phosphates present in the phospholipid headgroups, while the hydrophobic part of the molecule can reach towards the hydrophobic membrane core.
Indeed, the partitioning of basic compounds into membrane vesicles becomes more enthalpically driven than structurally related neutral compounds, which are predominantly entropically driven as expected for hydrophobic partition.26 This interfacial partition has been proposed to be the driving force for high tissue affinity of basic compounds, as shown by their high volumes of distribution (Vss)27 and enhanced lung residency of basic respiratory drugs,28 but this phenomenon may also in part drive the adaptive exchanges in membrane structure seen as phospholipidosis.29 A recent review indicates over 50 clinically used drugs have been shown to induce phospholipidosis.30 Most of them are lipophilic basic drugs with median log P = 4.2 and pKa = 9.2.
Due to the interfacial partitioning dominant for charged drug molecules, membrane partitioning systems are useful in understanding membrane affinity.
1.3.5 Chromatographic Log D Measurement
The n-octanol–water partition system for quantifying hydrophobicity is limited by the solubilities of molecules in the water and n-octanol phases, and for hydrophobic molecules, the analytical precision of measuring concentration differences between often high concentrations in the n-octanol phase and very low concentrations in the water phase. In order to quantify hydrophobicity in a high-throughput manner, retention times under standard conditions on reverse-phase HPLC columns,31–34 or n-octanol coated columns, have been used. These methods have achieved widespread popularity because they are high-throughput and permit a wide range of lipophilicities, beyond the scope of shake-flask methods, to be estimated using retention times on reverse-phase HPLC gradient systems.
Young and colleagues35 have demonstrated the that chromatographic log D7.4 values provided improved correlations over n-octanol–water log D7.4 values with a number of key ADMET endpoints including solubility, permeability and CYP inhibition, especially for very hydrophobic molecules (although one might question why so many hydrophobic molecules were made in the first place). They also highlighted the additional and independent importance of aromatic ring count in a number of these endpoints. The property forecast index (PFI) was defined as the chromatographically determined log D7.4 plus the aromatic ring count. Aromatic ring count was found, independently of lipophilicity, to have a negative influence on a number of developability parameters, including CYP inhibition, hERG inhibition, solubility and selectivity. PFI values were better at distinguishing the top 100 marketed oral drugs in 2009 from GSK development candidates 2001–2009 than log D7.4 alone. The guidance for three key properties for development of oral drugs which emerged from this study is dose ≤100 mg, FaSSIF solubility ≥100 µg mL−1 and PFI ≤6.81 FaSSIF solubility is a biorelevant solubility measure in fasted state simulated intestinal fluid.36 As an indication that the control of physicochemical properties can mitigate unforeseen risks, Fournier and colleagues also found that the intrinsic property forecast index was related to phototoxicity.37
In summary, many different partitioning systems have been used to experimentally quantify hydrophobicity, and each system has its strengths and weaknesses. What is most important to the medicinal chemist is applying an experimental method to quantify hydrophobicity, to calibrate predictions for the next compound to be synthesised, so that this property can be controlled in the design of new molecules, as it is the fundamental compound quality indicator.
1.3.6 Calculating Log P and Log D7.4
Probably the best known log P calculator, often acknowledged as the “gold standard”, is the clog P algorithm. It was developed at Pomona College38 and is available through their own software package and also the DAYLIGHT cheminformatics system.39,40 This empirically derived calculator uses a fragment-based approach to estimate log P based on the 2-D graph of the molecule. The program fragments the molecule into polar fragments and isolating carbons (an isolating carbon is one which is not doubly or triply bonded to a polar fragment). The fragmental constants were estimated from known molecules in the measured log P database. Where the fragment is unknown, it can be estimated, historically this resulted in a “missing fragment” error. The additive approach includes correction terms to account for neighbourhood interactions of polar atoms and groups (which reduces their overall hydrophilicity due to the interaction of their hydration shells), intramolecular hydrogen bonding and electronic effects.
A simpler but widely used log P algorithm proposed by Ghose and Crippen41 uses atom-based functions. Many different log P calculators have been proposed and are in common use, mostly being variants on the fragmental and atomic fundamental methods. The performance of over 20 algorithms, on two public databases and the Pfizer database of 95 809 measurements, has been evaluated.42 While many methods produced reasonable results on the public database, few were successful predicting a Pfizer internal dataset. It is in the nature of empirical models that they do well on compounds similar to those on the which the model was trained (e.g. public data), but can do worse on unseen data, or more dissimilar compounds (such as the Pfizer internal database). A simple equation, based on the number of carbon atoms and number of heteroatoms, out-performed many methods:
Many pharmaceutical companies, with their own internal measured log P/log D7.4 databases, use QSAR approaches to either “tune” the published methods, by having them as input descriptors to a multivariate QSAR model, or calculate log P/D7.4 directly from QSAR models trained on their internal measurement databases using their favourite molecular descriptors.
For medicinal chemists the important question is “does the algorithm I use predict the lipophilicity of my chemical series with acceptable accuracy and precision?” Which method is most suitable may differ from project to project according to the chemistry being pursued. For this reason, experimental determination is necessary. However, even if log P cannot be calculated with accuracy, it can still be used within a series, as pairwise differences in lipophilicity within a series are often well predicted, even if the absolute value is incorrectly estimated. For example, within a chemical series, when structural changes are being made remote from the ionising centre, the ionisation constants are often unchanged. Correlations between measured log D7.4 and calculated log P values can then use used to guide further compound optimisation. In a series of dual β2/D2 receptor agonists, different subseries gave different correlation lines between calculated log P and measured log D, (largely due to differences in the basic pKa of each subseries), but within each series the calculations and measurements were well correlated,43 Figure 1.4.
If automatically updating QSAR models are used to guide your project, any systematic errors in the predictions should be resolved as representatives from your series are measured and their values enter the log P model. One of the authors (AD) has spent many years calculating log D, measuring log D and updating local models for projects with the latest measurements – so projects had available the most accurate and precise predictions to aid the decision on which compound to make next.
A number of free log P calculators are available. The very useful structure–activity tool Datawarrior contains a log P calculator,44 as does the SWISS ADME45 and the MedChem Toolkit iPad app.46
Free log D7.4 calculators are rarer. In order to calculate log D7.4 from log P, the ionisation constants of the molecule must also be calculated. The commercial Physchem suite from ACDLabs has implemented a fully integrated package for calculating log P, pKa and log D,47 and ADMETLAB48 has a log D7.4 calculator. QSAR methods have been used to estimate log D7.4 directly from chemical structure49 without the need to calculate log P and pKa separately.
1.4 Ionisation Constants
The ionisation constant of a compound is given by the following equation and quantified as an equilibrium constant Ka:
Since biological membranes only efficiently support the passive partition of neutral molecules, the ionisation state of a molecule is an important property. The strength of drug–receptor interactions such as charged-reinforced hydrogen bonding and salt bridges also implicitly depend upon the ionisation constant. Formulation of salts, and solubility, also critically depend on the ionisation constants. The ionisation constant, Ka is normally recorded as the negative logarithm of the ionisation constant, the pKa, with most drugs with ionisable centres having pKa values in the range 2–12. The pKa is the pH at which the compound in solution is 50% ionised.
1.4.1 Measuring Ionisation Constants
Ionisation constants can be measured by a number of methods, including potentiometric titration, spectrophotometrically or by NMR. The determination of ionisation constants is not trivial, not being amenable to high throughput, and in the cases of potentiometric titration and NMR, requiring a large amount of material. Fortunately, pKas for most ionisable functional groups can be estimated by well-established linear free–energy relationships.
1.4.2 Calculating Ionisation Constants
In the 1930s, quantitative structure reactivity relationships (QSARs) defined the field of physical organic chemistry, through the work of Hammett,50 Brønsted and Pederson, and others.51 Hammett showed that the effect of meta- and para-substituents attached to a benzene ring on reaction rate or position of the chemical equilibrium in which the reaction centre participates, is determined by the electronic effect of the substituent. Different reactions may have different sensitivities to the substituent effect. Hammett defined an experimentally based descriptor σ, using the ionisation of benzoic acids as the calibrating reaction, and σm or σp represented the difference in pKa between the meta- or para-substituted benzoic acids and unsubstituted benzoic acid itself. The sensitivity of a particular reaction to the electronic effect of a substituent was described by ρ, the slope of the correlation between the logarithm of the equilibrium (K) or rate constant (k) for that reaction and the sigma values, eqn (1.1):
In Hammett's initial publication he tabulated σ values for 31 substituents, and linear energy correlations for 39 reactions. While Hammett's observations were empirical, he recognised that they may indicate a common underlying physics-based explanation, relating to electron attraction or repulsion by inductive and resonance effects.
In the 1950s Taft extended Hammett's work to define a sigma scale for aliphatic systems.52 Taft used the acid- and base-catalysed hydrolysis of esters as the calibrating reaction. He proposed that as the acid- and base-catalysed hydrolysis of esters 2 proceed through a very similar tetrahedral transition-state structure, and under identical conditions, both reactions would experience similar steric influences. As the base-catalysed reaction went from a neutral ground state to a negatively charged transition state, it would be more sensitive to electronic effects of substituents than the acid-catalysed hydrolysis reaction, which moves from a positively charged ground state (rapid pre-protonation of the ester carbonyl) to a similarly positively charged transition state. Therefore, the acid-catalysed hydrolysis of esters would be less sensitive to the electronic effect of substituents, while remaining sensitive to their size. On the other hand, base-catalysed hydrolysis would be sensitive to both size and electronic effects. Taft was therefore able to also define a new a steric substituent constant scale, termed Es. Taft gave the symbol σ* to the electronic effect of the aliphatic substituent. The sensitivity of a particular reaction to the electronic and steric effects of a substituent were described by ρ* and δ, respectively, the regression coefficients from the correlation between the logarithm of the equilibrium (K) or rate constant (k) for that reaction and the σ* and Es values of the substituents, eqn (1.2):
In the 1960s Swain and Lupton went on to factor the two effects into their Field & Resonance (F & R) scales for aromatic substituents,53 building on the earlier proposal by Hammett.
Using tabulated σ/σ* values and ρ values for different ionising centres, the pKas of most ionising centres can be calculated where measurements are absent. When the author (AD) started work in the pharmaceutical industry, he spent many hours manually calculating pKas using tabulations of Hammett and Taft equations, and the book by Perrin “pKa Prediction of Organic Acids and Bases” was considered a bible.54 Nowadays, expert systems such as ACDLabs can calculate pKas by applying these empirical relationships. The software uses a set of Hammett equations, and an internal database of σ-values, together with complex structural perception to identify the electronic environment of the ionising centre. But pKas can also be calculated using the physics-based approaches of computational chemistry,55 and increasingly using machine learning algorithms56 including some open-source packages.57
1.4.3 Manipulating pKa in Medicinal Chemistry Strategy
Manipulating pKa is an important strategy in drug design, to optimise potency through direct drug–receptor interactions, manipulation of overall physical properties such as log D7.4, improving solubility by introduction of an ionising centre, controlling other pharmacokinetic properties such as lung retention28 and modulating off-target activities. For example, reduction of pKa of basic amines can successfully reduce unwanted hERG activity.58
In a classic example, pKa was shown to be a major controller of antibacterial activity of sulphanilamide drugs showing a biphasic dependence on potency, Figure 1.5. From pKa 7–3, potency decreases since only the neutral form of the compound can transport into the cell (pH partition hypothesis). From pKa 7–11, potency decreases since the active species is the anion and its concentration decreases as the pKa increases, providing an optimal activity at around pKa 7.59
Ionisation constants are sensitive to conformational and stereoelectronic effects which are not always well predicted. For example, Muller showed that antiperiplanar sigma effects have a strong influence on pKa values of 4-fluoropiperidines, while 3-fluoropiperidines are subject to dipolar effects depending on axial or equatorial substitution,60 Figure 1.6. Even in alicyclic systems such stereoelectronic effects can be significant,61 and for this reason one should not rely on calculations alone, but measurements can highlight important effects where fine control of pKa is required.
In a comprehensive recent review by Walters,62 the role of charge type and pKa control in medicinal chemistry programs effectively demonstrates how important pKa control can be in optimising efficacy, safety, CNS penetration and tissue distribution. For example, pKa manipulation was critical in the design of a series of geldanamycin HSP90 inhibitors. Many weakly basic anticancer agents accumulate extensively in the acidic lysosomes through ion trapping, reducing their activity, since anticancer drug targets are often localised in the cell cytosol or nucleus. Some cancer cells have defective acidification of lysosomes, which causes a redistribution of trapped drugs from the lysosomes to the cytosol. Such differences in drug localisation between normal and cancer cells can be exploited to improve cancer-cell selectivity. Control of the pKa of geldanamycin analogues to around 8.1 had the maximum degree of selectivity for HL-60 leukemic cells vs. normal human fibroblasts,63 Figure 1.7.
In a second cited example, lowering the pKa of a lipophilic basic Met inhibitor, GEN-203 (N-ethyl-3-fluoro-4-aminopiperidine), through the introduction of an additional fluorine in the 3-position of the aminopiperidine, led to a decreased volume of distribution of the compound in mouse by approximately 4-fold (from 3.6 to 0.99 L kg−1), while maintaining cell potency and in vivo efficacy against the target kinase, Figure 1.8. Compared to GEN-203, GEN-890 had reduced liver and bone-marrow toxicity in mice, Table 1.3.64
. | |||
---|---|---|---|
Compound . | R . | pKa . | rHsp90 affinity Ki (nM) . |
GDA | NA | 117.8 | |
1 | 5.8 | 42.1 | |
2 | 6.5 | 102.5 | |
3 | 7.8 | 33.7 | |
4 | 8.1 | 34.5 | |
5 | 9.9 | 203.5 | |
6 | 10.5 | 53.8 | |
7 | 12.4 | 250.3 |
. | |||
---|---|---|---|
Compound . | R . | pKa . | rHsp90 affinity Ki (nM) . |
GDA | NA | 117.8 | |
1 | 5.8 | 42.1 | |
2 | 6.5 | 102.5 | |
3 | 7.8 | 33.7 | |
4 | 8.1 | 34.5 | |
5 | 9.9 | 203.5 | |
6 | 10.5 | 53.8 | |
7 | 12.4 | 250.3 |
Reduction in pKa and/or log P of a series of H1-antihistamines (for insomnia) through the introduction of a difluoroethyl side chain lowered the pKa of a guanidine-based 2-amino dihydroquinazoline 5-HT5a antagonist (from 9.9 to 8.9) and resulted in a significantly improved brain-to-plasma ratio, enhancing the pharmacological utility of these compounds.65
As we have previously mentioned, converting bases to zwitterions has been a successful strategy in producing non-sedating antihistamines. The incorporation of a carboxylic acid within a series of basic renin inhibitors to make them zwitterionic led to improved off-target profiles (CYP3A4 time-dependent inhibition and hERG affinity) relative to analogous non-zwitterionic inhibitors.66
1.5 Hydrogen Bonding
Hydrogen bonds are key drug–receptor interactions driving enthalpic binding, but also a key means of manipulating bulk physicochemical properties. It would be thought that increasing hydrogen bond strength would directly contribute to increased affinity in drug–receptor interactions dependent upon hydrogen bonds and be an easy way to optimise interactions at the ligand target interface. Indeed, different functional groups have intrinsically different hydrogen bonding abilities, and various hydrogen bonding scales have been derived, usually through measuring the free energy of hydrogen bond formation of donors with a fixed reference acceptor or acceptors with a fixed donor in non-hydrogen bonding solvents.67 But these parameters have found few applications in medicinal chemistry projects. The Δlog P scales, whereby the difference between the log P in two different solvent systems, often n-octanol–water and alkane–water, appear to encode for hydrogen bonding capacity of a solute and its uptake into the brain,68 and Δlog P measurements have recently been proposed as a way of describing intramolecular hydrogen bonding.69
Maybe one of the reasons why the intrinsic hydrogen bonding ability is less important in drug–receptor interactions is because it is an exchange process. Hydrogen bonds that the solute makes to water molecules in the solvent are broken and exchanged for new hydrogen bonds in the receptor cavity. Increasing the hydrogen binding ability of a functional group may favour the formation of the new bond, say in the active site of a protein, but disfavour the breaking of other hydrogen bonds, say to bulk water, Figure 1.9.
The overall benefit gained by the exchange may be difficult to predict. Hydrogen bond counts are, however, widely used and, in particular, the number of hydrogen bond donors appears to be a very important compound quality metric, as it appears to have a large impact on permeability and brain penetration. In marketed oral drugs there is a wider tolerated number of hydrogen bond acceptors as it is the primary means of manipulating log P and balancing hydrophobicity. In contrast, the number of hydrogen bond donors appears more strictly “controlled”. The reasons for this are not well understood, but desolvating a hydrogen bond donor may be more difficult than desolvating a hydrogen bond acceptor. In addition, the choice of n-octanol–water as the standard system for describing hydrophobicity may overly favour the partitioning of donors compared to the centre of a lipid bilayer, due to the high solubility of water in n-octanol itself, and the different nature of dissolved water versus bulk water. This would explain why Δlog P is such a good surrogate for hydrogen bond donor count in modelling blood–brain barrier permeability. For whatever reason, it is very clear that controlling the number of hydrogen bond donors while balancing hydrophobicity, by manipulating the hydrophobic groups and hydrogen bond acceptor count and polar surface area, is a fundamental design strategy in medicinal chemistry.
1.5.1 Polar Surface Area
The total polar surface area descriptor is a means of quantifying the overall number of polar hydrogen bonding groups contained in the molecule. It has been suggested that for CNS drugs the PSA should be below 90 Å,2,70 while it can be somewhat higher for systemically restricted oral drugs.71 A PSA <60 Å2 has been reported to be desirable to ensure complete absorption.72 Polar surface area is a key part of Pfizer's central nervous system (CNS) multi-parameter optimisation (MPO) algorithm for identifying drugs with greater probability of success in testing hypotheses in the clinic.73,74 Their MPO uses several key physicochemical properties, Table 1.4.
Properties . | Transformation . | Weight . | Most desirable range (T0 = 1.0) . | Less desirable range (T0 = 0.0) . |
---|---|---|---|---|
clog P | Monotonic decreasing | 1.0 | clog P ≤ 3.0 | clog P > 5.0 |
clog D | Monotonic decreasing | 1.0 | clog D ≤ 2 | clog D > 4 |
MW | Monotonic decreasing | 1.0 | MW ≤ 360 | MW > 500 |
TPSA | Hump function | 1.0 | 40 < TPSA ≤ 90 | TPSA ≤ 20; TPSA > 120 |
HBD | Monotonic decreasing | 1.0 | HBD ≤ 0.5 | HBD > 3.5 |
pKa | Monotonic decreasing | 1.0 | pKa ≤ 8.0 | pKa > 10.0 |
Properties . | Transformation . | Weight . | Most desirable range (T0 = 1.0) . | Less desirable range (T0 = 0.0) . |
---|---|---|---|---|
clog P | Monotonic decreasing | 1.0 | clog P ≤ 3.0 | clog P > 5.0 |
clog D | Monotonic decreasing | 1.0 | clog D ≤ 2 | clog D > 4 |
MW | Monotonic decreasing | 1.0 | MW ≤ 360 | MW > 500 |
TPSA | Hump function | 1.0 | 40 < TPSA ≤ 90 | TPSA ≤ 20; TPSA > 120 |
HBD | Monotonic decreasing | 1.0 | HBD ≤ 0.5 | HBD > 3.5 |
pKa | Monotonic decreasing | 1.0 | pKa ≤ 8.0 | pKa > 10.0 |
This analysis indicates that higher CNS MPO desirability scores enhance the odds of identifying compounds with drug-like ADME and safety attributes, such as high passive permeability, low P-gp liability, low clearance and high cellular viability.
The PSA metric does not discriminate between polar surface contributed by donors and acceptors. As described above, controlling hydrogen bond donor surface area is much more critical than controlling hydrogen bond acceptor surface area, and therefore minimising hydrogen bond donors while manipulating hydrogen bond acceptors to control overall log P is a more nuanced medicinal chemistry strategy.
1.5.2 Quantifying the Contribution of a Hydrogen Bond
The authors' favourite paper quantifying the contribution of hydrogen bond to drug–receptor interactions comes from the systematic study of enzyme kinetics with site-directed mutation of the enzyme tyrosyl transfer RNA synthase. Of eleven possible hydrogen bonds identified from crystal structure tyrosyl-RNA synthase and tyrosyl adenylate, eight are from side chains that could be mutated. After mutation, the change in free energy of activation was determined from the measurement of kcat/Km. The loss of a charged-reinforced hydrogen bond was found to be worth an approximately 1000-fold decrease in binding, while loss of a neutral hydrogen bond was found to be worth only 2.5–15-fold,75 Figure 1.10.
Equally elegant and comprehensive studies of the interaction of vancomycin and ristocetin antibiotics with small peptides came to similar conclusions.76 While the benefit in forming a neutral hydrogen bond may be small as hydrogen bonding is an exchange process, the penalty in not forming one in a protein active site may be large. This brings us to the much-pondered question, is it good to displace an active-site water or not? While it may be entropically favourable to displace a water molecule from a protein active site back to the bulk phase, it may be enthalpically favourable or unfavourable. While no definitive literature exists on the subject, rules of thumb suggest one should consider the hydrogen bonding environment around the water molecule, and whether the water molecule is a happy or unhappy water. If it is already forming four hydrogen bonds in the protein active site, it would appear it may be a structural water and hard to replace, as displacing it with a donor or acceptor will form fewer hydrogens bonds than were lost, destabilising the protein and costing free energy. If the water molecule is forming three hydrogen bonds, one may get more free energy gain by forming an additional hydrogen bond to it. If it is making only one or two hydrogen bonds to the protein, it may be an unhappy water, which can be mimicked by the hydrogen bonding group of the drug and it may be favourable to displace it.
1.6 Solubility
It is a truism that in order for a drug to act on its protein target it must be in solution. Therefore, adequate aqueous solubility is a critically important molecular property. For poorly soluble compounds, dissolution rate is also an important factor, although dissolution rate is likely highly correlated to the overall equilibrium solubility, in that poorly soluble compounds are likely slower to dissolve, as has been demonstrated for a series of substituted benzoic acid salts of benzylamine.77 Modern formulation techniques can improve solubility and dissolution, as discussed in the Pharmaceutical Properties Chapter 21, but add complexity and time to the development process. Poorly soluble drug candidates spend up to two years longer in development compared with those without solubility problems.78
But what is sufficient solubility? Much is dependent upon the dose. A simple rule of thumb uses the dose (in mg) divided by the solubility (mg mL−1) – a ratio of <500 mL, an estimate of the standing volume available in the GI tract, is preferable to obviate the need for special formulations.78 The concept of maximal absorbable dose (MAD) is a more sophisticated ranking tool for potential drug candidates:79
where S = solubility (mg mL−1 at pH 6.5; Ka = trans-intestinal absorption rate constant (min−1); SIWV = small intestinal water volume (approximately 250 mL for man) and SITT = small intestinal transit time (approximately 270 min for man).
Estimates for a 1 mg Kg−1 dose gives a minimum acceptable solubility of 5, 50 and 500 µg mL−1 for low, medium and highly permeable compounds, respectively.
The Biopharmaceutics Classification system (BCS) of novel chemical entities classifies drugs based on their permeability and solubility, Table 1.5.
BCS class 1 high permeability high solubility | BCS class 2 high permeability low solubility |
BCS class 3 low permeability high solubility | BCS class 4 low solubility low permeability |
BCS class 1 high permeability high solubility | BCS class 2 high permeability low solubility |
BCS class 3 low permeability high solubility | BCS class 4 low solubility low permeability |
The FDA has issued guidance allowing applications for biowaivers for BCS class 1 immediate release solid-dose oral drugs from the need for in vivo bioequivalence testing, as absorption is unlikely to be dependent upon dissolution and gastric emptying time.
A drug substance is classified as highly soluble if the highest single therapeutic dose is completely soluble in 250 millilitre (mL) or less of aqueous media over the pH range of 1.2–6.8 at 37 ± 1 °C.
The assessment of permeability should preferentially be based on the extent of absorption derived from absolute bioavailability or mass balance human pharmacokinetic studies. But permeability can be also assessed by validated and standardised in vitro methods using Caco-2 cells and the results discussed in the context of available data on human pharmacokinetics. If high permeability is inferred by means of an in vitro cell system, permeability independent of active transport should be proven, as if high permeability is not demonstrated, the drug substance is considered to have low permeability for BCS classification purposes.
For low solubility drugs the media in which solubility is measured can be critical in accurately predicting in vivo dissolution. Fasted state simulated intestinal fluid, (FaSSIF) simulates conditions in the proximal small intestine, and solubility measured in FaSSIF can more accurately model the likely behaviour of poorly soluble drugs in vivo. Measurement in fed state simulated gastric fluid can indicate whether the formulation should be taken before or after meals.36
Solubility may be even more important than permeability in driving drug absorption. Measurements of solubility cover a wider dynamic range than measures of permeability (Caco-2 permeability for example) and a high solubility may be enough to drive absorption for poorly permeable drugs. Poorly permeable but high solubility BCS class 3 drugs such as cimetidine, amoxycillin and atenolol can show high and rapid absorption if the dissolution rate is high and they behave as BSC class 1 drugs. Biowaivers can be granted for BCS class 3 drugs if all of the excipients are qualitatively the same and quantitatively similar (except for film coating or capsule shell excipients).80
1.6.1 Measurement of Solubility
The measurement of solubility, while superficially a simple experiment, has many pitfalls and caveats. The experimentally determined solubility is dependent upon buffer choice, ionic strength, temperature, pH, supersaturation, the starting solid-state form and its history and impurities, amongst other factors.81 Many experimental protocols try to measure thermodynamic equilibrium, but for some compounds this can take extended periods of time. The solubility is often measured at a particular pH, often pH 7.4 and in a particular buffer, at a fixed time point (4 h) although the solubility of the neutral form can be measured, in which case this is known as the intrinsic solubility.
Dissolution rate is an even more involved experiment. In order to control as many of the potential sources of variability as possible, intrinsic dissolution rate requires a compressed spinning disc of the solid to control surface area and fluid flow across the surface,82,83 which requires large amounts of drug.
In order to generate solubility on a large number of compounds, solubility from a stock DMSO solution injected into a buffer, after suitable equilibration, can be measured. Solubility is determined from the turbidity threshold, or quantitated spectrophotometrically or by HPLC, after filtration of undissolved drug. In a validation study at AstraZeneca the measured solubilities of 200 compounds showed a good correlation with a gold-standard manual method measured from solid with an average fold deviation of 3-fold.84 Some compounds showed much larger differences. It was noted that the authors did not expect a perfect correlation within the replicate error of each other's measurements, as they had changed the solid state form.
Solubility is often measured in pH 7.4 buffered aqueous solution, but as already mentioned, for pharmaceutical applications, dissolution into real or simulated intestinal fluid, under fasted (FaSSIF) or fed (FeSSIF) conditions can be used better model to model in vivo drug dissolution.36
1.6.2 Calculating Solubility
Solubility is difficult to measure accurately and precisely and more difficult to predict. Most solubility calculators use empirical QSAR equations based on literature data or in-house datasets. Their predictiveness for an individual chemical series should always be tested.
The Yalkowsky General Solubility Equation,85 which is not in fact a QSAR model, but a physically derived equation, contains a negative coefficient for log P and melting point (mpt), as dissolution would require the breaking of crystal packing interactions and the disruption of water structure by the hydrophobic nature of the drug.
Few QSAR models, nor indeed the Yalkowsky GSE for solubility, do better than an average error of +/−0.9 log units over a range of diverse drug-like chemistries, a range so wide as to be of limited utility in optimising solubility for most projects. But within local project series, solubility may be better predicted. The controlling influence of lipophilicity and the fact that many other ADMET properties depend upon log P, means the best way to control solubility is to control lipophilicity. Other molecular strategies involve breaking up the crystal packing by introducing 3-D character into the drug molecule, and more pharmaceutical approaches such as changing the counter ion for salts, or forming co-crystals, as described in the Pharmaceutical Properties Chapter 21. Some of this thinking drives interest in another compound quality metric, Fsp3 (the fraction of carbon atoms that are sp3 hybridised).86 Increasing Fsp3 was found to correlate with decreased melting point (lower crystal packing forces) and increased solubility, in accord with the Yalkowsky general solubility equation.
Young et al. reported on the detrimental effect that aromatic rings can have on solubility, above and beyond that due to their log P.35 Molecular weight had no effect on solubility.87 As aromatic ring count correlates inversely with Fsp3,88 one can appreciate that both aromatic ring count and Fsp3 point to an independent contribution of crystal packing forces being important for solubility over and above their contribution to hydrophobicity.
1.7 The Rule of Five
The rule of five (Ro5), formulated by Lipinksi in 1997,89 has become a central part of medicinal chemistry lore. Before the late 1980s, drug discovery relied on chemical starting points from natural products, follower drugs and classical medicinal chemistry design, and depended strongly on in vivo screening. The requirement for aqueous solubility sufficient for in vivo activity acted as a natural limiter on medicinal chemistry optimisation. However, around 1990 the situation changed. The advent of HTS, combinatorial chemistry and advances in molecular biology enabled the screening of 100 000s of compounds in in vitro assays, finding many more and diverse start points for projects than had previously been possible. But along with the advantages came compromises. Compounds were no longer solubilised in water but in DMSO, enabling the screening at high concentrations to find micromolar hits, sometimes exceeding their thermodynamic aqueous solubilities. The focus of drug optimisation became in vitro screening, and combinatorial screening libraries increased molecular weight and lipophilicity, since this appeared to increase the chances of finding sufficiently potent hits in the HTS campaigns.
Scientists at Pfizer were the first to realise that HTS was adversely affecting the physicochemical property distribution of their whole compound library. Leads were getting bigger and more lipophilic, and medicinal chemistry optimisation led to further drift in undesirable physicochemical properties, notably poor solubility and permeability.
In a bid to define guidelines for chemical synthesis, an aspirational library was chosen on the assumption it would represent a physicochemical property space consistent with success, with compound progression to Phase II as a selection filter. The hypothesis was that compounds with poorer physicochemical properties would have been weeded out by preclinical and Phase I safety evaluation, and that properties of resulting Phase II compounds would define a “holy grail” profile for early drug discovery. Considering the target audience of medicinal chemists, Lipinski and co-workers chose pragmatic descriptors known to be important for solubility and permeability, molecular weight, log P and hydrogen binding descriptors. For a calculated log P value they applied both the Moriguchi (Mlog P) method, which did not suffer the missing fragment problem of the then gold standard clog P, as well as clog P itself. For hydrogen bonding, donor and acceptor counts using sums of OH + NH groups and O + N atoms respectively, were unambiguous and did remarkably well compared to quantitative measures such as the Abraham solvatochromic or Raevsky C scales. In a similar manner to setting a confidence internal on an IC50 from an in vitro assay, they chose “cutoffs” so that 90% of the Phase II compounds had parameters within the calculated range. Hence the rule of 5 was born.
The Ro5 states that poor solubility or permeability are judged as more likely when:
MW > 500
clog P > 5 or Mlog P > 4.15
N + O count > 10
NH + OH count > 5
The Ro5 was proposed to be violated when two or more of these criteria are not met. Compound classes, such as some natural products, known to be substrates for transporters, were recognised as being exceptions to the rule. The Lipinski paper has become a medicinal chemistry classic defining the whole genre of drug-like properties and compound quality metrics, stimulating a massive literature on these topics.
However, HTS identifies hits or leads rather than drugs. As medicinal chemistry optimisation appears most often to add mass and sometimes lipophilicity, ideally screening libraries for HTS should be smaller and less lipophilic than drugs, in fact more lead-like.90,91 Further, the molecular complexity of larger molecules was suggested to preclude the chances of finding efficient leads, providing yet further justification for HTS collections being more lead-like.92 An analysis92 of historical lead drug-pairs previously compiled by Sneader93 highlighted the increase in molecular weight between leads and drugs by an average of 42 daltons, which has also been proposed to be the answer to the ultimate question of life, the universe and everything.94
A more recent analysis of physical property changes seen in successful hit-to-candidate optimisations from the 2016–2017 literature shows a large average increase in molecular weight of 85, but this was accompanied by no average change in clog P, suggesting that control of lipophilicity is being widely pursued in contemporary drug discovery.95 When starting points are already advanced literature leads, the changes in physicochemical properties are likely to be smaller or non-existent, as seen in the development of follow-on compounds from first-in-class drugs.96
One approach to overcome the hard cut-off approach of the Ro5 metrics is to use a desirability function approach as defined in the “quantitative estimate of drug-likeness” (QED), which scores molecules on a continuous scale, in the same way as the CNS MPO discussed above. QED is derived from the sum of desirability functions for eight different molecular properties that are based on their distributions in a curated historical set of >700 drugs.97 A number of drugs that fail Lipinski criteria have QED scores that overlap with drugs that pass the criteria.
1.7.1 Beyond the Rule of Five
Medicinal chemists continually debate whether the rule of 5 and drug-like and lead-like thinking should fundamentally underpin medicinal chemistry strategy or if they overly constrain it, and studies of “beyond the rule of five” (bRo5) compounds and drugs have become popular and influential. Many of these papers are directly or indirectly critical of the “rule of five”, often highlighting sets of approved drug compounds that “violate” the rules.98–102 A consistent subset of these publications focuses in particular on antibiotic and other anti-infective compounds, despite the fact that Lipinski et al. explicitly stated that orally active antibiotic, fungicide and antiparasitic compounds fell outside the “rule of five space”. Reports that potential antiparasitic drugs were being cast aside or deprioritised because they failed two of more of the “rules” so concerned the original Ro5 authors that a recent position piece was published, pointing out the salient facts and recommending the use of less stringent criteria for compounds progression in an area that is significantly under-funded.103
AbbVie analysed the property space of all compounds in their preclinical DMPK database that violated Lipinski's Ro5 (n = 1116) by virtue of failing >1 rule.104 Many such bRo5 molecules had poor in vitro permeability, which was ascribed to technical issues such as poor solubility and adherence to laboratory apparatus.104 AbbVie scientists therefore decided to directly screen bRo5 compounds for their bioavailability in vivo in rats. A surprising observation was that the upper quartile bioavailability value found was 27% – i.e. one quarter of the bRo5 compounds tested had bioavailability >25%. This shows bioavailability can be achieved in bRo5 space across a range of structures, which is of course unexpected based on the Ro5 teaching. The AbbVie research found, contrary to preconceptions, that in order for bRo5 compounds to demonstrate acceptable oral absorption, they must nevertheless have the right balance of physicochemical properties endemic to successful oral drugs, namely, optimal lipophilicity, a low number of aromatic rings (echoing the importance of aromatic ring count highlighted in GSK's PFI metric) and a low number of rotatable bonds. They developed a multiparametric mnemonic, AB-MPS, combining the physicochemical properties most closely correlated with an increased probability of higher oral bioavailability in bRo5 space:
Values of AB-MPS < 14 were associated with higher bioavailability.104 Determination of experimental polar surface area105 (EPSA) may be a useful tool for predicting acceptable permeability for bRo5 chemical series with high TPSA values. EPSA, a relative polarity profile derived by supercritical fluid chromatography (SFC) retention, may more adequately describe the formation of intramolecular hydrogen bonds, which some bRo5 compounds exploit to hide their polarity (hydrophobic collapse or their chameleonic efficiency) in non-polar milieu and hence facilitating membrane permeability.106 The rotatable bond influence in AB-MPS is large and probably captures bioavailable bRo5 macrocyclic compounds, where intramolecular hydrophobic interactions and H-bonding are facilitated by the macrocyclic structure having restricted rotatable bonds, as notably demonstrated by the permeability of the immunosuppressive drug cyclosporine and its analogues.107
Orally available compounds can exist on the edge of Ro5 space and beyond, but may have some compromises. The emergence of proteolysis targeting chimeras (PROTACs) as a new drug modality has the potential to rival or surpass CRISPR/cas9 as a therapeutic modality in targeted protein degradation. PROTACs, having inherently high molecular weight (mostly >700), which comprise the template structure binding to the protein of interest, a linker moiety and an E3 ligase warhead, are forcing chemists, whether sceptics or fans of Ro5, to seek opportunities in bRo5 space.108 Those PROTACs based on the VHL E3 ligase warhead have less favourable Ro5 properties than those based on cereblon E3 ligase ligands. This is largely irrespective of the protein ligand of interest and linker and is probably due to increased hydrogen bond donor count. In a series of cereblon-based PROTACS, bioavailability in the rat of >30% was seen in 1/3 of compounds tested, in comparison with molecules in conventional chemical space, where 2/3 had >30% bioavailability.109 These bioavailable PROTACS required high chrom log D values of 5–7. VHL-based PROTACS had <5% bioavailability in this study. While oral absorption may be compromised in the VHL class, the catalytic nature of the PROTAC mechanism of action may negate the need for 24 hour target coverage required of typical reversible non-covalent small-molecule antagonists,110 and we are now seeing the first PROTAC molecules reaching clinical stages of drug development.
Many authors have examined changes in drug and discovery compound physicochemical properties over time.91,111,112 The four Ro5 properties differ in that molecular weight and hydrogen bond acceptor counts are increasing significantly over time, whereas hydrogen bond donor count and lipophilicity remain near constant. It is fair to say the 90% cut-off values used for the Ro5 properties depend on the time frame chosen, leading some to argue that the notion of drug-likeness is intrinsically flawed.111 However, those properties most resistant to change – particularly lipophilicity and hydrogen bond donor count – are arguably the most important to control in drug discovery programs.
1.8 Ligand Efficiency Metrics
Following the focus on drug-like and lead-like physical properties, medicinal chemists have tried to define further measures using in vitro potency, against which optimisation pathways can be quantified. A number of efficiency metrics113 have been defined, which seek to quantify the physicochemical property “cost” of achieving potency. Some selected ligand efficiency metrics are shown below:
The most widely applied measures are ligand efficiency (LE)116 and lipophilic ligand efficiency (LLE).114 Both measures have useful features, as well as some compromises, which are summarised in Table 1.6.
LE (ligand efficiency) . | LLE (lipophilic ligand efficiency) . |
---|---|
Pros:
| Pros:
|
Cons:
| Cons:
|
No ideal values:
| Values as high as possible (ideally >5):
|
LE (ligand efficiency) . | LLE (lipophilic ligand efficiency) . |
---|---|
Pros:
| Pros:
|
Cons:
| Cons:
|
No ideal values:
| Values as high as possible (ideally >5):
|
Ligand efficiency (LE) was first proposed as a method for comparing hit molecules from a screen according to their average binding energy per atom,115 and a way of quantifying the “bang for your buck”.116 LE applies the principles of Occam's razor – every atom should contribute. With a drug having to balance so many factors to reach the clinic (potency, selectivity, ADMET properties, solubility, formulation and solid–state properties, etc.), carrying molecular features which do not clearly make a contribution, or are not specifically designed in for a reason, can build in unnecessary risks. Many medicinal chemistry programs, in particular those derived from fragment hits, focus on maintaining the implicit efficiency of a core scaffold or fragment as they are optimised towards leads and drugs. Fragment-based drug discovery (FBDD) has embraced the use of LE; indeed, FBDD has become a successful hit generation capability that truly embraces the lead-like concept90,117 to facilitate successful multiparameter optimisation to clinical candidates. The fact that starting fragments, often having only mM affinity, retain the same binding pose once decorated with further substitution to reach nM potency drug candidates speaks to the efficiency of binding of the original fragment. In a cross-industry comparison on a target-by-target basis, Astex, a specialist fragment-based drug discovery organisation, had lower molecular weights, log P and rotatable bonds in patented molecules on the same shared targets compared to peer pharmaceutical companies.118
A plot of fragment–lead pairs from all publications in 2015–2019 119 shows that efficient fragments tend to yield efficient leads. However, the spread in the correlation shows that less efficient fragments can still yield highly efficient leads, if properly optimised, Figure 1.11. Hence it was proposed that LE should not be the sole driver of selection, but rather a guide for optimisation, although it may be difficult to find highly ligand-efficient leads from less ligand-efficient fragments. As the authors highlight, extremely high fragment-ligand efficiencies are difficult to maintain during optimisation. This stems from the fact that LE values decline as molecular size increases.120
LE has been criticised as a metric.121,122 However, attempting to quantify how effectively an increase in molecular size translates to affinity gain over the course of hit to lead to candidate optimisation is a valid objective. This is captured by plots of potency vs. size (as well as other properties), which are highly useful and necessary tools for all drug discovery programs. Such plots can be used to track changes in LE that occur during optimisation, the absolute values of LE being less important. While LE is not a perfect metric, this does not mean it lacks utility.123
Optimising potency is very strong primary driver for drug discovery projects, as potency is broadly correlated to dose, and lower dose drugs have improved compliance, are cheaper, and tend to be more selective and carry fewer toxicological liabilities. With hydrophobicity being such a strong driver of potency for targets as well as off-targets (receptor active sites tend to be more hydrophobic than protein surfaces or bulk solvent) and ADMET properties, optimising potency in a chemical series by increasing hydrophobicity introduces a myriad of liabilities. For these reasons increasing lipophilic ligand efficiency (LLE)114 was proposed as a key driver in optimisation. Essentially increasing LLE – the difference between p(activity) and log P or log D – is linked to increasing specificity for the target versus non-specific lipid binding and therefore it has a plausible thermodynamic basis:124
LLE tends to increase during optimisation and LLE values in candidates should be as high as possible. This is reflected by the mean values of LLE of >4 found in drugs.125
The “conundrum of drug discovery”,126 Figure 1.12, captures the essence of the LLE concept. Moving up and down the diagonal of this graph, a line of constant LLE, should in general be avoided in medicinal chemistry programs. Three strategies can be adopted: (a) increasing potency and LLE while lowering log P through finding a new drug–receptor polar interaction – this is in general the preferred way to go; (b) to lower log P and increase LLE without increasing potency, for example by finding polar substituents that are tolerated by extending from the binding site to bulk solvent; and (c) finally, if LLE values are already high enough, potency can be traded to obtain ADMET improvements. It should be noted that while LLE can be increased by lowering log P, if log P is <0, permeability may suffer. An optimal log P range of ∼0–3 can be targeted,22 but the actual log P value required in a candidate will ultimately depend on the multiparameter optimisation compromises that are usually needed,127 the chemical class pursued and the nature of the drug target ligand binding site.
Multiple analyses of changes to LLE in optimisations from the recent literature have been reviewed, providing strong support for the application of this metric in drug discovery programs.128–132 LLE is improved in optimisation across multiple target classes, irrespective of whether it was addressed deliberately in the discovery phase, or not. Rather than relying solely on the Darwinian selection pressures for compound progression imposed by the discovery assay cascade, addressing the potency/lipophilicity balance rationally from the outset is recommended as a powerful strategy to help streamline compound design choices. Indeed, a survey of literature optimisations from 2014 revealed that when authors considered lipophilicity, log P was significantly reduced and LLE increased, vs. those that did not.130
An example showing successful optimisation trajectories from HTS hits to candidates acting at the CCR5 chemokine receptor is shown in Figure 1.13. In the discovery of both maraviroc and AZD5672, the project teams employed optimisation of physiochemical properties, to achieve good DMPK properties and to overcome hERG inhibition. LLE is markedly improved as shown by the movement to the top left of the potency vs. clog P plot. While molecular weight and HA count increased, LE nevertheless was also increased in these examples. Across a larger set of 60 lead-to-drug pairs, LLE was increased by 2 units on average, whereas average LE did not change, i.e. LE increases or decreases equally.128
It is evident from Figure 1.13 that the optimised CCR5 candidate drugs possess both LE and LLE values which are bettered by few other ligands for this target. This observation appears to be a general, if not exclusive trend, as shown in Figure 1.14.125 Compared to the median LE and LLE values of compounds acting at the drug's target, 96% of the drugs have improved LE or LLE, or both.125 In this study, drugs approved in 2010–2020 had greater potency, but were equally likely to possess either higher or lower values of molecular weight, lipophilicity, hydrogen bond donors or acceptors, or aromatic ring count, in comparison with the median values of compounds acting at their targets. Thus, the most commonly used simple physicochemical properties, including the Ro5 and QED, failed to differentiate newer drugs from non-drugs on a target basis. In addition to LE and LLE, 2010–2020 drugs were distinguished from their target compounds by having increased numbers of stereocentres, and those that were carboaromatic (but not heteroaromatic) had reduced aromatic ring count and lipophilicity relative to their target compounds.125
1.9 Compound Quality and Drug-likeness
As medicinal chemists, our goal is to make drugs and to design “drug-like” compounds. We want to discover molecules that engage with chosen molecular targets at an acceptable dose, that can be delivered safely to patients by the chosen delivery route to treat unmet medical need. How well a compound matches this “drug-like” profile is the true measure of compound quality. In this sense, physicochemical properties are surrogates for true drug-likeness.
Drug-likeness applies to all therapeutic modalities: small-molecule drugs (oral, injectable, inhaled and topical); peptides; nucleic acids, oligonucleotides and antisense therapeutics; protein hormones, therapeutic antibodies and antibody-drug conjugates; vaccines and other protein scaffolds; non-protein biologics; sugars and sugar derivatives; lipids and lipopeptides; metals and metal prodrugs; modified differentiated cells and stem cell therapies; modified bacteria; vaccines and microbiome-based therapies. Drug-likeness is therefore a difficult, perhaps impossible, term to define, especially, as noted from a regulatory perspective “drug” is not an intrinsic property of medicines. The “drug” attribute is conferred by a regulatory agency such as the US Food and Drug Administration (FDA), based on available evidence at the time of the New Drug Application (NDA). Because scientific evidence requirements vary from country to country, the “drug” attribute is not only time-dependent (marketing approval can be bestowed, and later withdrawn), but also geographically based (not all drugs are approved in all countries).133 It follows that compound quality is an equally difficult term to pin down. Within each modality class and target profile, however, a compound quality space can be defined. For much of this chapter we have referred only to small-molecule drugs, and for the most part, oral drugs. The physicochemical parameters we have discussed are important for binding to drug targets and are surrogates for the key experimental compound quality ADMET measures of permeability, metabolic clearance and solubility.
A candidate drug is the result of many years of a drug discovery campaign and the synthesis of hundreds or thousands of compounds, each of which we hoped could be “the one”. Overall, the success rate of candidate drugs reaching the market is low (∼1–5%). Spotting “the one that can make it” is a great challenge and the research and guidance summarised in this chapter, and book, is about increasing the chances of designing and selecting “the one”. A quality compound needs to possess biological properties of specificity for on-target potency vs. off-targets, it needs to engage the target in vivo to produce a therapeutic benefit with a suitable safety margin. Known risks can be frontloaded into discovery testing cascades, and the major attributes of a quality small-molecule compound are:
Potent (<10 nM vs. primary target)
Selective (>100 × selective vs. other targets based on maximal exposure levels)
Not a victim or perpetrator of drug–drug interactions (not a CYP inhibitor or inducer nor to interact with any significant other drug transporter that could affect the exposure of co-administered drugs)
Pharmacokinetic profile suitable for exposure and therapeutic efficacy via the chosen dose route and frequency
Soluble and formulatable
Scalable synthetic route.
Some risks are hard to define and are not absolutes. For example, are nitro groups toxic liabilities or can they be acceptable in oral drugs? Certainly marketed drugs can contain nitro groups, but are a risk for reactive metabolism and have become abhorrent to most medicinal chemists. Likewise, some but not all aromatic amines can be genotoxic, and these can be screened out preclinically. Tables of molecular alerts containing unwanted or undesirable functionality such as AZFILTERS134 have been published based on the experiences (and often biases) of medicinal chemists.
Metrics such LE and LLE discussed above, try to capture properties that may or may not be explicit in a candidate drug profile and could mitigate risks beyond poor permeability and solubility (as was the focus for Ro5), but also may more broadly improve chances of successful progression through clinical development.
1.10 Conclusions
Having a clear view on medicinal chemistry design strategies has never been more important. Control of physical properties such as hydrophobicity, ionisation and hydrogen bond donors, and the efficient use of chemical structure so that every atom plays an important part (the known knowns) can provide a framework to help mitigate later failure (the unknown unknowns). But these are guidelines, and candidate drugs most often contain project-specific compromises or sit in niches. For example: natural products have been optimised by evolution, and PROTACS and covalent drugs have different PK-PD requirements than non-covalent reversible drugs; different delivery routes allow different property ranges; and different diseases accept different safety compromises. We can aspire as Picasso did to “Learn the Rules Like a Pro, So You Can Break Them Like an Artist.” As we partner with artificial intelligence (AI) and powerful machine learning algorithms for augmented drug design – and autonomous drug design by AI is now widely discussed – we should remember that machines are good at learning rules, but it takes a human to break them and demonstrate the true art of medicinal chemistry.
1.11 Hints and Tips
While the number of compound quality metrics and associated guidance keeps increasing, and we have made no attempt here to be comprehensive, we believe that medicinal chemistry strategy in relation to physical property control, particularly for oral drugs, is clear:
Optimise potency which results in lower dose.
Control log P – between 0 and 3 is ideal.
Where the compound series has an ionisable centre, you can manipulate log D by pKa modulation and log P modulation. Beware of lowering log D7.4 by modulation of pKa, log P modulation should always be the focus.
LLE ≥5 is a good target value for a drug candidate. 96% of drugs possess LE or LLE, or both, greater than the average published compound acting at the drug's target.
Control the number of hydrogen bond donors: 0–2 will aid in cell penetration, absorption and CNS permeability.
Control aromatic ring count (especially phenyl rings), ideally ≤3.
Balance hydrophobicity by adjusting numbers of polar hydrogen bond acceptors.
Target aqueous solubility >100 micromolar in pH 7.4 buffer; if necessary use physiologically relevant assays such as FaSSIF solubility.
Acids and bases can help with solubility and bulk physical properties, but avoid extremes of pKa and avoid the addition of more than acid or basic groups if oral absorption is required.
Zwitterions can limit CNS penetration and still maintain oral absorption by passive and active processes.
Use calculations to guide compound design but validate with measurements to improve the accuracy of physical chemical control.
Never apply any rules literally – rules are merely guidelines – and the extremes of physical properties are not out of bounds, but carry additional risks and compromises.
Key References
The original “Rule of Five” paper by Lipinski and co-workers remains a “must” read:
Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings C. A. Lipinski, F. Lombardo, B. W. Dominy and P. J. Feeney, Adv. Drug Delivery Rev., 1997, 23(1–3), 3–25.
Contemporary perspectives on the “Rule of Five”:
C. P. Tinworth and R. J. Young, Facts, Patterns, and Principles in Drug Discovery: Appraising the Rule of 5 with Measured Physicochemical Data, J. Med. Chem., 2020, 18, 10091–10108.
D. A. DeGoey, H. J. Chen, P. B. Cox and M. D. Wendt, Beyond the Rule of 5: Lessons Learned from Abbvie's Drugs and Compound Collection, J. Med. Chem., 2018, 61, 2636−2651.
The origin of the lead-like concept:
The design of leadlike combinatorial libraries. S. J. Teague, A. M. Davis, P. D. Leeson and T. Oprea, Angew. Chem., Int. Ed., 1999, 38(24), 3743–3748.
Molecular Complexity and Its Impact on the Probability of Finding Leads for Drug Discovery, M. M. Hann, A. R. Leach and G. Harper, J. Chem. Inf. Comp. Sci., 2001, 41(3), 856–864.
Applications of efficiency metrics:
T. W. Johnson, R. A. Gallego and M. P. Edwards, Lipophilic Efficiency as an Important Metric in Drug Design, J. Med. Chem., 2018, 61, 6401−6420.
R. J. Young and P. D Leeson. Mapping the Efficiency and Physicochemical Trajectories of Successful Optimizations, J. Med. Chem., 2018, 61, 6421–6467.