Chapter 6: Representing Chemical Structures in Databases for Drug Design
-
Published:04 Nov 2011
-
Special Collection: 2011 ebook collection , 2011 ebook collection , 2011-2015 physical chemistry subject collectionSeries: Drug Discovery
J. M. Barnard, P. W. Kenny, and P. N. Wallace, in Drug Design Strategies: Quantitative Approaches, ed. D. J. Livingstone and A. M. Davis, The Royal Society of Chemistry, 2011, ch. 6, pp. 164-191.
Download citation file:
Many different computer representations of chemical structures are used in drug design. Most treat the molecule as a topological graph, though the analogy is not a perfect one, and has problems with features such as tautomerism, aromaticity and stereochemistry. Commonly-used representations include 2D-structure diagrams, systematic nomenclature, line notations (e.g. SMILES), connection tables and the recently-developed International Chemical Identifier (InChI). Several approaches are used to indicate stereochemical configuration. 3-D structural representations are also used in identifying molecules with appropriate conformations for biological activity. The presence or absence of substructure fragments in a molecule is used to build chemical “fingerprints”, which are useful both in structure search systems and for measuring the similarity between molecules. It is frequently important to establish a unique representation for a molecule, and a number of canonicalisation algorithms and structure normalisation procedures (especially to deal with tautomerism and protonation) are used to achieve this. In some cases, these need to consider the predominant form under physiological conditions. “Business rules” for normalisation are especially important in chemical registration systems, which also need to deal with different salts and isotopically-labelled compounds, as well as unknown and partially-known structures. In recent years, many techniques have been developed for the analysis of structural databases; these include clustering, R-group decomposition, “reduced feature” representations and matched molecular pair analysis.