Chemoinformatics Approaches to Virtual Screening
Chemoinformatics is broadly a scientific discipline encompassing the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information. It is distinct from other computational molecular modeling approaches in that it uses unique representations of chemical structures in the form of multiple chemical descriptors; has its own metrics for defining similarity and diversity of chemical compound libraries; and applies a wide array of statistical, data mining and machine learning techniques to very large collections of chemical compounds in order to establish robust relationships between chemical structure and its physical or biological properties. Chemoinformatics addresses a broad range of problems in chemistry and biology; however, the most commonly known applications of chemoinformatics approaches have been arguably in the area of drug discovery where chemoinformatics tools have played a central role in the analysis and interpretation of structure-property data collected by the means of modern high throughput screening. Early stages in modern drug discovery often involved screening small molecules for their effects on a selected protein target or a model of a biological pathway. In the past fifteen years, innovative technologies that enable rapid synthesis and high throughput screening of large libraries of compounds have been adopted in almost all major pharmaceutical and biotech companies. As a result, there has been a huge increase in the number of compounds available on a routine basis to quickly screen for novel drug candidates against new targets/pathways. In contrast, such technologies have rarely become available to the academic research community, thus limiting its ability to conduct large scale chemical genetics or chemical genomics research. However, the landscape of publicly available experimental data collection methods for chemoinformatics has changed dramatically in very recent years. The term "virtual screening" is commonly associated with methodologies that rely on the explicit knowledge of three-dimensional structure of the target protein to identify potential bioactive compounds. Traditional docking protocols and scoring functions rely on explicitly defined three dimensional coordinates and standard definitions of atom types of both receptors and ligands. Albeit reasonably accurate in many cases, conventional structure based virtual screening approaches are relatively computationally inefficient, which has precluded them from screening really large compound collections. Significant progress has been achieved over many years of research in developing many structure based virtual screening approaches. This book is the first monograph that summarizes innovative applications of efficient chemoinformatics approaches towards the goal of screening large chemical libraries. The focus on virtual screening expands chemoinformatics beyond its traditional boundaries as a synthetic and data-analytical area of research towards its recognition as a predictive and decision support scientific discipline. The approaches discussed by the contributors to the monograph rely on chemoinformatics concepts such as: -representation of molecules using multiple descriptors of chemical structures -advanced chemical similarity calculations in multidimensional descriptor spaces -the use of advanced machine learning and data mining approaches for building quantitative and predictive structure activity models -the use of chemoinformatics methodologies for the analysis of drug-likeness and property prediction -the emerging trend on combining chemoinformatics and bioinformatics concepts in structure based drug discovery The chapters of the book are organized in a logical flow that a typical chemoinformatics project would follow - from structure representation and comparison to data analysis and model building to applications of structure-property relationship models for hit identification and chemical library design. It opens with the overview of modern methods of compounds library design, followed by a chapter devoted to molecular similarity analysis. Four sections describe virtual screening based on the using of molecular fragments, 2D pharmacophores and 3D pharmacophores. Application of fuzzy pharmacophores for libraries design is the subject of the next chapter followed by a chapter dealing with QSAR studies based on local molecular parameters. Probabilistic approaches based on 2D descriptors in assessment of biological activities are also described with an overview of the modern methods and software for ADME prediction. The book ends with a chapter describing the new approach of coding the receptor binding sites and their respective ligands in multidimensional chemical descriptor space that affords an interesting and efficient alternative to traditional docking and screening techniques. Ligand-based approaches, which are in the focus of this work, are more computationally efficient compared to structure-based virtual screening and there are very few books related to modern developments in this field. The focus on extending the experiences accumulated in traditional areas of chemoinformatics research such as Quantitative Structure Activity Relationships (QSAR) or chemical similarity searching towards virtual screening make the theme of this monograph essential reading for researchers in the area of computer-aided drug discovery. However, due to its generic data-analytical focus there will be a growing application of chemoinformatics approaches in multiple areas of chemical and biological research such as synthesis planning, nanotechnology, proteomics, physical and analytical chemistry and chemical genomics.