CHAPTER 8: Statistical Learning
-
Published:30 Oct 2015
-
Special Collection: 2015 ebook collection
Download citation file:
Statistical learning is one of the most useful and used methodologies in computational drug discoveries. As more and more drug discovery data is being released from literature sources, pharmaceutical databases, and others, the power of statistical learning methods only become more important to use appropriately to make the best use of the available data. Unsupervised learning is a subfield of statistical learning that permits the understanding of the overall structure of our data and may reveal hidden structures that can be exploited, such as cluster analysis that may reveal dense or sparse regions of chemistry space. Conversely, supervised learning still uses calculated descriptors, but the modelling proceeds to build a model that correlates with a measured response. By building these models it is possible to make reliable predictions about molecules prior to testing, or even molecules that are yet to be synthesised. However, models are only as useful as their quality and how they are used. The last part of this section demonstrates some best practices in statistical learning to ensure that models are generated that are useful and then they are applied appropriately.