Skip to Main Content
Skip Nav Destination

Statistical learning is one of the most useful and used methodologies in computational drug discoveries. As more and more drug discovery data is being released from literature sources, pharmaceutical databases, and others, the power of statistical learning methods only become more important to use appropriately to make the best use of the available data. Unsupervised learning is a subfield of statistical learning that permits the understanding of the overall structure of our data and may reveal hidden structures that can be exploited, such as cluster analysis that may reveal dense or sparse regions of chemistry space. Conversely, supervised learning still uses calculated descriptors, but the modelling proceeds to build a model that correlates with a measured response. By building these models it is possible to make reliable predictions about molecules prior to testing, or even molecules that are yet to be synthesised. However, models are only as useful as their quality and how they are used. The last part of this section demonstrates some best practices in statistical learning to ensure that models are generated that are useful and then they are applied appropriately.

You do not currently have access to this chapter, but see below options to check access via your institution or sign in to purchase.
Don't already have an account? Register
Close Modal

or Create an Account

Close Modal
Close Modal