Chapter 2: DNA-encoded Library Machine Learning Applications
-
Published:21 Feb 2025
-
Special Collection: 2025 eBook CollectionSeries: Drug Discovery Series
E. A. Sigel, in DNA-encoded Library Technology for Drug Discovery, ed. G. Liu, C. J. Krusemark, and J. Li, Royal Society of Chemistry, 2025, vol. 85, ch. 2, pp. 17-40.
Download citation file:
Machine learning (ML) has begun to realize its promise in many domains in the last several years. While small molecule drug discovery has lagged in comparison to other areas, developments in computing capabilities, data generation, and algorithms have enabled significant progress in molecule prediction. DNA-encoded libraries (DELs) represent an efficient way to generate the quantity of data required for effective model building, providing a mechanism for protein-target specific prediction with economics that permit individual organizations to operate. DEL-based machine learning (DEL-ML) has been demonstrated to work for a variety of targets and continues to expand in its usage in the industry and in the approaches reported. With this initial success, a number of challenges and considerations faced by the DEL-ML practitioner have been identified including denoising of DEL data, choice of ML algorithm, hyperparameters and molecule representations, and the need for relevant metrics for assessment, particularly given the high resource and time costs of testing predictions. In order to fully realize the potential of DEL-ML, key improvements in drug discovery infrastructure and broad availability of DEL data are needed.