Skip to Main Content
Skip Nav Destination

In the era of big data one of the key challenges is the conversion of data into knowledge by organization and searching. This is also true in the chemistry field, where novel technologies such as DNA encoded libraries, peptide libraries and new in silico enumeration methods produce immense amounts of molecules and related data. Handling these extremely large sets of molecules is tremendously complex and requires compromises that often come at the expense of interpretability. In this chapter we introduce and discuss an alternative, novel approach called “chemical topic modeling” which has been adopted from the text-mining field. This probabilistic framework offers an intuitive and meaningful way to organize large data sets. On the ChEMBL database (v23), an extremely heterogonous set of more than 1.6 million molecules, the method has proven its efficacy and robustness: a 100-topic model provided interesting topics like “proteins”, “DNA” or “steroids”. These rather general, yet nonetheless sensible and humanly understandable topics can provide the basis for further investigation.

You do not currently have access to this chapter, but see below options to check access via your institution or sign in to purchase.
Don't already have an account? Register
Close Modal

or Create an Account

Close Modal
Close Modal