CHAPTER 5: Statistics, Data Mining and Modeling1
Published:16 Mar 2020
Special Collection: 2020 ebook collection
M. Reboiro-Jato, D. Glez-Peña, and H. López-Fernández, in Processing Metabolomics and Proteomics Data with Open Software: A Practical Guide, ed. R. Winkler, The Royal Society of Chemistry, 2020, pp. 120-200.
Download citation file:
Once mass spectrometry data have been pre-processed to discover what the true peaks are, they can be used in different ways. For instance, one may want to build a predictive model that can differentiate between two conditions (e.g. case versus control) and classify new samples or discover differentially expressed molecules. To accomplish these and other tasks, adequate statistics and data mining techniques should be chosen and applied. With these goals in mind, this chapter aims to present different strategies for sample comparison, dimensionality reduction techniques (e.g. Principal Component Analysis), cluster analysis methods (e.g. hierarchical clustering analysis), different ways to find important variables (e.g. biomarker discovery), and the creation and evaluation of predictive models based on machine learning techniques. These topics are covered in a practical way, showing reusable examples that use real, publicly-available mass spectrometry datasets.