Skip to Main Content
Skip Nav Destination

In this chapter we discuss current trends in multivariate biomarker discovery based on high-dimensional p>>N data (such as genomic, proteomic and metabolomic data generated by high-throughput technologies). First, we take a look at common misconceptions in biomarker discovery, and provide clear guidance on when to use (and when to avoid) which methods and why. Then we talk about feature selection, which – while still neglected by some studies – is the most important aspect of biomarker discovery. Next, we present selected supervised learning algorithms (linear discriminant analysis, support vector machines and random forests), which – when coupled with appropriate feature selection techniques – may be used as the cores of efficient methods of multivariate biomarker discovery. We also stress the importance of biological interpretation of biomarkers as well as the necessity for their proper validation. Finally, a novel data mining method that allows for the identification of multivariate biomarkers that are parsimonious, robust and biologically interpretable is presented.

You do not currently have access to this chapter, but see below options to check access via your institution or sign in to purchase.
Don't already have an account? Register
Close Modal

or Create an Account

Close Modal
Close Modal