CHAPTER 3: Recent Developments in Exploratory Data Analysis and Pattern Recognition Techniques
-
Published:06 Nov 2014
-
Series: Issues in Toxicology
M. Grootveld, in Metabolic Profiling: Disease and Xenobiotics, ed. M. Grootveld, The Royal Society of Chemistry, 2014, pp. 74-116.
Download citation file:
In this treatise, recent (or very recent) developments available for the analysis of multi- or megavariate datasets are reviewed and illustrated, with particular reference to the satisfaction of required assumptive criteria, and the development and operation of reliable means of assessing the value of such systems (diagnostic or otherwise) in the context of acceptable and cross-validated model systems. The latter series includes an appraisal and further re-evaluation of the applications (and limitations) of Canonical Correlation and Classification and Regression Tree analysis methods, moderated t-statistic techniques, specifically the Significance Analysis of Microarrays (SAMs) and Empirical Bayesian Approach Modelling (EBAM) methods, together with the machine-learning techniques such as (1) Self-Organising Maps (SOMs), (2) Support Vector Machines (SOMs) and (3) Random Forests (RFs). In particular, the Genetic Algorithms (GAs), Gaussian Graphical Models (GGMs), Independent Component Analysis (ICA) and Correlated Component Regression (CCR) analytical approaches are focused on, in addition to those which are more commonly employed, albeit in a rather limited sense. In particular, the applicability of the CCR technique to the MV analysis of X>n datasets (which are much more commonly encountered than X<n ones in ‘omics’ research investigations, is outlined.