Skip to Main Content
Skip Nav Destination

Notable historically-developed composites of advanced forms of statistical analysis and analytical/bioanalytical chemistry have been vital to the interpretation and understanding of the significance of results acquired in research (both natural sciences and clinical) and industry, with applications in numerous fields, including biomedical sciences, healthcare and environmental sciences. Herein, multicomponent nuclear magnetic resonance (NMR) analysis is used as a model to delineate how advanced statistical tools, both univariate and multivariate, can be implemented to effectively perform complex spectral dataset analyses in metabolomic applications, and to provide valuable, validated conclusions therein. Computational techniques are now embedded into spectral interpretation from an analytical chemist's perspective. However, there are challenges to applying such advanced statistical probes, which will be explored throughout this chapter.

Although some statistical approaches were developed much earlier, such as the pioneering Bayesian statistics conducted in the 18th century,1  the interdisciplinary usage between science and statistics has still not been fully established. At present, there is a strong affinity between statistics and science, which dates back to the late 19th century and early 20th century. Works by Karl Pearson and Francis Galton, explored regression towards the mean, principal component analysis (PCA), and Chi-squared contingency table testing and correlation.2  Later, Spearman also developed his theory on factor analysis, namely Spearman's rank correlation coefficient, and applied it to the social sciences research area.3  William Gosset was responsible for the discovery of the t-test, which is embedded in most statistical testing applied in scientific fields to date, and which unfortunately remains the most widely abused, misapplied univariate form of statistical analysis test.4  Ronald Fisher tied the aforementioned ideas together, observing Gaussian distribution accounting for both chi-squared and the t-test to formulate the infrequently used F-distribution test. Fisher later developed analysis-of-variance (ANOVA), and defined p-values for determining the level of statistical significance.5  Fisher furthered these works by applying his knowledge to genetics, in particular the observation of alleles, specifically the frequency and estimation of genetic linkages by maximum likelihood methods within a population.6  Basic statistical hypotheses, such as H0 and H1, which still stand to date, were then established7  and are still fundamental to all experimental designs.

These statistical tools are now applied to just about every field possible; however, in science every research area has an element of statistical interpretation, from genomics in diseases diagnostics, forensic science and hereditary studies, the microbiome, and the discovery of biomarkers using biological immunoassays and ‘state-of-the-art’ bioanalytical techniques. In this chapter, modern advanced statistical methodologies will be explored through a major, now commonly employed multicomponent analytical/bioanalytical chemistry technique, namely nuclear magnetic resonance (NMR) spectroscopy. Statistical approaches and challenges which are associated with the collection of large or very large NMR datasets run in parallel with those of other multicomponent analytical techniques, such as liquid-chromatography–mass spectrometry (LC/MS), and hence the NMR examples provided serve as appropriate test models. Despite some problems occasionally being experienced with low sensitivity (especially at biofluid concentrations <10 μmol L−1) and untargeted analyses, which may result in xenobiotic resonances overlapping with those of endogenous metabolites, NMR provides many laboratory advantages over LC/MS in view of its high reproducibility, non-destructive methodology with minimal sample preparation, and the simultaneous detection and determination of 100 or more metabolites at operating frequencies greater than or equal to 600 MHz.8  NMR is a suitable analytical technique across many different fields and sample types, including food chemistry, geological studies, drug discovery, forensics and an increasingly expanding number of metabolomics areas, for example lipidomics and fluxomics. Both solid and liquid samples can be analysed, which is obviously advantageous. Moreover, major recent advances in the computational capability have enhanced the applications of metabolomics-linked statistical approaches, and current software modules and their applications will also be discussed here.

Indeed, NMR as a technique advanced much later than the statistical methods, therefore combining statistical tools with the data acquired occurred subsequently. Nuclear spin was first described by Pauli in 1926,9  and these ideas were then further developed by Rabi in 1939 who created the first radio frequencies for this purpose involving a beam of lithium chloride (Figure 1.1).10  Overhauser in 1953 observed dynamic nuclear magnetisation.11  Redfield then explored the theory of NMR relaxation processes,12  and NMR was then developed from these principles by Bloch and Purcell from 1945–1970, both of whom then won several Nobel Prizes.13,14  Continuous Wave (CW) methods were used to observe the nuclei spin, experiments which use a permanent magnet/electromagnet and a radio frequency oscillator to produce two fields, B0 and B1 respectively. To produce a resonance, CW methodologies were used to vary the B1 field or the B0 field to achieve resonance. In essence, the magnetic field is continuously varied and the peak signal (resonance) is recorded on an oscilloscope or an x–y recorder. However, these methodologies have substantially advanced, and at present radiofrequency pulse sequences are applied to nuclei within a magnetic field, B1.

Figure 1.1

7Li nucleus NMR signal originally observed by Rabi et al.10  Reproduced from ref. 10, https://doi.org/10.1103/PhysRev.55.526, with permission from American Physical Society, Copyright 1939.

Figure 1.1

7Li nucleus NMR signal originally observed by Rabi et al.10  Reproduced from ref. 10, https://doi.org/10.1103/PhysRev.55.526, with permission from American Physical Society, Copyright 1939.

Close modal

A range of NMR facilities are currently available, with the highest operating frequency reaching 1200 MHz, which requires a Gauss safety line and regular cryogen fills, in addition to more accessible permanent non-cryogen-requiring benchtop magnets operating at a frequency of up to 80 MHz. Progress in low-field NMR spectroscopy and their analytical applications, albeit very uncommon in metabolomic applications, has been recently reviewed.15  A whole plethora of spectrometers exist between these two extreme frequencies, and these all have the capacity and capability to acquire a wide range of molecular analyte data, which can subsequently be employed for statistical evaluations and comparisons. Biological fluids have been examined using both low-16  and high-frequency17  NMR technologies for monitoring of a range of endogenous metabolites and xenobiotics, although the use of high-frequency spectrometers is often the preferred approach because of the much-enhanced level of resolution and deconvolution of NMR signals. An example of the differences in resolution observed between low- (60 MHz) and high-field (400 MHz) NMR analysis is shown in Figure 1.2.

Figure 1.2

400 and 60 MHz NMR of the same urine sample from a control participant. Assignments include; [1] trimethylsilylpropanoic acid–CH3; [2] acetone–CH3; [3] pyruvate–CH3; [4] citrate–CH2A/B; [5] creatinine/creatine–CH3; [6] cis-aconitate–CH2; [7] taurine–CH2; [8] trimethyl-N-oxide–CH3; [9] glycine–CH3; [10] taurine–CH2; [11] unassigned-CH2; [12] creatine–CH2; [13] glycolate–CH2; [14] creatinine–CH2; [15] H2O–OH; [16] histidine–CH; [17] indoxyl sulphate–CH; [18] indoxyl sulphate–CH; [19] hippurate–CH; [20] hippurate–CH; [21] hippurate–CH; and [22] formate–CH.

Figure 1.2

400 and 60 MHz NMR of the same urine sample from a control participant. Assignments include; [1] trimethylsilylpropanoic acid–CH3; [2] acetone–CH3; [3] pyruvate–CH3; [4] citrate–CH2A/B; [5] creatinine/creatine–CH3; [6] cis-aconitate–CH2; [7] taurine–CH2; [8] trimethyl-N-oxide–CH3; [9] glycine–CH3; [10] taurine–CH2; [11] unassigned-CH2; [12] creatine–CH2; [13] glycolate–CH2; [14] creatinine–CH2; [15] H2O–OH; [16] histidine–CH; [17] indoxyl sulphate–CH; [18] indoxyl sulphate–CH; [19] hippurate–CH; [20] hippurate–CH; [21] hippurate–CH; and [22] formate–CH.

Close modal

The majority of biological fluids are predominantly composed of water, such as urine and saliva, and therefore appropriate pulse sequences have been developed to suppress these resonances in order to focus on those of interest. Pulse sequences such as water excitation technique (WET), nuclear Overhauser effect spectroscopy (NOESY),18  PRESAT and WATERGATE19  are highly suitable for the analysis of spectra containing such broad signals arising from the 1H nuclei in H2O. The water signal can be irradiated at its characteristic frequency, and hence is unable to resonate, and this strategy serves to reveal metabolites at smaller concentrations that are located at similar frequencies (chemical shift values, δ).

Furthermore, all biological fluids such as blood serum or plasma contain low-molecular-mass metabolites in addition to large macromolecules, usually proteins and lipoproteins. These metabolite signals are then superimposed on the broad signals of the macromolecules, leading to signal loss and broadening. In this specific case, applying a CPMG pulse sequence makes it possible to overcome this problem by exploiting the differences in the relaxations of metabolites and macromolecules. This sequence deletes the fast-relaxing signals arising from large macromolecules, such as proteins, from the spectrum by applying a spin-echo pulse.20 

Analysis of biofluids has been used for both pharmacokinetic and biomarker detection in many studies, and requires multidimensional data analysis using highly sophisticated, but nevertheless comprehensive, statistical techniques.21 

Although the metabolome was first explored centuries ago through organoleptic analysis, such as the smelling of urine as a means of diagnosing diseases, applications of bioanalytical techniques to analyse and determine molecules in urine were first performed by Pauling et al. (1971),22  and the human metabolome was described by Kell and Oliver (2016).23  Nicholson and Sadler's groups’ pioneering work first detected drug metabolites using 1H NMR analysis in the early 1980s, observing acetaminophen and its corresponding metabolites.24  In addition, this group was the first to monitor metabolic conditions in urine samples.24,25  Over the last 20 years, the metabolome itself, and the means for the application of multianalyte bioanalytical techniques to be applied to it, have been further defined by others such as Nicholson and Lindon.26  Many developments have been explored in the literature with a progressive history for techniques employed for metabolomics studies, along with their establishment as key investigatory tools, as documented by Oliver and Kell.23  The study of the metabolome allows for the monitoring of metabolic processes within a system by identifying and determining low-molecular-mass metabolites within biofluids, tissues or cells. Indeed, the metabolome can be affected by many biological processes. These can either be through external stimuli such as an intervention, such as a medication, diet or exercise regimen, or alternatively through internal stimuli. An internal stimulus can be introduced via the modification of gene expression using techniques such as cell transfection, both in vivo and in vitro. Moreover, metabolomics techniques assess these changes by providing a ‘snapshot’ of the status of biological processes occurring at a specific point in time. This responsive information can provide a high level of detail regarding cellular metabolic processes, and can facilitate phenotypic evaluations, and hence yield an overall ‘picture’ or ‘fingerprint’ of the chemopathological status of a disease. Even more valuably, metabolomics is able to probe the changing disease status, for example, the effects of a drug treatment, the removal of a tumour, regression of an inflammatory condition, and so forth. Hence, these strategies may be successfully employed to monitor the severity and progression of a disease; information which further increases our understanding of the aetiology, together with the manifestation and progression of particular conditions.27 

Two approaches can be undertaken in metabolomics; they are chosen primarily by the objectives of the study and the hypotheses formulated:28 

  • A targeted approach focused on the quantitative analysis of a limited number or class of metabolites that are linked by one or more specific biochemical pathways. This makes it possible to compare variations in metabolites in a precise and specific way.

  • The objective of an untargeted approach is to detect “all” metabolites present in biological samples. It involves simultaneous analysis of as many metabolites as possible.

Targeted analysis is usually performed by mass spectrometry (MS), or more specifically MS as a detection system for liquid- or gas-chromatographically-separated metabolites, the sensitivity of which designates it as an excellent method to quantify and identify a set of specific metabolites. By not requiring prior knowledge about compounds present in a sample, and focusing on the global metabolic profile, NMR is ideally suited for an untargeted approach, but thanks to its easy quantification capacity, it can also be used in a targeted manner for selected metabolites.

NMR is able to detect and quantify metabolites in a high-throughput, simultaneous, non-destructive manner and requires minimal sample preparation. LC–MS strategies can target specific metabolites more sensitively than NMR, whilst lacking the specificity of the latter for absolute metabolite determinations.29  Therefore, a combination of both LC–MS and NMR methodologies is important for a global understanding of the metabolic effects involved in disease processes, and can be essential for thorough metabolite determinations; a description of the full strengths and weaknesses of these techniques can be found in Nicholson and Lindon.26 

A number of other analytical techniques can be applied, such as Fourier transform infrared spectroscopy (FTIR), ultraviolet-visible spectroscopy (UV–Vis) and derivatives of NMR and LC–MS.30  However, their applications remain limited and specific, each one having its own advantages and limitations.

When handling colossal and complex datasets from untargeted metabolomics approaches, which include a plethora of metabolites at a range of concentrations, it can be difficult to interpret and understand the significance of the datasets acquired. Multivariate (MV) statistics have been integrated into multicomponent NMR analysis in view of the number of datapoints produced from the output for analysing complex mixtures such as biofluids. MV statistics aids in the processing of large datasets into visual formats, which are more comprehensive for the analyst via the recognition of patterns or signatures in multianalyte datasets. The combination of using a scientific technique alongside MV and univariate statistical analysis strategies in this manner is termed ‘chemometrics’.

Any metabolomic approach is usually initiated by one or more biological/clinical questions to which the clinician or the biologist wishes to respond. Whether monitoring the effects of treatment or improving diagnostic tools, the experimental design must be carefully considered by assessing all sources of variation in order to reduce bias and avoid the introduction of irrelevant variability. Samples should be collected with appropriate research ethics approval and informed consent of all participants (both healthy and diseased, where relevant), and in metabolomics studies it is important to maintain a consistent cohort. Physiological factors, such as age, fasting or non-fasting protocols, gender, diet, physical activity, medical conditions, family medical history, genotype, and so forth, should all be taken into account prior to sample collection.8  Moreover, analytical factors should also be considered; for example, samples should be collected in a uniform manner, for example using the same collection tubes, whilst remaining mindful that some collection vessels may have signals with NMR resonances and hence interfere with the spectra acquired. This is common with anticoagulants when collecting blood samples, for example ethylenediaminetetraacetic acid (EDTA), will provide not only the signals from the EDTA chelator itself, but also those of its slowly-exchanging Ca2+- and Mg2+-complexes.31  Likewise, the citrate anticoagulant also has intense 1H NMR signals. Lithium heparin tubes are often recommended for plasma collection in order to avoid interfering signals. Contamination is possible in the early stages of sample collection to ensure sterility, and this also needs to be considered with sample transportation.

Sample stability is also an important experimental factor. Some samples are unstable when exposed to ambient temperature, which can occur if samples are on the autosampler. This can cause degradation, changes in the concentration or a complete loss of metabolites. Common pitfalls of biofluid storage include microbiological contamination of samples at ambient or even lower temperatures. Biological fluids should be stored at low temperatures, typically −80 °C prior to analysis.32  Sodium azide can be added to samples to ensure that microbes do not infiltrate, grow and interfere with metabolite levels in the sample whilst samples are maintained at ambient temperature on an NMR sample belt.33  Furthermore, freeze–thaw cycles should be minimised; indeed, it has been shown that no more than three freeze–thaw cycles are suitable for plasma sample analysis.32 

A suitable internal or external standard will provide a reference signal outside the area of analyte interest without interference with the sample. For example, an internal standard can be added to samples of saliva and urine, predominantly sodium 3-(trimethylsilyl)[2,2,3,3-d4] propionate (TSP). However, TSP can bind to proteins in plasma, serum, and synovial and cerebrospinal fluid samples,34  and therefore is not added in these cases. A more suitable internal standard could be added such as 4,4-dimethyl-4-silapentane-1-ammonium trifluoroacetate which has been shown to have limited interactions with proteins,35  or an external standard such as the use of a capillary. 4,4-Dimethyl-4-silapentane-1-ammonium trifluoroacetate has also been proposed as a suitable internal standard that does not interact with cationic peptides. Other useful internal standards which have been used for C2HCl3-based biofluid and tissue biopsy lipid extracts include 1,3,5-trichlorobenzene and tetramethylsilane, although the latter is not recommended as it readily evaporates. Electronic Reference To access In vivo Concentrations (ERETIC) can also be used as an electronic standard for biofluid or tissue extract NMR samples.34 

Buffering of the sample is also important in NMR as changes in the pH can modify the chemical shift values.34  Indeed, biofluid pH values vary significantly between sample classes in vivo. Biofluids are typically buffered to pH values of 7.0 or 7.4.34  For example, in blood plasma the 1H NMR profiles of histidine, tyrosine and phenylalanine are all affected by pH, and NMR-invisibility of both tyrosine and phenylalanine is possible at neutral pH in view of their binding to albumin.36 

Appropriate extraction techniques can be performed for solid samples such as leaves, seeds, biological tissues, cells, foods, drugs and so forth. However, it is critically important that no solids are retained in liquid sample extracts as this will interfere with the homogeneity of the magnetic field. Freeze-drying and/or drying with liquid nitrogen, followed by vortexing and centrifuging, are often necessary to ensure there is no retention of any solid sample debris.34 

Acquisition parameters should, of course, be maintained as uniform throughout the acquisition process. Full recommendations for such parameters are available in Emwas et al.33  The temperature in the NMR room and, more importantly, within the NMR probe should be consistent. Pulse parameters from the number of scans, acquisition time and number of acquisition points should be kept constant. The NMR instrument should be shimmed, tuned and matched appropriately.33  Occasionally, backward linear prediction (BLIP) and forward linear prediction (FLIP) can be appropriate in order to remove artefacts in the spectra, or if the acquisition time is too short giving rise to a truncated free induction decay (FID) respectively, processes that result in improved resolution. In metabolomics studies, these need to be used consistently throughout the post-processing stage in order to ensure that signals are not present as part of a “ringing pattern” or noise.

The experimental design ensures that the results acquired are indeed statistically significant and are not present owing to an error in the early sampling stages. Guidelines for urinary profiling have been established in the literature.33  However, there is no harmonisation throughout the field in view of the ranges in the NMR field strength and experimental parameters, such as a range of pulse sequences.

Before performing statistical analyses on spectral data, it is important to apply several pre-treatment steps that will ensure the quality of the raw data and limit possible biases. Indeed, in view of possible imperfection of the acquisition (noise acquisition), the signal processing, as well as the intrinsic nature of the biological samples (such as a dilution effect between the samples), it is very often necessary to apply some correctional measures to the spectra acquired.37  Differential methods of treatment can be used and each of them has its advantages and disadvantages. The choice of a method depends on the biological issue to be addressed, on the properties of the samples analysed, and on the methods of data analysis selected. Most of these post-processing steps are applicable to not only NMR datasets, but also LC–MS ones.

  • (A) Raw NMR spectral data acquisition

    One of the crucial steps is to ensure appropriate dilution of the metabolites so that the internal standard, if one is used, can be referenced. Furthermore, internal standards which are much lower in concentration than that of the monitored metabolite may indeed give rise to inaccurate results. Water suppression may also dampen the intensities of signals present in close proximity to the water signal; for example, it has been shown that both the α- and β-glucose proton resonances can both be significantly, or even substantially, affected by these suppression techniques if the power is not adjusted accordingly using certain pulse sequences, for example NOESY PRESAT.16,33,38  Further important quality-control assessments may include the recognition of drug signals, corresponding metabolites, alcohol, and so forth which are commonly found in biofluid matrices. These resonances, if not properly identified, may interfere and lead to false levels of statistical significance, misassignment of biomolecule signals, and drug-induced modifications to the biofluid profiles explored. Therefore, positive signal identification is important in ensuring valid statistical significance and will be discussed in greater depth below.

  • (B) Phase and baseline corrections

    Phase correction is crucial in order to ensure that signals are uniform, and no negative signals or baseline roll are present. These can cause elevated or decreased bucket intensity regions which could inflate the degree of statistical significance. Baseline correction can also ensure accurate signal integration.33 

  • (C) Alignment

    A regular problem encountered during data processing is a signal shift between the different NMR profiles of different samples. Several parameters can influence these peak shifts: instrumental factors, pH modifications, temperature variations, different saline concentrations or variable concentrations of specific ions. This problem is frequently encountered in urinary samples with pH values which are particularly variable, and which are subject to important variations in dilution.39  Several algorithms are available to realign peaks, each method has its own advantages and drawbacks. By shifting, stretching or compressing the spectra along their horizontal axis, this method maximizes the correlation between them (Figure 1.3).

  • (D) Bucketing

    Classically-acquired NMR spectra correspond to a set of several thousand points. NMR spectral data acquired on biological tissue extracts or biofluids contains information corresponding to about 50 to 100 metabolites. This gap between the number of variables available and the number of useful variables must be reduced before statistical processing. This stage of segmentation, called bucketing (or binning) must firstly reduce the dimensionality of the dataset in order to extract N variables from each acquired metabolic profile (spectrum). This approach also diminishes the problem of spectral misalignment.

    The most common segmentation technique comprises segmentation of the spectrum into N windows of the same width, otherwise known as bins or buckets. These buckets are usually of a size between 0.01 and 0.05 ppm. The total area within each bucket is measured instead of individual intensities, leading to a smaller number of variables. However, because of the lack of flexibility in the segmentation, some areas from the same resonance or peak could be split into two or more bins, dividing the chemical information between several bins, which could influence the data analysis subsequently conducted. To answer this problem, segmentation of variable intervals was developed. This technique, called intelligent bucketing, attempts to split the spectrum so that each bucket contains only one signal, peak or pattern (Figure 1.4). Of note, this method is highly sensitive to pH variations, and therefore the spectral realignment needs to be optimal before it is applied.8 

    Subsequently, bucketing is performed, which involves integration of the signal to create an NMR data matrix. It is important that an alignment approach is performed in order to ensure that regions which are bucketed are not splitting across two such integration areas rather than one.

  • (E) Normalisation

    Normalisation is then performed in order to maximise the information to be extracted while minimising the noise and variability arising from any sample dilutions involved. It is applied to the dataset of each spectrum, and attempts to render the samples comparable to each other, as well as between them across repeated runs. Furthermore, it allows minimisation of possible biases introduced by the experimenter when collecting, handling and preparing samples.40  In an optimal situation, a metabolite constitutively expressed in biofluids or tissues could serve as an internal standard. One of the only metabolites used in metabolomics analysis for that purpose is creatinine, and creatinine normalisation is widely applied to urine samples. However, it remains controversial, with more and more studies linking creatinine variations to age, weight, exercise or gender.41,42  Moreover, creatinine normalisation should not be applied in solutions containing more than 2.5% (v/v) 2H2O as deuterium has been shown to exchange with the 1H nuclei of the –CH2 function of this biomolecule, a process which gives rise to time-dependent decreases in the intensity.43  To overcome this lack of reliable internal standards, several varieties of standardisation methods have been developed.

    Normalisation can either be expressed as a percentage across the entire spectrum, or alternatively, signal intensities can be expressed relative to that of an internal standard. Resonances which may be of some metabolomics or diagnostic/prognostic importance may also be required to be removed prior to this process, including those of xenobiotics, urea and water in biofluid samples, for example.

    Quantile normalisation ensures the same distribution across all spectral bins by organising them in ascending order and calculating the means of these. If spectra share the same distribution, all the quantiles will be identical, for example the mean of the highest concentration metabolites will be reflected with this normalisation method.44  However, the highest concentration metabolite may vary significantly across different samples, and therefore this mean value may not be applied across all samples. Following this normalisation method, each feature will consist of the same set of values; however, within features, the distribution will be different.44  Similarly, cubic splines aim to provide the same distribution of metabolite features, however, both non-linear relationships are assumed between the baseline and individual spectra. In the cubic spline method, the geometric mean of all spectral features is calculated. A cubic spline is then fitted between the baseline and the spectral intensities several times in order to achieve normalisation. However, variance stabilisation normalisation (VSN) operates differently to the above described methods, and successfully maintains a constant variance for each predictor variable within the dataset.

    Other methods of normalisation include probabilistic quotient normalisation, histogram matching, contrast normalisation and cyclic locally-weighted regression can be considered for use in metabolomics datasets, but are beyond the scope of this work.

  • (F) Scaling or bucket normalisation

    Scaling can then be completed in order to standardise each bucket region. Indeed, the buckets associated with the most concentrated metabolites have a greater variance than others. Consequently, some buckets may have greater weights than those of others in variance-based multivariate data analyses. To avoid this bias, it is essential to rescale the weight of each variable. This can be performed using autoscaling, by subtracting the mean centre point from each observation and dividing by the variance, or the widely preferred Pareto scaling, which also subtracts the mean centre point, but is then divided by the square root of the standard deviation. Hence, Pareto-scaled variables do not have unit variance, but variances are all close to unity, albeit different. For example, urinary metabolites present in small concentrations such as formate will produce lower intensities, in view of the lower concentrations compared to that of creatinine, and therefore scaling these accordingly ensures normality for each variable and column (metabolite) variance homogeneity, despite their original concentrations.

    Scaling methodologies have been reviewed by Gromski et al.,45  and these suggest that VAST (variable stability) scaling is the best methodology for NMR data; this represents an extension of autoscaling. Small predictor variable metabolite variations are accounted for using this method as post autoscaling data are multiplied by a scaling factor and then divided by the standard deviation.45  Other scaling methodologies include range-level-scaling which are not explored herein.

    Transformation of data is also useful to ensure a bell-shaped data distribution, that is reducing distributional skewness. Indeed, logarithmic or cube root transformations are often recommended for metabolomics datasets.

    Some authors recommend spectral or chromatographic smoothing to ensure noise reduction; however, clearly small signals need to be retained as much as possible by this process.46  Overall, the quality of pre-processing spectral data prior to statistical analysis determines the quality and accuracy of the results.

Figure 1.3

Effect of realignment on the creatinine signal: before (upper panel) and after (lower panel) realignment.

Figure 1.3

Effect of realignment on the creatinine signal: before (upper panel) and after (lower panel) realignment.

Close modal
Figure 1.4

1H-NMR spectrum before (A) and after (B) equidistant bucketing. (C) Equidistant bucketing, in which the bucket size is constant. As shown, a single resonance could be erroneously divided into two buckets. (D) Intelligent bucketing, in which each signal is divided into only a single bucket.

Figure 1.4

1H-NMR spectrum before (A) and after (B) equidistant bucketing. (C) Equidistant bucketing, in which the bucket size is constant. As shown, a single resonance could be erroneously divided into two buckets. (D) Intelligent bucketing, in which each signal is divided into only a single bucket.

Close modal

Positive signal identification can be performed without statistical approaches, and there is a plethora of metabolites and identification platforms such as the Human Metabolome Database (HMDB),47  MetaboLights,48  Biomagresbank (BMRB),49  Spectral database for organic Compounds (SDBS), Madison-Qingdao Metabolomics Consortium Database (MMCD),50  The Birmingham Metabolite Library (BML-NMR),51  NMRshiftDB52  and the metabolomics workbench,53  which can markedly facilitate signal identification. It is important in NMR to account for the multiplicity, integral, J couplings, and chemical shift values prior to the assignment of signals. However, statistical approaches have also been used in conjunction with bioanalytical techniques in order to identify signals which are correlated to each other, a process also facilitating assignments. These methodologies have been demonstrated in model sample systems containing just a single molecule, and complex mixtures such as a biofluid sample, and hence provide a pseudo-two-dimensional NMR spectrum. Figure 1.5 shows confirmation of the identity of n-butyric acid using the most recently developed statistical total correlation spectroscopy (STOCSY) strategy as applied to faecal water, which shows the ability of this technique to tackle such complex mixtures.54 Figure 1.6 shows the elucidation of a mixture of sucrose and glucose signals utilising the STOCSY approach.

Figure 1.5

STOCSY analysis of faecal water, with a correlation matrix of R2=0.0–1.0 (right-hand ordinate axis) which shows correlation with the β-CH2 function δ=1.562 ppm driver signal, two other signals (those of the terminal-CH3 (t) and α-CH2 (t) functions) correlated with this resonance, which aids the positive identification of n-butyrate.54  Reproduced from ref. 54, https://doi.org/10.1016/j.csbj.2016.02.005, under a CC By 4.0 license, https://creativecommons.org/licenses/by/4.0/.

Figure 1.5

STOCSY analysis of faecal water, with a correlation matrix of R2=0.0–1.0 (right-hand ordinate axis) which shows correlation with the β-CH2 function δ=1.562 ppm driver signal, two other signals (those of the terminal-CH3 (t) and α-CH2 (t) functions) correlated with this resonance, which aids the positive identification of n-butyrate.54  Reproduced from ref. 54, https://doi.org/10.1016/j.csbj.2016.02.005, under a CC By 4.0 license, https://creativecommons.org/licenses/by/4.0/.

Close modal
Figure 1.6

STOCSY analysis of a sucrose/glucose admixture (metabolites are represented by the letters S and G respectively).55  Reproduced from ref. 55 with permission from American Chemical Society, Copyright 2007.

Figure 1.6

STOCSY analysis of a sucrose/glucose admixture (metabolites are represented by the letters S and G respectively).55  Reproduced from ref. 55 with permission from American Chemical Society, Copyright 2007.

Close modal

STOCSY has also been used in conjunction with the statistical recoupling of variables giving rise to an R-STOCSY approach that shows correlations between distant clusters.56  Previous methodologies of applying statistics to NMR spectra, predominantly from Nicholson's group, which include statistical heterospectroscopy (SHY) and accounts for covariance between signals, and which can be used to observe correlations between two applied analytical techniques such as MS and NMR, or STOCSY to MS, has been previously demonstrated by Crockford et al.57  and Nicholson et al.58  However, sufficient computing power is required to perform these techniques.57  Diffusion order (DO) and STOCSY have also been combined to yield an S-DOSY technique which can be employed for complex mixture analysis in order to facilitate assignments, the deconvolution of overlapping metabolite signals, and simple comparisons of the diffusional variances in signals.55 

Other useful non-statistical techniques include 2D NMR such as 1H–1H COSY and 1H–13C HSQC to help with the assignment of NMR signals, without involving such statistical complexity. Metabolite prediction has also been trialled by observation of the chemical shift and the concentration of the biofluid itself, comparing relationships between these two elements in order to provide a chemical shift and concentration dataset matrix, From this a prediction model was constructed including an algorithm model and salient navigator signals, such as those of creatine, creatinine and citrate, to aid with prediction capability.59  The idea of the chemical shift prediction remains in the early stages of development, and requires uniform sample preparation and operating frequencies in order to achieve successful assignments, as has been demonstrated for proteins60  and multicomponent biofluids.59 

At present there are numerous computational packages that can support the statistical analysis of NMR datasets. These include XLSTAT2019, an add on to excel, Metaboanalyst 4.0,61  an online user-friendly interface using R scripts, SIMCA, an all-in-one software for multivariate analysis, MatLab, MVAPACK, Python and R Programming, script-based programming languages with packages. The majority of statistical methodologies can be applied with any of the aforementioned software which, are predominantly available free for researcher use.

Statistical analysis can be univariate or multivariate, which both offer advantages and disadvantages, most of which are covered by Saccenti et al.62  Univariate analysis is simple to implement; however, it does not consider inter-relationships between metabolite concentrations. Metabolites can be independent or dependent, but are interlinked via pathways and could be correlated to other metabolites in the system. Notwithstanding, statistical power is also limited by the observation of only one metabolite. Multivariate analysis can be problematic in view of the high dimensionality of data, a process causing the masking of metabolites, and noise or unimportant variables appearing significant when this is indeed not the case. Usually, a combination of univariate and multivariate statistics applied in such cases addresses these issues. Most of these tests need to take into account certain assumptions which can be found in any statistical textbook, for example that the data has been suitably preprocessed, normalised and scaled to unit or near-unit variance as discussed above.

Each statistical technique herein will be described, and a case study showing statistical applications in 1H NMR spectral analysis will be considered. A range of applications will be explored to show the diversity of fields in metabolomics, but the predominant theme will be biofluids and liquid biopsies. A summary table showing the advantages and disadvantages of each technique respectively in metabolomics applications will be provided at the end of this chapter. Often, a combination of techniques will be used in order to classify and provide a statistical significance to the results acquired, that is 1D and 2D NMR spectra, LC–MS and so forth, which is required for validation. This chapter covers the most frequently applied statistical methods employed in metabolomics research investigations at present.

Univariate data analysis is crucial in any metabolomics data analysis strategy. A variable may be insignificant in a multivariate model, but significant in a univariate context. This is because multivariate models can often miss/mask significant variables as all metabolites (and metabolite relationships) are simultaneously examined. Hence, it is important that univariate data analysis is integrated into metabolomics experimental designs. This is particularly salient for validation purposes for specific potential biomarkers.

Student's t-tests can be used in order to discover statistical significance in univariate datasets consisting of two sample comparisons, or more if suitable corrections are applied for a false discovery rate. There are several variations of this test which rely on similar concepts, including the unequal variance t-test derivative, and the unrecommended non-parametric Mann–Whitney U test. Typically, these tests can be paired and unpaired, and are used in conjunction with the variable type, whether this be dependent or independent respectively. An unpaired t-test will evaluate the statistical significance of any differences between mean values between two independent groups. Degrees of freedom are considered in order to establish statistical significance. As with all other parametric tests for evaluating differences between mean values, critical assumptions of normality, intra-sample variance homogeneity, and in cases of randomized blocks ANOVA without replications so that predictor variable interactions may not be considered, and additivity all apply.

formula
Equation 1.1
formula
Equation 1.2
formula
Equation 1.3

graphic

formula
Equation 1.4
formula
Equation 1.5

In which x̄ represents the mean, µ0 is the null hypotheses, s is the standard deviation, n is the sample size, s12 and s22 are the variance with the associated numerical value indicating the group number, n1 and n2 represent the sample size with the associated numerical value indicating the group number, and s2 is the pooled sample variance.

In eqn (1.1) and (1.2) the degrees of freedom are (n−1). In eqn (1.2) when calculating the degrees of freedom, (n−1) is used in which n represents the number of paired samples. The Welch–Satterwaite equation is required for calculation of the degrees of freedom calculation in eqn (1.4).

Percival et al. applied a paired student t-test for a metabolomics investigation monitoring methanol and other metabolites in saliva using 1H NMR analysis.38  Two samples were taken from smoking participants prior and subsequent to smoking a single cigarette; thus, a paired test was appropriate. The paired student t-test showed highly significant differences between molecules such as methanol and propane-1,2-diol, which were significantly elevated post-smoking, with significance levels of p=<10−6 and 2.0×10−4 respectively.

The Mann–Whitney-U test counts the number of times the null hypothesis is proven false and this process is completed for both sample groups. The U statistic is then calculated, and is equivalent to the area under the receiver operating characteristic (ROC) curve which will be described in more detail later.

Fold-change analysis can also be performed to assess the degree of change in variable levels, and can be used to describe an increase of “X-fold” per sample classification. It is simply a ratio of two mean values.

Analysis of Variance has been successfully applied in metabolomics investigations such as the detection and determination of methanol in smokers’ salivary profiles using 1H NMR analysis;38  one typical experimental design is shown in eqn (1.6). This analysis of covariance (ANCOVA) model included the between sampling time-points Ti, smoking/non-smoking groups Sj, between participants P(j)k and between gender sources of variation, Gl The mean value, µ and unexplained error, eijkl are also incorporated into this mathematical model, in addition to the first-order interaction effect between the smoking/non-smoking groups and sampling time points, that is TSij Participants were ‘nested’ within treatment groups.

Yijklm=µ+Ti+Sj+P(j)k+Gl+TSij+eijklm ANCOVA
Equation 1.6

This ANCOVA test complimented the results acquired in the aforementioned paired students t-test but is particularly advantageous as the ANCOVA model factored in all possible sources of variation, including interaction effects and unexplained errors. However, ANOVA or ANCOVA models can be applied in different manners which are also applicable to metabolomics applications. ANOVA-simultaneous component analysis (ASCA), for example, allows comparison of data which has been acquired on the same human participants at increasing time-points, or when considering alternative second variables. It can handle two experimental factors, but also observe the factors separately, along with their magnitudes by operating with the use of a combination of ANOVA factors with PCA, the latter of which is described below. It can also isolate ASCA contributions from statistical interaction effects, just as it can in univariate ANOVA and ANCOVA models.

Factorial ANOVA can handle more than one factor simultaneously at multiple levels. Repeated measurements of ANOVA can also be applied in longitudinal studies. ANOVA techniques are generally more applicable in targeted metabolomics using MS; however, there are a few examples seen in NMR-based metabolomics applications, as discussed below.

ANCOVA can account for qualitative and quantitative variables.

Ruiz-Rodado et al. successfully applied ANCOVA and ASCA to 1H NMR metabolomics datasets from mice with Niemann–Pick Disease, Type C1.63  The ANCOVA model accounted for three factors and six sources of primary variation as shown in eqn (1.7), to provide the univariate predictor variable, Yijk. Between disease classifications, Di, between genders Gj and the experimental sampling time points, for example, for 3, 6, 9 and 11 week-old mice, Tk were incorporated into the experimental ANCOVA design. Interactions between each variable were also considered, for example DGij, DTik and GTjk. Interaction effects are computed to assess the dependence of the effects or significance of one variable at different levels or classifications of another µ represents the mean value in the population in the absence of all sources of variation, and eijk is the residual (unexplained) error contribution.

Yijk=µ+Di+Gj+Tk+DGij+DTik+GTjk+eijk ANCOVA
Equation 1.7

Once key features were identified using multivariate analysis, ANCOVA was applied in a univariate context in order to reveal information regarding significant metabolites that were time-dependent, such as 3-hydroxyphenylacetate, and gender-dependent such as tyrosine. Moreover, this tool was able to show significant metabolites for a combination of variables, for example the “time-point x disease” interaction effect revealed inosine as one of the significant biomarkers; in addition, the “gender x disease” interaction effect showed a combined lysine/ornithine resonance as one of the significant distinguishing spectral features. Thus, this technique can be used successfully in metabolomics across numerous markers, and provide distinct p values for each metabolite investigated. False discovery rates and power calculations can be applied, which will be discussed below.

An alternative to ASCA is multilevel simultaneous component analysis (MSCA), which can also allow for paired datasets and divides the data into two parts, for example age and sex, and then monitors the variance associated within and between each variable.64  ASCA supersedes MSCA, as it is simply a MV extension with the benefit of ANOVA, and this explains why it is less commonly used in metabolomics studies, as other multilevel techniques are more frequently applied.

It is, of course, essential to incorporate multivariate data analysis into a metabolomics investigation. Univariate data analysis can consider some metabolites insignificant, but this is not the case in a multivariate context, generally because its effects only correlate, perhaps strongly, with one or more of a pattern of other metabolite variables. Moreover, the insignificance of a variable in univariate analysis could also be explicable owing to high levels of biological and/or measurement variation. Multivariate analysis may perhaps overcome this problem by further explaining classifications attributable to biological/measurement variations, and is able to combine variables together as components by their correlations and inter-relationships.

The most common unsupervised multivariate method is termed PCA, which is particularly useful for data mining and outlier detection. It summarises the variance contained in the dataset in a small number of principal components (PCs, latent variables). The principle consists of applying a rotation in the space of the N-dimensional variables, so that the new axis system, composed of the principal components, maximises the dispersion of the samples. The first principal component (PC1) represents the direction of the space containing the largest variance expressed in the analysed data. The second principal component PC2 represents the second direction of greater variance, in the orthogonal subspace at PC1, and so on. This procedure continues until the entire variance is explained and thus allows the essential information contained within the dataset to be synthesised by a limited number of PCs (usually≪N). Each PC corresponds to a linear combination of the N original metabolite variables, the weights represent the contribution of them to these components. The representation of these components makes it possible to visualise the associated metabolic signatures. One of the interests of the method is to identify, without a priori considerations, possible groupings of individuals and/or variables. However, it is possible that the primary sources of variance within a cohort of samples are not related to the effect studied, and therefore this unsupervised analysis attempts a sample/participant classification without any prior consideration of their classifications. Supervised analysis methods, however, may identify variations in metabolites which are or may be correlated with the parameters of interest of the study. Both approaches also allow the detection of samples with atypical behaviour (“outliers”) when compared relative to the remainder of the population. Figure 1.7 shows 95% confidence intervals for a typical PCA scores plot. These may be established using a multivariate generalization of Student's t-test, known as Hotelling's T2 test.30  T2 determines how far away an observation is from the centre of a PC. In Figure 1.7, the points highlighted with blue arrows are outliers.

Figure 1.7

PCA plot of urinary profiles from feline urine from drug-treated (1000 MG CD), untreated (UNTREATED_NPC) and healthy control (CONTROL) groups shown in red, blue and green respectively. PC1 and PC2 represent 57% and 8% of the dataset variance respectively, and the 95% confidence ellipses are shown. Two clear outliers are highlighted by the arrows.

Figure 1.7

PCA plot of urinary profiles from feline urine from drug-treated (1000 MG CD), untreated (UNTREATED_NPC) and healthy control (CONTROL) groups shown in red, blue and green respectively. PC1 and PC2 represent 57% and 8% of the dataset variance respectively, and the 95% confidence ellipses are shown. Two clear outliers are highlighted by the arrows.

Close modal

These outliers could arise for a variety of reasons, such as xenobiotics and/or unusual or unexpected metabolites being detected in the urine or alternatively, the sample could display unexpected intensity alterations in a particular profile region. The PCA plot will not only indicate which samples are outliers, but also which principal component (PC) it is loading on (via a loadings plot), and also which other bucket regions are strongly loaded on that component. This information aids in the identification of classifications for these samples/participant donors.

Figure 1.7 shows a typical PCA plot obtained from feline urine samples with two outliers also identified. An improved example is shown in Figure 1.8 (provided by Kwon et al.65 ), who assessed green coffee bean metabolites in which a sample was removed in view of poor spectral shimming, with the unshimmed sample is shown in the inset image.

Figure 1.8

Background showing a PCA scores plot of green coffee bean extracts with a 95% confidence ellipse revealing an outlier and an unshimmed spectrum outlier sample in the foreground which is placed outside the confidence ellipse for all samples.65  Reproduced from ref. 65 with permission from Elsevier, Copyright 2014.

Figure 1.8

Background showing a PCA scores plot of green coffee bean extracts with a 95% confidence ellipse revealing an outlier and an unshimmed spectrum outlier sample in the foreground which is placed outside the confidence ellipse for all samples.65  Reproduced from ref. 65 with permission from Elsevier, Copyright 2014.

Close modal

An alternative to PCA analysis is simultaneous component analysis (SCA) which takes into account different sources of variation by separating datasets into sub-matrices;64  however, PCA is more commonly employed in this field. An extension of PCA, namely group-wise PCA (GPCA) has recently been created in order to distinguish between overlapping groups of variables, and may begin to be more commonly used in metabolomics investigations in the near future.66 

Partial-least squares discriminatory analysis (PLS-DA) is a supervised MV analysis technique which is able to distinguish between disease or alternative classifications, and focuses on ‘between-class’ maximisation. This method aims to predict a response variable Y (qualitative) from an explanatory data matrix X. The components of the PLS are composed to take into account the maximum variance of the data (X) which are the most correlated possible with Y. In this case, Y is a discrete variable that takes a value that depends only on the categorical class associated with the sample. PLS analysis allows the identification of the most important response-variables in the prediction of the variable Y, and thus makes it possible to highlight the most discriminant variables between the groups, and whether metabolites are upregulated or downregulated by creating latent structures and variable importance plots (VIPs). Similar to PCA, PCs can be plotted in order to observe clusterings, with each PC being orthogonal to each-other, and with PC1 again containing the highest sample variance, and PC2 containing the next highest sample variance, and so forth. PLS-DA has many variants, including orthogonal PLS-DA (O-PLS-DA), multilevel PLS-DA (M-PLS-DA), powered PLS-DA (PPLS-DA), and N-way-PLSDA. Each has its own advantages and disadvantages for metabolomics use. For example, O-PLS-DA can only handle two groups for comparative evaluations. In this case, the orthogonal signal correction filter applied enables separation between predictive variation and correlation.

Supervised analysis, unlike PCA, can lead to biased data and overestimation of the predictive capabilities of the model. Indeed, the large amount of data generates a space with a large number of dimensions in which it is almost always possible to find a direction of separation between the samples. Therefore, it is essential to ensure the quality of the models established with validation methods such as permutation testing, cross-validation and ROC analysis.

  • Cross-validation: Cross-validation is the most common validation method used in metabolomics. It is based on two parameters to evaluate the model's performance: R² and Q²Y. R² (X and Y) represents the explained variance proportion of the matrix of the X and Y variables, and Q²Y cumulative represents the predictive quality of the model. It can be interpreted as the estimation of R² for new data. The closer these values are to 1, the more the model is considered as predictive, and the results of the separation as significant.67 

  • Permutation test: The objective of this test is to confirm that the initial model is superior to other models obtained by permuting the class labels and randomly assigning them to different individuals. The initial model is statistically compared to all the other randomly-assigned models. Based on this, a p-value is then calculated. If the p-value is lower than 0.5, this indicates that the initial model performs better than 95% of the randomly assigned models.68 

  • Cross-validation and permutation tests are complementary, and both must be performed in order to validate a model. Indeed, cross-validation makes it possible to evaluate the capacity of the model to correctly predict in which class a new sample will be, while the test of permutation validates the model used.68 

  • ROC analysis: Area under the curve receiver operating characteristic (AUROC) value can then be used to monitor the sensitivity and specificity of singular metabolites, and the performance of the test system as a whole. Sensitivity and specificity are monitored, in which a correlation of 1.0 and 0.0 can be observed, with correlations of 1 representing a perfect distinction between classes, values greater than 0.5 being considered discriminatory, and a value equivalent to 0.50 demonstrating that the model is as likely to correctly classify a sample as if one was tossing a coin.69  PLS-DA is then validated using permutation testing, which is able to define the p value for the PLS-DA discriminatory ability. Further validation can be performed using leave-one-out cross validation (LOOCV) and 10-fold cross validation in order to obtain the Q2 and R2 values; Q2 values greater than 0.5 are considered satisfactorily discriminatory. Advantageously, PLS-DA provides the VIPs, which are able to distinguish which metabolites are responsible for the distinction observed, and also whether these metabolites are up- or downregulated.

PLS-DA analysis was effectively applied to an 1H NMR investigation of brain extracts obtained from the post-mortems of patients with Huntington's disease and control patients (Figure 1.9). Permutation testing was applied in order to validate the study using 2000 permutations for the frontal lobe analysis and striatum region, yielding p values of 0.003 and <0.001.70  In addition, permutation testing was performed again using only 1000 permutations, the results showing that these values for the frontal lobe and striatum were 0.108 and 0.015 respectively, which indicated that the frontal lobe was less affected by the pathological implications of Huntington's Disease than the striatum.70  VIPs were also useful for the identification of the metabolites causing these significant differences, and their up- or downregulation status (Figure 1.10). Values for AUROC, sensitivity and specificity values were 0.942, 0.869 and 0.865 respectively, using training/discovery, and 0.838, 0.818 and 0.857 respectively using 10-fold cross validation, the results demonstrate the success of the model.

Figure 1.9

PLS-DA showing 95% confidence ellipses and corresponding VIPs of Huntington's disease (red) versus control (blue) frontal lobe extracts in (A) and (B), and the striatum region shown in (C) and (D).70  Reproduced from ref. 70 with permission from Elsevier, Copyright 2016.

Figure 1.9

PLS-DA showing 95% confidence ellipses and corresponding VIPs of Huntington's disease (red) versus control (blue) frontal lobe extracts in (A) and (B), and the striatum region shown in (C) and (D).70  Reproduced from ref. 70 with permission from Elsevier, Copyright 2016.

Close modal
Figure 1.10

OPLS-DA S-plot revealing discriminatory metabolites. Abbreviations: N-acetyl-aspartate (NAA), gamma-aminobutyric acid (GABA), glutamate (Glu), glutamine (Gln). Reproduced from ref. 71 with permission from Elsevier, Copyright 2018.

Figure 1.10

OPLS-DA S-plot revealing discriminatory metabolites. Abbreviations: N-acetyl-aspartate (NAA), gamma-aminobutyric acid (GABA), glutamate (Glu), glutamine (Gln). Reproduced from ref. 71 with permission from Elsevier, Copyright 2018.

Close modal

OPLS-DA has been used to discriminate between two groups using orthogonal latent structures. The OPLS-DA plots revealed a loadings diagram with a S (sigmoidal)-shaped curve, and in which the validation and permutation tests can also be performed. This S-plot can show visualisation of the OPLS-DA loadings (Figure 1.10), which is useful in the identification of significant metabolites.

The successful use of O-PLS-DA has been demonstrated by Quansah et al. observing the effects of an anti-ADHD psychostimulant, methylphenidate (MPH), on brain metabolite levels.71  Using this methodology, the researchers were able to establish significant and non-significant groupings using this approach. A significant difference was observed between the acute high 5.0 mg kg−1 dose MPH-treated and age-matched saline-treated control groups with an OPLS-DA model showing R2X=0.60, R2Y=0.54, and Q2=0.44; with a permutation test p value=0.0005. A lower acute dosage of 2.0 mg kg−1 MPH provided insignificant results when compared to the saline-treated control groups, showing R2X=0.45, R2Y=0.05, and Q2<0.1; with a permutation p value=0.93. The lower acute dose of 2.0 mg/kg given twice daily was not significantly different from that of the control group.

The significant results pertaining to the higher dosage were further analysed, and an S-plot (Figure 1.10) was obtained, and results acquired were complimentary to those obtained with ANOVA analysis and revealed significant metabolites. The more discriminatory metabolites observed in the OPLS-DA analysis can be observed at each terminal of the S-plot, and are highlighted as glucose, N-acetyl-aspartate (NAA), inosine, gamma-aminobutyric acid (GABA), glutamine (Gln), hypoxanthine, acetate, aspartate and glycine (Figure 1.10).

Canonical correlation analysis (CCorA) is a valuable technique for revealing correlations between two sets of variables, usually predictor and response ones. This approach primarily forms independent PCs for each of the two datasets, and can then be used to explore the significance of the inter-relationships between these. This has been demonstrated in Probert et al.,31  in which scores vector datasets are derived from separate 1H NMR and traditional clinical chemistry determination datasets respectively. For this study, observations of the loading vectors showed that the total lipoprotein triacylglycerol-CH3 function-normalised 1H NMR triacylglycerol resonances, loaded strongly on PC1–PC4 from an 1H NMR-based dataset (shown in red in Figure 1.9), and the total triacylglycerol concentration-normalised clinical chemistry laboratory-determined total, low-density-lipoprotein (LDL)- and high-density-lipoprotein (HDL)-associated cholesterol levels loaded on PC1* and PC2* (shown in green in Figure 1.11). This CCorA analysis demonstrated firstly that the PC2* scores vectors positively correlated with those of PC2, and this was consistent with their common HDL sources. Secondly, PC1* was negatively correlated with PC4, that is a linear combination of plasma triacylglycerol-normalised total cholesterol and LDL-cholesterol concentrations was anti-correlated with the 1H NMR PC arising from the LDL-triacylglycerols.

Figure 1.11

CCorA analysis from a PCA plot in which Y1 and Y2 represent scores vector datasets arising from the separate 1H NMR and clinical chemistry datasets respectively. Reproduced from ref. 31, https://doi.org/10.1038/s41598-017-06264-2, under the terms of a CC BY 4.0 license, https://creativecommons.org/licenses/by/4.0/.

Figure 1.11

CCorA analysis from a PCA plot in which Y1 and Y2 represent scores vector datasets arising from the separate 1H NMR and clinical chemistry datasets respectively. Reproduced from ref. 31, https://doi.org/10.1038/s41598-017-06264-2, under the terms of a CC BY 4.0 license, https://creativecommons.org/licenses/by/4.0/.

Close modal

Extended canonical variate analysis (ECVA) uses a more complex supervised algorithm than PLS-DA, and is used in order to distinguish the maximum ratio of between-class variation to within-class variation72 ). ECVA observes individual metabolite regions, in addition to the dataset as a whole. The benefit of using ECVA is that it can discriminate between more than two groups without overfitting.

Figure 1.12 shows the number of misclassifications for each spectral interval in the average NMR spectrum of 26 wine samples. The region with the lowest numbers of misclassifications, and therefore the most discriminatory, is highlighted as the 100th interval, with only two such misclassifications. Figure 1.13 shows a scores plot of EVC3 versus EVC1 showing clear distinctions between wineries based on the selected 100th interval (Table 1.1).

Figure 1.12

iECVA plot overlay with average NMR spectra from 26 wine samples from La Rioja.73  Reproduced from ref. 73 with permission from American Chemical Society, Copyright 2012.

Figure 1.12

iECVA plot overlay with average NMR spectra from 26 wine samples from La Rioja.73  Reproduced from ref. 73 with permission from American Chemical Society, Copyright 2012.

Close modal
Figure 1.13

ECVA score plot with 26 wine samples showing different wineries in La Rioja based on the 100th interval shown in Figure 1.12. Reproduced from ref. 73 with permission from American Chemical Society, Copyright 2012.

Figure 1.13

ECVA score plot with 26 wine samples showing different wineries in La Rioja based on the 100th interval shown in Figure 1.12. Reproduced from ref. 73 with permission from American Chemical Society, Copyright 2012.

Close modal
Table 1.1

Statistical and Classifications Strategies Inclusive of Advantages and Disadvantages in NMR Metabolomics Applications.

Metabolomic MethodologyClassification or StatisticalUnivariate or MultivariateSupervisionAdvantages (+) and Disadvantages (−)
ANOVA Statistical Univariate Unsupervised + Hypothesis testing, with the ability to evaluate the statistical significance of a wide range of contributory variables, and their interactions, simultaneously. Partitions the total experimental variance into differential ‘predictor’ components, which may be fixed or random effects. Satisfaction of essential assumptions can be achieved with suitable transformations, for example logarithmic or square root ones. 
Student's t-test Statistical Univariate Unsupervised + Hypothesis testing, but without corrections for false discovery rate, is only appropriate for comparisons of the means of only two sample groups. 
Mann–Whitney U- test Statistical Univariate Unsupervised + Hypothesis testing – non-parametric equivalent of two sample t test. 
+ Data does not require normalisation prior to use. 
Fold-change Analysis Statistical Univariate Unsupervised + Hypothesis testing; represents the ratio of two sample group mean values, and the significance of these indices may be tested. 
ASCA Statistical Multivariate Unsupervised + Can consider paired samples, for example from the same person at different time-points, or two or more possible predictor variables simultaneously. 
PCA Statistical Multivariate Unsupervised + Outlier detection 
+ Unsupervised multivariate technique for dimensionality reduction and the preliminary exploration of 2D or 3D samples or participant clusterings. 
PLS-DA Statistical Multivariate Supervised + VIPs for significant metabolites 
− As it is subject to overfitting, permutation and validation testings are essential. 
O-PLS-DA Statistical Multivariate Supervised + S-Plot for significant metabolites 
− Can only consider two groups, validation and permutation testing required. 
Metabolomic MethodologyClassification or StatisticalUnivariate or MultivariateSupervisionAdvantages (+) and Disadvantages (−)
ANOVA Statistical Univariate Unsupervised + Hypothesis testing, with the ability to evaluate the statistical significance of a wide range of contributory variables, and their interactions, simultaneously. Partitions the total experimental variance into differential ‘predictor’ components, which may be fixed or random effects. Satisfaction of essential assumptions can be achieved with suitable transformations, for example logarithmic or square root ones. 
Student's t-test Statistical Univariate Unsupervised + Hypothesis testing, but without corrections for false discovery rate, is only appropriate for comparisons of the means of only two sample groups. 
Mann–Whitney U- test Statistical Univariate Unsupervised + Hypothesis testing – non-parametric equivalent of two sample t test. 
+ Data does not require normalisation prior to use. 
Fold-change Analysis Statistical Univariate Unsupervised + Hypothesis testing; represents the ratio of two sample group mean values, and the significance of these indices may be tested. 
ASCA Statistical Multivariate Unsupervised + Can consider paired samples, for example from the same person at different time-points, or two or more possible predictor variables simultaneously. 
PCA Statistical Multivariate Unsupervised + Outlier detection 
+ Unsupervised multivariate technique for dimensionality reduction and the preliminary exploration of 2D or 3D samples or participant clusterings. 
PLS-DA Statistical Multivariate Supervised + VIPs for significant metabolites 
− As it is subject to overfitting, permutation and validation testings are essential. 
O-PLS-DA Statistical Multivariate Supervised + S-Plot for significant metabolites 
− Can only consider two groups, validation and permutation testing required. 

Both univariate and multivariate statistical approaches enable users to explore which metabolites are up- or downregulated. However, the meaningfulness of this is not unveiled unless pathway analysis is performed. Metabolite set enrichment analysis (MSEA) and metabolomics pathway analysis (MetPA) are able to determine whether metabolite concentration changes relate to metabolic pathways, perturbations of which may be involved in the disease process explored.74,75  These features are integrated into MetaboAnalyst 4.0. Disturbed metabolic pathways involving metabolites identified and quantified by NMR analysis are identified through the exploitation of databases such as KEGG (Kyoto Encyclopedia of Genes and Genomes) or Reactome, and then reconstructed and visualized using a software tool such as Cytoscape (www.cytoscape.org), Metaboanalyst (www.metaboanalyst.ca) or MetExplore (metexplore.toulouse.inra.fr/index.html/).

It is important to appreciate the challenges of applying such statistical tools to metabolomics datasets. Data should be interpreted and described in an accurate manner. Bias can easily be introduced from both analytical and biological perspectives. Although analytical variances are addressed in previous sections of this chapter, it is important not to introduce bias from sample preparation techniques, whether this be by extraction, storage or the employed procedures and analytical conditions.76  This can lead to metabolite assignment and/or concentration errors. Biological variance does need considering, and hence it is important to follow experimental design factors such as those noted above, for example potential metabolic differences between ages, genders, body mass indices, races, and so forth should also be incorporated into experimental designs, but unlike univariate analysis approaches, this is often very difficult to achieve in multivariate analysis models. Moreover, the statistical power of metabolomics experimental models can be poor in view of low sample numbers; power cannot yet be assessed a priori to a study in a multivariate sense, usually only posteriori. Validation of metabolomics datasets are required, for example ensuring that biomarkers are: (i) reproducible at another laboratory site; and (ii) decrease or elevate succinctly upon treatment, if they are indeed influenced by such processes. In a univariate sense, a prior knowledge of sample size can be determined, as indeed it can with pilot datasets in multivariate analysis.

Intriguingly, PCA can be employed to monitor ‘between-replicate’ analytical reproducibility, and Figure 1.14 shows results from an experiment involving the 1H NMR profiles of n=4 duplicate urine samples which were analysed on two separate days so that both the within- and between-assay effects could be evaluated. These results demonstrate a high level of discrimination between the individual samples analysed, and also an acceptable level of agreement between the samples analysed for the within- and between-assays performed.

Figure 1.14

Two- and three-dimensional PCA scores plots showing clear distinctions between four sets of n=2 replicate samples of four separate urine samples (coded RK2, RK28, RK37 and RK41) analysed using 1H NMR spectroscopy at an operating frequency of 400 MHz. The colour codes provided on the right-hand side abscissa axes also provide information on the dates that the samples were analysed, for example the RK2-1 and RK2-2 samples correspond to the same sample analysed in duplicate on two separate occasions. Duplicate samples were prepared for analysis on each occasion.

Figure 1.14

Two- and three-dimensional PCA scores plots showing clear distinctions between four sets of n=2 replicate samples of four separate urine samples (coded RK2, RK28, RK37 and RK41) analysed using 1H NMR spectroscopy at an operating frequency of 400 MHz. The colour codes provided on the right-hand side abscissa axes also provide information on the dates that the samples were analysed, for example the RK2-1 and RK2-2 samples correspond to the same sample analysed in duplicate on two separate occasions. Duplicate samples were prepared for analysis on each occasion.

Close modal

An example of a retrospective power calculation is shown in Figure 1.15 from the investigation performed by Quansah et al.77  This study, monitored markers in murine brains following the administration of acute methylphenidate and used a total of n=36 samples, 18 untreated and 18 treated, and featured the observation of 13 of the biomarker variables. Predicted statistical power values of 0.99 and 1.00 were achieved considering 16 and 24 samples respectively; therefore, there was a justification for the sample size of n=18 being selected for this particular study.

Figure 1.15

Power calculations performed by Quansah et al.,77  in which the line reading 1.0 shows the optimum number of samples required for statistical significance. Reproduced from ref. 77 with permission from Elsevier, Copyright 2017.

Figure 1.15

Power calculations performed by Quansah et al.,77  in which the line reading 1.0 shows the optimum number of samples required for statistical significance. Reproduced from ref. 77 with permission from Elsevier, Copyright 2017.

Close modal

Power analysis ensures that ethical boundaries are implemented appropriately.

Some techniques work better with a larger number of variables, as is the case for PLS-DA.78  Indeed, overfitting is also possible if the number of variables exceeds the number of samples, a now increasingly common experience in metabolomics research.78 

Type I and Type II errors both need to be considered: the former is the improper rejection of the null hypothesis for example a false positive, and the latter assesses the rigour of the test which ensures that the null hypothesis is properly rejected, that is a false negative.76 

Violation of statistical analysis approaches may sometimes import a false significance to a variable when it is not, and vice versa. Common misconceptions regarding p values and confidence interval values, and statistical power were explored by Greenland et al.79  Bonferroni and Bonferroni–Holm and Sidak corrections can be applied for type I errors, and the Benjamin–Hochberg approach can be applied for false discovery rates in univariate analysis.

Metabolomics is now becoming a more integral part of diagnostics to strengthen and predict disease conditions and their progression more rapidly. Several diseases present challenging diagnostic problems. Markers within disease metabolomics can be combined with data for enhanced discrimination. An example of this is shown by Glaab et al.,80  in which metabolomics data and positron emission tomography brain neuroimaging data was combined in order to increase the discriminatory power using support vector machines (SVM) and random forest (RF) analysis strategies using LOOCV and ROC in the diagnosis of Parkinson's disease. In addition to this, it is evident in the publications consulted throughout this chapter that metabolomics is used in a variety of fields, and that a combination of statistical techniques and machine learning technologies are useful in combination. Future applications of spectral analysis are expanding to be more interdisciplinary to enable more robust models and accurate statistical analysis.

Within the NMR-based metabolomics field, one major drawback is analyte sensitivity. However, methodologies such as hyperpolarisation and enhancing technologies, for example the use of cryoprobes, is increasing the sensitivity of biomarker analytes. It should also be noted that statistical techniques and machine learning strategies are evolving and are often used in combination to effectively cope with the high dimensionality of datasets acquired in NMR-linked metabolomics. Indeed, enhancements in computer power are promoting faster turnaround times for data acquisition.

Statistical applications can successfully be applied to spectral and chromatographic datasets acquired in numerous fields, whether this be the diagnosis of diseases, the stratification of disease developmental stages and prognostics, or the creation of pseudo-two-dimensional spectra, as in STOCSY-type approaches. Sound relationships can be established using NMR datasets if correct standard operating procedures are followed, and which consider careful experimental design. Statistical methods can serve to distinguish between the metabolic patterns of different classifications of diseases and disease stages in both multivariate or univariate senses. Machine learning compliments statistical techniques, and aids further understanding of the clustering of the metabolites.

ANCOVA

Analysis of Covariance

ANOVA

Analysis of variance

ASCA

Analysis of variance simultaneous component analysis

AUC

Area Under Curve

BLIP

Backward Linear Prediction

BML-NMR

Birmingham Metabolite Laboratory-Nuclear Magnetic Resonance

CCORA

Canonical correlation analysis

CV

Cross Validation

CW

Continuous Wave

DFA

Discriminant Function Analysis

ECVA

Extended Canonical Variate Analysis

EDNN

Ensemble Deep Neural Networks

EDTA

Ethylenediaminetetraacetic acid

FLIP

Forward Linear Predication

HCA

Hierarchical Clustering Analysis

HMDB

Human Metabolome Database

HPLC

High Performance Liquid Chromatography

KNN

K-Nearest Neighbour

LDA

Linear Discriminant Analysis

LOOCV

Leave one out cross validation

LS

Least Squares

MANOVA

Multivariate Analysis of Variance

MLR

Multiple Linear Regression

MMCD

Madison Metabolomics Consortium Database

MS

Mass Spectrometry

NMR

Nuclear Magnetic Resonance

O-PLS-DA

Orthogonal-Partial Least Square-Discriminant Analysis

PC

Principal Component

PCA

Principal Component Analysis

PLS-DA

Partial Least Squares Discriminant Analysis

RBF

Radial

RF

Random Forest

ROC

Receiver Operating Characteristic

SDBS

Spectral database for organic Compounds

SHY

Statistical Heterospectroscopy

STOCSY

Statistical Total Correlation Spectroscopy

SOM

Self-Organising Maps

SVM

Support Vector Machine

VIP

Variable Importance Plot

The authors are grateful to Katy Woodason for useful discussions. BCP would like to acknowledge De Montfort University for her fee waiver for her PhD studies.

1.
Bayes
 
T.
Price
 
R.
An Essay towards solving a Problem in the Doctrine of Chances
Philos. Trans. R. Soc. London
1763
, vol. 
53
 (pg. 
370
-
418
)
2.
Pearson
 
K.
On lines and planes of closest fit to systems of points in space
London, Edinburgh Dublin Philos. Mag. J. Sci.
1901
, vol. 
2
 
11
(pg. 
559
-
572
)
3.
Spearman
 
C.
General Intelligence Objectively Determined and Measured
Am. J. Psychol.
1904
, vol. 
15
 
2
(pg. 
201
-
292
)
4.
Gosset
 
W. S.
The application of the law of error to the work of the brewery
Guinness Internal Note
1904
5.
R. A.
Fisher
,
Statistical Methods for Research Workers Oliver and Boyd Edinburgh
,
1925
6.
Fisher
 
R. A.
Balmukand
 
B.
The estimation of linkage from the offspring of selfed heterozygotes
J. Genet.
1928
, vol. 
20
 
1
(pg. 
79
-
92
)
7.
Neyman
 
J.
Pearson
 
E. S.
Pearson
 
K.
IX. On the problem of the most efficient tests of statistical hypotheses
Philos. Trans. R. Soc., A.
1933
, vol. 
231
 (pg. 
694
-
704
)
8.
Emwas
 
A.-H. M.
et al., NMR-based metabolomics in human disease diagnosis: applications, limitations, and recommendations
Metabolomics
2013
, vol. 
9
 
5
(pg. 
1048
-
1072
)
9.
Pauli
 
W.
On the hydrogen spectrum from the standpoint of the new quantum mechanics
Z. Phys.
1926
, vol. 
36
 (pg. 
336
-
363
)
10.
Rabi
 
I. I.
et al., The Molecular Beam Resonance Method for Measuring Nuclear Magnetic Moments. The Magnetic Moments of 3Li6, 3Li7 and 9F19
Phys. Rev. J. Arch.
1939
, vol. 
55
 pg. 
526
 
11.
Overhauser
 
A. W.
Polarization of Nuclei in Metals
Phys. Rev.
1953
, vol. 
92
 pg. 
411
 
12.
Redfield
 
A. G.
On the Theory of Relaxation Processes
IBM J. Res. Dev.
1957
, vol. 
1
 
1
(pg. 
19
-
31
)
13.
Purcell
 
E. M.
et al., Resonance Absorption by Nuclear Magnetic Moments in a Solid
Phys. Rev. J. Arch.
1945
, vol. 
69
 pg. 
37
 
14.
Bloch
 
F.
et al., Nuclear Induction
Phys. Rev.
1946
, vol. 
69
 pg. 
127
 
15.
Grootveld
 
M.
et al., Progress in Low-Field Benchtop NMR Spectroscopy in Chemical and Biochemical Analysis
Anal. Chim. Acta
2019
, vol. 
1067
 (pg. 
11
-
30
)
16.
Percival
 
B.
et al., Low-Field, Benchtop NMR Spectroscopy as a Potential Tool for Point-of-Care Diagnostics of Metabolic Conditions: Validation, Protocols and Computational Models
High-Throughput
2019
, vol. 
1
 pg. 
2
 
17.
Nicholson
 
J. K.
et al., 750 MHz 1H and 1H-13C NMR Spectroscopy of Human Blood Plasma
Anal. Chem.
1995
, vol. 
67
 
5
(pg. 
783
-
811
)
18.
Lippens
 
G.
Dhalluin
 
C.
Wieruszeski
 
J. M.
Use of a water flip-back pulse in the homonuclear NOESY experiment
J. Biomol. NMR
1995
, vol. 
5
 
3
(pg. 
327
-
331
)
19.
Piotto
 
M.
Saudek
 
V.
Sklenář
 
V.
Gradient-tailored excitation for single-quantum NMR spectroscopy of aqueous solutions
J. Biomol. NMR
1992
, vol. 
2
 
6
(pg. 
661
-
665
)
20.
Le Guennec
 
et al., Alternatives to Nuclear Overhauser Enhancement Spectroscopy Presat and Carr-Purcell-Meiboom-Gill Presat for NMR-Based Metabolomics
Anal. Chem.
2017
, vol. 
89
 (pg. 
8582
-
8588
)
21.
J.
Lindon
,
J.
Nicholson
and
Holmes
,
The Handbook of Metabonomics and Metabolomics
,
Elsevier Science
, 1st edn,
2008
22.
Pauling
 
L.
et al., Quantitative analysis of urine vapor and breath by gas-liquid partition chromatography
Proc. Natl. Acad. Sci. U.S.A.
1971
, vol. 
68
 
10
(pg. 
2374
-
2376
)
23.
Kell
 
D. B.
Oliver
 
S. G.
The Metabolome 18 Years on: A Concept Comes of Age
Metabolomics
2016
, vol. 
12
 
9
pg. 
148
 
24.
Bales
 
J. R.
et al., Urinary-excretion of acetaminophen and its metabolites as studied by proton NMR-Spectroscopy
Clin. Chem.
1984
, vol. 
30
 (pg. 
1631
-
1636
)
25.
Nicholson
 
J. K.
et al., Monitoring metabolic disease by proton NMR of urine
Lancet
1984
, vol. 
2
 (pg. 
751
-
752
)
26.
Lindon
 
J.
Nicholson
 
J.
Spectroscopic and Statistical Techniques for Information Recovery in Metabonomics and Metabolomics
Annu. Rev. Anal. Chem.
2008
, vol. 
1
 
1
(pg. 
45
-
69
)
27.
German
 
J. B.
et al., Metabolomics: building on a century of biochemistry to guide human health
Metabolomics
2005
, vol. 
1
 
1
(pg. 
3
-
9
)
28.
Beisken
 
S.
et al., Getting the right answers: understanding metabolomics challenges
Expert Rev. Mol. Diagn.
2015
, vol. 
15
 
1
(pg. 
97
-
109
)
29.
Sandlers
 
Y.
The future perspective: metabolomics in laboratory medicine for inborn errors of metabolism
Transl. Res.
2017
, vol. 
189
 (pg. 
65
-
75
)
30.
Lindon
 
J. C.
Holmes
 
E.
Nicholson
 
J.
Metabonomics techniques and applications to pharmaceutical research & development
Pharm. Res.
2006
, vol. 
23
 
6
(pg. 
1075
-
1088
)
31.
Probert
 
F.
et al., NMR analysis reveals significant differences in the plasma metabolic profiles of Niemann Pick C1 patients, heterozygous carriers, and healthy controls
Nat. Sci. Rep.
2017
, vol. 
7
 pg. 
6320
 
32.
Pinito
 
J.
et al., Human plasma stability during handling and storage: impact on NMR metabolomics
Analyst
2014
, vol. 
139
 
5
(pg. 
1168
-
1177
)
33.
Emwas
 
A.-H. M.
et al., Recommendations and Standardization of Biomarker Quantification Using NMR-Based Metabolomics with Particular Focus on Urinary Analysis
J. Proteome Res.
2016
, vol. 
15
 (pg. 
360
-
373
)
34.
Beckonert
 
O.
et al., Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts
Nat. Protoc.
2007
, vol. 
11
 
2
(pg. 
2692
-
2703
)
35.
Alum
 
M. F.
et al., 4,4-Dimethyl-4-silapentane-1-ammonium trifluoroacetate (DSA), a promising universal internal standard for NMR-based metabolic profiling studies of biofluids, including blood plasma and serum
Metabolomics
2008
, vol. 
4
 (pg. 
122
-
127
)
36.
Nicholson
 
J. K.
Gartland
 
K. R.
1H NMR studies on protein binding of histidine, tyrosine and phenylalanine in blood plasma
NMR Biomed.
1989
, vol. 
2
 pg. 
2
 
37.
Martin
 
M.
et al., PepsNMR for 1H NMR metabolomic data pre-processing
Anal. Chim. Acta
2018
(pg. 
1
-
13
)
38.
Percival
 
B.
et al., Detection and Determination of Methanol and Further Potential Toxins in Human Saliva Collected from Cigarette Smokers: A 1H NMR Investigation
JSM Biotechnol. Biomed. Eng.
2018
, vol. 
5
 
1
pg. 
1081
 
39.
Alonso
 
A.
Marsal
 
S.
Julià
 
A.
Analytical methods in untargeted metabolomics: state of the art in 2015
Front. Bioeng. Biotechnol.
2015
, vol. 
3
 
23
(pg. 
1
-
20
)
40.
Giraudeau
 
P.
Tea
 
I.
Remaud
 
G. S.
Akoka
 
S.
Reference and normalization methods: Essential tools for the intercomparison of NMR spectra
J. Pharm. Biomed. Anal.
2014
, vol. 
93
 (pg. 
3
-
16
)
41.
Scalabre
 
A.
et al., Evolution of Newborns’ Urinary Metabolomic Profiles According to Age and Growth
J. Proteome Res.
2017
, vol. 
16
 (pg. 
3732
-
3740
)
42.
Slupsky
 
C. M.
et al., Investigations of the Effects of Gender, Diurnal Variation, and Age in Human Urinary Metabolomic Profiles
Anal. Chem.
2007
, vol. 
78
 
18
(pg. 
6995
-
7004
)
43.
Haslauer
 
K. E.
Hemmler
 
D.
Schmitt-Kopplin
 
P.
Sophie Heinzmann
 
S.
Guidelines for the Use of Deuterium Oxide (D2O) in 1H NMR Metabolomics
Anal. Chem.
2019
, vol. 
91
 
17
(pg. 
11063
-
11069
)
44.
Kohl
 
S. M.
et al., State-of-the art data normalization methods improve NMR-based metabolomic analysis
Metabolomics
2012
, vol. 
8
 
1
(pg. 
146
-
160
)
45.
Gromski
 
P. S.
et al., The influence of scaling metabolomics data on model classification accuracy
Metabolomics
2015
, vol. 
11
 (pg. 
684
-
695
)
46.
Liland
 
H. K.
Multivariate methods in metabolomics – from pre-processing to dimension reduction and statistical analysis
TrAC, Trends Anal. Chem.
2011
, vol. 
30
 
6
(pg. 
827
-
841
)
47.
Wishart
 
D. S.
et al., HMDB 4.0 — The Human Metabolome Database for 2018
Nucleic Acids Res.
2018
, vol. 
46
 (pg. 
D608
-
D617
)
48.
Haug
 
K.
et al., MetaboLights– an open-access general-purpose repository for metabolomics studies and associated meta-data
Nucleic Acids Res.
2013
(pg. 
41 D1 D781
-
D786
)
49.
Ulrich
 
E. L.
et al., BioMagResBank
Nucleic Acids Res.
2008
, vol. 
36
 (pg. 
D402
-
D408
)
50.
Cui
 
Q.
et al., Metabolite identification via the Madison Metabolomics Consortium Database
Nat. Biotechnol.
2008
, vol. 
26
 (pg. 
162
-
164
)
51.
Ludwig
 
C.
et al., Birmingham Metabolite Library: A publicly accessible database of 1D 1H and 2D 1H J-resolved NMR authentic metabolite standards (BML-NMR)
Metabolomics
2012
, vol. 
8
 
1
(pg. 
8
-
12
)
52.
Steinbeck
 
C.
Kuhn
 
S.
NMRShiftDB – compound identification and structure elucidation support through a free community-builtweb database
Phytochemistry
2004
, vol. 
65
 
19
(pg. 
2711
-
2717
)
53.
Sud
 
M.
et al., Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools
Nucleic Acids Res.
2016
, vol. 
44
 (pg. 
D1 D463
-
D1 D46470
)
54.
Dona
 
A. C.
et al., A guide to the identification of metabolites in NMR-based metabonomics/metabolomics experiments
Comput. Struct. Biotechnol. J.
2016
, vol. 
14
 (pg. 
135
-
153
)
55.
Smith
 
L. M.
et al., Statistical Correlation and Projection Methods for Improved Information Recovery from Diffusion-Edited NMR Spectra of Biological Samples
Anal. Chem.
2007
, vol. 
79
 (pg. 
5682
-
5689
)
56.
Blaise
 
B. J.
et al., Two-Dimensional Statistical Recoupling for the Identification of Perturbed Metabolic Networks from NMR Spectroscopy
J. Proteome Res.
2010
, vol. 
9
 (pg. 
4513
-
4520
)
57.
Crockford
 
D. J.
et al., Statistical Heterospectroscopy, an Approach to the Integrated Analysis of NMR and UPLC-MS Data Sets: Application in Metabonomic Toxicology Studies
Anal. Chem.
2006
, vol. 
78
 (pg. 
363
-
371
)
58.
Nicholson
 
J. K.
et al., Statistical Heterospectroscopy, an Approach to the Integrated Analysis of NMR and UPLC-MS Data Sets: Application in Metabonomic Toxicology Studies
Anal. Chem.
2004
, vol. 
9
 
3
(pg. 
363
-
371
)
59.
Takis
 
P. G.
et al., Deconvoluting interrelationships between concentrations and chemical shifts in urine provides a powerful analysis tool
Nat. Commun.
2017
, vol. 
8
 pg. 
1662
 
60.
Da-Wei
 
L.
et al., Reliable resonance assignments of selected residues of proteins with known structure based on empirical NMR chemical shift prediction of proteins with known structure based on empirical NMR chemical shift prediction
J. Magn. Reson.
2015
, vol. 
254
 (pg. 
93
-
97
)
61.
Chong
 
J.
et al., Metaboanalyst 4.0: towards more transparent and integrative metabolomics analysis
Nucleic Acids Res.
2018
, vol. 
46
 
W1
(pg. 
W486
-
W494
)
62.
Saccenti
 
E.
et al., Reflections on univariate and multivariate analysis of metabolomics data
Metabolomics
2014
, vol. 
10
 
3
(pg. 
361
-
371
)
63.
Ruiz-Rodado
 
V.
et al., 1H NMR-Linked Metabolomics Analysis of Liver from a Mouse Model of NP-C1 Disease
J. Proteome Res.
2016
, vol. 
15
 
10
(pg. 
3511
-
3527
)
64.
Lemanska
 
A.
et al., Chemometric variance analysis of 1H NMR metabolomics data on the effects of oral rinse on saliva
Metabolomics
2012
, vol. 
8
 
S1
(pg. 
64
-
80
)
65.
Kwon
 
D.-A.
et al., Assessment of green coffee bean metabolites dependent on coffee quality using a 1H NMR-based metabolomics approach
Food Res. Int.
2015
(pg. 
175
-
182
)
66.
Camacho
 
J.
et al., Group-Wise Principal Component Analysis for Exploratory Data Analysis
J. Comput. Graph. Stat.
2017
, vol. 
26
 
3
(pg. 
501
-
512
)
67.
Broadhurst
 
D. I.
Kell
 
D. B.
Statistical strategies for avoiding false discoveries in metabolomics and related experiments
Metabolomics
2006
, vol. 
2
 
4
(pg. 
171
-
196
)
68.
Xia
 
J.
Broadhurst
 
D. J.
Wilson
 
M.
Wishart
 
D.
Translational biomarker discovery in clinical metabolomics: An introductory tutorial
Metabolomics
2013
, vol. 
9
 
2
(pg. 
280
-
299
)
69.
Bünger
 
R.
Mallet
 
R. T.
Metabolomics and ROC Analysis: A Promising Approach for Sepsis Diagnosis
Crit. Care Med.
2016
, vol. 
118
 
24
(pg. 
6072
-
6078
)
70.
Graham
 
S. F.
et al., Metabolic signatures of Huntington's disease (HD): 1H NMR analysis of the polar metabolome in post-mortem human brain
Biochim. Biophys. Acta, Mol. Basis Dis.
2016
, vol. 
1862
 
9
(pg. 
1675
-
1684
)
71.
Quansah
 
E.
et al., Methylphenidate alters monoaminergic and metabolic pathways in the cerebellum of adolescent rats
Eur. Neuropsychopharmacol.
2018
, vol. 
28
 
4
(pg. 
513
-
528
)
72.
Rinnan
 
A.
Savorani
 
F.
Engelsen
 
S. B.
Simultaneous classification of multiple classes in NMR metabolomics and vibrational spectroscopy using interval-based classification methods: iECVA vs iPLS-DA
Anal. Chim. Acta
2018
, vol. 
1021
 (pg. 
20
-
27
)
73.
López-Rituerto
 
E.
et al., Investigations of La Rioja Terroir for Wine Production Using 1H NMR Metabolomics
J. Agric. Food Chem.
2012
, vol. 
60
 (pg. 
3452
-
3461
)
74.
Xia
 
J.
Wishart
 
D. S.
MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data
Nucleic Acids Res.
2010
, vol. 
38
 (pg. 
W71
-
W77
)
75.
Xia
 
J.
Wishart
 
D. S.
MetPA: a web-based metabolomics tool for pathway analysis and visualization
Bioinformatics
2010
, vol. 
26
 
18
(pg. 
2342
-
2344
)
76.
Moseley
 
H. N. B.
Error Analysis and Propagation in Metabolomics Data Analysis
Comput. Struct. Biotechnol. J.
2013
, vol. 
4
 
5
pg. 
e201301006
 
77.
Quansah
 
E.
et al., 1H NMR-based metabolomics reveals neurochemical alterations in the brain of adolescent rats following acute methylphenidate administration
Neurochem. Int.
2017
, vol. 
108
 (pg. 
109
-
120
)
78.
Gromski
 
P. S.
et al., A tutorial review: Metabolomics and partial least squares-discriminant analysis – a marriage of convenience or a shotgun wedding
Anal. Chim. Acta
2015
, vol. 
879
 (pg. 
10
-
23
)
79.
Greenland
 
S.
et al., Statistical test, P values, confidence intervals and power: a guide to misinterpretations
Eur. J. Epidemiol.
2016
, vol. 
31
 (pg. 
337
-
350
)
80.
Glaab
 
E.
et al., Integrative analysis of blood metabolomics and PET brain neuroimaging data for Parkinson's disease
Neurobiol. Dis.
2019
, vol. 
124
 (pg. 
555
-
562
)

Figures & Tables

Figure 1.1

7Li nucleus NMR signal originally observed by Rabi et al.10  Reproduced from ref. 10, https://doi.org/10.1103/PhysRev.55.526, with permission from American Physical Society, Copyright 1939.

Figure 1.1

7Li nucleus NMR signal originally observed by Rabi et al.10  Reproduced from ref. 10, https://doi.org/10.1103/PhysRev.55.526, with permission from American Physical Society, Copyright 1939.

Close modal
Figure 1.2

400 and 60 MHz NMR of the same urine sample from a control participant. Assignments include; [1] trimethylsilylpropanoic acid–CH3; [2] acetone–CH3; [3] pyruvate–CH3; [4] citrate–CH2A/B; [5] creatinine/creatine–CH3; [6] cis-aconitate–CH2; [7] taurine–CH2; [8] trimethyl-N-oxide–CH3; [9] glycine–CH3; [10] taurine–CH2; [11] unassigned-CH2; [12] creatine–CH2; [13] glycolate–CH2; [14] creatinine–CH2; [15] H2O–OH; [16] histidine–CH; [17] indoxyl sulphate–CH; [18] indoxyl sulphate–CH; [19] hippurate–CH; [20] hippurate–CH; [21] hippurate–CH; and [22] formate–CH.

Figure 1.2

400 and 60 MHz NMR of the same urine sample from a control participant. Assignments include; [1] trimethylsilylpropanoic acid–CH3; [2] acetone–CH3; [3] pyruvate–CH3; [4] citrate–CH2A/B; [5] creatinine/creatine–CH3; [6] cis-aconitate–CH2; [7] taurine–CH2; [8] trimethyl-N-oxide–CH3; [9] glycine–CH3; [10] taurine–CH2; [11] unassigned-CH2; [12] creatine–CH2; [13] glycolate–CH2; [14] creatinine–CH2; [15] H2O–OH; [16] histidine–CH; [17] indoxyl sulphate–CH; [18] indoxyl sulphate–CH; [19] hippurate–CH; [20] hippurate–CH; [21] hippurate–CH; and [22] formate–CH.

Close modal
Figure 1.3

Effect of realignment on the creatinine signal: before (upper panel) and after (lower panel) realignment.

Figure 1.3

Effect of realignment on the creatinine signal: before (upper panel) and after (lower panel) realignment.

Close modal
Figure 1.4

1H-NMR spectrum before (A) and after (B) equidistant bucketing. (C) Equidistant bucketing, in which the bucket size is constant. As shown, a single resonance could be erroneously divided into two buckets. (D) Intelligent bucketing, in which each signal is divided into only a single bucket.

Figure 1.4

1H-NMR spectrum before (A) and after (B) equidistant bucketing. (C) Equidistant bucketing, in which the bucket size is constant. As shown, a single resonance could be erroneously divided into two buckets. (D) Intelligent bucketing, in which each signal is divided into only a single bucket.

Close modal
Figure 1.5

STOCSY analysis of faecal water, with a correlation matrix of R2=0.0–1.0 (right-hand ordinate axis) which shows correlation with the β-CH2 function δ=1.562 ppm driver signal, two other signals (those of the terminal-CH3 (t) and α-CH2 (t) functions) correlated with this resonance, which aids the positive identification of n-butyrate.54  Reproduced from ref. 54, https://doi.org/10.1016/j.csbj.2016.02.005, under a CC By 4.0 license, https://creativecommons.org/licenses/by/4.0/.

Figure 1.5

STOCSY analysis of faecal water, with a correlation matrix of R2=0.0–1.0 (right-hand ordinate axis) which shows correlation with the β-CH2 function δ=1.562 ppm driver signal, two other signals (those of the terminal-CH3 (t) and α-CH2 (t) functions) correlated with this resonance, which aids the positive identification of n-butyrate.54  Reproduced from ref. 54, https://doi.org/10.1016/j.csbj.2016.02.005, under a CC By 4.0 license, https://creativecommons.org/licenses/by/4.0/.

Close modal
Figure 1.6

STOCSY analysis of a sucrose/glucose admixture (metabolites are represented by the letters S and G respectively).55  Reproduced from ref. 55 with permission from American Chemical Society, Copyright 2007.

Figure 1.6

STOCSY analysis of a sucrose/glucose admixture (metabolites are represented by the letters S and G respectively).55  Reproduced from ref. 55 with permission from American Chemical Society, Copyright 2007.

Close modal
Figure 1.7

PCA plot of urinary profiles from feline urine from drug-treated (1000 MG CD), untreated (UNTREATED_NPC) and healthy control (CONTROL) groups shown in red, blue and green respectively. PC1 and PC2 represent 57% and 8% of the dataset variance respectively, and the 95% confidence ellipses are shown. Two clear outliers are highlighted by the arrows.

Figure 1.7

PCA plot of urinary profiles from feline urine from drug-treated (1000 MG CD), untreated (UNTREATED_NPC) and healthy control (CONTROL) groups shown in red, blue and green respectively. PC1 and PC2 represent 57% and 8% of the dataset variance respectively, and the 95% confidence ellipses are shown. Two clear outliers are highlighted by the arrows.

Close modal
Figure 1.8

Background showing a PCA scores plot of green coffee bean extracts with a 95% confidence ellipse revealing an outlier and an unshimmed spectrum outlier sample in the foreground which is placed outside the confidence ellipse for all samples.65  Reproduced from ref. 65 with permission from Elsevier, Copyright 2014.

Figure 1.8

Background showing a PCA scores plot of green coffee bean extracts with a 95% confidence ellipse revealing an outlier and an unshimmed spectrum outlier sample in the foreground which is placed outside the confidence ellipse for all samples.65  Reproduced from ref. 65 with permission from Elsevier, Copyright 2014.

Close modal
Figure 1.9

PLS-DA showing 95% confidence ellipses and corresponding VIPs of Huntington's disease (red) versus control (blue) frontal lobe extracts in (A) and (B), and the striatum region shown in (C) and (D).70  Reproduced from ref. 70 with permission from Elsevier, Copyright 2016.

Figure 1.9

PLS-DA showing 95% confidence ellipses and corresponding VIPs of Huntington's disease (red) versus control (blue) frontal lobe extracts in (A) and (B), and the striatum region shown in (C) and (D).70  Reproduced from ref. 70 with permission from Elsevier, Copyright 2016.

Close modal
Figure 1.10

OPLS-DA S-plot revealing discriminatory metabolites. Abbreviations: N-acetyl-aspartate (NAA), gamma-aminobutyric acid (GABA), glutamate (Glu), glutamine (Gln). Reproduced from ref. 71 with permission from Elsevier, Copyright 2018.

Figure 1.10

OPLS-DA S-plot revealing discriminatory metabolites. Abbreviations: N-acetyl-aspartate (NAA), gamma-aminobutyric acid (GABA), glutamate (Glu), glutamine (Gln). Reproduced from ref. 71 with permission from Elsevier, Copyright 2018.

Close modal
Figure 1.11

CCorA analysis from a PCA plot in which Y1 and Y2 represent scores vector datasets arising from the separate 1H NMR and clinical chemistry datasets respectively. Reproduced from ref. 31, https://doi.org/10.1038/s41598-017-06264-2, under the terms of a CC BY 4.0 license, https://creativecommons.org/licenses/by/4.0/.

Figure 1.11

CCorA analysis from a PCA plot in which Y1 and Y2 represent scores vector datasets arising from the separate 1H NMR and clinical chemistry datasets respectively. Reproduced from ref. 31, https://doi.org/10.1038/s41598-017-06264-2, under the terms of a CC BY 4.0 license, https://creativecommons.org/licenses/by/4.0/.

Close modal
Figure 1.12

iECVA plot overlay with average NMR spectra from 26 wine samples from La Rioja.73  Reproduced from ref. 73 with permission from American Chemical Society, Copyright 2012.

Figure 1.12

iECVA plot overlay with average NMR spectra from 26 wine samples from La Rioja.73  Reproduced from ref. 73 with permission from American Chemical Society, Copyright 2012.

Close modal
Figure 1.13

ECVA score plot with 26 wine samples showing different wineries in La Rioja based on the 100th interval shown in Figure 1.12. Reproduced from ref. 73 with permission from American Chemical Society, Copyright 2012.

Figure 1.13

ECVA score plot with 26 wine samples showing different wineries in La Rioja based on the 100th interval shown in Figure 1.12. Reproduced from ref. 73 with permission from American Chemical Society, Copyright 2012.

Close modal
Figure 1.14

Two- and three-dimensional PCA scores plots showing clear distinctions between four sets of n=2 replicate samples of four separate urine samples (coded RK2, RK28, RK37 and RK41) analysed using 1H NMR spectroscopy at an operating frequency of 400 MHz. The colour codes provided on the right-hand side abscissa axes also provide information on the dates that the samples were analysed, for example the RK2-1 and RK2-2 samples correspond to the same sample analysed in duplicate on two separate occasions. Duplicate samples were prepared for analysis on each occasion.

Figure 1.14

Two- and three-dimensional PCA scores plots showing clear distinctions between four sets of n=2 replicate samples of four separate urine samples (coded RK2, RK28, RK37 and RK41) analysed using 1H NMR spectroscopy at an operating frequency of 400 MHz. The colour codes provided on the right-hand side abscissa axes also provide information on the dates that the samples were analysed, for example the RK2-1 and RK2-2 samples correspond to the same sample analysed in duplicate on two separate occasions. Duplicate samples were prepared for analysis on each occasion.

Close modal
Figure 1.15

Power calculations performed by Quansah et al.,77  in which the line reading 1.0 shows the optimum number of samples required for statistical significance. Reproduced from ref. 77 with permission from Elsevier, Copyright 2017.

Figure 1.15

Power calculations performed by Quansah et al.,77  in which the line reading 1.0 shows the optimum number of samples required for statistical significance. Reproduced from ref. 77 with permission from Elsevier, Copyright 2017.

Close modal
Table 1.1

Statistical and Classifications Strategies Inclusive of Advantages and Disadvantages in NMR Metabolomics Applications.

Metabolomic MethodologyClassification or StatisticalUnivariate or MultivariateSupervisionAdvantages (+) and Disadvantages (−)
ANOVA Statistical Univariate Unsupervised + Hypothesis testing, with the ability to evaluate the statistical significance of a wide range of contributory variables, and their interactions, simultaneously. Partitions the total experimental variance into differential ‘predictor’ components, which may be fixed or random effects. Satisfaction of essential assumptions can be achieved with suitable transformations, for example logarithmic or square root ones. 
Student's t-test Statistical Univariate Unsupervised + Hypothesis testing, but without corrections for false discovery rate, is only appropriate for comparisons of the means of only two sample groups. 
Mann–Whitney U- test Statistical Univariate Unsupervised + Hypothesis testing – non-parametric equivalent of two sample t test. 
+ Data does not require normalisation prior to use. 
Fold-change Analysis Statistical Univariate Unsupervised + Hypothesis testing; represents the ratio of two sample group mean values, and the significance of these indices may be tested. 
ASCA Statistical Multivariate Unsupervised + Can consider paired samples, for example from the same person at different time-points, or two or more possible predictor variables simultaneously. 
PCA Statistical Multivariate Unsupervised + Outlier detection 
+ Unsupervised multivariate technique for dimensionality reduction and the preliminary exploration of 2D or 3D samples or participant clusterings. 
PLS-DA Statistical Multivariate Supervised + VIPs for significant metabolites 
− As it is subject to overfitting, permutation and validation testings are essential. 
O-PLS-DA Statistical Multivariate Supervised + S-Plot for significant metabolites 
− Can only consider two groups, validation and permutation testing required. 
Metabolomic MethodologyClassification or StatisticalUnivariate or MultivariateSupervisionAdvantages (+) and Disadvantages (−)
ANOVA Statistical Univariate Unsupervised + Hypothesis testing, with the ability to evaluate the statistical significance of a wide range of contributory variables, and their interactions, simultaneously. Partitions the total experimental variance into differential ‘predictor’ components, which may be fixed or random effects. Satisfaction of essential assumptions can be achieved with suitable transformations, for example logarithmic or square root ones. 
Student's t-test Statistical Univariate Unsupervised + Hypothesis testing, but without corrections for false discovery rate, is only appropriate for comparisons of the means of only two sample groups. 
Mann–Whitney U- test Statistical Univariate Unsupervised + Hypothesis testing – non-parametric equivalent of two sample t test. 
+ Data does not require normalisation prior to use. 
Fold-change Analysis Statistical Univariate Unsupervised + Hypothesis testing; represents the ratio of two sample group mean values, and the significance of these indices may be tested. 
ASCA Statistical Multivariate Unsupervised + Can consider paired samples, for example from the same person at different time-points, or two or more possible predictor variables simultaneously. 
PCA Statistical Multivariate Unsupervised + Outlier detection 
+ Unsupervised multivariate technique for dimensionality reduction and the preliminary exploration of 2D or 3D samples or participant clusterings. 
PLS-DA Statistical Multivariate Supervised + VIPs for significant metabolites 
− As it is subject to overfitting, permutation and validation testings are essential. 
O-PLS-DA Statistical Multivariate Supervised + S-Plot for significant metabolites 
− Can only consider two groups, validation and permutation testing required. 

References

1.
Bayes
 
T.
Price
 
R.
An Essay towards solving a Problem in the Doctrine of Chances
Philos. Trans. R. Soc. London
1763
, vol. 
53
 (pg. 
370
-
418
)
2.
Pearson
 
K.
On lines and planes of closest fit to systems of points in space
London, Edinburgh Dublin Philos. Mag. J. Sci.
1901
, vol. 
2
 
11
(pg. 
559
-
572
)
3.
Spearman
 
C.
General Intelligence Objectively Determined and Measured
Am. J. Psychol.
1904
, vol. 
15
 
2
(pg. 
201
-
292
)
4.
Gosset
 
W. S.
The application of the law of error to the work of the brewery
Guinness Internal Note
1904
5.
R. A.
Fisher
,
Statistical Methods for Research Workers Oliver and Boyd Edinburgh
,
1925
6.
Fisher
 
R. A.
Balmukand
 
B.
The estimation of linkage from the offspring of selfed heterozygotes
J. Genet.
1928
, vol. 
20
 
1
(pg. 
79
-
92
)
7.
Neyman
 
J.
Pearson
 
E. S.
Pearson
 
K.
IX. On the problem of the most efficient tests of statistical hypotheses
Philos. Trans. R. Soc., A.
1933
, vol. 
231
 (pg. 
694
-
704
)
8.
Emwas
 
A.-H. M.
et al., NMR-based metabolomics in human disease diagnosis: applications, limitations, and recommendations
Metabolomics
2013
, vol. 
9
 
5
(pg. 
1048
-
1072
)
9.
Pauli
 
W.
On the hydrogen spectrum from the standpoint of the new quantum mechanics
Z. Phys.
1926
, vol. 
36
 (pg. 
336
-
363
)
10.
Rabi
 
I. I.
et al., The Molecular Beam Resonance Method for Measuring Nuclear Magnetic Moments. The Magnetic Moments of 3Li6, 3Li7 and 9F19
Phys. Rev. J. Arch.
1939
, vol. 
55
 pg. 
526
 
11.
Overhauser
 
A. W.
Polarization of Nuclei in Metals
Phys. Rev.
1953
, vol. 
92
 pg. 
411
 
12.
Redfield
 
A. G.
On the Theory of Relaxation Processes
IBM J. Res. Dev.
1957
, vol. 
1
 
1
(pg. 
19
-
31
)
13.
Purcell
 
E. M.
et al., Resonance Absorption by Nuclear Magnetic Moments in a Solid
Phys. Rev. J. Arch.
1945
, vol. 
69
 pg. 
37
 
14.
Bloch
 
F.
et al., Nuclear Induction
Phys. Rev.
1946
, vol. 
69
 pg. 
127
 
15.
Grootveld
 
M.
et al., Progress in Low-Field Benchtop NMR Spectroscopy in Chemical and Biochemical Analysis
Anal. Chim. Acta
2019
, vol. 
1067
 (pg. 
11
-
30
)
16.
Percival
 
B.
et al., Low-Field, Benchtop NMR Spectroscopy as a Potential Tool for Point-of-Care Diagnostics of Metabolic Conditions: Validation, Protocols and Computational Models
High-Throughput
2019
, vol. 
1
 pg. 
2
 
17.
Nicholson
 
J. K.
et al., 750 MHz 1H and 1H-13C NMR Spectroscopy of Human Blood Plasma
Anal. Chem.
1995
, vol. 
67
 
5
(pg. 
783
-
811
)
18.
Lippens
 
G.
Dhalluin
 
C.
Wieruszeski
 
J. M.
Use of a water flip-back pulse in the homonuclear NOESY experiment
J. Biomol. NMR
1995
, vol. 
5
 
3
(pg. 
327
-
331
)
19.
Piotto
 
M.
Saudek
 
V.
Sklenář
 
V.
Gradient-tailored excitation for single-quantum NMR spectroscopy of aqueous solutions
J. Biomol. NMR
1992
, vol. 
2
 
6
(pg. 
661
-
665
)
20.
Le Guennec
 
et al., Alternatives to Nuclear Overhauser Enhancement Spectroscopy Presat and Carr-Purcell-Meiboom-Gill Presat for NMR-Based Metabolomics
Anal. Chem.
2017
, vol. 
89
 (pg. 
8582
-
8588
)
21.
J.
Lindon
,
J.
Nicholson
and
Holmes
,
The Handbook of Metabonomics and Metabolomics
,
Elsevier Science
, 1st edn,
2008
22.
Pauling
 
L.
et al., Quantitative analysis of urine vapor and breath by gas-liquid partition chromatography
Proc. Natl. Acad. Sci. U.S.A.
1971
, vol. 
68
 
10
(pg. 
2374
-
2376
)
23.
Kell
 
D. B.
Oliver
 
S. G.
The Metabolome 18 Years on: A Concept Comes of Age
Metabolomics
2016
, vol. 
12
 
9
pg. 
148
 
24.
Bales
 
J. R.
et al., Urinary-excretion of acetaminophen and its metabolites as studied by proton NMR-Spectroscopy
Clin. Chem.
1984
, vol. 
30
 (pg. 
1631
-
1636
)
25.
Nicholson
 
J. K.
et al., Monitoring metabolic disease by proton NMR of urine
Lancet
1984
, vol. 
2
 (pg. 
751
-
752
)
26.
Lindon
 
J.
Nicholson
 
J.
Spectroscopic and Statistical Techniques for Information Recovery in Metabonomics and Metabolomics
Annu. Rev. Anal. Chem.
2008
, vol. 
1
 
1
(pg. 
45
-
69
)
27.
German
 
J. B.
et al., Metabolomics: building on a century of biochemistry to guide human health
Metabolomics
2005
, vol. 
1
 
1
(pg. 
3
-
9
)
28.
Beisken
 
S.
et al., Getting the right answers: understanding metabolomics challenges
Expert Rev. Mol. Diagn.
2015
, vol. 
15
 
1
(pg. 
97
-
109
)
29.
Sandlers
 
Y.
The future perspective: metabolomics in laboratory medicine for inborn errors of metabolism
Transl. Res.
2017
, vol. 
189
 (pg. 
65
-
75
)
30.
Lindon
 
J. C.
Holmes
 
E.
Nicholson
 
J.
Metabonomics techniques and applications to pharmaceutical research & development
Pharm. Res.
2006
, vol. 
23
 
6
(pg. 
1075
-
1088
)
31.
Probert
 
F.
et al., NMR analysis reveals significant differences in the plasma metabolic profiles of Niemann Pick C1 patients, heterozygous carriers, and healthy controls
Nat. Sci. Rep.
2017
, vol. 
7
 pg. 
6320
 
32.
Pinito
 
J.
et al., Human plasma stability during handling and storage: impact on NMR metabolomics
Analyst
2014
, vol. 
139
 
5
(pg. 
1168
-
1177
)
33.
Emwas
 
A.-H. M.
et al., Recommendations and Standardization of Biomarker Quantification Using NMR-Based Metabolomics with Particular Focus on Urinary Analysis
J. Proteome Res.
2016
, vol. 
15
 (pg. 
360
-
373
)
34.
Beckonert
 
O.
et al., Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts
Nat. Protoc.
2007
, vol. 
11
 
2
(pg. 
2692
-
2703
)
35.
Alum
 
M. F.
et al., 4,4-Dimethyl-4-silapentane-1-ammonium trifluoroacetate (DSA), a promising universal internal standard for NMR-based metabolic profiling studies of biofluids, including blood plasma and serum
Metabolomics
2008
, vol. 
4
 (pg. 
122
-
127
)
36.
Nicholson
 
J. K.
Gartland
 
K. R.
1H NMR studies on protein binding of histidine, tyrosine and phenylalanine in blood plasma
NMR Biomed.
1989
, vol. 
2
 pg. 
2
 
37.
Martin
 
M.
et al., PepsNMR for 1H NMR metabolomic data pre-processing
Anal. Chim. Acta
2018
(pg. 
1
-
13
)
38.
Percival
 
B.
et al., Detection and Determination of Methanol and Further Potential Toxins in Human Saliva Collected from Cigarette Smokers: A 1H NMR Investigation
JSM Biotechnol. Biomed. Eng.
2018
, vol. 
5
 
1
pg. 
1081
 
39.
Alonso
 
A.
Marsal
 
S.
Julià
 
A.
Analytical methods in untargeted metabolomics: state of the art in 2015
Front. Bioeng. Biotechnol.
2015
, vol. 
3
 
23
(pg. 
1
-
20
)
40.
Giraudeau
 
P.
Tea
 
I.
Remaud
 
G. S.
Akoka
 
S.
Reference and normalization methods: Essential tools for the intercomparison of NMR spectra
J. Pharm. Biomed. Anal.
2014
, vol. 
93
 (pg. 
3
-
16
)
41.
Scalabre
 
A.
et al., Evolution of Newborns’ Urinary Metabolomic Profiles According to Age and Growth
J. Proteome Res.
2017
, vol. 
16
 (pg. 
3732
-
3740
)
42.
Slupsky
 
C. M.
et al., Investigations of the Effects of Gender, Diurnal Variation, and Age in Human Urinary Metabolomic Profiles
Anal. Chem.
2007
, vol. 
78
 
18
(pg. 
6995
-
7004
)
43.
Haslauer
 
K. E.
Hemmler
 
D.
Schmitt-Kopplin
 
P.
Sophie Heinzmann
 
S.
Guidelines for the Use of Deuterium Oxide (D2O) in 1H NMR Metabolomics
Anal. Chem.
2019
, vol. 
91
 
17
(pg. 
11063
-
11069
)
44.
Kohl
 
S. M.
et al., State-of-the art data normalization methods improve NMR-based metabolomic analysis
Metabolomics
2012
, vol. 
8
 
1
(pg. 
146
-
160
)
45.
Gromski
 
P. S.
et al., The influence of scaling metabolomics data on model classification accuracy
Metabolomics
2015
, vol. 
11
 (pg. 
684
-
695
)
46.
Liland
 
H. K.
Multivariate methods in metabolomics – from pre-processing to dimension reduction and statistical analysis
TrAC, Trends Anal. Chem.
2011
, vol. 
30
 
6
(pg. 
827
-
841
)
47.
Wishart
 
D. S.
et al., HMDB 4.0 — The Human Metabolome Database for 2018
Nucleic Acids Res.
2018
, vol. 
46
 (pg. 
D608
-
D617
)
48.
Haug
 
K.
et al., MetaboLights– an open-access general-purpose repository for metabolomics studies and associated meta-data
Nucleic Acids Res.
2013
(pg. 
41 D1 D781
-
D786
)
49.
Ulrich
 
E. L.
et al., BioMagResBank
Nucleic Acids Res.
2008
, vol. 
36
 (pg. 
D402
-
D408
)
50.
Cui
 
Q.
et al., Metabolite identification via the Madison Metabolomics Consortium Database
Nat. Biotechnol.
2008
, vol. 
26
 (pg. 
162
-
164
)
51.
Ludwig
 
C.
et al., Birmingham Metabolite Library: A publicly accessible database of 1D 1H and 2D 1H J-resolved NMR authentic metabolite standards (BML-NMR)
Metabolomics
2012
, vol. 
8
 
1
(pg. 
8
-
12
)
52.
Steinbeck
 
C.
Kuhn
 
S.
NMRShiftDB – compound identification and structure elucidation support through a free community-builtweb database
Phytochemistry
2004
, vol. 
65
 
19
(pg. 
2711
-
2717
)
53.
Sud
 
M.
et al., Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools
Nucleic Acids Res.
2016
, vol. 
44
 (pg. 
D1 D463
-
D1 D46470
)
54.
Dona
 
A. C.
et al., A guide to the identification of metabolites in NMR-based metabonomics/metabolomics experiments
Comput. Struct. Biotechnol. J.
2016
, vol. 
14
 (pg. 
135
-
153
)
55.
Smith
 
L. M.
et al., Statistical Correlation and Projection Methods for Improved Information Recovery from Diffusion-Edited NMR Spectra of Biological Samples
Anal. Chem.
2007
, vol. 
79
 (pg. 
5682
-
5689
)
56.
Blaise
 
B. J.
et al., Two-Dimensional Statistical Recoupling for the Identification of Perturbed Metabolic Networks from NMR Spectroscopy
J. Proteome Res.
2010
, vol. 
9
 (pg. 
4513
-
4520
)
57.
Crockford
 
D. J.
et al., Statistical Heterospectroscopy, an Approach to the Integrated Analysis of NMR and UPLC-MS Data Sets: Application in Metabonomic Toxicology Studies
Anal. Chem.
2006
, vol. 
78
 (pg. 
363
-
371
)
58.
Nicholson
 
J. K.
et al., Statistical Heterospectroscopy, an Approach to the Integrated Analysis of NMR and UPLC-MS Data Sets: Application in Metabonomic Toxicology Studies
Anal. Chem.
2004
, vol. 
9
 
3
(pg. 
363
-
371
)
59.
Takis
 
P. G.
et al., Deconvoluting interrelationships between concentrations and chemical shifts in urine provides a powerful analysis tool
Nat. Commun.
2017
, vol. 
8
 pg. 
1662
 
60.
Da-Wei
 
L.
et al., Reliable resonance assignments of selected residues of proteins with known structure based on empirical NMR chemical shift prediction of proteins with known structure based on empirical NMR chemical shift prediction
J. Magn. Reson.
2015
, vol. 
254
 (pg. 
93
-
97
)
61.
Chong
 
J.
et al., Metaboanalyst 4.0: towards more transparent and integrative metabolomics analysis
Nucleic Acids Res.
2018
, vol. 
46
 
W1
(pg. 
W486
-
W494
)
62.
Saccenti
 
E.
et al., Reflections on univariate and multivariate analysis of metabolomics data
Metabolomics
2014
, vol. 
10
 
3
(pg. 
361
-
371
)
63.
Ruiz-Rodado
 
V.
et al., 1H NMR-Linked Metabolomics Analysis of Liver from a Mouse Model of NP-C1 Disease
J. Proteome Res.
2016
, vol. 
15
 
10
(pg. 
3511
-
3527
)
64.
Lemanska
 
A.
et al., Chemometric variance analysis of 1H NMR metabolomics data on the effects of oral rinse on saliva
Metabolomics
2012
, vol. 
8
 
S1
(pg. 
64
-
80
)
65.
Kwon
 
D.-A.
et al., Assessment of green coffee bean metabolites dependent on coffee quality using a 1H NMR-based metabolomics approach
Food Res. Int.
2015
(pg. 
175
-
182
)
66.
Camacho
 
J.
et al., Group-Wise Principal Component Analysis for Exploratory Data Analysis
J. Comput. Graph. Stat.
2017
, vol. 
26
 
3
(pg. 
501
-
512
)
67.
Broadhurst
 
D. I.
Kell
 
D. B.
Statistical strategies for avoiding false discoveries in metabolomics and related experiments
Metabolomics
2006
, vol. 
2
 
4
(pg. 
171
-
196
)
68.
Xia
 
J.
Broadhurst
 
D. J.
Wilson
 
M.
Wishart
 
D.
Translational biomarker discovery in clinical metabolomics: An introductory tutorial
Metabolomics
2013
, vol. 
9
 
2
(pg. 
280
-
299
)
69.
Bünger
 
R.
Mallet
 
R. T.
Metabolomics and ROC Analysis: A Promising Approach for Sepsis Diagnosis
Crit. Care Med.
2016
, vol. 
118
 
24
(pg. 
6072
-
6078
)
70.
Graham
 
S. F.
et al., Metabolic signatures of Huntington's disease (HD): 1H NMR analysis of the polar metabolome in post-mortem human brain
Biochim. Biophys. Acta, Mol. Basis Dis.
2016
, vol. 
1862
 
9
(pg. 
1675
-
1684
)
71.
Quansah
 
E.
et al., Methylphenidate alters monoaminergic and metabolic pathways in the cerebellum of adolescent rats
Eur. Neuropsychopharmacol.
2018
, vol. 
28
 
4
(pg. 
513
-
528
)
72.
Rinnan
 
A.
Savorani
 
F.
Engelsen
 
S. B.
Simultaneous classification of multiple classes in NMR metabolomics and vibrational spectroscopy using interval-based classification methods: iECVA vs iPLS-DA
Anal. Chim. Acta
2018
, vol. 
1021
 (pg. 
20
-
27
)
73.
López-Rituerto
 
E.
et al., Investigations of La Rioja Terroir for Wine Production Using 1H NMR Metabolomics
J. Agric. Food Chem.
2012
, vol. 
60
 (pg. 
3452
-
3461
)
74.
Xia
 
J.
Wishart
 
D. S.
MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data
Nucleic Acids Res.
2010
, vol. 
38
 (pg. 
W71
-
W77
)
75.
Xia
 
J.
Wishart
 
D. S.
MetPA: a web-based metabolomics tool for pathway analysis and visualization
Bioinformatics
2010
, vol. 
26
 
18
(pg. 
2342
-
2344
)
76.
Moseley
 
H. N. B.
Error Analysis and Propagation in Metabolomics Data Analysis
Comput. Struct. Biotechnol. J.
2013
, vol. 
4
 
5
pg. 
e201301006
 
77.
Quansah
 
E.
et al., 1H NMR-based metabolomics reveals neurochemical alterations in the brain of adolescent rats following acute methylphenidate administration
Neurochem. Int.
2017
, vol. 
108
 (pg. 
109
-
120
)
78.
Gromski
 
P. S.
et al., A tutorial review: Metabolomics and partial least squares-discriminant analysis – a marriage of convenience or a shotgun wedding
Anal. Chim. Acta
2015
, vol. 
879
 (pg. 
10
-
23
)
79.
Greenland
 
S.
et al., Statistical test, P values, confidence intervals and power: a guide to misinterpretations
Eur. J. Epidemiol.
2016
, vol. 
31
 (pg. 
337
-
350
)
80.
Glaab
 
E.
et al., Integrative analysis of blood metabolomics and PET brain neuroimaging data for Parkinson's disease
Neurobiol. Dis.
2019
, vol. 
124
 (pg. 
555
-
562
)
Close Modal

or Create an Account

Close Modal
Close Modal