- 1.1 Introduction
- 1.2 Electrospray Ionization
- 1.2.1 Ionization Mechanisms
- 1.2.2 Ionization Response and Bias
- 1.2.3 Ion Transmission Efficiency
- 1.3 Contemporary LC-MS/MS Instrumentation
- 1.3.1 Liquid Chromatography
- 1.3.2 High-performance Tandem Mass Spectrometers
- 1.3.3 High-performance Characteristics of LC-MS/MS: Cycle Time, Peak Capacity, and Dynamic Range
- 1.4 Optimizing LC-MS/MS for Quantitative Proteomics Using Design of Experiments
- 1.5 Summary
CHAPTER 1: Practical Considerations and Current Limitations in Quantitative Mass Spectrometry-based Proteomics
-
Published:10 Jan 2014
-
A. M. Hawkridge, in Quantitative Proteomics, ed. C. E. Eyers and S. Gaskell, The Royal Society of Chemistry, 2014, pp. 1-25.
Download citation file:
Quantitative mass spectrometry (MS)-based proteomics continues to evolve through advances in sample preparation, chemical and biochemical reagents, instrumentation, and software. The breadth of proteomes and biological applications combined with unique experimental goals makes optimizing MS-based proteomics workflows a daunting task. Several MS-based instrument platforms are commercially available with LC-MS/MS being the most common for quantitative proteomics studies. Although the direction of LC-MS/MS instrumentation development is toward more user-friendly interfaces, there remain fundamental aspects of the technology that can be optimized for improving data quality. The intent of this chapter is to provide an introductory framework for understanding some of the more significant LC-MS/MS experimental conditions that can influence quantitative MS-based proteomics measurements, including electrospray ionization (ESI) bias and ion transmission efficiency. Because each commercial LC-MS/MS system is unique with regard to ESI source, transmission optics, ion isolation and trapping, ion fragmentation, and mass analysis, the use of design of experiments (DoE) is discussed as a potential approach for efficiently optimizing multiple inter-related factors.
1.1 Introduction
Mass spectrometry (MS)-based proteomics has become a prominent technology platform for quantitatively studying protein expression, modification, interaction, and degradation.1–4 Although more established protein quantification methods using antibodies, gel electrophoresis, and radiochemical labelling remain important in biological research, they do not provide the level of molecular specificity and breadth of unbiased proteome coverage achieved by MS-based approaches. Among the many types and configurations of mass spectrometer, reverse phase liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is perhaps the most commonly used for MS-based proteomics studies. LC-MS/MS can be configured to accommodate many types of quantitative experiments that span discovery (i.e., global shotgun proteomics) to targeted protein quantification (protein cleavage-isotope dilution mass spectrometry, PC-IDMS). However, optimizing the performance of an LC-MS/MS system for quantitative proteomics measurements can be a daunting challenge when considering the type of sample, choice of quantitative strategies (e.g., isobaric tagging, SILAC, label-free, PC-IDMS, etc.), extensive number of instrument parameters that can be adjusted, and the rapidly changing technology landscape. In order to optimize LC-MS/MS for detection limits, proteome coverage, precision, and accuracy, it is important to have a fundamental understanding of the inter-related effects of instrument settings and experimental conditions on data quality.
LC-MS/MS is not an inherently quantitative technique due in large part to bias in the electrospray ionization (ESI) response.5 Co-eluting species in a typical LC-MS/MS experiment must compete for a finite amount of charge in the ESI plume before reaching the mass spectrometer. For LC-MS/MS-based proteomics, the competing species are typically proteolytic peptides derived from intact proteins, each of which has unique physical (e.g., molecular weight) and chemical (e.g., hydrophobicity, isoelectric point) properties that give rise to different ESI responses. These physico-chemical properties also affect additional LC-MS/MS performance characteristics, including chromatographic, transmission, and fragmentation efficiency, further complicating LC-MS/MS quantification. Compounding these challenges is the pre-analytical variability introduced during sample preparation, including digestion efficiency, differential degradation rates of proteins and peptides, and preparation-induced modifications (e.g., methionine oxidation), that can impact quantitative accuracy. All of these factors must be considered in quantitative LC-MS/MS-based proteomics, which covers a broad spectrum of experimental workflows. These workflows can be categorized as either relative (Figure 1.1A) or absolute (Figure 1.1B).
There are several relative quantification methods developed for LC-MS/MS-based proteomics (Figure 1.1A), including isobaric tagging (e.g., iCAT6 , iTRAQ7 , and TMT8 ; see Chapter 3 for further information), in vitro metabolic labeling (e.g., SILAC9 , 15N10 ; see Chapters 10 and 11 for further information), label-free (see Chapter 6 for further information),11–15 and 16O/18O labeling.16 Label-free LC-MS/MS quantification requires a high degree of LC run-to-run reproducibility13,17 to achieve good precision as the data-dependent MS/MS settings impact accuracy, dynamic range, and proteome coverage.18 Isobaric tagging improves run-to-run precision but can suffer from imprecision at the sample preparation stage and is subject to reduced accuracy.19 Although the multiplexing capability of isobaric tagging is a major strength, the dynamic range of measured protein abundance suffers with increasing numbers of tags. Digesting a protein sample in 18O water introduces a shift of 2 Da in the proteolytic peptide relative to a protein sample digested in 16O water. This global approach is attractive for clinical proteome samples (e.g., plasma and tissues) that cannot be labelled in vitro but it suffers from overlapping isotopic distributions, which make identification and quantification difficult.20 In vitro labeling, particularly stable isotope labeling of amino acids in cell culture (SILAC), provides excellent quantitative precision for global protein expression studies accounting for pre-analytical variability during sample preparation and is less sensitive to run-to-run LC reproducibility. However, because SILAC generates peptide pairs (2-plex labeling) or triplets (3-plex labeling), proteome coverage could, in principle, be reduced under the same LC-MS/MS method compared with label-free or isobaric tagging.21,22 This is a consequence of the instrument spending twice or three times as much time performing MS/MS on differentially labeled peptides of the same primary sequence and the added charge to AGC-based instruments, which limit the number of ions that can be detected.
Absolute quantification methods for LC-MS/MS-based proteomics (i.e., PC-IDMS) require internal standards, such as stable isotope-labeled proteins or peptides with known concentrations (Figure 1.1B; see Chapter 4 for further information).23–29 The accuracy and precision of the LC-MS/MS method is largely dependent on the quality and stability of the internal standard. Ideally, a purified stable isotope-labeled form of the intact protein is most desirable as it would, in principle, generate the measured (i.e., signature) tryptic peptide at the same rate and to the same extent as the endogenous protein target.30 However, stable isotope-labeled proteins are sometimes challenging to express and purify, making them more expensive and less common. Another strategy, termed QconCAT,31–34 uses a synthetic gene inserted into an expression vector that contains a series of the desired tryptic peptides for subsequent stable isotope labeling and enzymatic digestion. This strategy has great potential for multiplexed assays that require reproducible production of targeted peptides. Although the digestion efficiency of the concatenated stable isotope-labeled peptides may not be the same relative to their endogenous targeted proteins, thus impacting accuracy, the peptide production rate should be similar irrespective of sample, providing a high level of precision. The most common type of PC-IDMS method uses stable isotope-labeled synthetic peptides generated by solid phase synthesis techniques at a lower cost and with high purity. However, studies have shown that the quantitative accuracy of synthetic stable isotope-labeled tryptic peptides is highly variable depending on the rates of formation and stability of the tryptic peptides.35,36 Thus, the strengths and weaknesses of each approach must be carefully considered when developing targeted LC-MS/MS assays. Optimization of LC-MS/MS conditions for PC-IDMS largely focuses on limits of detection, limits of quantification, and linear dynamic range while concurrently minimizing analysis time. Triple quadrupole LC-MS/MS is typically the instrument of choice for these types of measurements but new hybrid high resolution/high mass accuracy instruments, such as the quadrupole-Orbitrap, have recently been introduced that could potentially shift this paradigm.
This chapter will focus on some of the fundamental and practical aspects of LC-MS/MS that impact quantitative proteomics measurements. The first section covers ESI mechanism(s), bias, and transmission. Ion transmission spans the ESI plume, MS inlet, intermediate pressure region, and then through the ion optics in the ultra-low pressure region of the mass spectrometer. Once these areas have been reviewed, the collective LC-MS/MS platform will be discussed in the context of peak capacity, duty cycle, and dynamic range. The final section will review experiment design (i.e., design of experiments, DoE) and highlight recent interest in this approach for optimizing LC-MS/MS. DoE offers perhaps the most universal empirical method for optimizing LC-MS/MS for complex quantitative proteomics measurements where experimental goals and outcomes must be balanced with a rapidly changing technology landscape.
1.2 Electrospray Ionization
Electrospray ionization (ESI) was introduced by Fenn in 198437 and has transformed the fields of separation science and mass spectrometry with no foreseeable limit to its analytical utility. Wilm et al.38,39 introduced nanoelectrospray (nESI) in the mid-1990s, which significantly improved the sensitivity of ESI and allowed for the characterization of low concentration tryptic peptides. The original nESI emitter configuration involved a high voltage applied to a metalized tapered fused silica tip with no applied back pressure. However, this configuration has been replaced with a non-metalized fused silica nESI emitter for LC-MS/MS applications, which is illustrated in Figure 1.2A, with the key dimensions and experimental parameters labeled. A standard nESI emitter is 360 μm OD fused silica capillary pulled to a 10–30 μm ID tapered tip,40,41 which is typically positioned approximately 1–5 mm from the entrance of the mass spectrometer. A flow rate of 200–500 nL min−1 is applied with either a syringe or LC pump, and a high voltage (1500–2500 V) is applied to an anode at the liquid/emitter interface. Despite the evolution and established use of nESI for quantitative proteomics, fundamental questions remain regarding the ionization mechanism(s), bias, dynamic range, and transmission efficiency of gas-phase ions into the mass spectrometer. A comprehensive discussion of these topics is provided in a seminal book by Cole et al.42 and several well written review articles.43–46
1.2.1 Ionization Mechanisms
The initial stages of ESI are illustrated in Figure 1.2A whereby the positive high voltage applied to the solvent creates an excess of protons at the emitter tip to form a Taylor cone.39,47 The combination of solvent flow and charge repulsion (i.e., maximizing surface area) create charged droplets that emit from the tip of the Taylor cone. The size of the droplet varies depending on the solvent composition, flow rate, and emitter tip diameter.38,39,43,48,49 Wilm et al.38 estimated a parent droplet diameter of ∼200 nm emitted from a 1 μm metalized emitter tip at 20–40 nL min−1. They had previously shown that the parent droplet was proportional to two thirds the power of the flow rate.39 Smith et al.49 used Doppler interferometry to study droplet dynamics emitted from a 50 μm metal capillary tip at 1–2 μL min−1 for different solvent compositions. They measured parent droplet diameters of water between 10–40 μm. Based on these two studies and the fact that the geometry and voltage junctions are completely different, a conservative estimate for the parent droplet diameter shown in Figure 1.2A is <5 μm for the experimental conditions shown. Regardless of the exact dimension of the parent droplet, the experimental dimensions and conditions are typical for nESI in LC-MS/MS quantitative proteomics measurements. In general, larger diameter droplets (>5 μm) generated at higher flow rates (>1 μL min−1) with larger diameter emitter tips produce smaller charge/volume ratios, requiring sheath and countercurrent gas flow to desolvate the droplets and drive ionization of analytes. Smaller diameter droplets (<5 μm) generated in nESI (<1 μL min−1) have a higher charge/volume ratio, which require less desolvation to drive ionization. With the exception of fast LC separations at μL min−1 flow rates for targeted PC-IDMS protein quantification, most LC-MS/MS proteomics work employs nESI because of the enhanced ionization efficiency and, thus, improved analysis/quantification.
The formation of ionized gas-phase analytes from the ESI process is believed to occur by one of two mechanisms: 1. the charge residue model (CRM)50 proposed by Dole; and 2. the ion evaporation model (IEM)51 proposed by Iribarne and Thomson. The CRM maintains that the solvent from the parent charged droplet evaporates causing successive droplet fission events until all solvent has evaporated and residual charge ionizes the desolvated analyte. The IEM (Figure 1.2B) maintains that, as solvent evaporates from the parent droplet, the surface charge density increases creating an environment for charged yet solvated analytes to readily escape (‘evaporate’) once fission occurs at 80–100% of the Rayleigh limit.43 Although the fundamental ionization mechanism(s) for ESI is still actively studied,48,52–58 it is generally accepted that the IEM provides the most likely ionization mechanism for peptides and small proteins.
1.2.2 Ionization Response and Bias
The IEM is generally accepted as the most likely ionization mechanism in shotgun proteomics due to the inherent ionization bias observed in complex mixtures. For example, the LC-MS analysis of a pure protein that had been enzymatically digested would not result in equivalent intensities for each individual peptide. We can begin to estimate the ESI response for peptides based on eqn 1.1 published by Fenn.5
The ESI response (or ion flux), Niz, of an ion (i) with z charges is a function of a proportionality constant (A) that relates bulk concentration to surface activity for a given analyte, the free energy of solvation (ΔG0iz), the gas constant (R), the temperature (T), the moles of analyte (Ni), the radius of the droplet (r), the distance the ions must travel to become desolvated (Δr), the excess charge (Q), and the gas permittivity constant (ε0). Although, in principle, all of the experimental factors from eqn 1.1 can be adjusted to optimize ESI response, the most practical include the droplet size (r), temperature (T), desolvation distance (Δr), and concentration (Ni).
The importance of droplet size (r) on ionization efficiency was already discussed in the context of ESI vs. nESI. Because r is inversely related to the ESI response, smaller droplets are best for shotgun proteomics measurements. The trade-off for nESI and low flow rates is sample throughput as equilibration of the LC column takes longer. Temperature is another factor that can be controlled to tailor ESI response, particularly at higher flow rates where desolvation of large droplets is more critical.59 Heated nESI sources have also been developed60,61 to study protein complex thermodynamics and level the ESI response factors for target analytes. A thorough study of the effect of nESI emitter temperature on quantitative proteomics datasets may reveal significant advantages, particularly for label-free proteomics where lower responding peptides may increase in abundance and subsequently be selected for sequencing. The desolvation distance (Δr) for a set r and T can be adjusted by changing the nESI emitter tip-to-capillary inlet distance or capillary length.41,62 Geromanos et al.41 found an emitter-capillary distances of 1 mm to be optimal for peptide signal intensity, and results from Page et al.62 showed a two-fold decrease in peak intensity of reserpine when positioning the nESI emitter tip from 2 mm to 5 mm. It is important to note, however, that these distances are highly source dependent, thus the reason for giving a conservative range (1–5 mm) in Figure 1.2A. Finally, analyte concentration can be increased (Ni) to improve peptide signal intensity but saturation begins to occur above ∼1–10 pmoles (the amount of a unique proteolytic peptide trapped on a column) for contemporary nLC-MS/MS systems. The primary drivers for ESI response bias include the inter-related partition constant (A) and ΔG0iz, both of which are based on the physico-chemical properties of the proteolytic peptides.
Peptide ionization bias in the ESI process has been attributed to several physico-chemical properties, including molecular weight, basicity, hydrophobicity, 3D structure, and solubility. Collectively, these metrics are being used to understand the ESI response bias observed in nLC-MS/MS shotgun proteomics studies. Early fundamental work by Enke et al. 63–65 showed that more hydrophobic amino acid residues preferentially ionize relative to more hydrophilic residues. Similar findings were observed by Muddiman et al. for intact proteins66 and DNA oligonucleotides67 , all of which corroborate the seminal study by Fenn5 with quaternary alkyl amines. Researchers have built on these earlier fundamental studies of simple systems and developed interpretative and predictive informatics tools to dissect the ESI response of complex proteomics datasets.68–72 Hydrophobicity, basicity, and molecular weight were found to be the top ranked physico-chemical properties that predict ESI response. The development and evolution of these bioinformatic tools will likely facilitate more efficient development (i.e., less trial and error) of targeted PC-IDMS assays and potentially expand our fundamental understanding of peptide ESI response in complex mixtures.
1.2.3 Ion Transmission Efficiency
The transmission of ESI-generated ions is an inefficient process relative to the number of ions generated. Modern LC-MS systems can routinely achieve femtomole to attomole detection limits for targeted PC-IDMS studies. However, it has been shown experimentally that the ionization efficiency of ESI (and nESI), which approaches unity, far exceeds the transmission efficiency, which is estimated to be less than 1%.38,39 Figure 1.3A illustrates the ionization efficiency relative to the distance between the nESI emitter tip and the heated capillary inlet. As the charged droplets approach the MS inlet, the population of ions increases exponentially with successive fission events occurring as illustrated in Figure 1.2B. Furthermore, as discussed in the previous section, there is inherent bias during ionization with the better ionizing peptide generating a higher ESI response (Figure 1.2B and Figure 1.3A). Thus, when considering that contemporary LC-MS systems can routinely provide low femtomole detection limits for tryptic peptides, it becomes clear that there is enormous opportunity to improve detection limits with higher transmission efficiencies.
There have been several approaches to improve the ion transmission efficiency both at the MS inlet (Figure 1.3A) and within the MS inlet capillary/orifice (Figure 1.3B). These include fabricating multiple MS inlets,73–75 flared capillaries,76,77 atmospheric pressure separation (e.g., FAIMS),78–87 and air amplifiers.88,89 Each study has demonstrated improvements from two- to twenty-fold depending on the analyte. The amount of ions that make it into the MS inlet also depends on the nESI emitter-to-capillary distance.62 Too far and the majority of ions are lost at the face of the capillary inlet. Too close and ions do not have sufficient time and distance to form from the charge droplets (see section 1.2.2). Furthermore, Page et al.62 showed that significantly more current is lost to the inner wall of the capillary (Figure 1.3B), reducing ion formation and/or transmission. The MS inlet capillary or orifice illustrated in Figure 1.3B has been increasingly shortened from ∼6 inch-long glass capillaries to ∼1 inch metal inlets that vary from capillaries to skimmer cones (e.g., Micromass Z-spray). The length, inner diameter, and temperature of the inlet dictate the conductance90,91 and the pumping speed necessary in the intermediate pressure regime (Figure 1.3C). For commercial instruments where the physical dimensions are not easily shortened to improve transmission efficiency,90 the only experimental parameters that can be adjusted are the applied voltage, temperature, and the proximity of the nESI emitter to the MS inlet. Temperature is adjusted such that it is sufficiently high to effectively desolvate charged droplets (i.e., promote peptide ionization) yet low enough to facilitate effective conductance into the intermediate pressure region. Increasing the MS inlet temperature can result in a drop in pressure in the intermediate pressure region, which could in turn negatively impact ion transmission through the inner capillary wall. For example, ion transmission is sensitive to the position of the Mach disc relative to the skimmer cone, the former being sensitive to the amount of gas exiting the capillary. The illustration in Figure 1.3C gives a generic overview of both older ESI sources (skimmer cone) and newer ESI sources that incorporate ion funnel technology. The ion funnel developed in the Smith lab73,74,92–101 has replaced some of the skimmer cone-based sources in commercial LC-MS/MS platforms in an effort to improve ion transmission in the high/intermediate pressure regions immediately following the exit orifice of the mass spectrometer inlet. Although these technologies have been proven effective in custom-built instruments, it is difficult to quantify how they impact commercial instruments with restricted access to instrument controls and an inability to modify or measure losses at each stage of ion formation and transport. The same can be said for understanding losses during ion storage and mass analysis because each contemporary instrument platform from a given manufacturer is unique. Quadrupoles, linear ion traps, time-of-flight analyzers, Orbitraps, and multiple combinations thereof (i.e., hybrid MS/MS instruments) all have different operating principles and performance characteristics that make independent cross-instrument platform comparisons of ion transmission challenging. The reality of quantitative LC-MS/MS-based proteomics is that the user has increasingly less control over how commercial instruments function and limited ability to modify the instrument in an effort to improve performance. To keep pace with the rate of instrument improvements and maximize performance, section 1.4 of this chapter provides a discussion of a systematic and empirical method for optimizing LC-MS/MS instruments, regardless of instrument manufacturer.
1.3 Contemporary LC-MS/MS Instrumentation
1.3.1 Liquid Chromatography
Chromatography is essential prior to MS-based proteomics analysis for complex protein purification, pre-fractionation, and separation. Reverse-phase (RP) high-performance liquid chromatography (HPLC) is synonymously used as the ‘LC’ component in LC-MS/MS because it provides excellent separation efficiency of tryptic peptides and the mobile phase compositions are compatible with ESI. The demands of MS-based proteomics have driven innovation in LC technology, particularly with regard to nanoflow systems. The benefits of lower flow rates have already been discussed, yet producing stable and reproducible nanoflow gradients on capillary reverse phase columns is not trivial. Early efforts by researchers to achieve nanoflow gradients involved splitting the flow from a standard μL min−1 or mL min−1 HPLC system prior to the capillary column. Run-to-run reproducibility was challenging because back pressure on the analytical capillary column would increase with successive sample injections. Furthermore, the flow rates would change with different gradient conditions as back pressure was highest at the beginning (high aqueous) relative to the end (high organic) of the gradient. The changes in flow rate affected the nESI conditions, which in turn produced variable electrospray responses and poor inter-LC-MS/MS run reproducibility.
Splitless nanoflow LC systems were introduced around ten years ago and have become the standard for high-performance LC-MS/MS-based quantitative proteomics due to their superior flow rate stability and retention time reproducibility.102 Flow rate stability is critical for generating uniformly sized charged droplets to ensure reproducible ionization efficiency across multiple LC-MS/MS runs. Retention times are also important, particularly for label-free quantification and scheduled SRMs. Average retention time reproducibility for peptides in complex shotgun proteomic mixtures is <0.5% RSD for some of the more widely used commercial nLC systems.103 The technological advances in nLC systems related to retention time reproducibility have also been extended into the microflow rates. This is becoming increasingly important with the introduction of hybrid LC-MS/MS instruments that can perform both global ‘discovery’ and targeted quantitative proteomics studies, the former typically employing nLC, whereas the latter uses μLC. Furthermore, these developments have been integrated into ultra-high pressure (UHPLC) systems that operate at pressures above 10 000 psi. UHPLC, in general, provides higher peak capacity (and therefore improves detection limits) and the possibility of faster LC-MS/MS run times compared with HPLC.104–106 Versatile splitless nano-microflow HPLC and UHPLC systems have been critical for improved precision and accuracy in quantitative LC-MS/MS. Current trends suggest more integrated nLC-nESI interfaces in the future with chip-based columns, reduced dead volume connections, and better control over column and ESI conditions (e.g., column temperature, high voltage placement). Collectively, these advances will serve to make quantitative proteomics, particularly label-free quantitative studies, more routine and reproducible.
1.3.2 High-performance Tandem Mass Spectrometers
Contemporary global protein expression studies use high-performance tandem mass spectrometers, such as quadrupole time-of-flights (Q-TOF) and linear ion trap/quadrupole Orbitraps (LTQ-Orbitrap/Q-Exactive), to provide two essential types of data: 1. the accurate mass of the intact tryptic peptide (<5 ppm); and 2. the product ion spectra of the tryptic peptide. These two pieces of data provide a means to confidently identify the peptide relative to the relevant protein database. In principle, the faster the instrument can generate these two forms of data of high quality, the greater the number of proteins that can be identified. Protein identification is a blunt metric for assessing LC-MS/MS performance, particularly in the context of quantitative proteomics given the wide ranging goals one can set. However, the rapid pace of mass spectrometric instrumentation development over the past 10–15 years is astounding. Consider that, from 2001 to 2011, reports of comprehensive proteome coverage for yeast increased 2-fold from 1484107 to 2990108 proteins with essentially identical analysis times (28 h vs. 24 h, respectively), yet the number of identified peptides increased 2.5-fold from 5540107 to 13 682,108 respectively. The increased proteome coverage is primarily attributed to improvements in high-performance MS/MS technology, which produces high mass accuracy/high resolving power MS and MS/MS data, achieve faster duty cycles, and benefit from improved ion transmission.
Triple quadrupole LC-MS/MS systems have been mainstay instruments for the targeted quantification of small molecules. These systems have become increasingly used for targeted protein quantification using protein cleavage isotope dilution (PC-IDMS).23,24,27,28 Triple quadrupole LC-MS/MS systems do not provide high resolving power or high mass accuracy data, yet they are capable of rapidly measuring analytes in complex mixtures with high specificity over four to five orders of magnitude in concentration. The specificity is achieved with the rapid scan speeds for the mass selective quadrupoles. For a single precursor tryptic peptide ion, multiple signature product ions can be measured via rapid mass selective transitions to quantify and qualify the targeted peptide, which is representative of the protein concentration. Transition scan speeds are typically 10–50 Hz depending on the complexity of the sample, the number of peptides, the number of transitions necessary to uniquely define a peptide, and the concentration of the targeted peptide(s). Thus, monitoring a peptide with five transitions for 0.1 s each, one could collect 40 data points over a 20 s peak width. For two co-eluting peptides (i.e., target peptide and SIL internal standard) that require five transitions each at 0.1 s, one could obtain 20 data points for both peptides over the same 20 s peak width.
A potential shift away from the use of triple quadrupole LC-MS/MS systems is the introduction of hybrid, high resolving power and high mass accuracy instruments (e.g., Triple-TOF109 and Q-Exactive110 ). The newer hybrid systems do not have the mass selection/transition scan speeds found with classic triple quadrupole instruments (e.g., 10–100 Hz), but the high resolving power full scan of precursor product ions provides potentially all the information needed to offset this shortcoming. The evolution of high-performance LC-MS/MS instrumentation will continue to challenge the existing paradigms for how quantitative proteomics experiments are designed, the type(s) of information that can be collected, and the need, or lack thereof, for distinct instrument platforms to make different types of quantitative measurements.
1.3.3 High-performance Characteristics of LC-MS/MS: Cycle Time, Peak Capacity, and Dynamic Range
‘High-performance’ LC-MS/MS has been used as a general term in this Chapter to describe many different types of instruments. Here, the term is given some figures of merit that can be used to better define LC-MS/MS performance. Two main types of instruments have been discussed in this Chapter: 1. high-performance LC-MS/MS for global protein expression level studies; and 2. high-performance LC-MS/MS for targeted protein quantification. However, it should be re-emphasized that, as LC-MS/MS instrumentation continues to evolve, the distinctions in application for one type of instrument versus the other continue to blur. Three metrics are discussed in the context of quantitative proteomics: duty cycle, peak capacity, and dynamic range.
LC-MS/MS duty cycle can be defined differently depending on the type of instrument, experiment, and desired data. The basic definition of LC-MS/MS duty cycle is given by eqn 1.2.
where Tion is the amount of time the instrument spends measuring an ion and the cycle time (Tcycle) is the total amount of time between each measurement. If we consider a typical global proteomics LC-MS/MS data-dependent acquisition (DDA) experiment, a high resolution/high mass accuracy full scan of the eluting proteolytic peptides provides a list of peptide m/z values for subsequent mass selection, dissociation, and detection. A typical full scan (300–2000 m/z) can provide up to ∼30 unique peptides that can be subjected to MS/MS. A standard DDA is set-up so that the top ten most abundant peptides are selected for MS/MS analysis (‘top ten’). Thus, the full scan+ten MS/MS scans+the signal processing time represents one cycle. Consider a Q-Exactive operating in a typical ‘top ten’ mode110 with a 256 ms high-resolution scan and 10×64 ms MS/MS scans to give a total of 1.06 s: the actual time for this full cycle is 1.22 s if one accounts for the signal processing overhead. Thus, the duty cycle for the Q-Exactive DDA, as defined by eqn 1.2, is 1.06/1.22=0.87. In other words, the instrument spends 87% of its time detecting and fragmenting ions, which is remarkably efficient. However, this can be misleading if the goal is to identify/quantify as many peptides as possible, or if only specific precursor ions are of interest. For global LC-MS/MS studies, peak capacity is a potentially more useful measure of instrument performance.
The terms are the total time the peptides have to elute (L), the average peak width (4σ), the number of MS/MS events in a cycle (# MS/MS), and the time for one DDA MS/MS cycle (Cycle time). If we assume a 90 min LC gradient, average peak width of 25 s, # MS/MS events=10 (top ten), and a cycle time of 1.5 s; the maximum theoretical peak capacity is equal to 7.8×106:
Of course, not all MS/MS events result in useful product ion spectra enabling peptide identification due to a lack of signal or poor quality mass spectra. Furthermore, there are several practical limitations encountered during a shotgun DDA that impact the theoretical peak capacity. Abundant peptides can dominate the full scan mass spectrum limiting the detection and subsequent analysis of lower abundant peptides. The abundance of the peptide may be indicative of its true expression level (e.g., albumin in plasma) and/or it may be the result of higher ionization efficiency during the ESI process. To counter this effect, pre-fractionation of the sample or adjustment of the DDA LC-MS/MS method may be necessary. Shifting to UHPLC in theory improves peak capacity for the chromatography, but it increases the burden on the MS/MS. If we substitute a peak width of 10 s into eqn 1.3, the theoretical peak capacity decreases by half. However, narrower peak widths could potentially open elution windows that give peptides with lower surface affinities a better chance to effectively compete for charges on the droplet surface for subsequent detection and sequencing.
The abundances of proteins in mammalian cells can range from 1 to 107 copies per cell112,113 and, in plasma, circulating levels of proteins can exceed 1012.114 Global LC-MS/MS protein expression studies in cell lines can provide excellent proteome coverage with recent publications reporting ∼50% coverage of all predicted proteins for human cell lines.112,113 Detecting quantitative changes for each protein across this range of abundances requires extensive fractionation, significant instrument time, and one or more quantitative strategies. For this discussion, three quantitative strategies will be considered, including SILAC, isobaric tagging, and label-free approaches. The magnitude of detectable protein expression changes are defined by the linear dynamic range for each approach. The linear dynamic range for SILAC and isobaric tagging is around two orders of magnitude, whereas label-free quantification is around three orders of magnitude.115 However, the linear dynamic range can be extended in some cases depending on the complexity of the sample, LC-MS/MS platform, and LC-MS/MS settings. Targeted protein quantification studies by PC-IDMS LC-MS/MS have reported four to five orders of magnitude of linear dynamic range.115 The upper limit for the linear dynamic range occurs when the concentration (Ni, eqn 1.1) begins to approach and exceed the charges available per droplet. Thus, there reaches a point where more sample loaded onto the column becomes deleterious to quantitative measurements.
1.4 Optimizing LC-MS/MS for Quantitative Proteomics Using Design of Experiments
The diversity of MS-based quantitative proteomics experiments is immense, making a single set of optimal LC-MS/MS conditions impractical. Goals for each experiment can be significantly different and involve different LC-MS/MS platforms with distinct operating parameters. Furthermore, the technical expertise and established workflows for individual laboratories will be different resulting in a high degree of inter-laboratory variability.116 Given these factors, combined with the rapidly changing landscape of LC-MS/MS technology that is increasingly ‘blackbox’, it becomes necessary for each laboratory to efficiently and empirically develop optimized strategies for a broad range of quantitative proteomics experiments.
Design of experiments (DoE) was developed in the early part of the 20th century by Sir Ronald Fisher and has been extensively used to efficiently increase productivity in a wide range of fields, including manufacturing, agriculture, engineering, and the basic sciences.117,118 In DoE, an experimental framework is constructed that incorporates the principles of replication, randomization, and blocking (i.e., comparisons of parts of the overall experiment that are expected to be more homogenous) to assess the statistical significance of experimental conditions (i.e., factors) on the desired outcome(s) (i.e., response). This framework allows for the statistical evaluation of responses (e.g., proteome coverage, limits of detection) from multiple factors (e.g., ESI voltage, capillary temperature, CID energy, number of MS/MS events, etc.) that both reduce the time to optimize the LC-MS/MS system and also detect factors (or combinations of factors) that interact with one another. Many types of factorial designs have been developed and described in detail and fall outside the scope of this Chapter.119,120 The main types discussed here are two-level full factorial design (FullFD) and fractional factorial design (FracFD), where only two levels are considered. These experiments (commonly referred to as two-level screens) make the following assumptions: 1. factors are fixed; 2. the responses between the fixed factors are linear; and 3. responses are normally distributed.
DoE has been used for optimizing MS-based measurements,121–124 but it has not been widely reported by researchers in academia compared with industrial researchers, who are not always permitted to publish in peer-reviewed literature. An informative tutorial by Riter et al.125 makes this point and showed how DoE could be used to efficiently optimize LC-MS/MS for bottom-up proteomics. This approach was recently extended to newer LC-MS/MS systems using total yeast digests. Andrews et al.124 used a combination of FullFD and FracFD to optimize a nanoLC-LTQ-Orbitrap system for qualitatively and quantitatively studying a total yeast digest. The authors identified statistically significant main effects, such as capillary temperature, ionization time, monoisotopic precursor selection, number (#) of MS/MS events, and tube lens voltage, which collectively improved proteome coverage by ∼60%. The same group used DoE again to optimize a nanoLC-Q-Exactive with a total yeast digest and empirically derived the critical factors that influenced protein identification.126 Clearly, there is great potential for DoE to empirically explore and determine the best LC-MS/MS settings for virtually any type of quantitative proteomics study.
Two-level FullFD provides the most comprehensive assessment of factors, and combinations of higher-order interactions, that yield statistically significant response(s). Figure 1.4A shows a FullFD design for testing the response of four factors. However, this level of detail comes at the expense of efficiency, where the number of experiments necessary to complete a two-level FullFD for k factors is defined as 2k experiments for a single replicate. If we assume each LC-MS/MS experiment (i.e., Run #) takes 90 min, it will take 24 h to complete a single replicate of all four experimental conditions. This is not an impractical interval of time to evaluate four experimental factors, even if done in triplicate over three days. However, it would not be unreasonable to identify a greater number of experimental factors that would impact the optimization of a typical quantitative LC-MS/MS experiment.124 If we consider a ten-factor FullFD study, a single replicate would occupy almost two months of instrument time assuming a 90 min LC gradient. Thus, more efficient and practical factorial designs are needed to explore the full range of factors one would encounter in an LC-MS/MS workflow, particularly for global protein expression studies or multi-peptide SRM-based assays.
Two-level FracFD are commonly used in place of FullFD when the number of experimental factors is large and the time to optimize an experiment or process is limited. Figure 1.4B illustrates a common FracFD for the same four factors as in Figure 1.4A, which takes half the time. The gain in efficiency, however, is typically at the expense of the extent of confounding (i.e., an inability to differentiate contributions from individual or combined factors on a response), which is a measure of the FracFD's resolution. A lower resolution requires fewer experiments, but results in greater confounding, where a response cannot be attributed to a single factor or higher-order factor interactions. For example, a two-level FracFD of ten factors with a resolution of three requires 16 LC-MS/MS runs to complete a single replicate. This FracFD allows all main factors to be assessed without confounding, but some main factor–two factor interactions and all two factor–two factor interactions are confounded. For the same study, FracFD can be designed at a resolution of four that requires 32 LC-MS/MS runs for a single replicate. Such a design can assess responses for main factors and main factor–two factor interactions, but two factor–two factor interactions are confounded. If we compare the necessary LC-MS/MS analysis time for a two-level, ten-factor FullFD vs. a FracFD (resolution three) vs. a FracFD (resolution four), the necessary LC-MS/MS analysis time is decreased from two plus months to one day and two days, respectively.
DoE represents one of the most underutilized approaches for efficiently optimizing LC-MS/MS systems for complex quantitative proteomics studies. Previous efforts to optimize LC-MS/MS systems typically study the effects of one factor at a time, which is inefficient and insensitive to significant interactions that impact proteome coverage, quantitative precision and accuracy, and limits of detection. By applying some general knowledge of LC-MS/MS instrumentation, one can focus on 5–15 critical factors and develop a two-level FracFD experiment to screen for the main effects and higher-order interactions. Additional FracFD or FullFD with more than two levels can be developed to further optimize a system, particularly for targeted assays, where subtle changes are likely more important. As more researchers become aware of the power of DoE, it is likely to become a major factor in developing LC-MS/MS methods for quantitative proteomics.
1.5 Summary
Quantitative LC-MS/MS-based proteomics has evolved at a tremendous rate over the past 10–15 years. For example, instrument manufacturers introduce new products about every two years, making it very difficult to evaluate instrument performance and optimize for complex proteome analysis. Despite the frenetic pace of new instrument releases, there remain significant opportunities to improve the utility and quality of quantitative LC-MS/MS measurements. Fundamental studies focusing on the mechanism of ESI promise to yield important fundamental information about improving ionization efficiency for low-responding peptides and potentially leveling the response so as to improve proteome coverage. Ion transmission remains a significant barrier to measuring low-abundant peptides and small populations of cells. The trend toward faster MS/MS scan speeds will continue to expand our ability to measure more proteins at greater proteome depth and promises to effectively exploit the higher peak capacity of UHPLC. DoE is an underutilized strategy for optimizing complex LC-MS/MS measurements that should find more widespread use in the proteomics community. The growth and acceptance of a broad class of quantitative proteomics approaches is a testament to the power and untapped potential of LC-MS/MS technology.
AMH gratefully acknowledges funding support from NIH (K25CA128666) and Virginia Commonwealth University.