Data-dependent vs. Data-independent Proteomic Analysis
Data-dependent vs. Data-independent Proteomic Analysis
In proteomics, one of the major aims is to compare samples of interest (such as healthy vs diseased tissue) to identify which proteins are differentially expressed and to quantify these differences. Mass spectrometry (MS) is one of the most popular methods used for such analyses.
There are currently two broad approaches toward generating bottom-up or “shotgun” MS proteomic data: data-dependent acquisition (DDA) and data-independent acquisition (DIA).1 In tandem MS (MS/MS), the DDA approach only puts forward certain peptides generated during the first cycle of MS for fragmentation during the second cycle, while with the DIA approach, all peptides generated during the first MS cycle can be fragmented in the second round.
As with data acquisition, data analysis can be performed using one of two main approaches.1 The database search, which compares measured spectra with those in established databases, and the de novo search, where the MS/MS spectra are first deconvolved into multiple “pseudo spectra” that are then compared to known spectra using database searches. DDA uses the first approach, whereas DIA makes use of the latter or mixed approaches.
Here, we will compare and contrast the DDA and DIA approaches in proteomic analysis, so that readers can gain a useful overview of where they are best applied and what their advantages and disadvantages are.
Data-dependent acquisition (DDA)
- Only selected peptides are further fragmented during the second stage of tandem MS
- These selected peptides are chosen within a narrow range of mass-to-charge (m/z) signal intensity
- Typically, the precursors of highest abundance (called the “top N” precursors) are selected for further analysis
- The top N are typically 10–15 peptides in total
- MS/MS data acquisition occurs sequentially for each peptide
- The resulting data are used to search an existing database/s1–5
- Simpler to set up and analyze
- Lower demand on computational resources
- Cheaper to run
- Database-dependent algorithms used for DDA analysis are generally faster than de novo algorithms
- DDA may be best for targeted analysis (where the target peptides are in an existing database) as it offers more sensitive quantification than DIA
- Allows relative quantification of peptides between samples using various chemical labeling approaches (e.g., SILAC or iTRAQ)1–5
- The MS instrument decides on the fly which are the top N precursors and then fragments them one after the other. This introduces a level of bias.
- As a result, DDA datasets can contain “gaps” where peptides have been identified in some samples only. Even though some tweaks have been introduced to mitigate this, this remains an issue.
- Lower precision and reproducibility than DIA
- Low-abundance peptides are under-represented1–5
Data-independent acquisition (DIA)
- All peptides are fragmented and analyzed during the second stage of tandem MS
- Tandem mass spectra are acquired either by fragmenting all ions that enter the mass spectrometer at a given time (called broadband DIA) or by sequentially focusing on a narrow m/z window of precursors and fragmenting all precursors detected within that window
- MS/MS data acquisition occurs in parallel across peptides
- Resulting MS spectra are highly multiplexed (MS2 spectra)1–5
- Does not require prior knowledge of the protein composition of the sample
- Less biased as all peptides are included in the analysis
- Allows greater temporal resolution, which is an advantage for certain analyses (e.g., looking at changes in protein expression or post-translational modifications over time within the same tissue)
- Can quantify proteins in complex mixtures over a large dynamic range, thereby overcoming the challenge of undersampling when using DDA
- Offers higher precision and better reproducibility than DDA
- Best approach for discovery proteomics as no assumptions are made (e.g., comparison of large sample cohorts to see differences in protein expression)
- DIA data can be retrospectively analyzed with an improved algorithm to generate even better results1–5
- Amount of data generated is much larger, so can place a high demand on computational resources
- Data analysis is challenging because of the multiplexed nature of the MS2 spectra
- The robust database-based search methods used for DDA cannot be applied directly
- Further improvements are required in the tools and software used to deconvolute the complex spectra produced
- De novo search algorithms used in DDA are usually iterative and may not always converge around the same answers
- Fragment ions in MS2 spectra cannot be traced back to their precursors as they can potentially result from multiple precursor ions
- Tends to be more expensive than DDA
- In terms of quantification, DIA has lower sensitivity than DDA as the complete spectrum must be scanned, reducing the acquisition time per data point
- De novo search algorithms are not as good at quantification as database search algorithms, which can also reduce quantification sensitivity
- Algorithms need to control the false discovery rate among the identified peptides while also identifying as many of the real peptides as possible1–5
Some experts believe that, because of the continual improvements in algorithms and software for deconvoluting the complexity of DIA data, DDA and DIA will eventually merge into a single hybrid method. Indeed, this appears to already be happening, as a recent report still in publication discusses the development of a method called, “Data dependent-independent acquisition proteomics," or "DDIA" for short. This method combines DDA and DIA in a single LC-MS/MS run and uses deep-learning tools for more streamlined data analysis.
Overall, because of its ease of setup and analysis, DDA is probably the best approach to use if you are new to tandem MS and/or discovery proteomics. On the other hand, DIA is the best approach if you are more experienced and you want an unbiased and deeper look at the proteome of your samples, particularly when these samples are from a little-studied organism (e.g., the water flea, a keystone species of aquatic habitats) or cell type (e.g., senescent cells).
- Hu A, Noble WS, Wolf-Yadlin A. Technical advances in proteomics: new developments in data-independent acquisition. F1000Res. 2016;5(F1000 Faculty Rev):419. doi: 10.12688/f1000research.7042.1.
- Kawashima Y, Watanabe E, Umeyama T, et al. Optimization of data-independent acquisition mass spectrometry for deep and highly sensitive proteomic analysis. Int. J. Mol. Sci. 2019;20(23):E5932. doi: 10.3390/ijms20235932.
- Bruderer R, Bernhardt OM, Gandhi T, et al. Optimization of experimental parameters in data-independent mass spectrometry significantly increases depth and reproducibility of results. Mol Cell Proteomics. 2017;16(12):2296–2309. doi: 10.1074/mcp.RA117.000314.