Data-Independent Acquisition: A Superior Technique in Mass Spectrometry?
Data-Independent Acquisition: A Superior Technique in Mass Spectrometry?
Data-independent acquisition (DIA) is a relatively new approach to acquisition in Mass Spectrometry (MS). Traditional data-dependent acquisition (DDA) takes only a selection of peptide signals forward for fragmentation, and then matches them to a pre-defined database. In contrast, DIA fragments every single peptide in a sample. It therefore unbiased, in theory making it the better technique for discovery proteomics. To discuss this in more detail, we caught up with Florian Marty, Director Global Product Support at Biognosys AG, who firmly believes that DIA is the superior technique in the area.
Ruairi Mackenzie (RM): Tell us about DIA – in what situations might DIA be the best form of analysis for a sample?
Florian Marty (FM): Label-free DIA gets its strength when large sample cohorts are quantitatively compared in an exploratory analysis. Sample types such as cell lines, tissue, FACS sorted cells, CSF, urine, and plasma are very well suited. We believe that in terms of label-free discovery proteomics, DIA is the best choice.
DIA is also very suited to the analysis of post-translational modifications (PTMs) because of the additional time resolution it has on the MS2 level. This should help distinguish peptide forms with identical parent mass, very similar chromatographic retention time, and only small differences in fragmentation pattern. Even though some key papers have been published in this area, more work needs to be done in this direction.
We envision that going forward the bottom-up discovery proteomics field will move in two main directions. The first will be Label-free DIA for mid- to large-scale studies where dozens to thousands of samples need to be measured and analyzed with very high data completeness. Alternatively, isobaric labeling and deep fractionation can be used to achieve very high proteome coverage for up to 11 samples and few conditions limited to the multiplexing capabilities.
RM: What improvements does DIA make on comparable DDA acquisition?
FM: DIA, with its parallel nature of acquiring MS/MS spectra, is overcoming the limitations of sequential MS/MS acquisition in DDA. In other words, on short gradients and using the Q Exactive HF, we could show that DIA can identify more peptide precursors than DDA can theoretically acquire MS/MS spectra in a sequential manner1. Further, the DIA data show higher precision (lower CVs) and better reproducibility (fewer missing values). As an example, we recently conducted a large-scale study of over 1541 human non-depleted plasma samples from 4 clinical time points. In this study, we identified, on average, 450 proteins per run and achieved a data completeness of 77% on the protein level. For both the precursor and the protein level, the data was controlled at 1% FDR. Performing such a study would be very challenging using label-free DDA. Because of the semi-stochastic nature of DDA, MS1 alignment, also known as match between runs, is strongly recommended. For MS1 alignment, however, a very stable chromatography is required, which is not easy to maintain at nanoliter flow rates per minute for 1500 samples. Furthermore, MS1 alignment is typically not performed using FDR control influencing the quality of quantification.
Also, the data analysis is much faster: The full data set of 1541 runs can be processed in three days on a standard workstation-- including the processing time needed for library generation. This is roughly 10 times faster than processing a similar DDA data set.
RM: Does a technique like DIA produce a vast amount of data? If so, what are the preferred methods for handling and analyzing that data?
FM: Sure, the amount of data generated is bigger than for DDA. But, I would not say this is a disadvantage. In contrast, in DIA, every peptide precursor that is above the limit of detection is fragmented and all MS/MS information is stored. Therefore, DIA allows the researcher to convert a biochemical sample into a digital proteome allowing the processing of old data sets with new algorithms to dig deeper. When talking about physical storage needs, one way of reducing the data size is storing the data in centroid mode. In our facility, this leads to 5-10 times less storage space needed without significant loss in performance.
Data analysis can broadly be classified into two categories: spectrum-centric and peptide-centric strategies. The spectrum-centric analysis is more similar to classical DDA database searches. Tools such as DIA-Umpire2,3 or Spectronaut Pulsar can perform such an analysis. The peptide-centric analysis strategy was first published by Gillet et al. and uses a peptide library to perform a targeted analysis of the data4. OpenSWATH5 and Spectronaut Pulsar are specialized on this type of targeted analysis which generally provides the deepest proteome coverage.
RM: What is the future and potential of a technique like DIA?
FM: Jesper Olsen's group from the CPR, Copenhagen, recently showed in their benchmark paper of the Thermo Fisher QExactive HF-X, that they can identify 5900 protein groups of a HeLa digest in a single shot 30min DIA injection6. Ever increasing speed, resolution, and sensitivity will further simplify the deconvolution of the vastly complex spectra in DIA. This, together with a continuous improvement on the algorithms and software, will further increase the possible throughput as well as proteome coverage of DIA. As described above, DIA with its excellent reproducibility and quantitative precision will especially be valuable in the emerging field of proteomics-based precision medicine. Further development will also make DIA more applicable in PTM research as more and more algorithms are being developed for site localization. Finally, improvements on the direct search of DIA data as implemented in Spectronaut™ Pulsar (directDIA) or DIA-Umpire will additionally make DIA workflows more straightforward and easy to use. It is reasonable to assume that DDA and DIA will ultimately merge into a single hybrid method.
Florian Marty was speaking with Ruairi J Mackenzie, Science Writer for Technology Networks
1. Bruderer, R., Bernhardt, O. M., Gandhi, T., Xuan, Y., Sondermann, J., Schmidt, M., … Reiter, L. (2017). Optimization of Experimental Parameters in Data-Independent Mass Spectrometry Significantly Increases Depth and Reproducibility of Results. Molecular & Cellular Proteomics : MCP, 16(12), 2296–2309. https://doi.org/10.1074/mcp.RA117.000314
2. Tsou, C.-C., Avtonomov, D., Larsen, B., Tucholska, M., Choi, H., Gingras, A.-C., & Nesvizhskii, A. I. (2015). DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nature Methods, 12(3), 258–264. https://doi.org/10.1038/nmeth.3255
3. Tsou, C.-C., Tsai, C.-F., Teo, G., Chen, Y.-J., & Nesvizhskii, A. I. (2016). Untargeted, spectral library-free analysis of data independent acquisition proteomics data generated using Orbitrap mass spectrometers. Proteomics, 1–47. https://doi.org/10.1002/pmic.201500526
4. Gillet, L. C., Navarro, P., Tate, S., Röst, H., Selevsek, N., Reiter, L., … Aebersold, R. (2012). Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis. Molecular & Cellular Proteomics, 11(6), O111.016717. https://doi.org/10.1074/mcp.O111.016717
5. Röst, H. L., Rosenberger, G., Navarro, P., Gillet, L., Miladinović, S. M., Schubert, O. T., … Aebersold, R. (2014). OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nature Biotechnology, 32(3), 219–23. https://doi.org/10.1038/nbt.2841
6. Kelstrup, C. D., Bekker-Jensen, D. B., Arrey, T. N., Hogrebe, A., Harder, A., & Olsen, J. V. (2018). Performance Evaluation of the Q Exactive HF-X for Shotgun Proteomics. Journal of Proteome Research, 17(1), 727–738. https://doi.org/10.1021/acs.jproteome.7b00602