Unlocking New Possibilities for Proteomics Researchers

Industry Insight

Published: December 13, 2021

Ash Board PhD

Unlocking New Possibilities for Proteomics Researchers content piece image

Listen with

Speechify

0:00

Thank you. Listen to this article using the player above. ✖

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 4 minutes

Thermo Fisher Scientific recently announced the launch of Thermo Scientific Proteome Discoverer 3.0 software at ASMS 2021. With a wide range of comprehensive tools for proteomics studies, the software has the potential to improve peptide and protein identification, facilitating a wider and deeper understanding of biological system

To learn more about the newest version of Proteome Discoverer, how it leverages artificial intelligence (AI) and what this means for proteomics researchers, we spoke to Mark Sanders, Senior Director, Software Product Management at Thermo Fisher Scientific.

Ash Board (AB): What can proteomics researchers expect from the latest version of Proteome Discoverer, what benefits does it provide over previous versions of the software?

Mark Sanders (MS): I think this was the highlight of our software introductions this year at ASMS. Proteome Discoverer 3.0 software with the CHIMERYS intelligent search algorithm brings a truly transformational improvement to proteomics data processing. Our latest version allows customers to dig deeper into their new and existing proteomics data to substantially increase the number of identified and quantified peptides. The improvement depends on the complexity of the sample. But for typical proteomics datasets, we see a 1.8-fold increase in the number of unique peptide identifications and a 1.5-fold increase in the number of total protein identifications, when compared to the existing tools. We are very excited by this amazing jump in performance. This will allow people to dig deeper into their data and have greater coverage of important pathways that they're studying. It allows them more flexibility as well. So, they can choose whether they want to generate data at a higher throughput and get the same amount of coverage as they're getting today but do it in a quarter of the time. Alternatively, they can achieve more coverage to unlock new insights. We're really excited to see what our customers can achieve with these new capabilities.

AB: The software is billed as having the CHIMERYS search engine by MSAID, can you explain more about this and what it enables?

MS: The CHIMERYS intelligent search algorithm by our collaborators at MSAID solves a long-standing challenge in proteomics in that we have to deal with chimeric or mixed tandem mass spectrometry (MS/MS) spectra. So, in a complex mixture, when we isolate the parent ion of a peptide and fragment it, you typically don't isolate a single peptide. You often have many precursors and when you fragment you get a mixed MS/MS spectra. The current strategies to process proteomics data are only able to identify a few peptides per tandem mass spectra. With CHIMERYS, we can identify up to 12 per spectra, and we have seen 12 peptides all eluting at the same time with similar masses, and we can capture all that information now. This leads to a substantial improvement in peptide identification and quantitation by more fully and faithfully deconvoluting the data that's generated by the instrumentation. This is information that's always been there but now we have the tools to actually extract that information from the data.

AB: The news release mentions how AI enables deeper mining of proteomics data, can you describe how this works and how much more data can be extracted using this method?

MS: AI and machine learning are at the core of how CHIMERYS was developed. The training set for CHIMERYS was derived from over 1.4 million synthetic peptides that represent human proteins. The data was collected on Orbitrap instruments at multiple collision energies, using a variety of fragmentation methods. This resulted in a staggering 21.8 million high quality MS/MS spectra. This was used as the basis for the INFERYS deep learning model that our colleagues at MSAID developed. They used graphics processing unit (GPU)-based neural networks to generate prediction models for tandem mass spectra. So essentially, if you give the software a peptide sequence and a collision energy, it will actually predict very accurately the MS/MS spectra including the intensities of those ions. On top of that, there is also an automatic refinement learning to predict peptide retention times for any gradient length, which we can use as another scoring factor.

With Proteome Discoverer 3.0, we've incorporated a new INFERYS 2.0 algorithm, which expands on what we had before. This now adds collision-induced dissociation (CID) fragmentation, tandem mass tag (TMT)- and TMT Pro-labeled peptides on top of the existing high collision energy (HCD) functionality which was released in Proteome Discoverer 2.5 last year. So, we had this prediction capability in the product last year and have now improved this further. But the big improvement comes from how we've leveraged the INFERYS AI to create CHIMERYS, which allows us to deconvolute these chimeric MS/MS spectra accurately. CHIMERYS provides identification rates of around 80%, which is nearly double the identification rate when compared with current approaches.

AB: How does the software enable confident identification of post-translational modifications (PTMs)? Can you discuss why this is important for your customers?

MS: It is important to know how proteins are modified as this is related to signaling and the function of proteins. It’s often not the presence of a protein but the level of either glycosylation or phosphorylation that really determines its function at that particular time. Proteome Discoverer software is an extensible framework, which means that users are able to deploy third party nodes within the software to provide tailored functionality for their PTMs of interest. For example, we can incorporate the Byonic node from Protein Metrics for glycoproteomics, MS PepSearch for phosphorylation or ptmRS from the Institute for Molecular Pathology in Vienna to provide local site localization confidence. These nodes can be added to workflows, either independently or in tandem, to allow our users to deploy purpose-build workflows that specifically address their scientific needs.

AB: A major bottleneck in high-throughput proteomics is the large amount of data produced for analysis. Can you outline how the new software overcomes this bottleneck?

MS: Proteome Discoverer software is built to handle complex data through study management, flexible workflows, and results filtering with interactive graphical views for statistical analysis. We can manage the study right from the very beginning. With flexible workflows you can build custom workflows to meet your needs and answer your questions. In addition, there are many tools for results filtering and interactive graphical views for statistical analysis. By providing this fully integrated functionality we streamline the viewing and interpretation of these large datasets to reduce the time from results to meaningful insights.

AB: Lastly, compared to previous conferences, there seems to be a shift from new instrumentation introductions to new software introductions. Can you discuss why this has been a focus for Thermo Fisher Scientific this year?

MS: While I definitely see that trend in the industry I think that Thermo Fisher Scientific has been a bit of an outlier in that we continue to have a strong commitment to delivering hardware improvements and we continue to launch major hardware products each year. Having said that, we also realize that there's an enhanced need for software offerings, both in terms of creating more intelligent instruments, and in terms of providing tools that can move the user quickly from data to insights. With modern instruments, it's become very easy to generate massive amounts of data, and the data generated by our instruments has reached incredible levels of depth. In order to take full advantage of that, we've had to focus on developing new software solutions that really enable users to leverage this mountain of information and turn it into insights. We really want to streamline that process from data acquisition to results. This is essential to scientists’ productivity and ability to generate knowledge. We want to make sure they're fully equipped to tackle any challenges that they may face.

Mark Sanders was speaking to Dr. Ash Board, Editorial Director for Technology Networks.

Meet the Author

Ash Board PhD

Editorial Director

Ash Board is the editorial director at Technology Networks, he holds a PhD in Chemistry from the University of Nottingham and has over 10 years' experience in science publishing.

Proteomics & Metabolomics

Proteomics & Metabolomics

Unlocking New Possibilities for Proteomics Researchers