Today, mass spectrometry (MS)-based proteomics is considered as one of the most comprehensive, key techniques in life science, and is routinely utilized by researchers across the globe to pose hypotheses and decipher a range of analytical, biological and clinical questions. There are two principal tracks that are used to identify and characterize proteins (or "proteoforms") by MS. These are termed "bottom-up" (BUP) and "top-down" (TDP) proteomics (Figure 1). In BUP, proteoforms are distinguished by rebuilding snippets of information obtained from smaller pieces of data back into the complete – albeit inferred, and original format – analogous to piecing together parts of a jigsaw puzzle. Conversely, TDP begins with a global overview and then determines the details of the individual proteoforms by dissecting through multiple layers of information.
Figure 1: Schematic representation of the two principal tracks in mass spectrometry that are used to identify and characterize (A) proteoforms: (B) bottom-up proteomics, where snippets of information are pieced together to recreate the original proteoform; and (C) top-down proteomics, where structural details for each proteoform is determined through dissecting down through many layers of information. Image courtesy of the Kelleher laboratory, Northwestern University, Illinois.
The hard facts on BUP and TDP
In more technical terms, BUP proteomics involves enzymatically digesting all the proteoforms in a sample into smaller peptides that are now representative of specific sections of the original proteins (Figure 1B). Here, the entire complex mixture is analyzed by MS. Generally, peptides are quite soluble, biochemically well-behaved and can be readily separated by liquid chromatography prior to analysis by MS. Protein fragments are selected, and most frequently, the peptide backbone fragmented by collision-induced dissociation (CID) and/or higher-energy collision-induced dissociation (HCD) to reveal the amino acid sequence of the peptide. Sequenced peptides are matched to sections of proteins by comparing the experimental MS-generated data to in silico information determined from compiled and curated protein databases, such as UniProt. Several peptides that match one protein provide increasing confidence that the protein was indeed present in the original sample.
Conversely, TDP involves an initial global overview of all the intact proteoforms in a sample (Figure 1C). A key difference with TDP is that the proteins are not enzymatically digested into peptides. Rather, intact, full-length proteins in an unadulterated pristine form are analyzed. Structural information on each proteoform is then determined by fragmentation of the intact protein to delve into the framework of the individual proteoforms and decipher the composition. TDP still requires separation of the intact proteins in complex biological samples, and this can be achieved using conventional techniques such as liquid chromatography or 2-D gel electrophoresis followed by MS analysis. Fragmentation of the intact proteins is achieved via dissociation methods such as HCD, electron-capture dissociation (ECD) and electron-transfer dissociation (ETD) and/or combinations thereof.
Comparing and contrasting the pros and cons of BUP and TDP
Akin to any technology-based analytical approach, there are always inherent difficulties and advantages, and both BUP and TDP are not exempt. The main benefit of TDP is that the intact mass of every proteoform is retained and labile structural characteristics that are often destroyed in BUP are preserved. Consequently, a more complete overview of all the modification sites and patterns can be obtained. Thus, TDP can be used to detect protein isoforms, sequence variants, N and/or C-terminal cleavage products and other degradation products. The approach can also successfully identify unpredicted/unexpected post-translational modifications (PTMs) and determine the presence of multiple known modifications and combinations thereof. In addition, TDP is more amenable to identifying low molecular mass proteins, often missed by BUP. Sample preparation for TDP is relatively simple and less time-consuming because the process of digesting proteins into peptides (fundamental to BUP) is eliminated. Despite these clear advantages, protein identification and proteoform characterization by TDP suffers from limitations in the detection and fragmentation of larger proteins, and inherent dynamic range issues. Another drawback of TDP is the large-scale identification of individual proteins in complex samples. This is still restricted because reliable and simple intact protein fractionation methods integrated with MS are lacking.
When discussing his opinion on TDP, Roman A. Zubarev, professor at the Chemistry Division of the Karolinska Institute in Stockholm, says: "I don’t think top-down proteomics fully deserves its name yet.” Zubarev, whose particular interest is chemical proteomics1, where the task is to identify one or several drug target proteins from many thousands of possibilities, believes that TDP is a vision of the future, and may very well remain so for another decade or two.
As he sees it, all proteomic work of major importance is still currently performed by BUP, and in his opinion, there are only a few good examples from the TDP field that demonstrate the applicability of the approach. He says: “Unfortunately, there is still no standard method that works, and works every time.” Nevertheless, he acknowledges and has an appreciation for TDP, which he believes is best performed on a case-by-case basis, i.e., in the low-throughput manner.
Compared to TDP, BUP has a few, albeit core, advantages. Fundamentally, BUP is a relatively simple and reliable approach to determine the protein composition of any given sample of cells, tissues, etc, and is well-supported by available instrumentation and software. In addition, the upstream separation of peptides is considerably more advanced than for intact proteins, and BUP has inherent higher sensitivity than TDP. Despite these key advantages, there are some disadvantages with this widely adopted approach. The characterization of small proteins is still quite challenging, because often an insufficient number of proteolytic peptides are generated for unequivocal identification by MS. Also, protein sequence coverage is restricted to the peptides that are identified, labile PTMs are often lost and there is ambiguity as to the origin of redundant peptide sequences. Finally, information on whether protein isoforms, N and/or C-terminal cleavage products, and degradation products are present in a sample is overlooked; unpredicted PTMs or multiple modifications are missed.
When asked his opinion on the key advantages of BUP and TDP, Zubarev responds with: “Bottom-up proteomics works. It is an off-the-shelf product. You buy standard instrumentation, software and chemicals, and you can perform decent proteomic work anywhere in the world.” He continues: “One of the strengths of top-down is that the PTM occupancies can be determined much more accurately, especially when multiple PTMs are present at the same time; however, no killer application has yet emerged from this capability.”
Neil Kelleher, professor of Chemistry, Molecular Biosciences, and Medicine at Northwestern University adds: “By virtue of approximately 4 billion USD invested into the BUP ecosystem, proteome coverage has greatly improved. So many great, inventive people are working with BUP and now it has many implementations in spatial, proximity, and compositional proteomics. It’s really been a tremendous advance for basic biomedical research, and all those Cell, Science and Nature papers are a testament to this fact.”
When asked how TDP has influenced his research, Kelleher responds by saying: “It has enabled us to establish a proteoform-resolved operation and apply it towards basic and translational (clinical) research.”2 He then providesspecific examples to illustrate his point. “In chromatin biology, we’ve mapped all the abundant proteoforms and their PTMs. Another is this simple fact: when proteoforms are measured, it provides direct information on isoforms and PTM "stoichiometry" (i.e., how much of the total pool of protein present has which PTMs). This is not widely-appreciated in the field, but we benefit from it every day.”
Whilst Kelleher wishes to emphasize the value of understanding and analyzing PTMs by TDP, Zubarev suggests that the desire for this knowledge is not quite there yet: “The demand for PTM analysis from biologists and clinicians has turned out to be an order of magnitude smaller than anticipated, and this is the key reason why top-down analysis is not as popular as it might have been. It is the lack of demand that, in turn, breeds a delay in delivery. If and when such a requirement arises, TDP will become an off-the-shelf product within a short time.”
Mainstream community acceptance of TDP is on the rise
Although gaining momentum, TDP is still not as popular, nor as widely practised, in the general community as BUP. In the past, TDP was usually relegated to the analysis of individual proteins or simple mixtures. Complex samples containing numerous proteins were traditionally analyzed by more-established BUP approaches. When asked if TDP will find a larger audience in the proteomic community, Zubarev responds with: “Eventually, yes.” He believes, however, that to make this leap to widespread acceptance and application, “TDP needs to find that killer app.” In his opinion, “BUP currently has two: (i) biomarker discovery in clinical proteomics; and (ii) drug target discovery (chemical proteomics); neither of which currently require top-down approaches.” Kelleher is confident that TDP will find broader acceptance, but: “To really move the needle, we would benefit from cell- and disease-biologists pushing the field to be gene-specific in our protein identifications and more precise with our molecular assignments (i.e., proteoforms) (2). Much of TDP has been impacting the private sector more so – and they are certainly part of the community, albeit typically quiet partners.”
Kelleher highlights two specific examples where TDP has proven to flourish. “A tailored MALDI-MS top-down approach to identify bacterial pathogens has been clinically-deployed in >3,000 hospitals worldwide; and the unit of measurement is bacterial proteoforms.” He continues by saying: “Although not widely-appreciated, it is a significant clinical success!” Kelleher gives another example of applying TDP in the biopharmaceutical sector: “A lot of energy is going into applying top-down protein analysis to antibody-based therapeutic research; and eventually this will find a path from R&D further downstream into QC/QA and MAM assays for FDA approval by the mainstream Pharma companies of the world.”
Is the fusion of BUP and TDP the ultimate answer?
When asked what the future of TDP and BUP will bring, Zubarev says that the combination of BUP with TDP can definitely advance certain research areas, e.g., native antibody sequencing. “We are developing top-down methods for sequencing antibodies. There are efficient bottom-up methods available for that, but in our opinion, this must be complemented by top-down analysis.” His laboratory already has a bottom-up method for antibodies called SpotLight proteomics3,4; and are now developing a new top-down platform that combines on-line protein separation with fragmentation in a novel segmented linear ion trap and an ion trap mass analyser.
When asked the same question, Kelleher responds with: “There is clear complementarity between the two, and several groups are moving to capture that.” He firmly believes that: “Proteoform-informed "bottom-up" would be particularly efficient to firstly map all detectable proteoforms that exist in a system (or even the entire human proteome!) and then design assays for particular isoforms and PTMs that you now know are present and dynamic in the system.” Furthermore, he says: “This same modus operandi could be applied to protein biomarkers, as the technological readiness is now so much better than even five years ago.”
In 2012, Kelleher initiated and promoted the Human Proteoform Project5 and is the coordinator of the Consortium for Top Down Proteomics. Since these developments, interest has been steadily increasing within the scientific community to strive for 100% sequence coverage of all proteoforms and effectively sequence the human proteome. Two recent advances that have aided this goal are: (i) the use of native-mode electrospray for "native top-down MS"; and (ii) the advent of individual ion MS (I2MS).6 Kelleher exudes excitement at the future prospects of TDP, as he firmly believes that: “With these developments, we can now approach basically any target in biology and get useful data in a fairly short timeframe; thus opening up a new vista of investment and expectations that come along with it.”
- Saei et al. (2019). ProTargetMiner as a proteome signature library of anticancer molecules for functional discovery. Nature Communications. DOI: https://doi.org/10.1038/s41467-019-13582-8
- Smith and Kelleher. (2018). Proteoforms as the next proteomics currency. Science. DOI: https://doi.org/10.1126/science.aat1884
- Lundström et al. (2017). SpotLight Proteomics: uncovering the hidden blood proteome improves diagnostic power of proteomics. Scientific Reports. DOI: https://doi.org/10.1038/srep41929
- Lundström et al. (2019). SpotLight Proteomics-A IgG-Enrichment Phenotype Profiling Approach With Clinical Implications. International Journal of Molecular Sciences. DOI: https://doi.org/10.3390/ijms20092157
- Kelleher. (2012). A Cell-Based Approach to the Human Proteome Project. Journal of the American Society for Mass Spectrometry. DOI: https://doi.org/10.1007/s13361-012-0469-9
- Kafader et al. (2020). Multiplexed mass spectrometry of individual ions improves measurement of proteoforms and their complexes. Nature Methods. DOI: https://doi.org/10.1038/s41592-020-0764-5