The Next Big Thing in Genomics? Proteomics.
Explore how next-generation protein sequencing is unlocking proteomic discoveries.
Complete the form below to unlock access to ALL audio articles.
Proteins are the primary drivers of phenotype and function.1 The 20,000 genes in the human genome can generate potentially millions of protein variants, known as proteoforms. These proteoforms are produced by a variety of mechanisms, including post-translational modifications (PTMs), alternative splicing and germline variations.1,2 Detection of these proteoforms is essential for linking the findings of genomic and transcriptomic studies to functionally relevant changes at the protein level.1,2 While several techniques are available to identify proteoforms, each has shortcomings that are now being addressed by Next-Generation Protein Sequencing™ (NGPS).
Limitations of traditional techniques for protein analysis
Top-down and bottom-up mass spectrometry (MS) are two techniques used for protein identification. In top-down MS, intact proteins are ionized, allowing them to be distinguished via MS based on differences in their mass-to-charge (m/z) ratio.2 The totality of the ions present within a sample yields a mass spectrum, which is mapped against existing knowledge to determine the sample composition. This approach allows for exact mass calculations of proteins and does not require digestion of proteins into peptides. Characterization of heavily modified proteins can be problematic, however, especially when they exist in a population of similarly modified proteoforms.3
In contrast, peptides are the starting point for bottom-up MS, which is a core technology in proteomics. Here, proteins are subjected to proteolytic cleavage, and the resulting peptide products are analyzed by MS. The m/z ratio and predicted sequence are used to infer information about the proteins in the sample. Incomplete protein sequence coverage due to analysis of proteolytic products, combined with the inability to determine the proteoform origin of modified peptides, means that this approach cannot be used to fully understand proteoform complexity.3 Additionally, not all peptides within a sample may be efficiently ionized for downstream detection and analysis.2
At the other end of the spectrum of techniques for protein analysis are traditional approaches such as sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), enzyme-linked immunosorbent assays (ELISAs), immunohistochemistry and western blots. While these techniques are easier to perform than MS, they generally require antibodies to identify proteins, which can make characterization of unknown proteins challenging. These antibody-dependent techniques are further complicated by issues of specificity and variability.4
The age of next-generation protein sequencing
Just as next-generation DNA sequencing has transformed our understanding of the human genome, advances in protein sequencing are delivering new insights about the foundations of disease and setting the stage for novel scientific discoveries. NGPS is an accessible yet sophisticated method for the interrogation of proteins and proteoforms.
In 2022, an international team led by Brian Reed, head of research at Quantum-Si, published a manuscript in Science entitled “Real-time dynamic single-molecule protein sequencing on an integrated semiconductor device.”5 Reed and his team described the use of dye-labeled N-terminal amino acid (NAA) recognizers, aminopeptidases and semiconductor chip technology to sequence proteins.
In this process, peptides are first immobilized on a semiconductor chip and exposed to a solution containing dye-labeled NAA recognizers and aminopeptidases. The recognizers repeatedly bind and unbind the immobilized peptides when their cognate NAAs are exposed at the N-terminus. This activity generates a distinct series of pulses, termed a recognition segment (RS), for each recognized NAA, with characteristic fluorescence and kinetic properties. The aminopeptidases sequentially remove individual NAAs, exposing subsequent residues for detection. This dynamic process repeats until the peptide has been completely sequenced (Figure 1).
The temporal order of NAA recognition and associated kinetic properties over the time course of sequencing are highly characteristic for a given peptide and are termed its kinetic signature. Kinetic signatures are a unique feature of NGPS and can be analyzed to provide both high-confidence identification of individual peptides and variant detection.
Figure 1. Next-generation protein sequencing. Credit: Quantum-Si.
Recognizers detect one to three different types of NAAs. Each recognizer has an NAA binding pocket that determines which NAAs it can recognize and the relative affinity for each of its target NAAs. Importantly, the region adjacent to the binding pocket contacts multiple residues downstream from the NAA, which also influences the recognizer’s binding affinity. The affinity of a recognizer is reflected in the average duration of signal pulses detected during the corresponding RS. Higher affinity interactions result in a more stable bound state and therefore longer average pulse duration (PD).
The properties of the phenylalanine (F), tyrosine (Y) and tryptophan (W) recognizer (FYW) illustrate how recognizer-peptide complexes result in distinguishable binding characteristics. The FYW recognizer is derived from a ClpS-family protein that naturally binds to N-terminal F, Y and W. The recognizer’s binding pocket is optimal for F and accommodates Y and W with decreasing affinity. Thus, peptides with N-terminal F exhibit the strongest affinity and longest PD compared to identical peptides with N-terminal Y or W (Figure 2).
Figure 2. Peptides with N-terminal phenylalanine (F) exhibit the strongest affinity and longest PD compared to identical peptides with N-terminal tyrosine (Y) or tryptophan (W). Credit: Quantum-Si.
Each recognizer also binds to its cognate NAAs with characteristic preferences for downstream residues that influence PD. During the sequencing reaction, each RS has a specific PD characteristic of the NAA and the adjacent downstream residues. The order of recognizer binding, the appearance of segments lacking recognition, and the binding kinetics observed during the sequencing reaction constitute the full kinetic signature of a peptide.
Kinetic signatures are highly predictable patterns for protein identification and variant detection
Kinetic signatures are unique for each peptide sequence and are highly predictable. Quantum-Si has developed software incorporating a kinetic model that accurately predicts the kinetic signature of a peptide from its amino acid sequence. For peptide alignment and protein identification, the software uses a reference database of protein sequences and predicts the kinetic signature for each reference peptide sequence. The sequencing traces from every peptide on the chip are then aligned and mapped to the predicted reference signatures and assigned a score. The score is based on the presence of RSs in the expected order and the agreement between the observed PD value for each RS and the expected PD value based on the predicted kinetic signature.
Detecting disease-relevant proteoforms using NGPS
Proteoforms are different forms of a protein encoded by a single gene. These variants can arise from post-transcriptional mechanisms, such as alternative RNA splicing, and post-translational modifications (PTMs) including phosphorylation and proteolytic cleavage. Proteoforms that have different sizes and modifications may change the functionality of the encoded protein and alter protein–protein and protein–ligand interactions; in doing so, protein variants can play a role in a spectrum of diseases, including cancer.6 For example, functionality of the tumor suppressor protein PTEN is modulated via alternate splicing, alternative translational initiation, and PTMs. These events generate conformationally unique PTEN proteoforms that differ in terms of downstream functionality and pathological outcomes. PTEN proteoform–proteoform interactions may also play a role in cellular homeostasis and cancer.6
Given the role of protein variants in disease, the availability of NGPS addresses the pressing need in the field of proteomics for a powerful yet accessible technique for proteoform identification and characterization.
Data recently presented by Gloria Sheynkman, PhD, assistant professor at the University of Virginia, described the use of the Quantum-Si Platinum® NGPS instrument to understand the diversity of filamentous proteins and to capture alternative splicing variants.7 Her lab is developing approaches to discover novel disease proteoforms, assay proteoform-specific functions, and elucidate the molecular mechanisms by which proteoforms rewire cellular networks to drive disease states.
Filament structural proteins are highly diverse, with more than 40 genes coding for intermediate filament (IF) proteins and homologous tropomyosins (TPMs). IFs, including nuclear lamins and cytoplasmic keratins, vimentin, desmin and neurofilaments, are regulated by PTMs and provide mechanical resilience to cells, contribute to organelle positioning, and facilitate intracellular communication.8 TPM splice proteoforms regulate distinct actin filament populations to mediate cell-specific structural properties.7
Molecular detection of paralogous and alternatively spliced proteoforms of IFs and TPMs could represent a new source of critical biomarkers or drug targets. However, IF and TPM paralogs (expressed by related genes) and proteoforms with similar physicochemical properties are difficult to distinguish with mass spectrometry, and antibodies don’t exist for all TPM isoforms, which limits the use of antibody-based techniques.
Dr. Sheynkman and her team used long-read RNA sequencing (RNA-Seq) to reveal alternative splicing variants of vimentin and TPM1/2 with 87% identity.7 They then performed protein sequencing to evaluate the ability of NGPS to detect filament protein isoforms involved in cancer and skeletal muscle diseases. NGPS was able to distinguish peptides from multiple TPM1/2 paralogs and splice proteoforms, as well as isobaric TPM1/2 peptides differing only in a leucine versus an isoleucine. In future work, Dr. Sheynkman and her team plan to apply NGPS to detect PTM differences in IF and TPM proteoforms.
This study represents the first integration of proteogenomics with NGPS technology to detect proteoform-specific peptides. Several of these peptides were not easily detectable or distinguishable by mass spectrometry, demonstrating the value of NGPS as an orthogonal approach to provide a more comprehensive characterization of disease-relevant proteoforms.
Unlocking the proteome with NGPS
Compared with NGPS, conventional protein analysis techniques can be time- and labor-intensive, suffer from throughput constraints, and provide less information. Protein detection assays such as western blots and ELISA, for example, are limited in their ability to resolve unknown variants, truncations, and PTMs. While MS can sometimes resolve these differences, the technique requires expensive instrumentation, specialized technical expertise, and has lengthy turnaround time.
In contrast, NGPS advances and accelerates the characterization of proteins and interrogation of proteoforms, delivering a more detailed understanding of these molecules and their many variants. This powerful yet accessible technique helps researchers explore critical questions with greater precision, including identifying specific proteins important to pathways of interest, and how protein variation influences disease development. Continued adoption of NGPS will not only uncover protein complexity but may also reveal new biomarkers and drug targets, setting the stage for the development of novel medicines.