Leveraging Proteomics for Clinical Biomarker Discovery
Proteomic tools are enhancing the study of protein variants in health and disease.
Complete the form below to unlock access to ALL audio articles.
Over the last 40 years, molecular blood-based biomarkers have become increasingly important in the diagnosis, treatment and management of disease. In particular, the last decade has seen a rapid increase in the number of DNA-related tests, fueled by the revolution in DNA sequencing which has dramatically reduced costs such that a full genome can be sequenced for a few hundred dollars.
The earliest blood-based biomarkers measured proteins, but the difficulties in measuring large numbers of proteins in blood have limited discoveries of clinical utility to a few well-studied proteins. Indeed today, there are fewer than 150 protein biomarkers, but over 26,000 genetic biomarkers.
Using novel proteomic tools and associated technologies, such as machine learning, we can create and interrogate larger, more complex datasets enabling a more in-depth study of biological functions of protein variants in health and disease.
By leveraging unbiased proteomics, the study and functional analysis of proteins, scientists can begin to identify novel biomarkers for diseases such as cancer and Alzheimer’s disease (AD) and in the future, develop those biomarkers into clinically relevant and reliable diagnostic tests.
Biomarker discovery: interrogating the whole proteome
Historically, identifying novel disease-relevant proteins for biomarker discovery and clinical diagnostics has relied on immunoassays or affinity-based approaches. While supporting the necessary scale, these approaches have lacked the breadth to measure a large number of proteins, limiting their use to a few well-known proteins.
More recently, newer affinity approaches with greater breadth and scale have emerged enabling the study of several thousand proteins. While these methods enable the study of a broader set of proteins than the previous generation of affinity-based approaches, they are still limited to the set of proteins for which the affinity probes have been developed.
Therefore, they are unable to interrogate proteins that may play a critical role in the disease process but are not on the panel through design or ignorance of their importance.
Importantly, affinity probes are designed to work against a specific protein variant or proteoform. While the human genome contains approximately 20,000 genes, it is estimated that there could be more than one million unique proteoforms per cell type in humans. It is this explosion of function that drives the difference between humans and worms (C. elegans), both of which have the same number of genes (in fact C. elegans has a few more genes).
Proteoforms arise through dynamic processes, including alternative splicing, allelic variation and post-translational modifications (PTMs). The specific proteoform may play a critical role. For example, in AD, the phosphorylated form of tau protein is a more specific marker of AD than total tau. To gain the deepest insights into diseases, we must analyze proteins at the proteoform level.
While specific antibodies or other affinity probes may be developed against specific proteoforms, this approach does not scale to unbiased discovery as it is not practical to develop millions of affinity reagents to cover the full spectrum of proteoforms. Hence, we need proteomic technologies that enable deep, unbiased coverage of the proteome to help us uncover those proteoforms that play a role in disease progression and treatment response.
Unbiased proteomics can potentially drive the discovery of protein biomarkers, leading to earlier disease detection and the development of targeted therapeutics. By adopting this approach, we enhance the likelihood of identifying protein biomarkers that have clinical relevance.
Therefore, interrogating the whole proteome in an unbiased manner through the generation of large datasets will enable the identification of disease signatures. This approach necessitates the execution of large-scale cohort studies, however, once protein biomarkers are identified, smaller studies in specific populations can be used to validate those findings.
Proteomics for complex and rare diseases
Proteomics can reveal relevant proteoforms that can be tracked over the entire duration of complex diseases, such as cancer, metabolic disorders or neurodegenerative diseases.
Recently, studies from Seer scientists and researchers from Memorial Sloan Kettering Cancer Center identified specific protein isoforms of four proteins associated with non-small cell lung cancer (NSCLC) progression using mass-spectrometry proteomics. Additionally, a pre-print study from Seer and Massachusetts General Hospital identified key biomarkers for Alzheimer’s disease for monitoring disease progression.
In cancer, using cell-free DNA for liquid biopsies has been established as a valuable diagnostic for the detection of cancer. Combining cell-free DNA with proteomics in a multiomics approach may enable greater sensitivity of detection and ongoing monitoring of these progressive diseases.
PrognomIQ has taken a multiomics approach for cancer detection and recently made the results from its trial available in a pre-print on medRxiv. They demonstrate a classifier with 80% sensitivity at 89% specificity for Stage I lung cancer, which could allow oncologists to make better decisions on when and how to treat patients.
Proteomics could also provide crucial insights into rare diseases that are often hard to diagnose and for those where there is a limited understanding of the underlying biological processes. By tapping into the power of deep, unbiased proteomics, and if proteins link more closely to the disease processes, we may uncover new biological insights and biomarkers for these under-studied conditions.
For example, in a recent pre-print study, researchers have used deep, unbiased proteomic technologies to uncover novel biomarkers for the rare pediatric lysosomal storage disorder CLN3 Batten disease, which affects about 1 in 100,000 individuals, making it the most common pediatric neurodegenerative disorder worldwide.
Biological mechanisms and disease processes can be further elucidated by connecting deep coverage protein data generated at scale to large-scale genomic data. Two such analyses are protein quantitative trait loci or pQTLs and Mendelian randomization.
pQTL analysis connects genomic variants to protein levels. This can help us understand proteins and genetic variants that are related to disease processes in a cohort. Mendelian randomization is used to determine causality in biological pathways. Both of these methods require scalable proteomic methods that can run on large cohorts to work well, and deep coverage enhances their reach.
Clinical adoption of proteomics technology
There remains a significant need to make deep, unbiased, scalable plasma proteomics more accessible to researchers and clinician-scientists through the development of automated, standardized and easy-to-use sample preparation workflows.
Today, push-button, automated processes for sample preparation of proteomics are helping researchers perform much of the heavy lifting that, in the past, typically required specialists with technical expertise. Automation of workflows also improves the reproducibility of the assay. Through this automated approach, we can conduct high-throughput, large-scale proteomic studies and collect substantial amounts of data.
Once the body of studies identifying disease-relevant protein variants becomes significant, the life sciences community can begin to create lab-developed tests that would then go through the regulatory processes for approved clinical diagnostic tests. Similar to the path taken by genomic diagnostics, there are significant opportunities for delivering mass spec-based proteomics to the market.
Proteomics is facilitating large-scale studies aimed at advancing the discovery of protein variants, modifications and interactions that underpin biological mechanisms. This endeavor will enhance the breadth of clinically translatable datasets related to diseases and broaden the pool of potential biomarkers and drug targets accessible for therapeutic purposes.