Translational Research in Clinical Proteomics
Translational Research in Clinical Proteomics
Protein biomarkers and their role in the early detection of disease
In recent years, the discovery of biomarkers has advanced our understanding of diagnosing diseases such as cancer and coronary disease, monitoring their progression, predicting recurrence and supporting the identification of therapeutic treatment.
While genetic markers can indicate increased risk or pre-disposition to develop a certain disease, the genetic markers alone do not indicate the presence of the disease. Phenotypic markers such as changes in protein expression levels are needed to do this.
Clinical proteomics research that focuses on discovering protein biomarkers has been hailed as the "next step forward" in the early detection, diagnosis and treatment of diseases.
In this article, Gary Kruppa, VP of Proteomics at Bruker Daltonics, explores how proteomic methods can help achieve the ultimate goal of preventing and improving the treatment of disease.
Proteomics – the next generation of biomarkers?
Using proteomic methods to investigate changes in protein expression in cell cultures or tissue samples has led to the discovery of many potential biomarkers. There are, however, many hurdles in the validation of these biomarkers.
In general, for a biomarker to be useful for early detection of disease, it needs to be detected and validated in an easily sampled matrix, such as urine or plasma. Biomarker validation also requires clinical studies involving many hundreds or, more commonly, more than a thousand patients.
Plasma is often cited as the sample of choice, as plasma from blood circulates throughout the body and may contain proteins differentially expressed in diseased tissues. However, plasma also contains many very high abundance proteins required for normal biological function such as serum albumin and hemoglobin. The high abundance of these proteins interferes with the ability of standard proteomics methods to detect protein biomarkers excreted from relatively small amounts of diseased tissue into blood.
While state of the art proteomics methods can detect and quantify approximately 5000 proteins from a human cell culture in a reasonable time of ~90 minutes per sample, the same methods when applied to human plasma only detect 200-300 proteins, typically of high abundance.
Furthermore, even a 90-minute run time for the thousands of samples required for validation of biomarkers is too long. Both high performance liquid chromatography (HPLC) and mass spectrometry (MS) instrumentation must be sufficiently robust to process many hundreds of samples without loss of performance. To advance, the field needs novel ways to detect more proteins in plasma more quickly, with robust HPLC and mass spectrometers that minimize downtime.
Working towards deeper analysis of the plasma proteome
Over the last decade, specialist instrumentation suppliers have invested significantly to develop the power of mass spectrometers in terms of sensitivity and speed. In tandem with this technology, methods to deplete plasma of abundant proteins have also improved, allowing deeper analysis of the plasma proteome.
In standard bottom-up proteomics workflows, the peptides eluting from the chromatographic separation are detected by MS and then identified by MS/MS (tandem MS). Sensitive and fast MS/MS identification of proteins at >100 Hz is now possible which means that more than 5000 proteins can be identified and quantified in a single 90-minute run from 200 ng of protein from a human cell culture.1 In addition, new, high performance instrumentation enables the quantification of hundreds of proteins at 100 samples/day of depleted plasma with consistent performance for very large sample cohorts.2
To date, most biomarker discovery work in plasma has been performed with data dependent analysis (DDA) proteomics.
DDA vs DIA
In DDA methods, the peptides to be identified are selected by algorithms that identify all of the peaks in the MS spectrum and then use criteria such as peak intensity, charge state, mass-to-charge ratio (m/z) etc. to automatically calculate which peptides to target for MS/MS identification. Such algorithms generally also use dynamic exclusion lists so that once a peptide is targeted and fragmented for MS/MS, time will not be wasted in targeting it again.
DDA methods have the advantage that the mass of the peptide targeted for identification is known and may be used in the subsequent identification using the MS/MS data. However DDA experiments have certain drawbacks; the algorithms that perform the DDA selection rely on the MS data to select what is targeted, and biological variation in the samples as well as technical variations between runs may cause a peptide selected in one run to be missed in a subsequent run.
This stochastic nature in the selection of peptides for identification in DDA experiments results in the so-called "missing value problem". The data will not be complete in terms of being able to compare the quantity of all of the peptides (and hence proteins) across all samples. While time is being spent doing MS/MS identification of one peptide, the ions from all the rest are filtered away, resulting in inefficient use of the ion current.
To overcome these issues, a data independent analysis (DIA) approach was developed, in which, rather than selecting a single ion, all of the ions in a mass window are selected for fragmentation. If the windows are adjacent and scanned, then the DIA approach is deterministic – in principle, every peptide eluting from the chromatographic separation will be fragmented. The drawback of DIA experiments is that many peptides will be fragmented at once and it can be complex to determine which of the fragments belong together and came from a single precursor. However, software packages that can do this deconvolution are advancing rapidly, and as a result DIA experiments are becoming more common.
A final approach for validating biomarkers is targeted proteomics, generally known as multiple reaction monitoring (MRM) if conducted on triple quadrupole instruments, and parallel reaction monitoring (PRM) if conducted on instruments with high resolution detectors such as quadrupole time-of-flight (Q-TOF) and Orbitrap type instruments. Generally, these are referred to as targeted methods as they use a target list, containing the chromatographic retention time and m/z of each target. Only those targets are selected for MS/MS identification and quantitation. It may be that, to detect, quantify, and validate more proteins in plasma, a combined method represents a better approach. Combining DIA with targeted proteomic analysis may provide more in-depth results.
The introduction of fast, sensitive Q-TOF instruments with trapped ion mobility separation offers to further this research. DDA experiments on instruments such as Bruker’s timsTOF Pro are performed with unprecedented MS/MS speed and sensitivity and, in addition, provide an additional dimension of separation that assists in uncovering isobaric species with the same elution profile, using the so-called parallel accumulation serial-fragmentation (PASEF) method.i The same speed and sensitivity advantages apply to DIA experiments, termed diaPASEF, and in this case, the additional mobility dimension results in more efficient usage of the ion current, and also provides an additional chromatographic dimension that can be used in MS/MS data alignment to connect fragments to precursors with high confidence.3
The goal of translational proteomics
While the ability to analyse 100 samples per day of depleted plasma is a big step forward, it would be desirable to have even deeper coverage of the plasma proteome, preferably without the depletion step.
Ongoing research is working towards the goal of detecting and quantifying more than 1000 proteins in a plasma sample in <30 minutes, and at a reasonable cost per sample. If that goal can be achieved, then routine longitudinal monitoring of the plasma proteome of a substantial population could be used to correlate changes in the plasma proteome with the onset of disease. This should enable the simultaneous discovery and validation of biomarker panels for various diseases.
Extending the reach into multidisciplinary OMICS
As well as focusing on new developments in proteomics, research that combines proteomics with metabolomics and genomics, under the umbrella term OMICS, promises to be very powerful.
It is well known that glycosylation and lipid expression patterns are heavily modified in certain cancers and other diseases. Metabolites, however, are a diverse set of molecules which generally require very different HPLC and mass spectrometric methods from proteins for analysis.
Metabolomics methods also need extensive development to become sufficiently, robust, routine, and cost-effective to be deployed in the clinic.
Clinical research for biomarker discovery in both metabolomics and proteomics will continue, and it is only through uncovering developments in this field that the potential of multidisciplinary personalized medicine research will be revealed.
1. Online Parallel Accumulation-Serial Fragmentation (PASEF) with a Novel Trapped Ion Mobility Mass Spectrometer. Meier F, Brunner AD, Koch S, Koch H, Lubeck M, Krause M, Goedecke N, Decker J, Kosinski T, Park MA, Bache N, Hoerning O, Cox J, Räther O, Mann M.; Mol Cell Proteomics. 2018 Dec;17(12):2534-2545.
2. MaxQuant software for ion mobility enhanced shotgun proteomics, Nikita Prianichnikov, Heiner Koch, Scarlet Koch, Markus Lubeck, Raphael Heilig, Sven Brehmer, Roman Fischer, Jürgen Cox, bioRxiv 651760; doi: https://doi.org/10.1101/651760.
3. Parallel accumulation – serial fragmentation combined with data-independent acquisition (diaPASEF): Bottom-up proteomics with near optimal ion usage; F. Meier, A. Brunner, M. Frank, A. Ha, E. Voytik, S. Kaspar-Schoenefeld, M. Lubeck, O. Raether, R. Aebersold, B. C. Collins, H. L. Röst, M. Mann; bioRxiv 656207; doi: https://doi.org/10.1101/656207.