The Evolution of Proteomics – Professor Ruedi Aebersold
The Evolution of Proteomics – Professor Ruedi Aebersold
Professor Ruedi Aebersold is a professor of systems biology at the Institute of Molecular Systems Biology (IMSB) in ETH Zurich and is regarded as one of the pioneers of proteomics research.
Aebersold has made significant contributions to the development of targeted proteomic techniques, including selected reaction monitoring (SRM) and data-independent acquisition. He is also one of the inventors of the Isotope-Coded Affinity Tag (ICAT) reagents used in quantitative mass spectrometry (MS).
Aebersold's research in quantitative proteomics has helped shape our understanding of how proteins function, interact and are localized in both normal and diseased states. The Aebersold laboratory utilizes high-throughput proteomic and computational methods, such as label-free shotgun proteomics, to precisely measure protein analytes in complex samples. By creating "snapshot" profiles, the research team are able to determine which cells contain abnormal levels of specific proteins, and by doing so hope to develop novel diagnostic markers for disease.
Molly Campbell (MC): In your opinion, what have been some of the most exciting breakthroughs in the proteomics field since its conception?
Ruedi Aebersold (RA): We work on MS-based proteomics. For me the most fascinating aspect of this technique is its versatility. Essentially the same liquid chromatography mass-spectrometry (LC-MS/MS) technique and instrumentation can be used to explore the many different biologically important properties of proteins if some additional tricks are applied. These properties include, of course, the amino acid sequence and abundance of proteins, but also their half-life, state of modification, localization in cells, their participation in complexes and the precise contact sites of interacting proteins.
Recently, there has been a distinctive trend to also tackle the higher order structures and corresponding changes of proteins and protein complexes by techniques including hydrogen deuterium exchange (HDX), cross linking, correlation profiling, native MS, thermal profiling, limited proteolysis (LiP) etc. The information gained by many of these methods is frequently highly interesting and directly functionally relevant.
MC: Your current research in quantitative proteomics looks to compare levels of protein expression between samples. Can you tell us more about your recently published work in conducting proteomic profiling in different types of cancer for the discovery of new biomarkers
RA: We have been doing quantitative comparisons between samples for 20 years, starting with the development of the ICAT technology in 1999. Out of that work we gained a lot of insights about the response of cells and tissues to different conditions. As an example, a PhD student, Ralph Schiess discovered a set of plasma biomarkers to stratify prostate cancer with respect to diagnosis and treatment options. He then founded a company, ProteoMediX, that is in the process of bringing this marker panel to the clinic. We also gained a lot of insights about specific biological processes, including their regulation by phosphorylation. As we could measure deeper into the proteome as the techniques evolved, we eventually learned that the response of cells to essentially any perturbation is very complex, typically involving hundreds of proteins.
This situation created a very challenging problem because it is not evident how a biologist would make sense of the resulting patterns and which of the many observations should be prioritized for what is commonly referred to as “biological validation”. To overcome this essentially intractable problem, we decided to develop MS techniques that would allow us to quantitatively compare large numbers of samples (hundreds to thousands) so that we could use mathematical methods like clustering, machine learning, statistical associations or regression to discover patterns indicating the biochemical changes in cells in a data driven way, rather than by prior biological knowledge. Out of these insights arose high-throughput targeting techniques and scoring software, initially SRM and tools like mProphet and a bit later SWATH/DIA techniques and tool like openSWATH.
We are really excited about these techniques because they provide fascinating insights into the inner working of cells and tissues and opened the door to population-based studies, for example by the use of genetic reference panels like the BXD mouse panel. By doing multi-layer measurements in such panels we try to combine genomic and proteomic data to learn how cells translate genomic variability into proteomic and eventually phenotypic variability. The same approach is also very powerful for clinical studies where the measurement of high numbers of replicates allow us to detect clinically significant signals, even in the noisy background of clinical samples.
MC: How important is data integration in proteomics research? How are advances in computational proteomics aiding data storage and dissemination?
RA: These are really two different questions. The second is about data management and the first about relating the biological meaning of the data to other types of biological data.
Data management, including storage, dissemination and processing pose significant financial and technical challenges because with advances in instrumentation the data volume has also increased dramatically. It is not uncommon that a single study, e.g. population based studies as mentioned above, generate terabytes of data, a volume that is difficult to handle for many groups. Fortunately, cloud-based systems are becoming available and I would also like to highlight Pride and the Protein Exchange consortium who have done an outstanding job of collecting and archiving data supporting published work and making the aggregate of data accessible back to the community, e.g., to support meta analyses.
The first question is even more challenging to address because in my view it is at present not clear how different data types generated from the same biological objects, e.g. cultured cells or clinical tissue specimens are best integrated. There are rather straightforward methods such as correlation of data types (frequently, transcript vs. protein abundance) but the knowledge gained from those analyses is limited. There is an interesting discussion in the field as to whether strictly data-driven approaches like machine learning have an equal, higher or lower potential to discover properties of biological systems compared to approaches that take into account the vast accumulated knowledge of biological processes. Personally, I came to the conclusion that for understanding the evolved biological systems we are studying, prior biological knowledge is highly useful and likely essential.
MC: You have worked on the development of several proteomic techniques. What technical challenges do researchers face in proteomics research?
RA: For a long time, MS-based proteomic analyses were technically demanding at various levels, including sample processing, separation science, MS and the analysis of the spectra with respect to sequence, abundance and modification-states of peptides and proteins and false discovery rate (FDR) considerations. I think we are in, or are approaching, the exciting state where these challenges are reasonably well, if not completely, resolved. When we get there, we will be able to more strongly focus on creating interesting new biological or clinical research questions and experimental design, and tackle the highly fascinating question discussed above, how we best generate new biological knowledge from the available data. Personally, I am convinced that we will be most successful in this regard if we generate high quality, highly reproducible data across large numbers of replicates and it seems that proteomics is essentially at a point to achieve this.
MC: Your most recent paper adopted a multi-omics approach to explore heterogeneity in HeLA cells across laboratories. Why was a multi-omics approach advantageous over other techniques in this instance, and how significant were your findings for the field?
RA: We undertook the study for two reasons. First, we wanted to make a fact-based contribution to the discussion about the reproducibility of research results in the life sciences, and second, we wanted to generate a presently unique multi-layered data set to explore how genomic variability affects the different layers of gene expression along the central dogma.
With respect to the first question we found that HeLa cells used for experimentation in different labs are significantly different in their molecular makeup and that this different molecular make-up renders the cells phenotypically different. We also discovered that the cells cultured in the same lab change over time. These phenomena are the result of genomic drift. In combination with the results of some community benchmarking studies we and others have undertaken over the past few years to assess the technical reproducibility of various aspects of MS-based proteomic methods, we now conclude that proteomics has reached a state where the technical (and computational) reproducibility is very high. So, any potentially observed poor reproducibility of results is likely to be rooted in the complexities of biological systems.
With respect to the second question we discovered that the quantitative results at each measured layer, i.e. the way and extent to which the cells respond to genomic alterations (a situation that is similar to that in cancer cells), correlate to some extent along the path of gene expression but not strongly enough to make one layer predictive of the other. We also discovered that the response to copy number variation in specific gene loci was significantly buffered at the level of protein complexes. Excess protein that is synthesized due to higher ploidy at a locus tends to be degraded if it cannot associate with its intended complex partners. This mechanism contributes to protein homeostasis at the level of the modular proteome.
MC: Systems biology is evolving at a phenomenally fast rate. Having worked in the field for several decades, what do you envision for the future of proteomics?
RA: I envision a vastly increasing significance of proteomics in systems biology, for two main reasons, both of which have been addressed above. The first is that proteomics has reached a level of maturity where large and high-quality datasets can be generated with relative ease and at a moderate cost. We have witnessed in the field of genomics that robust and accessible high throughput technologies are strongly transforming the life sciences. The second reason is that the different types of proteomic data which now can be generated contain a wealth of information that we have yet to learn how to completely understand. In short, biology and medicine are essentially about function and phenotypes, and these are strongly determined by the composition and modular organization of the proteome, a state that we describe with the term the proteotype.
Ruedi Aebersold was speaking with Molly Campbell, Science Writer, Technology Networks