Towards a Deeper Understanding of the Proteomic Landscape
A deep understanding of the proteomic landscape of both healthy and diseased cells, tissues and organs is required for the development of robust clinical protein biomarkers. At a recent London Proteomics Discussion Group Meeting, Professor Roman Fischer kicked off the event by addressing the apparent "failure" of proteomics to deliver new clinical biomarkers thus far: "Approximately two new proteins each year are added to those that can be utilized as clinical biomarkers." He continued, "Modern proteomics has not significantly impacted this number."
Why? Referencing Joshua LaBaer, Fischer puts forward the following factors:
- It is hard to find differences that are predictive
- It is very hard to find predictive markers in accessible fluids
- It is ridiculously hard to find accessible predictive markers that are not affected by related diseases
Fischer's presentation then proceeded to outline his group's work in optimizing a laser-capture microdissection (LCM)-based method for spatial, cell-type-resolved proteomic analysis of neurons isolated from post-mortem human brains and high-throughput proteomics. Their strategy is available as an open access read in the Journal of Proteome Research.
Fischer emphasized that spatial proteomics will be important for studying and understanding tumor biology by exploring the proteome of the tumor environment. His laboratory has applied its method in this context, taking incredibly thin slices from brain tissue samples, including tumor and non-tumor tissue, and exploring the landscape of protein expression.
This experiment produced 96 samples which equated to almost 4,000 protein spatial maps and eight days of data acquisition time. Accompanying the oral presentation, Fischer provided a playful handout of 3D plastic printouts of the work. The expression of certain proteins, including peripherin, hemoglobin, histone and glycogen phosphorylase were represented by an increase in layering of the plastic that you could physically touch. A sensory treat!
A 3Dplastic printout of hemoglobin in a sample, represented by an increase in layering of plastic. Credit: Roman Fischer.
At the event, Technology Networks sat down with Fischer to discuss spatial proteomics, and to expand on some of the key aspects of his research.
Molly Campbell (MC): For our readers that may be unfamiliar, can you tell us briefly about what spatial proteomics is, and the aims of this research area?
Roman Fischer (RF): One of the major issues when people want to conduct clinical proteomics research on tissues is that they ultimately analyze a protein out of context. They typically use punch biopsies, which are mushed up and analyzed. Using this approach, we lose the spatial context of the proteins, i.e. where they are in relation to blood vessels for example. This becomes very important when you want to look at, say, tumor biology, as the blood supply brings nutrients to the tumor. To better understand the underlying mechanisms at play, the spatial context is required. This is not limited to tumor biology research, it's also applicable to organs; for example, a kidney has a large variety of different cell types, so it is critical to pin down where certain proteins are and explore how their expression relates to the spatial context and function. The next step is to expand this approach from 2D, and create a 3D proteome.
MC: Your work has focused largely on exploring the proteome of brain cells. Why?
RF: Apart from the fact that brain tissue and cell types are still not well understood, I don't have a fancy biological reasoning. However, at the moment these are the low hanging fruits – not because the tissue is available, brain tissue is really hard to obtain – but the neurons are quite large. From a practical point of view, we look at motor neurons that have a large diameter when compared to other cells. They're easier to work with. This approach is a precursor, really, for where we want to go with spatially resolved, deep proteome analysis, where we don't look at specific brain cells but areas and 3D structures.
MC: You presented some interesting data today. Can you tell us more about this, specifically the LCM-method?
RF: Some of this data was published last year: we compared Betz cells and Purkinje cells. They have a different function in motor coordination and Betz cells are rare and hard to get hold of – so no one had really looked at their proteome. To have sufficient amounts for study, you need a lot of brain tissue to analyze. With our optimized methods we can get a deep (ish) proteome of 4,000 proteins from 200 of these cells. But, these are not total cells; if you consider the fact that the cells have a diameter of 200 μm, we only look at a 10 μm thick slice of that – so basically we do not have even 10% of one the cells. 200 of these slices were required to take a look at 4,000 proteins. A depth of 4,000 proteins is sufficient to learn about the function of the type of cells you are comparing. For example, the Betz cells were found to have more abundant energy metabolism related proteins, which can directly be linked to their biological function as they need to fire nerve impulses much more frequently than the compared Purkinje cells. I like to call this portion of the proteome that can be directly linked to the phenotype of a cell the “Pheno-Proteome”.
MC: Your lab is involved in a wide range of collaborations, are you able to tell us about any that are particularly exciting areas of research?
RF: My own work is focused on the spatial proteomics space and high-throughput proteomics, because the two almost go hand-in-hand. We also work on addressing the fact that more and more research groups are interested in running large clinical cohorts in an unbiased way. Previously there would be a lot of "cherry picking" of samples when studying certain diseases, as throughput in proteomics would not allow you to analyze more than 100-200 samples. This pre-selection introduces bias. In order to avoid this bias, I am trying to convince my collaborators that, really, they should analyze all samples available. Not so long ago, this was not possible due to a lack of robust high-throughput capable LC-MS instrumentation.
This has changed recently, so we can easily analyze several hundreds to 1000s of samples on a single LC-MS platform within days. We have recently tested the feasibility of running large cohorts on a sample set of more than 2500 undepleted plasma samples. Including quality controls and samples for spectral library generation we ran a total of 4500 injections on a LC-MS platform consisting of a Bruker TimsTOF Pro connected to an Evosep One LC. On that platform we did not experience any of the problems very often associated with clinical samples, such as degrading performance or robustness. With 100 samples per day this type of project suddenly is doable in what not so long ago would have taken a year of instrument time.
At the moment we're thinking about how we could use these workflows to provide standardized proteomics analysis to large sample collections such as the UK Biobank, which is an enormous challenge, but I think we are on a good way to solve some of the methods-based issues.
MC: Are you able to expand on these methods-based issues – what are the biggest challenges?
RF: This is quite interesting because we ran into bottlenecks that we didn't quite anticipate, especially when you come from smaller sample numbers such as 20 or even 200 samples, where you can do most things manually. But when you have 2,000 samples you run into problems, such as samples being provided in screw cap tubes. Now you need someone to manually unscrew 2,000 screw cap tubes and pipette a small volume of liquid out of this to convert it into a format that you can then use for further processing. This is a problem that I didn't quite anticipated. However, this is not a new problem, but particularly in the proteomics field there are not many solutions yet. At the moment we're stuck in a semi-automatic processing space.
A problem related to this is that we have to work with barcoding, and that is something that we haven't really addressed yet in an academic lab. When you suddenly have so many samples, you need to track where the sample is going when, and what has been done with it. These problems are all not really new and have been addressed in other research areas such as metabolomics. However, for a small academic proteomics lab, those challenges provide a steep learning curve. The next big one is data analysis and integration with other –omics and clinical data, for which we don't really have a solution yet.
This might sound like a problem of the past because a lot of software is available to analyze proteomics data –commercial and free – but they don't really allow for large numbers of samples, and by that, I mean even 500 samples.
With this software you can almost analyze the data at the speed at which we acquire it, which is great, but this only applies to a certain threshold, which seems to be approximately 500 samples. If you have a sample set of 2,000-4,000, the software simply doesn't cope, and this includes software that has been tuned for speed. With software that are not tuned for speed, the whole thing becomes unfeasible because we are 10X slower in analyzing the data than we are at acquiring it. This can delay timelines by months, which is an even greater problem when you want to extend analysis to large samples cohorts such as the biobanks. Although there is rapid progress in the primary data analysis, we are only beginning to address data integration at this scale.
Roman Fischer was speaking to Molly Campbell, Science Writer, Technology Networks.