Dr Evangelia Petsalaki is a Group Leader at the European Bioinformatics Group, where her research team study human cell signaling in health and disease conditions.
The Petsalaki group uses interdisciplinary approaches, including data-driven network inference, modeling of cell processes and data integration to understand how different environmental or genetic conditions affect cell signaling responses leading to diverse cell phenotypes. Their aim is to make both predictive and conditional models so they can anticipate what might happen in a biological network under different conditions.
The research group also collaborate with experimental teams that specialise in mass spectrometry (MS), imaging and cell biology to enhance their data sets and validate their models. Such models are being designed to help researchers answer specific biological questions, such as how stem cells "decide" what type of cell they will become, and what is the effect of cell signaling on cell shape and migration (i.e. where it "goes" in a tissue or organ).
Molly Campbell (MC): What do you regard as being the most exciting breakthrough in proteomics research since the field’s conception?
Evangelia Petsalaki (EP): Proteomics in the last 10 years has been galloping on all fronts. I don’t think that there is a single most important breakthrough. Rather, the entire field has managed to develop technologies and methods that have allowed unprecedented views into the proteome of the cells, from a very large array of conditions and sample types, from cells, to patient samples, and everything in between. If I had to choose one, the SWATH technology developed in the Aebersold group at the Institute of Molecular Systems Biology at ETH Zurich, really provides in-depth quantitation of entire proteomes.
However, the technology I am most excited about is not quite ready to be called a breakthrough yet, but I expect to it to be revolutionary in the future. I am talking about the work from Swaminathan et al, published last year in Nature Biotechnology from the Marcotte group at the University of Texas at Austin. Using Edman degradation, they were able to identify proteins from protein mixtures. They still have a lot of issues to overcome before this technology works at scale, on protein lysates and is affordable. But when this is achieved, we are looking at a revolution in proteomics, where accurate, comprehensive proteomes and respective phosphoproteomes (and other post translational modifications) can be measured effectively, similar to the way that genomes, transcriptomes and epigenomes are measured now.
MC: Your research group studies human cell signaling with the aim to understand what controls different cell responses. Why it is useful to study this area from an omics approach, particularly with a focus on phosphoproteomics and proteomics?
EP: Cell signaling represents the set of processes that define how a cell will respond to perturbations in its environment or messages from other cells. These processes are critical in the cell and their deregulation leads to many diseases, including cancer, which is in fact largely a signaling disease. Because of their importance they have been studied for many years. Most of our knowledge comes from very detailed studies done a long time ago, where signaling pathways were discovered and annotated.
Since the "high throughput" era began we have made some additions to these pathways, but we are still heavily relying on these initial annotations. While they have provided amazing contributions to the field and our knowledge, there are two issues with them: The first one is that they represent the "average" pathway, however, cells respond differently to different conditions even if they activate similar "pathways". Therefore, assuming that pathways always have a specific structure regardless of the cell type or condition is an oversimplification. “Omics” approaches can help us fit these pathways to the observed data and adjust them, and even better, to use data-driven approaches to extract them directly from the data.
The second problem is that, as these pathways were discovered with very small and detailed studies; they cover a very small space of the actual signaling networks in cells. “Omics” data opens the door to exploring the rest of this space. As signals in the cell are transmitted largely through a relay of phosphorylation of proteins, proteomics and phosphoproteomics in particular, represent the actual signaling state of the cell at the time of measurement. They are therefore the ideal type of “omics” data to study this type of processes.
MC: As a computational lab, what approaches do you use to interpret phosphoproteomics data?
EP: First of all, we aim to use data-driven approaches. This means using statistical approaches to extract patterns from the data without restricting it to what is already known about the system. The reason for this is that the majority of knowledge in cell signaling is accumulated around a handful of very well studied kinases and pathways. If you think of the cell as Europe, we only have a limited map for Portugal and a bit for Spain and the rest of Europe is uncharted. Currently, most studies try to venture out just a bit out of the map but still very close to the charted territory.
Since we collect the data from the entire cell (i.e. snapshots of the entirety of Europe), if we only restrict our study around previously known information, then we are ignoring an entire world of potential new discoveries. The other focus of the group is on integrating the phosphoproteomics data with other "omics" datasets that can provide information on other layers of cell regulation. To go back to the map analogy, imagine it being like getting different types of pictures of Europe, including the roads, the mountains etc. Integrating different types of information can give us a more complete picture of how cells work.
MC: What challenges do you encounter when handling proteomic data? How can these challenges be overcome?
EP: The major two challenges are that the data is very sparse, and that we have trouble measuring low abundance proteins. So, every time we take a measurement, we sample different parts of the proteome or phosphoproteome and we are usually missing low abundance players that are often the most important ones, such as transcription factors.
In my group, one approach to mitigate this issue is to map the identified peptides on protein interaction networks and diffuse the signal on this network. This reduces the noise from spuriously identified proteins and enhances the functional signal. It also allows us to observe regions of the network that are highlighted by the different datasets and compare and study these, instead of trying to compare the sparse datasets between each other.
However, with the advances in MS technologies developed by many companies and groups around the world, including the Mann group at the University of Copehnagen, Aebersold and other emerging technologies that promise to allow "sequencing" proteomes, analogous to genomes, developed by the Marcotte group and colleagues, I expect that these will not be issues for very long.
MC: You recently published a paper titled “Allosteric Modulation of Binding Specificity by Alternative Packing of Protein Cores”. Your research group suggested that your findings could be used to engineer proteins with novel functions. Please can you expand on this?
EP: This is a project completed during my postdoc time in Toronto and the lead on this is Dev Sidhu, at the Donnelly Center of the University of Toronto. He is a wizard in protein engineering and has done very important work in the field. In this paper we showed that modifying amino acid residues from the core of the protein provided conformational flexibility to the protein, resulting in changes in its ability to recognise specific ligands and even the binding site for these ligands.
This has direct implications for its function and its effect in the cell’s functions. I am not a protein engineering expert but, as far as I know typically, modifications on the surface of the protein are used to modulate its ability to bind different ligands. Our finding shows that modifications in the core, can provide structural flexibility and therefore more options as a starting point for engineering specific binding properties. By understanding the effects that changes in the protein core have on the protein surface and its binding properties we can engineer proteins to have additional or modified functions.
EP: I think that despite all the advances with data generation, analysis and integration methods, an approach or set of approaches to truly integrate these data and generate testable hypothesis to push the boundaries of our knowledge forward is still elusive.
I am excited about efforts to create whole cell models that are happening in different groups around the world, such as the Covert and Karr groups, in Stanford University and the Mount Sinai School of Medicine respectively, and others in Japan, and we are also joining that effort now.
I think that combining true data integration efforts with executable models of cell function will provide breakthroughs in our understanding of how cells work, what is wrong in disease, why different human cells (either same human different cell types, or same cell type and different humans) respond differently to drugs, and many other important questions in biology and medicine.
Evangelia Petsalaki was speaking with Molly Campbell, Science Writer, Technology Networks.