The “Streetlight Effect” in Proteomics
An underlying problem in proteomics research
Our current understanding of human, animal and plant biology is largely derived from the insights provided by studying the DNA code. However, this code is just one of the integral components of biology’s central dogma. DNA must be read and converted to proteins, the “workhorses” of the cell, responsible for coordinating and conducting specific functions.
The introduction of high-throughput technologies, bioinformatic tools and artificial intelligence (AI)-based methods have progressed the field of proteomics over recent years. While not yet “in the clinic” so to speak, the study of proteins expressed in healthy or diseased states is guiding the development of diagnostic biomarkers, the identification of drug targets and the production of novel biopharmaceuticals. Across the broader life sciences, the applications of proteomics are numerous and varied.
“Proteomics has been transformed from an isolated field into a comprehensive tool for biological research that can be used to explain biological functions” – write Yahui Liu et al.
The future of proteomics is no doubt bright. However, a commentary article published in Nature Methods by Kustatscher et al. earlier this year brought attention to an underlying problem in the field: some proteins are getting more research attention than others.
The publication states that an estimated 500 proteins (approximately 25% of the human proteome) account for 95% of all life science publications. Most of these proteins were already known to the scientific community in the pre-human genome project era. Tumor protein 53 (p53), sometimes nicknamed the “guardian of the genome” due to its role in DNA repair and cell division, is one of the most frequently studied proteins. “One of the many chilling statistics revealed is the fact that p53 is the subject of 2 publications per day,” says Professor Kathryn Lilley, professor of cellular dynamics at the University of Cambridge, and a co-author of the publication.
Why does this annotation bias exist?
This inequality in protein annotation occurs due to a variety of different factors, Lilley explains: “Firstly, there are practical reasons why a protein might remain unannotated. This could be down to the fact that it is expressed at low levels and therefore rarely ‘measured’ in an experiment.”
Extremely small proteins, or those that possess certain properties (such as being hydrophobic, can prove challenging for even the most sophisticated analytical technologies. Some proteins can adopt unstable states that are present for a fraction of a second but play key biological roles – known as “fleeting proteins”, which are likely not captured in most studies.
“It could be that its corresponding and gene or transcript do not appear as ‘interesting/significant’ in genomics studies, or it is not associated with any disease states. Moreover, it may be that the protein does not resemble any other protein in terms of likely domain structure, well documented motifs or clear evolutionary trajectory,” Lilley says.
She describes the non-practical reasons as being “less palatable” to her mind: “There is security in numbers in scientific research. If a protein is well-studied, there may be more resources available which can be shared amongst different groups. If a protein is perceived to be of great interest by the scientific community, there is more chance of having research outputs published via high impact mechanisms, leading to high citation and subsequently a greater chance of continued funding.”
This cycle perhaps isn’t unique to the field of proteomics and speaks to wider issues within scientific research. But in this instance, it’s fueling what Lilley calls a “self-perpetuating microcosm of the well-studied proteome” at the expense of taking risks.
“When studies unearth sets of proteins that require further investigation, it is frustrating to trawl the literature only to find that historically such proteins have been ignored, many as simply not of significant interest to pursue, not trendy enough to attract funding, or generally considered to be a bit ‘dull’,” – Lilley.
Why are understudied proteins problematic?
Bias towards well-studied proteins inhibits our knowledge of cellular function, dysfunction and ultimately hinders progress across life science research. “The understudied proteome contains many examples of protein essential for proliferation, a key cellular process, whose aberrant function underpins many diseases, cancer being the most pertinent in many avenues of research. This bias will extend to most cellular processes, and hence without functional annotation of this subset of proteins, we’ll have little to no chance of fully understanding how cells work.”
Many of the drugs used to treat human diseases target proteins. Data from the DrugBank database suggests that the entire collection of drugs approved by the US Food and Drug Administration (FDA) target 620 proteins in total, including transporters, enzymes, ion channels and receptors. “The understudied proteome contains a considerable number [of proteins] that are expected to be druggable,” says Lilley.
To create a new drug, there are various stages of preclinical and clinical development required. Bench research and preclinical trials are reliant on models that enable scientists to interrogate the drug’s function in vitro and in vivo. However, if our basic knowledge of cellular mechanisms is flawed, our models could be too. “Knowledge of the function and role in disease of this considerable subset of the proteome may result in a step change in drug discovery going forward,” Lilley notes.
The Understudied Protein Initiative
Kustatscher and colleagues have brought the scale of the issue to light – but how do we tackle it? A change is clearly required within proteomics approaches to bring the perpetuating cycle to a halt. The Understudied Protein Initiative, a novel Wellcome Trust-funded initiative developed by Kustatscher et al., outlines a solution: a coordinated effort from the functional proteomics community. The initiative suggests that sufficient data be gathered on an understudied protein – perhaps on its interactions, localization or expression – such that hypotheses on its function can be made. “In an ideal world, researchers could carry out some systems level functional assays, where every protein is tested for a specific function. A good example of this, is testing whether a protein binds RNA or not. There are many routine methods to carry out such a functionality screen and that also can be applied across many conditions; some proteins may only bind RNA under a certain set of circumstances,” explains Lilley.
Using this functional data, it would be easier to then clarify which field or laboratory is best suited to conduct further, detailed studies of that protein. In essence, the task is divided into two parts: large-scale pre-characterization by omics scientists, followed by focused molecular biology studies. “More systems-wide studies will need agreement of the biological system, sets of conditions tested, sharing of resources and a holistic set of methods to ‘prod and poke’ the understudied proteome,” says Lilley. “What will be particularly essential will be data sharing, curation, integration of databases and creation of dynamic cellular models. Building on resources such as MuSIC 1.0, a hierarchical map of the cell from Ideker lab, being a very good starting point.”
She continues, “As a word of caution, however, the task at hand is almost uncalculatable in size. We have yet to adequately compute the size of the proteome. If one takes into consideration the number of proteoforms that may exist, in other words, the number of distinct chemical entities though post transcriptional and post translational processing and the likely combinatorial nature of this processing, the size of the proteome expands by multiple orders of magnitude.”
No matter the size of the anticipated challenge, a start must be made somewhere. The Understudied Proteins Initiative published an open invitation to researchers, outlining its “roadmap” for the project. An openly accessible survey has been launched as a first step, which presents a randomly selected human protein and asks the user to assign it to an annotation level. Next, the survey asks the user to describe which tools, resources and considerations they would put forward for that assessment.
“Based on the responses to the survey, we aim to define the challenge for a community effort to tackle protein annotation bias. We will present and discuss the results in a workshop,” the initiative leaders state. Core questions to be addressed during the workshop include:
- What new information on an uncharacterized protein would spark detailed mechanistic studies?
- What tool(s) would provide that information?
- How could a consortium be structured?
- How would the information efficiently reach molecular biologists to instigate change?
Some of the greatest triumphs in science have been based on taking a potential risk. It seems imperative – arguably now more than ever – that researchers feel confident and comfortable pursuing studies on lesser-known or understood proteins, irrespective of the anticipated analytical challenge or the perception that the protein is “dull”. Who knows what we might find – maybe solutions to some of the most challenging scientific conundrums of our time?
The Understudied Protein Initiative is leading the way and encourages the community to get involved by participating in the survey and spreading the word.
“By providing a basic molecular characterization of all proteins, the Understudied Proteins Initiative will catalyze mechanistic investigations of understudied proteins, drive new biomedical research, and boost our understanding of the human proteome and its role in disease,” – The Understudied Protein Initiative.
Complete the form below to unlock access to this Audio Article: "The “Streetlight Effect” in Proteomics"