Biologists still have no clear idea how many active genes there are coding for proteins in humans and other organisms, even though for some species the genomes have been completely sequenced. This is because many of the genes and their protein products have only been predicted by computer algorithms that are at this time imperfect.
The field of proteomics aims to discover all the proteins produced by a given organism. Such a proteome map would bring the possibility of deducing the precise number, and location in the genome, of the genes coding for proteins.
This is much more complex than simply mapping the genome from end to end because it involves detecting all proteins even though some are present only in very small amounts, while some are confined to specific organs and/or are only synthesized at certain times or stages of an organism’s life.
However a recent workshop supported by the European Science Foundation (ESF) concluded that it is now feasible to map at least nearly the whole proteome (the sum total of all proteins) of an organism. Such an extensive map will be an essential base for the development and eventually the widespread application of a new generation of proteomics technologies that are faster, more sensitive and more reliable than the present methods.
These technologies, in turn, thanks to their improved performance could greatly improve understanding of many diseases and lead to new therapies, according to the ESF workshop’s coordinator Professor Rudolf Aebersold from Institute of Molecular Systems Biology in Switerzland.
Most diseases, including cancer and many pathogenic infections, involve disruption to regulatory processes in cells or tissues with associated changes in the abundance of proteins and their interactions, Aebersold pointed out. “The idea would be that if we could map out the whole proteome, we could develop a toolbox structure enabling assays (for detecting proteins) to be done faster and more cheaply.” It would then be possible to identify proteins implicated in a particular disease more readily, helping both with research into underlying causes, and ultimately in diagnosis.
Mapping the proteome will also help resolve one of biology’s more recent puzzles, which is why all organisms, from the simplest to the most complex, contain a significantly higher number of predicted protein coding genes than experimentally detected proteins.
The shortfall is significant – in almost all organisms there are only about 65 per cent as many proteins as had been predicted through analysis of the genome. This is a surprise because each gene had been understood to code for a protein on a one-to-one basis.
As Aebersold pointed out, there are several possible explanations for this apparent anomaly. The simplest explanation is simply a lack of sensitivity of proteomics methods, which may as a result have failed so far to identify specific classes of proteins. However given the increasing sensitivity of proteomics methods, this explanation is increasingly unlikely.
Another possibility is that there are not as many coding genes as had been thought. The number has not yet been reliably counted, and is currently estimated by computer predictions based on knowledge of the sequence structure of genes already known. Another possibility is that there are as many genes as had been thought, but either that a substantial number code for proteins that are only very rarely or under specific conditions expressed in cells or tissues, or that the mRNA, which is the intermediate product between the DNA of the genome and the proteins, is not translated into the final protein product.
Aebersold hopes that the workshop will lead to a major EU-funded project to map complete proteomes, answer some of these questions, and arrive at a more accurate count of how many genes there are in humans and other organisms. “If something can be identified as a protein, that’s the most direct evidence we can have that a gene really exists,” said Aebersold. It will though be hard to tell when the job is done, because the proteome, unlike the genome, is not a stable, defined physical entity.
While the genome comprises the famous double helix of DNA that can be physically sequenced on an end-to-end basis, the proteome is simply the total of all proteins, and so by definition you can never be absolutely sure the last one has been found, given that some are present intermittently or in small amounts and can easily be missed during analysis.
Mass spectrometry is used to perform this analysis and identify proteins in complex samples, after applying some technique such as chromatography to separate out the individual proteins. The amino acid sequence of each protein can be determined after separation, from the relationship between electric charge and mass.
As Aebersold noted, the biggest prize of proteomics will be a much simpler and efficient technique for identifying individual proteins within samples, which could eventually have huge diagnostic power as well as application in research and across the whole field of biotechnology and pharmacology. The ESF workshop has prepared the ground by helping establish the collaborative framework for Europe to participate fully in the great proteomics initiative, and now it is up to the EU to provide the funding.
The workshop Model Organism Proteomics, which was one of the series of events organised by the ESF Exploratory Workshops, was held 11-13 April 2007, in Zurich, Switzerland. Delegates were given a thorough overview of the state of the field and Europe’s place in it, with details of various collaborations.