The ENCyclopedia Of DNA Elements (ENCODE), an international research consortium organized by the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH), has published the results of its exhaustive, four-year effort to build a parts list of all biologically functional elements in 1 percent of the human genome in the journal Nature.
The analysis was led by the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI), drawing on expertise from 35 groups from 80 organizations around the world. The project served as a pilot to test the feasibility of a full-scale initiative to produce a catalog of all components of the human genome crucial for biological function.
The findings promise to reshape our understanding of how the human genome functions. They challenge the traditional view of our genetic blueprint as a tidy collection of independent genes, pointing instead to a network in which genes, regulatory elements and other types of DNA sequences interact in complex, overlapping ways.
“By integrating 200 datasets generated by various high-throughput methods we now have a very good idea what 1 percent of our DNA might be doing. Our results reveal important principles about the organization of functional elements in the human genome, providing new perspectives on everything from DNA transcription to mammalian evolution. In particular, we gained significant insight into DNA sequences that do not encode proteins, which we knew very little about before,” said Ewan Birney, Ph.D., head of genome annotation at EMBL-EBI, who led ENCODE’s massive data integration and analysis effort.
The ENCODE consortium’s major findings include the discovery that the majority of human DNA is transcribed into RNA and that these transcripts extensively overlap one another. This broad pattern of transcription challenges the long-standing view that the human genome consists of a small set of discrete genes, along with a vast amount of junk DNA that is not biologically active.
The new data indicate that the genome contains little unused sequences; genes are just one of many types of DNA sequences that have a functional impact. The consortium identified many previously unrecognized start sites for transcription and regulatory sequences that contrary to traditional views are located not only upstream but also downstream of transcription start sites.
Other surprises in the ENCODE data have major implications for our understanding of the evolution of genomes. Until recently, researchers had thought that most DNA sequences with important biological function would be constrained by evolution making them likely to be conserved as species evolve. But about half of the functional elements in the human genome do not appear to have been constrained during evolution, suggesting that many species’ genomes contain a pool of functional elements that provide no specific benefits in terms of survival or reproduction.
“This impressive effort has uncovered many exciting surprises and blazed the way for future efforts to explore the functional landscape of the entire human genome,” said NHGRI Director Francis S. Collins, M.D., Ph.D. “Because of the hard work and keen insights of the ENCODE consortium, the scientific community will need to rethink some long-held views about what genes are and what they do, as well as how the genome’s functional elements have evolved. This could have significant implications for efforts to identify the DNA sequences involved in many human diseases.”
In addition to coordinating the analysis and integration of the ENCODE data, EMBL-EBI researchers in collaboration with the BioSapiens Network of Excellence (NoE) have investigated as part of the ENCODE effort how RNA transcripts are processed in human cells.
In the 27 March issue of PNAS they reported that alternative splicing, the phenomenon that the same RNA transcript can be cut at two or more different positions to make different products, is very common in humans. It is unlikely, however, that alternative splicing adds substantially to the variety of functions and structures among proteins.
Over the next couple of years the ENCODE project will be scaled up to the entire genome. The Ensembl project, a joint EMBL-EBI and Sanger Institute project, jointly headed by Ewan Birney, has already generated some initial genome wide datasets with early full scale datasets. This integration has lead to the identification of just over 110,000 regulatory elements across the human genome. In parallel the BioSapiens NoE is creating a pipeline for the systematic annotation of the proteins potentially produced by alternative splicing throughout the human genome.
“The collaboration with the ENCODE project holds great potential for new discoveries by the Biosapiens network” said Professor Janet Thornton, BioSapiens coordinator.
“The goal for the next five years is delivering a more complete understanding across our genome” said Birney, “the ENCODE pilot project is the first step towards this goal.”