The study, published June 9 in Nature Genetics, provides evidence for a 40-year-old hypothesis that regulation of genes must play an important role in evolution since there is little difference between humans and chimps in the proteins produced by genes. Indeed, human and chimpanzee proteins are more than 99 percent identical.
The researchers showed that the number of evolutionary adaptations to the part of the machinery that regulates genes, called transcription factor binding sites, may be roughly equal to adaptations to the genes themselves.
“This is the most comprehensive and most direct analysis to date of the evolution of gene regulatory sequences in humans,” said senior author Adam Siepel, Cornell associate professor of biological statistics and computational biology.
“It’s taken these 40 years to get a clear picture of what’s going on in these sequences because we haven’t had the data until very recently,” said Leonardo Arbiza, a postdoctoral researcher in Siepel’s lab and the paper’s lead author.
Less than 2 percent of the human genome – the complete set of genetic material – contains genes that code for proteins. In cells, these proteins are instrumental in biological pathways that affect an organism’s health, appearance and behavior.
Much less is known about the remaining 98 percent of the genome; however, in the 1960s, scientists recognized that some of the non-protein coding DNA regulates when and where genes are turned on and off and how much protein they produce. The regulatory machinery works when proteins called transcription factors bind to specific short sequences of DNA that flank the gene, called transcription factor binding sites, and by doing so, switch genes on and off.
Among the findings, the study reports that when compared with protein coding genes, binding site DNA shows close to three times as many “weakly deleterious mutations,” that is, mutations that may weaken or make an individual more susceptible to disease, but are generally not severe. Weakly deleterious mutations exist in low frequencies in a population and are eventually weeded out over time. These mutations are responsible for many inherited human diseases.
While genes generally tend to resist change, a mutation occasionally leads to a favorable trait and increases across a population; this is called positive selection. By contrast, “transcription factor binding sites show considerable amounts of positive selection,” said Arbiza, with evidence for adaptation in binding sites that regulate genes controlling blood cells, brain function and immunity, among others.
“The overall picture shows more evolutionary flexibility in the binding sites than in protein coding genes,” said Siepel. “This has important implications for how we think about human evolution and disease.”
This is the one of the first studies to combine recent data that identifies transcription factor binding sites, data on human genetic variation and genome comparisons between humans and apes. A new computational method called INSIGHT (Inference of Natural Selection from Interspersed Genomically coHerent elemenTs), designed by Ilan Gronau, a postdoctoral researcher in Siepel’s lab and a co-author of the study, allowed the scientists to integrate these diverse data types and find evidence of natural selection in the regulatory DNA.
“Transcription factor binding sites are probably the regulatory elements we know the most about,” said Arbiza. “If you want to understand evolution of gene expression regulation, that’s a good starting point.”
INSIGHT may now be used by other researchers for analyzing other short regulatory DNA sequences, such as micro-RNAs, non-coding molecules that also play a role in gene regulation.
The study was funded by the Packard Foundation, Alfred P. Sloan Foundation, National Science Foundation, National Institutes of Health, and a fellowship from the Cornell Center for Vertebrate Genomics.