Around 2.3 million years ago, a 900-kb chunk of DNA broke off, reversed itself, and reattached on the chromosome in the opposite orientation. These kinds of genomic rearrangements (known as inversions) are not entirely uncommon, but what is unusual is that both versions of the chromosome – each with a substantial chunk of DNA pointing in a different direction – still exist in the human gene pool. For more than 2 million years – throughout the entire course of modern human history – these two, distinct forms have been trundled along in our genetic luggage.
Scientists have known about this ancient inversion for about seven years now, but four researchers at the Broad Institute recently took a closer look at this region of the genome, teasing apart its genetic diversity and complexity. They found that the inversion was just the tip of the iceberg.
“This was a locus that we and a thousand of our closest friends in human genetics knew was interesting,” says Steve McCarroll, senior author of a recent Nature Genetics paper that details their findings. “If you’d asked us or other people in human genetics about it, we would have described this inversion, which is common in Europeans. But the inversion turns out not be the most interesting feature of the locus.”
Steve and his colleagues Linda Boettger, Bob Handsaker, and Michael Zody have found that there aren’t just two structural versions of the locus (a locus is a specific location of a gene, or in this case, a string of many genes). Instead, there are nine. After the initial, 2 million year-old rearrangement, much more recent changes began to emerge, including duplications of different parts of the locus. Steve and his colleagues at the Broad Institute specialize in detecting these extra (or in other cases, missing) copies, known as copy number variation (CNVs for short).
The team used two breakthrough approaches – one molecular and one computational – to detect and characterize these nine structures. The researchers chose this particular region, known as 17q21.31, in part because several markers for female fertility and neurological diseases are associated with it, and in part because it is structurally complicated.
“These regions are often ignored simply because of their complexity and because they seem very difficult to study,” Linda explains. “But that’s also what makes them so interesting.”
Because the region is so diverse, the researchers needed to look across about a thousand samples in order for the nine distinct structures to emerge. Existing methods are difficult to scale and apply to this many samples. Instead, the team used a molecular technique called digital droplet PCR to look at this region of the genome in parents and children to find patterns of copy number inheritance. They also used a complementary, computational method to analyze data from the 1,000 Genomes Project. The results of the two methodologies converged on many of the same findings.
“The key to our success was to combine several different techniques,” explains Bob, a senior principal software engineer. “We leverage the new digital droplet PCR technology, but combined this with sequence data analysis from public sources (the 1,000 Genomes Project), which allowed us to validate each approach against the other.”
Mike, a senior computational biologist, began working on this locus in 2006, publishing a paper with his colleague Evan Eichler at the University of Washington. Mike moved on to other projects, but in 2009 he heard that Steve and Linda had made fresh discoveries in the region. He joined them the following year.
Mike agrees with Bob that the true strength of the project comes from combining different strategies and expertise. “Between the different technologies, including complete, finished sequences for two of the nine versions that we discovered, we were able to describe and classify multiple structural types that we never completely sequenced,” Mike says.
The researchers found many intriguing results, including the fact that two of the structures independently acquired partial duplicate copies of the gene KANSL1. Interestingly, this gene has been shown to affect age-dependent female fertility in fruit flies, offering a potential connection to the relationship between human fertility and this region.
One of the team’s long-term goals is to eventually tie one or more of the nine structures to risk of disease. “We want to get to the point for this locus, and all such loci, where we can see that the phenotype [diseases like Parkinson’s, Celiac, etc.] is traveling with a set of genetic markers for one of these structures,” Steve says.
The methodology that the researchers have used to tease apart structural variation at this locus can now be applied to other complex loci connected to disease.
“Regions of high physical complexity are certainly understudied,” Linda says. “These kinds of projects can be very confusing at the beginning, but when you tease them apart, when you look at copy number and puzzle through it, you will begin to see a beautiful picture of what it all looks like.”