The technique will allow better access to genetic information of the Y chromosome of any species and thus can be used to study male infertility disorders and male-specific mutations. It also can aid in conservation genetics efforts by helping to trace paternity and to track how males move within and between populations in endangered species, like gorillas.
"Surprisingly, we found that in many ways the gorilla Y chromosome is more similar to the human Y chromosome than either is to the chimpanzee Y chromosome," said Kateryna Makova, the Francis R. and Helen M. Pentz Professor of Science at Penn State and one of two corresponding authors of the paper. "In regions of the chromosome where we can align all three species, the sequence similarity fits with what we know about the evolutionary relationships among the species -- humans are more closely related to chimpanzees. However, the chimpanzee Y chromosome appears to have undergone more changes in the number of genes and contains a different amount of repetitive elements compared to the human or gorilla. Moreover, a greater proportion of the gorilla Y sequences can be aligned to the human than to the chimpanzee Y chromosome."
The Y chromosome of mammals is extremely difficult to sequence for a number of reasons. One reason is that the Y chromosome is present in only one copy and makes up only about one to two percent of the total genetic material found in a cell of a male. To reduce this difficulty, the researchers used an experimental technique called flow-sorting to preferentially select the Y chromosome for sequencing based on the chromosome's size and genetic content.
"Flow-sorting increased the amount of the Y chromosome in our dataset to about thirty percent," said Paul Medvedev, assistant professor of computer science and engineering and of biochemistry and molecular biology at Penn State, the other corresponding author of the paper. "To further enrich our data for the Y chromosome, we developed a computational technique -- called RecoverY -- to sort the data into Y and non-Y sequences based on how frequently similar sequences appeared in our data."
The Y chromosome, like all DNA, is composed of a series of molecules called "bases" that are represented by the letters A, T, C, and G. Current genetic sequencing technologies produce "reads" of sequence that are much shorter than the entire length of the chromosome. These reads need to be placed in order and pieced together by finding places where they overlap into longer and longer chunks. The research team used two different sequencing technologies to help with this assembly of the DNA sequence of the Y chromosome.
The first technology produces massive amounts of very short reads -- about 150 to 250 bases in length. Using this method, the researchers sequenced enough reads to cover the entire length of the Y chromosome about 450 times. They assembled these short reads into longer chunks that they then further connected using a second technology that produces longer reads -- about seven thousand bases in length on average.
"By reducing non-Y chromosome reads from our data with flow sorting and the RecoverY technique that we developed, and by using this combination of sequencing technologies, we were able to assemble the gorilla Y chromosome so that more than half of the sequence data was in chunks longer than about 100,000 bases in length," said Medvedev.
Another reason that determining the genetic sequence of the Y chromosome is so difficult is that it is composed of an unusually high number of repeat sequences -- regions where the sequence of As, Ts, Cs, and Gs is identical, or nearly identical, for thousands or millions of bases in a row. Many of these repeats, including some genes, appear as back-to-back series of the same repeated sequence or as long palindromes which, like the word "racecar," read the same forward and backward. The researchers used an experimental technique -- "droplet digital polymerase chain reaction" -- to determine the number of copies of the genes that appear in these series.
"Sequencing the Y chromosome is like trying to put together a jigsaw puzzle, without knowing the final picture, from a pile of pieces where only about one out of every hundred is useful, and most of the pieces you do need look identical," said Makova. "We've developed a pipeline for sequencing the Y chromosome that is more efficient than previous methods and reduces a number of the difficulties associated with determining the genetic sequence of the Y chromosome. Our method will open the door for studying the Y chromosome for more labs, more species, and more individuals within those species."
To demonstrate the utility of the gorilla Y chromosome sequence they generated, the researchers designed genetic markers that can be used to differentiate the genetic relatedness among male gorillas and thus to aid in conservation genetics efforts targeted at preserving this endangered species.