Assembling a Human Genome for Under $10,000
Aedes aegypti mosquito. Credit: Rice University
A team from Rice University, Baylor College of Medicine, Texas Children’s Hospital and the Broad Institute of MIT and Harvard has developed a new way to sequence genomes that can assemble a human genome from scratch for less than $10,000.
The new 3-D assembly technique, which is described in a study this week in Science, produces a comparable genome sequence to that of the original Human Genome Project, which took 10 years and cost $4 billion.
Erez Lieberman Aiden, assistant professor of genetics at Baylor and a senior scientist at Rice’s Center for Theoretical Biological Physics (CTBP), led a team based at Baylor’s Center for Genome Architecture in developing 3-D assembly.
To illustrate the power of the new method, which can be applied not only to any patient but to any species, Aiden’s team assembled the 1.2 billion-letter genome of Aedes aegypti, the mosquito that transmits the Zika virus. The researchers produced the first end-to-end assembly of each of A. aegypti’s three chromosomes. The new genome is expected to enable scientists to better combat the Zika outbreak by identifying vulnerabilities in the mosquito that the virus uses to spread.
The human genome is a sequence of 6 billion chemical letters, called base pairs, that are divided among 23 pairs of chromosomes. Despite the decline in the cost of DNA sequencing, determining the sequence of each chromosome from scratch, a process called “de novo genome assembly,” remains extremely expensive because chromosomes can be hundreds of millions of base pairs long. In contrast, today’s inexpensive DNA sequencing technologies produce short reads or snippets of DNA sequence that are a hundred base pairs long, which are designed to be compared with an existing reference genome.
Actually generating a reference genome and assembling all those long chromosomes involves combining many different technologies at a cost of hundreds of thousands of dollars. Because human genomes differ from one another, the use of a reference genome generated from one person in the process of diagnosing a disease or disorder in a different person can mask the true genetic changes responsible for a patient’s condition.
“As physicians, we sometimes encounter patients who we know must carry some sort of genetic change, but we can’t figure out what it is,” said Dr. Aviva Presser Aiden, a physician scientist in the Pediatric Global Health Program at Texas Children’s Hospital and a co-author of the new study. “To figure out what’s going on, we need technologies that can report a patient’s entire genome. But we also can’t afford to spend millions of dollars on every patient’s genome.”
Three-D assembly addresses this challenge by studying how the chromosomes fold inside the nucleus of a cell to determine the sequence of each chromosome.
“Our method is quite different from traditional genome assembly,” said Olga Dudchenko, a postdoctoral fellow at the Center for Genome Architecture, who led the research. “Several years ago, our team developed an experimental approach that allows us to determine how the 2-meter-long human genome folds up to fit inside the nucleus of a human cell. In this new study we show that just as these folding maps trace the contour of the genome as it folds inside the nucleus, they can also guide us through the sequence itself.”
By carefully tracing the genome as it folds, the team members found they could stitch together hundreds of millions of short DNA reads into the sequences of entire chromosomes. Since the method uses only cheap short reads, it dramatically reduces the cost of de novo genome assembly and is likely to accelerate the use of de novo genomes in the clinic, Dudchenko said.
“Sequencing a patient’s genome from scratch using 3-D assembly is so inexpensive that it’s comparable in cost to an MRI,” said Dudchenko, who also is a fellow at Rice’s CTBP. “Generating a de novo genome for a sick patient has become realistic.”
Unlike the genetic tests used in the clinic today, the de novo assembly of a patient genome does not rely on the reference genome produced by the Human Genome Project.
“Our new method doesn’t depend on previous knowledge about the individual or the species that is being sequenced,” Dudchenko said. “It’s like being able to perform a human genome project on whomever you want, whenever you want.”
“Or whatever you want,” said Lieberman Aiden, director of the Center for Genome Architecture and the corresponding author on the new study. “Because the genome is generated from scratch, 3-D assembly can be applied to a wide array of species, from grizzly bears to tomato plants. And it is pretty easy. A motivated high school student with access to a nearby biology lab can assemble a reference-quality genome of an actual species, like a butterfly, for the cost of a science fair project.”
The effort took on added urgency with the outbreak of the Zika virus, which is carried by the A. aegypti. Researchers hoped to use the mosquito’s genome to identify a strategy to combat the disease, but the Aedes genome had not been well-characterized, and its chromosomes are much longer than those of humans.
“We had been discussing these ideas for years — writing a chunk of code here, doing a proof-of-principle assembly there,” Lieberman Aiden said. “So we had assembly data for Aedes aegypti just sitting on our computers. Suddenly, there’s an outbreak of Zika virus, and the genomics community was galvanized to get going on Aedes. That was a turning point.”
“With the Zika outbreak, we knew that we needed to do everything in our power to share the Aedes genome assembly, and our methods, as soon as possible,” Dudchenko said. “This de novo genome assembly is just a first step in the battle against Zika, but it’s one that can help inform the community’s broader effort.”
The team also assembled the genome of the Culex quinquefasciatus mosquito, the principal vector for West Nile virus.
“Culex is another important genome to have since it is responsible for transmitting so many diseases,” Lieberman Aiden said. “Still, trying to guess what genome is going to be critical ahead of time is not a good plan. Instead, we need to be able to respond quickly to unexpected events. Whether it is a patient with a medical emergency or the outbreak of an epidemic, these methods will allow us to assemble de novo genomes in days instead of years.”
Other contributors to this work include Sanjit Batra, Arina Omer, Sarah Nyquist, Marie Hoeger, Neva Durand, Muhammad Shamim, Ido Machol, all with Baylor’s Center for Genome Architecture and Rice University, and Eric Lander, of the Broad Institute.