Powerful Genome Barcoding System Reveals Large-Scale Variation in Human DNA
News Jun 01, 2010
Genetic abnormalities are most often discussed in terms of differences so miniscule they are actually called "snips" - changes in a single unit along the 3 billion that make up the entire string of human DNA.
"There's a whole world beyond SNPs - single nucleotide polymorphisms - and we've stepped into that world," says Brian Teague, a doctoral student in genetics at the University of Wisconsin-Madison. "There are much bigger changes in there."
Variation on the order of thousands to hundreds of thousands of DNA's smallest pieces - large swaths varying in length or location or even showing up in reverse order - appeared 4,205 times in a comparison of DNA from just four people, according to a study published May 31 in the Proceedings of the National Academy of Sciences.
Those structural differences popped into clear view through computer analysis of more than 500 linear feet of DNA molecules analyzed by the powerful genome mapping system developed over nearly two decades by David C. Schwartz, professor of chemistry and genetics at UW-Madison.
"We probably have the most comprehensive view of the human genome ever," Schwartz says. "And the variation we're seeing in the human genome is something we've known was there and important for many years, but we haven't been able to fully study it."
To get a better picture of those structural variations, Schwartz and his team developed the Optical Mapping System, a wholly new type of genome analysis that directly examines millions of individual DNA molecules.
Common systems for analyzing genomes typically chop long DNA molecules into fragments less than a couple thousand base pairs long and multiply them en masse, like a copy machine, to develop a chemical profile of each piece.
Reading such small sections without seeing their place in the larger picture of DNA leaves out critical understanding. To make matters worse, interesting parts of the human genome are often found within DNA's trickiest stretches.
"Short pieces could really come from so many different locations," Teague says. "An enormous part of the genome is composed of repeating DNA, and important differences are often associated with areas that have a lot of repeated sections."
It's a problem inherent to the method that has irked Schwartz for a long time.
"Our new technology quickly analyzes huge DNA molecules one at a time, which eliminates the copy machine step, reduces the number of DNA jig-saw pieces and increases the unique qualities of each piece," Schwartz says. "These advantages allow us to discover novel genetic patterns that are otherwise invisible."
The genome mapping system in Schwartz' lab takes in much larger pieces, at least millions of base pairs at a time. Sub-millimeter sections of single DNA molecules - thread-like and, in full, 4 to 5 inches long in humans - are coaxed onto treated glass surfaces.
The long strands of DNA straighten out on the glass, and are clipped into sections by enzymes and scanned by automated microscopes. The pattern of these cuts along each molecule thread produces a unique barcode, identifying the DNA molecule and revealing genetic changes it harbors.
The scan results are passed along to databases for storage and retrieval, and handled by software that stitches collections of bar-coded molecules together with others to reconstitute the entire strand of DNA and quickly pinpoint genetic changes.
"What we have here is a genetic version of Google Earth," Schwartz says. "I could sit down with you and start at chromosome 1, and we could pan and zoom through each one and actually see the genetic changes across an individual's genome."
To Teague, the Optical Mapping System provides access to a new frame of reference on human genetic variation.
"I've got a whole folder of papers on diseases that are ascribable to these structural differences," he says. "If you can see the genetic basis for those diseases, you can figure out the molecular differences in their development and pick drug targets to treat or cure or avoid them altogether. We fit into that storyline right up at the front."
It's been a long story.
"We've been thinking about these large structural variations for decades," says Schwartz, whose work is funded by the National Institutes for Health and the National Science Foundation. "The problem was that the system for discerning large structural variants was not available. So we had to build it."
The integrative building process included studying the behavior of fluids at microscopic scale, manipulating large DNA molecules and placing barcodes on them, automating high-powered microscopes to analyze single molecules, organizing the computing infrastructure to handle the data and algorithms to analyze whole human genome, and more.
And after notable turns analyzing the DNA of corn, parasites, bacteria and even the mold that caused the 19th-century potato famine in Ireland, Schwartz has arrived at the human genome, his original target.
"It's like you spend years making a telescope, and then one day you point it at the sky and you discover things that no one else could see," he says. "We've integrated so many scientific problems together in a holistic way, which lets us solve very hard problems."
The result is a 30-day turnaround for one graduate student to analyze one human genome, but that's just a waypoint. Schwartz's team isn't just pointing at the sky. They are aiming for the stars by building new systems for personal genomics.
"This will go even further," says Konstantinos Potamousis, the lab's instrumentation innovator and a co-author on the study, which included researchers from UW-Madison, Mississippi State University, the University of Pittsburgh, the University of Southern California and the University of Washington. "Our systems scale nicely into the future because we've pioneered single molecule technologies. The newer systems we are building will provide more genetic information in far less time."
With development complete on new molecular devices, software and analysis, a large piece of the system is already in place.
And the speed of innovation will synergize the pace of genome analysis.
"Our newer genome analysis systems, if commercialized, promise genome analysis in one hour, at under $1,000," Schwartz says. "And we require that high speed and low cost to power the new field of personal genomics."
Computer scientists at Carnegie Mellon University say neural networks and supervised machine learning techniques can efficiently characterize cells that have been studied using single cell RNA-sequencing (scRNA-seq). This finding could help researchers identify new cell subtypes and differentiate between healthy and diseased cells.