Applied Biosystems, an Applera Corporation business, has announced a significant development in the quest to lower the cost of DNA sequencing. Scientists from the company have sequenced a human genome using its next-generation genetic analysis platform. The sequence data generated by this project reveal numerous previously unknown and potentially medically significant genetic variations. It also provides a high-resolution, whole-genome view of the structural variants in a human genome, making it one of the most in-depth analyses of any human genome sequence. Applied Biosystems is making this information available to the worldwide scientific community through a public database hosted by the National Center for Biotechnology Information (NCBI).
Applied Biosystems was able to analyze the human genome sequence for a cost of less than $60,000, which is the commercial price for all required reagents needed to complete the project. This is a fraction of the cost of any previously released human genome data, including the approximately $300 million1 spent on the Human Genome Project. The cost of the Applied Biosystems sequencing project is less than the $100,000 milestone set forth by the industry for the new generation of DNA sequencing technologies, which are beginning to gain wider adoption by the scientific community.
The availability of this sequence data in the public domain is expected to help scientists gain a greater understanding of human genetic variation and potentially help them to explain differences in individual susceptibility and response to treatment for disease, which is the goal of personalized medicine. Although most human genetic information is the same in all people, researchers are generally more interested in studying the small percentage of genetic material that varies among individuals. They seek to characterize that variation as either single-base changes, or as a series of larger stretches of sequence variation known as structural variants. Structural variants comprise fragments of DNA – which include insertions, deletions, inversions, and translocations of DNA sequences ranging from a few to millions of base pairs that have a higher potential of impacting genes and thus contributing to human disease.
Under the direction of Kevin McKernan, Applied Biosystems’ senior director of scientific operations, the scientists resequenced a human DNA sample that was included in the International HapMap Project. The team used the company’s SOLiD™ System to generate 36 gigabases of sequence data in 7 runs of the system, achieving throughput up to 9 gigabases per run, which is the highest throughput reported by any of the providers of DNA sequencing technology.
The 36 gigabases includes DNA sequence data generated from covering the contents of the human genome more than 12 times, which helped the scientists to determine the precise order of DNA bases and to confidently identify the millions of single-base variations (SNPs) present in a human genome. The team also analyzed the areas of the human genome that contain the structural variation between individuals. These regions of structural variation were revealed by greater than 100-fold physical coverage, which shows positions of larger segments of the genome that may vary relative to the human reference genome.
“We believe this project validates the promise of next-generation sequencing technologies, which is to lower the cost and increase the speed and accuracy of analyzing human genomic information,” said McKernan. “With each technological milestone, we are moving closer to realizing the promise of personalized medicine.”
McKernan’s team used the SOLiD System’s ultra-high-throughput capabilities to obtain deep sequence coverage of the genome of an anonymous African male of the Yoruba people of Ibadan, Nigeria, who participated in the International HapMap Project. The scientists were able to perform an in-depth analysis of structural variants by creating multiple paired-end libraries of genomic sequence that included a wide range of insert sizes. Most inserts exceeded 1,000 bases. The SOLiD System has the ability to analyze paired-end libraries with large insert sizes. For the millions of SNPs identified in the project, the SOLiD System’s 2-base encoding chemistry discriminated random or systematic errors from true SNPs to reveal these SNPs with greater than 99.94 percent sequencing accuracy.
Another important attribute of the SOLiD System is that, unlike other available DNA sequencing platforms, the system is inherently scalable to support higher levels of throughput without requiring changes to the system’s hardware. The high-throughput, accuracy and paired-end analysis capability of the SOLiD System are expected to continue to reduce the cost of conducting studies of complex genomes and how variation in these genomes contributes to conditions such as cancer, diabetes and heart disease, among others.