Complete Genomics Publishes Paper Describing Its Informatics Approach for High-Accuracy Whole Human Genome Sequencing
News Jan 20, 2012
Complete Genomics performs whole human genome sequencing using proprietary biochemistry based on DNA nanoball arrays and combinatorial probe-anchor ligation sequencing. As these methods1 produce reads with unique characteristics, Complete Genomics has developed new methods that call single nucleotide polymorphisms (SNPs), short substitutions and insertions/deletions.
"The methods described in this paper produce very accurate variant calls," said Dr. Clifford Reid, chairman, president and CEO of Complete Genomics. "The algorithms described in this paper have been used for all of our 69 genome public data repository and the more than 3,800 complete, deeply sequenced human genomes we have delivered to customers to date." Access to Complete Genomics' genome data repository is provided free of charge at http://www.completegenomics.com/sequence-data/download-data/.
The effectiveness of the company's sequencing and bioinformatics approach is borne out in customer research papers where its data has been used to investigate lung cancer2, Miller syndrome3, craniosynostosis4 and hypercholesterolemia5 and published in Science, Nature, The American Journal of Human Genetics and Human Molecular Genetics, respectively. It is also compared positively with another sequencing technology in the December issue of Nature Biotechnology6.
Complete Genomics' approach employs a local de novo assembly process, which uses a combination of Bayesian analysis and graph-based techniques, for each variation. This de novo assembly approach, which was pioneered by Complete Genomics, has since been adopted by other organizations.
The company's assembly approach allows it to call both alleles at a position independently. This enables Complete Genomics to make complex calls in cases where both alleles differ from the reference. Furthermore, its algorithms are particularly adept at detecting variants that are located close to each other. Complete Genomics' technology is also capable of detecting previously unknown indels, whereas some other approaches can only check whether a known indel is present. This additional insight is included in the rich variant reports that Complete Genomics delivers to its customers. These reports also include copy number variations (CNVs), structural variations (SVs), transposable element insertions, and a comparison of tumor and normal samples if applicable. The comprehensiveness of the standard data reports provided reduces researchers' data analysis burden when working with Complete Genomics data.
Complete Genomics continues to refine its methods, making improvements in the quality and cost of data it produces to enable large-scale disease and cancer studies in the translational research market. "I'm always looking for ways to optimize our algorithms so that they run faster and produce more accurate output," said Bruce Martin, senior vice president of product development. "As a result, Complete Genomics can now map and assemble a genome in less than a day with very high sensitivity and specificity."