DNAnexus Inc., has announced that Stanford University, the Data Coordination Center (DCC) for the National Institutes of Health (NIH)-funded ENCyclopedia of DNA Elements (ENCODE) Project, a flagship functional genomics consortium funded by the National Human Genome Research Institute (NHGRI) at the NIH, has adopted the company’s cloud genomics platform to support data analysis and sharing for its Phase 3 project. The DNAnexus platform supports the DCC bioinformatics analysis of ENCODE data, making the consortium’s bioinformatics methods available to the broader research community.
The goal of the ENCODE Project is to comprehensively catalog all the features of the human genome, and provide a foundation for studying the genomic basis of human biology and disease. The ENCODE Consortium includes investigators and high-throughput sequencing centers at fourteen biomedical institutes across North America. The ENCODE DCC, tasked with centralizing the project’s raw sequencing data with uniform metadata standards and bioinformatics analysis chose DNAnexus because the company provides:
• a secure and unified platform already connecting thousands of scientists around the world
• a scalable environment to process thousands of datasets and allow collaboration around petabyte-sized genomic analysis results
• transparency, reproducibility, and data provenance for consistency amongst ENCODE pipelines and results
Stanford researchers were able to establish and optimize the ENCODE consortium’s initial bioinformatics pipelines for the cloud. These pipelines are now running in production, transforming raw sequencing data into refined analysis results for downstream use by the broader consortium and scientific community at large. It’s expected this analysis will require 10 million core-hours of compute and will generate nearly 1 petabyte of raw data over the next 18 months on the DNAnexus platform.
The development of analysis pipelines is a priority in the current phase of the ENCODE project to ensure data released to the public are consistently processed. By collaborating with DNAnexus, Stanford was able to run version-controlled ENCODE pipelines to produce clear, consistent results and make them available in real-time to researchers around the world. Stanford has open-sourced the ENCODE pipelines on GitHub, and they are also available in a public project on the DNAnexus platform.
“Many large-scale genomic studies have been limited by the lack of required compute power and collaborative data management infrastructure; this is a real hindrance in realizing the full potential of genomic medicine,” said Richard Daly, CEO of DNAnexus. “The DNAnexus global network provides hundreds of researchers at institutions worldwide secure and immediate access and use of ENCODE’s results. We believe the availability of the consortium’s gold-standard analysis pipelines and ENCODE data on a single integrated platform will accelerate genomic medicine.”