DNAnexus Cloud Genomics Platform to Support Data Management and Genomic Analysis
News Jul 01, 2015
DNAnexus Inc., has announced that Stanford University, the Data Coordination Center (DCC) for the National Institutes of Health (NIH)-funded ENCyclopedia of DNA Elements (ENCODE) Project, a flagship functional genomics consortium funded by the National Human Genome Research Institute (NHGRI) at the NIH, has adopted the company’s cloud genomics platform to support data analysis and sharing for its Phase 3 project. The DNAnexus platform supports the DCC bioinformatics analysis of ENCODE data, making the consortium’s bioinformatics methods available to the broader research community.
The goal of the ENCODE Project is to comprehensively catalog all the features of the human genome, and provide a foundation for studying the genomic basis of human biology and disease. The ENCODE Consortium includes investigators and high-throughput sequencing centers at fourteen biomedical institutes across North America. The ENCODE DCC, tasked with centralizing the project’s raw sequencing data with uniform metadata standards and bioinformatics analysis chose DNAnexus because the company provides:
• a secure and unified platform already connecting thousands of scientists around the world
• a scalable environment to process thousands of datasets and allow collaboration around petabyte-sized genomic analysis results
• transparency, reproducibility, and data provenance for consistency amongst ENCODE pipelines and results
Stanford researchers were able to establish and optimize the ENCODE consortium’s initial bioinformatics pipelines for the cloud. These pipelines are now running in production, transforming raw sequencing data into refined analysis results for downstream use by the broader consortium and scientific community at large. It’s expected this analysis will require 10 million core-hours of compute and will generate nearly 1 petabyte of raw data over the next 18 months on the DNAnexus platform.
The development of analysis pipelines is a priority in the current phase of the ENCODE project to ensure data released to the public are consistently processed. By collaborating with DNAnexus, Stanford was able to run version-controlled ENCODE pipelines to produce clear, consistent results and make them available in real-time to researchers around the world. Stanford has open-sourced the ENCODE pipelines on GitHub, and they are also available in a public project on the DNAnexus platform.
“Many large-scale genomic studies have been limited by the lack of required compute power and collaborative data management infrastructure; this is a real hindrance in realizing the full potential of genomic medicine,” said Richard Daly, CEO of DNAnexus. “The DNAnexus global network provides hundreds of researchers at institutions worldwide secure and immediate access and use of ENCODE’s results. We believe the availability of the consortium’s gold-standard analysis pipelines and ENCODE data on a single integrated platform will accelerate genomic medicine.”
DDN Provides University of Tennessee’s SimCenter with Big Data Storage to Support Machine Learning and Data AnalyticsNews
DataDirect Networks (DDN®) today announced that The University of Tennessee, Chattanooga (UTC) has selected DDN’s GS14KX® parallel file system appliance with 1.1PB of storage to replace its aging big data storage system and to support a diversifying range of data-intensive research projects.READ MORE
Closer Look at Immune Proteins Provides Insight into Potential Pathogen Protection StrategiesNews
Biologists have resolved the structure of a ring of proteins used by the immune system to summon support when under attack, providing new insight into potential strategies for protection from pathogens.READ MORE
Computer Program Helps Find Ways to Repurpose Existing DrugsNews
Researchers have developed a computer program to find new indications for old drugs. The computer program, called DrugPredict, matches existing data about FDA-approved drugs to diseases, and predicts potential drug efficacy.READ MORE