Deploying Cloud Technologies to Fight Pediatric Cancer
Blog May 10, 2018 | by Ruairi MacKenzie and Laura Elizabeth Mason, Science Writers, Technology Networks
St. Jude Children’s Research Hospital recently launched the St. Jude Cloud, an online data-sharing and collaboration platform that provides researchers access to the world's largest public repository of pediatric cancer genomics data. We caught up with Jinghui Zhang, PhD, Chair of the Department of Computational Biology and St. Jude Endowed Chair in Bioinformatics at St. Jude Children’s Research Hospital to discuss the cloud and its impact on pediatric research.
Could you tell us about how the St. Jude Cloud project originated and developed?
Jinghui Zhang (JZ): St. Jude Cloud is a partnership between St. Jude Children’s Research Hospital, DNAnexus and Microsoft to overcome the challenges of technological limitations like storage and speed to advance scientific discovery from our huge repositories of data. The partnership grew out of conversations among St. Jude, Microsoft and DNAnexus about computational challenges of processing whole genome data.
The team realized that bringing together all three institutions could solve challenges for St. Jude genomics efforts and free researchers to focus on complex research questions.
St. Jude generates the pediatric cancer data. Microsoft Genomics performs alignment and variant calling whereby billion-piece puzzles of raw genomics data are aligned to a reference genome and then differences between the aligned and reference genomes are identified. DNAnexus provides an open, flexible cloud platform that supports Microsoft Genomics service as well as other genomics services, providing researchers access to tools and diverse datasets in a collaborative ecosystem.
Could you tell us more about the three aspects of St. Jude Cloud -- data, tools and visualizations -- and explain how each of these aspects can help advance progress within the field of pediatric cancer?
JZ: Data: The St. Jude Cloud dataset has more than 5,000 whole-genome (WGS), 5,000 whole-exome (WES), and 1,200 RNA-Seq datasets from more than 5,000 pediatric cancer patients and survivors. Access to data is simple, fast and does not require downloading prior to exploration. The St. Jude Cloud enables data-sharing on a large scale and will help drive research and discovery forward. By 2019, we expect to make 10,000 whole-genome sequences available on St. Jude Cloud.
Tools: St. Jude Cloud features a collection of bioinformatics to help experts and non-specialists gain novel insights from genomics data. These tools include thoroughly validated end-to-end computational pipelines and interactive visualization tools to make it easier to gain new insights from these large datasets. Users can upload their own data or use existing data from St. Jude with tools including: PeCan PIE, Rapid RNA-Seq Fusion Detection, ChIP-Seq Peak Calling, WARDEN Differential Expression Analysis, and HLA Typing and Neoepitope Prediction.
Visualizations: St. Jude Cloud enables researchers to explore St. Jude data or their own results using innovative, interactive visualizations powered by ProteinPaint, the genomic visualization engine developed at St. Jude. Other St. Jude Cloud tools can generate custom visualizations. With data at their fingertips, researchers can produce custom visualizations of a user’s own research data for exploration or comparison across samples, pinpoint specific mutations, or drill down to individual subjects.
The Cloud offers a wealth of genomic data, including the relatively novel deployment of Neoepitope Prediction. How can Neoepitope Prediction data advance cancer research?
JZ: Right now, the key to neoepitope prediction data is greater accuracy. We are hopeful that St. Jude Cloud can help researchers develop faster and more precise means of identifying all expressed neoepitopes, which in turn could aid in the development of personalized cancer vaccines.
Could you tell us more about the St. Jude Children’s Research Hospital—Washington University Pediatric Cancer Genome Project (PCGP)?
JZ: In 2010, St. Jude Children’s Research Hospital and Washington University School of Medicine in St. Louis announced the launch of the Pediatric Cancer Genome Project (PCGP), the world’s most ambitious effort to discover the origins of childhood cancer and seek new cures. During the three-year initiative, researchers analyzed genomics data from more than 800 children and adolescents with 23 different childhood cancers and sequenced more than 700 tumors.
The cancer genome sequencing efforts yielded one of the largest high-coverage whole-genome DNA sequence databases in cancer and an unparalleled view of the altered signaling pathways in cancer. It also generated primary DNA sequence data—a resource for both cancer and non-cancer researchers.
By comparing the complete genomes from cancerous and normal cells for more than 800 patients, the PCGP successfully pinpointed the genetic factors behind some of the toughest pediatric cancers. PCGP has some of the most accurate algorithms for detecting single nucleotide variations, structural variation and DNA copy number variations.
The cloud encourages researchers to upload their own data into the system’s secure cloud environment. Can you tell us more about how such vital data can be secured by the cloud?
JZ: Sharing research and scientific discoveries is vital to advancing cures and saving lives, especially in rare diseases like pediatric cancer. Making sure this data is shared in as secure a manner as possible is very important to us. It’s one of the reasons we chose to partner with DNAnexus and Microsoft.
As for data access, we have rigorous protocols in place to help ensure that only a bona fide researcher from an academic institution — with a real research question and the means to explore that question — is given access to St. Jude Cloud.
St. Jude Cloud allows users to visualize data from the PCGP and other published studies. Could you expand on which other studies are included within the repository?
JZ: In addition to PCGP data, St. Jude Cloud includes data from two other St. Jude-supported genomics initiatives: the Genomes for Kids clinical trial and the St. Jude Lifetime Cohort study. The Genomes for Kids clinical trial data has genomics information about children and teens who have been diagnosed with a solid or a liquid tumor. The St. Jude Lifetime Cohort study has whole genome and whole exome sequencing data from more than 3,000 long-term pediatric cancer survivors.
Jinghui Zhang was speaking to Ruairi MacKenzie and Laura Elizabeth Mason, Science Writers for Technology Networks.