Corporate Banner
Satellite Banner
Informatics
Scientific Community
 
Become a Member | Sign in
Home>News>This Article
  News
Return

National Data Center for Cancer Genome Research

Published: Wednesday, May 02, 2012
Last Updated: Wednesday, May 02, 2012
Bookmark and Share
In the wake of personalized medicine, scientists at the University of California, Santa Cruz, make progress in the management and analysis of large data sets.

The emerging field of "personalized" or "precision" medicine holds great promise in the fight against cancer. If scientists can identify the genetic changes that drive each patient's cancer cells, they can use that information to develop targeted treatments. But achieving this goal will require massive amounts of genomic and clinical data and a sophisticated infrastructure to manage and analyze the data.

The University of California, Santa Cruz, has now completed a first step in building this infrastructure, said UC Santa Cruz bioinformatics expert David Haussler. Haussler's team has established the Cancer Genomics Hub (CGHub), a large-scale data repository and user portal for the National Cancer Institute's cancer genome research programs. CGHub's initial "beta" release is providing cancer researchers with efficient access to a large and rapidly growing store of valuable biomedical data. The project is funded by the National Cancer Institute (NCI) through a $10.3 million subcontract with SAIC-Frederick Inc., the prime contractor for the Frederick National Laboratory for Cancer Research.

"By providing researchers with comprehensive catalogs of the key genomic changes in many major types and subtypes of cancer, these efforts will support the development of more effective ways to diagnose and treat cancer," said Haussler, a distinguished professor of biomolecular engineering in the Baskin School of Engineering at UC Santa Cruz and a Howard Hughes Medical Institute investigator.

In personalized care, doctors design treatments to target specific genetic changes found in a patient's cancer cells. Researchers are trying to catalog all the genetic abnormalities found in different types of cancers and find connections between specific genetic changes and how patients respond to different treatments. The scale and complexity of the information being gathered creates a critical challenge in the area of data management.

Although recent studies using genetically targeted treatments have shown promising results, much more research is needed to enable their widespread use, Haussler said. "There won't be one magic bullet, because cancer is not one disease, or even 100 diseases. Every instance of cancer is different. We have to improve our understanding of the molecular biology of cancer and develop computer algorithms so that we can analyze the genetic changes in each individual patient. It will take time. But with cancer genomics, we will eventually know our enemy completely."

Haussler's team assembled the first draft of the human genome sequence in 2000 and created and maintains the UC Santa Cruz Genome Browser, a Web-based tool that is used extensively in biomedical research and serves as the platform for several large-scale genomics projects. His group's contributions to cancer genomics research include creation of a Cancer Genomics Browser for analyzing data from large-scale cancer studies.

Haussler's group built CGHub to support all three major NCI cancer genome sequencing programs: the Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and the Cancer Genome Characterization Initiative (CGCI). TCGA is a collaborative effort led by NCI and the National Human Genome Research Institute to map the genomic changes that occur in at least 20 major types and subtypes of adult cancer. The TARGET program is a related effort focusing on the five most common childhood cancers, and the CGCI makes available genomic data from HIV-associated cancers and certain lymphoid and childhood cancers.

These programs are laying the foundation for personalized cancer care by creating a database that scientists around the world can use to connect specific genomic changes with clinical outcomes. Haussler's group has been closely involved in data analysis for TCGA.

"TCGA is allowing us for the first time to look at cancer in full molecular detail," Haussler said. "Cancer is a disease caused by disruption of DNA molecules within the cell. When life starts, every cell in the body has the same DNA. In the course of a person's lifetime, however, some cells may accumulate changes in their DNA that cause them to go rogue and multiply without control, creating the disease we call cancer. For the first time now, we are able to look into an individual patient's cancer cells and see all the genetic disruptions, among which are the molecular drivers of that person's cancer."

There are currently only a few situations in which doctors can prescribe a treatment plan based on the specific genetic mutations in a patient's cancer cells. That is expected to change as projects like TCGA, TARGET, and CGCI yield a comprehensive catalog that researchers can use to find new targets for medicines and discover clues to improve patient outcomes. But there is an urgent need for an efficient and user-friendly portal to give researchers access to the data. The NCI genome projects are producing staggering amounts of data.

"The scale of this is far beyond anything faced in medical research before," Haussler said. "Each genome file, the DNA record from a tumor or normal tissue, is 300 billion bytes. And for every case there are two of these files, the cancer genome and the normal genome. Add to this RNA sequence data, and the prospect of deeper sequencing in the future, and we must plan for up to a terabyte (1,000 billion bytes) for each case."

TCGA currently generates about 10 terabytes of data each month. For comparison, the Hubble Space Telescope amassed about 45 terabytes of data in its first 20 years of operation. TCGA's output will increase tenfold or more over the next two years. Over the next four years, if the project produces a terabyte of DNA and RNA data from each of more than 10,000 patients, it will have produced 10 petabytes of data (a petabyte is 1,000 terabytes). And TCGA is just the beginning of the data deluge, Haussler said, noting that 10,000 cases is a small fraction of the 1.5 million new cancer cases diagnosed every year in the United States alone.

New data compression schemes are expected to reduce the total storage space needed, so the CGHub repository is designed initially to hold 5 petabytes and to allow further growth as needed. That is still a massive amount of data, and CGHub will need to accommodate transfers of extremely large data files.

Managed by the UC Santa Cruz team, the CGHub computer system is located at the San Diego Supercomputer Center. It is connected by high-performance national research networks to major centers nationwide that are participating in these projects, including UC Santa Cruz. Haussler's team designed and oversees the storage and computing infrastructure for the repository, which has an automated query and download interface for large-scale, high-speed use. It will eventually also include an interactive web-based interface to allow researchers to browse and query the system and download custom datasets.

It may take years for cancer genomics research to bring about major changes in cancer care. The first step, and the focus of the NCI cancer genomics programs, is to determine which genomic changes are involved in each type of cancer and to understand the molecular and clinical effects of those changes. Then biomedical researchers must identify or develop treatments to block those effects.

"Right now, cancer research needs something on a very large scale, like the Large Hadron Collider in physics," Haussler said. "Instead of bringing subatomic particles together in high-energy collisions and computing their behavior, we're bringing cancer genomes together in a common database and computing the disease drivers."

CGHub program director is Robert Zimmerman; project team members include technical director Mark Diekhans; operations manager Linda Rosewood; hardware systems lead Erich Weiler; engineering lead Chris Wilks; engineering consultant Brian Craft; and networking consultants Brad Smith and Jim Warner. The core code, including GT software for downloading data, was licensed from Annai Systems. The cancer genomics group at UC Santa Cruz also includes co-principal investigator Joshua Stuart, an associate professor of biomolecular engineering at UC Santa Cruz; assistant research scientist Jing Zhu; engineers Kyle Ellrott, Teresa Swatloski and Singer Ma; user testing engineer Mary Goldman; postdoctoral scholars Adam Ewing, Benedict Paten and Daniel Zerbino; research associate Charlie Vaske; and graduate students Tracy Ballinger, Steve Benz, Daniel Carlin, James Durbin, Ted Goldstein, Mia Grifford, Sam Ng, Amie Radenbaugh, Zack Sanborn and Chris Szeto.


Further Information
Access to this exclusive content is for Technology Networks Premium members only.

Join Technology Networks Premium for free access to:

  • Exclusive articles
  • Presentations from international conferences
  • Over 2,500+ scientific posters on ePosters
  • More than 3,700+ scientific videos on LabTube
  • 35 community eNewsletters


Sign In



Forgotten your details? Click Here
If you are not a member you can join here

*Please note: By logging into TechnologyNetworks.com you agree to accept the use of cookies. To find out more about the cookies we use and how to delete them, see our privacy policy.

Related Content

Designing New Pain Relief Drugs
Researchers have identified the molecular interactions that allow capsaicin to activate the body’s primary receptor for sensing heat and pain, paving the way for the design of more selective and effective drugs to relieve pain.
Thursday, June 11, 2015
Fast-Mutating DNA Sequences Shape Early Development
What does it mean to be human? According to scientists the key lies, ultimately, in the billions of lines of genetic code that comprise the human genome.
Wednesday, November 13, 2013
Pan-Cancer Studies Find Common Patterns Shared by Different Tumor Types
Findings may open up new treatment options by extending therapies effective in one cancer type to others with a similar genomic profile.
Wednesday, October 02, 2013
New Center for Data Storage Research Established
Researchers in the Baskin School of Engineering at UC Santa Cruz are partnering with data storage industry to establish the Center for Research in Storage Systems (CRSS).
Thursday, March 28, 2013
New Network Being Built to Support Transfer of Big Data
The University of California, San Diego, is taking another leap forward in the name of enabling data-intensive science.
Thursday, March 21, 2013
Personalized Medicine From Genomics and Bioinformatics Highlighted at UCSF Genetics Symposium
Personalized medicine advances arising from genetic discoveries were the primary focus of wide-ranging presentations at the UCSF Institute for Human Genetics 2012 Symposium.
Thursday, November 15, 2012
Computer Model Successfully Predicts Drug Side Effects
Research based on the similarity between a drugs chemical structures and those molecules known to cause side effects, according to a paper appearing online this week in the journal Nature.
Tuesday, June 12, 2012
Scientific News
Searching Big Data Faster
Theoretical analysis could expand applications of accelerated searching in biology, other fields.
Imaging Software Could Speed Breast Cancer Diagnosis
Technology could improve access to diagnostic services in developing countries.
Data Mining DNA For Polycystic Ovary Syndrome Genes
A new Northwestern Medicine genome-wide association study of PCOS – the first of its kind to focus on women of European ancestry – has provided important new insights into the underlying biology of the disorder.
Firefly Protein Enables Visualization of Roots in Soil
A new imaging tool from a team led by Carnegie’s José Dinneny allows researchers to study the dynamic growth of root systems in soil, and to uncover the molecular signaling pathways that control such growth.
UEA Research Could Help Build Computers From DNA
New research from the University of East Anglia could one day help build computers from DNA.
Viral Comparisons
ORNL team applies genomics expertise to analyze, map virus sequence database.
Preserving Fleeting Digital Information with DNA
A team has demonstrated that DNA they encapsulated can preserve information for at least 2,000 years, and they’re now working on a filing system to make it easier to navigate.
TGAC Leads Development to Diminish Threat to Vietnam’s Most Important Crop
Advanced bioinformatics capabilities for next-generation rice genomics in Vietnam to aid precision breeding.
Mass Extinctions Can Accelerate Evolution
A computer science team at The University of Texas at Austin has found that robots evolve more quickly and efficiently after a virtual mass extinction modeled after real-life disasters such as the one that killed off the dinosaurs.
Furthering Data Analysis of Next-gen Sequencing to Facilitate Research
Researchers at Cincinnati Children's Hospital Medical Center have developed a user-friendly, integrated platform for analyzing the transcriptomic and epigenomic "big data.
Scroll Up
Scroll Down
SELECTBIO

Skyscraper Banner
Go to LabTube
Go to eposters
 
Access to the latest scientific news
Exclusive articles
Upload and share your posters on ePosters
Latest presentations and webinars
View a library of 1,800+ scientific and medical posters
2,500+ scientific and medical posters
A library of 2,500+ scientific videos on LabTube
3,700+ scientific videos
Close
Premium CrownJOIN TECHNOLOGY NETWORKS PREMIUM FREE!