Satellite Banner
Scientific Community
Become a Member | Sign in
Home>News>This Article

National Data Center for Cancer Genome Research

Published: Wednesday, May 02, 2012
Last Updated: Wednesday, May 02, 2012
Bookmark and Share
In the wake of personalized medicine, scientists at the University of California, Santa Cruz, make progress in the management and analysis of large data sets.

The emerging field of "personalized" or "precision" medicine holds great promise in the fight against cancer. If scientists can identify the genetic changes that drive each patient's cancer cells, they can use that information to develop targeted treatments. But achieving this goal will require massive amounts of genomic and clinical data and a sophisticated infrastructure to manage and analyze the data.

The University of California, Santa Cruz, has now completed a first step in building this infrastructure, said UC Santa Cruz bioinformatics expert David Haussler. Haussler's team has established the Cancer Genomics Hub (CGHub), a large-scale data repository and user portal for the National Cancer Institute's cancer genome research programs. CGHub's initial "beta" release is providing cancer researchers with efficient access to a large and rapidly growing store of valuable biomedical data. The project is funded by the National Cancer Institute (NCI) through a $10.3 million subcontract with SAIC-Frederick Inc., the prime contractor for the Frederick National Laboratory for Cancer Research.

"By providing researchers with comprehensive catalogs of the key genomic changes in many major types and subtypes of cancer, these efforts will support the development of more effective ways to diagnose and treat cancer," said Haussler, a distinguished professor of biomolecular engineering in the Baskin School of Engineering at UC Santa Cruz and a Howard Hughes Medical Institute investigator.

In personalized care, doctors design treatments to target specific genetic changes found in a patient's cancer cells. Researchers are trying to catalog all the genetic abnormalities found in different types of cancers and find connections between specific genetic changes and how patients respond to different treatments. The scale and complexity of the information being gathered creates a critical challenge in the area of data management.

Although recent studies using genetically targeted treatments have shown promising results, much more research is needed to enable their widespread use, Haussler said. "There won't be one magic bullet, because cancer is not one disease, or even 100 diseases. Every instance of cancer is different. We have to improve our understanding of the molecular biology of cancer and develop computer algorithms so that we can analyze the genetic changes in each individual patient. It will take time. But with cancer genomics, we will eventually know our enemy completely."

Haussler's team assembled the first draft of the human genome sequence in 2000 and created and maintains the UC Santa Cruz Genome Browser, a Web-based tool that is used extensively in biomedical research and serves as the platform for several large-scale genomics projects. His group's contributions to cancer genomics research include creation of a Cancer Genomics Browser for analyzing data from large-scale cancer studies.

Haussler's group built CGHub to support all three major NCI cancer genome sequencing programs: the Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and the Cancer Genome Characterization Initiative (CGCI). TCGA is a collaborative effort led by NCI and the National Human Genome Research Institute to map the genomic changes that occur in at least 20 major types and subtypes of adult cancer. The TARGET program is a related effort focusing on the five most common childhood cancers, and the CGCI makes available genomic data from HIV-associated cancers and certain lymphoid and childhood cancers.

These programs are laying the foundation for personalized cancer care by creating a database that scientists around the world can use to connect specific genomic changes with clinical outcomes. Haussler's group has been closely involved in data analysis for TCGA.

"TCGA is allowing us for the first time to look at cancer in full molecular detail," Haussler said. "Cancer is a disease caused by disruption of DNA molecules within the cell. When life starts, every cell in the body has the same DNA. In the course of a person's lifetime, however, some cells may accumulate changes in their DNA that cause them to go rogue and multiply without control, creating the disease we call cancer. For the first time now, we are able to look into an individual patient's cancer cells and see all the genetic disruptions, among which are the molecular drivers of that person's cancer."

There are currently only a few situations in which doctors can prescribe a treatment plan based on the specific genetic mutations in a patient's cancer cells. That is expected to change as projects like TCGA, TARGET, and CGCI yield a comprehensive catalog that researchers can use to find new targets for medicines and discover clues to improve patient outcomes. But there is an urgent need for an efficient and user-friendly portal to give researchers access to the data. The NCI genome projects are producing staggering amounts of data.

"The scale of this is far beyond anything faced in medical research before," Haussler said. "Each genome file, the DNA record from a tumor or normal tissue, is 300 billion bytes. And for every case there are two of these files, the cancer genome and the normal genome. Add to this RNA sequence data, and the prospect of deeper sequencing in the future, and we must plan for up to a terabyte (1,000 billion bytes) for each case."

TCGA currently generates about 10 terabytes of data each month. For comparison, the Hubble Space Telescope amassed about 45 terabytes of data in its first 20 years of operation. TCGA's output will increase tenfold or more over the next two years. Over the next four years, if the project produces a terabyte of DNA and RNA data from each of more than 10,000 patients, it will have produced 10 petabytes of data (a petabyte is 1,000 terabytes). And TCGA is just the beginning of the data deluge, Haussler said, noting that 10,000 cases is a small fraction of the 1.5 million new cancer cases diagnosed every year in the United States alone.

New data compression schemes are expected to reduce the total storage space needed, so the CGHub repository is designed initially to hold 5 petabytes and to allow further growth as needed. That is still a massive amount of data, and CGHub will need to accommodate transfers of extremely large data files.

Managed by the UC Santa Cruz team, the CGHub computer system is located at the San Diego Supercomputer Center. It is connected by high-performance national research networks to major centers nationwide that are participating in these projects, including UC Santa Cruz. Haussler's team designed and oversees the storage and computing infrastructure for the repository, which has an automated query and download interface for large-scale, high-speed use. It will eventually also include an interactive web-based interface to allow researchers to browse and query the system and download custom datasets.

It may take years for cancer genomics research to bring about major changes in cancer care. The first step, and the focus of the NCI cancer genomics programs, is to determine which genomic changes are involved in each type of cancer and to understand the molecular and clinical effects of those changes. Then biomedical researchers must identify or develop treatments to block those effects.

"Right now, cancer research needs something on a very large scale, like the Large Hadron Collider in physics," Haussler said. "Instead of bringing subatomic particles together in high-energy collisions and computing their behavior, we're bringing cancer genomes together in a common database and computing the disease drivers."

CGHub program director is Robert Zimmerman; project team members include technical director Mark Diekhans; operations manager Linda Rosewood; hardware systems lead Erich Weiler; engineering lead Chris Wilks; engineering consultant Brian Craft; and networking consultants Brad Smith and Jim Warner. The core code, including GT software for downloading data, was licensed from Annai Systems. The cancer genomics group at UC Santa Cruz also includes co-principal investigator Joshua Stuart, an associate professor of biomolecular engineering at UC Santa Cruz; assistant research scientist Jing Zhu; engineers Kyle Ellrott, Teresa Swatloski and Singer Ma; user testing engineer Mary Goldman; postdoctoral scholars Adam Ewing, Benedict Paten and Daniel Zerbino; research associate Charlie Vaske; and graduate students Tracy Ballinger, Steve Benz, Daniel Carlin, James Durbin, Ted Goldstein, Mia Grifford, Sam Ng, Amie Radenbaugh, Zack Sanborn and Chris Szeto.

Further Information

Join For Free

Access to this exclusive content is for Technology Networks Premium members only.

Join Technology Networks Premium for free access to:

  • Exclusive articles
  • Presentations from international conferences
  • Over 3,900+ scientific posters on ePosters
  • More than 5,300+ scientific videos on LabTube
  • 35 community eNewsletters

Sign In

Forgotten your details? Click Here
If you are not a member you can join here

*Please note: By logging into you agree to accept the use of cookies. To find out more about the cookies we use and how to delete them, see our privacy policy.

Related Content

Crowdfunding the Fight Against Cancer
From budding social causes to groundbreaking businesses to the next big band, crowdfunding has helped connect countless worthy projects with like-minded people willing to support their efforts, even in small ways. But could crowdfunding help fight cancer?
Monday, February 08, 2016
Genome Sequencing May Save California's Legendary Sugar Pine
The genome of California’s legendary sugar pine, which naturalist John Muir declared to be “king of the conifers” more than a century ago, has been sequenced by a research team led by UC Davis scientists.
Thursday, December 17, 2015
‘Purity’ Of Tumor Samples May Significantly Bias Genomic Analyses
Non-cancerous tumor components influence research findings, clinical classifications, study shows.
Monday, December 07, 2015
Rare Childhood Leukemia Reveals Surprising Genetic Secrets
A coalition of leukemia researchers led by scientists from UC San Francisco has discovered surprising genetic diversity in juvenile myelomonocytic leukemia (JMML), a rare but aggressive childhood blood cancer.
Thursday, October 15, 2015
New Autism Genes Are Revealed in Largest-Ever Study
Work draws more detailed picture of genetic risk, sheds light on sex differences in diagnosis.
Wednesday, September 30, 2015
Influenza A Viruses More Likely To Emerge In East Asia Than North America
Novel strains of influenza A are more likely to emerge in East Asia than in North America, according to a global analysis by the One Health Institute at the UC Davis School of Veterinary Medicine and EcoHealth Alliance.
Wednesday, September 30, 2015
Crunching Numbers to Combat Cancer
UCSF receives $5 million to integrate data from cancer research models.
Wednesday, September 16, 2015
Designing New Pain Relief Drugs
Researchers have identified the molecular interactions that allow capsaicin to activate the body’s primary receptor for sensing heat and pain, paving the way for the design of more selective and effective drugs to relieve pain.
Thursday, June 11, 2015
Fast-Mutating DNA Sequences Shape Early Development
What does it mean to be human? According to scientists the key lies, ultimately, in the billions of lines of genetic code that comprise the human genome.
Wednesday, November 13, 2013
Pan-Cancer Studies Find Common Patterns Shared by Different Tumor Types
Findings may open up new treatment options by extending therapies effective in one cancer type to others with a similar genomic profile.
Wednesday, October 02, 2013
New Center for Data Storage Research Established
Researchers in the Baskin School of Engineering at UC Santa Cruz are partnering with data storage industry to establish the Center for Research in Storage Systems (CRSS).
Thursday, March 28, 2013
New Network Being Built to Support Transfer of Big Data
The University of California, San Diego, is taking another leap forward in the name of enabling data-intensive science.
Thursday, March 21, 2013
Personalized Medicine From Genomics and Bioinformatics Highlighted at UCSF Genetics Symposium
Personalized medicine advances arising from genetic discoveries were the primary focus of wide-ranging presentations at the UCSF Institute for Human Genetics 2012 Symposium.
Thursday, November 15, 2012
Computer Model Successfully Predicts Drug Side Effects
Research based on the similarity between a drugs chemical structures and those molecules known to cause side effects, according to a paper appearing online this week in the journal Nature.
Tuesday, June 12, 2012
Scientific News
Improving Drug Production with Computer Model
A model has been developed that can be used to improve and accelerate the production of biotherapeutics, cancer drugs, and vaccines.
Accelerating the Detection of Foodborne Bacterial Outbreaks
The speed of diagnosis of foodborne bacterial outbreaks could be improved by a new technique developed by researchers at the Georgia Institute of Technology.
Largest Resource of Protein-Protein Interactions
Researchers have developed the largest ever database of protein-protein interactions.
NVIDIA Awards $400k to Trailblazers in Cancer Research
NVIDIA Foundation furthers research that could lead to new and more targeted treatments with investments.
Computers Learn to Recognize Molecules That Can Enter Cells
Researchers discover peptides with antimicrobial properties, but also that many known human proteins also had this ability.
Big Data for Infectious Disease Surveillance
NIH-led effort examines use of big data from health records and other digital sources for uses in infectious disease surveillance.
Clinical Screening Test for Gut Health Developed
uBiome has created an entirely new approach to support the clinical diagnosis of gut health conditions.
Computational Tool May Speed Drug Discovery
Scientists are able to see beyond static images of proteins with the help of a new computational tool.
Scientists Develop a Novel Method to Benchmark and Improve the Performance of Protein Measurement Techniques
A wide range of laboratories around the world are benefiting from this work, which enables researchers to analyze or compare the results of quantitative proteomics assays in a standardized way.
Uncovering the Cellular Environment
Researchers have shown how molecules move within a crowded bacterial cell using Japan's K supercomputer.
Scroll Up
Scroll Down

SELECTBIO Market Reports
Go to LabTube
Go to eposters
Access to the latest scientific news
Exclusive articles
Upload and share your posters on ePosters
Latest presentations and webinars
View a library of 1,800+ scientific and medical posters
3,900+ scientific and medical posters
A library of 2,500+ scientific videos on LabTube
5,300+ scientific videos