Corporate Banner
Satellite Banner
Proteomics
Scientific Community
 
Become a Member | Sign in
Home>News>This Article
  News
Return

New Statistical Tools Being Developed for Mining Cancer Data

Published: Monday, December 02, 2013
Last Updated: Monday, December 02, 2013
Bookmark and Share
Team from Rice, BCM, UT Austin tackling big data variety.

Researchers at Rice University, Baylor College of Medicine (BCM) and the University of Texas at Austin are working together to create new statistical tools that can find clues about cancer that are hidden like needles in enormous haystacks of raw data.

“The motivation for this is all of these new high-throughput medical technologies that allow clinicians to produce tons of molecular data about cancer,” said project lead Genevera Allen, a statistician with joint appointments at Rice and BCM. “For example, when a tumor is removed from a cancer patient, researchers can conduct genomic, proteomic and metabolomic scans that measure nearly every possible aspect of the tumor, including the number and location of genetic mutations and which genes are turned off and on. The end result is that for one tumor, you can have measurements on millions of variables.”

This type of data exists — the National Institutes of Health (NIH) has compiled such profiles for thousands of cancer patients — but scientists don’t yet have a way to use the data to defeat cancer.

Allen and BCM collaborator Zhandong Liu teamed up to attack the problem in 2012 thanks to a seed-funding grant from Rice’s Ken Kennedy Institute for Information Technology (K2I). Based on results of that study, Allen, Liu and UT Austin computer scientist Pradeep Ravikumar have won a new $1.3 million federal grant that will allow them to create a new statistical framework for integrated analysis of multiple sets of high-dimensional data measured on the same group of subjects.

“There are a couple of things that make this challenging,” said Allen, the principal investigator (PI) on the new grant, which was awarded jointly by the National Science Foundation and the NIH. “First, the data produced by these high-throughput technologies can be very different, so much so that you get into apples-to-oranges problems when you try in make comparisons. Second, for scientists to leverage all of this data and better understand the molecular basis of cancer, these varied ‘omics’ data sets need to be combined into a single multivariate statistical model.”

For example, Allen said, some tests, like gene-expression microarrays and methylation arrays, return “continuous data,” numbers with decimal places that represent the amounts of a particular protein or biomarker. Other tests, like RNA-sequencing, return “count data,” integers that indicate how often a biomarker shows up. And for yet other tests, the output is “binary data.” An example of this would be a test for a specific mutation that produces a zero if the mutation does not occur and a one if it does.

“Right now, the state of the art for analyzing these millions of biomarkers would be to create one data matrix — think one Excel spreadsheet — where all the numbers are continuous and can be represented with bell-shaped curves,” said Allen, Rice’s Dobelman Family Junior Chair of Statistics and assistant professor of statistics and electrical and computer engineering. “That’s very limiting for two reasons. First, for all noncontinuous variables — like the binary value related to a specific mutation — this isn’t useful. Second, we don’t want to just analyze the mutation status by itself. It’s likely that the mutation affects a bunch of these other variables, like epigenetic markers and which genes are turned on and off. Cancer is complex. It’s the result of many things coming together in a particular way. Why should we analyze each of these variables separately when we’ve got all of this data?”

Developing a framework where continuous and noncontinuous variables can be analyzed simultaneously won’t be easy. For starters, most of the techniques that statisticians have developed for parallel analysis of three or more variables — a process called multivariate analysis — only work for continuous data.

“It is a multivariate problem, and that’s how we’re approaching it,” Allen said. “But a proper multivariate distribution does not exist for this, so we have to create one mathematically.”

To do this, Allen, Liu and Ravikumar are creating a mathematical framework that will allow them to find the “conditional dependence relationships” between any two variables.

To illustrate how conditional dependence works, Allen suggested considering three variables related to childhood growth — age, IQ and shoe size. In a typical child, all three increase together.

“If we looked at a large dataset, we would see a relationship between IQ and shoe size,” she said. “In reality, there’s no direct relationship between shoe size and IQ. They happen to go up at the same time, but in reality, each of them is conditionally dependent upon age.”

For cancer genes, where the relationships aren’t as obvious, developing a mathematical technique to decipher conditional dependence could avoid the need to rule out such errors through years of expensive and time-consuming biological experiments.

Thanks to the seed-funding grant from K2I’s Collaborative Advances in Biomedical Computing program, Allen and her collaborators have already illustrated how to use these techniques. They’ve produced a network model for a half-million biomarkers related to a type of brain cancer called glioblastoma. The model acts as a sort of road map to guide researchers to the relationships that are most important in the data.

“All these lines tell us which genetic biomarkers are conditionally dependent upon one another,” she said in reference to the myriad connections in the model. “These were all determined mathematically, but our collaborators will test some of these relationships experimentally and confirm that the connections exist.”

Allen said the team’s technique will also be useful for big data challenges that exist in fields ranging from retail marketing to national security.

“This is a very general mathematical framework,” she said. “That’s why I do math. It works for everything.”


Further Information

Join For Free

Access to this exclusive content is for Technology Networks Premium members only.

Join Technology Networks Premium for free access to:

  • Exclusive articles
  • Presentations from international conferences
  • Over 3,200+ scientific posters on ePosters
  • More than 4,600+ scientific videos on LabTube
  • 35 community eNewsletters


Sign In



Forgotten your details? Click Here
If you are not a member you can join here

*Please note: By logging into TechnologyNetworks.com you agree to accept the use of cookies. To find out more about the cookies we use and how to delete them, see our privacy policy.

Related Content

‘Missing Tooth’ Hydrogels Handle Hard-to-Deliver Drugs
Rice University’s custom hydrogel traps water-avoiding molecules for slow delivery.
Wednesday, June 08, 2016
Cancer Cells’ Evasive Action Revealed
Rice, MD Anderson scientists analyze suppression of proteins key to immune recognition.
Friday, March 04, 2016
Obstacles Not Always a Hindrance to Proteins
Rice researchers’ theory finds blocked path sometimes speeds DNA sequence search.
Friday, December 11, 2015
Biomarker Finder Adjusts On the Fly
Rice University scientists build better tool to find signs of disease.
Thursday, October 22, 2015
Gene On-Off Switch Works Like Backpack Strap
Texas Medical Center-based team unravels how loops form in genome.
Thursday, October 22, 2015
Structure of Protein at Root of Muscular Disease Decoded
Researchers at Rice University and Baylor College of Medicine have unlocked the structural details of a protein seen as key to treating a neuromuscular disease.
Thursday, October 01, 2015
Researchers Find New Clue to Halting Leukemia Relapse
A protein domain once considered of little importance may be key to helping patients who are fighting acute myeloid leukemia (AML) avoid a relapse.
Friday, September 11, 2015
Bacteria Use DNA Replication to Time Key Decision
Rice University researchers have found that in spore-forming bacteria, chromosomal locations of genes can couple the DNA replication cycle to critical decisions about whether to reproduce or form spores.
Monday, July 13, 2015
DNA Mutations get Harder to Hide
Rice University researchers have developed a method to detect rare DNA mutations with an approach hundreds of times more powerful than current methods.
Wednesday, May 27, 2015
Researchers Tune in to Protein Pairs
Rice University team quantifies how mutations affect cell signaling in bacteria.
Tuesday, January 28, 2014
Bad Proteins Branch Out
Rice researchers find misfolded proteins are capable of forming tree-like aggregates.
Monday, December 02, 2013
Rice Lab Finds Molecular Clues to Wilson Disease
Physical biochemists used computer tests and lab experiments to show how mutation alters key protein.
Friday, August 22, 2008
Scientific News
ASMS 2016: Targeting Mass Spectrometry Tools for the Masses
The expanding application range of MS in life sciences, food, energy, and health sciences research was highlighted at this year's ASMS meeting in San Antonio, Texas.
“Amazing Protein Diversity” Discovered in Maize
The genome of the corn plant – or maize, as it’s called almost everywhere except the US – “is a lot more exciting” than scientists have previously believed. So says the lead scientist in a new effort to analyze and annotate the depth of the plant’s genetic resources.
Proteins in Blood of Heart Disease Patients May Predict Adverse Events
Nine-protein test shown superior to conventional assessments of risk.
Self-Assembling Protein Shell for Drug Delivery
Made-to-order nano-cages open possibilities of shipping cargo into living cells or fashioning small chemical reactors.
Molecular Map Provides Clues To Zinc-Related Diseases
Mapping the molecular structure where medicine goes to work is a crucial step toward drug discovery against deadly diseases.
Nanoprobe Enables Measurement of Protein Dynamics in Living Cells
Mass. General and Harvard researchers use device to measure how anesthetic affects levels of Alzheimer's-associated proteins.
Diagnosing Systemic Infections Quickly, Reliably
Team develop rapid and specific diagnostic assay that could help physicians decide within an hour whether a patient has a systemic infection and should be hospitalized for aggressive intervention therapy.
What Makes a Good Scientist?
It’s the journey, not just the destination that counts as a scientist when conducting research.
A New Tool Brings Personalized Medicine Closer
Scientists from EPFL and ETHZ have developed a powerful tool for exploring and determining the inherent biological differences between individuals, which overcomes a major hurdle for personalized medicine.
Blood Test That Detects Early Alzheimer’s Disease
A research team, led by Dr. Robert Nagele from Rowan University School of Osteopathic Medicine and Durin Technologies, Inc., has announced the development of a blood test that leverages the body’s immune response system to detect an early stage of Alzheimer’s disease – referred to as the mild cognitive impairment (MCI) stage – with unparalleled accuracy.
Scroll Up
Scroll Down
SELECTBIO

SELECTBIO Market Reports
Go to LabTube
Go to eposters
 
Access to the latest scientific news
Exclusive articles
Upload and share your posters on ePosters
Latest presentations and webinars
View a library of 1,800+ scientific and medical posters
3,200+ scientific and medical posters
A library of 2,500+ scientific videos on LabTube
4,600+ scientific videos
Close
Premium CrownJOIN TECHNOLOGY NETWORKS PREMIUM FOR FREE!