Noisy Data Facilitates Investigation of Breast Cancer Gene Expression
News Jan 23, 2015
Researchers from Dartmouth's Norris Cotton Cancer Center, led by Casey S. Greene, PhD, reported in Pacific Symposium on Biocomputing on the use of denoising autoencoders (DAs) to effectively extract key biological principles from gene expression data and summarize them into constructed features with convenient properties.
"Cancers are very complex," explained Greene. "Our goal is to measure which genes are being expressed, and to what extent they're being expressed, and then automatically summarize what the cancer is doing and how we might control it."
Normally, it is difficult to apply computational models across different studies because the gene expression data is "noisy," meaning that there are many factors that differ in the way gene expression is measured. To begin their analysis, Greene's team added more noise to the data and then trained a computer to remove the noise. To remove the noise, the computer had to learn about key underlying features of breast cancer. "This approach of removing noise makes the models we constructed more generally applicable," Greene said.
Greene and the Dartmouth team studied DAs, which train computers directly on the data without requiring researchers to provide known biological principles to the computer, as a method to identify and extract complex patterns from genomic data. The model that the computer constructs can then be compared to previous discoveries to understand where data supports those discoveries and where the data raises new questions. The performance of DAs was evaluated by applying them to a large collection of breast cancer gene expression data. Results show that DAs were able to recognize changes in gene expression that corresponded to the cancers' molecular and clinical information.
"These techniques and findings will enable others to use the DAs to evaluate gene expression data in a variety of disease sites," reported Greene. "While noise in data is usually viewed as a problem, adding noise to data can actually be a good thing because it can help reveal the underlying signal. When we did this to analyze data from breast cancers, we found gene expression features that generalize across studies and represent important clinical factors."
Next for Greene's research team are more complex models that take multiple levels of regulation into account. Their goal is to develop methods that not only model data but that can automatically explain to researchers what the models have learned.
Dr. Greene and his team of investigators do their research at Dartmouth's Norris Cotton Cancer Center in Hanover and Lebanon, New Hampshire. Their work is supported in part by NIH funding P20 GM103534 and the American Cancer Society Grant #IRG-82-003-27.
Can Epigenetics Help Explain the Mechanisms of Autism?News
New findings suggest that epigenetic analysis of DNA regions that control gene expression may hold clues to the genetic basis of autism spectrum disorder.READ MORE
Sanchi Oil Spill Contamination Could Take Three Months to Reach MainlandNews
Water contaminated by the oil currently leaking into the ocean from the Sanchi tanker collision is likely to take at least three months to reach land, and if it does the Korean coast is the most likely location. However, the oil’s fate is highly uncertain, as it may burn, evaporate, or mix into the surface ocean and contaminate the environment for an extended duration.READ MORE