Noisy Data Facilitates Investigation of Breast Cancer Gene Expression
News Jan 23, 2015
Researchers from Dartmouth's Norris Cotton Cancer Center, led by Casey S. Greene, PhD, reported in Pacific Symposium on Biocomputing on the use of denoising autoencoders (DAs) to effectively extract key biological principles from gene expression data and summarize them into constructed features with convenient properties.
"Cancers are very complex," explained Greene. "Our goal is to measure which genes are being expressed, and to what extent they're being expressed, and then automatically summarize what the cancer is doing and how we might control it."
Normally, it is difficult to apply computational models across different studies because the gene expression data is "noisy," meaning that there are many factors that differ in the way gene expression is measured. To begin their analysis, Greene's team added more noise to the data and then trained a computer to remove the noise. To remove the noise, the computer had to learn about key underlying features of breast cancer. "This approach of removing noise makes the models we constructed more generally applicable," Greene said.
Greene and the Dartmouth team studied DAs, which train computers directly on the data without requiring researchers to provide known biological principles to the computer, as a method to identify and extract complex patterns from genomic data. The model that the computer constructs can then be compared to previous discoveries to understand where data supports those discoveries and where the data raises new questions. The performance of DAs was evaluated by applying them to a large collection of breast cancer gene expression data. Results show that DAs were able to recognize changes in gene expression that corresponded to the cancers' molecular and clinical information.
"These techniques and findings will enable others to use the DAs to evaluate gene expression data in a variety of disease sites," reported Greene. "While noise in data is usually viewed as a problem, adding noise to data can actually be a good thing because it can help reveal the underlying signal. When we did this to analyze data from breast cancers, we found gene expression features that generalize across studies and represent important clinical factors."
Next for Greene's research team are more complex models that take multiple levels of regulation into account. Their goal is to develop methods that not only model data but that can automatically explain to researchers what the models have learned.
Dr. Greene and his team of investigators do their research at Dartmouth's Norris Cotton Cancer Center in Hanover and Lebanon, New Hampshire. Their work is supported in part by NIH funding P20 GM103534 and the American Cancer Society Grant #IRG-82-003-27.
Existing 20-year-old Multiple Sclerosis Drug Effective Against Multi-resistant BacteriaNews
A widely-used and twenty-year-old medicine used to treat multiple sclerosis can also beat a type of multi-resistant bacteria for which there are currently only a few effective drugs.READ MORE
Revolutionary Imaging Technique Uses CRISPR to Map DNA MutationsNews
The new high-speed AFM method can map DNA to a resolution of tens of base pairs while creating images up to a million base pairs in size. And it does it using a fraction of the amount of specimen required for DNA sequencing.READ MORE
Comments | 0 ADD COMMENT
3rd Annual NGS Data Analysis and Informatics Conference
Feb 08 - Feb 09, 2018
3rd Annual Genome Editing & Engineering Conference
Feb 08 - Feb 09, 2018