Stanford/Packard Scientist's Data-Mining Technique Strikes Genetic Gold
News Jan 14, 2006
A method to mine existing scientific data may provide a wealth of information about the interactions among genes, the environment and biological processes, say researchers at the Stanford University School of Medicine, Lucile Packard Children's Hospital and Harvard Medical School.
Like panning for gold, they used technique to sift through millions of bits of unrelated information - in this case, gene expression data from so-called microarray experiments - to pinpoint genes likely to be involved in leukemia, aging, injury and muscle development.
"This is just the tip of the iceberg," said bioinformatics specialist Atul Butte, MD, PhD, who is also a pediatrician at Lucile Packard Children's Hospital at Stanford.
"Nearly 100 different diseases have been studied using microarrays, spanning all of medicine."
"This is a new way to explore this type of data. We can study virtually everything that's been studied." Butte is the first author of the study, which is published in the Jan. 6 online issue of Nature Biotechnology.
"Libraries figured out a long time ago how to classify items using the Dewey decimal and other systems," said Butte, who estimates that the contents of the databases are doubling each year.
"We need to write software now that will help scientists assign the proper concepts to each experiment."
Butte and his Harvard co-author, Isaac Kohane, MD, PhD, used computer programs to automatically categorize the tens of thousands of microarray experiments in a single database based on the terms, or concepts, used by the submitter to describe the experiment.
They then looked for findings shared by several experiments with similar concepts, such as tissue type, for example.
Comparing results from many similar experiments allowed them to identify correlations that may not be statistically significant in just one experiment.
Butte and Kohane identified several previously unknown correlations: nine genes whose expression increased or decreased significantly with aging, two genes that are highly expressed in response to injury, and another gene in which the expression drops significantly in leukemic cells.
They also confirmed these relationships by studying genes known to be associated with muscle tissue in both humans and mice.
"As a community, we've standardized the way the data itself is represented," said Butte, "but there are no formal requirements for the accompanying textual descriptions of this data."
"Sometimes people seem to almost copy and paste their entire scientific paper into the text box. We need to clean up our annotations because now we're showing that they have value."
"All the answers are already there," said Butte. "We've reached a critical mass with this data. But unless we're careful, we're going to end up with a big mess."
4000-Year Old DNA Helps Track the Spread of Rice Farming in AsiaNews
Rice farming spread far and wide in ancient Southeast Asia, but how it got there has been a mystery. Now, a study of 4000-year-old DNA—a rare find in this region—suggests it came with farmers migrating from China, where rice farming originated.
Island Life: Worm-eating Mice Hold Clues to EvolutionNews
How much space does a population need to branch out and form a new species? A small island in the Philippines, and four species of mice that live on it, have helped researchers work out the answer.READ MORE
Gonorrhoea Genome Maps Out STD Across EuropeNews
The first European-wide genomic survey of gonorrhoea has mapped antibiotic resistance in this sexually transmitted disease throughout the continent. Researchers also showed that using DNA sequencing data they could accurately determine antibiotic resistance and identify incorrect laboratory test results.
Comments | 0 ADD COMMENT
Epigenetics in the nervous system: development and disease
Oct 01 - Oct 03, 2018