Purdue University is working with Argonne National Laboratory to supercharge a "metagenomics data" system that could bring innovations in personalized medicine, better management of oil spills and other clinical and ecological applications.
The complex human intestine contains more than 1,000 microbial species, and metagenomics concerns the study of this microbiome, which plays an important role in protection against disease, modulating immunity and regulating metabolic processes possibly related to obesity and diabetes.
Argonne in 2008 created an open portal for metagenomics data processing called MG-RAST, a pipeline of hardware and software providing access to a treasure trove of genetic information on microbes that could have implications for personalized medicine tailored to individual patients; bio-enzymes for cleaning up contamination from oil spills; and the sustainability of operations such as mineral mining.
"In metagenomics, for example, you take a culture of the gut, you bring it into the lab and do the genetic analysis, and it gives you a profile of the bacteria and other microbes that are present," said Ananth Grama, a Purdue professor of computer science. "Then you can compare that profile with MG-RAST data to diagnose the patient."
The portal, however, is straining to keep up with a steady increase in demand due in large part to a dramatic advancement in commercially available gene-sequencing instruments. The research team will work to solve this problem in a project funded with a $3.8 million five-year grant from the National Institutes of Health.
The project involves Grama and co-principal investigators Saurabh Bagchi, a professor in the School of Electrical and Computer Engineering and Department of Computer Science, and Somali Chaterji, a visiting faculty member in the Department of Computer Science. The Purdue researchers will collaborate with an Argonne team led by Folker Meyer, a computational biologist at Argonne and a senior fellow at the Computation Institute at the University of Chicago.
"The analysis of metagenomic data has tremendous potential in both clinical and ecological applications," Bagchi said. "However, since its original design, MG-RAST has witnessed the frenetic development of next-generation sequencing technologies, a drastically altered computing landscape and other challenges."
Meyer, who is deputy director of the Biosciences division and a part of the Mathematics and Computer Science division at Argonne, said, "This grant will bring significant upgrades to the MG-RAST software infrastructure and harnesses the expertise of the Purdue team in distributed infrastructures and robust software development."
One goal is to develop a "federated infrastructure" that allows MG-RAST to seamlessly farm jobs out to various computer clusters around the country and harness the large distributed computing power that exists through federal investments by the National Science Foundation, U.S. Department of Energy and NIH.
"To do that you need federation software to figure out how to do all this data transfer back and forth and make sure all the credentials line up so the user doesn't have to log in separately to all these different machines," Bagchi said. "When a new job comes in, they want to be able to figure out which machines to assign it to. The correct choice may mean the difference between taking 10 hours to run or one hour to run."
The team will collaborate with researchers in the University of Chicago's School of Medicine to test a pilot-scale system.
"Overall, we aim to improve the pipeline's functionality and data reproducibility, improve MG-RAST's software quality and performance through automated generation of testing suites and move toward a federated infrastructure to cater to a diverse and ever-growing user base," Chaterji said.