Tennessee To Participate In NSF ‘Big Data’ Genomics Research Project
News Oct 17, 2014
The National Science Foundation is supporting the data-driven genomic science community with $31 million in awards to develop tools, cyberinfrastructure and best practices for data science, and a researcher with the University of Tennessee Institute of Agriculture has been named as a co-principal investigator on one of the funded projects.
Margaret Staton, an assistant professor of bioinformatics in the Department of Entomology and Plant Pathology, will work with a team led by researchers from Washington State University, on a $1.3 million grant for the Data Infrastructure Building Blocks (DIBBS) program. The Tripal Gateway project, one of 17 awarded by the NSF, is expected to build on existing cyberinfrastructure to enhance the capacity of genomic databases to manage, exchange and process big data.
So, just what is “big data?” With respect to this research, “big data” refers to the collection of massive amounts of information regarding the genomics of plants – sets of data that are so large that a single researcher or research group would have difficulty storing and analyzing all the data for a single plant, much less for multiple individuals or multiple species.
The Tripal Gateway project is based on open-source software known as Tripal (http://tripal.info), which was originally developed by WSU’s Steve Ficklin and Staton while the two were at Clemson University. Dorrie Main, also of WSU, and Kirsten Bett at the University of Saskatchewan, expanded the software, and Tripal is now used by at least 24 different plant and animal databases.
Under the new NSF-funded effort, Staton will implement the enhanced Tripal software for the Hardwood Genomics website (www.hardwoodgenomics.org). Like the now complete Human Genome Project, the Hardwood Genomics Project seeks to create genomic resources regarding the most economically and phylogenetically important hardwood species in North America, including sugar maple, tulip poplar, and ash and oak species. Scott Schlarbaum, director of the UT Tree Improvement Program and a professor in the Department of Forestry, Wildlife and Fisheries, was among the scientists who established the Hardwood Genomics website.
“This is a great example of UTIA's breadth of research from the field to the lab to the computer. The UT Tree Improvement Program is critical for breeding, grafting, establishing and maintaining populations of trees that are used for DNA sequencing, RNA sequencing and genetic mapping. The data from those experiments is analyzed through computational resources and shared with other researchers and the public via the website, all with computational resources housed here.”
Staton explains that many of the scientists that use the Hardwood Genomics website want to embrace the new wave of "big data," but they lack the computational capabilities to process and understand huge data sets. “This grant is funding the development of new analysis software that will enable the website users to perform cutting-edge genetic and genomic science with many different types of biological data,” she said.
“Our group of online resources for plant scientists will be expanded to offer flexible, web-based data analysis tools. Our users will not just be able to download, search or view data, but to upload and perform analysis on their own datasets,” Staton added.
To get users started with the expected new capabilities of the website, Staton will develop online educational materials for using the new software tools and offer training workshops at professional conferences. She hopes these tools will help promote the utility of the website to the research community. In addition, working with other scientists participating in the grant, Staton will help create, test and integrate cross-database mining capabilities and SSWAP (simple semantic web architecture and protocol) services for the hardwoods database. She will also solicit website user feedback to determine what services and research tools are most important for scientists to accomplish their research goals.
The ultimate goal of collecting and analyzing all this big data, says the Staton, is forest health and tree sustainability.
Organizations looking to benefit from the artificial intelligence (AI) revolution should be cautious about putting all their eggs in one basket, a study published in Nature Machine Intelligence has found. Researchers found that contrary to conventional wisdom, there can be no exact method for deciding whether a given problem may be successfully solved by machine learning tools.READ MORE