How might artificial intelligence (AI) impact agriculture, the food industry, and the field of bioengineering? Dan Jacobson, a research and development staff member in the Biosciences Division at the US Department of Energy’s (DOE’s) Oak Ridge National Laboratory (ORNL), has a few ideas.
For the past 5 years, Jacobson and his team have studied plants to understand the genetic variables and patterns that make them adaptable to changing environments and climates. As a computational biologist, Jacobson uses some of the world’s most powerful supercomputers for his work—including the recently decommissioned Cray XK7 Titan and the world’s most powerful and smartest supercomputer for open science, the IBM AC922 Summit supercomputer, both located at the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility at ORNL.
Last year, Jacobson and his team won an Association for Computing Machinery Gordon Bell Prize after using a special computing technique known as “mixed precision” on Summit to become the first group to reach exascale speed—approximately a quintillion calculations per second.
Jacobson’s team is currently working on numerous projects that form an integrated roadmap for the future of AI in plant breeding and bioenergy.
“We’ve been working on a couple of things. Recently, we’ve developed new ways to do what’s called “genomic selection,” or designing an organism for breeding purposes. We’ve developed a new genomic selection algorithm that’s driven by emerging machine learning methods collectively called “explainable AI,” which is a field that improves on black box classifier AI methods by attempting to understand how these algorithms make decisions.” says Jacobson.
“This algorithm helps us determine which variations in a genome we need to combine to produce plants that can adapt to their environments. This informs breeding efforts, gene editing efforts, or combinations of those, depending on what sort of bioengineering strategy you want to take.”
Last year, Jacobson earned a Gordon Bell Prize after breaking the exascale barrier with a code that allows study of the combinatorial interactions between organisms and their environments. “We’re still using the model we used last year, but now, we’ve introduced this AI-driven genomic selection algorithm to our Combinatorial Metrics [CoMet] code and we’re feeding it environmental information across every day of a year, so we can do genome-wide association studies across climate time.
“Additionally, we’ve expanded to a global scale our efforts in what we’re calling “climatypes”—the climate and environmental information that plants are adapting to. With the help of ORNL’s Peter Thornton and his group’s expertise in biogeography and climate, we built models of every square kilometer of land on the planet and encoded 50 years of environmental and climate data into these models, ranging from the soil, up through light spectral quality, and everything in between.
“To understand all the relationships between different environments, we compared these environments to each other on Summit using a new algorithm called Duo that we added to our CoMet code base. To our knowledge, this is the largest scientific calculation ever done.”
Jacobson believes that these comparisons are of significant use: “These comparisons can help us determine exactly where we can target certain environments and what gene mutations and alleles we need to include to help these plants adapt to different environments. We can look at an environment and say, “For this environment, this is what we’re going to need to have in this plant’s genome for it to thrive as well as it can.
“The integrated vision that we see is the connection of all the “-omics” layers, from genomics (gene expression), proteomics (protein expression), and metabolomics (metabolite expression) all the way up through phenotypes—observable traits; so, from genome to phenome and everything in between.
"Ideally, we’d like to have a combination of genotype data with climate and environmental data in an integrated model, from single nucleotides—the molecular structures that make up DNA—up to environment and climate at the planetary scale. Such comprehensive integrated models are now possible because we’ve actually calculated the light spectral scale of every point on the planet—that’s an astrophysical phenotype that comes from our nearest star, the Sun.
"First, we need to look at the combinatorial interactions in such models to see how they lead to the emergent properties that we’re trying to optimize in plants for future productivity and sustainability. Then, we can connect that with how plants have historically adapted to environments in order to design new ideal genotypes for bioenergy or food production that are optimized to thrive in specific environments."
“The next step is to look at the historical data and all these relationships and then project forward so that we can actually design genotypes that will not only thrive in the current environmental zones but continue to thrive in the future as the global network changes. The ability to project forward, both for annual crops as well as long-term perennial crops, is really important.” says Jacobson.
Harfouche et al. (2019) Accelerating Climate Resilient Plant Breeding by Applying Next-Generation Artificial Intelligence. Trends in Biotechnology. DOI: https://doi.org/10.1016/j.tibtech.2019.05.007
This article has been republished from the following materials. Note: material may have been edited for length and content. For further information, please contact the cited source.