A new solution to an old, complex problem?
Proteins are molecular machines which carry out the physiological processes underpinning life, both in humans and in other organisms. Studying the proteome, identifying proteins, characterizing and analyzing their biology, is the focus of the proteomics research field, which has grown and advanced at an impressive rate in recent years.
The vast and varied functionality of proteins is largely related to their shape and structure. Proteins are able to fold themselves into very specific shapes and structures that dictate exactly how they interact with other molecules. Take pharmacology, for example; almost all pharmaceutical drugs elicit their effects by targeting proteins in the human body. Thus, determining protein structure is a fundamental component of proteomics research, and has huge applications. However, this has not been an easy feat due to the large number of proteins that exist, and the myriad of different shapes they can uptake.
"We have been stuck on this one problem – how do proteins fold up – for nearly 50 years. To see DeepMind produce a solution for this, having worked personally on this problem for so long and after so many stops and starts, wondering if we’d ever get there, is a very special moment," – Professor John Moult, co-founder and chair of CASP, in a press release.
Over the years, an array of analytical technologies have been developed to try and solve the problem, including X-ray crystallography, cryo-electron microscopy and mass spectrometry-based approaches. However, these methods can be complex, costly and an entire research project – a PhD for example – can be dedicated to determining the structure of one protein.
AlphaFold adopts AI to predict and determine a proteins' structure and shape by likening a protein to a "spatial graph". "We created an attention-based neural network system, trained end-to-end, that attempts to interpret the structure of this graph, while reasoning over the implicit graph that it’s building. It uses evolutionarily related sequences, multiple sequence alignment (MSA), and a representation of amino acid residue pairs to refine this graph," said AlphaFold's developers.
Professor John Moult and Professor Krzysztof Fidelis founded Critical Assessment of protein Structure Prediction (CASP) in 1993 to catalyze research in protein structure prediction. CASP selects protein structures that have been recently determined as targets for research groups to test the accuracy of their prediction methods against. The scoring chart, known as the Global Distance Test (GDT), ranges from 0-100, where 90 is typically considered a "competitive" result. AlphaFold achieved a score of 92.4 GDT across all targets.
The system is able to develop a strong prediction of a protein's physical structure and can determine a highly accurate structure in days.
The developers said, "We trained this system on publicly available data consisting of ~170,000 protein structures from the protein data bank together with large databases containing protein sequences of unknown structure. It uses approximately 128 TPUv3 cores (roughly equivalent to ~100-200 GPUs) run over a few weeks, which is a relatively modest amount of compute in the context of most large state-of-the-art models used in machine learning today."
"This computational work represents a stunning advance on the protein-folding problem, a 50-year-old grand challenge in biology. It has occurred decades before many people in the field would have predicted. It will be exciting to see the many ways in which it will fundamentally change biological research," – Professor Venki Ramakrishnan, Nobel Laureate and president of the Royal Society, in a press release.
Expanding the frontiers of scientific knowledge
“AlphaFold’s astonishingly accurate models have allowed us to solve a protein structure we were stuck on for close to a decade, relaunching our effort to understand how signals are transmitted across cell membranes," said Professor Andrei Lupas, director at the Max Planck Institute for Developmental Biology.
In their announcement, the developers nod to the potential utility of protein structure prediction systems in future pandemic response strategies. Characterizing the structure of SARS-Cov-2's proteins, and the human proteins with which it interacts to infect host cells, has been a major research focus for many groups over the last few months. "Earlier this year, we predicted several protein structures of the SARS-CoV-2 virus, including ORF3a, whose structures were previously unknown," DeepMind said.
Whilst the data underpinning this announcement has not yet been published, it is already garnering excitement in the scientific community.
"The progress announced today gives us further confidence that AI will become one of humanity’s most useful tools in expanding the frontiers of scientific knowledge, and we’re looking forward to the many years of hard work and discovery ahead!" DeepMind conclude.
1. Jumper J et al. High Accuracy Protein Structure Prediction Using Deep Learning. Fourteenth Critical Assessment of Techniques for Protein Structure Prediction (Abstract Book). https://predictioncenter.org/casp14/doc/CASP14_Abstracts.pdf. Accessed November 30, 2020.