Summary Researchers developed ProtGPS, an AI tool that predicts protein localization in cells and how mutations affect disease. The model identifies functional disruptions and designs novel proteins for targeted therapies. This could revolutionize drug development, helping scientists create more effective treatments for diseases caused by protein mislocalization. Key Takeaways ProtGPS predicts protein localization, offering insights into function and disease.

The AI model identifies mutations that alter localization, revealing potential disease mechanisms.

Researchers designed novel proteins that localize to specific compartments, aiding drug development and therapeutic design.





Proteins are the workhorses that keep our cells running, and there are many thousands of types of proteins in our cells, each performing a specialized function. Researchers have long known that the structure of a protein determines what it can do. More recently, researchers are coming to appreciate that a protein’s localization is also critical for its function. Cells are full of compartments that help to organize their many denizens. Along with the well-known organelles that adorn the pages of biology textbooks, these spaces also include a variety of dynamic, membrane-less compartments that concentrate certain molecules together to perform shared functions. Knowing where a given protein localizes, and who it co-localizes with, can therefore be useful for better understanding that protein and its role in the healthy or diseased cell, but researchers have lacked a systematic way to predict this information.





Meanwhile, protein structure has been studied for over half a century, culminating in the artificial intelligence (AI) tool AlphaFold, which can predict protein structure from a protein’s amino acid code, the linear string of building blocks within it that folds to create its structure. AlphaFold and models like it have become widely used tools in research.

Proteins also contain regions of amino acids that do not fold into a fixed structure, but are instead important for helping proteins join dynamic compartments in the cell. Whitehead Institute Member Richard Young and colleagues wondered whether the code in those regions could be used to predict protein localization in the same way that other regions are used to predict structure.





Other researchers have discovered some protein sequences that code for protein localization, and some have begun developing predictive models for protein localization. However, researchers did not know whether a protein’s localization to any dynamic compartment could be predicted based on its sequence, nor did they have a comparable tool to AlphaFold for predicting localization. Now, Young, also a professor of biology at the Massachusetts Institute of Technology (MIT), Young lab postdoc Henry Kilgore, Regina Barzilay, the School of Engineering Distinguished Professor for AI and Health at MIT’s Computer Science and Artificial Intelligence Laboratory, and colleagues have built such a model, which they call ProtGPS. In a paper published on February 6 in the journal Science, with first authors Kilgore and Barzilay lab graduate students Itamar Chinn, Peter Mikhael, and Ilan Mitnikov, the cross-disciplinary team debuts their model. The researchers show that ProtGPS can predict which of twelve known types of compartments a protein will localize to, as well as whether a disease-associated mutation will change that localization. Additionally, the research team developed a generative algorithm that can design novel proteins to localize to specific compartments.





“My hope is that this is a first step towards a powerful platform that enables people studying proteins to do their research,” Young says, “and that it helps us understand how humans develop into the complex organisms that they are, how mutations disrupt those natural processes, and how to generate therapeutic hypotheses and design drugs to treat dysfunction in a cell.”





The researchers also validated many of the model’s predictions with experimental tests in cells.





“It really excited me to be able to go from computational design all the way to trying these things in the lab,” Barzilay says. “There are a lot of exciting papers in this area of AI, but 99.9% of those never get tested in real systems. Thanks to our collaboration with the Young lab, we were able to test and really learn how well our algorithm is doing.”

Developing the model

The researchers trained and tested ProtGPS on two batches of proteins with known localizations. They found that it could correctly predict where proteins end up with high accuracy. The researchers also tested how well ProtGPS could predict changes in protein localization based on disease-associated mutations within a protein. Many mutations—changes to the sequence for a gene and its corresponding protein—have been found to contribute to or cause disease based on association studies, but the ways in which the mutations lead to disease symptoms remain unknown.





Figuring out the mechanism for how a mutation contributes to disease is important because then researchers can develop therapies to fix that mechanism, preventing or treating the disease. Young and colleagues suspected that many disease-associated mutations might contribute to disease by changing protein localization. For example, a mutation could make a protein unable to join a compartment containing essential partners.





They tested this hypothesis by feeding ProtGOS more than two-hundred thousand proteins with disease-associated mutations, and then asking it to both predict where those mutated proteins would localize and measure how much its prediction changed for a given protein from the normal to the mutated version. A large shift in the prediction indicates a likely change in localization.