In biology, the binding of cellular proteins to molecules called ligands produces a myriad of functions essential for life, including cell signaling and enzymatic action. In biotechnology and medicine, the ability of researchers to alter proteins to refine control over binding affinity and specificity can create tailored therapeutics with reduced side effects, highly sensitive diagnostic tools, efficient biocatalysis, targeted drug delivery systems and sustainable bioremediation solutions.





Various approaches to such protein redesign have drawbacks. Traditional methods include time-consuming trial and error efforts, and many models in the emerging field of computational design demand extensive information about the protein structure and the pocket where a ligand binds.

Researchers led by Truong Son Hy, PhD, from the University of Alabama at Birmingham, offer a simplified method they call ProteinReDiff that uses artificial intelligence to speed the redesign of ligand-binding proteins.





“Our framework enables the design of high-affinity ligand-binding proteins without reliance on detailed structural information,” said Hy, an assistant professor in the UAB Department of Computer Science. “We rely solely on initial protein sequences and ligand SMILES strings.”





SMILES, or the Simplified Molecular Input Line Entry System, is a longstanding specification of the structure of molecules using only computer-readable ASCII characters.





“A key feature of our method is blind docking, which predicts how the redesigned protein interacts with its ligand without the need for predefined binding site information,” Hy said. “This streamlined approach significantly reduces reliance on detailed structural data, thus expanding the scope for sequence-based exploration of protein-ligand interactions.”





The researchers — including Viet Thanh Duy Nguyen, FPT Software AI Center, Ho Chi Minh City, Vietnam, and Nhan D. Nguyen, University of Chicago, trained the artificial intelligence framework ProteinReDiff on numerous known structures of proteins and their binding ligands. They then were able to redesign selected protein-ligand pairs by stochastically masking amino acids and equivariantly denoising the diffusion model to capture the joint distribution of ligand and protein complex conformations.





With regard to input characteristics, six of the eight comparison models relied on protein structure information as one of the inputs; only ProteinReDiff and a model called DPL relied solely on protein sequence and ligand SMILES inputs. With regard to outputs, only ProteinReDiff produced new protein designs that included protein sequence, protein structure and ligand structure.





With regard to performance, redesigned proteins from selected protein-ligand pairs produced by ProteinReDiff and the eight other protein design models were compared for ligand binding affinity, amino acid sequence diversity and structure preservation. ProteinReDiff produced superior improvement in ligand binding affinity, compared to the other models.





“Our model excels in optimizing ligand binding affinity based solely on initial protein sequences and ligand SMILES strings, bypassing the need for detailed structural data,” Hy said. “These findings open new possibilities for protein-ligand complex modeling, indicating significant potential for ProteinReDiff in various biotechnological and pharmaceutical applications.”





The study, “ProteinReDiff: Complex-based ligand-binding proteins redesign by equivariant diffusion-based generative models,” is published in the journal Structural Dynamics, as part of a special topic on Artificial Intelligence and Structural Science.





ProteinReDiff stands for Protein Redesign based on Diffusion Models, and it incorporates key improvements inspired by the representation learning modules from the AlphaFold2 architecture of computer-based protein folding. These modules allow the ProteinReDiff framework to capture intricate protein–ligand interactions, improve the fidelity of binding affinity predictions and enable more precise redesigns of ligand-binding proteins.





