How Is AI Shaping Proteomics and Multiomics?
Explore AI breakthroughs and challenges in proteomics and multiomics.

Complete the form below to unlock access to ALL audio articles.
Artificial intelligence (AI) has emerged as a powerful toolset that could create new opportunities and help overcome hurdles in proteomics and wider omics disciplines. Bolstered by AI, these fields of research could have a profound impact on science and society.
At the Children's Medical Research Institute, the University of Sydney, Associate Professor Qing Zhong’s research interests span big data analysis, machine learning and computational biology.
His work involves mining and managing large-scale proteomics and multiomics datasets. He aims to advance cancer research and implement big data-driven, evidence-based computational tools to enable predictive, preventive and personalized medicine, among other projects.
Zhong recently joined Technology Networks for a conversation on AI’s progress in proteomics and multiomics, barriers to its widespread implementation and his vision for a “continuous, high-resolution lens on biology.”
DIA-NN, DeeProM and AlphaFold
“AI applications in proteomics have gained significant traction recently,” Zhong said. “Particularly with the emergence of data-independent acquisition neural networks (DIA-NN) for streamlined DIA analysis, DeeProM for predicting cancer cell vulnerabilities and AlphaFold for protein structure prediction.”
Pioneered by the laboratory of Professor Ruedi Aebersold, DIA is considered a breakthrough technique in mass spectrometry (MS)-based proteomics. Unlike data-dependent acquisition (DDA), DIA offers unbiased analysis with larger proteome coverage and higher reproducibility, making it a useful method for discovery proteomics. Discovery research is incredibly important for interrogating the underpinning mechanisms of biological states, such as health and disease. There’s one drawback, however; DIA generates large amounts of data, which creates a bottleneck.
“DIA-NN uses deep neural networks to handle large volumes of DIA data, simplifying peptide identification and quantitation,” Zhong explained. DIA-NN is also free to use, contributing to its growing popularity in high-throughput proteomics.
In 2022, researchers – including Zhong – published a pan-cancer proteomic map of 949 human cell lines. The team developed a deep learning-based computational pipeline, named Deep Proteomic Marker, or DeeProM.
“DeeProM enabled the full integration of proteomic data with drug responses and CRISPR-Cas9 gene essentiality screens to build a comprehensive map of protein-specific biomarkers of cancer vulnerabilities that are essential for cancer cell survival and growth,” Zhong said.
A significant challenge in the study of proteins is their versatility, which is also why they’re so useful in biology.
A protein’s function is closely related to its structure. For decades, scientists have worked to develop methods capable of deciphering protein structure. The issue, however, is that the number of different configurations a protein could adopt is enormous.
Enter AlphaFold, an AI program developed by Google’s DeepMind that is trained on mass amounts of data from the Protein Data Bank to predict protein structure.
“AlphaFold has revolutionized structural proteomics by accurately predicting protein folding, offering vital clues about protein function and interaction networks,” Zhong said. AlphaFold can also design de novo proteins – a longstanding challenge in the field – for a wide variety of applications, including the development of novel therapeutics, diagnostics and imaging reagents.
An estimated 2 million researchers across 190 countries are using AlphaFold to inform their research across several applications, from accelerating drug discovery, identifying protein structural alterations associated with diseases such as Alzheimer’s to generating plastic-eating enzymes. The model’s significant impact on science and society earned Google DeepMind’s Demis Hassabis and John M. Jumper one-half of the 2024 Nobel Prize for Chemistry.
“Together, these AI-driven approaches accelerate discoveries in disease mechanisms and therapeutic development, pushing proteomics beyond traditional experimental limits,” Zhong said.
AI hurdles that are yet to be surmounted
Though AI’s transformative potential in proteomics is being realized to some degree, its integration still faces significant challenges.
Data volume, quality and privacy
A common misconception about AI’s position in proteomics and multiomics research, according to Zhong, is that there’s already enough data to drive AI research at the same speed as fields such as natural language processing or computer vision. “In reality, although biomedical experiments generate vast quantities of raw data, only a fraction of these datasets are well annotated, standardized and of high quality,” he said.
“Unlike the billions of labeled texts or images available for training large language or vision models, biomedical data often remain scattered and behind institutional firewalls, limiting opportunities for building equally powerful AI systems,” Zhong continued.
Collaborative data-sharing frameworks, uniform standardization efforts and privacy-preserving technologies are urgently needed to accelerate AI-driven breakthroughs in proteomics and wider biomedical fields.
Privacy-preserving technologies will be integral to AI’s widespread adoption in healthcare research, where patient confidentiality is paramount. At the Human Proteome Organization 2024 World Congress, Zhong presented his recent pre-print research that seeks to address this challenge.*
Zhong and colleagues developed a federated deep learning (FDL) approach, called ProCanFDL. FDL is a technique used to train AI models without sending raw data to the model itself – instead, the model is brought to the data.
“Our system enables AI to learn from individual cancer proteomic data securely, behind local firewalls. In this system, each local computer trains its own AI model on private data, and only the updated local model parameters are aggregated to create a single, more robust global model,” Zhong explained.
Local models were trained on simulated sites that contained data from a pan-cancer cohort and 29 cohorts that were held behind firewalls, representing 8 countries and 19,930 DIA-MS runs. “This global AI model demonstrated a significant improvement in accuracy for cancer subtyping tasks, highlighting its potential to uncover valuable insights into tumors and inform potential treatments – all while maintaining data security and privacy,” Zhong said.
The researchers predict their approach could enable the development of large-scale, privacy-compliant proteomics AI models across institutions globally, advancing digital health.
Funding for AI in omics
Headlines are frequently dominated by sizable AI investment announcements. Though reports suggest venture capital deal activity in AI for healthcare has flourished over the last five years, Zhong believes there is a funding disparity.
“While massive investments – sometimes running into the billions – fuel the development of large language models (LLMs) in the tech sector, the same level of financial backing remains scarce for omics research,” he said.
The impact? There are fewer opportunities to build “Large Omics Models” to a scale and size that compares to contemporary LLMs. “Limited funding slows the creation of foundational datasets, impedes the development of cutting-edge analytic tools and ultimately restricts the field’s growth,” Zhong emphasized, adding that there is an urgent need for greater philanthropic, governmental and industrial investment in omics-focused AI initiatives.
An omics version of ImageNet
Lastly, Zhong highlighted the pressing need for data standards and reproducibility in this line of research, especially as AI models in proteomics and wider omics studies become “increasingly data-hungry”, he said.
“Much like ImageNet transformed the field of computer vision – and large, standardized corpora such as Wikipedia dumps did for language models – omics studies need a well-curated, widely accessible reference dataset,” Zhong continued.
The omics version of ImageNet, or “omics ImageNet”, as he described it, would help to unify metadata protocols, file formats and quality checks across different labs. Subsequently, this would enable reproducible, transparent benchmarking and foster collaboration.
“Establishing such a foundation could dramatically accelerate AI-driven discoveries, making it easier for teams around the world to contribute to – and build upon – the same high-quality datasets,” Zhong said.
A future without limits
In a future without barriers, Zhong believes that AI could transform proteomics and multiomics into a “continuous, high-resolution lens on biology”, one that operates at a massive scale, which “might even dwarf the data used to train LLMs, like ChatGPT,” he said.
“Much as LLMs have reshaped the way we interact with technology, ‘Large Omics Models’ would seamlessly integrate proteomic, genomic and other molecular data, revealing complex cellular processes and disease pathways in real-time,” Zhong said. “By predicting how proteins and other biomolecules evolve, interact and respond under diverse conditions, these models would drive breakthroughs from new diagnostics to highly tailored therapies.”
In this world, scientists could be freed from laborious data management issues. They could pursue creative research projects at an unprecedented pace. “Meanwhile, the broader public would reap the benefits of earlier disease detection, more precise interventions and a deeper comprehension of health that shapes public policy and healthcare worldwide,” Zhong said.
Zhong paints a compelling picture of a future where AI, powered by unified data standards and transformative “Large Omics Models,” revolutionizes proteomics and omics research, delivering profound benefits to science, medicine and society.
Given the rapid pace of current advancements, it may not be too long before we see whether this vision can become a reality.
*This article is based on research findings that are yet to be peer-reviewed. Results are therefore regarded as preliminary and should be interpreted as such. Find out about the role of the peer review process in research here. For further information, please contact the cited source.