Register for free to listen to this article
Thank you. Listen to this article using the player above. ✖
Want to listen to this article for FREE?
Complete the form below to unlock access to ALL audio articles.
In the era of precision oncology, the integration of high-throughput, multimodal datasets presents both a formidable challenge and a transformative opportunity. From genomic and pharmacological profiles to radiological imaging and chemical perturbation data, the convergence of diverse data types offers unprecedented potential to unravel the complex biological underpinnings of cancer progression and therapeutic response. Yet realizing this potential requires computational frameworks capable of extracting clinically actionable insights from vast, heterogeneous and often incomplete datasets.
We spoke to Dr. Benjamin Haibe-Kains, senior scientist at the Princess Margaret Cancer Centre, University Health Network, and professor in the Medical Biophysics Department of the University of Toronto, at the American Association of Cancer Research (AACR) Annual Meeting 2025. He discussed the challenges of working with clinical data, how AI/ML data models are helping and how the use of virtual biopsies could expand access to precision oncology.
Karen Steward, PhD (KS):
Senior Scientific Specialist
Technology Networks
Karen Steward holds a PhD in molecular microbiology and evolutionary genetics from the University of Cambridge. She moved into science writing in 2017 after over a decade as a research scientist.
What are some of the greatest challenges in collecting, collating and interrogating clinical data in a useful and meaningful way?
Benjamin Haibe-Kains, PhD (BHK):
Senior Scientist
Princess Margaret Cancer Centre, University Health Network
Dr. Benjamin Haibe-Kains is a senior scientist at the Princess Margaret Cancer Centre(PM), University Health Network, and professor in the Medical Biophysics Department of the University of Toronto, having earned his PhD in bioinformatics at the Université Libre de Bruxelles, Belgium. He is the Canada Research Chair in Computational Pharmacogenomics, the scientific director of the Cancer Digital Intelligence Program at PM and head of data science of the Structural Genomics Consortium. Dr. Haibe-Kains’ research focuses on integrating high-throughput data from various sources to analyze multiple facets of cancer progression and therapy response jointly using machine learning and artificial intelligence methods.
Beyond the critical hurdle of data governance, namely, obtaining approvals to access large-scale clinical datasets, the major challenges revolve around the heterogeneity and accessibility of clinical data. Extracting structured data from electronic medical records (EMRs), aligning them with standardized ontologies and integrating information from unstructured sources such as clinical notes, lab tests, radiology and pathology reports remain challenging tasks. Unstructured data, in particular, pose significant difficulties. However, recent advances in large language models (LLMs) and agentic AI systems offer promising solutions to automate and scale the curation of these complex datasets. Once curated, these rich clinical data can be further augmented with high-dimensional modalities such as medical imaging and molecular profiles (genomics and transcriptomics). Together, these multimodal datasets form a strong foundation for developing clinical tools to improve early diagnosis, guide treatment decisions and enable more precise monitoring.
KS:
Senior Scientific Specialist
Technology Networks
Karen Steward holds a PhD in molecular microbiology and evolutionary genetics from the University of Cambridge. She moved into science writing in 2017 after over a decade as a research scientist.
Can you discuss some of the key advantages of using spatial analysis over individual information sources?
BHK:
Senior Scientist
Princess Margaret Cancer Centre, University Health Network
Dr. Benjamin Haibe-Kains is a senior scientist at the Princess Margaret Cancer Centre(PM), University Health Network, and professor in the Medical Biophysics Department of the University of Toronto, having earned his PhD in bioinformatics at the Université Libre de Bruxelles, Belgium. He is the Canada Research Chair in Computational Pharmacogenomics, the scientific director of the Cancer Digital Intelligence Program at PM and head of data science of the Structural Genomics Consortium. Dr. Haibe-Kains’ research focuses on integrating high-throughput data from various sources to analyze multiple facets of cancer progression and therapy response jointly using machine learning and artificial intelligence methods.
It is now well-established that tumors are highly heterogeneous, composed of multiple cellular clones interacting dynamically with the tumor microenvironment. A growing area of research focuses on understanding how spatially organized niches of cancer cells, often referred to as ecotypes, influence tumor progression and therapeutic response. While we have seen significant advances from bulk to single-cell sequencing, spatial analysis represents the next critical frontier. By preserving the physical context of cells within tissues, spatial profiling allows us to map interactions between cancer cells, stromal components and immune infiltrates with unprecedented resolution. This added dimension of information is essential for uncovering clinically relevant patterns that are invisible when data are analyzed in isolation, ultimately enhancing our ability to predict outcomes and design targeted interventions.
KS:
Senior Scientific Specialist
Technology Networks
Karen Steward holds a PhD in molecular microbiology and evolutionary genetics from the University of Cambridge. She moved into science writing in 2017 after over a decade as a research scientist.
How can data models be utilized to help improve data quality and usefulness?
BHK:
Senior Scientist
Princess Margaret Cancer Centre, University Health Network
Dr. Benjamin Haibe-Kains is a senior scientist at the Princess Margaret Cancer Centre(PM), University Health Network, and professor in the Medical Biophysics Department of the University of Toronto, having earned his PhD in bioinformatics at the Université Libre de Bruxelles, Belgium. He is the Canada Research Chair in Computational Pharmacogenomics, the scientific director of the Cancer Digital Intelligence Program at PM and head of data science of the Structural Genomics Consortium. Dr. Haibe-Kains’ research focuses on integrating high-throughput data from various sources to analyze multiple facets of cancer progression and therapy response jointly using machine learning and artificial intelligence methods.
In the context of clinical data, AI models can significantly enhance data quality and utility through deep structuring and standardization. By extracting and harmonizing information across diverse sources, such as clinical notes, lab results, tumor profiles and circulating tumor DNA, AI enables the creation of richly contextualized patient datasets. Moreover, the integrative power of AI, particularly in multimodal data analysis, allows for the identification of convergent biological or clinical patterns. This not only improves interpretability but also helps flag inconsistencies, outliers or potential data entry errors. These capabilities open the door to automated quality control systems and scalable data aggregation, ultimately strengthening the foundation for robust clinical research and precision medicine.
KS:
Senior Scientific Specialist
Technology Networks
Karen Steward holds a PhD in molecular microbiology and evolutionary genetics from the University of Cambridge. She moved into science writing in 2017 after over a decade as a research scientist.
Can you give some examples where data models have been used to predict missing data successfully?
BHK:
Senior Scientist
Princess Margaret Cancer Centre, University Health Network
Dr. Benjamin Haibe-Kains is a senior scientist at the Princess Margaret Cancer Centre(PM), University Health Network, and professor in the Medical Biophysics Department of the University of Toronto, having earned his PhD in bioinformatics at the Université Libre de Bruxelles, Belgium. He is the Canada Research Chair in Computational Pharmacogenomics, the scientific director of the Cancer Digital Intelligence Program at PM and head of data science of the Structural Genomics Consortium. Dr. Haibe-Kains’ research focuses on integrating high-throughput data from various sources to analyze multiple facets of cancer progression and therapy response jointly using machine learning and artificial intelligence methods.
There are several compelling examples where AI/ML models have been used to impute missing data effectively in cancer research and clinical practice. Large language models have been applied to electronic health records to fill in missing variables, such as lab test results or medication histories, by learning patterns from similar patients across large cohorts (e.g., Med-BERT). As metadata or clinical annotation of radiological and pathological images may be missing, AI models have been used to infer tumor grade, molecular subtype or biomarker status (e.g., used here to predict the overall survival of patients diagnosed with brain tumors). Deep learning models have been used to infer missing gene expression values in RNA-seq datasets, leveraging co-expression patterns and network-based relationships. For example, variational autoencoders and matrix completion methods can reconstruct transcriptomic profiles with high accuracy (e.g., stAI). Companies are effectively leveraging millions of gene expression profiles to generate purely computationally the transcriptomic profiles of healthy and cancerous cells resulting of user-specified perturbation (e.g., gene knock down).
KS:
Senior Scientific Specialist
Technology Networks
Karen Steward holds a PhD in molecular microbiology and evolutionary genetics from the University of Cambridge. She moved into science writing in 2017 after over a decade as a research scientist.
For those that may be unfamiliar with the concept, can you explain what virtual biopsies are?
BHK:
Senior Scientist
Princess Margaret Cancer Centre, University Health Network
Dr. Benjamin Haibe-Kains is a senior scientist at the Princess Margaret Cancer Centre(PM), University Health Network, and professor in the Medical Biophysics Department of the University of Toronto, having earned his PhD in bioinformatics at the Université Libre de Bruxelles, Belgium. He is the Canada Research Chair in Computational Pharmacogenomics, the scientific director of the Cancer Digital Intelligence Program at PM and head of data science of the Structural Genomics Consortium. Dr. Haibe-Kains’ research focuses on integrating high-throughput data from various sources to analyze multiple facets of cancer progression and therapy response jointly using machine learning and artificial intelligence methods.
Virtual biopsies refer to non-invasive or minimally invasive methods that use imaging and computational analysis to characterize tumors in ways that traditionally required tissue sampling. Instead of extracting a physical tissue sample, virtual biopsies leverage data from radiological scans (e.g., MRI, CT or PET) or liquid biopsies (e.g., circulating tumor DNA) and apply advanced AI or ML algorithms to infer molecular, histological or prognostic features of the tumor.
The goal is to replicate the insights gained from conventional biopsies while avoiding the risks, limitations and sampling bias associated with invasive procedures. Virtual biopsies are especially valuable for capturing tumor heterogeneity, monitoring disease progression over time and guiding personalized treatment decisions without repeated surgeries or biopsies. Paverd et al 2024 provides an excellent review on this.
KS:
Senior Scientific Specialist
Technology Networks
Karen Steward holds a PhD in molecular microbiology and evolutionary genetics from the University of Cambridge. She moved into science writing in 2017 after over a decade as a research scientist.
What impact might virtual biopsies have on cancer diagnosis and monitoring?
BHK:
Senior Scientist
Princess Margaret Cancer Centre, University Health Network
Dr. Benjamin Haibe-Kains is a senior scientist at the Princess Margaret Cancer Centre(PM), University Health Network, and professor in the Medical Biophysics Department of the University of Toronto, having earned his PhD in bioinformatics at the Université Libre de Bruxelles, Belgium. He is the Canada Research Chair in Computational Pharmacogenomics, the scientific director of the Cancer Digital Intelligence Program at PM and head of data science of the Structural Genomics Consortium. Dr. Haibe-Kains’ research focuses on integrating high-throughput data from various sources to analyze multiple facets of cancer progression and therapy response jointly using machine learning and artificial intelligence methods.
Virtual biopsies have the potential to transform cancer diagnosis and monitoring by enabling non-invasive, longitudinal and comprehensive assessments of tumors. By leveraging imaging data and/or circulating biomarkers using AI models to infer molecular and histological features, virtual biopsies would allow clinicians to detect tumors earlier, monitor treatment response in real time and capture spatial and temporal heterogeneity more effectively. As a result, virtual biopsies can support more personalized treatment strategies, reduce the need for invasive procedures and expand access to precision oncology, especially in settings where traditional biopsies are impractical.