Genomics Testing Needs Structured Data To Succeed
Dr. Heidi Rehm discusses how AI, data sharing and structured standards are shaping the future of genomic medicine.
As genomic technologies advance, the lines between research and clinical application are increasingly intertwined. Few have been as instrumental in shaping this evolving landscape as Dr. Heidi Rehm, a leading human geneticist whose work has impacted how genomic data is interpreted, shared and applied to patient care.
As the chief medical officer of the Broad Clinical Laboratories, chief genomics officer at Massachusetts General Hospital and a co-director of the Program in Medical and Population Genetics at the Broad Institute, Rehm stands at the intersection of science, medicine and data policy. Through her leadership in initiatives such as ClinGen (the Clinical Genome Resource) and GA4GH (the Global Alliance for Genomics and Health), she has championed global efforts to standardize genomic data.
Technology Networks caught up with Rehm at the annual GA4GH plenary meeting. In this Q&A, Rehm discusses how research and clinical practice inform one another, the growing role of AI in variant interpretation and why structured, sharable data will be key to realizing the full clinical potential of genomics in the decade ahead.
As the chief medical officer of the Broad Clinical Laboratories, you oversee genomic testing for both clinical and research use. How do you see these two applications influencing each other?
When we work in genomics, there's always a gray boundary between research and clinical use because for every patient who gets a genetic test, we're often finding variants we've never seen before. We have to interpret, and so, in essence, we’re conducting research every time we do a genetic test in many cases.
Similarly, when we conduct clinical testing and discover information about the patients, we want to utilize that information in research to better understand the disease, which could lead to clinical trials and endpoints for developing new drugs.
There's always an interplay between the research and the clinical side of genetics.
ClinGen has become a critical global resource for gene and variant interpretation. Can you tell us about this effort and its overall aims?
ClinGen is a worldwide effort. While it's funded in the United States, it engages nearly 3,000 people worldwide, who participate as either volunteer bio curators or experts in their different clinical domains.
They contribute to curating the relationships between genes and diseases or between variants in those genes and diseases, as well as other types of curation. We're also investigating both somatic cancer and pharmacogenomics, as well as rare diseases and common diseases.
It really spans the gamut of all genetic disorders.
We really work together as a community to expertly curate that information so it can be reliably used in both research and clinical care.
ClinGen is currently engaging in pilot studies on the use of AI and machine learning in different aspects of the work we do. This includes some of the tools that are used in things such as variant classification.
This includes the use of SpliceAI, which was built on machine learning algorithms to help predict whether a variant will impact splicing, for example.
In other cases, we may use AI to find literature or use large language models to extract content from papers, to help us organize that information, so that we can apply that evidence to a curation.
There are many ways that we might use AI and machine learning in different aspects of both the underlying evidence generation, as well as how we collect, store and categorize that information.
To most effectively support variant interpretation in patients, we need access to data from all around the world, because most variants are seen in one or maybe a handful of individuals.
One third of all genetic tests end in an inconclusive result due to a variant of uncertain significance. We need to figure out how to interpret these variants.
We need to be able to share evidence.
I think for genetic testing to have the most impact, we will need to be able to readily access structured data, including evidence from patients in functional studies.
All the data being generated needs to be structured in standardized ways so that we can consume it rapidly, and it can be shared extensively so that anyone around the world can access it.
The standards for structuring that data, as well as sharing it in very efficient and effective ways, are going to be critical if we're to scale genomics at the level of impacting all individuals on the planet.
GA4GH, or the Global Alliance for Genomics and Health, is an organization that's dedicated to global data sharing to improve health and the use of genomics in both research and clinical care. We bring the community together in the form of work streams that are developing standards in the different aspects of how we share and standardize data so it can be usable.
That may take the form of policies where we talk about how data should be shared and how to obtain consent from patients, but we also talk about technical standards.
The annual GA4GH plenary meeting is here in Uppsala, Sweden. We rotate this meeting all around the world because it's critical to have in-person meetings where we can deeply engage in different regions of the world to show the benefit of the incorporation of GA4GH standards, in the work that both members of our community do, but also the clinical and research enterprises that each of our regions works with.
By hosting these annual meetings, we can share our own experiences using these standards and how they've enabled our work. It also allows us to engage with other groups to work with us to deploy these standards, show their benefit and contribute to improving them over time – so that we can continue to keep pace with the major advances in genomics and health.