Using Machine Learning To Enhance Biomarkers for Cancer Immunotherapy
Complete the form below to unlock access to ALL audio articles.
Tumor cells’ ability to avoid destruction by the immune system is one of the core hallmarks of cancer. The immune system aims to identify and eliminate malignant cells it recognizes as “non-self” through the presentation of tumor neoantigens by human leukocyte antigens (HLA) on the surface of cells.
HLA loss of heterozygosity (HLA-LOH) is a phenomenon that occurs in some cancer cells to help them to evade detection by the immune system. HLA alleles that encode the antigen-presentation machinery are deleted from the genomes of these cells, effectively hiding them from detection by the immune system. With the success of immunotherapies such as immune checkpoint inhibitors, understanding immune evasion in cancers is more important than ever. However, despite the importance of HLA-LOH identification in predicting response to immunotherapy, there are few accurate methods currently available for its detection.
Biotechnology company Personalis published a recent paper in Nature Communications detailing a new machine learning algorithm capable of detecting HLA LOH from whole-exome sequencing data. This new machine-learning approach – named DASH (Deletion of Allele-Specific HLAs) – detects HLA LOH using data from paired normal and tumor tissue, with the aim of advancing the use of HLA-LOH as a biomarker for cancer immunotherapy.
To find out more about DASH and its applications, we spoke to Dr. Rachel Marty Pyke, manager of bioinformatics science at Personalis and lead author of the paper.
Sarah Whelan (SW): You explain that HLA LOH is important to cancer cells escaping immune recognition ‒ how important is immune escape in tumors, and how does it affect the potential use of immunotherapy?
Dr. Rachel Marty Pyke (RMP): Evidence of tumor immunoediting has been building for the past half-century, resulting in the addition of "evading immune destruction" to the "hallmarks of cancer". Immunoediting is the idea that immune cells attack and kill immunogenic cancer clones, leaving less immunogenic, or hidden, clones behind to survive and grow. Here are a couple of highlights from the literature:
- Chowell et al. have published two papers (Science 2017 and Nature Medicine 2019) showing that reduced germline variation in the HLA genes which code for major histocompatibility complexes (MHCs) leads to poorer response to checkpoint blockade immunotherapy, suggesting that a lack of diversity of neoantigens can facilitate immune escape and lead to poorer outcomes for patients.
- Just in the past month, two brilliant Nature papers quantified immunoediting to show that hotspot mutations are striking the optimal balance between oncogenicity and immunogenicity (Hoyos et al. and Luksza et al.).
The literature has shown that the necessity of immune escape shapes tumor evolution and impacts patient prognosis – two critical areas!
Immunotherapies work by either sensitizing or re-sensitizing a patient’s immune system to their tumor, allowing it to attack and kill the tumor. Immune escape can manifest in several ways that interfere with this process:
- Reducing the immunogenic neoantigens presented by the tumor can result in fewer targets for the immune system, making checkpoint inhibitors less effective.
- Breaking of the antigen presentation machinery (like HLA) can stop the presentation of all or specific antigens making checkpoint therapies, personalized cancer vaccines and adoptive T cell therapies all less effective.
- Tumors often upregulate checkpoints to block attacks by T-cells. Personalized cancer vaccines cannot overcome this method of immune escape alone, forcing the rise of combination therapies with checkpoint inhibitors.
SW: Can you give a brief summary of how this newly developed technique – DASH – works?
RMP: DASH relies on an input of high-quality exome sequencing of tumor and normal DNA from a patient. While we designed the method to work with the HLA-enriched ImmunoID NeXT Platform®, we also demonstrated that the method still works on other exomes, with slight performance decreases. From the exome sequencing data, HLA typing is performed, and the tumor and normal reads are aligned to a patient-specific HLA reference. Then, seven features are calculated and used as input to an XGBoost machine learning model that detects HLA LOH in specific genes.
The performance of the model hinges on its features for the model. We worked to quantify the unique aspects of the HLA region that differentiate it from other parts of the genome. Two of our unique features are:
- Modified b-allele frequencies. The b-allele typically refers to the non-reference allele of two alleles – A and B. B-allele frequency was coined to refer to the intensity ratio between two alleles on a microarray. In our case, we are using it to refer to the sequence depth ratio between the two alleles. Most copy number detection algorithms are going to use b-allele frequencies to understand if there is allelic imbalance. However, due to the complexity of the HLA region, the quality of probe capture can vary for specific alleles. To overcome this hurdle, we normalize the b-allele frequencies by the normal DNA to account for these probe biases.
- Flanking regions. The HLA genes are relatively short and sometimes have few genomic differences between homologous alleles. To increase our sensitivity and confidence in our calls, we draw on information from the regions surrounding the HLA genes.
SW: What were your main aims when developing this new technique?
RMP: We had two main focuses. First, we wanted to tune the method specifically for the HLA region, so we focused on designing features that capture the unique challenges that the HLA region presents (described above). Second, we wanted to develop orthogonal approaches to truly understand the performance of the method.
SW: Did you experience any challenges during the development of this new technique?
RMP: Most of the challenges we faced in this project revolved around the validation of DASH. We took three main approaches to validation – in silico cell line dilutions, patient-specific digital polymerase chain reaction (PCR), and functional immunopeptidomics.
To perform the in silico cell line dilutions to quantify the limit of detection of our method, we had to profile several dozen cell lines to find ones with HLA LOH. Due to the complexity and diversity of the HLA genes, we had to design allele-specific, patient-specific primers to evaluate deletion with digital PCR. We went through several iterations to find the optimal primers for each patient that gave a clean signal. Finally, we found that the quantitative immunopeptidomics approach to validation was very challenging. While we were hoping to show robust validation results in this section, we found a limited signal with the method. We hypothesized several reasons for this finding, but it would likely take another research paper to truly understand whether the limitations were technical or biological.
SW: How does DASH compare to LOHHLA (Loss of Heterozygosity in Human Leukocyte Antigen), the existing technique for detecting HLA LOH?
RMP: We evaluated the performance of DASH in comparison to the existing tool, LOHHLA in two ways. First, we showed very similar performance on the set of patient tumors that we profiled with patient-specific digital PCR. Second, we evaluated the performances using in silico cell line dilutions. While both methods had strong specificities across all dilutions, we found that DASH was more sensitive at lower tumor purities and for sub-clonal events. Capturing HLA LOH in both scenarios is critical for real patient samples.
SW: Are there any limitations of the new technique that you wish to highlight?
RMP: There are several potential areas for improvement.
First, the machine learning approach is highly dependent on the training dataset. Expanding the training dataset to more patients and optimizing the method for labeling the samples could improve the accuracy of the model.
Second, the quantitative immunopeptidomics approach yielded largely negative results. Likely another study would be required to understand the root cause.
Third, DASH focuses exclusively on the LOH. However, several other allelic imbalance mechanisms, like expression imbalance, may be very relevant for immune escape. Future work could expand the scope to detect aberrations in other mechanisms.
SW: How do you plan to apply this technique moving forward? Could DASH be used to tailor treatment options for patients in the future?
RMP: While we believe that HLA LOH can serve as a biomarker on its own, we are most excited about integrating it with other readouts from our ImmunoID NeXT Platform into composite biomarkers. We recently published a paper in Clinical Cancer Research that describes the layering of DASH on top of neoantigen prediction in a biomarker called NEOPS™. We believe composite biomarkers that capture many different aspects of tumor-immune biology are the future and have the highest potential impact on clinical decision making.
Reference: Pyke RM, Mellacheruvu D, Dea S, et al. A machine learning algorithm with subclonal sensitivity reveals widespread pan-cancer human leukocyte antigen loss of heterozygosity. Nat Commun. 2022;13(1):1925. doi: 10.1038/s41467-022-29203-w
Rachel Marty Pyke was speaking to Sarah Whelan, Science Writer for Technology Networks.