TRON’s CoVigator Offers an Unseen Perspective of SARS-CoV-2
Complete the form below to unlock access to ALL audio articles.
The SARS-CoV-2 variants Delta and Delta-plus are making their way around the world, however these are only two of hundreds of unique mutations catalogued by TRON, a research institute in Mainz, Germany. Using their publicly available CoVigator web research tool, researchers and scientists can navigate TRON’s extensive database that tracks the bio-geographical evolution of the virus using gene sequences from thousands of COVID-19 patients.
TRON gGmbH is a nonprofit research organization established as independent spin-off of the University Medical Center of the Johannes Gutenberg University Mainz TRON, meaning “Translational Oncology” bridges the gap between scientific research and pharmaceutical application, carrying on novel research to discover immunological mechanisms and therapeutic modulation of the immune system. Their research has been specific to mainly cancers, but with the pandemic developing in 2020, they applied their tools and expertise to compile a living database of SARS-CoV-2 mutations.
“Vaccines are designed to trigger the human immune system to recognize an invading virus or bacteria and do what it normally does to eliminate it,” explained Martin Löwer, deputy director of the Biomarker Development Center (BDC) at TRON. “But to develop a vaccine requires understanding of how the virus or bacteria invades human cells and what it does to modify them. We analyze genomes to look for specific mutations in the DNA, which are important to understanding the evolution of a disease and how individualized medicine can affect the disease in different patients.”
Throughout 2020, as COVID-19 developed into a worldwide pandemic and millions of people around the globe became infected, these diverse populations offered hosts where the SARS-CoV-2 virus could evolve through mutation. Mutational variants are of considerable interest to health professionals and scientists as vaccines are developed to fight against the spread of the disease. We are seeing the dangerous impact of mutations as the Delta variant of SARS-CoV-2 infects immunized individuals and spreads virulently among the unvaccinated.
While several studies in 2020 had already reported different strains and an increase in the mutation rate, TRON scientists were interested in how evolution might impact the effectivity of the vaccines as they began to appear. So, they began analyzing the SARS-CoV-2 virus’ spike-glycoprotein (spike protein), which is the feature it uses to invade human host cells. mRNA vaccines approved for use are designed to induce immune responses against the spike protein.
“When we started, there was little genomic information for the virus,” said Thomas Bukur, a bioinformatics scientist at TRON. “But we quickly realized that this virus has the potential to mutate and evolve. We also realized that more and more countries were starting large sequencing initiatives. We wanted to create a collection of all this virus information and understand mutations in different populations and geographical regions over time.”
The sequencing initiatives created large and diverse repositories of virus DNA sequence reads. These sequences continue to lead to discoveries of variants of the virus, such as Delta, but a database of the spike protein’s evolution across geographies and individuals was not available at the time. TRON used these repositories to begin their search for variants in the spike protein.
CoVigator screenshot. Credit: TRON
Discovering mutations takes supercomputing capabilities
Looking for variants in a genome is a complex process that involves a lot of computation by powerful computers. A whole genome is an alignment of short segments of base-pair sequences from next-generation sequence (NGS) machines. The SARS-CoV-2 DNA comprises 30,000 base pairs (compared to the human genome’s 3.2 billion base pairs). Genomic repositories provide whole genome assemblies of the virus and sequence data sets that must first be aligned, much like putting together pieces from a 30,000-piece puzzle. Once aligned, TRON’s analyses look at specific differences between the original, “wild type” reference genome and the sample being studied. The differences are annotated as variants (mutations), which, from hundreds of thousands of samples, could add up to millions of variants.
To identify non-synonymous mutations in the spike protein, TRON scientists built a computational pipeline that utilizes various genome processing and bioinformatic tools. This Corona Virus Navigator (CoVigator)pipeline includes trimming, alignment, variant calling and other tasks using open source tools from many genomic software repositories, including The Broad Genome Analysis Toolkit (GATK), BCFtools, LoFreq and iVar. Their CoVigator NGS pipeline is implemented in the Nextflow framework and publicly available on GitHub for use by other researchers.
TRON’s main work focuses on cancers and other diseases of high medical need. Assigning existing computational resources away from that work to SARS-CoV-2 analyses could delay other important discoveries. Working with primeLine Solutions and Intel’s Pandemic Response Technology Initiative, TRON acquired ten new Intel Server Systems nodes built on 2nd Generation Intel Xeon Scalable processors. The new system gave TRON 960 dedicated threads to run in a parallel fashion thousands of tasks for their analyses. TRON is now able to analyze and process more than 20,000 sequencing datasets in less than three hours, providing near-real-time-analysis of the constantly growing publicly available data sets.
CoVigator offers a new perspective on SARS-CoV-2
TRON’s initial study, which produced a preprint, used 146,917 whole genome assemblies and 2,393 next-generation sequencing (NGS) datasets from GISAID, NCBI Virus and NCBI SRA archives. The study revealed that a small percentage of samples contained the wild-type spike protein without variation, but it found 2,592 distinct variants across all samples. The mutation rate was low, but it increased over time. Furthermore, TRON found subclonal mutations, indicating potential co-infection with various SARS-CoV-2 strains and/or intra-host evolution of the virus, plus variants that might affect antibody binding or T-cell recognition.
“The most interesting finding from the research,” commented Löwer, “was seeing the many variants of the spike protein. Secondly, the ability to go back over the last year and a half and track exactly how it evolved offers a new perspective. We are able to detect even small changes early in its evolution. We can see from how it starts in a single patient and mutates within the patient to several variants and across populations and geographical regions. And because the virus continues to travel around the world, we see how mutations move across the globe over time.”
Löwer points out that, while they did not first sequence it, the B.1.1.7 (UK) lineage is characterized by the accumulation of 17 variants–eight of those located in the spike protein. Considering the discoveries of variants, including the latest variants Delta and Delta-plus, it is important to continue to monitor and catalogue the evolution of SAR-CoV-2 and effects of variants on vaccines.
“We now know that the immune system doesn't recognize the whole spike protein,” added Löwer. “It recognizes specific small parts of the protein we call epitopes. People have studied how vaccines trigger the immune system to identify these epitopes. We want to continue to look at whether the mutations we identify change the epitopes that the immune system detects. That is something vaccine producers and scientists would want to know.”
A tool for future pandemics
TRON’s work has given us new insight into the SARS-CoV-2 virus but the study was only the beginning of an effort to first understand and then help scientists continually monitor and analyze the evolution of the virus. Their novel work combines a vast amount of data into a single resource and gives scientists the ability to not only look at many low-frequency variants, but to look at the same variant across many hosts in parallel, presenting many different dimensions all together in one database.
“Without new computational resources, we could have only completed the initial study,” stated Bukur. “With the new system, we are able to provide ongoing research that downloads data from millions of samples and processes them in parallel to identify spike protein variants. As soon as we have the results, we publish them to our CoVigator web service dashboard. With this platform, we are able to keep the work going and make the latest data available to the research community.”
Understanding mutations in new and known viruses is critical to be able to address and continually manage the therapeutic response to widespread and dangerous illnesses. There will certainly be other unique pandemics. And, with climate change, scientists are watching the migration of existing dangerous tropical diseases out of equatorial regions into more temperate zones. These threats will create new challenges for healthcare. The workflows and pipelines created by TRON scientists can be rapidly adapted to new viruses and strains of viruses, offering new tools for collaborative immuno-biology research and response.
“At some point, we can detect these variants during the sequencing,” concluded Löwer, “and be able to see in single patients those individual mutations, which now months later make up more dangerous variants, like Delta. But to see those very small changes quite early and decide whether it’s a variant or just random fluctuation of the data means we are working very closely to the noise level. But we know we can be specific because we now know where the critical variants occur. This will not be the last pandemic. And we hope to use what we have learned to develop methods for early detection of subsequent viruses and their mutations.”
Learn more about the TRON CoVigator NGS pipeline.