We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.


Disease Detectives: Working on the Frontline

Listen with
Register for free to listen to this article
Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 7 minutes
“If you know the enemy and know yourself, you need not fear the result of a hundred battles.” – Sun Tzu, Chinese general, military strategist, writer, philosopher, and author of the Art of War.

Throughout human history, we have always been at war with infectious agents. From The Black Death (1346–1353) which caused 75–200 million deaths worldwide to the current coronavirus (death toll currently at 2,130*). Scientists and medical practitioners have been trying to use Sun Tzu’s philosophy as the foundation to tackle disease outbreaks. The first epidemiologist was the British physician John Snow in 1854. He tracked cases of cholera and eventually succeeded in identifying contaminated water as the means of disease transmission. The water source was subsequently quarantined from humans, reducing the spread of the disease. Nowadays, more advanced technologies, such as genomic sequencing, are being harnessed to gain information about infectious agents.

What are viruses?

Viruses are among the smallest infectious agents on earth, and cannot survive and reproduce without a host. The primary aim of a virus is to deliver its DNA or RNA genome into the host cell so that its genetic code can be replicated, and more virus particles made. Unfortunately for us and other animals who play host to them, viruses can make us sick by killing cells or disrupting cellular function. If the virus is infectious, it may cause
epidemics (an outbreak of a disease that is greater than what we would normally see in that population) or a pandemic (an epidemic that has travelled to other countries).

Why sequence a viral genome?

If the surface proteins of a virus are recognized by the immune system, the immune system can produce antibodies that can block the infection.
Errors in the DNA or RNA sequence may be made during replication. If the mutation is disadvantageous, it is likely that these variants will be lost from the population. However, if the changes confer a selective advantage to the virus, then it is likely the variant will thrive and spread through the population. For example, if the mutation causes a change in a viral surface protein that reduces the ability of existing host antibodies to recognize and destroy the virus, these viral variants are likely to predominate. These changes are called antigenic drift or antigenic shift. Antigenic drift occurs when the virus acquires small changes which may lead to the virus becoming more immunoevasive. Antigenic shift occurs when a virus undergoes a major change and may become capable of “jumping” from an animal reservoir to a human population, for example, – this is called the spillover effect. Antigenic drift occurs more frequently then antigenic shift.

Sequencing viruses during an epidemic can help to determine the source of the virus, track the spread of disease, help to create diagnostic tests and develop vaccines. The cost and time for sample preparation and sequencing whole genomes have dramatically reduced in the past decade. Therefore, it has become a much
more accessible tool in disease outbreak investigation.

The 2013 Ebola epidemic

Ebola epidemic of 2013 caused deaths on an unprecedented scale for this disease, with 28,646 reported cases and 11,323 reported deaths. Scientists suspected an antigenic shift event in the virus that prompted the outbreak enabling transmission from bats to humans. It was possible to trace the origin of the epidemic to the village of Meliandou, in Guinea, West Africa, where a two-year-old-boy had died. The boy passed the disease onto his mother, his three-year-old sister, and his grandmother (whom all died), before the disease moved out of the village and through the human population of Guinea, Liberia, and Sierra Leone. The virus spread through direct contact and was extremely infectious.

Genome sequencing and data sharing were important “weapons” during this epidemic. To this day, the currency in science is publications, with a common phrase being “publish or perish”. Typically, research labs would hold onto sequence data until the publication became publicly available. This approach can be devasting in the fast-paced world of epidemics, as publications can take months or years to reach the reader. During the Ebola epidemic, however, scientists bucked the trend and released data ahead of publication.

The Ebola pioneers were
Dr Pardis Sabeti and her lab. On May 23, 2014, this lab was sent a sample from a suspected Ebola case. Once this was confirmed as Ebola, more patient samples were requested and by June 15, 2014, 12 full viral genomes from infected Ebola patients were sequenced. All the sequences were posted on the National Center for Biotechnology Information website, so other scientists could see the results instantly and free-of-charge. In late August 2014, a paper was published in Science detailing the results of 99 whole-genome sequences. The genome data provided information on how the virus mutated over time and pointed to the origin of the outbreak.

Sending samples to distant sequencing facilities has many problems, such as long waiting periods, compromised samples and an increased risk to personnel (the longer the travel to the sequencing facilities, the more people will handle the infectious samples). As the outbreak progressed, whole-genome sequencing was conducted in the field.
Dr Lauren Crowley was a scientist on the frontline of the outbreak in April 2015. With her laboratory equipment fitting into one luggage bag, her team was able to get sequencing results within 24 hours of receiving a positive Ebola sample. With this information, transmission chains were established and helped efforts to establish effective quarantine and prevent further spread.

How are scientists using genomic data in the coronavirus (2019-nCoV) outbreak?

The same data sharing principles that were present in the past Ebola epidemic are starting to appear again in the new coronavirus outbreak. However, this time sequencing has become even more cost-effective and faster.

es represent a large family of human respiratory viruses. Most people will have been infected with a coronavirus at some point in their lives, with some strains causing upper respiratory infections similar to the common cold. Over the last 20 years, three coronavirus strains of note have emerged as products of antigenic shift, causing clinical signs ranging from minor symptoms to more “serious” issues. These strains are; severe acute respiratory syndrome (SARS), Middle East Respiratory Syndrome (MERS) and now the new coronavirus (COVID-19). There is still confusion on the name of the new coronavirus, the disease has officially been named by WHO as COVID-19, however the virus (2019-nCoV) is still currently known by multiple identities*. SARS was first reported in early 2003 and originated from civet cats. During this epidemic 8,098 people became sick and 774 died (case-fatality ratio: 10%). MERS appeared in 2012 and was originally transmitted from camels. During the outbreak, there were 2,494 confirmed cases and 858 fatalities (case-fatality ratio: 35%). The case-fatality ratio for COVID-19 cannot be calculated until the epidemic has ended.

2019-nCoV emerged in Wuhan, China in late 2019. Originally the people infected with coronavirus were associated with the seafood and live animal market. The exact dynamics of how the virus is transmitted is yet to be determined. Normally, respiratory illnesses are transmitted through droplets created when an infected person coughs or sneezes, or via contaminated surfaces. The symptoms of coronavirus infection range from mild to severe, and to date there is no specific treatment for the disease. Therapeutics are in development, however, due to the novelty of the disease, supportive care is the only recommended treatment currently.

Similar to the Ebola epidemic, scientists are doing a great job at keeping data
publicly accessible and freely available. This approach has drastically accelerated research. On January 29, 2020, the Institut Pasteur sequenced the whole genome of the coronavirus known as “2019-nCoV”, becoming the first institute in Europe to do so since the start of the outbreak. The whole genome was confirmed in just three days. Around the world, 20 other sequences have been completed and are freely available. This rapid sequencing and dissemination of information have occurred more quickly than during the Ebola epidemic, mainly due to the availability of cheaper and quicker sequencing technologies.

So far, the 2019-nCoV samples appear to be very closely related to each other. The lack of diversity in these viruses suggests little pre-existing immunity in the population, therefore, applying little selective pressure to the coronavirus strain to mutate in order to adapt and spread, however at this early time point in the epidemic this is very speculative.

To prevent further spread, scientists and medical staff need to conduct thorough case and outbreak investigations. To identify those with 2019-nCoV, accurate diagnostic tests need to be developed and deployed. Diagnoses can be made using polymerase chain reaction (PCR)to detect the virus and lung x-rays to look for clinical change. To have an effective PCR diagnostic, scientists need to identify unique and conserved regions of the virus. The more comprehensive our knowledge of the viral genome sequence, the better able scientists are to design PCR-based tests that are specific and sensitive to detecting coronavirus. As such, the
WHO (in partnership with the German Center for Infection Research at Charité – Universitätsmedizin Berlin) and CDC have both published new laboratory tests/ protocols.

Using genetic sequencing for Influenza predictions

In the Northern hemisphere, between October 2018–May 2019,
37.4–42.9 million became ill with influenza (flu), and 36,400–61,200 of these people died. Signs and symptoms of flu include respiratory symptoms, muscle aches, nausea, pneumonia and respiratory failure.

Flu is caused by
the influenza virus, which is part of the Orthomyxoviridae virus family. Influenza is a single-stranded, negative-sense RNA virus that causes acute respiratory illness in humans. There are three types (A, B and C) that circulate in the human population. Influenza viruses of type B and C mutate slowly and circulate at low levels.

Type A mutates more rapidly and can therefore evade the immune response from individuals that have been previously infected or vaccinated against another strain of influenza. Our immune system primarily recognizes influenza by identifying two surface proteins: hemagglutinin (HA) and neuraminidase (NA). The influenza strain name depends on the version of HA or NA the virus has, such as H1N1 and H3N2. H3N2 is known to have a high mutational rate, as it can change by 3–4 amino acid residues per year. Mutations in HA or NA that make changes at key residues responsible for immune recognition can make the virus undetectable to the immune system.

Influenza vaccines are developed at the start of the flu season, based on scientific predictions about the influenza strains that are likely to cause problems in the upcoming flu season. To help optimize the development of vaccines, there is a global initiative called
"GISAID" that aims to help share all types of influenza sequences. This initiative seeks to investigate influenza evolution by providing a public data set of complete influenza genome sequences from collections of isolates representing diverse species distributions. This resource provides international researchers with valuable information that is needed to develop vaccines, therapies and diagnostics, as well as improve overall understanding of the molecular evolution of influenza and other genetic factors that determine virulence. This knowledge can help mitigate the impact of annual influenza epidemics and improve scientific knowledge for influenza vaccines. Once the vaccine design has been decided it can take five to six months for the first batch to be available.


In the last few decades, genome sequencing has become faster and cheaper, meaning research teams across the globe can use it as a tool to fight and defeat rising epidemics. During recent epidemics, we have seen increasing numbers of scientists selflessly putting the need of the sick above ego. By swiftly releasing valuable data to the global scientific community,
more resources can be applied to epidemic problems. If humanity wants to defeat any future epidemics and other crises, then scientists need to be willing to continue to make more data freely available. 

* Correct at time of writing [20/02/2020]