A pre-print article published by Harvard University suggests that trends in hospital traffic and search engine data from the Chinese province of Wuhan in late Summer and early fall 2019 could be attributed to SARS-CoV-2.1
The emergence of SARS-CoV-2, the virus that has caused the COVID-19 pandemic, was initially linked to the Huanan Seafood Market in Wuhan, China in late November - early December 2019. However, the authors of the article highlight that there are a few disparities surrounding this. Firstly, research which linked two-thirds of the identified coronavirus cases to the Huanan Seafood Market failed to find a direct connection to the market for 14 of the cases – including the first case. Subsequently, the door is left open as to the point of origin and infection.2,3,4
Secondly, analysis of the wildlife in the Huanan market could not be linked to SARS-CoV-2, implying that transmission of the virus could have occurred downstream from the spill over event – when a virus "spills over" from one species to another.2,3,4
In the article, Nsoesie et al describe how these disparities led them to consider the idea that SARS-CoV-2 could actually have been circulating in Wuhan prior to its linking to the market.
Previously, digital epidemiology and non-traditional data streams have proven valuable tools in respiratory disease surveillance, including internet search trends. As such, the researchers adopted this method to further explore a potential alternative origin of SARS-CoV-2.
In the study, they adopted vehicle counts extracted from satellite imagery of hospital parking lots in Wuhan to estimate trends in hospital occupancy and explored this in association with reported illnesses linked to influenza.
They also used Baidu search trends – the Chinese technology company specializing in internet-related services – to analyze the searches of disease related terms.
Increased number of hospital visits
The researchers collected 111 satellite images of Wuhan from January 9 2018 to April 30 2020. They found that, between 2018 and 2020, there was a general increasing trend of hospital occupancy as measured by the parking lot volume.
A steep increase in numbers began in August 2019, which resulted in a peak in December 2019. Between September and October 2019, five of the six hospitals demonstrate their highest relative daily volume from the analyzed data series. Interestingly, this coincided with increased internet search queries for terms such as "diarrhea" and "cough". The authors note that the increased search for "diarrhea" is only detectable in late 2019, however "cough" illustrates yearly peaks that align with influenza season. This finding parallels the recent acknowledgement that gastrointestinal symptoms are, as the authors put it, "a unique feature of COVID-19 disease and may be the chief complaint of a significant proportion of presenting patients". Following the public health lockdown of Wuhan on January 23 2019, the authors found a large decrease in hospital volume and also search query data.
“This study is currently a preprint and so has not undergone peer-review. Using search engine data and satellite imagery of hospital traffic to detect disease outbreaks is an interesting idea with some validity. However, it’s important to remember that the data are only correlative and (as the authors admit) cannot identify the cause of the uptick. By focussing on hospitals in Wuhan, the acknowledged epicentre of the outbreak, the study forces the correlation. It would have been interesting (and possibly much more convincing) to have seen control analyses of other Chinese cities outside of the Hubei region.” - Professor Paul Digard, Chair of Virology, University of Edinburgh.
In the article the authors state: "While we cannot confirm if the increased volume was directly related to the new virus, our evidence supports other recent work showing that emergence happened before identification at the Huanan Seafood market. These findings also corroborate the hypothesis that the virus emerged naturally in southern China and was potentially already circulating at the time of the Wuhan cluster."
They add: "In August, we identify a unique increase in searches for diarrhea which was neither seen in previous flu seasons or mirrored in the cough search data. While surprising, this finding lines up with the recent recognition that gastrointestinal (GI) symptoms are a unique feature of COVID-19 disease and may be the chief complaint of a significant proportion of presenting patients."
Considering the limitations
It's important to note that analyzing search query data is tied with its own set of limitations. There is no indication of why the individual searched a specific term, and not all symptom searches are specifically linked to disease morbidity.
In the article, the authors go on to say: "These data are also vulnerable to fluctuations related to events we might not be aware of and individual search behavior changes over time, which may result in spurious signals. Surveillance using web-query data depends on adequate Internet access and Internet penetration in China can be highly variable. However, by the end of 2017, the internet penetration rate was 70.7% in Wuhan which was 14.9% higher than the national average."
To conclude the piece, the researchers state that further research is indeed required to validate the emergence of SARS-CoV-2, but that this study adds to the collection of work on the value of utilizing digital resources for monitoring disease outbreaks.
This article is based on research findings that are yet to be peer-reviewed. Results are therefore regarded as preliminary and should be interpreted as such. Find out about the role of the peer review process in research here. For further information, please contact the cited source.
1. Nsoesie et al. Analysis of hospital traffic and search engine data in Wuhan China indicates early disease activity in the Fall of 2019. Harvard University's DASH repository. https://dash.harvard.edu/bitstream/handle/1/42669767/Satellite_Images_Baidu_COVID19_manuscript_preprint.pdf?sequence=1&isAllowed=y.
2. Lu et al. (2020). Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. DOI:10.1016/S0140-6736(20)30251-8.
3. Benvenuto et al. (2020). The 2019-new coronavirus epidemic: Evidence for virus evolution. J Med Virol. DOI: 10.1002/jmv.25688 3.
4. Duchene et al. (2020). Temporal signal and the phylodynamic threshold of SARS-CoV-2. bioRxiv. DOI: 10.1101/2020.05.04.077735