Introduction
As of August 2021, the origins of the SARS-CoV-2 virus that caused the COVID-19 pandemic remain a mystery. This essay summarizes a recent article I published about the prevalence in news media articles of two popular hypotheses about SARS-CoV-2 virus origins: the natural emergence and the lab-leak hypotheses. Results show that for most of 2020, the natural emergence hypothesis was favored in news media content while the lab-leak hypothesis was largely absent. However, something changed around May 2021 that caused the prevalence of the lab-leak hypothesis to substantially increase in news media discourse. This shift was not uniform across media organizations but instead manifested itself more acutely in some outlets than others. A structural break analysis of daily news media usage of terms related to the laboratory escape hypothesis provides hints about potential sources for this sudden shift in the prevalence of the lab-leak hypothesis in mainstream news media.
Figure 1 illustrates the validity of the methodology used by tracking salient sociocultural phenomena related to the COVID-19 pandemic.
Prevalence of Two COVID-19-Origins Hypotheses in News Media
The first officially acknowledged signs of COVID-19 surfaced around December 2019 in Wuhan, China [1]. Chinese authorities initially reported that many early cases had been traced back to the Wuhan wet market. This was reminiscent of the 2003 SARS1 outbreak in which a bat virus first jumped to civets, some of which were sold in wet markets, and from there the virus again leaped the species barrier to infect people [2]. Many scientists and Chinese government officials proposed that a similar event could have happened again, perhaps with the intermediate host this time being pangolins [3] or a direct transmission from bats to humans [4]. The Wuhan wet market was signaled as perhaps the breeding ground of the outbreak [5]. News media at the time echoed this first plausible explanation (see first row of Figure 2) and largely ignored the possibility of a lab-leak, as evidenced in the second, third, and fourth rows of Figure 2. Over time however, conclusive supportive evidence for the natural emergence hypothesis has not yet materialized, as no signs of prior intermediate animal host infection with COVID-19 have been found despite an intensive search [6]. Initial cases of COVID-19 not linked to the Wuhan wet market were also eventually reported [7]. This perhaps explains the decreasing prevalence of the intermediate host hypothesis in news media content since its peak around February-April of 2020; see Figure 2 a–d.
The possibility of a of lab-leak was largely absent in news media discourse during most of 2020 and has only gained prominence in mid-2021, see second row of Figure 2. A compelling reason to not rule out the lab-leak hypothesis was due to Wuhan hosting a virology laboratory that conducted research work on coronaviruses, the Wuhan Institute of Virology (WIV). Media interest about the lab has however, only peaked recently; as reflected in Figure 2 i.
The WIV is also China’s only maximum biosafety level-4 (BSL-4) laboratory, meaning it is authorized and equipped to work on the most dangerous viral pathogens; see Figure j. Critically, since at least 2015, this research lab had been working on gain-of-function experiments to make coronavirus strains more infectious of cells lining the human respiratory tract [6], [8], allegedly under inadequate safety conditions [9]. Media mentions of the WIV engaging in this type of research have only picked up in May–June of 2021, see Figure 2 k.
The hypothesis of a lab-leak was dismissed early in 2020 by some prominent members of the scientific community as a conspiracy theory and their opinions were published in prestigious scientific journals such as The Lancet [10] and Nature Medicine [11]. This perhaps could explain why mainstream news media mostly echoed the natural emergence hypothesis and largely ignored the lab-leak alternative during most of 2020 as the world suffered the thrust of the pandemic.
At least one signatory of The Lancet letter [10] was a member and president of the EcoHealth Alliance, an organization that had funded coronavirus gain-of-function research at the Wuhan Institute of Virology with U.S. government grants from the National Institute of Allergy and Infectious Diseases (NIAIDS) [6], [8]. The NIAIDS coronavirus gain-of-function research grant to the Wuhan Institute of Virology through EcoHealth has only recently attracted substantial media attention; see subplot l in Figure 2.
The most similar public genome to SARS-CoV-2 is a bat coronavirus known as RaTG13, with a genome similarity to SARS-CoV-2 of 96% [4]. Media attention to this virus that was retrieved from a cave in the Yunnan province (1,800 km away from Wuhan), sequenced and published by staff from the Wuhan Institute of Virology [4], has also only recently become prominent; see Figure 2 m.
Several relevant molecular features of SARS-CoV-2 were also largely underreported by mainstream news media during 2020. In the middle of the SARS2 spike protein, a motif called the furin cleavage site is critical for the subunits of the spike protein (S1 and S2) to be cut apart by a protein cutting tool on the surface of human cells known as furin [12]. Such cleavage allows the virus to fuse with the target cells’ membrane, inject its genetic material into the cell and cause the cell to generate new copies of the virus. The human furin protein will cut any protein chain that carries the motif amino acid sequence proline-arginine-arginine-alanine (PRRA). SARS2 is the only SARS-related beta-coronavirus with a furin cleavage site, making it particularly optimized to target human cells [6], [13]. Yet, news media outlets have largely overlooked this molecular feature of the virus until recently; see Figure 2 n.
At the S1/S2 junction, the 12-nucleotide sequence codifying the PRRA motif that renders the protein chain susceptible to be cleaved by furin and allow viral particles to fuse with human cells’ membrane is T-CCT-CGG-CGG-GC. This sequence contains the unusual feature that the double arginine codons pattern, CGG-CGG, has never been found in any other beta coronavirus [6]. This molecular characteristic also appears to have been absent from news media discourse until recently; see Figure 2 o. Perhaps as a result of the above discussed unusual molecular features of SARS-CoV-2, some news media outlets have recently started to mention the possibility that the virus might have been manipulated in a lab; see Figure 2 p.
High Frequency Analysis of the COVID-19 Lab-leak Hypothesis Prevalence in News Media
Figure 2 only shows that the prevalence of the lab-leak hypothesis in news media content markedly increased in May and June of 2021. To visualize higher resolution dynamics around this period, the previous analysis is replicated using weekly frequency counts for a set of key target words denoting the lab-leak hypothesis theme. Figure 3 shows that the prevalence in news media of the lab-leak hypothesis theme increased during the month of May to spike in the last week of that month, and then decreased gradually as the month of June progressed. There also appears to be milder peaks of this topic prevalence in mid-February and in the week at the end of March/beginning of April.
Figure 3 also illustrates that not all news media outlets have manifested a spike in the prevalence of the lab-leak hypothesis theme in their textual content. Instead, the increased prevalence has been driven mainly by just some outlets such as Fox News, The New York Post, The Wall Street Journal, and The Washington Post.
To achieve even higher granularity temporal dynamics of the lab-leak hypothesis thematic prevalence in news media, daily frequency counts of target words in news media content from 1 January 2021 to 30 June 2021 are shown in Figure 4. The usage peaks identified in Figure 3 guided a search for potentially relevant events around those dates that could have plausibly influenced media coverage of the lab-leak hypothesis.
Figure 4 highlights six such potentially relevant events. The first three are the beginning of the World Health Organization’s (WHO) field visit to China to investigate the origins of the pandemic, their visit to the Wuhan Institute of Virology, and the end of their field visit to China. The next event corresponds to the publication of the WHO report on its Wuhan field visit investigation and the simultaneous WHO recommendation calling for further studies while reiterating that several hypotheses about COVID-19 origins remain open [14].
The next relevant event concerns Nicholas Wade, a former science reporter at the New York Times, and his publication of “The origin of COVID: Did people or nature open Pandora’s box at Wuhan?” on May 5, 2021 [6]. In his article, Wade enumerated what he considered substantial evidence pointing in the direction of the lab-leak hypothesis, although he acknowledged that no definite proof existed yet for either the natural emergence or the lab-leak hypotheses.
The final highly likely influential event in press coverage of the lab-leak hypothesis concerns the U.S. president, Joe Biden, ordering to its intelligence community on 26 May 2021 to further investigate the origins of the COVID-19 virus and provide a report back to him within 90 days [15].
Latent Associations about COVID-19 Origins in News Media Content
While frequency analysis of a corpus of text can be informative about the thematic prevalence of certain topics, the technique is also limited in that it does not analyze the context in which words are being used. To overcome this limitation, we used embedding models to analyze the strength with which sets of words are associated (i.e., appear in the vicinity of each other or in similar contexts) in news media articles.
Figure 5 shows the results of the analysis. The first row of Figure 5 contains subplots using a dashed orange line and it is only used to illustrate that the technique produces sensible results, including detecting the temporal occurrences of events such as Donald Trump’s infection and subsequent positive testing for COVID-19, or Joe Biden winning the Democratic Party nomination for the U.S. presidency around March/April 2020 and his subsequent electoral victory in the U.S. presidential election of November 2020.
The second row of Figure 5 illustrates the decreasing prevalence of the intermediate host hypothesis in news media as shown by the declining association of coronavirus with potential intermediate hosts such as pangolins, civets, and bats, as well as peak association of the virus with wet markets between January and February of 2020.
The third row of Figure 5 shows how news media have recently started to more strongly associate terms such as covid or coronavirus with a lab-leak or the Wuhan Institute of Virology. Subplot k in the figure also shows that during 2020, the media mostly did not report on the gain of function research experiments being conducted since 2015 at the WIV [6], [8]. Similarly, associations about the dangerous nature of gain of function research are stronger in mid-2021 than at any time in 2020. The commonality for all these associations is that their strength of association has peaked around May and June of 2021.
Subplots m and n in Figure 5 shows that associations between the research grants from NIAID/NIH for gain of function research at the WIV have become more prominently linked in the last few months. Subplot o also illustrates how in recent journalistic discourse, the lab-leak hypothesis is often associated with terms denoting racism. The embedding method used does not allow discerning whether such associations occur because news media content suggests that it is racist to propose the lab-leak hypothesis or whether some writers are arguing that the lab-leak hypothesis was not properly scrutinized previously because of concerns about accusations of racism or in an attempt to not stir up racist sentiment. The pattern could also be the result of a combination of all the previous possibilities. Finally, associations between the WIV and a potential laboratory accident have become more prominent in 2021, although the relationship was also briefly common in April and May of 2020; see subplot p in Figure 5.
Conclusion
As of August 2021, the origins of the SARS-CoV-2 virus that caused the COVID-19 pandemic remain a mystery. The results presented here suggest that for most of 2020, popular news media outlets mostly ignored or downplayed the possibility of a lab-leak as a reason for the virus outbreak. Perhaps the publication in prestigious scientific journals, such as The Lancet and Nature Medicine, of opinion pieces dismissing the lab-leak hypothesis [10], [11] played a role in media attitudes towards this hypothesis.
Alternatively, the fact that early on in the pandemic, U.S. president at the time, Donald Trump, advocated for the lab-leak hypothesis without providing explicit evidence [17] could have rendered the lab-leak hypothesis unpalatable for most news outlets due in part to the notorious mutual animosity between news media organizations and Trump.
In May 2021, however, something caused the prevalence of the lab-leak hypothesis in news media discourse to substantially spike in prominence in some, but not all, of the studied media outlets. The evidence presented above is suggestive, but not conclusive, of Nicholas Wade’s essay having perhaps triggered the sudden increase in attention of at least some outlets to the lab leak hypothesis.
It is also noteworthy that most of the interest in the lab-leak hypothesis seems to have emerged in right-leaning news outlets (Fox News, the Wall Street Journal, and the New York Post). Although, the Washington Post, a prestigious left-leaning newspaper [19], has also manifested in its content an increasing prevalence of the lab-leak hypothesis.
Accurate reporting on current events is essential to maintain trust between the public and news media organizations. If additional supporting evidence for the lab-leak hypothesis eventually surfaces while supporting evidence for the natural emergence hypothesis fails to materialize, the downplaying of the lab-leak hypothesis in mainstream news outlets during the first 16 months of the pandemic could trigger public erosion of trust in news media.
References
[1] C. Sohrabi et al., “World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19),” Int. J. Surg. Lond. Engl., vol. 76, pp. 71–76, Apr. 2020, doi: 10.1016/j.ijsu.2020.02.034.
[2] J. W. LeDuc and M. A. Barry, “SARS, the First Pandemic of the 21st Century1,” Emerg. Infect. Dis., vol. 10, no. 11, p. e26, Nov. 2004, doi: 10.3201/eid1011.040797_02.
[3] T. Zhang, Q. Wu, and Z. Zhang, “Probable Pangolin Origin of SARS-CoV-2 Associated with the COVID-19 Outbreak,” Curr. Biol. CB, vol. 30, no. 8, p. 1578, Apr. 2020, doi: 10.1016/j.cub.2020.03.063.
[4] P. Zhou et al., “A pneumonia outbreak associated with a new coronavirus of probable bat origin,” Nature, vol. 579, no. 7798, pp. 270–273, Mar. 2020, doi: 10.1038/s41586-020-2012-7.
[5] China Daily, “Wuhan wet market closes amid pneumonia outbreak,” Jan. 01, 2020. https://www.chinadaily.com.cn/a/202001/01/WS5e0c6a49a310cf3e35581e30.html (accessed Jul. 03, 2021).
[6] N. Wade, “The origin of COVID: Did people or nature open Pandora’s box at Wuhan?,” Bulletin of the Atomic Scientists, May 05, 2021. https://thebulletin.org/2021/05/the-origin-of-covid-did-people-or-nature-open-pandoras-box-at-wuhan/ (accessed Jun. 30, 2021).
[7] J. F.-W. Chan et al., “A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster,” Lancet Lond. Engl., vol. 395, no. 10223, pp. 514–523, 2020, doi: 10.1016/S0140-6736(20)30154-9.
[8] P. Daszak, “Understanding the Risk of Bat Coronavirus Emergence,” Jun. 01, 2014. https://reporter.nih.gov/search/xQW6UJmWfUuOV01ntGvLwQ/project-details/9491676 (accessed Jul. 01, 2021).
[9] “Opinion | State Department cables warned of safety issues at Wuhan lab studying bat coronaviruses,” Washington Post, Apr. 14, 2020. Accessed: Jul. 17, 2021. [Online]. Available: https://www.washingtonpost.com/opinions/2020/04/14/state-department-cables-warned-safety-issues-wuhan-lab-studying-bat-coronaviruses/
[10] C. Calisher et al., “Statement in support of the scientists, public health professionals, and medical professionals of China combatting COVID-19,” The Lancet, vol. 395, no. 10226, pp. e42–e43, Mar. 2020, doi: 10.1016/S0140-6736(20)30418-9.
[11] K. G. Andersen, A. Rambaut, W. I. Lipkin, E. C. Holmes, and R. F. Garry, “The proximal origin of SARS-CoV-2,” Nat. Med., vol. 26, no. 4, pp. 450–452, Apr. 2020, doi: 10.1038/s41591-020-0820-9.
[12] B. A. Johnson et al., “Furin Cleavage Site Is Key to SARS-CoV-2 Pathogenesis,” bioRxiv, p. 2020.08.26.268854, Aug. 2020, doi: 10.1101/2020.08.26.268854.
[13] T. P. Peacock et al., “The furin cleavage site in the SARS-CoV-2 spike protein is required for transmission in ferrets,” Nat. Microbiol., vol. 6, no. 7, pp. 899–909, Jul. 2021, doi: 10.1038/s41564-021-00908-w.
[14] World Health Organization, “WHO calls for further studies, data on origin of SARS-CoV-2 virus, reiterates that all hypotheses remain open,” Mar. 30, 2021. https://www.who.int/news/item/30-03-2021-who-calls-for-further-studies-data-on-origin-of-sars-cov-2-virus-reiterates-that-all-hypotheses-remain-open (accessed Jul. 02, 2021).
[15] REUTERS, “Biden orders review of COVID origins as lab leak theory debated,” Reuters, May 26, 2021. https://www.reuters.com/business/healthcare-pharmaceuticals/biden-says-us-intelligence-community-divided-covid-origin-2021-05-26/ (accessed Jul. 02, 2021).
[16] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” in Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2013, pp. 3111–3119. Accessed: May 11, 2017. [Online]. Available: http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
[17] “Coronavirus: Trump stands by China lab origin theory for virus,” BBC News, May 01, 2020. Accessed: Aug. 07, 2021. [Online]. Available: https://www.bbc.com/news/world-us-canada-52496098
[18] COVID-19 Data Repository CSSE - JHU, COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. 2021. Accessed: Aug. 09, 2021. [Online]. Available: https://github.com/CSSEGISandData/COVID-19
[19] “AllSides Media Bias Ratings,” AllSides, 2019. https://www.allsides.com/blog/updated-allsides-media-bias-chart-version-11 (accessed May 10, 2020).
Nice article David! I think Philipp Markolin's excellent dissection and detailed explanation of the lab leak origin hypothesis won we over to a likely natural zoonotic event despite the arguments raised in Nicholas Wade's article and particularly over the origin of the furin cleavage site and the virus's unusual codon preference that you mention in your piece. Markolin doesn't dismiss the leak idea altogether, but zoonosis now makes more sense to me. Of course, the unfortunate association between the lab origin hypothesis and conspiracy mongering by the right wing press does nothing to clear the air or relieve tensions. And I've been taken aback by the frequently aggressive debunking of the lab origin hypothesis by the scientific community which does nothing to increase public confidence in expert opinion either!
Sad I’m just seeing this write up! Have you since conducted a follow-up to this excellent data and analysis? I would love to compare current lab-leak origin frequency in media with mid-2021 frequency you observed in the data given the recent moderately confident US energy dept and FBI pro lab-leak conclusions. How might those impact or update your then-tentative conclusion?