key: cord-0995695-v7rik5td authors: Tyshkovskiy, Alexander; Panchin, Alexander Y. title: There is still no evidence of SARS‐CoV‐2 laboratory origin: Response to Segreto and Deigin (10.1002/bies.202100137) date: 2021-10-26 journal: Bioessays DOI: 10.1002/bies.202100194 sha: 09178c74ae565c27501cdad9c92e36357e60f09d doc_id: 995695 cord_uid: v7rik5td The causative agent of COVID‐19 SARS‐CoV‐2 has led to over 4 million deaths worldwide. Understanding the origin of this coronavirus is important for the prevention of future outbreaks. The dominant point of view that the virus transferred to humans either directly from bats or through an intermediate mammalian host has been challenged by Segreto and Deigin, who claim that the genome of SARS‐CoV‐2 has certain features suggestive of its artificial creation. Following their response to our commentary, here we continue the discussion of the proposed arguments for this hypothesis. We show that neither the existence of a furin cleavage site in SARS‐CoV‐2, nor the presence of specific sequences within the nucleotide insertion encoding that site are evidence for intelligent design. We also explain why existing genetic data, viral diversity and past human history suggest that a natural origin of the virus is the most likely scenario. Genetic evidence suggesting otherwise is yet to be presented. More than a year after the start of the COVID-19 pandemic, the origin of SARS-CoV-2 remains a widely discussed topic. Some proponents of the lab origin hypothesis including Segreto and Deigin have claimed that the genome of the coronavirus (CoV) has certain features "which could be consistent with a lab origin", including (i) the similarity of the SARS-CoV-2 backbone and receptor binding domain (RBD) with the backbone of bat CoV RaTG13 and RBD of pangolin CoV MP789 respectively, and (ii) the presence of a 12 nucleotide insertion that resulted in a formation of S1/S2 furin cleavage site (FCS). [1, 2] Even though the authors themselves admit that these observations are also consistent with the scenario of natural emergence, that is, "the genetic structure of SARS-CoV-2 is consistent with both natural or laboratory origin", they use it as an argument for the hypothesis of SARS-CoV-2 artificial creation. This line of argumentation appears to be scientifically invalid since any genetic structure of the virus would be consistent with some scenario of laboratory engineering. Even if one finds a natural reservoir with a close relative of the original Wuhan SARS-CoV-2 strain demonstrating 99.8% genome similarity, one could still claim that the remaining ∼60 mutations in the genome were introduced in the Wuhan Institute of Virology (WIV). Therefore, given that the discovery of a complete clone of one of the earliest SARS-CoV-2 representatives is highly improbable, the general hypothesis that the virus has been modified in the lab doesn't appear to be falsifiable and therefore doesn't meet the necessary criteria for scientific hypotheses sensu Popper, unless a very specific scenario is presented. [3] In contrast, the natural evolution scenario can be falsified by both genetic sequence analysis and formal investigations. In addition, the hypothesis of natural origin has a higher prior probability given past human history as well as present viral abundance and diversity. Indeed, among more than 80 viral emerging infectious diseases since 1940 none have been caused by a genetically modi-fied virus, [4] , and only one, the 1977 Russian flu pandemic, is speculated to have originated in the laboratory through a live-vaccine trial, [5] although no complete evidence for this scenario has been established. [6] To date there are over 200 known circulating viruses that infect humans, none of which are a result of genetic engineering and only one-the Marburg virus-has leaked from a lab prior to being described. [7] Remarkably, even in the latter case the infectious agent has originated in nature without laboratory manipulations. Finally, it is estimated that currently wildlife harbors hundreds of thousands of viruses with zoonotic potential, orders of magnitude more than any laboratory. [8] In this regard, to be considered plausible, the hypothesis of lab origin of SARS-CoV-2 should be accompanied with a specific scenario and strong arguments that would significantly favor it over the more probable and parsimonious hypothesis of natural origin. As we demonstrate below, no such arguments have been provided by Segreto and Deigin in their initial manuscript and response to our commentary. [9] Meanwhile, the evidence in favor of SARS-CoV-2 natural origin has been growing and is reviewed elsewhere. [10] POINTS OF DISAGREEMENT WITH THE SEGRETO / DEIGIN HYPOTHESIS 1 . In our criticism of Segreto's and Deigin's lab leak hypothesis we provided a bioinformatic analysis, showing that SARS-CoV-2 could not have been made from coronaviruses RaTG13 and MP789. The authors responded that: "we never claim that RaTG13 itself is a 'proposed ancestor' or the backbone used for a possible construction of SARS-CoV-2″ and emphasized that their hypothesis was about a "RaTG13-like backbone and an RBD from a MP789-like pangolin CoV". [2] Although the authors didn't claim that RaTG13 itself was an ancestor of SARS-CoV-2 in their article in Bioessays, they explicitly and repeatedly stated that in other resources, including their online blogs: "And CoV2 is an obvious chimera (though not nesessarily a lab-made one), which is based on the ancestral bat strain RaTG13, in which the receptor binding motif (RBM) in its spike protein is replaced by the RBM from a pangolin strain, and in addition, a small but very special stretch of four amino acids is inserted, which creates a furin cleavage site that, as virologists have previously established, significantly expands the "repertoire" of the virus in terms of whose cells it can penetrate". [11] Remarkably, the blog post hasn't been corrected after our commentary as we are writing this response. In contrast with this detailed hypothesis, Segreto's and Deigin's article in BioEssays contains no definition of RaTG-13-like or MP789-like viruses, allowing for multiple interpretations. [1] After all, SARS-CoV-2 itself has an over 96% nucleotide identity to RaTG13 and therefore could be considered "RaTG13-like". In this case the hypothesis becomes trivial: SARS-CoV-2 was created from SARS-CoV-2. For this reason, in our response we divided this hypothesis into two more specific scenarios: 1. SARS-CoV-2 was made from viruses that are nearly identical to RaTG13 and MP789. We then provided arguments against both scenarios. The first scenario is not consistent with the bioinformatic analysis provided in our original commentary, which still holds true for all RaTG13-like and MP789-like viruses discovered since 2013 with at least ∼98% nucleotide identity. Moreover, since the start of the pandemic, new naturally occurring coronaviruses such as RacCS203, [12] RpYN06, [13] PrC31, [14] and RmYN02 [15] have been discovered that are more similar to SARS-CoV-2 in their polyprotein 1ab genomic sequences than RaTG13. Remarkably, several coronaviruses recently found in horseshoe bats in Laos share RBDs, which are more similar to that of SARS-CoV-2 based on both nucleotide (93.6% similarity) and amino acid (97.4% similarity) sequences compared to RBDs of RaTG13 (85.5% and 89.2%, respectively) and MP789 (86.6% and 96.9%, respectively). [16] Finally, one of these bat coronaviruses, BANAL-52, demonstrates even higher overall genome similarity with SARS-CoV-2 (96.8%) than RaTG13 (96.2%). This provides the most conclusive evidence that known viruses from WIV cannot be viewed as SARS-CoV-2′s templates. It is understandable why this first subhypothesis is especially appealing to lab origin proponents: the fact that RaTG13 has been identified by researchers from the WIV provides an additional link between the lab and the pandemic. But this scenario appears to be implausible. The second scenario assumes that scientists at the WIV combined two undescribed coronaviruses, which are relatively distinct to RaTG13 and MP789 (> 2% nucleotide difference) but share higher similarity with the backbone and RBD encoding fragment of SARS-CoV-2, respectively. Although this assumption could be true, it is less probable than the hypothesis of natural recombination between the unknown coronaviruses, given the much higher prevalence of coronaviruses and recombination events in bat populations compared to laboratories. [8, 17] Moreover, since naturally occurring coronaviruses with higher similarity to SARS-CoV-2 within both backbone and RBD at the same time have been recently discovered in Laos, the hypothesis of artificial "recombination" between two unknown viruses becomes even less plausible. In the end, this scenario of SARS-CoV-2 creation appears to be less parsimonious than the hypothesis of a laboratory leak of a naturally evolved virus or a natural spillover. Thus, both scenarios of SARS-CoV-2 construction from the RaTG13like and MP789-like CoVs are not supported by available genomic data for relative coronaviruses and are further weakened by new discoveries. [16] Moreover, Boni et al. have argued that under the most parsimonious scenario of S protein evolution the RBD of SARS-CoV-2, it is the RBD of RaTG13 that has emerged as a result of recombination. [18] 2 On the other hand, the genetic similarity of SARS-CoV-2 with other coronaviruses appears to be completely consistent with the hypothesis of its natural origin. Specifically, we observed similar substitution patterns distinguishing nucleotide sequences of SARS-CoV-2 and SARS-CoV from their closest described bat relatives RaTG13 and Rs4231. [19] While this analysis does not exclude all of the numerous proposed scenarios of SARS-CoV-2 intelligent design, it suggests against the use of artificially accelerated mutagenesis during SARS-CoV-2 evolution prior to the pandemic. For example, the guanosine analog ribavirin mutagen specifically increases the rate of C > U and G > A transitions. [20] The RBD of SARS-CoV-2 was also shown to bind ACE2 receptors of bats, which makes the scenario of its emergence in these animals plausible. Commenting the hypothesis of bat origin of SARS-CoV-2, Segreto and Deigin state that the RBD of the coronavirus is "peculiar because it is characterized by a very high binding affinity to the human ACE2 receptor, but it binds poorly to the bat ACE2 receptor". [2] However, the reference used by the authors to support this claim is not an experimental study but a computational analysis of SARS-CoV-2 binding to various vertebrate ACE2 receptors. [21] Moreover, several crucial experimental studies conducted afterwards appear to challenge this claim. Yan et al. used an infectious assay to show that the ACE2 of 25 out of tested 46 bat species supports SARS-CoV-2 entry and that for several bat species the infection rate was comparable to human ACE2. [22] Schlottau et al. also showed transient infection and virus transmission after SARS-CoV-2 exposure in a species of fruit bats. [23] In addition, the adaptation of bat coronavirus RBD to the human ACE2 receptor doesn't seem to be an unlikely event. For example, it was shown that a single T403R mutation in RaTG13 spike protein allows this bat coronavirus to utilize the human ACE2 receptor for infection of human cells and intestinal organoids. [24] Thus, naturally occurring sarbecoviruses can easily change their preference over ACE2 receptors of different hosts. This is further confirmed by the discovery of naturally occurring bat coronaviruses in Laos, which share RBDs highly similar to that of SARS-CoV-2. Remarkably, their spikes have been shown to bind the human ACE2 receptor and infect human cells in a pseudovirus entry assay with the efficiency comparable to SARS-CoV-2 Wuhan strain. [16] Therefore, the higher affinity of SARS-CoV-2 to the human cells compared to some bat species appears to be consistent with the hypothesis of bat origin of the coronavirus with the subsequent tuning of its RBD to the new host. 3 In our initial analysis of Segreto's and Deigin's hypothesis we have pointed out that their finding of a FauI site in the 12-nt FCS insertion encoding sequence is indistinguishable from a random coincidence because there are many possible commercially available restriction sites and there is a high probability (∼99.5%) of finding at least one position cut by some enzyme in a nucleotide fragment of this length. [9] In their response Segreto and Deigin provided several alternative analyses of this probability based on a number of assumptions that are different from ours such as limiting the search to restriction enzymes that recognize 5 bp or larger sequences. In the end, the authors concluded that we have "underestimated the probability of a 12-nt insertion not containing a 5+ restriction enzyme cut site by 2 orders of magnitude: rather than the 0.5% implied by their [our] calculation, the actual probability is around 50%". [2] We could discuss which statistical models and assumptions are more reasonable in the given circumstances. However, this is not necessary, since the smallest probability achieved by Segreto and Deigin in their own calculation (p = 0.468) is still far higher, by an order of magnitude, than the relaxed threshold (p = 0.05) used to reject the null hypothesis in statistical analysis. In this case, the null hypothesis is that the FauI restriction site has emerged in the 12-nt insertion by coincidence, and Segreto's and Deigin's statistical analysis provides no significant deviation from this scenario. Thus, the authors themselves have confirmed our thesis that the finding of some restriction site in a 12-nt fragment is consistent with random coincidence. Therefore, it shouldn't be used as an argument for the scenario of artificial construction of SARS-CoV-2, in the same way as two "heads" or "tails" in a row do not suggest that a coin is biased. Segreto and Deigin also speculate that "the FauI site is notable for its unique property which enables it to be used to screen precisely whether the two arginines (R) in the newly created RRAR polybasic cleavage site are still present, as FauI's recognition sequence is created by the CGGCGG codons coding for RR". This argument seems to be incorrect, since the FauI recognition consensus. [25] [26] [27] This does not match the RRAR site found in SARS-CoV-2. The SARS-CoV-2 FCS is also different from those present in other well studied coronaviruses, such as HKU1 [28] and MERS, [29] the use of which would be reasonable from an experimental perspective, unlike the use of the previously undescribed RRAR. At the same time, various FCS have emerged independently multiple times in the natural history of coronaviruses, indicating that such events are not rare. Natural generation of novel multibasic cleavage sites via insertions have also been reported for other viruses, such as H5 and H7 influenza viruses. [30] [31] [32] Although the CGG codon is rarely used by the sarbecoviruses, the naturally occurring Rabbit coronavirus HKU14 (NCBI Reference Sequence: NC_017083.1) also has a CGGCGG sequence encoding two arginines, demonstrating that such coding sequences can emerge in beta coronaviruses in nature. Moreover, if the 12-nt sequence was acquired as an insertion via template switch, the source CGGCGG nucleotides were not required to code for arginines, because they were not necessarily in frame. This scenario seems plausible, since it has been shown that insertions are not infrequent in SARS-CoV-2 and that "although this insert has a high GC-content compared to the genomic average of SARS-CoV-2, it falls within the GC-content range of the long inserts'' and is located within 20 nucleotides of a template switch hotspot at position 22 582. [33] Furthermore, Holmes et al. reported that "both CGG codons are more than 99.8% conserved among the > 1,800,000 near complete SARS-CoV-2 genomes sequenced to date," [10] suggesting that these codons are preserved by stabilizing selection and, therefore, may be associated with high fitness. In this regard, the FCS insertion structure appears to be well consistent with the hypothesis of natural origin and subsequent evolution-driven codon optimization. In the end of their response to our commentary, Segreto and Deigin state that "the out-of-frame insertion of the furin cleavage site is not proven to be natural". [2] It is important to reiterate that there is no way to prove something is natural based on genetic sequence alone, since one can reproduce any naturally occurring mutation in a lab setting. As we have pointed out in our initial response, it is not unusual for insertions with the length of a multiple of three nucleotides to emerge in viral protein-coding sequences even if they occur in the middle of existing codons, because they do not cause a frameshift in the original viral protein coding sequence. [9, 34] Such cases have been observed even for some strains of SARS-CoV-2. Since the start of the pandemics, more than 20 long insertions in the coronavirus genome have been identified, some of which are even bigger than the 12-nt FCS insertion and have also resulted in a codon split. [33] CONCLUSIONS The genetic features of SARS-CoV-2 described by Segreto and Deigin, including nucleotide sequence similarity to several other coronaviruses and the presence of a 12-nt FCS insertion, do not favor artificial creation over natural evolution. The majority of arguments presented by the authors appear to be severely flawed. Some observations, such as the use of a non-canonical furin cleavage site, poorly fit the lab origin scenario, thus providing additional support for SARS-CoV-2 natural emergence. Past human history and present viral diversity in wildlife also suggest a higher prior probability of the virus's natural origin compared to intelligent design. Taken together, this allows us to conclude that natural emergence remains the most plausible scenario of SARS-CoV-2 origin and evidence to the contrary is yet to be presented. The genetic structure of SARS-CoV-2 does not rule out a laboratory origin: SARS-COV-2 chimeric structure and furin cleavage site might be the result of genetic manipulation The genetic structure of SARS-CoV-2 is consistent with both natural or laboratory origin: Response to Tyshkovskiy and Panchin Bioinformatic analysis indicates that SARS-CoV-2 is unrelated to known artificial coronaviruses Global trends in emerging infectious diseases The reemergent 1977 H1N1 strain and the gain-of-function debate Influenza in China in 1977: Recurrence of influenzavirus A subtype H1N1 Fortyfive years of marburg virus research The global virome project There is no evidence of SARS-CoV-2 laboratory origin: Response to Segreto and Deigin The origins of SARS-CoV-2: A critical review Lab-Made? SARS-CoV-2 Genealogy Through the Lens of Gain-of-Function Research Evidence for SARS-CoV-2 related coronaviruses circulating in bats and pangolins in Southeast Asia Identification of novel bat coronaviruses sheds light on the evolutionary origins of SARS-CoV-2 and related viruses A novel SARS-CoV-2 related coronavirus with complex recombination isolated from bats in Yunnan province A novel bat coronavirus closely related to SARS-CoV-2 contains natural insertions at the S1/S2 cleavage site of the spike protein Coronaviruses with a SARS-CoV-2-like receptor-binding domain allowing ACE2-mediated entry into human cells isolated from bats of Indochinese peninsula Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic Excessive G-U transversions in novel allele variants in SARS-CoV-2 genomes Antiviral effect of ribavirin against HCV associated with increased frequency of G-TO-A and C-TO-U transitions in infectious cell culture model Broad host range of SARS-CoV-2 predicted by comparative and structural analysis of ACE2 in vertebrates ACE2 receptor usage reveals variation in susceptibility to SARS-CoV and SARS-CoV-2 infection among bat species SARS-CoV-2 in fruit bats, ferrets, pigs, and chickens: an experimental transmission study Spike mutation T403R allows bat coronavirus RaTG13 to use human ACE2 Host cell proteases controlling virus pathogenicity Cleavage at the furin consensus sequence RAR/KR 109 and presence of the intervening peptide of the respiratory syncytial virus fusion protein are dispensable for virus replication in cell culture Proteolytic activation of the spike protein at a novel RRRR/S motif is implicated in furin-dependent entry, syncytium formation, and infectivity of coronavirus infectious bronchitis virus in cultured cells Furin cleavage sites naturally occur in coronaviruses The sequence at Spike S1/S2 site enables cleavage by furin and phospho-regulation in SARS-CoV2 but not in SARS-CoV1 or MERS-CoV Source of high pathogenicity of an avian influenza virus H5N1: Why H5 is better cleaved by furin Characterization of the 1918 influenza virus polymerase genes From low to high pathogenicity-Characterization of H7N7 avian influenza viruses in two epidemiologically linked outbreaks Insertions in SARS-CoV-2 genome caused by template switch and duplications give rise to new variants of potential concern Virulenceassociated sequence duplication at the hemagglutinin cleavage site of avian influenza viruses There is still no evidence of SARS-CoV-2 laboratory origin: Response to Segreto and Deigin The author declares no conflict of interests. Data are available by request from the corresponding authors.