key: cord-1047935-ciqa1tax authors: Li, Yang; Zhang, Yunjun; Liang, Mifang; Zhang, Yi; Ma, Xuejun; Zhang, Yong; Zhou, Xiaohua title: Lack of evolutionary changes identified in SARS-CoV-2 for the re-emerging outbreak of COVID-19 in Beijing, china date: 2021-12-25 journal: Biosaf Health DOI: 10.1016/j.bsheal.2021.12.001 sha: 39758ded60b797e6f903f244a8f19432062cdebf doc_id: 1047935 cord_uid: ciqa1tax Although significant achievements have shown that the COVID-19 resurgence in Beijing, China, was initiated by contaminated frozen products and transported via cold chain transportation, international travelers with asymptomatic symptoms or false-negative nucleic acid may have another possible transmission mode that spread the virus into Beijing. One of the key differences between these two assumptions was whether the virus actively replicated since, so far, no reports showed viruses could stop evolution in alive hosts. We studied SARS-CoV-2 sequences in this outbreak by a modified leaf-dating method with the Bayes factor. The numbers of single nucleotide variants (SNVs) found in SARS-CoV-2 sequences were significantly lower than those called from B.1.1 records collected at the matching time worldwide (p = 0.047). In addition, results of the Leaf-dating method showed ages of viruses sampled from this outbreak were earlier than their recorded dates of collection (Bayes factors > 10), while control sequences (selected randomly with ten replicates) showed no differences in their collection dates (Bayes factors < 10). Our results which indicated that the re-emergence of SARS-CoV-2 in Beijing in June 2020 was caused by a virus that exhibited a lack of evolutionary changes compared to viruses collected at the corresponding time, provided evolutionary evidence to the contaminated imported frozen food should be responsible for the reappearance of COVID-19 cases in Beijing. The method developed here might also be helpful to provide the very first clues for potential sources of COVID-19 cases in the future. Although significant achievements have shown that the COVID-19 resurgence in Beijing, China, was initiated by contaminated frozen products and transported via cold chain transportation, international travelers with asymptomatic symptoms or false-negative nucleic acid may have another possible transmission mode that spread the virus into Beijing. One of the key differences between these two assumptions was whether the virus actively replicated since, so far, no reports showed viruses could stop evolution in alive hosts. We studied SARS-CoV-2 sequences in this outbreak by a modified leafdating method with the Bayes factor. The numbers of single nucleotide variants (SNVs) found in SARS-CoV-2 sequences were significantly lower than those called from B.1.1 records collected at the matching time worldwide (p = 0.047). In addition, results of the Leaf-dating method showed ages of viruses sampled from this outbreak were earlier than their recorded dates of collection (Bayes factors > 10), while control sequences (selected randomly with ten replicates) showed no differences in their collection dates (Bayes factors < 10). Our results which indicated that the re-emergence of SARS-CoV-2 in Beijing in June 2020 was caused by a virus that exhibited a lack of evolutionary changes compared to viruses collected at the corresponding time, provided evolutionary evidence to the contaminated imported frozen food should be responsible for the reappearance of COVID-19 cases in Beijing. The method developed here might also be helpful to provide the very first clues for potential sources of COVID-19 cases in the future. Molecular clock; Frozen Virus; Leaf-dating; Bayes factors; SARS-CoV-2 A super-spreading event of COVID-19 outbreak at Xinfadi (XFD) market in Beijing in June 2020 was supposed to be caused by contaminated imported frozen food. However, this hypothesis resulted in critical issues: i, Could SARS-CoV-2 be transmitted through environment-to-human transmission? ii, Would infectivity of SARS-CoV-2 be reduced after transportation and storage associated with international cold-food logistics? iii, Could the "frozen" features be observed among those SARS-CoV-2 genomes sequenced from this outbreak? Significant achievements had been made. Pang and colleagues provided molecular evidence (ancestral sequences were circulating in Europe [B.1.1 lineage]) and epidemiological investigations to conclude that environment-tohuman transmission originated from contaminated imported food should be responsible for the COVID-19 resurgence in Beijing [1] . In addition, SARS-CoV-2 was successfully isolated from the imported frozen cod package surface while cytopathic effects (CPE) were observed from Vero-E6 cells inoculated with the isolated virus [2] . Therefore, the first two concerns have been well addressed, while the third remained obscure. Importantly, answers to the third question could provide insights into whether the international travelers with asymptomatic symptoms or false-negative of nucleic acid test spread the virus into Beijing [3] . We previously reported those sequences found in XFD were "older" than Europe's viruses collected at the matching time [4] , leading to the "frozen evolution" virus hypothesis. A typical character of frozen viral isolates showed no accumulated mutations while in storage. Given that phylogenetic and evolutionary analyses had been performed to prove "frozen" virus as a potential cause of arbovirus re-emergence in France [5] , here we aimed to conduct similar investigations to explore whether XFD SARS-CoV-2 sequences presented "frozen" genomic features. We firstly collected the SARS-CoV-2 sequences from XFD during June 2020 from GISAID (https://www.gisaid.org, 5 records, Accession ID: EPI_ISL_3154875, EPI_ISL_469254, EPI_ISL_469255, EPI_ISL_469256 and EPI_ISL_850948) and Genome Warehouse (https://ngdc.cncb.ac.cn/gwh/, 3 records, Accession ID: GWHANPA01000001, GWHANPB01000001 and GWHANPC01000001) (Access Date: 21/Oct/2021) ( Table S1 ). Given that the genomes found in this outbreak belonged to the B.1.1 lineage, which was previously circulating outside of China, it should be reasonable to compare those XFD genomes to those records sampled from all over the world at the corresponding time. Therefore, we built a collection including all the B.1.1 sequences regardless of their location, collected during June 2020 (B.1.1 collection) ( Table S1 ). After that, the sequences with less than 15 ambiguous N bases in the genome were kept, with 2,355 out of 3,541 remaining (Table S1 ). We aligned the XFD SARS-CoV-2 sequences using Burrows-Wheeler Aligner (BWA) [6] , regarding the official sequence of SARS-CoV2 (NC_045512.2) as the reference genome. After the alignments, BAM files were sorted then using SAMtools [7] . Next. the sorted BAM files were analyzed with Bcftools [7] to generate variant call format (VCF) files using the command line: The Leaf-dating method was developed to estimate unknown sequences ages [5] . Here, we regarded both XFD and part sequences from B.1.1 collection as the unknown ones to calculate their computational collection time (DateE) through leaf-dating with BEAST v2.6.2 [8] . Briefly, the background sequences with known collection time were randomly selected (seven sequences per week) from lineage B.1.1 collection. The HKY85 nucleotide substitution model with Gamma distributed rate variation applied a strict clock model and exponential population growth. The priors of the sequences to estimate their DateE were defined with a uniform prior from 1/Jan/2020 to 30/Nov/2020. The rest parameters of priors were described in the previous study [9] . The chain length was set to 100 million states with a 10% burn-in. Convergence was evaluated using Tracer v1.7.1 [10] . Ten replicates of Leaf-dating with the Bayes factor were implemented. Next, the Bayes factor, which was to test the discrimination between the DateE and recorded collection date (DateR) of each sequence, was calculated with the Savage-Dickey ratio [11] based on the prior and posterior distribution of sequences ages generated from the Leaf-dating method. The interpretation of the Bayes factor was guided as follows, where a Bayes factor of at least 10 indicated "strong" support for DateE ≠ DateR, a value of 3.2 showed "positive" support for DateE ≠ DateR, a value of 1 indicated "not worth" support for DateE ≠ DateR [12] . Mann-Whitney U test was applied on non-normally distributed variables, with a two-tailed P<0·05 defined as statistically significant. To determine whether the SARS-CoV-2 genomes sequenced from XFD in Beijing were intrinsically differed from the B. (Figure 3) , which might indicate a problem with the true age [5] . We next calculated the Bayes factors over these outputs to test this hypothesis. The results showed the XFD virus causing the outbreak in Beijing could be earlier than its DateR (Bayes factor > 10). In contrast, groups of low mutation and regular control showed no significant difference to their DateR (Bayes factor < 10). Our results were robust to different sampling datasets (Figure 4 ). In addition, the ages of SARS-CoV-2 with high mutations (≥ 17 SNVs) might be later than its DateR ( Figure S2 ). It was as expected that sequences with high mutations showed delays in DateR. The mechanism behind such high mutations accumulated in such a short time needed to be studied in the future. One of the significant conclusions about the potential source of the XFD outbreak was that the outbreak originated from a seafood booth contaminated by SARS-CoV-2 based on the epidemiological data [1] . The related facts led to the hypothesis that the virus triggered the XFD outbreak: 1) was imported by the infected patient and somehow contaminated the frozen food; 2) was imported by the frozen food and spread in Beijing [3] . The hypothesis stated above could be simplified to whether the frozen food was contaminated before or after its appearance in China. This problem was equivalent to whether the virus was frozen or not. Based on the modified leafdating method developed here, we showed the re-emergence of SARS-CoV-2 in Beijing in June 2020 was caused by a virus that exhibits a lack of evolutionary changes compared to viruses collected at the corresponding time (Figure 2-4) . In other words, we did reveal the "frozen" genomic features in SARS-CoV-2 sequences found in the COVID-19 outbreak at Xinfadi market in Beijing in June 2020. Although frozen viral isolate would not accumulate the mutations while in storage, it should be noted that the viral strain with low mutations could be necessary but not sufficient condition to "frozen virus." Our results from the modified Leaf-dating method demonstrated that DateE of the sequences with low mutations (≤ 7 SNVs) showed no differences to DateR (Bayes factors < 10) ( Figure 2,3) . In other words, "frozen virus" showed fewer SNVs and more complex evolutionary features (e.g., mutation position, substitution pattern, etc.), which required further studies. Thus, the SARS-CoV-2 in the XFD outbreak in Beijing showed a lack of evolutionary changes. The modified leaf-dating method proposed in this study could provide a quantitative way through the Bayes factor to show the discrimination between the computational age (estimated from leaf-dating) and collection date with two steps. In the first step, the ages of target sequences (e.g., XFD sequences and control sequences in this study) were assumed to be unknown and estimated through the original leaf-dating method [5] . The second step was to test the gap between the computational age (DateE) and recorded collection date (DateR) of target sequences. Finally, the Bayes factor was applied and calculated through the Savage-Dickey ratio. The codes were available at https://github.com/yunPKU/BayesFactorCalculation. We were noted that this method carried out in this study could provide alternative insights to identify the possible source of re-emergency of COVID- There were some limitations in this study. First, given that the power of molecular clock analysis could be reduced when the study period was limited from years/ decades to only months (in this study), the specific how earlier the XFD sequences than expected might be biased and misleading when the DateE could be large of uncertainty. Therefore, we applied the test (DateE ≠ DateR) with the Bayes factor instead of a specific DateE. Secondly, the method could not be used for the virus that underwent recombination. Border control and quarantine have effectively prevented the spread of SARS-CoV-2 by infected travelers in China. However, strict strategies of the monitor for imported goods, especially those cold-chained products, need to be developed accordingly, to prevent the potential secondary extensive outbreak in this country, while emerging variants of SARS-CoV-2, such as Delta variant, were still spreading worldwide. The reference group in the comparison was Beijing-Asia. P-value was determined by the Mann-Whitney U test for two-group comparisons with median reported, ***: P-value < 0.001; **: P-value < 0.01; *: P-value < 0.05; ns: P-value > 0.05. Cold-chain food contamination as the possible origin of COVID-19 resurgence in Beijing Cold-chain transportation in the frozen food industry may have caused a recurrence of COVID-19 cases in destination: Successful isolation of SARS-CoV-2 virus from the imported frozen cod package surface Source of Beijing's big new COVID-19 outbreak is still a mystery China's CDC experts investigate Xinfadi market three times, announce groundbreaking virus tracing discovery A Bayesian phylogenetic method to estimate unknown sequence ages Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM The Sequence Alignment/Map format and SAMtools BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis Evolutionary & epidemiological analysis of 93 genomes Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7 Bayesian hypothesis testing for psychologists: a tutorial on the Savage-Dickey method Bayes factors Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2 We appreciate the helpful discussion with Dr. Tao Zhao from Nvidia and Prof. Li Supplementary data to this article can be found online.