key: cord-0139462-zk440c6t authors: Massey, Steven E title: SARS-CoV-2's closest relative, RaTG13, was generated from a bat transcriptome not a fecal swab: implications for the origin of COVID-19 date: 2021-11-18 journal: nan DOI: nan sha: f98f0c0eed9552a26f132d95a60113beebb4e5db doc_id: 139462 cord_uid: zk440c6t RaTG13 is the closest related coronavirus genome phylogenetically to SARS-CoV-2, consequently understanding its provenance is of key importance to understanding the origin of the COVID-19 pandemic. The RaTG13 NGS dataset is attributed to a fecal swab from the intermediate horseshoe bat Rhinolophus affinis. However, sequence analysis reveals that this is unlikely. Metagenomic analysis using Metaxa2 shows that only 10.3 % of small subunit (SSU) rRNA sequences in the dataset are bacterial, inconsistent with a fecal sample, which are typically dominated by bacterial sequences. In addition, the bacterial taxa present in the sample are inconsistent with fecal material. Assembly of mitochondrial SSU rRNA sequences in the dataset produces a contig 98.7 % identical to R.affinis mitochondrial SSU rRNA, indicating that the sample was generated from this or a closely related species. 87.5 % of the NGS reads map to the Rhinolophus ferrumequinum genome, the closest bat genome to R.affinis available. In the annotated genome assembly, 62.2 % of mapped reads map to protein coding genes. These results clearly demonstrate that the dataset represents a Rhinolophus sp. transcriptome, and not a fecal swab sample. Overall, the data show that the RaTG13 dataset was generated by the Wuhan Institute of Virology (WIV) from a transcriptome derived from Rhinolophus sp. tissue or cell line, indicating that RaTG13 was in live culture. This raises the question of whether the WIV was culturing additional unreported coronaviruses closely related to SARS-CoV-2 prior to the pandemic. The implications for the origin of the COVID-19 pandemic are discussed. Understanding the origin of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease 2019 (COVID- 19) is vital for preventing future pandemics. There are two main hypotheses regarding the origin of the COVID-19 pandemic. The zoonosis hypothesis proposes that the progenitor of SARS-CoV-2 jumped from a bat or intermediate host to a human (1) . This scenario requires that the infected bat or intermediate host came into close contact with a human in a nonresearch setting which allowed the transmission to occur. The contrasting lab leak hypothesis proposes that SARS-CoV-2 was transmitted into the human population from a research related activity such as a laboratory experiment (2) . The RaTG13 coronavirus genome, sequenced by the Wuhan Institute of Virology (WIV), is phylogenetically the closest known relative to SARS-CoV-2 1 , and its apparent provenance from the intermediate horseshoe bat Rhinolophus affinis has been used to support the proposed zoonotic origin of SARS-CoV-2 (3). However, the original Nature publication describing the RaTG13 genome sequence was sparse regarding sampling location and date of sequencing of RaTG13. A fragment of the RNA dependent RNA polymerase (RdRp) was the first part of RaTG13 to be sequenced, initially labelled as 'RaBtCov/4991' (4) , and subsequently renamed 'RaTG13' in the Nature paper (the link between the two was identified in (5)). Further details were elaborated in an Addendum (6) , which gave the date of sequencing of RaTG13 as 2018, and the sampling location as a mine in Mojiang, Yunnan Province, China, which had been associated with the death in 2012 of three miners who had been clearing bat guano, from a virus like respiratory infection (7) (8) (9) . 1 While RaTG13 shows 96.2 % sequence identity with the SARS-CoV-2 genome (3), a new Rhinolophus malayanus sarbecovirus genome from Laos, BANAL-52 (59) , shows 96.9 % nucleotide sequence identity with SARS-CoV-2 (data not shown). However, a maximum likelihood phylogenomic tree shows that RaTG13 is the closest relative to SARS-CoV-2, with strong support (59) . Stranded mRNA Library Preparation kit (Illumina) was used to produce the sequencing library for sequencing on the HiSeq 3000 (Illumina) platform. A NGS dataset generated from an anal swab obtained from R.affinis (although the species is likely incorrect, as discussed below in Results) by Li et al. at the WIV (17) was used as a comparison. This dataset was used to generate the BtRhCoV-HKU2r (Bat Rhinolophus HKU2 coronavirus related) genome (National Center for Biotechnology Information, NCBI, accession number MN611522). The raw sequence data were obtained from the ENA (accession number SRR11085736) and were labelled as being generated from an R.affinis anal swab and sequenced on an HiSeq 3000 plaform. Likewise, the dataset was described as being generated from an R.affinis anal swab in the publication describing its genome sequence, using the QIAamp Viral RNA Mini Kit and TruSeq Library Preparation kit (Illumina) (17) A transcriptome generated from a R.sinicus splenocyte cell line by the WIV was used as an additional comparison, and was obtained from the ENA (accession number SRR5819066). The protocol used to generate the dataset was not described on its ENA webpage, however it described as being sequenced using a HiSeq 2000 platform. A NGS dataset generated by the EcoHealth Alliance from an oral swab from the bat Miniopterus nimbae from Zaire, and which contained Ebola, was obtained from the SRA (accession number SRR14127641). The dataset was described on its SRA webpage as being generated using the VirCapSeq target protocol (18) , and sequenced using a HiSeq 4000 platform. In this work, the four datasets will be described as the RaTG13, BtRhCoV-HKU2r anal swab, splenocyte transcriptome and Ebola oral swab datasets, respectively. Metaxa2 (19) was used to identify forward reads that match small subunit (SSU) rRNA from mitochondria, bacteria and eukaryotes present in the four datasets. Phylogenetic affiliation was assigned to the lowest taxonomic rank possible from the read alignments by Metaxa2. Forward reads from the RaTG13, BtRhCoV-HKU2r anal swab and splenocyte transcriptome datasets were mapped to a variety of mitochondrial genomes corresponding to mammalian species known to have been studied at the WIV, using fastv (20) . Some reads from the RaTG13, BtRhCoV-HKU2r anal swab and splenocyte transcriptome datasets were identified as corresponding to mitochondrial SSU rRNA using Metaxa2. These were assembled using Megahit (21). The resulting contigs were used to query the NCBI nr database using Blast (22) , in order to determine its closest match. The contig generated from the RaTG13 dataset was used to make a phylogenetic tree of mitochondrial SSU rRNA genes from different Rhinolophus species, obtained from the NCBI (accession numbers are listed in Supplementary Table 1) . Sequence alignment, model testing and phylogenetic tree construction was conducted using Mega11 (23) . First, a nucleotide alignment was constructed using Muscle (24) . DNA model testing was conducted and the general time reversible (GTR) model (25) was determined to be the best fit to the data using the Akaike Information Criteria (26) . Then, a maximum likelihood analysis was conducted using an estimated gamma parameter and 1000 bootstrap replicates. The number of forward reads mapping to the RaTG13 genome from the RaTG13 NGS dataset were determined using fastv. Eight novel coronavirus genome sequences in addition to the BtRhCoV-HKU2r anal swab dataset were generated from NGS datasets derived from bat anal swabs from southern China by Li et al. (17) . Fastv was used to determine the number of reads mapping to these nine coronavirus genomes from their respective NGS datasets. Raw sequences from the RaTG13 and BtRhCoV-HKU2r anal swab datasets were mapped to a variety of mammalian nuclear genomes. The Rhinolophus ferrumequinum genome (NCBI accession number GCA_ 004115265.3) was the closest related bat genome to R.affinis available, and was used for mapping. In each case, the most recent assembly was used for mapping. First, the reads were trimmed and filtered using fastp (27), using polyX trimming, and filtering reads with > 5 % of reads with a quality threshold of Q < 20. Then the reads were mapped using the splicing aware mapper BBMap (https://sourceforge.net/projects/bbmap/), using the default parameters and the usemodulo option. In order to assess the proportion of reads that mapped to gene sequences, reads from the RaTG13 dataset were mapped to a previous version of the Rhinolophus ferrumequinum bat nuclear genome (28) (NCBI accession number GCA_014108255.1), as a gff annotation file was not available for the most recent version of the genome assembly (GCA_ 004115265.3). The corresponding annotation file GCA_014108255.1_mRhiFer1.p_genomic.gff was incompatible with the sam file containing mapped reads, due to differences in chromosome naming between the annotation file and corresponding genome assembly file. This was corrected by modifying the sam file using the following commands: sed -i 's/ Rhinolophus ferrumequinum isolate mRhiFer1 scaffold_m29_p_[0-9][0-9]*, whole genome shotgun sequence//g' mappingfile.sam followed by sed -i 's/ Rhinolophus ferrumequinum isolate mRhiFer1 mitochondrion, complete sequence, whole genome shotgun sequence//g' mappingfile.sam Quantification of the number of reads mapped to annotated genes was conducted using the bedtools multicov function (29) . To do this, the sam file was first converted to a bam file and then sorted and indexed using SAMtools (30) . (Table 1 ). This implies it has undergone rRNA depletion during preparation, probably using the Ribo-Zero procedure which is part of the TruSeq library preparation protocol. This procedure involves enzymatic degradation of rRNA from both eukaryotes and bacteria. It is unclear if the procedure preferentially degrades rRNA from eukaryotes or bacteria, thus altering the ratio of bacterial : eukaryotic SSU rRNA sequences in the dataset. Consideration of the ratio of eukaryotic : bacterial SSU rRNA sequences reveals marked differences between the datasets. The ratio is 8.3 : 1 for the RaTG13 dataset, 88.7 : 1 for the splenocyte transcriptome dataset and 3.7 : 1 for the Ebola oral dataset (Table 1 ), indicating that eukaryotic SSU rRNA dominates these samples. However, in contrast the BtRhCoV-HKU2r anal swab dataset has a ratio of 1 : 5.4, indicating that bacterial SSU rRNA dominates this dataset, as expected with fecal material. The ratio of eukaryotic : bacterial SSU rRNAs in the RaTG13 dataset is inconsistent with that of the BtRhCoV-HKU2r anal swab dataset, and consequently appears inconsistent with fecal material. Microbial taxonomic analysis provides a fingerprint that can be used to track the source of a sample by identifying taxa characteristic of the microhabitat from which it was derived (31) . The results of the taxonomic analysis are displayed in Table 2 . The has scientific relevance as bat coronaviruses present in oral mucosa are more likely to be transmissable via aerosols, than those that present in higher abundance in fecal swabs. Thus, determining whether RaTG13 was generated from an oral swab would provide better understanding of the emergence of SARS-CoV-2 (which is transmitted via respiratory droplets and aerosols rather than the fecal route (41)). A taxonomic analysis of the SSU rRNA sequences from the Ebola oral swab dataset is instructive. Firstly, there is a low proportion of reads corresponding to Lactococcus spp. While some sequences in the BtRhCoV-HKU2r anal swab dataset might be expected to originate from the bat insectivorous diet (45) , only a few arthropod nuclear rRNA sequences were observed in the BtRhCoV-HKU2r, RaTG13 anal swab and splenocyte transcriptome datasets (0.01 %, 0.02 % and 0.01 % of eukaryotic SSU rRNA sequences, respectively). However, the Ebola oral swab dataset has substantially more (0.4 % of eukaryotic SSU rRNA sequences). This is consistent with the insectivorous diet of M.nimbae, the bat species from which the oral swab was taken. Rhinolophus spp. are also insectivorous, and so the lower relative proportion of arthropod SSU rRNA sequences in the RaTG13 sample is an additional inconsistency with an oral swab. The observations listed above indicate the differences between Ebola oral swab microbiota, which are consistent with an oral microhabitat, with the microbiota present in the RaTG13 sample. These data indicate that the RaTG13 sample was not derived from an oral swab. Reads from the RaTG13, BtRhCoV-HKU2r anal swab and splenocyte transcriptome datasets were mapped onto a range of mammalian mitochondrial genomes (Table 3) . Reads from the RaTG13 dataset mapped most efficiently to the R.affinis mitochondrial genome with 75335 reads mapping with 97.2 % coverage. 18017 reads mapped with 40.4 % coverage to the R.sinicus mitochondrial genome. This implies that the sample originated from R.affinis or a Rhinolophus species more closely related to R.affinis than R.sinicus. The low proportion of total reads mapping to the R.affinis mitochondrial genome (0.3 %) suggests the RaTG13 sample was subjected to DNase treatment during preparation, which is an optional step in the QIAamp Viral RNA Mini Kit protocol. This is consistent with the observation that the majority of reads that map to the Rhinolophus ferrumequinum genome map to annotated protein coding genes (discussed below), implying little host nuclear DNA was present in the sample. Reads from the BtRhCoV-HKU2r anal swab dataset mapped most efficiently to the R.sinicus mitochondrial genome, with 29.8 % coverage and 10019 reads mapping, in contrast to the R.affinis mitochondrial genome, which mapped with 14.9 % coverage and 6278 reads mapping. This indicates that the sample was derived from R.sinicus or a more closely related Rhinolophus species than R.affinis, and is consistent with the phylogenetic analysis below. This contradicts the description of the sample as being derived from R.affinis. Lastly, reads from the splenocyte transcriptome dataset mapped most efficiently to the R.sinicus mitochondrial genome, with 94.5 % coverage and 170591 reads mapping, in contrast to the R.affinis mitochondrial genome, which mapped with 32.2 % coverage and 88220 reads mapping. This indicates that the sample was derived from R.sinicus or a more closely related Rhinolophus species than R.affinis. While mapping to the mitochondrial genome gives a convincing indication of the general phylogenetic affinities of the NGS datasets, mitochondrial SSU rRNA confers more precision. A 1139 bp contig generated by Megahit from SSU rRNA sequences extracted from the RaTG13 dataset using Metaxa2 was found to match R.affinis mitochondrial mitochondrial SSU rRNA (NCBI accession number MT845219) with 98.7 % sequence identity, with 8 mismatches (Figure 1) . A maximum likelihood phylogenetic tree indicates that the RaTG13 contig was most closely related to R.affinis mitochondrial SSU rRNA, compared to other Rhinolophus species for which full length mitochondrial SSU rRNA sequences were available ( Figure 2 ). Mitochondrial SSU rRNA sequences generated by Metaxa2 from the BtRhCoV-HKU2r anal swab dataset were likewise assembled using Megahit. A 960 bp contig aligned to Rhinolophus sinicus sinicus mitochondrial SSU rRNA (NCBI accession number KP257597.1), with only one mismatch. This is surprising given that the anal swab sample is described as having been obtained from R.affinis (17) The eight mismatches of the 1139 bp contig to R.affinis mitochondrial SSU rRNA (derived from the subspecies himalayanus, sampled from Anhui province (46)) implies that the dataset was derived from a genetically distinct population/subspecies of R.affinis, or a closely related species. This is consistent with the observation that the R.affinis taxon has nine subspecies with marked morphological and echolocation differences, and might actually represent a species complex (47) . In addition, Rhinolophus stheno, a species closely related to R.affinis (48) (49) , was identified as being present in the mine in Mojiang in 2015 (50) . Unfortunately, no mitochondrial SSU rRNA sequence is currently available from this species. Megahit-assembled contigs corresponding to plant nuclear SSU rRNA sequences were recovered from both the RaTG13 and BtRhCoV-HKU2r anal swab datasets. A 250 bp contig was recovered from the RaTG13 dataset that showed 100 % identity to Gossypium hirsutum (cotton) nuclear SSU rRNA. A 366 bp contig was recovered from the anal swab dataset that showed 100 % identity to Zea mays (maize) nuclear SSU rRNA, while a 390 bp contig showed 98.5 % identity to Arabis alpine (alpine rock cress) nuclear SSU rRNA, with 6 mismatches (data not shown). Table 4 ). These data shows that the viral titer in the RaTG13 sample was relatively low (7.2 x 10 -5 of total reads map to the coronavirus genome), compared to the nine anal swab samples generated by Li et al. (which includes the BtRhCoV-HKU2r anal swab sample), which ranged from 3.0 x 10 -5 to 4.9 x 10 -2 of total reads mapping to the respective coronavirus genomes. Unfortunately, there are no coronavirus datasets generated from cell lines by the WIV available for comparison. Finally, it was found that raw reads generated from Rhinolophus larvatus (SRA accession number SRR11085733) mapped to the BtHiCoV-CHB25 genome, and not Hipposideros pomona as reported in the supplementary file msphere.00807-19-st002.xlsx (17) . In order to identify the origin of the bulk of reads in the RaTG13 dataset, they were mapped to a variety of mammalian genomes, corresponding to species for which cell lines were known to be in use at the WIV, as well as Rhinolophus ferrumequinum, which is the bat genome most closely related to R.affinis available ( Table 5 ). The results show that the reads most efficiently map to the R.ferrumequinum genome, with 87.5 % of reads mapping. An even higher percentage of reads would be expected to map to the exact Rhinolophus species used to generate the RaTG13 sample (R.affinis or closely related species, as identified in the phylogenetic analysis above). The high percentage of reads mapping to R.ferrumequinum is inconsistent with a fecal swab, which is expected to have a majority of reads mapping to bacterial sequences. This is because fecal material is typically dominated by bacteria, with only a small amount of host nucleic acid present (53) . Consistent with this expectation, only 2.6 % of the reads from the BtRhCoV-HKU2r anal swab sample mapped to the R.ferrumequinum genome ( Table 5 ). In addition, the results appear inconsistent with an oral swab. This is because only 27.9 % of reads from the Ebola swab sample map to the Miniopterus natalensis genome (NCBI accession number GCF_001595765.1), which was the most closely related bat genome to M.nimbae available. However, only the forward reads from the NGS dataset were used for the mapping, as the reverse reads were not available. In addition, the RNA purification method used is unclear from the NCBI sample webpage. These two factors means that the Ebola oral swab mapping results may not be directly comparable to the RaTG13-R.ferrumequinum mapping results. Further analysis of the RaTG13 reads mapped to the R.ferrumequinum genome shows that 62.1 % of mapped reads map to protein coding genes. 92.1 % of protein coding genes have at least one read that maps to it. These data confirm that the RaTG13 sample represents a transcriptome. In addition, the result indicates that the sample did not have large amounts of DNA present as this would lead to mapping to parts of the genome that do not code for protein coding genes, which is the large majority of the bat genome. This supports the mitochondrial genome mapping results that the sample was subjected to DNase treatment, which is an optional step in the QIAamp Viral RNA Mini Kit. The data presented here indicate that the RaTG13 genome was not generated from a bat fecal swab, but rather a Rhinolophus sp. cell line or tissue. Given that the original RaBtCov/4991 RdRp sequence fragment was described as having been generated from a R.affinis fecal swab (4), then the chain of events leading from that original sample to the sample used to generate the genome sequence is unclear. As far as the author is aware, no live animals were reported as being captured in the collecting expeditions to Mojiang between August 2012 to July 2013 and there is no reported precedent for virus isolation from bat tissue at the WIV. If the sample was derived from a dead bat, it is hard to understand how the sample became depleted, as stated by Zheng-liShi (54) , given that tissue would have yielded substantial RNA. The presence of a substantial proportion of reads corresponding to the Escherichia spp. and Lactococcus spp. in the NGS reads ( Table 2 ) would also be hard to understand. There are two examples of mislabelling of coronavirus samples by researchers at the WIV identified here. The BtRhCoV-HKU2r anal swab sample appears to have been derived from R.sinicus sinicus and not R.affinis as described. In addition, the BtHiCoV-CHB25 anal swab sample was derived from R.larvatus and not H.pomona as reported. These observations indicate that sample collection and processing were error prone, lending some credence to a lab leak scenario for the origin of the COVID-19. Given that these samples may contain potential pandemic pathogens (PPPs), this is of great concern. In particular, it is of note that in a Master's thesis by Yu Ping, supervised by Zheng-li Shi and Cui Jie, work on the Ra4991_Yunnan (RaTG13) sample and other bat samples is described as being conducted in a 'BSL-2 cabinet', but it is unclear if the cabinet was situated in a BSL-2 lab or a regular lab (16) . Bat coronaviruses are PPPs and so should be handled in a BSL-3 lab (55) . A further curiosity is that neither Yu Ping or Cui Jie are listed as authors in the Nature paper describing RaTG13 (3). The data imply that RaTG13 may have been in live culture at the WIV since before June 2017, which is when the Illumina sequencing of the RaTG13 (RaBtCOv/4991) sample appears to have begun (information generated by Francisco de Asis de Ribera). This implies that isolation of RaTG13 was successfully conducted on the original sample collected from the Mojiang mine after its collection in July 2013 and before June 2017. It is unclear if RaTG13 is currently in live culture at the WIV. Of high relevance to the work described here is the statement on page 119 in the recently released EcoHealth-WIV grant 1R01Al 110964-01 that the WIV had successfully isolated coronaviruses using bat cell lines: ' We have developed primary cell lines and transformed cell lines from 9 bat species using kidney, spleen, heart, brain and intestine. We have used these for virus isolation, infection assays and receptor molecule gene cloning.' (italics the author's) In the Nature paper describing RaTG13 (3), and its Addendum (6), there is no specific statement that the RaTG13 raw reads were generated from a fecal swab. However, the dataset is labelled as such at the GSA, SRA and ENA, the original RaBtCoV /4991 sample from the Mojiang mine is described as having been a fecal swab (4) , and the Master's thesis describing the genome sequencing of RaTG13 describes it as having been generated from an anal swab/fecal pellet (16) . Consequently, the Nature paper describing RaTG13 and its database entries should be amended to state the true provenance of RaTG13. Pertinent to this is the observation that serial passaging during stock maintenance typically causes SARS-CoV-2 to diverge genetically, given that is rapidly adapts to the cell culture conditions (56) . This means that the reported RaTG13 genome sequence would be expected to have picked up mutations during its laboratory sojourn, from its date of isolation. This is consistent with the observation that the genome has an excess of synonymous mutations compared to the closely related Rhinolophus malayanus RmYN02 coronavirus (which was sampled in 2019 (57)), in relation to SARS-CoV-2 (58). This could be interpreted as being the result of evolutionary change occurring from its 2013 collection date, which could only happen if the virus was in culture (as noted in a preprint of (58) at https://www.biorxiv.org/content/10.1101/2020.04.20.052019v1). The observations outlined here have important implications regarding the origin of the COVID-19 pandemic. Zheng-li Shi, lead author of the Nature paper, has stated that RaTG13 was not in live culture at the WIV, as follows: "I would like to emphasize that we have only the genome sequence and didn't isolate this virus" and "…we did not do virus isolation and other studies on it" (54) . The data presented here suggest otherwise, and point to the possibility of additional coronaviruses closely related to SARS-CoV-2 in culture at the WIV that have not yet been disclosed. This conclusion increases the plausibility of the lab leak scenario for the origin of COVID-19. The investigation into the origin of the COVID-19 pandemic has represented a paradigm shift for how forensic and epidemiological investigations are conducted, with decentralized online groups of investigators making significant contributions. This work is the product of discussions with numerous investigators based on Twitter, some anonymous, several associated with the Decentralized Radical Autonomous Search Team Investigating COVID-19 (DRASTIC) (https://drasticresearch.org/), and others independent. The author has made widespread use of material generated by online investigators, including information generated by @BillyBostickson, @TheSeeker268, @franciscodeasis, @Daoyu15, @pathogenetics, @mrandersoninneo and @Ayjchan. The Master's thesis of Yu Ping was identified and translated by @TheSeeker268 and @franciscodeasis. The EcoHealth-WIV NIAID grant 1R01Al 110964-01 was made available by an FOIA made by The Intercept (theintercept.com). In particular, the author would like to thank @Florin_Uncovers for his tireless (59) (59) persistence and curiosity: this work is the direct result of our discussions together. R.affinis TGCAAGTATCTGCA-CCCAGTGAGAATGCCCTCTAAATCACACCTGATTAAAAGGAGCGG RaTG13-sample TGCAAGTATCCGCACCCCAGTGAGAATGCCCTCTAAATCACGCCTGATTAAAAGGAGCGG **********.*** **************************.****************** R.affinis GCATCAAGCACACTACAAAGTAGCTCATGACGCCTTGCTTAACCACGCCCCCACGGGAAA RaTG13-sample GCATCAAGCACACTACAAAGTAGCTCATGACGCCTTGCTTAACCACGCCCCCACGGGAAA ************************************************************ R.affinis CAGCAGTGATAAAAATTAAGCCATGAACGAAAGTTCGACTAAGTTATACCTACTCCTTAG RaTG13-sample CAGCAGTGATAAAAATTAAGCCATGAACGAAAGTTCGACTAAGTTATACCTACTCCTTAG ************************************************************ R.affinis GGTTGGTAAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAATTAACAGAAACA RaTG13-sample GGTTGGTAAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAATTAACAGAAACA ************************************************************ R.affinis CGGCGTAAAGCGTGTTTAAGAATAC-AAAAAAAATAAAGTTAAATTCTAGCTAAGCTGTA RaTG13-sample CGGCGTAAAGCGTGTTTAAGAATACAAAAAAAAATAAAGTTAAATTCTAGCTAAGCTGTA ************************* ********************************** R.affinis AAAAGCCATAGCTAAAATAAAAATAGACTACGAAAGTGACTTTACAAATTCTGAATACAC RaTG13-sample AAAAGCCATAGCTAAAATAAAAATAGACTACGAAAGTGACTTTACAAATTCTGAATACAC ************************************************************ R.affinis GATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAAACA RaTG13-sample GATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAAACA ************************************************************ R.affinis ATCAACACAACAACATTATTCGCCAGAGTACTACTAGCAACAGCTTAAAACTCAAAGGAC RaTG13-sample ATCAACACAACAACATTATTCGCCAGAGTACTACTAGCAACAGCTTAAAACTCAAAGGAC ************************************************************ R.affinis TTGGCGGTGCTTCATACCCCTCTAGAGGAGCCTGTCCTATAATCGATAAACCCCGATAGA RaTG13-sample TTGGCGGTGCTTCATACCCCTCTAGAGGAGCCTGTCCTATAATCGATAAACCCCGATAAA **********************************************************.* R.affinis CCTCACCAGCTCTTGCCAATTCAGCCTATATACCGCCATCCTCAGCAAACCCTAAAAAGG RaTG13-sample CCTCACCAGCTCTTGCCAATTCAGCCTATATACCGCCATCCTCAGCAAACCCTAAAAAGG ************************************************************ R.affinis AACTGCAGTAAGCACAAACATTAGACATAAAAACGTTAGGTCAAGGTGTAGCCTATGAGC RaTG13-sample AACTGCAGTAAGCACAAACATTAGACATAAAAACGTTAGGTCAAGGTGTAGCCTATGAGC ************************************************************ R.affinis TGGGAAGAGATGGGCTACATTTTCTTCTCAAAGAACATTTAAAACTACATACGGAAGTTC RaTG13-sample TGGGAAGAGATGGGCTACATTTTCTTCTCAAAGAACATTTAAAACTACATACGGAAGTTC ************************************************************ R.affinis TCATGAAATAGAGAGCGGAAGGTGGATTTAGTAGTAAATCAAGAACAAAGAGCTTGGTTG RaTG13-sample TCATGAAATAGAGAACGGAAGGTGGATTTAGTAGTAAATCAAGAACAAAGAGCTTGGTTG **************.********************************************* R.affinis AATTAGGCCATGAAGCACGCACACACCGCCCGTCACCCTCCTCAAATATGAAGGTAATAC RaTG13-sample AATTAGGCCATGAAGCACGCACACACCGCCCGTCACCCTCCTCAAATATGAAGGTAATGC **********************************************************.* R.affinis CCAAACCTATTACCACACACCCACAATATGAGAGGAGATAAGTCGTAACAAGGTAAGCGT RaTG13-sample CCAAACCTATTACCACACACCCACAATATGAGAGGAGATAAGTCGTAACAAGGTAAGCGT ************************************************************ TGCAAGTATCCGCACTCCAGTGAGAATGTCCTCTAAATCACACCTGATTAAAAGGAGCGG ********** ** * ************ *********** ***************** R.yunnanensis GCATCAAGCGCACTACAAAGTAGCTCATAACGCCTTGCTTAACCACACCCCCACGGGAAA R.ferrumequinum-nippon GCATCAAGCACACTACAAAGTAGCTCACAACGCCTTGCTTAACCACGCCCCCACGGGAAA R.pumilus GCATCAAGCGCACTATAAAGTAGCTCATGACGCCTTGCTTAACCACGCCCCCACGGGAAA The origins of SARS-CoV-2: a critical review CoV-2 Have Arisen via Serial Passage through an Animal Host or Cell Culture? Coexistence of multiple coronaviruses in several bat colonies in an abandoned mineshaft The genetic structure of SARS-CoV-2 does not rule out a laboratory origin Addendum: A pneumonia outbreak associated with a new coronavirus of probable bat origin The analysis of six patients with severe pneumonia caused by unknown viruses Novel virus discovery in bat and the exploration of receptor of bat coronavirus HKU9. Beijing : National Institute for Viral Disease Control and Prevention Lethal Pneumonia Cases in Mojiang Miners (2012) and the Mineshaft Could Provide Important Clues to the Origin of SARS-CoV-2 The anomalous nature of the fecal swab data, receptor binding domain and other questions in RaTG13 genome Major Concerns on the Identification of Bat Coronavirus Strain RaTG13 and Quality of Related Nature Paper De-novo assembly of RaTG13 Genome Reveals Inconsistencies further Obscuring SARS-CoV-2 Origins SARS-CoV-2ʹs claimed natural origin is undermined by issues with genome sequences of its relative strains Investigation of RaT13 and the 7896 clade Anomalies in BatCoV/RaTG13 sequencing and provenance. Zhang, D. 2020, Zenodo Master's Thesis: Geographic Evolution of Bat SARS-related Coronaviruses Discovery of Bat Coronaviruses through Surveillance and Probe Capture-Based Next-Generation Sequencing Virome Capture Sequencing Enables Sensitive Viral Diagnosis and Comprehensive Virome Analysis Metaxa2: Improved Identification and Taxonomic Classification of Small and Large Subunit rRNA in Metagenomic Data MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph Basic local alignment search tool MEGA11: Molecular evolutionary genetics analysis version 11 MUSCLE: a multiple sequence alignment method with reduced time and space complexity Some probabilistic and statistical problems in the analysis of DNA sequences Information theory and an extension of the maximum likelihood principle Six reference-quality genomes reveal evolution of bat adaptations BEDTools: a flexible suite of utilities for comparing genomic features The sequence alignment/map (SAM) The genus Micrococcus Microbes vs. chemistry in the origin of the anaerobic gut lumen The Family Peptostreptococcaceae The Family Lachnospiraceae Clostridium species as probiotics: potentials and challenges Origin and cross-species transmission of bat coronaviruses in China Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor Bats are natural reservoirs of SARS-like coronaviruses Modes of transmission of SARS-CoV-2 and evidence for preventive behavioral interventions The Family Pasteurellaceae The genus Haemophilus The complete mitochondrial genome of Rhinolophus affinis himalayanus Taxonomic implications of geographical variation in Rhinolophus affinis (Chiroptera: Rhinolophidae) in mainland Southeast Asia. Ith Molecular phylogenetics and historical biogeography of Rhinolophus bats. Stoffberg Patterns of sexual size dimorphism in horseshoe bats: Testing Rensch's rule and potential causes SARS-related coronaviruses that use bat ACE2 receptor Unexpected novel merbecovirus discoveries in agricultural sequencing datasets from Wuhan Proposed Forensic Investigation of Wuhan Laboratories. s.l. : DRASTIC, 2021 A Pipeline for Faecal Host DNA Analysis by Absolute Quantification of LINE-1 and Mitochondrial Genomic Elements Using ddPCR Wuhan coronavirus hunter Shi Zhengli speaks out A cautionary perspective regarding the isolation and serial propagation of SARS-CoV-2 in Vero cells A Novel Bat Coronavirus Closely Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site of the Spike Protein Synonymous mutations and the molecular evolution of SARS-CoV-2 origins Coronaviruses with a SARS-CoV-2-like receptor binding domain allowing ACE2-mediated entry into human cells isolated from bats of Indochinese peninsula GCATCAAGCACACTATAAAGTAGCTCATAACGCCTTGCTTAGCCACACCCCCACGGGAAA * ******* ***** *********** ************ **** ************* CAGCAGTGATAAAAATTAAGCCATGAACGAAAGTTCGACTAAGTTATACCTACTTCCTAG *********************************** **************** * ** R.yunnanensis GGTTGGTAAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAATCAACAGAAACA R.ferrumequinum-nippon GGTTGGTAAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAATTAACAGAAATA R.pumilus GGTTGGTAAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAATCAACAGAAACA R.pusillus GGTTGGTAAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAATCAACAGAAACA R.monoceros GGTTGGTAAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAATCAACAGAAATA R.macrotis GGTTGGTAAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAATCAACAGAAACA R.rex GGTTGGTAAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAATCAACAGAAACA R.affinis GGTTGGTAAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAATTAACAGAAACA RaTG13-sample GGTTGGTAAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAATTAACAGAAACA R.sinicus-sinicus GGTTGGTAAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAATCAACAGAAATA R.thomasi GGTTGGTAAATTTCGTGCCAGCCACCGCGGTCACACGATTAACCCAAATCAACAGAAATA ************************************************* ******** * R.yunnanensis CGGCGTAAAGCGTGTTTAAGAATAC-AAGAAAAATAAAGTTAAACTCTAGCTAAGCCGTA R.ferrumequinum-nippon CGGCGTAAAGCGTGTTTAAGAGTAC---AAAAAATAAAGTTAAATCCTAACTAAGCCGTA R.pumilus CGGCGTAAAGCGTGTTTAAGAATAA--AAAAAAATAAAGTTAAATTCTAGCTAAGCTGTA R.pusillus CGGCGTAAAGCGTGTTTAAGAATAA-AAAAAAAATAAAGTTAAATTCTAGCTAAGCTGTA R.monoceros CGGCGTAAAGCGTGTTTAAGAATAA-AAAAAAAATAAAGTTAAATTCTAGCTAAGCTGTA R.macrotis CGGCGTAAAGCGTGTTTAAGAATAATAAAAAAAATAAAGTTAAATTCTAGCTAAGCTGTA R.rex CGGCGTAAAGCGTGTTTAAGAATAATAAAAAAAATAAGGTTAAATTCTAACTAAGCTGTA R.affinis CGGCGTAAAGCGTGTTTAAGAATAC-AAAAAAAATAAAGTTAAATTCTAGCTAAGCTGTA RaTG13-sample CGGCGTAAAGCGTGTTTAAGAATACAAAAAAAAATAAAGTTAAATTCTAGCTAAGCTGTA R.sinicus-sinicus CGGCGTAAAGCGTGTTTAAGAGTGC--AAAAAAATAAAGTTAAATTCTAGCTAAGCCGTA R.thomasi CGGCGTAAAGCGTGTTTAAGAACAT--AAAAAAATAAAGTTAAATTCTAGCTAAGCCGTA ********************* ******** ****** *** ****** ***AAAAGCCATAGCTAAAATAAAAATAAACTACGAAAGTGACTTTACGAATTCTGAACACAC ******* ***************** ** *************** * ****** **** R.yunnanensis GATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAAACA R.ferrumequinum-nippon GATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAAACA R.pumilus GATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAAACA R.pusillus GATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAAACA R.monoceros GATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAAACA R.macrotis GATAGCTAAGATCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAAACA R.rex GATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAAACA R.affinis GATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAAACA RaTG13-sample GATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAAACA R.sinicus-sinicus GATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAAACA R.thomasi GATAGCTAAGACCCAAACTGGGATTAGATACCCCACTATGCTTAGCCCTAAACCTAAACA *********** ************************************************ GTCAACACAACAACATTATTCGCCAGAGTACTACTAGCAACAGCTTAAAACTCAAAGGAC * * ******** *** **************** ***** ******************* TTGGCGGTGCTTTATACCCCTCTAGAGGAGCCTGTCCTATAATCGATAAACCCCGATAGA ************ ************************* ********* ********* * CCTCACCAGCTCTTGCCAATTCAGCTTATATACCGCCATCCTCAGCAAACCCTAAAAAGG ********** ******** ***** *************** *** ************** R.yunnanensis AGCCACAGTAAGCACAAACATAAGACATAAAAACGTTAGGTCAAGGTGTAGCCCATGGGC R.ferrumequinum-nippon AACTGTAGTAAGCACAAACATAAGACATAAAGACGTTAGGTCAAGGTGTAGCCCATGAGC R.pumilus AACTGTAGTAAGCACAAACATAAGACATAAAAACGTTAGGTCAAGGTGTAGCCTATGAGC R.pusillus AACTGTAGTAAGCACAAACATAAGACATAAAAACGTTAGGTCAAGGTGTAGCCTATGAGC R.monoceros AACTGTAGTAAGCACAAACATAAGACATAAAAACGTTAGGTCAAGGTGTAGCCTATGAGC R.macrotis AACTGTAGTAAGCACAAACATAAGACATAAAAACGTTAGGTCAAGGTGTAGCCTATGAGC R.rex AACTGTAGTAAGCACAAACATAAGACATAAAAACGTTAGGTCAAGGTGTAGCCTATGAGC R.affinis AACTGCAGTAAGCACAAACATTAGACATAAAAACGTTAGGTCAAGGTGTAGCCTATGAGC RaTG13-sample AACTGCAGTAAGCACAAACATTAGACATAAAAACGTTAGGTCAAGGTGTAGCCTATGAGC R.sinicus-sinicus AACTACAGTAAGCACAAACATAAGACATAAAAACGTTAGGTCAAGGTGTAGCCTATGAGC R.thomasi AACTACAGTAAGCACAAACATAAGACATAAAAACGTTAGGTCAAGGTGTAGCCTATGAGC * * *************** ********* ********************* *** ** TGGGAAGAGATGGGCTACATTTTCTTCTTAAAGAACATTTAAAACTTCATACGGAAGCTC ************************** ******** ***** ***** ** * CCGTGAAACAAGGAGCAGAAGGTGGATTTAGTAGTAAATCAAGAACAAAGAGCTTGATTG ** ** * * ********************* ****** ********** *** AATTAGGCCATGAAGCACGCACACACCGCCCGTCACCCTCCTCAAATATAGAGGTAGCAC *** ************************ ********** ******** *** * * R.yunnanensis CCAAACCTATTACCATACACCCACAGTATGAGAGGAGATAAGTCGTAACAAGGTAAG-CG R.ferrumequinum-nippon CCAAACCTATTAACACGTACCCACAACATGAGAGGAGATAAGTCGTAACAAGGTAAG-CG R.pumilus CCAAACCTATTATCACGTACCCATAGTATGAGAGGAGATAAGTCGTAACAAGGTAAGCCG R.pusillus CCAAACCTATTACCACGTACCCATAGTATGAGAGGAGATAAGTCGTAACAAGGTAAG-CG R.monoceros CCAAACCTATTACCACGTACCCATAGTATGAGAGGAGATAAGTCGTAACAAGGTAAG-CG R.macrotis CCAAACCTATTAACACGTGCCCGTAGTATGAGAGGAGATAAGTCGTAACAAGGTAAG-CG R.rex CCAAACCTATTAACACGTACCCGTAGTATGAGAGGAGATAAGTCGTAACAAGGTAAG-CG R.affinis CCAAACCTATTACCACACACCCACAATATGAGAGGAGATAAGTCGTAACAAGGTAAG-CG RaTG13-sample CCAAACCTATTACCACACACCCACAATATGAGAGGAGATAAGTCGTAACAAGGTAAG-CG R.sinicus-sinicus CCAAACCTATTACCACGTACCCGTAATATGAGAGGAGATAAGTCGTAACAAGGTAAG-CG R.thomasi CTAAACCTATTATCACGTACCCGTAATATGAGAGGAGATAAGTCGTAACAAGGTAAG-CG * ********** ** *** * ****************************** **