key: cord-0104451-3y0xlb3i authors: Abid, Nabil; Chillemi, Giovanni; Rebai, Ahmed title: Did circoviruses intermediate the recombination between bat and pangolin coronaviruses, yielding SARS-CoV-2? date: 2020-09-29 journal: nan DOI: nan sha: 784b34db3c26da442b03b898527da0ca971264e3 doc_id: 104451 cord_uid: 3y0xlb3i Since the first reports of a coronavirus (CoV) disease 2019 (COVID-19) caused by severe acute respiratory syndrome virus (SARS-CoV-2) in Wuhan, Hubei province, China, scientists are working around the clock to find sound answers to the issue of its origin. While the number of scientific articles on SARS-CoV-2 is increasing, there are still many gaps as to its origin. All studies failed to find a coronavirus in other animals that is more similar to human SARS-COV2 than the bat virus, considered to be the primary reservoir. In this paper we address a new hypothesis, based on a possible recombination between a DNA and SARS-CoV viruses, to explain the rise of SRAS-CoV-2. By comparing SARS-CoV-2 and related CoVs with circoviruses (CVs), we found strong sequence similarity of the genomic region at the 3-end of Bat-CoV ORF1a and the origin of replication (Ori) of porcine CV type 2 (PCV2), as well as similar RNA secondary structures of the region encompassing the cleavage site of CoV S gene with the PCV2 Ori. This constitutes a primary evidence that supports a possible recombination, which occurrence might explain the origin of SARS-CoV-2. Many theories about the origin of the SARS-CoV-2 have been proposed; the most debated ones have been: its natural emergence after passing from bats to an intermediate animal, which served as a springboard for SARS-CoV-2 to jump from bats to humans (1) , and the virus being deliberately engineered and accidentally released by humans (2) . The hypothesis of natural emergence is far more supported by recent data and two scenarios have been suggested (3); the first is natural selection in an animal host before zoonotic transfer which necessitates an animal intermediate that have a high population density (to allow natural selection) and an ACE2encoding gene similar to the human ortholog, since the SARS-CoV-2 virus have acquired both the polybasic cleavage site and mutations in the Spike protein suitable for human ACE2 binding. The second scenario is natural selection in humans following zoonotic transfer, where the progenitor of SARS-CoV-2 jumped into humans and acquired new genomic features through adaptation during undetected human to human transmission. It is now recognized that the newly emerged SARS-CoV-2 Spike (S) glycoprotein gained specific features that facilitated its spread as compared to its closely related Bat-CoV RaTG13 strain, as supported by both phylogenetic analysis (4) and a single nsp8 gene, typical of Bat-CoVs (5, 6) . The most remarkable event in SARS-CoV-2 S glycoprotein is the insertion of a cleavage site (residues PRRA) at the boundary between the S 1 and S 2 subunits, compared to Bat-CoV RaTG13 strain (Gisaid ID: EPI_ISL_402131). The severity of infection has not been yet fully linked to the newly acquired furin-like cleavage site, however has likely facilitated the host species jump and a dramatic increase in cell-cell fusion capacities. Such events were previously reported for Sendai virus following the insertion of a second furin cleavage site in the F protein (7) . Additionally, the insertion of a furin-like cleavage site in SARS-CoV-2 S glycoprotein is reminiscent of a low-pathogenic avian influenza virus (AIV) (8, 9) , which, when introduced into a poultry farm, acquired a polybasic cleavage motif that caused a deadly outbreak of highly pathogenic virus. However, several questions are still unanswered concerning the reason why the acquisition of the cleavage site is restricted to a particular genome region, as reported previously 2 for AIV (10) and whether SARS-CoV-2 is a "Chimera" of two viral strains or if it resulted from the accumulation of point mutations, as hypothesized in most published studies (1) . CoVs are characterized by the reduced error rate of their viral polymerases, much lower than that of other RNA viruses (11, 12) , unless they recombine their genomes with a related CoV. This feature, that has been observed for SARS-CoV-2 infection (as estimated from public genome sequences released since December 2019), might ultimately limit the mutagenic variability of the virus over a short period of time. Therefore, recombination might prove to be a prominent driving factor in the evolution and the expansion of species range and cellular tropism. Recent studies reported the detection of a Bat-CoV RmYN02 strain (Gisaid ID: EPI_ISL_412977) containing insertions in the S 1 /S 2 cleavage site (13) and the detection of two human SARS-CoV-2 variants (SARS-CoV-2 Variants 1 and 2) (14) showing deletion mutations in the furin-like cleavage site and its flanking sites. More recent study reported three human variants (referred here as SARS-CoV-2_mut1, SARS-CoV-2_mut2, and SARS-CoV-2_mut3) showing deletions at the S 1 /S 2 junction (15) . The detection of the human variants, in vivo and/or in vitro, suggests that this region of the S gene is under strong selective pressure, given that replication in permissive cells leads to the loss of this adaptive function. Given the importance of this region, we decided to further analyze a possible origin of the furinlike cleavage site in the SARS-CoV-2 S gene. The most important question is whether these insertions/deletions are due to homologous recombination or might have resulted from a recombination between two different viruses. Here, we consider the hypothesis of a possible recombination between RNA and DNA viruses to explain the emergence of SARS-CoV-2. Although the recombination between RNA and DNA viruses was considered, for a long time, as an unusual and extremely rare event, it was previously described by the isolation of a circovirus-like genome, called Boiling Springs Lake RNA-DNA hybrid virus (BSL-RDHV) from an acidic hot lake (16, 17) . It was reported that the gene for the rolling-circle replication initiation protein (RC-Rep) of BSL-RDHV is inherited from a circovirus-like ancestor whereas the capsid protein (Cap) gene is most closely related to that of ssRNA viruses (16) . Additionally, a metagenomic study tracing the prevalence of circular DNA viruses in tissue specimens and environment samples, followed by sequence annotation using bioinformatics tools, reported recombination events, never described before, between ssDNA viruses and ssRNA viruses (18) . 3 Publicly available genomic CoVs sequences were obtained from GenBank (https://www.ncbi.nlm.nih.gov/) and GISAID (https://www.gisaid.org/) (as by 08/09/2020). We used basic local alignment search tool BLAST (National Center for Biotechnology Information [NCBI], Bethesda, United States) (19) . The use of local alignment is justified by the fact that the CVs and CoVs are too divergent to retain discernable sequence similarity using global alignment algorithms. Nucleotide BLAST was used with appropriate parameters (Word size: 7; Gap Existance:4, Gap Extension: 2). Data mining was carried out using local database comprising a metagenomic data of DNA viruses in animal specimens and environmental samples, reported previously (18) . To evaluate the sequence variability, genome sequences were aligned using ClustalW (20) and MAFFT (21) and refined manually. Alignments for region flanking the cleavage site of the S glycoprotein and the PCV2 Ori were extracted from the genome alignment. The centroid secondary structures (SS) were first generated by RNAfold (http://rna.tbi.univie.ac.at/) using default parameters (22) , then visualized and edited using VARNA v.3-93 (23) . In order to show SS variability, the missing sequence motifs in viral strains (RaTG13 and Pangolin CoVs) were completed by their corresponding sequences in SARS-CoV-2. We selected, as a query for sequence homology search (using nucleotide BLAST), a mosaic region comprising the insertion region of the recently detected Bat-CoV RmYN02 strain (Gisaid ID: EPI_ISL_412977) and its missing upstream and downstream nucleotide sequences from SARS-CoV-2 Wuhan strain (Genbank accession n o NC_045512) (Fig. 1 ). The similarity search was carried out against circoviruses (taxid:39725) with appropriate alignment parameters (see material and methods). Three similar 11-nt sequences were shown in the genomes of PCV2 The results showed a high variability, mainly in the downstream region of the alignment ( Fig. 2A ). The used Raccoon CV showed sequence similarity with SARS-CoV-1 and its related strains (yellow boxes), mapping mostly to the beginning of the alignment and to a region specific of SARS-CoV-1 strains. However, the highest sequence similarity of Raccoon CV was shown with SARS-CoV-2 and its related strains, mainly in the high variable region (blue boxes). To assess whether S gene of CoVs shows similar SS structure as that of CV, we carried out the SS analysis of the RNA region flanking the cleavage site of different CoVs (Human SARS-CoV-2 Wuhan strain, Bat-CoV RaTG13 strain, and Pangolin-CoV isolate MP789) and compared them with the well-known SS of the PCVs Ori (24,25) (Fig. 3) . 5 The results showed five similar nucleotides at the loop structure (5'-gUAUA-3') and four similar nucleotides (5'-gCgC-3') in the stem of PCV2 and SARS-CoV-2 SS palindrome (Fig. 3) . While these nucleotides are totally or partially shared by Bat-CoV RaTG13 strain and Pangolin-CoV isolate MP789, they were different in Bat-CoV RmYN02 strain and SARS-CoV-1 (data not shown). However, nucleotide substitutions at specific positions downstream the the cleavage site of Bat-CoV RmYN02 allow the generation of similar SS with SARS-CoV-2 (data not shown), suggesting that nucleotide mutations at the stem loop structure of CoVs might hamper the formation of similar SS as PCV2. As reported previously, the SS of PCV2 Ori showed four hexa-nucleotide sequences (H1-4) and two penta-nucleotide sequences (P1 and P2) (25), whereas we report here that SARS-CoV-2 showed three H (H1-3) and one P; both viruses shared the H3 and partially P1, generating the furin-like cleavage site. The Bat-CoV RatG13 strain and Pangolin-CoV isolate MP789 shared two H (H1 and H2) with SARS-CoV-2 (Fig. 3) . According to these SS structures, the Bat-CoV RaTG13 strain is more closely related to SARS-CoV-2. The difference was an insertion of P and H3 sequences, identical to P1 and H3 of PCV2, generating a cleavage site (Fig. 3) . Besides the SS homology between CoVs and PCV2, these findings suggest that Bat-CoV/PCV2 recombination occurs only when both viruses have identical nucleotide sequences at the region flanking the cleavage site of CoVs and the PCV2 Ori. Thus, the region encompassing H1, H2, P1, and H3 (5'-CggCAGCggCAgCACCTCggCgg-3') of the PCV2 Ori was used as query for similarity search among CoV strains (Fig. 2B) . The results showed sequence similarity with the (Fig. 2B) . Surprisingly, these sequences are mapped to the 3'-end of ORF1a of these Bat-CoVs, while we expected them in the S gene, as a motif in the query sequence unique to the cleavage site of SARS-CoV-2 S gene (5'-CggCgg-3') is shared by PCV2 Ori. 6 During DNA replication of PCV2, the viral Rep protein of PCV2 nicks an octa-nucleotide sequence (5'-AgTATT † AC-3') of the Ori (loop structure) between T 6 and A 7 to generate a free 3′-OH end for initiation of plus-strand DNA replication (24) . Interestingly, these Bat-CV strains have a similar nucleotide sequence, mainly at the nick site (5'-AgTAgTT † AC-3') (Fig. 2B) . The resulting in a rapid evolution of the virus in Bats and its jump into humans (Fig. 4) . The diversity and prevalence of environmental RNA and DNA viruses as epicenters of uncommon recombination events are still very poorly studied (33) . Although the DNA/RNA recombination was not shown previously between PCV2 and Bat-CoV, heterologous inter-family recombination event was reported between Bat-CoV and bat orthoreovirus (34) . The sequence similarity and SS match could constitute another mechanism by which the recombination events occur between these two unrelated viruses. While the SS of CVs Ori was discussed in the present study as the main structure for the initiation of virus replication, these structures regulate many stages of the viral replication cycle of RNA viruses, including genome replication and packaging, intracellular trafficking (35) , and play a potential role in genetic recombination (36) (37) (38) (39) . The high variability and rearrangement of the region downstream the stem-loop SS were reported previously for PCV1 (40) , closely related to PCV2, to yield deletion and/or extensive nucleotide reorganization of the hexa-nucleotide sequences. In particular, mutations engineered into H1/H2 of nonpathogenic PCV1 were invariably deleted so that the downstream H3/H4 was 8 placed next to the palindrome (40) . According to this same study, viral genomes with mutations engineered into both H1/H2 and H3/H4 underwent extensive nucleotide reorganization to yield progeny viruses containing either H3/H4, h-like/H4, or h-like/H3/H4 sequences, generating sequences in a similar way as the downstream region of SS of Bat-CoV RmYN02 and SARS-CoV-1 strains (data not shown). DNA replication of CVs is initiated by cleaving the loop structure at a specific nick site using the synthesized viral Rep protein (41) . Therefore, we suggest that the deletions in the cleavage site in recently detected SARS-CoV-2 variants, even though they showed similar SS, are a result of a selective pressure on the adaptive sequence rather than specific point mutations or homologous recombination, in a similar way as for CVs. Finally, it is worth noting that the sequence mapped to the PCV2 Ori was reported to have a CpG oligodeoxynucleotide (ODN) (42) and therefore similar CoV sequence mapping to that position could modulate the immune response. This needs further experimental investigation. The RNA viruses as new potential candidate agents for the global pandemics were extensively discussed (43) . However, due to their high genetic variability, inter-and intra-species recombination strategies increasing the already huge diversity, it is impossible to anticipate the emergence of a new viral strain. The emergence of SARS-CoV-2 could be seen as an opportunity to elucidate some uncommon RNA/DNA recombination events. We report in the present study a new perspective concerning the emergence of SARS-CoV-2 by focusing on the cleavage site of the S gene and similar events that might have occurred in the genome of SARS-CoV-2. This suggests the importance of monitoring SARS-like CoVs in Bats and CVs in animal herds as well as the establishment of effective strategies to hamper interspecies spread of these two viruses. CoV and animal CVs. The remarkable insert in the SARS-CoV-2 S glycoprotein is indicated by a blue box and a question mark. The bat, swine, raccoon, and human are used to show the host species involved in the present study. 18 Extreme Genomic CpG Deficiency in SARS-CoV-2 and Evasion of Host Antiviral Defense The proximal origin of SARS-CoV-2 A pneumonia outbreak associated with a new coronavirus of probable bat origin Molecular epidemiology, evolution and phylogeny of SARS coronavirus Genetic evolution analysis of 2019 novel coronavirus and coronavirus from other species Insertion of the two cleavage sites of the respiratory syncytial virus fusion protein in Sendai virus fusion protein leads to enhanced cellcell fusion and a decreased dependency on the HN attachment protein for activity Variable impact of the hemagglutinin polybasic cleavage site on virulence and pathogenesis of avian influenza H7N7 virus in chickens, turkeys and ducks From low to high pathogenicity-Characterization of H7N7 avian influenza viruses in two epidemiologically linked outbreaks Genetic Predisposition To Acquire a Polybasic Cleavage Site for Highly Pathogenic Avian Influenza Virus Hemagglutinin Characterization of severe acute respiratory syndrome coronavirus genomes in Taiwan: molecular epidemiology and genome evolution Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China A Novel Bat Coronavirus Closely Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site of the Spike Protein Identification of Common Deletions in the Spike Protein of Severe Acute Respiratory Syndrome Coronavirus 2 Attenuated SARS-CoV-2 variants with deletions at the S1/S2 junction A novel virus genome discovered in an extreme environment suggests recombination between unrelated groups of RNA and DNA viruses Extensive gene remodeling in the viral world: new evidence for nongradual evolution in the mobilome network Discovery of several thousand highly diverse circular DNA viruses Basic local alignment search tool CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice A simple method to control over-alignment in the MAFFT multiple sequence alignment program Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure VARNA: Interactive drawing and editing of the RNA secondary structure Identification of an Octanucleotide Motif Sequence Essential for Viral Protein, DNA, and Progeny Virus Biosynthesis at the Origin of DNA Replication of Porcine Circovirus Type 2 Palindrome Regeneration by Template Strand-Switching Mechanism at the Origin of DNA Replication of Porcine Circovirus via the Rolling-Circle Melting-Pot Replication Model First detection and phylogenetic analysis of porcine circovirus type 2 in raccoon dogs Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus A New Bat-HKU2-like Coronavirus in Swine Isolation and characterization of a highly pathogenic strain of Porcine enteric alphacoronavirus causing watery diarrhoea and high mortality in newborn piglets Discovery of a novel swine enteric alphacoronavirus (SeACoV) in southern China Complete Genome Sequence of a Novel Swine Acute Diarrhea Syndrome Coronavirus Fatal swine acute diarrhoea syndrome caused by an HKU2-related coronavirus of bat origin Are we missing half of the viruses in the ocean A Bat-Derived Putative Cross-Family Recombinant Coronavirus with a Reovirus Gene RNA conformational changes in the life cycles of RNA viruses, viroids, and virus-associated RNAs An RNA secondary structure bias for non-homologous reverse transcriptase-mediated deletions in vivo Identification of a preferred region for recombination and mutation in HIV-1 gag RNA structures facilitate recombination-mediated gene swapping in HIV-1 Evidence for a mechanism of recombination during reverse transcription dependent on the structure of the acceptor RNA Sequences at the Origin of DNA Replication of Porcine Circovirus Type 1 Demonstration of nicking/joining activity at the origin of replication associated with the rep and rep′ proteins of porcine circovirus type 1 Identification of a sequence from the genome of porcine circovirus type 2 with an inhibitory effect on IFN-alpha production by porcine PBMCs Are RNA Viruses Candidate Agents for the Next Global Pandemic? A Review Acknowledgments: This work was supported by the Tunisian Ministry of Higher Education and Scientific Research, the 'Departments of Excellence-2018' Program (Dipartimenti di Eccellenza) of the Italian Ministry of Education writing-original draft preparation Data and materials availability: All data is available in the main text. 13