key: cord-1003840-xxeyogwd authors: Sander, Anna-Lena; Moreira-Soto, Andres; Yordanov, Stoian; Toplak, Ivan; Balboni, Andrea; Ameneiros, Ramón Seage; Corman, Victor; Drosten, Christian; Drexler, Jan Felix title: Genomic determinants of Furin cleavage in diverse European SARS-related bat coronaviruses date: 2021-12-15 journal: bioRxiv DOI: 10.1101/2021.12.15.472779 sha: 32cc2f98bad8ec4c3518909e7c2fb062d0a4d807 doc_id: 1003840 cord_uid: xxeyogwd The furin cleavage site in SARS-CoV-2 is unique within the Severe acute respiratory syndrome–related coronavirus (SrC) species. We re-assessed diverse SrC from European horseshoe bats and reveal molecular determinants such as purine richness, RNA secondary structures and viral quasispecies potentially enabling furin cleavage. Furin cleavage thus likely emerged from the SrC bat reservoir via molecular mechanisms conserved across reservoir-bound RNA viruses, supporting a natural origin of SARS-CoV-2. The furin cleavage site in SARS-CoV-2 is unique within the Severe acute respiratory syndrome-related 24 coronavirus (SrC) species. We re-assessed diverse SrC from European horseshoe bats and reveal 25 molecular determinants such as purine richness, RNA secondary structures and viral quasispecies 26 potentially enabling furin cleavage. Furin cleavage thus likely emerged from the SrC bat reservoir via 27 molecular mechanisms conserved across reservoir-bound RNA viruses, supporting a natural origin of 28 SARS-CoV-2. 29 30 Emerging coronaviruses of recent or regular zoonotic origin include the betacoronaviruses Severe acute 31 respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus 32 (MERS-CoV) and Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (1). SARS-CoV-2 33 is unique in its high transmissibility (2). SARS-CoV and SARS-CoV-2 belong to the species Severe 34 acute respiratory syndrome-related coronavirus (SrC), subgenus Sarbecovirus, and both use 35 angiotensin-converting enzyme 2 (ACE2) as main cellular receptor (3). In contrast to SARS-CoV, only 36 SARS-CoV-2 contains a functional polybasic furin cleavage site (FCS) between the two subunits of the 37 viral spike glycoprotein (4). The existence of a FCS has led to various hypotheses regarding the 38 evolution of SARS-CoV-2, including conjectures about the possibility of a non-natural origin from 39 laboratory experiments (5, 6). Furin cleavage is thought to be essential for entry into human lung cells 40 and may also determine the efficiency of infection of the upper respiratory tract and consequent 41 transmissibility of SARS-CoV-2 (7). So far, SARS-CoV-2 is unique among SrC, as even its closest 42 known relatives, the bat coronavirus RaTG13 and the pangolin coronaviruses, lack a FCS (8). The 43 natural hosts of SrC are horseshoe bats, widely distributed in the Old World (9). We and others 44 previously showed that SrC in European horseshoe bats are conspecific but distinct from those detected 45 in Asia (10-13). Here, we describe the S1/S2 genomic region encompassing the FCS in SARS-CoV-2 46 in ten unique European bat-associated SrC in comparison to other sarbecoviruses and mammalian 47 We re-accessed stored original fecal samples from four horseshoe bat species (Rhinolophus 49 hipposideros, R. euryale, R. ferrumequinum, R. blasii) collected in Italy, Bulgaria, Spain, and Slovenia 50 during 2008-2009 and amplified an 816 nucleotide fragment of the viral RNA-dependent RNA 51 polymerase (RdRp) of ten unique coronaviruses previously described (12, 13) . Taxonomic classification 52 based on this fragment showed that all ten coronaviruses belonged to the species SrC (Supplementary 53 Table 1 ). In a representative phylogeny that covered the diversity of known SrC, based on a partial S2-54 genomic region encompassing 495 nucleotides, European bat-associated SrC formed a sister clade to 55 Chinese bat-associated SrC (Figure panel A, Supplementary Figure 1 ). Sequence comparison of the 56 S1/S2 genomic region revealed remnants of a polybasic FCS motif (R-X-X-R) at the S1/S2 boundary in 57 12 of 71 unique bat-associated SrC from Europe, Asia, and Africa with higher genetic diversity in 58 European than in Asian bat-associated SrC (Figure panel B) . 59 Next to the S1/S2 FCS, the coronaviral spike protein can also be activated by host cell proteases at the 60 N-terminal S2 (S2') genomic region (8). Only MERS-CoV contains a FCS at both the S1/S2 and the 61 S2' sites (14), whereas the other coronaviruses contain an intact FCS at either the S1/S2 or S2' site. To 62 better understand the evolution of FCS at both the S1/S2 and S2' sites within human coronaviruses we 63 investigated the genomic regions encompassing potential FCS within human-associated coronaviruses 64 and related viruses found in their ancestral and intermediate hosts. Within bat-associated CoVs, only 65 10/102 (10%) and 11/102 (11%) showed a FCS in either the S2' or the S1/S2 genomic region, 66 respectively, suggesting circulation of a broad genetic diversity in the genomic region encoding potential 67 without Accession numbers, SARS-CoV and SARS-CoV-2 are given in red, and SrC from civets and 120 pangolins in blue type. Sequences are named as followed: GenBank Acc. number/strain name/host 121 species/country of detection. Circles at nodes indicate support of grouping in ≥ 90% of 1,000 bootstrap 122 replicates. Scale bar represents nucleotide substitutions per site. B) Bat-associated SrC harbor remnants 123 of a FCS between the spike subunits S1 and S2. Genomic regions at the interface of the S1 and S2 124 domains and the S2' position of the spike protein of 91 unique SrC were aligned using Mafft (23). A 125 scheme of the spike protein and its subunits S1 and S2 shows the position of the FCS in SARS-CoV-2. 126 Conserved amino acids of the FCS are highlighted within the box. Red line separates sequences with 127 and without remnants of FCS. C) FCS conservation at the S1/S2 interface and the S2' site in human 128 coronaviruses and closely related animal coronaviruses. FCS predictions are shown in the 129 Sample collection and processing: 228 Bats were sampled with mist nets using minimally invasive methods under appropriate permits as 229 described earlier (10, 11, 13) . Specimens were screened for the presence of viral RNA of the genus 230 Coronavirus by using reverse-transcription-PCR (RT-PCR) as described previously (24), amplifying 231 455 bp fragments of the RNA-dependent RNA polymerase (RdRp) gene. For further phylogenetic 232 analyses these amplicons were extended to an 816 bp fragment towards the 5' end (12). Nucleotide 233 sequences were deposited in GenBank with accession numbers KC633198, KC633201-205, KC633209, 234 KC633212 and KC633217. 235 S1/S2 genomic regions were characterized using a hemi-nested RT-PCR assay flanking the S1/S2 and 236 S2' site (690 nt; pos 23,442-24,112 in SARS-CoV-2 Wuhan strain Acc. Number MT019529) using the 237 following oligonucleotides: panSARS-S1S2-F1 (TDGCTGTTGTHTAYCARGATGT), panSARS-238 S1S2-F2 (CARGATGTWAAYTGYACWGATGT) and panSARS-S1S2-R 239 (AGDCCATTRAACTTYTGHGCACA). Briefly, RNA was reverse transcribed for 30 min at 50°C 240 using the SSIII One-Step Kit (Thermo Fisher) followed by 45 PCR cycles of 94°C for 15 seconds, 58°C 241 for 45 seconds and 72°C for 1 minute. The 2nd round PCR was performed at the same conditions as the 242 1st round without reverse transcription. PCR amplicons were Sanger sequenced. 243 To detect single nucleotide variants within the S1/S2 site, PCR amplicons were sequenced using the 244 Illumina NovaSeq 6000 Sequencing System with the NovaSeq 6000 SP Reagent Kit (500 cycles). 245 Sequence reads obtained from the library were mapped against their corresponding S1/S2 genomic 246 sequence obtained after PCR amplification in Geneious 9.1.8. 247 Nucleotide sequences obtained within this study were deposited in GenBank with accession numbers 248 XXX to XXX. 249 A tblastx search of the complete spike sequence of the Bulgarian SrC (GenBank Acc No. GU190215) 251 within the Taxa ID 11118 (Coronaviridae) excluding the Taxa ID 2697049 (SARS-CoV-2) was 252 performed on 21 June 2021. Hits with percentage identities below 80% were non-SrC sequences and 253 were thus not included to the dataset. SARS-CoV sequences, experimentally infected or clones as well 254 The evolutionary dynamics of endemic human coronaviruses. 157 Virus Evol Estimating infectiousness throughout SARS-CoV-2 infection course Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. 162 Cell The spike glycoprotein 164 of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same 165 clade Science, not 167 speculation, is essential to determine how SARS-CoV-2 reached humans Investigate the 170 origins of COVID-19 A Multibasic Cleavage Site in the Spike Protein of 172 SARS-CoV-2 Is Essential for Infection of Human Lung Cells Furin cleavage sites naturally occur in coronaviruses Detection of a virus related to 178 betacoronaviruses in Italian greater horseshoe bats. Epidemiology and infection Identification of SARS-like coronaviruses in 181 horseshoe bats (Rhinolophus hipposideros) in Slovenia Genomic 183 characterization of severe acute respiratory syndrome-related coronavirus in European bats and 184 classification of coronaviruses based on partial RNA-dependent RNA polymerase gene sequences Millet JK, Whittaker GR. Host cell entry of Middle East respiratory syndrome coronavirus 190 after two-step, furin-mediated activation of the spike protein Furin cleavage sites in the spike proteins of 193 bat and rodent coronaviruses: Implications for virus evolution and zoonotic transfer from rodent 194 species. One Health Emergence of a highly 196 pathogenic avian influenza virus from a low-pathogenic progenitor Deep sequencing of H7N8 avian influenza 198 viruses from surveillance zone supports H7N8 high pathogenicity avian influenza was limited to a 199 single outbreak farm in Indiana during Recombination resulting in 201 virulence shift in avian influenza outbreak Predisposition To Acquire a Polybasic Cleavage Site for Highly Pathogenic Avian Influenza Virus 204 Hemagglutinin Distinguishing low 206 frequency mutations from RT-PCR and sequence errors in viral deep sequencing data Highly Pathogenic Avian Influenza 209 A(H7N9) Virus A palindromic RNA sequence as a common breakpoint contributor to copy-211 choice recombination in SARS-COV-2. Arch Virol MAFFT: a novel method for rapid multiple sequence 213 alignment based on fast Fourier transform. Nucleic acids research Generic 215 detection of coronaviruses and differentiation at the prototype strain level by reverse transcription-216 PCR and nonfluorescent low-density microarray Modular Evolution of Coronavirus 218 Genomes Multiple Comparisons of Log-Likelihoods with Applications to 220 Phylogenetic Inference Mfold web server for nucleic acid folding and hybridization prediction Prediction of proprotein convertase cleavage sites. Protein Eng 224 Des Sel as sequences with less than 27,000 nt or gaps in the spike protein were excluded, resulting in 80 255 sequences. One reference sequence of each SARS-CoV and SARS-CoV-2 as well as the nine sequences 256 from European bats generated within this study were additionally added, resulting in a final dataset of 257 91 sequences. Because Coronaviruses frequently recombine (25) only the S2 region (495 nt) of the 690 258 nt amplified fragment was used for the phylogenetic analyses. Maximum-likelihood phylogenies were 259 generated using FastTree Version 2.1.10 using a GTR substitution model and 1,000 bootstrap replicates. 260Local support values are based on the Shimodaira-Hasegawa (SH) test (26).