key: cord-0755305-seibnvcq authors: Wu, Yiran; Zhao, Suwen title: Furin cleavage sites naturally occur in coronaviruses date: 2020-12-09 journal: Stem Cell Res DOI: 10.1016/j.scr.2020.102115 sha: 448597366c93633accf86ce04deb241b310dcb20 doc_id: 755305 cord_uid: seibnvcq The spike protein is a focused target of COVID-19, a pandemic caused by SARS-CoV-2. A 12-nt insertion at S1/S2 in the spike coding sequence yields a furin cleavage site, which raised controversy views on origin of the virus. Here we analyzed the phylogenetic relationships of coronavirus spike proteins and mapped furin recognition motif on the tree. Furin cleavage sites occurred independently for multiple times in the evolution of the coronavirus family, supporting the natural occurring hypothesis of SARS-CoV-2. The coronavirus disease 2019 pandemic, caused by coronavirus SARS-CoV-2, is still rapidly spreading. Scientists all around the world are studying the infection process of the virus and looking for therapeutic solutions to prevent and cure the disease. Spike protein is one of most important targets in COVID-19 research, not only in mechanism study but also in vaccine development and therapeutic antibody design (Premkumar et al., 2020; Abraham, 2020) . The spike protein forms a homotrimer (Fig. 1A) and protrudes outside the membrane of the virion (Li, 2016; Wrapp et al., 2020; Walls et al., 2020) . It binds to angiotensin-converting enzyme 2 (ACE2) on human cell surface, undergoes conformational changes, and mediates the fusion of the virion to human cells (Li et al., 2003; Yan et al., 2020; Lan et al., 2020; Wang et al., 2020; Benton et al., 2020) . The spike protein consists of two subunits: the N-terminal S1 subunit, containing the receptor binding domain that interacts with ACE2 and induces conformational change; while the C-terminal S2 subunit, containing the fusion peptide that is responsible for fusing the membranes of virion and human cells (Li, 2016) . To function properly, the spike protein needs to be cleaved by host proteases to separate the two subunits (Simmons et al., 2013) . Study of SARS-CoV (caused the severe acute respiratory syndrome in [2002] [2003] and belonged to the same species as SARS-CoV-2) showed that cleavage at S1/S2 enhances fusogenicity of spike protein (Bosch et al., 2003) . Another cleavage site producing S2 ′ (fusion peptide and the C-terminal region of S2) was also identified ( Fig. 1B) (Belouzard et al., 2009) . The S2 ′ site is not far from the S1/S2 site, and cleavage either or both them can yield the separation of two subunits of spike. For coronaviruses, multiple proteases were reported responsible for the cleavage, including TMPRSS2 (Hoffmann et al., 2020; Glowacka et al., 2011) ; cathepsin CTSL (Bosch et al., 2008; Ou et al., 2020) , and trypsin (Belouzard et al., 2009; Bertram et al., 2011) . The spike protein of the Middle East respiratory syndrome coronavirus (MERS-CoV) was discovered to be more effectively cleaved by another protease, furin (Millet and Whittaker, 2014) . In MERS-CoV, both S1/S2 and S2 ′ sites have sequence RXXR, the furin recognition motif. In human body, furin is ubiquitously expressed (Braun and Sauter, 2019) . This may explain the highly lethal nature and high rate of multiple system failure caused by MERS (Millet and Whittaker, 2014) . Surprisingly, SARS-CoV-2 has the furin recognition motif at S1/S2, causing by a 12-nucleotide insertion not presented even in its closest relatives (Walls et al., 2020; Coutard et al., 2020) . This stimulates a conspiracy that this furin site can only be manual work, thus SARS-CoV-2 must be created in a laboratory. Here, we analyzed the sequences of coronaviruses and found furin sites occurred independently for multiple times during evolution. This exhibits natural occurrence of furin cleavage site in SARS-CoV-2 spike protein is highly possible. Thus, the insertion of furin cleavage site into SARS-CoV-2 spike protein is not necessarily a result of manual work. Coronaviruses are members of the subfamily Orthocoronavinae (also named Coronavinae) in the family Coronavidae. There are four genera: Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus. The first two genera only infect mammals, while the last two E-mail addresses: wuyr@shanghaitech.edu.cn (Y. Wu), zhaosw@shanghaitech.edu.cn (S. Zhao). primarily infect birds but also mammals (Cui et al., 2019) . We collected sequences of coronavirus spike proteins from the InterPro database (Mitchell et al., 2019) and performed phylogenetic analysis. The phylogenetic tree ( Fig. 2A) generally matches reported relationship of coronaviruses (Cui et al., 2019) : three genera (Betacoronavirus; Gammacoronavirus, and Deltacoronavirus) each forms a single clade, while only Alphacoronavirus has a small group left outside (see Fig. 5 for representative sequences). The left-out Alphacoronavirus group contains Rhinolophus bat coronavirus HKU2 that was reported to be closely related to two typical Alphacoronavirus, human coronavirus NL63 and human coronavirus 229E, in a phylogenetic analysis based on the RNAdependent RNA polymerase coding region (Cui et al., 2019) . We aligned the whole genomes of the three viruses and found Rhinolophus bat coronavirus HKU2 is quite unique in its spike protein-coding region (Fig. S1 ). Such result explains why this clade was separated from other Alphacoronavirus in our phylogenetic analysis. Furthermore, the five subgenera of Betacoronavirus were nicely displayed as monophyletic ( Fig. 2B) : Sarbecovirus (e.g. SARS-CoV-2 and SARS-CoV), Merbecovirus (e.g. MERS-CoV), Embecovirus (e.g. human coronavirus OC43 and human coronavirus HKU1, both causing common cold), and two small subgenera Hibecovirus and Nobecovirus. Hibecovirus and Nobecovirus are only discovered in bats so far, and our phylogenetic analysis shows they are related to Sarbecovirus. We mapped the furin recognition motif RXXR at both S1/S2 and S2 ′ positions in phylogenetic trees of spike protein sequences (Figs. 3-5 and S2, S3). In the Sarbecovirus + Hibecovirus + Nobecovirus clade (Fig. 3) , furin cleavage sites at either position occur only in limited ranges. Strains of SARS-CoV-2 (we also added sequences from the GISAID database) have furin cleavage sites at spike S1/S2. Moreover, SARS-CoV-2 is the only virus in subgenus Sarbecovirus having this feature, while even its closest relatives, bat coronavirus RaTG13 (sequence identity 97.7%) and pangolin coronaviruses (92.9%-90.7%), do not have furin site. However, in Hibecovirus, the sister clade of Sarbecovirus, a Hipposideros bat coronavirus collected in 2013 at Zhejiang Province in China has furin site at S1/S2. Interestingly, the other member in Hibecovirus lacks such site, similar to the situation of SARS-CoV-2 and its close relatives. The SARS-CoV-2 strains and Hipposideros bat coronavirus (Zhejiang 2013) are not sister groups, in agreement with the distinct sequence patterns of their furin cleavage sites at spike S1/S2 (Fig. 6A) . Besides, two strains of bat coronavirus HKU9, belonging to Nobecovirus, are the only members in the Sarbecovirus + Hibecovirus + Nobecovirus clade having furin cleavage site at spike S2 ′ . Our mapping results showed that the furin recognition motif is more common in Merbecovirus and Embecovirus (Figs. 4 and S2, S3). In Merbecovirus, furin sites at spike S1/S2 occur in three clades: MERS-CoV strains, the bat coronavirus HKU5 strains, and coronavirus Neoromicia/PML-PHE1/RSA/2011 with its relatives (Figs. 4A and S2). Besides, MERS-CoV and bat coronavirus HKU5 are the only clades in Merbecovirus having furin cleavage site at S2 ′ . In Embecovirus, furin recognition motif at spike S1/S2 is universal: All strains but a few exceptions have furin cleavage sites at spike S1/S2 (Figs. 4B and S3). Interestingly, the Longquan Aa mouse coronavirus (Wang et al., 2015) loses this furin site, while its close relatives (e.g. China Rattus coronavirus HKU24, sequence identity 96.0%) maintains the furin cleavage site. This provides an example of naturally occurred sequence variation at spike S1/S2 among closely related coronaviruses. Besides, for spike S2 ′ , only several single strains have furin recognition motif (Fig. S3 ). Our mapping results showed furin cleavage sites are widely present in the whole coronavirus family (Fig. 5) . For spike S1/S2, furin recognition motif is universal in Gammacoronavirus, and also occurs in two clades of Alphacorovanvirus: feline coronavirus and relatives, and Chevrier's field mouse coronavirus. For spike S2 ′ , furin recognition motif occurs in several independent clades, covering all the three genera. Notably, in the two human coronaviruses in Alphacoronavirus causing common cold, HCoV NL63 has furin cleavage site at spike S2 ′ , while the HCoV 229E (protein sequence identity 63.8%) lacks such feature. The alignment of linking regions of spike S1 and S2 domains in representative Betacoronavirus (Fig. 6A) shows this region is less conserved than the neighboring folded S1 and S2 segments. Within a subgenus the sequences are well aligned, but among subgenera the similarity is low. The furin cleavage site of SARS-CoV-2 spike S1/S2 is formed by a insertion of PRRA in comparison to other Sarbecovirus including close relative RaTG13, showing it occurred very recently and independently. Similarly, Hipposideros bat coronavirus (Zhejiang 2013) in Hibecovirus has furin site of independent origin, though the occurring time is hard to decide for in this subgenera only two sequences were published. Merbecovirus and Embecovirus both have multiple coronavirus species with furin cleavage sites at spike S1/S2, but their situations are different: In Merbecovirus, furin cleavage sites prevail in three non-sister clades (Figs. 4A and 6A). Moreover, the positions of furin recognition motifs in the linking regions are unique to each clade, as exhibited in alignments of both protein sequences (Fig. 6A ) and nucleotide sequences (Fig. S4A) . These indicated for of the three clades in Merbecovirus, furin cleavage sites have an independent origin. In Embecovirus, to the contrast, all the furin cleavage sites are variations based on a 5-residue region with consensus sequence RRXRR. The region is well aligned in both protein and nucleotide sequences (Figs. 6A and S4B ). This suggested the furin cleavage sites of Embecovirus share a common ancestor. In addition, in Alphacoronavirus and Gammacoronavirus, S1/S2 cleavage sites reside at a different loop comparing to the site in Betacoronavirus (Fig. 6B) , therefore furin cleavage sites at spike S1/S2 in these two genera occurred independently from those in Betacoronavirus in evolution. Furin cleavage is critical to many viral diseases, including HIV, Ebola, and influenza H5 and H7 (Becker et al., 2012) . Furin is a ubiquitously expressed protease. In human body, it has a wider distribution range than the major protease responsible for cleaving spike, TMPRSS2 (Fig. S5) . Therefore, coronaviruses with spike containing furin cleavage site may have advantage in spreading. Deletions of furin cleavage site in SARS-CoV-2 attenuates replication on respiratory cells (Johnson et al., 2020) and pathogenesis in hamster (Johnson et al., 2020; Lau et al., 2020) . Furin inhibitors suppress virus production and cytopathic effects in kidney cells (Cheng et al., 2020) . Natural polymorphisms losing furin recognition motif in SARS-CoV-2 spike S1/S2 are observed, but very rare (Xing et al., 2020) . Variations in this region are more common in viruses cultured in vitro than viruses isolated from clinical samples, suggesting this cleavage site is under selection pressure in human body (Lau et al., 2020; Liu et al., 2020) . Our analysis exhibits furin cleavage sites at spike S1/S2 occurred independently for several times in coronavirus. Consequently, natural occurring of the site in SARS-CoV-2 is highly possible. This is further supported by other observed natural variations at the linking region of S1 and S2: A natural insertion in SARS-CoV spike though not related to furin recognition motif was reported . In Embecovirus; Longquan Aa mouse coronavirus (Wang et al., 2015) has a frameshift mutation led to the loss of furin recognition motif (Fig. S4B) ; Some strains of murine coronavirus lose furin recognition motif through substitution mutations (Fig. S3) , e.g. in MHV-2 (Yamada et al., 1997) . Further study of losing the furin cleavage site in Embecovirus would help to interpret the S1/S2 cleavage of Betacoronaviruses. Besides, independent occurrences of furin cleavage sites in surface glycoproteins are not unique to coronavirus: for the hemagglutinin of influenza, only H5 and H7 have furin cleavage sites (Bottcher-Friebertshauser et al., 2013) ; and these subtypes are distant in phylogenetic tree (Fig. S6) . Furin cleavage sites in spike proteins naturally occurred independently for multiple times in coronaviruses. Such feature of SARS-CoV-2 spike protein is not necessarily a product of manual intervention, though our observation does not rule out the lab-engineered scenario. 3 . Mapping of furin recognition motif on phylogenetic tree of spike protein sequences, the Sarbecovirus + Hibecovirus + Nobecovirus clade. Sequences in the SARS-CoV-2 clade were clustered with 95% identity threshold; in the SARS-CoV-2 clade were clustered with 99% identity threshold. Most sequences were obtained from the InterPro database (Mitchell et al., 2019) . All the sequences sharing the same domain annotation SAR2-CoV-2 or SAR2-CoV (entries: IPR002552, IPR018548, IPR036326, and IPR042578) were selected and then filtered in three steps: 1) Sequences shorter than 1000 amino acids or not starting with methionine were dropped; 2) The rest sequences were put into multiple sequence alignment and phylogenetic analysis, and a few sequences were found to be out of the coronavirus family thus were dropped; 3) Duplicate sequences were removed using the software CD-HIT version 4.8.1 (Fu et al., 2012; Li and Godzik, 2006) . Finally spike protein sequences from pangolin coronaviruses, closely related to SARS-CoV-2 but not included in InterPro database, were added in to the sequence set. Genomes of bat coronavirus RmYN02 and pangolin covonaviruses were obtained from GISAID database (https://www.gisaid.org/), and spike protein sequences were generated using NCBI ORFfinder server (https://www. ncbi.nlm.nih.gov/orffinder/). Additional sequences of SARS-CoV-2 spike proteins were also obtained from GISAID database (at Sep. 29, 2020) ; Sequences with 25 or more continuous unreadable amino acids were dropped, and 93,420 spike protein sequences were used for analysis. Nucleotide sequences of representative spike sequences were obtained from NCBI nucleotide database (https://www.ncbi.nlm.nih. gov/nuccore/). Sequence set were clustered using the software CD-HIT version 4.8.1 (Fu et al., 2012; Li and Godzik, 2006) . Multiple sequence alignment jobs were performed using the MAFFT software version 7.299b (Katoh and Standley, 2013) . The strategy FFT-NS-2 was selected. For other parameters, the default values were used. The alignment figures in Fig. S4 were generated using the ESPript 3.0 website (http://espript.ibcp.fr/ESPript/ESPript/) (Robert and Gouet, 2014) . The phylogenetic trees were generated from multiple sequence alignment results of spike proteins using the MegaX software version 10.1.7 (Kumar et al., 2018) . The algorithm Neighbor Joining was selected, and for all parameters the default values were used. Homology model of feline coronavirus UU16 was build based on electron-microscopy structure of human coronavirus NL63 spike protein (PDB ID: 5SZS (Walls et al., 2016) using the Prime program version 4.2 (Schrödinger, 2015) in Schrodinger software platform. Missing loops of SARS-CoV-2 spike protein was modeled also using the Prime program based on structure with PDB ID: 6VYB (Walls et al., 2020) . Sequences clustered with 95% identity threshold. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Fig. 6 . Positions of furin cleavage sites at the linking region of S1 and S2. A) Multiple sequence alignment of representative Betacoronavirus spike protein S1/S2 region, with furin recognition motifs highlighted (red colorboxes in sequence alignment). Phylogenetic tree of spike protein sequences is colored to indicate subgenera (coloring scheme the same as in Fig. 2B) . B) Positions of furin cleavage sites in different coronavirus genera (red cartoon, furin recognition motif; red arrow, cleavage site); structures: SARS-CoV-2, PDB ID 6VYB (Walls et al., 2020) , with missing loop added; feline coronavirus (FCoV) UU16, homology model based on PDB ID 5SZS (Walls et al., 2016) ; infectious bronchitis coronavirus (IBV), PDB ID 6CV0 (Shang et al., 2018) . (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) The receptor binding domain of the viral spike protein is an immunodominant and highly specific target of antibodies in SARS-CoV-2 patients Passive antibody therapy in COVID-19 Structure, function, and evolution of coronavirus spike proteins Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Structural and functional basis of SARS-CoV-2 entry by using human ACE2 Receptor binding and priming of the spike protein of SARS-CoV-2 for membrane fusion Proteolytic activation of the SARS-coronavirus spike protein: cutting enzymes at the cutting edge of antiviral research The coronavirus spike protein is a class I virus fusion protein: structural and functional characterization of the fusion core complex Activation of the SARS coronavirus spike protein via sequential proteolytic cleavage at two distinct sites SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor Evidence that TMPRSS2 activates the severe acute respiratory syndrome coronavirus spike protein for membrane fusion and reduces viral control by the humoral immune response Cathepsin L. functionally cleaves the severe acute respiratory syndrome coronavirus class I fusion protein upstream of rather than adjacent to the fusion peptide Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV Cleavage and activation of the severe acute respiratory syndrome coronavirus spike protein by human airway trypsin-like protease Host cell entry of Middle East respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein Furin-mediated protein processing in infectious diseases and cancer The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade Origin and evolution of pathogenic coronaviruses Discovery, diversity and evolution of novel coronaviruses sampled from rodents in China Highly potent inhibitors of proprotein convertase furin as potential drugs for treatment of infectious diseases Attenuated SARS-CoV-2 variants with deletions at the S1/S2 junction Furin inhibitors block SARS-CoV-2 spike protein cleavage to suppress virus production and cytopathic effects Natural polymorphisms are present in the furin cleavage site of the SARS-CoV-2 spike glycoprotein Identification of common deletions in the spike protein of severe acute respiratory syndrome coronavirus 2 Activation of influenza viruses by proteases from host cells and bacteria in the human airway epithelium CD-HIT: accelerated for clustering the nextgeneration sequencing data Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences MAFFT multiple sequence alignment software version 7: improvements in performance and usability Deciphering key features in protein structures with the new ENDscript server MEGA X: molecular evolutionary genetics analysis across computing platforms Glycan shield and epitope masking of a coronavirus spike protein observed by cryo-electron microscopy Cryo-EM structure of infectious bronchitis coronavirus spike protein reveals structural and functional evolution of coronavirus spike proteins Options and obstacles for designing a universal influenza vaccine We thank ShanghaiTech University for supporting this work. Supplementary data to this article can be found online at https://doi. org/10.1016/j.scr.2020.102115.