key: cord-0168903-g7qioapl authors: Jaimes, Javier A.; Andre, Nicole M.; Millet, Jean K.; Whittaker, Gary R. title: Structural modeling of 2019-novel coronavirus (nCoV) spike protein reveals a proteolytically-sensitive activation loop as a distinguishing feature compared to SARS-CoV and related SARS-like coronaviruses date: 2020-02-14 journal: nan DOI: nan sha: ad2b326221b48e904660968e5d10c4cb2882c5e0 doc_id: 168903 cord_uid: g7qioapl The 2019 novel coronavirus (2019-nCoV) is currently causing a widespread outbreak centered on Hubei province, China and is a major public health concern. Taxonomically 2019-nCoV is closely related to SARS-CoV and SARS-related bat coronaviruses, and it appears to share a common receptor with SARS-CoV (ACE-2). Here, we perform structural modeling of the 2019-nCoV spike glycoprotein. Our data provide support for the similar receptor utilization between 2019-nCoV and SARS-CoV, despite a relatively low amino acid similarity in the receptor binding module. Compared to SARS-CoV, we identify an extended structural loop containing basic amino acids at the interface of the receptor binding (S1) and fusion (S2) domains, which we predict to be proteolytically-sensitive. We suggest this loop confers fusion activation and entry properties more in line with MERS-CoV and other coronaviruses, and that the presence of this structural loop in 2019-nCoV may affect virus stability and transmission. The coronaviruses belong to the Coronaviridae family and the Orthocoronaviridae subfamily, which is divided in four genera; Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus. SARS-CoV, MERS-CoV, and 2019-nCoV are all betacoronaviruses, a genus that includes many viruses that infect humans, bats, and other wild animals (ICTV, 2018). Betacoronaviruses have many similarities within the ORF1ab polyprotein and most structural proteins, however, the spike protein and accessory proteins portray significant diversity (Cui et al., 2019) . MERS-CoV has maintained a stable genome since its emergence in 2012, unlike other coronaviruses that readily evolve and can undergo notable recombination events (Perlman, 2020) . . The market sells many species including seafood, birds, snakes, marmots and bats (Gralinski and Menachery, 2020) . The market was closed on January 1 st , 2020 and sampling and decontamination have occurred in order to find the source of the infection. Origination of 2019-nCoV from bats has been strongly supported, but the presumed intermediate host remain to be identified; initial reports that 2019-nCoV had an origin in snakes have not been verified (Gralinski and Menachery, 2020; Zhou et al., 2020a) . The coronavirus spike protein (S) is the primary determinant of viral tropism and is responsible for receptor binding and membrane fusion. It is a large (approx. 180 kDa) glycoprotein that is present on the viral surface as a prominent trimer, and it is composed of two domains, S1 and S2 (Belouzard et al., 2012) . The S1 domain mediates receptor binding, and is divided into two subdomains, with the N-terminal domain often binding sialic acid and the C-domain binding a specific proteinaceous receptor (Hulswit et al., 2016) . The receptor for SARS-CoV has been identified as The fusion peptide is activated through proteolytic cleavage at a site immediately upstream (S2') , which is common to all coronaviruses. In many (but not all) coronaviruses, additional proteolytic priming occurs at a second site located at the interface of the S1 and S2 domains (S1/S2) (Millet and Whittaker, 2015) . The use of proteases in priming and activation, combined with receptor binding and ionic interactions (e.g. H + and Ca 2+ ) together control viral stability and transmission, and also control the conformational changes in the S protein that dictate the viral entry process into host cells (Belouzard et The rapid dissemination and sharing of information during the 2019-nCoV outbreak has surpassed that of both MERS-CoV or SARS-CoV, where the latter virus was only identified after several months and with a genome available a month later (Gralinski and Menachery, 2020) . The 2019-nCoV was identified and a genome sequence was available within a month from the initial surfacing of the agent in patients (Gralinski and Menachery, 2020) . Initial reports identified that 2019-nCoV contains six major open reading frames in the viral genome and various accessory proteins (Zhou et al., 2020a) . The SARS-like virus Bat-CoV RaTG13 was observed to have highly homologous conservation of the genome, with two other bat SARS-like viruses (Bat-SL-CoVZC45 and Bat-SL-CoVZXC21) having 89-97% sequence identity (Gralinski and Menachery, 2020) . The S protein of 2019-nCoV was found to be approximately 75% homologous to the SARS-CoV spike (Gralinski and Menachery, 2020; Zhou et al., 2020a) . In this study, we perform bioinformatic analyses and homology structural modeling of 2019-nCoV S, in comparison with closely related viruses. We identify a small structural loop at the S1/S2 interface that contains a short insert containing two arginine residues for 2019-nCoV S. These features are missing from all other SARS-CoV-related viruses, but present in MERS-CoV S and in many other coronaviruses. We discuss the importance of this extended basic loop for S proteinmediated membrane fusion and its implications for viral transmission. To obtain an initial assessment of shared and/or specific features of the 2019-nCoV spike (S) envelope glycoprotein, a protein sequence alignment was performed to compare the sequence of the Wuhan-Hu-1 strain of the novel coronavirus with that of the closely related human SARS-CoV S strain Tor2 sequence ( Supplementary Fig. 1 ). The overall percent protein sequence identity found by the alignment was 76% (Fig. 1A) . A breakdown of the functional domains of the S protein, based on the SARS-CoV S sequence, reveals that the S1 receptor-binding domain was less conserved (64% identity) than the S2 fusion domain (90% identity). Within S1, the N-terminal domain (NTD) was found to be less conserved (51% identity) compared to the receptor binding domain (RBD, 74% identity). The relatively high degree of sequence identity for the RBD is consistent with the view that 2019-nCoV, like SARS-CoV, may use ACE2 as its host cell receptor, The composition of residues found at the two known coronavirus S cleavage sites was performed using alignment data ( Fig. 2B and C). The region around arginine 667 (R667) of SARS-CoV S, the S1/S2 cleavage site aligned well with 2019-nCoV and the bat SARS-related sequences . Notably, an arginine at the position corresponding to SARS-CoV R667 is conserved for the other five sequences analyzed. The alignment shows that 2019-nCoV contains a four amino acid insertion 681PRRA684 that is not found in any other sequences, including the closely related bat-SL-RaTG13 (Fig. 2B ). Together with the conserved R685 amino acid found in 2019-nCoV at the putative S1/S2 cleavage site, the insertion introduces a stretch of three basic arginine residues that could potentially be recognized by members of the pro-protein convertase family of proteases (Seidah, 2011; Seidah et al., 2013) . This insertion was conserved for all fifteen 2019-nCoV sequences analyzed ( Supplementary Fig. 2 ). Within the Betacoronavirus genus, the presence of a basic stretch of residues at the S1/S2 site is found for a number of species from We observed an average of ~30% identity among the four viral S proteins at the amino acid level, with the exception of HCoV-HKU1 and MHV which share an amino acid identity of 59% at the S protein ( Supplementary Fig. 3A ). Despite the differences at the amino acid level, the overall structure of the four Betacoronavirus S proteins showed a similar folding pattern ( Supplementary Fig. 3B ), and major differences can only be spotted at specific sections of the functional domains where flexible loops are abundant (e.g. RBD and cleavage sites). Considering this, we built a first set of models for the 2019-nCoV S protein based on each of the above-mentioned structures ( Supplementary Fig. 4) . Interestingly, we found no major differences at the secondary structures among the 2019-nCoV S protein predicted models depending on the S structure that was used as template for the modeling construction. However, extended flexible loops at the RBD and/or clashes between S monomers at the S2 domain level were observed in the 2019-nCoV S models based on HCoV-HKU1, MHV and MERS-CoV ( Supplementary Fig. 4 -first three panels) . In contrast, the predicted 2019-nCoV S model based on the SARS-CoV S structure displayed a much better organized folding and no major clashes were observed between the S monomers ( Supplementary Fig. 4 -last panel) . As we described previously, the identity between 2019-nCoV and SARS-CoV at the S protein amino acid level was 76%, and the phylogenetic analysis grouped the 2019-nCoV in the lineage A of the Betacoronavirus genus, closely relating to SARS-CoV, as well as to other CoVs originated in bats (Fig. 1B) . These two considerations, in addition to our preliminary modeling results, suggested SARS-CoV S as the most suitable template for modeling the 2019-nCoV S protein. Taking an alternative approach, the S protein sequence of 2019-nCoV was submitted to two To better compare the predicted structural characteristics of the 2019-nCoV, we also performed homology modeling of four S proteins from Bat-CoVs belonging to lineage B in our phylogenetic analysis, that showed to be closely related to 2019-nCoV. The modeled S proteins from the Bat-CoV: RaTG13, CoVZC45, CoVZXC21 and LYRa3 were compared to the predicted structure of 2019-nCoV S and to the template structure od SARS-CoV (Fig. 3) . The amino acid homology of the modeled S proteins in comparison to the template SARS-CoV S was ~71% for all the Bat-CoV S with the exception of the LYRa3 S which shares a homology of 84.69% with the template S. Overall, all the modeled S proteins shared a similar folding pattern in comparison to SARS-CoV S and both, S1 and S2 domains showed a uniform organization (Fig. 3) . As expected, differences were mostly observed at the flexible loops forming the 'head" of the S1 domain, specially at the NTD region (RBD region), were most of the amino acid variation was observed ( Fig. 2A and 3 ). and bottom left panels). These two last viruses showed a 5 and, a 14 amino acid deletions at the RBM sequence ( Fig. 2A) , which can explain the differential folding in the modeled proteins. Structural modeling of 2019-nCoV S reveals a proteolytically-sensitive loop Figure 2B shows a four amino acid insertion 681PRRA684, as well as a conserved R685 at the S1/S2 site of the 2019-nCoV. This insertion, which appears to be common among the lineage B of betacoronaviruses, suggests a differential mechanism of activation for the 2019-nCoV compared to other SARS-CoV and SARS-like Bat-CoV. At the structural level, the S1/S2 site has been shown to be difficult to solve for most CoVs structures, resulting in either incomplete structures (missing the complete S1/S2 site) or structures with an altered (i.e. mutated) S1/S2 site (Walls et al., . Solving the structure of the S1/S2 site was also found to be an issue in the SARS-CoV S structure we used for our modeling analyses. We have previously shown that the S1/S2 site can be modeled in other CoV S proteins and it appears to organize as a flexible exposed loop that extends from the S structure and suggest it could be easily accessible for proteolytic activation (Jaimes et al., 2020). To better study the S1/S2 site structural organization, we modeled the SARS-CoV S protein based on the S structure of MHV (S1/S2 site mutated in the structure), and MERS-CoV and SARS-CoV (S1/S2 site missing in the structure) to see if the predicted structure of the S1/S2 site was similar in despite the template structure. We observed no differences in the modeled SARS-CoV S protein at the S1/S2 site, predicting an exposed flexible loop in all the three models (data not shown). Based on this, we proceeded to compare the S1/S2 site, as well as other major functional elements of the S2 domain (i.e. S2' site and fusion peptide), in the predicted structure in our SARS-CoV, 2019-nCoV and Bat-CoV S models (Fig 5) . Remarkably, two features appear to exhibit distinctive characteristics in the 2019-nCoV S model: the fusion peptide, which is predicted to be organized in a more compact conformation for 2019-nCoV S than in SARS-CoV S (Fig 5 - surface models) and the region corresponding to the S1/S2 cleavage site which contains R667 in the case of SARS-CoV (Fig. 5 -S1/S2 alignment box and ribbon models). For SARS-CoV and the bat-CoV proteins, the S1/S2 site forms a short loop that appears flanking closely to the side of the trimeric structure. In the case of 2019-nCoV S, the S1/S2 site is predicted to form an extended loop that protrudes to the exterior of the trimer (Fig. 5) . This feature suggests that the S1/S2 loop in 2019-nCoV S could be more exposed for proteolytic processing by host cell proteases. As mentioned before, solving structure of the S1/S2 site appears to present difficulties for most of the reported CoV S structures (Fig. 6 -top panel) . However, the exposed loop feature has been demonstrated in both modeled and cryo-EM CoV S structures with similar amino acid sequences at the S1/S2 site (i.e. FCoV and IBV, respectively) ( Fig. 6 -top panel) . Interestingly, FCoV viruses do not always display a S1/S2 site (Fig. 6 -top panel) , which results in distinct cell entry mechanisms. We also performed an analysis of the S2' site of the 2019-nCoV in comparison to SARS-CoV and bat-CoV S proteins. As expected, differences in the modeled S2' site structure were not predicted in any of the studied spikes (Fig. 5 -S2' ribbon models) . This agrees with the fact that the S2' site appears to be conserved in the studied sequences (Fig. 5 -S2 ' alignment box) and as we In this study, we show the presence of a distinct insert in the S1/S2 priming loop of 2019-nCoV S, which is not shared with SARS-CoV or any SARS-related viruses. The significance of this is yet to Zhou et al., 2020b). One notable feature of the S protein S1/S2 cleavage site was first observed during the purification of the MHV S protein for structural analysis (Walls et al., 2016a) . MHV with an intact cleavage loop was unstable when expressed, and so we consider that the S1/S2 loop controls virus stability, likely via access to the down-stream S2' site that regulates fusion peptide exposure and activity. As such it will interesting to monitor the effects of S1/S2 loop insertions and proteolytic cleavability in the context of virus transmission, in addition to virus entry and pathogenesis. Sequences: Amino acid sequences of the S protein used in the phylogenetic analysis were obtained from Figure 4 . 2019-nCoV and bat-CoVs modeled RBM. Surface view of SARS-CoV S structure and 2019-nCoV, RaTG13, CoVZC45, CoVCZXC21 and LYRa3 S models. SARS-CoV RBM (red) and flanking residues (yellow) are noted. RBM in the modeled structures is also noted according to their amino acid homology (red) and differences (blue) to SARS-CoV. Bat-CoV LYRa3 S model Bat-CoV CoVZXC21 S Figure 5 . 2019-nCoV S1//S2 and S2' activation sites. The S1/S2 and S2' activation sites of SARS-CoV and 2019-nCoV S models are shown in surface and ribbon views. S1/S2 and S2' sites of bat-CoVs are shown in ribbon view. Amino acid homology to SARS-CoV is noted as follows: S1/S2 site: homology (red) and differences (blue); S2' site: homology (yellow) and differences (magenta). Amino acid alignments of the S1/S2 and S2' sites are shown, and homology is also noted. . CoVs S1/S2 and S2' site. The S1/S2 and S2' activation sites of FCoV, MERS-CoV and IBV. S models are shown ribbon views. Amino acid homology to SARS-CoV is noted as follows: S1/S2 site: homology (red) and differences (blue); S2' site: homology (yellow) and differences (magenta). Amino acid sequences of the S1/S2 and S2' sites are shown. GenBank accession numbers (in parenthesis) from which whole genome or S gene sequences were obtained were Bat-SL-CoVZC45 (MG772933.1), Bat-SL-CoVZXC21 (MG772934.1), Bat-SL-LYRa3 (KF569997.1), BatCoV/133 (DQ648794.1) BatCoV-Neo camMERS-CoV-HKU23 (KF906251.1), camMERS-CoV-KSA-505 (KJ713295.1), camMERS-CoV-NRCE-HKU205 (KJ477102.1) CivSARS-CoV-SZ3 (P59594.1), HCoV-229E (NC_002645.1), HCoV-HKU1 (AY597011.2), HCoV-OC43 (KF963244.1), Hedgehog-CoV/VMC/DEU (KC545383.1), hMERS-CoV-EMC/2012 (JX869059.2), hMERS-CoV-England-1 (KC164505.2), hMERS-CoV-Jordan-N3 (KC776174.1), hSARS-CoV-BJ01 (AY278488.2), hSARS-CoV-GZ02 (AY390556.1), hSARS-CoV-HKU39849 (JN854286.1) For S protein modeling, amino acid sequences of SARS-CoV Urbani (AAP13441.1) Distinct mutation in the feline coronavirus spike protein cleavage activation site in a cat with feline infectious peritonitis-associated meningoencephalomyelitis Activation of the SARS coronavirus spike protein via sequential proteolytic cleavage at two distinct sites Mechanisms of coronavirus cell entry mediated by the viral spike protein A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster Structure of the hemagglutinin precursor cleavage site, a determinant of influenza pathogenicity and the origin of the labile conformation High reproduction number of Middle East respiratory syndrome coronavirus in nosocomial outbreaks: mathematical modelling in Saudi Arabia and South Korea Origin and evolution of pathogenic coronaviruses A mathematical model of the transmission of middle East respiratory syndrome coronavirus in dromedary camels (Camelus dromedarius) Proteolytic cleavage of the E2 glycoprotein of murine coronavirus: host-dependent differences in proteolytic cleavage and cell fusion Cell receptor-independent infection by a neurotropic murine coronavirus Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor Return of the Coronavirus: 2019-nCoV New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0 Ready, set, fuse! The coronavirus spike protein and acquisition of fusion competence The novel coronavirus 2019 (2019-nCoV) uses the SARS-coronavirus receptor ACE2 and the cellular protease TMPRSS2 for entry into target cells. bioRxiv Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet Fast assessment of human receptor-binding capability of 2019 novel coronavirus (2019-nCoV). bioRxiv Coronavirus Spike Protein and Tropism Changes Evidence supporting a zoonotic origin of human coronavirus strain NL63 Virus Taxonomy A Tale of Two Viruses: The Distinct Spike Glycoproteins of Feline Coronaviruses Feline coronavirus: Insights into viral pathogenesis based on the spike protein structure and function Coronavirus 2019-nCoV Global Cases MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform MAFFT multiple sequence alignment software version 7: improvements in performance and usability Pre-fusion structure of a human coronavirus spike protein Functional analysis of potential cleavage sites in the MERS-coronavirus spike protein The SARS-CoV Fusion Peptide Forms an Extended Bipartite Fusion Platform that Perturbs Membrane Order in a Calcium-Dependent Manner Cleavage of a Neuroinvasive Human Respiratory Virus Spike Glycoprotein by Proprotein Convertases Modulates Neurovirulence and Virus Spread within the Central Nervous System Functional assessment of cell entry and receptor usage for lineage B β-coronaviruses, including 2019-nCoV. bioRxiv Structure of SARS coronavirus spike receptor-binding domain complexed with receptor Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Mutation in spike protein cleavage site and pathogenesis of feline coronavirus Transmission dynamics and control of severe acute respiratory syndrome Transmission dynamics of 2019 novel coronavirus (2019-nCoV). bioRxiv Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Characterization of a highly conserved domain within the severe acute respiratory syndrome coronavirus spike protein S2 domain with characteristics of a viral fusion peptide Trypsin treatment unlocks barrier for zoonotic bat coronaviruses infection Jumping species-a mechanism for coronavirus persistence and survival SARS-like WIV1-CoV poised for human emergence Host cell entry of Middle East respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein Host cell proteases: Critical determinants of coronavirus tropism and pathogenesis Another Decade, Another Coronavirus Importation and Human-to-Human Transmission of a Novel Coronavirus in Vietnam Neurovirulent Murine Coronavirus JHM.SD Uses Cellular Zinc Metalloproteases for Virus Entry and Cell-Cell Fusion The proprotein convertases, 20 years later The multifaceted proprotein convertases: their unique, redundant, complementary, and opposite functions Cryo-EM structure of infectious bronchitis coronavirus spike protein reveals structural and functional evolution of coronavirus spike proteins Cryo-EM structure of porcine delta coronavirus spike protein in the pre-fusion state Cryo-electron microscopy structure of a coronavirus spike glycoprotein trimer Glycan shield and epitope masking of a coronavirus spike protein observed by cryo-electron microscopy Receptor recognition by novel coronavirus from Wuhan: An analysis based on decade-long structural studies of SARS Discovery of seven novel Mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus Summary of probable SARS cases with onset of illness from 1 Middle East respiratory syndrome coronavirus (MERS-CoV) Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Angiotensin-converting enzyme 2 (ACE2) from raccoon dog can serve as an efficient receptor for the spike protein of severe acute respiratory syndrome coronavirus Cryo-EM analysis of a feline coronavirus spike protein reveals a unique structure and camouflaging glycans Isolation and Characterization of a Novel Bat Coronavirus Closely Related to the Direct Progenitor of Severe Acute Respiratory Syndrome Coronavirus Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains A pneumonia outbreak associated with a new coronavirus of probable bat origin Preliminary prediction of the basic reproduction number of the Wuhan novel coronavirus 2019-nCoV A Novel Coronavirus from Patients with Pneumonia in China We thank all member of the Whittaker and Daniel labs at Cornell University for comments and discussion, and Joshua Chappie for invaluable help with structural modeling. Work in the author's laboratory is supported by the National Institutes of Health (research grant R01AI35270).