key: cord-0771528-w72ma7fp authors: Dong, Shengjie; Sun, Jiachen; Mao, Zhuo; Wang, Lu; Lu, Yi‐Lin; Li, Jiesen title: A guideline for homology modeling of the proteins from newly discovered betacoronavirus, 2019 novel coronavirus (2019‐nCoV) date: 2020-03-29 journal: J Med Virol DOI: 10.1002/jmv.25768 sha: 96de1243ebc56e3815d9221e1d39bd7da283e7ae doc_id: 771528 cord_uid: w72ma7fp During an outbreak of respiratory diseases including atypical pneumonia in Wuhan, a previously unknown β‐coronavirus was detected in patients. The newly discovered coronavirus is similar to some β‐coronaviruses found in bats but different from previously known SARS‐CoV and MERS‐CoV. High sequence identities and similarities between 2019‐nCoV and SARS‐CoV were found. In this study, we searched the homologous templates of all nonstructural and structural proteins of 2019‐nCoV. Among the nonstructural proteins, the leader protein (nsp1), the papain‐like protease (nsp3), the nsp4, the 3C‐like protease (nsp5), the nsp7, the nsp8, the nsp9, the nsp10, the RNA‐directed RNA polymerase (nsp12), the helicase (nsp13), the guanine‐N7 methyltransferase (nsp14), the uridylate‐specific endoribonuclease (nsp15), the 2'‐O‐methyltransferase (nsp16), and the ORF7a protein could be built on the basis of homology templates. Among the structural proteins, the spike protein (S‐protein), the envelope protein (E‐protein), and the nucleocapsid protein (N‐protein) can be constructed based on the crystal structures of the proteins from SARS‐CoV. It is known that PL‐Pro, 3CL‐Pro, and RdRp are important targets for design antiviral drugs against 2019‐nCoV. And S protein is a critical target candidate for inhibitor screening or vaccine design against 2019‐nCoV because coronavirus replication is initiated by the binding of S protein to cell surface receptors. It is believed that these proteins should be useful for further structure‐based virtual screening and related computer‐aided drug development and vaccine design. (nsp15), the 2'-O-methyltransferase (nsp16), and the ORF7a protein could be built on the basis of homology templates. Among the structural proteins, the spike protein (S-protein), the envelope protein (E-protein), and the nucleocapsid protein (N-protein) can be constructed based on the crystal structures of the proteins from SARS-CoV. It is known that PL-Pro, 3CL-Pro, and RdRp are important targets for design antiviral drugs against 2019-nCoV. And S protein is a critical target candidate for inhibitor screening or vaccine design against 2019-nCoV because coronavirus replication is initiated by the binding of S protein to cell surface receptors. It is believed that these proteins should be useful for further structure-based virtual screening and related computer-aided drug development and vaccine design. Middle East respiratory syndrome coronavirus (MERS-CoV) have caused more than 10 000 cumulative cases. 1, 2 Very recently, there have been thousands of pneumonia cases in Wuhan, China. These cases of pneumonia were found to be related to a large seafood and animal market in Wuhan, where local government agencies quickly adopted sanitation and disinfection measures. Now, it is known that these cases of pneumonia were caused by a novel betacoronavirus, that is, the 2019 novel coronavirus (2019-nCoV). [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] Then, the genome sequence and amino acid sequences were originally released by several research groups from different countries. [13] [14] [15] The genome of the newly discovered CoV consists of a single, positivestranded RNA that is about 30k nucleotides long. The overall genome organization of the newly discovered CoV is similar to that of other coronaviruses. The newly sequenced virus genome encodes the open reading frames (ORFs) common to all betacoronaviruses, containing ORF1ab that encodes several enzymatic proteins, the spike-surface glycoprotein (S protein), the small envelope protein (E protein), the matrix protein (M protein), and the nucleocapsid protein (N protein), as well as several nonstructural accessory proteins. Currently, the specific primers and probes for detection 2019-nCoV were synthesized and fabricated against the genetic targets such as ORF1ab and N gene region. According to the gene sequencing data, Wu terrupted. [17] [18] [19] [20] [21] It can be predicted that the research results of computational biology and bioinformatics will play an important role in the process of resistance to new coronavirus. 22, 23 Realization and prediction of these structural and nonstructural proteins from 2019-nCoV should be used by more scientific and technological workers, especially those engaged in drug research and development. Using the observed and achieved three-dimensional models of these structural and nonstructural proteins, the screening of potential drugs against 2019-nCoV could be carried out. In this study, using the amino acid sequences released on NCBI, we try to clarify the closest sequences and identify the most suitable template existed for homology modeling of several crucial proteins of 2019-nCoV. This study would be beneficial to the next drug screening research. The Basic Local Alignment Search Tool (BLAST) could find regions of local similarity between sequences. [24] [25] [26] [27] [28] [29] The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. The maximum number of aligned sequences to display was set to 100. The expected number of chance matches in a random model was set to 10 . The length of the seed that initiates an alignment was set to 6. SWISS-MODEL is a fully automated protein structure homologymodeling server, accessible via the ExPASy web server or from the program DeepView (Swiss Pdb-Viewer). [30] [31] [32] [33] The purpose of this server is to make protein modeling accessible to all life science researchers worldwide. Building a homology model comprises four main steps: identification of structural template(s), alignment of the target sequence and template structure(s), model-building, and model quality evaluation. Clustal Omega is the latest addition to the Clustal family. 34, 35 Clustal Omega is a multiple sequence alignment program that uses seeded guide trees and HMM profileprofile techniques to generate alignments between three or more sequences. It produces biologically meaningful multiple sequence alignments of divergent sequences. In this study, using three popular online tools, that is, BLAST, SWISS-MODEL, and Clustal Omega, we performed sequence alignment investigations on these 10 primary sequences of 2019-nCoV. Here, we summarized the main findings mentioned in this section on the structure prediction of the proteins encoded by ORF1ab using the homology modeling method in Table 1 Here, we explored the rationality and feasibility of the homology modeling of other proteins. According to BLAST search results, it is found that SARS spike glycoprotein is a good template for DONG ET AL. 2019-nCoV S protein is about 76%, and the query coverage rate is 95%; the homology of MERS S protein (PDB ID: 5X59) and 2019-nCoV S protein is about 34%, and the query coverage rate is 74% (see Figures S63 and S64 ). Using SWISS-MODEL, 6ACD_C can be used as a potential template for modeling 2019-nCoV S protein (see Figure S65 ). Then, the homology modeling of S protein could be realized (see Figure S66 ). Next, we try to perform the homology modeling of ORF3a protein. Nevertheless, although 2019-nCoV ORF3a protein and SARS-CoV ORF3a protein are highly homologous (see Figure S67 ), the crystal structure of this protein is unknown up to now (see Figure S68 ). Fortunately, the crystal structure of the template protein for modeling 2019-nCoV E protein is realized experimentally (see Figure S69 ). Using SWISS-MODEL, the template was found successfully (see Figure S70 ). The sequence identity is more than 90% (see Figure S71 ). Regretfully, the crystal structure of the template protein for modeling membrane glycoprotein and ORF6 protein is deficient (see Figures S72-S75 ). Hopefully, the structure of the template protein of ORF7a protein has been identified experimentally (see Figure S76 ). Using SWISS-MODEL, the template was found successfully (see Figure S77 ). The sequence identity is more than 90% (see Figure S78 ). Currently, there is no suitable template protein for the homology modeling of ORF8 protein (see Figures S79 and S80) . Then, we tried to model the structure of the N protein. We found that the N protein would be divided into two parts and the two parts need to be modeled independently (see Figure S81 ). The sequence identity is more than 90% (see Figures S82 and S83) . ORF10 protein is a short peptide, and there is no template available at present. Here, we summarized the main findings mentioned in this section on the structure prediction of the other proteins from 2019-nCoV using the homology modeling method in Table 2 . Coronavirus replication is initiated by the binding of S protein to cell surface receptors. S protein consists of two functional subunits. S1 (globule) is used for receptor binding and S2 (stem) is used for membrane fusion. The specific interaction between S1 and homologous receptors can trigger the conformational change of the S2 subunit, which leads to the fusion of virus envelope and cell membrane and release of nucleocapsid into the cytoplasm. Receptor binding largely determines the host range and histotropism of coronavirus. Thus, the S protein should be a critical candidate for drug screening or vaccine design against 2019-nCoV. 36, 37 In addition, the cartoons of the proteins modeled in this section are shown in Figure 2 . We found that the main color of these ribbons is composed of red, demonstrating that there are some squeezing and collision between the atoms in these structures to some degree. Consequently, before these models can be used as targets for computer-aided drug design, conformation optimization based on molecular mechanics or molecular dynamics should be performed sufficiently. T A B L E 1 Initial reference templates, final recommended templates, and sequence identities information for homology modeling of the ORF1ab related proteins common to all β-coronaviruses, including ORF1ab that encodes many enzymatic proteins, the spike-surface glycoprotein (S-protein), the small envelope protein (E-protein), the matrix protein (M-protein), and the nucleocapsid protein (N-protein), as well as several nonstructural proteins. 43 2019-nCoV shared a better sequence homology toward the sequences of SARS-CoV than that of MERS-CoV. We demonstrated that it is likely that they share considerable sequence Identification of a novel coronavirus in patients with severe acute respiratory syndrome Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia Clinical features of patients infected with 2019 novel coronavirus in Wuhan A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster A novel coronavirus outbreak of global health concern Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study A novel coronavirus from patients with pneumonia in China Medical journals and the 2019-nCoV outbreak A novel coronavirus emerging in China-key questions for impact assessment Another decade, another coronavirus Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Evolution of the novel coronavirus from the ongoing Wuhan outbreak and modeling of its spike protein for risk of human transmission Can an anti-HIV combination or other existing drugs outwit the new coronavirus? Science First case of 2019 novel coronavirus in the United States Therapeutic efficacy of the small molecule GS-5734 against Ebola virus in rhesus monkeys Broad-spectrum antiviral GS-5734 inhibits both epidemic and zoonotic coronaviruses GS-5734) protects African green monkeys from Nipah virus challenge COVID-2019: the role of the nsp2 and nsp3 in its pathogenesis Composition and divergence of coronavirus spike proteins and host ACE2 receptors predict potential intermediate hosts of SARS-CoV-2 Basic local alignment search tool Gapped BLAST and PSI-BLAST: a new generation of protein database search programs 2 Cartoon displayed models of proteins Domain enhanced lookup time accelerated BLAST BLAST: at the core of a powerful and diverse set of sequence analysis tools BLAST: improvements for better sequence analysis NCBI BLAST: a better web interface Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: a historical perspective Toward the estimation of the absolute quality of individual protein structure models The SWISS-MODEL Repository-new features and functionality SWISS-MODEL: homology modelling of protein structures and complexes Fast, scalable generation of highquality protein multiple sequence alignments using Clustal Omega Clustal Omega for making accurate alignments of many protein sequences Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Structural basis for the recognition of the SARS-CoV-2 by full-length human ACE2 The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak Coronavirus infections-more than just the common cold Supporting the health care workforce during the COVID-19 global epidemic From containment to mitigation of COVID-19 in the US Critical care utilization for the COVID-19 outbreak in Lombardy, Italy: early experience and forecast during an emergency response A new coronavirus associated with human respiratory disease in China A pneumonia outbreak associated with a new coronavirus of probable bat origin A guideline for homology modeling of the proteins from newly discovered betacoronavirus, 2019 novel coronavirus (2019-nCoV)