key: cord-1038286-09ots6oj authors: Woo, Patrick C.Y.; Huang, Yi; Lau, Susanna K.P.; Tsoi, Hoi‐wah; Yuen, Kwok‐yung title: In Silico Analysis of ORF1ab in Coronavirus HKU1 Genome Reveals a Unique Putative Cleavage Site of Coronavirus HKU1 3C‐Like Protease date: 2013-11-14 journal: Microbiol Immunol DOI: 10.1111/j.1348-0421.2005.tb03681.x sha: 53c5dd5166b71eb75d2d7850e40aff428e91156e doc_id: 1038286 cord_uid: 09ots6oj Recently we have described the discovery and complete genome sequence of a novel coronavirus associated with pneumonia, coronavirus HKU1 (CoV‐HKU1). In this study, a detailed in silico analysis of the ORF1ab, encoding the 7,182‐amino acid replicase polyprotein in the CoV‐HKU1 genome showed that the replicase polyprotein of CoV‐HKU1 is cleaved by its papain‐like proteases and 3C‐like protease (3CL(pro)) into 16 polypeptides homologous to the corresponding polypeptides in other coronaviruses. Surprisingly, analysis of the putative cleavage sites of the 3CL(pro) revealed a unique putative cleavage site. In all known coronaviruses, the P1 positions at the cleavage sites of the 3CL(pro) are occupied by glutamine. This is also observed in CoV‐HKU1, except for one site at the junction between nsp10 (helicase) and nsp11 (member of exonuclease family), where the P1 position is occupied by histidine. This amino acid substitution is due to a single nucleotide mutation in the CoV‐HKU1 genome, CAG/A to CAT. This probably represents a novel cleavage site because the same mutation was consistently observed in CoV‐HKU1 sequences from multiple specimens of different patients; the P2 and P1′‐P12′ positions of this cleavage site are consistent between CoV‐HKU1 and other coronaviruses; and as the helicase is one of the most conserved proteins in coronaviruses, cleavage between nsp10 and nsp11 should be an essential step for the generation of the mature functional helicase. Experiments, including purification and C‐terminal amino acid sequencing of the CoV‐HKU1 helicase and trans‐cleavage assays of the CoV‐HKU1 3CL(pro) will confirm the presence of this novel cleavage site. . In early 2004, a novel human coronavirus associated with upper respiratory tract infections, human coronavirus NL63 (HCoV-NL63), was also discovered (11, 47) . Phenotypically, the envelopes of all coronaviruses are studded with long, petal-shaped spikes, resulting in the appearance of a crown under the electron microscope. Genotypically, coronaviruses possess the largest genomes of all RNA viruses of about 30 kb. As a result of a unique mechanism of viral replication, coronaviruses have a high frequency of recombination (24, 25, 29, 31) . The order of the genes in the genomes of all coronaviruses that encode the RNAdependent RNA polymerase (pol) and the four structural proteins present in all coronaviruses is 5'-pol, spike (S), envelope (E), membrane (M), nucleocapsid (N)-3'. Based on serological and genotypic characterization, coronaviruses were divided into three distinct groups, with HCoV-229E and HCoV-NL63 being group 1 coronaviruses and HCoV-OC43 a group 2 coronavirus respectively (11, 26, 47) . Recently, we have described the discovery of a group 2 novel coronavirus associated with pneumonia, coronavirus HKU1 (CoV-HKU1), and its complete genome sequence (48) (49) (50) . In the genome of CoV-HKU1, more than two thirds of the genome was made up of a single open reading frame (ORF), ORF1ab, which encodes the putative replicase polyprotein. In other coronaviruses, this polyprotein, after cleavage by proteases encoded by this ORF, gives rise to more than 10 proteins, of which many are important enzymes essential for the survival of the coronaviruses. In this article, we describe a detailed in silico analysis of this ORF in CoV-HKU1, which reveals a unique putative cleavage site of the 3C-like protease (3CL pro ) in coronavirus HKU1. Other similarities and differences of this ORF in CoV-HKU1 as compared to other coronaviruses are also discussed. The predicted amino acid sequences of ORF1ab in CoV-HKU1 were extracted from the CoV-HKU1 genome sequence (GenBank accession no. NC_006577) (48) . The corresponding amino acid sequences of ORF1ab in other coronaviruses were extracted from GenBank (HCoV-OC43, GenBank accession no. NP_937947; BCoV, GenBank accession no. NP_150073; MHV A59, GenBank accession no. P16342; HCoV-229E, GenBank accession no. NP_073549; IBV, GenBank accession no. NP_066134; and SARS-CoV, GenBank accession no. NP_828849). The amino acid sequences of the RNA-dependent RNA polymerases in hepatitis C virus, rabbit hemorrhagic disease virus and poliovirus 1 and their secondary struc-tures were retrieved from Protein DataBank (PDB ID 1QUV, 1KHV and 1RDR respectively). In silico analysis. Multiple alignment was performed using ClustalX 1.83 (46) . Protein family analysis was performed using PFAM (2) . Secondary structure prediction was performed using PROFsec (38) . Three-dimensional modeling of 3CL pro of CoV-HKU1 was performed using 3CL pro of SARS-CoV as the template (54) . Manually corrected sequence alignment and homology modeling requirement were submitted via Deep View -spdbv 3.7 (14) to Swiss-model Protein Modeling web server (http://swissmodel.expasy.org/) (40) . The 3D structure was displayed using MOLMOL 2K.2 (22) . Phylogenetic tree construction was performed using neighbour joining method with ClustalX 1.83. The replicase polyprotein (7,182 amino acids) of CoV-HKU1 is translated from ORF1a (13,395 nt) and ORF1b (8,154 nt) . Similar to other coronaviruses, a slippery sequence (UUUAAAC), followed by sequences that form a putative pseudoknot structure, are present ( Fig. 1) (5, 6) . Translation will presumably occur by a -1 RNA-mediated ribosomal frameshift at the end of the slippery sequence. Therefore, instead of reading the transcript as UUUAAACGGG, it will be read as UUUAAACCGGG. In infectious bronchitis virus (IBV) and SARS-CoV, site directed mutagenesis experiments have confirmed the importance of the slippery sequence as well as the pseudoknot structure for the frameshift to occur (7, 45) . Multiple alignments between the replicase polyprotein of CoV-HKU1 and those of other coronaviruses reveal the orthologs of two types of proteases, papainlike proteases (PL pro ) and 3CL pro . As in other coronaviruses and by multiple alignment and analysis of sequences around consensus cleavage sites, the replicase polyprotein of CoV-HKU1 is cleaved by its PL pro and 3CL pro into 16 polypeptides ( Table 1) . The three Nterminal cleavage sites are putatively cleaved by the PL pro , whereas the others are putatively cleaved by the 3CL pro . The putative PL pro cleavage sites in the replicase polyprotein of CoV-HKU1 follow the general rules determined by site directed mutagenesis experiments in other group 2 coronaviruses. In MHV, site directed mutagenesis studies have shown that at the p28/p65 cleavage site, a basic residue (arginine or lysine) in the P5 position, arginine in the P2 position, and glycine in the P1 position are the critical amino acids for PL1 pro recognition and processing (17) . In CoV-HKU1, arginine, arginine and glycine are present in these three positions respectively (Table 2 ). In MHV, site directed mutagenesis studies have shown that at the p65/nsp1 cleavage site, arginine in the P5 position, alanine in the P1 position, and glycine in the P1' position are the critical amino acids for PL1 pro recognition and processing (4) . In CoV-HKU1, arginine, alanine and glycine are present in these three positions respectively (Table 2 ). In MHV, site directed mutagenesis studies have shown that at the nsp1/p44 cleavage site, phenylalanine in the P6 position, glycine in the P2 position, and glycine in the P1 position are the critical amino acids for PL2 pro recognition and processing (20) . In CoV-HKU1, phenylalanine, glycine and glycine are present in these three positions respectively (Table 2) . In contrast to the putative PL pro cleavage sites, analysis of the putative cleavage sites of the 3CL pro in the replicase polyprotein of CoV-HKU1 revealed a unique putative cleavage site by the 3CL pro of CoV-HKU1. The 3CL pro of coronaviruses are named as such because their structures and substrate specificities resemble those of the 3C proteases in picornaviruses. In all known coronaviruses, the P1 positions at the cleavage sites of the 3CL pro are occupied by glutamine, whereas the P1' positions are occupied by aliphatic amino acids, including alanine, glycine, cysteine, asparagine and serine (3, 9, 16, 23) . In most cases, the P1' positions are occupied by alanine, glycine or serine. Occasionally, in group 2 coronaviruses, the P1' positions are occupied by cysteine (30) ; and in rhinoviruses, by asparagines (3). In CoV-HKU1, multiple alignment with the genome sequences of other coronaviruses revealed that all, except one, P1 positions of putative cleavage sites of 3CL pro are occupied by glutamine. The exception lies at the junction between nsp10 (helicase) and nsp11 (member of exonuclease family), where the P1 position is occupied by histidine. This probably represents a novel cleavage site due to the following reasons. First, the sequence is genuine instead of a result of RT-PCR or sequencing errors because the same position has been amplified and sequenced four times in different clinical specimens of the index patient (48) . Second, the same mutation was consistently found in the genome sequence of the CoV-HKU1 in the second patient with pneumonia (data not shown). Therefore, it is unlikely that it is due to an occasionally occurring mutation in just one patient. Third, in all other group 2 coronaviruses with genome sequences available (MHV, BCoV and HCoV-OC43), the P2, P1', P2', P3', P4', P5', P6', P7', P8', P9', P10', P11' and P12' positions of this cleavage site are all occupied by leucine/valine [shown to be important for 3CL pro cleavage (35) ], cysteine, serine/threonine, threonine, asparagines, leucine, phenylalanine, lysine, aspartic acid, cysteine, serine, lysine/arginine and serine respectively, which are also present at the corresponding positions in CoV-HKU1 ( Fig. 2) . Fourth, this amino acid substitution is probably due to a single nucleotide mutation in the CoV-HKU1 genome, CAG (as in MHV) or CAA (as in BCoV and HCoV-OC43) (which encodes glutamine) to CAT (which encodes histidine). Finally, as the helicase is one of the most conserved proteins in coronaviruses, cleavage between nsp10 and nsp11 should be an essential step for the generation of the mature functional helicase. The present novel putative cleavage site, as well as some other atypical cleavage sites by 3CL pro , were not predicted by a program for prediction of 3CL pro cleavage sites (21) . Experiments, including purification of the CoV-HKU1 helicase in its native form and C-terminal amino acid sequencing and trans-cleavage assays of the CoV-HKU1 3CL pro will confirm the presence of this novel cleavage site. The replicase polyprotein of CoV-HKU1 is putatively cleaved by its PL pro and 3CL pro into 16 polypeptides (Table 1) . Among these 16 putative polypeptides, p28, p65, nsp3, nsp4, nsp5, nsp6, nsp7 and nsp8 have no homologues with known functions by BLAST search. Similar to other coronaviruses, the N terminal of the putative nsp1 consists of an acidic domain (amino acids 1-333 of nsp1). However, unlike other coronaviruses, there are 16 tandem copies of a 10-amino acid repeat near the C terminal of the acidic domain. The first 14 tandem copies are perfect repeats of NDDENVVTGD, with four of the 10 amino acids being either glutamic acid or aspartic acid. The last two copies are imperfect repeats of NNDEEIVTGD and NDDQIVVTGD respectively. In picornaviruses, it has been found that there were three tandem copies of 24 imperfect amino acid repeats, named VPg1, VPg2 and VPg3 (10) . However, no specific functions have been found for these tandem repeats. Downstream to the acidic domain is the PL1 pro (amino acids 334-549 of nsp1), with the characteristic catalytic dyad of cysteine and a downstream histidine. Moreover, similar to the PL1 pro of other coronaviruses, four conserved cysteine residues (Cys 429 , Cys 432 , Cys 455 and Cys 457 ) that may contain a Zn 2ϩ binding motif are present. Downstream to PL1 pro is the X domain (amino acids 550-709 of nsp1) that contains putative ADP-ribose 1"-phosphatase (ADRP) activity [amino acids 575-685 of nsp1 belonging to Appr-1-p processing enzyme family (Pfam accession no. PF01661)]. The putative ADRP activity was first described for this domain by Snijder et al. (43) . In other microorganisms, such as Saccharomyces cerevisiae and other eukaryotes, ADRP and its functionally related enzyme cyclic nucleotide phosphodiesterase (CPDase), were important for tRNA processing (33). ADP-ribose 1",2"-cyclic phosphate (ApprϾp) is produced as a result of tRNA splicing. ApprϾp is converted to ADP-ribose 1"-phosphate (Appr-1"p) by CPDase. Appr-1"p is then further processed by ADRP. In other group 2 coronaviruses, both putative ADRP and CPDase (in NS2a) homologues are present (43) . Interestingly, in CoV-HKU1 and also SARS-CoV, only ADRP, but not CPDase [NS2a is not present in CoV-HKU1 (48)], is present. Downstream to the X domain, there is a segment of 134 amino acids, also present in other coronaviruses, with no matches of known functions found by BLAST search. Downstream to this segment of unknown function is the putative PL2 pro (amino acids 844-1140 of nsp1), with the characteristic catalytic dyad of cysteine and a downstream histidine. Similar to the PL2 pro of other coronaviruses, four conserved cysteine residues (Cys 1026 , Cys 1028 , Cys 1060 and Cys 1062 ) that may contain a Zn 2ϩ binding motif are also present. At the C terminal of nsp1, similar to other coronaviruses, there is a Y domain with two hydrophobic stretches and 11 conserved Cys/His residues present in the N terminal 889 amino acids (30) . Ziebuhr et al. speculated that this Y domain may be responsible for anchoring nsp1 into membranes and binding cations (55) . Downstream to nsp1, p44 (496 amino acid residues), putatively cleaved from nsp1 by PL2 pro , with a hydrophobic transmembrane domain, is present. Three-dimensional structure modeling of the putative 3CL pro (nsp2) of CoV-HKU1 using the three-dimensional structure of SARS-CoV as the template revealed a similar structure for the monomer of the CoV-HKU1 3CL pro as compared to the 3CL pro of SARS-CoV and HCoV-229E (Fig. 3) . Three domains were present, domain I (8-101 amino acid residues), domain II (102-184 amino acid residues) and domain III (201-303 amino acid residues). Domains I and II consist of six strands of antiparallel β barrels and domain III is a globular structure with five α helices. The catalytic dyad of the protease is located in a cleft between domains I (His 41 ) and II (Cys 165 ). As in other coronaviruses, the putative Pol (nsp9) of CoV-HKU1 consists of the N-terminal domain, the fingers subdomain, the palm subdomain and the thumb subdomain (Fig. 4) . The N-terminal domain (residues 1-371) of the Pol of CoV-HKU1 shares 52-87% amino acid identities with those of other coronaviruses but Ͻ31% amino acid identities with those of other RNA viruses. The function of the N-terminal domain is unclear. On the other hand, the fingers (motifs F and G), palm (motifs A-E) and thumb subdomains of the Pol of CoV-HKU1 are homologous to those of other coronaviruses and some positive-stranded RNA virus-es, such as hepatitis C virus (of family Flaviviridae) (1), rabbit hemorrhagic disease virus (of family Caliciviridae) (34) and poliovirus 1 (of family Picornaviridae) (15) (Fig. 4) . The critical amino acids, aspartic acids in motif A and XDD sequence in motif C, which have been demonstrated to be essential for the function of Pol in other coronaviruses and positive-stranded RNA viruses, are also present in Pol of CoV-HKU1. Similar to the helicase of SARS-CoV, the putative helicase (nsp10) of CoV-HKU1, which unwinds duplex RNA by nucleoside triphosphate hydrolysis activity, possesses six conserved motifs (18) (Fig. 5 ). Similar to other coronaviruses, based on the presence of conserved motifs, the putative helicase of CoV-HKU1 should belong to superfamily 1 of the three superfamilies of RNA helicases (12, 19) . Furthermore, all nine cysteine residues and three histidine residues that are conserved in the N termini of the helicases in nidoviruses are present in the putative helicase of CoV-HKU1 (Fig. 5 ). It has been suggested that this cysteine/histidine region constitutes a Zn 2ϩ -binding domain, which controls the activities of the catalytic domain (42) . Similar to the helicases of other nidoviruses and most other helicases of superfamily 1, the helicase of CoV-HKU1 probably unwinds the RNA duplex in a 5' to 3' fashion (41, 44) . The putative nsp11 possesses a putative 3'-to-5' exonuclease (ExoN) domain of the DEDD superfamily (56) . The putative nsp12 possesses a putative poly(U)specific endoribonuclease (XendoU) domain (27) . The (8) . These three putative enzymes, as well as ADRP and CPDase, are enzymes in RNA processing pathways. ExoN, XendoU and 2'-O-MT are enzymes in a small nucleolar RNA processing and utilization pathway, in contrast to the pre-tRNA splicing pathway that ADRP and CPDase belong to. At the moment, no experiments have been performed to confirm the activities of these putative RNA processing enzymes. Further studies are required to elucidate the exact roles of these putative viral enzymes in the corresponding viruses. Crystal structure of the RNA-dependent RNA polymerase of hepatitis C virus The Pfam protein families database Cleavage site analysis in picornaviral polyproteins: discovering cellular targets by neural networks Characterization of a second cleavage site and demonstration of activity in trans by the papain-like proteinase of the murine coronavirus mouse hepatitis virus strain A59 Completion of the sequence of the genome of the coronavirus avian infectious bronchitis virus The primary structure and expression of the second open reading frame of the polymerase gene of the coronavirus MHV-A59; a highly conserved polymerase is expressed by an efficient ribosomal frameshifting mechanism Characterization of an efficient coronavirus ribosomal frameshifting signal: requirement for an RNA pseudoknot RNA methylation under heat shock control Expression of virus-encoded proteinases: functional and structural similarities with cellular enzymes A tandem repeat gene in a picornavirus A previously undescribed coronavirus associated with respiratory disease in humans Helicases: amino acid sequence comparisons and structure-function relationships Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modelling Structure of the RNA-dependent RNA polymerase of poliovirus Conservation of substrate specificities among coronavirus main proteases Identification of the murine coronavirus p28 cleavage site Multiple enzymatic activities associated with severe acute respiratory syndrome coronavirus helicase Virus-encoded RNA helicases Identification of the murine coronavirus MP1 cleavage site recognized by papain-like proteinase 2 Coronavirus 3CL pro proteinase cleavage sites: possible relevance to SARS virus pathology MOL-MOL: a program for display and analysis of macromolecular structures Viral proteinases Sequence evidence for RNA recombination in field isolates of avian coronavirus infectious bronchitis virus Recombination between nonsegmented RNA genomes of murine coronaviruses The molecular biology of coronaviruses Purification, cloning, and characterization of XendoU, a novel endoribonuclease involved in processing of intron-encoded small nucleolar RNAs in Xenopus laevis Detection of severe acute respiratory syndrome (SARS) coronavirus nucleocapsid protein in SARS patients by enzyme-linked immunosorbent assay Evidence of genetic diversity generated by recombination among avian coronavirus IBV The complete sequence (22 kilobases) of murine coronavirus gene 1 encoding the putative proteases and RNA polymerase High-frequency RNA recombination of murine coronaviruses Characterization of the Saccharomyces cerevisiae cyclic nucleotide phosphodiesterase involved in the metabolism of ADP-ribose 1",2"-cyclic phosphate Crystal structures of active and inactive conformations of a caliciviral RNA-dependent RNA polymerase Picornaviral processing: some new ideas Coronavirus as a possible cause of severe acute respiratory syndrome Clinical progression and viral load in a community outbreak of coronavirus-associated SARS pneumonia: a prospective study The PredictProtein server SWISS-MODEL: an automated protein homology-modeling server The human coronavirus 229E superfamily 1 helicase has RNA and DNA duplex-unwinding activities with 5'-to-3' polarity A complex zinc finger controls the enzymatic activities of nidovirus helicases Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage The severe acute respiratory syndrome NTPase/helicase belongs to a distinct class of 5' to 3' viral helicases Mechanisms and enzymes involved in SARS coronavirus genome expression The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Identification of a new human coronavirus Characterization and complete genome sequence of a novel coronavirus, coronavirus HKU1, from patients with pneumonia Clinical features and molecular epidemiology of coronavirus HKU1 associated community-acquired pneumonia Phylogenetic and recombination analysis of coronavirus HKU1, a novel coronavirus from patients with pneumonia Relative rates of non-pneumonic SARS coronavirus infection and SARS coronavirus pneumonia Detection of specific antibodies to severe acute respiratory syndrome (SARS) coronavirus nucleocapsid protein for serodiagnosis of SARS coronavirus pneumonia Longitudinal profile of immunoglobulin G (IgG), IgM, and IgA antibodies against the severe acute respiratory syndrome (SARS) coronavirus nucleocapsid protein in patients with pneumonia due to the SARS coronavirus The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor The autocatalytic release of a putative RNA virus transcription factor from its polyprotein precursor involves two paralogous papain-like proteases that cleave the same peptide bond Exoribonuclease superfamilies: structural analysis and phylogenetic distribution