key: cord-0736050-ok5fi567 authors: Parvez, Mohammad Khalid; Khan, Azmat Ali title: Molecular modeling and analysis of hepatitis E virus (HEV) papain-like cysteine protease date: 2014-01-22 journal: Virus Res DOI: 10.1016/j.virusres.2013.11.016 sha: 3b5c406e95bdc2cd2dd35261b0737152cb7534c6 doc_id: 736050 cord_uid: ok5fi567 The biochemical or biophysical characterization of a papain-like cysteine protease in HEV ORF1-encoded polyprotein still remains elusive. Very recently, we have demonstrated the indispensability of ORF1 protease-domain cysteines and histidines in HEV replication, ex vivo (Parvez, 2013). In this report, the polyprotein partial sequences of HEV strains and genetically-related RNA viruses were analyzed, in silico. Employing the consensus-prediction results of RUBV-p(150) protease as structural-template, a 3D model of HEV-protease was deduced. Similar to RUBV-p(150), a ‘papain-like β-barrel fold’ structurally confirmed the classification of HEV-protease. Further, we recognized a catalytic ‘Cys434-His443’ dyad homologue of RUBV-p(150) (Cys1152-His1273) and FMDV-L(pro) (Cys51-His148) in line with our previous mutational analysis that showed essentiality of ‘His443’ but not ‘His590’ in HEV viability. Moreover, a RUBV ‘Zn(2+) binding motif’ (Cys1167-Cys1175-Cys1178-Cys1225-Cys1227) equivalent of HEV was identified as ‘Cys457-His458-Cys459 and Cys481-Cys483’ residues within the ‘β-barrel fold’. Notably, unlike RUBV, ‘His458’ also clustered therein, that was in conformity with the consensus cysteine protease ‘Zn(2+)-binding motif’. By homology, we also proposed an overlapping ‘Ca(2+)-binding site’ ‘D-X-[DNS]-[ILVFYW]-[DEN]-G-[GP]-XX-DE’ signature, and a ‘proline-rich motif’ interacting ‘tryptophan (W437-W472)’ module in the modeled structure. Our analysis of the predicted model therefore, warrants critical roles of the ‘catalytic dyad’ and ‘divalent metal-binding motifs’ in HEV protease structural-integrity, ORF1 self-processing, and RNA replication. This however, needs further experimental validations. Proteases or proteinases are proteolytic enzymes that regulate biological processes in cellular organisms and viruses. Proteases are classified according to their catalytic site into four major classes: cysteine (thiol) proteases, serine proteases, aspartyl (acid) proteases, and metalloproteases. Of the 21 families of cysteine proteses discovered so far, almost half of them are coded by viruses (Grzonka et al., 2001) . The best characterized family of cysteine proteases is the 'papain' family characterized by a two-domain structure that contains an active site (catalytic pocket) for substrate binding. Some RNA virus-encoded papain-like cysteine proteases with 'Cysteine(Cys)-Histidine(His)-Aspartate(Asp)' catalytic triads employ a structural 'Zn 2+ -binding motif' residing between the loops of the two ␤-barrel folds (Sommergruber et al., 1997; Yu and Lloyd, 1992) . However, the foot-and-mouth disease virus (FMDV) leader protease (L pro ), one of the most well characterized viral papain-like proteases (Ryan and Flint, 1997) , including the rubella virus (RUBV) p 150 (Chen et al., 1996) and the human coronavirus-229E (HCoV-229E) PL1pro (Herold et al., 1998) has shown a 'Cys-His' catalytic 'dyad' instead of a 'triad'. The RNA-interacting Zn 2+ -binding proteases are critical in virus replications that include UL52 of herpes simplex virus type 1 (HSV-1) (Chen et al., 2005) , the nucleocapsids of human immunodeficiency virus (HIV-1) and murine-leukaemia virus (MLV) (Gorelick et al., 1996; Mark-Danieli et al., 2005) , the V-protein of Sendai virus (SENV) (Fukuhara et al., 2002) , and the NS5A of hepatitis C virus (HCV) (Tellinghuisen et al., 2004) . Of the several different 'Ca 2+ -binding motifs', the most common is a 'helix-loop-helix' structure referred to as the 'EF-hand'. Viral 'EF-hand Ca 2+ -binding motifs' are also reported in rotavirus-VP7, HIV-1-gp41 and polyomavirus-VP1 structural proteins, including the RUBV-nonstructural p 150 (Zhou et al., 2007) . Hepatitis E virus (HEV), the etiological agent of acute and chronic hepatitis E, is a non-enveloped positive-sense RNA virus (Holla et al., 2013; Kamar et al., 2013) . The HEV open reading frame 1 (ORF1) codes for nonstructural polyprotein (∼186 kDa) wherein the validity of a papain-like cysteine protease-domain (a.a. 434-590) (Fig. 1A) still remains a bottle-neck in the understanding of its biology. Also, the expression and purification of HEV-protease and therefore, its biochemical or biophysical characterization is not The active cysteines (red) and histidines (blue) are shown in the box, where 'X' represents the intervening residues. (B) Partial sequence alignment of proteases of HEV strains (n = 77), representing the four recognized viral genotypes 1, 2, 3 and 4. (GeneBank accession numbers: AB074918, AB074920, AB089824, AY575857, AY575858, AY575859, AF082843, AF060669, AY115488, AB091394, AB222182, AB246676, AB222183, AB236320, AB189071, AB189072, AB189074, AB189073, AB189075, AB189070, achieved, so far. Nevertheless, there is ample convincing molecular evidences, showing ORF1 polyprotein processing (Parvez, 2013; Karpe and Lole, 2011; Sehgal et al., 2006; Ropp et al., 2000; Ansari et al., 2000) along with a few contesting reports (Perttilä et al., 2013; Suppiah et al., 2011) . Very recently, using HEV-SAR 55 repliconsystem, the indispensability of ORF1 protease-domain 'catalytic' as well as X-domain 'protease-substrate' residues in HEV replication in hepatoma S10-3 cells, is demonstrated (Parvez, 2013) . With this supporting information and the known biology of geneticallyclose RNA virus proteases, we proposed a three-dimensional (3D) structure of HEV-protease using homology modeling. Protease-domain amino acid (a.a.) sequences of the four recognized HEV genotypes (1, 2, 3 and 4), represented by seventy seven viral strains, including RUBV-p 150 and FMDV-L pro were analyzed, using the programs ClustalW 1.83 with a gap open penalty-10 and gap extension penalty-0.5 (http://embnet.vital-it. ch/software/ClustalW.html) as well as MultAlign (www.multalin. toulouse.inra.fr/multalin). The program, PSIPRED was used to predict the secondary structure of HEV-protease (target) (http://bioinf.cs.ucl.ac.uk/psipred). The PROFsec program was also used to predict secondary structure elements: helix (␣, and 3 10), ␤-strand (E=extended strand in ␤-sheet conformation of at least two residues length) and loop (L) in HEV-protease (https://www.predictprotein.org/get results?req id=453912). Notably, PROFsec employs a system of neural networks with an expected average accuracy of more than 72% (Rost et al., 2004) . Further, based on the available structural homology of RUBVp 150 (Zhou et al., 2007) with that of FMDV-L pro (Guarne et al., 2000) , p 150 was used as template for HEV-protease 3D structure modeling using MODELLER 8v2. Briefly, an alignment of templatetarget sequences, the atomic coordinates of the template, and a script file served as the input. First, the files of target and template sequences were converted into MODELLER-PIR and the targettemplate alignment sequences were converted into MODELLER-PAP formats. MODELLER then calculated five 3D models (starting and ending models 1-5) of HEV-protease. The output files in PDB format that contained coordinates of the generated models were viewed in PyMOL1.6. The final modeled structure validation was performed in PROCHECK v.3.5 (http://nihserver.mbi.ucla.edu/SAVES/). Of the Ramachandran plots obtained using the PDB files of the predicted models, the one showing about 87% residues in the most favored regions, and about 1.3% in the disallowed region, was selected. Also, AP003430, AB222184, AB073912, AB248521, AB248522, AB248520, AF455784, AB074917, AB220972, AB161719, AB161718, AB220973, AB220975, AB220978, AB220977, AB220979, AB220976, AB161717, AB074915, AB091395, AB200239, AB099347, AB193176, AB097811, AB193177, AB193178, AB097812, AB220971, AB080575, AB220974, AB108537, Q450072, AB197674, EF077630, AB197673, AY723745, AJ272108, AY594199, AB253420, DQ279091, AF028091, AF076239, AF459438, AF051830, X99441, D10330, M73218, AF444002, AF444003, L25547, L25595, L08816, M94177, D11093, X98292, AY230202 and M74506). Positions of the highly conserved cysteine (red), histidine (blue) and tryptophan (green) residues are indicated as per SAR55 (genotype 1). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of the article.) for the final model, pseudo-energy profile was generated and Discrete Optimization Protein Energy (DOPE) potential was evaluated. Furthermore, formations of disulfide-bridges, if any, between the active cysteine residues in the modeled HEV-protease were predicted using DISULFIND (https://www.predictprotein.org/ get results?req id=453912). Multiple alignment-based consensus sequences that contained highly conserved cysteine, histidine and tryptophan residues across all non-redundant collection of HEV-protease domains, were derived (Fig. 1B) . The PRFsec-predicted secondary structure revealed elemental composition: H (15.62%), E (33.75%) and L (50.62%) in HEV-protease (data not shown). For model building, the HEV consensus target (a.a. 434-563) when aligned with RUBV-p 150 and FMDV-L pro sequences, showed the HEV-protease essential cysteines and histidines in a close homology with RUBV-p 150 ( Fig. 2A) . Interestingly, the PSIPRED-prediction of the secondary structure of HEV-protease was in a very close structural-homology with that of RUBV-p 150 (Fig. 2B) . The consensus-prediction results of RUBV-p 150 structural template allowed us, for the very first time, to construct a 3D model of HEV-protease with active 'cysteine-histidine' backbone. Of the five initial models constructed, the one with lowest DOPE (data not shown) was selected for further analysis. In the absence of a crystal-structure, our predicted model of HEV-protease was very close to experimentally and structurally characterized RUBV-p 150 (Fig. 3) . Like p 150 , the predicted 'papain-like ␤-barrel fold' in our model structurally confirmed the linear sequence homology-based classification of HEV-protease (Koonin et al., 1992) . Our comparative analysis of the viral protease models recognized the HEV 'Cys434-His443' homologue ( Fig. 3) of RUBV 'Cys1152-His1273' that constituted the putative 'catalytic dyad' in the 'papain-fold'. Based on homology analysis, we identified a 'Zn 2+ binding motif' within the framework of HEV-protease 'papain-like fold'. Similar to RUBV-p 150 'Cys1167-Cys1175-Cys1178-Cys1225-Cys1227' residues, clusters of 'Cys457-His458-Cys459' and 'Cys481-Cys483' were arranged closely in the proposed model that could form a potential Zn 2+ -binding pocket (Fig. 3) . Our further scanning for RUBV 'EF-hand Ca 2+ /CaM-binding motif' in HEV-protease identified partially homologous sequences. While the catalytic 'His1273', was excluded in RUBV 'Ca 2+ /CaMbinding' sequences, both 'Cys434' and 'His443' of the dyad remained out of its HEV equivalent (Fig. 3) . By homology, we therefore, A DISULFIND program-based analysis of protein secondary structure further confirmed formations of three disulfide-bridges between Cys434-Cys481, C471-Cys483 and Cys472-Cys563 residues in the HEV-protease (data not shown). Two highly conserved tryptophan (W437 and W476) residues were observed in HEV-protease domain (Fig. 1B) where W476 clustered in the predicted 'EF-hand loop'. The W476 could pair with the upstream-located W437 to form a putative 'WW/rsp5' module of the proposed HEV-protease model (Fig. 3) . The best characterized family of cysteine proteases is the 'papain' that includes mammalian lysosomal cathepsin where the 'papain-fold' catalytic dyad (Cys25-His159) is evolutionarily preserved in all known eukaryotic papain-like cysteine protaseses (Grzonka et al., 2001) . In HCoV-229E PL1pro, the conserved catalytic 'Cys1054-His1205' dyad was found to be indispensable for its proteolytic activity (Herold et al., 1998) . Further, the RUBVp 150 'Cys1152-His1273' and FMDV-L pro 'Cys51-His148' have been reported as Zn 2+ -dependent catalytic dyads (Liu et al., 2000; Chen et al., 1996; Gorbalenya et al., 1991) . Our comparative analysis thus, recognized the HEV 'Cys434-His443' homologue of RUBV that constituted the putative catalytic dyad. This was however, in contrast with the previously sequence homology-based prediction of 'Cys434-His590' as catalytic residues in the HEV-protease (Koonin et al., 1992) . Moreover, in the Mexican and Burmese strains, 'His590' is substituted with 'leucine' and 'tyrosine', respectively, showing non-conservational dispensability of histidine. A previous alignment analysis of FMDV-L pro with the equivalent regions of equine rhinitis virus (ERV-1 and ERV-2) polyproteins had shown complete conservation of the active 'Cys51' amongst the three aphthoviruses but not 'His148', except ERV-2 (Gorbalenya et al., 1991) . This also supports the non-conservation and positional dispensability of the second catalytic 'histidine' in viral thiol-proteases, in general. Although, the conserved 'Cys434' and non-conserved 'His590' was found non-essential in HEV viability, the indispensability of highly conserved 'His443' (Parvez, 2013) does support our structural recognition of the catalytic dyad. Very reasonably, such a structural diversity of cysteine proteases could be related to the extremely high mutation rate of the viral RNA genomes. As a result, many viral proteases have functionally diverged to the point where sequence similarity between homologues can be rarely detected. Several RNA virus papain-like proteases, such as the PLP2 of SARS-coronavirus (SARS-CoV) (Han et al., 2005) , nsp1 of equine arteritis virus (EAV) (Tijms et al., 2001) , the L pro of encephalomyocarditis virus (EMV) (Dvorak et al., 2001) , the NS3 of HCV (Barbato et al., 1999) , the PL1pro of HCoV-229E, the p 150 of RUBV (Zhou et al., 2007) and L pro of FMDV (Guarne et al., 2000) are shown to have 'Zn 2+ -binding motifs'. We therefore, using the RUBV-p 150 as molecular and structural surrogate-model, identified a 'Zn 2+ binding motif' homologue of HEV-protease. Structural Zn 2+ -binding sites are generally coordinated by four cysteines and a histidine as ligands (Zhou et al., 2009) . In line with this, a 'His458' was also clustered with four 'cysteines' in our predicted Zn 2+ -binding pocket, unlike RUBV-p 150 model where no histidine was present among five cysteines (Fig. 3) . However, the human rhinovirus 2 (HRV2) and poliovirus 2A proteases Zn-binding motifs, 'Cys52-Cys54-Cys112-His114' (Sommergruber et al., 1997) and 'Cys55-Cys57-Cys115-His117' (Yu and Lloyd, 1992) , respectively, has three cysteines plus a histidine. As understood, such structural and functional diversity could be clearly attributed to the extreme mutability and host-adaptability of RNA viruses in their evolutionary processes. Notably, the critical role of Zn 2+ -binding papain-like proteases in viral life cycles are well known. Previously, using the genomic replicon-system, the RUBV-p 150 'Cys1175, Cys1178, Cys1225 and Cys1227' residues are shown essential in its replication (Zhou et al., 2009) . In line with this, our recent analysis of the HEV-protease domain nonviable mutants of Cys457, Cys459, Cys481 and Cys483 residues (Parvez, 2013) , strongly support potentiality of the predicted 'Zn 2+ -binding motif'. Recently, a Ca 2+ -dependent association of calmodulin (CaM) with RUBV-P 150 has led to map a unique 'EF-hand Ca 2+ /CaMbinding motif' (CWLRAAANVAQAARAGAYTSAGCPKCAYGR; a.a. 1152-1182) that was required for its optimal stability under physiological conditions and virion infectivity (Zhou et al., 2007) . In the RUBV-P 150 model, the 'Ca 2+ /CaM-binding motif' partially overlaps the 'Zn 2+ -binding motif', including the catalytic 'Cys1152-His1273' dyad (Zhou et al., 2007) . In our predicted model, while the catalytic 'H1273', was excluded in RUBV 'Ca 2+ /CaM-binding' sequences, both 'Cys434' and 'His443' of the dyad remained out of its HEV equivalent. Moreover, the canonical 'EF-hand motif' signature 'D-X-[DNS]-[ILVFYW]-[DEN]-G-[GP]-XX-DE' is described to form the Ca 2+ -binding loops in living systems (Zhou et al., 2007) . Notably, presence of the essential cysteine residues in the overlapping putative 'Ca 2+ /CaM-binding motif', also suggested formation of intramolecular disulfide-bridges between cysteines that might structurally facilitate the orientation of the 'EF-hand' towards Ca 2+ binding. Our analysis of protein secondary structure also confirmed formations of three disulfide-bridges between active cysteines. The 'WW-domain' or 'rsp5-domain' formed by two distantly located tryptophans (W; >20 residues apart, in general) is known as one of the smallest protein modules that specifically interacts with 'proline-rich motif' of regulatory proteins (Sudol et al., 1995) . This is however, not reported in viral proteins, including nonstructural cysteine proteases so far. In our proposed model, W476 is therefore predicted to pair with the upstream-located W437 that could form the putative 'W437-W476/rs5-domain'. Like closely-related virus polyproteins, HEV ORF1 also contains a 'proline-rich hypervariable region' followed by the protease-domain (Fig. 1A ) that indicates its possible interaction with the viral protease through predicted 'W437-W476' module. In conclusion, our present model of HEV-protease supported by molecular data, warrants critical roles of the putative 'catalytic dyad' and 'divalent metal-binding motifs' in viral protease structural-integrity, ORF1 self-processing, and RNA replication. These results are however, not a confirmation of the structures presented, but do present testable hypotheses about the structure and function of the HEV-protease. Therefore, experimental validations involving molecular, biochemical and biophysical methods are further needed. Knowledge of the structure of HEV-protease would be thus, valuable for understanding the virus biology and classification as well as antiviral-drug development. Cloning, sequencing, and expression of the hepatitis E virus (HEV) nonstructural open reading frame The solution structure of the N-terminal proteinase domain of the hepatitis C virus (HCV) NS3 protein provides new insights into its activation and catalytic mechanism Mutations in the putative zinc-binding motif of UL52 demonstrate a complex interdependence between the UL5 and UL52 subunits of the human herpes simplex virus type 1 helicase/primase complex Characterization of the rubella virus nonstructural protease domain and its cleavage site Leader protein of encephalomyocarditis virus binds zinc, is phosphorylated during viral infection, and affects the efficiency of genome translation Mutational analysis of the Sendai virus V protein: importance of the conserved residues for Zn binding, virus pathogenesis, and efficient RNA editing Putative papain-related thiol proteases of positive-strand RNA viruses, identification of rubi-and aphthovirus proteases and delineation of a novel conserved domain associated with proteases of rubi-, alpha-and coronaviruses Genetic analysis of the zinc finger in the Moloney murine leukemia virus nucleocapsid domain: replacement of zinc-coordinating residues with other zinc-coordinating residues yields noninfectious particles containing genomic-RNA Structural studies of cysteine proteases and their inhibitors Structural and biochemical features distinguish the foot-and-mouth disease virus leader proteinase from other papain-like enzymes Papain-like protease 2 (PLP2) from severe acute respiratory syndrome coronavirus (SARS-CoV): expression, purification, characterization, and inhibition Proteolytic processing at the amino terminus of human coronavirus 229E gene 1-encoded polyproteins: identification of a papain-like proteinase and its substrate Molecular virology of hepatitis E virus Hepatitis E virus infection Deubiquitination activity associated with hepatitis E virus putative papain-like cysteine protease Computer-assisted assignment of functional domains in the nonstructural polyprotein of hepatitis E virus: delineation of an additional group of positive-strand RNA plant and animal viruses Characterization of the zinc binding activity of the rubella virus nonstructural protease Single point mutations in the zinc finger motifs of the human immunodeficiency virus type 1 nucleocapsid alter RNA binding specificities of the gag protein and enhance packaging and infectivity Molecular characterization of hepatitis E virus ORF1 gene supports a papain-like cysteine protease (PCP)-domain activity Early secretory pathway localization and lack of processing for hepatitis E virus replication protein pORF1 Expression of the hepatitis E virus ORF1 The PredictProtein server Virus-encoded proteinases of the picornavirus supergroup Expression and processing of the Hepatitis E virus ORF1 nonstructural polyprotein Lack of processing of the expressed ORF1 gene product of hepatitis E virus Mutational analyses support a model for the HRV2 2A proteinase Characterization of a novel protein-binding module-the WW domain The NS5A protein of hepatitis C virus is a zinc metalloprotein A zinc fingercontaining papain-like protease couples subgenomic mRNA synthesis to genome translation in a positive-stranded RNA virus Characterization of the roles of conserved cysteine and histidine residues in poliovirus 2A protease Identification of a Ca 2+ -binding domain in the rubella virus nonstructural protease A cysteine-rich metal-binding domain from rubella virus non-structural protein is essential for viral protease activity and virus replication The authors would like to extend their sincere appreciation to the Deanship of Scientific Research at King Saud University, Riyadh for its funding of this research through the Research Group Project No. RGP-VPP-212.