key: cord-0900022-5b5t297i authors: Chaudhuri, Ankur title: Comparative analysis of non structural protein 1 of SARS-CoV2 with SARS-CoV1 and MERS-CoV: An in silico study date: 2021-05-24 journal: bioRxiv DOI: 10.1101/2020.06.09.142570 sha: 30bc8ab79a5634dfafae3d9b89fcd10472bed545 doc_id: 900022 cord_uid: 5b5t297i The recently emerged SARS-CoV2 caused a major pandemic of coronavirus disease (COVID-19). Non structural protein 1 (nsp1) is found in all beta coronavirus that causes several severe respiratory diseases. This protein is considered as a virulence factor and has an important role in pathogenesis. This study aims to elucidate the structural conformations of non structural protein 1 (nsp1), prediction of epitope sites and identification of important residues for targeted therapy against COVID-19. In this study, molecular modelling coupled with molecular dynamics simulations were performed to analyse the conformational change of nsp1 of SARS-CoV1, SARS-CoV2 and MERS-CoV at molecular level. Principal component analysis escorted by free energy landscape revealed that SARS-CoV2 nsp1 protein shows greater flexibility, compared to SARS-CoV1 and MERS-CoV nsp1. From the sequence alignment, it was observed that 28 mutations are present in SERS-CoV2 nsp1 protein compared to SERS-CoV1 nsp1. Several B-cell and T-cell epitopes were identified by immunoinformatics approach. SARS-CoV2 nsp1 protein binds with the interface region of the palm and finger domain of POLA1 by using hydrogen bond and salt bridge interactions. These findings can be used to develop therapeutics specific against COVID-19. In The ligand binding residues of nsp1 of SARS-CoV1, SARS-CoV2 and MERS-CoV were predicted by COACH meta server [27] . COACH generates complementary ligand binding sites of the target proteins by using two comparative processes, S-SITE and TM-SITE. These two methods recognize ligand-binding templates from the BioLiP database [28] by sequence profile comparisons and binding-specific substructure. The RANKPEP, sequence-based screening server was used to identify the T-cell epitopes [29] of the nsp1 protein of SARS-CoV1, SARS-CoV2 and MERS-CoV. This server predicts the short peptide that binds to MHC molecules from protein sequences using the position-specific scoring matrix (PSSM). All the HLA class I alleles were selected for prediction of epitopes of HLA class I. For the prediction of epitopes of HLA class II, we selected some alleles such as DRB10101, process. Sodium and chloride ions were added to neutralize the three systems. Each system was energy minimized using the steepest descent algorithm until the maximum force was found to be smaller than 1000.0 kJ/mol/nm. This was done to remove any steric clashes on the system. Each system was equilibrated with 100 ps isothermal-isochoric ensemble, NVT (constant number of particles, volume, and temperature) followed by 100 ps isothermal-isobaric ensemble NPT (constant number of particles, pressure, and temperature). The two types of ensemble of equilibration method stabilized the three systems at 310 K and 1 bar pressure. The Berendsen thermostat and Parrinello-Rahman were applied for temperature and pressure coupling methods respectively [37] . Particle Mesh Ewald (PME) method [38] was used for the calculations of the long range electrostatic interactions and the cut off radii for Van der Waals and coulombic short-range interactions were set to 0.9 nm. The Linear Constraint Solver (LINCS) constraints algorithm was used to fix the lengths of the peptide bonds and angles [39] . All the three systems were subjected to MD simulations for 50 ns. The resulting MD trajectories were utilized through the inbuilt tools of GROMACS for analysis purposes. The subsequent analyses were performed using VMD [40], USCF Chimera [41], Pymol [42] , and also the plots were created using xmgrace [43] . Principal components analysis or essential dynamics is a process which extracts the essential motions from the MD trajectory of the targeted protein molecule [44] . The nsp1 protein of SARS-CoV1, SARS-CoV2 and MERS-CoV are used for this purpose. The initial step of PCA analysis is to construct the covariance matrix which examines the linear relationship of atomic fluctuations for individual atomic pairs. The diagonalization of covariance matrix results in a matrix of eigenvectors and eigenvalues. The eigenvectors determine the movement of atoms having corre-sponding eigenvalues which represents the energetic contribution of an atom participating in motion. The covariance matrix and eigenvectors were analyzed using the gmx cover and gmx anaeig tool respectively. Gibbs free-energy landscape (FEL) elaborates the protein dynamic processes by representing the conformational states and the energy barriers [45] . The FEL of SARS-CoV1, SARS-CoV2 and MERS-CoV was constructed based on the first (PC1) and second (PC2) principal components, Rg and rmsd, and psi and phi angels. FEL was calculated and plotted by using gmx sham and gmx xpm2ps module of GROMACS. Nsp1 protein of SARS-CoV2 interacts with POLA1 and blocks the host cell replication process. [8]. The molecular interactions of nsp1 protein of SERS-CoV2 (COVID-19) with the catalytic subunit of human DNA polymerase alpha, POLA1 was analyzed by using the HADDOCK (High Ambiguity Driven protein-protein DOCKing) programme. It is a flexible docking approach for the modelling of biomolecular complexes. It encodes instruction from predicted or identified protein interfaces in ambiguous interaction restraints (AIRs) to drive the docking procedure [46] . The coordinates of the solved structure of the catalytic domain of DNA polymerase alpha, PO-LA1 was downloaded from PDB database (PDB ID: 6AS7) and prepared for the docking experiments by removing water, ions and the ligands. The interface residues were utilized for the docking procedure. The active residues of POLA1 (Asp860, Ser863, Leu864, Arg922, Lys926, Lys950, Asn954 and Asp1004) were retrieved from the literature [47] . The active residues (Leu16, Leu18, Phe31, Val35, Glu36, Leu39, Arg43, Leu46, Gly49, Iso71, Pro109, Arg119, Val121 and Leu123) of nsp1 of SARS-CoV2 were predicted from COACH server. The amino acids surrounding the active residues of both proteins were selected as passive in docking procedure. Active residues are the amino acids from the interface region of the two proteins that take part in direct binding with the other protein partner while passive residues are the amino acids that can interact indirectly in docking procedure. Approximately 163 structures in 8 clusters were obtained from HADDOCK server, which represented 81.5% of the water-refined models. PRODIGY software [48] was used to predict the binding affinity and dissociation constant for each SARS-CoV2 nsp1-POLA1 complex from the best three clusters. The generated model of SARS-CoV2 nsp1-POLA1 complex was optimized to avoid any stereochemical restraints by steepest descent energy minimization method. The optimized complex was validated by Ramachandran plot analysis of ψ/φ angle from PROCHECK. PISA server (http://www.ebi.ac.uk/ msd-srv/prot_int/, Last accessed May 25, 2020) was used to calculate total buried surface area, nature of interactions and amino acids involved in interactions at interface region. The sequence identity of nsp1 protein of SARS-CoV2 with SARS-CoV1 and MERS-CoV was 84.4% and 20.61% respectively. Multiple sequence alignment (MSA) of nsp1 proteins was performed to identify the conserved residues. The amino acids marked as asterisk illustrate the positions of nsp1 protein that were highly conserved over the evolutionary time scale (Fig. 1) . The differences in the amino acid changes were also recorded. It was observed that compared to SARS-CoV1, there were 28 mutations in the nsp1 protein of SARS-CoV2 (Fig. 1) . The three dimensional structure of nsp1 of SARS-CoV2 and MERS-CoV was modelled using the I-TASS-ER web server (Fig. 2) . The NMR structure of SARS-CoV1 nsp1 13-127 (PDB: 2hsx/2gdt) was identified as the template for three dimensional structure prediction of SARS-CoV2 nsp1 on the I-TASSER server. The highly significant templates used in the modelling of the nsp1 protein of SARS-CoV2 and MERS-CoV through I-TASSER are listed in Table 1 . It was evident from the high Z score (>1 means good alignment) and good coverage in case of most of the structural templates, the generated threading alignment predicts a good and confident model in both cases. As shown from the topological analysis, SARS-CoV2 nsp1 13-127 exhibits similar α/β-folds with SARS-CoV nsp1 13-127 that consist of a characteristic six β-barrel and a long α-helix. It was also observed that a short α helix (163-168) is present in the C-terminal region of the of the SARS-CoV2 nsp1 protein. This α-helix may play an important role in the inhibition of the host protein synthesis process [49] . In case of MERS-CoV, eight β-barrel and four α-helix constitutes the nsp1 protein which is different from the SARS-CoV1 and SARS-CoV2 (Fig. S1) . The stereochemical quality of the model nsp1 proteins was validated on the basis of Ramachandran analysis of ψ/φ angle from PROCHECK. Examination of Ramachandran plot of nsp1 protein of SARS-CoV2 and MERS-CoV showed above 92% residues lie in the allowed regions (Table S1) . From ProSA and ProQ analysis, it is clear that the overall model quality of the nsp1 protein of SARS-CoV2 and MERS-CoV are within the range of scores typically found for proteins of similar size ( Table S1 ). The important residues of the nsp1 protein of SARS-CoV1, SARS-CoV2 and MERS-CoV that are involved in the ligand binding process are listed in Table 2 . Combination of the finding of the different domains of SARS-CoV1, SARS-CoV2, and MERS-CoV may help better understanding of the entire structure of nsp1. Several studies revealed that specific T-cell response is required for the elimination of several viral infections such as influenza A, SARS-CoV, MERS-CoV and para-influenza. These studies conclude that T-cell mediated response is essential for the development of specific vaccines [50, 51] . CD8 + cytotoxic T-cells recognize the infected cells in the lungs whereas CD4 + helper T-cells are essential for the production of specific antibodies against viruses. Here we used RankPep server to predict peptide binders to MHC class I and MHC class II alleles from nsp1 protein sequences by using Position Specific Scoring Matrices (PSSMs). The antigenic epitopes of three nsp1 proteins with high binding affinity were predicted and summarized in Table 3 and Table 4 . Secreted neutralising antibodies play an important role to protect the body against viruses. The entry process of the viruses is blocked by the SARS-CoV specific neutralizing antibodies [52] . The Bepipred web server was employed for the linear B-cell epitope prediction study. SARS-CoV1, SARS-CoV2 and MERS-CoV nsp1 proteins were used for this purpose. The Kolaskar & Tongaonkar Antigenicity method was employed for the cross-checking of the predicted epitopes. The linear B-cell epitopes of the three nsp1 proteins are depicted in Table 5 . Both humoral and cellular immune responses are important factors against coronavirus infection [52] . Finally, in SARS-CoV2 nsp1 protein, four epitope rich regions (15-27, 45-81, 121-140 and 147-178) that were shared between T-cell and B-cell were reported. This information will be helpful for vaccine design by targeting SARS-CoV2 nsp1 protein. (Fig. 3C) . The high RMSF values of SARS-CoV2 indicated a larger degree of flexibility in this protein compared with SARS-CoV1 and MERS-CoV (Fig. 3C) . To further comprehend the structural stability of three nsp1 proteins, we studied the compactness parameter of the structure by figuring the radius of gyration (Rg). Radius of gyration (Rg) is detected as root mean square deviation between the center of gravity of the respected protein and its end. It detects the stability and firmness of the simulation system and changes over simulation time due to protein folded-un- (Fig. 3D) . From RMSD, RMSF and Rg analysis, it was observed that SARS-CoV2 shows flexible stability during MD simulation. An overall trend of backbone RMSD, SASA, RMSF and radius of gyration indicated that all three nsp1 protein systems were well equilibrated and stable during the simulation run. Hydrogen bonds provide most of the directional interactions that represents protein structure, protein folding and molecular recognition. The core of most protein structures is composed of secondary structures such as α-helix and β-sheet, which are stabilized by H-bonds between main chain carbonyl oxygen and amide nitrogen. The formation of hydrogen bonds as a function of simulation time was analyzed. The average number of hydrogen bonds of SARS-CoV1, SARS-CoV2 and MERS-CoV were 69±5, 92±10 and 90±11, respectively (Fig. 4) . 3.4 Ramachandran (ψ/Φ) space of the residues by molecular dynamics Simulation. The frequency of dihedral angles phi (Φ) and psi (Ψ) was monitored during the 50 ns simulation time. The plot of the dihedral angle frequencies in a Ramachandran-like graph provides confor-mational preferences of the three nsp1 proteins. The allowed regions of the Ramachandran plot of three nsp1 proteins are delimited by green lines in Figure 5 . High peaks mean that this combination of Ramachandran-angles is assumed often during simulation. In case of SARS-CoV1, the major Ramachandran ψ/Φ angle distribution, as obtained by the MD analysis was found to peak at Ramachandran coordinates of -100° ≤ Φ ≤ -60° and -55° ≤ Ψ ≤ -40°. The other comparatively medium distributions were at -80° ≤ Φ ≤ -60° and 120° ≤ Ψ ≤ 150° in the Ramachandran (ψ, Φ) space. The highly populated Φ/Ψ values were close to the right-handed α-helical space and medium populated Φ/Ψ values were close to the β-sheet space (Fig.5A and Table 6 ). In case of SARS-CoV2, it is worth noting that the distribution of β-sheets (-90° ≤ Φ ≤ -70°, 110° ≤ Ψ ≤ 150°) is more pronounced with respect to the one for α-helices (-95° ≤ Φ ≤ -65°, -40° ≤ Ψ ≤ -30°) ( Fig.5B and Table 6 ). The most highly populated (Φ, Ψ) angles for MERS-CoV were -90° ≤ Φ ≤ -65° and 120° ≤ Ψ ≤ 150°, which defined as β-sheet. The other medium and week distributions were at right-handed α-helix and left handed α-helix respectively ( Fig.5C and Table 6 ). The analysis of the Ramachandran (ψ/Φ) space of three nsp1 proteins suggests that the distribution of the secondary structures spanned mainly swings between α-helices and β-sheets. and MERS-CoV nsp1, respectively. It was suggested that SARS-CoV2 nsp1 protein appeared to cover a larger conformational space due to its greater flexibility when compared with the other two nsp1 proteins (Fig. 6) . This observation correlates with the trajectory of the stability parame-ters such as RMSD, RMSF and Rg of SARS-CoV2 nsp1. It is outward from the PCA plot that the collective motions of the SARS-CoV1 nsp1 protein are localized in a small subspace compared to the SARS-CoV2 and MERS-CoV nsp1 (Fig.6) The Gibbs free energy landscape (FEL) deliver a precise portrayal of a protein's most stable conformational space, which are certainly important to study the conformational changes during simulation. Gibbs free energy landscape (FEL) was calculated by using PC1, PC2 coordinates and RMSD, Rg coordinates of the three nsp1 protein molecules. Both 2D and 3D graphs of the FEL were plotted using PC1, PC2 and RMSD, Rg coordinates. The ∆G values for SARS-CoV1, SARS-CoV2 and MERS-CoV nsp1 protein were 0 to 13.3, 12.2 and 12.9 kJ/mol respectively. The broader, shallow and narrow energy basin were observed during the trajectory analysis of the simulation. Each nsp1 protein has a different pattern for the FEL. SARS-CoV1 nsp1 has less conformational mobility, restricting to a more confined conformational space within a single local basin with the lowest energy (Fig.7A) . Extraction of snapshots from this deep wells represents the most native-like conformation which had a RMSD of 0.2702 nm and Rg of 1.2807 nm (Fig S2A) . In case of SARS-CoV2 nsp1, local minima distributed to about three to four energy basins within the energy landscape which indicates wide range of conformations (Fig. 7B) . One of the well populated conformations was located at a RMSD of around 0.5433 nm and a Rg of around 1.7332 nm (Fig.S2B) . MARS-CoV nsp1 has a broader and narrow with two to three energy basins (Fig. 7C) . The native-like conformation from the low energy basins was centered at the coordinate (RMSD = 0.5724 nm, Rg = 1.6221 nm) (Fig.S2C ). Figure 7 displays the FELs of SARS-CoV1, SARS-CoV2 and MERS-COV nsp1 proteins where the deeper blue indicates the stable conformational states having lower energy. The catalytic domain of DNA polymerase alpha, POLA1 is involved in the replication process. The molecular association of SARS-COV2 nsp1 with POLA1 was predicted by the HADDOCK programme. The nsp1-POLA1 docked complexes were analyzed based on Z-score and HAD-DOCK score (Fig. 8 & Table S2 ). PRODIGY was used to predict the binding energy for each nsp1-POLA1 complex from the best there cluster. Three best docked complexes were selected on the basis of lowest binding energy. (Fig. S3) . The best energy values obtained were -13.0 kcal/ mol, -11.6 kcal/mol and -9.6 kcal/mol. (Table S2 ). Interface area, involvement of amino acids and molecular interactions were calculated by PISA server. The interface area of the best docked complex (ΔG = -13.0 kcal/mol) was 1260.9 Å 2 and shown in Fig. 9A . The best complex structure of SARS-COV2 nsp1-POLA1 was evaluated by the Ramachandran plot which shows 98.4% residues are present in the allowed region. From ProSA and ProQ analysis, it is clear that the overall model quality of the final protein-protein complex is within the range of scores typically found for proteins of similar size (Table S1 ). Interaction studies of this nsp1-POLA1 complex showed thirteen hydrogen bonds and eight salt bridge interactions at the interface region (Table Fig. 9B) . It was observed from the docking experiments that the residues of finger domain (Lys923, Gln927, Gln932) and palm domain (Glu1060, Lys1020, Lys1024, Asn1030, Lys1031 and Glu1037) of POLA1 are mainly involved in the binding process with SARS-COV2 nsp1 protein (Fig. 8) . The hydrogen bonds and salt bridges interactions play an important role towards the stability of the SARS-COV2 nsp1-POLA1 complex formation. Right-handed α-helix ProQ* LG score = 3.578 LG score = 3.607 LG score = 4.418 .9 -0.6 9 -13.0 1260.9 Cluster2 -127.4±2. PyMOL: an open-source molecular graphics tool XMGRACE, Version 5.1.19 Essential dynamics of proteins, Proteins: Structure, Function, and Genetics The energy landscapes and motions of proteins, Science The HAD-DOCK2.2 Web Server: User-Friendly Integrative Modeling of Biomolecular Complexes Crystal Structure Of The Catalytic Core Of Human Dna Polymerase Alpha In Ternary Complex With An Dna-Primed Dna Template And Dctp PRODIGY: a web server for predicting the binding affinity of protein-protein complexes Structural basis for translational shutdown and immune evasion by the Nsp1 protein of SARS-CoV-2 T cell-mediated immune response to respiratory coronaviruses Understanding the T cell immune response in SARS coronavirus infection, Emerging Microbes & Chronological evolution of IgM, IgA, IgG and neutralisation antibodies after infection with SARS-associated coronavirus