key: cord-0861305-r7lallv0 authors: De Maio, Flavio; Lo Cascio, Ettore; Babini, Gabriele; Sali, Michela; Della Longa, Stefano; Tilocca, Bruno; Roncada, Paola; Arcovito, Alessandro; Sanguinetti, Maurizio; Scambia, Giovanni; Urbani, Andrea title: Improved binding of SARS-CoV-2 Envelope protein to tight junction-associated PALS1 could play a key role in COVID-19 pathogenesis date: 2020-09-04 journal: Microbes Infect DOI: 10.1016/j.micinf.2020.08.006 sha: ac7a5fed452b38f471bb9c6377e35df572b47d55 doc_id: 861305 cord_uid: r7lallv0 The Envelope (E) protein of SARS-CoV-2 is the most enigmatic protein among the four structural ones. Most of its current knowledge is based on the direct comparison to the SARS E protein, initially mistakenly undervalued and subsequently proved to be a key factor in the ER-Golgi localization and in tight junction disruption. We compared the genomic sequences of E protein of SARS-CoV-2, SARS-CoV and the closely related genomes of bats and pangolins obtained from the GISAID and GenBank databases. When compared to the known SARS E protein, we observed a significant difference in amino acid sequence in the C-terminal end of SARS-CoV-2 E protein. Subsequently, in silico modelling analyses of E proteins conformation and docking provide evidences of a strengthened binding of SARS-CoV-2 E protein with the tight junction-associated PALS1 protein. Based on our computational evidences and on data related to SARS-CoV, we believe that SARS-CoV-2 E protein interferes more stably with PALS1 leading to an enhanced epithelial barrier disruption, amplifying the inflammatory processes, and promoting tissue remodelling. These findings raise a warning on the underestimated role of the E protein in the pathogenic mechanism and open the route to detailed experimental investigations. abundantly expressed inside the infected cell and actively involved in the pathogenic viral mechanisms [15] . 84 SARS-CoV E protein is the smallest among structural proteins (76 amino acids), organized in three main 85 domains: a short (approx. 8-10 amino acids) luminal oriented N-terminal domain, a long α-helical 86 transmembrane domain composed of ≈ 22 amino acid residues and a cytoplasmically oriented C-terminal 87 domain [16, 17] . 88 Homologous assembling of the E protein contributes to create a pentameric channel with its transmembrane 89 domain that directly alters virus replication [18] . 90 Conversely, monomeric E protein affects the host's intracellular activities through C-terminal end domain, 91 which is predicted to have a β-coil-β structure, leading to its localization in the endoplasmic reticulum, Golgi Crumbs-PALS1-PATJ complex is fundamental for the development and maintenance of apical-basal polarity 100 of epithelial cells [21] . Therefore, interactions between the SARS E protein and PALS1 induced relocation of 101 PALS1 to the virus assembly site and disrupted tight junctions promoting virus spread. Little information has 102 been collected yet on SARS-CoV-2 E protein and mainly focused on the sequences conserved from SARS, The amino acid sequence of the SARS-CoV-2 Envelope protein was extracted, and the NMR structure of the 119 homologous protein of SARS-CoV (PDB code: 2MM4) was used as a template. The starting 3D model was 120 then built using the Homology Modeling protocol Prime of the Schrodinger Suite [25] . According to the 121 Ramachandran plot analysis for 58 residues, 93.1% lie in the most favored regions, 6.9% in the allowed 122 regions, and none in the disallowed regions. The missing C and N terminal residues, not present in the 123 chosen template, were finally added. 124 protein in the neighbourhood of CRB1 peptide at 5 Å of the protein. A search grid was generated with 138 Glide5 by selecting the 8 C-terminal residues of CRB1 (PPAMERLI) to define the binding pocket, thus 139 including the entire binding site of the peptide-protein complex. 140 Then, the 8 C-terminal residues of each E protein were built, i.e. EGVPDLLV and SRVPDLLV for SARS-141 CoV and SARS-CoV-2, respectively, whose protonation state was assigned with PROPKA. Using the 142 peptide-protein docking protocol of Glide [29] multiple conformers of the peptide were generated, docked on 143 the protein and post-processed using MM-GBSA. 144 145 Multiple sequence alignments showed a quasi-perfect identity between all genomes of bats, pangolin, SARS-148 CoV and SARS-CoV-2 in the N-terminal and transmembrane regions of the E protein: only few synonymous 149 mutations were identified in these two regions (Supplementary Table 2) . In Figure 1B and 1C, the two predicted monomeric E full length protein structure models have been 159 constructed and show N-terminal (blue), transmembrane (green), C-terminal domains (red) as well as the 160 amino acid variants (yellow). As expected, transmembrane domains of both proteins presented the highest 161 accuracy with a total confidence score of more than 90% on ~ 80% of the full-length proteins. The full-162 length domains of the SARS-CoV variants have been further characterised, posing them in a membrane membrane bilayer. 166 The end of the C-terminal, accounting 11 amino acid residues, and the beginning of the N-terminal end did 167 not reach previously indicated accuracy. Moreover, deletion of two amino acid residues and arginine 168 substitution at C-terminus could affect the protein structure altering the spatial disposition of the β-coil-β 169 ( Figure 1D and E). The structure of this subunit appeared highly mobile but remained substantially unaltered 170 along the short MD simulations for both E variants. 171 172 In order to verify the potential implications of the altered amino acid sequence, the binding pose of the two 174 C-terminus octapeptides belonging to SARS-CoV and SARS-CoV-2 were determined and compared with 175 the crystallographic structure of the complex PALS1-CRB1[31]. The poses with the lowest ΔG, calculated 176 via MM-GBSA by using default parameters, are shown in Figure 2A . Accordingly, the Free Energy of 177 Binding for SARS-CoV and SARS-CoV-2 Envelope C-terminals amounts to -63.62 and -97.10 kcal/mol, 178 respectively. This value must be compared to the value of -92.5 kcal/mol obtained performing the same 179 analysis on the complex PALS1-CRB1, where the endogenous peptide was shortened to 8 aminoacids in 180 order to compare its in silico affinity with the SARS-CoVs variants. Interestingly, the SARS-CoV-2 peptide 181 is able to bind PALS1 with a significantly higher affinity compared to SARS-CoV variant, reaching and 182 slightly ameliorating the affinity value of the endogenous ligand, even though the two octapeptides differ for 183 only two out of 8 of the selected aminoacids. In particular, the last four residues of both E C-terminals are 184 the same (Asp, Leu, Leu, Val) and bind PALS1 similarly to what observed for the endogenous CRB1, even 185 if the short sequence of the CRB1 peptide is slightly different (Glu, Arg, Leu, Ile). As shown in the 186 interaction maps described in Figure 2C and 2D, the side-chain of the last residue of the E proteins, which is 187 a valine, interacts with Leu 267 , Leu 321 and Phe 330 of PALS1; its free terminal carboxyl group, instead, makes a 188 salt bridge with Lys 261 and H-bond interactions with amide hydrogens of Leu 267 , Gly 268 and Ala 269 . The two 189 following leucine residues of the E C-terminals make Van der Waals contacts, in particular the second one 190 makes interaction with Phe 318 (while CRB1 interacts with this residue via cation-π through its Arg). Aspartate 191 is the last common residue inside the binding pocket and its sidechain residue makes a salt bridge with negatively charged pocket (zoomed region in Figure 2B Given the very small sequence length of the E protein, full genome multiple sequence alignment might affect 206 the overall precision on this region, likely the reason for erroneously aligned E proteins in previous works, 207 leading to misinterpret these amino acids deletion and substitution in the C-terminal end [32, 33] . SARS-CoV 208 matches to orthologue E proteins of bat-CoV Rs3367, while SARS-CoV-2 E protein is identical to bat 209 RaTG13 and, except for synonymous mutations, to bat CoVs and the recently identified pangolin A novel coronavirus from patients 254 with pneumonia in China CoV-2 related coronaviruses in Malayan pangolins The proximal origin of SARS-258 Coronavirus entry and release in polarized epithelial cells: A review A potential role for integrins in host cell entry by SARS-262 Functional assessment of cell entry and receptor usage for 264 SARS-CoV-2 and other lineage B betacoronaviruses SARS-CoV replication and pathogenesis in an in 266 vitro model of the human conducting airway epithelium Membrane binding proteins of coronaviruses CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant CoV-2: Moving toward more biased codon usage in viral structural and nonstructural genes COVID-19 relationships in different species: a one health perspective. Microbes Infect SARS-CoV-2 spike 2020 Immunoinformatic 281 analysis of the SARS-CoV-2 envelope protein as a strategy to assess cross-protection against 282 COVID-19 Coronavirus envelope protein: current knowledge Coronavirus envelope 286 (E) protein remains at the site of assembly Model of a putative pore: 288 the pentameric α-helical bundle of SARS coronavirus E protein in lipid bilayers A SARS-CoV-2 305 protein interaction map reveals targets for drug repurposing MAFFT: a novel method for rapid multiple sequence alignment based on fast 307 Fourier transform MAFFT multiple sequence alignment software version 7: 309 Improvements in performance and usability A hierarchical 311 approach to all-atom protein loop prediction CHARMM-GUI input 313 generator for NAMD, GROMACS, AMBER, OpenMM, and CHARMM/OpenMM 314 simulations using the CHARMM36 additive force field High 317 performance molecular simulations through multi-level parallelism from laptops to 318 supercomputers Protein and ligand 320 preparation: Parameters, protocols, and influence on virtual screening enrichments Extra 323 precision glide: docking and scoring incorporating a model of hydrophobic enclosure for 324 protein-ligand complexes SARS-CoV2 Envelope protein: non-synonymous 326 mutations and its consequences Structures 328 of the human Pals1 PDZ domain with and without ligand suggest gated access of Crb to the 329 PDZ peptide-binding groove Understanding the nature of variations in structural sequences coding for 331 coronavirus spike, envelope, membrane and nucleocapsid proteins of SARS-CoV-2 Genome composition and divergence 334 of the novel coronavirus ( 2019-nCoV ) originating in China Epithelial barrier function: at the front line of asthma immunology and 337 allergic airway inflammation Epithelial-mesenchymal transition 339 in lung development and disease: Does it exist and is it important? The impact of aging on epithelial barriers