key: cord-0266529-7peyinxs authors: Nouira, Fatma; Hamdi, Manel; Redissi, Alaeddine; Kouidhi, Soumaya; Charfeddine, Cherine; M’saad, Meriem; Cherif, Ameur; Messaoudi, Sabri; Aldulaijan, Sarah; Raouafi, Noureddine; Dhouib, Adnene; Mosbah, Amor title: In silico screening of TMPRSS2 SNPs that affect its binding with SARS-CoV2 spike protein and directly involved in the interaction affinity changes date: 2021-09-29 journal: bioRxiv DOI: 10.1101/2021.09.29.462283 sha: 31c03085d0c8d78fe05f0d0195c3441e293192f9 doc_id: 266529 cord_uid: 7peyinxs In this paper, we used in silico analysis to shed light on the possible interaction between TMPRSS2 and SARS-CoV2 spike (S) protein by examining the role of TMPRSS2 single nucleotide polymorphisms (SNPs) in relation with susceptibility and inter-individual variability of SARS-CoV2 infection. First, we used molecular docking of human TMPRSS2 protein to predict the binding site of TMPRSS2, especially the TMPRSS2 link loops, in order to assess the effect TMPRSS2 SNPs. The latter lead to missense variants on the interaction between TMPRSS2 and SARS-CoV2 S protein. In a second step, we further refine our analysis by performing a structure-function analysis of the complexes using PyMol software, and finally by MD simulations to validate the as-obtained results. Our findings show that 17 SNPs among the 692 natural TMPRSS2 coding variants are in positions to influence the binding of TMPRSS2 with the viral S protein. All of them give more important interaction energy as assessed by docking. Among the 17 SNPs, four missense variants E389A, K392Q, T393S and Q438E lead to “directly increasing” the interaction affinity and 2 missense variants R470I and Y416C cause it “directly decreasing”. The R470I and Y416C present in African and American population, respectively. While the other 4 SNP variants (E389A; K392Q; T393S and Q438E) are present only in the European population, which could link the viral infection susceptibility to demographic, geographic and genetic factors. The pathogenesis of the coronavirus disease (COVID-19) is triggered by the entry of 51 SARS-CoV2 via the spike protein into angiotensin-converting enzyme 2 (ACE2)-bearing 52 host cells, especially pneumocytes, resulting in overactivation of the immune system 53 (cytokine storm), which attacks the infected cells and damages the lung tissue (Hakmi 54 et al.,2020) . Cell entry of the betacoronaviruses, depends on the binding of the surface 55 unit, S1, of the viral spike protein to ACE2 receptor, which facilitates viral attachment to 56 the surface of the target cells. Moreover, to fuse membranes, the S protein needs to be 57 proteolytically activated at the S1/S2 boundary, such that S1 dissociates and S2 undergoes 58 a radical structural modification, therefore, viral entry requires not only the binding to 59 the ACE2 receptor but also the priming of the viral S protein by the transmembrane 60 protease serine 2 (TMPRSS2), which cleaves the S proteins at the S1/S2 and S2 sites 61 Most recently, the ongoing COVID-19 pandemic has created the hypothesis that inter-80 individual genetic differences may affect the spatial transmission dynamics of SARS-81 is to detect the TMPRSS2 polymorphisms affecting binding interfaces, and which are 109 directly associated with the increase or decrease of the interaction affinity with the S1/S2 domain of the spike protein, which can be considered as protective or susceptibility 111 variants to SARS CoV-2 infection. 112 When this study was started, the crystal structure of human TMPRSS2 has not been filed 134 in the Protein Data Bank (PDB), therefore, we modelled the structure of human 135 TMPRSS2 employing I-TASSER (Iterative Threading Assembly Refinement), which is a 136 strong predictor of protein 3D structure, aiming to determine by computational 137 calculations the spatial location of every atom in a protein molecule from the amino acid 138 sequence (Zhang, 2008) . In April 2021, the crystal structure of human TMPRSS2 in 139 complex with Nefamostat has been deposited in the Protein Data Bank (code PDB: 7MEQ), we compared our structure to the one recently deposited in PDB in order to 141 verify the quality and reliability of our model using PyMOL software. 142 SNPs 144 Although there is not enough information about the active site and the catalytic site of 145 TMPRSS2, by running a protease conserved domain (CD), TMPRSS2 was analyzed, and 146 all its link loops residues were predicted with PyMOL. Following the identification of 147 the binding interfaces, we selected only the variants located at the level of these 148 connecting loops from the list of missense and damaging TMPRSS2 SNPs already 149 predicted and extracted from databases. dbSNP is a database of genetic variants 150 implemented at the National Center for Biotechnology Information "NCBI" and 151 GnomaAD database were exploited to characterize the selected SNPs (population and 152 allelic frequency). 153 To explore the structural changes in the protein encoded by different alleles of 155 TMPRSS2, molecular models of all the selected protein variants were developed and 156 superimposed over the structurally resolved template of wild-type TMPRSS2 using 157 SWISS-MODEL, which allows a fully automated protein structure homology modelling. 158 The FASTA sequence of TMPRSS2 was obtained from the UniProt protein knowledge 159 database (UniProt Id O15393, corresponding to 492 amino acid transcript). The sequence 160 of each TMPRSS2 variant is generated at the base of the wild-type sequence by a simple 161 substitution of the amino acid coding for the missense mutation. 162 AutoDock Vina was used to carry out the molecular docking between S1/S2 domain of 164 SARS-CoV2 spike protein and TMPRSS2 wild type or missense variants. In our analysis 165 we used, as a receptor, the TMPRSS2 wild type or missense variants, and, as a ligand, the 166 S1/S2 domain of SARS-CoV2 spike protein model (Code PDB :6ZB4) downloaded from 167 RCSB-PDB database. To obtain the optimal docking, the interactions of the wild-type 168 receptor and variants with the partner were simulated using different parameters therefore receptor and ligand we used a grid size was set to 80× 80 × 80 points with a 170 spacing of 1 Å. 171 To further understand the effect of polymorphisms on receptor recognition by the S1/S2 173 domain of SARS-CoV2 a structural analysis was performed by PyMOL. This is an 174 approach combined with the molecular docking output files to analyze the interactions 175 between the ligand and its receptor. We evaluate the complexes obtained by docking to 176 monitor intermolecular hydrogen bonds, electrostatic, and hydrophobic interactions 177 between SARS-CoV2 S protein and TMPRSS2 missense variants compared to the wild 178 type. Representation of TMPRSS2 protein binding loops. 245 The simplest way that missense variation could impact SARS-CoV-2 infection would be 248 by altering the TMPRSS2-S interface. TMPRSS2 missense variants located at residues 249 that bind the S protein are most likely to have such effects. Structurally, all TMPRSS2 variants bear the characteristic domains of TMPRSS2 wild 261 type. The overall protein architecture of TMPRSS2 allelic variants is largely conserved. 262 The 3D structure of the 17 TMPRSS2 SNPs presents a significant similarity with the 3D 263 structure of the wild type and the Ramachandran analysis of the different analogues 264 proves that all the amino acids of the models are found in the favorable regions. 265 Firstly, the interaction of wild type TMPRSS2 and the selected missense variants with 282 the 3D structure of spike monomer protein were simulated using Autodock Vina. 283 Additionally, both hydrogen, electrostatic bonds and hydrophobic bonds within 284 different TMPRSS2-spike protein complexes were assessed using PyMOL. We classified 285 the polymorphisms into two categories. The first one includes mutations that directly 286 increase the interaction affinity within the complex, these variants increase the number 287 of electrostatic interactions or decrease the distance of interaction between the receptor 288 and ligand residues. We term these mutations as "directly increasing". The second one 289 includes mutations that directly decrease the interaction affinity of the complex by 290 decreasing the number of electrostatic interactions or increasing the distance of 291 interaction between the residues of receptor and its ligand, we term these mutations as 292 "directly decreasing". All these mutants were found to affect the residues in the 293 TMPRSS2/S protein binding interface and in direct contact with the virus domain S1/S2 294 residues. 295 Therefore, this structure analysis allowed us to identify four missense variants E389A, 296 K392Q, T393S and Q438E "directly increasing" the interaction affinity and 2 missense 297 variants R470I and Y416C "directly decreasing" it. 298 299 Figure 3 . Secondary structure of the catalytic domain of TMPRSS2 protein (shown in 300 blue) and the domain S1/S2 of the SARS-CoV2 (shown in orange). The mutated amino 301 acids that "directly increasing" are colored in magenta and those "directly decreasing" 302 are presented as green spheres. 303 The complexes of the binding site (255-492) of TMPRSS2 and missense variants with the 305 spike protein were subject of MD simulation studies over a period of 120 ns to 306 understand their stability and study the structural consequences of these substitutions. 307 To determine the stability and mechanistic aspects of the wild type and mutant 308 complexes, hydrogen bond interactions, RMSD, RMSF, Rg and their binding profiles 309 were analyzed and discussed below. 310 The RMSD is usually used to measure the protein drift from a reference structure, to 312 study the residue behavior of the protein during the simulations and to describe the 313 dynamic stability of systems as it measures the global fluctuations of proteins or 314 complexes. It reflected the mobility of an atom during the MD simulation trajectory. As 315 a result, a higher residue RMSD value suggests higher mobility; inversely, a lower residue 316 RMSD value suggests lower mobility. Therefore, the RMSD analysis was carried out for 317 the MD simulations of each system to determine the change in the overall stability of 318 the protein after mutation, more specifically to understand the effect of TMPRSS2 319 missense variants on the stability of complexes. In addition, we compared the RMSD of 320 the wild type and mutants TMPRSS2/S protein complexes to the free forms of receptor 321 ( Figure S1 ) and the wild type complex to mutants TMPRSS2/S protein complexes (Figure 322 complexes. Data are displayed in Figure S4 . 343 Analysis of the hydrogen bonds (HB) during ligand binding is another important factor 345 that influences protein stability, it has a significant role to strengthen protein-protein 346 interactions. To elucidate how the mutations affect the TMPRSS2 and viral protein 347 interaction at molecular level, the dynamics of hydrogen bonds of each system is 348 displayed in Figure S5 suggested that TMPRSS2 DNA polymorphisms were likely to be associated with 382 susceptibility to COVID-19 and would contribute to differences in SARS-CoV2 infection. 383 In the other hand, recent studies showed that ACE2 genetic variation is very rare in the obtained. After modelling the TMPRSS2 3D structure using I-TASSER, we have 397 compared the catalytic domain of our structure to the one recently deposited in PDB 398 (code PDB: 7MEQ), which shows a strong similarity with an RMSD value equal to 0.705 399 Å, then we predicted the binding site with the S1/S2 domain of the S protein. In a further 400 step, we focused only on the missense variants whose spatial position is at the level of 401 the binding site in order to identify those that are able to modify the interaction affinity 402 in a direct way with the S protein. 403 In the present study, we performed an in-silico analysis of the SNP variants localized at 404 the binding loops. Those SNPs can be directly involved in the alteration of interaction 405 affinity based on molecular docking to obtain the complexes of TMPRSS2. We also 406 selected missense variants with the viral protein to predict the interaction affinity 407 between the two partners, followed by a structure function analysis to identify the key 408 bonds of interaction. In a last step, we carried out a MD study for the wild-type complex 409 and variants that have the potential to alter the interaction affinity between TMPRSS2 and the S protein with aiming to validate the previous results and identify missense Imputation of exome sequence variants into population-based samples and 466 blood-cell-trait-associated loci in African Americans: NHLBI GO Exome Sequencing 467 Project Targeting TMPRSS2 in SARS-CoV-2 infection Status of TMPRSS2-ERG fusion in prostate cancer patients from India: correlation 473 with clinico-pathological details and TMPRSS2 Met160Val polymorphism Proteolytic activation of influenza viruses by serine proteases TMPRSS2 and HAT from 477 human airway epithelium The protein data bank Efficient approximation 481 of ligand rotational and translational entropy changes upon binding for use in MM-PBSA 482 calculations Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia 486 in Wuhan, China: a descriptive study Structure, function 490 and variants analysis of the androgen-regulated TMPRSS2, a drug target candidate for COVID-491 19 infection Pymol: An open-source molecular graphics tool. CCP4 Newsletter 493 on protein crystallography BFEE: A user-495 friendly graphical interface facilitating absolute binding free-energy calculations The MM/PBSA and MM/GBSA methods to estimate 498 ligand-binding affinities In silico exploration of small-molecule α-helix mimetics as inhibitors of 501 SARS-COV-2 attachment to ACE2 G_mmpbsa -a GROMACS tool for high-throughput MM-510 PBSA calculations The genomeaggregationdatabase 512 (gnomAD) Type II transmembrane serine protease gene variants associate with 515 breast cancer Enhanced isolation of SARS-CoV-2 by TMPRSS2 expressing cells SIFT: Predicting amino acid changes that affect protein 522 function First comprehensive 525 computational analysis of functional consequences of TMPRSS2 SNPs in susceptibility to 526 SARS-CoV-2 among different populations Scalable molecular dynamics with NAMD CADD: 532 predicting the deleteriousness of variants throughout the human genome 1000 Genomesproject 535 25-Stawiski Assessment 539 of risk conferred by coding and regulatory variations of TMPRSS2 and CD26 in susceptibility 540 to SARS-CoV-2 infection in human Flexibility and binding affinity in protein-ligand, 542 protein-protein and multi-component protein interactions: limitations of current computational 543 approaches Genetic variants in TMPRSS2 and Structure of SARS-CoV-2 spike 548 glycoprotein and TMPRSS2 complex SWISS-MODEL: homology 551 modeling of protein structures and complexes I-TASSER server for protein 3D structure prediction The results reported in this study show a remarkable change in the interaction affinity 423 of the missense variants with the spike protein compared to the native protein and 424suggest that these missense variants may be directly involved in the modification of 425 interaction affinity between the human TMPRSS2 protein and SARS-CoV2. The binding 426 free energy of the 6 SNP variants is higher than that of the native one but the two variants 427