key: cord-0892311-bcgjwxms authors: Escalante, Diego E.; Wang, Austin; Ferguson, David M. title: Structural Modeling of the TMPRSS Subfamily of Host Cell Proteases Reveals Potential Binding Sites date: 2021-06-15 journal: bioRxiv DOI: 10.1101/2021.06.15.448583 sha: 29750af50213c2f0be6b7c46de884b38a1278460 doc_id: 892311 cord_uid: bcgjwxms The transmembrane protease serine subfamily (TMPRSS) has at least eight members with known protein sequence: TMPRSS2, TMPRRS3, TMPRSS4, TMPRSS5, TMPRSS6, TMPRSS7, TMPRSS9, TMPRSS11, TMPRSS12 and TMPRSS13. A majority of these TMPRSS proteins have key roles in human hemostasis as well as promoting certain pathologies, including several types of cancer. In addition, TMPRSS proteins have been shown to facilitate the entrance of respiratory viruses into human cells, most notably TMPRSS2 and TMPRSS4 activate the spike protein of the SARS-CoV-2 virus. Despite the wide range of functions that these proteins have in the human body, none of them have been successfully crystallized. The lack of structural data has significantly hindered any efforts to identify potential drug candidates with high selectivity to these proteins. In this study, we present homology models for all members of the TMPRSS family including any known isoform (the homology model of TMPRSS2 is not included in this study as it has been previously published). The atomic coordinates for all homology models have been refined and equilibrated through molecular dynamic simulations. The structural data revealed potential binding sites for all TMPRSS as well as key amino acids that can be targeted for drug selectivity. The Type II Transmembrane Serine Protease (TTSP) family consists of 4 different subfamilies that are differentiated by specific domains: the HAT/DESC subfamily, the Hepsin/TMPRSS subfamily, the Matriptase subfamily, and the Corin subfamily. 1 The proteins are created as single-chain, inactive proenzymes that are activated by cleaving the basic amino acid residue, arginine or lysine, within the conserved activation motif between the pro-domain and catalytic domain. 1, 2 Once activated, they remain associated with the membrane through disulfide bonds between the catalytic and transmembrane domains. 1 There are currently 14 different TMPRSS proteins : TMPRSS2, TMPRSS3, TMPRSS4, TMPRSS5, TMPRSS6, TMPRSS7, TMPRSS9, TMPRSS11A, TMPRSS11B, TMPRSS11D, TMPRSS11E , TMPRSS11F, TMPRSS12, and TMPRSS13. 3 These proteases have been found to be in multiple organs and tissues throughout the body. [4] [5] [6] [7] [8] [9] [10] This subfamily has a diverse set of physiological functions like homeostasis and proteolytic cascades, and while not all functions have been found, their involvement in various pathogenicities is becoming apparent as shown in Table 1 below. 11 Several inhibitors have also been identified including aprotinin, camostat, leupeptin, AEBSF, nafamostat, and gabexate. 12 Table 1 Known roles of TMPRSS proteins. Recently, the Hepsin/TMPRSS subfamily has recently received considerable attention not only due to the potential role these enzymes play in the development of cancers but to the function TMPRSS2 plays in mediating the virulence of SARS-CoV-2. The TMPRSS2 protein and ACE2 receptor are key elements of the main pathway by which SARS-CoV-2 (and related CoV pathogens) are internalized by lung cells. 12 The virus uses TMPRSS2 to cleave the Spike protein at a specific site (defined by Arg255 and Ile256) to activate membrane insertion. 12 ACE2 recruits the Spike protein to the host cell surface through specific interaction with the receptor binding domain of the Spike protein. 12 The Spike protein contains two functional subunits: the S1 subunit that allows binding of the virus to the host cell surface receptor and the S2 subunit that allows the fusion of the viral and cellular membranes. 13, 14 In vivo studies have shown TMPRSS2-knockout mice with down-regulated TMPRSS2 show less severe lung pathology as compared to controls when exposed to SARS-CoV-2. 15, 16 Additional work has shown that TMPRSS2 is the main processing enzyme for virus entry in lung cells. 17 TMPRSS2 has therefore emerged as a primary target for the design and discovery of drugs for treating SARS-CoV-2 infections. An excellent starting point in the search for potent drugs is camostat. This compound is clinically approved for use in treating pancreatitis in Japan and is a known inhibitor of TMPRSS2. While camostat has shown efficacy in preventing mice infected with SARS-CoV from dying, 12 it is a pan-trypsin-like serine protease inhibitor and is not highly selective for TMPRSS2. Similarly, there are no known or reported small molecules that are highly selective for any of the other member of the TMPRSS family. The main reason for this is the lack of crystal structures of the TMPRSS protein subfamily. Any structural information on these proteases could lead to the development of drugs more selective than the current alternatives. This study extends prior work on camostat bound to TMPRSS2 to identify common elements of recognition across the TMPRSS sub-family. Homology Modeling . The 14 proteins (TMPRSS2, TMPRSS3, TMPRSS4, TMPRSS5, TMPRSS6, TMPRSS7, TMPRSS9, TMPRSS11A, TMPRSS11B, TMPRSS11D, TMPRSS11E, TMPRSS11F, TMPRSS12, TMPRSS13) amino acid sequence was obtained from the UnitProt database (Gene ID: O15393, P57727, Q9NRS4, Q9H3S3, Q9DBI0, Q7RTY8, Q7Z410, Q6ZMR5, Q86T26, O60235, Q9UL52, Q6ZWK6, Q86WS5, Q9BYE2, respectively), and the crystal structures with high sequence identity, available in the Protein Databank (PDB), were retrieved through the BLASTp algorithm. The differences between proteins will be discussed in the results section. Across the 14 receptors, 25 crystal structures were used as templates to build the TMPRSS protein models. Of the 25 crystal structures, the 3 notable crystals structures were a urokinase-type plasminogen activator with a position 190 alanine mutation (1O5E), DESC1, part of the TTSP family, (2OQ5), and a human plasma kallikrein (6O1G) each with a known ligand included with each crystal structure. The 3 ligands included were 6-Chloro-2-(2-Hydroxy-Biphenyl-3-yl)-1H-indole-5-carboxamidine, benzamide, and 7SD from 1O5E, 2OQ5, and 6O1G, respectively, and were kept in the homology models created. The Pfam database was used to obtain the amino acid region corresponding to the trypsin domain for each respective TMPRSS protein. Found through the Pfam database, all the crystal structures were truncated to only their trypsin domains. The Schrodinger prime homology model suite program was used to align each sequence and identify any conserved secondary structure assignments. To construct a homology model for each TMPRSS TMPRSSxhm with x being the specific name of the protein. The models were refined through a process of short molecular dynamics (MD) simulations that will be described below. Molecular Dynamics. All of the molecular dynamic simulation stages were completed by using the SANDER.MPI function of the AMBER 18 software package unless otherwise stated 18 . The ligand structures were pre-processed with the Antechamber package to assign AM1-BCC partial charges. The ligand and enzyme structures were processed using LEaP to assign ff15ipq 19 or gaff 20 and ff14SB 21 force field parameters for the enzyme and ligand, respectively. All the complexes were submerged within a periodic box of TIP3P water with a 10Å buffer region. Each molecule was initially minimized by using the steepest descent method for 100 steps followed by 9900 steps of conjugate gradient minimization. Then, a stepwise heating procedure occurred where the system temperature was slowly ramped from 0K to 300K over 15,000 steps and then relaxed over 5,000 steps to where the average temperature was kept constant at 300k using a weak-coupling algorithm. For both heating stages, the position of all the nonsolvent atoms were restrained with a harmonic potential with a force constant of 25 kcal/mol-Å. After that, a two-step procedure was performed in which the average pressure was maintained at 1bar for 20,000 steps through pressure relaxation time of 0.2ps and the Berendsen barostat. A harmonic potential with a force constant of 5 kcal/mol-Å restrained the position of all non-solvent atoms. A relaxation stage of 20,000 steps followed in which the pressure relaxation time was increased to 2ps, and the only position of the receptor C was restrained by a harmonic potential with a force constant of 0.5 kcal/mol-Å. Lastly, all the production runs were carried out without any restraints and were kept at a consistent 300K temperature and 1bar pressure. All fifteen sequences have a high enough degree of identity to template models (>40%), as shown in Figure 1 . This allowed us to build homology models for all members of the TMPRSS family. All of the homology models were subject to loop refinement using the Schrodinger refinement package. The refined structures were equilibrated in a water box at 300K and 1atm as described in the methods section. All the TMPRSS structures rapidly equilibrated and their root-mean-squared-deviation plateaued at an average of 1.5-1.6Å. The equilibrated atomic coordinates are provided in the supplementary information section. A structural alignment of all equilibrated TMPRSS' is shown in Figure 2 . Most of the tertiary structure is conserved by all members of this family (i.e. the alpha-helices and beta-sheets). However, the loop found between residues 50-70 is very dissimilar, which is a result of the low sequence identity in this region. Furthermore, all TMPRSS proteins share a conserved histidine, aspartic acid, and serine that form the catalytic triad as shown in Supplementary Table 2 . While the catalytic triad appears to be in different points across the amino acid sequences, the histidine, aspartic acid, and serine are all located at similar points in structural space within the trypsin region. From the structural alignment it is observed that the catalytic triad is also conserved in geometry, as shown in Figure 3 . -NNPWH TMPRSS3 1 IVGGNM-SLLSQWPWQASLQFQ-G----YHLCGGSVITPLWIITAAHCVYD-L--YLPKS TMPRSS4 1 VVGVEE-ASVDSWPWQVSIQYD-K----QHVCGGSILDPHWVLTAAHCFRK-H--TDVFN TMPRSS5 1 IVGGQS-VAPGRWPWQASVALG-F----RHTCGGSVLAPRWVVTAAHCMHS-FRLARLSS TMPRSS6 1 IVGGAV-SSEGEWPWQASLQVR-G----RHICGGALIADRWVITAAHCFQE-DSMASTVL TMPRSS7 1 IIGGTD-TLEGGWPWQVSLHFV-G----SAYCGASVISREWLLSAAHCFHG-NRLSDPTP TMPRSS9 1 IVGGME-ASPGEFPWQASLREN-K----EHFCGAAIINARWLVSAAHCFNE-FQ--DPTK TMPRSS9_2 1 VVGGFG-AASGEVPWQVSLKEG-S----RHFCGATVVGDRWLLSAAHCFNH-TK---VEQ TMPRSS11A 1 Contact interaction diagram between GBPA and TMPRSS2 active site. The anchoring residues near the active site are highlighted in yellow, the catalytic triad is highlighted in red and the lysine distal anchor is highlighted in blue. shown in Figure 6 . Pocket A has two critical positions that vary between TMPRSS proteins. An analysis of Figure 1 shows TMPRSS2 has an uncharged, polar amino acid residue in position 181, which is The alignments and structural analyses have shown there is significant variation across the TMPRSS family of protein to exploit in the design of selective inhibitors. Of the potential pockets and positions that influence binding, pockets A, B, C, and D have differences in the primary sequence that may drive ligand binding and selectivity. From these specific regions, differences in position 87 in pocket D may be important for ligand binding. Prior work has shown that lys87 may be a key anchor point for the carboxylate group of GBPA (the active metabolite of camostat). Only TMPRSS2 and TMPRSS4 share the conserved lysine at position 87. This is interesting because the Spike protein is known to be cleaved by both TMPRSS2 and TMPRSS4 to gain cell entry. Targeting this position may therefore provide an approach to selectively inhibit these two enzyme and reduce off target binding of camostat. Pocket A also contains an area of dense negative charges through its primary sequence. This allows for a strong ionic interaction between ligands containing positively charged amino acids and TMPRSS proteins. Because of this interaction, current ligands with large bulky positively charged regions, like nafamostat, can form strong ionic interactions with pocket A. Within the TTSP family, members of the TMPRSS subfamily have been implicated in disease progression involving cancer and infectious diseases such as COVID-19. The connection between Spike protein processing, disease progression, and TMPRSS2 activity has sparked great interest in the repurposing of drugs such as camostat and nafamostat for COVID-19 drug therapy. The alignments presented and analysis performed has shown there are significant differences in amino acid residues within the ligand binding pockets of the TMPRSS family of enzymes. Since drug selectivity and off target binding can have dramatic consequences on adverse side effects of drugs the data given here is designed to help identify unique epitopes in TMPRSS2 that can be exploited in the design of more selective agents. One of the more interesting results of this study is the identification of lys87 as a potential selectivity element for inhibiting TMPRSS2 and TMPRSS4. This lysine is only conserved in these two family members which are both enzymes that have been shown to process the SARS-CoV Spike protein. The ability to selectively block TMPRSS2,4 and avoid off target binding to other TMPRSS proteins would provide significant advantages in the development of better therapeutic agents that block protease-mediate cleavage of the Spike protein. One problem that is not easily addressed is the conserved nature of the polar-anionc region in pocket A. Drugs like nafamostat that contain a bulky cationic group are recognized across all members that display this motif. This, may in part explain the promiscuous binding of compounds showing pan-trypsin protease activity. These results suggest it may not be possible to improve the selectivity of these agents for repurposing in the design of COVID-19 agents. Supplemental Table 2 Comparison of TMPRSS Proteins showing the catalytic triad residues, peptidase and trypsin regions. The percent identity and percent positives was calculated relative to TMPRSS2. The atomic coordinates for all the generated homology models can be found as PDB files in the supplementary .zip folder. 100 100 Type Ii Transmembrane Serine Proteases Membrane-Anchored Serine Proteases in Health and Disease. Progress in molecular biology and translational science Uniprot: A Hub for Protein Information Polyserase-I, a Human Polyprotease with the Ability to Generate Independent Serine Protease Domains from a Single Translation Product Expression of Transmembrane Serine Protease Tmprss2 in Mouse and Human Tissues Tmprss2, a Serine Protease Expressed in the Prostate on the Apical Surface of Luminal Epithelial Cells and Released into Semen in Prostasomes, Is Misregulated in Prostate Cancer Cells Evidence That Tmprss2 Activates the Severe Acute Respiratory Syndrome Coronavirus Spike Protein for Membrane Fusion and Reduces Viral Control by the Humoral Immune Response Sars-Cov-2 Receptor Ace2 Is an Interferon-Stimulated Gene in Human Airway Epithelial Cells and Is Detected in Specific Cell Subsets across Tissues Tmprss2 and Tmprss4 Promote Sars-Cov-2 Infection of Human Small Intestinal Enterocytes A Novel Transmembrane Serine Protease (Tmprss3) Overexpressed in Pancreatic Cancer Type Ii Transmembrane Serine Protease (Ttsp) Deregulation in Cancer Tmprss2: A Potential Target for Treatment of Influenza Virus and Coronavirus Infections Interaction between Heptad Repeat 1 and 2 Regions in Spike Protein of Sars-Associated Coronavirus: Implications for Virus Fusogenic Mechanism and Identification of Fusion Inhibitors. The Lancet Human Coronavirus: Host-Pathogen Interaction Tmprss2 Contributes to Virus Spread and Immunopathology in the Airways of Murine Models after Coronavirus Infection Sars-Cov-2 Cell Entry Depends on Ace2 and Tmprss2 and Is Blocked by a Clinically Proven Protease Inhibitor Further Along the Road Less Traveled: Amber Ff15ipq, an Original Protein Force Field Built on a Self-Consistent Physical Model Automatic Atom Type and Bond Type Perception in Molecular Mechanical Calculations Ff14sb: Improving the Accuracy of Protein Side Chain and Backbone Parameters from Ff99sb Structural Modeling and Analysis of the Sars-Cov-2 Cell Entry Inhibitor Camostat Bound to the Trypsin-Like Protease Tmprss2 Supplemental Figure 1 TMPRSS2 Potential Pockets for ligand binding.