key: cord-0786876-1pwxh3ln authors: Ohno, Ayako; Maita, Nobuo; Tabata, Takanori; Nagano, Hikaru; Arita, Kyohei; Ariyoshi, Mariko; Uchida, Takayuki; Nakao, Reiko; Ulla, Anayt; Sugiura, Kosuke; Kishimoto, Koji; Teshima-Kondo, Shigetada; Nikawa, Takeshi; Okumura, Yuushi title: Crystal structure of inhibitor-bound human MSPL that can activate high pathogenic avian influenza date: 2020-07-06 journal: bioRxiv DOI: 10.1101/2020.06.12.149229 sha: db09d4917794f324bb328d9aeb7fbb6c715cd594 doc_id: 786876 cord_uid: 1pwxh3ln Viral infection is triggered when a surface envelope glycoprotein, hemagglutinin (HA), is cleaved by host cell proteins of the transmembrane protease serine (TMPRSS) family. The extracellular region of TMPRSS-2, -3, -4, and MSPL are composed of LDLA, SRCR, and SPD domains. MSPL can cleave the consensus multibasic (R-X-X/R-R) and monobasic (Q(E)-T/X-R) motifs on HA, while TMPRSS2 or -4 only cleaves monobasic motifs. To better understand HA cleavage mediated by MSPL, we solved the crystal structure of the extracellular region of human MSPL in complex with the furin inhibitor. The structure revealed that three domains are gathered around the C-terminal α-helix of the SPD domain. The furin inhibitor structure shows that the side chain of P1-Arg inserts into the highly conserved S1 pocket, whereas the side chain of P2-Lys interacts with the Asp/Glu-rich 99 loop that is unique to MSPL. We also constructed a homology model of TMPRSS2, which is identified as an initiator of SARS-CoV-2 infection. The model suggests that TMPRSS2 is more suitable for Ala/Val residues at the P2 site than Lys/Arg residues. Mosaic serine protease large form (MSPL), also known as TMPRSS13, was originally identified from a human lung cDNA library and is a member of the type II transmembrane serine proteases (TTSPs) (1, 2) . TTSPs comprise a transmembrane domain near the N-terminus and a catalytic serine protease domain at the C-terminus. Human MSPL is expressed in lung, placenta, pancreas and prostate (1) . Little is known about the biological function of human MSPL, although there are reports it can cleave the spike protein of porcine epidemic diarrhea virus (PEDV) (3), MERS-and SARS-CoV (4), influenza virus hemagglutinin (5) , and pro-hepatocyte growth factor (6) . TTSPs share a similar overall organization comprising an N-terminal cytoplasmic domain, transmembrane region, and stem/catalytic domains at the C-terminus (7) . All TTSPs are synthesized as single-chain zymogens and are subsequently activated into the twochain active forms by cleavage within the highly conserved activation motif. The two chains are linked by a disulfide bridge, so that TTSPs remain bound to the cell membrane (8) . The catalytic domain contains a highly conserved 'catalytic triad' of three amino acids (His, Asp, and Ser). The conserved Asp lies on the bottom of the S1 substrate-binding pocket. Substrate specificity results from Arg or Lys residues in the P1 position. Based on similarities in domain structure, the serine protease domain and chromosomal location, TTSPs are classified into four subfamilies: hepsin/TMPRSS, matriptase, HAT/DESC and corin (7, 9) . MSPL belongs to the hepsin/TMPRSS subfamily. In this subfamily, hepsin and spinesin contain a single scavenger receptor cysteine-rich repeat (SRCR) domain in the stem region, while MSPL, TMPRSS2, -3, and -4 contains a low-density lipoprotein receptor A (LDLA) domain near the single SRCR domain in the stem region (9) . The SRCR domain contains approximately 100-110 amino acids that adopt a compact fold consisting of a curved β-sheet wrapped around an α-helix and is stabilized by 2-4 disulfide bonds. Depending on the number and the position of the cysteine residues, the SRCR domain has been divided into three subclasses (group A, B and C) (10) . However, the canonical LDLA domain contains approximately 40 amino acids and contains six conserved cysteine residues that are involved in the formation of disulfide bonds. The LDLA domain also contains a calcium ion coordinated with six highly conserved residues near the C-terminus. Together, the disulfide bonds and calcium-binding stabilize the overall structure of the LDLA domain (11) . It was recently reported that TMPRSS2, -4, and MSPL are involved in infections by influenza virus by cleaving the glycoprotein hemagglutinin (HA) on the influenza viral surface (4, 5, 12, 13, 14) . Specifically, HA is cleaved into HA1 and HA2 subunits by TMPRSS2, -4, and MSPL. Proteolytic cleavage of HA is essential for influenza virus infection, where HA1 mediates host cell binding as well as initiation of endocytosis and HA2 controls viral-endosomal fusion (15) . To date, two main HA processing consensus motifs in the influenza virus have been identified. One is the single basic HA processing motif (Q(E)-T/X/-R) in human seasonal influenza viruses, which contain a single arginine at the cleavage site. The other is a multiple-basic-residue motif (R-X-X/R-R and K-K/R-K/T-R) found in highly pathogenic avian influenza viruses, which contain several basic amino acids at the cleavage site. TMPRSS2 and -4 recognize the single basic HA processing motif, while MSPL recognizes both the single basic and multiple basic residue motifs (5) . However, it is not clear why only MSPL is able to recognize the multiple-basic-residue motif. The multibasic motif was also known to be recognized by ubiquitously expressed furin and proprotein convertases (PCs)5/6 in the trans-Golgi network (16) . A previous study showed that the enzyme activity of MSPL was inhibited by decanoyl-RVKR-cmk that mimics the substrate for the furin (5) . To date, only one structure of the extracellular region of hepsin has been reported among the hepsin/TMPRSS family of proteins (17) . The crystal structure of hepsin revealed that the SRCR domain is located at the opposite side of the active site of SPD, and these domains are splayed apart. Because hepsin lacks the LDLA domain, the relative orientation of the LDLA, SRCR and SPD domains in other members of the hepsin/TMPRSS family, such as MSPL, is still unknown. To elucidate the spatial arrangement of the three domains and substrate specificity, we determined the crystal structure of the extracellular region of MSPL in complex with the decanoyl-RVKR-cmk peptide at 2.6 Å resolution. To our surprise, the overall structure of MSPL reveals that the spatial arrangement of the SRCR and SPD domains in MSPL is markedly different from that of hepsin. The complex structure explains how MSPL is able to recognize both the single-and multiple-basic-residue motifs. In addition, we constructed a homology model of TMPRSS2, which is a key protease in SARS-CoV-2 infections. The model was used to investigate the target sequence preference to S1/S2 site of SARS-CoV-2 spike protein. The extracellular region of hMSPL is composed of an LDLA domain (residues 198-221), an SRCR domain (residues 222-313) and a serine protease domain (residues 321-556) ( Fig. 1A) . We expressed and purified the extracellular region (residues 187-586) of hMSPL and crystallized the protein with decanoyl-RVKR-cmk, which is known as a furin inhibitor. Diffraction data was collected at the Photon Factory AR-NE3a (Tsukuba, Japan) and the structure was solved to a resolution of 2.6 Å (Fig. 1B) . This is the first published structure of an LDLA-containing hepsin/TMPRSS subfamily protein. The refined model contains the hMSPL with the residue range of 188-558, except 319 and 320, decanoyl-RVKR-cmk, and a calcium ion. Glycans attached to residues Asn250 and Asn400 were observed, but no phosphorylated residues found (18) . The extracellular region of hMSPL is composed of the non-catalytic portion of the Nterminal region (LDLA domain and SRCR domain) and the catalytic part at the Cterminus (Fig. 1B) . The three domains are linked to each other by disulfide bonds. The hMSPL is activated by hydrolytic cleavage at Arg320-Ile321 and residues in the 321-581 region are converted to the mature SPD (5). Ile321 is located in a pocket where the N atom interacts with Asp505 (Fig. S1A) . Therefore, this structure could represent a mature form in which hMSPL is processed by an intrinsic protease during expression in the cell. The LDLA domain of hMSPL is 24 residues in length and composed of two turns and a short α-helical region. A canonical LDLA domain has an N-terminal antiparallel β-sheet and three disulfide bonds (11) . Therefore, LDLA of MSPL lacks half of the canonical N-terminal region. Since the SRCR domain of MSPL has only two disulfide bonds, it does not belongs to either the group A or B (19) . Intriguingly, the 3D structures of the SRCR domains of MSPL and hepsin are very similar despite their low level of sequence homology (23% sequence identity), suggesting that the SRCR domain of MSPL belongs to group C (10). To date, hepsin (PDB entry: 1P57) is the only protein in the same TTSP subfamily of proteins with an available 3D structure. However, hepsin lacks the LDLA domain. Here, we compared the structures of hMSPL and hepsin ( There are only three residues between the transmembrane domain and the N-terminal Thr188 residue of our structural model. Hence, the extracellular region of MSPL must be located very close to the plasma membrane. Indeed, the region that was predicted to be close to the plasma membrane is enriched in basic residues, such as Arg191, Lys193, Lys 213, Lys 215, and Arg556 (Fig. 2C) . The extracellular region of hepsin is also thought to lie flat against the plasma membrane (17) . Hence, both MSPL and hepsin may bind substrate in close proximity to the transmembrane region. However, the extracellular region of MSPL is oriented in the opposite way with respect to that of hepsin. As expected, the SPD of MSPL displays the conserved architecture of the trypsin-and chymotrypsin-like (S1 family) serine proteases (Fig. 1B) . In the activated MSPL, Ile321 at the cleavage site forms a salt bridge with the conserved Asp505 residue located immediately prior to the catalytic Ser506 residue (Fig. S1A ). This interaction might be generated by the activating cleavage. Formation of the S1 pocket and oxyanion hole comes about via a conformational change in the nearby hairpin loop (Fig. 3 ). This salt bridge was also observed in other proteases such as plasma kallikrein (20) (PDB entry: 1Z8G) and hepsin. A furin inhibitor peptide binds to the SPD of MSPL with P1-Arg, P2-Lys, C-terminal cmk (chloromethylketone; an active site-direct group) and N-terminal decanoyl group (Fig. 1C, 3) . Covalent interaction between the furin inhibitor and catalytic residues (His361, Ser506) occurs via nucleophilic attack on the cmk moiety. P1-Arg inserts into the deep S1 pocket, and its carbonyl oxygen atom directly binds to the backbone amides of the oxyanion hole (Gly504 and Ser506). In addition, the guanidino group of P1-Arg forms a salt bridge with the side chain of Asp500, as well as a hydrogen bond with the side chain of Ser501 and the backbone carbonyl of Gly529. Asp500 is located in the bottom of S1 pocket. These residues are highly conserved among the hepsin/TMPRSS subfamily (Fig. 4) . The interaction between P1-Arg and MSPL is characteristic of trypsin-and chymotrypsin-like serine proteases. However, P2-Lys interacts with residues at the so-called 99-loop (chymotrypsinogen numbering) that contains the catalytic residue Asp409. The Nζ of P2-Lys forms five hydrogen bonds with the backbones of Asp403 and Glu405, the side chains of Tyr401 and Asp406 and a water molecule. This water molecule also mediates hydrogen bonds with the side chains of Asp406 and the catalytic Asp409 residue. Interestingly, with the exception of catalytic Asp409, residues that interact with the side chain of P2-Lys are not conserved among the hepsin/TMPRSS subfamily (Fig. 4, cyan dot) . Indeed, this may explain why other TMPRSSs and hepsin recognize the single basic motif but not the di-basic motif. The crystal structure of the furin inhibitor in complex with mouse furin has been determined (21) . Although furin also has the same Ser-His-Asp catalytic triad as MSPL, its catalytic domain belongs to the superfamily of subtilisin-like serine proteases (22) . The catalytic domain of furin has a different overall fold from that of MSPL, which belongs to the trypsin-and chymotrypsin-like (S1 family) serine protease family. Despite the different overall fold of MSPL and furin, the inhibitor peptide (decanoyl-RVKR-cmk) can bind to both enzymes. Therefore, we compared the structure of the MSPL-bound furin inhibitor with that of the furin-bound inhibitor (Fig. 5) . Except for the P1-Arg, they are not superimposed. In the MSPL:furin inhibitor complex structure, the inhibitor exhibits a bend at the P3-Val. By contrast, in the furin:furin inhibitor complex structure, the inhibitor adopts an extended conformation. As a consequence, the P1, P2, and P4 site contacts with furin, whereas the P3 site is directed away from it. Nonetheless, structural differences between furin and MSPL do not prevent the inhibitor from binding to both proteins. In 2020, the SARS-CoV-2 pandemic killed over 0.5 million people (https://ourworldindata.org/covid-deaths) and resulted in a worldwide recession as people were forced to socially distance. In the early stage of infection, the spike protein of SARS-CoV-2 is cleaved by human TMPRSS2, and converted to the infectious form (23, 24) . To date, the structure of TMPRSS2 has not been reported. To investigate the structural features of TMPRSS2, we constructed a homology model (Fig. 6 ) using MSPL as template. Eight out of nine disulfide bonds are conserved (Fig. 4) , and the relative domain alignment of TMPRSS2 is similar to that of MSPL. However, the SPD domain, specifically the β12-β13 loop region, displays significant differences (Fig. 6) . These structural changes result in a wide substrate-binding groove, so that TMPRSS2 may more readily capture the target peptide. Furthermore, Glu404, an important residue for P2-Lys recognition in MSPL, is replaced by Lys225 in TMPRSS2 (Fig. 4, 6B) . As mentioned earlier, this substitution leads to a preference for the monobasic target of TMPRSS2. In fact, the S1/S2 cleavage site of SARS-CoV-2 spike protein is reported as P2-Ala instead of a basic residue (25, 26, 27) . In summary, our homology model reflects the features of TMPRSS2 target peptide recognition. Our structure also helps to predict the tertiary structure of TMPRSS3, the gene responsible for autosomal recessive nonsyndromic deafness. Mutations identified in patients with this syndrome were mapped onto a homology model of TMPRSS3 to better understand the disease. Seven missense TMPRSS3 mutants (D103G, R109W, C194F, R216L, W251C, P404L and C407R) associated with deafness in humans were unable to activate the ENaC (28, 29) . One of seven missense mutants associated with the loss of hearing, D103G, was found in the LDLA domain of TMPRSS3 (28, 30) . Because Asp103 in TMPRSS3 corresponds to Asp221 in MSPL, the LDLA structure stabilized by calcium-binding may be important for the function of the protein. Indeed, the mutations in LDLA and SRCR (D103G, R109W and C194F) as well as the SPD domains of TMPRSS3 affect its autoactivation by proteolytic cleavage at the junction site between the SRCR and the SPD domains (30) . In this study, we have elucidated the structure of the extracellular domain of MSPL and its spatial arrangement of three (LDLA, SRCR, and SPD) domains, as well as the substrate sequence specificity of MSPL. These findings will be useful in designing novel anti-influenza drugs that prevent HAPI virus uptake into the host cell. MSPL also contributes to the cleavage and activation of severe acute respiratory syndrome coronavirus (SARS-CoV) Middle East respiratory syndrome coronavirus (MERS-CoV) spike proteins (4). Soluble recombinant hMSPL was generated using a previously established stable cell line expressing hMSPL (5) , which accumulated in serum-free culture medium (SFCM). The peptide inhibitor (decanoyl-RVKR-cmk) was purchased from Merck-Millipore and Table S1 . The structure of the complex was solved by the molecular replacement method using the program MolRep (33), with SPD of human plasma kallikrein (PDB code: 2ANY), which shows the highest sequence identity score (46.1%), as a search model. The model of SPD was manually fixed with COOT (34) and refined with Refmac5 (35) . Once the SPD of MSPL was well refined, interpretable electron density of the unmodeled region was evident. The model of the LDLA and SRCR domains was then manually built. The final model contained one MSPL, one furin inhibitor, four sugars, 80 ions, and 65 water molecules, with R-work and R-free values of 18.5% and 25.1%, respectively. The refinement statistics are summarized in Table S1 . In the MSPL-peptide inhibitor complex, some residues (N-terminal 3xFLAG-tag and His187, Gly319, Arg320, and Cterminal Thr559-Val 581) are missing due to disorder. All the structures in the figures were prepared using PyMOL (http://www.pymol.org/). The MSPL/peptide inhibitor interfaces were analyzed using LIGPLOT (36) . The sequence alignment of the extracellular region of MSPL and TMPRSS2 was obtained using the BLAST webserver (https://www.uniprot.org/blast/). The amino acid identity between MSPL and TMPRSS2 was 39.8% with a score of 704, and E-value of 1.1e-86. The homology model of TMPRSS2 was build using MODELLER (37) . Electrostatic surface potentials were calculated using the APBS server (http://server.poissonboltzmann.org/). The coordinates and structure factors of the MSPL-peptide inhibitor complex have been deposited to the RCSB Protein Data Bank (PDB code: 6KD5). The authors have jointly contributed to project design, data analysis, and manuscript (B) Electrostatic surface potential of MSPL and TMPRSS2 SPD. MSPL has a narrow groove that fits with the downstream peptide chain (green arrow). By comparison, in TMPRSS2 the groove is significantly wider, and the peptide binding site is bowlshaped (cyan oval A). A positively-charged area derived from Lys225 is indicated in green oval B. The potential map is colored from red (-5kT/e) to blue (+5kT/e). Cloning and expression of novel mosaic serine proteases with and without a transmembrane domain from human lung MSPL/TMPRSS13 TMPRSS2 and MSPL Facilitate Trypsin-Independent Porcine Epidemic Diarrhea Virus Replication in Vero Cells DESC1 and MSPL activate influenza A viruses and emerging coronaviruses for host cell entry Novel Type II Transmembrane Serine Proteases, MSPL and TMPRSS13, Proteolytically Activate Membrane Fusion Activity of the Hemagglutinin of Highly Pathogenic Avian Influenza Viruses and Induce Their Multicycle Replication TMPRSS13, a type II transmembrane serine protease, is inhibited by hepatocyte growth factor activator inhibitor type 1 and activates pro-hepatocyte growth factor Type II transmembrane serine proteases in development and disease Type II transmembrane serine proteases. Insights into an emerging class of cell surface proteolytic enzymes Membrane-anchored serine proteases in vertebrate cell and developmental biology Crystal structure of the cysteine-rich domain of scavenger receptor MARCO reveals the presence of a basic and an acidic cluster that both Contribute to ligand recognition Threedimensional structure of a cysteine-rich repeat from the low-density lipoprotein receptor MDCK cells that express proteases TMPRSS2 and HAT provide a cell system to propagate influenza viruses in the absence of trypsin and to study cleavage of HA and its inhibition Proteolytic activation of the 1918 influenza virus hemagglutinin TMPRSS4 is a type II transmembrane serine protease involved in cancer and viral infections Influenza Virus-Mediated Membrane Fusion: Determinants of Hemagglutinin Fusogenic Activity and Experimental Approaches for Assessing Virus Fusion. Viruses, 4 Influenza virus hemagglutinin with multibasic cleavage site is activated by furin, a subtilisin-like endoprotease The Structure of the Extracellular Region of Phosphorylation of the type II transmembrane serine protease, TMPRSS13, in hepatocyte growth factor activator inhibitor-1 and -2-mediated cell-surface localization The Scavenger Receptor Cysteine-Rich (SRCR) domain: an ancient and highly conserved protein module of the innate immune system Structure of plasma and tissue kallikreins The crystal structure of the proprotein processing proteinase furin explains its stringent specificity Subtilases: the superfamily of subtilisin-like serine proteases SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor The insert sequence in SARS-CoV-2 enhances spike protein cleavage by TMPRSS Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site absent in CoV of the same clade A Multibasic Cleavage Site in the Spike Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells A novel TMPRSS3 missense mutation in a DFNB8/10 family prevents proteolytic activation of the protein The cutting edge: membrane-anchored serine protease activities in the pericellular microenvironment The transmembrane serine protease (TMPRSS3) mutated in deafness DFNB8/10 activates the epithelial sodium channel (ENaC) in vitro iMOSFLM: a new graphical interface for diffraction-image processing with MOSFLM How good are my data and what is the resolution? Molecular replacement with MOLREP Coot: model-building tools for molecular graphics REFMAC5 for the refinement of macromolecular crystal structures LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions Comparative protein modelling by satisfaction of spatial restraints CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice ESPript/ENDscript: extracting and rendering sequence and 3D information from atomic structures of proteins Structure validation by Cα geometry: ϕ, ψ and Cβ deviation We thank the beamline staff at the PF-AR and SPring The authors declare that they have no conflict of interest.