key: cord-0315282-x7bj67t8 authors: Vance, Tyler D.R.; Yip, Patrick; Jiménez, Elisabet; Li, Sheng; Gawol, Diana; Byrnes, James; Usón, Isabel; Ziyyat, Ahmed; Lee, Jeffrey E. title: SPACA6 structure reveals a conserved superfamily of gamete fusion-associated proteins date: 2022-03-23 journal: bioRxiv DOI: 10.1101/2022.03.23.484325 sha: 0fc9f8771ef389ac4ba56a02587b823d5d72f3ce doc_id: 315282 cord_uid: x7bj67t8 SPACA6 is a sperm-expressed surface protein that is critical for gamete fusion during mammalian sexual reproduction. Despite this fundamental role, little is known about how SPACA6 specifically functions. We elucidated the crystal structure of SPACA6 at 2.2-Å resolution, revealing a two-domain protein containing a four-helix bundle and Ig-like β-sandwich connected via a quasi-flexible linker. Based on the structural analysis, we propose SPACA6 is a founding member of a superfamily of gamete fusion-associated proteins, herein dubbed the IST superfamily. The IST superfamily is defined structurally by its distorted four-helix bundle and a pair of disulfide-bonded CXXC motifs. A structure-based search of the AlphaFold human proteome identified more protein members to this superfamily; remarkably, many of these proteins are linked to gamete fusion. The SPACA6 structure and its connection to other IST-superfamily members provide a missing link in our knowledge of mammalian gamete fusion. Significance Statement SPACA6 is a human sperm protein vital for the fusion of gametes, though its exact function remains a mystery. We present the first solved structure of SPACA6: a two-domain fold comprised of an Ig-like domain and a distorted four-helix bundle. Dali searches of the PDB and AlphaFold reveal a family of structurally related proteins, several of which are also known to play a role in gamete fusion; as such, SPACA6 is a founding member of a conserved protein superfamily, dubbed the IST superfamily. Evolutionary analysis to ascertain functionally relevant structural elements in SPACA6 show a conservation of flexibility between the two domains and several conserved surfaces that could function as protein-protein interfaces. Every human life begins with two separate haploid gametes: a sperm from the father and an oocyte Outside of gamete fertilization, the chemical process of fusing two lipid bilayers has been studied 43 extensively. In general, membrane fusion is an energetically unfavorable process, requiring protein 44 catalysts that undergo changes in structural conformation to draw two membranes close together, 45 disrupt their continuity, and induce fusion (8, 9) . Dubbed fusogens, these protein catalysts have 46 been found in a myriad of fusion systems. They are necessary for viral entry into host cells (e.g., gp160 in HIV-1, spike in coronaviruses, hemagglutinin in influenza viruses) (10-12), the formation Recently, a collection of newly discovered, sperm-expressed proteins with similar phenotypes to 62 IZUMO1 and JUNO have been discovered (20, 29-33). Sperm Acrosome Membrane-Associated 63 protein 6 (SPACA6) was identified as essential in fertilization during a large-scale mutagenesis 64 study in mice. Transgene insertion into the Spaca6 gene produced sperm unable to fuse, although 65 these sperm penetrated into the perivitelline space (34). Subsequent knockout studies in mice 66 confirmed that Spaca6 is essential for gamete fusion (29, 31). SPACA6 is expressed almost 67 exclusively in the testis and has a localization pattern similar to that of IZUMO1, i.e. within the inner 68 membrane of sperm prior to the acrosome reaction, followed by relocation to the equatorial region In the interest of better understanding the fundamental processes behind human sperm-egg fusion 75 thereby informing future advances in both family planning and infertility treatmentwe undertook 76 structural and biochemical studies of SPACA6. The crystal structure of the SPACA6 ectodomain 77 revealed a four-helix bundle (4HB) and immunoglobulin-like (Ig-like) domain that are connected by 78 a quasi-flexible region. Interestingly, the domain architecture of SPACA6 is similar to that of human 79 IZUMO1, with both proteins sharing an uncommon motif: a 4HB with a triangular face of helices 80 and pair of disulfide-bonded CXXC motifs. We propose that IZUMO1 and SPACA6 now define a 81 larger, structurally related superfamily of gamete fusion-associated proteins. Using the hallmark 82 features specific to the superfamily, we carried out an exhaustive search of the AlphaFold structural 83 human proteome, revealing additional members of this superfamily that are all linked to gamete 84 fusion. It now appears that there is a common structural fold and superfamily of proteins that are 85 associated with gamete fusion, with our structure providing a molecular picture of this important 86 aspect of the human gamete-fusion machinery. A soluble monomeric SPACA6 ectodomain 92 SPACA6 is a single-pass transmembrane protein with one N-linked glycan and six predicted 93 disulfide linkages (SI Appendix, Fig. S1A and Fig. S2 ). We expressed the extracellular domain of 94 human SPACA6 (residues 27-246) in Drosophila S2 cells and purified the protein using nickel-95 affinity, cation-exchange, and size-exclusion chromatographies (SI Appendix, Fig. S1B ). The 96 purified SPACA6 ectodomain was highly stable and homogeneous. Analysis with size-exclusion 97 chromatography coupled multi-angle light scattering (SEC-MALS) revealed a single peak with a 98 calculated molecular weight of 26.2  0.5 kDa (SI Appendix, Fig. S1C ). This is consistent with the 99 size of a monomeric SPACA6 ectodomain, indicating that no oligomerization occurred during 100 purification. Furthermore, circular dichroism (CD) spectroscopy revealed mixed α/β structure with 101 4 a melting temperature 51.3 °C (SI Appendix, Fig. S1D and S1E). Deconvolution of the CD spectra 102 showed 38.6% α-helix and 15.8% β-strand elements (SI Appendix, Fig. S1D ). The SPACA6 ectodomain was crystallized using a random matrix microseeding approach(35), 106 yielding a 2.2-Å resolution dataset (SI Appendix, Fig. S3 and Table S1 ). The structure was 107 determined using a combination of fragment-based molecular replacement and SAD phasing data 108 from bromide soaks (SI Appendix, Fig. S4 and Table S1), with the final refined model consisting of 109 residues 27-246. The SPACA6 ectodomain, with dimensions of 20 Å x 20 Å x 85 Å, is made up of 110 seven helices and nine β-strands and adopts an elongated tertiary fold stabilized by six disulfide 111 bonds (Fig. 1A) . The structure consists of two domains: an N-terminal four-helix bundle (4HB) and 112 a C-terminal Ig-like domain, with an intermediary hinge region between the two (Fig. 1B) . The 4HB domain of SPACA6 includes four main helices (Helices 1-4) arranged in a coiled-coil 115 fashion ( Fig. 2A) that alternate between antiparallel and parallel interactions (Fig. 2B) . A small 116 additional single-turn helix (Helix 1') packs perpendicularly with the bundle, forming a triangular 117 shape with Helices 1 and 2. This triangle produces a slight distortion in the coiled-coil packing 118 relative to the tight packing of Helices 3 and 4 ( Fig. 2A) . The 4HB is centered around an internal hydrophobic core made up predominantly of aliphatic and 121 aromatic residues (Fig. 2C) . The core accommodates a disulfide bond between Cys41 and Cys55, 122 which pinches Helices 1 and 2 together at the top, accentuating the triangular shape (Fig. 2D) . Two 123 additional disulfide bonds are formed between the CXXC motif of Helix 1' and another CXXC motif 124 found at the tip of a β-hairpin in the hinge region (Fig. 2D) . A conserved arginine residue (Arg37) 125 of unknown function resides within the triangular hollow produced by Helices 1', 1 and 2. The Cβ, Cγ, and Cδ aliphatic carbons of Arg37 interact with the hydrophobic core, and its guanidium group 127 makes contacts with the loop between Helices 1' and 1 via Thr32 main and side chain interactions 128 (SI Appendix, Fig. S5A and S5B). Tyr34 stretches over the hollow, leaving two small cavities 129 through which Arg37 can interact with solvent. Ig-like -sandwich domains are a large superfamily of proteins that share the common characteristic 132 of two or more multi-stranded, amphipathic -sheets interacting via a hydrophobic core(36). The C- The four-stranded -sheet twists significantly throughout its length, producing asymmetric edges 142 that are distinct in shape and electrostatics. The thinner edge presents a flat hydrophobic surface 143 to the environment, which stands out against the rest of the uneven and electrostatically diverse 144 surface in SPACA6 (SI Appendix, Fig. S6B and S6C). A halo of exposed backbone carbonyl/amino 145 groups and polar side chains surrounds the hydrophobic surface (SI Appendix, Fig. S6C ). The 146 wider edge is partially covered by a capping coiled segment that blocks the N-terminal portion of 147 the hydrophobic core and forms three hydrogen bonds with the exposed backbone polar groups of 148 Strand F (SI Appendix, Fig. S6D ). The C-terminal portion of this edge produces a large pocket with 149 a partially exposed hydrophobic core. The pocket is surrounded by positive charges due to three 150 5 sets of dual arginine residues (Arg162-Arg221, Arg201-Arg205, and Arg212-Arg214) and a central 151 histidine (His220) (SI Appendix, Fig. S6E ). The hinge region is a short segment between the helical and Ig-like domains that is made up of a 155 single antiparallel three-stranded β-sheet (Strands A, B, and C), a small 310 helix, and several long The structure of SPACA6 bears a striking similarity to IZUMO1 (40-42). Both SPACA6 and IZUMO1 Previous studies have noted the potential for structural similarities between SPACA6 and IZUMO1 185 (7, 31, 41)an early attempt at a homology model even predicted an N-terminal 4HB in mouse Whereas connectivities and secondary structure elements are well conserved between IZUMO1 208 and SPACA6, a structural alignment of the Ig-like domains revealed that the overall orientations of 209 the two domains relative to each other are different (SI Appendix, Fig. S10 ). The helical bundle of 210 IZUMO1 is bent relative to the β-sandwich, producing a previously described "boomerang" shape 211 that deviates by about 50° from the central axis (40). In contrast, the helical bundle in SPACA6 has 212 an approximately 10° lean in the opposite direction. These differences in orientation likely result 213 from differences within the hinge region. At the primary sequence level, IZUMO1 and SPACA6 214 share almost no sequence similarity in the hinge save for the cysteine residues, a glycine, and an 215 aspartate. As a result, the hydrogen-bonding and electrostatic networks are completely different. The secondary structure element of the β-sheet is shared between IZUMO1 and SPACA6, although Table S2 ). Recently, DeepMind (Alphabet/Google) developed AlphaFold, a neural 224 network-based system that accurately predicts protein 3D structure from a primary sequence (46). Shortly after we solved the SPACA6 structure, the AlphaFold Database was released, providing 226 predicted structural models that cover 98.5% of all proteins in the human proteome (46, 47). Using Unlike IZUMO proteins, the other SPACA proteins (i.e., SPACA1, SPACA3, SPACA4, SPACA5, 243 and SPACA9) are predicted to be structurally divergent from SPACA6 (SI Appendix, Fig. S12 ). Only SPACA9 has a 4HB, but it is not predicted to have the same parallel-antiparallel orientation 245 as SPACA6 or the same disulfide linkages. Only SPACA1 has a similar Ig-like domain. SPACA3, 246 SPACA4, and SPACA5 are predicted by AlphaFold to have completely different structures from 247 SPACA6. Interestingly, SPACA4 is also known to play a role in fertilization but further upstream 248 than SPACA6, instead aiding in the interactions between sperm and the oocyte zona pellucida (50). Another match to the SPACA6 4HB, as predicted by AlphaFold, is TMEM95. TMEM95 contains the 251 pair of CXXC motifs and the additional disulfide between Helices 1 and 2 ( Fig. 3A and SI Appendix, 252 Fig. 11 ). Whereas TMEM95 lacks an Ig-like domain, it has a region with the same disulfide bonding 253 patterns as the hinge regions of both SPACA6 and IZUMO1 (Fig. 3B) . Interestingly, TMEM95 is a 254 sperm-specific, single-pass transmembrane protein that when ablated leaves male mice infertile 255 (31, 32). Sperm lacking TMEM95 have normal morphology, motility, and ability to penetrate the 256 zona pellucida and bind the oolemma but are not able to fuse with oocyte membranes. TMEM95, 7 much like SPACA6 and IZUMO1, is evolutionary conserved as far back as amphibians ( Fig. 4 and 258 SI Appendix, Fig. S13 ). Thus, the striking overall structural similarities between SPACA6 and IZUMO1 suggests that these 261 are the founding members of a conserved structural superfamily of gamete fusion-associated 262 proteins that includes TMEM95 and IZUMO proteins 2, 3, and 4. We propose the name IST 263 superfamily after the initials of the three members known to be associated with gamete fusion so 264 far: IZUMO1, SPACA6, and TMEM95. As only certain members possess an Ig-like domain, the 265 hallmark feature of the IST superfamily is the 4HB domain, which has unique characteristics shared 266 by all these proteins: 1) the distorted 4HB has helices packed in an alternating anti-parallel/parallel 267 fashion (Fig. 5A) , 2) the bundle has a triangular face made from two helices within the bundle and 268 a third perpendicular helix (Fig. 5B) , and 3) a double CXXC motif connects the perpendicular helix 269 in the 4HB to a flexible hinge region via dual disulfide bonds (Fig. 5C) . The CXXC motif, found in No signal was detected when SPACA6 was used as the analyte against either sensor-bound 283 IZUMO1 or sensor-bound JUNO (SI Appendix, Fig. S14A and S14B). This lack of signal provides 284 evidence that the SPACA6 ectodomain does not interact with the ectodomains of IZUMO1 or 285 JUNO. Since the BLI as an assay relies on biotinylation of free lysine residues on the bait protein, this 288 modification may prevent binding if lysine residues are involved in the interaction. In addition, the 289 binding orientation relative to the sensor may create steric hindrances; thus, traditional pull-down 290 assays were also performed with recombinant SPACA6, IZUMO1, and JUNO ectodomains. SPACA6 surface has three patches of highly conserved residues Despite the known necessity of SPACA6 for gamete fusion and its similarity to IZUMO1, SPACA6 298 does not appear to perform the equivalent function of binding JUNO. Therefore, we sought to 299 combine our structural data with evidence of importance provided by evolutionary biology. Sequence alignments of the SPACA6 homologs suggest a conservation of the general structure 301 beyond mammals. For example, the cysteine residues are present even in distantly related 302 amphibian animals (Fig. 6A) . Using the ConSurf server, the conservation data from a multiple-303 sequence alignment of 66 sequences was mapped onto the surface of SPACA6. This type of 304 analysis can reveal those residues that have been maintained throughout the protein's evolution 305 and can suggest which surface areas play a role in function. The SPACA6 structure has three highly conserved surface patches (Fig. 6B) . Patch 1 spans the 308 4HB and the hinge region, and contains the two conserved CXXC disulfide bridges, the Arg233- , Fig. S6E ), which presents several positively charged residues toward 312 the sperm surface. Interestingly, this patch holds an antibody epitope previously shown to prevent 313 SPACA6 from functioning (29). Patch 3 spans the hinge and one side of the Ig-like domain; this 314 region has conserved prolines (Pro126, Pro127, Pro150, Pro154) and outward facing polar/charged 315 residues. Strangely, the majority of the residues on the 4HB surface are quite variable (Fig. 6B) , 316 despite the fold's conservation throughout the SPACA6 homologs (as indicated by the bundle's 317 hydrophobic core being conserved) and beyond into the IST superfamily. Although it is the smallest region of SPACA6 with the fewest definable secondary structure 321 elements, many hinge region residues (including Patch 3) are highly conserved amongst SPACA6 To specifically identify regions of flexibility, hydrogen-deuterium exchange mass spectrometry (H-335 DXMS) was performed on SPACA6 and compared to previously acquired data on IZUMO1 (40) 336 ( Fig. 7A and 7B) . SPACA6 is clearly more flexible than IZUMO1, as shown by the higher deuterium 337 exchange over the entire structure after 100,000 seconds of exchange. In both structures, the C- SPACA6 and its fellow members of the IST superfamily appear to be highly conserved in mammals, 355 as well as select birds, reptiles, and amphibians; indeed, SPACA6 is even known to be essential 356 for fertilization in zebrafish (56). This distribution is similar to those of other known gamete fusion-357 associated proteins such as DCST1 (33), DCST2 (33), FIMP (30), and SOF1 (31), suggesting that 358 these factors are part of a conserved molecular mechanism for fertilization used by the higher 359 eukaryotes that lack the HAP2 (also known as GCS1) protein, which is the fusion protein In summary, the functions of the members of the IST superfamily of gamete fusion-associated 407 proteins remains an enticing mystery. The structure of SPACA6 provides insight into the next steps 408 that will connect these shared structures to gamete attachment and fusion. Thermal denaturation assays were performed at a wavelength of 207 nm by increasing the 472 temperature from 20 to 80 °C in 5 °C intervals with 2-min equilibration between temperature points. Four scans were taken per temperature point, averaged, and baseline corrected. The resultant 474 change in ellipticity was normalized between 0 (folded) and 1 (unfolded) and fit to a non-linear Biolayer Interferometry The binding affinities of SPACA6 to IZUMO1 and JUNO were measured by BLI using a single- Transfer membranes were then incubated with a 1:10,000 dilution of primary mouse anti-6xHis 585 antibody (Roche, cat. #11922416001) for 1 h, followed by 1:10,000 secondary HRP-conjugated 586 goat anti-mouse IgG (H+L) antibody (Invitrogen, cat. #62-6520) for 1 h with three 10-min PBST 587 washes in between. The membranes were developed using UltraScence Pico Western Substrate (BIO-HELIX), and its chemiluminescence signal was imaged using a G:Box gel documentation 589 system (Syngene). Reported results are representative from independent duplicate trials. Small angle X-ray scattering 592 SEC-SAXS was performed at NSLS-II using their mail-in service on the Life Sciences X-ray Scattering 16-ID beamline (79-81). The SPACA6 ectodomain was dialyzed into commercially 594 prepared 1X PBS (Millipore Sigma) and concentrated to ~6.5 mg mL −1 prior to shipment on ice. The protein was centrifuged at 20,000 x g for 10 min before loading 45 µL onto a Superdex 200 Increase 5/150 GL column (Cytiva) at 0.5 mL min −1 on a Shimadzu bio-inert HPLC system for in-597 line SAXS measurements. Flow from the column was split 2:1 using a passive splitter between the 598 X-ray scattering measurements and the UV/Vis and refractive index detectors. Subsequent buffer 599 subtraction, peak selection, and profile analysis was undertaken using Lixtools (79) Fertilization: a sperm's journey to and 662 interaction with the oocyte Fertilizing capacity of spermatozoa deposited into the fallopian tubes The capacitation of the mammalian sperm Molecular Basis of Human Sperm Capacitation. Front Sperm penetration through cumulus mass and zona pellucida Sperm acrosome reaction: its site and role in fertilization The Fertilization Enigma: How Sperm and Egg Fuse Mechanics of membrane fusion Virus and cell fusion mechanisms The HIV-1 envelope glycoproteins: fusogens, antigens, and 679 immunogens Structure, Function, and Evolution of Coronavirus Spike Proteins The structure and function of the hemagglutinin membrane 683 glycoprotein of influenza virus Syncytin is a captive retroviral envelope protein involved in human placental 685 morphogenesis Genomewide screening for fusogenic 687 human endogenous retrovirus envelopes identifies syncytin 2, a gene conserved on primate 688 evolution Syncytin-A knockout mice demonstrate the critical role in placentation 690 of a fusogenic, endogenous retrovirus-derived, envelope gene Arabidopsis hapless Mutations Define Essential Gametophytic 693 Functions GENERATIVE CELL SPECIFIC 1 is 695 essential for angiosperm fertilization Arabidopsis HAP2 (GCS1) is a sperm-697 specific gene required for pollen tube guidance and fertilization The conserved plant sterility gene HAP2 functions after attachment of fusogenic 700 membranes in Chlamydomonas and Plasmodium gametes The cell biology of fertilization: Gamete 703 attachment and fusion Severely reduced female 705 fertility in CD9-deficient mice Requirement of CD9 on the egg plasma membrane for fertilization The gamete fusion process is defective in eggs of Cd9-deficient mice Oocyte CD9 is enriched on the microvillar membrane and required for 711 normal microvillar shape and distribution CD9 tetraspanin generates fusion competent sites on the egg membrane for 713 mammalian fertilization The immunoglobulin superfamily protein Izumo is 715 required for sperm to fuse with eggs Juno is the egg Izumo receptor and is essential 717 for mammalian fertilization JUNO, the receptor of sperm IZUMO1, is expressed by the human oocyte and 719 is essential for human fertilisation Sperm SPACA6 protein is required for mammalian Sperm-Egg 721 Spermatozoa lacking Fertilization Influencing Membrane Protein (FIMP) 723 fail to fuse with oocytes in mice Sperm proteins SOF1, TMEM95, and SPACA6 are required for sperm−oocyte 725 fusion in mice TMEM95 is a sperm membrane protein essential for mammalian 727 fertilization Evolutionarily conserved sperm factors, DCST1 and DCST2, 729 are required for gamete fusion A transgenic insertion on mouse chromosome 17 inactivates a novel 731 immunoglobulin superfamily gene potentially involved in sperm-egg fusion Microseed matrix screening to improve crystals of yeast cytosine 734 deaminase Protein folds in the all-beta and all-736 alpha classes The Immunoglobulin Superfamily-Domains for Cell Surface 738 Recognition The Immunoglobulin Fold: Structural Classification, Sequence 740 Patterns and Common Core Using Dali for Protein Structure Comparison Molecular architecture of the human 743 sperm izumo1 and egg juno fertilization complex The structure of sperm Izumo1 reveals unexpected similarities with 745 Plasmodium invasion proteins Structure of IZUMO1-JUNO reveals sperm-oocyte recognition during 747 mammalian fertilization SNAREs -engines for membrane fusion ATG14 promotes membrane tethering and fusion of autophagosomes to 751 endolysosomes Coiled-coils: The long and short of it Highly accurate protein structure prediction with AlphaFold AlphaFold Protein Structure Database: massively expanding the structural 757 coverage of protein-sequence space with high-accuracy models Izumo is part of a multiprotein family whose members form large 760 complexes on mammalian sperm IZUMO family member 3, IZUMO3, is involved in male fertility 762 through the acrosome formation The conserved fertility factor SPACA4/Bouncer has divergent modes of 764 action in vertebrate fertilization The CXXC Motif: A Rheostat in the Active Site The CXXC Motif Is More than a 768 Similarities and differences in the thioredoxin 770 superfamily A role for sperm surface protein disulfide isomerase 772 activity in gamete fusion: evidence for the participation of ERp57 Oocyte-triggered dimerization of sperm 775 IZUMO1 promotes sperm-egg fusion in mice The Sperm Protein Spaca6 is Essential for Fertilization in Zebrafish. Front 777 HAP2/GCS1: Mounting evidence of our true biological EVE? Virus and eukaryote fusogen superfamilies Cadherins: a molecular family important in selective cell-cell adhesion Integrin Structure, Activation, and Interactions. Cold Spring Bacterial adhesins: function and structure On the remarkable mechanostability of scaffoldins and the mechanical 789 clamp motif DICHROWEB, an online server for protein secondary structure 791 analyses from circular dichroism spectroscopic data DIALS: implementation and evaluation of a new integration package How good are my data and what is the resolution? Exploiting distant homologues for phasing through the generation of compact 798 fragments, local fold refinement and partial solution combination Crystallographic ab initio protein structure solution below atomic 801 resolution Phaser crystallographic software ALEPH: a network-oriented approach for the generation of fragment-based 805 libraries and for structure interpretation ALIXE: a phase-combination tool 807 for fragment-based molecular replacement An introduction to experimental phasing of macromolecules 809 illustrated by SHELX; new autotracing features Macromolecular structure determination using X-rays, neutrons and 811 electrons: recent developments in Phenix Decision-making in structure solution using Bayesian estimates of 813 map quality: the PHENIX AutoSol wizard Towards automated crystallographic structure refinement with 816 phenix.refine Features and development of Coot The PDB_REDO server for 820 macromolecular structure model optimization MolProbity: More and better reference data for improved all-atom 822 structure validation Improvements to the APBS biomolecular solvation software suite Tools for supporting solution scattering during the COVID-19 pandemic Solution scattering at the Life Science X-ray Scattering (LiX) beamline Robotic sample changers for macromolecular X-ray crystallography and 830 biological small-angle X-ray scattering at the National Synchrotron Light Source II ATSAS 3.0: expanded functionality and new tools for small-833 angle scattering data analysis Restoring low resolution structure of biological macromolecules from solution 835 scattering using simulated annealing DAMMIF, a program for rapid ab-initio shape determination in small-837 angle scattering Uniqueness of ab initio shape determination in small-angle scattering SASpy: a PyMOL plugin for manipulation and refinement of 841 hybrid models against small angle X-ray scattering data Tables 847 848 849 850