key: cord-0716947-r18br8a4 authors: Crépin, Thibaut; Swale, Christopher; Monod, Alexandre; Garzoni, Frederic; Chaillet, Maxime; Berger, Imre title: Polyproteins in structural biology date: 2015-05-18 journal: Curr Opin Struct Biol DOI: 10.1016/j.sbi.2015.04.007 sha: 4c747be5467e8f28cfacfe7ecea2072f297f0e3b doc_id: 716947 cord_uid: r18br8a4 Polyproteins are chains of covalently conjoined smaller proteins that occur in nature as versatile means to organize the proteome of viruses including HIV. During maturation, viral polyproteins are typically cleaved into the constituent proteins with different biological functions by highly specific proteases, and structural analyses at defined stages of this maturation process can provide clues for antiviral intervention strategies. Recombinant polyproteins that use similar mechanisms are emerging as powerful tools for producing hitherto inaccessible protein targets such as the influenza polymerase, for high-resolution structure determination by X-ray crystallography. Conversely, covalent linking of individual protein subunits into single polypeptide chains are exploited to overcome sample preparation bottlenecks. Moreover, synthetic polyproteins provide a promising tool to dissect dynamic folding of polypeptide chains into three-dimensional architectures in single-molecule structure analysis by atomic force microscopy (AFM). The recent use of natural and synthetic polyproteins in structural biology and major achievements are highlighted in this contribution. Polyproteins composed of covalently linked individual proteins with different biological functions are prevalent in nature. For instance SARS coronavirus, the agent that causes severe acute respiratory syndrome, realizes its entire proteome from two large polyproteins, each encoded by a long single open reading frame (ORF) [1] . The expressed SARS polyproteins are then processed into the individual functional protein subunits by the action of highly specific proteases also encoded by the ORFs [1, 2] . A further example is human immune deficiency virus (HIV) which causes acquired immune deficiency syndrome (AIDS). The RNA genome of this retrovirus is organized in three major genes gag, pol and env, which encode for polyproteins and undergo proteolytic processing at defined stages during maturation [3] . Recombinant polyprotein approaches mimicking viral polyproteins have recently emerged as a powerful means to express high-value protein complexes for structure determination [4, 5 ] . Recently, the long elusive influenza polymerase has been produced successfully from a self-processing synthetic polyprotein, enabling high-resolution structure determination [6 ,7] . Polyprotein fusions which are not processed by protease, but remain covalently conjoined by engineered linkers have been instrumental to obtain important insight into numerous essential physiological processes including multidrug efflux, co-translational protein targeting or enzymatic processing of chromatin, among others [8 ,9,10,11 ,12 ] . Moreover, single-chain engineering approaches linking protein domains into novel, artificial polyproteins have resulted in new classes of high-affinity binder molecules as potential protein therapeutics [13] and accelerated elucidation of mechanisms governing protein folding by single-molecule techniques [14] . Thus, polyprotein technologies have recently gained prominence as particularly useful tools for unlocking previously often inaccessible protein samples to detailed structural and mechanistic studies, as illustrated in the following. In this contribution, the use of polyproteins in structural biology is discussed, by highlighting recently determined structures of naturally occurring polyproteins on one hand and by reviewing recent structural studies where recombinant polyprotein constructions were utilized on the other. Polyproteins are used in nature by many viruses to structure their proteome. As a consequence, viral polyproteins are an intense focus of research efforts for numerous reasons [15] [16] [17] . For instance, inhibition by small molecules of proteases that process polyproteins during viral maturation can provide a powerful handle to combat viral disease [2, [18] [19] [20] . In HIV, the physical infrastructure of the virus is provided by the gag gene which gives rise to the precursor Gag (group specific antigen) polyprotein [21] . Viral protease processes Gag during maturation into several proteins and spacer peptides dictating immature and mature viral capsid structure [22] . Maturation is a two-stage process. Precursor Gag polyprotein first forms a hexameric lattice at the plasma membrane of an infected cell. This induces budding and release of immature viral particles. Proteolytic processing of Gag then rearranges the viral structure to the mature form [23] . Inhibition of Gag-processing protease enabled preparation of immature retroviral capsids suitable for structure analysis by electron cryo-tomography and subtomogram averaging methods ( Figure 1 ) [24] [25] [26] 27 ] . Comparison of the structures of HIV Gag protein in reconstituted tubular arrays on one hand and in intact virus particles on the other, provided unprecedented insights into the conformational plasticity of this precursor polyprotein [26, 27 ] . The studies also revealed that retroviral capsid proteins can adopt different quaternary arrangements during virus assembly, notwithstanding conserved tertiary structures [27 ] . A different, less frequently encountered type of natural, non-viral polyproteins are so-called tandemly repetitive polyproteins (TRPs). TRPs are also produced as large precursor proteins and then processed by proteases into several copies of proteins with similar function. TRPs are made of consecutively arranged repeats of amino-acid stretches. Examples of known TRPs include polyprotein lipid binding proteins of nematodes [28] and filaggrins, which are keratinocyte produced TRPs crucial to health and appearance of skin [29, 30] . The solution structure of a mature, post-translationally processed repeat unit of a TRP, ABA-1A from the nematode polyprotein allergen of Ascaris, was determined, representing the first structure of this class of proteins [31] . ABA-1A adopts a novel fold comprising two juxtaposed four-helical bundles that share a long central alpha-helix ( Figure 1 ). Nematode polyprotein allergens have no known counterpart in humans. The Ascaris ABA-1A structure therefore may serve as a starting point for the development of new drugs and therapeutic intervention strategies against disease states caused by these intestinal parasites. An inverse 'polyprotein' concept of covalently linking functional protein units into long modular polypeptide chains characterizes mega-enzymes that functionally arrange multiple domains into ordered assembly lines for the production of a wide variety of bioactive molecules. Modular polyketide synthases (PKS) and their metazoan homologs, fatty acid synthases (FAS) belong to this class Natural polyprotein structures. Polyproteins, prevalent in biology, are illustrated here by the structures of the human immune deficiency virus (HIV) immature capsid determined by electron cryo-tomography revealing molecular details of hexameric Gag [24, 26, 27 ] , the structure of a repeated unit of the ABA-1 nematode polyprotein allergen derived from nuclear magnetic resonance (NMR) spectroscopy [31] and the crystal structure of a single-chain, multi-domain long-chain acyl-CoA carboxylase, LCC [34] . CT stands for carboxyltransferase, BCCP for biotin carboxyl carrier protein, and BC for biotin carboxylase components. of catalysts [32, 33] . Recently, the structure and function of a modular multi-domain long-chain acyl-CoA carboxylase from Mycobacterium avium was elucidated [34] . The crystal structure revealed extensive swapping of functional domains in the holo-enzyme which is a homo-hexamer ( Figure 1 ). Thus, four intertwined protomers are involved in completing one catalytic reaction cycle [34] . Influenza polymerase produced from a recombinant polyprotein The highly successful strategy of viruses to utilize polyproteins to their advantage has found its equivalent in recombinant technology. A synthetic polyprotein, expressed recombinantly in baculovirus-infected insect cells, has enabled the structure determination of influenza polymerase. Despite its common appearance, a detailed understanding of the molecular mechanisms of the virus that causes influenza has remained elusive. More than 40 years ago, the influenza polymerase was discovered, a key protein complex that replicates the genetic material of the virus [35, 36] . Atomic resolution information on the structure and function of this protein machine is essential, as it may open up important avenues for drug discovery. However, the influenza polymerase remained inaccessible for decades -to produce this valuable protein complex for detailed analysis, proved to be a seemingly insurmountable technical challenge. This has now changed Polyproteins in structural biology Cré pin et al. 141 dramatically with structures of influenza polymerase complex determined by X-ray crystallography [6 ,7] . These break-through studies provide unprecedented insight into the inner workings of this viral protein machine. This revolution in understanding influenza was brought about by applying a polyprotein strategy to produce influenza polymerase recombinantly, in the quality and quantity required for high-resolution structural and functional analysis (Figure 2 ). The strategy applied recapitulates the mechanism adopted by SARS coronavirus. A single ORF was constructed encoding for a highly specific protease, NIa, from tobacco etch virus (TEV), fused in frame with consecutively arranged PA, PB1 and PB2 subunits of the polymerase. At the far end of the polyprotein, a reporter protein was inserted to track protein production 'in real-time' during expression, by recording fluorescence [5 ] . All protein units within the polyprotein were spaced apart by customized linkers containing the specific site for cleavage by the TEV protease, purification tag sequences and spacer residues ( Figure 2 ). The polyproteins encoding for influenza polymerases were successfully expressed with the Multi-Bac baculovirus insect cell expression system developed for producing multiprotein complexes [37] . The polyprotein encoding for influenza polymerase was proteolytically cleaved into the constituent protein subunits to yield the sample that crystallized. Conversely, engineering of individual proteins into covalently linked polypeptide chains can also accelerate structure determination considerably. These 'polyproteins' remain conjoined as single-chains during sample preparation and structure determination. Particularly prominent examples for such single-chain engineering include the insertion of T4 lysozyme into the primary sequence of G-protein coupled receptors (GPCRs) to facilitate crystallogenesis [38] . In a variation of this approach, the catalytic domain of Pyrococcus abysii glycogen synthase was recently used to stabilize an intracellular loop of the human OX 2 orexin receptor to determine its crystal structure bound to an insomnia drug [39] . Elaborate single-chain engineering into polyproteins was applied to determine the architecture of the bacterial multidrug efflux pump AcrABZ-TolC [8 ] . In this study, the efflux pump was assembled by preparing two single-chain polypeptide fusions, AcrB-AcrA-AcrB and 142 New constructs and expressions of proteins Linking individual proteins into polypeptide chains. Proteins engineered into single polypeptide chains were used to obtain suitable sample for structure determination of AcrABZ complex [8 ] , an alphabody (MA12) to neutralize human interleukin Il-23 (p19 and p40) [13] and the first structure of a histone-modifying enzyme, the Polycomb Repressive Complex (PRC) 1 ubiquitylation module, bound to a nucleosome [12 ] . Linker amino acid segments are marked (L, L1, L2). AcrA-AcrZ, respectively, each conjoined by extended glycine/serine rich linkers. TolC was co-expressed with the AcrA-AcrZ fusion. This strategy resulted in a reconstituted complex with the correct stoichiometry enabling structure determination by hybrid methods. Fitting of crystal coordinates into EM densities allowed to model AcrBZ/AcrA interactions (Figure 3 ) as well as the holocomplex containing TolC. Single-chain engineering likewise enabled the development of alphabodies, a novel scaffold representing a promising alternative to antibodies for various biomedical and biotechnological applications [13] . Alphabodies comprise in silico designed short individual alpha-helical protein segments that are then conjoined by covalent linkers into single-chain antiparallel coiled-coils that are highly stable and suitable for affinity maturation. The crystal structure of a complex with human interleukin (IL-23) revealed the structural basis of IL-23 antagonism by the alphabody, MA12 (Figure 3 ). Covalent linking of subunits of a protein complex was required to obtain the first crystal structure of a histonemodifying enzyme complex bound to a nucleosome core particle [12 ] . Polycomb repressor complex (PRC) 1, an essential regulator of cell fate, comprises an activity to ubiquitylate nucleosomal histone H2A at residue K119. PRC1 uses its E3 ubiquitin ligase subunits, Ring1B and Bmi1, together with an E2 ubiquitin-conjugating enzyme, UbcH5c for this purpose. The E2-E3 complex was Polyproteins in structural biology Cré pin et al. 143 [10 ] ). The cryo-EM structure of the ribosome-SRP-FtsY co-translational targeting complex in the closed state is shown on the right (adapted from [11 ] ). produced using an engineered single-chain fusion of Ring1B to UbcH5c to overcome the low affinity and salt-sensitive E2-E3 interaction. The resulting singlechain construct was co-expressed with Bmi1 to produce the trimeric complex which was purified and reconstituted with nucleosome core particles, and crystallized [12 ] . The structure showed two copies of PRC1 E2-E3 complex bound to one nucleosome, revealing intricate interactions ( Figure 3 ). Around one third of the proteins in living cells are delivered to the plasma membrane. This is carried out by a universally conserved, complex mechanism involving ribosomes that are translating mRNA into membrane-bound nascent polypeptide chains, the signal recognition particle, SRP and the SRP receptor, FtsY [9, 40, 41] . Snapshots of this elaborate process were obtained by single particle electron cryo-microscopy and biochemical analysis [9,10 ,11 ,42-44] . These studies revealed SRP binding to ribosome nascent chain complexes [42] , followed by the 'early' [44] , 'proofreading' [10 ] and 'closed' [11 ] states upon FtsY binding and GTP hydrolysis. Successful structure determination critically relied on stabilizing SRP binding to FtsY. This was achieved by covalently linking the SRP subunit Ffh with FtsY into a single polypeptide chain with virtually wild-type activity (Figure 4 ). Transient unfolding and refolding of proteins can be an essential feature of protein structure space in living organisms, for example in the translocation of proteins into and across cellular membranes or the muscle stroke. The use of atomic force microscopy (AFM) has emerged as a powerful technique to probe protein structure, enabling analysis of the mechanical stability and folding pathway of protein specimens at the single-molecule level. By means of a stretching force, applied through a microscopic cantilever to a biological target fixed to a support and recorded by the deflection of a laser beam, the analyzed protein is unfolded to an extended state ( Figure 5 ). Historically, polyproteins were used in single-molecule AFM to measure unique mechanical fingerprint profiles of a protein response to the applied mechanical force [14, 45 ] . The muscle protein titin is a prominent example [46] [47] [48] . Titin consists of several hundred repeated protein domains including fibronectin and immunoglobulinlike folds. The use of polyproteins in single-molecule AFM results in considerable improvement of the statistical evaluation of singular domains within the polyprotein chain [14] . This can be exploited in the analysis of homomeric polyproteins that are constructed genetically or by chemical fusion reaction from identical copies of the same protein species of choice. Moreover, in addition to providing a clear fingerprint, polyproteins also have the advantage that a larger number of events can be recorded per experiment as compared to only one event if monomeric proteins are used. For these reasons polyproteins emerge as work-horses of single-molecule structural biology by AFM. Numerous polyproteins have been analyzed for their mechanical properties by using this technique, including poly-I27, derived from the I-band region of Titin, oligo-calmodulin, poly-ubiquitin, polyproteins made of the virulence factor GB1 of Peptostreptococcus magnus [46] [47] [48] [49] [50] [51] [52] [53] [54] and others, providing unique insights into biological folding/unfolding mechanisms. The availability of a large and growing number of well characterized homomeric specimens furthermore enables now the construction of chimeric polyproteins as a tool to study mechanically uncharacterized proteins, by using the unique fingerprints of the known protein unit as a reference [14, 45 ] . Natural and synthetic polyproteins are at the core of contemporary structural biology. Analysis of viral polyproteins 144 New constructs and expressions of proteins Polyprotein single-molecule structural biology. The setup of atomic force microscopy of polyproteins is shown in a schematic representation (adapted from [14] ). The polyprotein is tethered to the gold support resting on the piezoelectronic positioning stage (bottom) on one end, and the tip of a cantilever made of silicon nitride on the other. A laser beam is focused on the back of the cantilever (top). The cantilever is displaced by the force that acts on the polyprotein chain resulting in change of the deflection of the laser beam, recorded by a photodetector. Increasing force causes each domain of the polyprotein to unfold, resulting in characteristic spikes in a force/ extension diagram (inset). and the architectural consequences of their processing during maturation not only furthers our understanding of viral mechanisms but provides important clues for drug design to combat viral diseases. Synthetic polyproteins are emerging as invaluable tools to accelerate research by unlocking hitherto inaccessible proteins for high resolution structure determination. Artificial polyproteins obtained by singe-chain protein engineering approaches are instrumental to overcome sample production bottlenecks and provide novel means to illuminate biological mechanisms, including folding and unfolding properties at the singlemolecule level. We anticipate a major increase in the use of polyproteins in structural biology as valuable tools to tackle large and complex biological systems in the future. Coronaviruses: severe acute respiratory syndrome coronavirus and Middle East respiratory syndrome coronavirus in travelers From SARS to MERS: crystallographic studies on coronaviral proteases enable antiviral drug design HIV and the spectrum of human disease Robots, pipelines, polyproteins: enabling multiprotein expression in prokaryotic and eukaryotic cells Multiprotein complex production in insect cells by using polyproteins Structural insight into cap-snatching and RNA synthesis by influenza polymerase First crystal structure of the influenza polymerase bound to cognate viral RNA substrate, providing unprecedented insights into cap-snatching and RNA synthesis. Influenza polymerase, inaccessible to structural analysis for decades since its discovery Structure of influenza A polymerase bound to the viral RNA promoter Structure of the AcrAB-TolC multidrug efflux pump Structure determination of the bacterial AcrABZ-TolC multidrug efflux pump by hybrid methods combining X-ray crystallography and electron cryo-microscopy. Elaborate single-chain protein engineering was a prerequisite for Cryo-electron microscopy of ribosomal complexes in cotranslational folding, targeting, and translocation Structural basis of signal sequence surveillance and selection by the SRP-FtsY complex Structure of the 'proof-reading' step in co-translational targeting, revealed by electron cryo-microscopy and biochemical analysis of a reconstituted assembly comprising ribosome nascent chain complex, SRP and SRP receptor, FtsY. The SRP protein subunit Ffh and FtsY were provided as an engineered protein fusion conjoined by an extended glycine/serine rich linker. The resulting protein fusion Ribosome-SRP-FtsY cotranslational targeting complex in the closed state High-resolution electron cryo-microscopy study revealing the structure and mechanism of the 'closed' step in bacterial co-translational targeting. The emerging signal sequence bound to the Ffh M-domain was Crystal structure of the PRC1 ubiquitylation module bound to the nucleosome Å crystal structure determination of the first histone-modifying enzyme complex, the PRC1 deubiquitylation module, bound to a nucleosome. Single-chain engineering was required to stabilize the enzyme complex for crystallization Structural basis of IL-23 antagonism by an Alphabody protein scaffold Single molecule force spectroscopy using polyproteins Structural and functional insights into alphavirus polyprotein processing and pathogenesis Viral precursor polyproteins: keys of regulation from replication to maturation Immature and mature dengue serotype 1 virus structures provide insight into the maturation process The early years of retroviral protease crystal structures Structural biology of dengue virus enzymes: towards rational design of therapeutics Mü ller B: Retroviral proteases and their roles in virion maturation HIV Gag polyprotein: processing and early viral particle assembly Krä usslich HG: The molecular architecture of HIV Krä usslich HG: HIV-1 assembly, budding, and maturation. Cold Spring Harb Perspect Med Structure of the immature retroviral capsid at 8 Å resolution by cryo-electron microscopy Role of the SP2 domain and its proteolytic cleavage in HIV-1 structural maturation and infectivity Cryo-electron microscopy of tubular arrays of HIV-1 Gag resolves structures essential for immature virus assembly Structure of the immature HIV-1 capsid in intact virus particles at 8.8 Å resolution The structure of the immature HIV-1 capsid in intact virus particles formed by HIV Gag precursor polyprotein was elucidated by electron cryotomography and sub-tomogram averaging methods, revealing unexpected structural plasticity The polyprotein lipid binding proteins of nematodes The multifunctional role of filaggrin in allergic skin disease Filaggrin -revisited Solution structure of a repeated unit of the ABA-1 nematode polyprotein allergen of Ascaris reveals a novel fold and two discrete lipid-binding sites Architecture of the polyketide synthase module: surprises from electron cryomicroscopy Evolutionary origins of the multienzyme architecture of giant fungal fatty acid synthase Structure and function of a single-chain, multi-domain longchain acyl-CoA carboxylase At the centre: influenza A virus ribonucleoproteins The RNA synthesis machinery of negative-stranded RNA viruses MultiBac: expanding the research toolbox for multiprotein complexes Modified T4 lysozyme fusion proteins facilitate G protein-coupled receptor crystallogenesis Crystal structure of the human OX2 orexin receptor bound to the insomnia drug suvorexant Co-translational protein targeting to the bacterial membrane Fidelity of cotranslational protein targeting by the signal recognition particle Structure of the E. coli signal recognition particle bound to a translating ribosome Multiple conformational switches in a GTPase complex control co-translational protein targeting Cryo-EM structure of the E. coli translating ribosome in complex with SRP and its receptor Single molecule mechanical manipulation for studying biological properties of proteins, DNA, and sugars Authoritative overview article presenting the state-of-the-art of singlemolecule structure analysis of biological samples (proteins, nucleic acid, sugars) by using atomic force microscopy The giant protein titin: a regulatory node that integrates myocyte signaling pathways The complete gene sequence of titin, expression of an unusual approximately 700-kDa titin isoform, and its interaction with obscurin identify a novel Z-line to I-band linking system Hidden complexity in the mechanical properties of titin Quasi-simultaneous imaging/pulling analysis of single polyprotein molecules by atomic force microscopy Mechanical design of proteins studied by single-molecule force spectroscopy and protein engineering The mechanical stability of ubiquitin is linkage dependent Mechanically untying a protein slipknot: multiple pathways revealed by force spectroscopy and steered molecular dynamics simulations Polyprotein of GB1 is an ideal artificial elastomeric protein The molecular mechanism underlying mechanical anisotropy of the protein GB1 We thank all members of our laboratories for their assistance. We thank Timothy J. Richmond, Darren Hart, Christiane Schaffitzel, Alexander Pflug and Stefan Reich for their contributions and especially Rob Ruigrok for discussions and helpful suggestions. We are grateful to the Alliance The authors declare no conflict of interest. Papers of particular interest, published within the period of review, have been highlighted as: of special interest of outstanding interest