key: cord-0989048-jbe0p6av authors: Uetrecht, Charlotte; Heck, Albert J. R. title: Modern Biomolecular Mass Spectrometry and its Role in Studying Virus Structure, Dynamics, and Assembly date: 2011-08-29 journal: Angew Chem Int Ed Engl DOI: 10.1002/anie.201008120 sha: aead3b01863cfd0d9cb72b0618ff93aa16ded1da doc_id: 989048 cord_uid: jbe0p6av Over a century since its development, the analytical technique of mass spectrometry is blooming more than ever, and applied in nearly all aspects of the natural and life sciences. In the last two decades mass spectrometry has also become amenable to the analysis of proteins and even intact protein complexes, and thus begun to make a significant impact in the field of structural biology. In this Review, we describe the emerging role of mass spectrometry, with its different technical facets, in structural biology, focusing especially on structural virology. We describe how mass spectrometry has evolved into a tool that can provide unique structural and functional information about viral‐protein and protein‐complex structure, conformation, assembly, and topology, extending to the direct analysis of intact virus capsids of several million Dalton in mass. Mass spectrometry is now used to address important questions in virology ranging from how viruses assemble to how they interact with their host. Mass spectrometry (MS) is a very powerful analytical technique known to and used by most researchers in the natural and life sciences. The technique behind MS dates back about a hundred years, when people like Thomson and Aston were amongst the first to separate particles of different mass to charge ratios (m/z), discovering isotopes of rare gases and other elements. For half a century, MS remained primarily in the hands of physicists, and often shrouded in secrecy, as it was also used to enrich uranium for the Manhattan project. [1] Around the 1960s, the technique began to be adopted by chemists, for instance working in the petroleum industry, to investigate the chemical nature of the compounds formed in refinement processes. In the domain of chemistry, MS subsequently came to bloom particularly in the chemical analysis of unknown compounds (assisted by the coupling to separation technologies, such as gas and liquid chromatography) and as a research area by itself, termed organic MS, concerned with studying structures and fragmentation mechanisms of ions, and ion-molecule reactions. [2] Up to the 1980s, the impossibility of transferring larger molecules as intact gaseous ions into the vacuum of the mass spectrometer represented a serious bottleneck in the analysis. This problem was gradually solved by new desorption techniques, cumulating in the introduction of matrix assisted laser desorption/ionization (MALDI) [3] and electrospray ionization (ESI) MS. [4] These new ionization methods, together with breakthrough innovations in instrumentation, really opened up applications in biology, nanotechnology, polymer science, and medicine, as well as many other fields in the natural and life sciences. Focusing on the life sciences, MS became a key technology used for peptide sequencing, [5] through which the identity of a protein can be revealed. The speed and sensitivity in MS allows the qualitative and quantitative analysis of the protein content of a whole cell or tissue, nowadays termed proteomics. [6] Similarly, the chemical analysis of the small-molecule content of a cell or body fluid can now be performed using MS, and is termed metabolomics. [7] New desorption techniques and special dedicated mass spectrometers even allow MS-based imaging by the position-sensitive measurement of compound distributions (protein, neuropeptide, metabolite, drug molecule) in a tissue or organelle. [8] Furthermore, ambient desorption technologies have been introduced to directly sample molecular compounds of surfaces and organisms. [9] The importance of MALDI and ESI for these revolutionary developments was recognized by awarding the Nobel Prize in Chemistry to the late John B. Fenn and to Koichi Tanaka. Fenn entitled his Nobel Lecture "Electrospray Wings for Molecular Elephants" [10] as ESI (and MALDI) expanded the mass regime attainable for MS at least 1000-fold. Soon after the introduction of ESI, it became apparent that not only the mass of intact proteins but also the tertiary and quaternary structure of these proteins could be partially retained and therefore analyzed. This potential was evidenced by the early discovery that the noncovalent complex between myoglobin and its heme cofactor could be kept intact in the gas phase. [11] Groundbreaking work of Standing et al., [12, 13] Smith, Loo, et al., [14, 15] Robinson et al., [16] [17] [18] [19] and our own group [20, 21] in the area of MS on intact macromolecular complexes led to a new very powerful tool in structural biology, now termed native mass spectrometry. [22] Next to the well-established peptide-sequencing approach and native MS, modern techniques have many other facets relevant for the analysis of proteins in numerous ways. Over a century since its development, the analytical technique of mass spectrometry is blooming more than ever, and applied in nearly all aspects of the natural and life sciences. In the last two decades mass spectrometry has also become amenable to the analysis of proteins and even intact protein complexes, and thus begun to make a significant impact in the field of structural biology. In this Review, we describe the emerging role of mass spectrometry, with its different technical facets, in structural biology, focusing especially on structural virology. We describe how mass spectrometry has evolved into a tool that can provide unique structural and functional information about viralprotein and protein-complex structure, conformation, assembly, and topology, extending to the direct analysis of intact virus capsids of several million Dalton in mass. Mass spectrometry is now used to address important questions in virology ranging from how viruses assemble to how they interact with their host. Especially, H/D (hydrogen/deuterium) exchange, and chemical labeling of solvent-accessible amino acids in combination with MS or cross-linking MS can add to our knowledge about protein structure-function relationships. In this Review, we will describe how such tools in MS can be used to study several aspects of protein structure and function, focusing in particular on the biochemical and biophysical properties of viruses and viral particles. Viruses are ideal model systems to study the assembly of protein complexes, since the viral protein shells, capsids, often have the amazing ability to self-organize their folding and assembly even in vitro without the help of chaperones. [23] Moreover, their natural capacity of encapsulating material, that is, the viral genome, makes virus capsids an interesting target for nanotechnological applications that extend far beyond drug delivery. [24] The detailed biophysical and biochemical characterization of the virus assembly and maturation processes is crucial, as such data may potentially be used to interfere with viral infection. [25] Technically, studying virus assemblies is rather challenging as the structures formed can be very large, hampering analysis by conventional structuralbiology techniques such as protein X-ray crystallography and NMR spectroscopy. Another problem is posed by the transient nature of the intermediates formed during assembly and/or maturation impeding their purification and analysis. [26] Although virus structure, dynamics, and assembly has been studied for decades, in recent years MS has entered this area of research and tackled several important questions that were less accessible by other means. [27] Proteomics approaches were early on used to study the "new" SARS virus, [28] but MS is also an emerging method to reveal the constituents of a virus, the stoichiometry of the viral structural proteins, virus assembly and the corresponding intermediates. [27] After introducing briefly a few general concepts in structural virology, we describe how modern MS can assist in virus characterization, especially focusing on structural aspects. Viruses are infectious agents that can replicate inside a host. [29] With sizes from nm to mm, most viruses are invisible under the light microscope. [30] [31] [32] Viruses are ubiquitous; they appear in archaea, bacteria, plants, and animals. [33] [34] [35] [36] [37] [38] Estimates suggest that after prokaryotes, viruses account for the second largest amount of biomass on earth. [39] Amongst the viruses, those hosting in bacteria are most abundant and called phages. The viral genome encodes all viral proteins necessary for replication in the adequate host, which can range from several hundred to just a few different proteins. Owing to their inability to reproduce independently, viruses are often not regarded as a form of life. [40] However, the discussion is still ongoing and was recently fueled by the discovery of giant mimiviruses. [30, 41] Furthermore, the evolutionary ancestry of viruses is still unclear: did they arise from pieces of nucleic acids replicating in cells; or reduce from cellular organisms? [42, 43] The ubiquity of viruses amongst all species already foreshadows a broad diversity complicating viral classification. Generally, viruses are distinguished based on either their host organism or morphogenetic characteristics and genome organization. [44] [45] [46] [47] The genome can be encoded by both RNA or DNA, in single-or double-stranded (ss and ds) form as exemplified in Figure 1 . The information can be located on (+) or (À) strands and in some cases requires reverse transcription for replication. Viruses can contain single or multiple pieces of nucleic acid in linear or circular form. The real evolutionary relationship between viruses is often difficult to obtain from one single feature, complicating virus taxonomy. The viral genome is typically enclosed in a protein shell termed the nucleocapsid. This capsid can be helical, icosahedral, or more complex in structure. The nucleocapsid alone can facilitate host-cell attachment and entry, but especially eukaryotic viruses are often enveloped with a lipid bilayer containing the adaptor proteins. [48] To replicate, viruses first have to recognize the host and introduce their genome into the cell. Next, the protein and nucleic acid machinery of the cell is taken over to produce the viral constituents, finally comes the assembly and release of the infectious virus. After successful attachment to the host cell, the nucleocapsid can enter the cytosol by various mechanisms (Figure 2 ), such as, membrane fusion or phagocytosis often used by eukaryotic viruses. [37, 49, 50] Bacteriophages generally inject their genome directly into the cell, [51, 52] whereas most other viruses release the genome from the capsid at the pore complexes of the cell nucleus. [ An exception are (+)ssRNA viruses, which also uncoat in the cytoplasm, but replicate their genome outside the nucleus in close proximity to the membrane. [54] Some viruses are incorporated metastably into the host genome, for example retroviruses and lysogenic phages. Replication of such proviruses is then induced spontaneously or after a specific trigger. [55] [56] [57] Generally following infection, viruses take control of the host-cell machineries to facilitate their own replication. [58] At late stages of infection, the structural proteins either assemble around new copies of nucleic acid or assemble independently and package the nucleic acid afterwards. [59] [60] [61] Eventually, mature virions are released by budding like vesicles or the cell can rupture as a consequence of the viral load. [62, 63] The virus can then spread and infect new cells, reinitiating the cycle. Also multiple ways of replication occur during the lifecycle of different viruses, an important step is usually maturation, signaling complete assembly of infectious particles that are ready for release. This process includes a variety of biochemical adaptations such as conformational changes triggered by the nucleic acid incorporated, attachment of auxiliary proteins, or protein posttranslational modifications, such as phosphorylation, glycosylation, proteolysis, and cross-linking. [64] [65] [66] [67] [68] Viruses represent a strong selective pressure on the host. Resistant cells have an evolutionary advantage by increased viability. As a consequence, viruses evolve by constantly altering their genome. RNA viruses, in particular, often show a high mutagenesis rate enforcing a constant struggle for survival. [69] Another mechanism which results in an increased virulence and a switch in species specificity is the recombination of partial genomes from related viral strains, as is common in influenza viruses. [70, 71] The viral nucleocapsid is of crucial importance in the viral lifecycle since it encapsulates and protects the viral genome. Especially their high efficiency to self-assemble, their strength, and efficiency in nucleic acid packaging mark nucleocapsids as intriguing structures. Typically, the capsid shell is formed by multiple copies of one or a few different structural proteins, [72, 73] although decorations with other proteins are common in non-enveloped viruses. [74] These attached proteins can increase the capsid stability, or serve a role during infection and genome packaging. The high copy number of a limited set of small proteins beneficially decreases the length of nucleic acid needed to encode for them, requiring less space for encapsulation. [72] This illustrates some of the brilliant and efficient principles underlying virus structure and function. Many capsid proteins (cp) can readily be produced recombinantly in high quantities and represent thus ideal model systems to study protein (self-)assembly. [23] Even though, the building blocks in capsid assembly are different for certain viruses, their formation is generally in agreement with nucleation theory as has also been proposed in amyloid assembly. [73, 75, 76] First, an assembly nucleus has to be formed, which may be an oligomeric assembly or just a conformationally changed cp monomer. After formation of this nucleus, further building blocks attach to it until the capsid is completed (Figure 3 ). Under conditions of efficient assembly, the nucleation is generally the rate-limiting step and only a small fraction of the proteins are in this intermediate state. The following elongation steps take place at a much higher rate leading to immediate propagation of the nucleus to a capsid. Therefore, the intermediate oligomeric species forming the assembly nucleus are typically only present in trace amounts under assembly conditions. [26, 73, 76, 77] For some viruses, it is possible to change the solution conditions favoring over-nucleation, whereby intermediates become kinetically trapped. Still, even under such conditions, processes such as protein mis-folding and aggregation, can further hamper the detection of these intermediates. [67, [78] [79] [80] A low nucleation rate ensures effective capsid assembly. Nucleation can be triggered by increasing concentration or posttranslational modifications to secure a sufficiently high titer of cp in the cell. Also, interactions with the nucleic acid can facilitate nucleus formation. [78, 81, 82] The corresponding viral capsids are often highly stable towards changes in environment and can resist, for example, extreme pH values, high concentrations of denaturants and organic solvents, dilution to very low concentrations that don't facilitate assembly, and even dehydration. [83] [84] [85] [86] [87] This effect is reflected by a strong hysteresis observed in virus dissociation experiments. Theoretical and experimental results suggest that the interaction between individual building blocks is rather weak and cannot account for the high apparent stability. [73, 88] However, in the capsid the binding energies of the subunits add up, explaining the pseudostability of the capsids under conditions where assembly typically does not occur. Nevertheless, theory suggests that in such a pseudo-equilibrium, there are always some free building blocks in solution that could dynamically exchange with proteins in the capsid. This process has been termed "capsid breathing" and some indirect experimental evidence has been described supporting this model. [88] [89] [90] [91] [92] We like to note that the term "breathing" has also been used to describe the significant conformational changes that can occur in viral cp proteins, sometimes even resulting in transient externalization of domains that, according to structural models, are on the inside. [91, 93] Most commonly, either helical or icosahedral capsid structures are observed, which both allow the formation of a regular shell with multiple copies of a single cp as a result of the high symmetry. [72] The prototype of a helical virus is the tobacco mosaic virus (TMV). In this virus, cp monomers assemble around the RNA and the length of the genome defines the nucleocapsid size. [94, 95] Icosahedral structures allow the complete closure of the shell using just one type of protein. [72] Additionally, the almost spherical structure reduces the surface area relative to the enclosed volume. An icosahedron consists of 12 vertices, 20 faces, and 30 edges corresponding to the different symmetry axis (5-, 3-, and 2fold, respectively). At least 30, generally dimeric, building blocks are required to form the smallest possible icosahedron, where all proteins are located in pentamers. Larger capsids are formed by addition of hexamers. Only certain numbers of hexamers can be inserted to produce a perfect icosahedron reflected by the triangulation number (T): T = h 2 + h k + k 2 where h and k can be any positive integer and T = 2 is therefore not allowed. The number of building blocks corresponds to 30 T. Even though viral capsids can be built up by a single cp, the surrounding contacts between subunits in hexamers and pentamers are different. [96, 97] However, the conformational changes to compensate this are often marginal, yielding cp structures that are quasi-equivalent. Additional hexamers may be introduced in a ring-like fashion leading to prolate capsids as in bacteriophage Phi29. [98, 99] Figure 2. Viral lifecycle: a) Prokaryotic viruses (phages) attach to the host cell and directly inject their genome as shown in this case for a tailed phage. Then, the genome is amplified and proteins transcribed in the cytosol. In case of tailed phages, an empty capsid forms, then the genome is incorporated followed by maturation. The assembled phages accumulate in the cytoplasm until the cell ruptures. b) Eukaryotic viruses transfer their capsid into the cytoplasm. Internalization can occur by membrane fusion in the case of enveloped viruses. At the nucleus the capsid disintegrates and releases its genome, which is reproduced in the nucleus. Protein synthesis and assembly take place in the cytoplasm. For (+)ssRNA viruses the genome is also amplified in the cytoplasm. After assembly and possibly maturation, the virus is released from the cell through budding or destructing the cell. Complex viruses often deviate from icosahedral symmetry, for example the HIV cp typically forms conical, but also rodshaped capsids. [100] 3. Mass Spectrometry in Structural Virology Next, we describe how biochemical and biophysical properties of viruses can be studied by modern MS. Before focusing on some case studies we first summarize some of the key technologies used, in four boxes: "proteomics", "native and ion mobility mass spectrometry", "H/D-exchange mass spectrometry", and "chemical-labeling approaches coupled to mass spectrometry". These refer the readers to background Reviews on these individual methods. When applied to studying virus structure, dynamics and assembly, the information obtained by these four approaches is schematically summarized in Figure 4 . MS-based proteomics is currently the most powerful tool to obtain sequence information of proteins. [6] Common practice in proteome analyses is the sequencing of proteolytic peptides obtained through fragmentation by collisioninduced dissociation (CID) and/or electron transfer dissociation (ETD). [101, 102] These fragmentation spectra are then searched against large protein-sequence databases containing the predicted spectra, derived from continuously actualized genome/protein databases. [103] Besides identifying proteins, MS-based proteomics can also be used to identify and localize naturally occurring posttranslational modifications (such as glycosylation, [104] phosphorylation, [105] lysine and N-terminal acetylation) [106] or protein chemical modifications (induced by cross-linking reactions, surface mapping by oxidation, [107] acetylation, [108] or deuterium incorporation. [107] ) For peptide/ protein identification the very selective protease trypsin is the enzyme of choice, although other enzymes are becoming popular especially in conjunction with the use of ETD. [109] Box 2: Native and Ion Mobility Mass Spectrometry In ESI-MS, molecules are ionized by a combined process of desolvation and (de)protonation. ESI is most sensitive at low flow rates (nl min À1 ) and optimal conditions can be obtained by bringing the analyte into a 1:1 mixture of water and organic solvent. Such solutions are typically acidified to promote protonation. Although these solvent conditions allow ultra-sensitive detection, they typically denature and unfold proteins. However, the goal of native MS [21, 22, 110] is to preserve higher order protein structures to enable the investigation of protein conformation, protein complex topology, and dynamics. Therefore, the samples need to be held as close as possible to physiological conditions. An ESI-MS compatible "volatile buffer" is provided by aqueous ammonium acetate, whereby salt concentrations can be varied from approximately 5 mm to 1m, retaining a neutral pH value. From numerous biophysical validation studies it has become apparent that many quaternary protein structures can be preserved under these conditions. During the ESI process the volatile buffer easily evaporates, leaving "naked" protein ions, albeit substantially less charged than when sprayed from organic ESI solvents, as the surface is more compact in these folded species. Since larger protein assemblies may attain m/ z values exceeding a few thousand, dedicated/modified time of flight (ToF) mass analyzers are required for detection. [111, 112] Retaining quaternary structures in the gas phase opens up ways to measure the mass of intact protein complexes and sub-complexes, from which information about stoichiometry and topology can be extracted. [19, 113] To probe the structure of the protein complexes, sub-complexes may also be formed intentionally using either a low concentration of denaturant, a shift in pH value and/or ionic strength [114] or through CID. [115] [116] [117] By in vitro reconstitution of membrane protein complexes in micelles, even membraneembedded protein complexes can be studied by native MS. [118] A further strength of native MS is its high sensitivity, allowing even the analysis of endogenously expressed protein complexes. [119, 120] The available toolbox has recently been extended by the coupling of ion mobility (IM) separation to MS (IMMS). [121] In IMMS, ions are separated not only on the basis of their m/z but also, inside a gas-filled ion-mobility chamber, according to their drift time, which depends on their overall shape or collision cross section (W). Typically, molecules with larger W values, that is, larger apparent volumes, exhibit longer drift times. Using IMMS data, the W value or average projected area of a protein or protein complex can be determined. Early results have revealed that solution-phase structures can be, in particular for larger protein complexes, mostly retained in the gas phase. [121, 122] IMMS is nowadays used in conjunction with computational modeling to generate refined structural models for protein complexes. [114, 123] For instance, by having high-resolution structures (from X-ray crystallography or NMR spectroscopy) of the protein complex constituents, the W value of the intact complex and/or sub-complexes can be used to predict structural models. In H/D-exchange MS, the incorporation of deuterium atoms into proteins is monitored over time. [124] [125] [126] [127] [128] The method is based on the exchange of solvent-accessible backbone hydrogen atoms with deuterium atoms when a protein is Figure 3 . Model for nucleated assembly: Conformational change or oligomer formation can result in nucleation. The nucleus formation is a slow reaction, whereas the subsequent addition of building blocks proceeds fast towards capsid completion. In cases where the nucleation is a fast process, assembly intermediates accumulate because of overnucleation and few capsids are formed. [73] Left: from top to bottom is illustrated how proteomics approaches can be used to probe the composition of a virus. Proteins of the SARS coronavirus were identified early on by MS, the detected peptides of the spike protein S1 are shown (mapped onto the structure model 1Q4Z). [143] Beneath, schematic illustrations of how proteomics was used to map the precise disposition of heterogeneous glycosylation patterns in the major HIV surface protein, [65] multiple phosphorylation sites in the HBV cp, [157] proteolytic cleavage sites in bacteriophage P22 gp4, [193] and the location of cross-links like in HK97 maturation. [67, 165] Right: How MS can be used to obtain more structural information, from top to bottom: how H/D exchange and chemical labels, for example, cross-linking were used to dissect a crucial step in HIV capsid maturation. [64] How native and ion-mobility MS provide information on the stoichiometry of an assembly, in favorable cases even the binding affinity, [160] the amount of material encapsulated, [184] and the shape of a viral protein complex. [172] placed in deuterated water (D 2 O). The subsequent increase in protein mass over time is measured with MS. Using intact or native MS global exchange in a protein or protein complex can be monitored providing information on major conformational changes, for example upon ligand binding. [129] The more detailed location of the deuterium incorporation can be determined by monitoring the mass shift in peptic fragments that are produced after the H/D-exchange reaction. Therefore, the samples are incubated in D 2 O over different times and then typically diluted to acidic solution conditions (pH % 2.5) at low temperatures (0 8C) to slow down back-exchange processes. However, few proteases can efficiently digest proteins at such pH values. Pepsin has an optimum activity at low pH values and is the preferred enzyme for H/Dexchange applications. Even though pepsin is regarded an unspecific protease, the detected peptides are reproducible. [130] After digestion, the peptides obtained are subjected to MS analysis and identified by exact mass and fragmentation pattern. The mass shift owing to deuterium incorporation can then be monitored over time, elucidating which peptides are engaged in structural changes occurring upon protein complex formation or, for instance virus maturation. [66, 131] H/D exchange coupled to MS has become a valuable analytical tool for the study of protein dynamics. By combining this information with classical functional data, a more thorough understanding of protein function can be obtained. The H/Dexchange MS approach is comparable to, and has been used in conjunction with, NMR spectroscopic experiments in which the H/D exchange is monitored over time. Although H/D exchange analysis was, for a long time, somewhat limited to small proteins or protein domains, improved resolution and sensitivity in mass analyzers combined with better software for data interpretation now also allow large proteins, such as whole antibodies, to be investigated. [132] Box 4: Chemical-Labeling Approaches Coupled to Mass Spectrometry Next to H/D exchange, there are a couple of alternative chemical approaches to probe the surface accessibility and interconnectivity of proteins in protein complexes which are frequently used in combination with MS. [133] In these approaches specific amino acids are rapidly and efficiently chemically labeled under pseudo-physiological conditions. MS is then used to probe the chemically induced mass shifts in the peptides/amino acids affected by the label. The idea is that only the accessible amino acids will be modified, allowing conformational changes in proteins to be monitored, for instance upon ligand binding or protein-complex formation. Most popular in the field of structural biology are oxidative labeling by hydroxyl radicals [134] [135] [136] or labeling of free accessible amine groups, for instance by acetylation. [108, 133, 137] Chemical labeling using molecules with at least two reactive groups can also be used for cross-linking specific amino acids that are in close proximity to each other. Using such crosslinking approaches, intra-and intermolecular interactions can be identified in a protein complex. [138] [139] [140] The most commonly used bifunctional chemical cross-linkers target lysine residues, whereby the linker rigidity and spacing in between the two reactive groups defines the range of interactions that can be probed, providing distance restraints for computational modeling. After modification of the proteins, residual reagent needs to be removed or inactivated. Following proteolysis, peptides originating from specific cross-linked regions of the proteins need to be filtered out of the background of unmodified peptides, for which dedicated software is usually a prerequisite. [139, 141] For instance, chemical cross-linking and MS were applied to probe subunit-subunit interactions in the bacteriophage P22 procapsid. [138] New viral infectious pathogens are most efficiently characterized by genotyping. [142] However, such experiments do not directly provide information about the expressed viral proteins. Using standard proteomics experiments (Box 1), viral proteins can easily be identified even from complex samples such as host-cell lysates. Such data provides direct information about the composition of the virion and may reveal its structural and accompanied proteins. [143] An illustrative example is provided by the human coronavirus causing SARS, which emerged around 2003 in Asia. After the outbreak, all the predicted structural proteins were soon identified by MS (Figure 4) , and also several glycosylation and phosphorylation sites could be mapped. [28, 143, 144] With a time delay of a few years, some of these proteins could be structurally analyzed using high-resolution techniques, such as X-ray crystallography. [145] Quantitative proteomics experiments have been used to study dynamic temporal changes in the host proteome upon infection by the pathogen, with the aim to identify the hostpathogen interactome. [146, 147] Therefore, often some form of isotope labeling is used enabling the identification of proteins whose expression level is most affected by the infection. For example, isotope-labeling strategies in combination with proteomics provided information on the cellular changes upon SARS infection and allowed the identification of hostcell factors putatively involved in virus replication. [148, 149] Surface-exposed viral proteins or protein domains usually mediate the attachment of the virus to the host. For enveloped viruses, such as HIV, those proteins are located in the lipid bilayer and are typically highly glycosylated. These glycans are likely involved in antigenicity, shielding the virus from the immune system. The sites and in particular types of glycosylation are heavily affected by mutations, hampering vaccine development against HIV. [150] Proteomics methods have been used to reveal major changes in the glycosylation patterns between different viral strains. [151] [152] [153] After peptide digestion, the mass discrepancy between glycosylated and enzymatically deglycosylated samples determines the size of the carbohydrate. Using tandem MS, the modified amino acid and the composition of the glycans can be determined. [65, 151, 152] Viral protein decoration by glycans may be very complex, as attached carbohydrates typically vary in length and type, whereby all three structural classes, high mannose, complex, and hybrid, occur and have been detected on the HIV gp120 protein (Figure 4 ). Presumably because of the high structural flexibility and heterogeneity of the glycan layer, only a deglycosylated variant of gp120 could be crystallized to date. Binding of CD4 and chemokine receptors on human T-cells to gp120 is apparently influenced by the glycosylation pattern, making this sort of analysis important. Although virus capsids may self-assemble, the assembly process in vivo is much more complex, and the host cellular machinery can regulate the assembly process. For example, phosphorylation of HBV cp enhances the formation of DNA from RNA in the assembling capsid. [154] This is likely connected to the enhanced capsid assembly and encapsulation of RNA observed after phosphorylation of multiple HBV cp amino acid residues. [155] [156] [157] Proteomics and mutagenesis studies have revealed that the host kinases PKA and PKC each phosphorylate one of the two central serine residues on HBV cp (Figure 4 ). [156, 157] Proteins involved in viral assembly and maturation processes are less prone to mutational changes than the surface proteins, and therefore offer potential targets for interference. MS has been used extensively to elucidate not only virus structure, but also to monitor dynamic changes therein throughout the viral lifecycle. Such modifications and rearrangements, which can take place during virus assembly and maturation, and also upon infection, have traditionally been mapped using fluorescent labels or globally monitored by spectroscopic techniques. [158, 159] Moreover, mutagenesis studies, such as alanine scanning, have been very valuable to characterize and localize posttranslational modifications occurring during maturation. [67, 156, 160, 161] In an elegant example, several MS-based technologies have been used and combined with electron microscopy (EM) and X-ray crystallography to investigate capsid formation and maturation of the icosahedral bacteriophage HK97. This lambdoid phage stores its dsDNA under high pressure and thus demands a stable capsid structure and irreversible assembly to ensure protection of the genome. [68, 162] Initially, 420 cp subunits organize into pentamers and hexamers. These capsomers then build the spherical, thick-walled prohead I ( Figure 5 ). The N-terminal D-domain functions as a scaffold and is cleaved off during maturation leading to prohead II. Such examples of proteolytic cleavage have been observed in many viruses and the exact location of cleavage can be identified using proteomics based methods (Figure 4 ). [163] A critical step in HK97 maturation is the conformational change leading to a thin-walled icosahedral capsid or phagehead. This expansion is usually accompanied by DNA packaging, but can be triggered in vitro in the absence of DNA and of the packaging machinery. Major structural changes associated with this transition were recognized in EM and crystallography studies. However, the resolution achieved was limited for some intermediate structures. H/D-exchange MS (Box 3) Figure 5 . HK97 and HIV virus-maturation studied by mass spectrometry. a) The cp of bacteriophage HK97 preassembles into hexamers and pentamers. These then form the prohead I together with the viral protease. Assembly is facilitated by the cp D-domain, which functions as a scaffold. The protease cleaves the D-domain and itself leading to prohead II marking the first irreversible step in capsid formation. Under certain conditions expansion occurs followed by the second irreversible step, auto-catalytic cross-linking of cp subunits. The head I of a cross-link deficient mutant closely resembles the mature head II. [66] b) The spine helix from the crystal structures of one subunit in prohead II (yellow) was aligned with the straightened helix in head II of the same subunit. A peptide analyzed by H/D exchange (c) is shown in blue. As revealed by the exchange data in (c), the straightening of the helix occurs upon transition from the prohead II (P-II) to the expansion intermediate (EI). Also shown are the deuterons incorporated in the corresponding peptide fragments for the free capsomers and head I (H-I). Reprinted by permission from Macmillan Publishers Ltd: Nature, [66] 2009. d) In HIV, the immature capsid of the Gag polyprotein encloses the (+)ssRNA and becomes enveloped. After release from the cell, the viral protease cleaves the Gag into three major proteins and some small peptides. The matrix domain stays bound to the lipid envelope through a myristoyl residue and the nucleocapsid domain is associated with the ssRNA. A conformational change in cp leads to a collapse of the spherical towards a conical capsid. At this stage the virus is infective. Other proteins, including the protease and peptide fragments, are omitted for clarity. [64] on the different intermediates in the head assembly, from capsomers to the nearly mature head I, could precisely demonstrate which amino acids were involved in the changes accompanying maturation ( Figure 5 ). [66, 131, 164] The next step in maturation is the autocatalytic cross-linking of the cp subunits locking the capsid in the expanded conformation. The specific peptide carrying a rather unusual lysine-asparagine cross-link could be identified by MS after proteolysis by trypsin (Figure 4 ). [67] More typical inter-and intramolecular cross-links are provided by disulfide bonds, which are commonly observed in mature viruses. For instance, stabilization of the Cytomegalovirus glycoprotein B involves extensive disulfide bonds, which have largely been mapped using MS and such disulfide linkages are apparently common amongst Herpesviridae (Figure 4 ). [165] Another approach commonly used to identify and map structural changes or binding surfaces between viral proteins is chemical cross-linking coupled to MS (Box 4). [141, 160, 166] H/ D-exchange and chemical cross-linking experiments [64, [166] [167] [168] have been combined to analyze the pleiomorphic HIV capsids. Their variable appearance obstructs classical highresolution structural analysis, since this generally relies on averaging multiple particles. [100] In HIV, the Gag polyprotein assembles around the (+)ssRNA at the plasma membrane. After budding of the enveloped virus, the viral protease residing in the spherical immature capsid cleaves the Gag protein. The released domains reorganize, leading to mature virions. The nucleocapsid domain is associated with the RNA in the now collapsed and conical core formed by the capsid domain (from here on cp), whereas the matrix domain stays bound to the envelope by an N-terminal myristoylation ( Figure 5 ). This myristoyl residue was also detected by MS in the matrix domain in virus-like particles demonstrating their similarity to the fully infectious virus. [167] Alanine scanning had already identified certain sites in the N-and C-terminal domain of the cp involved in intersubunit binding between these homotypic domains. However, only by using H/D exchange, a previously unknown site in the N-terminal region of cp was found to be highly protected for in-vitro assembled and mature particles, whereas immature capsids showed behavior similar to the free cp subunit. [167, 168] In general, the cp arrangement is similar for different HIV capsid appearances. Moreover, chemical cross-linking elucidated an unknown interaction between the N-and C-terminal domains of adjacent monomers (Figure 4 ). This contact probably drives maturation and is probably protected or inhibited in the Gag polyprotein. Another example focusing on virus assembly used H/Dexchange MS to locate regions in the MS2 bacteriophage cp dimer that exhibit conformational changes upon binding to a stem-loop from genomic ssRNA known to initiate assembly. [169] The data revealed specific areas within the cp dimer that altered their exchange kinetics in the presence of the RNA, including the known RNA-binding sites. ESI-MS has expanded the attainable mass range for the analysis of biomolecules tremendously, and viruses and their capsids have provided showcase benchmarks. The successful transfer of the TMV into the gas phase using ESI was demonstrated as early as 1996. [83] In these pioneering studies, the exact mass determination was precluded by the limitations of the mass analyzer employed. However, collection of the electrosprayed TMV particles and subsequent EM analysis disclosed that the viral structures had been largely retained. The TMV harvested after MS was even infective. Since then, multiple instrumental setups have been applied to estimate the mass of virus particles. A combination of m/z and charge detection in a ToF analyzer resulted in an estimated mass of 40 MDa for TMV, with a substantial uncertainty of 15 %. [170] ESI can thus be readily used to ionize viruses and viral particles, but accurate mass analysis is not straightforward when these particles become too big or heterogeneous. ESI has been combined with gas-phase electrophoretic mobility molecular analyzers (GEMMA). [171, 172] The high charge of particles produced by ESI is reduced to obtain singly charged ions, which are subsequently separated and sized by their electrophoretic mobility. GEMMA analysis was successfully applied on the 4.6 MDa cowpea chlorotic mottle virus (CCMV). Although the mass resolution of such an instrument is still too low to enable accurate mass measurement, GEMMA does provide in parallel information about the electrophoretic mobility diameter of the analyzed particle. Such analysis indicated that the gas-phase CCMV particle had largely retained its quaternary structure ( Figure 4 ). [171, 172] A first more-accurate mass assessment was performed on intact MS2 particles, whereby ESI-ToF analysis allowed the identification of partly resolved charge states. [173] Unprecedented high-resolution data on intact viral capsids of HBV were obtained using a modified Q-ToF instrument. [111, 174] HBV capsids are rather unique in exhibiting two distinct icosahedral morphologies even in vivo, composed of 90 and 120 dimers with masses of approximately 3 and 4 MDa, respectively. The mass spectra displayed well-separated charge-state distributions ( Figure 6 ) for both capsids enabling a mass assignment within 0.1 %, revealing that both lattices were complete. The HBV capsids were surprisingly stable during transfer into the gas phase and through the vacuum of the mass spectrometer. Measuring the W values of the capsids by IMMS allowed an estimation of the capsid radii in good agreement with the dimensions of the particles in EM, verifying a largely retained capsid morphology in the gas phase. [175] Although the HBV capsid turned out to be very stable, models have suggested that dimers in the particles are exchanging with very low abundant "free" cp dimers in solution. Native MS combined with CID on the intact HBV capsids was used to monitor the incorporation of isotopically labeled cp dimers into preassembled unlabeled HBV capsids. Slow exchange could be observed over a timeframe of months, albeit only for the 3 MDa particles and exclusively at low temperatures [176] providing experimental evidence for the theoretically predicted "capsid breathing". [88] The even larger (ca. 10 MDa) norovirus capsid proved much less stable and prone to dissociation when altering pH value and ionic strength as monitored by native MS. [177] At pH 6 in solution only the intact T = 3 capsid was detected, although with insufficient resolution to allow an accurate mass assignment. More acidic pH values resulted in chiefly cp dimers. Intriguingly at basic pH values, higher order oligomers were formed preferentially at high ionic strength. Remarkably, the transition between T = 3 particles and other oligomers was fully reversible. The larger oligomers most likely arose from initial dissociation of the T = 3 capsids into dimers and subsequent reassembly. The main products of this pathway were identified as 60-and 80-mers, although also smaller oligomers were present under certain conditions (Figure 7 ). Atomic force microscopy (AFM) [177] and earlier EM studies [178] confirmed that T = 1 norovirus capsids consisting of 60 cp subunits can be formed at basic pH values. Ionic strength and pH value could thus drive the norovirus capsid into various morphologies rendering this a particularly ideal model system to study capsid (dis)assembly. Owing to their low abundance and transient nature, it is extremely hard to probe potential oligomeric intermediates occurring during virus assembly in between the stage of the usually dimeric building block and the intact capsid. The nucleating intermediate is thought to form rather slowly, whereas the subsequent addition of building blocks proceeds fast towards capsid completion (Figure 3 ). The high sensitivity of native IMMS was used to detect and structurally characterize a wide variety of intermediate oligomers of the norovirus and HBV coexisting with the intact capsid forms. [179] In combination with computational modeling, the shape of these intermediates could be assessed which revealed that they all exhibited extended sheet-like structures as would be expected for on-pathway products. Moreover, the anticipated assembly nuclei for norovirus (decamer) and HBV capsids (hexamer) could be confirmed from the MS data. An assembly pathway common for both viruses was proposed. [179] In case of the MS2 bacteriophage, the assembly pathways are restricted by the bound RNA. Introducing longer RNA stretches favors assembly along the threefold axis (C 3 axis). Importantly not only the polyanionic character, but also the sequence of the RNA affects the assembly efficiency. [180] The abundance of two major intermediates was monitored by MS over time in presence of various RNAs and interpreted by kinetic modeling. [181] In conclusion, the common proteincentric view of capsid assembly seems rather simplified. Furthermore, the structure of the two intermediates was deduced by IMMS of the intact and CID fragmented species. [182] Both exhibit an extended, ring-like topology as found in the norovirus and HBV oligomers. For applications in nanotechnology the integrity of the capsids is important, but also their dynamic properties, such as reversible assembly. Knowledge about such abilities is important for particle modification and to support reactions Figure 6 . Capsid breathing in HBV monitored by tandem MS: 15 Nlabeled cp dimers of HBV were incubated with preassembled unlabeled capsids. At certain times native MS spectra were recorded and either T = 3 or T = 4 capsids of HBV were selected for CID. Subunit exchange was detected in the T = 3 capsids after prolonged times as judged by the growth of a signal assigned to labeled monomers, which were ejected from the capsids in tandem MS. Reproduced by permission of the PCCP Owner Societies from Ref. [176] . of encapsulated materials. Limited proteolysis in combination with MS on intact capsids revealed the dynamic nature of the cp in flock-house virus and CCMV in which cp regions located at the inner face become transiently exposed to the exterior. [91, 183] The incorporation of material into the caspids and modification of the capsids and other protein cages utilizes such dynamic properties. Mass shifts induced by functionalizing virus particles or creation of mixed assemblies can be readily measured with MS providing a tool to monitor the reaction and control the product quality ( Figure 4 ). [184] [185] [186] Dendrimers covalently attached to the inner face of a protein cage can be nurtured to subsequent generations. Thereby, the particle may increase its stability enabling applications in more extreme environments. The possibility to specifically modify the enclosed dendrimers further increases the utility in imaging. [184] 4. Outlook MS has evolved into a technology that contributes meaningful insights into structural and functional aspects of proteins and protein complexes, such as the viral capsids and viruses described herein. While focusing herein on structural virology, the application areas are not limited, and elegant work has also been described studying ribosomes, [17] RNA polymerases, [115] and proteasomes. [187, 188] Although further improvement in mass-spectrometric analyzers and ionization techniques will develop the field, most progress in the near future will come from the linking of MS with other technologies, such as computational modeling, microscopy, and spectroscopy. Structural modeling of protein complexes is still largely underdeveloped partly because of the size of the particles to be modeled. However, advances made in recent years, for instance in modeling of the yeast nuclear pore complex, [189] may be suitable for wider application and benefit from W value and topology data of protein (sub-)complexes obtained by native IMMS. Coupling of MS with EM or AFM can be envisaged to be bi-directional. The mass analyzer can be used as sample purification step, using soft-landing approaches to select particles of interest that may be further structurally characterized by EM or AFM. [190] In this way, maybe low abundant viral-assembly intermediates can be trapped and studied by microscopic techniques. Conversely, EM-and AFM-based methods can be used to "grap" selective organelles or even protein complexes out of their natural cellular environment, and subsequently bring those to the mass spectrometer for identification and/or structural analysis. [191] Another exciting development that may become available in the near future uses ultra-short and intense X-ray laser pulses to obtain high-resolution structures of individual protein complexes in the gas phase. [192] Such a method would be highly complementary to the herein described native MS and IMMS approaches and could provide higher resolution structural information, which may be directly compared with solution-phase structural data obtained by more conventional structural-biology techniques, such as NMR spectroscopy, X-ray crystallography and EM. These are exciting times in biomolecular mass spectrometry. Interpretation of mass spectra of organic compounds Proc. Natl. Acad. Sci Proc. Natl. Acad. Sci Proc. Natl. Acad. Sci Van den Heuvel Mathews in Fields Virology Condit in Fields Virology Harrison in Fields Virology Proc. Natl. Acad. Sci Proc. Natl. Acad. Sci Proc. Natl. Acad. Sci We thank all colleagues in the Heck-lab, especially the native MS group, who contributed to some of the research described herein. We would like to take this opportunity to thank all our collaborators through whom we entered the exciting field of structural virology. With the danger of accidently leaving people out, we mention in particular Norman R. Watts, Paul T.