key: cord-0310793-7rty5s1b authors: Pak, Alexander J.; Gupta, Manish; Yeager, Mark; Voth, Gregory A. title: Inositol hexakisphosphate (IP6) accelerates immature HIV-1 Gag protein assembly towards kinetically-trapped morphologies date: 2022-03-29 journal: bioRxiv DOI: 10.1101/2022.03.29.486265 sha: ca65f0e24fc356d209f4ab5ef6c82491a38bdfe2 doc_id: 310793 cord_uid: 7rty5s1b During the late stages of the HIV-1 lifecycle, immature virions are produced by the concerted activity of Gag polyproteins, primarily mediated by the capsid (CA) and spacer peptide 1 (SP1) domains, which assemble into a spherical lattice, package viral genomic RNA, and deform the plasma membrane. Recently, inositol hexakisphosphate (IP6) has been identified as an essential assembly cofactor that efficiently produces both immature virions in vivo and immature virus-like particles in vitro. To date, however, several distinct mechanistic roles for IP6 have been proposed on the basis of independent functional, structural, and kinetic studies. In this work, we investigate the molecular influence of IP6 on the structural outcomes and dynamics of CA/SP1 assembly using coarse-grained (CG) molecular dynamics (MD) simulations and free energy calculations. Here, we derive a bottom-up, low-resolution, and implicit-solvent CG model of CA/SP1 and IP6, and simulate their assembly under conditions that emulate both in vitro and in vivo systems. Our analysis identifies IP6 as an assembly accelerant that promotes curvature generation and fissure-like defects throughout the lattice. Our findings suggest that IP6 induces kinetically-trapped immature morphologies, which may be physiologically important for later stages of viral morphogenesis and potentially useful for virus-like particle technologies. During the late stages of human immunodeficiency virus type 1 (HIV-1) replication, an immature particle is assembled at the plasma membrane interface and released [1] [2] [3] ; this latter step is aided by the endosomal sorting complex required for transport (ESCRT) machinery. 1, 4 The capsid (CA) and spacer peptide 1 (SP1) domains of the structural Gag polyprotein are responsible for coordinating Gag oligomerization into the immature lattice, [5] [6] [7] which is both hexagonal and spherical (and therefore, incomplete). [7] [8] [9] [10] The matrix (MA) and nucleocapsid (NC) domains mediate interactions with the plasma membrane and viral genomic RNA, respectively. 1, [11] [12] [13] The molecular mechanisms that regulate Gag assembly are potential targets for anti-retroviral therapies, as similarly demonstrated by capsid inhibitor drugs that perturb capsid assembly after maturation. [14] [15] [16] The CA domain of Gag consists of two globular domains: the N-terminal (CANTD) domain consists of 7 a-helices (helix 1-7) and the C-terminal (CACTD) domain consists of 4 a-helices (helix [8] [9] [10] [11] . [17] [18] [19] On the basis of atomistic structures resolved at 3.9 Å and 3.27 Å resolutions, using cryo-electron tomography (cryo-ET) 20 and X-ray crystallography 21 , respectively, several critical inter-and intra-hexameric contacts have been identified. A trimeric contact formed by helix 2 stabilizes inter-hexameric interactions between CANTD while a dimeric contact formed by helix 1 stabilizes intra-hexameric interactions between CANTD. 20, 22 Inter-hexameric interactions are further mediated by the CACTD dimeric contact formed by helix 9, which is partially preserved in the mature CA configuration. [22] [23] [24] [25] Within the CACTD domain, the major homology region (MHR; residues I285-L304) loop and GVGG (residues G352-V355) b-turn are two structural motifs that contain important interactions, likely required to nucleate a-helical folding of the CACTD/SP1 junction (residues P356-T371). 21 This helical junction forms a six-helix bundle (6HB) that is stabilized by hydrophobic knobs-in-holes interactions, and represents an important intrahexameric interaction that is minimally necessary for immature assembly. 21, 26 Mutations within the aforementioned hexameric contacts have been shown to abrogate immature assembly. 6, 21, [26] [27] Beyond the protein-protein interactions described above, immature particle assembly requires other cellular constituents. For instance, viral RNA and to a lesser extent, generic oligonucleotides, serve as catalysts for protein multimerization by promoting colocalization 11, [28] [29] [30] ; in the former case, this is primarily mediated by interactions between the NC domain and the RNA Y packaging signal. [31] [32] The plasma membrane serves as a scaffold to promote protein assembly through dimensional reduction, 11, [33] [34] which is directed by myristoyl insertion and interactions between the MA domain and phosphatidylinositol 4,5-bisphosphate (PIP2) lipids, [35] [36] [37] , and ultimately evolves into the lipid envelop of the released viral particle. In vitro assembly studies have previously identified short oligonucleotides and inositol hexakisphosphate (IP6) as two constituents minimally necessary for immature virus-like particle (VLPs) assembly. 30, [38] [39] In mammalian cells, IP6 is present in the cytoplasm at 10-40 µM concentrations. [40] [41] Recently, combined biochemical and structural studies have shown that IP6 is an essential cofactor that promotes immature particle production around a stoichiometric ratio of one IP6 molecule per immature hexamer. [42] [43] Within the immature hexamer, the negatively charged IP6 molecule binds between two six-membered lysine rings, K290 and K359, which line the interior pore of the hexamer and are positioned directly above the 6HB. 42 While it is evident that the presence of IP6 is essential, the precise role of IP6 during immature virus assembly remains unclear. For example, is the role of IP6 to stabilize the 6HB by neutralizing the two lysine rings along the central pore? Previous in vitro studies have shown that CA/SP1 morphology is sensitive to solution pH or salt (e.g. NaCl or KCl) concentrations, which modulate electrostatic interactions 42, 44 ; under high salt or low pH conditions, CA/SP1 proteins assemble into mature tubes while under low salt or high pH conditions, these same proteins assemble into spherical VLPs. 42, [44] [45] Intriguingly, alanine mutations of the lysine rings tended to increase in vitro VLP formation 46 while reducing in vivo particle production, 43 both with respect to wild-type (WT) virus. Alternatively, is the role of IP6 to influence assembly kinetics? For example, a recent study by Kucharska et al. suggests that Gag encodes all of the necessary interactions for assembly and that IP6 and viral RNA act as assembly rate modulators. 47 Finally, it is possible that IP6 serves both roles in a synergistic fashion. In this work, we use coarse-grained (CG) molecular dynamics (MD) simulations to investigate the molecular influence of IP6 on CA/SP1 assembly. To do so, we derive a bottom-up, low-resolution, and implicit-solvent CG model of CA/SP1 and IP6, and implement a MD trajectorybased enhanced sampling approach. By quantifying 3D and 2D assembly behavior under three different buffer conditions (150 mM monovalent salt, 150 mM salt + IP6, and 400 mM salt), as well as free energy calculations, we show that IP6 primarily acts as an assembly accelerant. Immature lattices formed through the influence of IP6 assemble at faster rates compared to without, thereby promoting curvature generation and fissure-like defects throughout the lattice. Our analysis suggests that IP6 induces kinetically-trapped immature morphologies, which furthermore, may be physiologically important for later stages of viral morphogenesis or leveraged for efficient production of VLPs. We first derived "bottom-up" implicit-solvent CG models for CA/SP1 (35 sites) and for IP6 (1 site), which were parameterized from AA MD trajectories (depicted in Fig. 1A) . The two reference AA MD simulations were of a CA/SP1 18-mer, i.e. a hexamer surrounded by six partial hexamers, and the same 18-mer complexed with seven IP6 molecules positioned between the K290 and K359 rings in each hexamer pore; the first trajectory was used to derive the CA/SP1 CG model while the second trajectory was used to derive CG interactions between CA/SP1 and IP6. The CG Hamiltonian consists of four additive terms (see Fig. 1A ) that respectively represent the excluded volume, flexibility, electrostatics, and short-range attraction of each molecule. Electrostatics are represented by a screened Yukawa potential; monovalent salt concentrations are implicitly modeled by adjusting the Debye length. Complete details on the mapping and parameterization procedure are provided in Methods. Using the aforementioned CG models, we performed two types of CG MD assembly simulations. We first simulated CA/SP1 assembly under conditions that emulate prior in vitro studies. Here, we included CA/SP1 with 50-nt oligonucleotide (represented as a linear polymer) randomly dispersed within a cubic box (see Fig. 1B ) at (i) 150 mM monovalent salt, (ii) 400 mM monovalent salt, and (iii) 150 mM monovalent salt and IP6 conditions. We next simulated CA/SP1 under simplified in vivo conditions, i.e., at a plasma membrane interface. We used the same salt and IP6 conditions, and additionally included a 9000-nt oligonucleotide and a CG lipid membrane using a generic model with 30 kBT bending rigidity. Overlay of a CG CA/SP1 monomer (solid blue, grey, and yellow spheres) and IP6 monomer (red sphere) over an AA CA/SP1 18-mer (transparent blue, grey, and yellow tubes) bound to 7 IP6 molecules (cyan, purple, and red sticks); blue, grey, and yellow represent the NTD, CTD, and SP1 domains, respectively. The additive interaction elements of the CG Hamiltonian are depicted to the right; excluded volume (Eexcl), hetero-elastic network model (EhENM), screened Yukawa electrostatics (Eelec), and protein-protein associative contacts (Eattr). (B) A representative snapshot of an initial simulation configuration; active CA/SP1 (blue, grey, and yellow beads), inactive CA/SP1 (white beads), IP6 (red beads), and oligonucleotide (orange beads) molecules are randomly distributed. (C) Schematic of the parallel sampling technique used to propagate assembly during CGMD simulations. For each CGMD simulation of length Δ , multiple CGMD trajectories were spawned (thin red lines) from the previously selected trajectory. From the probability distribution of resultant assembly sizes ( ( )), a trajectory that corresponded to the maximum ( ) was selected to spawn the next generation of trajectories. All selected trajectories (thick red lines) were appended to create the final trajectory. For both sets of simulations, we implemented a constant protein concentration algorithm and a path sampling algorithm. In the former, we intermittently identified CA/SP1 that have oligomerized with a hexameric CA/SP1 seed, which served as a nucleator. All remaining CA/SP1 were randomly partitioned into "active" and "inactive" monomers, similar to the ultra-coarsegraining strategy used in prior mature CA assembly simulations. 16, 48 Assuming an excess of initial CA/SP1 is provided, this strategy ensures that a constant concentration of CA/SP1 is available for assembly. To aid assembly path sampling, we discretized the assembly process into sequential segments of length Δ !" CG time steps. For each segment, 10 parallel CG MD simulations following randomized Langevin dynamics were propagated. Cluster sizes were computed at the final configuration of each parallel run, and the probability distribution of cluster sizes ( ( )) was approximated as a log-normal distribution. The trajectory yielding a cluster closest to the mean of ( ) was used to initialize the next set of parallel trajectories; the selected trajectory from each chuck was concatenated to create the final assembly trajectory, a strategy conceptually similar to parallel cascade MD. 49 Since CA/SP1 assembly is a rare-event process, our strategy facilitates cluster formation and extracts trajectories that represent the mean kinetics predicted by the underlying CG model. Complete details on the CG MD simulations are provided in Methods. To quantify free energies of monomer-monomer association, we performed 2D welltempered metadynamics (WTMetaD) simulations. [50] [51] Each system contained two monomers that were biased against two collective variables (CVs): the distances between the two CACTD centroids and the two CANTD centroids; when included, IP6 was restrained to be proximal to the K290 and K359 residues of one monomer. To remove volumetric entropic effects, the free energies were shifted by the Jacobian correction factor # ln $ . 52-53 All reported free energy surfaces (FESs) were averaged over three independent replicas after alignment to the free energy at large distances. Additional details on the WTMetaD simulations are provided in Methods. In vitro experiments have shown that the presence of 500 µM IP6 induces CA/SP1 (also 500 µM) assembly into spherical VLPs at pH 6 and physiological salt conditions, while the absence of IP6 induces amorphous protein aggregates. Under high salt conditions, CA/SP1 instead assembles into tubules. Mutating K290 and K359 to alanine was previously shown to yield VLPs, suggesting that electrostatic repulsion of the lysine ring inhibits assembly, which may be recovered by salt-induced electrostatic screening. Here, we use CG MD simulations to investigate the differences between IP6-induced and salt-screening-induced CA/SP1 assembly under similar in vitro conditions. We simulated 850 µM CA/SP1 assembly under three different conditions: (i) 150 mM monovalent salt (i.e., low salt), (ii) 400 mM monovalent salt (i.e., high salt), and (iii) 150 mM monovalent salt with 425 µM IP6; in all three cases, 50-nt RNA-representing polymers were also added to the system. Our results are summarized in Fig. 2 . The time-series profiles in Fig. 2A show that while CA/SP1 fails to assemble at low-salt conditions, both high-salt and low-salt with IP6 conditions lead to productive protein assembly. We note that under our simulated salt and IP6 concentrations, the presence of IP6 (elevated salt) yields a lattice consisting of 999±89 (534±27) CA/SP1 after 1500´10 6 tCG, suggesting that IP6 accelerates CA/SP1 by nearly two-fold compared to high-salt conditions. Beyond accelerated kinetics, the presence of IP6 appears to shift the morphological outcome of CA/SP1 assembly. As depicted in Fig. 2B , CA/SP1 assembles into a spherical VLP under low-salt with IP6 conditions. Here, the particle consists of fissure defects that are characteristic of immature HIV-1 virions as seen in prior cryo-ET lattice maps. 3, 8, 54 Alternatively, CA/SP1 assembles into a curved and contiguous hexameric lattice under high-salt conditions, as evident from Fig. 2C . Here, each protomer appears to maximize its inter-hexameric and intra-hexameric contacts, suggesting that the assembled lattice is representative of an energetically minimized state. In contrast, the VLP formed under the presence of IP6 is likely a kinetically trapped state that results from enhanced assembly kinetics. Prior assembly simulations have shown that similar defects emerge when protomer association energetics are enhanced 11 , suggesting that IP6 interactions induce a comparable effect. To quantify thermodynamic differences between protomer association under the aforementioned conditions, we performed WTMetaD simulations. We compare our computed FESs in Fig. 2D , which are projected onto the distances between CTD-CTD and NTD-NTD centersof-mass. The low-salt FES serves as a baseline to characterize how the underlying FES is altered under high-salt and low-salt with IP6 conditions; recall that 850 µM CA/SP1 does not spontaneously assemble under these conditions (see Fig. 2A ). All three cases exhibit qualitatively similar minimum free energy paths (MFEPs), in which association begins at the CTD-CTD interface, which is followed by additional association at both CTD-CTD and NTD-NTD interfaces. However, there are several key differences to note. For example, CTD-CTD association is facilitated under high-salt conditions, as demonstrated by the broadening of the FES towards smaller CTD-CTD distances when the NTD-NTD distance is below 4.5 nm. The shapes of the free energy minimum where the CTD-CTD (NTD-NTD) distance is around 2.5 (3.3) nm are also distinct. This local minimum is comparatively shallow in the absence of IP6, whereas the minimum is both deepened and broadened when IP6 is present. With IP6 present, the free energy barriers for CA/SP1 association and dissociation are 5.7 ± 0.6 and 1.5 ± 0.8 kcal/mol, respectively. The association barrier is notably lower than that of both low-salt (10.6 ± 0.1 kcal/mol) and high-salt (9.6 ± 0.2 kcal/mol) conditions, while the dissociation barriers (1.9 ± 0.3 and 0.8 ± 0.4 kcal/mol for low-salt and high-salt, respectively) are comparable across all three cases. In other words, rather than stabilizing the bound state, i.e., increasing the dissociation barrier, IP6 appears to promote CA/SP1 association by reducing the association barrier, in part by promoting colocalization up to CTD-CTD and NTD-NTD distances of 3.5 and 4.5 nm, respectively. We should note, however, that the above calculations only pertain to monomer-monomer association, while oligomer-monomer association may include additional collective effects. Nonetheless, these calculations serve as a minimal basis to identify and compare the impact of electrostatic screening and IP6 interactions on CA/SP1 assembly. In our prior CG MD simulations of CA/SP1 assembly on a membrane, we had found that initial protein assembly first required a perturbation or "puncta" to the local membrane curvature 11 as the assembling immature Gag without it could not sufficiently drive the membrane bending towards a budding event. Interestingly, prior experimental studies have shown that the number of virions released from cells is reduced by around 60-90% when IP6-producing enzymes are knocked out without loss of integrity to the produced virion. 43 We therefore hypothesize that the effect of IP6 on the assembling Gag may induce the necessary membrane curvature generation (and therefore, force) necessary for productive immature lattice assembly and budding. We thus simulated CA/SP1 assembly with RNA at a lipid membrane interface (30 kBT bending rigidity) under the same three conditions as above: (i) 150 mM monovalent salt (i.e. low salt), (ii) 400 mM monovalent salt (i.e. high salt), and (iii) 150 mM monovalent salt with IP6 (using a 1:1 CA/SP1:IP6 stoichiometric ratio); here, a 9000-nt RNA-representing polymer was added to the system. To reduce the lag time arising from the nucleation event, we ran short simulations until a small hexamer was formed, which we used to seed the rest of the lattice. We maintained a constant surface coverage of 2380 #/µm 2 for CA/SP1 protein. We summarize assembly time-series statistics in Fig. 3A . At our simulated protein concentrations, we find that CA/SP1 is unable to assemble at 150 mM salt; as discussed later, assembly under these salt concentrations is possible with elevated protein expression levels. In contrast, both 400 mM salt and 150 mM salt with IP6 conditions results in immature lattice growth. However, in the case of 150 mM salt with IP6, it is notable that the lattice grows to one containing 220±15 protomers after 400´10 6 CG time steps, while the 400 mM salt case grows to one containing 90±20 protomers within the same simulation time. In other words, the presence of IP6 at 150 mM salt accelerates assembly nearly 2.5-fold compared to that of elevated salt at 400 mM concentrations (i.e. increased electrostatic screening). As seen in Fig. 3B , accelerated assembly kinetics under the presence of IP6 also appears to incorporate fissure-like defects throughout the lattice, which correspondingly results in greater curvature generation. In comparison, as seen in Fig. 3C , the lattice assembled under 400 mM salt remains uniform, contiguous, and flatter in comparison to the IP6-containing lattice. Although these assembly rates are slower than those computed for the in vitro system, we note that the two systems should not be directly compared due to differences in effective protein concentration and dimensional reduction. In Fig. 3D , we quantify the 2D FES for CA/SP1 association when diffusion is restricted to the lateral xy plane. Similar to our prior analysis under in vitro conditions, we investigate the changes to the FES under high-salt and low-salt with IP6 conditions in comparison to low-salt conditions, under which extended assembly is not observed. Under high-salt conditions, the shape of the close contact free energy minimum located when the NTD-NTD and CTD-CTD distances are around 3.2 and 2.4 nm, respectively, is qualitatively similar to that of low-salt conditions. Interestingly, the free energy barriers for dissociation under low-salt and high-salt conditions are comparable (3.2 ± 0.6 and 4.1 ± 0.4 kcal/mol, respectively) while the barriers for association are slightly lowered in the latter case (7.9 ± 0.4 and 7.1 ± 0.3 kcal/mol, respectively, with p = 0.04 according to the unpaired t-test). Although the position of the free energy minimum is conserved under low-salt with IP6 conditions, the shape of this minimum is widened to favor an alternative CTD-CTD association path, while the barriers for dissociation and association are 2.0 ± 0.6 and 4.1 ± 0.4 kcal/mol, respectively. Hence, while the presence of IP6 yields a comparable dissociation barrier to that of the low-salt case, the association barrier is notably reduced by 3.8 ± 0.5 kcal/mol. It is also interesting to note that the association barriers across the three cases are consistently lower than that of the above in vitro system, while the dissociation barriers are consistently larger. Both effects are likely due to dimensional reduction, which effectively reduces the number of accessible transition state configurations. To further quantify CA/SP1 assembly kinetics, we performed several shorter simulations (10´10 6 CG MD timesteps) at varying protein surface coverage levels. While our previous simulations predict that CA/SP1 does not assemble under low-salt conditions at 2380 #/µm 2 protein surface coverage, assembly may proceed at elevated concentrations. In particular, we depict the mean CA/SP1 assembly rate as a function of protein surface coverage and assembly conditions in Fig. 4A . At low surface coverage (£ 2500 µm -2 ), both high-salt and low-salt with IP6 conditions promote assembly while low-salt conditions do not. At high surface coverage (³ 2500 µm -2 ), all three conditions result in productive assembly, with the largest rates achieved under low-salt with IP6 conditions, followed by high-salt and low-salt conditions. Assuming power law rate kinetics, we find that all three assembly conditions follow sublinear scaling with respect to protein surface coverage, in which low-salt, high-salt, and low-salt with IP6 conditions have scaling law orders of 0.63, 0.66, and 0.74, respectively. At this point, it is not clear why sublinear reaction orders emerge. The impact of RNA as a substrate for CA/SP1 co-localization or the assembly of CA/SP1 into small intermediates prior to lattice growth, [55] [56] [57] [58] including the basic trimer-of-dimer unit that has been implicated for both mature and immature lattice assembly, 11, 48, 59 are two factors that are likely to contribute. In the latter case, a sublinear reaction order may indicate that competing intermediates, that is, intermediates that are not productive for lattice assembly, increasingly form at elevated protein concentrations. Nonetheless, it is clear that the addition of IP6 and increased electrostatic screening due to increased salt consistently increase assembly rates compared to physiological salt conditions; low-salt with IP6 and high-salt conditions represent a 3-fold and 2-fold increase, respectively, in effective protein concentration compared to that of low-salt conditions. We further assess the morphological characteristics of the assembled lattices under varying protein surface coverages and assembly conditions. In Fig. 4B and Fig. 4C , we depict the rate of formation for complete and defective (i.e., incomplete) hexamers, respectively, within the assembled lattice. We find that high-salt conditions consistently yield the largest rate for hexamer growth while low-salt with IP6 conditions yield the lowest. Intriguingly, both high-salt and low-salt conditions accumulate defective hexamers at rates comparable to that of their hexamer growth rates, although reduced by around 30-50%. The inclusion of IP6, on the other hand, dramatically increases the rate of defect growth by 6-fold compared to the rate of hexamer growth. It is therefore evident that while defects are incorporated into the lattice under all three assembly conditions, and more so at elevated protein concentrations, the rate of defect incorporation is substantially increased due to the presence of IP6. We have developed a bottom-up (i.e., derived from atomistic simulation data), lowresolution, and implicit-solvent CG model for HIV-1 CA/SP1 and IP6 in order to study their assembly behavior on length-and time-scales relevant to the formation of viral particles. Despite the simplicity of the model, our CGMD simulations recapitulate key experimental observations, including the selective inducement of spherical VLPs under the presence of IP6 42, 46 that are morphologically consistent with immature lattices observed in vivo. 3, 8 Furthermore, we have investigated the sensitivity of CA/SP1 assembly kinetics and structural outcomes to salt conditions and the presence of IP6. Thermodynamically-driven self-assembly often favors a self-regulated nucleation and growth mechanism that ensures error correction and tends to consist of weak multivalent interactions between specific protomer-protomer interfaces resulting in the formation of structures that maximize these contacts. [60] [61] Assembly can also proceed along kinetically-trapped pathways that may yield morphologies alternative to the thermodynamically preferred structure. Our simulations suggest that Gag tubule formation, as seen under select in vitro conditions 44 or with single point mutations, 62 is slow and proceeds through disassembly/reassembly, a known error correction mechanism, 63 whereas spherical particle formation is kinetically-driven, in which lattice defects represent kinetically-trapped states. In the latter case, the lattice grows too quickly for error correction by local remodeling, producing defects that subsequently facilitate curvature throughout the lattice. Thus, we propose that tubules are a near-equilibrium assembly product while VLPs are a kinetically-trapped product. Relatedly, we find that IP6 promotes kineticallytrapped morphologies under conditions where CA/SP1 would not otherwise assemble by lowering the free energy barrier for association rather than stabilizing the associated state, which accelerates CA/SP1 assembly. The robust creation of immature lattice defects by IP6 may serve additional purposes beyond simple curvature generation. The ability to transform spherical particles into coneshaped cores via Gag proteolysis is an intriguing property of HIV-1 maturation. 1 During this process, viral protease (PR) cleaves Gag precursors at five different sites, in which cleavage of the site within the CA/SP1 6HB is the rate limiting step. 1, 64-65 Our prior cryo-ET and MD study showed that lattice defects facilitate uncoiling of the CA/SP1 partial helical bundle, which is crucial for protease access to the CA/SP1 cleavage site. 54 By inducing fissure-like defects via a kineticallydriven process, IP6 may facilitate the emergence of cleavage initiation sites during maturation. Recent studies of Gag multimerization at the plasma membrane interface suggest that the extent of membrane deformation, 11 and relatedly lipid phase segregation, 66 are coupled to the dynamics of Gag assembly. On the basis of our membrane-bound simulations, we propose that the free energy penalty for lattice growth due to membrane deformation resistance is mitigated by the IP6-induced lattice defects, which serve to relieve strain from lattice curvature. In addition, our free energy calculations indicate that IP6 lowers the free energy of the protomerprotomer bound state by 2.6 ± 0.5 kcal/mol compared to the case when IP6 is absent, suggesting that IP6 lowers the free energy of formation for the lattice. Both factors may explain why IP6 appears to be essential for efficient virion production, 43 although additional work is necessary to explain why other inositol phosphate molecules, such as inositol pentakisphosphate (IP5), and related polyanions are not as effective. In the absence of IP6, CA/SP1 constructs have been shown to assemble into mature tubes in vitro 42, 46 while our simulations produce relatively flatter and contiguous immature lattices. This observation underscores a limitation of the present CG model, which was trained using immature state statistics, and therefore restricted to the immature state. The mature state requires a structural transition from the immature state, which involves the uncoiling of the CA/SP1 helical junction and a hinge-like motion such that CANTD forms the primary hexameric pore. 24-25, 65, 67 Incorporating this structural transition into future CG models is likely to be nontrivial, but may be aided by recent methodological efforts to recapitulate conformational transitions within CG space. 68 Given that the presence of IP6 shifts mature CA assembly outcomes towards conical cores instead of tubules, 42 we speculate that mature CA assembles into tubules following a near-equilibrium pathway while the presence of IP6 induces kinetically-driven assembly into mature cores. Nonetheless, the present work highlights the utility and importance of kinetically-driven assembly in the context of immature HIV-1 viral particle production, although additional studies are needed to understand the connection between the fissure-like defects, the kinetics of PR activity throughout the immature lattice, and mature CA assembly during the final stages of maturation. In conclusion, our findings demonstrate the critical role of IP6 as both an assembly accelerant and morphological modulator of the immature lattice during the late stages of HIV-1 viral morphogenesis. Beyond HIV-1, IP6 has also been implicated as an assembly cofactor in other retroviruses, such as Rous sarcoma virus and equine infectious anemia virus. [69] [70] It would be interesting to investigate the utility of IP6 and its derivatives for the efficient production of broad VLPs that have been repurposed for biomedical technologies, including vaccine development and biologics delivery. [71] [72] [73] [74] Our results will hopefully stimulate future experimental studies as well as the use and development of even more sophisticated CG models. All-Atom (AA) MD simulations. We used the 18-mer atomic model of CA/SP1 from Ref. 20 (PDB 5L93) as our initial protein structure. Seven IP6 molecules (each phosphate was deprotonated) were placed between the seven K290/K359 double lysine rings using Monte Carlo insertion. The protein-cofactor complex was solvated by 163010 water molecules, 646 Na + ions, and 508 Clions (i.e. 150 mM NaCl) in a dodecahedron box with 1.5 nm of space between the edges of the protein and the box. All simulations used the CHARMM36m force field 75 and were performed using GROMACS 2016. 76 Minimization was performed using steepest descent until the maximum force was reduced to 500 kJ/mol/nm. Then, equilibration was performed in several phases. First, 10 ns were integrated in the constant NVT ensemble at 310 K using the stochastic velocity rescaling thermostat 77 with a damping time of 0.1 ps and a timestep of 2 fs. During this phase, the heavy atoms of the protein were harmonically restrained with a force constant of 1000 kJ/mol/nm 2 . An additional 10 ns were integrated in the constant NPT ensemble using the Nose-Hoover chain thermostat 78 at 310 K (2 ps damping time) and the Parrinello-Rahman barostat 79 at 1 bar (10 ps damping time) and a timestep of 2 fs; the restraints on the protein were maintained. Finally, an additional 530 ns were integrated in the constant NVT ensemble using the Nose-Hoover chain thermostat 78 at 310 K (2 ps damping time) and a timestep of 2 fs. Throughout this procedure, Hcontaining bonds were constrained using the LINCS algorithm. 80 Protein and cofactor configurations were gathered every 200 ps. The same process was repeated to generate a second trajectory without IP6 molecules. The final 500 ns of each trajectory were used for CG model generation. Using the generated AA MD trajectories described above, we derived a bottom-up model of CA/SP1 and IP6. To map the CA/SP1 protomer, we used the Essential Dynamics Coarse-Graining (EDCG) method 81 to identify residue groupings that maximized overlap of the CG model motions with the AA collective motions. We specified 35 CG sites to balance computational efficiency and least-squared error in the represented principal component subspace. We mapped the IP6 molecule into a single CG site using a linear center-ofmass mapping. After CG mapping, effective CG interactions ( !" ) were determined using the following four terms: Our CG model initially assumes that all close contacts between protomers contribute to protein association through two-body attractive interactions, i.e., -../ . We identified relevant CG site pairs on the basis of their pair correlation functions. If the first peak of the pair correlation function was located below 1.75 nm, and the standard deviation less than 0.15 nm, that pair of CG sites was selected. We represented -../ as a pairwise Gaussian potential, 7 8√$0 exp D− where r0 and s are the mean and standard deviation determined by a fit to the first peak of the pair correlation function between CG sites i and j through least-squares regression. The constant H was optimized for each pair interaction using relative-entropy minimization (REM). 85 During this process, some of the parameters exhibited diverging behavior or yielded repulsive interactions. Interactions with diverging behavior were removed from the CG model while the remaining parameters were reoptimized, although only minor changes were observed. We used the same interaction parameters for the RNA polymer as in our prior CG study of HIV-1 Gag assembly 11 and for the CG lipid as in our prior CG study of the SARS-CoV-2 virion. 86 CG assembly simulations. All simulations were prepared using PACKMOL and Moltemplate. [87] [88] For the in vitro simulations, 3012 CA/SP1 monomers, 200 50-nt RNA polymers, and 1500 IP6 molecules (when relevant) were randomly distributed throughout a 180´180´180 nm 3 cubic box; two replicas were simulated. For the membrane simulations, a CG lipid bilayer was prepared using a grid distribution of 152352 lipids (76176 per monolayer) in a 450´450 nm 2 lateral (xy) domain with an initial spacing of 4 nm between the two monolayer head groups in the z direction. An additional 4090 CA/SP1 monomers were uniformly distributed along a 2D grid 1.5 nm below the CG membrane followed by the 9000-nt RNA polymer (3 nm below the layer of CA/SP1) and a random distribution of 4096 IP6 molecules (3 nm below the RNA polymer). Due to computational cost, only one replica was simulated. All CG MD simulations were performed using LAMMPS 18Jun2019. 89 Conjugate gradient energy minimization was performed on each system until the change in force was less than 10 -6 . All systems were initially equilibrated using a Langevin thermostat 90 at 310 K (5 ps damping constant) over 5´10 6 tCG (with tCG = 50 fs) with attractive protein-protein interactions turned off to remove bias due to initial protein distributions. For the membrane simulations, a Berendsen barostat 91 at 0 atm (10 ps damping constant, applied over the lateral xy dimension) was also applied; during this process, the xy simulation domain expanded to 454´454 nm 2 , which remained fixed for the remainder of the simulation. We then integrated all CG molecules using a Langevin thermostat at 310 K (50 ps damping constant). We simultaneously integrated 10 trajectories over 50´10 6 tCG with tCG = 50 fs; during these simulations, we maintained an 85% and 15% active population of CA/SP1 which were randomly selected every 0.5´10 6 tCG for the in vitro and membrane-bound simulations, respectively. Trajectory snapshots were saved every 5´10 6 tCG. At the end of each set of 10 CGMD simulations, the sizes of the largest assembled CA/SP1 cluster in each trajectory was computed and their distribution fit to a log-normal function. The variance of the distribution was used to compute the standard error of the mean cluster size. The trajectory closest to the mean cluster size was used to initiate the next set of 10 simultaneous trajectories. CG metadynamics simulations. Free energy profiles were computed using the well-tempered metadynamics method 51 via LAMMPS interfaced with the PLUMED 2.4 plugin. 92 We used the same CGMD conditions as above. To emulate membrane-bound effects, we used two planar indenter forces ( 5=>%=. ( ) = ( − ? ) $ ) to restrict the z-diffusion of the NTD domain; here, = 1 kcal/mol/Å 2 . Gaussian biases with a height of 0.1 kcal/mol and width of 0.05 nm were deposited every 1000 tCG using a bias factor of 2 kBT. The biases were deposited along two collective variables (CVs): (i) the distance between the centers-of-mass of the CTD and (ii) that of the NTD. A harmonic restraint was applied with a force constant of 1000 kcal/mol/Å to prevent the two CVs from exceeding 8 nm. CG analysis and visualization. Assembled clusters were identified using two proximity criteria. Inter-hexameric contacts were defined by the CTD dimer interface, i.e. when the distance between CG types 25/26 were less than 1.8 nm. Intra-hexameric contacts were defined by the top of the CA/SP1 6HB, i.e. when the distance between CG type 32 was less than 1.8 nm. For each trajectory snapshot, an adjacency graph was constructed and extracted using the NetworkX 2.1 (http://networkx.github.io/) Python package. Extracted clusters were visualized using VMD 1.9.3. 93 HIV-1 assembly, budding, and maturation. Cold Spring Harb Perspect Med Maturation of retroviruses Structure and assembly of immature HIV Human immunodeficiency virus type 1 nucleocapsid p1 confers ESCRT pathway dependence Dimerization of the SP1 Region of HIV-1 Gag Induces a Helical Conformation and Association into Helical Bundles: Implications for Particle Assembly On the role of the SP1 domain in HIV-1 particle assembly: a molecular switch Electron cryotomography of immature HIV-1 virions reveals the structure of the CA and SP1 Gag shells Cryo electron tomography of native HIV-1 budding sites Timing of ESCRT-III protein recruitment and membrane scission during HIV-1 assembly Three-dimensional analysis of budding sites and released virus suggests a revised model for HIV-1 morphogenesis Immature HIV-1 lattice assembly dynamics are regulated by scaffolding from nucleic acid and the plasma membrane Properties and functions of the nucleocapsid protein in virus assembly Roles of Gag-RNA interactions in HIV-1 virus assembly deciphered by single-molecule localization microscopy Small-molecule inhibition of human immunodeficiency virus type 1 infection by virus capsid destabilization Structural and mechanistic bases for a potent HIV-1 capsid inhibitor Off-Pathway Assembly: A Broad-Spectrum Mechanism of Action for Drugs That Undermine Controlled HIV-1 Viral Capsid Formation Structure of the HIV-1 full-length capsid protein in a conformationally trapped unassembled state induced by small-molecule binding Crystal structure of dimeric HIV-1 capsid protein Structure of the carboxyl-terminal dimerization domain of the HIV-1 capsid protein An atomic model of HIV-1 capsid-SP1 reveals structures regulating assembly and maturation Crystal structure of an HIV assembly and maturation switch Structure of the immature HIV-1 capsid in intact virus particles at 8.8 Å resolution Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics X-ray crystal structures of native HIV-1 capsid protein reveal conformational variability X-ray structures of the hexameric building block of the HIV capsid Efficient particle production by minimal Gag constructs which retain the carboxy-terminal domain of human immunodeficiency virus type 1 capsid-p2 and a late assembly domain Mutation of the SP1 sequence impairs both multimerization and membrane-binding activities of human immunodeficiency virus type 1 Gag In vitro assembly properties of purified bacterially expressed capsid proteins of human immunodeficiency virus Role of RNA in facilitating Gag/Gag-Pol interaction RNA is a structural element in retrovirus particles Distinct binding interactions of HIV-1 Gag to Psi and non-Psi RNAs: implications for viral genomic RNA packaging Identification of the initial nucleocapsid recognition element in the HIV-1 RNA packaging signal Membrane interaction of retroviral Gag proteins Lipid domains in HIV-1 assembly Binding mechanism of the matrix domain of HIV-1 gag on lipid membranes Structural and molecular determinants of HIV-1 Gag binding to the plasma membrane Phosphatidylinositol (4,5) bisphosphate regulates HIV-1 gag targeting to the plasma membrane Modulation of HIV-like particle assembly in vitro by inositol phosphates Self-Assembly in-Vitro of Purified Ca-Nc Proteins from Rous-Sarcoma Virus and Human-Immunodeficiency-Virus Type-1 Do mammals make all their own inositol hexakisphosphate? Complex changes in cellular inositol phosphate complement accompany transit through the cell cycle Inositol phosphates are assembly co-factors for HIV-1 Cellular IP6 Levels Limit HIV Production while Viruses that Cannot Efficiently Package IP6 Are Attenuated for Infection and Replication A conformational switch controlling HIV-1 morphogenesis Proteolytic refolding of the HIV-1 capsid protein amino-terminus facilitates viral core assembly In Vitro Quantification of the Effects of IP6 and Other Small Polyanions on Immature HIV-1 Particle Assembly and Core Stability Biochemical Reconstitution of HIV-1 Assembly and Maturation Coarse-grained simulation reveals key features of HIV-1 capsid selfassembly Parallel Cascade Selection Molecular Dynamics (PaCS-MD) to generate conformational transition pathway Well-tempered metadynamics converges asymptotically Well-tempered metadynamics: a smoothly converging and tunable free-energy method The Jacobian factor in free energy simulations Free energy from constrained molecular dynamics Immature HIV-1 assembles from Gag dimers leaving partial hexamers at lattice edges as potential substrates for proteolytic maturation Assembly of viruses and the pseudo-law of mass action Distinguishing reversible from irreversible virus capsid assembly Viral self-assembly as a thermodynamic process Dynamic pathways for viral capsid assembly A Trimer of Dimers Is the Basic Building Block for Human Immunodeficiency Virus-1 Capsid Assembly Enzyme-assisted self-assembly under thermodynamic control The statistical mechanics of dynamic pathways to self-assembly Cryo-electron microscopy of tubular arrays of HIV-1 Gag resolves structures essential for immature virus assembly A two-pronged structural analysis of retroviral maturation indicates that core formation proceeds by a disassembly-reassembly pathway rather than a displacive transition Processing sites in the human immunodeficiency virus type 1 (HIV-1) Gag-Pro-Pol precursor are cleaved by the viral protease at different rates High-resolution structures of HIV-1 Gag cleavage mutants determine structural switch for virus maturation A lipid-based partitioning mechanism for selective incorporation of proteins into membranes of HIV particles Quenching protein dynamics interferes with HIV capsid maturation Multiconfigurational Coarse-Grained Molecular Dynamics Structures of immature EIAV Gag lattices reveal a conserved role for IP6 in lentivirus assembly Structure of the mature Rous sarcoma virus lattice reveals a role for IP6 in the formation of the capsid hexamer Virus-Like Particle Engineering: From Rational Design to Versatile Applications Engineering virus-like particles as vaccine platforms. Current Opinion in Virology Engineering Virus-like Particles for Antigen and Drug Delivery Engineered virus-like particles for efficient in vivo delivery of therapeutic proteins CHARMM36m: an improved force field for folded and intrinsically disordered proteins GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers Isothermal-isobaric molecular dynamics using stochastic velocity rescaling Nosé-Hoover chains: The canonical ensemble via continuous dynamics Crystal Structure and Pair Potentials: A Molecular-Dynamics Study LINCS: A linear constraint solver for molecular simulations A Systematic Methodology for Defining Coarse-Grained Sites in Large Biomolecules Systematic Multiscale Parameterization of Heterogeneous Elastic Network Models of Proteins On the Interaction of Elementary Particles. I. Proceedings of the Physico On the Dielectric "Constant" of Proteins: Smooth Dielectric Function for Macromolecular Modeling and Its Implementation in DelPhi The relative entropy is fundamental to multiscale and inverse thermodynamic problems A multiscale coarse-grained model of the SARS-CoV-2 virion PACKMOL: A package for building initial configurations for molecular dynamics simulations Moltemplate a Coarse-Grained Model Assembly Tool Fast Parallel Algorithms for Short-Range Molecular Dynamics Second-order integrators for Langevin equations with holonomic constraints Molecular dynamics with coupling to an external bath PLUMED 2: New feathers for an old bird VMD: visual molecular dynamics Frontera: The Evolution of Leadership Computing at the National Science Foundation, Practice and Experience in Advanced Research Computing (PEARC '20) This research was supported by the National Institute of Allergy and Infectious Diseases (NIAID) of the National Institutes of Health under grant R01 AI154092. A.J.P gratefully acknowledges support from the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under grant F32 AI150477. The authors also acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC and visualization resources that have contributed to the research results reported within this paper. 94 This work also used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by the National Science Foundation grant number ACI-1548562. 95 Model parameters, simulation input files, and custom code are available on Github at: https://github.com/uchicago-voth/MSCG-models/tree/master/HIV_CASP1 . The input files and data generated from this study are accessible from Zenodo: https://zenodo.org/record/6335601.