key: cord-0811840-sr9srqk0 authors: Huber, Roland G; Marzinek, Jan K; LS Boon, Priscilla; Yue, Wan; Bond, Peter J title: Computational modelling of flavivirus dynamics: the ins and outs date: 2020-06-08 journal: Methods DOI: 10.1016/j.ymeth.2020.06.004 sha: f81d830b7ce6952c7300b526af4e0d34d9818bb3 doc_id: 811840 cord_uid: sr9srqk0 Enveloped viruses such as the flaviviruses represent a significant burden to human health around the world, with hundreds of millions of people each year affected by dengue alone. In an effort to improve our understanding of the molecular basis for the infective mechanisms of these viruses, extensive computational modelling approaches have been applied to elucidate their conformational dynamics. Multiscale protocols have been developed to simulate flavivirus envelopes in close accordance with biophysical data, in particular derived from cryo-electron microscopy, enabling high-resolution refinement of their structures and elucidation of the conformational changes associated with adaptation both to host environments and to immunological factors such as antibodies. Likewise, integrative modelling efforts combining data from biophysical experiments and from genome sequencing with chemical modification are providing unparalleled insights into the architecture of the previously unresolved nucleocapsid complex. Collectively, this work provides the basis for the future rational design of new antiviral therapeutics and vaccine development strategies targeting enveloped viruses. Viral diseases continue to impose a severe burden on human health and economy. Despite recent advances in the medical and pharmacological sciences, viral infections continue to cause millions of human deaths every year, and also have a major impact on livestock and agricultural productivity. Furthermore, viral pandemics such as HIV 1 pose a continuing global risk to public health. There is a constant threat of emerging viral pathogens which can cause epidemics, as exemplified by the 2015 outbreak of Zika virus (ZIKV) and its association with birth defects and neurological issues. Most recently, in late 2019, the first reports of an unknown respiratory infection emerged from Wuhan, China. The source of that infection was a novel coronavirus, related to those that had previously caused outbreaks of Severe Acute Respiratory Syndrome (SARS-CoV) and Middle East Respiratory Syndrome (MERS-CoV). Since the emergence of the illness resulting from the new SARS-CoV-2 virus, COVID-19, millions of infections and hundreds of thousands of deaths have been reported worldwide 2 . Both viruses led the World Health Organization (WHO) to declare a Public Health Emergency of International Concern. Flaviviruses are typically transmitted by mosquitoes or ticks, and encompass around 50 known pathogens that cause human disease. Dengue (DENV) is the archetypal member, infecting up to ~400 million people per year worldwide 3 . All flaviviruses adopt a similar structure, composed of only three structural proteins, namely the envelope (E) and membrane (M) proteins -embedded within a lipid bilayer vesicle and arranged in an icosahedral fashion -and the capsid (C) proteins, which form a complex with the singlestranded, positive-sense RNA genome inside the virion core. 180 copies of the E and M proteins are arranged with icosahedral symmetry on the surface of the virion particle 4 . In the mature particle, these are found as three sets of parallel dimers or "rafts" which are arranged in a herringbone-like pattern. The E protein is exposed on the virion surface, and is responsible both for binding to cell receptors during infection, as well as fusion with endosomal membranes inside the cell, enabling genome release 5 . The discovery of new antiviral drugs is hampered by numerous challenges including the integration of the viral life cycle with that of the host, potential development of resistance, and the ongoing threat of newly arising pandemics. Vaccination is a potentially effective strategy to combat viral disease, but can be complicated by pathogen-specific factors such as antibody-dependent enhancement (ADE) 6 in which non-neutralizing antibodies actually facilitate viral infectivity. This is a particularly relevant issue for DENV, which exists as four distinct serotypes, and a recently available vaccine has been reported to worsen the outcome of patients who have not previously been infected 7 . To counter the challenges in viral disease therapy, research efforts continue to focus on better understanding the molecular mechanisms associated with infection. These have been facilitated by advances in structural biology, including the ever increasing resolutions attainable by cryo-electron microscopy (cryo-EM) 8 for whole virion particles, particularly when resultant density maps are combined judiciously with available X-ray crystallographic or solution NMR structures of individual viral proteins. Furthermore, the molecular dynamics (MD) simulation technique has now come of age. With the ever increasing computational power available and specialized hardware, along with appropriate multiscale approaches that enable accurate but coarse-grained (CG) representations of systems or components thereof, it is now possible to simulate entire viruses, and to use such simulations to refine integrative models incorporating experimental data 9 . As described in the following sections, such computational models have enabled the description of the conformational dynamics of flaviviruses during progression along their life cycle in unparalleled detail, as well as furthering our understanding of their interactions with host components such as antibodies. In addition, the architecture of the flavivirus nucleocapsid complex containing the RNA genome is not well understood, because it appears not to be organized in a clear geometric pattern amenable to cryo-EM, and because a significant component of the C protein is intrinsically disordered and hence challenging to study via traditional approaches, depending on conditions and maturation state 10 . Thus, we also describe our recent efforts to integrate biophysical experiments with simulation to characterize the C protein conformational ensemble, and to use high-throughput sequencing (HTS) with chemical modification techniques in combination with modelling to predict genomic RNA structures. Flavivirus infection is triggered through receptor mediated endocytosis, when the virus is internalized inside an endosome. The acidic endosomal pH triggers the E proteins to undergo a major structural rearrangement and a dimer-to-trimer transition. Through this process, hydrophobic fusion peptides (FPs), which are positioned at the tip of each trimeric E protein "spike", become exposed. This allows them to interact with the endosomal membrane, leading to its fusion with the viral envelope 11 . We and others have reported efforts to model the flavivirus lipid bilayer, and in particular, we describe in subsequent sections protocols to model the entire envelope particle in accordance with cryo-EM data, and to leverage such models to understand different stages of the viral life cycle. Once the capsid and RNA genome are released into the cell, this leads to polyprotein translation and viral replication. 4, 12, 13 Subsequently, the assembly of the nascent immature dengue virus (immDENV) proceeds via budding of its components into the endoplasmic reticulum (ER). The immDENV is rendered non-infectious due to the complex formed between the prM (precursor of membrane) protein with the E protein in the viral membrane. The immDENV envelope is composed of 60 trimeric spikes, in which each monomer consists of an E:prM complex. 14, 15 Under the low pH conditions in the trans-Golgi network (TGN), the maturation process is triggered, involving cleavage by furin protease of pr fragments to form the mature M proteins, and a large-scale icosahedral DENV surface rearrangement with a trimer-to-dimer transition of the E proteins. Nevertheless, the low pH prevents the dissociation of pr molecules from dissociating from the E protein. 15, 16 This serves to prevent the virus particle from fusing within the cell prior to release, as the E protein FP regions are "capped" by the pr fragments. Upon release of the virus from the cell, the neutral pH of the extracellular environment causes dissociation of pr molecules, thus completing the maturation process. Maturation is inefficient and a substantial number of DENV particles released from the cell are either partially mature or immature. 17 However, when they are complexed with specific antibodies, this can lead to ADE, rendering them infectious by facilitating the endosomal internalization of the immDENV:antibody complex. In this context, although furin can cleave the pr:M complex from the antibody-bound virion inside the endosome 18 , it has remained unclear how pr dissociates in the acidic pH environment. Thus, we also describe below our recent use of simulations of the immature viral envelope in combination with cryo-EM, coflotation assays, and hydrogen deuterium exchange mass spectrometry (HDX-MS) to rationalize an example of ADE at the molecular level. It is also emerging that flaviviruses undergo major structural and dynamic changes outside of the cell, in response to different environmental triggers. In particular, the envelope is modulated when exposed to higher temperatures that may be associated with fever. Such viral conformational dynamics have been referred to as "breathing". [19] [20] [21] [22] The most prominent temperature-mediated transitions have been observed in specific strains of DENV2. The virus envelope has been observed to expand and become "bumpy" at 37°C, in contrast to 28°C where it exhibits a "smooth" envelope surface. 19, 23, 24 Although the existence of such variable morphologies may not be correlated with infectivity 23 , they likely have important consequences for antibody binding and neutralizing, and represent a challenge for therapeutics development. Interestingly, flavivirus "breathing" was thought to be irreversible, but as described below, a combination of multiscale modelling, cryo-EM, single-molecule fluorescence, and HDX-MS has enabled us to observe the process of viral expansion in atomic detail, and to elucidate the mechanism by which divalent cations enable the virion to reverse this process. 25 In the following, we summarize computational and integrative modelling methods, guided by biophysical data, that have yielded molecular insights into the "ins" and "outs" of enveloped flaviviruses. In Section 3, we begin by highlighting areas of ongoing interest in the context of viral envelope dynamics, which have been tackled recently using a combination of multiscale simulation and experiment. We then go on to detail efforts to characterize the virion's nucleocapsid cargo via simulation (Section 4) and RNA sequencing-guided modelling (Section 5), before reflecting upon where future efforts may lie in better understanding enveloped viruses (Section 6). Numerous all-atom and CG simulations of the flavivirus envelope have been reported. For example, several all-atom simulation studies investigated viral envelope conformational changes occurring upon exposure to low pH, both during the fusion and maturation processes. [26] [27] [28] [29] [30] [31] [32] [33] [34] The capacity for individual E/M proteins or their complexes to initiate membrane fusion 35, 36, 37, 38, 39, 40 and host membrane bending 41 has also been extensively studied via simulation. Nevertheless, it is important to consider the dynamics of the ~50 nm diameter flavivirus envelope in its entirety. 42, 43 The sheer size of the virus particle, comprising 180 monomeric E/M proteins, represents a significant computational cost in attempting to simulate relevant timescales 42 , leading to the frequent use of the wellestablished Martini CG force field, in which four heavy atoms map to approximately one CG particle. 44, 45, 46, 47 Furthermore, the flavivirus shell poses additional challenges as a result of the need to model a lipid membrane in which the viral proteins are embedded. In the following sections, we summarize key steps required in our protocol 47 for generating a complete flavivirus envelope model, for multiscale refinement of its structure in direct accordance with cryo-EM data, which may subsequently be leveraged to explore conformational dynamics associated with the viral life cycle. Particle types are approximately transferable across systems in the Martini force field as a result of their being parameterized according to partitioning free energies 48, 49 . On the other hand, several approaches are possible for the treatment of protein conformation, including application of an elastic network model 50 (ENM) between backbone beads, on top of secondary structure dependent dihedrals, to preserve higher order structure. Our experience suggests that this tends to be system specific, and calibration against all-atom reference simulations is desirable. Thus, we first simulated the solvated DENV dimeric E protein ectodomain in atomic resolution for several hundred nanoseconds, and generated equivalent CG trajectories 47 ( Figure 1A ). Default Martini parameters along with a range of ENMs were tested, validated by comparing protein conformational properties from all-atom and CG simulations, including backbone root mean square deviations (RMSDs) with respect to the X-ray structure, protein radius of gyration, and pairwise residue contact maps. An ENM with non-standard cut-off distances and force constants were ultimately required to accurately describe the E protein conformational dynamics. 47 The viral lipid bilayer vesicle was subsequently constructed using the CHARMM-GUI 51 Martini-builder, with a diameter chosen in accordance with estimates from cryo-EM density maps. The DENV2 membrane composition included palmitoyl-oleoyl phosphatidylcholine (PC), palmitoyl-oleoyl phosphatidylethanolamine (PE), and palmitoyl-oleoyl phosphatidylserine (PS) in a 6:3:1 ratio, as guided by lipidomics data measured for C6/36 mosquito cells. 52 The complete protein coordinates of 180 E/M proteins representing the entire virion envelope, obtained from the cryo-EM structure, were next converted to CG representation. These were then directly overlaid with the lipid vesicle, and overlapping lipids deleted ( Figure 1B ). To facilitate insertion whilst reducing steric clashes, we employed a "shrinking" strategy for the lipids. This was performed for the coordinates of lipid tails by scaling their atomic positions onto the lipid tail centres of mass in a plane perpendicular to the tail principle axes 47 . This step did not involve any simulation, but was followed by multiple sets of energy minimization whilst "freezing" the coordinates of the protein backbone; this may be performed several times in an iterative manner to obtain a stable starting model of the lipid/protein envelope complex. This may subsequently be solvated (in a sensibly chosen simulation unit cell, e.g. dodecahedron or truncated octahedron), with energy minimization and protein-restrained equilibrations in the NVT ensemble to relax the system. We would advise that a short equilibration simulation time step (e.g. ~2 fs) be chosen initially, which can be gradually increased in progressive ~100,000-step equilibration runs. Finally, this may then be followed by NPT ensemble equilibration and unrestrained production runs. The MDanalysis 53 or GROmaρs 54 tools may be used to extract a grid-based theoretical density map from a single simulation frame, or from numerous averaged frames extracted from progressive equilibration or production runs. Both protein and lipid should be included for density map calculations, and it is recommended to use a voxel grid-spacing close to the resolution of the cryo-EM map of interest. The Chimera 55 visualization and analysis tool (https://www.cgl.ucsf.edu/chimera/) enables map-in-map fitting of the theoretical MD generated and experimental density maps. This is performed via local optimization using steepest ascent to maximize the overlap or correlation between two sets of map grid points. The correlation between two maps corresponds to the mean cosine similarity across voxels, and varies from -1 to 1. In our case the correlation reached ~0.75-0.8 during the simulations of the DENV particle, dependent upon precise system conditions ( Figure 1C ). 47 Due to the low resolution of the cryo-EM map obtained for the "bumpy" expanded state of DENV, the membrane-embedded components of the proteins (namely, E protein transmembrane helices and entire M proteins) could not be resolved. To elucidate the complete expanded virus structure and to explore the "breathing" pathway, a targeted MD (TMD) 56, 57 approach was taken to trigger the transition, using our refined structure of the smooth DENV virion as a starting point ( Figure 2 ). TMD is currently available in Gromacs 58 when used in combination with PLUMED 59 , as well as in NAMD 60 , and employs a time dependent harmonic potential in which the difference in RMSD between the initial first-step and final-step target structures evolves linearly over a specified simulation timescale. Over the TMD simulation, the force on each atom is given by the gradient of the potential: where is the instantaneous RMSD value measured between the current and target coordinates, evolves linearly from the first to the final step, and is the associated force constant. The force constant used in the initial TMD simulations for triggering DENV "breathing" were 1,000 kcal mol -1 Å -2 applied to the E protein backbone beads; this was sufficiently strong that the structure follows the centre of the harmonic spring closely, but not weak enough to avoid introducing numerical instabilities. Following an initial 100 ns of TMD simulation, the E protein coordinates of the entire virion typically reached ~0.4-0.6 nm RMSD with respect to the target, expanded virus structure. To further refine the structure, it is advised to run additional, shorter follow-up TMD simulations with ~3-5 times increased force constant. Thus, an additional 10 ns simulation with a force constant of 4,000 kcal mol -1 Å -2 yielded a virion structure within ~0.1 nm RMSD with respect to the expanded experimental structure. The TMD simulations revealed the gradual opening of protrusions at the 3-and 5-fold vertices on the icosahedral surface of the virus, consistent with cryo-EM images. 23 Figure 2 . Multiscale simulation approach to study DENV "breathing". On the left, the application of TMD in CG resolution is illustrated for the transition between the smooth DENV2 (typical of lower temperatures as found in mosquitos) and expanded DENV2 (typical of higher temperatures as found in the human host). The envelope is shown in surface representation, with lipids in grey, and proteins in yellow, red, or blue for the 2-, 3-and 5-fold vertices, respectively. On the right, an atomistic representation for an isolated E protein pentamer is shown, following-back mapping from the CG expanded virion structure. Protein is shown as blue cartoon, with lipids (PC:PE:PS in a 6:3:1 ratio) in CPK lines format. Fluorescence and HDX-MS data suggested an influence of divalent magnesium or calcium cations upon the reversibility of DENV expansion, upon lowering the temperature from 37°C back to 28°C, 25 and localized the effect as occurring at peptide segments around the virion 5-fold vertices. In order to investigate the molecular basis for this, we "back-mapped" our expanded CG DENV envelope model to all-atom representation. To do this, geometric projection/reconstruction was performed using a freely available library of mapping definitions (available at http://cgmartini.nl/index.php/tools2/resolution-transformation), followed by several all-atom simulation relaxations. 61 From the back-mapped, relaxed virion structure, we extracted an E protein pentamer and its associated M proteins arranged around a 5-fold vertex. This was performed to reduce the system size in all-atom representation. The virus membrane was approximated as a flat lipid bilayer. This was built using CHARMM-GUI 62,63 and corresponded to a biologically relevant membrane composition, similarly to the preceding whole virion model, of a PC/PE/PS mixture in a ~6:3:1 ratio 52 . Subsequently, we embedded the pentamer with its transmembrane (TM) regions inserted into the membrane, and removed clashing lipids within ~0.3 nm of protein atoms. The system was then minimized in vacuum and solvated. In order to retain the higher order pentameric "expanded" structure, with its ectodomain positioned above the membrane according to cryo-EM data, we applied position restraints with a low force constant to the protein C atoms 25 during simulations totalling 200 ns. Atomistic simulations revealed that the emergent spaces at the 5-fold vertices are accessible to water and salt, and that divalent cations can ''soak'' the protein 5-fold interface and competitively interact with acidic residues to break inter-chain salt bridges. This is expected to reduce the stability of the expanded state, enabling reversible contraction of the virion to its smooth form at low temperatures. 19, 23, 24 More recently, a combination of atomic-resolution simulations and cryo-EM revealed how point mutations at the dimeric E protein interface modulate the threshold for virion "breathing", which is proposed to be a mechanism by which DENV adapts to its host environment to evade antibody recognition. 64 ADE can result from the recognition of prM proteins by non-neutralizing antibodies on the surface of immDENV particles, facilitating virion endocytosis. However, following cleavage by furin of the pr:M complexes inside the endosome, it is unclear how pr dissociates in the acidic pH environment, such that the E protein FP is exposed and the virus can infect the cell. To investigate this further, low resolution (~12-25 Å) cryo-EM structural intermediates along the maturation pathway were solved of such an immDENV:antibody complex. 65 To improve the resolution and to resolve TM regions, a higher-resolution structure (of a fully immature virion, PDB: 4B03 66 ) was overlaid onto the fully immature virion solved at pH=8, via superimposition onto each of the individual 180 E protein ectodomains, using PyMOL. 67 Subsequently, a similar procedure to that described in Section 3.3 was used to generate equilibrated CG models for the proteins of the immature virion particle embedded with an explicit lipid bilayer envelope; the resulting theoretical density maps overlapped well with the equivalent experimental ones, reproducing the spherical membrane vesicle ( Figure 3A ). The equilibrated model of the immDENV:pr (pH=8) complex was subsequently used as a starting point for TMD simulations, to explore the molecular pathway towards the intermediate-mature structure solved at pH=5 ( Figure 3B ). The initial RMSD between the ectodomain backbone in the initial versus target structures was ~5 nm. Initially, all E protein chains were simultaneously biased towards the target structure, but this led to significant clashes between E protein trimers (blue) and pentamers (red) ( Figure 3B ). Thus, we instead applied a sequential series of biases to the E proteins; this involved first biasing all chains between RMSD values from 5 to 3.5 nm along the pathway, followed by a bias of the blue and green chains only between RMSD values from ~3.5 to 2.5 nm, and finally a bias of all chains from ~2.5 nm RMSD until the end of the simulation ( Figure 3C ). These simulations revealed that the neighbouring blue E protein molecule was approached by the pr:red-E protein complexes, dislodging the pr molecule from the 5-fold symmetry axes; based on additional modelling studies, this was predicted to be accelerated in the presence of bound antibodies due to clashes between Fab domains. This extent of pr binding site occlusion on the red E protein was not observed for the blue or green molecules ( Figure 3C ). Similar conclusions were drawn when running TMD simulations in which biases were instead applied independent either to red (5-fold), blue (3-fold), or green (2-fold) E protein molecules alone. These result matched nicely with cryo-EM data, and combined with cofloatation assays and HDX-MS measurements, this study indicates that pr:Fab complexes dissociates as a complex from DENV E proteins due to steric clashes, thereby facilitating maturation and infectivity. 65 It should be noted that whilst the predictions regarding conformational pathways in breathing or maturation are likely to be accurate, not least because of the incorporation of multiple experimental intermediate "snapshots" which help to validate the structural transitions, the kinetics of these processes are more difficult to accurately characterize. In part, this is because of the TMD approach, in which the specified timescale during the linear evolution of the structure as well as associated biasing force constant will largely determine the rate of change along the pathway of interest. On top of this, the nature of the CG methodology means that timescales should be treated with great care. Like other CG forcefields, Martini simplifies the energy landscape, speeding up sampling rates, but this speed-up is not easily predictable and is unlikely to be consistent across different degrees of freedom, or different types of molecules 68 . From the experimental perspective, the rates of whole-virion conformational changes are similarly not well established, though singlemolecule approaches are beginning to help measure such phenomena. 43 The flavivirus C protein is a small but multifunctional protein. A large part of its multifunctionality is derived from the intrinsically disordered regions (IDRs) at the Nterminus of the protein. Due to the flexible nature of the N-terminal IDRs, the structure and dynamics of these regions have remained undefined: X-ray crystallographic and NMR structures of flavivirus C proteins either solve the structure without the N-terminal tails or are unable to resolve the atoms in the region. [69] [70] [71] However it remains an important task to characterise and understand the dynamics of these IDRs, and indeed, intrinsically disordered proteins (IDPs) are abundantly found in nature. 72, 73 Using MD simulations in conjunction with biophysical experiments, the dynamics of these IDRs can be better understood at atomic resolution. In recent years, the need for accurate treatment of IDRs in simulations have seen a rise in IDP-specific parameterised force fields. [74] [75] [76] [77] We recently showed that certain force fields are more suited to the DENV C protein than others. 78 Small angle X-ray scattering (SAXS) provides low resolution information on the shape of proteins in solution and is suited to IDPs due to their fuzzy and dynamic nature. 79 Selecting conformations from a simulation has the benefit of selecting structures that have been relaxed and equilibrated and are closer to the experimental conditions. This is as opposed to selecting conformations from a random pool of conformations for the IDRs derived from X-ray structures or NMR structures that may have unresolved clashes or are in non-aqueous conformations. Using an ensemble of conformations derived from MD simulations, we were able to fit the SAXS data better than using a randomly generated pool. 78 The SAXS data was used to validate the most representative ensemble of conformations from the different force fields by fitting a pool of structures derived from the simulations, and selected via a genetic algorithm that minimises the chi-squared value. The force field with the smallest chi-squared value was the IDPparameterized force field, amber03ws. The resultant conformations that best fit the SAXS data showed that the alpha 1 helices towards the N-terminus of the protein may come together in a clamp-like motion to occlude a hydrophobic patch at the core of the DENV C protein. This is a refinement of the previously solved NMR structure, and may be a better representation of the protein in an aqueous environment. HDX-MS may also be useful to elucidate the dynamics of IDRs. The protein is incubated in the presence of deuterated water and then the reaction is quenched. During the time of incubation in the presence of deuterium, the backbone amide hydrogens exchange with the deuterium, and this change in mass can be detected through mass-spectrometry. IDPs and IDRs show greater exchange of their backbone amide hydrogens with deuterium than do proteins regions that are structured. The correlation of backbone hydrogen bond formation or solvent accessibility with amide backbone deuteriation may also be used to validate the ensemble of structures derived from different forcefields. 78 Again, trajectories based on the amber03ws force field exhibited the highest correlations for both backbone hydrogen bond formation and solvent accessibility with experimentally measured amide backbone deuteriation, thus validating our calculated ensemble of C protein conformations. Inherent structural complexity and variability of viral genomic material make it an elusive target for common structural biology techniques. While it is possible to study genomic structures of simple viruses with cryo-EM, 80-83 more complex viral systems such as flaviviruses remain a challenge. [84] [85] [86] [87] [88] Determining the structure and organization of viral genomes is crucial for understanding the function of this core component of viruses at the molecular level. [89] [90] [91] [92] [93] [94] [95] [96] In the face of challenges to classic structural biology techniques, a new set of methods is called for to investigate this key area of viral biology. Recent advances in genome sequencing have allowed us to leverage structural probing techniques that were previously confined to small RNA systems on the whole genome level, particularly for viral genomes. 89, 94, [97] [98] [99] [100] [101] [102] [103] Combining information obtained through structural probing and crosslinking experiments with computational modelling techniques allows us to gain insights into genome-level interactions between different loci that are of functional importance for the virus. 95 Structural probing can be performed at distinct phases within the viral life cycle, e.g. at a virion stage or within infected cells, and hence a time-resolved picture of genomic organization can be obtained. This allows us to identify key structural features at the stages of translation, replication, or packaging, as recently demonstrated for the ~11 kb RNA genome of DENV and ZIKV, potentially opening up novel therapeutic strategies targeting flaviviruses 95 (Figure 4 ). Figure 4 . Experimental sources of data to augment RNA structure prediction. The search space for the prediction of long RNA molecules is too large to obtain accurate predictions of structure using computational techniques alone. Incorporation of experimental data can increase the prediction accuracy dramatically. SHAPE data provides information on local structure by assessing the differential reactivity of the 2'-hydroxyl group in flexible versus constrained bases. SPLASH and other cross-linking techniques can provide information on the macro-organization of RNA by forming chimeric reads. Footprinting techniques can identify regions of RNA interacting with proteins by observing an increased resistance to digestion that correlate with increasing protein concentrations. RNA is capable of adopting a wide range of different structures that serve a regulatory or catalytic 104 purpose including translation, splicing decay, and RNA transport. 105 As such, studying how RNAs fold and their energetic landscapes is important for understanding gene regulation. 106 The problem of predicting RNA secondary structure from sequence is challenging as a large search space of possible base pairs needs to be traversed and the energetics assessed with high accuracy. A variety of tools exist that allow the prediction of likely RNA secondary structure based on folding free energy estimates and capable of including chemical reactivity information. [107] [108] [109] [110] Generally, these approaches use energetic terms for base pairing and stacking interactions. The challenge for predicting the structure of long RNAs increases exponentially as the search space increases and hence accuracy of prediction for long RNAs without the inclusion of additional experimental information is limited. 100 A common strategy to mitigate this effect is to limit the maximum distance between paired bases to a few hundred nucleotides, thus reducing the search space considerably. The trade-off involved in this approach results in more accurate predictions of local secondary structure at the cost of neglecting longer-range interactions that are responsible for the macro-scale organization of RNA molecules. Unfortunately, it is evident from studies into viral systems that such long-range interactions are often key functional elements, as is highlighted e.g. by the Flavivirus 5'-3' circularization motifs that span the length of the entire genome. 111 At present, only a combination of different methods is able to capture long-range and short-range interactions with high accuracy. 95, 112, 113 While it has been shown that de-novo prediction of RNA structures from sequences alone is unreliable for larger RNAs, 114, 115 this problem can be overcome to some degree by including experimental information on secondary structure in the modelling process. 100 Selective 2'hydroxyl acylation analysed by primer extension (SHAPE) is an established protocol yielding such secondary structure information for all of the four bases (A, U, C, G). 116 The concept underlying SHAPE probing is that local RNA structure, especially the difference between flexible and physically constrained bases, affects the reactivity of the 2'-hydroxyl group of constituent nucleotides. Flexible bases are able to orientate themselves to react with SHAPE chemicals to result in acylation at the 2'-hydroxyl of the RNA. Hence, local secondary structure can be read by observing the degree of 2'-hydroxyl acylation at a specific position. The readout can be performed on a full genome level, as 2'-hydoxyl modified nucleotides tend to induce skips with a significantly increased likelihood of reverse transcription over unmodified nucleotides. [98] [99] [100] 117 By sequencing a reverse-transcribed library of chemically modified RNA, an increase in mutations at modified positions can be observed. From this data, local secondary structure can be inferred by comparing the local mutation rate to the mutation rate of a mock (usually DMSO) modified sample. The modification ratio thus obtained can be used as a probabilistic penalty term in secondary structure prediction. 100, 101, 118 It should be noted that whereas structuredness precludes high reactivity signals at a specific position, low reactivity signals are still more likely to occur even at an open or flexible region. 118 In addition, low reactivity signals could also be protein-interacting sites, whereby the protein binding protects the region from interacting with SHAPE compounds. However, in general, reactivity can be translated into an odds ratio 118 that contributes significantly to the prediction accuracy of RNA secondary structure. Whereas naïve prediction accuracy for the 16S subunit of the E. coli ribosome reaches around 45% of base pairs and helices, including SHAPE data allows an increase in accuracy to well over 90%. 100 This technique has been successfully applied to probe the genomic structure of a range of viral systems, e.g. HIV 98, 101, 117 or flaviviruses. 95, 112, 119 Whereas SHAPE-MaP data is able to improve the prediction accuracy of local secondary structure significantly, there remains a level of structural uncertainty related to the macroorganization of genomic RNAs. Ensembles generated with common structure prediction tools yield a diversity of macro-patterns. Chemical cross-linking information is invaluable to narrow down the space of these patterns to fewer possibilities by providing pair-wise RNA interaction information. 95 Several methods for cross-linking based structure probing exist, mainly based on psoralen intercalation into helical segments and subsequent UV-based covalent cross-linking. 89, 112, 113, 119, 120 One way of enriching for the cross-linked RNA interactions is to utilize a biotinylated version of psoralen, which allows the selective extraction of cross-linked segments using streptavidin beads. Subsequent proximity ligation steps followed by PCR amplification and deep sequencing produces chimeric reads that reveal intra-genomic structures at various distances. Moreover, if conducted in the presence of host RNA, virus-host RNA-RNA interactions can be captured as well. Interestingly, a significant number of positions tend to form multiple interactions that are mutually exclusive, indicating the presence of either structural heterogeneity in the virus population, or constant fluctuations in virus structures. 95 Observing mutually exclusive interactions in the full ensemble is to be expected if structural heterogeneity in the underlying population exists, i.e. if there is more than one way that the genome can be arranged within the mature virions (or in infected cells for that matter). Unfortunately, to date, we have been technically unable to conduct an experiment on a single virion, which would answer this question. In case the observed ensemble is the result of in-virion dynamics, time averaging during the experiment may still reveal competing positions. If structural heterogeneity exists only across virions with a specific frozen state in each individual virion, the single-virion experiment would not show any competing positions, but a subset of the full ensemble. Combining ensembles of structures that have been predicted with high local accuracy using SHAPE-constrained modelling with information about the macro-organization obtained through cross-linking experiments allows for a high-confidence selection of plausible RNA structures from the local to global scale of organization. RNA molecules do not exist in isolation inside cells or even inside virion particles, and can closely interact with proteins, e.g. in viral capsids or the ribosome. Such RNA-protein interactions have the potential to materially affect the structure of RNA, but are currently impossible to account for in free-energy-based structure prediction algorithms. The integrative modelling of RNA and protein-RNA interactions in the context of the whole viral nucleocapsid is still a new and challenging field. 45 Due to the flexible nature of RNA and the range of secondary and tertiary structures that RNAs can adopt, modelling the RNA genome is a non-trivial problem at the atomistic and even CG level. There have been recent efforts to develop tools to model the 3D structure of RNA and these methods range from atomistic representations, CG representations, and graph representations of RNA (i.e. helices as edges and loops as vertices); these have been reviewed extensively elsewhere. 121, 122 There are also currently a few methods that can model the interactions of large RNAs and multiple proteins. 123, 124 Two such methods are the integrative modelling platform (IMP) 125, 126 , which allows the definition of proteins and RNA as large beads whose spatial arrangement can be subjected to minimization to satisfy user-defined restraints, and PyRy3D (http://genesilico.pl/pyry3d/), which also employs an integrative modelling approach. These restraints can come from knowledge of protein-binding regions along the viral genome, length, size, base-pairing, long-range interactions, electrostatic interactions between the capsid and viral genome, and volume restraints of the viral particle. Thorough verification of identified structural motifs is a crucial part of structural genomics research. Beyond the identification of structural peculiarities, identifying functional structures is a key goal. Functional structures, either existing within viral genomes or formed via virus-host RNA interactions in infected cells may offer targets for therapeutic intervention. A key method to validate structural motifs arises from investigating the evolutionary history of a particular virus. 127, 128 By identifying covarying bases in the genetic history of a virus, evolutionarily conserved structural motifs can be identified. [127] [128] [129] A prerequisite for such an analysis is the availability of a large number of viral sequences, ideally spanning as diverse a number of time points and geographies as possible. Such a wealth of sequence information is usually only available for well-studied systems, e.g. HIV, Influenza, or DENV. Emerging threats such as the 2016 Zika pandemic virus prove more difficult to investigate by reference to the historical genetic record. While there is a large number of full-genome sequences available for Zika viruses, most of these samples focus on the 2016 South American outbreak and exhibit only a limited amount of genetic diversity, making the identification of covariation for structure validation challenging. In contrast, analysing dengue viruses that have been subjected to much more intense surveillance over time and geographic regions allowed us to validate predicted structural motifs with high confidence. 95 Ultimately, verifying the functional significance of structural motifs relies on the ability to create viruses with mutations that disrupt the structures of interest and subsequently measuring the functional impact of these modifications via fitness assays in cells or in animals. This poses a particular challenge when the motifs are present in coding regions, as opposed to untranslated regions, as possible modifications are constrained by a number of factors such as the need to preserve protein sequences through synonymous mutations and the need to avoid rare codons that would impact translation efficiency independently of any structural effects at the genome level. Moreover, the ability to design compensatory mutations is heavily constrained in coding regions due to the same factors and the need for both modifications to align. Viruses have complex life cycles which are dependent upon interactions with a variety of host factors and are constantly adapting, making therapeutics development challenging. Continuing increases in computational power, widespread use of multiscale approaches, and growing efforts to integrate diverse structural, biophysical, and HTS data into theoretical models have made possible the accurate description of entire virus particles, including component protein, lipid, and nucleic acid material. As outlined here for flaviviruses, this is enabling us to refine envelope structure, predict its dynamic interplay with environmental and immunological triggers, elucidate capsid conformational ensembles, and map genomic architecture. Similar approaches are now being applied to the emerging virus SARS-CoV-2 that is responsible for the COVID-19 pandemic, and indeed, numerous coronavirus structures of the trimeric pre-fusion glycoprotein spike responsible for infection have been published. 130 In the years to come, this will provide promising routes to novel antiviral and vaccine design strategies. The Burden of HIV: Insights from the Global Burden of Disease Study A Novel Coronavirus from Patients with Pneumonia in China The Global Distribution and Burden of Dengue Structure of a Flavivirus Envelope Glycoprotein in Its Low-PH-Induced Membrane Fusion Conformation Structure of Dengue Virus: Implications for Flavivirus Organization, Maturation, and Fusion Antibody-Dependent Enhancement of Infection and the Pathogenesis of Viral Disease The Risks behind Dengvaxia Recommendation Revolutionary Cryo-EM Is Taking over Structural Biology Multiscale Modelling and Simulation of Viruses Capsid Protein Structure in Zika Virus Reveals the Flavivirus Assembly Process Rodenhuis-Zybert, I.; Wilschut, J. Flavivirus Cell Entry and Membrane Fusion Structure of the Dengue Virus Envelope Protein after Membrane Fusion A Ligand-Binding Pocket in the Dengue Virus Envelope Glycoprotein The Flavivirus Precursor Membrane-Envelope Protein Complex: Structure and Maturation Structures of Immature Flavivirus Particles Structure of the Immature Dengue Virus at Low PH Primes Proteolytic Maturation Influence of Pr-M Cleavage on the Heterogeneity of Extracellular Dengue Virus Particles Lysosomal Enzyme Trafficking between Phagosomes, Endosomes, and Lysosomes in J774 Macrophages: Enrichment of Cathepsin H in Early Endosomes Binding of a Neutralizing Antibody to Dengue Virus Alters the Arrangement of Surface Glycoproteins Combined Effects of the Structural Heterogeneity and Dynamics of Flaviviruses on Antibody Recognition Antibody-Mediated Neutralization of Flaviviruses: A Reductionist View Dengue Virus: Two Hosts, Two Structures Structural Changes in Dengue Virus When Exposed to a Temperature of 37°C Dengue Structure Differs at the Temperatures of Its Human and Mosquito Hosts Infectivity of Dengue Virus Serotypes 1 and 2 Is Correlated with E-Protein Intrinsic Dynamics but Not to Envelope Conformations Histidine Protonation and the Activation of Viral Fusion Proteins The PH Dependence of Flavivirus Envelope Protein Structure: Insights from Molecular Dynamics Simulations Stability of Trimeric DENV Envelope Protein at Low and Neutral PH: An Insight from MD Study Role of PH on Dimeric Interactions for DENV Envelope Protein: An Insight from Molecular Dynamics Study Structure and Dynamics of the Monomer of Protein E of Dengue Virus Type 2 with Unprotonated Histidine Residues Probing the Mechanism of PH-Induced Large-Scale Conformational Changes in Dengue Virus Envelope Protein Using Atomistic Simulations Identification of Specific Histidines as PH Sensors in Flavivirus Membrane Fusion Study of the Mechanism of Protonated Histidine-Induced Conformational Changes in the Zika Virus Dimeric Envelope Protein Using Accelerated Molecular Dynamic Simulations New Insights into Flavivirus Biology: The Influence of PH over Interactions between PrM and E Proteins Insertion of Dengue E into Lipid Bilayers Studied by Neutron Reflectivity and Molecular Dynamics Simulations Molecular Basis of Endosomal-Membrane Association for the Dengue Virus Envelope Protein Characterizing the Conformational Landscape of Flavivirus Fusion Peptides via Simulation and Experiment A Funneled Conformational Landscape Governs Flavivirus Fusion Peptide Interaction with Lipid Membranes Synthetic Fusion Peptides of Tick-Borne Encephalitis Virus as Models for Membrane Fusion The Structural Dynamics of the Flavivirus Fusion Peptide-Membrane Interaction Membrane Vesiculation Induced by Proteins of the Dengue Virus Envelope Studied by Molecular Dynamics Simulations Multiscale Molecular Dynamics Simulation Approaches to the Structure and Dynamics of Viruses Single-Molecule Studies of Flavivirus Envelope Dynamics: Experiment and Computation The Role of the Membrane in the Structure and Biophysical Robustness of the Dengue Virion Envelope Computational Virology: From the inside Out Breaking the "Unbreakable Pushing the Envelope: Dengue Viral Membrane Coaxed into Shape by Molecular Simulations The MARTINI Coarse-Grained Force Field: Extension to Proteins The MARTINI Force Field: Coarse Grained Model for Biomolecular Simulations Combining an Elastic Network with a Coarse-Grained Molecular Force Field: Structure, Dynamics, and Intermolecular Recognition Maker for Coarse-Grained Simulations with the Martini Force Field The Stem Region of Premembrane Protein Plays an Important Role in the Virus Surface Protein Rearrangement during Dengue Maturation MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations GROmaρs: A GROMACS-Based Toolset to Analyze Density Maps Derived from Molecular Dynamics Simulations UCSF Chimera-A Visualization System for Exploratory Research and Analysis Targeted Molecular Dynamics Simulation of Conformational Change -Application to the T ↔ R Transition in Insulin Targeted Molecular Dynamics: A New Approach for Searching Pathways of Conformational Transitions GROMACS: Fast, Flexible, and Free PLUMED 2: New Feathers for an Old Bird Scalable Molecular Dynamics with NAMD Going Backward: A Flexible Geometric Approach to Reverse Transformation from Coarse Grained to Atomistic Models A Web-Based Graphical User Interface for CHARMM CHARMM-GUI Membrane Builder toward Realistic Biological Membrane Simulations Molecular Basis of Dengue Virus Serotype 2 Morphological Switch from 29°C to 37°C Mechanism of Enhanced Immature Dengue Virus Attachment to Endosomal Membrane Induced by PrM Antibody Immature and Mature Dengue Serotype 1 Virus Structures Provide Insight into the Maturation Process The PyMOL Molecular Graphics System, Version 1.8 Perspective on the Martini Model Solution Structure of Dengue Virus Capsid Protein Reveals Another Fold Crystal Structure of the Capsid Protein from Zika Virus Virus Core Protein: Tetramer Structure and Ribbon Formation Structural Disorder in Viral Proteins The Most Important Thing Is the Tail: Multitudinous Functionalities of Intrinsically Disordered Protein Termini Balanced Protein-Water Interactions Improve Properties of Disordered Proteins and Non-Specific Protein Association CHARMM36m: An Improved Force Field for Folded and Intrinsically Disordered Proteins Force Field Development and Simulations of Intrinsically Disordered Proteins The IDP-Specific Force Field Ff14IDPSFF Improves the Conformer Sampling of Intrinsically Disordered Proteins Partial Intrinsic Disorder Governs the Dengue Capsid Protein Conformational Ensemble A Practical Guide to Small Angle X-Ray Scattering (SAXS) of Flexible and Intrinsically Disordered Proteins 3.3 Å Cryo-EM Structure of a Nonenveloped Virus Reveals a Priming Mechanism for Cell Entry Angstrom Cryo-EM Reconstruction of Tobacco Mosaic Virus from Images Recorded at 300 KeV on a 4k x 4k CCD Camera Near-Atomic Cryo-EM Structure of the Helical Measles Virus Nucleocapsid Mechanisms of Assembly and Genome Packaging in an RNA Virus Revealed by High-Resolution Cryo-EM New Structural Insights into the Genome and Minor Capsid Proteins of BK Polyomavirus Using Cryo-Electron Microscopy The 3.8 Å Resolution Cryo-EM Structure of Zika Virus Structure of the Thermally Stable Zika Virus Cryo-EM Structure of the Mature Dengue Virus at 3.5-Å Resolution Mature HIV-1 Capsid Structure by Cryo-Electron Microscopy and All-Atom Molecular Dynamics In Vivo Mapping of Eukaryotic RNA Interactomes Reveals Principles of Higher-Order Organization and Regulation Composition and Three-Dimensional Architecture of the Dengue Virus Replication and Assembly Sites Pathways for Virus Assembly around Nucleic Acids Role of RNA Structures Present at the 3'UTR of Dengue Virus on Translation, RNA Synthesis, and Viral Replication The Search for Structure-Specific Nucleic Acid-Interactive Drugs: Effects of Compound Structure on RNA versus DNA Interaction Strength Structural and Functional Motifs in Influenza Virus RNAs Structure Mapping of Dengue and Zika Viruses Reveals Functional Long-Range Interactions The Druggable Genome: An Update Identifying RNA Contacts from SHAPE-MaP by Partial Correlation Analysis RNA Motif Discovery by SHAPE and Mutational Profiling (SHAPE-MaP) Selective 2'-Hydroxyl Acylation Analyzed by Primer Extension and Mutational Profiling (SHAPE-MaP) for Direct, Versatile and Accurate RNA Structure Analysis Accurate SHAPE-Directed RNA Structure Determination SHAPE-Directed RNA Secondary Structure Prediction Toward Global RNA Structure Analysis Genome-Wide Measurement of RNA Secondary Structure in Yeast Ribozyme Structures and Mechanisms The Roles of Structural Dynamics in the Cellular Functions of RNAs Structural Genomics of RNA Prediction of RNA Secondary Structure by Free Energy Minimization Software for RNA Secondary Structure Prediction and Analysis ViennaRNA Package 2.0 Genome-Scale Reconstruction of RNA Secondary Structure Integrating High-Throughput Sequencing Data A 5′ RNA Element Promotes Dengue Virus RNA Synthesis on a Circular Genome Integrative Analysis of Zika Virus Genome RNA Structure Reveals Critical Determinants of Viral Infectivity Evaluation of the Suitability of Free-Energy Minimization Using Nearest-Neighbor Energy Parameters for RNA Secondary Structure Prediction Ab Initio RNA Folding by Discrete Molecular Dynamics: From Structure Prediction to Folding Mechanisms Selective 2′-Hydroxyl Acylation Analyzed by Primer Extension (SHAPE): Quantitative RNA Structure Analysis at Single Nucleotide Resolution Architecture and Secondary Structure of an Entire HIV-1 RNA Genome Computational Analysis of Conserved RNA Secondary Structure in Transcriptomes and Genomes The RNA Base-Pairing Problem and Base-Pairing Solutions Coarse-Grain Modelling of Protein-Protein Interactions Advances and Assessment of 3D Structure Prediction Computational Modeling of RNA 3D Structure Based on Experimental Data Opportunities and Challenges in RNA Structural Modeling and Design Integrative Structure Modeling with the Integrative Modeling Platform Putting the Pieces Together: Integrative Modeling Platform Software for Structure Determination of Macromolecular Assemblies Comparative Sequence Analysis and Patterns of Covariation in RNA Secondary Structures A Statistical Test for Conserved RNA Structure Shows Lack of Evidence for Structure in LncRNAs Measuring Covariation in RNA Alignments: Physical Realism Improves Information Measures Cryo-EM Structure of the 2019-NCoV Spike in the Prefusion Conformation This work was supported by A*STAR. JKM and PJB thank NRF (NRF2017NRF-CRP001-027) for funding.