key: cord-006951-tj22dh5o
authors: Li, Siyu; Erdemci-Tandogan, Gonca; van der Schoot, Paul; Zandi, Roya
title: The effect of RNA stiffness on the self-assembly of virus particles
date: 2018-01-31
journal: J Phys Condens Matter
DOI: 10.1088/1361-648x/aaa159
sha: 
doc_id: 6951
cord_uid: tj22dh5o

Under many in vitro conditions, some small viruses spontaneously encapsidate a single stranded (ss) RNA into a protein shell called the capsid. While viral RNAs are found to be compact and highly branched because of long distance base-pairing between nucleotides, recent experiments reveal that in a head-to-head competition between an ssRNA with no secondary or higher order structure and a viral RNA, the capsid proteins preferentially encapsulate the linear polymer! In this paper, we study the impact of genome stiffness on the encapsidation free energy of the complex of RNA and capsid proteins. We show that an increase in effective chain stiffness because of base-pairing could be the reason why under certain conditions linear chains have an advantage over branched chains when it comes to encapsidation efficiency. While branching makes the genome more compact, RNA base-pairing increases the effective Kuhn length of the RNA molecule, which could result in an increase of the free energy of RNA confinement, that is, the work required to encapsidate RNA, and thus less efficient packaging.

For self-avoiding chains the scaling exponents are ν = 3/5 and ν = 1/2 for the linear and branched polymers, respectively [9, 10] . However, because of its tertiary structures that include pseudoknots, RNAs are significantly more compact than branched polymers. Indeed, several numerical studies and surveys have found the exponent ν = 1/3 to be small for RNA, reflecting this more compact structure [11, 12] .

Many small viruses encapsidate a single stranded RNA into a protein shell called the capsid. Under appropriate physico-chemical conditions of acidity and ionic strength, this process is spontaneous and the virus can readily assemble in vitro from solutions containing protein subunits and RNA [13] [14] [15] [16] [17] [18] [19] . Note that in the absence of genome, capsids do not form at physiological pH and salt concentrations. Many spherical viruses adopt structures with icosahedral symmetry [20, 21] , which imposes a constraint on the number of subunits in capsids. The structural index T, introduced by Casper and Klug, defines the number of protein subunits in viral shells, which is 60 times the T number. Note that T = 1, 3, 4, 7, . . . can assume only certain 'magic' integer numbers [22] [23] [24] [25] .

Quite interestingly, virus protein subunits are able to coassemble with a wide variety of negatively charged cargos, including non-cognate RNAs of different length and sequence, synthetic polyanions, and negatively charged nanoparticles [18, 26, 27] . It is now widely accepted that electrostatic interactions between the positive charges on the coat protein tails and negative charges on the cargo is the main driving force for the spontaneous assembly of simple viruses in solution [13-17, 28, 29] . Still, several recent self-assembly experimental studies reveal the importance of non-electrostatic interactions, associated with specific structures of the genome, for the selection of one RNA over another by the capsid proteins [30] .

The self-assembly studies of Comas-Garcia et al [31] reveal in particular the importance of RNA topology. They carried out a number of experiments in which a solution of the capsid proteins of cowpea chlorotic mottle virus(CCMV) were mixed with equal amount of RNA1 of Brome mosaic virus (BMV) and RNA1 of Cowpea chlorotic mottle virus (CCMV). In this head-to-head competition, the amount of coat protein (CP) of CCMV was selected such that it could only encapsidate one of the genomes. Quite unexpectedly, the RNA1 of CCMV (the cognate RNA) lost to RNA1 of BMV, i.e. only RNA1 of BMV was encapsidated by CCMV CPs. These experiments emphasize the impact of RNA structure on the assembly of viral shells, as RNA1 of BMV has a more compact structure than that of CCMV [32] .

Following these experiments a number of simulation studies, using quenched (fixed) branched polymers as a model for RNA, have shown that the optimal length of encapsidated RNA increases when accounting for its secondary structure [12, 33] . Mean-field calculations using annealed (equilibrium) branched polymers as model RNAs have also shown that the length of encapsidated polymer increases as the propensity to form larger numbers of branched points increases [32, 34, 35] . More importantly, these calculations show that a higher level of branching considerably increases the depth of the free-energy gain associated with the encapsulation of RNA by a positively charged shell. This implies that the efficiency of genome packaging goes up with increasing the level of branching, so with increasing compact secondary structure of the genome.

In fact, it was shown in [36, 37] that while RNA molecules of the same nucleotide length and composition might have similar amounts of base pairing, non-viral RNAs have significantly less compact structures than viral ones. The compactness of viral RNAs has been associated with the presence of a larger fraction of higher-order junctions or branch points in their secondary structure [36, 38, 39] . Figures 1(a) and (b) illustrate the secondary structures of CCMV RNA and those of a randomly sequenced RNA with the same length. The structures are obtained through the Vienna RNA software package [8] . As shown in the figure, CCMV RNA has considerably larger number of branched points than non-viral RNA of the same length.

Above-mentioned theoretical and experimental studies indicate that in a head-to-head competition between two different types of RNAs, the RNA with a larger number of branching junctions or branch points should have a competitive edge over others [32, 34, 35] . A naive physical explanation is that branching causes RNA molecules to become more compact than structureless linear polymers of similar chain length, which are then easier to accommodate in the limited space provided by the cavity of a capsid. According to these theories and simulations, a linear chain should definitely 'lose' to a branched one of the same number of monomers when competing head-to-head for a limited number of capsid proteins.

To probe the effect of RNA structure and test the above theories on the self-assembly of virions more systematically, Beren et al [40] recently performed a set of in vitro packaging experiments with polyU, an RNA molecule that has no folded secondary structure. They examined whether RNA topology, i.e. the secondary structure or level of branching, allows the viral RNA to be exclusively packaged by its cognate capsid proteins. More specifically, they studied the competition between CCMV viral RNA with polyU of equal number of nucleotides for virus capsid proteins. They find that CCMV CPs are capable of packaging polyU RNAs and, quite interestingly, polyU outcompetes the native CCMV RNA in a headto-head competition for the capsid proteins. These findings are in sharp contrast with the previous experimental, theoretical, simulation and scaling studies noted above, which suggest that the branching and compactness of RNA must lead to a more efficient capsid assembly. That being said, the scaling theory of [41] already hints at the subtle interplay between Kuhn length, solvent quality and linear charge density dictating the free energy gain of encapsulation.

To explain these intriguing experimental findings, we employ a mean-field density functional theory and study the impact of RNA branching, while allowing for differences in Kuhn length. We further consider that double helical sequences have a larger linear charge density than non-hybridized sequences along the chain. In all previous theoretical and simulation studies related to the impact of RNA topology on virus assembly, the focus has been on the importance of the degree of branching, ignoring the impact of base-pairing on the RNA Kuhn length and linear charge density.

As noted above, the Kuhn length of single stranded RNA under physiological conditions of monovalent salt is between one and two nm depending on the ionic strength [2] , while that of a double stranded RNA is about 140 nm [3, 4] . The average duplex length of viral RNA is about six nucleotide pairs [11] , which corresponds to about five nm. This value is much smaller than the persistence length of double stranded RNA [36] , suggesting that viral RNA can be modeled as a flexible polymer with an average Kuhn length of about six paired nucleotides. There are of course also loop sequences that in our model act as end, hinge and branching points, but how this translates into an effective Kuhn length for the entire branched chain representation of the RNA is unclear. Plausibly, the effective Kuhn length of the internally hybridized chain should be larger than that of the equivalent unstructured non-hybridized chain. Furthermore, another major difference between the linear and branched (base-paired) ssRNA structures seems to be the linear charge density, which doubles for the latter on account of base pairing (hybredization).

In this paper, we vary the degree of branching as well as the effective Kuhn length and linear charge density of a model RNA, and study their impact on the optimal length of encapsulated genome by capsid proteins. We find that as we increase the chain stiffness or Kuhn length the free energy of encapsulation of RNA becomes less negative than that of a linear chain, at least under certain conditions. Hence, a larger Kuhn length, associated with base-pairing, might decrease the efficiency of packaging of RNA compared to a linear polymer. In contrast, our results indicate that increasing the linear charge density improves the efficiency of packaging of both linear and branched polymers. Thus base-pairing has two competing effects: it makes the chain stiffer, which increases the work required to encapsidate the chain, but at the same time it increases the linear charge density that lowers the encapsidation free energy and augment the packaging efficiency. These results are consistent with the experiments of Beren et al [40] , in which the linear RNA, PolyU, outcompetes the cognate RNA of CCMV when they are both in solution with a limited amount of capsid proteins of CCMV, that is, sufficient to encapsidate either PolyU or CCMV RNA but not both.

The remainder of this paper is organized as follows. In the next section, we introduce the model and present the equations that we will employ later. In section 3, we present our results and discuss the impact of the Kuhn length on the capsid stability and optimal length of encapsidated genome in section 4. Finally, in section 5, we present our conclusion and summarize our findings.

To obtain the free energy associated with a genome trapped inside a spherical capsid, we consider RNA as a generic flexible branched polyelectrolyte that interacts with positive charges residing on the inner surface of the capsid. We focus on the case of annealed branched polymers as the degree of branching of RNAs, a statistical quantity, can be modified by its interaction with the positive charges on the capid proteins [42] .

Within mean-field theory, the free energy of a negatively charged chain in a salt solution confined inside a positively charged spherical shell can be written as [29, 32, 34, 35, [43] [44] [45] 

with β the inverse of temperature in units of energy, v the effective excluded volume per monomer, λ B = e 2 β/4π the Bjerrum length, e the elementary charge, μ the number density of monovalent salt ions, and τ the charge of the statistical Kuhn segment of the chain. The dielectric permittivity of the medium ε is assumed to be constant [46] . The quantity , the Kuhn length of the polymer, is defined as an effective stiffness averaged over the entire sequence along the genome. Further, the fields Ψ(r) and Φ(r) describe the square root of the monomer density field and the electrostatic potential, respectively, and the term W [Ψ] corresponds to the free energy density of an annealed branched polymer as described in equation (2) below.

As discussed in the Introduction, the secondary structure of the RNA molecules contain considerable numbers of junctions of single-stranded loops from which three or more duplexes exit. This makes RNA act effectively as a flexible branched polymer in solution. While the Kuhn length for a single stranded, non self-hybridized ssRNA is a few nanometers and that for a double stranded RNA is about 140 nanometers, the Kuhn length of viral RNA is not well determined, as we discussed above. In the absence of exact measurements, we employ an average or effective value for , which presumably will be larger if the number of consecutive base pairs (duplexes) between single stranded segments or stem loops along the RNA is larger. Further, we consider the limit of long chains consisting of a very large number of segments N → ∞ for our confined chains, where N denotes the number of segments. In this formal limit, we employ the ground-state dominance approximation implicit in equation (1), as it has proven to be accurate provided N 1, i.e. for very long chains [47] . We specify below the connection between the number of segments and the number of nucleotides that make up the RNA, differentiating between self-hybridized and non self-hybridized RNAs.

The first term in equation (1) is the entropic cost of deviation from a uniform chain density and the second term describes the influence of excluded volume interactions. The last two lines of equation (1) are associated with the electrostatic interactions between the chain segments, the capsid and the salt ions at the level of Poisson-Boltzmann theory [43, 48, 49] . The term W [Ψ] represents the free energy density associated with the annealed branching of the polymer [50] [51] [52] [53] ,

where f e and f b are the fugacities of the end and branched points of the annealed polymer, respectively [44] . Note that the stem-loop or hair-pin configurations of RNA are counted as end points. The quantity 1 √ 3 f e Ψ indicates the density of end points and 

There are two additional constraints in the problem. Note first that the total number of monomers (Kuhn lengths) inside the capsid is fixed [54, 55] ,

We impose this constraint through a Lagrange multiplier, E, introduced below. Second, there is a relation between the number of the end and branched points,

as there is only a single polymer in the cavity that by construction has no closed loops as it has to mimic the secondary structure of an RNA. The polymer is linear if f b = 0, and the number of branched points increases with increasing value of f b . For our calculations, we vary f b and find f e through equations (3) and (5). Thus, f e is not a free parameter.

Varying the free energy functional with respect to the monomer density field Ψ(r) and the electrostatic potential Φ(r), subject to the constraint that the total number of monomers inside the capsid is constant [55] , we obtain two self-consistent non-linear differential equations, which couple the monomer density with the electrostatic potential in the interior of the capsid. The resulting equations are

with E the earlier mentioned Lagrange multiplier enforcing the fixed number of monomers in the cavity. The boundary conditions for the electrostatic potential inside and outside of the spherical shell of radius R arê

The boundary condition (BC) for the electrostatic potential is obtained by minimizing the free energy assuming the surface charge density σ is fixed. The concentration of the polymer outside of the capsid is assumed to be zero. The BC for the inside monomer density field Ψ is of Neumann type (n·∇Ψ| s = 0) that can be obtained from the energy minimization [55] but our findings are robust and our conclusion do not change if we impose the Dirichlet boundary condition Ψ(r) | r=R = 0. The former represent a neutral surface, whilst the latter a repelling surface [47] .

We solved the coupled equations given in equation (6) for the Ψ and Φ fields, subject to the boundary conditions in equation (7) through a finite element method. The polymer density profiles Ψ 2 as a function of the distance from the center of the shell, r, are shown in figure 2 for different values of the RNA stiffness and a fixed number of nucleotides, presuming the RNA not to have any secondary structure. Note that for simplicity we assume that a linear chain with = 1 nm contains one nucleotide and carries one negative charge, so τ = −e. = 2 nm has two nucleotides with two negative charges and so on. Thus in our figures the numerical value of also indicates the number of nucleotides in one Kuhn length for linear chains. For the three plots in figure 2 , the total number of nucleotides is calculated using equation (4) and is equal to 1000. It is worth mentioning that equation (4) gives us the total number of Kuhn lengths N and we multiply it by the number of nucleotides along one Kuhn length to obtain the total number of nucleotides.

As illustrated in the figure, the polymer density becomes larger at the wall as the Kuhn length decreases, even though the linear charge density is fixed. In all plots for figure 2 we assumed that the excluded volume is kept constant. Arguably, the excluded volume parameter υ depends on , and usually it is assumed that υ ∝ 3 [47] . As we will discuss in section 4, our conclusions about the role of stiffness in the encapsidation free energy are robust and should not sensitively depend on the strength of the excluded volume interaction.

To investigate the packaging efficiency of a linear chain as a function of its stiffness, we obtained the free energy of the encapsidation of the linear polymer model as a function of number of nucleotides for different values of , as illustrated in figure 3 . The figure shows that the optimal number of nucleotides trapped in the shell increases as decreases. We emphasize again that since we assumed that the size of a single nucleotide is about one nm, the numerical value of represents the number of nucleotides within one Kuhn length. This implies that the number of nucleotides and hence the number of charges per Kuhn segment should increase as the Kuhn length increases. For example, in our parametrization = 4 nm represents four nucleotides (resulting in τ = −4e). We observe the same behavior for the free energy of branched polymers, that is, increasing causes the optimal length of genome to move towards shorter chains. Obviously the stiffness value is larger for the RNAs whose average number of base pairs in the duplex segments is larger.

The concept of the number of nucleotides per Kuhn length is trickier to implement for the branched polymers taken as model for self-hybridized ssRNA. For example, a branched polymer with the Kuhn length = 1 nm represents in our model description two nucleotides and a charge of τ = −2e. When the average number of base pairs is about 8 in duplex segments of an ssRNA, we consider the Kuhn length is about 8 nm, but the number of nucleotides and number of charges per Kuhn length τ will be 16. Thus, in our prescription of the self-hybridized ssRNA the number of nucleotides is twice the value of within a Kuhn length as a result of base pairing.

We also examined the impact of the fugacity on the optimal number of nucleotides. There is a direct relation between the fugacity and the number of branched points: As the fugacity increases the number of branched points of RNA increases too, see [32, 34, 35] . Figure 4 illustrates that the optimal number of nucleotides increases and the encapsidation free energy becomes more negative, indicating a more stable complex, as the fugacity of branching and hence the number of branch points increases. The solid line in the figure shows the free energy of a linear polymer. For the case shown in the figure, the Kuhn length of the linear chain is = 1 nm but that for the branched polymers = 4 nm, corresponding to four base-paired nucleotides. The number of charges within one Kuhn length then is τ = −8e. Figure 4 reveals that the free energy of the linear chain is lower than that of the branched one in certain regions of parameter space. For example, for a branched polymer with fugacity f b = 0.1, = 4 nm and τ = −8e (dotted line), the encapsidation free energy of a linear chain with = 1 nm and τ = −e is always lower than that of the branched polymer, and thus, in a head-to-head competition with a limited number of proteins, the linear chain will be the one that is preferentially encapsidated by capsid proteins. This shows that the work of compaction of linear chains could be lower than that of a branched polymer, depending on the stiffness and the degree of branching of the polymers involved. Note that for a fixed while the number of branch points (f b ) increases, at some point, the branched polymers outcompetes the linear polymer for binding to capsid proteins, as is illustrated in the figure.

We next studied the free energy of a branched polymer with a fixed fugacity for different values of the stiffness . As illustrated in figure 5 for a fugacity f b = 0.1, the linear chain (solid) 'looses' to a branched one when four nucleotides have formed two base pairs with = 2 nm and τ = −4e (dashed line). However, the figure shows that as increases, for = 4 nm As the stiffness increases, the optimal number of nucleotides moves towards shorter chains. The quantity τ indicates the number of negative charges in one Kuhn segment. Other parameters used are the total number of charges on the capsid Q c = 1800, the excluded volume parameter υ = 0.05 nm 3 , the quantity μ corresponds to a salt concentration of 100 mM, the radius of the cavity of the capsid R = 12 nm and the absolute temperature T = 300 K. and 8 nm (dotted and dotted-dashed lines), their encapsidation free energies become larger than that of the linear chain, indicating that in a head-to-head competition the linear polymer will be encapsidated. Thus, if the average number of nucleotides in duplex segments increases, it becomes energetically more costly to confine RNA inside the capsid.

Recent experiments emphasized on the crucial role of the RNA topology in the efficiency of virus assembly. As noted in the introduction, Comas-Garcia et al [31] have shown that CCMV capsid proteins exclusively encapsidate BMV RNA in the presence of the cognate CCMV RNA under conditions where there is a limited number of capsid proteins in solution. The simulations and analytical studies performed in [32] [33] [34] [35] 56] are consistent with these results: the viral RNA with a larger degree of branching has a competitive edge over the other viral RNAs or non-viral randomly branched RNAs, keeping all other chain quantities equal.

Indeed, all mean-field theories, numerical calculations and simulations up to now have indicated that the encapsidation free energy of both annealed and quenched branched polymers is significantly lower than that of linear polymers. This suggests that if there are equal amounts of linear and branched polymers in a solution, but there are sufficient capsid proteins to encapsulate exclusively half of the genomes in solutions, only the branched polymer is encapsidated by capsid proteins. Nevertheless, according to a series of more recent experiments by Beren et al [40] in a head-to head competition between a linear (polyU) chain and CCMV RNA of equal length, surprisingly, and in contrast to theoretical predictions, the linear chain outcompetes the cognate RNA.

While previous theoretical studies have focused on the scaling behavior of linear and branched flexible polymers [32, 34, 35, 48, [56] [57] [58] , in this paper we study the impact of the stiffness or Kuhn length on the encapsidation of RNA by capsid proteins. In general the duplexed segments of viral RNA contain on average about five to six base-pairs [11] . Note that some studies show that viral RNAs must have between 60 and 70 % of their nucleotides in duplexes, so the linear charge density is almost a factor of two larger and the effective chain length about twice shorter [59] . We argue that while the base pairing on the one hand makes the RNA more compact, on the other hand it increases the effective Kuhn length or the statistical length of the polymer unit. This leads to an increase in the work of compaction of the flexible chain by capsid proteins, which is directly related to the encapsidation free energy of the polymer as plotted in figure 5 . We emphasize again that the findings of this paper is not in contradiction with the previous studies: The more strongly branched a polymer is, the more competitive it becomes to be encapsidated by capsid proteins. However, in this work we show that because of basepairing, the RNA also becomes stiffer and under appropriate conditions can no longer outcompete the linear polymer for binding to capsid proteins.

Since branching due to base-pairing causes both the stiffness and the linear charge density of an otherwise linear polymer to increase, one might wonder which effect, higher charge density or larger stiffness, makes the viral RNA less competitive than a linear polymer. Figure 6 distinguishes the effect of stiffness and charge density. The dashed lines in the figure correspond to linear polymers with = 1 nm but different numbers of charges per Kuhn segment τ = −e, −4e, −10e. In the plots, the longer the dashes are, the higher the charge density is. As illustrated in the figure, the encapsidation free energy becomes lower as the charge density increases. The charge density has the same impact on the encapsidation free energy of branched polymers. Figure 6 shows that as the charge density of branched polymer increases (dotted lines), their free energy decreases. The more distance between the dots, the higher the charge density of the branched polymer. Quite interestingly, the figure shows that the effect of stiffness overshadows the impact of charge density. A branched polymer with the stiffness of = 2 nm and charge density of τ = −4e or −10e has a higher free energy than a linear polymer with These examples do not correspond to 'real' RNA as it is not possible to increase the number of charges to more than 2e per base pair, but they clarify that base-pairing has three competing effects. First, it makes RNA stiffer, which increases the work of encapsidation but, second, in parallel gives rise to the branching effect and, third, a higher charge density, which both lowers the encapsidation free energy and enhances the packaging efficiency of RNA by capsid proteins.

Another important point to consider is the change in the excluded volume interaction that must somehow be connected with the variation in the Kuhn length. We repeated the calculations done for figure 5, but considered the excluded volume effect, which approximately goes as 3 [47] . We found that our conclusion is robust and that the excluded volume interaction only slightly modifies the boundary in the parameter space where the linear polymers are able to outcompete the branched ones. The results of this study can explain the intriguing findings of the experiments of Beren et al [40] in which the unstructured polyU RNA is preferentially packaged and outcompetes native RNA CCMV, despite the fact that viral RNAs have more branch points and as such have a more compact structure. Last but not least, we note that the interaction of RNA with capsid proteins could modify the preferred curvature of proteins and result into the capsid of different sizes and T numbers as demonstrated in [40] . However, since very little is known about this effect, in this paper we exclusively focused on the impact of RNA stiffness resulting from its base pairing in the RNA encapsidation free energy.

The results of our field theory calculations have shown that competition between different forms of RNA for encapsulation by virus coat proteins is a complex function of the degree of branching, effective stiffness of the polymer, linear charge density and excluded volume interactions. The conclusion of previous works that the more branched an RNA is on account of its secondary, base-paired structure, the larger the competitive edge it has to be encapsulated in the presence of coat proteins needs to be refined. Under appropriate conditions of linear charge density and effective chain stiffness, we find that a linear chain may in fact outcompete even the native RNA of a virus, as was recently also shown experimentally. Of course, our conclusions are based on coarse-grained model in which the RNA binding domains of the coat proteins are represented by a smooth, uniformly charged wall. In future work we intend to more realistically model these polycationic tails that form a complex with the polynucleotide. Of par ticular interest here is the impact of excluded volume interactions between these tails and the polynucleotide. 

Proc. Natl Acad. Sci

The authors would like to thank Jef Wagner for useful discussions. This work was supported by the National Science Foundation through Grant No. DMR-1719550 (RZ).

Siyu Li https://orcid.org/0000-0002-4326-7560