key: cord-0878390-is7plpzx authors: Chen, Shi-Jie title: Graph, Pseudoknot, and SARS-Cov-2 Genomic RNA – a Biophysical Synthesis date: 2021-02-03 journal: Biophys J DOI: 10.1016/j.bpj.2021.01.030 sha: 827dcdd6cb89a3774b690edef3ac1e8e740f510e doc_id: 878390 cord_uid: is7plpzx nan performed a systematic computational study to search for the structurally critical nucleotides that may serve as drug targets [1] . Specifically, Schlick et al. exhaustively scanned the mutations to identify nucleotides whose mutations would destroy the native structure. The study led to several surprising findings, including the discovery that mutating only 2-3 critical nucleotides would be sufficient to cause dramatic structural changes and hence possible disruption of frameshifting. Conformational sampling is one of the major challenges for computational study of sequence-structure relationships. Incomplete or poor-quality sampling is often the culprit for inaccurate predictions of RNA folding. Schlick et al. developed and employed a highly innovative, graph-theoretic approach, RAG (RNA-As-Graphs), to tackle the sampling problem [2] . The key strategy of RAG is to transform an RNA 2D structure into a tree (or dual) graph, where loops (helices) and helices (loops) are represented as vertices and edges, respectively. For example, in terms of a dual graph, a two-stem H-type pseudoknot can be represented as two vertices (two helix stems) connected by two edges (two cross-linked loops). Because mapping from a structure to a graph effectively silences information about helix and loop lengths and retains only the "topology" of the network of helix-loop connectivity, RAG J o u r n a l P r e -p r o o f leads to a drastic reduction in the (graphical) conformational space. Another notable feature that distinguishes the RAG model from other coarse-grained RNA folding models is that the model provides a platform for direct application of various powerful and rigorous graph theory tools. For example, the graph partitioning algorithm gives a rigorous method to modularize a structure. Moreover, the exact enumeration of all the possible graphs for a given number of vertices makes it possible to exhaustively sample all possible RNA motifs and structures, including new folds not yet discovered in experimentally determined structures. In the study of sequence-structure relationships, exploring the sequence space with high computational efficiency is another major challenge. To tackle this challenge, Schlick et al. developed the RAG-IF approach to select sequences that fold into a given target structure (the inverse folding problem) [3] . Realizing that RNA folding is intrinsically a 3D problem, Schlick The above approach leads to a number of minimal mutants that destroy the pseudoknot structure and/or helix stem S2. For example, mutants [13441A-G, 13443A-C] in S1 and [13483G-C, 13485U-C] in S2 cause switches from the native pseudoknot to a three-way junction and a three-stem structure with an internal loop, respectively. Although computation was mostly focused on stem 2, the same approach can be used to find critical mutations that destroy stems 1 and 3. The successful application of graph-theoretic approaches demonstrate the advantage of coarse-grained modeling for RNA folding [4] and the power of rigorous mathematical tools, such as graph theory, in biophysical modeling. As shown below, these graph-theoretic approaches may play a unique role in tackling further challenges in the biophysics of SARS-Cov-2 FSE. There are two major challenges in the biophysical modeling of the FSE RNA structures. First, the FSE RNA may form multiple alternative low-energy structures at both the 2D and the 3D structure levels [5, 6] . For example, at the 2D structure level, nucleotide 13448G can switch between base pairing with 13417U and with 13475U, causing two different 2D structures. At the 3D structure level, computer simulation indicates the formation of different 3D folds with the 5'end spacer sequence threading (or not threading) through the junction region between stems S1 and S3. In addition, the formation of the different 3D structures can be further complicated by J o u r n a l P r e -p r o o f metal ion effects [7] . Second, the folding of FSE may be influenced by potential long-range interactions between FSE and other nonlocal regions in the SARS-Cov-2 genome. For example, including nucleotides upstream from the slippery site can lead to formation of alternative helices [8] . Including 50 nucleotides upstream from the slippery site, VfoldPK [9] , a free energy-based model for the folding of pseudoknotted structures, predicts the formation of two low-energy structures with different sets of base pairs for helix S1 and a new helix formed with long-range base pairing between the slippery-spacer region and the 50-nt upstream sequence. Consideration of such a full structure may be necessary to design drug binding to the FSE target. However, in the process of viral translation, the sliding ribosome may disrupt upstream structures. Therefore, predicting the influence from disrupting long-range interactions in the upstream structure on possible structural rearrangements in the downstream FSE is important. The RAG model, with its unique graph-based structure sampling algorithm, offers a highly promising tool for predicting alternative folds, even for larger systems. The fact that RAG/RAG-IF models rely on RNA 2D structure prediction programs highlights the need for an accurate 2D structure prediction program. Sequence alignment can often provide reliable information about conserved nucleotides and base pairs. However, sequence alignment usually cannot give all the base pairs for a structure. Therefore, we need physical models to predict the (remaining) base pairs. Although the RAG algorithm offers an excellent solution to the conformational sampling problem, calculating the free energy for structures, especially for those containing convoluted pseudoknots and loop-helix base triple interactions [10] , requires an accurate physical model. Structure-Altering Mutations of the SARS-CoV-2 Frame Shifting RNA Element RAG: RNA-As-Graphs web resource Inverse folding with RNA-As-Graphs produces a large pool of candidate sequences with target topologies The Multiscale Future of RNA Modeling Cryoelectron Microscopy and Exploratory Antisense Targeting of the 28-kDa Frameshift Stimulation Element from the SARS-CoV-2 RNA Genome Perspectives on Viral RNA Genomes and the RNA Folding Problem Modeling the structure of the frameshift stimulatory pseudoknot in SARS-CoV-2 reveals multiple possible conformers Structure of the full SARS-CoV-2 RNA genome in infected cells Vfold: a web server for RNA structure and folding thermodynamics prediction Predicting loop-helix tertiary structural contacts in RNA pseudoknots The author thanks Yangwei Jiang, Jun Li, Sicheng Zhang, and Yuanzhe Zhou for many useful discussions and Travis Hurst for critical reading of the manuscript. This work was supported by the National Institutes of Health under Grants R01-GM117059 and R35-GM134919 to S.-J.C.