key: cord-0817685-tviofv9o authors: Sun, Lei; Li, Pan; Ju, Xiaohui; Rao, Jian; Huang, Wenze; Ren, Lili; Zhang, Shaojun; Xiong, Tuanlin; Xu, Kui; Zhou, Xiaolin; Gong, Mingli; Miska, Eric; Ding, Qiang; Wang, Jianwei; Zhang, Qiangfeng Cliff title: In vivo structural characterization of the SARS-CoV-2 RNA genome identifies host proteins vulnerable to repurposed drugs date: 2021-02-09 journal: Cell DOI: 10.1016/j.cell.2021.02.008 sha: f4b4906959f53de3a95daf04774c3acff6ab5a7d doc_id: 817685 cord_uid: tviofv9o SARS-CoV-2 is the cause of the ongoing Coronavirus Disease 2019 (COVID-19) pandemic. Understanding of the RNA virus and its interactions with host proteins could improve therapeutic interventions for COVID-19. Using icSHAPE, we determined the structural landscape of SARS-CoV-2 RNA in infected human cells and from refolded RNAs, as well as of the regulatory untranslated regions of SARS-CoV-2 and six other coronaviruses. We validated several structural elements predicted in silico and discovered structural features that affect the translation and abundance of subgenomic viral RNAs in cells. The structural data informed a deep learning tool to predict 42 host proteins that bind to SARS-CoV-2 RNA. Strikingly, antisense oligonucleotides targeting the structural elements and FDA-approved drugs inhibiting the SARS-CoV-2 RNA binding proteins dramatically reduced SARS-CoV-2 infection in cells derived from human liver and lung tumors. Our findings thus shed light on coronavirus and reveal multiple candidate therapeutics for COVID-19 treatment. SARS-CoV-2 is an RNA virus of the Coronaviridae family, which also includes the 56 SARS-CoV-2 and a mutant ( Figure 1C ). The icSHAPE structure data informed our 148 downstream analyses ( Figure 1D) . 149 For the in vivo icSHAPE structural map of the SARS-CoV-2 RNA genome, we obtained 150 an average of about 150 million reads for each library replicate (Supplementary Table S1 ). 151 Underscoring the very high quality of our sequencing data, we found that the inter-replicate 152 Pearson correlation coefficient values are higher than 0.98 for comparison of RNA expression 153 (RPKM) levels of the host transcriptome ( Figure S1A ); and the correlation of the RT-stop 154 caused by NAI-N 3 modifications on the viral RNA genome exceeded 0.99 ( Figure S1B) . 155 Finally, we obtained icSHAPE scores for more than 99.88% of the nucleotides for in vivo 156 SARS-CoV-2 RNA genome structure, by using icSHAPE-pipe (Figures 2A-B , 157 Supplementary Table S2) . 158 To assess the accuracy of our in vivo structure, we calculated an AUC to quantitatively 159 evaluate the predictive performance of icSHAPE scores for the structure models, using a 160 previously established method (Burkhardt et al., 2017 ; Zubradt et al., 2017) (STAR Methods). 161 We first compared the structural data we obtained for 18S rRNA, 28S rRNA, and the signal 162 Figure S1C ), indicating that the icSHAPE scores are consistent with the reference structures. 166 We also compared our structural data with another extensively studied coronavirus, 167 Mouse Hepatitis Virus (MHV), which has a SHAPE reactivity score-directed RNA structural 168 model for its 5'UTR region (Yang et al., 2015) . The two structural models were very similar, 169 with the exception that MHV has apparently lost the SL3 elements. Further, we compared the 170 icSHAPE scores of viral RNAs with the very recently published theoretical models of the 171 secondary RNA structures of the SARS-CoV-2 5'UTR and 3'UTR (Rangan et al., 2020). We 172 observed a high AUC (AUC = 0.854) for the 5'UTR but a relatively low AUC for the 3'UTR 173 (AUC = 0.692). The low AUC may be explained by the dynamic structure of the 3'UTR in 174 cells, for example the alternative conformations between the extended bifurcated stem-loop 175 potential regulatory regions. Interestingly, we observed another five co-variations within a 238 duplex formed between the 3'UTR and "ORF10". ORF10 is cryptic ORF upstream of the 239 3'UTR that was predicted computationally but lacks empirical evidence for the protein or the 240 subgenomic RNA (Kim et al., 2020) . Our structural data raised the possibility that this region 241 is a part of a structure within the 3'UTR ( Figure 3) . 242 Overall, these results support that our SARS-CoV-2 structural analysis using icSHAPE 243 constraints yielded a reliable RNA structural model. This model enabled our identification of 244 candidate functional structural elements, and represents a rich resource to support both basic 245 hypothesis-driven investigations about host-virus biology and the development of potential 246 antiviral applications (e.g., ASO-or siRNA-based therapies). In particular, our comparative structural analyses included a SARS-CoV-2 mutant 257 containing a C241T mutation in the 5'UTR, which is accompanied by the glycine mutation at 258 the residue 614 found in the dominant pandemic form (Korber et al., 2020) . We observed 259 increased flexibility around this position ( Figure 4B ). Remarkably, this structure change will 260 on the one hand result in the loss of a highly stable UUCG tetraloop (Ennifar et al., 2000; 261 Thapar et al., 2014) , and on the other hand create a single-stranded U-rich sequence (Schnell 262 et al., 2012) . 263 Over all the untranslated regions we examined, the icSHAPE profile data revealed 264 conserved structures largely consistent with the phylogeny (Figure 4B ), both in 5'UTR 265 populations based on long-read sequencing (Kim et al., 2020) . We examined these data in the 298 context of our icSHAPE scores and found that the abundance of a particular subgenomic viral 299 RNA was positively correlated with the extent of single-stranded regions within its 5' TRS-B 300 region ( Figure 5B , left, r = 0.239, p = 0.035, Spearman correlation, Figure S5D ). Notably, we 301 analyzed the structure relationship with both the canonical and noncanonical subgenomic 302 viral RNA. To further pursue this structure-abundance correlation using the structure of the 303 TRS-L region, we re-examined our icSHAPE data to identify and exclusively count those 304 reads that i) cross a fusion site and ii) specifically map to a confirmed subgenomic viral RNA 305 ( Figure S5E , STAR method). We found that the TRS-L sequence adopted different secondary 306 structures in different subgenomic viral RNAs depending on the flanking sequence, and that 307 the extent of single-stranded RNA in the TRS-L correlated with abundance ( Figure 5B , right, 308 r = 0.646, p = 1.645e-6, Spearman correlation, Figure S5F ). For example, the TRS-L is more 309 single-stranded in the subgenomic "N" RNA than in the subgenomic "pp1ab" RNA, and the 310 subgenomic N RNA is more abundant than the subgenomic "pp1ab" RNA, Figure 5C ). These 311 data suggest that the abundance of a specific subgenomic viral RNA species may be 312 influenced by its RNA 5' structure. 313 In addition, we examined our icSHAPE scores of the SARS-CoV-2 RNA in the context 314 of recently reported translation efficiency (TE) data for the subgenomic viral RNAs (Finkel et 315 al., 2020). We observed a high Spearman correlation coefficient between TE and the 316 frequency of single-stranded regions in vivo (r = 0.762, p =0.028, Spearman correlation, 317 Figure 5D ). These data suggest that the subgenomic viral RNA structures may functionally 318 impact translation. showing heightened sequence conservation across betacoronaviruses, and various stems 511 demonstrating functional roles in viral infection. For example, studies suggested that the first 512 stem-loop (SL1) in the 5'UTR is necessary for coronavirus replication (Li et al., 2008a) . The 513 third stem-loop contains a TRS core sequence (CS region, CUAAAC), which has been 514 speculated to be critical for the discontinuous transcription characteristic of coronaviruses 515 (van den Born et al., 2005) . In viral genome 3'UTRs, mutually exclusive RNA structures 516 have been shown to control various stages of the RNA synthesis pathway (Goebel et al., 517 2004). Recent virus structural modeling efforts using SARS-CoV-2 genome sequences have 518 confirmed the existence of many of these stem-loops and driven predictions of yet more of 519 these in SARS-CoV-2 (Andrews et al., 2020; Rangan et al., 2020). 520 Our work emphasized that most stem-loops exist in both refolded RNA molecules in 521 vitro and in viruses within host cells, suggesting that co-transcriptional folding and refolding 522 lead to similar, stable structures. But more importantly, our in vivo data also point to potential 523 structural difference when compared with the in vitro and theoretical studies. For example, 524 we observed that the proposed loop region in SL3 is not reactive, supporting the possibility of 525 long-range functional interactions with downstream TRS-B regions, which is understood as 526 integral for successful discontinuous transcription (Enjuanes et al., 2006) . We also noticed 527 that the small stem-loop downstream of SL4 proposed by Rangan et al., 2020 is absent from 528 our in vivo structural data. Instead, our results indicate this region adopts a long, 529 single-stranded conformation in vivo; interestingly, the sequence context of this region is 530 AU-rich, suggesting it may be a hotspot for the binding of RBPs that prefer AU-rich 531 single-stranded structure elements. 532 Overall, our study identified many single-stranded regions in the SARS-CoV-2 genome 533 that are potential targets for interventions through siRNA, ASO, etc. Importantly, our work 534 also revealed and validated structural elements with strong co-evolution support throughout 535 the genome (including in CDS regions), suggesting stable, functionally conserved RNA 536 J o u r n a l P r e -p r o o f structures. Computational methods like ROSETTA and FARFAR are efficient for modeling 537 tertiary structure when accurate secondary structural models are available (Das and Baker, 538 2007; Leman et al., 2020) . Thus, our data will inform reliable tertiary structure models of the 539 SARS-CoV-2 genome, which may reveal druggable pockets vulnerable to small molecules. 540 Indeed, functional RNA structural elements can be targeted by small compounds to disrupt 541 viral infectivity (Ren and Patel, 2014). Thus, the RNA structures we have uncovered in 542 SARS-CoV-2 could facilitate target discovery and the development of antiviral therapeutics. 543 Our in vivo RNA structure also provides the groundwork to accurately predict host RBPs In addition to recruiting the translation machinery, SARS-CoV-2 may interact with many 557 host proteins including RNA metabolism proteins and enzymes such as helicases. For 558 example, based on our predictions, the helicase DDX42 is likely hijacked by the virus to help 559 evade cell innate immune response (Beachboard and Horner, 2016) . Interestingly, our 560 findings suggest that stress granule proteins including TIA1, IGF2BP1, and PTBP1 interact 561 with the SARS-CoV-2 RNA genome. Previous studies reported that TIA1 interacts with the 562 minus-strand 3' terminal stem loop (SL) of the West Nile virus RNA, which inhibits stress 563 granule formation and facilitates flavivirus genome RNA synthesis (Emara and Brinton, 564 2007). Intriguingly, inhibition of stress granules is known to promote replication of 565 MERS-CoV (Nakagawa et al., 2018). Overall, these SARS-CoV-2 RNA-host protein 566 J o u r n a l P r e -p r o o f interactions will substantially extend our insight into SARS-CoV-2 biology and shed light on 567 the molecular mechanism of viral infection. 568 Finally, the present study illustrates how the identification of conserved RNA structures 569 and host RBPs that bind to viral RNA genomes can be exploited to develop antiviral drugs. 570 Using an innovative Caco-2 cell SARS-CoV-2 infection platform to test antiviral drugs, we 571 found inhibitor drugs targeting predicted host factor proteins successfully reduced 572 SARS-CoV-2 infection. Treatments with ASOs targeting conserved RNA structure and 573 predicted RBP binding sites, or siRNA knock-down of predicted host factors also showed 574 moderate inhibitory effects against SARS-CoV-2 infection, suggesting effective approaches 575 for interventions. Overall, our strategy holds great promise for repurposing existing drugs and 576 developing innovative strategies to fight against the still-ongoing SARS-CoV-2 pandemic and 577 to combat viral disease more generally. 578 Although this study provides a rich resource of SARS-2-Cov RNA structures and uses 580 this information to predict host proteins that are vulnerable for drug repurposing, there are 581 nevertheless a number of limitations, stemming both from the technology we used for 582 structure measurement and regarding the validations of the drug candidates. First, the 583 SARS-CoV-2 RNA structural information obtained by icSHAPE must be understood as an 584 ensemble representing different life stages of the virus (e.g., replication/transcription, 585 packaging principle bind to SARS-CoV-2 RNA; cellular context information such as protein abundance 599 and localization data are not considered. More physiologically relevant predictions of host 600 factors could be obtained by incorporating these parameters into PrismNet predictions. 601 Finally, although we have demonstrated that some repurposed FDA-approved drugs can 602 effectively inhibit viral infection in different cells using both the SARS-CoV-2 N 603 trans-complementation system and the bona fide SARS-CoV-2, their mechanisms of action 604 should be studied further, and their efficacy and side-effects must be assessed by in vivo 605 validations using animal models prior to any possible clinical application. 606 Detailed methods are provided in the online version of this paper and include the following: 609 Biochem 77, 77-100. 817 Li, Z., and Nagy, P.D. (2011). Diverse roles of host RNA binding proteins in RNA virus 818 replication. RNA Biol 8, 305-315. 819 Liu, N., Dai, Q., Zheng, G., He, C., Parisien, M., and Pan, T. (2015) . 820 N(6)-methyladenosine-dependent RNA structural switches regulate RNA-protein interactions. 821 Nature 518, 560-564. 822 Liu, Y., Wang, Y., Wang, X., Xiao, Y., Chen, L., Guo, L., Li, J., Ren, L., and Wang, J. (2020). water, and samples were heated to denature RNA structure at 90°C for 2 min. Samples were 1045 then transferred onto ice immediately for more than 2 min. 3.3 µl of 3.3× SHAPE refolding 1046 buffer (333 mM HEPES (pH 7.5), 20 mM MgCl 2 and 333 mM NaCl) was added to the RNA 1047 and incubated at 37°C for 5 min. 1µl of 1M NAI-N 3 was added to the refolded samples and 1048 incubated at 37°C for 10 min. In vitro modified RNA was extracted as outlined in the above 1049 steps. Table S2 ). We amplified these regions using primers including the P7 promoter sequence 1059 (Supplementary Table S5 ). We synthesized RNA in vitro from PCR products using a 1060 HiScribe TM T7 Quick kit following manufacturer instructions. After overnight incubation, 1061 DNA was removed using DNase I. Then, the in vitro transcribed RNA was purified using a 1062 Hipure RNA pure Micro Kit. RNA probe was shortened to one hour (to minimize the RNA structure refolding) ( Figure 6C ). 1102 For the mutation and rescue RNA probe of ILF3 targets, all the steps followed the description 1103 provided in Figure 6B ( Figure 6D ). 1104 1105 Immunoblotting was used to examine RNA pull-down results, using antibodies for IGF2BP1, 1107 TIA1, PTBP1, hnRNPK, hnRNPA1, NONO, U2AF2, CAPRIN1, ILF3, and GAPDH 1108 (Abcam). Elution samples from RNA pull-downs were boiled at 95 °C for 10 min, followed 1109 by immunoblotting as previously described (Sun et al., 2019). 1110 qPCR was used to quantify SARS-CoV-2 infection and amplification before collection of 1111 the infected cells. 1112 Biotin labeled RNA of the SARS-CoV-2 UTR and the control GFP RNA were synthesized 1115 using the HiScribe TM T7 Quick kit. We followed the manufacturer instructions with the 1116 following modification: we added biotin-16-UTP into the 10mM NTP mix for biotin labeling. 1117 Human lung cells A549 (1 × 10 7 ) were lysed using the lysis buffer. Cell lysis were incubated 1118 with RNA probe for 3 hours and then added with C1 beads for another one hour. After 1119 washing, the pull-down proteins were eluted in 30µL of 1×LDS SAMPLE buffer (Thermo 1120 Fisher, cat#NP0007) and heated at 90 °C for 10 min. The SARS-CoV-2 UTR and the GFP 1121 The SARS-CoV-2-GFP∆N genome was assembled using in vitro ligation of five fragments A, The icSHAPE sequencing data was processed using icSHAPE-pipe . The 1229 processing steps were as follows: 1) Duplicated reads in raw fastq files were collapsed; 2) 3' 1230 Adaptor sequence in the reads and the first 10nt from 5' were removed using trimmomatic 1231 (Bolger et al., 2014) ; 3) Clean reads were mapped to human rRNA with bowtie2 (Langmead 1232 J o u r n a l P r e -p r o o f and Salzberg, 2012); 4) Un-mapped reads were mapped to the human genome using STAR 1233 (Dobin et al., 2013) ; 5) Remaining unmapped reads were mapped to the SARS-CoV-2 1234 sequence (Genbank ID: NC_045512.2) with bowtie2; 6) Sam files were convert into .tab files 1235 using icSHAPE-pipe sam2tab; 7) The icSHAPE score was calculated using icSHAPE-pipe 1236 NAI_rep1.tab,NAI_rep2.tab -size virus_len.txt -wsize 50 -out virus_shape.gTab. 8) The .gTab 1238 file was converted to .shape format using icSHAPE-pipe genSHAPEToTransSHAPE -i 1239 virus_shape.gTab -s virus.fa.len -c 100 -o virus_shape.shape. We set -c 100 to retain bases 1240 with a read depth greater than 100 (Table S2) . 1241 To assess data quality, Pearson correlation coefficients were calculated based on the 1242 RPKM of host transcriptome between replicates. We also compared consistency of the reverse 1243 transcription (RT) stop counts of SARS-CoV-2 across all samples ( Figure S1 ). 2008). For 18S rRNA and 28S rRNA with 3D models, we used the PDB structure (id: 6ek0) 1260 to calculate the solvent accessibility for 2'-OH of each nucleotide in a 3D model (retaining 1261 those bases with solvent accessibility > 3) to evaluate the AUC. 1262 To define a structurally variable regions between in vivo and in vitro conditions, we used a 1266 method combining a binomial test and a permutation test to call significantly different 1267 structural regions (Figure 2 ). The algorithm is summarized as four steps below. 1268 Step 1: Estimate the random background noise. We calculated the L1 distance of 1269 icSHAPE reactivity scores for each nucleotide between replicates (for in vivo and in vitro 1270 separately). Then we aggregated all L1 distances from in vivo and in vitro conditions, which 1271 were used as the background distribution of the technical variations of icSHAPE scores. We 1272 defined the top 5% of the L1 distance as the threshold of random noise: ΔS = 1273 ( , 0.95). 1274 Step 2: Search for significantly different regions with sliding windows. The virus 1275 genome was split into sliding windows (window size: 10 nt, window step: 1 nt). The L1 1276 distance of icSHAPE reactivity scores from two conditions are calculated and the windows 1277 with the number of differential nucleotides (L1 distance > ΔS ) is greater or equal than 3 1278 are defined as differential windows. 1279 Step 3: Keep the top differential windows. We only preserve the top 10% of average L1 1280 distances of all differential windows. 1281 Step 4: Merge overlapped windows. 1282 1283 To construct RNA secondary structural models for a complete SARS-CoV-2 genome, we used 1285 the partition program and MaxExpect program in the RNAstructure software suite (Reuter and 1286 Mathews, 2010) to predict secondary structure with icSHAPE scores as the pseudo-energy 1287 constraint. We set the maximum pairing distant as 300 nt. To identify a combination of slope 1288 and intercept parameters, we used grid search to predict a structure of the UTR and flanking 1289 region which is consistent with the Rfam model (Kalvari et al., 2018a) . We then used the 1290 parameter to predict the structure. We used a sliding window with a length of 5000 nt and a 1291 step size of 1000 nt to predict the structure of full-length viral RNA. Structure models with 1292 J o u r n a l P r e -p r o o f higher pairing probabilities produced by the partition program were selected for RNA 1293 structures of overlapping regions. We visualized RNA structure using VARNA 1294 We retrieved the Coronaviridae sequences in ViPR database (https://www.viprbrc.org/). 1314 We only leave those sequences with complete genome and remove duplicate genome 1315 sequences. Finally, we obtained 10,852 sequences. To remove those redundant sequences, we 1316 used CD-HIT (Fu et al., 2012) to remove sequences with a similarity higher than 99%: Finally, 1317 1,367 sequences are leaved for downstream analysis. 1318 The full-length SARS-CoV-2 genome is divided into fragments according to the 1320 secondary structure model we built, and each fragment is an independent secondary structure. 1321 The sequence and structure model of each fragment is used to construct a stockholm file. The 1322 stockholm file is used to construct a covariance model (.cm file) with cmbuild (from Infernal). 1323 The homologous aligned sequences are retrieved from sequences databases (1,367 sequences) 1324 with cmsearch. Then those duplicated sequences in the alignment file are removed. The 1325 remaining sequences are used to build a new covariance model with cmbuild. The new 1326 covariance model can be used to search homologous sequences as described above. This 1327 process is repeated at most three times or util no new sequences can be added. We developed 1328 this method mainly refer to Rfam's method of construction of seed alignment (Kalvari et al., 1329 2018b). Covariance score in the resulting alignment was calculated referring to RNAalifold 1332 (Hofacker, 2007) . To summarize, given a multi-sequence alignment file, the covariation score 1333 for column i and column j is defined as 1334 Covarying base pairs with a score ranging from 0.4-0.5 were defined as weak covariation, 1339 scores ranging from 0.5-0.7 were defined as medium covariation, and scores greater than 0.7 1340 were defined as strong covariation (Figures 2, 3, 4 and S3, S4, S5). icSHAPE score of every subgenomic RNA in the TRS-L region was calculated as described 1351 above "Data quality control and icSHAPE score calculation". 1352 The Spearman correlation efficient and the two-tailed P values were calculated using the 1353 Python package function scipy.stats.spearmanr. For input, the sequences and the icSHAPE data of the SARS-CoV-2 UTRs and flanking 1371 regions were split into sliding windows (window size: 101nt, window step: 20nt). Input 1372 sequences were encoded with the one-hot encoding (A, C, G, U, 4-dimension), and the 1373 structural data were encoded as the fifth dimension (icSHAPE values ranging from 0 to 1, 1374 1-dimension). Missing icSHAPE scores (Null) were dubbed "-1". 1375 For each RBP and a sliding window, if the output of binding probability is larger than 1376 0.85 by the PrismNet model, we defined the sequence window as a predicted binding site of 1377 the RBP. Overlapped binding sites for the same protein were merged (Supplementary Table 1378 S4). 1379 Raw mass spectrometry data were searched against the human proteome (Uniprot database) 1382 with Proteome Discoverer Software. Subsequently, the MiST scoring algorithm was used to 1383 calculate the specific binding proteins in SARS-CoV-2 using the default parameters (Jager et 1384 al., 2011). We used the threshold (MiST score > 0.7) to confidently obtain interacting host 1385 proteins. To further validate the data quality and identified proteins, we compared the total 1386 number of identified peptide spectra matched for the protein between replicates (r =0.97, 1387 pairs. The red dashed boxes label the structural regions with differences compared to 1440 Rangan's structural models (Rangan et al., 2020). 1441 See also Figure S1 , S2, Table S1 , S2 1442 1443 Figure 3 . Schematic of the SARS-CoV-2 RNA structure (1nt-394nt and 1444 21473nt-29876nt). 1445 Nucleotides are colored with icSHAPE reactivity scores; blue bars show the probability 1446 of base pairing. Nucleotides with a color background were predicted as co-variant base 1447 pairs. The boxplot insets at the bottom show the distributions of icSHAPE reactivity 1448 scores. Note that a full-length structure model of the SARS-CoV-2 RNA genome is 1449 shown in Figure S3 . The start and stop sites of each ORF are labeled with green and 1450 yellow colors. 1451 See also Figure S3 , pairs. 1468 See also Figure S4 , S5, Table S2 24h, including a "Scramble" control treated with a non-targeting ASO, a "Not treated" 1528 control with no ASO treatment, and a "NC" control treated with an ASO targeting 1529 ORF1ab (without a predicted RBP binding site). Data represent the mean ± SEM; n = 3 1530 biological replicates. 1531 n.s.: not significant. ***<0.005, **<0.01 and *<0.05 using one-way ANOVA and post 1532 hoc Student's t-test. 1533 See also Table S4 . RBPs predicted to bind the UTRs of SARS-CoV-2, with locations and binding 1715 probabilities, Related to Figure 6 . 1716 Table S5 . Sequence of siRNAs, qPCR primers for siRNA knockdown validation, ASOs, 1717 RNA probes for RNA pull down, and primers for coronaviruses in vitro transcription. Related 1718 to Figures 4, 6 , 7, S4, S5, S6 and S7. 1719 Table S6 . Predicted conserved RNA structural elements in the SARS-CoV-2 RNA genome. 1720 Related to Figure 3 , 7, S3 and S7. 1721 Sun et al. determined the SARS-CoV-2 RNA genome structure in infected cells and from refolded RNAs, which enabled prediction of 42 host proteins that bind to viral RNA using a deep learning tool, and identification of FDA-approved drugs for repurposing to reduce SARS-CoV-2 infection in cells. We generated in vivo structure maps and models of the SARS-CoV-2 RNA genome Start codon BtCoV-HKU9 AUC=0.848 25 275 250 225 200 175 150 125 100 75 50 300 325 25 275 250 225 200 175 150 125 100 75 50 300 325 25 275 250 225 200 175 150 125 100 75 50 300 325 25 275 250 225 200 175 150 125 100 75 50 300 350375 400 425 450 325 25 275 250 225 200 175 150 125 100 75 50 300 350 375 400 250 225 200 175 150 125 100 75 50 300 350 375 325 25 275 250 225 200 175 150 125 100 75 50 300 350 375 325 25 275 250 225 200 175 150 125 100 75 50 300 350 64 1 IGF2BP1 ILF3 WDR33 NOP56 KHSRP HNRNPUL1 HNRNPU HNRNPK HNRNPC CPSF2 UPF1 SND1 MOV10 RTCB HNRNPA1 CAPRIN1 TIAL1 SAFB2 PTBP1 NPM1 NONO KHDRBS1 IGF2BP3 FAM120A EWSR1 U2AF2 TROVE2 TIA1 TARDBP SBDS RBM27 RBM22 QKI HNRNPF GTF2F1 GNL3 FUS FBL DDX42 CSTF2 BtCoV-HKU5 AUC=0.815 Control WT mut1 rescue1 mut2 rescue2 G U G U G G C U G U C A C U C G G C U G C A U G C U U A G U G C A C U C A C G C A G U A U A A U U A A U A A C U A B C D F E G 5'UTR SL2,SL3 SL4SL1 U U U C G A U C U C U U G U A G A U C U G U U C U C U A A A C G A A C U U U A A A A U C U 3'UTR SL (ORF10) A U U A A A G G U U U A U A C C U U C C C A G G U A A C A A A C C A A C C A A C G G C U A U A U A A A C G U U U U C G C U U U U C C G U U U A C G A U A U A U A G U C U A C U ILF3 GAPDH U A C U G G C U A U A U A A A C G U U U U C G G G C U U U U C C C G U U U A C G A U A U A U A G U C A A C U U G29579C G29573C U29568A U29566A C29586G C29592G A29596U A29598UU U U G A G G A U U U A G A A G A G C U U U U G G U G A A U A C A G U C A U G U A G U U G C C U U U A A U A C U U U A C C U A U U C G U C U U A C G mut-disrupt = G9463A+U9466C+U9511C rescue = mut-disrupt + A9472G+C9520U+A9517G G9463A U9466C A9472G C9520U A9517G U9511C m u t -dC U U G A G U G U A A U G U G A A A A C U A C C G A A G U U G U A G G A G A C A U U A U A C U U A A 6460 6470 6480 6490 6449-6498 C C U A A C A A U G A G C A G U G C U G A C U C A A C U C A G G C C U A A A C U C A U G C 29510 29520 29530 29502-29541 U U A U G A G G U U U A G A A G A G C U U U U G G U G A A U A C A G U C A U G U A G U U G C C U U U A A U A C U U U A C U A U U C C U U A UuORF A U U A A A G G U U U A U A C C U U C C C A G G U A A C A A A C C A A C C A A C U U U C G A U C U C U U G U A G A U C U G U U C U C U A A A C G A A C U U U A A A A U C U G U G U G G C U G U C A C U C G G C U G C A U G C U U A G U G C A C U C A C G C A G U A U A A U U A A U A A C U A A U U A C Us2m G A C C A C A C A A G G C A G A U G G G C U A U A U A A A C G U U U U C G C U U U U C C G U U U A C G A U A U A U A G U C U A C U C U U G U G C A G A A U G A A U U C U C G U A A C U A C A U A G C A C A A G U A G A U G U A G U U A A C U U U A A U C U C A C A U A G C A A U C U U U A A U C A G U G U G U A A C A U U A G G G A G G A C U U G A A A G A G C C A C C A C A U U U U C A C C G A G G C C A C G C G G A G U A C G A U C G A G U G U A C A G U G A A C A A U G C U A G G G A G A G C U G C C U A U A U G G A A G A G C C C U A A U G U G U A A A A U U A A U U U U A G U A G U G C U AU C C C C A U G U G A U U U U A A U A G C U U C U U A G G A G A A G A U U U A A G A G A A U A G C C U A G C U A U C C C U C U C U C U C G U U C U C U U G C A G A A C U U U G U U U U U A A C G A A C U U A A A U A A U A G C C C U G C U G G U U U G C G U G C U G C G A U A C C U U U C U G C U G C G U C A U A G G C G C C G G A C U G G A A A G C G C A U G U A C A C C A C U G G G U A U A A U U A A A A C U G A A U A A U A U U U U U C A G U U A G A G C A U C G U G U C U C A A G U G C U U C A C G G U C A C A A U A U A C C G U U U C G U C G G G U G C G U G G C A A U U C G G U G C A C A U C A U G U C U U U C G U G G C U G G U G U G G C U C C U C A A G G U G C G A G G G G C A A G U A U A G A G C A G A A C U C A A C A C U G A A A A A A G G A C U G A C C A U G U G U C U C U C A A A G C G U C A C U C U G U G A U G C A G G A G A U C U G G U U C U C A A G A U C U C A C C A U G G U U U A U G G A C G G C G A A A G C G C C U A U A A A C A U G U G A G U G A A C ASL1 SL2 SL3 SL4 SL5 SL6 SL7 A U A U U A G G U U U U U A C C U A C C C A G G A A A A G C C A A C C A A C C U C G A U C U C U U G U A G A U C U G U U C U C U A A A C G A A C U U U A A A A U C U G U G U A G C U G U C G C U C G G C U G C A U G C C U A G U G C A C C U A C G C A G U A U A A A C A A U A A U A A A U U U U A C U G U C G U U G A C A A G A A A C G A G U A A C U C G U C C C U C U U C U G C A G A C U G C U U A C G G U U U C G U C C G U G U U G C A G U C G A U C A U C A G C A U A C C U A G G U U U C G U C C G G G U G U G A C C G A A A G G U A A G A U G G A G A G C C U U G U U C U U G G U G U C A A C G A G A A A A C A C A C G U C C A A C U C A G U U U G C C U G U C C U U C A G G U U A G A G A C G U G C U A G U G C G U G G C U U C G G G G A C U C U G U G G A A G A G G C C CU A U C G G A G G C A C G U G A Weak covariation Medium covariation Strong covariation HCoV-HKU1 AUC=0.738 HCoV-NL63 AUC=0.797 BtCoV-HKU9 (Rfam ID: RF03117) AUC=0.764 SARS-CoV AUC=0.632 SARS-CoV-2-T AUC=0.668 Zika virus produces noncoding RNAs using 685 a multi-pseudoknot structure that confounds a cellular exonuclease An in silico map of the SARS-CoV-2 RNA Structurome RNA STRAND: the RNA 690 secondary structure and statistical analysis database Hepatitis C virus hijacks P-body and stress granule components around lipid 693 droplets In Vivo Mapping of Eukaryotic RNA Interactomes Reveals 696 Principles of Higher-Order Organization and Regulation Innate immune evasion strategies of DNA and 698 RNA viruses RiboVision suite for visualization and analysis 701 of ribosomes Trimmomatic: a flexible trimmer for 703 Illumina sequence data Hierarchy and dynamics of RNA folding Operon mRNAs are organized into ORF-centric structures that predict translation efficiency RIC-seq for global in situ profiling of RNA-RNA spatial interactions Automated de novo prediction of native-like RNA tertiary 713 structures Overlapping local and long-range RNA-RNA interactions modulate dengue virus 717 genome cyclization and replication Accurate SHAPE-directed 719 RNA structure determination Pervasive tertiary 722 structure in the dengue virus RNA genome STAR: ultrafast universal RNA-seq aligner An interactive web-based dashboard to track COVID-19 in 728 real time Interaction of TIA-1/TIAR with West Nile and 730 dengue virus products in infected cells interferes with stress granule formation and processing 731 body assembly Biochemical aspects of coronavirus 734 replication and virus-host interaction The crystal structure of UUCG 737 tetraloop Dengue virus genomic 740 variation associated with mosquito adaptation defines the pattern of viral non-coding RNAs 741 and fitness in human cells The coding 744 capacity of SARS-CoV-2 Structural and mechanistic insights into hepatitis C 746 viral translation initiation CD-HIT: accelerated for clustering the 748 next-generation sequencing data Structure of the RNA-dependent RNA polymerase from COVID-19 virus Characterization of the 753 RNA components of a putative molecular switch in the 3' untranslated region of the murine 754 coronavirus genome A SARS-CoV-2 protein interaction map 757 reveals targets for drug repurposing The Molecular Modeling Toolkit: A New Approach to Molecular 759 Simulations RNA consensus structure prediction with RNAalifold SARS-CoV-2 Cell Entry 764 Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor POSTAR: a platform for 767 exploring post-transcriptional regulation coordinated by RNA-binding proteins Recognition of RNA N(6)-methyladenosine by IGF2BP proteins 771 enhances mRNA stability and translation Global landscape of HIV-human 774 protein complexes A novel cell culture system modeling the SARS-CoV-2 life cycle Rfam 13.0: shifting to a genome-centric 780 resource for non-coding RNA families Non-Coding RNA Analysis Using the Rfam Database Viral IRES RNA structures and ribosome interactions The Architecture 787 of SARS-CoV-2 Transcriptome SARS-CoV-2 790 structure and replication characterized by in situ cryo-electron tomography Spike: evidence that D614G increases infectivity of the COVID-19 virus RAxML-NG: a 797 fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 801 receptor Fast gapped-read alignment with Bowtie 2 Macromolecular modeling 806 and design in Rosetta: recent methods and frameworks Structural lability in stem-loop 1 drives a 5' UTR-3' UTR interaction in coronavirus 809 replication icSHAPE-pipe: A comprehensive toolkit for icSHAPE 811 data analysis and evaluation J o u r n a l P r e -p r o o f 300 275 250 225 200 175 150 125 100 75 50 325 25 300 275 250 225 200 175 150 125 100 75 50 325 25 300 275 250 225 200 175 150 125