key: cord-0801594-h8u16436 authors: Yang, Shaomin; Zhou, Hong; Cruz-Cosme, Ruth; Liu, Mingde; Xu, Jiayu; Niu, Xiaoyu; Li, Yaolan; Xiao, Lizu; Wang, Qiuhong; Zhu, Hua; Tang, Qiyi title: Circular RNA profiling reveals abundant and diverse circRNAs of SARS-CoV-2, SARS-CoV and MERS-CoV origin date: 2020-12-08 journal: bioRxiv DOI: 10.1101/2020.12.07.415422 sha: 42968a739f0b7a1c16ebc1439acccccadc805d2e doc_id: 801594 cord_uid: h8u16436 Circular RNAs (circRNAs) encoded by DNA genomes have been identified across host and pathogen species as parts of the transcriptome. Accumulating evidences indicate that circRNAs play critical roles in autoimmune diseases and viral pathogenesis. Here we report that RNA viruses of the Betacoronavirus genus of Coronaviridae, SARS-CoV-2, SARS-CoV and MERS-CoV, encode a novel type of circRNAs. Through de novo circRNA analyses of publicly available coronavirus-infection related deep RNA-Sequencing data, we identified 351, 224 and 2,764 circRNAs derived from SARS-CoV-2, SARS-CoV and MERS-CoV, respectively, and characterized two major back-splice events shared by these viruses. Coronavirus-derived circRNAs are more abundant and longer compared to host genome-derived circRNAs. Using a systematic strategy to amplify and identify back-splice junction sequences, we experimentally identified over 100 viral circRNAs from SARS-CoV-2 infected Vero E6 cells. This collection of circRNAs provided the first line of evidence for the abundance and diversity of coronavirus-derived circRNAs and suggested possible mechanisms driving circRNA biogenesis from RNA genomes. Our findings highlight circRNAs as an important component of the coronavirus transcriptome. Summary We report for the first time that abundant and diverse circRNAs are generated by SARS-CoV-2, SARS-CoV and MERS-CoV and represent a novel type of circRNAs that differ from circRNAs encoded by DNA genomes. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a single strand and positive 25 sense RNA virus and belongs to the Betacoronavirus genus of the family of Coronaviridae (CoVs). 26 It is responsible for the ongoing global pandemic of COVID-19. SARS-CoV-2 shares ~80% 27 homology with severe acute respiratory syndrome coronavirus (SARS-CoV) and is more closely CoVs use an RNA-dependent RNA polymerase (RdRp) to generate genomic RNA and 75 sgRNA transcripts in the cytoplasm of host cells. We thus reasoned that CoV circRNAs, if existed, 76 are likely to circularize independent of splicing, which occurs in the nucleus. Several circRNA 77 prediction algorithms have been developed to identify BSJ reads from RNA-Seq data and to 78 predict the 5' and 3' breakpoints (23). CIRI2 (23) is the only tool that adopts an MLE-based 79 algorithm to unbiasedly identify back-splice junction (BSJ) reads independent of a circRNA 80 reference annotation file. It is more sensitive and accurate than two other de novo circRNA 81 identification tools (23). Therefore, we used the recommended CIRI2 pipeline (24) to perform de 82 novo circRNA discovery and assembly. 83 To improve the assembly accuracy and to simplify follow-up comparison, we combined 84 reads of biological triplicates into single datasets. After mapping with BWA-MEM (25), we 85 obtained 1,216,403,242 total reads from the SARS-CoV-2 dataset with 36.6% mapped to SARS-86 CoV-2. The MERS-CoV dataset had a similar percentage (30.2% of 316,893,928 total reads) 87 mapped to the viral genome. And 87.0% of the 1,127,121,362 total reads from the SARS-CoV 88 dataset was mapped to SARS-CoV. The SARS-CoV-2 and SARS-CoV datasets showed sharp 89 peaks at the 5' leader sequence and high coverage towards the 3' end of the genome ( Figure 1A 90 and 1B). Genome coverage of the MERS-CoV dataset was substantially lower due to the removal 91 of linear RNAs by RNase R (Figure1A and 1B) . We observed above-threshold coverage in the last 92 5,000 nucleotides (nt) of the MERS-CoV genome, corresponding to E, N, ORF8b and the 3'UTR. 93 CIRI2 identifies circRNAs by aligning chimeric reads to the 3' donor sequence and the 5' 94 acceptor sequence and determining the exact breakpoints of the BSJ ( Figure 1C ). By this definition, To examine the circRNA landscape, we mapped all identified circRNAs by the 5' and 3' 108 breakpoints of the BSJs to their respective genomic locations and estimated the back-splicing 109 frequency by counting the reads spanning the BSJs ( Figure 1D -1F). We identified two major types 110 of back-splicing events shared by all three CoVs: 1) long-distance back-splicing between the 3' 111 end of the genome and the 5' end of the genomes; 2) local back-splicing in regions corresponding 112 to the N gene of SARS-CoV-2 and SARS-CoV and the 3'UTR of MERS-CoV). We also noticed 113 back-splicing events that specifically occur in SARS-CoV-2 or MERS-CoV. Local back-splicing 114 around position 1500-2500 (Nsp2), 5500-6500nt (Nsp3) and 22000-23000nt (S) of the MERS-115 CoV genome occurred at high frequency ( Figure 1F ), whereas middle-distance back-splicing from 116 SARS-CoV-2 genomic region 7501-8000 (Nsp3) to 1-500 (5'UTR) and from 27501-28000 117 (ORF7a/ORF7b) to 22001-22500nt (S) was observed at high frequency ( Figure 1D ). 118 Next, we performed de novo reconstruction and quantification of full-length SARS-CoV-119 2, SARS-CoV and MERS-CoV circRNAs using the CIRI-full (24) algorithm. We got 300 120 reconstructed SARS-CoV-2 circRNAs, of which 127 (42.3%) were full-length. Of 201 assembled 121 SARS-CoV circRNAs, 122 (60.7%) were full-length. We also got 1,024 reconstructed MERS-122 CoV circRNAs, with 81.6% were fully assembled, suggesting that RNase R treatment improves 123 circRNA reconstruction. De novo assembly of host circRNAs resulted in 4,815 (49.9%) full-length 124 monkey circRNAs and 31,808 (100%) full-length human circRNAs. 125 Furthermore, we compared the features of circRNAs derived from CoVs with those from 126 the host genomes. The length of nuclear genome-derived circRNAs (nu-circRNAs) is highly 127 conserved across species with the majority ranging from 250 to 500 nt (24). We observed similar 128 length distribution in full-length monkey and human genome-derived circRNAs ( Figure 2A ). CoV 129 circRNAs shared a different length distribution pattern ( Figure 2B ). The average length of SARS-130 CoV-2 and MERS-CoV circRNAs was over 150 nt longer than that of the host circRNAs ( Figure 131 2A and 2B). And more SARS-CoV-2 and MERS-CoV circRNAs were over 1,000 nt long whereas 132 host circRNAs are rarely over 750 nt in length. Since CoV have both positive and negative 133 genomic and subgenomic RNAs, we examined the strandness of CoV circRNAs. CircRNAs 134 generated by both host genomes showed no strand preference (Vero: 51.9% positive-stranded; 135 Calu-3: 51.0% positive-stranded). In contrast, 59.5% of SARS-CoV-2 circRNAs, 56.3% of SARS-136 CoV circRNAs, and 85.1% of MERS-CoV circRNAs were negative-stranded ( Figure 2A ). This 137 result suggests that CoV circRNAs have a preference for negative strand. Additionally, only 1 FSJ could be detected in predicted full-length CoV circRNAs, whereas about 145 50% of host circRNAs had at least 2 FSJs ( Figure 2D ). Next, we looked for predicted full-length 146 CoV circRNAs that share the same BSJ breakpoints but differ in length. We found that MERS-147 CoV circRNA 1262|29148 produces two isoforms, both of which contain one FSJ. The longer 148 isoform (1,051nt) has the FSJ 2223|29060, whereas the shorter isoform (155nt) has the FSJ 149 1316|29049. This result shows that very few CoV circRNAs could have isoforms. 150 In conclusion, we analyzed SARS-CoV-2, SARS-CoV and MERS-CoV related deep RNA-151 Seq datasets, and identified a large amount of CoV circRNAs. The circRNAs of CoV origin have 152 features in common and can be distinguished from circRNAs derived from the human and monkey 153 host genomes. We have shown that CoV circRNAs are expressed at higher level and longer in 154 length than host circRNAs and tends to be negative stranded. We identified BSJ hotspots for 155 circRNAs derived from each CoV, and found that distant back-splicing from the tail of the genome 156 to the head of the genome and local back-splicing in regions corresponding to the N gene and the 157 3'UTR occur at the highest frequency. 158 159 We extracted total RNA from Vero E6 cells mock-treated or infected with SARS-CoV-2 at 24 hpi. 161 Forward and reverse divergent primers were designed to maximize the chances of amplifying BSJ 162 sequences ( Figure 3A and 3B). To validate the two major back-splicing events, we performed 163 inverse RT-PCR with primer pairs that targeting either the distant BSJ hotspot 29001-29903|1~500 164 or the local BSJ hotspots 28501~29500|27501~28500 ( Figure S2A -S2C). We also performed 165 inverse RT-PCR with divergent primer sets targeting the most abundant SARS-CoV-2 circRNAs 166 predicted by CIRI2 ( Figure 3C ). Majority of the inverse RT-PCR reactions using the infected 167 sample as template resulted in products ranging from 200bp to 800bp, whereas no amplification 168 was seen from the mock samples. Notably, many candidate inverse RT-PCR products were more 169 abundant than that of circHIPK3, a known highly expressed human circRNA that served as a 170 positive control ( Figure 3C , S2A and S2B). We gel-purified candidate PCR products based on the 171 size, subcloned by TA cloning, and Sanger-sequenced at least 8 colonies for each candidate BSJ 172 sequence. The sequencing results revealed the surprising diversity of SARS-CoV-2 circRNAs and 173 support our predictions from the bioinformatic analyses. First, all gel-purified bands represent 174 more than one PCR product of the same size. While highly expressed circRNAs, such as 175 29194|27797 and 28853|28467, represent over 50% of the confirmed clones (29194|27797: 5/7 176 with 29083-F and 27893-R; 28853|28467: 4/8 with 28809-F and 28494-R; Figure 3D and 3F), 177 most other purified bands contain a variety of circRNAs (data not shown). Secondly, we confirmed 178 that the breakpoints of a given circRNA is surprisingly flexible. For example, PCR products 179 amplified by 29668-F/29572-F and 51-R contain a distant BSJ. However, the 3' breakpoint ranges 180 from genomic location 29,080nt to 29,767nt, and the 5' breakpoint was between genomic location 181 7nt and 19nt ( Figure S3B ). When a deviation of 10nt was considered for the breakpoints, the 182 predicted BSJ 29758|8 represent 8 out of the 13 BSJs confirmed by sequencing. Thirdly, both the 183 distant and the local back-splicing events were validated by multiple BSJs. We detected distant 184 fusion from ORF6, N, ORF10 and the 3'UTR to the 5' UTR (data not shown). We also detected 185 local fusion within N, and from N to ORF7a, ORF7b, and ORF8 (data not shown). In summary, 186 our RT-PCR and sequencing results validated the diversity of SARS2 circRNAs at the genome 187 level and at the circRNA level. confirmed that the viral titer was comparable among the infected samples ( Figure S3A ). We found 200 that the bands (red arrowheads) corresponding to circRNA 29122|28295 were strong in all the 201 samples except for infected-24hpi-rep2, which is still detectable but significantly lower (Figure 202 3I). Interestingly, we found that the abundance of others candidate BSJ products (green arrows) 203 amplified by these primer sets was different between 8hpi and 24hpi samples. This result suggests 204 that circRNA expression level and pattern could change over the course of infection. 205 We also confirmed a few features of CoV circRNAs characterized bioinformatically. First, 206 we detected a variety of FSJs in SARS-CoV-2 circRNAs. The major type of FSJ was accompanied 207 with a long-distance back-splicing to the 5'UTR to create sgRNA-like circRNAs. We found 5 208 circRNAs that contained FSJ 75|28266 and 4 circRNAs that contained FSJ 76|26480 (data not 209 shown), suggesting TRS-mediated fusion of the leader sequence with N and M gene, respectively. 210 Interestingly, the BSJs in sgRNA-like circRNAs were more flexible. The 3' breakpoints ranges 211 from 28465 to 2927, and the 5' breakpoint ranges from 3 to 40 ( Figure S3B ). It is likely that these 212 circRNAs used sgRNAs as template for synthesis. We also detected FSJs that represent Two circRNAs with unexpected repetitive back-splicing caught our attention. One had two 222 different distant back-splicing events (28465|40 and 28526|1) followed by the same TRS-L 223 dependent fusion, 75|28266 ( Figure S3C ). The other had two rounds of fusion from 28465 to 28320 224 followed by a third fusion from 28467 to 28282 ( Figure S3D ). Since the BSJs within the same 225 circRNAs were slightly different, it is unlikely to be an artifact of the rolling-cycle amplification 226 of circRNAs by RT. These two cases suggest that SARS-CoV-2 circRNAs form BSJs independent 227 of splicing. It is likely that SARS-CoV-2 circRNA are generated through the template-switching 228 mechanism that drives the formation of discontinuous transcripts. In support of this hypothesis, 229 we found that the upstream sequences of the acceptors were homologous to the donor sequence 230 ( Figure 3D-H, data not shown) . TRS-dependent FSJs in SARS-CoV-2 circRNAs had 11-12 231 homologous nucleotides between the leader and the body sequence. Also, BSJs with 3-6 232 nucleotides homology around the breakpoint was frequently observed. 233 In conclusion, we have demonstrated that SARS-CoV-2 produces a surprising diversity of 234 circRNAs that are abundantly present in the infected Vero E6 cells. CoV-2 by systematic capturing and identifying viral circRNAs produced from the predicted BSJ 249 We bioinformatically identified 351, 224 and 2,764 circRNAs derived from SARS-CoV-251 2, SARS-CoV and MERS-CoV, respectively ( Figure 1D-1F) , and experimentally identified more 252 than 100 SARS-CoV-2 circRNAs (data not shown). Comparing the BSJ landscapes and 253 frequency among SARS-CoV-2, SARS-CoV and MERS-CoV revealed two major circularization 254 events shared by all the three CoVs: 1) distant fusion between RNA located at the tail and the 255 head of the genome; 2) local fusion in the conserved N gene ( Figure 1D-1F) . These events were 256 confirmed by experimentally identified circRNAs ( Figure 3C-H and S3B) . What distinguishes 257 CoV circRNAs from host circRNAs are the expression level ( Figure S1F ), the length ( Figure 2A 258 and 2B), the strand preference ( Figure 2C) , and the circRNA exon number ( Figure 2D) . 259 The collection of experimentally identified SARS-CoV-2 circRNAs further distinguishes 260 CoV circRNAs from Nu-circRNAs. First, we observed striking flexibility in the breakpoints of 261 SARS-CoV-2 circRNAs. Analysis of sequences around the 3' and 5' breakpoints of 262 experimentally identified SARS-CoV circRNAs suggest that homology-mediated inaccurate 263 fusion drives the back-splicing event (data not shown), whereas nu-circRNAs tend to splice 264 accurately on the AGGT splicing signal. Secondly, we found two cases where multiple back-265 splicing events occurred in the same circRNAs ( Figure S3C and S3D) , suggesting back-splicing 266 occurs as the RNA is synthesized. It further suggests that the RNA configuration could create 267 BSJ hotspots that enable repetitive back-splicing. 268 As we wrote this manuscript, another group reported the first bioinformatic identification 269 of circRNAs in SARS-CoV-2, SARS-CoV and MERS-CoV (27). Interestingly, they came to 270 several opposing conclusions about CoV circRNAs, including the abundance, the strandness and 271 the expression level. It is likely due to the datasets they used and the circRNA analysis pipeline 272 and strategy they adopted. First, we chose SARS-CoV-2 and SARS-CoV datasets with higher 273 sequencing depth and pooled biological triplicates before the analysis. As a result, we identified 274 240 circRNAs shared by CIRI2 and finc_circ ( Figure S1E ), twice the number they found. Since 275 CoV circRNA does not form BSJs through splicing, AG|GT signal-base algorithms are likely to 276 have an extreme high false discovery rate, which could lead to their opposing conclusion on 277 strand-preference. Secondly, we chose BSJ-spanning read counts as the indication of abundance 278 and made comparison between the host and the viral circRNAs of the same dataset. We have 279 shown that many CoV circRNAs were spliced tail-to-head. Using transcript per million (TPM) as 280 the index would greatly underestimate the abundance of CoV circRNAs. Similarly, they 281 considered the span between the 5' and 3' breakpoints of the BSJ is the length of the circRNA, 282 assuming that CoV circRNAs do not have FSJs, is an unreasonable way to analyze the data. For 283 our analysis, we only quantified fully assembled circRNAs predicted by CIRI2-full, rendering 284 our length analysis more reliable. Lastly, the group claimed that the number of circRNA 285 identified by their pipeline increased over the course of infection. However, our experimental 286 results suggest that the most abundant SARS-CoV-2 circRNA, 29122|28295, was highly 287 expressed at 8 hpi and was likely to down-regulated at 24 hpi ( Figure 3I ). Considering the 288 flexibility of circRNA BSJs, we have observed experimentally and the inaccuracy of 289 bioinformatic algorithms in calling circRNAs. We believe using a systematic approach to 290 examine circRNA expression diversity and abundance at different stages of infection is needed 291 before any conclusion could be drawn. Bands indicated by red arrows were gel-purified and sequenced. Note the intensity of most 347 candidate BSJs were comparable to that of the positive control, circHIPK3 of host origin. 348 Infection also enhanced the expression of circHIPK3. (D-H The SARS-CoV-2 infection experiment was performed in BSL3 labs as described previously (35). Table S1 . PCR was performed with GoTaq Master Mix (Promega) with 1ul cDNA template 432 at 1:20 dilution. Following agarose gel (2%) electrophoresis, candidate circRNA PCR products were size-433 selected and gel-purified (Gel purification kit, Zymo) and subcloned with TA cloning kit (ThermoFisher). At least 8 colonies were checked for insertion of candidate PCR products by PCR with M13 universal 435 primers. Amplified insertions were PCR purified (DNA purification kit, Zymo) and subjected to Sanger A pneumonia outbreak associated with a new coronavirus of probable bat 443 origin Study Group of the International Committee on Taxonomy of, The 445 species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV 446 and naming it SARS-CoV-2 Continuous and Discontinuous RNA Synthesis 448 in Coronaviruses The Architecture of SARS-CoV-2 Transcriptome Circular RNA and its mechanisms in disease: From the bench to 452 the clinic Circular RNAs are abundant, conserved, and associated with ALU repeats The Biogenesis, Functions, and Challenges of Circular RNAs CircHIPK3 sponges miR-558 to suppress heparanase expression in bladder 458 cancer cells The Landscape of Circular RNA in Cancer Intrathecal circHIPK3 shRNA alleviates neuropathic 461 pain in diabetic rats Roles of circular RNAs in immune regulation and 464 autoimmune diseases Epstein-Barr virus-derived circular RNA LMP2A induces stemness in EBV-466 associated gastric cancer Identification of virus-encoded circular RNA The Epstein Barr virus circRNAome Circular DNA tumor viruses make circular RNAs Discovery of Kaposi's sarcoma herpesvirus-encoded circular RNAs and a 474 human antiviral circular RNA Kaposi's Sarcoma-Associated Herpesvirus-Encoded circRNAs Are 476 Expressed in Infected Tumor Tissues and Are Incorporated into Virions Transforming activity of an oncoprotein-encoding circular RNA from human 478 papillomavirus circBase: a database for circular RNAs Comparative tropism, replication kinetics, and cell damage profiling of SARS-482 CoV-2 and SARS-CoV with implications for clinical manifestations, transmissibility, and 483 laboratory studies of COVID-19: an observational study Discovery of SARS-CoV-2 antiviral drugs through large-scale compound 485 repurposing Competing endogenous RNA network profiling reveals novel host 487 dependency factors required for MERS-CoV propagation Circular RNA identification based on multiple seed matching Reconstruction of full-length circular RNAs enables 492 isoform-level quantification Fast and accurate short read alignment with Burrows-Wheeler transform The hepatitis delta 496 (delta) virus possesses a circular RNA Heping Zheng, Yousong Peng, Identification and 499 characterization of circRNAs encoded by MERS-CoV, SARS-CoV-1 and SARS-CoV-2. 500 Transcriptomic analysis reveals novel mechanisms of SARS-CoV-2 infection 502 in human lung cells Fast gapped-read alignment with Bowtie 2 Qualimap 2: advanced multi-sample 506 quality control for high-throughput sequencing data Circular RNAs are a large class of animal RNAs with regulatory potency. 508 Application of ggplot2 to Pharmacometric Graphics Formation of DNA replication structures in herpes virus-512 infected cells requires a viral DNA binding protein A systemic and molecular study of subcellular localization of SARS-CoV-2 514 proteins Growth, detection, 516 quantification, and inactivation of SARS-CoV-2 Detection and Analysis of Circular RNAs by RT-PCR