key: cord-1043957-16npbwit authors: Gao, Shan; Cheng, Zhi; Jin, Xiufeng; Wang, Fang; Xuan, Yibo; Zhou, Hao; Liu, Chang; Ruan, Jishou; Duan, Guangyou; Li, Xin title: A negative feedback model to explain regulation of SARS-CoV-2 replication and transcription date: 2020-08-29 journal: bioRxiv DOI: 10.1101/2020.08.23.263327 sha: e318dab40db0f2e45437716634226882b0c5a769 doc_id: 1043957 cord_uid: 16npbwit Background Coronavirus disease 2019 (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Although a preliminary understanding of the replication and transcription mechanisms of SARS-CoV-2 has recently emerged, their regulation remains unclear. Results Based on reanalysis of public data, we propose a negative feedback model to explain the regulation of replication and transcription in—but not limited to—SARS-CoV-2. The key step leading to new discoveries was the identification of the cleavage sites of nsp15—an RNA uridylate-specific endoribonuclease, encoded by CoVs. According to this model, nsp15 regulates the synthesis of subgenomic RNAs (sgRNAs) and genomic RNAs (gRNAs) by cleaving transcription regulatory sequences in the body. The expression level of nsp15 determines the relative proportions of sgRNAs and gRNAs, which in turn change the expression level of nps15 to reach equilibrium between the replication and transcription of CoVs. Conclusions The replication and transcription of CoVs are regulated by a negative feedback mechanism that influences the persistence of CoVs in hosts. Our findings enrich fundamental knowledge in the field of gene expression and its regulation, and provide new clues for future studies. One important clue is that nsp15 may be an important and ideal target for the development of drugs (e.g. uridine derivatives) against CoVs. Coronavirus disease 2019 is caused by severe acute respiratory 47 syndrome coronavirus 2 (SARS-CoV-2). As enveloped viruses composed of a 48 positive-sense, single-stranded RNAs, CoVs have the largest genomes (26-32 kb) 49 among all RNA virus families. SARS-CoV-2 has a genome of ∼ 30 kb [1] . In addition 50 to ORF1a and 1b (Figure 1A) , the SARS-CoV-2 genome has sequences encoding 51 four conserved structural proteins [spike protein (S), envelope protein (E), membrane 52 protein (M), and nucleocapsid protein (N)] and six accessory proteins (ORF3a, 6, 7a, 53 7b, 8, and 10) that are yet to be experimentally verified. ORF3a, 6, 7a, 7b, 8 and 10) of SARS-CoV-2 has been reported in a study [2] that directly validates the prevailing "leader-to-body fusion" model using Nanopore 65 RNA-seq-a direct RNA sequencing method [3] . In the model validated in that study 66 ( Figure 1A) , replication and transcription require gRNA(+) as the template for the 67 synthesis of antisense genomic RNAs [gRNAs (-) ] and antisense subgenomic RNAs 68 [sgRNAs (-) ] by RNA-dependent RNA polymerase (RdRP). When RdRP pauses, as it 69 crosses a body transcription regulatory sequence (TRS-B) and switches the template 70 to the leader TRS (TRS-L), sgRNAs(-) are formed. Otherwise, RdRP reads gRNAs(+) 71 continuously, without interruption, resulting in gRNAs(-). Thereafter, gRNAs(-) and 72 sgRNAs(-) are used as templates to synthesize gRNAs(+) and sgRNAs(+), 73 respectively; sgRNAs(+) are used as templates for the translation of the 10 proteins 74 mentioned above. 75 TRS-L usually forms the first 60-70 nts of the 5' UTR in a CoV genome. TRS-B 76 with varied length is located immediately upstream of ORFs except ORF1a and 1b 77 ( Figure 1A) . During antisense-strand synthesis, the discontinuous transcription 78 (referred to as polymerase jumping or template switching) by RdRP is described by 79 the "leader-to-body fusion" model that is conserved in the entire nidovirus order, 80 including arteriviruses, toroviruses, roniviruses and animal coronaviruses. Although 81 investigation was conducted to determine the underlying mechanisms in previous 82 studies [4], the molecular basis remain unknown. In the present study, we aimed to 83 determine the molecular basis of the "leader-body fusion" model and construct a 84 model to explain the regulation of SARS-CoV-2 replication and transcription. 85 86 Using public Nanopore RNA-seq data (Materials and Methods), 575,106 out of 89 879,679 reads from a SARS-CoV-2-infected Vero cell sample were aligned to the 90 SARS-CoV-2 reference genome. Among all aligned reads, 575,106 sense reads 91 represented gRNAs(+) or sgRNAs(+), while 30 antisense reads represented gRNAs (-) or sgRNAs(-). This exceedingly high ratio (575,106 vs. 30) between sense and 93 antisense reads may be the result of significant differences between the 94 gRNAs(+)/sgRNAs(+) and gRNAs(-)/sgRNAs(-) degradation efficiencies. This 95 phenomenon, however, was not reported in the previous study [2] . Another 96 explanation for the high ratio is that gRNAs(+)/sgRNAs(+) are protected by binding 97 to the N proteins. Another super high ratio (198,198,542 vs. 11,820,438) between 98 contiguous and junction-spanning reads was reported in that previous study [2] . This 99 suggested that there were significant differences between gRNAs(+)/gRNAs(-) and 100 sgRNAs(+)/sgRNAs(-). 101 By reanalysis of junction-spanning reads, we found that TRS-Bs share ~12 nt 102 junction regions with TRS-Ls in SARS-CoV-2 ( Figure 1A ). Most junction regions in 103 eight protein-coding genes (S, E, M, N, ORF3a, 6, 7a and 8) showed that ACGAAC is a highly conserved sequence among their genomes, 114 particularly in their putative TRS-Bs. This suggests that canonical junction regions 115 contain a specific motif for enzyme reorganization. Since all CoVs and even viruses 116 in the entire nidovirus order adhere to the "leader-to-body fusion" model [2], this or 117 these enzymes should be encoded by the ORF1a or 1b gene, given their likelihood to 118 be translated. After analysing 16 non-structural proteins (nsp1-16) encoded by the endoribonuclease, NendoU [5]) is most likely to function in these junction regions, 121 given that a homolog of nsp15 has cleavage sites containing "GU" [6] . Thus, the 122 cleavage site of nsp15 was identified to follow the motif "GTTCGT|N" (the vertical 123 line indicates the breakpoint and N indicates any nucleotide base), read in the 124 antisense strands of CoV genomes. Furthermore, we found that almost all the genomic 125 sites containing the motif "GTTCGT" have polyT (not less than three T) at the tail, 126 which ensures the presence of at least one U for nsp5 cleavage. Upon searching for "GTTCGT" in the genomes of betacoronavirus subgroup B, 135 the occurrence of "GTTCGT" on the antisense strand was found to be more than 1.6 136 times that on the sense strand. In particular, "GTTCGT" occurred 3 and 9 times 137 (Table 1) on the sense and antisense strands of the SARS-CoV-2 genome, 138 respectively. These findings suggest that the basic function of nsp15 involves the 139 degradation of gRNAs and gsRNAs and that the high ratio between sense and 140 antisense reads (see above) results from substantially more cleavage of gRNAs(-)/gsRNAs(-) than that of gRNAs(+)/gsRNAs(+). Among the three sites 142 containing "GTTCGT" on the sense strand of the SARS-CoV-2 genome (referred to 143 as internal cleavage sites-ICSs), one is located in the coding sequence (CDS) of 144 RdRP (nsp12), while the other two are located in the ORF8 gene (see below). 145 These two ICSs are also located in ORF8 of most SARS-CoV-2, SARS2-like 146 CoV and SARS-like CoV genomes; however, they are absent in the genomes of 147 SARS-CoVs obtained from humans (GenBank: AY274119 and AY278489) and In total, three types of plasmids containing EGFP reporter genes-named 219 pEGFP-C1, pSARS, and pCoV-ba (betacoronavirus subgroup A)-were used in the 220 experiments (Figure 2A) . The plasmid pEGFP-C1 was used as a control, given that it 221 contains 17-nt sequences, encoding the first hairpin from Cytomegalovirus (CMV). 222 Two types of plasmids proceeded by 30-and 29-nt inserts were used to evaluate their factors that may exert influence at the replication or transcription level, we performed 241 the following experiments: (1) using HEK293 cells to rule out the possible influence 242 by the differences of plasmid copy numbers, since all three types of plasmids 243 containing the SV40 origins can be replicated to a copy number of between 400~1000 244 plasmids per cell within HEK293T; and (2) using qPCR to rule out the possible 245 influence by differential transcription (Supplementary 1) . In the present study, we determined the molecular basis of the "leader- preliminary experiments, we concluded that the hairpin in pSARS resulted in the 267 over-expression of EGFP, which caused cell death. These facts supported that the 268 negative feedback is necessary to prevent the over-expression of viral genes. 269 Our findings enrich fundamental knowledge in the field of gene expression and 270 its regulation, and provide new clues for future studies. The template switching ability 271 and the high ratio between contiguous and junction-spanning reads indicated that 272 RdRP (nsp12) The 5' and 3' ends of gRNAs(+) and sgRNAs(+) were observed and double-checked 289 using the software Tablet v1.15.09.01 [11] . 290 In the present study, three types of plasmids pEGFP-C1 (maintained in the lab of 291 Hao Zhou), pSARS and pCoV-ba, and three types (Hela, HEK293 and HEK293T) of 292 cell maintained in our lab were used for transfection. To construct pSARS, pEGFP-C1 293 was PCR amplified, using primers fVR and rRBS2 (Supplementary 1). Then the 294 linear PCR product was self-ligated into a plasmid by homologous recombination 295 technology using ClonExpress II One Step Cloning Kit (Vazyme Biotech, China). 296 Following the same procedure, primers fVR and rRBS3 (Supplementary 1) were 297 used to construct pCoV-ba. The cells were cultured in Dulbecco's Modified Eagle 298 Medium (DMEM) media supplemented with 10% fetal bovine serum. For each 299 experiment, 100,000 cells were seeded into one well of a 6-well plate for plasmid 300 transfection. After 12 hours (0 hour in figure 2B), transfection of 1 μ g of plasmid into 301 one well was performed using 3 μ L PolyJet (SignaGen Laboratories, USA), following the manufacturer's instructions. The medium was changed at 12 hours after plasmid 303 transfection. MTT (200 μ l, 5 mg/ml),and 1.8 ml medium was added to each well and 304 cultured at 37 °C for 4 h. Then, the cells in each well were removed of medium, 305 washed with PBS, then mixed with 1000 μ l DMSO to dissolve the formazan product. 306 Finally, formazan absorbance was measured by a microplate reader with a wavelength 307 of 492 nm (Thermo Labsystems Helsinki, Finland). LDH experiments were 308 performed using LDH cytotoxicity assay detection kit (Beyotime, China), following 309 the manufacturer's instructions. 310 We are grateful for the help from the following faculty members of College of 357 Life Sciences at Nankai University: Xuetao Cao, Deling Kong, Quan Chen, Wenjun 358 Bu, Ting Ma, Tao Zhang, Dawei Huang, Mingqiang Qiao, Yanqiang Liu, Qiang 359 Zhao, Bingjun He and Zhen Ye. This manuscript was online as a preprint on Aug 360 24nd, 2020 at https://www.biorxiv.org/content/10.1101/2020.08.23.263327v1. 361 362 Recombination, Reservoirs, and the Modular 365 Spike: Mechanisms of Coronavirus Cross-Species Transmission The 368 Architecture of SARS-CoV-2 Transcriptome Using pan RNA-seq analysis to reveal the ubiquitous 371 existence of 5' and 3' end small RNAs Sequence Motifs Involved in the 373 Regulation of Discontinuous Coronavirus Subgenomic RNA Synthesis Crystal structure of Nsp15 377 endoribonuclease NendoU from SARS-CoV-2 Biochemical Characterization of Arterivirus 381 Nonstructural Protein 11 Reveals the Nidovirus-Wide Conservation of a 382 Barcode of the 2019 Novel Coronavirus Leads to Insights into Its Virulence Fastq_clean: An 392 optimized pipeline to clean the Illumina sequencing data with quality control 2014 IEEE International 394 Conference on ggplot2: elegant graphics for data analysis Using Tablet for visual exploration of second-generation 399 sequencing data 312 Additional file 1: Figure S1 . Comparison of fluorescent brightness.