key: cord-0952132-lkhad2ex authors: Zou, Wei; Xiong, Min; Hao, Siyuan; Zhang-Chen, Elizabeth Yan; Baumlin, Nathalie; Kim, Michael D.; Salathe, Matthias; Yan, Ziying; Qiu, Jianming title: The SARS-CoV-2 transcriptome and the dynamics of the S gene furin cleavage site in primary human airway epithelia date: 2021-02-04 journal: bioRxiv DOI: 10.1101/2021.02.03.429670 sha: 47c49f8969431898baaf3894a6fa3c334e859e45 doc_id: 952132 cord_uid: lkhad2ex The novel severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) caused the devastating ongoing coronavirus disease-2019 (COVID-19) pandemic which poses a great threat to global public health. The spike (S) polypeptide of SARS-CoV-2 consists of the S1 and S2 subunits and is processed by cellular proteases at the S1/S2 boundary. The inclusion of the 4 amino acids (PRRA) at the S1/S2 boundary forms a furin cleavage site (FCS), 682RRAR↓S686, distinguishing SARS-CoV-2 from its closest relative, the SARS-CoV. Various deletions surrounding the FCS have been identified in patients. When SARS-CoV-2 propagated in Vero cells, the virus acquired various deletions surrounding the FCS. In the present study, we studied the viral transcriptome in SARS-CoV-2 infected primary human airway epithelia (HAE) cultured at an air-liquid interface (ALI) with an emphasis on the viral genome stability at the S1/S2 boundary using RNA-seq. While we found overall the viral transcriptome is similar to that generated from infected Vero cells, we identified a high percentage of mutated viral genome and transcripts in HAE-ALI. Two highly frequent deletions were found at the S1/S2 boundary of the S gene: one is a deletion of 12 amino acids, 678TNSPRRAR↓SVAS689, which contains the FCS, another is a deletion of 5 amino acids, 675QTQTN679, which is two amino acids upstream of the FCS. Further studies on the dynamics of the FCS deletions in apically released virions revealed that the selective pressure for the FCS maintains the S gene stability in HAE-ALI but with exceptions, in which the FCS deletions are remained at a high rate. Thus, our study presents evidence for the role of unique properties of human airway epithelia in the dynamics of the FCS region during infection of human airways, which is donor-dependent. In this study, we used RNA-seq to analyze the viral transcriptome of SARS-CoV-2 in the 134 infected human airway epithelia (HAE) cultured at an air-liquid interface (HAE-ALI), which 135 mimics natural viral infection of human airways (35, 36) . While the viral transcriptome overall 136 recapitulated that in Vero cells, we discovered that there is a selective pressure in HAE-ALI to 137 suppress the deletions at the S1/S2 boundary and that this pressure appears individual donor 138 dependent. We identified two FCS region deletions that are strikingly amplified in two HAE-ALI were prepared from bronchial airway epithelial cells isolated from various donors. They were 155 obtained from the Cells and Tissue Core of the Center for Gene Therapy, University of Iowa and 156 polarized in Transwell inserts (0.33 cm 2 were aligned to the reference SARS-CoV-2 Wuhan-Hu-1 isolate genome (GenBank accession 234 no: MN908947) using BWA v0.7.5a-r405. Sequencing read coverage was calculated using 235 bedtools genomecov of version 2.27.1. We used STAR (2.7.3a) to identify the junction-spanning 236 reads as described previously except that we set the minimal size of deletions as 10 (15). observe an obvious 5'-leader peak in the infected HAE cells (Fig. 2) . Instead, we observed ~2-268 fold higher reads in the 3'-end than that in the 5'-end viral genome. and 16.93% of total junction-spanning reads in the groups of MOI 0.2 and MOI 2, respectively, 276 followed by ORF3a, ORF7a, M, ORF8, S, E, ORF6 coding RNAs (Fig. 3) . The junction-277 spanning reads associated with ORF7b and ORF9a/b were identified at a level of 0.01% or less 278 of the total junction-spanning reads and were only identified in part of all the six samples in two 279 MOI groups (Supplemental Material S1). In SARS-CoV-2 infected HAE cells, S RNA transcript 280 was expressed at a ratio of ~2% of total junction-spanning reads in both groups (Fig. 3) , 281 compared to that of ~8% in Vero cells. We detected relatively higher level of ORF3a (~8%) 282 transcript in SARS-CoV-2 infected HAE cells (Fig. 3) , in contrast to 5.22% in infected Vero cells 283 Interestingly, in all identified spanning-junction reads, only ~50% correlated to the 285 canonical sgRNA transcripts in both MOI infection groups. The other half junction-spanning 286 reads represent either reads covering 5'-leader sequence but with unexpected 3' sites located in 287 the middle of annotated ORFs or reads covering between different ORFs or inside an ORF 288 without 5'-leader sequence (Supplemental Material S1). It's important to note that a lot of 289 these noncanonical junction-spanning patterns were supported by only one read from the RNA-290 seq data, indicating that these noncanonical transcripts may arise from erroneous replicase 291 activity. 292 293 Identification of deletions surrounding the furin cleavage site at S1/S2. 294 Among the ~50% noncanonical junction-spanning reads, we identified a high abundant 295 36-bp deletion, mut-del1, located at nt 23,594-23,629 spanning the FCS (Fig. 4A ) that encodes 296 aa 678 TNSPRRAR↓SVAS 689 (Fig. 4B , "↓" indicates cleavage). It displayed at frequencies of 297 21.04% and 14.79% of total junction-spanning reads in MOI 0.2 and MOI 2 groups, respectively 298 (Tab. 2). Another 15-bp deletion, mut-del2, located at nt 23,583-23,597, encoding aa 299 675 QTQTN 679 (Fig. 4A) , just two amino acids ahead of the FCS. It accounted for 0.42% and 300 15.11% of the total junction-spanning reads in MOI 0.2 and MOI 2 groups, respectively (Tab. 2). 301 The ratio of mut-del1 is only slightly lower than the N sgRNA and nearly 10 times higher than 302 the S sgRNA transcript, indicating a high fraction of this mutation comes from the viral genome 303 (+gRNA). 304 To further reveal the ratio of the two deletions in total viral genome, the junction-305 spanning reads associated with the two deletions were normalized with the average reads 306 covering the same deletions. The results showed that 69.02% and 20.02% of the viral reads 307 related to this region contain the mut-del1 deletion, while 1.11% and 15.75% of this region 308 contain mut-del2 deletion in MOI 0.2 and MOI 2 groups, respectively (Tab. 2). It should be noted 309 that the total reads used for normalization include reads of both viral gRNA and sgRNAs. Thus, 310 here we were unable to distinguish the origin of these two deletions from the viral genome and 311 viral RNA transcripts in these total cellular transcriptome data. 312 Except for these two highly abundant mut-del1 and mut-del2 deletions, we also observed 313 a 21-bp FCS deletion at nt 23,595-23,615, encoding aa 678 TNSPRRA 684 , but only in the MOI 2 314 group with 1.13% of the total junction-spanning reads, and a 39-bp deletion at the N-terminus of 315 the S protein (nt 21,743-21,781 encoding aa 61 NVTWFHAIHVSGT 73 ) with 0.27% and 0.60% of 316 the total junction-spanning reads in MOI 0.2 and MOI 2 groups, respectively. 317 In addition to these deletions in S gene, we identified about 50 different in-frame or 318 frameshift deletions in M encoding region that appeared in all six samples of both MOI groups, 319 and there were even more deletions in M coding region that appeared in only a part of the six 320 RNA samples (Supplemental Material S1). Although the ratio of single deletion was low, the 50 321 deletion patterns that appeared in all 6 RNA samples had the ratios of 2.39% and 3.18% in MOI 322 0.2 and MOI 2 groups, respectively, which is similar or even higher than the identified canonical 323 junction-spanning reads related to M sgRNAs (Fig. 3) . Notably, most of these identified deletion apical virus release kinetics of the HAE-ALI KC19 is shown in Fig. 5 . Viral RNA was prepared 337 either for RNA-seq or for PCR amplicon-seq of a 384-nt sequence covering the FCS. Notably, 338 mut-del1 was not significantly detected (<0.1%) in all the apically released viruses collected at 339 >13 dpi (Tab. 3, Bx-20) . Nevertheless, for viruses collected from HAE-ALI KC19 , the mut-del2 was 340 detected at a high level (20.75% RNA-seq and 20.98% RNA-seq ) at 4 dpi and 13 dpi, respectively, 341 which reached a close level of 41.79% PCR-seq at 21 dpi. Although the viruses derived from HAE-342 ALI B3-20 , HAE-ALI L209 and HAE-ALI B4-20 contain a high level (23.17%, 30.33%, and 8.3%, 343 respectively) of mut-del2 at 3 dpi, it decreased to a level of <2% at ≥17 dpi (Tab. 3, Bx-20) . 344 HAE-ALI B9-20 did never produce significant mut-del2 (<0.1%). 345 Notably, SARS-CoV-2 isolate USA-WA1/2020 P0 stock provided by BEI was already 346 passaged 4 times in Vero cells, and it was reported that there appeared significant 347 heterogeneity at the S1/S2 boundary in Vero cell propagated virus (31). To verify this, we 348 sequenced the viral RNA of P0 (the originally received vial) and P1 (passaged once in Vero-E6 349 cells) virus stocks. The results showed that there was no detectable mut-del1 in the P0 stock 350 but a high rate of 21.69% RNA-seq and 40.47 PCR-seq in the P1 stock. However, while there was ~2% 351 of mut-del2 in the P0 stock, it only slightly increased to ~5% in the P1 stock. (Tab. 3, P0 and 352 P1). These results confirmed that even though there was no or low level of mut-del1 and mut-353 del2 in the P0 stock, there was a high level of mut-del1 and a low level of mut-del2 in the P1 354 stock, which we used for infection of all HAE-ALI cultures. 355 Taken together, the results demonstrated that the mut-del1 appears at a very low rate in 356 all the HAE-ALI produced viruses at late time points of infection (≥17 dpi), which was confirmed 357 by both RNA-seq and PCR amplicon-seq. Although the inoculum had the mut-del1 detected at a 358 high frequent rate of 21.69% RNA-seq (40.47% PCR-seq ), this deletion was obviously suppressed 359 when the virus propagated in the HAE-ALI cultures. In addition, except for the viruses produced 360 from HAE-ALI KC19 , mut-del2 also appeared at a low detection rate during the course of infection 361 (≥17 dpi) in various HAE-ALI cultures. 362 indicating the input inoculum replicated similarly in these two HAE-ALI cultures as in the others 374 ( Fig. 5) (36) . The sequencing results of the apically released viruses from infected HAE-ALI B15-20 375 showed that mut-del1 was detected at a rate of 31.87% at 3 dpi, which increased to a high level 376 at 54.22% at 13 dpi (Tab. 4, B15-20/mut-del1). However, mut-del2 was barely detectable at a 377 low rate (0.18%) at 3 dpi, and this rate remained very low at 0.69% at 13 dpi (Tab. 4, B15-378 20/mut-del2). For the viruses apically released from infected HAE-ALI B16-20 cultures, only mut-379 del1 was detected at a rate of 6.81% at 3 dpi, which nearly disappeared (at rate of 0.08%) at 13 380 dpi; whereas mut-del2 was detected at very low levels (<1%) at both 3 and 13 dpi (Tab. 4, B16-381 20). The suppression of mut-del1 in HAE-ALI B16-20 cultures was similar as what observed in the 382 previously tested five ALI cultures (Tabl. 3) . 383 Together with the detection rates of the viruses produced from the infected HAE KC19 -ALI 384 cultures, the above results suggested that the probability of FCS region deletions is dependent 385 on the HAE-ALI cultures made from airway epithelial cells of different donors. In contrast to the 386 HAE-ALI KC19 that produced a high rate of mut-del2, HAE-ALI B15-20 tended to generate the viruses 387 that have a high rate of the mut-del1 deletion. 388 In this study, we analyzed the transcriptome of SARS-CoV-2 in polarized human 391 bronchial airway epithelia, an in vitro model mimicking the SARS-CoV-2 infection in human 392 lower airways (35,36). We found that the transcriptome in HAE-ALI reflects more closely the 393 viral transcriptome in the airways of COVID-19 patients, supporting that HAE-ALI is a 394 physiologically relevant in vitro culture to study SARS-CoV-2. Neither RNA-seq data of clinical 395 SARS-CoV-2 positive nasopharyngeal specimens nor RNA-seq of SARS-CoV-2 infected HAE-396 ALI showed the 5'-leader sequence read peak (38,39). In SARS-CoV-2 infected Vero-E6 cells, 397 N sgRNAs accounted for up to 69% in total viral RNA transcripts, and S sgRNAs accounted for 398 8% of the total junction-spanning reads (15) (Fig. 3) . Nevertheless, in HAE-ALI cultures, SARS-399 CoV-2 still expresses the abundant N protein transcript, and relatively low level of S gene 400 mRNA. Of note, the overall sgRNA transcripts in infected HAE-ALI were mapped to only 50% of 401 all the canonical sgRNAs, much lower than that in Vero cells (15), which is partially due to the 402 high deletion rate of the FCS region derived from the inoculated virus (P1 stock, Tab. 3) . 403 Among all the SARS-CoV-2 viral genes, S gene is the one most variable, in particular 404 the S1/S2 junctional region, featuring the FCS. Increasing evidence has shown that the S1/S2 405 FCS region is highly unstable, and various deletions and mutations have been detected or 406 receptor binding and membrane fusion. The first is a priming cleavage that occurs at the S1/S2 423 boundary, and the second is the obligatory triggering cleavage that occurs within the S2' site 424 (Fig. 4A) . The priming cleavage at S1/S2 boundary causes the conformation changes of the S1 425 subunit for receptor binding and of the S2 subunit for conversion of a fusion competent form, by 426 enabling the S protein to better bind receptors or expose the hidden S2' cleavage site. The 427 cleavage at S2' triggers the fusion of the viral envelope with the host cell membrane (23). 428 Cleavage by furin at the S1/S2 site is required for subsequent transmembrane serine protease 2 429 The continuing 498 2019-nCoV epidemic threat of novel coronaviruses to global health -The latest 2019 499 novel coronavirus outbreak in Wuhan A pneumonia outbreak associated 504 with a new coronavirus of probable bat origin A 507 Novel Coronavirus from Patients with Pneumonia in China COVID-19: An 510 overview and a clinical update COVID-19 vaccines: The status and 513 perspectives in delivery points of view Coronaviridae Study Group of the International Committee on Taxonomy of 515 The species Severe acute respiratory syndrome-related coronavirus: 516 classifying 2019-nCoV and naming it SARS-CoV-2 SARS-CoV-2 is an appropriate name for the new 519 coronavirus Genomic 524 characterisation and epidemiology of 2019 novel coronavirus: implications for virus 525 origins and receptor binding Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated 528 from a patient with atypical pneumonia after visiting Wuhan A major 532 outbreak of severe acute respiratory syndrome in Hong Kong Identification of a novel coronavirus in patients with severe 539 acute respiratory syndrome Secondary structure of the SARS-541 CoV-2 5'-UTR The RNA Architecture of 543 the SARS-CoV-2 3'-Untranslated Region The 545 Architecture of SARS-CoV-2 Transcriptome High-Resolution Analysis of Coronavirus Gene Expression by RNA Sequencing 548 and Ribosome Profiling 2020. 551 Characterisation of the transcriptome and proteome of SARS-CoV-2 reveals a cell 552 passage induced in-frame deletion of the furin-like cleavage site from the spike 553 glycoprotein A contemporary view of 555 coronavirus transcription Continuous and Discontinuous 557 RNA Synthesis in Coronaviruses Sequence motifs involved in the 559 regulation of discontinuous coronavirus subgenomic RNA synthesis Coronavirus 561 biology and replication: implications for SARS-CoV-2 Coronaviruses: an overview of their replication and 563 pathogenesis Structure, Function, and Evolution of Coronavirus Spike Proteins Characteristics of SARS-CoV-2 and 567 COVID-19 Cell entry 569 mechanisms of SARS-CoV-2 SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is 573 Blocked by a Clinically Proven Protease Inhibitor A Multibasic Cleavage Site 575 in the Spike Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells A Novel Bat Coronavirus Closely 579 Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site of the 580 Spike Protein Proteolytic Cleavage of the 582 SARS-CoV-2 Spike Protein and the Role of the Novel S1/S2 Site. iScience The 584 spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-like cleavage site 585 absent in CoV of the same clade SARS-CoV-2 growth, furin-589 cleavage-site adaptation and neutralization using serum from acutely infected 590 hospitalized COVID-19 patients Spike Protein of Severe Acute Respiratory Syndrome Coronavirus 2 Attenuated SARS-CoV-2 variants with deletions at the S1/S2 junction SARS-coronavirus-2 replication in Vero E6 cells: replication 603 kinetics, rapid adaptation and cytopathology Morphogenesis and cytopathic effect of SARS-CoV-2 infection in human 607 airway epithelial cells Long-Term 609 Modeling of SARS-CoV-2 Infection of In Vitro Cultured Polarized Human Airway 610 Epithelium Minus-strand copies of 612 replicating coronavirus mRNAs contain antileaders Whole Genome Amplification and Sequencing for Effective Population-Based 616 Surveillance and Control of Viral Transmission Mason. 2020. Shotgun Transcriptome and Isothermal Profiling of SARS-CoV Infection Reveals Unique Host Responses, Viral Diversification, and Drug Interactions Neuropilin-1 facilitates SARS-CoV-2 cell entry and infectivity SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is 637 Blocked by a Clinically Proven Protease Inhibitor SARS-CoV-2 variants with mutations at the S1/S2 cleavage 640 site are generated in vitro during propagation in TMPRSS2-deficient cells Proteolytic Activation of SARS-CoV-2 Spike at the S1/S2 Boundary: Potential Role 644 of Proteases beyond Furin The S1/S2 648 boundary of SARS-CoV-2 spike protein modulates cell entry pathways and transmission Analysis of ACE2 in polarized 652 epithelial cells: surface expression and function as receptor for severe acute respiratory 653 syndrome-associated coronavirus Inhibition 655 of influenza virus infection in human airway cell cultures by an antisense peptide-656 conjugated morpholino oligomer targeting the hemagglutinin-activating protease 657 TMPRSS2 SARS-CoV-2 Receptor ACE2 Is an Interferon-Stimulated 669 Gene in Human Airway Epithelial Cells and Is Detected in Specific Cell Subsets across 670 Tissues Coronavirus 672 membrane fusion mechanism offers a potential target for antiviral development Loss 679 of furin cleavage site attenuates SARS-CoV-2 pathogenesis Pathogenicity, 682 immunogenicity, and protective ability of an attenuated SARS-CoV-2 variant with a 683 deletion at the S1/S2 junction of the spike protein 2020. 687 Natural transmission of bat-like SARS-CoV-2∆PRRA variants in COVID-19 patients At 4 days post-infection, a piece of the insert membrane was 697 fixed in 4% paraformaldehyde in PBS at 4°C overnight. and subjected to direct 698 immunofluorescence analysis. The membranes were stained with anti-SARS-CoV-2 N protein 699 (NP) Genome coverage of SARS-CoV-2 infected HAE cells with MOIs of 0.2 and 2, 704 respectively Six total RNA samples, as indicated with six colors, extracted from HAE-ALI B2-20 cultures 706 infected with SARS-CoV-2 with at MOIs of 0.2 and 2, respectively The reads were mapped to the reference SARS-CoV-2 Wuhan-Hu-1 strain genome 708 (MN908947, NCBI), as shown with nucleotide numbers (X axis), using BWA and the sequencing 709 read coverage Identification and quantification of SARS-CoV-2 subgenomic RNAs ORF7a/b and ORF8, N protein, and ORF9a/b. The leader 715 sequence was labeled as L in blue box. The structural genes are labeled within boxes in orange 716 and the accessory genes are labeled within boxes in light green. (B) Subgenomic RNAs. Six 717 total RNA samples were extracted from SARS-CoV-2 infected HAE-ALI cultures (at MOIs of 0.2 718 and 2, respectively) and subjected to whole RNA-seq. Three repeats in each MOI group were 719 merged. Junction-spanning reads were identified using STAR (2.7.3a), and the transcript 720 abundance, as shown in % under HAE-ALI/MOI 0.2 or 2 The canonical junction-spanning reads related to each sgRNA were 723 calculated and the ratios are shown on right. The abundances of the subgenomic transcripts 724 identified in Vero cells in a previous study (15) are listed for comparison Features of the S gene of SARS-CoV-2 and the deletions detected in the FCS 727 region Key domains of the S polypeptide are diagrammed in the context 729 of the SARS-CoV-2 genome. The S1 protein, receptor binding unit, harbors N-terminal domain 730 (NTD) and receptor binding domain (RBD) subunit S2' proteolytic site, two heptad-732 repeats, HR1 and HR2, and a transmembrane domain (TM) followed by cytoplasmic peptide 733 (CP) (30). The S protein has acquired a polybasic site An FCS region of aa670-695, together with the two key deletions 735 mut-del1 (∆FCS1) and mut-del2 (∆FCS2), are shown with S amino acid sequences of the 736 Coverage plots of S gene 737 at nt 23,500 to 23,698 in SARS-CoV-2 infected HAE-ALI B2-20 . The coverage plots show the 738 most abundant junction-spanning reads in SARS-CoV-2 infected HAE-ALI B2-20 cultures are the 739 36 bp and 15 bp deletions in S gene of nt 23,594-23,629 and nt 23,583-23,597, respectively, 740 which deleted 12 aa and 5 aa shown in mut-del1 and mut-del2 in panel A infected at an MOI of 0.2 at 4 dpi, and RNA Sample 9 Apical virus release kinetics of SARS-CoV-2 infected HAE-ALI KC19 culture KC19 cultures were infected with SARS-CoV-2 at an MOI of 0.2 from the apical 746 side Plaque-forming units (PFU) were determined (y axis) and 748 plotted to the dpi. Values represent means ± standard deviations (SD) (error bars)