key: cord-0814188-9wx4ltyd authors: Zhang, Liguo; Richards, Alexsia; Khalil, Andrew; Wogram, Emile; Ma, Haiting; Young, Richard A.; Jaenisch, Rudolf title: SARS-CoV-2 RNA reverse-transcribed and integrated into the human genome date: 2020-12-13 journal: bioRxiv DOI: 10.1101/2020.12.12.422516 sha: 2c9c0fe1fd1770ed8a374244b1a02a04c0ad2a50 doc_id: 814188 cord_uid: 9wx4ltyd Prolonged SARS-CoV-2 RNA shedding and recurrence of PCR-positive tests have been widely reported in patients after recovery, yet these patients most commonly are non-infectious1–14. Here we investigated the possibility that SARS-CoV-2 RNAs can be reverse-transcribed and integrated into the human genome and that transcription of the integrated sequences might account for PCR-positive tests. In support of this hypothesis, we found chimeric transcripts consisting of viral fused to cellular sequences in published data sets of SARS-CoV-2 infected cultured cells and primary cells of patients, consistent with the transcription of viral sequences integrated into the genome. To experimentally corroborate the possibility of viral retro-integration, we describe evidence that SARS-CoV-2 RNAs can be reverse transcribed in human cells by reverse transcriptase (RT) from LINE-1 elements or by HIV-1 RT, and that these DNA sequences can be integrated into the cell genome and subsequently be transcribed. Human endogenous LINE-1 expression was induced upon SARS-CoV-2 infection or by cytokine exposure in cultured cells, suggesting a molecular mechanism for SARS-CoV-2 retro-integration in patients. This novel feature of SARS-CoV-2 infection may explain why patients can continue to produce viral RNA after recovery and suggests a new aspect of RNA virus replication. Continuous or recurrent positive SARS-CoV-2 PCR tests have been reported in patients 35 weeks or months after recovery from an initial infection 1-14 . Although bona fide re-infection of 36 SARS-CoV-2 after recovery has been reported lately 15 , cohort-based studies with strict 37 quarantine on subjects recovered from COVID-19 suggested "re-positive" cases were not caused 38 by re-infection 16, 17 . Furthermore, no replication-competent virus was isolated or spread from 39 these PCR-positive patients 1-3,5,6,12 . The cause for such prolonged and recurrent viral RNA 40 production is unknown. As positive-stranded RNA viruses, SARS-CoV-2 and other beta-41 coronaviruses such as SARS-CoV-1 and MERS employ an RNA-dependent RNA polymerase to 42 replicate their genomic RNA and transcribe their sub-genomic RNAs 18-20 . One possibility is that 43 SARS-CoV-2 RNAs could be reverse-transcribed and integrated into the human genome, and 44 transcription of the integrated DNA copies could be responsible for positive PCR tests. To investigate the possibility of viral integration into virus infected cells we analyzed 59 published RNA-Seq data from SARS-CoV-2 -infected cells for evidence of chimeric transcripts, 60 which would be indicative of viral integration into the genome and expression. Examination of 61 these data sets 24-30 ( Fig. S1a-b) revealed a substantial number of host-viral chimeric reads (Fig. 62 1a-c, S1c). These occurred in multiple sample types, including cells and organoids from 63 lung/heart/brain/stomach tissues, as well as BALF cells directly isolated from COVID-19 64 patients (Fig. 1c) . Chimeric read abundance was positively correlated with viral RNA level 65 across the sample types (Fig. 1c) . Chimeric reads generally accounted for 0.004% -0.14% of 66 total SARS-CoV-2 reads across the samples, with a 69.24% maximal number of reads in 67 bronchoalveolar lavage fluid cells derived from severe COVID19 patients and near no chimeric 68 reads from patient blood buffy coat cells (corresponding to almost no total SARS-CoV-2 reads). 69 A majority of chimeric junctions mapped to SARS-CoV-2 nucleocapsid (N) sequence ( Fig. 1d -70 e). This is consistent with the finding that nucleocapsid (N) RNA is the most abundant SARS-71 CoV-2 sub-genomic RNA 31 , and thus is most likely to be a target for reverse transcription and 72 integration. These analyses support the hypothesis that SARS-CoV-2 RNA may retro-integrate 73 into the genome of infected cells resulting in the production of chimeric viral-cellular transcripts. by a CMV promoter showed ~8-fold higher signals of N sequence detection suggesting a higher 90 copy-number of integrated N sequences than in cells expressing LINE-1 driven by its natural 91 promoter (5'UTR) or HIV-1 RT (Fig. 2c) . We were able to clone full-length N DNA from gDNA 92 of cells overexpressing CMV-LINE-1 and confirmed its sequence by Sanger sequencing (Fig. 93 S2b). We did not detect the full-length N sequence from gDNA of cells transfected with 5'UTR-94 LINE-1 or HIV-1 RT, which may be due to lower expression of RT in these cells (Fig. S2b) . We 95 further confirmed that purified SARS-CoV-2 RNA from infected cells can be reverse-transcribed 96 in vitro by lysates of cells expressing either LINE-1 or HIV-1 RT (Fig. S2c-d) . 97 We conducted single-molecule RNA-FISH (smRNA-FISH) using fluorophore-labeled 98 oligo-nucleotide probes targeting N (Fig. 2a) cells that are efficiently infected versus NHBE cells that are resistant to infection). Although the 119 upregulation in Calu3 was not higher than that in NHBE, multiple LINE-1 elements were 120 upregulated as compared to just one in NHBE (Fig. 3a, S4b, d) . Expression analysis using LINE-121 1 specific primers 33,34 showed a ~3-4-fold up-regulation of LINE-1 in Calu3 cells when infected 122 by SARS-CoV-2 (Fig. 3c) . Moreover, PCR analysis on Calu3 cellular DNA showed retro-123 integration of SARS-CoV-2 N sequences after infection (Fig. 3d- Discussion 137 138 In this study, we showed evidence that SARS-CoV-2 RNAs can be reverse-transcribed 139 and integrated into the human genome by several sources of reverse transcriptase such as 140 activated human LINE-1 or co-infected retrovirus (HIV). We found LINE-1 expression can be 141 induced upon SARS-CoV-2 infection or cytokine exposure, suggesting a molecular mechanism 142 responsible for SARS-CoV-2 retro-integration in patients. Moreover, our results suggest that the 143 3D optical sections were acquired with 0.2-μm z-steps using a DeltaVision Elite Imaging 245 System microscope system with a 100 × oil objective (NA 1.4) and a pco.edge 5.5 camera and 246 DeltaVision SoftWoRx software (GE Healthcare). Image deconvolution was done using 247 SoftWoRx. All figure panel images were prepared using FIJI software (ImageJ, NIH) and Adobe 248 Illustrator 2020 (Adobe), showing deconvolved single z-slices. 249 To measure the LINE-1 ORF1p immuno-staining signal intensity, we projected cell 250 optical sections (sum, 42 slices) with the "z projection" function in FIJI. We measured the sum 251 of intensity of the entire cell area in the z-projected image as the signal intensity, subtracted the 252 background intensity outside of cells and then divided by the mean of the "Basal media 253 treatment" group to have the normalized signal intensity, as previously described 44,45 . All images 254 from the same experiment were using the same exposure time and transmitted exciting light. All 255 intensity measurements were done with non-deconvolved raw images. Box plot was done in R 256 Figure 1a) . 262 To identify human -SARS-CoV-2 chimeric reads, raw sequencing reads were aligned to 263 concatenated human and SARS-CoV-2 genomes plus transcriptomes by STAR (version 264 2.7.1a) 47 . Human genome version hg38 with no alternative chromosomes and gene annotation 265 version GRCh38.97 were used. SARS-CoV-2 genome version NC_045512.2 and gene 266 annotation (http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/bigZips/genes/) were used. 267 The following STAR parameters 31 were used to call chimeric reads unless otherwise specified 268 To analyze human LINE-1 expression in RNA-Seq data, a published method, 272 RepEnrich2 48 , was used to map RNA-Seq reads to human repeat annotations, using human 273 repeat masker (hg38). Differential expression was analyzed using EdgeR package (version -2 infection: The role of cytokines in COVID-19 disease Distinct viral reservoirs in individuals with spontaneous control of HIV-1. 447 Protein-nucleic acid interactions of LINE-1 REGN-COV2 antibodies prevent and treat SARS-CoV-2 infection in rhesus 451 macaques and hamsters Lentivirus-delivered stable gene silencing by RNAi in primary cells Evolutionary conservation of the 455 functional modularity of primate and murine LINE-1 elements Gamma radiation increases 458 endonuclease-dependent L1 retrotransposition in a cultured cell assay TSA-Seq reveals a largely "hardwired" genome organization relative to 461 nuclear speckles with small position changes tightly correlated with gene expression 462 changes Gene 464 expression amplification by nuclear speckle association R: A language and environment for statistical computing. R 467 Foundation for Statistical Computing STAR: ultrafast universal RNA-seq aligner Transcriptional 471 landscape of repetitive elements in normal and cancer human cells edgeR: a Bioconductor package for 474 differential expression analysis of digital gene expression data Differential expression analysis of multifactor 477 RNA-Seq experiments with respect to biological variation Functional Studies of Missense TREM2 Mutations in Human 480 Stem Cell-Derived Microglia Human T Cells Expressing a CD19 CAR-T Receptor 483 Provide Insights into Mechanisms of Human CD19-Positive beta Cell Destruction Three biological replicates; mean ± s.e.m. d) qPCR 537 detection and copy-number estimation of SARS-CoV-2 N sequences in mock (green) CoV-2 infected (magenta) Calu3 cellular DNA. HSPA1A: human HSPA1A gene as a reference SARS-CoV-2 N sequences as shown in Figure 1a. Three biological replicates m; n.d.: not detected. e) Gel purification of large-fragment genomic DNA (yellow box, top SARS-CoV-2 infected Calu3 cells and PCR detection of SARS-CoV-2 N sequences in the 542 purified genomic DNA (bottom) with same primer sets as in d). f) Endogenous LINE-1 543 expression fold-changes in Calu3 cells comparing Myeloid conditioned versus basal media 544 treatment measured by RT-qPCR with primers probing 5'UTR, ORF1, or 3'UTR regions of 545 LINE-1. Reference genes: GAPDH and TUBB. Three biological replicates Supplementary Figure 1. Human -SARS-CoV-2 chimeric reads RNA-Seq data. a) Published data used to identify human -CoV2 chimeric reads summarizing 557 GEO accession number (data ID), sample type, infection method/type (MOI: Multiplicity Of 558 RNA-Seq format (single or paired-end with read length), and threshold to call 559 chimeric reads (Min overhang: minimum number of bases mapped to either human or SARS-560 CoV-2 genome/transcriptome to call a chimeric reads). b) Comparison of SARS-CoV-2 read 561 fraction of total mappable reads in the published RNA-Seq datasets as shown in a chimeric read example (149 nt) from Calu3 (infected) RNA-Seq with 57 nt mapped to human 563 Chromosome X (green) and 92 nt (magenta) mapped to the SARS-CoV-2 genome Supplementary Figure 3. SARS-CoV-2 N RNA signals detected in cell nuclei by single-585 RNA-FISH. a-b) Example images of single-molecule RNA-FISH (red/grey) targeting SARS-CoV-2 N sequence using probes shown in Figure 2a and merged channels with DAPI 587 (blue) in SARS-CoV-2 infected HEK293T cells without (a) or with (b) human LINE-1 Insets in b): 4x enlargement of regions in white-boxes to show nuclear signals of 589 SARS-CoV-2 N sequence (white arrows). c) Comparison of nuclear N RNA-FISH signals in 590 Left: 591 example images as in a) and b); Right: fraction of HEK293T cells infected by SARS-CoV-2 592 (indicated by cytoplasmic FISH signals) showing nuclear N RNA-FISH signals in cell 593 populations without (left bar, n = 109) or with (right bar Combination of two independent cell samples All images shown were single z-slices from 3D optical sections 596 (0.2-μm z-steps) Log2 fold-changes (x-axis) of different types of human repetitive elements (y-600 axis) with significant (FDR < 0.05) expression changes in SARS-CoV-2 versus mock NHBE (c) cells from published RNA-Seq data (GSE147507). b, d) Fold changes (y-602 axis) of different human LINE-1 families (x-axis) with significant (FDR < 0.05) expression 603 changes in SARS-CoV-2 versus mock infected Calu3 (b) or NHBE (d) cells from published 604 RNA-Seq data (GSE147507, see Supplementary Figure 1a) Cytokine containing media treatment triggers LINE-1 expression 607 in human cells. a) LINE-1 ORF1 protein immuno-staining (magenta, same exposure and 608 intensity scaling, 1 st column: no primary antibody control) plus merged channels with DAPI 609 (blue) in HEK293T cells cultured in basal (1 st and 2 nd columns) or microglia conditioned media 610 (3 rd column) or LPS-treated microglia conditioned media Endogenous LINE-1 expression fold-changes in Calu3 cells between CAR-T conditioned 612 (diluted with basal media at indicated percentage in volume) versus basal media treatment 613 measured by RT-qPCR with primers probing 5'UTR, ORF1 Three independent cell samples treated with two batches 615 of media Supplementary Table 1. Primer sequences used in this study 619 Species Name Sequence SARS-CoV-2 N1 Forward: GACCCCAAAATCAGCGAAAT Reverse: TCTGGTTACTGCCAGTTGAATCTG SARS-CoV-2 N2 Forward: GGGAGCCTTGAATACACCAAAA Reverse: TGTAGCACGATTGCAGCATTG SARS-CoV-2 N3 Forward: GGGGAACTTCTCCTGCTAGAAT Reverse: CAGACATTTTGCTCTCAAGCTG SARS-CoV-2 N4 Forward: AAATTTTGGGGACCAGGAAC Reverse: TGGCACCTGTGTAGGTCAAC SARS-CoV-2 N (full length) Forward: ATGTCTGATAATGGACCCCAAAAT Reverse: TTAGGCCTGAGTTGAGTCAGC Human HSPA1A Forward: ATCTCCACCTTGCCGTGTT Reverse: ATCCAGTGTTCCGTTTCCAG Human TUBB Forward: TCCCTAAGCCTCCAGAAACG Reverse: CCAGAGTCAGGGGTGTTCAT Human GAPDH Forward: GTCTCCTCTGACTTCAACAGCG Reverse: ACCACCCTGTTGCTGTAGCCAA Human LINE-1-5'UTR Forward: GACGCAGAAGACGGTGATTT Reverse: TCACCCCTTTCTTTGACTCG Human LINE-1-ORF1 Forward: CTCGGCAGAAACCCTACAAG Reverse: CCATGTTTAGCGCTTCCTTC Human