key: cord-0915177-htr5jad4 authors: Yan, Bingyu; Chakravorty, Srishti; Mirabelli, Carmen; Wang, Luopin; Trujillo-Ochoa, Jorge L.; Chauss, Daniel; Kumar, Dhaneshwar; Lionakis, Michail S.; Olson, Matthew R.; Wobus, Christiane E.; Afzali, Behdad; Kazemian, Majid title: Reply to Grigoriev et al., “Sequences of SARS-CoV-2 ‘Hybrids’ with the Human Genome: Signs of Non-coding RNA?” date: 2022-01-26 journal: Journal of virology DOI: 10.1128/jvi.01690-21 sha: 94f09d626b76419595fe4da0e94dbaea4064681e doc_id: 915177 cord_uid: htr5jad4 High throughput sequencing reads from virally infected cells provide detailed information about both the infected host cells and invading viruses (1). For example, RNA-sequencing techniques from infected cells contains reads that unequivocally align to either the host or the viral transcriptomes, enabling quantification of host and viral gene expressions (2). Occasionally, there are reads with split characteristics, having one part (e.g., the 5' end) unambiguously matching the host and another part (e.g., the 3' end) clearly matching the viral genomes. The split characteristic with unambiguous matching on either part is the key here, typically requiring convincing stretches of sequence matches such as >30bp that we used in our analysis (3). Such reads are termed host-virus chimeric reads (HVCRs). Indeed, HVCRs that surpass statistical reproducibility and signal-to-noise standards might carry novel insights into the biology of host-virus interactions (4, 5). Thus, it is important to unambiguously detect statistically rigorous and biologically relevant HVCRs. We and others have shown that detection of relevant HVCRs is complicated by unfaithful reverse-transcriptase and polymerase enzymes that template-switch during typical high throughput sequencing library preparation protocols (6-9). H igh throughput sequencing reads from virally infected cells provide detailed information about both the infected host cells and invading viruses (1) . For example, RNA-sequencing techniques from infected cells contains reads that unequivocally align to either the host or the viral transcriptomes, enabling quantification of host and viral gene expressions (2) . Occasionally, there are reads with split characteristics, having one part (e.g., the 59 end) unambiguously matching the host and another part (e.g., the 39 end) clearly matching the viral genomes. The split characteristic with unambiguous matching on either part is the key here, typically requiring convincing stretches of sequence matches such as .30 bp that we used in our analysis (3) . Such reads are termed host-virus chimeric reads (HVCRs). Indeed, HVCRs that surpass statistical reproducibility and signal-to-noise standards might carry novel insights into the biology of host-virus interactions (4, 5) . Thus, it is important to unambiguously detect statistically rigorous and biologically relevant HVCRs. We and others have shown that detection of relevant HVCRs is complicated by unfaithful reverse transcriptase and polymerase enzymes that template-switch during typical high throughput sequencing library preparation protocols (6) (7) (8) (9) . The conventional HVCRs with split characteristics that we and others used in our studies should not be confused with what we term "composite" host reads that contain short matches to the viral genome or, vice-versa, viral reads that contain short sequence matches to the host genome in the middle of the reads. Such "composite" viral reads seem to be the subject of the letter contributed by Grigoriev et al. Our work only evaluated the biological relevance of conventional HVCRs and showed that in the context of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, they are most likely artifacts of library construction. Due to the short nature of sequence matches within "composite" reads (such as those identified by Grigoriev et al.), they are more prone to statistical anomalies and alignment errors and are likely to align by chance to at least some regions of the 3.2 billion base pairs encoded in the human genome. Thus, any analysis of "composite" events would need to include empirical or theoretical probabilities of such observations under rigorous control experiments to rule out template switching, alignment errors, or statistical anomalies. Nevertheless, to avoid any misinterpretation, it is important to note that the observations of composite reads by Grigoriev et al. have no bearing on our original findings Integrated pan-cancer map of EBV-associated neoplasms reveals functional host-virus interactions SARS-CoV-2 drives JAK1/2-dependent local complement hyperactivation Host-virus chimeric events in SARS-CoV-2-infected cells are infrequent and artifactual Epstein-Barr virus Episome physically interacts with active regions of the host genome in lymphoblastoid cells The landscape of viral associations in human cancers Template-switching artifacts resemble alternative polyadenylation Reverse transcriptase template switching and false alternative transcripts Hypothesis: Artifacts, including spurious chimeric RNAs with a short homologous sequence, caused by consecutive reverse transcriptions and endogenous random primers Suppression of artifacts and barcode bias in highthroughput transcriptome analyses utilizing template switching SARS-CoV-2-host chimeric RNA-sequencing reads do not necessarily arise from virus integration into the host DNA No evidence of human genome integration of SARS-CoV-2 found by long-read DNA sequencing