key: cord-0964829-2h3yk4zg
authors: Moustafa, Ahmed; Khalel, Rana Salah; Aziz, Ramy K.
title: Traces of SARS-CoV-2 RNA in Peripheral Blood Cells of Patients with COVID-19
date: 2021-08-01
journal: OMICS
DOI: 10.1089/omi.2021.0068
sha: 8577639e43a7289db5a595e8793a4786cbd6ab28
doc_id: 964829
cord_uid: 2h3yk4zg

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the third virus that caused coronavirus-related outbreaks over the past 20 years. The outbreak was first reported in December 2019 in Wuhan, China, but rapidly progressed into a pandemic of an unprecedented scale since the 1918 flu pandemic. Besides respiratory complications in patients with COVID-19, clinical characterization of severe infection cases showed several other comorbidities, including multiple organ failure, and septic shock. To better understand the systemic pathogenesis of COVID-19, we interrogated the virus's presence in the peripheral blood cells, which might provide a form of trafficking or hiding to the virus. By analyzing >2 billion sequence reads of high-throughput transcriptome sequence data from 180 samples of patients with active SARS-CoV-2 infection or healthy controls collected from 6 studies, we found evidence of traces of SARS-CoV-2 RNA in peripheral blood mononuclear cells in two samples from two independent studies. In contrast, the viral RNA was abundant in bronchoalveolar lavage specimens from the same patients. We also devised a “viral spike-to-actin” RNA normalization as a metric to compare across various samples and minimize errors caused by intersample variability in total human RNA abundance. Our observation suggests immune presentation and discounts the possibility of extensive viral infection of lymphocytes or monocytes.

T he severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the third virus that caused coronavirus-related outbreaks over the past 20 years. The first outbreak occurred in Asia in 2002  hence, the name SARS-CoV, which back then was not related to any of the known viruses (Marra et al., 2003; Rota et al., 2003) . Between 2002 and 2003, 8098 people became sick with SARS, and of those, 774 died (i.e., a mortality rate of 9.5%). Since 2004, there have been no more reports of SARS cases (via NHS, WHO, and CDC).

The second coronavirus-related outbreak started in the Arabian Peninsula in 2012 (Zaki et al., 2012) , causing a fatal disease, Middle East respiratory syndrome (MERS), with a significantly higher mortality rate of 40% of the cases infected by MERS-CoV virus (Zumla et al., 2015) .

More recently, in December 2019, the third coronavirusrelated outbreak was first reported in Wuhan, China, by the emergence of the SARS-CoV-2, initially dubbed ''the 2019 novel coronavirus'' (2019-nCoV). The spread of the virus led to a pandemic of an unprecedented scale since the 1918 flu pandemic.

As of mid-June 2021, >175 million confirmed COVID-19 cases globally, and >3.5 million deaths have been reported by the WHO (WHO Dashboard, continuously updated). Besides the respiratory complications in patients with COVID-19, clinical characterization of severe infection cases indicated further comorbidities, including multiple organ failure (liver, kidney, and heart) and septic shock (Cascella et al., 2020; Li et al., 2020; Poston et al., 2020) .

Since the first genome sequence of SARS-COV-2 has been determined and made public in January 2020 (Lu et al., 2020) , >2 million genomes have been sequenced worldwide and become available through the Global Initiative on Sharing All Influenza Data (Shu and McCauley, 2017) . The availability of those genomic sequences allows rapid screening of viral RNA in human tissues and environmental samples [e.g., sewage (Bibby and Peccia, 2013) ] using multiomic wet lab technologies and in silico screening tools for publicly available metatranscriptomic samples.

To better understand COVID-19 systemic pathogenesis, we conducted this study to interrogate the presence of the virus in the blood, or any of its components, as it might provide a form of trafficking or hiding to the virus, notably that some precarious studies reported the ability of the virus to infect lymphocytes (Wang et al., 2020b) . In contrast, others have suggested that the virus exerts its pathogenesis through ''attacking hemoglobin,'' although this hypothesis has been heavily criticized (Read, 2020) . The virus was sporadically reported to be found in the plasma or blood of patients with COVID-19 (Huang et al., 2020; Wang et al., 2020a) . Finally, peripheral blood mononuclear cells (PBMCs) were shown to harbor other infectious viruses, such as HIV, HCV, and HBV (Li et al., 2015; Wang et al., 2002) .

We computationally analyzed high-throughput sequence data from patients with active COVID-19 in several publicly available RNA-Seq datasets for the reasons mentioned previously. We found only evidence of traces of SARS-CoV-2 RNA in their PMBCs, whereas their bronchoalveolar lavage samples had large amounts of viral RNA.

Publicly available raw RNA-Seq FASTQ sequences, published by six studies (Arunachalam et al., 2020; Kusnadi et al., 2021; Manne et al., 2020; Wilk et al., 2020; Xiong et al., 2020; Zheng et al., 2020) , were retrieved (Table 1 ). For quality control of raw sequences, fastp (Chen et al., 2018) was used to remove adaptor sequences, trim low-quality ends, and remove short reads.

Filtered FASTQ sequences were aligned to the SARS-CoV-2 reference genome (GenBank accession NC_045512) by the Burrows-Wheeler aligner (Li and Durbin, 2009 ). Sambamba (Tarasov et al., 2015) was used to filter generated binary alignment map files for mapped sequences with a quality score >40 and alignment score >90. For further verification, identified SARS-CoV-2 matching sequences were manually inspected, and blastn (Altschul et al., 1990 ) was used to check them against the NCBI nucleotide ''nt'' database. Blast matches and alignments were visually reviewed. In addition, the matching sequences were annotated by blastx searches against the NCBI ''RefSeq protein'' database.

To estimate viral RNA abundance in a given sample and make a comparison between samples possible, we normalized the number of any positive SARS-CoV-2 matching hits to the total number of reads within that sample. In addition, we used the spike gene, which is highly specific to SARS-CoV-2, to estimate the extent of viral RNA load in a given sample. We normalized the abundance of spike genes to human actin RNA, being a transcript of a housekeeping gene. This normalization generated a ''spike-to-actin'' ratio that could be used as an accurate metric for viral RNA abundance relative to human RNA because the total number of reads may include nonhuman and nonviral samples. The abundance of viral RNA is estimated as the number of detected viral RNA reads to the total number of RNA reads in the sample.

MOUSTAFA ET AL.

For dimension reduction analysis, we used the plotPCA function implemented in the DESeq2 package (Love et al., 2014) , after applying the rlog function, which transforms the transcript count data to the log2 scale and normalizes the count data to the sizes of the libraries.

Filtered sequences, in FASTQ format, were processed in Salmon (Patro et al., 2017) for the quantification of the expressed transcripts against the human reference GENCODE (Frankish et al., 2019) Release 35 (GRCh38.p13). The DE-Seq2 package (Love et al., 2014) was used for transcript normalization and differential gene expression analysis.

The comparisons performed for the differential expression are (1) healthy controls versus patients with COVID-19 without viral RNA, (2) healthy controls versus patients with COVID-19 with viral RNA, and (3) patients with COVID-19 without viral RNA versus patients with COVID-19 with viral RNA. The cutoff for statistical significance was adjusted p < 0.001. Thus, the complete set of the differentially expressed genes (DEGs) is the union of the identified DEGs from the three comparisons. Patterns of differential gene expression were visualized as a heatmap using the R package pheatmap https://cran.r-project.org/package=pheatmap. Further data processing and visualization were performed in R https://www .r-project.org/. Toppgene (Chen et al., 2009 ) was used for enrichment analysis for differentially expressed transcripts. 

We analyzed 180 RNA-Seq datasets from 6 independent studies and found SARS-CoV-2 viral sequences in only 2 PBMC samples from 2 independent studies (Table 2) . Sample CRR119891 (Xiong et al., 2020) had four viral sequence reads ( Supplementary Fig. S1A, B) , matching SARS-CoV-2's polyprotein pp1ab (accession NP_828849), with blastx e-value 1 · 10 -29 , and surface glycoprotein (accession YP_009724390), with blastx e-value 2 · 10 -25 . In addition, sample SRR12626644 (Zheng et al., 2020) had two viral sequence reads (one paired-end read), matching SARS-CoV-2's ORF1a polyprotein (accession YP_ 009725295), with blastx e-value 9 · 10 -53 ( Supplementary  Fig. S1C ).

On the contrary, expectedly, we found viral sequences in all the bronchoalveolar lavage fluid (BALF) samples (Xiong et al., 2020) with a median abundance of 1.07% of the total sequence reads, which, despite being a relatively low percentage of the total reads, was sufficient to cover the entire genome of SARS-CoV-2 with coverage exceeding 4000 · for some areas in the viral genome ( Supplementary Fig. S2 ).

Based on the normalized gene expression values of the PBMC samples (Xiong et al., 2020) , principal component analysis (PCA) demonstrated separation between the COVID-19 and healthy control samples. Moreover, the t-SNE analysis showed a further separation between the patient samples in which viral RNA was detected in PBMCs and those with no detected viral RNA (Fig. 1) .

Our differential gene expression analysis of the PBMC samples (human transcriptome) identified 439 DEGs between the healthy controls and patients with COVID-19 without SARS-CoV-2 viral RNA (adjusted p < 0.001), 505 DEGs between healthy controls and patients with COVID-19 with detected SARS-CoV-2 viral RNA (adjusted p < 0.001), and 107 DEGs between patients with COVID-19 without detected SARS-CoV-2 viral RNA and patients with COVID-19 with detected SARS-CoV-2 viral RNA (adjusted p < 0.001). The pairwise comparisons of the three types resulted in 791 DEGs (Fig. 2) .

In agreement with the PCA analysis, hierarchical clustering delineated the patterns of DEGs and confirmed the separation of the three groups: controls, COVID-19 without SARS-CoV-2 viral RNA, and COVID-19 with SARS-CoV-2 viral RNA (Fig. 3) . Toppgene enrichment analysis for genes that were differentially expressed between healthy controls and COVID-19 samples with viral RNA and between COVID-19 without viral RNA and COVID-19 with viral RNA using provided enriched Gene Ontology (GO) terms under the Molecular Function, Biological Process, and Cellular Component categories (Table 3) .

When the statistically enriched terms were filtered to adjusted p < 0.05, the following categories stood out. Under the Molecular Function category, there was only one statistically significant enriched term, GO:0003823 (antigen binding). Likewise, only one statistically significant enriched term under the Cellular Component category satisfied the statistical filter, that is GO:0019814 (immunoglobulin complex). Under the Biological Process category, 19 statistically significant enriched terms were shortlisted. Many of these terms were related to immune responses and viral life cycle, including GO:0051707 (response to other organisms), GO:0002250 (adaptive immune response), GO:0045087 (innate immune response), and GO:0002449 (lymphocyte-mediated immunity).

When we calculated a spike-to-actin RNA ratio for each sample, values for all four BALF samples, as well as the one PBMC sample with SARS-CoV-2 transcripts, followed the same pattern of viral RNA abundance ratios and were strongly correlated (Spearman correlation q = 0.96, p = 0.00047; Fig. 4C )

Coronavirus-related infections are reported to be associated with hematological changes, including lymphopenia, thrombocytopenia, and leukopenia, by infecting blood cells, bone marrow stromal cells, or inducing autoantibodies (Yang et al., 2003) . In a former study characterizing the clinical features of patients with COVID-19, Huang et al. (2020) showed that using reverse-transcription polymerase chain reaction (RT-PCR) allowed them to detect coronavirus in plasma-isolated samples from the patients. Their report preferred to use the term ''RNAaemia,'' rather than ''viraemia,'' which they defined as the presence of viral RNA in the blood because they did not perform tests to confirm the presence of infectious SARS-CoV-2 virions in the blood of the patients.

A handful of early reports detected amplifiable viral RNA in the blood (Peng et al., 2020; Wang et al., 2020a) ; however, to our knowledge, no studies used an unbiased systematic approach to report SARS-CoV2 RNA in PBMCs (Azghandi and Kerachian, 2020) . In their preliminary analysis of RNA isolated from PBMCs, Corley et al. confirmed that they did not detect viral sequences (a preprint 10.1101/2020.04.13. 039263v1). High-throughput sequencing has been repeatedly demonstrated to be a practical approach for the identification and quantification of viruses in the blood (Moustafa et al., 2017) following similar methods as those used in viral metagenomics (Aziz et al., 2015; Breitbart et al., 2003) and uncultivated viral genomics (Roux et al., 2019) .

Therefore, we planned to exhaustively mine publicly available RNA-Seq PBMC datasets for SARS-CoV-2 RNA sequences. As of the writing of this report, only one study reported viral RNA in an RNA-Seq PBMC dataset (Xiong et al., 2020) . In that study, the authors profiled global gene expression in BALF and PBMC specimens of patients with COVID-19. Predictably, we detected viral RNA in all BALF samples (2 patients, 2 replicates each) with an average abundance of 1.07% of the total RNA reads in those samples, including human RNA (Table 1) . We also confirmed the finding by Xiong et al. (2020) and detected the viral RNA sequences in one of their PBMC samples from patients with COVID-19 (Table 1) Of note, to estimate the viral RNA load in a given sample and to compare between different samples, we normalized viral read counts to the total number of reads in each dataset (Fig. 4) . Because the total number of human-related reads may be strongly affected by potential microbial RNA (from pulmonary microbiota, for example) or by extensive viral counts, we also sought to estimate the ratio of viral genomic RNA to human cellular RNA. For that purpose, we used actin RNA, being a transcript for a human housekeeping genenot expected to be seen in bacterial cells or viral genomes and expected to have constitutive expression levels. We estimated the ratio of hits to RNA encoding the spike protein (S) to actin transcripts in the sample. Quite interestingly, the spike-to-actin ratio strongly correlated with viral RNA abundance values (computed as SARS-CoV-2 RNA reads normalized to the total number of reads) (Fig. 4C) .

These RNA traces are indeed quite rare; however, they confidently and unambiguously belong to SARS-COV-2 ( Supplementary Fig. S1 ). One viral RNA read translates into polyprotein (pp1ab, accession NP_828849), the largest protein of coronaviruses and involved in replicating and transcribing the viral genome. The other viral RNA from the same sample translates into surface (spike) glycoprotein (accession YP_009724390), which mediates the entry of the viral RNA into human cells expressing human angiotensinconverting enzyme 2(hACE2) (Ou et al., 2020; Walls et al., 2020) . The third viral RNA was from a different sample, and it translated into ORF1a polyprotein (accession number YP_009725295), a leader protein, from which the nonstructural proteins are derived.

In our differential expression analysis of the human transcriptome, we opted to limit the analysis to the samples from the Xiong et al. (2020) study because (1) it includes PBMCs from healthy donors and patients with COVID-19 with and without RNA viral sequences and (2) to exclude possible batch effects across RNA-Seq datasets generated from different studies. Our analysis shows a clear separation between the PBMC transcriptomes from healthy donors and patients with COVID-19 ( Figs. 1 and 3 ). The analysis also shows a further separation within the COVID-19 PBMC samples between samples without detected viral RNA and the sample with viral RNA, suggesting that the PBMC transcriptome with the presence of the viral RNA differs from that with no detected viral RNA sequences.

Although we do not reject the possibility of crosscontamination or barcode bleeding (Kircher et al., 2012; Mitra et al., 2015) for detecting the viral RNA in two PBMC RNA-Seq samples, such possibility is unlikely, given that control samples had no hits to SARS-CoV-2 RNA. A more likely possibility is that SARS-CoV-2 is being sampled by antigen-presenting cells (most likely dendritic cells) or presented to T lymphocytes, which are in the PBMC population. Because such an event (antigen presentation) may be relatively difficult to detect, this explains why it was detected in only 2 of 118 datasets.

One more possibility, which needs many more samples to consider, is that SARS-CoV-2 may be transiently or coincidentally internalized by one of the mononuclear cell types, suggesting a mechanism for the chronicity of the SARS-CoV-2 infection. This hypothesis requires further testing. However, in light of our analysis, it is difficult to support the early reports that SARS-COV-2 targets T lymphocytes in vivo, as suggested earlier, in a correspondence, based on cell culture experiment with pseudotyped viruses (Wang et al., 2020b) .

It is important to emphasize that, after 1.5 years of the start of the COVID-19 pandemic, a scientific consensus has been reached that this disease is not merely respiratory, but is a systemic disease involving multiple organs. Our results about the paucity of detectable SARS-CoV-2 RNA in peripheral blood cells are not contradictory to the systemic nature of the disease or the possibility of viral transport through the circulatory system. What the study reports, quite specifically, is that PBMCs are not specifically targeted by the virus. This finding is supported by reports on the lack of ACE2 expression by those immune cells (Hamming et al., 2004; Salamanna et al., 2020) .

Although publicly available SARS-CoV-2 genomic data have accumulated to a historical record (>2 million genome sequences), blood transcriptome and RNA-seq data remain insufficient. With more data becoming publicly available, it will be possible to revisit this hypothesis and others to improve our understanding of the progression, replication, chronicity, and recurrence of SARS-CoV-2 in infected individuals.

Basic local alignment search tool

Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans

Detection of novel coronavirus (SARS-CoV-2) RNA in peripheral blood specimens

Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes

Identification of viral pathogen diversity in sewage sludge by metagenome analysis

Metagenomic analyses of an uncultured viral community from human feces

Features, evaluation and treatment coronavirus (COVID-19)

Topp-Gene suite for gene list enrichment analysis and candidate gene prioritization

Fastp: An ultra-fast all-in-one FASTQ preprocessor

GEN-CODE reference annotation for the human and mouse genomes

Tissue distribution of ACE2 protein, the functional receptor for SARS coronavirus. A first step in understanding SARS pathogenesis

Clinical features of patients infected with 2019 novel coronavirus in Wuhan

Double indexing overcomes inaccuracies in multiplex sequencing on the illumina platform

Severely ill COVID-19 patients display impaired exhaustion features in SARS-CoV-2-reactive CD8 T cells

Fast and accurate short read alignment with burrows-wheeler transform

SARS-CoV-2 and viral sepsis: Observations and hypotheses

Is mother-to-infant transmission the most important factor for persistent HBV infection?

Moderated estimation of fold change and dispersion for RNA-Seq Data with DESeq2

Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding

Platelet gene expression and function in patients with COVID-19

The genome sequence of the SARS-associated coronavirus

Strategies for achieving high sequencing accuracy for low diversity samples and avoiding sample bleeding using illumina platform

Traces of SARS-CoV-2 RNA in the blood of COVID-19 patients. Infectious Diseases (except HIV/AIDS)

The blood DNA virome in 8,000 humans

Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV

Salmon provides fast and bias-aware quantification of transcript expression

SARS-CoV-2 can be detected in urine, blood, anal swabs, and oropharyngeal swabs specimens

Management of critically ill adults with COVID-19

Flawed methods in 'COVID-19: Attacks the 1-beta chain of hemoglobin and captures the porphyrin to inhibit human heme metabolism

Characterization of a novel coronavirus associated with severe acute respiratory syndrome

Minimum information about an uncultivated virus genome (MIUViG)

Body localization of ACE-2: On the trail of the keyhole of SARS-CoV-2

GISAID: Global initiative on sharing all influenza data-from vision to Reality

Sambamba: Fast processing of NGS alignment formats

Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein

Detection of SARS-CoV-2 in different types of clinical specimens

SARS-CoV-2 infects T lymphocytes through its spike protein-mediated membrane fusion

Detection of dengue virus replication in peripheral blood mononuclear cells from dengue virus type 2-infected patients by a reverse transcription-real-time PCR assay

A single-cell atlas of the peripheral immune response in patients with severe coVID-19

Transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in COVID-19 patients

The effect of SARS coronavirus on blood system: Its clinical findings and the pathophysiologic hypothesis

Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia

Longitudinal transcriptome analyses show robust T cell immunity during recovery from COVID-19

An early article representing part of this work, analyzing one of the six datasets, has been released as a preprint (Moustafa and Aziz, 2020) 

No competing financial interests exist. The authors have no connection with or interest in any of the analyzed datasets and have no direct links to the authors who generated those datasets.

A.M. was funded by the American University in Cairo (AUC) for establishing the AUC Centennial Laboratory for Bioinformatics and Integrative Genomics. R.K.A. was funded by the Cairo University anti-COVID task force and also by the Academy of Scientific Research and Technology (ASRT) JESOR project #3046 (Center for Genome and Microbiome Research). The funders were not involved in the article, nor did they interfere with the publication process.