key: cord-0713427-h184u2x2 authors: Chen, X.; Lin, Y.; Wu, T.; Xu, J.; Ma, Z.; Sun, K.; Li, H.; Luo, Y.; Zhang, C.; Chen, F.; Wang, J.; Kuo, T.; Li, X.; Geng, C.; Lin, F.; Huang, C.; Hu, J.; Yin, J.; Liu, M.; Tao, Y.; Zhang, J.; Ou, R.; Xiao, F.; Yang, H.; Xu, X.; Fu, S.; Jin, X.; Jiang, H.; Chen, R. title: Time-series plasma cell-free DNA analysis reveals disease severity of COVID-19 patients date: 2020-06-09 journal: nan DOI: 10.1101/2020.06.08.20124305 sha: 0a256d2a62ed04438adbb9f406fd7522b80f8c5f doc_id: 713427 cord_uid: h184u2x2 Symptoms of coronavirus disease 2019 (COVID-19) range from asymptomatic to severe pneumonia and death. Detection of individuals at high risk for critical condition is crucial for control of the disease. Herein, for the first time, we profiled and analyzed plasma cell-free DNA (cfDNA) of mild and severe COVID-19 patients. We found that in comparison between mild and severe COVID-19 patients, Interleukin-37 signaling was one of the most relevant pathways; top significantly altered genes included POTEH, FAM27C, SPATA48, which were mostly expressed in prostate and testis; adrenal glands, small intestines and liver were tissues presenting most differentially expressed genes. Our data thus revealed potential tissue involvement, provided insights into mechanism on COVID-19 progression, and highlighted utility of cfDNA as a noninvasive biomarker for disease severity inspections. points within 29 days of hospitalization; plasma cfDNA was extracted and sequenced to ~12x human haploid genome coverage for each time point (Fig. S1 and Table S2 ). We investigated gene expression pattern in cfDNA via analyzing the sequencing depth-normalized, relative coverage of the promoters. Gene set enrichment analysis was performed using the relative coverage around promoters of known genes on the two time-course data sets and controls (Fig. 1) . Six gene clusters showing significantly different patterns between mild and severe cases were identified ( Fig. 1A and Fig. 1C) . Notably, the average coverage around gene promoters from cluster 2 and 6 decreased along hospitalization time line for the severe case (suggesting upregulation of these genes), while such pattern did not exist in mild case (Fig. 1A) , indicating that the genes involved in disease course could be different in mild and severe cases. Pathway analysis was carried out on genes of the above six clusters separately (Fig. 1B) . For cluster 1 containing the maximum number of genes among the six clusters, the most enriched pathway is interleukin-37 (IL-37) signaling (p = 0.005, Table S3 ), which is involved in suppression of cytokine production and inflammation inhibition (19, 20) .Interestingly, this pathway had been reported to have potential therapeutic effect on COVID-19 (21) . Cluster 2 shows a clear trend of increased expression in the severe case, especially at the last three time points; significantly enriched pathways in this cluster include cytosolic sulfonation of small molecules (p = 0.009) and RUNX1 regulates genes involved in megakaryocyte differentiation and platelet function (p = 0.01), which is in accordance with the dropping platelet counts from clinical laboratory records of the severe case (Table 1 ). Cluster 6, which showed similar trend with cluster 2, involves Ub-specific processing proteases (p = 2.8×10 -9 ) and deubiquitination pathways (p = 3.3×10 -9 ), which are commonly hijacked by viruses for replication and pathogenesis, and were reported containing druggable targets to treat COVID-19 (22) . Other enriched pathways include olfactory signaling pathway (p = 0.003), G alpha (s) signaling events (p = 0.006), signal regulatory protein family interactions (p = 4.5×10 -7 ), and cell-cell communication (p = 2.3×10 -4 ), which are expected given inflammatory and immune responses in severe COVID-19. To detect differentiated genes and tissues between mild and severe cases, relative coverages around promoters from cfDNA 1) within control group, 2) among 4 time points from mild case, 3) among 4 time points from severe case, 4) between mild and severe cases, 5) between cases and controls, were compared using Mann-Whitney rank-sum tests (Fig. 2) . We laid emphasis on differences between mild and severe cases, while not significant either between cases and controls, or within controls. Table 2 lists top 10 genes out of all significantly differentially expressed genes ( Fig. 2A and Table S4 ), and the coverage pattern around the promoters of the top 10 genes were shown in Fig. S2 . Strikingly, the top 3 genes, namely POTEH (Prostate, Ovary, Testis-Expressed Protein On Chromosome 22), MGC39584 (Family With Sequence Similarity 27 Member C), and C7orf72 (Spermatogenesis Associated 48) are all highly linked to male gender: POTEH and MGC39584 are specifically expressed in prostate and testis, and C7orf72 is associated to spermatogenesis (23) . The significant difference of genes related to male reproductive system are corroborated by previous studies that claimed spermatogonia cells to be targets attacked by SARS-CoV-2 (24,25). Other genes worth noting include SQLE that catalyzes oxidation of squalene, a reported conjugate of therapeutic drugs for COVID-19 (26), . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . (Severe4) for severe case, and data from two control individuals. Color scale represents average coverage around TSSs of each gene weighted by average whole-genome sequencing depth from cfDNA. 6 clusters with differentiated pattern between mild and severe cases were marked and corresponding gene lists and relative coverage for each data set were shown (C). Top significant . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 9, 2020. . pathways based on genes of the 6 clusters were illustrated and mapped to six different functional modules (B). (A) Meta p values of rank-sum tests on coverage distribution in 200bp TSSs of genes on whole genome for case and control (pink), mild and severe (green), and control (blue). Name of genes with -log10(p value) greater than 6 within each group were marked. (B) Meta p values of genes that were expressed specifically in 28 tissues from 1) controls, 2) mild time series, 3) severe time series, 4) mild and severe cases, 5) cases and controls, from top to bottom. (C) Percent of significant genes marked in (A) within each tissue from (B) were summarized for control (green), case and control (pink), mild and severe (blue), severe (grey). Tissues were sorted based on percent of significant genes between case and control. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 9, 2020. . plasma collected at 4 time points for mild (light blue) and severe (dark blue) cases, and at once for the two cases (green, time for dotted lines are invented only for comparison). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 9, 2020. Materials and Methods Fig. S1 to S3 Tables S1 to S8 References . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.08.20124305 doi: medRxiv preprint A total of 10 samples were collected from 2 patients with COVID-19 at 4 time points and 2 health controls. Leftover surplus blood was collected from participants who had signed consent forms after clinical diagnosis. Peripheral blood was stored using EDTA anticoagulant-coated tubes. The blood samples pretreatment and DNA extraction were proceeded at a Biosafety Level 2 (BSL-2) laboratory to ensure the appropriate biosafety practices (1). All samples were centrifuged at low speed (3000 rpm) for 10 min at 4°C within six hours after collection. The supernatant was centrifuged at high speed (1,4000 rpm) for 10 min at 4°C. Then the plasma was set at 56°C water bath for 30 minutes. Circulating cfDNA was extracted from 200uL plasma using MagPure Circulating DNA Mini KF Kit (MD5432-02) following the manufacturer's guide. The cfDNA was eluted by 200uL TE buffer for QC and 40uL for the rest. For cfDNA library construction, the extracted cfDNAs were processed to library using MGIEasy Cell-free DNA Library Prep kit (MGI, cat. No.: AA00226). For upstream data processing, firstly, soapnuke (version 1.5.0) (2) was used for trimming the sequencing adapters from raw reads, and filtering low quality and high ratio N. N rate threshold was set to 0.1, low quality rate was set to 0.5, and low quality threshold, namely the max mismatch number when match the adapter to 2 and the clean data quality system to sanger, was set to 12. Secondly, BWA (version 0.7.17-r1188) (3) was used to map reads against the human reference genome (build hg38). After sorting and removing duplication of the aligned reads, variants were detected by Haplotype caller from GATK (4). The above steps were performed by sention (5), a platform for processing of genomic data that combined alignment, variants calling, and quality recalibration together efficiently. Firstly, for each gene, average sequencing depth within 2kb region around transcription start sites (2kb-TSSs) and 200bp region around TSSs (200bp-TSSs) were calculated respectively by DepthOfCoverage package from GATK (4). The relative coverage around TSS was the above depth normalized by average depth of WGS data from each sample. Secondly, for each gene, Si = Dmaxi -Dmini was calculated, where Dmaxi represented the highest depth among group i within 2kb-TSSs/200bp-TSSs and Dmini represented the lowest depth among group i within 2kb-TSSs/200bp-TSSs. Genes with Scontrols > Scases were filtered, and remained genes were ranked by Scases. Gene set enrichment analysis was performed on the top 1% ranked genes by the heatmap package from R version 3.5.1, and clusters of genes were selected from dendrogram output by heatmap package. Pathway analysis was carried out by Reactom (6) on the 6 clusters of genes separately (Table S2) . Results based on 2kb-TSS regions were presented in Fig. 1 , and results based on 200bp-TSS regions were summarized in Fig. S3 . To test difference among different groups, including 1) Gcontrol: 2 samples from 2 control individuals, 2) Gmild: 4 samples from 1 mild case at 4 time points, 3) Gsevere: 4 samples from 1 severe case at 4 time points, 4)Gmild,severe: 4 mild and 4 severe samples, 5) Gcase,control: 2 controls and 8 cases, reads start counts, namely counts of starts of sequencing reads pairs, were calculated within 200bp region around transcription start site (TSS) of each gene. For the 2 samples from 2 control individuals and the 8 samples from 4 cases collected at 4 time points, pairwise Wilcox's Rank-sum tests were carried out by wilcox package from R version 3.5.1, and corresponding p values, including Gcontrol∈{pcontrol1,control2}, Gmild∈{pmild1,mild2, pmild1_mild3, pmild1_mild4, pmild2_mild3, pmild2,mild4, pmild3_mild4}, Gsevere ∈{psevere1,severe2, psevere1,severe3, psevere1,severe4, psevere2,severe3, psevere2,severe4, psevere3,severe4}, Gmild,severe∈{pmild1,severe1, pmild1,severe2, pmild1,severe3, pmild1,severe4, pmild2,severe1, pmild2,severe2, pmild2,severe3, pmild2,severe4, pmild3,severe1, pmild3,severe2, pmild3,severe3, pmild3,severe4, pmild4,severe1, pmild4,severe2, pmild4,severe3, pmild4,severe4}, Gcase,control∈{pcontrol1,mild1, pcontrol1,mild2, pcontrol1,mild3, pcontrol1,mild4, pcontrol1,severe1, pcontrol1,severe2, pcontrol1,severe3, pcontrol1,severe4, pcontrol2,mild1, pcontrol2,mild2, pcontrol2,mild3, pcontrol2,mild4, pcontrol2,severe1, pcontrol2,severe2, pcontrol2,severe3, pcontrol2,severe4,}, were generated. For each group of p values above, a combined p value was calculated using Wilkinson's method (7) . Cutoff for selection of significantly differentially expressed genes of each group was set to < 0.05 , namely, < 1.6 × 10 −6 . Tissue specificity was analyzed based on grouped meta p values above. 23,570 transcripts that are expressed specifically from 28 tissues were selected based on GTEx database (8, 9) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 9, 2020. . The sequences remaining after removal of reads identified as human were aligned to microbial genome databases, contains 4153 viral and 2328 bacterial genomes, 208 fungi and 79 parasites genomes, coronavirus 2 (SARS-CoV-2) genome downloaded from NCBI, NC_045512.2. For identifying a positive microbe, we used the methodological criteria according to (10) (11) (12) , focused on a). Coverage rate; b). Species level sequencing depth; c). Species level relative abundance and d). Unique mapped reads numbers. Concentration of mitochondrial cfDNA was calculated as count of cfDNA fragments (sequencing reads pairs) that were uniquely mapped to mitochondrial DNA divided by total count of cfDNA fragments that were uniquely mapped to whole genome (Build hg38). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.08.20124305 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . Relative coverage was calculated as coverage of cfDNA fragments weighted by average sequencing depth of each sample. Top 10 significant genes were ranked by p values from top to bottom. Regions within the grey rectangle were used for significant test. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.08.20124305 doi: medRxiv preprint (Table S8) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . Table S3 . Results on pathway analysis of the 6 clusters from coverage around 2kb-TSSs from cfDNA of one mild case, one severe case, and two controls 1 . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.08.20124305 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.08.20124305 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.08.20124305 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 9, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 9, 2020. . https://doi.org/10.1101/2020.06.08.20124305 doi: medRxiv preprint China Novel Coronavirus Investigating and Research Team, A Novel Coronavirus From Patients With Pneumonia in China Asian Critical Care Clinical Trials Group, Intensive care management of coronavirus disease 2019 (COVID-19): challenges and recommendation Inferring expressed genes by whole-genome sequencing of plasma DNA Cell-free DNA comprises an in vivo nucleosome footprint that informs its tissues-of-origin Plasma DNA Tissue Mapping by Genome-Wide Methylation Sequencing for Noninvasive Prenatal, Cancer, and Transplantation Assessments Orientation-aware Plasma Cell-Free DNA Fragmentation Analysis in Open Chromatin Regions Informs Tissue of Origin Targeting potential drivers of COVID-19: Neutrophil extracellular traps Neutrophil extracellular traps in COVID-19 IL-37: a new anti-inflammatory cytokine of the IL-1 family Interleukin-37: The Effect of Anti-Inflammatory Response in Human Coronary Artery Endothelial Cells Induction of pro-inflammatory cytokines (IL-1 and IL-6) and lung inflammation by COVID-19 antiinflammatory strategies Spermatogenesis-associated 48 Is Essential for Spermatogenesis in Mice Laboratory biosafety guidance related to coronavirus disease 2019 (COVID-19): interim guidance SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data Fast and accurate short read alignment with Burrows-Wheeler transform The Genome Analysis Toolkit: a MapReduce framework for analyzing nextgeneration DNA sequencing data Sentieon DNASeq variant calling workflow demonstrates strong computational performance and accuracy The Reactome Pathway Knowledgebase A statistical consideration in psychological research The Genotype-Tissue Expression (GTEx) project Microbiological diagnostic performance of metagenomic next-generation sequencing when applied to clinical practice Validation of metagenomic Next-Generation sequencing tests for universal pathogen detection Metagenomic sequencing detects respiratory pathogens in hematopoietic cellular transplant patients