key: cord-0919738-5c8fbpk9 authors: Chen, Chen; Li, Jizhou; Di, Lin; Jing, Qiuyu; Du, Pengcheng; Song, Chuan; Li, Jiarui; Li, Qiong; Cao, Yunlong; Xie, X. Sunney; Wu, Angela R.; Zeng, Hui; Huang, Yanyi; Wang, Jianbin title: MINERVA: A facile strategy for SARS-CoV-2 whole genome deep sequencing of clinical samples date: 2020-05-26 journal: bioRxiv DOI: 10.1101/2020.04.25.060947 sha: 332d33830d322100fb202152c5595cba393596ba doc_id: 919738 cord_uid: 5c8fbpk9 The novel coronavirus disease 2019 (COVID-19) pandemic poses a serious public health risk. Analyzing the genome of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) from clinical samples is crucial for the understanding of viral spread and viral evolution, as well as for vaccine development. Existing sample preparation methods for viral genome sequencing are demanding on user technique and time, and thus not ideal for time-sensitive clinical samples; these methods are also not optimized for high performance on viral genomes. We have developed MetagenomIc RNA EnRichment VirAl sequencing (MINERVA), a facile, practical, and robust approach for metagenomic and deep viral sequencing from clinical samples. This approach uses direct tagmentation of RNA/DNA hybrids using Tn5 transposase to greatly simplify the sequencing library construction process, while subsequent targeted enrichment can generate viral genomes with high sensitivity, coverage, and depth. We demonstrate the utility of MINERVA on pharyngeal, sputum and stool samples collected from COVID-19 patients, successfully obtaining both whole metatranscriptomes and complete high-depth high-coverage SARS-CoV-2 genomes from these clinical samples, with high yield and robustness. MINERVA is compatible with clinical nucleic extracts containing carrier RNA. With a shortened hands-on time from sample to virus-enriched sequencing-ready library, this rapid, versatile, and clinic-friendly approach will facilitate monitoring of viral genetic variations during outbreaks, both current and future. As of May 22, 2020, the ongoing COVID-19 viral pandemic has affected 5 million people 55 in over 200 countries and territories around the world, and has claimed more than 320 56 thousand lives 1 . Closely monitoring the genetic diversity and distribution of viral strains at 57 the population level is essential for epidemiological tracking, and for understanding viral 58 evolution and transmission; additionally examining the viral heterogeneity within a single 59 individual is imperative for diagnosis and treatment 2 . The disease-causing pathogen, 60 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was identified from 61 early disease cases and its draft genome sequenced within weeks, thanks to the rapid 62 responses from researchers around the world 3-6 . The initial SARS-CoV-2 draft genome 63 was obtained independently from the same early COVID-19 patient samples using Recently, we reported a new RNA-seq library construction strategy that aims to address 99 some of these challenges: SHERRY avoids the problematic dsDL step in library 100 construction by taking advantage of the newly discovered Tn5 tagmentation activity on 101 RNA/DNA hybrids, to directly tag RNA/cDNA fragments with sequencing adapters 10 . As 102 such, SHERRY has minimal sample transfers and greatly reduced hands-on time, making 103 it simple, robust, and suitable for inputs ranging from single cells to 200 ng total RNA. We 104 now combine the advantages of a tailored SHERRY protocol, which improved coverage 105 of whole metatranscriptome, with a simplified post-library target enrichment protocol. MetagenomIc RNA EnRichment VirAl sequencing or MINERVA, is an easy-to-use, 107 versatile, scalable, and cost-effective protocol that yields high-coverage high-depth 108 SARS-CoV-2 genome, while preserving the sample's rich metatranscriptomic profile. The Metagenomic RNA enrichment viral sequencing (MINERVA). To analyze both 120 metagenomics and SARS-CoV-2 genetics from COVID-19 patient samples, we 121 developed a two-stage metagenomic RNA enrichment viral sequencing strategy termed 122 MINERVA (Fig. 1A) to address this, we used 10 ng mouse 3T3 cell total RNA as starting material, and tested 128 whether adding random decamers (N10) during RT could improve coverage evenness 129 (Fig. S2) . Compared with the standard SHERRY protocol, which uses 1 µM T30VN primer 130 during RT, the supplement of 1 µM N10 indeed improves gene body coverage evenness, 131 presumably by improving the RT efficiency. When the N10 concentration was further 132 increased to 10 µM, we observed almost no coverage bias in the gene body. The high 133 N10 concentration can result in an increased rRNA ratio in the product, sometimes as 134 high as 90%, but MINERVA employs rRNA removal as the first step prior to RT, thus 135 negating this problem. We also performed enzyme titration with homemade and 136 commercial Tn5 transposomes. Based on these N10 and Tn5 titration results, we used 137 10 µM N10 during RT and 0.5 µl V50 for each 20-µl tagmentation reaction in all following 138 experiments. The whole procedure from nucleic acid to metagenomic sequencing-ready 139 library, including wait time, takes 5.5 hours (Fig. S1) . These patients were admitted to Ditan Hospital within a three-month period from January 152 to April 2020, presenting different symptom severity ( Fig. 1B and Table S1-S3). Some 153 patients were re-sampled longitudinally to investigate temporal and intra-host viral 154 heterogeneity. We first tested the effect of sample input volume on MINERVA results. Using just 2.7-ul of sample input led to satisfactory SARS-CoV-2 coverage, and scaling 156 up the reaction volume to 5.4-ul further improved the MINERVA data quality (Fig. 1C) . 157 Using the same samples and at the same sequencing depth, more input in a higher 158 reaction volume generated deeper SARS-CoV-2 genome coverage. Gbp for each MINERVA library, and nearly 100 Gbp for each dsDL library (Fig. S3) . The 172 metagenomic compositions of SHERRY and dsDL libraries were comparable: total virus, 173 fungus, and bacteria ratios were highly concordant between the two methods (Fig. S4) ; 174 bacterial heterogeneity as measured by entropy is also correlated between the two. along PC1 ( Fig. 2A) , and this is reflected in analysis at both the genus and species levels, 182 conveying the unique microbial environment of this body site. This phenomenon is most 183 prominently reflected by the bacterial composition, but is also somewhat reflected in the 184 viral composition (Fig. S5) . We then identified the specific microbes that drive this 185 separation of sample types, and found some microbes to be body site-specific. For in known pathogenic species such as Candida, which is only found in orally obtained 189 samples ( Fig. S5) . There also appears to be separation between samples by COVID-19 190 symptom severity along PC2 ( Fig. 2A) , which is supported by our analysis of specific 191 microbial species (Fig. 2B) . We found the bacterial metagenomic signature could be used with non-critical patients ("mild", "moderate", and "severe" groups) (Fig. 2C) . The 203 Shannon Diversity Index, however, is similar for all disease severities, indicating that 204 although the overall bacterial abundance is reduced in critical patients, the relative 205 abundance of different species remains stable. Interestingly, this phenomenon appears 206 to also correlated with patient age -In elderly patients above the age of 60, both bacterial 207 ratio and species richness are significantly reduced in critical patients as compared to 208 8 non-critical; this trend is not observed in patients younger than 60 years of age (Fig S6) . 209 To further assess the relationship between bacterial metagenomic composition and 210 disease severity, we calculated the pairwise Bray-Curtis similarity for pharyngeal swab 211 samples, and found that mild, moderate, and severe patient samples are clustered in critical patients is significantly lower (Fig. 2E ). This effect is partly contributed by a 228 greater abundance of SARS-Cov-2 sequences, which could then lead to a lower Shannon 229 Index as signals from low abundance viruses could be drowned out. Intriguingly, several 230 other viral families display increased abundance in critical patients as compared to non-231 critical (Fig. 2F) , including many dsDNA viruses that are known to establish latency such 232 as herpesviruses and papillomaviruses. One speculation is that the correlation of 233 abundance of these viral families with disease severity could be due to reactivation of 234 latent viruses under immunosuppressive medication, changes in host immune activity, or 235 direct SARS-CoV-2 activity 15-17 . The effect of age that we observed for bacteria is less 236 pronounced in viruses (Fig. S8) , and is only observed for some viral families. Additional For severe viral pneumonia, co-infections can greatly affect patient outcomes 18-20 . One 241 recent study has showed that 50% of patients with COVID-19 who have died in this 242 pandemic had secondary bacterial infections 21 . By surveying the metagenomic landscape 243 of these samples, we observed several patient samples with exceptionally high 244 abundance of known pathogens, which could indicate a co-infection with SARS-Cov-2 in 245 those patients. We found ten cases of S. aureus and nine cases of C. albicans co-246 infections, and the rate of co-infection for both pathogens is generally correlated with 247 disease severity (Fig. 2G) . genome coverage than dsDL methods ( Fig. 3F and 3G) , and this advantage is more 265 pronounced for low viral load samples, including two samples with negative qPCR results, 266 and stool samples. By studying the relationship between SARS-CoV-2 qPCR results and 267 read ratio, we identified two groups of samples that resulted in low SARS-CoV-2 genome 268 coverage when processed using dsDL (Fig. 3H ). The first group had low SARS-CoV-2 269 read ratio, which prohibited the acquisition of enough SARS-CoV-2 sequencing reads. Ct values and read ratio, suggesting these samples had low total nucleic acid amount. Since dsDL approaches are less sensitive and require more input, this may explain why 273 MINERVA outperforms dsDL most evidently in stool samples. constructed a SARS-CoV-2 mutational profile (Fig. 4A) , which was distinct from the (Fig. 4D) . Summarily, despite the asymptomatic phenotype of some infected SNVs (iSNVs) detected in many samples (Fig. 4A) suggest that SARS-CoV-2 is rapidly 318 evolving within the hosts 2 . Through longitudinal sampling, we confirmed that iSNVs were 319 generally relatively stable across time and body sites (Fig. S10) , but found that some 320 patients harbored greater variations in iSNVs (Fig. 4E) . For P40 and P41, iSNVs were 321 stable within the same sample type across time, but varied across different sample types. immunity, and disease progression. We also showed that MINERVA metagenomic profiles 391 can identify potential co-infections of bacteria, fungi, and other viruses, which is 392 challenging to do with conventional approaches. In our samples, we found a co-infection 393 rate of ~ 20% (16/79 patients), which is higher than the rate reported by one secondary 394 14 study of COVID-19 co-infections 35 . In this secondary study, although they found 8% of 395 patients experiencing bacterial/fungal co-infection, the rate of broad spectrum antibiotic 396 use for COVID-19 patients is much higher (72%). It is well-known that co-infections in 397 severe pneumonia can greatly affect patient outcomes 19,20 , and it is estimated that 50% The laboratory tests and host immunity of COVID-19 patients with 464 different severity of illness The trinity of 467 COVID-19: immunity, inflammation and intervention Novel' Triggers of Herpesvirus Reactivation and Their 470 Immunosuppression Facilitates 473 the Reactivation of Latent Papillomavirus Infections Immunosuppression for hyperinflammation in 476 COVID-19: a double-edged sword? Co-infections: potentially lethal 479 and unexplored in COVID-19. The Lancet Microbe Online Publication Epidemiology, Co-Infections, and Outcomes of Viral 481 Pneumonia in Adults Bacterial and viral co-infections complicating severe influenza: 484 Incidence and impact among 507 U.S. patients Clinical course and risk factors for mortality of adult inpatients with 487 COVID-19 in Wuhan, China: a retrospective cohort study Genomic epidemiology of SARS-CoV-2 in Guangdong Province On the origin and continuing evolution of SARS-CoV-2. National 492 Science Review Online Publication No evidence of SARS-CoV-2 in semen of males recovering from 494 COVID-19 Clinical Characteristics and Results 497 of Semen Tests Among Men With Coronavirus Disease Presumed Asymptomatic Carrier Transmission of COVID-19 Covid-19 -Navigating the Uncharted This study was approved by the Ethics Committee of Beijing Ditan Hospital, Capital 650 Medical University 25 µl home-made pTXB1) was performed in tagmentation procedure. In 662 all tests, 10 ng 3T3 total RNA was used, and all reagents except for N10 or Tn5 663 transposome remain unchanged. All libraries were sequenced on Illumina NextSeq 500 664 with 2x75 paired-end mode. Clean data was aligned to GRCm38 genome and known 665 transcript annotation using 91 patients were enrolled in this study according 671 to the 7th guideline for the diagnosis and treatment of COVID-19 from the National Health 672 Commission of the People's Republic of China. All patients, diagnosed with COVID-19, 673 were hospitalized in Beijing Ditan Hospital and classified into four severity degrees, mild moderate, severe, and critical illness, according to the guideline. We collected 143 675 samples (60 pharyngeal swabs, 52 sputum samples, 25 stool samples, and 6 semen For all the clinical samples, nucleic acids extraction was performed in a BSL-3 laboratory Total RNA 681 was extracted using QIAamp Viral RNA Mini Kit (Qiagen) following the manufacturer's 682 instructions. In most samples (79 out of 85) we specifically omitted the use of carrier RNA 683 due to its interference on the most prevalent sample preparation protocols for high-684 throughput sequencing. After nucleic acids extraction, rRNA was removed by rDNA probe 685 hybridization and RNase H digestion The final elution volume was 687 12-20 µl for each sample. For carrier RNA removal tests, 1.7 µg polyA carrier RNA was 688 spiked into 18 µl of elute from QIAamp Viral RNA Mini Kit. To remove the carrier RNA from 689 these spike-in samples and other samples extracted with carrier RNA dsDL Metagenomic RNA library construction and sequencing 693 The libraries were constructed using MGIEasy reagents (BGI, China) following 694 manufacture's instruction. The purified RNA, after rRNA depletion and DNA digestion, 695 underwent reverse transcription, second strand synthesis, and sequencing adaptor 696 ligation. After PCR amplification 7 µl RNA from rRNA and DNA removal reaction was used for standard SHERRY 701 reverse transcription, with the following modifications: 1) 10 pmol random decamer (N10) 702 was added to improve coverage; 2) initial concentrations of dNTPs and oligo-dT (T30VN) 703 were increased to 25 mM and 100 µM, respectively. For 5.4 µl and 10.8 µl input, the entire 704 reaction was simply scaled up 2 and 4 folds, respectively. The RNA/DNA hybrid was 705 tagmented in TD reaction buffer P0756), and 1U/µl RNase inhibitor (TaKaRa, Cat.No. 2313B). The reaction was incubated at 55°C for 30 min. 20 µl tagmentation product was mixed with 20.4 µl M0492L), 0.4 µl SuperScript II reverse transcriptase, 710 and incubated at 42°C for 15 min to fill the gaps, followed by 70°C for 15 min to inactivate 711 Then index PCR was performed by adding 4 µl 10 712 µM unique dual index primers and 4 µl Q5 High-Fidelity 2X Master Mix, with the following 713 thermo profile: 98°C 30 s, 18 cycles of The PCR product was then purified with 0.8x VAHTS DNA Clean Beads These SHERRY libraries were sequenced on Illumina NextSeq 500 with 2x75 716 paired-end mode for metagenomic analysis For preparing MINERVA libraries through SARS-CoV-2 enrichment, 1µl SHERRY 718 metagenomic library was first quantified with N gene using quantitative PCR (F: 719 GGGGAACTTCTCCTGCTAGAAT dilution, then multiple libraries were pooled together based on qPCR results and 721 processed with TargetSeq One Cov Kit (iGeneTech, Cat.No.502002-V1) following 722 manufacturer's instruction. The iGeneTech Blocker was replaced by the IDT xGen 723 These MINERVA libraries were sequenced on Illumina 724 NextSeq 500 with 2x75 paired-end mode for deep SARS-CoV-2 analysis For metagenomic RNA-seq data, raw reads were quality controlled using BBmap (version 728 38.68) and mapped to the human genome reference (GRCh38) using STAR (version 729 2.6.1d) with default parameters. All unmapped reads were collected using samtools 730 (version 1.3) for microbial taxonomy assignment by Centrifuge (version 1.0.4) Bacterial Shannon diversity (entropy) was calculated at species level, and the species 736 abundance was measured based on total reads assigned at the specific clade normalized 737 by genome size and sequencing depth. Bacterial genus composition was analyzed based 738 on reads proportion directly assigned by Centrifuge. For dsDL sequencing data sampling was performed for each sample to obtain ~12M paired-end nonhuman reads, 740 which is the median of SHERRY datasets. Same workflow was performed as described 741 above for the removal of human reads and microbial taxonomy assignment Then we removed 745 duplicates from the primary alignment with Picard Tools v2.17.6. We used mpileup 746 function in samtools v1.10 to call SNP and InDel with parameter -C 50 -Q 30 -q 15 -E -d 747 0. We called mutation if the depth >=10 and strand bias >0.25. The strand bias is defined 748 as the value that minimum The sequencing data generated during this study have been uploaded to Sequencing Archive (PRJCA002533). However