key: cord-0733653-nd35wraw authors: Lu, Roujian; Niu, Peihua; Zhao, Li; Wang, Huijuan; Wang, Wenling; Tan, Wenjie title: Sequencing the Complete Genome of COVID-19 Virus from Clinical Samples Using the Sanger Method date: 2020-06-19 journal: China CDC Wkly DOI: 10.46234/ccdcw2020.088 sha: d3b328f799ee42beeb75e6811249a35429ebe3d0 doc_id: 733653 cord_uid: nd35wraw What is already known on this topic? Coronavirus disease 2019 (COVID-19), a disease caused by a novel human coronavirus named the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) or COVID-19 virus, was reported in December 2019. Complete genomes of the COVID-19 virus from clinical samples using next generation sequencing (NGS) have been reported. What is added by this report? Here we provide the technical data for sequencing complete genome of COVID-19 virus from clinical samples using the Sanger method. Two complete COVID-19 virus genome sequences (named WH19004-S and GX0002) were obtained from clinical samples of COVID-19 patients, and two single nucleotide polymorphisms (SNPs) in ORF7a (T/C, nt 27,493) and ORF8 (T/C, nt 28,253) of WH19004-S were identified by Sanger sequencing. What are the implications for public health practice? The COVID-19 virus genome sequencing by Sanger method reported here could be used to generate data of high enough quality without requirement for expensive NGS equipment, which support sequencing complete genomes from clinical samples and monitoring of viral genetic variations of COVID-19 infections. In December 2019, a novel coronavirus from patients with pneumonia was identified and subsequently named the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (1-3). SARS-CoV-2 has caused a coronavirus disease 2019 (COVID-19) pandemic with high morbidity and mortality. Analyzing the genome of SARS-CoV-2 (also referred to as COVID-19 virus) from clinical samples is crucial for the understanding of viral spread and viral evolution as well as for vaccine development (4) (5) (6) . Presently, whole genome sequencing of the COVID-19 virus was often generated by next generation sequencing (NGS) (7) . Although NGS methods have many advantages in terms of speed and parallelism, the accuracy and read length of Sanger sequencing is still superior and has confined the use of NGS mainly to resequencing genomes (8) . Here we introduce a detailed method to rapidly obtain COVID-19 virus whole-genome sequence from clinical samples. This method is based on multiple nucleic acid amplified fragments for Sanger sequencing. We applied this method to obtain 2 complete genome sequences of COVID-19 virus from clinical samples of patients with COVID-19. In this study, bronco-alveolar lavage samples were collected from patients with COVID-19 in Hubei, China. COVID-19 virus RNA was identified as positive (Ct value: 28.78 and 31.86) by a real-time fluorescence-based reverse transcriptase polymerase chain reaction (rRT-PCR) assay as previously reported (7). Viral RNA was extracted from 140 μL of sample using QIAamp Viral mini kits (Qiagen, Germany) according to the manufacturer's instructions. RNA was eluted in 80 μL of elution buffer. A total of 38 sets of specific primers covering the whole COVID-19 virus genome were designed (Table 1) according to the reference sequence (WH19004, Accession ID: EPI_ISL_402120) obtained by NGS as previously reported (7) . Overlapping fragments were obtained by RT-PCR conducted as follow: 5 μL of extracted RNA were amplified with the QIAGEN OneStep RT-PCR Kit (Qiagen, Germany) and RT-PCR programs were run as follows: 50 ℃ 30 min; 95 ℃ for 15 min; 95 ℃ for 30 s, 50/55 ℃ 30 s, 72 ℃1/2 min, 40 cycles; 72 ℃ 5 min. All PCR products were confirmed by gel electrophoresis analysis and sequenced using the Sanger method. The 5' and 3' ends of the genome were determined by rapid amplification of cDNA ends (RACE) using the Invitrogen 5' RACE System and 3' RACE System (Invitrogen, USA) according to the manufacturer's instructions. Gene-specific primers for 5' and 3' RACE All sequencing fragments were assembled using DNAStar software. The open reading frames of the verified genome sequences were predicted using Geneious (version 11.1.5) and annotated using the Conserved Domain Database. Sequence alignment of the COVID-19 virus with reference sequences was done with Mafft software (version 7.450). The SNPs of each sequence were defined as the site's variant from the reference sequence. The primers were designed in entire genome regions to obtain overlapping amplicons of approximately 1,000-1,200 bp leading to a list of 38 primer pairs. Meanwhile, 5' and 3' terminal sequencing primers were designed to obtain amplicons of 400-500 bp for sequencing (Table. 1 ). Using DNAStar software all sequencing fragments were assembled, 2 complete sequences named WH19004-S and GX0002 were obtained from the clinical samples (Figure 1 To accelerate our investigation of this virus and the disease it causes, a practical protocol for viral genome research of clinical samples is urgently needed. In this study, we obtained 2 COVID-19 virus complete genome sequences WH19004-S and GX0002 from clinical samples using the Sanger sequencing method. While NGS is the current mainstream sequencing method with the characteristics of high-throughput, rapidity, etc., it also has some drawbacks such as its relatively short reads. As a result, NGS lacks the capacity to link independent variations on the same nucleic molecule, so it is not well suited to discriminate and phase alleles to their respective parental homolog (9) . In addition, the abundance of COVID-19 virus in clinical samples is often low, so the application of conventional NGS requires deeper sequencing of each sample in order to obtain sufficient coverage and depth of the whole viral genome, which increases the time and cost of sequencing. Nevertheless, as one of the earliest sequencing methods, the Sanger method has the characteristics of high accuracy, long reads, no requirement for expensive equipment, etc. Sanger sequencing has been used for analyzing genes where NGS fails to achieve sufficient depth of coverage or to generate data of high enough quality. Sanger sequencing is also used for confirming NGS variants before they are clinically reported (10). Especially when the general laboratory have common PCR machine and lack of expensive NGS platform, Sanger method is more prefer to be applied. In this study, we identified two SNPs in ORF7a (T/C, nt 27,493) and ORF8 (T/C, nt 28,253) of WH19004-S using Sanger sequencing compared with WH19004-NGS derived from NGS. The SNP in ORF7a of WH19004-S translated to two different amino acid (Ser or Pro). The roles of the SNPs in COVID-19 virus genetic evolution and whether it causes functional changes still need further investigation. In summary, we reported here a rapid, versatile, and clinic-friendly approach for sequencing the complete genome of COVID-19 virus from clinical samples using the Sanger method, which will facilitate monitoring of viral genetic variations during outbreaks, both current and future. Conflict of interest: No conflicts of interest were reported. Funding: This work was supported by the National Key Research and Development Program of China (2016YFD0500301). doi: 10.46234/ccdcw2020.088 A novel coronavirus outbreak of global health concern Notes from the field: a novel coronavirus genome identified in a cluster of pneumonia cases-Wuhan A novel coronavirus from patients with pneumonia in China Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. The species Severe acute respiratory syndromerelated coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 Epidemiology, genetic recombination, and pathogenesis of coronaviruses Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Deep sequencing: becoming a critical tool in clinical virology The importance of phase information for human genomics Sanger confirmation is required to achieve optimal sensitivity and specificity in next-generation sequencing panel testing