key: cord-0837412-f3pu4l9c authors: Takenouchi, Toshiki; Iwasaki, Yuka W.; Harada, Sei; Ishizu, Hirotsugu; Uwamino, Yoshifumi; Uno, Shunsuke; Osada, Asami; Abe, Kodai; Hasegawa, Naoki; Murata, Mitsuru; Takebayashi, Toru; Fukunaga, Koichi; Saya, Hideyuki; Kitagawa, Yuko; Amagai, Masayuki; Siomi, Haruhiko; Kosaki, Kenjiro title: Clinical Utility of SARS-CoV-2 Whole Genome Sequencing in Deciphering Source of Infection date: 2020-10-24 journal: J Hosp Infect DOI: 10.1016/j.jhin.2020.10.014 sha: 77745456f54e76b9dfe6292733816d678dd42630 doc_id: 837412 cord_uid: f3pu4l9c COVID-19 caused by SARS-CoV-2 is a worldwide problem. From the standpoint of hospital infection control, determining the source of infection is critical. We conducted the present study to evaluate the efficacy of using whole genome sequencing to determine the source of infection in hospitalized patients who do not have a clear infectious contact history. Recently, we encountered two seemingly separate COVID-19 clusters in a tertiary hospital. Whole viral genome sequencing distinguished the two clusters according to the viral haplotype. However, the source of infection was unclear in 14 patients with COVID-19 who were clinically unlinked to clusters #1 or #2. These patients, who had no clear history of infectious contact within the hospital (“undetermined source of infection”), had haplotypes similar to those in cluster #2 but did not have two of the mutations used to characterize cluster #2, suggesting that these 14 cases of “undetermined source of infection” were not derived from cluster #2. Whole viral genome sequencing can be useful for confirming that sporadic COVID-19 cases with an undetermined source of infection are indeed not part of clusters at the institutional level. Specimen collection and sample preparation Specimen collection and sample preparation Specimen collection and sample preparation Specimen collection and sample preparation The residual nasopharyngeal swab specimens of subjects who was tested positive during clinical RT-PCR-testing were retrospectively collected and used in the present analysis. Total RNA was extracted from the specimens using the QIAamp MinElute Virus Spin Kit (Qiagen, Hilden, Germany), according to the manufacturer's instructions. The RNA was reverse transcribed to cDNA using a random hexamer primer and SuperScript III Reverse Transcriptase (Thermo Fisher Scientific Corporation). PCRbased amplification was performed using Artic ncov-2019 primers, version 3, 5 in two multiplex reactions according to the globally accepted "nCoV-2019 sequencing protocol." 6 A sequencing library for amplicon sequencing was prepared using the NEB Next Ultra II DNA Library Prep Kit for Illumina (New England Biolabs). Paired-end sequencing was performed on the MiSeq platform (Illumina, CA). The fastq files were aligned using the Burrows-Wheeler Aligner and a reference sequence (Wuhan-Hu-1, MN908947.3) to generate the bam files. 7, 8 The bam files were then processed with iVar to remove primer positions supplied in a bed file and to soft clip primer sequences from an aligned and sorted bam file. 9 The quality of the genome sequencing data was evaluated using qualimap. 10 The sequenced bam files were J o u r n a l P r e -p r o o f processed with samtools and bcftools 11 In the phylogenetic tree analysis, the data points from cluster #1 were distinct from the data points from cluster #2 (Figure 1 ). Cluster #1 appeared to have derived from the original SARS-CoV-2 descended from the Wuhan outbreak at a relatively early stage, whereas the data points from the other Japanese COVID-19 cases, including those in cluster #2 and the "undetermined source of infection" group, were clustered together rather closely. The viral genome haplotype analysis confirmed that cluster #1 and cluster #2 were distinguished by 15 mutations (Figure 2 The dots represent publicly available data points. The squares represent cases in the present study. Clades were defined according to the color code shown at the bottom right. Note that the data points from cluster #1 and from cluster #2 were distinct. Cluster #2 and the "undetermined source of infection" cases, as well as the Japanese cases registered in GISAID, formed clusters belonging to a close branch (see Figure 2 ). J o u r n a l P r e -p r o o f Table upplemental Table upplemental Table upplemental Table 1 1 1 1 The clinical features of each subject are presented at the bottom of the table. The day of initial symptom onset was designated as "1." If other symptoms appeared, the day of onset for each symptom was designated in relation to the day of the initial onset. We downloaded the full nucleotide sequences of the SARS-CoV-2 genomes from the GISAID database (https://www.gisaid.org/). We uploaded the full nucleotide sequences of our cohort to the GISAID database. A Nextstrain: real-time tracking of pathogen evolution Artic ncov-2019 primers, version 3 nCoV-2019 sequencing protocol V.1 Fast and accurate short read alignment with Burrows-Wheeler transform A new coronavirus associated with human respiratory disease in China An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar Qualimap 2: advanced multisample quality control for high-throughput sequencing data The Sequence Alignment/Map format and SAMtools A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118 Public Health Responses to COVID-19 Outbreaks on Cruise Ships -Worldwide Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus