key: cord-0897198-agr12gb8 authors: Lee, Ji-eun; Chung, Jae Keun; Kim, Tae sun; Park, Jungwook; Lim, Mi hyeon; Hwang, Da jeong; Jeong, Jin; Kim, Kwang gon; Yoon, Ji-eun; Kee, Hye young; Seo, Jin jong; Kim, Min Ji title: Genomic and phylogenetic analyses of SARS-CoV-2 strains isolated in the city of Gwangju, South Korea date: 2020-12-17 journal: bioRxiv DOI: 10.1101/2020.12.16.423178 sha: 054b926f070db58fb22f54efbdf8134067d1daff doc_id: 897198 cord_uid: agr12gb8 Since the first identification of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in China in late December 2019, the coronavirus disease 2019 (COVID-19) has spread fast around the world. RNA viruses, including SARS-CoV-2, have higher gene mutations than DNA viruses during virus replication. Variations in SARS-CoV-2 genome could contribute to efficiency of viral spread and severity of COVID-19. In this study, we analyzed the locations of genomic mutations to investigate the genetic diversity among isolates of SARS-CoV-2 in Gwangju. We detected non-synonymous and frameshift mutations in various parts of SARS-CoV-2 genome. The phylogenetic analysis for whole genome showed that SARS-CoV-2 genomes in Gwangju isolates are clustered within clade V and G. Our findings not only provide a glimpse into changes of prevalent virus clades in Gwangju, South Korea, but also support genomic surveillance of SARS-CoV-2 to aid in the development of efficient therapeutic antibodies and vaccines against COVID-19. The coronavirus disease 2019 is caused by a newly emerged virus, named 36 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (1, 2). Since its first Chinese visitor from Wuhan, was identified on January 20, 2020 (5, 6). On February 3, 2020, 42 3 the first case in the city of Gwangju, South Korea, was reported, and as of mid-July 2020, a 43 total of 171 confirmed cases have been identified. 44 SARS-CoV-2 is an enveloped, positive-sense single-stranded RNA virus (7). The CoV-2 genome encodes 16 non-structural proteins (nsps) involved in virus replication and 46 four structural proteins, including envelope (E), matrix (M) and nucleocapsid (N) and spike 47 (S) (8). RNA viruses have higher mutation rates than DNA viruses (9). Mutations in the 48 SARS-CoV-2 genome are being continuously reported (10, 11) , and these mutations may 49 affect pathogenesis. Therefore, it is critically important to monitor the genome evolution of 50 SARS-CoV-2. Investigation of the viral genomic variations is necessary to provide 51 epidemiological information on SARS-CoV-2 and for the development of therapeutics and 52 vaccines. In this study, we isolated SARS-CoV-2 virus from COVID-19 patients in the city of Gwangju, South Korea, and we investigated genomic mutations resulting in amino acid changes. To this 55 end, we performed full-genome sequencing using a next-generation sequencing (NGS) tool 56 and analyzed point mutations in the SARS-CoV-2 genome. We phylogenetically analyzed and 57 classified virus strains from confirmed SARS-CoV-2-infected patients in the southwestern 58 region of Korea. RT-qPCR assays targeted the RNA-dependent RNA polymerase (RdRp) and E genes. Sequence information for the primers and probes used to detect these two genes is presented 140 We performed phylogenetic analyses of 16 distinct SARS-CoV-2 sequences of isolates from 141 Gwangju and 40 sequences deposited in the GISAID database up to March 31, 2020 (Fig. 1) . According to GISAID data, three major clades of SARS-CoV-2, including S, V, and G, can be 143 identified (13). These three clades are determined by the amino acid substitutions present: 144 ORF8-L84S (S), ORF3a-G251V (V), and S-D614G (G). In accordance with the above classification, we identified three clades (GH, GR, and V) on 146 the basis of the point mutations identified in the Gwangju sequences (Fig. 1) . Clade V 147 included seven isolates from Gwangju. Five of these isolates (indicated in purple in Fig. 1, 148 IHC597, 610, 614, 685, and 719) were associated with the Shincheonji religious group, and 149 the remaining two (Fig. 1, IHC1215 , and 1217) were related to another church in Gwangju. As revealed by the epidemiologic information, the five isolates from this clade were related (14). The G clade comprised nine of the 16 isolates (labeled green in Fig. 1) . We identified two 155 subclades within clade G: GH, with seven isolates, and GR, with one isolate. Of the seven 156 confirmed cases related to the seven GH isolates, six were linked to cluster secondary 157 infections from door-to-door sales businesses at Geumyang Building (Fig. 1, IHC17695 , 158 18094, 18561, 18670, 18755 and 20065) . A Geumyang Building case led to a cluster outbreak 159 in Gwangju from late June to mid-July and further led to additional cases at the 160 Gwangneuksa temple, church, and at work. The remaining one case appeared to be associated 161 with international travel-related exposures (Fig. 1, IHC2681) . The case related to the one 162 isolate in clade GR had a known international travel history (Fig. 1, IHC3486) . Taken 163 together, these results reveal that the V clade of SARS-CoV-2 was the major clade type in Table 3 . Clades V (ORF3a-G251V) and G (S-D614G) comprised seven and nine out of the 16 175 sequences of Gwangju isolates, respectively (Table 4) . ORF8-L84S, representative of clade S, 176 was not observed in any of the 16 genomes. In addition to the G251V mutation in ORF3a and 177 the D614G mutation in S, we analyzed 19 additional point mutations in ORF1a, ORF1b, 178 ORF3a, ORF7a, S, and N. For these 19 genetic mutations, 11 mutation sites were found in 179 ORF1a and ORF1b, which encode nsp1-16, occupying two-thirds of the entire genome (Fig. 180 2). We first analyzed the mutations in ORF1a/b, and we identified six mutation types in ORF1a 182 (Table 3) . Among them, the most common mutations were ORF1a-T265I (in seven samples, , 18094, 18561, 18670, 18755, and 20065) . In particular, we found that ORF1a-186 L3606F co-occurred with ORF3a-G251V, which determines clade V (Table 3) . Further, 187 M1769I, A968V, and L2609F occurred in nsp3 within ORF1a. Mutations within nsp2 and 188 nsp3 may play a role in differentiating infectivity of SARS-CoV-2 from SARS-CoV (15). In 189 addition, we detected five mutations in ORF1b. P323L in nsp12 and Q2412L were observed 190 in nine (IHC2681, 3127, 3486, 17695, 18094, 18561, 18670, 18755, and 20065) and six 191 (IHC17695, 18094, 18561, 18670, 18755, and 20065 ) genome sequences, respectively. We 192 also identified A449V and M633V in nsp12 and A1652V. The Nsp12 gene of SARS-CoV-2 193 encodes the RdRp protein essential for the virus replication machinery. The remaining eight point mutations were found in ORF3a, ORF7a, S, and N. Four mutations, 195 R195K, R203K, G204R, and T366I, were detected in the N protein, which participates in 196 viral RNA genome packaging and viral particle release (16). Among them, G204R was 197 observed together with ORF1b-P323L in the genome sequence of hCoV-198 19/Gwangju/IHC3486/2020 (Table 3) . Based on GISAID data, two subclades, GR and GH, 199 belonging clade G, were determined by two mutations (N-G204R and ORF1b-P323L) and We for the first time successfully isolated SARS-CoV-2 from COVID-19 patients in the 216 southwestern region of South Korea. According to viral clade classification based on GISAID 217 data, SARS-CoV-2 in Gwangju mainly belong to the V and G clades. More specifically, 218 during the initial breakout in February 2020, the V clade was major clade type, mainly due to 219 spread in the Shincheonji religious group. Since July 2020, the GH clade, related to door-to- Association of public health interventions with the epidemiology of the 230 COVID-19 outbreak in Wuhan, China Epidemiology, virology, and clinical features of severe acute 232 respiratory syndrome-coronavirus-2 (SARS-CoV-2 Coronavirus Disease-19) Coronavirus disease 2019 (COVID-19): situation report World Health Organization The continuing 2019-nCoV 238 epidemic threat of novel coronaviruses to global health -The latest 2019 novel 239 coronavirus outbreak in Wuhan, China The 241 first case of 2019 novel coronavirus pneumonia imported into Korea from Wuhan China: Implication for infection prevention and control measures Identification of coronavirus isolated from a patient in Korea with 246 COVID-19 Genotype and phenotype of COVID-19: Their 12 Detection of 2019 novel 266 coronavirus (2019-nCoV) by real-time RT-PCR Phylogenetic network analysis of 268 SARS-CoV-2 genomes First pediatric case of 270 coronavirus disease 2019 in Korea COVID-2019: The role of the nsp2 and nsp3 in its pathogenesis Nucleocapsid-independent 275 specific viral RNA packaging via viral envelope protein and viral RNA signal Severe acute respiratory syndrome coronavirus 279 ORF7a inhibits bone marrow stromal antigen 2 virion tethering through a novel 280 mechanism of glycosylation interference