key: cord-0688790-rdkqjqs2 authors: Chen, Zigui; Chong, Ka Chun; Wong, Martin C.S.; Boon, Siaw S.; Huang, Junjie; Wang, Maggie H.; Ng, Rita W.Y.; Lai, Christopher K.C.; Chan, Paul K.S. title: A global analysis on replacement of genetic variants of SARS-CoV-2 in associate with containment capacity and changes in disease severity date: 2021-01-30 journal: Clin Microbiol Infect DOI: 10.1016/j.cmi.2021.01.018 sha: e02ad1f63008e18fa46a894bd98a5e30d0b6cb86 doc_id: 688790 cord_uid: rdkqjqs2 OBJECTIVES: To examine SARS-CoV-2 variant replacement in association with containment capacity and changes in case-fatality at country level. METHODS: Altogether, 69,571 SARS-CoV-2 full genomes collected globally within the first six months of pandemic were examined. The correlation between variant replacement and containment capacity was examined by logistic regression models using the WHO International Health Regulation (IHR) score, the Oxford COVID-19 Government Response Tracker (OxCGRT) and the vulnerability index “INFORM” as proxies; whereas correlation with changes in monthly crude case-fatality ratios was examined by a mixed effect model. RESULTS: At the global level, variant lineage G* characterized by the S-D614G mutation replaced the older lineages L and S in March 2020. European countries including Finland, France and Italy were the first group to reach 50% increment of G*, whereas only Singapore and South Korea had non-G* persisted throughout the first six months. Countries with higher IHR scores (β-coefficient: -0.001, 95% C.I. [-0.016, -0.001], p=0.034) and higher stringency indexes (OxCGRT) (β-coefficient: -0.011, 95% C.I. [-0.020, -0.001], p=0.035) were associated with lower levels of G* replacement; whereas higher vulnerability indexes (INFORM) (β-coefficient: 0.049, 95% C.I. [0.001, 0.097], p=0.044) were associated with higher replacement levels. Crude case-fatality ratio showed a positive correlation with G* replacement (β-coefficient: 0.034, 95% C.I. [0.011, 0.058], p=0.004), even after adjusting for testing capacity and other country-specific characteristics. CONCLUSIONS: SARS-CoV-2 variant lineage G* (S-D614G) replaced older lineages more efficiently in countries with lower containment capacity, and its possible association with increased disease severity deserves further investigations. SARS-CoV-2 complete genome sequences collected within the first six months of pandemic (on or before 30 June 2020) were downloaded from GISAID. (5) All sequences were called for open reading frames (ORFs) using blastn v2.10.1 (7) and bedtools v2.29. 2, (8) and concatenated based on aligned ORFs using mafft v7.471.(9) Sequences of poor quality including those missing more than 5000 bp or having more than 10 ambiguous variations were filtered. A tree inferred from all sequences was prepared using FastTree v2, (10) to J o u r n a l P r e -p r o o f select representative genomes from main topology branches for calling single nucleotide polymorphisms (SNPs) and for constructing a maximum likelihood phylogenetic tree using RAxML v8.2.12. (11) With reference to GISAID, variants were assigned to four lineages: L (reference sequence, nucleotide T28144, ORF8 amino acid L84), S (nt T28144C, ORF8 aa L84S), V (nt G26144T, ORF3a aa G251V) and G* (nt A23403G, S gene aa D614G). The lineage G* was further divided into three clades, including G, GH (G25563T, ORF3a aa Q57H) and GR (G28883C, N gene aa G204R). The geographic source and date of collection were retrieved from GISAID to construct the dispersion and replacement of SARS-CoV-2 variants at global, continent and country levels. We hypothesized the degree of spread and hence the replacement with newly emerged variants could be related to the containment capacity. Three internationally recognized indexes, namely the WHO International Health Regulation (IHR) score, the stringency index, and the vulnerability index were used as proxies of containment capacity and implementation. The IHR Score. Members of the IHR report to the WHO annually on the implementation of capacity required by the regulations to sustain public health response and surveillance. These regulations are legal instruments designed to develop the capacity of all members for preventing, detecting, assessing, notifying, and responding to internationally concerned public health events. The IHR score includes 13 IHR capacity items assessed by We obtained the numbers of cumulative incidence and deaths of COVID-19 from the WHO dashboard, (20) and determined the crude case-fatality ratios (CFRs) by dividing the cumulative number of deaths by the cumulative number of reported cases for each country. A J o u r n a l P r e -p r o o f scatterplot was generated to explore the association between CFRs and the proportion of variants belonged to lineage G* from January to June, 2020. To account for between-country variability, we employed linear mixed effect models to examine the association between CFR and proportion of variants belonged to lineage G* in each month. As the testing capacity is highly correlated with the number of cases detected, the country-specific numbers of tests per 1,000 population by months and other country-specific characteristics (i.e. proportion of population aged 65 years or above, gross domestic product per capita, population density, and number of hospital beds per thousand population) were adjusted in the model. (21) A random intercept term was used to adjust for between-country variations from repeated measurements. Suppose y ij is the monthly CFR on month j in country i, the full model form is as follow: x is the monthly proportion of variants belonged to lineage G* with regression coefficient β 1 , ij w is the monthly testing capacity with regression coefficient β 2 for country i on month j, and ) ( p i z is the p-th country-specific characteristic variable with regression coefficient β p . The country-specific random effect is modelled as α i which followed a normal distribution with mean 0 and variance σ α 2 on top of the random error ( ij ε ) within country i over time. To examine whether the observation was affected by the extreme CFRs from the countries with healthcare facilities overwhelmed, a separated analysis by excluding countries outside the regression prediction interval was conducted. A subgroup analysis for low and high testing capacity using a median cut-off was also conducted. By June 30, 2020, there were at least 10,450,456 confirmed COVID-19 cases reported to WHO(22) ( Figure 1A) , and with 69,571 high-quality complete genome sequences deposited in GISAID fulfilling the criteria to be included in this study ( Figure 1B) . These sequences were from six continents (62.9% Europe, 23.0% North America, 7.5% Asia, 3.7% Oceania, 1.6% South America, 1.4% Africa) (Table S1 ) and 100 countries/cities. The phylogenetic tree topology revealed four major lineages (L, S, V and G*), with lineage G* further divided into three clades (G, GH and GR) (Figure 2A ). The sequence signature patterns of each lineage are shown in Figure 2B , with S-D614G consistently detected from clades G, GR and GH of lineage G*. When the outbreak was first recognized in December 2019, all sequenced isolates (before 1 January 2020) were from China and belonged to lineage L; whereas lineage S was detected since early January in China ( Figure 2C ). Then, from January to mid-February, both lineages L and S were detected and contributed to the majority of sequenced isolates. Isolates belonged to lineage V was first reported from China on January 21, and soon followed by another newly detected lineage, G*, on January 24, also from China. Subsequently, two clades (GH and GR) of lineage G* were first reported on January 27 and February 16, respectively, both were from the United Kingdom. From the global perspective, lineage G* appeared since late January, and quickly replaced L and S starting from early March. Lineage V remained as a minor fraction until early May, and then rarely detected ( Figures 2C) . Replacement of lineage G* as observed at the global level ( Figure 3A ) was reproduced in most continents including Europe, North America, Oceania and Asia ( Figure 3B ). While lineage G* also predominated in South America and Africa, early sequences from J o u r n a l P r e -p r o o f these two continents were not available to make a clear distinction between replacement by or persistent of lineage G* right from the beginning. Globally, replacement by lineage G* occurred in March 2020; but with variations in the timing of replacement at country level, and some countries did not exhibit substantial replacement throughout the first six months of pandemic. We included countries with more than 50 high-quality full genome sequences collected within the first six months of pandemic for country level analysis. Hong Kong, a city of China, also provided more than 50 highquality full genomes was included in the analysis. Figure 4A shows the time to reach 50% increase, i.e. 50% of the newly identified isolates belonged to lineage G*. Finland, France and Italy were the first group of countries to reach 50% increase in February; whereas others, e.g. Panama, Malaysia, Singapore and South Korea did not reach 50% increase during our study period (till the end of June). Some countries, such as Senegal and Saudi Arabia, Denmark, Luxembourg, Netherlands, Switzerland Mexico and Brazil were predominated by lineage G* right from the beginning. The association between variant replacement and containment capacity was examined at country level. We first included 45 countries with at least one of the three proxy parameters to indicate containment capacity ( Figure S1 ) We observed that S-D614G replacement started in late February 2020, and followed by the exponential upsurge in reported cases two weeks later in mid-March 2020. We thus use March 2020 as the "critical variant replacement period" to examine its association with containment capacity and response. The results showed that countries with higher containment and public health response capacity had delayed lineage replacement, probably due to their success in suppressing importation and/or delaying local spread of this newly emerged variant. Our findings support using the IHR score, OxCGRT and INFORM to reflect stringency of government responses and vulnerability of a country in the context of pandemic. Of note, two countries in Asia, Singapore and South Korea, exhibited a clear persistence of J o u r n a l P r e -p r o o f the older lineages, L and V, respectively; suggesting that their later waves were mainly due to continuous circulation and upsurge of local infections, rather than importation of new variants. Strategies to suppress importation in these two countries could be of learning value to others. We examined the changes in population level case-fatality ratio in associate with the changes in proportion of lineage G* with an attempt to understand its effect on disease severity. It is anticipated that comparing crude fatality ratios at country level is subjected to biases and confounders. We tried a few alternative analyses by excluding countries with extremely high case-fatality, and by stratifying countries according to testing capacity. We also adjusted for country-specific demographic features. While a significant association between lineage G* replacement and increased disease severity was observed from our mixed effect model, one should interpret this cautiously as it may not represent causative association. For instance, information of the infected population in each country, such as age distribution and comorbidity status, were not available for a more robust analysis. Our study has limitations. Firstly, the availability of genome sequences is subjected to biases such as sequencing capacity, sampling location and timing. Secondly, deaths and infections due to SARS-CoV-2 are bound to be underreported and linked to complex confounders. Nevertheless, the variant replacement revealed in this study is no doubt a genuine observation, and its possible association with increased disease severity should be further verified using appropriate patient cohorts and biological models. Complete genome sequences available in GISAID according to the date of collection. A pneumonia outbreak associated with a new coronavirus of probable bat origin World Health Organization. 2019-nCoV outbreak is an emergency of international concern 2020 World Health Organization. WHO Director-General's opening remarks at the media briefing on COVID-19 -11 Phylogenetic network analysis of SARS Global initiative on sharing all influenza data maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models The University of Oxford. Variation in Government responses to COVID-19 Evaluation on different non-pharmaceutical interventions during COVID-19 pandemic: An analysis of 139 countries Index for Risk Management INFORM Concept and Methodology Report-Version The potential impact of vulnerability and coping capacity on the pandemic control of COVID-19 Human Development Index Ranking World Bank and Central Intelligence Agency World Factbook Countries by density by population 2020. World Population Review World Health Organization. Coronavirus disease (COVID-19) outbreak situation. Dashboard Data. Statistics and Research. Coronavirus (COVID-19) Testing 2020 Could the D614G substitution in the SARS-CoV-2 spike (S) protein be associated with higher COVID-19 mortality? Making Sense of Mutation: What D614G Means for the COVID-19 Pandemic Remains Unclear We gratefully acknowledge the authors, originating and submitting laboratories of the sequences from GISAID's EpiFlu™ Database (https://www.epicov.org/) on which this research is based. PKSC conceived and supervised that study. ZC, KCC, MCSW, JH collected and analysed data. SSB, MHW, RWYN, CKCL interpreted data and prepared manuscript.J o u r n a l P r e -p r o o f