key: cord-0964931-s0374olh authors: Scheepers, C.; Everatt, J.; Amoako, D. G.; Mnguni, A.; Ismail, A.; Mahlangu, B.; Wibmer, C. K.; Wilkinson, E.; Tegally, H.; San, J. E.; Giandhari, J.; Ntuli, N.; Pillay, S.; Mohale, T.; Naidoo, Y.; Khumalo, Z.; Makatini, Z.; Network for Genomic Surveillance South Africa,; Sigal, A.; Williamson, C.; Treurnicht, F.; Mlisana, K.; Venter, M.; Hsiao, N.-y.; Wolter, N.; Msomi, N.; Lessells, R.; Maponga, T.; Preiser, W.; Moore, P. L.; von Gottberg, A.; De Oliveira, T.; Bhiman, J. title: The continuous evolution of SARS-CoV-2 in South Africa: a new lineage with rapid accumulation of mutations of concern and global detection date: 2021-08-24 journal: nan DOI: 10.1101/2021.08.20.21262342 sha: 9b6afb5a0c087ff2f44d3921b1f17516012d0148 doc_id: 964931 cord_uid: s0374olh SARS-CoV-2 variants of interest have been associated with increased transmissibility, neutralization resistance and disease severity. Ongoing SARS-CoV-2 genomic surveillance world-wide has improved our ability to rapidly identify such variants. Here we report the identification of a potential variant of interest assigned to the PANGO lineage C.1.2. This lineage was first identified in May 2021 and evolved from C.1, one of the lineages that dominated the first wave of SARS-CoV-2 infections in South Africa and was last detected in January 2021. C.1.2 has since been detected across the majority of the provinces in South Africa and in seven other countries spanning Africa, Europe, Asia and Oceania. The emergence of C.1.2 was associated with an increased substitution rate, as was previously observed with the emergence of the Alpha, Beta and Gamma variants of concern (VOCs). C.1.2 contains multiple substitutions (R190S, D215G, N484K, N501Y, H655Y and T859N) and deletions (Y144del, L242-A243del) within the spike protein, which have been observed in other VOCs and are associated with increased transmissibility and reduced neutralization sensitivity. Of greater concern is the accumulation of additional mutations (C136F, Y449H and N679K) which are also likely to impact neutralization sensitivity or furin cleavage and therefore replicative fitness. While the phenotypic characteristics and epidemiology of C.1.2 are being defined, it is important to highlight this lineage given its concerning constellations of mutations. More than a year into the COVID-19 pandemic, SARS-CoV-2 remains a global public health concern. Ongoing waves of infection result in the selection of SARS-CoV-2 variants with novel constellations of mutations within the viral genome [1] [2] [3] [4] . Some emerging variants accumulate mutations within the spike region that result in increased transmissibility and/or immune evasion, making them of increased public health importance [2] [3] [4] . Depending on their clinical and epidemiological profiles, these are either designated as variants of interest (VOI) or variants of concern (VOC) 5 , and ongoing genomic surveillance is essential for early detection of such variants. There are currently four VOCs (Alpha, Beta, Gamma and Delta) and four VOIs (Eta, Iota, Kappa and Lambda) in circulation globally. Of these, Alpha, Beta and Delta have had the most impact globally in terms of transmission and immune evasion, with Delta rapidly displacing other variants to predominate globally, including in South Africa. Ongoing genomic surveillance in South Africa also detected an increase in sequences assigned to C.1 during the third wave of SARS-CoV-2 infections in May 2021, which was unexpected since C.1, first identified in South Africa 6, 7 , was last detected in January 2021. Upon comparison of the mutational profiles between these and older C.1 sequences (which only contain the D614G mutation within the spike), it was clear that these new sequences had mutated substantially. C.1 had minimal spread globally but was detected in Mozambique and had accumulated additional mutations resulting in the PANGO lineage C.1.1 7 . These new sequences, however, were also very distinct from C.1.1, resulting in the assignment of the PANGO lineage C.1.2 on 22 July 2021 8 . C.1.2 is highly mutated beyond C.1 and all other VOCs and VOIs globally with between 44-59 mutations away from the original Wuhan Hu-1 virus (Fig. 1a) . While the VOI Lambda (C.37) is phylogenetically closest to C.1.2, the latter has distinct lineage-defining mutations. The C.1.2 lineage was first detected in the Mpumalanga and Gauteng provinces of South Africa, in May 2021 ( Fig. 1b and Supplementary Fig. 1a) . In June 2021, it was also detected in the KwaZulu-Natal and Limpopo provinces of South Africa as well as in England and China ( Fig. 1b and Supplementary Fig. 1b) . As of August 13, 2021 the C.1.2 lineage has been detected in 6/9 South African provinces (including the Eastern Cape and Western Cape), the Democratic Republic of the Congo (DRC), Mauritius, New Zealand, Portugal and Switzerland ( Fig. 1b and Supplementary Fig. 1b and c) . As of August 13, 2021 we have identified 63 sequences that match the C.1.2 lineage, of which 59 had sufficient sequence coverage to be used in phylogenetic analyses and/or spike analysis. All C.1.2 sequences including those with poor coverage (from the DRC and Mpumalanga) can be found on GISAID (www.gisaid.org), the global reference database for SARS-CoV-2 viral genomes 9,10 , and listed in Supplementary Tables 1 and 2. The majority of these sequences (n=53) are from South Africa. Though SARS-CoV-2 genomic surveillance is ongoing, there is normally a delay of 2-4 weeks between sampling and data being publicly available on GISAID. Provincial detection of C.1.2 to some extent mirrored the depth of sequencing across SA (Supplementary Fig. 1a, c and d) , suggesting that it may be present in under-sampled provinces and these numbers are most likely an underrepresentation of the spread and frequency of this variant within South Africa and globally. Nevertheless, we see consistent increases in the number of C.1.2 genomes in South Africa on a monthly basis, where in May C.1.2 accounted for 0.2% (2/1054) of genomes sequenced, in June 1.6% (25/2177) and in July 2.0% (26/1326), similar to the increases seen in Beta and Delta in South Africa during early detection (Supplementary Fig. 1e ). Preliminary molecular clock estimates suggested that the overall rate of evolution of SARS-CoV-2 in 2020 was 8x10 -4 substitutions/site/year, which equates to 24 substitutions per year 11 (Fig. 1c) . To obtain an estimate of the rate of C.1.2 specifically, we performed a root-to-tip regression of C.1.2 against C.1 sequences. This suggested that the emergence of the C.1.2 lineage resulted from a rate closer to 1.4x10 -3 , or ~41.8 mutations per year, which is approximately 1.7-fold faster than the current global rate and 1.8-fold faster than the initial estimate of SARS-CoV-2 evolution. This short period of increased evolution compared to the overall viral evolutionary rate was also associated with the emergence of the Alpha, Beta and Gamma VOCs 2,3,12 , suggesting a single event, followed by the amplification of cases, which drove faster viral evolution 13 . C.1.2 shares some mutations with C.1 but has accumulated additional mutations within the ORF1ab, spike, ORF3a, ORF9b, E, M and N proteins (Fig. 2a) H655Y, and T716I) appeared together early in the lineage evolution (Fig. 3a) . Thereafter, the majority of sequences have also accumulated the mutations Y144del, N679K and T859N. The mutations P25L, W152R, R346K, T478K, L585F, N440K, P681H, A879T, D936H and H1101Q can be seen in some of the smaller clusters from more recent sequences, further highlighting continued evolution within the lineage. Several (52%, 13/25) of the spike mutations identified in C.1.2 have previously been identified in other VOIs and VOCs (Fig. 3b) . These include D614G, common to all variants 14 , and E484K and N501Y which are shared with Beta and Gamma, with E484K also seen in Eta and N501Y in Alpha. The T478K substitution is seen in <50% of the C.1.2 viruses but is also observed in Delta. N440K and Y449H co-localize on the same outer face of C.1.2 RBD (Fig. 3c) . While these mutations are not characteristic of current VOCs/VOIs, they have been associated with escape from certain class 3 neutralizing antibodies 15, 16 . The combination of these mutations presents a potentially novel antigenic landscape for C.1.2 variant specific antibodies. More striking, however, was the remodeling of NTD relative to the Wuhan Hu-1 sequence (blue, Mutations close to the furin cleavage site have also been observed in VOCs, H655Y has been seen in Gamma and P681R/H have been seen in Alpha, Delta, and Kappa (S1/S2 region in predominating) and may therefore perform a similar role by increasing the local, relative positive charge and improving furin cleavage. Evolution involving the introduction of N679K or P681H has recently been seen within Gamma (P.1) 18 . The identification of convergent evolution between C.1.2 and other VOIs and VOCs suggests that this variant may also share concerning phenotypic properties with VOCs. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 24, 2021. ; https://doi.org/10.1101/2021.08.20.21262342 doi: medRxiv preprint We are currently assessing the impact of this variant on antibody neutralization following SARS-CoV-2 infection or vaccination against SARS-CoV-2 in South Africa. We have identified a new SARS-CoV-2 variant assigned to the PANGO lineage C. As part of monitoring the viral evolution by the Network for Genomics Surveillance of South Africa (NGS-SA) 40 , seven sequencing hubs receive randomly selected samples for sequencing every week according to approved protocols at each site. These samples include remnant nucleic acid extracts or remnant nasopharyngeal and oropharyngeal swab samples from routine diagnostic SARS-CoV-2 PCR testing, from public and private laboratories in South Africa. Permission was obtained for associated metadata for the samples including date All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in Sequencing was performed using the COVIDSeq or nCoV-2019 ARTIC network sequencing protocol (https://artic.network/ncov-2019), which is an amplicon-based next-generation sequencing approach (Illumina, Inc, USA) 41 Raw reads from Illumina sequencing were assembled using the Exatype NGS SARS-CoV-2 pipeline v1. The 'Phylogenetic Assignment of Named Global Outbreak Lineages' (PANGOLIN) software suite (https://github.com/hCoV-2019/pangolin) was used for the dynamic SARS-CoV-2 lineage classification 46 . The SARS-CoV-2 genomes in our dataset were also classified using All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in At the time of writing, there were over 2.9 million SARS-CoV-2 genomes available on GISAID (https://www.gisaid.org). Due to the size of this dataset, sub-sampling was performed to obtain a representative but manageable sample of genomes. A preliminary dataset was downloaded from GISAID; the options 'complete', 'high coverage', and 'collection date complete' were selected to ensure that only genomes with complete date information and less than 5% N content were included. This contained all C.1.2 genomes, genomes from the C.1 lineage (the original lineage to which C.1.2 was assigned), the C. We conducted temporal analysis to ensure that C.1.2 possesses a strong enough temporal signal for dated phylogenetic analysis, as well as to get an estimate of the molecular clock rate for the C. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 24, 2021. ; sequences with CoVDB (https://covdb.standford.edu/) showed that they contain several spike mutations not characteristic of either C.1 or C.1.2, suggesting they have been mis-assigned. There were also several samples that may violate the molecular clock assumption. These sequences were removed and the tree remade. The final tree showed a strong positive temporal signal, with a correlation coefficient of 0.97 and R 2 of 0.95. The slope of the regression suggested a preliminary clock rate estimate of 1.4x10 -3 . Phylogenetic analysis was conducted with a custom Nextstrain SARS-CoV-2 build 47 Fig. 3b ). All sequences possessed at least eight major mutations; this, along with the clustering, was used as evidence to re-assign the sequences to C.1.2, resulting in a set of 54 C.1.2 genomes. We modelled the spike protein on the basis of the Protein Data Bank coordinate set 7A94. We perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in shades of grey. Lineage-defining mutations (found in >50% of sequences) are colored dark purple, with additional mutations (present in <50% of sequences) colored light purple. Key mutations known/predicted to influence neutralization sensitivity (C136F and P25L, Y144del, L242del/A243del, and E484K), or furin cleavage (H655Y and N679K) are indicated. Image was created using the PyMOL molecular graphic program. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 24, 2021. ; Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus Detection of a SARS-CoV-2 variant of concern in South Africa Emergence of a new SARS-CoV-2 variant in the UK WHO. Tracking SARS-CoV-2 variants Sixteen novel lineages of SARS-CoV-2 in South Africa A year of genomic surveillance reveals how the SARS-CoV-2 pandemic Cov-lineages. C.1.2 pango designation Data, disease and diplomacy: GISAID's innovative contribution to global health Global initiative on sharing all influenza data -from vision to reality Temporal signal and the phylodynamic threshold of SARS-CoV-2 The molecular clock of variants of concern The emergence and ongoing convergent evolution of the N501Y lineages coincides with a major global shift in the SARS-CoV-2 selective landscape Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody Recognition Sites in SARS-CoV-2 RBD where mutations reduce binding by antibodies / sera SARS-CoV-2 501Y.V2 escapes neutralization by South African COVID-19 donor plasma Emergence and spread of SARS-CoV-2 P.1 (Gamma) lineage variants carrying Spike mutations 141-144del, N679K or P681H during persistent viral circulation in Amazonas SARS-CoV-2 Variants in Patients with Immunosuppression HIV-1 and SARS-CoV-2: Patterns in the Evolution of Two Pandemic Pathogens Affinity maturation of SARS-CoV-2 neutralizing antibodies confers potency, breadth, and resilience to viral escape mutations Persistence and Evolution of SARS-CoV-2 in an Immunocompromised Host Recurrent deletions in the SARS-CoV-2 spike glycoprotein drive antibody escape Case Study: Prolonged Infectious SARS-CoV-2 Shedding from an Asymptomatic Immunocompromised Individual with Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding Impact of South African 501.V2 Variant on SARS-CoV-2 Spike Infectivity and Neutralization: A Structure-based Computational Assessment Analysis of SARS-CoV-2 variant mutations reveals neutralization escape mechanisms and the ability to use ACE2 receptors from additional species Reduced neutralization of SARS-CoV-2 B.1.1.7 variant by convalescent and vaccine sera The N501Y spike substitution enhances SARS-CoV-2 transmission Increased transmission of SARS-CoV-2 lineage B.1.1.7 (VOC 2020212/01) is not accounted for by a replicative advantage in primary airway cells or antibody escape Loss of furin cleavage site attenuates SARS-CoV-2 pathogenesis Functional evaluation of proteolytic activation for the SARS-CoV-2 variant B.1.1.7: role of the P681H mutation Resistance of SARS-CoV-2 variants to neutralization by monoclonal and serum-derived polyclonal antibodies The monoclonal antibody combination REGEN-COV protects against SARS-CoV-2 mutational escape in preclinical and human studies Mutational escape from the polyclonal antibody response to SARS-CoV-2 infection is largely shaped by a single class of antibodies SARS-CoV-2 spike E484K mutation reduces antibody neutralisation N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2 Antibody Resistance of SARS-CoV-2 Variants B.1.351 and B.1.1.7 Identification of SARS-CoV-2 spike mutations that attenuate monoclonal and serum antibody neutralization A genomics network established to respond rapidly to public health threats in South Africa High throughput detection and genetic epidemiology of SARS-CoV-2 using COVIDSeq next-generation sequencing Genome detective coronavirus typing tool for rapid identification and characterization of novel coronavirus genomes Genome Detective: An automated system for virus identification from highthroughput sequencing data Clade assignment, mutation calling and sequence quality checks. Nextstrain (2021) AliView: a fast and lightweight alignment viewer and editor for large datasets A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Nextstrain: real-time tracking of pathogen evolution MAFFT multiple sequence alignment software version 7: Improvements in performance and usability IQ-TREE: a fast and effective stochastic algorithm for estimating maximumlikelihood phylogenies Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) TreeTime: Maximum-likelihood phylodynamic analysis