key: cord-0754682-fok1eadx authors: Calvignac-Spencer, Sébastien; Budt, Matthias; Huska, Matthew; Richard, Hugues; Leipold, Luca; Grabenhenrich, Linus; Semmler, Torsten; von Kleist, Max; Kröger, Stefan; Wolff, Thorsten; Hölzer, Martin title: Rise and Fall of SARS-CoV-2 Lineage A.27 in Germany date: 2021-07-29 journal: Viruses DOI: 10.3390/v13081491 sha: 8565d8bf671d904e7d158b631d82779fc06e9768 doc_id: 754682 cord_uid: fok1eadx Here, we report on the increasing frequency of the SARS-CoV-2 lineage A.27 in Germany during the first months of 2021. Genomic surveillance identified 710 A.27 genomes in Germany as of 2 May 2021, with a vast majority identified in laboratories from a single German state (Baden-Wuerttemberg, n = 572; 80.5%). Baden-Wuerttemberg is located near the border with France, from where most A.27 sequences were entered into public databases until May 2021. The first appearance of this lineage based on sequencing in a laboratory in Baden-Wuerttemberg can be dated to early January ’21. From then on, the relative abundance of A.27 increased until the end of February but has since declined—meanwhile, the abundance of B.1.1.7 increased in the region. The A.27 lineage shows a mutational pattern typical of VOIs/VOCs, including an accumulation of amino acid substitutions in the Spike glycoprotein. Among those, L18F, L452R and N501Y are located in the epitope regions of the N-terminal- (NTD) or receptor binding domain (RBD) and have been suggested to result in immune escape and higher transmissibility. In addition, A.27 does not show the D614G mutation typical for all VOIs/VOCs from the B lineage. Overall, A.27 should continue to be monitored nationally and internationally, even though the observed trend in Germany was initially displaced by B.1.1.7 (Alpha), while now B.1.617.2 (Delta) is on the rise. From the beginning of last year, the world has witnessed the emergence and unprecedented spread of SARS-CoV-2 which has so far claimed at least 3.5 million lives worldwide (June 2021, WHO). Viral genome sequencing and subsequent integrated data analyses proved to be essential [1] [2] [3] [4] to track the spread of the virus and to detect emerging mutations as well as variants of concern/interest (VOC/I) that can be associated with antibody escape or higher transmissibility [5] [6] [7] [8] . Since January 2021, the German Electronic Sequence Data Hub (DESH) has been active, through which all sequencing laboratories in Germany transmit SARS-CoV-2 genome sequences to the Robert Koch Institute (RKI), Germany's national public health institute. This technical platform enables the pooling of sequence data throughout Germany, further enriched by in-house-sequenced samples as part of the SARS-CoV-2 Integrated Molecular Surveillance lab-network (IMS-SC2). Data submitted to DESH from sequencing laboratories across the country, as well as IMS-SC2 sequence data, in combination with additional international data from resources such as GISAID [9] or EMBL-EBIs COVID-19 Data Portal [10] , enabled comprehensive genomic surveillance of the pandemic across Germany. SARS-CoV-2 genomes belonging to the A.27 lineage were already identified in France in mid-January 2021 and analyzed in the context of other sequences available at that time, originating from the Comoros archipelago, Western European countries (mainly metropolitan France), Turkey and Nigeria [11] . In addition, colleagues from the Rhein-Neckar district in Germany reported a local occurrence and decline of the A.27 variant in January and March 2021 [12] , while Anoh et al. also found A.27 sequences on the rise in western sub-Saharan Africa between May 2020 and March 2021 [13] . Here, we now extend these analyses and also use the example of the spread of A.27 in Germany in the first months of 2021 to show how genomic surveillance enables monitoring of SARS-CoV-2 variants at the national level. A.27 is of particular interest because the lineage shows a mutational pattern typical of VOIs/VOCs, including an accumulation of amino acid changes in the spike that are thought to lead to immune escape and higher transmissibility. In addition to the sequences obtained via DESH, three samples assigned to the A.27 lineage were sequenced directly at the RKI as part of the IMS-SC2 lab-network. Amplicon sequencing was either performed via Illumina (n = 1) or Nanopore (n = 2) and consensus genomes were reconstructed using covPipe (Illumina data, unpublished, v3.0.1, https: //gitlab.com/RKIBioinformaticsPipelines/ncov_minipipe, accessed on 1 May 2021) or poreCov (Nanopore data, [14] , v0.7.8, https://github.com/replikation/poreCov, accessed on 1 May 2021). For Illumina, amplicon sequencing was performed using the CleanPlex SARS-Cov-2 Flex amplicon panel on an iSeq 100 system with 150 bp paired-end reads yielding 283k reads for this sample. Nanopore sequencing was performed using the ARTIC V3 primer set on a MinION flow cell resulting in 116k and 108k reads per A.27 sample, respectively. Both sequencing and reconstruction approaches resulted in high-quality consensus sequences with an N content of 0.40 % (both Nanopore-derived sequences) and 1.64 % (Illumina) per genome. Based on the genomic data, we tested for differences in the proportion of the A.27 lineage using a Fisher exact test. Tests were performed separately for suspect and random sampling strategies. For each German state, the test was performed on a 2 × 2 count table showing, for pairs of consecutive calendar weeks (CW), the number of A.27 samples and the total number of non-A.27 samples in a state. If no A.27 sequences were identified for a particular federal state in a given week, that week was skipped and sequences from the next week for that state were considered instead. We chose this approach to be more conservative in detecting an increase in the proportion. Only states in which A.27 samples were detected in at least three CW were considered. The obtained set of p-values was then corrected for multiple testing using a Benjamin-Hochberg procedure, keeping the adjusted p-values below 0.1. We used all available A.27 genomes and the selection of A genomes from NextStrain [15] global build to assemble a dataset for phylogenetic analyses including 718 A.27 genomes from RKI and GISAID. After excluding low quality/missing sampling date genomes (<90 % reference genome coverage, >10 non-ACGTN ambiguous positions, >40 SNPs with respect to NC_045512.2), we aligned a total of 875 genomes with MAFFT using default parameters [16] . We used the resulting alignment to estimate a no-clock maximum-likelihood tree with IQ-TREE2 [17] . This tree showed a strong temporal signal in a regression of root-to-tip distance versus time (R 2 : 0.81) and was therefore used to estimate a time tree using tip dates only with TreeTime [18] . The PyMOL software version 1.7.2.1 [19] was used to label variant residues on the structure of the spike protein based on the template structure 7kj2 obtained from the Protein Data Base (www.pdb.org, accessed on 1 July 2021) and provided by [20] . A total of 377 A.27 genomes from 16 different (mostly European) countries have been made available on GISAID (as of 10 May 2021). A.27 was first detected in Denmark in mid-December 2020 but rather 214 genomes were reported from France including 12 from Mayotte, an overseas region and single territorial collectivity of France. The numbers might therefore point to an origin of A.27 in France where the lineage may have spread early. However, A.27 was also found in Côte d'Ivoire (western sub-Saharan Africa) [13] and Burkina Faso and Togo (based on GISAID sequences) in early 2021, indicating another possible origin of A.27 in West Africa. In addition, A.27 cases have been reported from West African countries based on sequencing data from Belgian forces involved in overseas operations. [21] . Genomic surveillance at the RKI identified 710 sequences belonging to the lineage A.27 (as of 2 May 2021). The earliest sequence was discovered in a laboratory in Baden-Wuerttemberg (BW) and occurred in calendar week (CW) 01 in 2021; since then, a vast majority of A.27 cases in Germany were sequenced in BW (n = 572 to CW16, based on data through 2 May 2021). The relative abundance of this lineage in the region increased until CW08 (6.12 %) but has since then decreased (1.21 % in CW13). Meanwhile, the frequency of the VOC B.1.1.7 kept increasing in the region (5.84 % in CW 03, 71.0 % in CW 14; see corresponding reports at https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_ Coronavirus/DESH/Berichte-VOC-tab.html, accessed on 27 July 2021). Of the 710 A.27 genomes from Germany, 206 were obtained following a random sampling strategy, while 271 were collected as 'suspect samples' based on variant-specific PCRs or epidemiological circumstances (remainder unknown). For sequences from both categories ('suspect', 'random'), the proportion of reported A.27 sequences was compared between calendar weeks and significant increases were detected (Fisher exact test, adjusted p-value < 0.1, see Methods). In CW 07-08, we observed a 2.1-fold increase in proportion among the suspect samples in the BW region, followed the next week by a 3-fold increase on the set of samples obtained using a random sampling strategy ( Figure 1A ). Interestingly, on the random samples and during the same period CW 07-10, Schleswig-Holstein also showed an impressive 33-fold increase in proportion ( Figure 1A ) but absolute numbers were still low ( Table 1 ). Note that no A.27 sequences were detected in CW08 and 09 for that region. based on suspect (n = 271) and random sampling (n = 206) strategies. Significant increases in proportion are marked with a star (adjusted p-value < 0.1). In CW 07-08, we observed a 2.1-fold increase in the proportion among suspect samples in the BW region and in CW 08-09 among random samples in the same region. Between CW07 and CW10 we observed an impressive 33-fold increase in proportion in Schleswig-Holstein among the random samples, however, no A.27 sequences were detected in CW08 and 09 for that region, and absolute numbers are low. (B) Epidemiological data of reported cases of lineage A.27 (n = 205) in Germany per age group. Median age of cases was 45 years. Note that information was not available in the reporting system for all data points, so they may have no value. In general, no information was available for CW04-06 as of 30 April 2021. (C) Distribution of cases over federal states based on epidemiological data, 92 % were notified in Baden-Wuerttemberg (date of reporting: 30 April 2021). Epidemiological data was not available for all federal states. For 205 cases reported between CW03 and 14, additional information was available via the national electronic reporting system for surveillance of notifiable infectious diseases (SurvNet, implemented in 2001; [22] ) (Figure 1B,C; geographic distribution similar to that seen in the entire dataset, data not shown). Most cases were reported for middle-aged patients ( Figure 1B) . Hospitalization was reported in 17 patients, and three died. None of the 205 reported cases with epidemiologic data originated in Schleswig-Holstein. For phylogenetic analysis, we assembled a dataset of 718 A.27 genomes from RKI and GISAID and a selection of additional A genomes from NextStrain [15] as an outgroup. All A.27 genomes appeared in a monophyletic group whose most recent common ancestor (MRCA) was dated to early August 2020 (6 August 2020; Figure 2 ). However, all but three basal A.27 sequences formed a clade whose time to MRCA was much more recent, around late October 2020 (29 October 2020). The tree shows a basal polytomy from which different clades descend. One clade comprises many sequences originating in France, whereas the largest clade mainly comprises German sequences (Figure 2 ). The A.27 lineage is characterized by a high number of characteristic (lineage-defining) mutations that are also known from other VOC/VOI ( Figure 3C ). In total, A.27 harbors 17 lineage-defining mutations ( Figures 3A and 4A) , seven of which result in non-synonymous nucleotide substitutions in the spike protein: L18F, L452R, N501Y, A653V, H655Y, D796Y, G1219V. Among those, three alterations are of particular concern as they are located in the NTD or RBD epitope regions, respectively, and are suspected to result in immune escape (L18F, L452R and N501Y) and/or higher infectivity and transmissibility (L452R and N501Y). Antibody escape data for the RBD, integrating multiple experimental studies [5] (https://jbloomlab.github.io/SARS2_RBD_Ab_escape_maps, accessed on 3 May 2021), assign maximum mutation escape scores of 0.97 (L452R) and 0.90 (N501Y) over multiple antibodies/sera or antibody/serum types. The polymorphic positions are labeled on the S protein structure ( Figure 3B ), indicating the localization of L452R (reduced antibody neutralization [7, 8] ) and N501Y (related to increased transmissibility [6] ) in the RBD, A653V/H655Y in proximity to the S1/S2 furin cleavage site at position 681 that promotes infection and cell-cell fusion [23] , and D795Y, closely located to the essential TMPRSS2-cleavage site at position 815 [24] . The L18F replacement is part of an antigenic supersite in the N-terminal domain of the spike protein [25] . Antibodies targeting NTD were shown to be less abundant, but more potent than those targeting RBD [25] . Mutations at this position were selected via in vitro passaging experiments and showed cross-resistance to monoclonal antibodies targeting NTD [25] . The S-L18F alteration also appears in VOCs that are associated with immune escape and reinfection, such as B.1.351 and P.1. Recently, the L452R change in the RBD also known from B.1.427/429 [26] and B.1.617.2 and its novel sublineages AY.1 and AY.2 ( Figure 3C ) was suggested to contribute to escaping human leukocyte antigen-restricted cellular immunity while also increasing the binding affinity to ACE2, thus increasing viral infectivity and potentially enhancing virus replication [27] . S-N501Y also appears in VOCs B.1.1.7, P.1 and B.1.351, and may increase infectivity in vitro [28] and appears to confer resistance against some RBD targeting antibodies (class 1) [29] . The H655Y substitution adjacent to the S1/S2 cleavage site and also known from P.1 was recently detected in a potential new VOI within the A lineage (temporarily designated A.VOI.V2) identified from three cases of incoming travelers from Tanzania to Angola [30] . Importantly, A.27 does not possess the D614G change, which emerged early in the pandemic and rapidly became dominant. D614G is associated with increased receptor binding [31] and infectivity [32] . Neither does A.27 contain the likely functionally equivalent Q613H that is present in A.23.1 [33] (Figure 3C ). Almost all A.27 genomes have multiple deletions such as ORF3a:del258/259 and ORF8:del119/120, a few of which are highly conserved (Figures 3 and 4) . We also observed in 58 (8.07 %) of the investigated A.27 genomes a 12 nt long insertion (CTTTCGATCTCT) located at position 27,373 near the 3'-end of ORF6. However, all sequences with this insertion were submitted by a single laboratory via DESH and further investigation showed that the inserted sequence is similar to a potential primer sequence in the first genomic amplicon. Therefore, a technical error in the sequencing protocol or in the genome reconstruction pipeline of the submitting laboratory cannot be excluded. The rare, relatively deep-branching lineages in the phylogenetic tree ( Figure 2 ) prompted us to check the mutation profile of the corresponding three genomes (Figure 4 , GISAID accessions: EPI_ISL_1170076, EPI_ISL_1567985, EPI_ISL_1353586). None possessed all the seven aforementioned non-synonymous mutations in spike, nor any of the frequently observed deletions (Figure 4) . Intriguingly, one of the genomes (EPI_ISL_1567985) encoded S-L452R, S-N501Y, and S-G1219V associated with six other amino acid changes in the spike (A570D, D614G, P681H, T716I, S982A, I1221V), most interestingly comprising D614G and also P681H ( Figure 4B ). While we report here on these three unusual A.27 genomes, further analysis is required also on their raw sequencing data to confirm the observed changes that distinguish these sequences from standard A.27. Figure 2 . Among "standard" A.27 genomes harboring all seven spike mutations and the three prevalent deletions, we also found three genomes that show a varying mutation profile in the spike and missing all three high-frequency deletions. While the deletion on ORF6 at position 63 is listed with a high frequency (higher 75 %) on outbreak.info (accessed on 24 May 2021), we only observe a frequency of 8.2 % in our data set based on a detection via NextStrain. * EPI_ISL_1353586 is not yet available on GISAID due to rejection of the sequence because of a frameshift. Here, we report the increasing frequency of the SARS-CoV-2 lineage A. 27 we assume an earlier origin in West Africa. Thus, we speculate that West Africa may be the origin of the A.27 lineage, which may also explain why early and higher numbers of this specific lineage were first discovered in France and Mayotte. However, we must consider sparse sampling strategies and other caveats (random vs. suspect sampling, travel history, sequencing capacity, overall reporting, and data availability). Therefore, further analyses are needed to fully elucidate the origin of the SARS-CoV-2 A.27 lineage. Since A.27 shows mutations associated with both higher transmissibility and reduced antibody reactivity, we suggest that enhanced monitoring is warranted on both national and international levels. The A.27 lineage was indeed continuously detected in other European countries (France, Slovenia and the United Kingdom). However, the observed trend in BW clearly showed that A.27 was initially displaced by B.1.1.7 (Alpha) while now B.1.617.2 (Delta) is on the rise and should be monitored closely. Based on genomic data obtained via a random sampling strategy, we also found a significant increase in the proportion of A.27 sequences between CW 07-10 in the federal state of Schleswig-Holstein. Finally, our analyses identified rare but clearly divergent genomes assigned to A.27, suggesting the current definition of the lineage probably needs to be reexamined. One of these rare genomes combines a D614G background and several mutations of concern and should therefore also be monitored. A.27 consensus genome sequences used in this study and obtained via DESH were uploaded to GISAID and the EMBL-EBI COVID-19 Data Portal (PRJEB44987). Note, that due to sequencing or reconstruction errors (e.g., causing frameshifts) not all A.27 genome sequences obtained via DESH could be uploaded to GISAID immediately. For example, EPI_ISL_1353586 could not be uploaded to GISAID due to an incorrectly masked deletion at position 28,252 causing a frameshift. Once potential errors in the sequences are fixed, which currently prevent uploading, the remaining sequences will also be made publicly available on GISAID. Meanwhile, all sequences and metadata obtained via DESH are uploaded daily to doi.org/10.5281/zenodo.5139363 (accessed on 27 July 2021) including also all A.27 sequences used in this study. Computational strategies to combat COVID-19: Useful tools to accelerate SARS-CoV-2 and coronavirus research An integrated national scale SARS-CoV-2 genomic surveillance network SARS-CoV-2-Varianten: Evolution im Zeitraffer Genomic surveillance at scale is required to detect newly emerging strains at an early timepoint. medRxiv 2021 Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody Recognition The N501Y spike substitution enhances SARS-CoV-2 transmission Transmission, infectivity, and antibody neutralization of an emerging SARS-CoV-2 variant in California carrying a L452R spike protein mutation The Impact of Mutations in SARS-CoV-2 Spike on Viral Infectivity and Antigenicity Global initiative on sharing all influenza data-From vision to reality The COVID-19 Data Portal: Accelerating SARS-CoV-2 and COVID-19 research through rapid open access data sharing Spreading of a new SARS-CoV-2 N501Y spike variant in a new lineage Local emergence and decline of a SARS-CoV-2 variant with mutations L452R and N501Y in the spike protein SARS-CoV-2 variants of concern, variants of interest and lineage A poreCov-An easy to use, fast, and robust workflow for SARS-CoV-2 genome reconstruction via nanopore sequencing. Front Nextstrain: Real-time tracking of pathogen evolution MAFFT multiple sequence alignment software version 7: Improvements in performance and usability IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era Maximum-likelihood phylodynamic analysis PyMOL: An open-source molecular graphics tool. CCP4 Newsl. Protein Crystallogr A trimeric human angiotensin-converting enzyme 2 as an anti-SARS-CoV-2 agent Variant Analysis of SARS-CoV-2 Genomes from Belgian Military Personnel Engaged in Overseas Missions and Operations SurvNet electronic surveillance system for infectious disease outbreaks Furin cleavage of SARS-CoV-2 Spike promotes but is not essential for infection and cell-cell fusion TMPRSS2 and furin are both essential for proteolytic activation of SARS-CoV-2 in human airway cells N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2 Neutralization of SARS-CoV-2 Variants B.1.429 and B.1.351 An emerging SARS-CoV-2 mutant evading cellular immunity and increasing viral infectivity SARS-CoV-2 spike variants exhibit differential infectivity and neutralization resistance to convalescent or post-vaccination sera mRNA vaccine-elicited antibodies to SARS-CoV-2 and circulating variants A novel variant of interest of SARS-CoV-2 with multiple spike mutations detected through travel surveillance in Africa SARS-CoV-2 D614G spike mutation increases entry efficiency with enhanced ACE2-binding affinity SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity A SARS-CoV-2 lineage A variant (A.23.1) with altered spike has emerged and is dominating the current Uganda epidemic SARS-CoV-2 Lineage Comparison from outbreak We gratefully acknowledge the authors from the originating laboratories responsible for obtaining the specimens and the submitting laboratories where genetic sequence data were generated and shared via DESH (German Electronic Sequencing Data Hub) and GISAID, on which this research is based. We thank the data management team (MF4) and the DESH team at RKI for data handling and sharing, as well as the sequencing unit (MF2) for in-house sequencing in the context of the Integrated Molecular Surveillance SARS-CoV-2 (IMS-SC2). We thank all members of the bioinformatics team at RKI (MF1) involved in SARS-CoV-2 data handling, analyses, and reporting and particularly Oliver Drechsel for dealing with GISAID data uploads and Stephan Fuchs for fruitful discussions. We thank Jonas Fuchs for a helpful exchange, especially about the likely sequence insertion artifact found in a subset of the A.27 genomes. The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.