key: cord-0723703-i00c2ylu authors: Shen, Lishuang; Bard, Jennifer Dien; Triche, Timothy J.; Judkins, Alexander R.; Biegel, Jaclyn A.; Gai, Xiaowu title: Rapidly emerging SARS-CoV-2 B.1.1.7 sub-lineage in the United States of America with spike protein D178H and membrane protein V70L mutations date: 2021-06-26 journal: Emerging microbes & infections DOI: 10.1080/22221751.2021.1943540 sha: 892b3561e8a82d46ffa005d862d86c2146636c74 doc_id: 723703 cord_uid: i00c2ylu The SARS-CoV-2 B.1.1.7 lineage is highly infectious and as of April 2021 accounted for 92% of COVID-19 cases in Europe and 59% of COVID-19 cases in the U.S. It is defined by the N501Y mutation in the receptor-binding domain (RBD) of the Spike (S) protein, and a few other mutations. These include two mutations in the N terminal domain (NTD) of the S protein, HV69-70del and Y144del (also known as Y145del due to the presence of tyrosine at both positions). We recently identified several emerging SARS-CoV-2 variants of concerns, characterized by Membrane (M) protein mutations, including I82T and V70L. We now identify a sub-lineage of B.1.1.7 that emerged through sequential acquisitions of M:V70L in November 2020 followed by a novel S:D178H mutation first observed in early February 2021. The percentage of B.1.1.7 isolates in the US that belong to this sub-lineage increased from 0.15% in February 2021 to 1.8% in April 2021. To date, this sub-lineage appears to be U.S.-specific with reported cases in 31 states, including Hawaii. As of April 2021, it constituted 36.8% of all B.1.1.7 isolates in Washington. Phylogenetic analysis and transmission inference with Nextstrain suggest this sub-lineage likely originated in either California or Washington. Structural analysis revealed that the S:D178H mutation is in the NTD of the S protein and close to two other signature mutations of B.1.1.7, HV69-70del and Y144del. It is surface exposed and may alter NTD tertiary configuration or accessibility, and thus has the potential to affect neutralization by NTD directed antibodies. Introduction B.1.1.7 emerged in the UK and was the first major SARS-CoV-2 variant of concern (VOC) that is both more transmissible and apparently more virulent [1] . It now accounts for 50-90% of the COVID-19 cases in US and Europe. The Spike (S) protein N501Y mutation in the receptor-binding domain (RBD) confers higher binding affinity of the S protein for ACE2, while the other two deletions, HV69-70del and Y144del in the N-terminal domain (NTD) may also play a role in ACE2 receptor binding or neutralizing antibody escape [2] . With millions of new B.1.1.7 cases in recent months, there is a very high probability of continuous acquisitions of new mutations, some of which may result in the emergence of new and even more infectious sub-lineages of B.1.1.7. While these new mutations may not be significantly deleterious by themselves, but when they appear in the context of other mutations within this VOC the result may be a more transmissible or pathogenic virus. This calls for rigorous genomic surveillance for newly acquired mutations in previously reported VOCs, including but not limited to B.1.1.7 and B.1.351. Using the Children's Hospital Los Angeles (CHLA) COVID-19 Analysis Research Database (CARD) [3] , and viral sequences submitted to GISAID and NCBI GenBank, we have routinely performed genomic epidemiology and genomic surveillance studies of local, national and international databases [4] [5] [6] [7] [8] [9] . This allowed us to identify a new rapidly expanding SARS-CoV-2 lineage (B.1.575) with a signature mutation I82T in the M gene [7] . In the same study, we identified multiple other M mutations including V70L that are currently being encountered with significantly increased frequency. We have identified the M:V70L mutation in multiple SARS-CoV-2 lineages but primarily in the B. The study design conducted at Children's Hospital Los Angeles was approved by the Institutional Review Board under IRB CHLA-16-00429. Whole genome sequencing of the 2900 samples previously confirmed at Children's Hospital Los Angeles to be positive for SARS-CoV-2 by reverse transcription-polymerase chain reaction (RT-PCR) was performed as previously described [5] . Full-length SARS-CoV-2 sequences had been periodically downloaded from GISAID [10, 11] and NCBI GenBank. They were combined with SARS-CoV-2 sequences from CHLA patients, annotated, and curated using a suite of bioinformatics tools, CHLA-CARD, as previously described [3] . A custom Surging Mutation Monitor (SMM) standardized and integrated the viral genome and demographic data, in order to identify the trend of surging mutations and lineages across state and country levels. The current study was based on the 1.33 million global viral genomes that were available on 1 May 2021. Phylogenetic analysis was conducted using the Next-Strain phylogenetic pipeline (version 3.0.1) (https:// nextstrain.org/). Mafft (v7.4) was used in multiple sequence alignment [12] , IQ-Tree (multicore version 2.1.1 COVID-edition) and TreeTime version 0.7.6 were used to infer and time-resolved evolutionary trees, and reconstruct ancestral sequences and mutations [13, 14] . Phylogenetic analysis. Structural predictions of mutant Spike proteins were carried out against the wild type protein PDB qhd43416 (https://zhanglab.ccmb.med.umich.edu/ COVID-19/) using Missense3D service hosted online by the Imperial College London (http://www.sbg.bio. ic.ac.uk/~missense3d/) [15] . CoV3D was used as the Spike Protein Mutation Viewer for the multiple mutations in B.1.1.7 (https://cov3d.ibbr.umd.edu/ MutViewer/QTY83983). Identification of a rapidly emerging B.1.1.7 sublineage We evaluated 1,333,679 SARS-CoV-2 viral genomes available on 1 May 2021, including 2900 from our own institution and the rest from GISAID and NCBI GenBank. We searched for SARS-CoV-2 mutations with a significantly higher prevalence rate in both the US and globally. Candidate mutations were further partitioned by pangolin lineage to identify emerging mutations in the context of a specific lineage, such as B.1.1.7 or B.1.351. We focused initially on the M mutations that we previously identified, including V70L, that was spiking near the end of 2020 [7] . Overall the percentage of isolates that carried the M:V70L mutation had been relatively stable in the US and globally with a gradual month to month increase (Table 1 ). In the vast majority of cases, the M:V70L mutation occurred on the B.1.1.7 lineage. While the percentage of B.1.1.7 isolates with the V70L mutation remained relatively stable across the world, the percentage fluctuated significantly in the US, attributable largely to the initial small number of B.1.1.7 cases in the U.S. We identified the acquisition of another S mutation, D178H, in this B.1.1.7 sub-lineage ( Figure 1 and Table 2 ), which was estimated to have occurred on 23 January 2021. By April, the prevalence of SARS-CoV-2 isolates carrying the S:D178H mutation increased to 1.05% nationally and as high as 14.77% in Washington. When we examined the prevalence of S:D178H in the context of the B. Table 3 ). In California, S:D178H was first seen in December 2020, but it was not seen within the B.1.1.7 lineage until 4 February 2021 (Table 3 ). Its prevalence increased to 1.6% (45/2904) in April compared to all isolates studied in California, The 3D structure of the Spike protein, as visualized using the CoV3D mutation viewer. Using these results we were able to show that the S:D178H mutation is close structurally to two signature deletions of B.1.1.7, HV69_70del and Y144del ( Figure 4 ). They are all surface exposed and likely alter N terminal domain (NTD) tertiary configuration. It should be noted that our study relied primarily upon sequence data deposited at GISAID, which represents a limitation and could also introduce potential bias, as state public health laboratories have varied sequencing capacity and nonuniform data sharing and reporting practices. It is less likely to be biased significantly by the practice of a single laboratory though. As an example, we included 2900 SARS-CoV-2 sequences that we obtained at Children's Hospital Los Angeles since March 2020. We found only 29 sequences belonging to the B. This abrupt change could reflect undersampling or potentially reflect superspreader events. Given the spike in the number of B.1.1.7 cases, this sub-lineage clearly appears to be more transmissible than even the original B.1.1.7 lineage. This finding warrants prompt and further attention by public health authorities, as this mutation profile is closely linked to the resurgence of cases in Washington in particular. It also strongly supports the now widely recognized need for more extensive SARS-CoV-2 viral sequencing of PCR positive COVID-19 cases for detection of new mutations of concern as part of widespread genomic surveillance [16, 17] . The S:D178H mutation, while demonstrably associated here with the more pathogenic B.1.1.7 lineage, is not necessarily by itself more pathogenic. Dozens of SARS-CoV-2 genomes that carry the S:D178H mutation were reported before February, but none of these demonstrated the increased frequency seen when the mutation occurs in the context of the B.1.1.7 lineage. Phylogenetic analysis revealed a distinct and long branch leading to the new S:D178 branch after M:V70L. Together, these observations suggest that the S:D178H mutation is recurrent, but only increased exponentially in the context of the more pathogenic B.1.1.7 lineage, which serves as an argument for its fitness. This is the same observed The S:D178H mutation arose independently again in the US on the B.1.1.7-M:V70L background. The rapid increase in its prevalence, only after its acquisition by the B.1.1.7-M:V70L sub-lineage suggests this combination of mutations is associated with increased transmissibility. It is also of interest that this mutation occurs in the NTD, unlike most of the mutations associated with current VOC that are centred on the spike protein RBD, implying that NTD mutations beyond the original 69-70del and the 144del are of concern. And finally, it should be noted that this NTD mutation co-exists with the previously reported M protein mutation M:V70L, suggesting that M protein mutations also contribute to enhanced biologic "fitness" or pathogenicity of this sub-lineage. The appearance of the S:D178H mutation in the context of the B.1.1.7 lineage is temporally associated with the increased incidence of COVID-19 in Washington. New cases in Washington were higher than the national level at the time of the study. According to New York Time COVID-19 dashboard, the 7-day average of new cases on May 2 was 1379 in Washington, which was only a 50% reduction compared to 2757 cases on December 15. In comparison, the numbers on May 2 was 49,270 in US, a 77.3% reduction from the December 15 number of 21,7325. The appearance of this new B.1.1.7 sub-lineage temporally linked to increased cases in Washington warrants further investigation. The potential effect of the S:D178H mutation on immunity and vaccine "escape" also warrant further analysis. Mutations in the Spike N-terminal domain have been associated with a lack of neutralization by NTD directed antibodies, especially when the N5 loop is affected [18, 19] . The NTD initiates viral binding to the ACE2 receptor-expressing host cell. Since the D178H falls in the NTD domain close to the N5 loop, it may alter NTD structure and antibody recognition. It may thus have a similar immune evasion effect as the HV69-70del and Y144del mutations [20] , or it may further enhance that of the two other mutations, based on the 3D model. These findings highlight the continued importance of active genomic surveillance to monitor the spread of this B.1.1.7-M: V70L-S:178H lineage. Calum Semple G. SAGE meeting report Comprehensive mapping of mutations in the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human plasma antibodies Children's Hospital Los Angeles COVID-19 Analysis Research Database (CARD) -a resource for rapid SARS-CoV-2 genome Identification using interactive online phylogenetic tools Comprehensive genome analysis of 6,000 USA SARS-CoV-2 isolates reveals haplotype signatures and localized transmission patterns by state and by country High prevalence of SARS-CoV-2 genetic variation and D614G mutation in pediatric patients with COVID-19. Open Forum Infect Dis Persistent SARS-CoV-2 infection and increasing viral variants in children and young adults with impaired humoral immunity. medRxiv [Preprint]. 2021. Update in: EBioMedicine Emerging variants of concern in SARS-CoV-2 membrane protein: a highly conserved target with potential pathological and therapeutic implications Increased viral variants in children and young adults with impaired humoral immunity and persistent SARS-CoV-2 infection: A consecutive case series Utility of viral whole-genome sequencing for institutional infection surveillance during the coronavirus disease 2019 (COVID-19) pandemic disease and diplomacy: GISAID's innovative contribution to global health GISAID: global initiative on sharing all influenza data -from vision to reality MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era Treetime: maximum-likelihood phylodynamic analysis Can predicted protein 3D structures provide reliable insights into whether Missense variants Are disease associated? Alarming COVID variants show vital role of genomic surveillance SARS-CoV-2 variants of concern in the United States-challenges and opportunities Antibody resistance of SARS-CoV-2 variants B.1.351 and B.1.1.7 SARS-CoV-2 501Y.V2 escapes neutralization by South African COVID-19 donor plasma Recurrent deletions in the SARS-CoV-2 spike glycoprotein drive antibody escape The authors would like to acknowledge all members of the Department of Pathology and Laboratory Medicine (PLM) at Children's Hospital Los Angeles for dedication towards providing excellent patient care throughout the pandemic, including especially the rapid launch of both the COVID-19 diagnostic test in the Clinical Microbiology and Virology Laboratory and the SARS-CoV-2 whole genome sequencing assay at the Center for Personalized Medicine in March 2020, both with the support of Thermo Fisher and Paragon Genomics, with whom these assays were developed. The authors would also like to acknowledge the frontline healthcare workers who remain devoted to the fight against COVID-19. We would also like to acknowledge all institutions that have contributed the SARS-CoV-2 sequences timely, as well as NCBI, GISAID, and Nextstrain for providing valuable resources for SARS-CoV-2 genomics. No potential conflict of interest was reported by the author(s). Lishuang Shen http://orcid.org/0000-0002-0436-0199