key: cord-0991651-o90nt2uj authors: Parker, Matthew D; Lindsey, Benjamin B.; Shah, Dhruv R; Hsu, Sharon; Keeley, Alexander J; Partridge, David G; Leary, Shay; Cope, Alison; State, Amy; Johnson, Katie; Ali, Nasar; Raghei, Rasha; Heffer, Joe; Smith, Nikki; Zhang, Peijun; Gallis, Marta; Louka, Stavroula F; Hornsby, Hailey R; Whiteley, Max; Foulkes, Benjamin H; Christou, Stella; Wolverson, Paige; Pohare, Manoj; Hansford, Samantha E; Green, Luke R; Evans, Cariad; Raza, Mohammad; Wang, Dennis; Gaudieri, Silvana; Mallal, Simon; de Silva, Thushan I. title: Altered Subgenomic RNA Expression in SARS-CoV-2 B.1.1.7 Infections date: 2021-03-04 journal: bioRxiv DOI: 10.1101/2021.03.02.433156 sha: 65762d6fa68a93f6f6c173f97de61406f3aa6066 doc_id: 991651 cord_uid: o90nt2uj SARS-CoV-2 lineage B.1.1.7 viruses are more transmissible, may lead to greater clinical severity, and result in modest reductions in antibody neutralization. subgenomic RNA (sgRNA) is produced by discontinuous transcription of the SARS-CoV-2 genome and is a crucial step in the SARS-CoV-2 life cycle. Applying our tool (periscope) to ARTIC Network Oxford Nanopore genomic sequencing data from 4400 SARS-CoV-2 positive clinical samples, we show that normalised sgRNA expression profiles are significantly increased in B.1.1.7 infections (n=879). This increase is seen over the previous dominant circulating lineage in the UK, B.1.177 (n=943), which is independent of genomic reads, E gene cycle threshold and days since symptom onset at sampling. A noncanonical sgRNA which could represent ORF9b is found in 98.4% of B.1.1.7 SARS-CoV-2 infections compared with only 13.8% of other lineages, with a 16-fold increase in median expression. We hypothesise that this is a direct consequence of a triple nucleotide mutation in nucleocapsid (28280:GAT>CAT, D3L) creating a transcription regulatory-like sequence complementary to a region 3’ of the genomic leader. These findings provide a unique insight into the biology of B.1.1.7 and support monitoring of sgRNA profiles in sequence data to evaluate emerging potential variants of concern. One Sentence Summary The recently emerged and more transmissible SARS-CoV-2 lineage B.1.1.7 shows greater subgenomic RNA expression in clinical infections and enhanced expression of a noncanonical subgenomic RNA near ORF9b. The recently emerged SARS-CoV-2 lineages B.1.1.7 ( 20I/501Y.V1 or VOC-202012/01), B.1.351 ( 20H/501Y.V2 or VOC-202012/02 ) and P.1 ( 20J/501Y.V3 or VOC-202101/02) ( 1 ) have been classified as variants of concern by public health agencies. An increasing body of evidence suggests B.1.1.7 is more transmissible ( 2 , 3 ) and rapidly became the dominant circulating virus in the United Kingdom (UK) during October 2020 to February 2021 ( Figure 1A ). To date, B.1.1.7 has been reported in 93 countries ( https://cov-lineages.org/ , 22nd February 2021), with increasing prevalence. Recent data suggest a similar pattern for P.1 in Manaus ( 4 ) , Brazil ( 5 ) , and now 19 other countries ( https://cov-lineages.org/ , 22nd February 2021), where it appears to be more transmissible than it's ancestral lineage B.1.1.28 ( 4 ) . Along with B.1.351, these variants of concern harbour functionally important mutations in the SARS-CoV-2 Spike protein, some of which demonstrate evidence of convergent evolution across all three lineages (Supplementary Table S1 ). Preliminary data ( 6 , 7 ) shows that B.1.1.7 positive diagnostic respiratory samples may have lower cycle threshold (Ct) values, therefore higher viral loads, compared to other lineages. These findings suggest a potential reason for enhanced transmissibility, though they did not account for potential confounders such as days since symptom onset at sampling. Many of these studies also use S gene target failure (SGTF) as a surrogate for the presence of B.1.1.7 ( 6 ) , which might misclassify samples, depending on the prevalence of B.1.1.7 ( 8 ) . Several analyses from community tested cases also suggest increased mortality associated with B.1.1.7 ( 8 ) . Reasons for the potential viral load increase and enhanced mortality are currently unclear ( 9 ) . Genomic surveillance has been critical in rapidly identifying these variants and Nanopore sequencing of ARTIC Network ( 10 ) prepared SARS-CoV-2 amplicons is used by many laboratories to generate this data. We have recently reported an approach to quantify subgenomic RNA (sgRNA) expression profiles from genomic sequence data, which is produced as a result of a critical step in the SARS-CoV-2 replication cycle ( 11 ) . sgRNA is produced from the genomically encoded SARS-CoV-2 RNA Dependent RNA polymerase (RdRp) using discontinuous transcription of the positive, single-stranded SARS-CoV-2 genome from the 3' end. Negatively stranded RNAs are produced, which are shorter than the genome, owing to a template switch from the ORF to the leader sequence at the 5' end of the genome when RdRp encounters a transcription regulatory sequence in the genome body (TRS-B) to a complementary TRS 3' of the leader sequence (TRS-L). All sgRNAs therefore contain a leader sequence at their 3' end which can be used computationally for their identification. There are thought to be nine such canonical sgRNAs; Spike:S, E: Envelope, M: Membrane, N: Nucleocapsid, ORF3a, ORF6, ORF7a, ORF8 and ORF10, although multiple studies have found negligible ORF10 expression ( 12 , 13 ) . As part of COVID-19 Genomics Consortium UK (COG-UK) ( 14 ) we are sequencing SARS-CoV-2 positive nose-throat swabs from healthcare workers and patients at Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK ('Pillar 1' testing). Additionally, to relieve pressure on centralised sequencing services, we also sequence a selection from 'Pillar 2' testing, which represent SARS-CoV-2 positive samples from the community, tested at the UK's Lighthouse Laboratories ( Figure S1 ). We hypothesised that we would see differences in sgRNA expression profiles in distinct lineages of SARS-CoV-2, in particular, increased sgRNA in B.1.1.7 that may relate to its altered phenotype. We stratified sgRNA expression by lineage in 4400 SARS-CoV-2 sequences that reached our previously defined quality control thresholds (>90% genome coverage, >50K mapped reads, Table S2) Figure 1E&F ). Accordingly, total mapped reads and total genomic RNA reads (i.e. non-sgRNA) were also significantly higher for B.1.1.7 sequences compared to B.1.177 ( Figure S3 ). We observed a weak but significant negative correlation between raw sgRNA reads and It is possible that the days since symptom onset at sampling may vary between lineages in our dataset, either due to changes in sampling practice over time or presentation of individuals to healthcare services, which in turn could confound our sgRNA findings. We Figure 2F&S8 ). This increase appeared to be greatest on day one following symptom onset. Using the same methods for sgRNA estimation, we have previously shown that ACE2 and TMPRSS2 expression alter the kinetics of sgRNA expression in vitro , leading to a peak in expression at an earlier stage of infection ( 11 ) . It is possible that the increased affinity for ACE2 conferred by N501Y in the B. Figure 3A ). In addition to the noncanonical sgRNA resulting from the nucleocapsid R203K/G204R mutation (N*, Figure S2 homoplasies in a global phylogeny, suggesting that this event may have occurred independently on several occasions ( Figure S10 ). Taken together, our data suggests that sgRNA expression from existing ARTIC Nanopore sequencing data can be used in real time to examine the effect of SARS-CoV-2 variation on its ability to express its genome. We cannot say if the increased sgRNA expression is the cause or consequence of an increase in viral replication or a more efficacious entry, but our study provides further insight to guide exploration with mechanistic studies. A major advantage of this approach is that we can deconvolute the contribution of genomic and subgenomic RNA, which is impossible with current diagnostic PCR assays and we can, additionally, examine all ORFs simultaneously and discover noncanonical subgenomic RNA which could be of biological relevance. Finally we believe that sgRNA expression analysis should be carried out on all compatible genomic surveillance platforms to give an instant readout of altered expression profiles in emerging SARS-CoV-2 variants. This would use existing data to complement epidemiological and phylodynamic methods, and provide an early warning of variants that might be of concern with regards to greater transmissibility and/or disease severity. Subgenomic RNAs were classified using periscope ( 11 ) as described previously. Briefly, reads containing the leader sequence at their start are identified by a local alignment, the quality of this alignment determines which quality "bin" sgRNA reads are placed (HQ, LQ or LLQ). The amplicon from which the sgRNA evidence was generated is determined and a count of genomic reads for this amplicon used to normalise the raw sgRNA read counts. Samples were excluded from the subsequent analysis if: • Their consensus coverage was < 0.9 • They had less than 50,000 mapped reads (we have previously shown that fewer reads produce a less robust analysis) Lineages ( 1 ) were assigned using Pangolin ( https://github.com/cov-lineages/pangolin v2.1.7). Statistical analysis was performed in R ( 27 ) . Figures were generated in R using tidyverse ( 28 ) , apart from those that depict sequencing reads, which were generated in IGV ( 29 ) . Tests between groups were performed using an unpaired Wilcoxon test using the rstatix package (https://github.com/kassambara/rstatix) , adjusting p-values for any multiple comparisons using the Holm method. The grapevine pipeline ( https://github.com/COG-UK/grapevine ) was used for generating the phylogeny based on all data available on GISAID and COG-UK up until 16th February 2021. A representative sample of global sequences was obtained in 2 steps. First by randomly selecting one sequence per country per epi week followed by random sampling of the remaining sequences to generate a sample of 4000 sequences. The global tree was then pruned using code adapted from the tree-manip package ( https://github.com/josephhughes/tree-manip ). We then identified samples with D3L mutations and colour coded these tips according to their lineages. The visualisation was produced using R/ape, R/ggplot2, R/ggtree, R/treeio, R/phangorn, R/stringr, R/dplyr, R/aplot. Individuals presenting with active COVID-19 disease were sampled for SARS CoV-2 sequencing at Sheffield Teaching Hospitals NHS Foundation Trust, UK using samples collected for routine clinical diagnostic use. This work was performed under approval by the UK consortium (R&D NR0195). All SARS-CoV-2 consensus sequences that are of high enough quality are available on GISAID and ENA and from https://www.cogconsortium.uk/data/ . All sgRNA expression data is provided as supplementary files and at https://github.com/sheffield-bioinformatics-core/periscope-variants-publication . All raw sequencing data will be available on ENA as soon as possible. Periscope is available at https://github.com/sheffield-bioinformatics-core/periscope and the code used to generate the figures contained within this manuscript can be found as a RLU F q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q A ns **** ns **** ns **** ** **** ns ns A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom CMMID COVID-19 Working Group, Estimated transmissibility and severity of novel SARS-CoV-2 Variant of Concern 202012/01 Genomic characterisation of an emergent SARS-CoV-2 lineage in Manaus: preliminary findings Genomic characterization of a novel SARS-CoV-2 lineage from Rio de Janeiro, Brazil. medRxiv S-variant SARS-CoV-2 lineage B1.1.7 is associated with significantly higher viral loads in samples tested by ThermoFisher TaqPath RT-qPCR The COVID-19 Genomics UK (COG-UK) consortium, Early analysis of a potential link between viral load and the N501Y mutation in the SARS-COV-2 spike protein Increased hazard of death in community-tested cases of SARS-CoV-2 Variant of Concern Increased transmission of SARS-CoV-2 lineage B.1.1.7 (VOC 2020212/01) is not accounted for by a replicative advantage in primary airway cells or antibody escape periscope: sub-genomic RNA identification in SARS-CoV-2 ARTIC Network Nanopore Sequencing Data. bioRxiv (2020) The Architecture of SARS-CoV-2 Transcriptome The coding capacity of SARS-CoV-2 COG-UK) consortiumcontact@cogconsortium.uk, An integrated national scale SARS-CoV-2 genomic surveillance network Tracking SARS-CoV-2 VOC 202012/01 (lineage B.1.1.7) dissemination in Portugal: insights from nationwide RT-PCR Spike gene drop out data Improved sensitivity using a dual target, E and RdRp assay for the diagnosis of SARS-CoV-2 infection: Experience at a large NHS Foundation Trust in the UK Three adjacent nucleotide changes spanning two residues in SARS-CoV-2 nucleoprotein: possible homologous recombination from the transcription-regulating sequence The New SARS-CoV-2 Strain Shows a Stronger Binding Affinity to ACE2 Due to N501Y Mutation The high infectivity of SARS-CoV-2 B.1.1.7 is associated with increased interaction force between Spike-ACE2 caused by the viral N501Y mutation Sequence motifs involved in the regulation of discontinuous coronavirus subgenomic RNA synthesis SARS-CoV-2 Orf9b suppresses type I interferon responses by targeting TOM70 SARS-CoV-2 ORF9b Antagonizes Type I and III Interferons by Targeting Multiple Components of RIG-I/MDA-5-MAVS, TLR3-TRIF, and cGAS-STING Signaling Pathways SARS-CoV-2 ORF9b inhibits RIG-I-MAVS antiviral signaling by interrupting K63-linked ubiquitination of NEMO SARS corona virus peptides recognized by antibodies in the sera of convalescent cases SARS-CoV-2 proteome microarray for global profiling of COVID-19 specific IgG and IgM responses High-Throughput Transcription-mediated amplification on the Hologic Panther is a highly sensitive method of detection for SARS-CoV-2 R: A Language and Environment for Statistical Computing Others, Welcome to the Tidyverse Integrative genomics viewer We thank the Sheffield Bioinformatics Core for their useful thoughts and discussions. We