key: cord-0778005-ekeimh5b authors: Trieu, G.; Trieu, V. N. title: Mutational analysis of SARS-CoV-2. ORF8 and the evolution of the Delta and Omicron variants. date: 2021-12-21 journal: nan DOI: 10.1101/2021.12.19.21268069 sha: 4dd20c6c7d60dc19b1de8e62483d73014c5ec940 doc_id: 778005 cord_uid: ekeimh5b SARS-CoV-2 the virus responsible for the current pandemic. This virus is continually evolving, adapting to both innate and acquired immune responses and therapeutic drugs. Therefore, it is important to understand how the virus evolving to design the appropriate therapeutic and vaccine in preparation for future variants. Here, we used the online SARS-CoV-2 databases, Nextstrain and Ourworld, to map the evolution and epidemiology of the virus. We identified 30 high entropy residues which underwent a progressive evolution to arrive at the current dominant variant - Delta variant. The virus underwent mutational waves with the first wave made up of structural proteins important in its infectivity and the second wave made up of the ORFs important for its contagion. The most important driver of the second wave is ORF8 mutations at residue 119 and 120. Further mutations of these two residues are creating new clades that are offshoots from the Delta backbone. More importantly the further expansion of the S protein in the Omicron variant is now followed with the acquisition of ORF8 mutations 119 and 120. These findings demonstrate how SARS-CoV-2 mutates and points to two evolutionary paths; 1) Mutational expansion on the Delta backbone among the ORFs and 2) Mutational expansion of the S protein on other backbone follow with mutational wave among the ORFs. Both are happening at the same time right now with the Omicron variant early in the first wave to follow with a more aggressive second wave of mutations. Since late 2019, an outbreak of upper respiratory infection and pneumonia caused by a novel coronavirus (2019-nCoV) has rapidly spread from its epicenter in Wuhan in Hubei province, China to become a global epidemic with millions of cases and hundreds of thousands of deaths. The virus has now been named Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2), and the disease it causes has been named CoronaVirus Disease 2019 (COVID 19) . It is believed that the outbreak has a zoonotic origin, with animal-to-human transmission followed by human-to-human spread via aerosol droplets and contaminated surfaces. As with the prior outbreaks of The genomic information for SARS-CoV-2 is known and has been shared. The SARS-CoV-2 genome encodes 28 confirmed proteins. Open reading frame 1ab (ORF1ab) encodes polyproteins PP1ab and PP1a which are cleaved into 16 nonstructural proteins (Nsp1 to Nsp16). Additionally, there are four structural proteins . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 21, 2021 [N] ) and eight accessory proteins (ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8, and ORF9b). Mutational conservation is highest for the Nsp polyproteins, while the genome sequence encoding the accessory factors (ORFs) diverges greatly [1] Study of mutational changes of SARS-CoV-2 will help us understand the epidemiology of COVID-19 and project out potential mutational changes in the future so that interventional steps can be implemented to stop the pandemic. Global SARS-CoV-2 genomic sequencing efforts have contributed large amounts of sequencing data from several variants into various public databases such as, GISAID and Nextstrain and NCBI SARS-CoV-2 Resources. These databases have allowed the interrogation of viral diversity with associated disease transmission in different countries [2] . In Nextstrain, the data is organized into a phylogeny tree showing evolutionary relationships of SARS-CoV-2 viruses. The site subsamples available genome data for these analysis views with ~600 genomes per continental region (~200 prior to the last 4 months, and ~400 from the most recent 4 months) in order to display a balanced global sequence distribution. Site numbering and genome structure uses Wuhan-Hu-1/2019 as a reference and the phylogeny rooted relative to early samples from Wuhan. Temporal resolution assumes a nucleotide substitution rate of 8 × 10 -4 substitutions per site per year. For each position, Nextstrain calculated the Shannon entropy of the distribution of amino acids [3] where a score of 0 corresponds to no variation and higher scores correspond to sites with increasing amino acid diversity. Epidemiological data was obtained through https://ourworldindata.org/coronavirus . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint [4] . As shown below in Figure 1 , there are high entropy residues across the SARS-CoV-2 genome. Residues with entropy greater than 0.6 were examined for their ability to generate offshoots from the Delta backbone. Each of the 30 high entropy residues examined was able to separate out Delta variants away from pre-Delta variants such as Alpha, Beta, and Omicron (Table 1) . Of those, two are class 1 (S protein; residues 95 and 142) and exhibit random distribution across the Delta phylogeny tree. Four are class 2 (ORF1a; residue 2930, S protein; residue 158, ORF8; residues 119 and 120) and are start of new lineage on the Delta backbone. One is class 3 (ORF7a; residue 71) and showed a phylogenic linkage between two major branches of the Delta variant. The phylogeny trees displayed as divergence linked radial for these residues are shown below in Figure 2 . As shown in Figure 2d , these mutational changes occur in waves. The first wave consists of the structural proteins with the N protein leading follow by the S protein. The second wave consists of ORFs proteins with ORF8 happening early. To further understand the role of ORF8 in the epidemiology of SARS-CoV-2, we scanned each residue of ORF8 for its ability to form new clades. As shown in Figure 3 , only residues 27, 52, 73, 92, 119 and 120 were able to branch off into a new clade. Additionally, 52, 119 and 120 being part of the covalent dimer interface; 73 and 92 are part of the alternate dimeric interface. To determine if the 120 mutations have been disseminated globally, we performed the analysis with a regional subset of the data on Nextstrain from August 2021 to December 2021. The data shown in Figure 4 indicated that the mutations have . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 21, 2021. ; https://doi.org/10.1101/2021.12.19.21268069 doi: medRxiv preprint spread widely into all regions under analysis with the dominant region being Egypt (F mutation) and India (L mutation). In Egypt, a number of new cases have been increasing beginning June 2021 but have now plateaued; case fatality rate has been fluctuating but there is no trend towards an increase in case fatality ( Figure 5 ). The data would suggest that F120 is potentially more contagious but not more virulent. In contrast, the case fatality rate in India has been increasing to its highest level since the pandemic, despite decreasing numbers of new cases suggesting that the -L120, though endemic like the situation in Egypt, is more virulent. We also examined the possibility that the emerging Omicron variant could acquire the F120-mutation characteristics of the Delta variant. As shown in Figure 6 , this indeed is true. A fraction of the Omicron variants has acquired the F120-mutation. This variant should be monitored carefully so that it does not become widespread as the Delta variant once did following the completion of the second mutational wave.. The global analysis of SARS-CoV-2 through evaluation of high entropy residues is suggestive of stochastic evolution. The original SARS-CoV-2 mutated rapidly into is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 21, 2021. ; https://doi.org/10.1101/2021.12.19.21268069 doi: medRxiv preprint contagious aerosols. It is possible that being able to bind the ACE2 receptor is an important initial step in establishing infectivity follow by the ability to cause productive symptoms that can spread the virus. Infectivity serves as an anchor but not sufficient for the growth of the virus. Contagion via productive sneeze/cough is necessary to propagate the viral spread among the populace. Improving contagion before improving infectivity would fail explaining the observed mutational waves. Once both infectivity and contagion been optimized the variant-now known as Delta variant-exploded. This protein modulates the adaptive host immunity through downregulation of MHC-1 (Major Histocompatibility Complex) molecules and innate immune responses by suppressing the host's interferon-mediated antiviral response. [12] The accessory protein ORF8 is one of the most rapidly evolving betacoronavirus proteins. While ORF8 expression is not strictly essential for SARS-CoV and SARS-CoV-2 replication, a 29-nucleotide deletion (Δ29) that occurred early in human-to-human transmission of SARS-CoV, splitting ORF8 into ORF8a and ORF8b, is correlated with . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 21, 2021. ; https://doi.org/10.1101/2021.12.19.21268069 doi: medRxiv preprint milder disease [7] . A 382-nucleotide deletion (Δ382) in SARS-CoV-2 [8, 9] was also found to correlate with milder disease and a lower incidence of hypoxia [10] . These data would suggest that ORF8 is important in the virulence of the virus. The crystal structure of SARS-CoV-2 ORF8 was determined at 2.04-Å resolution by X-ray crystallography. The structure reveals a ∼ 60-residue core similar to SARS-CoV-2 ORF7a, with the addition of two dimerization interfaces unique to SARS-CoV-2 ORF8. The last three residues 118 DFI 120 are part of the C-terminal covalent dimer interface and could be important in the dimerization of ORF8. The dimerization of ORF8 is a relatively new evolutionary event unique to bat and human SARS-COV-2 but not SARS-CoV. [11] . Our analyses here is indicative that mutational changes to ORF8 affecting its dimerization site have a favorable evolutionary advantage to SARS-CoV-2, giving rise to the Delta variant. Additionally, mutations that are starting new clades are all within the dimerization sites. It would suggest that ORF8 un-dimerized is more capable of inducing contagion. The Contagion Airborne Transmission inequality can be used to calculate the risk of airborne transmission of respiratory infections. Transmission occurs if extrinsic factor (Ex) is greater than intrinsic factor (N C19 ) or the droplets expelled per second times the average number of virus particles per droplet, times the fraction of droplets that make it past the face mask, times the fraction of droplets that aerosolize, times the fraction of aerosolized droplets that reach another person, times the fraction of those droplets that contain virus, times the fraction of droplets inhaled by someone not wearing a mask, times the fraction of droplets that make it through another person's mask, times the duration of exposure is greater than or equal to the minimum inhaled viral load required . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted December 21, 2021. ; https://doi.org/10.1101/2021.12.19.21268069 doi: medRxiv preprint ORF8 protein is abundantly secreted as a glycoprotein in vitro and in patients with newly diagnosed SARS-CoV-2. The levels of ORF8 protein in the blood correlate with disease mortality in patients with acute infection, and fatality in hospital patients is associated with higher serum levels of ORF8. Glycosylated ORF8 stimulates PBMCs to produce SARS-CoV-2 specific (IL1b, IL6, IL8) cytokines but not IL2. ORF8 induces proinflammatory cytokines through activation of NLRP3-mediated inflammasome pathways. [13] Dysregulation of the interferon response is a strategy employed by viruses to evade host immunity. Previous investigations of the host response to SARS-CoV and MERS-CoV infection suggest that multiple coronaviruses employ this strategy [14] [15] [16] [17] [18] . NSP1, NSP6, NSP13 and other viral proteins interfere with IFN-beta translation [19] [20] [21] . ORF3b and ORF9b both specifically target a mitochondrial antiviral-signaling protein complex in order to inhibit type I IFN signaling [22, 23] . Host factors affecting SARS-CoV-2 infection outcomes include age and sex; furthermore, comorbidities have also been implicated in interferon dysregulation [24] . SARS-CoV-2 interaction with TGF-β also caused dysregulation of NK functions [25] . Although TGF-β is thought to be an suppressed excessive immune response to the virus and its suppression would be detrimental to COVID-19 patients [26] , we have proven this to be incorrect and that suppression of TGF-beta is a therapeutic option against the virus and it has been demonstrated that an untimely early production of TGF-β and associated NK cell dysfunction is a hallmark of severe SARS-CoV-2. ORF-8 has been reported to bind to and disrupt TGF-β signaling [6] as well as impairing B-cell responses against COVID-19 [27] . TGF-beta inhibitors such as OT-101, artemisinin, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 21, 2021. ; https://doi.org/10.1101/2021.12.19.21268069 doi: medRxiv preprint for the SOC alone group (N=10) (P=0.004, Log-rank test). These data provide clinical proof of concept that targeting the TGF-β pathway with Artemisinin may contribute to a faster recovery of patients with mild-moderate SARS-CoV-2 when administered early in the course of their disease [35] . We thank the following individuals at Brush and Key Foundation for their help in this research: Nikkita Mehta, Jeffrey Park, Andrew Ionescu, Lily Asgari. Funding came from Oncotelic Inc. https://ourworldindata.org/coronavirus (accessed December 17, 2021). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The amino acid residue and the date of divergence are indicated in text. 19 on es nd s). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 21, 2021. ; https://doi.org/10.1101/2021.12.19.21268069 doi: medRxiv preprint Comparative host-coronavirus protein interaction networks reveal pan-viral disease mechanisms Nextstrain: real-time tracking of pathogen evolution Predicting functionally important residues from sequence conservation Coronavirus Pandemic (COVID-19) Structure of SARS-cov-2 ORF8, a rapidly evolving immune evasion protein A unique view of SARS-CoV-2 through the lens of ORF8 protein Secreted ORF8 is a pathogenic cause of severe Covid-19 and potentially targetable with select NLRP3 inhibitors SARS-CoV and IFN: Too Little, Too Late Dysregulated Type I Interferon and Inflammatory Monocyte-Macrophage Responses Cause Lethal Pneumonia in SARS-CoV-Infected Mice The ORF8 protein of SARS-CoV-2 induced endoplasmic reticulum stress and mediated immune evasion by antagonizing production of interferon beta Severe Acute Respiratory Syndrome Coronavirus Papain-Like Regulation of IRF-3-dependent innate immunity by the papain-like protease domain of the severe acute respiratory syndrome coronavirus Structural basis for translational shutdown and immune evasion by the Nsp1 protein of SARS-CoV-2, Science (80-. ) A SARS-CoV-2 protein interaction map reveals targets for drug repurposing, Nat Interplay between SARS-CoV-2 and the type I interferon response SARS-CoV-2 ORF3b Is a Potent Interferon Antagonist Whose Activity Is Increased by a Naturally Occurring Elongation Variant SARS-CoV-2 ORF9b suppresses type I interferon responses by targeting TOM70 Comorbidity and its impact on 1590 patients with COVID-19 in China: a nationwide analysis Untimely TGFβ responses in COVID-19 limit antiviral functions of NK cells, Nat TGF-β Activation and Function in Immunity SARS-CoV-2 in severe COVID-19 induces a TGF-βdominated chronic immune response that does not target itself Severe COVID-19 Is Marked by a Dysregulated Myeloid Cell Compartment Pathological inflammation in patients with COVID-19: a key role for monocytes and macrophages Perforin and Granzymes Have Distinct Roles in Defensive Immunity and Immunopathology Senescence of Activated Stellate Cells Limits Liver Fibrosis Repurposing Anti-Malaria Phytomedicine Artemisinin as a COVID-19 Drug Targeting Transforming Growth Factor-beta for Treatment of COVID-19-associated Kawasaki Disease in Children Selectively targeting TGF-β with Trabedersen/OT-101 in treatment of evolving and mild ards inCOVID-19 Targeting TGF-β pathway with COVID-19 Drug Candidate ARTIVeda/PulmoHeal Accelerates Recovery from Mild-Moderate COVID-19 Authors are employee and/or contractor of Oncotelic Inc.