key: cord-0823260-7xrduhq7 authors: Hassan, Sk. Sarif; Aljabali, Alaa A.A.; Panda, Pritam Kumar; Ghosh, Shinjini; Attrish, Diksha; Choudhury, Pabitra Pal; Seyran, Murat; Pizzol, Damiano; Adadi, Parise; Abd El-Aziz, Tarek Mohamed; Soares, Antonio; Kandimalla, Ramesh; Lundstrom, Kenneth; Lal, Amos; Azad, Gajendra Kumar; Uversky, Vladimir N.; Sherchan, Samendra P.; Baetas-da-Cruz, Wagner; Uhal, Bruce D.; Rezaei, Nima; Chauhan, Gaurav; Barh, Debmalya; Redwan, Elrashdy M.; Dayhoff, Guy W.; Bazan, Nicolas G.; Serrano-Aroca, Ángel; El-Demerdash, Amr; Mishra, Yogendra K.; Palu, Giorgio; Takayama, Kazuo; Brufsky, Adam M.; Tambuwala, Murtaza M. title: A Unique View of SARS-COV-2 through the Lens of ORF8 Protein date: 2021-04-15 journal: Comput Biol Med DOI: 10.1016/j.compbiomed.2021.104380 sha: 1a659914e13660eb5f861e4fb33df54fd5817ba7 doc_id: 823260 cord_uid: 7xrduhq7 Immune evasion is one of the unique characteristics of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) attributed to its ORF8 protein. This protein modulates the adaptive host immunity through down-regulation of MHC-1 (Major Histocompatibility Complex) molecules and innate immune responses by surpassing the host's interferon-mediated antiviral response. To understand the host's immune perspective concerning the ORF8 protein, a comprehensive study of the ORF8 protein and mutations possessed by it have been performed. Chemical and structural properties of ORF8 proteins from different hosts, such as human, bat, and pangolin, suggest that the ORF8 of SARS-CoV-2 is much closer to ORF8 of Bat RaTG13-CoV than to that of Pangolin-CoV. Eighty-seven mutations across unique variants of ORF8 in SARS-CoV-2 can be grouped into four classes based on their predicted effects [1]. Based on the geo-locations and timescale of sample collection, a possible flow of mutations was built. Furthermore, conclusive flows of amalgamation of mutations were found upon sequence similarity analyses and consideration of the amino acid conservation phylogenies. Therefore, this study seeks to highlight the uniqueness of the rapidly evolving SARS-CoV-2 through the ORF8. hosts, such as human, bat, and pangolin, suggest that the ORF8 of SARS-CoV-2 is much 83 closer to ORF8 of Bat RaTG13-CoV than to that of Pangolin-CoV. Eighty-seven 84 mutations across unique variants of ORF8 in SARS-CoV-2 can be grouped into four 85 classes based on their predicted effects [1] . Based on the geo-locations and timescale of 86 sample collection, a possible flow of mutations was built. Furthermore, conclusive flows 87 of amalgamation of mutations were found upon sequence similarity analyses and 88 consideration of the amino acid conservation phylogenies. Therefore, this study seeks to 89 highlight the uniqueness of the rapidly evolving SARS-CoV-2 through the ORF8. Highlights: 95 * ORF8 is an accessory protein that is currently prevalent and has been suggested to 96 interfere with inflammatory responses. ORF7a, ORF7b, ORF8, and ORF10 ( Figure 1A ) [12] [13] [14] [15] [16] . Among these accessory proteins, 121 SARS-CoV-2 ORF8 is a complete protein, as it is different from any other known 122 coronavirus ORF8 and thereby can be associated with SARS-CoV-2 pathogenicity [17, 18] . 123 The SARS-CoV-2 ORF8 displays arrays of functions; inhibition of interferon 1, promotion of 124 viral replication, induction of apoptosis, and modulation of the ER stress [19] [20] [21] . 125 126 The SARS-CoV-2 ORF8 is a 121 amino acid (aa) long protein, which has an N-terminal 127 hydrophobic signal peptide (1-15 aa) , and an ORF8 chain (16-121 aa) bearing dimer 128 crystallography determined to 2.04 Å (PDB-ID:7JTL) ( Figure 1B) [22, 23] . The functional 129 motif (VLVVL) of SARS-CoV ORF8b, responsible for the induction of cell stress pathways 130 and activation of macrophages, is absent from the SARS-CoV-2 ORF8 protein [24] . In the 131 later stages of the SARS-CoV epidemic, it was found that a 29 nucleotide deletion in the ORF8 132 protein caused it to split into ORF8a (39 aa) and ORF8b (84aa), rendering it functionless [25] . 133 Although such deletions have not been reported for SARS-CoV-2, a 382-nucleotide deletion 134 variant (∆382) was identified in Singapore and other countries, which caused the deletion of 135 the entire ORF8 protein [26] . Patients with the ∆382 variant exhibited less severe symptoms, 136 including milder hypoxic conditions and low cytokine activity compared to patients infected 137 with the wildtype virus [26] . Also, the SARS-CoV-2 ORF8 functions in interspecies 138 transmission and viral replication efficiency as the ∆382 deletion variant resulted in a reduced 139 viral replication ability in human cells [27] . However, the SARS-CoV-2 ORF8 mainly acts as 140 an immune-modulator by down-regulating MHC class I molecules, thereby shielding the 141 infected cells against cytotoxic T cell, killing the target cells ( Figure 1C ). Simultaneously, it 142 is a potent inhibitor of the type 1 interferon signaling pathway, a key component of antiviral 143 host immune response [28, 29] . The ORF8 also regulates unfolded protein response (UPR) 144 induced due to the ER stress by triggering the ATF-6 activation, thus enhancing the 145 survivability of infected cells [30] . Since this protein impacts various host processes and 146 develops various strategies for evading the host immune responses, it is essential to study the 147 ORF8 mutations (natural variability) to understand better the viral infectivity and The present study identified a set of distinct mutations across unique variants of the SARS- 162 CoV-2 ORF8 and classified them according to their predicted effect on the host (i.e., disease 163 or neutral) and their consequences for protein structural stability. Furthermore, a comparison 164 of the ORF8 protein of SARS-CoV-2 with Bat-RaTG13-CoV and Pangolin-CoV ORF8 was 165 conducted to determine the evolutionary relationships regarding sequence similarity and 166 originality of these paralogues. Similarly, a hydropathy and charge examination of the SARS- The SARS-CoV-2 ORF8 protein (YP 009724396) is a 121-amino-acid-long protein, which has 176 an N-terminal hydrophobic signal peptide (1-15 aa) and an ORF8 chain (16-121 aa). Fig. S1 177 shows a schematic representation of ORF8 (SARS-CoV-2). In this protein, the total number 178 of hydrophilic residues (63) was more extensive than that of the hydrophobic residues (58). 179 The ORF8 protein of SARS-CoV-2 has only 55.4% nucleotide and 30% amino acid similarity signal sequence, which directs its transport to the endoplasmic reticulum (ER). However, 193 after deleting the 29 nucleotides, which splits the ORF8ab protein into ORF8a and ORF8b, 194 only ORF8a can translocate to the ER, and ORF8b remains distributed throughout the cell. 195 Likewise, the SARS-CoV-2 ORF8 protein also contains an N-terminal hydrophobic signal 196 peptide (1-15 aa), which is involved in the same function. The ER has an internal oxidative 197 environment akin to other organelles, necessary for correct protein folding and oxidation 198 processes. Due to this oxidative environment, the formation of intra or intermolecular disulfide 199 bonds between unpaired cysteine residues can occur as the SARS-CoV ORF8ab protein is an 200 ER-resident protein. There are ten cysteine residues present in ORF8 of SARS-CoV, which 201 can be involved in disulfide linkages leading to the formation of homomultimeric complexes 202 in the ER. Similarly, the ORF8 of SARS-CoV-2 also has seven cysteines, which may be 203 expected to form these types of disulfide linkages. (Table S2) . From Table S2 , it is inferred that the secondary structures of ORF8 295 (SARS-CoV-2) and Bat-RaTG13-CoV are closely related compared to the ORF8 of 296 Pangolin-CoV. Based on the sequence alignment, the ORF8 of SARS-CoV-2 differs 297 substantially from the ORF8 of Pangolin-CoV in terms of a greater number of amino acid 298 differences (mutations). It can be hypothesized that the SARS-CoV-2 ORF8 protein is using 299 ORF8 of Bat RaTG13-CoV as a blueprint of its structure. Figure 4C and Figure 4D shows that the variability in the disorder 351 predisposition between many variants of the ORF8 protein from SARS-CoV-2 isolates is 352 noticeably greater than that between the reference ORF8 from SARS-CoV-2 and ORF8 353 proteins from Bat-RaTG13-CoV and Pangolin-CoV. As the analyses mentioned above showed the similarity profile based on sequence and 385 structure alignments, awareness of the physicochemical properties of the SARS-CoV-2 386 ORF8 protein is required to understand the composition of these viral proteins to develop 387 subunit vaccines or for designing drugs targeting these specific proteins [44] . 388 Physicochemical analysis revealed that the total number of hydrophilic residues in the SARS- 389 CoV-2 ORF8 protein was higher than that of the hydrophobic residues [24]. However, the 390 predicted secondary structure and solvent accessibility analysis (Fig. S8 ) indicated that the 391 highest solubility score for this protein is four, indicating that although hydrophilic residues 392 are higher in number, they are insufficient to ensure high protein solubility. Fig. S8 shows the 393 predicted secondary structure and solvent accessibility of the ORF8 proteins of SARS-CoV-394 2, Bat-RaTG13-CoV, and Pangolin-CoV obtained using the ab-initio web server QUARK to 395 perceive the differences. The frequencies of the hydrophobic, hydrophilic, and charged amino 396 acids were compared among the four ORF8 proteins of SARS-CoV, SARS-CoV-2, Bat-397 RaTG13-CoV, and Pangolin-CoV. As seen in Table S3 , SARS-CoV-2 ORF8, Bat-RaTG13- 398 CoV ORF8, and Pangolin-CoV ORF8 are all similar in terms of hydrophobicity and 399 hydrophilicity, and it is known that hydrophobicity and hydrophilicity play an essential role 400 in protein folding, which determines the tertiary structure of the protein and thereby affects The N-terminal signal peptide of ORF8 (D1) of SARS-CoV-2 is hydrophobic. We further 487 analyzed mutations within this region and observed that hydrophobic to hydrophobic 488 mutations were dominating, indicating that the domain's hydrophobicity is maintained (Table 489 S6 ). Therefore, we can postulate that there are probably no functional changes in the 490 hydrophobic N-terminal signal peptide associated with the evolutionary variability. 491 Furthermore, it was found that there was a change from hydrophilic to hydrophobic residues 492 in two positions of the D1 region, thereby further enhancing its hydrophobic nature. Although 493 hydrophobic to hydrophilic and hydrophilic to hydrophilic mutations were also observed, 494 they were not expected to have significant effects when compared to hydrophobicity changes, 495 as hydrophobic mutations were observed at eight positions. 538 Next, we compared the SARS-CoV-2 ORF8 with the Bat-RaTG13-CoV and the Pangolin-539 CoV ORF8 to study mutations' evolution. The mutations in the ORF8 protein regarding the 540 reference ORF8 sequence of Bat-RaTG13-CoV were found to be of the neutral type as 541 predicted through the webserver Meta-SNP. All of them are expected to cause a decrease in 542 ORF8 stability as determined using the server I-MUTANT (Fig. S5) In this flow of mutations, we have described the occurrence of mutations in the US sequences 564 based on chronological order, considering the Wuhan ORF8 sequence YP 009724396 as the 565 reference sequence (Fig. 7) . The protein sequence QMI92505.1 possesses a mutation L4F of 566 neutral type with no change in hydropathy. However, it showed a decreasing effect on the 567 stability of the protein. Following this sequence, another sequence, QMT48896.1, was 568 identified following the time scale, in which a second mutation located at D63N emerged. 569 This mutation is of neutral type, and no change in hydropathy was observed. Therefore, this 570 sequence accumulated two neutral mutations, which may affect the protein's function as both 571 mutations cause a decrease in protein stability. The QMT96239.1 sequence harbors another 572 mutation, G8R, which is of the disease-increasing type, and the hydropathy changed from 573 hydrophobic to hydrophilic. Another mutation, D35Y, occurred as a second-order mutation in 574 the QMU92030.1 sequence in addition to the G8R mutation. As D35Y is neutral and G8R is 575 of the disease-increasing type, their combination may alter both the protein's structure and 576 function. To support these mutation flows, we analyzed the protein sequence similarity based 577 on phylogeny and amino acid composition. The reference ORF8 sequence YP 009724396 578 was found to be much more like the variants QMT48896.1 and QMI92505.1, which are more 579 like each other as depicted in the sequence-based phylogeny ( Figure 7A ). This sequence- protein sequence QLH58953.1 acquired a second mutation, P38S, which was found to be of 609 the disease-increasing type, and the hydropathy also changed from hydrophobic to 610 hydrophilic, indicating that these mutations may be of some importance. The protein 611 sequence QLH58821.1 possesses a second mutation, V62L, which was found to be of the 612 disease-neutral type with no hydropathy change. Here, this sequence accumulated two neutral 613 mutations, which may account for some functional changes. By comparing both the 614 sequence-based phylogeny and amino acid conservation-based phylogeny, we found that 615 according to sequence-based phylogeny, the Australian sequence is closely related to the 616 ORF8 Wuhan sequence. However, according to the pathway, it should be closely related to 617 both the Wuhan sequence and second-order mutations. This can be attributed to the presence 618 of 119 amino acid residues instead of 121 amino acid residues. In this case, the sequence has 619 two amino acid deletions. Therefore, it is present at the first node. 620 621 We analyzed the US sequences considering the Wuhan sequence (YP 009724396.1) as the 624 reference and found one sequence, QKC05159.1, with a single mutation and seven sequences 625 with two mutations each ( Figure 7C ). The first sequence, QKC05159.1, contained the L84S 626 mutation (strain-determining mutation), neutral. However, the hydropathy changed from 627 hydrophobic to hydrophilic, which may account for some significant change of a function. 628 The sequences that accumulated a second mutation along with L84S are as follows: 629 QMT28672.1: This sequence possesses a second mutation V5F, which was predicted to be of 630 neutral type with no hydropathy change. Hence this sequence acquired two neutral mutations, 631 and together these mutations may alter the protein's function. protein, it is proposed that virus with the S24L mutation is a new strain altogether. We also 733 observed that hydrophobic to hydrophobic mutations are dominant in the D1 domain. 734 Therefore, hydrophobicity is an essential property for the N-terminal signal peptide. 735 However, in the D2 domain, hydrophobic to hydrophilic mutations are observed more 736 frequently, consequently making the ionic interactions more favorable and allowing the 737 protein to evolve, providing better pathogenicity efficacy. The ORF8 sequence of SARS-CoV-2 shows 93% similarity with the Bat-RaTG13-CoV and 740 88% similarity with that of the Pangolin-CoV ORF8. Thus, the ORF8 protein of SARS-CoV-741 2 can be considered a valuable candidate for deterministic evolutionary studies and the 742 determination of the origin of SARS-CoV-2. We also analyzed a wide variety of mutations in 743 the SARS-CoV-2 ORF8, where we compared them with the ORF8 of Bat-RaTG13-CoV and 744 the Pangolin-CoV in relation to charge and hydrophobicity. We found that the Bat-RaTG13- 745 CoV ORF8 protein exhibits precisely the same properties as that of the SARS-CoV-2 ORF8 746 protein, whereas the properties of the Pangolin-CoV ORF8 are relatively less similar to the 747 SARS-CoV-2 ORF8. Furthermore, to study the evolutionary nature of mutations in the ORF8, 748 we aligned three bat sequences and found that two of them were the same, and there were only 749 six amino acid differences in the third compared to the other two sequences. So, only two 750 variants were identified for the Bat-RaTG13-CoV ORF8. Therefore, it shows that the 751 mutation rate is slow in the Bat-RaTG13-CoV ORF8. However, for pangolins, no differences were observed among four Pangolin-CoV ORF8 754 sequences, and therefore, only a single variant of ORF8 was identified. The Bat-RaTG13-CoV, 755 the Pangolin-CoV, and the SARS-CoV-2 ORF8 displayed a high similarity index based on 756 sequence alignment, biochemical characteristics, and secondary structure analysis [51] . 757 Additionally, in the ORF8 of SARS-CoV-2, specific mutations were found to exhibit exact Table S1 to Table S9 864 865 866 Immunoinformatic Analysis of 869 Structural and Epitope Variations in Spike and Orf8 Proteins of SARS-CoV-2/B The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak The 874 effect of human mobility and control measures on the COVID-19 epidemic in China Editorial: The explosive epidemic outbreak of novel coronavirus disease 2019 (COVID-876 19) and the persistent threat of respiratory tract infectious diseases to global health security COVID-19 pandemic: an overview of 879 epidemiology, pathogenesis, diagnostics and potential vaccines and therapeutics Differences and similarities between Severe Acute Respiratory Syndrome (SARS)-CoronaVirus (CoV) and SARS-CoV-2. Would a rose by another name smell as 882 sweet? Systematic Comparison of Two Animal Human Transmitted Human Coronaviruses: SARS-CoV-2 and SARS-CoV Underpinning Research for Detection, Therapeutics, and Vaccines Development Understanding 888 COVID-19 via comparative analysis of dark proteomes of SARS-CoV-2, human SARS and bat SARS-like 889 coronaviruses A Genomic Perspective on the Origin and Emergence of SARS-CoV-2 Genomic Diversity of Severe Acute Respiratory Syndrome-Coronavirus 2 894 in Patients With Coronavirus Disease Compositional diversity and 896 evolutionary pattern of coronavirus accessory proteins Advances in research on ACE2 as a receptor for 2019-nCoV The ORF6, ORF8 and nucleocapsid proteins of SARS-900 CoV-2 inhibit type I interferon signaling pathway The Architecture of SARS-CoV-2 Transcriptome Coronavirus envelope protein: a small membrane protein with multiple functions The ORF8 Protein of SARS-CoV-2 Mediates Immune Evasion 907 through Potently Downregulating MHC-I, bioRxiv Genomic Divergence and Functional Convergence, Pathogens Novel Immunoglobulin Domain Proteins Provide Insights 911 into Evolution and Pathogenesis of SARS-CoV-2-Related Viruses Region Is Valuable in the Epidemiological Investigation of Severe Acute Respiratory Syndrome-Similar Coronavirus Genomic characterization of a novel SARS-CoV-2 Functional pangenome analysis 918 provides insights into the origin, function and pathways to therapy of SARS-CoV-2 coronavirus, bioRxiv Pangenome Analysis Shows Key Features of E Protein Are Preserved in SARS and SARS-CoV-2, Front Cell Infect 922 Microbiol Severe Acute Respiratory Syndrome 924 From Gene Structure to Pathogenic Mechanisms and Potential Therapy Effects of a major 931 deletion in the SARS-CoV-2 genome on the severity of infection and the inflammatory response: an observational 932 cohort study Understanding genomic diversity, 934 pan-genome, and evolution of SARS-CoV-2 Immune evasion via SARS-CoV-2 ORF8 protein? Host immune response and immunobiology of human SARS-937 CoV-2 infection The 8ab protein of SARS-CoV is a luminal ER membrane-939 associated protein and induces the activation of ATF6 Mechanisms of severe acute respiratory syndrome pathogenesis and innate 941 immunomodulation Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions SARS-Coronavirus Open Reading Frame-8b triggers 945 intracellular stress pathways and activates NLRP3 inflammasomes SARS-CoV-2 ORF8 and SARS-CoV ORF8ab: genomic divergence and functional convergence p.i.p.s. Kurgan, Computational prediction of intrinsic disorder in proteins Mutational analysis of SARS-CoV-2 ORF8 during six months of Optimizing long 953 intrinsic disorder predictors with protein evolutionary information Dynamics, Accurate prediction of disorder in protein chains with a comprehensive and 955 empirically designed consensus Comprehensive review of methods for prediction of intrinsic 957 disorder and its molecular functions Comprehensive comparative assessment of in-silico predictors of 959 disordered regions Structure of SARS-CoV-2 ORF8, a rapidly 961 evolving coronavirus protein implicated in immune evasion, bioRxiv Structure of SARS-CoV-2 ORF8, a 963 rapidly evolving coronavirus protein implicated in immune evasion Accurate Diagnosis of COVID-19 by a Novel Immunogenic Secreted SARS-CoV-2 orf8 Protein An overview of vaccine development for COVID-19 A simple method for displaying the hydropathic character of a protein A.o.p.s. Dayhoff, structure, A model of evolutionary change in proteins Understanding genomic 972 diversity, pan-genome, and evolution of SARS-CoV-2 Implications of SARS-CoV-2 Mutations for Genomic RNA Structure and Host 974 microRNA Targeting Pathogenetic perspective of missense mutations of 976 orf3a protein of sars-cov2 Urgent Need for Field Surveys of Coronaviruses in Southeast Asia to Understand the SARS-CoV-2 Phylogeny and Risk Assessment for Future Outbreaks The structural basis of accelerated host cell entry by SARS-CoV-2 Nosocomial outbreak of COVID-19 pneumonia in Wuhan, China SWISS-MODEL: homology modelling of protein structures and complexes UCSF 988 ChimeraX: Structure visualization for researchers PDBsum: Structural summaries of 990 PDB entries Current methods of mutation detection Magic-BLAST, an accurate RNA-seq 993 aligner for long and short reads The EMBL-EBI search and sequence analysis tools APIs in 2019 Collective judgment predicts disease-associated single nucleotide variants 0: predicting stability changes upon mutation from the protein sequence or structure Toward optimal fragment generations for ab initio protein structure assembly Ab initio protein structure assembly using continuous structure fragments and optimized 1004 knowledge-based force field WebLogo: a sequence logo generator Phylogeny of the serpin superfamily: implications of patterns of 1008 amino acid conservation for structure and function Use of amino acid sequence data in phylogeny and evaluation of methods using computer 1010 simulation Missense mutations in SARS-CoV2 genomes from Indian patients SARS-CoV2 envelope protein: Non-synonymous mutations and its 1014 consequences DNA barcode 1016 analysis: a comparison of phylogenetic and statistical classification methods The neighbor-joining method: a new method for reconstructing phylogenetic trees Exploiting heterogeneous sequence properties 1021 improves prediction of protein disorder Accurate prediction of disorder in protein chains with a comprehensive and empirically 1023 designed consensus Comprehensive review of methods for prediction of intrinsic disorder and its 1025 molecular functions Comprehensive comparative assessment of in-silico predictors of disordered regions Computational Prediction of Intrinsic Disorder in Proteins 1030 1031 1032 1033 The schematic figures were created with BioRender