key: cord-0906430-tx924rhr authors: Tsai, Ping-Hsing; Wang, Mong-Lien; Yang, De-Ming; Liang, Kung-How; Chou, Shih-Jie; Chiou, Shih-Hwa; Lin, Ta-Hsien; Wang, Chin-Tien; Chang, Tai-Jay title: Genomic variance of Open Reading Frames (ORFs) and Spike protein in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) date: 2020-07-17 journal: J Chin Med Assoc DOI: 10.1097/jcma.0000000000000387 sha: 95ac68b903e5d6defeef4c80069b69fbfd363331 doc_id: 906430 cord_uid: tx924rhr The outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused severe pneumonia at December 2019. Since then, it has been wildly spread from Wuhan, China, to Asia, European, and United States to become the pandemic worldwide. Now coronavirus disease 2019 were globally diagnosed over 3 084 740 cases with mortality of 212 561 toll. Current reports variants are found in SARS-CoV-2, majoring in functional ribonucleic acid (RNA) to transcribe into structural proteins as transmembrane spike (S) glycoprotein and the nucleocapsid (N) protein holds the virus RNA genome; the envelope (E) and membrane (M) alone with spike protein form viral envelope. The nonstructural RNA genome includes ORF1ab, ORF3, ORF6, 7a, 8, and ORF10 with highly conserved information for genome synthesis and replication in ORF1ab. METHODS: We apply genomic alignment analysis to observe SARS-CoV-2 sequences from GenBank (http://www.ncbi.nim.nih.gov/genebank/): MN 908947 (China, C1); MN985325 (United States: WA, UW); MN996527 (China, C2); MT007544 (Australia: Victoria, A1); MT027064 (United States: CA, UC); MT039890 (South Korea, K1); MT066175 (Taiwan, T1); MT066176 (Taiwan, T2); LC528232 (Japan, J1); and LC528233 (Japan, J2) and Global Initiative on Sharing All Influenza Data database (https://www.gisaid.org). We adopt Multiple Sequence Alignments web from Clustalw (https://www.genome.jp/tools-bin/clustalw) and Geneious web (https://www.geneious.com. RESULTS: We analyze database by genome alignment search for nonstructural ORFs and structural E, M, N, and S proteins. Mutations in ORF1ab, ORF3, and ORF6 are observed; specific variants in spike region are detected. CONCLUSION: We perform genomic analysis and comparative multiple sequence of SARS-CoV-2. Large scaling sequence alignments trace to localize and catch different mutant strains in United possibly to transmit severe deadly threat to humans. Studies about the biological symptom of SARS-CoV-2 in clinic animal and humans will be applied and manipulated to find mechanisms and shield the light for understanding the origin of pandemic crisis. The outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused severe pneumonia at December 2019. 1 Since then, it has been wildly spread from Wuhan, China, to Asia, European, and United States to become pandemic worldwide. 2 Severe cases beginning from Huanan Seafood Wholesale market in China which confirmed human pneumonia with the infection of a novel coronavirus (2019-nCoV), 3 and named as SARS-CoV-2 by International Committee on Taxonomy of Viruses. 4, 5 Now coronavirus disease 2019 were globally diagnosed over 3 084 740 cases with mortality of 212 516 toll. 6 Current reports single nucleotide variants are found in many patients with SARS-CoV-2, which belongs to beta-coronavirus species. SARS-CoV-2 contains functional genomic ribonucleic acid (RNA) to transcribe into structural proteins as transmembrane spike (S) glycoprotein for mediating the virus to entry the host cell by utilizing host's cellular angiotensin-converting enzyme 2 (ACE2), and the nucleocapsid (N) protein holds the major nuclear viral RNA genome; the envelope (E) and membrane (M) alone with spike protein form viral envelope. 7 The nonstructural RNA genome including ORF1ab, ORF3, ORF6, 7a, 8, and ORF10 contains highly conserved information for genome RNA synthesis and replication in ORF1ab and unclear-verified function in other ORF proteins. 8 The transmission mechanisms with the start of SARS-CoV attaches host cell membrane receptor and then induce the membrane endocytosis to entry host cells. ORF1 of virus genome leads its replication and synthesize the subgenomic RNAs afterward. Meanwhile, N protein and new genomic RNA assemble to form helical nucleocapsids with M protein inserted in endoplasmic reticulum (ER) and anchored Golgi of host cells. E and M proteins then begin to trigger budding processes. S together with helical N on membrane-bound ER triggers the translationrequired viral structure proteins and transport to Golgi. During the final cycle, virions are released by exocytosis to finish the life cycle and replication of the virus. 9 Previous SARS-CoV-1 in 2003 transmits possibly through Bat and Civet as its intermediate hosts, and finally to human with the symptoms of severe respiratory impacts in a 10% mortality rate. However, Wuhan SARS-CoV-2 is suspected to be transmitted from bat (RaTG13) to pangolin as intermediate hosts before transmitted to humans by some unknown mechanisms with symptoms of severe respiratory impacts with highest mortality now. 10 The genomic sequence of RaTG13 cited the 96% similarity with Wuhan coronavirus. 11 Although intermediate host is not clear at present, genomic sequence comparison obviously points out spike receptor-binding domain (RBD) of Wuhan SARS-CoV-2 with the similarity in 90% homolog of pangolin. Thus, the possibility that pangolin might contribute the spike protein region to cross-transmitted to RATG13 forms a new recombinant mutant Wuhan SARS-CoV-2 to transmit onto human finally. 12 The S protein of SARS-CoV-1 and SARS-CoV-2 responsible for viral entry mediates the binding to host cell membrane of ACE2 through its RBD. 13 The surface S spike protein of SARS-CoV comprises two components (S1 and S2). The S protein of SARS-CoV-2 binds to the host receptor ACE2 through its S1 subunit, which contains RBD, and follows by fusing the viral and host membranes through the S2 subunit, which contains the fusion peptide primed by host protease. Major six ORFs exist in SARS-CoV-2. ORF1ab occupies the two-thirds length of the whole genome and subgenome RNA to play roles in viral pathogenesis excluding its replication function as well as involving in cellular signaling and modification of cellular gene expression. 14 There is no clue for antiviral therapy and treatment for SARS-CoV-2 at present. Further study approaches the molecular genomic variants for selection and packaging is critical for developing antiviral strategies. We will verify and compare Multiple Sequence Alignment by Clustalw (https://www. genome.jp/tools-bin/clustalw) web is applied as our alignment tool. Phylogenetic analysis platform performs at Geneious website (https://www.geneious.com). ORF1ab joins 16 proteins together to perform viral genomic replication and synthesis. From the data analysis, it reveals eight mutations from a different country: During this long 6796 amino acids protein, we observe eight mutations located in different regions from various countries; position T609I mutation in California/United States sequence, G818S in Sweden and India, M902I in Korea, F3071Y in Spain, S3120L China, L3606X in Italy and L3606F in Japan, F4321L in Sweden and India, and T6891M in Korea. ORF3a functions as accessory protein to help new viral synthesis and escape from the host cell. We find four position mutations; M128L in Korea, K136X in Spain, G196V in Spain, and G251V in Italy, Korea, and Sweden. There are no mutations in ORF6, ORF7a, and ORF10, but we do find one mutation in ORF8 located at L84S from Spain, India, and China. The M protein abundantly defines the shape of the viral envelope. N protein functions primarily to bind to RNA genome of SARS-CoV, making up the nucleocapsid. 15 Although N is most involved in processes viral genome signaling, it is also involved RNA replication cycle with host cellular response to viral infection. Although many differences between SARS-CoV-1 and SARS-CoV-2 within in M and N protein, there is no variant observed in M protein but we find a point mutation S197L from Spain. 3.6. S protein S protein mediates the attachment of SARS-CoV-1 to the host cell surface receptors and subsequently fuse them to facilitate viral entry into the host cell. 15 The expression of S protein at the cell membrane can mediate cell-cell fusion. This formation offers a strategy to spread the virus between cells to subvert function of virus-neutralizing antibodies mechanisms, which play major controlling of protein interaction. By analysis of S protein, we find four mutations from 10 countries; S221W in Korea, S247R in Australia, F737C in Sweden, and A870V in India (Figs. 3-6 ). (Fig. 1) . ORF3a functions as accessory protein to help new viral synthesis and escape from the host cell. We find four position mutation; M128L in Korea, K136X in Spain, G196V in Spain, and G251V in Fig. 4 Genomic analysis of E protein amino acid sequence. We found one amino acid mutation at position 37th L37H as "H" from South Korea comparing the "L" from other nine sequences. Yellow line indicates the difference in 10 sequence alignment. Italy, Korea, and Sweden (Fig. 2) . We do not observe any mutations in ORF6, ORF7a, and ORF10 proteins, but we find one mutation in ORF8, which located at L84S from Spain, India, and China. No inclusion can explain the mutations happened at present (Fig. 3) . In comparison of 10 strains from different countries, one mutation of E protein is observed at L37H in Korea (Fig. 4) . Inside the envelope, there is the nucleocapsid, which is formed from multiple copies of the nucleocapsid (N) protein, which are bound to the positive-sense single stranded RNA genome in a continuous beads-on-a-string type conformation. 16 The lipid bilayer envelope, membrane proteins, and nucleocapsid protect the virus when it is outside the host cell. 17 Although the N protein holds the viral RNA, and M protein joins with E and S proteins together to create the viral envelope for protection when it is outside the host cell, we do not find point mutation of M protein. We do find a point mutation S197L of N protein in Spain. The binding of M to N stability the nucleocapsid (N protein-RNA complex), as well as the internal core of virions, and, ultimately, promotes completion of viral assembly. 18 No evidence demonstrates if S197L will abolish function of N protein (Fig. 5) . By analysis of S protein, we find four mutations from 10 countries; S221W in Korea, S247R in Australia, F737C in Sweden, and A870V in India (Fig. 6) . Report 19 mentioned a single amino acid reversion (L294Q) in the S protein is sufficient to abrogate the phenotype and grows well at and below 32 o C. Although SARS-CoV-1 and SARS-CoV-2 share the sequence similarity with 80% homolog. After performing the alignment, they reveal their 75% similarity in spike protein. The S protein mediates viral entry into host cells by first binding to a host receptor through the RBD in the S1 subunit and then fusing the viral and host membranes through the S2 subunit priming by host cell proteases. [20] [21] [22] [23] Unraveling which cellular factors are used by SARS-CoV-2 for entry might provide insights into viral transmission and reveal therapeutic targets. SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV) RBDs recognize different receptors. SARS-CoV recognizes ACE2 as its receptor, whereas MERS-CoV recognizes dipeptidyl peptidase 4 as its receptor. 14, 24 Since SARS-CoV-2 recognizes ACE2 as its host receptor binding to viral S protein. 25 Therefore, it is critical to define the RBD in SARS-CoV-2 S protein as the most likely target for the mechanism of virus attachment such as new developing inhibitors, neutralizing antibodies, and vaccines. Authors from the group of Tai et al 26 demonstrate by characterizing of SARS-CoV-2 RBD to display a multiple sequence alignment of RBDs of SARS-CoV-2, SARS-CoV, and MERS-CoV spike (S) proteins. They identified the RBD in SARS-CoV-2 S protein and found that the RBD protein bound strongly to human and bat ACE2 receptors. SARS-CoV-2 RBD displayed significantly higher binding affinity to ACE2 receptor than SARS-CoV RBD. Subsequently, SARS-CoV RBD-specific antibodies could crossreact with SARS-CoV-2 RBD protein. Meanwhile, SARS-CoV RBD-induced antisera could cross-neutralize SARS-CoV-2 which suggested the potentials to develop SARS-CoV RBD-based vaccines for prevention of SARS-CoV-2 and SARS-CoV infection. 26 Hoffmann group mentions SARS-CoV-1 and SARS-CoV-2 share 76% amino acid identity in spike protein region. By the amino acid alignment, they observe the receptor-binding motif of SARS-CoV-1 corresponding to the sequences of bat-associated beta-coronavirus S proteins. Demonstration of high or low similarity by taking advantage of ACE2 as cellular receptor reveals SARS-CoV-2 possesses crucial amino acid residues for ACE2 binding. They also find similarity signal to points out between SARS-CoV-2 and SARS-CoV-1 during transmitting host cells stage and then identify a potential target for antiviral intervention. Inspecting conserved amino acids within ACE2 domain, Hoffmann group perform SARS-CoV-2 to transmit cell entry depends on ACE2 and transmembrane serine protease 2 two proteins and is blocked by applied clinically proven protease inhibitor. 27, 28 By deep and large scaling analysis of spike protein from many countries, we do have variants found in US case including specimen from east coast United States. We do find variants in United States comparing with China origin (Fig. 7) . Mutant-1 expresses a "G" amino acid at 614 instead of China "D" (D614G). Mutant-2 strain displays the position at 614 same as China strain with "D" but other mutations found in different regions (Fig. 8A ). Mutant 2-2 with same position of 614 "D" but only display one mutation same as China pointed as QIS60546 strain (Fig. 8B) . Studies suggest various viral strains originally spread from China to Europe which one strain should be deadly mutations as observed and then they spread to New York finally. The other milder strains also spread to west coast in United States from China. 29 Since this report cites SARS-CoV2 acquired mutations capable of substantially changing its pathogenicity. Will this observation be matched with our finding that three variants found in New York become more severe transmitted to humans than west coast in the United States? Limitedly in the study, we perform our study either data mining by alignment and phylogenetic analysis from public domains such as Global Initiative on Sharing All Influenza Data and National Center for Biotechnology Information. There will be interesting to demonstrate biological approaches with specimens in hands to observe the correlation from clinical to lab analysis directly. In conclusion, we analyze database by genome alignment search for nonstructural ORFs and structural E, M, N, and S proteins. Large scaling performance to catch different mutant strains in American possibly induce severe deadly threat to humans. More studies about the biological symptom of SARS-CoV-2 in clinic animal and humans will manipulate and shield the light for understanding the origin of pandemic crisis. Middle East respiratory syndrome: emergence of a pathogenic human coronavirus SARS and MERS: recent insights into emerging coronaviruses Novel coronavirus (2019-nCoV) situation report 23 World Health Organization. Coronavirus disease The International Committee on Taxonomy of Viruses (ICTV) The genome sequence of the SARS-associated coronavirus Characterization of a novel coronavirus associated with severe acute respiratory syndrome The proximal origin of SARS-CoV-2 Coronaviruses: an overview of their replication and pathogenesis Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-toperson transmission: a study of a family cluster Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus SARS coronavirus replicase proteins in pathogenesis Coronavirus envelope protein: current knowledge The SARS coronavirus nucleocapsid protein-forms and functions A structural analysis of M protein in coronavirus assemblyand morphology The membrane M protein carboxy terminus binds to transmissible gastroenteritis coronavirus core and contributes to core stability A single amino acid mutation in the spike protein of coronavirus infectious bronchitis virus hampers its maturation and incorporation into virions at the nonpermissive temperature Interaction between heptad repeat 1 and 2 regions in spike protein of SARSassociated coronavirus: implications for virus fusogenic mechanism and identification of fusion inhibitors MERS-CoV spike protein: targets for vaccines and therapeutics Structure of SARS coronavirus spike receptor-binding domain complexed with receptor Molecular basis of binding between novel human coronavirus MERS-CoV and its receptor CD26 Dipeptidyl peptidase 4 is a functional receptor for the emerging human coronavirus-EMC A pneumonia outbreak associated with a new coronavirus of probable bat origin Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: implication for development of RBD protein as a viral attachment inhibitor and vaccine SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor Patient-derived mutations impact pathogenicity of SARS-CoV-2. medRxiv 2020 This research was funded by Taipei Veterans General Hospital (grant number V107E-002-2, V108D46-004-MY2-1, V108E-006-4, 108E-006-5, and 109VACS-003).