key: cord-0955566-amsrzqmo authors: Shang, Jingzhe; Han, Na; Chen, Ziyi; Peng, Yousong; Li, Liang; Zhou, Hangyu; Ji, Chengyang; Meng, Jing; Jiang, Taijiao; Wu, Aiping title: Compositional diversity and evolutionary pattern of coronavirus accessory proteins date: 2020-10-30 journal: Brief Bioinform DOI: 10.1093/bib/bbaa262 sha: 1eb392218253f9a6130ddca0732554eb192df6f9 doc_id: 955566 cord_uid: amsrzqmo Accessory proteins play important roles in the interaction between coronaviruses and their hosts. Accordingly, a comprehensive study of the compositional diversity and evolutionary patterns of accessory proteins is critical to understanding the host adaptation and epidemic variation of coronaviruses. Here, we developed a standardized genome annotation tool for coronavirus (CoroAnnoter) by combining open reading frame prediction, transcription regulatory sequence recognition and homologous alignment. Using CoroAnnoter, we annotated 39 representative coronavirus strains to form a compositional profile for all of the accessary proteins. Large variations were observed in the number of accessory proteins of 1–10 for different coronaviruses, with SARS-CoV-2 and SARS-CoV having the most (9 and 10, respectively). The variation between SARS-CoV and SARS-CoV-2 accessory proteins could be traced back to related coronaviruses in other hosts. The genomic distribution of accessory proteins had significant intra-genus conservation and inter-genus diversity and could be grouped into 1, 4, 2 and 1 types for alpha-, beta-, gamma-, and delta-coronaviruses, respectively. Evolutionary analysis suggested that accessory proteins are more conservative locating before the N-terminal of proteins E and M (E-M), while they are more diverse after these proteins. Furthermore, comparison of virus-host interaction networks of SARS-CoV-2 and SARS-CoV accessory proteins showed that they share multiple antiviral signaling pathways, those involved in the apoptotic process, viral life cycle and response to oxidative stress. In summary, our study provides a tool for coronavirus genome annotation and builds a comprehensive profile for coronavirus accessory proteins covering their composition, classification, evolutionary pattern and host interaction. Recently, a new coronavirus that causes the severe respiratory disease COVID-19, SARS-CoV-2, has become prevalent worldwide [1, 2] . The World Health Organization declared that the SARS-CoV-2 epidemic was an international emergency public health situation on 30 January 2020 [3] . As of 23 July 2020, the SARS-CoV-2 virus had expanded to 188 countries or regions, with more than 15.2 million people diagnosed. SARS-CoV-2 is the third coronavirus found to cause severe human diseases. As early as 2003, the SARS-CoV coronavirus spread worldwide, leading to infection of more than 8000 and a fatality rate of nearly 10% [4, 5] . Ten years later, another coronavirus with a lethality rate of more than 20%, MERS-CoV, spread from the Arabian Peninsula to 27 countries [6] [7] [8] [9] . Although the fatality rate of SARS-CoV-2 (about 3.61%) is lower than that of the previous two viruses [10] , this virus is more contagious with high R0 values estimated at 2-6.47 [11] [12] [13] . The frequent emergence of different types of coronavirus that cause serious diseases has raised important questions about the diversity and evolutionary associations of these viruses. Coronaviruses are positive-stranded, single-stranded RNA enveloped viruses with the largest genome among RNA viruses [14] . Coronaviruses have a certain of replication fidelity, which may explain their relatively large genomes [15] . All coronaviruses have a similar genomic structure. At the 5 ′ end, two-thirds of the genome comprises two large open reading frames (ORFs) (ORF1a and ORF1b) encoding the coronavirus replicase, which is highly conserved among genera. At the 3 ′ end, the genome encodes four structural proteins (S, E, M and N) and a variable number of accessory proteins. Based on the highly conserved ORF1ab coding region, coronaviruses can be divided into four genera: alpha-, beta-, gamma-and delta-coronavirus. Alphaand beta-coronaviruses mainly infect mammals, while gammaand delta-coronaviruses primarily infect birds [16] . In addition to SARS-CoV, MERS-CoV and SARS-CoV-2, there are currently four other coronaviruses that can infect humans, HCoV-229E [17] , HCoV-OC43 [18] , HCoV-NL63 [19] and HCoV-HKU1 [20] . These viruses are all either alpha-or beta-coronaviruses. Studies have shown that the genome compositions of these seven humaninfecting coronaviruses are significantly different, especially for accessory proteins [21, 22] . Coronaviruses have a unique discontinuous transcription mechanism. By recognizing specific transcription regulating sequences (TRSs), mature mRNA can be obtained by onetime transcription [23] . In theory, when naming encoded proteins, especially coronavirus variable accessory proteins, it is necessary to refer to the location of the TRS in the genome. However, because there is not a standardized genome annotation tool, some genome annotation studies do not have location information for the TRS. This makes the annotation and naming of coronaviruses a bit confusing, especially for accessory proteins that vary greatly in number and composition among different viral strains. Therefore, the need for a standardized method and tool for coronavirus genome annotation is urgent. Coronavirus accessory proteins are normally located behind structural proteins, and each location may encode a different number of accessory proteins, resulting in a huge complexity of accessory proteins. Previous studies have shown that accessory proteins play an important role in virus-host interactions, especially in antagonizing or regulating host immunity and virus adaptation to the host [24] . However, there is still no normalized annotation and analysis for coronavirus accessory proteins, and no in-depth explorations of the evolutionary relationships of accessory proteins between different strains have been conducted to date ( Figure 1A ). To address these issues, we first developed a semi-automatic coronavirus annotation tool named CoroAnnoter for standardized annotation of all coronaviruses. We then obtained a comprehensive genome composition profile for all of the representative coronaviruses. The composition and evolutionary analysis revealed large variations in accessory proteins between strains as well as their inherent conservation of evolutionary patterns. The representative sequences of the coronavirus genomes were obtained from the genome available in the National Center for Biotechnology Information (NCBI) database (Supplementary Table 1 ). The protein sequences of the self-built BLAST database were mainly obtained from the Virus Pathogen Resource (ViPR), including 86 747 protein sequences of all coronaviruses collected up to 16 February 2020. Some protein sequences were also from the annotated proteins in the genome in the NCBI database. The coronavirus genome annotation process consists of the following six steps. (i) ORFfinder is used to predict the ORFs of the genome. The ORF starting sequence is ATG, the genetic code is set to the standard code and the minimum predicted length is 60 bp. (ii) The BLAST program is used to conduct sequence alignment to obtain credible ORFs with sequence similarities. The E value is set to 1 E-2, and the first three sequences with the best match are retained. The BLAST results are then manually checked, and the two redundant results are removed. (iii) The R script is used to obtain the 100 nucleotides before the 5 ′ end of the credible ORFs. (iv) The MEME kit is used to identify conservative transcription regulatory sequences (TRSs) [25] . The default classic method is used to select the zoos motif distribution mode and the motif length is between 6 and 8 nucleotides. The five best results are kept for manual selection of the most suitable motif as the viral TRS. In addition to the leader sequence, the TRSs of other ORFs are normally located at the 3 ′ end of 100 nucleotides. (v) Identify the position of TRS on the genome through the R script. (vi) The coronavirus genome is then annotated by integrating the information from the TRS and ORFs. The similarity of accessory proteins is measured based on the score of the pairwise sequence alignment. We use the pairwise alignment function of the R package 'Biostrings' for sequence alignment [26] . The comparison type was global comparison based on the Needleman-Wunsch algorithm and the substitution Matrix was BLOSUM62. The CDS sequences of the accessory proteins of each coronavirus were combined according to their orders in the genome to form a virtual accessory protein sequence. Coronavirus accessory protein sequences of the same genus were placed into a file and submitted to Circoletto (http://tools. bat.infspire.org/circoletto/), an online server that visualizes similar sequences [27] . The E value of cutoff is selected as 1 E-5, while the defaults were used for all parameters. We collected all interaction proteins between coronaviruses and humans up to 1 May 2020. We then used the R package 'clusterprofiler' to conduct GO and KEGG annotation of the proteins [28] , after which we removed the redundant annotation results based on the kappa coefficient (Kappa similarity >0.3). The IPA software was used to obtain interaction information between human proteins. The above information was then integrated and visualized using the cytoscape software [29] . CoroAnnoter is free-software and is available at https://github. com/wuaipinglab/CoroAnnoter. The unique TRS recognition-based transcription mechanism can be used to guide the development of genome annotation tools for coronaviruses [30, 31] . By combining ORF prediction, TRS recognition and homologous alignment, we developed a semiautomatic and standardized genome annotation tool for coronaviruses named CoroAnnoter ( Figure 1B ). CoroAnnoter consists of the following steps (Supplementary Figure 1) . First, the ORFs of the coronavirus genome sequence are predicted by ORFfinder [32] , after which the potential ORFs are filtered based on sequence similarities via BLAST. After filtering, there were still some ORFs that should not be transcribed. To solve this problem, we introduced TRS position recognition to determine exactly which sub-genomes were transcribed. We extracted the 100 nucleotides before the 5 ′ terminal of each ORF to predict the conserved motifs using the MEME kit [33] , and then, we manually determined the appropriate TRS core sequences (CSs) with 6-8 nucleotides. Finally, we annotated the coronavirus genome structure by integrating ORFs, blast similarities and TRS positions. The ORF following each TRS is an individual sub-genomic fragment. If more than one ORF is present at the same TRS position, then they can be named a, b, etc. (e.g. as 3a, 3b, etc.). The CoroAnnoter tool is freely available at https://github.com/ wuaipinglab/CoroAnnoter. One advantage of CoroAnnoter is to standardize the naming of identified ORFs based on the TRS information. For example, MERS-CoV and Bat-CoV-HKU4 were found to share a similar genome structure and have the same TRS CS as ACGAAY. The ORF names of some corresponding segments are different between these two coronaviruses. Four ORFs named as 3, 4a, 4b and 5 in MERS-CoV were named as 3a, 3b, 3c and 3d in Bat-CoV-HKU4, respectively [34] . However, these four ORFs could be named as the consistent 3, 4a, 4b and 5 with CoroAnnoter because their same TRS information locating before ORF3, ORF4a and ORF5, respectively. Therefore, with CoroAnnoter, the genome structure could be uniformly annotated in 39 representative species for all currently known coronaviruses. Given the importance of TRS in the unique sub-genomic transcription mechanism of coronaviruses, a conserved TRS CS was identified for each type of coronavirus (Supplementary Table 2 ). Systematic comparison revealed that the TRS CSs of coronaviruses in the same genus were highly similar. The original TRS CSs of each genus were most likely to be CTAAAC (alpha), ACGAAC (beta), AACAA (gamma) and ACACCA (delta). Within each genus, the TRS sequence of every species may possess specific mutations introduced into the genus TRS ( Figure 2 ). For example, the TRS of Human-Cov-NL63 strain of the alpha genus is CTMAAC, in which the third base M indicates A or C. In addition, some species-specific TRSs are inconsistent with their genus TRSs. For example, the TRS of the Human-CoV-OC43 strain of beta genus is TYYAAAC, which is very different from ACGAAC (beta) and more similar to CTAAAC (alpha). Based on the comprehensive annotation profile of the coronavirus genome, we found that there are large variations in the number of accessory proteins (1-10) among coronaviruses ( Figure 2 ). The number of accessory proteins of the alphacoronaviruses is relatively lower, between 1 and 5, while betacoronaviruses have 3-5 accessory proteins, except for SARS-CoV and SARS-CoV-2, which possess the largest number of accessory proteins among all coronaviruses (10 and 9, respectively). When compared with the evolutionarily similar strain, Bat-CoV-Hp, SARS-CoV and SARS-CoV-2 were found to have more complex compositions of accessory proteins. SARS-CoV and SARS-CoV-2 related viruses from non-human hosts showed similar complex accessory protein compositions (Supplementary Figure 2) . The compositional variations between SARS-CoV and SARS-CoV-2 accessory proteins could be found from their evolutionarily related viruses. For example, protein 3b splits into a shorter 3c protein in some viruses, and even loses 3b, leaving only the 3c protein. SARS-CoV possesses the completed 3b protein, while SARS-CoV-2 contains only the shorter 3c protein. Protein 8 splits into 8a and 8b in some strains. SARS-CoV contains shorter 8a and 8b proteins, while SARS-CoV-2 has the complete protein 8. There are only two representative coronaviruses in the gamma genus, but they have relatively more accessory proteins (6 and 8, respectively). The number of accessory proteins in delta-coronaviruses is between 2 and 8. To date, three of the seven human-infecting coronaviruses (SARS-CoV, MERS-CoV and SARS-CoV-2) have been shown to cause severe symptoms ( Figure 2 ). We found that the number of accessory proteins of these three viruses is relatively high, between 5 and 10, while the other four viruses contained between 1 and 4 accessory proteins. Viruses from the alphagenus (229E and NL63) were found to have the lowest number of accessory proteins, 2 and 1, respectively. When compared with other viruses of the alpha-genus, these viruses lacked the accessory proteins behind the N protein. Another two humaninfecting coronaviruses, HKU1 and HKU24, both contain the HE protein and have 3 or 4 accessory proteins, respectively. The distribution pattern of coronavirus accessory proteins was found to have intra-genus conservation and inter-genus diversity (Figures 2 and 3) . Based on the distribution characteristics of coronavirus accessory proteins, we can divide the structural compositions of the accessory proteins into eight types, namely, Alpha, Beta-Lineage-A, Beta-Lineage-B, Beta-Lineage-C, Beta-Lineage-D, Gamma-Lineage-A, Gamma-Lineage-B and Delta ( Figure 3A ). All alpha-coronaviruses belong to the conservative Alpha type, in which 1-2 accessory proteins after the S protein and multiple accessory proteins after the N protein were observed ( Figure 3B ). However, beta-coronaviruses are highly diverse and consist of four compositional types of accessory proteins, lineages A, B, C and D. Beta-lineage-A strains have 2a and HE protein before the S protein, as well as 1-2 accessory proteins behind the S protein. Beta-lineage-B strains, including SARS-CoV and SARS-CoV-2, possess multiple accessory proteins located behind S and M, respectively. Members of Beta-lineage-C, which include MERS-CoV, possess four accessory proteins between S and E. Accessory proteins in Beta-Lineage-D have similar distributions as those of the alpha type ( Figure 3C ). The two-representative gamma-coronaviruses were found to have completely different accessory protein compositions. Gamma-Lineage-A contains all of the accessory proteins located between M and N proteins, while the distribution of Gamma-Lineage-B accessory proteins is similar to that of Beta-Lineage-B proteins, with accessory proteins located behind the S and M proteins, respectively ( Figure 3D ). In the Delta type, there is one accessory protein behind the M protein, as well as multiple accessory proteins behind the N proteins ( Figure 3E ). Interestingly, although the coronavirus accessory proteins have different compositions, the E and M proteins are always linked together with no accessory protein between them. We hypothesized that accessory proteins located at the same genomic position could have a close phylogenetic relationship and share similar sequences. To test this hypothesis, we merged coding sequences (cds) of all of the accessory proteins of a coronavirus genome to construct a virtual protein sequence, and then we analyzed the similarities among all of the virtual protein sequences by multiple sequence alignment. We found that sequence similarities of virtual accessory proteins had significant intra-genus conservation and inter-genus diversity ( Figure 4A) , which was consistent with the distribution pattern among accessory protein sequences. Furthermore, we calculated the sequence similarities for all of the encoded accessory proteins. The results showed that only a small portion of the accessory proteins had similarity scores ≥40%, while most Figure 3A) . Accessory protein 3 in alphacoronaviruses had a relatively higher consistency among different viral species. However, accessory protein 3 in betacoronaviruses had significant sequence variation, forming three groups of consistent proteins (Supplementary Figure 3B) . By integrating the position distribution and sequence similarity of the accessory proteins, we found that they were relatively conserved before the E-M proteins, while they were more diverse behind the E-M proteins. Therefore, using the E-M proteins as a boundary, the accessory proteins could be divided into two parts: Pre-EM and Post-EM ( Figure 4B ). The Pre-EM accessory proteins could be distinguished and named according to the distribution pattern and TRS position of their accessory proteins as follows: Alpha-3, Beta-Lineage-A-4 (Beta-A4), Beta-Lineage-B-3 (Beta-B3), Beta-Lineage-C-3 (Beta-C3), Beta-Lineage-C-4a (Beta-C4a), Beta-Lineage-C-4b (Beta-C4b), Beta-Lineage-C-5 (Beta-C5) and Beta-Lineage-D-3 (Beta-D5). The 3b proteins of SARS-CoV and SARS-CoV-2 were found to have low consistency, which may have been because of truncation of the 3b protein of SARS-CoV-2 [22] . The accessory proteins of the two strains in Beta-A4 are more similar to that of Beta-C3 (Supplementary Figure 4A) , while the accessory proteins of the two strains of Beta-D3 are more similar to that of Beta-C5. This implies that the Lineage C branch in the beta genus may be derived from Lineage A and Lineage D (Supplementary Figure 4B) . Contrary to pre-EM, post-EM accessory proteins present high diversity, with only the Delta-5 and Delta-7 groups having sequence similarities ≥40%. The distribution and sequence characteristics of coronavirus accessory proteins suggest that there may be functional similarity within the genus. Because the three human-infecting coronaviruses that cause severe symptoms all belong to the beta-genus, we further investigated the characteristics of accessory proteins in beta-coronaviruses. The accessory proteins of beta-genus were found to have internal diversity and lineage conservation ( Figure 5A ). Three of the four lineages included coronaviruses that can infect humans. In Lineage-A, HKU1 and OC43 can cause mild symptoms in humans, and these contain a conserved accessory protein, Beta-A4 ( Figure 5B ). Studies have shown that OC43 Beta-A4 antagonizes type I IFN and NF-kb signaling [35] . Notably, Lineage-B includes SARS-CoV and SARS-CoV-2. The conserved protein in the pre-EM region of Lineage-B is Beta-B3, and the sequence similarity score of 3a between SARS-CoV and SARS-CoV-2 was 72% ( Figure 5C ), while it was only 20% for the Bat-CoV-Hp virus. Previous studies revealed that the 3a protein of SARS-CoV activates NF-Kb, has pro-inflammatory functions and can promote apoptosis and facilitate release of the virus [36] [37] [38] [39] . Overexpression of the 3b protein of SARS-CoV in Vero E6 cells leads to apoptosis and necrosis [40] , while 7a causes apoptosis by interfering with Bcl-XL [41] , and ORF8a enhances virus replication and induces apoptosis through a mitochondrial-dependent pathway [42] . Finally, the 8b protein negatively regulates viral replication and activates the NLRP3 inflammasome [43, 44] . The similarities of the four accessory proteins before the E protein in Lineage-C were close to or higher than 40% among strains, including MERS-CoV ( Figure 5D ). Studies have shown that MERS-CoV 4a protein suppresses host immune response and antagonizes type I IFN production and NF-Kb activity [45] while also inhibiting the formation of stress granules and promoting viral protein translation [46, 47] . MERS-CoV 4b protein also suppresses the host immune response, preventing IFN-β production and NF-Kb signaling [48, 49] . SARS-CoV-2 has been shown to have a similar compositional structure of accessory proteins as SARS-CoV [22] . However, the similarity and differentiation of the functional interactions between the host and accessory proteins of SARS-CoV-2 and SARS-CoV are still unclear. Here, we collected all public virushost protein-protein interactions of SARS-CoV-2 (229) and SARS-CoV (33) to construct networks of interactions between coronavirus accessory proteins and host proteins ( Figure 6A and Supplementary Table 3 ). We found that they only share four interacting genes: SMOC1, MARK3, DCTN2 and BAG6. In addition, some human interaction genes of the two viruses were found to interact with each other. These genes are involved in multiple pathways of host resistance to viral infections, including apoptotic process, viral life cycle and response to oxidative stress. We found that SARS-CoV-2 and SARS-CoV share the BAG6 gene in the apoptosis signal. In addition, they were found to possess several proteins that can interact each other, BCL2, BCL2L1, ITGB2, PACK1 and HMOX1 ( Figure 6B ). Furthermore, evaluation of the interaction network of each accessory protein revealed that all of the accessory proteins participate in the host immune response, except for proteins 8a and 10 in SARS-CoV ( Supplementary Figures 5 and 6 ). The key role of coronavirus accessory proteins in virus-host interactions prompted us to systematically study their compositional diversity and evolution pattern. Based on the unique discontinuous transcription of sub-genomes of coronavirus, we developed a standardized coronavirus genome annotation tool named CoroAnnoter. Using CoroAnnoter, we can correct the inaccurate naming of accessory proteins in some previous coronavirus annotation results, such as different naming of the same protein in different studies [7, 50] . Furthermore, we constructed a comprehensive profile for coronavirus accessory proteins and standardized the naming of accessory proteins by integrating ORFs sequence similarity and TRS positions. The normalized dataset enables the subsequent systematic studies of the composition and evolution pattern of different coronavirus accessory proteins. We found that the compositions of coronavirus accessory proteins have significant intra-genus conservation and intergenus diversity. We divided all of the representative coronaviruses into eight types based on the composition and location of accessory proteins in the viral genome. As expected, these eight types derived from accessory proteins are consistent with the evolutionary relationship of coronaviruses based on conserved proteins, such as pp1b or pp1ab. The consistent classifications suggest that the evolution of conserved proteins and relatively divergent accessory proteins occurs simultaneously with inherent association. We found that seven humaninfecting coronaviruses distributed into four types in genus alpha and beta with significantly different compositions of accessory proteins, indicating the ability to infect humans is not closely associated with the composition of accessory proteins. However, different types of coronaviruses present variable pathogenicity. Types Alpha and Beta-Lineage-A coronaviruses cause mild cold symptoms, while coronaviruses from the Beta-Lineage-B and Beta-Lineage-C types cause potentially serious symptoms, including death. The compositional characteristics of the Alpha and Beta-Lineage-A types indicate that they both have few accessory proteins and simple compositional patterns; however, the accessory proteins of Beta-Lineage-B and Beta-Lineage-C types are more complex. Therefore, the pathogenicity of coronaviruses may be associated with the composition of accessory proteins. Traditionally, proteins with the same name and genomic position among associated viral strains are considered to be homologous. However, coronavirus accessory proteins have more complex characteristics. Using the structural proteins E and M as the boundary, we found that the accessory proteins in the pre-EM region have intra-genus conservation and intergenus diversity, while the accessory proteins in the post-EM region are not conserved within genera. The balance of conservation and variation of accessory proteins may play important roles in their functions and viral adaptation. For example, the inhibition of virus replication by conserved Alpha-3 protein may be beneficial to the long-term symbiosis of the virus in the host [7] . Additionally, the diverse accessory proteins of beta-coronaviruses have multiple functions, including antagonizing the host immune response and promoting viral protein translation [36, 39, 46, 47] . The interaction patterns between coronaviruses and their hosts are very important for the investigation of the pathogenic mechanism of the virus. Comparison of the virus-host interaction networks of SARS-CoV and SARS-CoV-2 accessory proteins revealed that they share multiple antiviral signaling pathways. This implies that although the two viruses induce differences in host-virus interaction proteins, these proteins function through similar pathways. Considering the large disease variations manifested by SARS-CoV-2 and SARS-CoV infection, the virus-host interaction mechanisms of the coronavirus accessory proteins need to be further investigated. The divergent composition and evolutionary pattern of accessory proteins raises the question of their origins. Because the gain or loss of accessory proteins strongly affects the viral phenotypes, it is important to identify the sources of these coronavirus accessory proteins [21, 51, 52] . We identified all of the accessory proteins by BLAST and found a small number of similar sequences that may indicate their possible origins (Supplementary Table 4 ). Proteins 2a and HE from Beta-lineage-A are homologous with torovirus and influenza virus proteins. HE protein is known to be a glycoprotein that facilitates virus invasion [53] . However, protein 4b in Beta-lineage-C is similar to that of an Escherichia coli protein with unclear function. Indeed, the sources of most accessory proteins are still unknown, which may be because there are still a large number of unknown sequences. Three serious outbreaks caused by diverse coronaviruses in recent years indicate that more attention should be taken to some possibly human-susceptible coronaviruses in nature. Investigation in the genomic composition and evolution pattern of known coronaviruses may help with, when new viruses appear, understanding their molecular characteristics and evolutionary origin. We will further optimize the function of CoroAnnoter tool and develop a website for providing online genome annotation and comparison services especially for coronavirus. It may also play a role in promoting the standardized naming for coronavirus accessory proteins. • CoroAnnoter is a semi-automatic and standardized genome annotation tool for coronavirus proteins by combining open reading frame prediction, transcription regulatory sequence recognition and homologous alignment. • We generated a comprehensive profile for the composition, homology, function and source of all coronavirus accessory proteins. • The genomic distributions of accessory proteins have significant intra-genus conservation and inter-genus diversity. • Evolutionary analysis suggested that the accessory proteins are more conservative in pre-EM group while significantly diverse in post-EM group. • SARS-CoV-2 and SARS-CoV accessory proteins share multiple antiviral signaling pathways. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China A novel coronavirus from patients with pneumonia in China Statement on the second meeting of the International Health Regulations (2005) Emergency Committee regarding the outbreak of novel coronavirus (2019-nCoV) Identification of a novel coronavirus in patients with severe acute respiratory syndrome A novel coronavirus associated with severe acute respiratory syndrome Middle East respiratory syndrome coronavirus in dromedary camels: an outbreak investigation Transmission and evolution of the Middle East respiratory syndrome coronavirus in Saudi Arabia: a descriptive genomic study Incidence and mortality rate of Middle East respiratory syndrome-corona virus (MERS-Cov), threatens and opportunities Clinical course and outcomes of critically ill patients with Middle East respiratory syndrome coronavirus infection Cross-country comparison of case fatality rates of COVID-19/SARS-COV-2 The SARS-CoV-2 outbreak: what we know Early transmissibility assessment of a novel coronavirus in Wuhan, China, Available at SSRN 2020 Estimation of the transmission risk of the 2019-nCoV and its implication for public health interventions Virus taxonomy: classification and nomenclature of viruses High fidelity of murine hepatitis virus replication is decreased in nsp14 exoribonuclease mutants Discovery of seven novel mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus Infectious RNA transcribed in vitro from a cDNA copy of the human coronavirus genome cloned in vaccinia virus Human respiratory coronavirus OC43: genetic stability and neuroinvasion Identification of a new human coronavirus Characterization and complete genome sequence of a novel coronavirus, coronavirus HKU1, from patients with pneumonia Molecular evolution of human coronavirus genomes Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China Sequence motifs involved in the regulation of discontinuous coronavirus subgenomic RNA synthesis Accessory proteins of SARS-CoV and other coronaviruses SUITE: tools for motif discovery and searching Biostrings: string objects representing biological sequences, and matching algorithms. R package version 2 Circoletto: visualizing sequence similarity with Circos clusterProfiler: an R package for comparing biological themes among gene clusters Cytoscape: a software environment for integrated models of biomolecular interaction networks The molecular biology of coronaviruses A new model for coronavirus transcription. Coronaviruses and Arteriviruses ORF-FINDER: a vector for high-throughput gene identification Fitting a mixture model by expectation maximization to discover motifs in bipolymers Genomic characterization of a newly discovered coronavirus associated with acute respiratory distress syndrome in humans Effect of human coronavirus OC43 structural and accessory proteins on the transcriptional activation of antiviral response elements The 3a Protein of SARScoronavirus induces apoptosis in Vero E6 cells The open reading frame 3a protein of severe acute respiratory syndrome-associated coronavirus promotes membrane rearrangement and cell death Severe acute respiratory syndrome coronavirus ORF3a protein activates the NLRP3 inflammasome by promoting TRAF3-dependent ubiquitination of ASC Severe acute respiratory syndrome-associated coronavirus 3a protein forms an ion channel and modulates virus release Over-expression of severe acute respiratory syndrome coronavirus 3b protein induces both apoptosis and necrosis in Vero E6 cells Induction of apoptosis by the severe acute respiratory syndrome coronavirus 7a protein is dependent on its interaction with the Bcl-XL protein Open reading frame 8a of the human severe acute respiratory syndrome coronavirus not only promotes viral replication but also induces apoptosis SARS coronavirus 8b reduces viral replication by down-regulating E via an ubiquitin-independent proteasome pathway SARS-coronavirus open reading frame-8b triggers intracellular stress pathways and activates NLRP3 inflammasomes Middle East respiratory syndrome coronavirus accessory protein 4a is a type I interferon antagonist Middle East respiratory coronavirus accessory protein 4a inhibits PKRmediated antiviral stress responses Inhibition of stress granule formation by Middle East respiratory syndrome coronavirus 4a accessory protein facilitates viral translation, leading to efficient virus replication MERS-CoV 4b protein interferes with the NF-κB-dependent innate immune response during infection Middle East respiratory syndrome coronavirus ORF4b protein inhibits type I interferon production through both cytoplasmic and nuclear targets Comparative analysis of twelve genomes of three novel group 2c and group 2d coronaviruses reveals unique group and subgroup features Structural analysis of the evolutionary origins of influenza virus hemagglutinin and other viral lectins Identification and characterization of a novel alpaca respiratory coronavirus most closely related to the human coronavirus 229E Structure of coronavirus hemagglutinin-esterase offers insight into corona and influenza virus evolution We thank Peng Chao and Qiming Liang from Shanghai Jiaotong University for providing human protein information that interacts with coronavirus accessory proteins. No conf licts of interest. Supplementary data are available online at https://academic. oup.com/bib.