key: cord-0707052-6t2vmkcr authors: Li, Fei-Feng; Zhang, Qiong; Wang, Gui-Yu; Liu, Shu-Lin title: Comparative analysis of SARS-CoV-2 and its receptor ACE2 with evolutionarily related coronaviruses date: 2020-11-07 journal: Aging (Albany NY) DOI: 10.18632/aging.104024 sha: 67a6e98389129dd659ccc21220ade1a614346664 doc_id: 707052 cord_uid: 6t2vmkcr The pandemic COVID-19 is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and it is spreading very rapidly worldwide. To date, the origin and intermediate hosts of SARS-CoV-2 remain unclear. In this study, we conducted comparative analysis among SARS-CoV-2 and non-SARS-CoV-2 coronavirus strains to elucidate their phylogenetic relationships. We found: 1, the SARS-CoV-2 strains analyzed could be divided into 3 clades with regional aggregation; 2, the non-SARS-CoV-2 common coronaviruses that infect humans or other organisms to cause respiratory syndrome and epizootic catarrhal gastroenteritis could also be divided into 3 clades; 3, the hosts of the common coronaviruses closest to SARS-CoV-2 were Apodemus chevrieri (a rodent), Delphinapterus leucas (beluga whale), Hypsugo savii (bat) , Camelus bactrianus (camel) and Mustela vison (mink); and 4, the gene sequences of the receptor ACE2 from different hosts could also be divided into 3 clades. The ACE2 gene sequences closest to that of humans in evolution include those from Nannospalax galili (Upper Galilee mountains blind mole rat), Phyllostomus discolor (pale spear-nosed bat), Mus musculus (house mouse), Delphinapterus leucas (beluga whale), and Catharus ustulatus (Swainson's thrush). We conclude that SARS-CoV-2 may have evolved from a distant common ancestor with the common coronaviruses but not a branch of any of them, implying that the prevalent pandemic COVID-19 agent SARS-CoV-2 may have existed in a yet to be identified primary host for a long time. AGING member of the Coronavirus family, Betacoronavirus genus and Sarbecovirus subgenus, with a 30 kb genome [5, 6] . Currently the bat coronavirus RaTG13 (GenBank No.: MN996532) is shown to be the most closely related with SARS-CoV-2 by whole genome comparisons [7, 8] , and pangolin, mink, snake and turtle are deemed to be the intermediate hosts of this virus [1, 9, 10] . However, to date the origin and the intermediate hosts of SARS-CoV-2 remain unclear. Here, we analyzed the complete genome sequences of 200 SARS-CoV-2 strains, including 176 from America (USA), 17 from China (CHN), 2 from Spain (ESP), 2 from Hungary (HUN), 1 from Peru (PER), 1 from Colombia (COL) and 1 from Pakistan (PAK), using the MEGA-X software [11] . As shown in Figure 1 , the SARS-CoV-2 strains could be grouped into 3 clades, C I, CII and CIII. The viral genomes showed regional aggregation. The SARS-CoV-2 strains from China "Fu or Ne", the SARS-CoV-2 were in the clades CI, CII and CIII respectively with furthest (Fu) or nearest (Ne) from the roots of the evolutionary tree; 2 "Near with Fu or Ne", the viruses in the common coronaviruses that were infect humans and nearest with the "Fu or Ne". In order to elucidate the relationships between SARS-CoV-2 and the common coronaviruses that also infect humans, we chose genome sequences of six SARS-CoV-2 strains, i.e., MT263395 (furthest), MT263421 (nearest); MT251973 (furthest), MT263420 (nearest); MT259229 (furthest), MT263389 (nearest), which were in the clades C I, C II and C III, respectively, and were the furthest or nearest from the root of the evolutionary tree. We then combined the six SARS-CoV-2 strains with 293 common coronavirus strains that infect humans in the comparative sequence analysis. As shown in Figure 2 , the 293 common coronaviruses that infect humans were divided into 3 clades, and there were 12 common coronaviruses that were particularly close to the SARS-CoV-2 strains in evolution ( Figure 2 and Table 1 ). Very interestingly, the disease caused by the 12 common coronaviruses was exclusively respiratory syndrome (Table 1) ; these common coronaviruses were identified in 2013, 2014 and 2015 (Table 1) . So far, the bat, pangolin, mink, snake and turtle have been assumed to be the intermediate hosts of the SARS-CoV-2 virus [1, [7] [8] [9] [10] . Researchers have also found many coronaviruses in other organisms [1, 9, 10] . In order to identify the intermediate hosts of SARS-CoV-2, we chose genome sequences of the six SARS-CoV-2 strains and made comparisons with those of 53 common coronaviruses that infect other organisms. As shown in AGING Figure 3 . The evolutionary tree of common coronaviruses that infect other organisms and their phylogenetic comparisons with SARS-CoV-2. These common coronavirus strains could be grouped into 3 clades, with 6 of the coronavirus strains being particularly close to the SARS-CoV-2 in evolution. Note: 1 "Fu or Ne", the SARS-CoV-2 were in the clades CI, CII and CIII respectively with furthest (Fu) or nearest (Ne) from the roots of the evolutionary tree; 2 "Near with Fu or Ne", the viruses in the common coronaviruses that were infect other organisms and nearest with "Fu or Ne". hosts could be divided into 3 clades, with those that were closest to that of humans in evolution being from Nannospalax galili (Upper Galilee mountains blind mole rat), Phyllostomus discolor (pale spear-nosed bat), Mus musculus (house mouse), Delphinapterus leucas (beluga whale), and Catharus ustulatus (Swainson's thrush). Figure 3 , the common coronaviruses were divided into 3 clades, with six common coronaviruses being particularly close to the SARS-CoV-2 strains in evolution ( Figure 3 and Table 2 ). The diseases caused by the six common coronaviruses were respiratory syndrome and epizootic catarrhal gastroenteritis ( Table 2 ). The hosts of the common coronaviruses closest to SARS-CoV-2 were Apodemus chevrieri (a rodent), Delphinapterus leucas (beluga whale), Hypsugo savii (bat), Camelus bactrianus (camel) and Mustela vison (Mink) ( Table 2 ). Those common coronaviruses were identified in 1998, 2006, 2011 and 2015 ( Table 2 ). The Angiotensin-Converting Enzyme-2 (ACE2) gene encodes the ACE2 protein, which is the receptor of SARS-coronavirus (SARS-CoV), human respiratory coronavirus NL63 and SARS-CoV-2 [8, 12] . To understand whether different features of ACE2 might be correlated with the infection of SARS-CoV, NL63 or SARS-CoV-2 [13] [14] [15] , we compared the genome sequences of the ACE2 genes from 29 organisms, including man, chimpanzee, rat, bat, camel, mink, bovine, and Beluga Whale. As shown in Figure 4 , the 29 ACE2 gene sequences from different organisms were divided into 3 clades. The ACE2 gene sequence from Nannospalax galili (Upper Galilee mountains blind mole rat, MW008344634) was the closest to humans in evolution, followed by the sequences from Phyllostomus discolor (pale spear-nosed bat, NC040911), Mus musculus (house mouse, NC000086), Delphinapterus leucas (beluga whale, NW022098033) and Catharus ustulatus (Swainson's thrush, NC046222). In summary, in this work, we found 1, the SARS-CoV-2 strains analyzed could be divided into 3 clades with regional aggregation; 2, the common coronaviruses that infect humans or other organisms causing respiratory syndrome and epizootic catarrhal gastroenteritis were particularly similar to COVID-19 and could be divided into 3 clades, with SARS-CoV-2 being clearly separated from the common coronaviruses in evolution; 3, the hosts of the common coronaviruses closest to SARS-CoV-2 were Apodemus chevrieri (a rodent), Delphinapterus leucas (beluga whale), Hypsugo savii (bat), Camelus bactrianus (camel) and Mustela vison (mink); and 4, the gene sequences of the receptor ACE2 from different hosts could be divided into 3 clades. The ACE2 gene sequences closest to that of humans in evolution include those from Nannospalax galili (Upper Galilee mountains blind mole rat), Phyllostomus discolor (pale spear-nosed bat), Mus musculus (house mouse), Delphinapterus leucas (beluga whale), and Catharus ustulatus (Swainson's thrush). Based on these analyses, we conclude that SARS-CoV-2 may have evolved from a relatively distant common ancestor with the other coronaviruses but not a branch of any of them, implying that the prevalent pandemic COVID-19 agent SARS-CoV-2 may have existed in a yet to be identified primary host for a long time. Study concept or design: FFL, SLL; Data collection: QZ, GYW; funding: FFL, SLL; drafting/revising of manuscript: all the authors. Special Expert Group for Control of the Epidemic of Novel Coronavirus Pneumonia of the Chinese Preventive Medicine Association Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease-2019 (COVID-19): the epidemic and the challenges The COVID-19 epidemic Cultivation of viruses from a high proportion of patients with colds Genomic variance of the 2019-nCoV coronavirus Evolutionary history, potential intermediate animal host, and cross-species analyses of SARS-CoV-2 The First Disease X is Caused by a Highly Transmissible Acute Respiratory Syndrome Coronavirus A pneumonia outbreak associated with a new coronavirus of probable bat origin Effect of throat washings on detection of 2019 novel coronavirus Pangolin homology associated with 2019-nCoV MEGA X: molecular evolutionary genetics analysis across computing platforms Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Susceptibility to SARS coronavirus S protein-driven infection correlates with expression of angiotensin converting enzyme 2 and infection can be blocked by soluble receptor Receptor and viral determinants of SARS-coronavirus adaptation to human ACE2 Comparative genetic analysis of the novel coronavirus (2019-nCoV/SARS-CoV-2) receptor ACE2 in different populations The authors have declared that no conflicts of interest exist.