key: cord-1030609-ijsn8d7b authors: Frank, Hannah K.; Enard, David; Boyd, Scott D. title: Exceptional diversity and selection pressure on SARS-CoV and SARS-CoV-2 host receptor in bats compared to other mammals date: 2020-04-20 journal: bioRxiv DOI: 10.1101/2020.04.20.051656 sha: eee7a0b2b93593f4867f9ec508ecb77747ff9b7b doc_id: 1030609 cord_uid: ijsn8d7b Pandemics originating from pathogen transmission between animals and humans highlight the broader need to understand how natural hosts have evolved in response to emerging human pathogens and which groups may be susceptible to infection. Here, we investigate angiotensin-converting enzyme 2 (ACE2), the host protein bound by SARS-CoV and SARS-CoV-2. We find that the ACE2 gene is under strong selection pressure in bats, the group in which the progenitors of SARS-CoV and SARS-CoV-2 are hypothesized to have evolved, particularly in residues that contact SARS-CoV and SARS-CoV-2. We detect positive selection in non-bat mammals in ACE2 but in a smaller proportion of branches than in bats, without enrichment of selection in residues that contact SARS-CoV or SARS-CoV-2. Additionally, we evaluate similarity between humans and other species in residues that contact SARS-CoV or SARS-CoV-2, revealing potential susceptible species but also highlighting the difficulties of predicting spillover events. This work increases our understanding of the relationship between mammals, particularly bats, and coronaviruses, and provides data that can be used in functional studies of how host proteins are bound by SARS-CoV and SARS-CoV-2 strains. because studies only examine a small subset of the existing diversity, it is hard to determine 48 whether the selected species are more or less similar to humans than a random sample of 49 animals. Here, we investigate how angiotensin converting enzyme 2 (ACE2), the host protein 51 bound by SARS-CoV and SARS-CoV-2 3,14,15 , has evolved in bats compared to other mammals. We analyze sequences drawn from 90 bat species, including 55 sequences generated for this 53 study, an eight-fold increase over prior studies, and 108 other mammal species. Finally, we use We analyzed a total of 207 ACE2 sequences from 198 species (90 bat species; 108 63 non-bat species) representing 18 mammalian orders (Table S1 ). There are 24 amino acid sites 64 on ACE2 that are important for stabilizing the binding of ACE2 with the receptor binding domain 65 of SARS-CoV (22 sites; Table S2 ) and/or SARS-CoV-2 (21 sites; Table S2 ) 6,8,10,11,14-16 . Across 66 these 24 sites, which we refer to by their position in the human ACE2, we found a minimum of (Table S2) . Of the 22 sites with more than one amino acid, bats were 83 more diverse than other mammals at 13 and were more even at 15. That bats demonstrate a 84 similar diversity in their ACE2 across these loci and greater diversity in some sites than that 85 5 observed across the rest of mammals suggests they may be particularly diverse in their ACE2, 86 and supports the idea that bats are more diverse than other suspected SARS-CoV and SARS-87 like CoV hosts 6 . Bats drive the signal of mammalian selection and adaptation to SARS-CoV and SARS-CoV-2 We also conducted a series of selection analyses each on 5 phylogenetic trees drawn 91 from Upham et al. 17 . Across all mammals, the 20 variable sites in ACE2 that contact SARS- CoV were not more likely to be under positive selection than other residues in the gene (MEME when considering sites under selection at p < 0.1, residues that contact SARS-CoV-2 do indeed 99 appear to be more likely to be under selection than other residues in the gene, likely due to the 100 reduction in statistical power loss (MEME p < 0.1, Fisher's exact test, pall trees < 0.02). Therefore 101 there is some evidence that the locus is evolving in response to coronaviruses; this is similar to 102 the finding of strong selection in aminopeptidase N (ANPEP) in response to coronaviruses in 103 mammals 18 . However, this pattern is driven by and strengthened in bats; in bats a greater 104 proportion of residues that contact SARS-CoV (MEME p<0.05, Fisher's exact test, pall trees < 105 0.03; MEME p<0.1, Fisher's exact test, pall trees < 0.02; Table S3 ) and SARS-CoV-2 (MEME p < 106 0.05, Fisher's exact test, pall trees < 0.02; MEME p<0.1, Fisher's exact test, pall trees < 0.0004; (Table S2; 5 115 trees, MEME, p < 0.05), but in bats positions 27, 31, 35 and 354 (Table S2 , 5 trees, MEME, p < 116 0.05) and 30, 38, 329 and 393 (Table S2 , 5 trees, MEME, p < 0.1) were additionally under 117 positive selection while positions 45 (Table S2 , 5 trees, MEME, p < 0.05) and 353 (Table S2, 5 118 trees, MEME, p < 0.1) were under selection in non-bat mammals but not bats. Using aBSREL we tested two a priori hypotheses, the first that bats are under positive 120 selection in ACE2 and the second that the family Rhinolophidae, the bat family in which the It is possible that the sequences we generated through target capture and genomic sequence 132 were of poorer sequencing quality than the reference genomes (though the number of residues 133 covered by sequences we generated and publicly available reference sequences was similar; 134 two-tailed t-test, t = -0.49, p = 0.63). When we removed the bat sequences we generated and 135 examined the remaining terminal branches, a greater proportion of bat branches were under 136 selection than non-bat branches, but statistical significance was lost, likely due to reduced 7 power (Table S4) . Increased positive selection in bats in ACE2 compared with other mammals 138 is consistent with their status as rich hosts of coronaviruses 5 . Host diversity of bats in a region is 139 associated with higher richness of coronaviruses 5 ; the diversity of bat ACE2 is consistent with 140 the idea that a diversity of bats and their ACE2 sequences are coevolving with a diversity of 141 viruses. Two bat families, Rhinolophidae and Hipposideridae, have been associated with SARS-143 related betacoronaviruses 5 , which use the ACE2 molecule as a viral receptor 9 . Interestingly, 144 while we found evidence that Rhinolophidae are under selection in ACE2, we found widespread 145 selection across bats. Branches in the rhinolophid/ hipposiderid clade were not more likely to be 146 under selection than other branches within bats (Fisher's exact test, pall trees > 0.7; Table S4 ) and 147 bat lineages that live outside the predicted range of these viruses (e.g. in the Americas 5 ) are 148 also under positive selection. Therefore, there are still aspects of the bat-coronavirus 149 relationship that we do not fully understand. At least one other coronavirus uses ACE2 to gain 150 entry into the host cell, HCoV-NL63, which may have its origin in bats 22 ; we found some 151 evidence for increased selection in the residues that contact this virus in bats (MEME p < 0.05, Fisher's exact test, pall trees < 0.07; MEME p < 0.1, Fisher's exact test, pall trees < 0.08; Table S3 ), 153 but not in non-bat mammals (MEME p < 0.05, Fisher's exact test, pall trees > 0.6; MEME p < 0.1, Fisher's exact test, pall trees > 0.4; Table S3 ). Many ACE2 residues that interact with HCoV-NL63 155 also interact with one or both of SARS-CoV and SARS-CoV-2 23,24 , which may be driving the 156 evidence of selection in these residues. However, we did find evidence of selection in residues 157 321 and 326 in both bats and non-bat mammals (Table S2 , 5 trees, MEME, p < 0.05), as well as 158 selection in bats in residue 322 (Table S2 , 5 trees, MEME, p < 0.05); these three residues 159 contact HCoV-NL63 but not SARS-CoV or SARS-CoV-2. Our finding of selection in residues 160 that contact HCoV-NL63 but not SARS-CoV or SARS-CoV-2 contrasts with the findings of a 161 smaller dataset of bats mostly from Europe, Asia and Africa 7 and may result from our greater 162 power to detect signal or signal originating from bats in different regions than previously tested. Table S1 ). However, amino acid similarity in these 178 sites across different species often diverged from what we would have predicted from phylogeny 179 alone. Notably, two rodents (Mesocricetus auratus and Peromyscus leucopus) had identical or 180 very similar amino acids to humans in all but 2 sites for each virus, and many artiodactyls (e.g. cows, deer, sheep, goats), cetaceans, cats, and pangolin were as similar or more similar to 182 humans than New World monkeys both in residues that contact SARS-CoV and in residues that 183 contact SARS-CoV-2. The civet fell in the middle of mammals in its similarity to humans in 184 residues that contact either or both viruses. In general, bats were not very similar to humans at 185 these 24 amino acid sites, some with as many as five changes that would likely reduce virus 186 binding, the most observed across mammals. Additionally, most bat sequences (56 of 91) showed that at least one of the two salt bridges (Lys 31 -Glu 35 ; Asp 38 -Lys 353 in humans) within 188 ACE2 would be disrupted by changing a charged amino acid to an uncharged amino acid or to 9 an amino acid with a clashing charge (Table S1 ). In Rhinolophidae, only one sequence of the 190 ten examined did not have a change in position 31 or 35 that would result in a clash between 191 two positively charged amino acids. Because of the large overlap in residues that contact 192 SARS-CoV and SARS-CoV-2 (19 residues) generally species were roughly as similar to 193 humans in residues that contact SARS-CoV and in residues that contact SARS-CoV-2 ( Figure 194 2). However, bats (two-sided Wilcoxon signed rank test, p < 0.0001) and carnivores (two-sided Wilcoxon signed rank test, p < 0.0004), particularly mustelids including ferrets, were more 196 similar to humans in residues that contact SARS-CoV-2 than residues that contact SARS-CoV 197 ( Figure 2 ). Examination of the diversity of ACE2 sequences across mammals and the similarity 199 between distantly related groups at key residues for interaction with SARS-CoV and SARS- CoV-2 allows one to make predictions about potential spillover hosts or other susceptible 201 species. In some cases, similarity of host residues seems to predict infection ability well. Old CoV-2 29,30 . Pangolins were as similar in their ACE2 residues to humans as cats, lending some 208 support for the idea that a virus that can bind pangolin ACE2 might be able to transmit to 209 humans. Accordingly, it seems prudent to exercise precautions when interacting with species 210 whose ACE2 is similar to humans in the contact residues for SARS-CoV and SARS-CoV-2, 211 especially domestic animals such as cats, cows, goats and sheep. Care should also be taken Table S1 . We sought additional data on the diversity of the ACE2 gene across bats using a 296 combination of samples collected in the field in Costa Rica and granted from museums (63 297 species; summarized in table S1). For samples collected in the field, bats were captured in mist 298 nets and a wing biopsy sample was collected. Bats were released immediately after sampling. Sequences are available from Genbank (MT333480-MT333534; Table S1 ). All sequences for ACE2 were aligned in Geneious 37 . Sequences were corrected by hand 313 to remove sequences outside the coding region and adjust gaps to be in frame with the coding 314 region using the human mRNA as a guide. Missing sequence, gaps and premature stop codons R 39 (version 3.6.2). We also calculated how "human-like" a species was across these 24 amino 328 acids, as well as separately for residues contacting SARS-CoV and residues contacting SARS- CoV-2 by giving a score to each amino acid in each position. Residues that were identical or 330 relatively equivalent to humans were given a score of 1; relative equivalency was inferred when 331 amino acids retained similar properties and abilities to participate in hydrogen bonds, Van der 332 Waals forces or salt bridges. Residues that would likely be worse at binding were given scores 333 of -1; reduced binding was inferred when amino acid properties were dramatically altered from 334 that of the human amino acid motif (e.g. replacement of a positively charged amino acid with a 335 negatively charged amino acid in a salt bridge). In general, asparagine and glutamine were 336 considered similar enough not to disrupt binding, as were amino acids with the same charge 337 and amino acids with small hydrophic side chains (valine, leucine, isoleucine and methionine). Amino acids whose effect was hard to determine were given scores of zero. Exact 339 determinations of the impact can be found in Table S2 . The human-like score was calculated as 340 a sum of each amino acid score divided by the total amino acids observed across all 24 sites or 341 all sites that contacted a given virus (since some species had missing data). We predicted the 342 N-linked glycosylation of Asn when Asn was found in the following motif N-X-S/T where X is not 343 a proline 40 . Glycosylation was not taken into account when calculating the human-like score. To determine whether it was likely to be interactions with coronavirus driving the 347 evolution of ACE2 we used MEME 19 to infer the residues under selection across the mammal 348 phylogeny, in just bats and in non-bat mammals and used a Fisher's exact test to determine 349 whether residues that interact with SARS-CoV, SARS-CoV-2 or HCoV-NL63 23 were more likely 350 to be under selection than other residues in ACE2. Only codons that showed variation (e.g. 351 more than one amino acid across all 198 species) and that were present in humans were 352 considered in the Fisher's exact test. We used a p < 0.05 cutoff for inferring selection at each 353 site via MEME but some results were shaper when using a p < 0.1 cutoff, likely due to the 354 reduction in loss of statistical power (Table S3) . (Table S4 ). As described in the 371 16 results, to guard against bias due to potentially lower sequence quality in the sequences we 372 generated, we repeated our Fisher's exact test using only terminal branches and removing 373 sequences we generated; the trend of a larger proportion of bat branches being under selection 374 was maintained but the results lose statistical significance (Table S4) . Similarity of residues was calculated based on the number of residues that were identical or 391 highly similar in binding properties to those found in human ACE2 with penalties for residues 392 that would likely disrupt binding (see methods). Scores of 1 indicate residues that contact the Table S1 : Data for each sequence on accession number, whether sequence is in selection 494 analyses, preservation of salt bridges, identity of residues contacting SARS-CoV or SARS-CoV-495 2, combination of residues, and scores for similarity to humans in residues contacting SARS- CoV and SARS-CoV-2 497 498 Table S2 : Summary data about residues that contact SARS-CoV, SARS-CoV-2 and HCoV-499 NL63 including diversity metrics, number of trees in which residues are inferred to be under 500 selection, which virus is contacted by each residue and the identity of amino acids that lead to a 501 positive or negative score in terms of similarity to humans. The Proximal 404 Origin of SARS-CoV-2 Genomic characterisation and epidemiology of 2019 novel coronavirus: 406 implications for virus origins and receptor binding A pneumonia outbreak associated with a new coronavirus of probable bat 408 origin Bats as 'special' reservoirs for emerging zoonotic 410 pathogens Global patterns in coronavirus diversity Angiotensin-converting enzyme 2 (ACE2) proteins of different bat species 413 confer variable susceptibility to SARS-CoV entry Evidence for ACE2-Utilizing Coronaviruses 415 (CoVs) Related to Severe Acute Respiratory Syndrome CoV in Bats Receptor recognition by novel 418 coronavirus from Wuhan: An analysis based on decade-long structural studies of SARS Functional assessment of cell entry and receptor 421 usage for SARS-CoV-2 and other lineage B betacoronaviruses Composition and divergence of coronavirus spike proteins and host ACE2 424 receptors predict potential intermediate hosts of SARS-CoV-2 Bat-to-human: Spike features determining 'host jump' of 427 coronaviruses SARS-CoV, MERS-CoV, and beyond Receptor and viral determinants of SARS-coronavirus adaptation to human 430 ACE2 Isolation and characterization of viruses related to the SARS coronavirus 432 from animals in Southern China Structural basis for the recognition of the SARS-CoV-2 by full-length human 434 ACE2 Structure of SARS coronavirus spike receptor-436 binding domain complexed with receptor Crystal structure of the 2019-nCoV spike receptor-binding domain bound 438 with the ACE2 receptor Inferring the mammal tree: Species-level sets 440 of phylogenies for questions in ecology, evolution, and conservation Viruses are a dominant driver of protein 443 20 adaptation in mammals Detecting individual sites subject to episodic diversifying selection Isolation and characterization of a bat SARS-like coronavirus that uses 447 the ACE2 receptor Less is more: An adaptive branch-site random effects model for 449 efficient detection of episodic diversifying selection Bats and coronaviruses Crystal structure of NL63 respiratory coronavirus 454 receptor-binding domain complexed with its human receptor The S proteins of human coronavirus NL63 and severe acute respiratory 457 syndrome coronavirus bind overlapping regions of ACE2 Angiotensin-converting enzyme 2 protects from lethal avian influenza A 459 H5N1 infections New World Bats Harbor Diverse Influenza A Viruses Probable Pangolin Origin of SARS-CoV-2 Associated with 463 the COVID-19 Outbreak Replication of SARS coronavirus administered into the respiratory tract 465 of African Green, rhesus and cynomolgus monkeys Susceptibility of ferrets, cats, dogs, and different domestic animals to SARS-467 coronavirus-2 SARS virus infection of cats and ferrets A review of studies on animal reservoirs of the SARS coronavirus ACE2 -angiotensin I converting enzyme 2. Bethesda (MD): National Library of Medicine 472 (US) An evaluation of transcriptome-based exon capture for 475 frog phylogenomics across multiple scales of divergence (Class: Amphibia Adaptive seeds tame 478 genomic sequence comparison BLAT -The BLAST-like alignment tool The BioMart community portal: An innovative alternative to large, 481 centralized data repositories Geneious Basic: an integrated and extendable desktop software 483 platform for the organization and analysis of sequence data R: A language and environment for statistical computing HyPhy 2.5 -A Customizable Platform for Evolutionary 489 Hypothesis Testing Using Phylogenies