key: cord-0743370-ye570v9g authors: Praharaj, Manas Ranjan; Garg, Priyanka; Nabi Khan, Raja Ishaq; Sharma, Shailesh; Panigrahi, Manjit; Mishra, B P; Mishra, Bina; kumar, G Sai; Gandham, Ravi Kumar; Singh, Raj Kumar; Majumdar, Subeer; Mohapatra, Trilochan title: Prediction analysis of SARS-COV-2 entry in Livestock and Wild animals date: 2020-05-15 journal: bioRxiv DOI: 10.1101/2020.05.08.084327 sha: 664e32e6731dce63138346480dbfc6df71db3e0c doc_id: 743370 cord_uid: ye570v9g Interaction between spike protein of SARS-CoV-2 and ACE2 receptor on the cells is a potential factor in the infectivity of a host. The protein and nucleotide sequences of ACE2 were initially compared across different species to identify key differences among them. The ACE2 receptor of various species was homology modelled and assessed for its binding ability to the spike receptor-binding domain of SARS-CoV-2. Out of the several spike binding properties of ACE2, a significant difference between the known, infected and uninfected species was observed for Entropy side chain, Van der Waals, Solvation Polar, Solvation Hydrophobic and Interface Residues. However, these parameters did not specifically categorize the animals into infected or uninfected, for all the Orders (of animals). This clearly established the fact that no single parameter should be used to predict SARS-CoV-2 entry. The logistic regression model constructed led to inclusion of Interaction energy, entropy sidechain and entropy mainchain for estimating the probability of viral entry in different species. In the mammalian class, most of the species of Carnivores, Artiodactyls, Perissodactyls, Pholidota, and Primates showed high probability of viral entry. However, among the primates, baboons have very low probability of viral entry. Among rodents, hamsters were highly probable for viral entry with rats and mice having a very low probability. Rabbits have a medium probability of viral entry. In Birds, ducks have a very low probability, while chickens seemed to have medium probability and turkey showed the highest probability of viral entry. Three large-scale disease outbreaks during the past two decades, viz., Severe Acute Respiratory Syndrome (SARS), Middle East Respiratory Syndrome (MERS), and Swine Acute Diarrhea Syndrome (SADS) were caused by three zoonotic coronaviruses. SARS and MERS, which emerged in 2003 and 2012 , respectively, caused a worldwide pandemic claiming 774 (8,000 SARS cases) and 866 (2,519 MERS cases) human lives, respectively(2020), while SADS devastated livestock production by causing fatal diseases in pigs in 2017. The SARS and MERS viruses had several common factors in having originated from bats in China and being pathogenic to human or livestock (Drosten, et al., 2003; Fan, et al., 2019; Zhou, et al., 2018) . Seventeen years after the first highly pathogenic human coronavirus, SARS-COV-2 is devastating the world with 4,560,457 cases and 304,309 deaths (as on May 15, 2020)(2020). This outbreak was first identified in Wuhan City, Hubei Province, China, in December 2019 and notified by WHO on 5 th January 2020. The disease has since been named as COVID-19 by WHO. Coronaviruses (CoVs) are an enveloped, crown-like viral particles belonging to the subfamily Orthocoronavirinae in the family Coronaviridae and the order Nidovirales. They harbor a positive-sense, single-strand RNA (+ssRNA) genome of 27-32 kb in size. Two large overlapping polyproteins, ORF1a and ORF1b, that are processed into the viral polymerase (RdRp) and other nonstructural proteins involved in RNA synthesis or host response modulation, cover two thirds of the genome. The rest 1/3 of the genome encodes for four structural proteins (spike (S), envelope (E), membrane (M), and nucleocapsid (N)) and other accessory proteins. The four structural proteins and the ORF1a/ORF1b are relatively consistent among the CoVs, however, number and size of accessory proteins govern the length of the CoV genome (Fan, et al., 2019) . This genome expansion is said to have facilitated acquisition of genes that encode accessory proteins, which are beneficial for CoVs to adapt to a specific host (Forni, et al., 2017; Subissi, et al., 2014) . Next generation sequencing has increased the detection and identification of new CoV species resulting in expansion of CoV subfamily. Currently, there are four genera (α-, β -, δ -, and γ -) with thirty-eight unique species in CoV subfamily (ICTV classification) including the three highly pathogenic CoVs, viz., SARS-CoV-1, MERS-CoV, SARS-CoV-2 are β -CoVs (Zaki, et al., 2012) . Coronaviruses are notoriously promiscuous. Bats host thousands of these types, without succumbing to illness. The CoVs are known to infect mammals and birds, including dogs, chickens, cattle, pigs, cats, pangolins, and bats. These viruses have the potential to leap to new species and in this process mutate along the way to adapt to their new host(s). COVID -19, global crisis likely started with CoV infected horseshoe bat in China. The SARS-CoV-2 is spreading around the world in the hunt of entirely new reservoir hosts for re-infecting people in the future(2020). Recent reports of COVID-19 in a Pomeranian dog and a German shepherd in Hong Kong(2020); in a domestic cat in Belgium(2020) ; in five Malayan tigers and three lions at the Bronx Zoo in New York City(2020) and in minks(2020) make it all the more necessary to predict species that could be the most likely potential reservoir hosts in times to come. Angiotensin-converting enzyme 2 (ACE2), an enzyme that physiologically counters RAAS activation functions as a receptor for both the SARS viruses (SARS-CoV-1 and SARS-CoV-2) (Hoffmann, et al., 2020; Li, et al., 2003; Zhou, et al., 2020) . The ACE2 human RefSeqGene is 48037 bp in length with18 exons and is located on chromosome X. ACE2 is found attached to the outer surface of cells in the lungs, arteries, heart, kidney, and intestines (Donoghue, et al., 2000; Hamming, et al., 2004) . The potential factor in the infectivity of a cell is the interaction between SARS viruses and the ACE2 receptor (Li, et al., 2005; Wrapp, et al., 2020) . By comparing the ACE2 sequence, several species that might be infected with SARS-CoV2 have been identified (Qiu, et al., 2020) . Recent studies, exposing cells/animals to the SARS-CoV2, revealed humans, horseshoe bats, civets, ferrets, cats and pigs could be infected with the virus and mice, dogs, pigs, chickens, and ducks could not be or poorly infected (Shi, et al., 2020; Zhou, et al., 2020) . Pigs, chickens, fruit bats, and ferrets are being exposed to SARS-CoV2 at Friedrich-Loeffler Institute and initial results suggest that Egyptian fruit bats and ferrets are susceptible, whereas pigs and chickens are not(2020). In this cause of predicting potential hosts, no studies on ACE2 sequence comparison among species along with homology modeling and prediction, to define its interaction with the spike protein of SARS-CoV-2 are available. Therefore, the present study is taken to identify viral entry in potential hosts through sequence comparison, homology modeling and prediction. In this study, 48 (mammalian, reptilian and avian species) ACE2 complete/partial protein and nucleotide sequences available on NCBI were analyzed ( Table 1) to understand the possible difference(s) in the ACE2 sequences that may correlate with SARS-CoV-2 viral entry into the cell. Within the mammalian class, Orders -Artiodactyla, Perrisodactyla, Chiroptera, Rodentia, Carnivora, Lagomorpha, Primates, Pholidota and Proboscidea; within the Reptilian class, Orders -Testutides and Crocodile; and within the Avian class, Orders -Acciptriformes, Anseriformes and Galliformes, were considered in the study. These orders were considered keeping in view all the possible reservoir hosts/ laboratory animal models that can possibly be infected with the SARS-CoV-2. The within between group distances were calculated in Mega 6.0 (Tamura, et al., 2013) . The Codon-based Z test of selection (strict-neutrality (dN=dS)) to evaluate synonymous and non-synonymous substitutions across the ACE2 sequences among the Orders was done. Phylogenetic analysis of the protein sequences was done using MEGA 6.0 (Tamura, et al., 2013) . Initially, the sequence alignment was done using Clustal W (Thompson, et al., 1994) . The aligned sequences were analyzed for the best nucleotide substitution model on the basis of Bayesian information criterion scores using the JModelTest software v2.1.7 (Darriba, et al., 2012) . The tree was constructed by the Neighbor-joining method with the best model obtained using 1000 bootstrap replicates. The Structure of novel coronavirus spike receptor-binding domain complexed with its receptor ACE2 which was determined through X-ray diffraction is available at PDB database with ID 6LZG (Wang, et al., 2020) . This available ACE2 model from PDB databse is used for homology modeling using SWISS-MODEL (Waterhouse, et al., 2018) . SWISS-MODEL is a fully automatic homology modeling server for protein structure, which can be accessed through ExPASy web server. The spike receptor-binding domain of 6LZG was used in docking along with the homology modelled structures of ACE2 proteins of all the hosts, i.e., ACE2 of 48 hosts as a receptor and spike receptor-binding domain of SARS-CoV-2 (from 6LZG) as a ligand for protein-protein docking. GRAMM-X docking server was used for proteinprotein docking, which generated a docked complex (Tovchigrechko and Vakser, 2006) . Post-docking analysis was carried out using Chimera software (Pettersen, et al., 2004) , which is an extensible program for interactive visualization and analysis of molecular structures for use in structural biology. It provides the user with high quality 3D images, density maps, trajectories of small molecules and biological macromolecules, such as proteins. The homology modelled structure(s) of each species are compared with the human 6LZG to calculate the RMSD (root mean squared deviation). As most the deviation values could not be calculated with 6LZG model, the deviation(s) with respect to different human models 108a and 6M18 (Yan, et al., 2020) were calculated. A significant (P < 0.05) correlation in the deviation values calculated from 6LZG and 6M18 was observed. As most of the values could be calculated as deviations from 6M18, these values were used for further analysis along with the parameters below. For the binding of the modelled structure(s) of ACE2 and the spike receptorbinding domain, using FoldX software (Strokach, et al., 2019) , several parameters (referred as spike binding properties of ACE2) -Interaction Energy, Backbone Hydrogen bond, Side chain Hydrogen bond, Van-der-Waals interaction, Electrostatic interaction, Solvation polar, Solvation hydrophobic and Entropy sidechain, entropy mainchain, torsional clash, backbone clash, helix dipole, disulfide, electrostatic kon, Interface Residues, Interface Residue Clashing and Interface Residues VdW Clashing, were estimated. Till date, clear-cut information of 17 species that are either infected or uninfected with SARS-CoV2 is available (Supplementary table 1) . Initially, for each parameter (spike binding properties of ACE2), the difference between the infected and uninfected is tested using both Mann-Whitney non-parametric test was done using GraphPad Prism 7.00 (GraphPad Software, La Jolla, California, USA). For those parameters that were significant the difference between Order(s) and the infected/uninfected groups was established using Mann-Whitney non-parametric test (Note: if a species is included in the infected/uninfected group, the same is not included in its Order on comparing the Order(s) with infected/uninfected group) (Supplementary table 2 for more information). Later, a Logistic regression model was constructed on all the 18 parameters (17 from FoldX and RMSD w.r.t 6M18) estimated above. With 18 parameters, the minimum sample size required to derive statistics that represent each parameter, is 1000 (Bujang, et al., 2018 ) (n =100 + xi i.e here :-n = 100 + (100 + (50 × 18) = 1000, with a minimum of 50 events per parameter). The data needed to be extrapolated to at least 1000. This needed us to take an assumption that the ACE2 structure and sequence is conserved within a species. For the species -Homo sapiens, we compared several ACE2 sequences and found that all the compared sequences were identical. With this assumption that the spike binding properties of ACE2 within a species are conserved and due to the pandemic nature of the disease the data was extrapolated. All the parameters were included in the glm -logistic regression to construct the best model (based on R 2 ) for prediction. The goodness of fit was tested with Hosmer and Lemeshow goodness of fit test. The reduction in null deviance was tested with Chisquare test. Recognition of the receptor is an important determinant in identifying the host range and cross-species infection of viruses (Li, 2013) . It has been established that ACE2 is the cellular receptor of SARS-CoV-2 (Zhou, et al., 2020) . This study is targeted to predict viral entry in a host, i.e., hosts that can be reservoir hosts (Artiodactyla, Perrisodactyla, Chiroptera, Carnivora, Lagomorpha, Primates, Pholidota, Proboscidea, Testutides, Crocodilia, Acciptriformes and Galliformes) and hosts that can be appropriate small animal laboratory models (Rodentia) of SARS-CoV-2 through sequence comparison and homology modeling of ACE2 and prediction The protein and DNA sequence lengths of ACE2 varied in different hosts ( Table 1 ). Among the sequences that were compared, the longest CDS was found in the Order -Chiroptera (Myotis braditii -811 aa) and the smallest in the Order -Proboscidea (Loxodonta africana -800 aa). Homo sapiens ACE2 is taken as a standard to compare all the sequences because of the on-going pandemic nature of COVID-19 and the availability of its 3D structure -6LZG (Wang, et al., 2020) . The within group mean distance, the parameter indicative of variability of nucleotide sequences within the group was found to be minimum in Perrisodactyla followed by Primates and was maximum among the Galliformes followed by Chiroptera ( Table 2 ). This indicates that within the group of primates, all the considered species are prone to be equally infected with SARS-CoV-2 as humans. Further, to establish the probability of SARS-CoV-2 entry into species of other Orders, the distance of all orders from Primates was assessed (Table 3 ). This distance was found minimum for Perissodactyls followed by Carnivores and maximum for Galliformes followed by Anseriformes. This confirms with the recent reports of Chicken (Galliformes) and ducks (Anseriformes) not being infected with SARS-CoV-2 (Shi, et al., 2020) , and tigers and lions being infected(2020). To decide a cut-off distance that can establish whether the species can be infected or not, the individual distance of each species from Homo sapiens was evaluated (Supplementary Table 3 ). Melaegris gallapova (Turkey) is the species, which had the greatest distance from Homo sapiens. Recently, it was reported that SARS-CoV-2 does not infect pigs, chickens, ducks (Shi, et al., 2020) and rats (Zhang, et al., 2020) . The minimum distance that corresponds to the species that is already established to be uninfected with the SARS-CoV-2 would be 0.187 of Rattus norvegicus (Rat). Considering this distance from Homo sapiens as a cut-off, would include all the carnivores, perissodactyls and few artiodactyls viz. Goat, buffalo, Bison and sheep, to be infected and excludes cattle (Artiodactyla), all the bats (Chiroptera) and birds (Galliformes, Anseriformes and Accipitriformes). Similar distance values were observed on evaluating the protein sequences as well (Table-2 Galliformes, followed by Acciptriformes, Testidunes, Crocodilia and Chiroptera. The protein sequence alignment at 30-41aa, 82-84 aa and 353-357 also showed similar sequence conservation and variability (Figure 2) . The Codon-based Test of Neutrality to understand the selection pressure on the ACE2 sequence in the process of evolution was done. The analysis showed that there was a significant negative selection between and within orders for the ACE2 sequence indicating that, though, there is a variation at the nucleotide level, the protein translation had synonymous substitutions dominating over the non-synonymous substitutions. This negative selection indicates that the structure of ACE2 is being conserved through the process of evolution. The protein sequences that were aligned were further subjected to find the best substitution model for phylogenetic analysis. The best model on the basis of BIC was found to be JTT + G. The phylogenetic analysis clearly classified the sequences of the species into their Orders. All the sequences were clearly grouped into two clusters. The first cluster represented the Mammalian class and the second cluster was represented by two sub-clusters of Avian and Reptilian classes with high bootstrap values ( Figure 3 ). Within the mammalian cluster, the artiodactyls were sub-clustered farthest to the primates and the rodents, lagomorphs and carnivores were found clustered close to the primates with reliable bootstrap values. This partially corroborates with the occurrence of SARS-CoV-2 infection in carnivores (Shi, et al., 2020) since rats were found uninfected with SARS-CoV-2 (Zhang, et al., 2020) . The Chiroptera sub-cluster had a sub-node constituting horseshoe bat (Rhinolophus ferrumequinum) and the fruit bats Dec 2019 was traced back to have a probable origin from horseshoe bat (Zhou, et al., 2020) . The virus strain RaTG13 isolated from this bat was found to have 96.2% sequence similarity with the human SARS-CoV-2. This suggests that the virus probably could enter the fruit-bat as well, since it clustered with horseshoe bat to a common subnode. These results again leave us with no concrete conclusions on viral entry in various hosts. Therefore, to assess the probability of viral entry in various species, homology modeling of ACE2 along with its interaction with coronavirus spike receptorbinding domain was analyzed for all the 48 hosts. Homology modeling was done for all the ACE2 sequences based on the X-ray diffraction structure defined in 6LZG (PDB database). The models constructed were then studied for their interaction with the spike receptor binding domain defined in the same ID. It was observed that the modelled interaction of human ACE2 showed four hydrogen bonds between the ACE2 and Spike receptor binding domain. The hydrogen bonds between the ACE2 and Spike receptor binding domain varied for different species (Fig 4) . In FoldX, several parameters (spike binding properties of ACE2) were estimated for the binding of ACE2 with spike receptor binding domain. Logistic regression model was constructed on 17 species (known infected or uninfected) using these parameters. When each parameter was considered individually, significant difference between the infected and uninfected groups was observed for Entropy side chain, Van der Waals, Solvation Polar, Solvation Hydrophobic and Interface Residues (Supplementary Table 4 ). Each of the Order(s) was tested as a group for their possibility of infection by comparing them with the infected and uninfected groups for all these significant parameters (Figure 5, Figure 6 & Figure 7) . For the parameterssolvation hydrophobic and entropy side chain, artiodactyls were found significantly (P<0.05) different from the uninfected group and not significantly (P<0.05) different from the infected group ( Figure 5 ). This indicates that the artiodactyls considered in the study can be infected. The testudines were significantly different from the infected and not significantly different from the uninfected groups for all the parameters (Figure 6 ). This suggests that the species considered under testudines may not be infected. However, analysis for the Order -Chiroptera revealed that this group is not significantly different from both the infected and uninfected groups (Figure 7) for all the five parameters, leaving no clue about the probability of infection in this group. This suggests that a single parameter at a time, as has been considered in recent reports (Qiu, et al., 2020) , may not be considered and evaluated for estimating the probability of virus entry. Therefore, all the estimated parameters were considered in logistic regression to find the best possible independent variables that would influence the entry of the SARS-CoV-2. On evaluating several models, we finally included a model with Interaction energy, entropy side chain and entropy main chain, as independent variables, with an R 2 of 0.807. Hosmer and Lemeshow goodness of fit test showed no significant difference between the model and the observed data (p > 0.05) indicating that the model constructed is a good fit. There was also a statistically significant reduction in null deviance on inclusion of these three parameters (Supplementary Table 5 ). The predicted probabilities are given in Table 4 . Within the Order Artiodactyla, all species except Sus scrofa (Pig) had 99% probability of viral (SARS-CoV-2) entry using ACE2 as a receptor. It has been predicted that Bos indicus (Indian cattle) and Bos taurus (Exotic cattle) can act as intermediate hosts of SARS-CoV-2(Luan, et al., 2020) and that pigs are not susceptible (Shi, et al., 2020) . Also, Camels, which are reported to be infected with SARS-CoV(Gong and Bao, 2018) are equally capable of SARS-CoV-2 infection. Among the rodents, hamsters had the highest probability of viral entry (Zhang, et al., 2020) . It has been established that SARS-CoV-2 effectively infects hamster (Lau, et al., 2020) and, rats and mice were found less probable (Zhang, et al., 2020) . All the Carnivores except Lontra canadensis (Otter) in the study had high probability of viral entry. Reports of SARS-CoV2 infection in cats (Shi, et al., 2020) , tigers and lions(2020) substantiate our estimates obtained in the study. Rabbits had medium probability of viral entry showing some resemblance to the recent evidence of SARS-CoV-2 replication in rabbit cell lines (Chu, et al., 2020) . In bats, the probability of viral entry was high in family Vespertilionidae. Rhinolophus ferrumequinum (horse-shoe bat) and Phyllostomus discolor (Pale spear-nosed bat) had lower probability of viral entry. The kidney cell line from the Rhinolophus genus was found infected with SARS-CoV but not with SARS- CoV-2(Chu, et al., 2020) . However, probability of viral entry in chicken and ducks was found to be low. All the primates except baboon were predicted to have ~ 100% probability of viral entry as evident from the devasting nature of the disease in humans. Among the reptiles, both the testudines and crocodilia, showed low probability of viral entry. In the class Aves, Anas platyrhynchos (ducks) and Haliaeetus albicilla (eagles) showed the lowest probability followed by Gallus gallus (chicken). Aquila chrysaetos chrysaetos (Golden Eagle) and Meleagris gallapova (turkey) showed highest probability of viral entry. Most of the species considered in this study showed high probability of viral entry. However, viral entry is not the only factor that determines infection in COVID-19 as viral loads were found to be high in asymptomatic patients (Rabi, et al., 2020; Zou, et al., 2020) . The important factors that determine disease/infection in host (s) are -Host defense potential, underlying health conditions, host behavior and number of contacts, Age, Atmospheric temperature, Population density, Airflow and ventilation and Humidity(Lakshmi Priyadarsini and Suresh, 2020). S t r u c t u r e o f S A R S c o r o n a v i r u s s p i k e r e c e p t o r -b i n d i n g d o m a i n c o m p l e x e d w i t h r e c e p t o r . S c i e n c e 2 0 0 5 ; 3 0 9 ( 5 7 4 2 ) : 1 8 6 4 -1 8 6 8 . L i , W . , e t a l . A n g i o t e n s i n -c o n v e r t i n g e n z y m e 2 i s a f u n c t i o n a l r e c e p t o r f o r t h e S A R S c o r o n a v i r u s . N a t u r e 2 0 0 3 ; 4 2 6 ( 6 9 6 5 ) : l s w i n e a c u t e d i a r r h o e a s y n d r o m e c a u s e d b y a n H K U 2 -r e l a t e d c o r o n a v i r u s o f b a t o r i g i n . N a t u r e 2 0 1 8 ; 5 5 6 ( 7 7 0 0 ) : 2 5 5 -2 5 8 . Z h o u , P . , e t a l . A p n e u m o n i a o u t b r e a k a s s o c i a t e d w i t h a n e w c o r o n a v i r u s o f p r o b a b l e b a t o r i g i n . N a t u r e 2 0 2 0 ; 5 7 9 ( 7 7 9 8 ) : 2 7 0 -2 7 3 . Representative protein modelled structures showing the interaction between ACE2 of (A) Human (B) Cat (C) Donkey (D) Exotic cattle (E) Chinese alligator & (F) Greater horseshoe bat, and spike receptor binding domain of SARS-CoV-2 Entropy side chain -Significant difference on comparison of Artiodactyls with uninfected group and no significant difference from infected group. (C). Solvation hydrophobic -Significant difference on comparison of Artiodactyls with uninfected and no significant difference from infected. (D) Solvation hydrophobic -No Significant difference on comparison of Chiroptera with infected and uninfected groups. (D) We are grateful to Director NIAB and Director IVRI for the support. The author has declared no competing interests.