key: cord-0957244-jtszdioh authors: Sharma, Swarkar; Singh, Inderpal; Haider, Shazia; Malik, Md. Zubbair; Ponnusamy, Kalaiarasan; Rai, Ekta title: ACE2 Homo-dimerization, Human Genomic variants and Interaction of Host Proteins Explain High Population Specific Differences in Outcomes of COVID19 date: 2020-05-19 journal: bioRxiv DOI: 10.1101/2020.04.24.050534 sha: f0787016dfda0afd8348b343fc5f4b345df1797c doc_id: 957244 cord_uid: jtszdioh Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a positive single-stranded RNA virus that causes a highly contagious Corona Virus Disease (COVID19). Entry of SARS-CoV-2 in human cells depends on binding of the viral spike (S) proteins to cellular receptor Angiotensin-converting enzyme 2 (ACE2) and on S-protein priming by host cell serine protease TMPRSS2. Recently, COVID19 has been declared pandemic by World Health Organization (WHO) yet high differences in disease outcomes across countries have been seen. We provide evidences to explain these population-level differences. One of the key factors of entry of the virus in host cells presumably is because of differential interaction of viral proteins with host cell proteins due to different genetic backgrounds. Based on our findings, we conclude that a higher expression of ACE2 is facilitated by natural variations, acting as Expression quantitative trait loci (eQTLs), with different frequencies in different populations. We suggest that high expression of ACE2 results in homo-dimerization, proving disadvantageous for TMPRSS2 mediated cleavage of ACE2; whereas, the monomeric ACE2 has higher preferential binding with SARS-CoV-2 S-Protein vis-a-vis its dimerized counterpart. Further, eQTLs in TMPRSS2 and natural structural variations in the gene may also result in differential outcomes towards priming of viral S-protein, a critical step for entry of the Virus in host cells. In addition, we suggest that several key host genes, like SLC6A19, ADAM17, RPS6, HNRNPA1, SUMO1, NACA, BTF3 and some other proteases as Cathepsins, might have a critical role. To conclude, understanding population specific differences in these genes may help in developing appropriate management strategies for COVID19 with better therapeutic interventions. The recent emergence of corona virus disease (COVID19) caused by SARS-CoV-2 has resulted in >4.1 Million infections and >285 thousands deaths worldwide so far, and the numbers are increasing exponentially [https://covid19.who.int/]. SARS-CoV-2 is reported to be originated in bats and transmitted to humans via unknown intermediate host. However, its origin is still being questioned time and again. With the declaration of SARS-CoV-2 as pandemic by WHO, extensive research worldwide has been carried out. It has been established that Human ACE2 mediates SARS-CoV-2 entry into cells through its Spike (S) Protein, which primarily makes entry to host body through respiratory tract with nasal epithelial cells as potential initial infection site (Sungnak et al., 2020) . ACE2 is a functional receptor on human cells for this newly originated coronavirus (Walls et al., 2020) with a higher affinity than the severe acute respiratory syndrome coronavirus (SARS-CoV) originated in . However, no substantial evidence exists about the higher expression of ACE2 being primarily associated with the degree of infection (Kuster et al., 2020) . Additionally, COVID19 lethality is mostly driven by the extent of underlying lung injury; whereas, a negative correlation has been reported between ACE2 expression and lung injury (Imai et al., 2005) . A recent report also suggests an inhibition of SARS-CoV-2 by human recombinant soluble (hrs) ACE2 (Monteil et al., 2020) , making it an interesting question to explore. High differences in clinical outcomes across countries have been [https://covid19.who.int/] noted which demonstrate that neither all people who are exposed to SARS-CoV-2 develop infection nor all infected patients end up in severe respiratory illness (Guan et al., 2020) , which cannot be explained by immunity alone (Shi et al., 2020) . This leads one to hypothesize about differential genetic susceptibility to COVID19 and virulence of SARS-CoV-2 in different populations (Kaiser, 2020) . Efforts are being made to gain a better understanding of this disease and even propose prophylactic-hypothetical role of Bacillus Calmette-Guérin (BCG) vaccine, a vaccine primarily used against tuberculosis (TB), for reduced morbidity and mortality for COVID19 in human population (Miller et al., 2020) . Due to extensive contemporary research, evidences in favour (Stawiski et al., 2020) as well as against (Cao et al., 2020) the existence of SARS-CoV-2 S-protein bindingresistant ACE2 natural variants, in different populations, have been found recently. In addition, studies highlighting role of eQTLs in ACE2 expression, resulting in potential differential COVID19 fatality (Cao et al., 2020; Chen et al., 2020a) , are pouring in. Where most such studies are targeting only natural variations in ACE2 gene as SARS-CoV-2 differential susceptibility factor, recent evidences suggest that additional host proteins like cellular serine protease TMPRSS2 act as co-factors and are critical for efficient cellular infection by SARS-CoV-2 (Hoffmann et al., 2020) . Further, it is equally important to consider that rare functional variants, with uncertain consequences, may not explain large-scale population level differential clinical outcomes. This highlights the importance of identification of other potential co-factors and underlying mechanisms these genes could be involved in. At the same time, understanding the interactions of these host proteins with SARS-CoV-2 along with ACE2 may explain many of the unanswered questions. Further evaluation of these host genes, and exploring their natural occurring variants , along with their expression patterns, may also shed some light on better conceptual framework of differential susceptibility to COVID19 and the virulence of SARS-CoV-2. In the present study, we have tried to exploit existing literature to understand mechanisms of correlations, of several relevant host genes, including ACE2, which interact specifically with some prominent viral proteins, using in-silico approaches to understand the role of these as factors responsible for population level differences. As SARS-CoV-2 is primarily a respiratory pathogen, the present study was conducted to understand mechanisms of its entry to cells in lungs, and we believe it could be extrapolated to other respiratory tract tissues, to explore potential factors that are responsible for differential outcomes in respiratory illness. To begin with, we started with the most studied host protein ACE2 and tried to understand its interaction with SARS-CoV-2 S-Protein followed by adding other interacting proteins. We further added layers of other methods to have a better understanding of the potential causes and outcomes. Recently submitted experimental structure (6M17.pdb) of ACE2 dimer bound to two B0AT1 (coded by SLC6A19) and two receptor-binding domains (RBD) of S1 of SARS-CoV-2 Spike protein (S-protein), with overall resolution of 2.90 Angstrom and SARS-CoV2 Spike Glycoprotein with RBD domain in Up conformation (6VSB.pdb) were downloaded from the Protein Data Bank for structural visualization and dynamics analysis (Wrapp et al., 2020; Yan and Zhang, 2020) . This structure was corrected for missing amino acids, side chains, missing hydrogen atoms, disulphide bonds etc. It was also optimized for pKa corresponding to physiological pH 7.2 and energy minimized to correct steric clashes using the Protein Preparation Wizard of Schrodinger Maestro (Madhavi Sastry et al., 2013) . Predefined solvation model TIP3P was used and overall neutrality of the system was maintained by addition of Na+ and Cl-counter ions (Mark and Nilsson, 2001) . Physiological salt concentration of 0.15 Molar was generated through addition of NaCl. Periodic boundary condition of 10 Angstrom was set using the System Builder Tool of Desmond software (2006) . Total two MD simulations, each 10 nanosecond long, were conducted. In one of the simulations, six chains i.e. ACE2 dimer with each monomer bound to a B0AT1 and RBD domain was simulated, where the resulting solvated system consisted of 424,847 atoms. For the other simulation, monomeric ACE2 bound to viral RBD consisted of 174,900 atoms. Root mean square fluctuation (RMSF) and principal component analysis were done through R based BIO3D module (Grant et al., 2006) . MMGBSA free energy of binding between proteins was 4 calculated through Prime Software of Schrodinger Suite (Jacobson et al., 2002) . Structural visualizations and images were traced using pyMOL and VMD (Jacobson et al., 2002) (Humphrey et al., 1996) . Genotype-Tissue Expression (GTEx) portal at https://gtexportal.org/ was used to explore various gene expression profiles. By Tissue, Multigene Query as well as transcript browser was used to understand expression of splice variants and exons in Lungs. eQTLs were viewed by GTEx IGV Browser as well as GTEx Locus Browser was used to plot gene specific eQTLs. eQTLs and other gene regulatory information, splice variants, and ESTs were also explored through UCSC genome browser, https://genome.ucsc.edu/ with build GRCh37/hg19. Genetic data for global population groups was explored through the GnomAD portal at TMPRSS2 has been recently characterized as a critical component for cell entry by SARS-CoV-2 (Hoffmann et al., 2020) . To understand its interactions, the protein sequence of surface glycoprotein (YP_009724390.1) of SARS CoV-2 and two transcripts of TMPRSS2 (NP_005647.3 and NP_001128571.1) protein of homo sapiens, were retrieved from the NCBI-Protein database. Pairwise sequence alignment of TMPRSS2 isoforms was carried out by Clustal Omega tool (Sievers et al., 2011) . The protein domain information and transcript variation were retrieved from UniProt (UniProt, 2019), Prosite (Sigrist et al., 2013) , Pfam (El-Gebali et al., 2019) and ENSEMBL , respectively. The homology model of the S-protein and TMPRSS2 were constructed using Swiss model (Waterhouse et al., 2018) , whereas 3D structure of the ACE2 was retrieved from the PDB database (PDB ID: 6M17). These structures were energy minimized by the Chiron energy minimization server (Ramachandran et al., 2011) . The binding site residues of the proteins retrieved from Uniport and literature. The mutant structure of the TMPRSS2 protein was generated using WHATIF server (Chinea et al., 1995) and energy minimized. The effect of the mutant was analysed using HOPE (Venselaar et al., 2010) and I-mutant (Capriotti et al., 2006) . The I-mutant method allows to predict stability of the protein due to mutation. The docking studies for wild and mutant TMPRSS2 with S-protein and ACE2 were carried out using HADDOCK (Dominguez et al., 2003) . We downloaded the SARS-CoV-2 genome (Accession number: MT121215) from the NCBI database. In order to find the Host-Pathogen Interactions (HPIs), the SARS-CoV-2 protein sequences were subjected to Host-Pathogen Interaction Database (HPIDB 3.0) (Ammari et al., 2016; Kumar and Nanduri, 2010) . In addition to other host genes, we added two more proteins (TMPRSS2 and SLC6A19) which were found to have an important role in the mechanism of viral entry (Hoffmann et al., 2020) . The protein-protein interaction (PPI) and transcription factor regulation of human proteins were retrieved from GeneMANIA (Warde-Farley et al., 2010) and literature (Barros et al., 2012; David et al., 2010; Maitland et al., 2011; Tumer et al., 2013; Yu et al., 2010; Zhang et al., 2009) , respectively. The Host Pathogen Interaction Network (HPIN) was visualized using Cytoscape (Shannon et al., 2003) which includes the information collected from HPIDB, PPIs and TFs. Modules were defined as the set of statistics and functionally significant interacting genes (Reichardt and Bornholdt, 2006) which was constructed by using MCODE (Bader and Hogue, 2003) . Further, Hubs were identified using Network analyzer (Assenov et al., 2008) , the plugin of Cytoscape v3.2.1. We also studied the hub proteins' association with diseases, using DisGeNET (Menche et al., 2015) . The topological properties of the network and their functional significance was studied within the formalism of network theory which could predict important organizational and regulatory candidates in the network. We used network theoretical concepts, namely probability of degree distribution, clustering coefficient and average neighborhood connectivity, to characterize the structural and organizational features of the network (Details in supplementary Information). Removal of high degree nodes (hubs) in biological networks may cause lethality to the corresponding organism; the phenomenon has been referred to as the centrality-lethality rule (Jeong et al., 2001) , verified by various genomic investigations. This idea of understanding enabled us to determine the topological features of the network architecture, important functionally related modules and hubs (Malik et al., 2019; Nafis et al., 2016) . The topological properties were calculated after each subsequent removal, using the Network Analyzer plugin in Cytoscape version 3.7.1. The calculated topological properties of knock-out experiment were compared with those of the main network and the change in the properties helped in understanding the role of leading hubs in the network. SARS-CoV-2 uses ACE2 for entry, and its S-protein priming by the serine protease TMPRSS2 is a key factor. Despite the availability of other proteases, like cathepsin B and L, in the host cells, yet only TMPRSS2 activity is essential for pathogenesis and spread (Hoffmann et al., 2020) . Mutagenesis experiments have reported the importance of Arginine and Lysine residues in ACE2, positioned between 697 to 716 as an important recognition site for TMPRSS2 mediated cleavage ( Figure S1a ) of ACE2. Mapping these residues on the recently resolved structure (61M7.pdb) suggests that this region lies in symmetric dimerization interface between ACE2 homodimer; and is also masked from outside by two BoAT1 in an overall 1:2:1 hetero-tetramer of B0AT1 and ACE2 ( Figure S1a ) (Yan and Zhang, 2020) . In a recently reported study, the overexpression of ACE2 has been reported to have an overall protective role in viral infection among patients (Vaduganathan et al., 2020) . Increased density of ACE2 on the cell surface has been observed to protect from lung injury (Imai et al., 2005) ; also observed in many other instances (Kuster et al., 2020) . We propose that an increased expression of ACE2 due to various factors, including naturally occurring variations influencing expression of the gene, may increase collisions and hence result in homo-dimerization of this protein. Binding of B0AT1 to ACE2 is not dependent on ACE2 homodimerization, but its presence along with dimerised ACE2 effectively shields the latter from interaction with TMPRSS2, with a possible prevention or reduction in its efficiency to cleave ACE2, a hypothesis based on visualization of the available experimentally resolved structure 61M7.PDB (Yan and Zhang, 2020) . This observation is significant since the cleaved ACE2 interacts more efficiently with the SARS CoV-2 RBD domain. 7 Additionally, we also explored if the protease domains (PD) of each of the homodimerized ACE2 could independently bind with whole of the Spike Glycoprotein without affecting each other or any steric hindrance (Yan and Zhang, 2020) . In the absence of any experimentally resolved structure available in literature with ACE2 homodimer and complete SARS-CoV2 Spike Glycoproteins binding to these, we aligned the RBD domains of the 6M17.pdb with the prefusion SARS-CoV2 Spike Glycoprotein in the optimal ACE2 binding conformation (Wrapp et al., 2020; Yan and Zhang, 2020) with "up CTD1" and "open S1 subunit" (Song et al., 2018) (Figure 1') . Both Spike Glycoproteins (6VSB.pdb) aligned well with viral RBD of (6M17.pdb). Further these were also observed without any steric clash with the ACE2 or partner Spike Glycoprotein in modelled structure. Both of these protruded radially outward from ACE2 binding site at PDs but inwards, at anchorage sites on virus membrane, with distance of approximately 27 nanometers (nm) (Figure 1') . However, considering the reported inter spike distance, between 13-15 nm (Neuman et al., 2006) and relative positions of Spike proteins on the viral shell, given almost similar sizes of SARS-CoV (Goldsmith et al., 2004; Neuman et al., 2006) and SARS-CoV-2 (Chen et al., 2020b) , these are anticipated to be either parallel or protruding radially outwards from its anchorage site, contrasting to the findings in modelled structure (Figure 1 ). With this finding of inverse orientation of the Spike protein from virus shell in modelled structure, we propose that in case of ACE2 dimerization, only one of the dimerized ACE2 can bind with the Spike protein due to steric hinderances. Thus, the other unbound ACE2 partner may participate in its physiological role and continue to protect from lung injury. To understand the mechanistic effect of dimerization, we conceived and simulated two situations: one, where ACE2 dimer is bound to two B0AT1 and two RBD domains of SARS CoV-2; and second, where only monomeric ACE2 is bound to one viral RBD. From the RMSF analysis of the simulated trajectories, we observed a higher order fluctuation in monomeric ACE2 homo-dimerization interface; while the rest of the ACE2 displayed a similar fluctuation profile, including residues interacting with RBD ( Figure This observation further explains the protective role of overexpressed ACE2, resulting in its dimerization and reduced affinity for the RBD domain of SARS CoV-2 S1 protein. However, we are also not clear whether the presence of B0AT1 has any allosteric effect on this observation, in addition to already observed masking of ACE2 from TMPRSS2 mediated cleavage. Additionally, another metalloprotease ADAM17 has been reported to compete with the TMPRSS2 and cleave ACE2 in a way that only cleavage by TMPRSS2 was reported to drive the SARS CoV entry inside the cell (Heurich et al., 2014) , which is yet to be explored in relation to 8 SARS CoV-2. Recent literature also indicates the potential role of other proteases, like cathepsin B/L that can functionally replace TMPRSS2, which needs to be evaluated extensively. (Sungnak et al., 2020) All these hypotheses warrant further in-silico analyses with better computational resources, as well as experimental study designs; and for these we are open to seek collaborations. Yet, the observations made are of high importance and emphasise on evaluation of expression of the ACE2 and BoAT1 along with TMPRSS2 among patients with varied clinical response to SARS CoV-2 infection, differential outcomes, and correlation with observed mortality. Therefore, a differential expression of these genes among patients displaying varied responses from asymptomatic to acute symptoms is worth an exploration, which may act as biomarkers if proven to predict severity and susceptibility to COVID19. The observations, from our MD analyses (Figures S1'') show that higher expression of ACE2 gene may promote ACE2 homo-dimerization, rendering less binding affinity to SARS CoV-2 RBD as well as masking TMPRSS2 cleavage site (Figure 1 ). Based on evidences appearing in literature of eQTLs and ACE2 higher expression in East Asians (Chen et al., 2020a) as well as reported protective role of ACE2 in lung injury (Imai et al., 2005; Kuster et al., 2020; Vaduganathan et al., 2020) , we explored more about ACE2 expression and its differential genomic backgrounds in different population groups of the world. Tissue Specific evaluation of ACE2 gene in GTEx portal indicated low level expression of the ACE2 in Lungs Genomes Phase 3 dataset and filtered for variants with at least 10% frequency of the alternate allele in global population. Overlapping the variants from both the exercises resulted in shortlisting of 2 SNPs rs1978124 and rs2106809, which were observed with differential frequency distribution in different populations of the world ( Figure S5 ) at both super-population group and sub-population group level. Interestingly, EAS and EUR sub population groups were observed to show relatively uniform frequency distribution within group. However, in AFR, AMR and SAS sub population groups, intra population group differences were observed to be higher, 9 indicating diversity in the gene pool of population groups. HaploReg annotations indicated that these SNPs are in a region with Enhancer histone marks and DNAase activity. The same were observed through multiple regulatory elements tracks in UCSC Genome browser ( Figure S2d and S4 ) including GH0XJ015596 enhancer marked by GeneHancer database. (Fishilevich et al., 2017) Linkage Disequilibrium (LD) values as (r 2 ) with other putative proxy functional variants were also explored and mapped with UCSC genome browser. Interestingly, it was observed that SNPs rs1978124 and rs2106809 have a strong LD block of >100kb with various SNPs in absolute LD but upstream of ACE2 gene across population groups ( Figure S6 ). This indicates a strong enhancer activity from the region that may affect higher expression of ACE2 gene in lungs, which may facilitate ACE2 homo-dimerization (Figure 1) . However, it also requires experimental validation. GTEx portal indicated relatively high levels of TMPRSS2 expression in Lungs (Figure S2a and S7c), differential transcription ( Figure S7c ) but no protein expression in lungs ( Figure S7a) . Evaluation of the Protein atlas portal, source of Figure S7a , indicated that both antibodies used for detection TMPRSS2 (not shown in Ms) were restricted to target near N terminal of the protein (either cytoplasmic domain or proximal extracellular domain near membrane). Evaluation of the GTEx portal for cis-eQTLs for TMPRSS2 gene in lungs, returned a large number of eQTLs but with a peculiar feature (Figure 3a and b) . The eQTLs in lungs were different as compared to other tissues and had following features: found to be concentrated in region of the gene with potential alternate transcripts (Figure a,b and S8) , towards end of the gene in relatively high expressing exons (Figure 4b) and coding for the amino acid sequence that has putative functional role in protein as serine protease domain (Figure 4a ) critical for ACE2 cleavage and SARS CoV-2 S-Protein priming. Variations data was retrieved from 1000 Genomes Phase 3 dataset and filtered for variants with at least 10% frequency of the alternate allele in global population and overlapped with cis-eQTLs data, resulting in 10 SNPs (rs463727, rs55964536, rs4818239, rs734056, rs4290734, rs2276205, rs34783969, rs11702475, rs62217531 and rs383510). Annotation of the SNPs indicated that all the SNPs clustered together and were in strong LD block ( Figure S8 and S9a). HaploReg annotations indicated Enhancer histone marks and DNAase activity in the regions overlapping these SNPs. Amongst these, rs4818239 showed a prominent putative functional role ( Figure S9b ) which requires experimental validation. However, the frequency distribution of the variant rs4818239 in different populations groups of 1000G showed an interesting differential pattern (Figure S9c) . We also explored if there were any alternate functional variations, yet common in populations, by screening TMPRSS2 gene through genomAD browser and filtered for only missense variations. The search returned rs75603675 (NP_001128571.1:p.Gly8Val or G8V) and rs12329760 (NP_001128571.1:p.Val197Met or V197M), also observed with differential frequency distribution in 1000G populations (Figure S9d and e) . The observations indicate TMPRSS2 variants may influence interaction with ACE2 as well as SARS CoV-2 (Figure 1 ) resulting in population specific differential outcomes. We further opted to explore the structure and function of TMPRSS2 protein. Pairwise sequence alignment of two isoforms of TMPRSS2 suggested that Isoform-2 lacked 37 residues at N-terminal compared to Isoform-1, which was the longest transcript and coded for 529 amino acids ( Figure S10 ). Since the human TMPRSS2 protein structure was not available in the PDB database, we generated a computational protein model. SRCR (GO Term: Scavenger Receptor Activity as annotated in UniProt) and introduces an amino acid with different properties, which could disturb this domain and abolish its function. Analyses of another variation rs75603675 (p.Gly8Val or G8V) showed that wild-type residue, glycine, providing flexibility might be necessary for the protein function and an alteration in this position could abolish this function, as the observed torsion angles for this residue were unusual. It could be speculated that only glycine was flexible enough to make these torsion angles, and a change at the location into another residue would force the local backbone into an incorrect conformation, disturbing the local structure. The variant residue was also observed to be more hydrophobic than the wild-type residue. Since sequence similarity search did not find any significant template at the N-terminal of this protein, we could not generate a quality model for isoform-2, which contains G8V variation; hence we carried out sequence-based stability analysis. The results suggested that G8V variation could increase the stability of the TMPRSS2 protein with ∆∆G value of -0.10 kcal/mol. Further, it is known that Arginine and lysine residues within amino acids 697 to 716 are essential for efficient ACE2 cleavage by TMPRSS2 (Heurich et al., 2014) and recent studies have shown that SARS-CoV-2 uses the ACE2 for entry and the serine protease TMPRSS2 for S-protein priming. Based on information from previous studies, we docked the TMPRSS2 p.Val197Met wild-type and variant protein with ACE2 and S-protein of SARS-CoV-2. The docking results suggest that the variant V197M protein could promote the binding to ACE2 and inhibit the binding with S-protein ( Figure S11) . However, these observations need critical revaluation as well as experimental work to understand these interactions better. One of the interesting gene, SLC6A19, that codes for protein B0AT1, expresses in a very limited number of tissues and is reported to be absent in Lungs (Figure S2a and S12a,b,c). Our findings indicated its protective role with competitive hinderance in binding to TMPRSS2. As its expression was observed to be very low in Lungs, we resolved to look for indirect signatures that may have putative role in providing differential susceptibility. We noticed several variations clustering together in a potential enhancer region (Figure S12d and e) and with differential frequencies in different populations. eQTL analyses at GTEx portal also showed a huge list of potential SNPs upregulating or down regulating SLC6A19 expression in Pancreas, liver, and whole blood cells, overlapping with potential Transcription factor binding sites ( Figure S12f) . We hypothesize a potential chance of some natural occurring variation/s that could induce expression of BoAT1 in respiratory tract thus, providing a protective role in this scenario (Figure 1) . However, this requires extensive computational exploration as well as experiential validations. We also believed that there are additional genes and factors which might be playing an additional role in providing differential susceptibility to COVID19. Thus, we carried out a systems biology study to identify Interaction Network (HPIN) was created. The constructed HPIN contained 163 interactions, involving 31 nodes, which included 4 viral proteins, 27 human proteins and 8 Transcription Factors (TF) (Figure 5a) . From the HPIN, we identified hubs, namely RPS6, NACA, HNRNPA1, BTF3 and SUMO1 with 19, 18, 17, 16 & 12 degrees, respectively in the network (Figure 5a) . This indicated the affinity to attract a large number of low degree nodes towards each hub, which is a strong evidence of controlling the topological properties of the network by these few hubs (Good et al., 2011) . Interestingly, out of five significant hubs, four hubs (RPS6, NACA, HNRNPA1 and BTF3) present in module (Nodes: 16, Edges: 118, Score: 15.73) and one hub SUMO1 was present at motif level which is considered one of the most important regulating motif of biological network at a fundamental level (Figure 5b) association study few diseases were highly associated with these important hub proteins ( Figure S13a) ; HNRNPA1 (36%) followed by RPS6 (25%), SUMO1 (21%), BTF3 (9%) and NACA (9%) (Figure S13b) . Clinically, patients with COVID-19 present with respiratory symptoms, Anoxia, fatigue, heart failure etc could be associated with these hub proteins, mainly HNRNPA1& SUMO1. By hub removal methodology; we tried to understand the effect of hub removal and calculated the topological properties of the HPIN as a control. The probability of degree distribution (P(k)), Clustering coefficient (C(k)) and Neighbourhood connectivity (CN(k)) showed that the HPIN followed a power law scaling behavior. The power law behaviour was also checked and confirmed by using statistical test for power law fitting (p-value ≥ 0.1) (Clauset et al., 2009) in the hub-removal process. The removal of RPS6, HNRNPA1, SUMO1, NACA & BTF3 from the HPIN brings significant variations in the topological properties of the HPIN where, degree distribution (α and β) change significantly (Figure S14a) . Similarly, the variations in the measurements of the exponents of clustering coefficient and neighbourhood connectivity (λ and µ) also showed significant ( Figure S14a ). The knockout experiment of hub could able to highlight the local perturbations driven by these hubs and their effect on global network properties. The result suggest removal of NACA and RPS6 hubs could turn down the degree distribution, so it could be crucial for communicating the signals (Figure S14b) . In case of clustering coefficient, perturbation of BFT3 shows minimum γ, indicating removal of BFT3 makes the network less compact, which may lead to the delay in flow of signal. The HPIN perturbation increases in case of BTF3 hub removal. In Neighbourhood connectivity increase in µ indicates that the information processing in the network becomes faster when SUMO1 hubs are removed, which means that local perturbations due to removal of hubs are strong enough to cause significant change in global scenario (Canright and Engø-Monsen, 2004) ( Figure S14b ). This indicates hubs are not robust, but it helps in the stability of the network. Overall network analysis showed ACE2 is not only the key molecule for entry and survival of SARS-CoV-2 virus, the hub proteins like RPS6, HNRNPA1, SUMO1, NACA & BTF3 might also play a vital role. Analysing these interactions could provide further important understanding for the underlying biological mechanism of SARS-CoV-2 virus infection and identifying putative drug targets. To conclude (Figure 1) , higher expression of ACE2 is facilitated by natural variations with different frequencies in different populations and is functionally associated with expression of the gene. The higher expression of ACE2 facilitates homo-dimerization resulting in hindrance to TMPRSS2 mediated cleavage of ACE2. It becomes more difficult in presence of B0AT1 that usually does not express in Lungs. We also propose that the monomeric ACE2 has higher preferential binding with SARS-CoV-2 S-Protein vis-a-vis its dimerized counterpart. Further, natural variations in TMPRSS2, with potential functional role, and their differential frequencies may also result in differential outcomes towards interaction with ACE2 or priming of viral S-protein, a critical step for entry of virus in host cells. In addition, we have identified some other potential key host genes like ADAM17, RPS6, HNRNPA1, SUMO1, NACA and BTF3, that might have a critical role. With all this background, it is anticipated that in populations like Indian populations, with highly diverse gene pool, a great variation in clinical outcomes is expected and that could be population/region specific, but primarily due to gene pool structure of the region, despite similar exposure levels to SARS CoV-2 and resources. Understanding these population specific differences may help in developing appropriate management strategies. Additional information about knock-out experiment in Network analysis Degree Distribution (P(k)): Degree k is the number of interaction a node in the HPIN. P(k) is the probability of randomly chosen node to have k interaction with the neighbour. The probability of degree distribution (P(k)) of the network is calculated by: where, nk is equal to the number of nodes with degree k and N is equal to the size of the network (Albert and Barabási, 2002; Barabasi and Albert, 1999) . Clustering Coefficient (C(k)): Clustering coefficient defines how strongly the nodes in a network tend to cluster together. Clustering coefficient of the i th node in undirected network can be obtained by: where, ei is the number of connected pairs of the nearest-neighbour of the i-th node and ki denotes the degree of the respective node (Ravasz and Barabasi, 2003; Ravasz, et al., 2002) . Neighborhood Connectivity (CN(k)): Neighbourhood connectivity of a node is defined as the connectivity of all the neighbours of the node (Maslov and Sneppen, 2002) . The average connectivity of the nearest neighbours of a node is given by: ) / (#) = 0 1!(1|#) 3 where, P(q|k) denotes the conditional probability that a link belonging to a node with connectivity k points to a node with connectivity q. For scale free network, CN(k) is constant while it follows power law CN(k)= c -µ for hierarchical network with µ approximately equal to 0.5 (Pastor-Satorras, et al., 2001) . The positive and negative signs in µ indicates the assortivity or disassortivity of network respectively (Barrat, et al., 2004) . Lung eQTLs that overlapped with the common dbSNPs are shown and along r 2 values depicting LD in the region is plotted. All the SNPs appeared to cluster towards 3' end of the gene. Other annotations include GeneHancer regulator elements and interaction between GeneHancer regulatory elements as curved lines can be seen. (dark blue color for TMPRSS2 gene). Alternate splicing graph is also indicated. intersecting enhancer region GH05J001199. It also depicts common variants (with applied filter on track for SNPs with frequency greater than 20 percent in global populations) in dbSNP version 151. (f) eQTLs in different tissues for SLC6A19 were plotted through GTEx Locus Browser. Size of the dot indicate level of significance (as negative p values) whereas colour depicts positive or negative correlation with Normalized effect size (NES) of the eQTL from -1 to 0 to 1. Red color shades represent upregulation whereas blue color shades show downregulation. It also depicts Linkage disequilibrium (LD) in region with value range from 0 to 1 as white to black shades with 1(dark) as absolute LD. Significant upregulation of the gene by eQTLs was observed in Pancreas whereas downregulation in Liver and no data on Lungs as it is reported not to be expressing in Lungs at GTEx portal. Proceedings of the 2006 ACM/IEEE conference on Supercomputing HPIDB 2.0: a curated database for host-pathogen interactions Computing topological parameters of biological networks Rac1 signalling modulates a STAT5/BCL-6 transcriptional switch on cell-cycle-associated target gene promoters On the Robustness of Centrality Measures Under Conditions of Imperfect Data Roles in networks Comparative genetic analysis of the novel coronavirus (2019-nCoV/SARS-CoV-2) receptor ACE2 in different populations Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information Individual Variation of the SARS-CoV2 Receptor ACE2 Gene Expression and Regulation Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study The use of position-specific rotamers in model building by homology Power-Law Distributions in Empirical Data HnRNP proteins controlled by c-Myc deregulate pyruvate kinase mRNA splicing in cancer HADDOCK: a protein-protein docking approach based on biochemical or biophysical information The Pfam protein families database in 2019 GeneHancer: genome-wide integration of enhancers and target genes in GeneCards Ultrastructural characterization of SARS coronavirus Scaffold proteins: hubs for controlling the flow of cellular information Bio3d: an R package for the comparative analysis of protein structures Clinical Characteristics of Coronavirus Disease 2019 in China TMPRSS2 and ADAM17 cleave ACE2 differentially and only proteolysis by TMPRSS2 augments entry driven by the severe acute respiratory syndrome coronavirus spike protein SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor VMD: Visual molecular dynamics Angiotensin-converting enzyme 2 protects from severe acute lung failure On the Role of the Crystal Environment in Determining Protein Side-chain Conformations Lethality and centrality in protein networks How sick will the coronavirus make you? The answer may be in your genes HPIDB--a unified resource for host-pathogen interactions SARS-CoV2: should inhibitors of the renin-angiotensin system be withdrawn in patients with COVID-19? Eur Heart J Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments Prostate cancer stem cells: do they have a basal or luminal phenotype? Methodology of predicting novel key regulators in ovarian cancer network: a network theoretical approach Uncovering disease-disease relationships through the incomplete interactome Correlation between universal BCG vaccination policy and reduced morbidity and mortality for COVID-19: an epidemiological study. medRxiv Inhibition of SARS-CoV-2 Infections in Engineered Human Tissues Using Clinical-Grade Soluble Human ACE2 Identification of key regulators and their controlling mechanism in a combinatorial apoptosis network: a systems biology approach Supramolecular architecture of severe acute respiratory syndrome coronavirus revealed by electron cryomicroscopy Automated minimization of steric clashes in protein structures Statistical mechanics of community detection Cytoscape: a software environment for integrated models of biomolecular interaction networks COVID-19 infection: the perspectives on immune responses Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega New and continuing developments at PROSITE Cryo-EM structure of the SARS coronavirus spike glycoprotein in complex with its host cell receptor ACE2 Human ACE2 receptor polymorphisms predict SARS-CoV-2 susceptibility. bioRxiv SARS-CoV-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes Enterocyte-specific regulation of the apical nutrient transporter SLC6A19 (B(0)AT1) by transcriptional and epigenetic networks Proteomics. Tissue-based map of the human proteome. Science UniProt: a worldwide hub of protein knowledge Renin-Angiotensin-Aldosterone System Inhibitors in Patients with Covid-19 Protein structure analysis of mutations causing inheritable diseases. An e-Science approach with life scientist friendly interfaces The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function SWISS-MODEL: homology modelling of protein structures and complexes Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Y-box binding protein 1 is up-regulated in proliferative breast cancer and its inhibition deregulates the cell cycle Role of HIF-1alpha in the regulation ACE and ACE2 expression in hypoxic human pulmonary artery smooth muscle cells A pneumonia outbreak associated with a new coronavirus of probable bat origin Jaypee Institute of Information Technology, Noida, Sector-62 Statistical mechanics of complex networks Emergence of scaling in random networks The architecture of complex weighted networks Specificity and stability in topology of protein networks Dynamical and correlation properties of the internet Hierarchical organization in complex networks The authors declare no competing interests associated with MS. For declaration purposes, SS is founder, chief scientific advisor of a startup "Biodroid Innovations Pvt Ltd" and IP is director of "Bioinfores Pvt. Ltd.".