key: cord-0732765-o0c0en9i authors: Ghavasieh, A.; Bontorin, S.; Artime, O.; De Domenico, M. title: Multiscale statistical physics of the Human-SARS-CoV-2 interactome date: 2020-09-08 journal: nan DOI: 10.1101/2020.09.06.20189266 sha: ab0aacc48cdb4010275eb6a529b9b69a437d6375 doc_id: 732765 cord_uid: o0c0en9i Protein-protein interaction (PPI) networks have been used to investigate the influence of SARS-CoV-2 viral proteins on the function of human cells, laying out a deeper understanding of COVID--19 and providing ground for drug repurposing strategies. However, our knowledge of (dis)similarities between this one and other viral agents is still very limited. Here we compare the novel coronavirus PPI network against 45 known viruses, from the perspective of statistical physics. Our results show that classic analysis such as percolation is not sensitive to the distinguishing features of viruses, whereas the analysis of biochemical spreading patterns allows us to meaningfully categorize the viruses and quantitatively compare their impact on human proteins. Remarkably, when Gibbsian-like density matrices are used to represent each system's state, the corresponding macroscopic statistical properties measured by the spectral entropy reveals the existence of clusters of viruses at multiple scales. Overall, our results indicate that SARS-CoV-2 exhibits similarities to viruses like SARS-CoV and Influenza A at small scales, while at larger scales it exhibits more similarities to viruses such as HIV1 and HTLV1. : Virus information summary. The 45 viruses used in this study are shown against their size, in terms of viral proteins, and coloured by their official family classification. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2020. . is shown. Proteins targeted by viruses are highlighted in two ways. On the one hand, markers of distinct size identify targeted proteins: bigger the marker larger the number of times a protein is targeted by viruses in our data set. On the other hand, distinct colored markers of constant size encode distinct viruses (45 in total, including SARS-CoV-2): on the right-hand side the same color scheme is used to show the contribution of each virus to the most frequently targeted proteins. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2020. . erwise). In the classical version of percolation analysis, one removes a randomly chosen fraction 20 . This point of view assumes that, as a first approximation, there is an intrinsic rela-52 tion between connectivity and functionality: when the node removal occurs, the more capable of 53 remaining assembled a system is, the better it will perform its tasks. Hence, we have a quantita-54 tive way to assess the robustness of the system. If one wants to single out the role played by a 55 certain property of the system, instead of selecting the nodes randomly, they can be sequentially 56 removed following that criteria. For instance, if we want to find out what is the relevance of the 57 most connected elements on the functionality, we can remove a fraction of the nodes with largest 58 6 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 8, 2020. . https://doi.org/10.1101/2020.09.06.20189266 doi: medRxiv preprint degree 21, 22 . Technically, the criteria can be whatever metric that allows us to rank nodes, although 59 in practical terms topologically-oriented protocols are the most frequently used due to their ac-60 cessibility, such as degree, betweenness, etc. Therefore percolation is, at all effects, a topological 61 analysis, since its input and output are based on structural information. In the past, the usage of percolation has been proved useful to shed light on several aspects 63 of protein-related networks, such as in the identification of functional clusters 23 and protein com-64 plexes 24 , the verification of the quality of functional annotations 25 or the critical properties as a 65 function of mutation and duplication rates 26 , to name but a few. Following this research line, we 66 perform the percolation analysis to all the PPI networks to understand if this technique brings any 67 information that allows us to differentiate among viruses. The considered protocols are the random 68 selection of nodes, the targeting of nodes by degree -i.e., the number of connections they have -69 and their removal by betweenness centrality -i.e., a measure of the likelihood of a node to be in 70 the information flow exchanged through the system by means of shortest paths. We apply these 71 attack strategies and compute the resulting (normalized) size of the largest connected component 72 S in the network, which serves as a proxy to the remaining functional part, as commented above. This way, when S is close to unity the function of the network has been scarcely impacted by the 74 intervention, while when S is close to 0 the network can no longer be operative. The results are 75 shown in Fig. 3 . Surprisingly, for each attacking protocol, we observe that the curves of the size of 76 the largest connected component neatly collapse in a common curve. In other words, percolation 77 analysis completely fails at finding virus-specific discriminators. Viruses do respond differently 78 depending on the ranking used, but this is somehow expected due to the correlation between the 79 metrics employed and the position of the nodes in the network. 80 We can shed some light on the similar virus-wise response to percolation by looking at CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 8, 2020. . different symptomatology, their overall structure shows a high level of similarity when it comes to the protein-protein interaction. Indeed, for every pair of viruses we find the fraction of nodes 84 f N and fraction of links f L that simultaneously participate in both. Averaging over all pairs, we 85 obtain that f N = 0.9996 ± 0.0002 and f L = 0.9998 ± 0.0007. That means that the interactomes 86 are structurally very similar, so the dismantling ranks. If purely topological analysis is not able 87 to differentiate between viruses, then we need more convoluted, non-standard techniques to tackle 88 this problem. In the next sections we will employ these alternative approaches. is also dependent on its degradation rate B i and the amount of protein synthesized at a rate F i . The resulting Law of Mass Action: A ij x i x j summarizes the formation of 98 complexes and degradation/synthesis processes that occur in a PPI. Regulatory dynamics can be 99 instead characterized by an interaction with neighbors described by a Hill function that saturates 100 at unity: The effect is simulated by introducing a negative constant perturbation at the steady state concen-106 8 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2020. . https://doi.org/10.1101/2020.09.06.20189266 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2020. . tration/activity x i → x i − αx i , ∀i ∈ V (e.g., α = 0.2) and tracking its propagation to the rest of the network by solving the corresponding set of the coupled equations. For a M-M-like model (with In Ultimately, the idea is that constant perturbation brings the system to a new steady state The steps we follow to asses the impact of the viral nodes in the human interactome via 122 the microscopic dynamics are described next. We first obtain the equilibrium states of human 123 10 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2020. . https://doi.org/10.1101/2020.09.06.20189266 doi: medRxiv preprint interactome by numerical integration of equations. Then, for each virus, we compute the system 124 response from perturbations starting in ∀i ∈ V which is eventually encoded in G v . Finally, we 125 repeat these steps for both the Bio-Chem and M-M models. The amount of correlation generated 126 is a measure of the impact of the virus on the interactome equilibrium state. We estimate it as the 127 Euclidean 1-norm of the correlation vectors G v 1 = i |G v i |, which we refer to as Cumulative Correlation. The results are presented in Fig. 4 . By allowing for multiple sources of perturbation, the biggest responses in magnitude will 130 come from direct neighbors of these sources, making them the dominant contributors to G v 1 . With I i not being dependent on the source degree, these results support the idea that with these although some of them are highly dependent on the choice of threshold (Fig. 5) . In this section, 147 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2020. . https://doi.org/10.1101/2020.09.06.20189266 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2020. Note that a low value of the Massieu function indicates high information flow between the nodes. The von Neumann entropy can be directly derived from the Massieu function by 13 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2020. . 14 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2020. . encoding the information content of graph G. Finally, the difference between von Neumann entropy and the Massieu function follows where U(β, G) is the counterpart of internal energy in statistical physics. In the following, we . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2020. Table 1 : The summary of clustering results at small scales (β ≈ 1 from Fig.6 ) is presented. Remarkably, at this scale, SARS-CoV-2 groups with a number of respiratory diseases including SARS-CoV, Influenza A and HAdV. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2020. Fig.6 ) is presented. Here, SARS-CoV-2 shows higher similarity to HIV1, HTLV1 and HPV type 16. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted September 8, 2020. can be adopted to characterize and categorize the complex nature of viruses and their impact on 190 human cells. In this study, we used an approach based on statistical physics to analyze virus-human appears to be more similar to viruses like HIV1 and HTLV1. As mentioned earlier, the response the targeted human proteins and build a virus-host interactome by merging this information with BIOSTR. However, for the subsequent analyses, which are focused only on the human interactome, 236 we discard virus-virus interactions. It is worth noting that to build the COVID-19 virus-host interactions, a different procedure 238 had to be used. In fact, since the SARS-CoV-2 is too novel we could not find its PPI in the STRING 239 repository and we have considered, instead, the targets experimentally observed in Gordon et al 13 , 240 consisting of 332 human proteins. The remainder of the procedure used to build the virus-host PPI 241 is the same as before. See Fig. 1 for summary information about each virus. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 8, 2020. . https://doi.org/10.1101/2020.09.06.20189266 doi: medRxiv preprint viral RNA responsible for triggering the innate immune response: it is fundamental for activating 259 the process of pro-inflammatory response that includes interferons, for this reason it is targeted by 260 several virus families which are able to hinder the innate immune response by evading its specific 261 interferon response. Contributions. AG, OA and SB performed numerical experiments and data analysis. MDD conceived and designed the study. All authors wrote the manuscript. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 8, 2020. . https://doi.org/10.1101/2020.09.06.20189266 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 8, 2020. . https://doi.org/10.1101/2020.09.06.20189266 doi: medRxiv preprint The proximal origin of SARS-CoV-2 The genetic landscape of a cell Epidemiologic features and clinical course of patients infected with SARS-CoV-2 in singapore A trial of lopinavir-ritonavir in adults hospitalized with severe covid-19 Remdesivir, lopinavir, emetine, and homoharringtonine inhibit SARS-CoV-2 replication in vitro Network medicine: a network-based approach to human disease Focus on the emerging new fields of network physiology and network medicine Human symptoms-disease network Network medicine approaches to the genetics of complex diseases The human disease network The multiplex network of human diseases Network medicine in the age of biomedical big data Structural genomics and interactomics of 2019 wuhan novel coronavirus, 2019-ncov, indicate evolutionary conserved functional regions of viral proteins Structural analysis of sars-cov-2 and prediction of the human interactome Fractional diffusion on the human proteome as an alternative to the multi-organ damage of SARS-CoV-2 Network medicine framework for identifying drug repurposing opportunities for covid-19 Predicting potential drug targets and repurposable drugs for covid-19 via a deep generative model for graphs Network robustness and fragility: Percolation on random graphs Introduction to Percolation Theory Error and attack tolerance of complex networks Breakdown of the internet under intentional attack Identifying protein complexes from interaction networks based on clique percolation and distance restriction Percolation of annotation errors through hierarchically structured protein sequence databases Infinite-order percolation and giant fluctuations in a protein interaction network Computational Analysis Of Biochemical Systems A Practical Guide For Biochemists And Molecular Biologists Propagation of large concentration changes in reversible protein-binding networks An Introduction to Systems Biology Quantifying the connectivity of a network: The network correlation function method Universality in network dynamics The statistical physics of real-world networks Classical information theory of networks The von neumann entropy of networks Structural reducibility of multilayer networks Spectral entropies as information-theoretic tools for complex network comparison Complex networks from classical to quantum Enhancing transport properties in interconnected systems without altering their structure Scale-resolved analysis of brain functional connectivity networks with spectral entropy Unraveling the effects of multiscale network entanglement on disintegration of empirical systems Under revision String v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets Biogrid: a general repository for interaction datasets The biogrid interaction database: 2019 update Gene help: integrated access to genes of genomes in the reference sequence collection Competing financial interests. The authors declare no competing financial interests.Acknowledgements. The authors thank Vera Pancaldi for useful discussions.