key: cord-0283259-z2ibquvr authors: Mahajan, Tarun; Dar, Roy D title: Internetwork connectivity of molecular networks across species of life date: 2020-08-03 journal: bioRxiv DOI: 10.1101/2020.08.03.233304 sha: 8447fed1f64f62b6139acbf90ed6614e3bb0a894 doc_id: 283259 cord_uid: z2ibquvr Background Molecular interactions have been studied as independent complex networks in systems biology. However, molecular networks dont exist independently of each other. In a network of networks approach (called multiplex), we uncover the design principles for the joint organization of transcriptional regulatory network (TRN) and protein-protein interaction (PPI) network. Results We find that TRN and PPI networks are non-randomly coupled in the TRN-PPI multiplex across five different eukaryotic species. Gene degrees in TRN (number of downstream genes) are positively correlated with protein degrees in PPI (number of interacting protein partners). Gene-gene interactions in TRN and protein-protein interactions in PPI also non-randomly overlap in the multiplex. These design principles are conserved across the five eukaryotic species. We show that the robustness of the TRN-PPI multiplex is dependent on these design principles. Further, functionally important genes and proteins, such as essential, disease-related and those involved in host-pathogen PPI networks, are preferentially situated in essential parts of the human multiplex with highly overlapping interactions. Conclusion We unveil the multiplex architecture of TRN and PPI networks across different species. Multiplex architecture may thus define a general framework for studying molecular networks across the different species of life. This approach may uncover the building blocks of the hierarchical organization of molecular interactions. Biological functions and characteristics are consequences of complex interactions between numerous components [1] . These components can be molecules such as DNA, RNA, proteins and other small molecules or larger units such as cells, tissues, whole organisms or entire ecosystems. These interactions are organized into a hierarchy of networks. Networks at different levels of this hierarchy have been studied extensively. For instance, at the subcellular level, transcriptional regulatory networks (TRN) model protein-DNA interactions [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] , proteinprotein interaction (PPI) networks capture physical interactions between proteins [6, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26] and metabolic networks map interactions between the set of biochemical reactions in an organism [1, 27, 28, 29] . Analysis of individual network layers has answered important biological questions ranging from organization of gene expression [5, 8, 29, 30, 31] , predicting phenotype from molecular interaction networks [16, 24] , to understanding disease biology [32, 33, 34, 35, 36] . However, biological networks do not function in isolation. These networks comprise of different types of interactions and even interact with other networks [1, 37] . For instance, TRN and PPI networks interact with each other. Proteins are translated from genes in accordance with the regulatory program encoded in the TRN. These translated proteins interact with each other in the PPI layer. Transcription factor proteins interact with other proteins in the PPI layer and also regulate downstream genes in the TRN network. Further, PPI networks can also encode different kinds of physical interactions between proteins, such as the ones revealed by Yeast Two-Hybrid (Y2H) binary, Affinity Purification (AP) protein complexes, synthetic lethality, dosage lethality, genetic interactions, etc, [14] . Such multilayer networks (comprising multiple networks) can be interdependent when different network layers interact with each other to form a network of networks (NON) architecture [38] . For instance, the interaction between TRN and PPI networks forms an interdependent NON ( Figure 1 ). Alternatively, multilayer networks can be multiplex with different networks, which encode distinct types of interactions between the same molecular species such as the different types of PPI interactions. Until recently, network science has focused largely on the study of individual biological networks. Even some of the studies that worked with multiple networks aggregated or integrated the different networks and did not consider a multilayer approach [39, 40, 41, 42] . This could partly be attributed to the fact that multilayer networks have gained popularity only in recent years, especially in statistical physics [38] . Now, extensive work has been done to study robustness properties of multilayer networks [43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53] . Counter-intuitively, interdependent networks are more fragile to random failure than independent individual networks [43] . Real interdependent networks mitigate this vulnerability by means of specific intra-and interlayer degree-degree correlation or coupling [46, 51] . For a given TRN-PPI interdependent (or multiplex) network ( Figure 1A ), degreedegree coupling (C D ) is quantified as the correlation between the connectivity of a protein in the PPI network, K, and the connectivity of its corresponding gene in the TRN, either in-degree (k in , number of regulations incident on a gene from upstream transcription factors), out-degree (k out , number of downstream genes regulated by a transcription factor), or total degree (k = k in + k out ). In this case, C D can be negative, positive, or zero. Particularly, positive C D makes the multiplex robust to attack ( Figure 1A ) [54, 55] . With positive C D , hub nodes are likely to be hub nodes in all the network layers ( Figure 1A, top) . For negative C D , hub nodes in one layers are dependent on spokes in the other network layer. For zero C D , hubs and spokes are randomly dependent on each other across network layers ( Figure 1B, bottom) . Here, we investigate C D across species and assess whether it is positive, negative, or uncoupled, and if there exists a global trend across various species. Edge overlap or redundancy between network layers also mitigates vulnerability in interdependent networks [56] . Two genes in the TRN network have an interaction between them if one is the transcription factor for the other. While in PPI networks, interaction between two proteins depicts physical or functional interaction between these proteins. We define multiplex redundancy as similarity in interactions between the TRN and PPI networks. We quantify redundancy (C R ) by counting the number of common interactions simultaneously in both the TRN and PPI networks. If TRN and PPI networks are represented as graphs, then C R can be measured by counting the number of edges which are simultaneously present in both the graphs ( Figure 1B and Methods). Higher redundancy makes the multiplex more robust to attack ( Figure 1B) . We study the multilayer network of TRN and PPI networks in nine different species, namely H. pylori, M. tuberculosis, E. coli, S. cerevisiae, C. elegans, D. melanogaster, A. thaliana, M. musculus and H. sapiens, spanning two domains of life (bacteria and eukaryotes). We collected TRN networks from nine different sources, and 16 different PPI networks from five different sources (see Methods and Table S1 , Additional File 2). Two of the PPI sources are publicly curated databases-BioGRID [14] and HINT [57] . Interlayer connectivity is defined by one-to-one correspondence between a gene and its corresponding protein. Therefore, this multilayer network can be reduced to an equivalent multiplex network [43] . Henceforth, we call the TRN-PPI multilayer network a TRN-PPI multiplex. Based on quality control, five (S. cerevisiae, C. elegans, D. melanogaster, M. musculus and H. sapiens) of the nine species were used for further analysis; multiplexes with visually continuous percolation curves representing second-order like behavior were studied. Here we show that for species TRN-PPI multiplexes, positive C D increases robustness to targeted attack on the genes and proteins. Further, increasing C R also increases robustness. We find a trade-off between robustness and independence. Independent multiplexes with no degree-degree coupling and redundancy are highly vulnerable, while positively degree-degree coupled and highly redundant multiplexes are highly robust. We show that this trade-off exists for different species individually. Multiplex coupling is also correlated with the distribution of functionally important genes and proteins such as essential, disease and pathogen-interacting genes and proteins. These functionally important genes are selectively situated in redundant and essential parts of the multiplex and consequently, are vulnerable. Interlayer degree-degree coupling and redundancy offer design mechanisms for tuning robustness in molecular multiplex networks. We study coupling between TRN and PPI networks using two multiplex propertiesdegree-degree coupling (C D ) and redundancy coupling (C R ). C D is quantified by Pearson's or Spearman's rank correlation between PPI degree, K, and TRN outdegree, k out (Figures 2A-2D, Methods) . We quantify C R by counting the total number of interactions present simultaneously in both TRN and PPI networks ( Figure 1B , Methods). We find that C D is significantly positive across different eukaryotes (Figures 2A-2D and Figure S1 Supplementary Information, Additional File 1). C R is also non-randomly positive across different eukaryotes (Figures 2E-2H and Figure S2 Supplementary Information, Additional File 1). Node-specific (for each gene-protein pair in the multiplex) C R values are more long-tailed compared to a randomly shuffled null model. Shuffled null model is generated by randomly shuffling labels on genes in TRN, while keeping protein labels fixed in PPI. Multiplex degree-degree and redundancy couplings modulate robustness We study the relationship between multiplex degree-degree (C D ) and redundancy (C R ) couplings and robustness (R) across species and for individual species. We use a previously reported formalism to study topological robustness of the TRN-PPI multiplex under targeted attack on its nodes [43, 58] (Methods). Robustness is related to the size of the mutually connected giant component (MCGC). MCGC is defined as the largest connected component between both layers of the multiplex (Methods). Buldyrev et al. [43] and Kleinberg et al. [58] track the size of MCGC under attack to quantify robustness. We specifically focus on targeted attack. Under targeted attack, at each step of the attack, gene-protein pairs are removed in decreasing order of multiplex degree, K mult (i) = max(K(i), k out (i)) [58] , where K mult (i) is the multiplex degree for gene-protein pair i, K(i) is the degree of protein i in the PPI network and k out (i) is the out-degree of gene i in the TRN network (Methods). Absolute robustness is then measured by tracking the relative size of MCGC (MCGC divided by number of gene-protein pairs in the multiplex) as we successively remove gene-protein pairs from the multiplex. Figure 3 and Figure S3 (Supplementary Information, Additional File 1) show relative size of MCGC as a function of fraction of gene-protein pairs removed during targeted attack for all the species. We call the curve for MCGC the attack curve. Absolute robustness is quantified by area under the attack curve (we will call this area RobustArea) (Methods). Large RobustArea implies large robustness for a given multiplex, and vice versa small RobustArea means low robustness. Cohens d [59] is used to quantify effect size for robustness by comparing RobustArea for a given multiplex against an appropriate null model (Methods). This quantity is used as an estimate of robustness (R) in this work. We use a Zero-Coupling-Zero-Redundancy null model (see Supplementary Information, Additional File 1). Under this null model, we generate multiplexes with C D and C R fixed to zero. Attack curves for different eukaryotic species are shown in Figure 3 . Visually, we see that organismal RobustArea values are larger than the null model on average (except for yeast with HINT PPI network). This is quantified in Figure 4A , where C D and R are positively correlated across different eukaryotic species with Spearman's correlation coefficient of 0.68 (p-value = 0.044), and C R and R are positively correlated across different eukaryotic species with Spearman's correlation coefficient of 0.72 (p-value = 0.037). We also quantify the dependence of R on C D and C R for individual species (Figures 4B and 4C ). For each species, we sample a subset of the TRN-PPI multiplex such that the sampled multiplex has specific values of C D and C R (Supplementary Information, Additional File 1). This sampling is repeated 1000 times. Attack curves and RobustArea are then computed for the sampled multiplexes. Therefore, for the given combination of C D and C R , we get a distribution of RobustArea, and R can be calculated. For each species, we explore C D and C R values over a grid ( Figure 4B ). Changing both C D and C R increases R (Figures 4B and S6 Supplementary Information Additional File 1) . For all the species, multiplexes with highest C D and C R values exhibit maximum R ( Figures 4B and S6 Supplementary Information, Additional File 1). Further, for a fixed value of C R , R increases with C D . Similarly, for fixed C D , R also increases with C R . The effect of C R is stronger than C D , which is evident for human multiplex where R is high for high C R irrespective of C D . The independent effect of C D and C R on R is shown in Figure 4C . Across species, we find that C D , C R and R are positively correlated with the number of gene-proteins in the multiplex ( Figure S4 , Supplementary Information, Additional File 1). To assess whether across species correlation between C D (C R ) and R is simply an artifact of the difference between the sizes of the multiplexes or not, we sample two subsets of different sizes for yeast, fly and human (Figures S13-S15, Supplementary Information, Additional File 1). For each of these species, we find that the larger sized susbet has a larger robustness while keeping C R and C D fixed ( Figure S13 , Supplementary Information, Additional File 1). However, despite this dependence of R on multiplex size, dependence on C D and C R can still be assessed by comparing the two subsets (Figures S13 and S14, Supplementary Information, Additional File 1). This suggests that the correlations with R seen in Figure 4A and 4C are indeed because of C D and C R in addition to the dependence on the number of gene-protein pairs in the multiplex. Essential genes and proteins are essential for multiplex robustness We collected essential and non-essential genes for three species (yeast, fly and human) from the Online GEne Essentiality (OGEE) database [60, 61] . We quantify C R and C D couplings for essential and non-essential genes (Figures 5A and S7 Supplementary Information, Additional File 1, respectively). For human and fly, essential genes have higher C R than non-essential genes ( Figure 5A ). However, there is no significant difference in C D ( Figure S5 Supplementary Information, Additional File 1). For yeast, C R is not significantly different between essential and non-essential, while C D for essential genes is higher than non-essential genes. To gauge importance of essential (non-essential) genes for the multiplex, we selectively attack essential (non-essential) genes and proteins in decreasing order of multiplex degree. Partial attack curves are generated by successively attacking essential (non-essential) genes in decreasing order of multiplex degree for the subset. The attack process is halted once all the genes in the subset of essential (non-essential) genes have been removed (Methods). RobustArea is quantified by computing area under such partial attack curves. R is calculated by comparing against a randomly selected set of genes. For fly and human, higher C R of essential genes compared to non-essential and random genes co-occurs with lower R ( Figure 5A ). For yeast with HINT PPI network, higher C D of essential genes compared to non-essential and random genes co-occurs with lower R ( Figure S7A Supplementary Information, Additional File 1). Attacking essential genes breaks the multiplex faster than attacking non-essential or a random set of genes, which shows that essential genes and proteins are situated in a highly coupled and important part of the multiplex. We also study the impact of C R on robustness by sampling subsets of genes and proteins (size 100) in the human multiplex ( Figure S12 Supplementary Information, Additional File 1). As the redundancy of the sampled genes and proteins increases, robustness decreases against a random set of genes. Thus, redundancy controls selective placement of a subset of genes and proteins in important parts of the multiplex. Pathogen-and disease-related genes and proteins are situated in essential parts of the human multiplex Pathogen We collected human-pathogen protein-protein interaction data for 13 different pathogens. Data for 12 of these 13 pathogens was collected from a publicly available database, HPIDB 3.0 [62, 63] . This is a curated database which currently contains 69,787 unique protein interactions between 66 host and 668 pathogen species. We studied interactions for 12 different human pathogens from HPIDB 3.0 ( Figure 5B ). Besides these 12 pathogens, we also included human-pathogen protein interactions for various human coronaviruses (HCoVs). We collected a list of 119 human proteins which interact with various HCoVs [64] . Therefore, in total we have 13 pathogens in our analysis. For each pathogen, we divide the human multiplex into two sets of gene-protein pairs: pathogen-related (S P ) and not pathogen-related (S N P ). We perform targeted attack on these sets of genes and proteins and generate partial attack curves (Methods and Supplementary Information, Additional File 1). The size of S P is greater than S N P for all of the 13 pathogens. Therefore, for an equitable comparison, we randomly sampled genes and proteins from S N P to match the number in S P . This process is repeated 100 times. For a baseline comparison, we also randomly sample genes and proteins to match the number in S N P . This random sampling was also performed 100 times. R is computed for pathogen-related and not pathogenrelated genes and proteins by comparing partial RobustArea against the partial RobustArea for the random set. For all the pathogens, except Zika, targeted attack on the pathogen-related genes and proteins makes the multiplex highly vulnerable ( Figure 5B , left). This vulnerability co-occurs with higher C R for the pathogenrelated genes and proteins compared to not pathogen-related genes and proteins for all the pathogens, except dengue and influenza with BioGRID PPI ( Figure 5B , right). C D is similar between essential and non-essential genes ( Figure S7B We collected disease-related genes from a publicly available database, DisGeNET [65, 66, 67] . The current version (v6.0) contains gene-disease associations between 17,549 genes and 24,166 diseases, disorders, traits, and clinical or abnormal human phenotypes. We collected disease-gene associations for diseases which have at least 100 genes (with HINT PPI network) in the human multiplex considered in this work. After this filtering, we retain 24 diseases in our analysis ( Figure 5C ). For each disease, we divide the human multiplex into two sets of genes and proteins: disease-related (S D ) and not disease-related (S N D ). We perform targeted attack on these sets of genes and proteins and generate partial attack curves (Methods and Supplementary Information, Additional File 1). Size of S D is greater than S N D for all the 24 diseases. Therefore, for an equitable comparison, we randomly sampled genes and proteins from S N D to match the number in S D . This process is repeated 100 times. For a baseline comparison, we also randomly sample genes and proteins to match the number in S D . This random sampling was also performed 100 times. RobustArea and R are computed as it was done for pathogens previously (Methods and Supplementary Information, Additional File 1) . For all the diseases, except Unipolar Depression (with HINT PPI) and Liver Cirrhosis, Experimental (with BioGRID PPI), targeted attack on the disease-related genes and proteins makes the multiplex highly vulnerable ( Figure 5C , left). This vulnerability co-occurs with higher C R for the disease-related genes and proteins compared to not disease-related genes and proteins for all the diseases, ( Figure 5C , right). C D is similar between disease and non-disease genes ( Figure S7C , right Supplementary Information, Additional File 1). Further, given our simulations (Figure S12 Supplementary Information, Additional File 1), this suggests that higher redundancy makes the disease-related genes and proteins highly important for the human multiplex. R and C R for S D are negatively correlated with Spearman's rank correlation of -0.69 (p-value = 2.1×10 −7 ), Figure S9C We also collected the set of genes that contain mutations which have been causally implicated in cancer from the Network of Cancer Genes (NCG) database [68] . NCG also includes information on whether a given cancer gene is an oncogene or a tumor suppressor gene (TSG). For the set of oncogenes (TSGs) in NCG, we divide the human multiplex into two sets of genes and proteins: oncogene (TSG) (S ON (S T SG )) and not oncogene (TSG) (S N ON (S N T SG )). As before, we compare partial robustness between both the sets. We find that targeted attack on oncogenes (TSGs) makes the multiplex highly vulnerable ( Figure 5D , left). This vulnerability co-occurs with higher C R for the set of oncogenes (TSGs) ( Figure 5D , right). C D is similar between oncogenes (TSGs) and non-oncogenes (TSGs) ( Figure S7D , right Supplementary Information, Additional File 1 ). Recently, robustness properties of different biological multiplex and multilayer networks have been studied. These include brain networks [51, 58] , multiplex of PPI interactions [58] , and a TRN-metabolic multilayer network [69] . However, these studies have limitations, which are described next. Kleineberg et al. [58] do not draw broader conclusions about the organization of PPI multiplex across different species. Further, the different layers encode different types of interactions between proteins. This framework does not capture interaction between different types of molecules. Their conclusions are based on a generative model of network growth. This makes the results contingent on the accuracy of the generative model. This generative model is based on geometric principles and does not incorporate biological motivations or mechanisms. Klosik et al. [69] study robustness under random failure of a TRN-metabolic multilayer network. The study only focuses on E. coli and there are no species wide comparisons. Moreover, they do not study the dependence of robustness on degree-degree coupling and redundancy. Another recent study focuses on the interdependent or multiplex network of TRN-PPI-metabolic networks in human [70] . Liu et al. 2019 [70] show that this multiplex is more robust than an uncoupled or shuffled multiplex. They also showed that essential and cancer genes are preferentially arranged in essential parts of the multiplex. However, they do not study the dependence of robustness on multiplex properties. Further, there is no cross-species analysis. This study bridges the gap between theoretical developments and sub-cellular multilayer networks of molecular interactions. The central goal of our work is to investigate the organization and traits of molecular multiplexes. We focus our attention on the multiplex of TRN and PPI networks across five different eukaryotes. Our analysis spans five different TRN and 9 different PPI networks. We show that degreedegree coupling and redundancy are universal principles that shape robustness of the multiplex. Both are independent modulators of robustness. Though maximum robustness is achieved for a degree-degree coupling of 1 and a completely redundant multiplex, the observed species multiplexes have low absolute degree-degree coupling and redundancy. This suggests that robustness is not the only evolutionary pressure shaping the TRN-PPI multiplex. Independence might be a countering force to robustness. One possible explanation for low absolute degree-degree coupling, redundancy and hence robustness could be an inability to tune degree-degree coupling and redundancy independently. Redundancy and degree-degree coupling are positively correlated across species (Figure S11 Supplementary Information, Additional File 1). Multiplexes in nature might be tuning degree-degree coupling and redundancy in unison. Therefore, increasing robustness would increase redundancy as well. If redundancy were high, both TRN and PPI layers would be encoding similar interactions and the amount of unique information captured by the multiplex would be low [71] . Species multiplexes might have an upper bound on redundancy which could explain the low absolute values for robustness. Robustness, independence and redundancy are only some of the pressures which might affect the structure of TRN-PPI multiplex across the domains of life. Other topological factors might possibly be involved. For instance, theoretical studies have established that controllability [72] and navigability [73] both depend on multiplex structure. Further, these results have been confirmed in macroscale networks [72, 73] . For multiplex networks, with one-to-one correspondence between the nodes in the two layers, controllability decreases with increasing degree-degree coupling [72] . This means that the TRN-PPI multiplex might become less controllable at high degree-degree coupling, and more genes and proteins will need to be controlled to steer the multiplex towards a desired state. Navigability is negatively affected by redundancy as well. As the number of overlapping edges, and hence redundancy, increases, navigability decreases [73] . Navigability is quantified by two different metrics; maximum entropy of trajectories explored by a random walker over the multiplex, and uniformity in the steady state probability distribution of node occupation under random walks over the multiplex. Maximum entropy decreases and probability distribution of node occupation becomes more heterogeneous with increasing edge overlap. At high redundancy or edge overlap, low maximum entropy will mean that a random walker can only explore a limited set of trajectories, and highly heterogeneous steady-state distributions will lead to unbiased occupancy among nodes. Therefore, controllability and navigability might exert countering pressures to robustness in shaping the structure of the TRN-PPI multiplex. Disassortative mixing is another important property of molecular networks [74] . Individual network disassortativity might interact with multiplex coupling to create higher order effects, where degrees of neighbors in individual networks might be coupled in the multiplex. Such high order coupling may have additional impact on robustness and other multiplex properties. We have identified degree-degree coupling and redundancy as two modulators of multiplex robustness. These modulators can potentially be tuned to control robustness in naturally existing TRN-PPI multiplexes. Further, we can even custom design synthetic TRN-PPI multiplexes to have desired robustness values. For instance, if robustness is the desired property for a set of genes, the multiplex could be rewired such that protein hubs are also highly regulated transcriptionally. On the other hand, if independence between TRN and PPI layers is the desired behavior, degree-degree coupling and redundancy can be reduced synthetically. In principle, similar ideas can be extended to multiplexes comprising different types of molecular species, for example, protein coding mRNA, miRNA and protein-binding mRNA. The results of this study can be easily extended to other molecular multiplexes and can inform the design of novel multiplexes with different molecular species to achieve a desired biological function. Besides the global design principles for multiplex organization, we have also shown that functionally important genes and proteins have a distinct distribution over the TRN-PPI multiplex. Essential, disease-and pathogen-related genes and proteins are preferentially situated in essential parts of the multiplex. This topological placement is dictated by redundancy. Attack on these functionally important genes quickly dismantles the multiplex. For diseases and pathogens, this suggests that these diseases and pathogens have evolved with the human multiplex and preferentially interact with the vulnerable genes and proteins. Thus, multiplex framework can be useful in the study of disease evolution. Network analysis has previously been used for repurposing existing drugs [75] . We believe that our multiplex approach might help in better identification of drug targets, since a multiplex better captures the complexity of the underlying molecular networks. Therefore, multiplex framework might have application in network medicine. One limitation of any network analysis of molecular interactions is incomplete data. This problem is further compounded due to partial overlap between the TRN and PPI networks. However, it has been shown previously that if the size of the incomplete network is above a certain threshold such that a giant component exists, incomplete networks are representative of the complete network [76] . This suggests that our analysis is representative of the complete species multiplexes. Addition of more genes and proteins in the multiplex might change the specific values of C D , C R and R, however the general dependence between robustness and degree-degree coupling and redundancy will hold. Further, we have not incorporated information about isoform proteins in this study. However, it is straightforward to include such information. In the presence of isoforms, the correspondence between TRN genes and PPI proteins will be one-to-many rather than one-to-one. Quantifying structure and topology in complex biological networks has been actively researched within network biology. Design principles include the universality of scale-free networks [1, 77] from metabolic networks across species [28] and gene regulatory networks [5, 8, 13, 30, 78, 79] , to power grids and the internet [80] , lethal deletions in the hubs of yeast [16] , disassortative mixing in molecular networks [74] [74], and the existence of sub-modules and reoccurring network motifs [9, 12] . Network biology has only recently investigated the influence of multilayered multiplex networks in comparison to single network layers in isolation [51, 58, 69, 70, 71, 81] . This study contributes to the understanding of internetwork connectivity in layered molecular interaction networks. It is the first to compare TRN-PPI multiplexes across species covering two domains of life. We discover global trends across species with degree-degree coupling and edge redundancy positively correlated with increased robustness. Robustness is explored in the context of the TRN-PPI multiplex and is proposed as one of the selective pressures by which evolution has shaped internetwork connectivity, degree-degree coupling and redundancy. The design principles presented here may be useful for the future design and understanding of multiplex networks and to improve efficacy for targeting specific gene subgroups, e.g. in disease. This research presents a multiplex framework for additional investigations of design principles in interlayered biological networks. We compiled network data for nine different species. These nine species span two domains of life, namely bacteria and eukaryotes. There are three bacteria-H. pylori, M. tuberculosis and E. coli -and six eukaryotes-S. cerevisiae, C. elegans, D. melanogaster, A. thaliana, M. musculus and H. sapiens. For the eukaryotes, we collected three datasets-one TRN and two PPI networks. Among the bacteria, E. coli also has three datasets (one TRN and two independent PPI networks), while H. pylori and M. tuberculosis have one TRN and one PPI networks each. These datasets have been collected from diverse sources (see Table S1 , Supplementary Information Additional File 2). We use PPI data from multiple published sources-speciesspecific publications [15, 18, 24] , BioGRID database [14] and HINT database [57] . Different experimental methods uncover different information about PPI networks [26] . Therefore, we only use PPI networks inferred from Yeast two-hybrid (Y2H) experiments [82] . Y2H infers binary protein-protein interactions and is a prominent strategy for identifying protein-protein interactions [83] . BioGRID and HINT do not have data for H. pylori and M. tuberculosis. PPI networks for these bacteria were collected from individual publications, Huser et al. [15] and Wang et al. 2010 [24] respectively. E. coli only exists in HINT. We include another published PPI network for E. coli [18] . A consolidated database of TRN networks across species does not exist. Therefore, we collected protein-DNA interactions from different publications. References for all the species are given in Table S1 , Supplementary Information Additional File 2. Available TRN and PPI networks are incomplete. Consequently, they only contain a fraction of the total number of possible genes and proteins in the genome and proteome, respectively. Further, TRN and PPI networks used in this study have different numbers of genes and proteins (see Table S1 , Supplementary Information Additional File 2) . For a given species, we have only considered genes and proteins which are present in both TRN and PPI networks in our analysis. The following characteristics of the TRN and PPI networks used in this study are included in Table S1 , Supplementary Information Additional File 2-Number of genes and proteins, % proteome coverage in the TRN-PPI multiplex (fraction of the total proteome covered in the multiplex), number of network edges (edges represent connections between genes and proteins in TRN and PPI networks respectively), average degrees (average K in PPI and average kin or kout in TRN) and size of the Largest Connected Component (LCC) (subset of genes/proteins in TRN/PPI where every gene/protein is reachable from every other gene/protein) are shown for all nine species. List of essential genes was collected for three species (yeast, fly and human) from the Online GEne Essentiality (OGEE) database [60, 61] . The database has gene essentiality information on 48 species. We collected human-pathogen protein-protein interaction data for 12 different pathogens from a publicly available database HPIDB 3.0 [62, 63] . This curated database contains 69,787 unique protein interactions between 66 host and 668 pathogen species. Human-pathogen protein interactions for various human coronaviruses (HCoVs) were collected from a recently published paper [64] . We collected disease-related genes from a publicly available database DisGeNET [65, 66, 67] . The current version (v6.0) contains gene-disease associations between 17,549 genes and 24,166 diseases, disorders, traits, and clinical or abnormal human phenotypes. We collected disease-gene associations for diseases which have at least 100 genes in the human multiplex considered in this work. We collected the set of genes which contain mutations which have been causally implicated in cancer from the Network of Cancer Genes (NCG) [68] . NCG also includes information on whether a given cancer gene is an oncogene or TSG. Multiplex formulation of transcriptional regulatory and protein-protein interaction networks TRN and PPI networks are modeled as interdependent networks ( Figure 1A) . TRN layer encodes the transcriptional program for producing proteins from genes. The proteins translated from the TRN layer participate in protein-protein interactions in the PPI layer. There is one-to-one correspondence between genes and proteins in the TRN and PPI network layers. This specific configuration of interdependent networks can be reduced to a multiplex network [43] , and we can apply the framework developed by Buldyrev et al. 2010 [43] . We use graph theory to model and analyze TRN-PPI multiplex in this work. TRN and PPI networks are modeled as graphs with nodes representing genes and proteins respectively ( Figure 1A) . Connections between nodes are represented by edges. PPI edges are undirected. Edges in TRN have directionality-transcription factors have edges emanating from them, while downstream genes have incoming edges. The connectivity pattern of edges is quantified by the concept of degree at each node. In PPI networks, degree (K) is the number of edges incident on a protein. For TRN, in-degree (k in ) is the number of transcription factors upstream of a gene, and out-degree (k out ) is the number of genes downstream of a transcription factor. Since TRN and PPI layers have different coverage of the genome and proteome (see Table S1 , Supplementary Information Additional File 2) , all analysis was done with genes and proteins present in both TRN and PPI networks. Quantifying multiplex coupling Degree-degree coupling We quantify degree-degree coupling (C D ) using either Pearson's correlation or Spearman's rank correlation coefficient (Eqn. 1). where cor() is the sample Pearson's correlation or Spearman's rank correlation coefficient, k out is the TRN out-degree and K is the PPI degree. Redundancy coupling (C R ) is quantified by the number of edges simultaneously present in TRN and PPI. Assume that G 1 and G 2 are graphs representing TRN and PPI networks respectively, and V 1 and V 2 are the corresponding vertex sets. Let E(G 1 ) and E(G 2 ) be the edge sets for G 1 and G 2 respectively. The elements of the edge sets are vertex pairs. For instance, (V 1 i , V 1 j ) ∈ G 1 means that gene i is a transcription factor regulator for gene j, (V 2 i , V 2 j ) ∈ G 2 means that proteins i and j interact with each other. Interactions common between TRN and PPI can be mathematically represented by the number of common edges between G 1 and G 2 (Eqn. 2). where Edges 12 is the number of edges common between G 1 and G 2 and e represents an edge either in G 1 or G 2 . In Figure 2 , we compute node-specific redundancy coupling. Here, C R for each gene-protein is equal to the number of redundant edges incident on that gene-protein pair. C R is either calculated as a z-score, which is computed as where Edges null 12 is the number of redundant edges in a null model, or C R = Edges 12 or C R = mean(Edges 12 ) − mean(Edges null 12 ). The definition of C R used is specified in each figure's caption. Quantifying multiplex robustness We use MCGC to quantify multiplex robustness [43] . MCGC is the set of genes and proteins which are simultaneously connected in both the network layersevery gene/protein in MCGC is reachable from every other gene/protein in MCGC. MCGC is computed by finding the intersection between the largest connected components (LCCs) of the TRN and PPI network layers. To quantify response to targeted attack, we track the size of the largest MCGC at each step of the attack. We simulate attack on the multiplex via the following algorithm. 1 Compute multiplex degree for all gene-protein pairs in the multiplex. Multiplex degree is defined as K mult (i) = max(K(i), k out (i)) [58] , where K mult (i) is the multiplex degree for gene-protein pair i, K(i) is the degree of protein i in the PPI network and k out (i) is the out-degree of gene i in the TRN network. Order multiplex degrees into a list of gene-protein pairs, D, arranged in decreasing order of multiplex degree. This algorithm will generate a sequence of values, which give the trajectory of the MCGC as the multiplex is successively attacked. If we plot this trajectory as a function of the fraction of gene-protein pairs removed from the multiplex, robustness to attack can be assessed from either the area under the curve or the critical number of nodes removed for which the MCGC is fragmented [46] . We use area under the curve (RobustArea) as the measure for multiplex robustness. Thus, RobustArea is given as where RobustArea ∈ (0, 1] is the area under the curve, f is the fraction of geneprotein pairs removed during the attack, and RM CGC(f ) is the size of MCGC relative to the total number of gene-protein pairs in the multiplex (n) after f fraction of gene-protein pairs have been removed from the multiplex. RobustArea quantifies absolute robustness. Relative robustness (R) is computed by comparing RobustArea against a null model. Cohens d is used to compute effect size of relative robustness (Eqn. 2). Where R is the relative robustness, RobustArea obs and RobustArea null are the RobustArea values for the observed multiplex and null model respectively and mean() and var() are the mean and variance functions respectively. We have assumed that RobustArea obs and RobustArea null have the same number of samples. Robustness for partial attack curves For Figure 5 , we estimate robustness for a subset of functionally important geneprotein pairs and compare that against the remaining gene-protein pairs in the multiplex. Here, we explain the strategy to conduct such a comparison. Assume that S = {S 1 , S 2 , ..., S M } is a collection of sets of gene-protein pairs in a multiplex. Here M is the total number of sets. Sets S i , i ∈ {1, 2, ..., M }, can be mutually exclusive or not. Let M min be the size of the smallest set in S. For an equitable comparison, we randomly sample M min number of gene-protein pairs from all the subsets, except for the smallest subset. We sample each subset 100 times. For each sampled version of a subset S i ∈ S, we attack the gene-protein pairs in S i in decreasing order of multiplex degree for gene-protein pairs in that set, using the attack algorithm explained previously. We stop the attack once M min number of gene-protein pairs have been removed from the multiplex. We also perform a similar partial attack on a set of randomly selected gene-protein pairs. This size of the random set is set equal to M min . Robustness can be calculated for each subset based on the obtained partial attack curves. Relative robustness for each subset is calculated by comparing RobustArea for that subset against the random set. Figure S5 , Supplementary Information, Additional file 1. C out D : degree-degree coupling between kout and K, C in D : degree-degree coupling between k in and K. C) (Left) Dependence of robustness on degree-degree coupling while redundancy is fixed. For each species curve, R is calculated by comparison against the sampled multiplex with C D = 0. (Right) Dependence of robustness on redundancy coupling while degree-degree coupling is fixed to the value in the full multiplex. For each species curve, R is calculated by comparison against the sampled multiplex with C R = 0. In all the panels, error bars show 95% CIs. In panel B, BioGRID PPI networks are used; results with HINT PPI networks are shown in Figure S6B , Supplementary Information, Additional File 1. For panels B and C, C R is set equal to the number of redundant edges in the multiplex. Figure 5 Functionally important genes and proteins are redundant and essential for human multiplex. A) (Left) Essential genes and proteins (red) have lower robustness to targeted attack across species compared to non-essential genes and proteins (blue). (Right) Lower R is accompanied by higher redundancy (C R ) for essential genes and proteins compared to non-essential genes and proteins. B) (Left) Pathogen-related genes and proteins (red) have lower R to targeted attack across pathogens compared to non-pathogen genes and proteins (blue). (Right) Lower R is accompanied by higher C R for pathogen genes and proteins compared to non-pathogen genes and proteins. C) (Left) Disease-related genes and proteins (red) have lower R to targeted attack across diseases compared to non-disease genes and proteins (blue). (Right) Lower R is accompanied by higher C R for disease genes and proteins compared to non-disease genes and proteins. D) (Left) oncogenes (tumor suppressor genes (TSGs)) and proteins (red) have lower R to targeted attack across pathogens compared to non-oncogenes (-TSGs) and proteins (blue). (Right) Lower R is accompanied by higher C R for oncogenes (TSGs) and proteins compared to non-oncogenes (-TSGs) and proteins. In all the panels, database used for PPI networks is annotated on the y-axis (Methods). In all the panels, relative robustness (R) is measured against a random set of genes and proteins used as the null (Methods), and C R values are calculated as the difference in redundancy against the random set of genes. Error bars show 95% CIs. For all the panels, C R is calculated as the mean difference in the number of redundant edges between the species' multiplex and the random set of gene-protein pairs. Network biology: understanding the cell's functional organization YEASTRACT: providing a programmatic access to curated transcriptional regulatory associations in Saccharomyces cerevisiae through a web services interface Constructing transcriptional regulatory networks A gene-centered C. elegans protein-DNA interaction network Topological and causal structure of the yeast transcriptional regulatory network Evidence for dynamically organized modularity in the yeast proteinprotein interaction network An Arabidopsis transcriptional regulatory map reveals distinct functional and evolutionary features of novel transcription factors Transcriptional regulatory networks in Saccharomyces cerevisiae Network motifs: simple building blocks of complex networks A compendium of Caenorhabditis elegans regulatory transcription factors: a resource for mapping transcription regulatory networks A core transcriptional network for early mesoderm development in Drosophila melanogaster Network motifs in the transcriptional regulation network of Escherichia coli Structure and evolution of transcriptional regulatory networks The BioGRID interaction database: 2017 update A second-generation proteinprotein interaction network of helicobacter pylori Lethality and centrality in protein networks DroID 2011: a comprehensive, integrated resource for protein, transcription factor, RNA and gene interactions for Drosophila The binary protein-protein interaction landscape of Escherichia coli Towards a proteome-scale map of the human proteinprotein interaction network A network of proteinprotein interactions in yeast A human protein-protein interaction network: a resource for annotating the proteome STRING v10: proteinprotein interaction networks, integrated over the tree of life Global protein function prediction from protein-protein interaction networks Global proteinprotein interaction network in the human pathogen Mycobacterium tuberculosis H37Rv Functional and topological characterization of protein interaction networks High-quality binary protein interaction map of the yeast interactome network Functional cartography of complex metabolic networks The large-scale organization of metabolic networks Metabolic network structure determines key aspects of functionality and regulation Evolutionary dynamics of prokaryotic transcriptional regulatory networks Genomic analysis of regulatory network dynamics reveals large topological changes Drugtarget network Network medicine: a network-based approach to human disease The human disease network Network pharmacology: the next paradigm in drug discovery Human symptomsdisease network An extensive network of coupling among gene expression machines Modular biological function is most effectively captured by combining molecular interaction data types Integration of external signaling pathways with the core transcriptional network in embryonic stem cells Integrating transcriptional and protein interaction networks to prioritize condition-specific master regulators Network motifs in integrated cellular networks of transcriptionregulation and proteinprotein interaction Catastrophic cascade of failures in interdependent networks Percolation of partially interdependent networks under targeted attack Correlated multiplexity and connectivity of multiplex random networks Breakdown of interdependent directed networks Interdependent networks: Reducing the coupling strength leads to a change from a first to second order percolation transition Inter-similarity between coupled networks Percolation in real interdependent networks Abrupt transition in the structural formation of interconnected networks Avoiding catastrophic failure in correlated networks of networks Percolation theory on interdependent networks based on epidemic spreading Simultaneous first-and second-order percolation transitions in interdependent networks Multilayer Networks: Structure and Function Network robustness of multiplex networks with interlayer degree correlations Percolation in multiplex networks with overlap HINT: High-quality protein interactomes and their applications in understanding human disease Geometric correlations mitigate the extreme vulnerability of multiplex networks against targeted attacks Statistical Power Analysis for the Behavioral Sciences OGEE v2: an update of the online gene essentiality database with special focus on differentially essential genes in human cancer cell lines OGEE: an online gene essentiality database Hpidb-a unified resource for host-pathogen interactions Hpidb 2.0: a curated database for host-pathogen interactions Network-based drug repurposing for novel coronavirus 2019-ncov/sars-cov-2 Disgenet: a discovery platform for the dynamical exploration of human diseases and their genes Disgenet: a comprehensive platform integrating information on human disease-associated genes and variants The disgenet knowledge platform for disease genomics: 2019 update The network of cancer genes (ncg): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens The interdependent network of gene regulation and metabolism is robust where it needs to be Robustness and lethality in multilayer biological molecular networks Structural reducibility of multilayer networks Effect of degree correlation on exact controllability of multiplex networks Efficient exploration of multiplex networks Specificity and stability in topology of protein networks Network-based in silico drug efficacy screening Uncovering disease-disease relationships through the incomplete interactome Scale-free networks in cell biology Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in Escherichia coli Scale-free networks Control of multilayer biological networks and applied to target identification of complex diseases A novel genetic system to detect proteinprotein interactions Yeast two-hybrid, a powerful tool for systems biology Degree-degree coupling (C D ) and redundancy coupling (C R ) can assume any value in the multiplex of TRN (yellow layer) and PPI (green layer) networks across species. A) (Right, Top) For C D > 0, highly connected genes in TRN (red spheres) are more likely to produce hubs or highly connected proteins in PPI (red spheres), while sparsely connected genes in TRN (blue spheres) are highly likely to produce spokes or sparsely connected proteins in PPI (blue spheres). (Right, Bottom) For C D = 0, TRN and PPI will be uncoupled in the multiplex, and the association between genes and proteins would be randomized. Here, highly connected genes produce spoke proteins and vice-versa sparsely connected genes produce hub proteins. (Left) Based on theoretical studies, C D is expected to be positively correlated with robustness. Robustness is quantified by area under the attack curve for the Mutually Connected Giant Component (MCGC) (Methods). B) (Left, Top) For C R > 0, there will be non-zero number of edges which are simultaneously present in both TRN and PPI (edges marked green). (Left, Bottom) For C R = 0, TRN and PPI will not have common edges We are grateful to the members of the Dar lab for fruitful discussions on the manuscript. All TRN and PPI networks are provided as an R programming language [84] data object ("NetworkMultiplex.RData" in Additional File 3). Data for all the pathogens and diseases are also provided as R programming language data objects in Additional file 3. The authors declare that they have no competing interests. Table S1 . Supplementary table listing the properties and sources for the TRN and PPI networks used in this study.Additional File 3 -Code. Custom scripts written in the R programming language [84] for reproducing