key: cord-0272487-kvqtaj3n authors: Jiang, Xiong-Fei; Xiong, Long; Bai, Ling; Lin, Jie; Zhang, Jing-Feng; Yan, Kun; Zhu, Jia-Zhen; Zheng, Bo; Zheng, Jian-Jun title: Structure and dynamics of human disease-complication network date: 2021-12-14 journal: bioRxiv DOI: 10.1101/2021.12.13.472342 sha: 414605cb63e55302c7b9cc167390967f90495f6c doc_id: 272487 cord_uid: kvqtaj3n A complication is an unanticipated disease arisen following, induced by a disease, a treatment or a procedure. We compile the Human Disease-Complication Network from the medical data and investigate the characteristics of the network. It is observed that the modules of the network are dominated by the classes of diseases. The relations between modules are unveiled in detail. Three nontrivial motifs are identified from the network. We further simulate the dynamics of motifs with the Boolean dynamic model. Each motif represents a specific dynamic behavior, which is potentially functional in the disease system, such as generating temporal progressions and governing the responses to fluctuating external stimuli. Author summary Advances in molecular biology lead to a new discipline of network medicine, investigating human diseases in a networked structure perspective. Recently, clinical records have been introduced to the research of complex networks of diseases. An important available medical dataset that has been overlooked so far is the complications of diseases, which are vital for human beings. We compile the Human Disease-Complication Network, representing the causality between the upstream diseases and their downstream complications. This work not only helps us to comprehend why certain groups of diseases appear collectively, but also provides a new paradigm to investigate the dynamics of disease progression. For clinical applications, the investigation of complications may yield new approaches to disease prevention, diagnosis and treatment. Advances in molecular biology lead to a new discipline of network medicine, which has between disease genes and disease phenotypic features has brought the concept of 23 diseasome [1, [29] [30] [31] . 24 To go beyond the global features, the motif is utilized to characterize the local 25 properties of those biological networks. The pioneering work by Milo et al. identifies 26 motifs in networks from biochemistry, neurobiology, ecology, and engineering [32] . 27 Then motifs have been observed in many biological networks [33] [34] [35] , and their 28 dynamics have been investigated further [34, 36] . 29 A critical medical dataset that has been overlooked so far is the complications of 30 diseases, which are vital for patients in clinical practice. For example, complictions 31 may affect clinical decisions for physicians in certain in circumstances [37] . A 32 complication is an unanticipated disease arisen following, induced by a disease, a 33 treatment or a procedure. Diseases form a networked structure with their 34 complications. For example, COVID−19 generates numerous complications, such as 35 venous thromboembolism and acute kidney injury, etc [38] . We compile the Human 36 Disease-Complication Network, representing the causality between the upstream 37 diseases and their downstream complications. We systematically investigate the 38 structure of the network, and pay attention to the disease modules. Taking into 39 account the complexity of network dynamics, we further study the motifs, with which 40 the dynamics can be depicted in the disease system. This work not only helps us 41 understand how different medical subdisciplines organize, but also provides a 42 comprehensive understanding of why certain groups of diseases appear collectively. 43 44 Materials 45 We collected data from the Clinical Medicine Knowledge Database, including 6715 46 diseases. The complications of diseases are extracted from the descriptions of diseases 47 in the database. A node denotes a disease, and a directed link from the i-th node to 48 the j-th node is drawn if the i-th disease generates the j-th complication. Then, we The k-core analysis is able to uncover a nucleus set of nodes, i.e., a set of nodes with a 57 high degree connected to each other. It has been widely used in networks to identify 58 the kernels more robustly than simply through the ranking of centrality 59 measures [39, 40] . The k-core of a network consists of nodes i with the degree k i ≥ k, 60 and the k-core could be extracted by the iterative removal of all nodes i with degrees 61 k i < k when k > 0. The motifs are defined as local patterns occurring in the real network significantly 64 more frequently than in randomized networks with the same degree sequence [32, 34] . For any given network, the occurrence number N g of the g-th connected subsets is 66 related to the network size and the degree distribution. For measuring the statistical 67 significance, the randomized ensemble of networks is generated as a null model [32] . The statistical significance Z score is defined as where N g is the appearance number of the g-th subset in the real HDCN, ⟨N g ⟩ and σ 70 are the average appearance number and standard deviation in the randomized 71 ensemble of networks. The motifs are those subsets with significantly higher frequency 72 in the real network than randomized ones, measured by the Z score. In this paper, we 73 mainly focus on the 3-node and 4-node connected subsets. Boolean dynamics 75 The Boolean dynamics is a common model in gene regulatory networks and signal 76 transduction networks [41, 42] . Each node in a Boolean network represents a 77 sub-cellular component such as protein, gene, transcription factor or metabolite. The 78 states of input nodes i are described by a binary value X i . X i = 1 represents that the 79 component i is active or expressed, X i = 0 means that it is inactive or not expressed. The state of node j at time t + 1, X j (t + 1), is determined by a logic operation 81 together with the current state of its upstream regulators X i (t). The logic operation is 82 the Boolean update function, denoted by the logic operators or a weighted sum of the 83 inputs to an activation threshold. The output nodal dynamics is described by a 84 differential equation like where F (X 1 , T 1j ; · · · ; X i , T ij ) is the Boolean update function. Here, we introduce the Boolean dynamic model to simulate the complication 87 progression in the HDCN. The disease progression in the HDCN is compared to other 88 biological processes, such as gene regulatory networks. A complication is generated by 89 the collective effect of upstream diseases, and it will presumably self-cure after the 90 upstream diseases are cured. It is reasonable to utilize the Boolean dynamic model to 91 investigate the complication dynamics. Therefore, X i represents the activation of the 92 i-th upstream disease, T ij is the activation threshold of the i-th disease to the j-th 93 complication, α is the lifetime of the cured disease, and F (X 1 , T 1j ; · · · ; X i , T ij ) 94 denotes the summarized effect of overall i upstream diseases on the j-th complication. Qualitatively, the HDCN is formed by very few disconnected components and a large 97 giant connected component (see Fig 1) diseases. The in-degree distribution decays faster than the out-degree distribution, but 113 still significantly deviates from the Poisson distribution expected for a random graph. In the HDCN, a disease presents a streaming structure: generating complications, 115 and as a complication caused by other diseases. Therefore, we introduce in-degree k in 116 and out-degree k out to quantitatively categorize the diseases into these three structural 117 levels, i.e., upstream, intermediate and downstream. The diseases with highest k out 118 and k in are listed in Table 1 . A node with high k out and low k in is an upstream 119 disease such as Acute Lymphoblastic Leukemia (k out = 16, k in = 1), since it could lead 120 to other diseases yet be hardly produced by others. On the contrary, a node with low 121 k out and high k in is a downstream disease such as Pneumonia (k out = 0, k in = 181), To quantify the correlation between the in-degree and out-degree of nodes, we 130 compute the Spearman rank correlation coefficient SC between these two ranking [43] . 131 A negative value of SC = −0.26 indicates that the in-degree and out-degree in the 132 same node are significantly asymmetric, i.e., the disease connecting with more 133 upstream diseases usually results in fewer downstream complications, and vice versa. With the k-core method [39, 40] , we identify a small well-connected nucleus, 135 consisting of 98 diseases with the maximal coreness= 7 (See S1 Table ) . The Although the HDCN layout is generated without any priori knowledge on disease 141 classes, it is naturally and visibly clustered according to major disease classes. In 142 order to quantitatively understand this clustering nature, we identify the community 143 structure of the HDCN. In a complex network, the community structure is the 144 grouping of nodes into clusters with a high density of internal links, while including a 145 relatively low density of links between clusters [44, 45] . The out-degree and in-degree distributions P (k) of the HDCN. In-degree is the number of edges pointed to some vertex, and out-degree is the number of edges pointing away from it. the probability distribution of out-degree exhibits an approximative power-law, while the one of in-degree decays in an exponentially. distributed in the related organs, rather than form a single module as in the 155 gene-disease network [30] . This difference is rooted in that the cancers in the former 156 one usually cause complications in the related organs, while the latter one may share 157 the same genes and generate dense connections between each other. Gynecology, Nephrology and Urology are grouped into a single module, since all three belong to 159 the genitourinary system in which the related organs are physiologically close and 160 interlinked by the blood supply and some meatus. In a similar vein, the diseases in 161 Endocrine, Metabolic, Hematology and Rheumatology form a single module, which 162 has a global effect on the whole human disease system. To quantify the influence of modules in the progression of complications, we define 164 the complicating ability of the p-th module as where κ out p and κ out p are the total out-links and in-links of the p-th module connecting 166 to other modules. The high value of C p suggests that the module can trigger 167 downstream complications with high possibility, while the high absolute value of 168 negative C p means that the module is prone to be caused by upstream diseases. As 169 shown in Table 2 , Orthopedics, Dermatology, and Cancer are the modules with the 170 highest C p . Therefore, these three are the most influential disease modules, which can 171 generate the downstream diseases with great capability. Meanwhile, General Disease, Table 3 . disease layer. k out and k in of four nodes in the bi-fan are shown in Table 4 . has not been observed in other networks. As shown in Table 3 , the OFFL consists of The frequency of motifs appeared significantly higher than in a random network, For the AND-gate, the differential equations are written as and where F (u, T ) = (u/T ) / (1 + (u/T )). In the remainder of this paper we will adopt gene regulation networks and neuronal connectivity networks [34] . Therefore, the 276 AND-gate of the FFL is probably a general mechanism to protect biological functions. 277 For the OR-gate, Eq 5 is substituted by If not stated otherwise, the 280 function will be adopted for all OR-gate. The results are exhibited in In the bi-fan motif, upstream diseases U 1,2 have two affecting pathways, U 1 → D 1,2 292 and U 2 → D 1,2 . Likewise, U 1 and U 2 may act in an AND-gate or OR-gate manner to 293 control D 1,2 . For the OR-gate, both D 1,2 could be activated by U 1 and U 2 independently, i.e., 295 U 1 → D 1,2 and U 2 → D 1,2 . The differential equations of the bi-fan are written as and As shown in Fig 5, when U 1 and U 2 co-occur simultaneously will the morbidity of D 1 and D 2 be high. To 300 some extent, the FFL motif provides a temporal mechanism to prevent the wild For the AND-gate, both D 1,2 should be activated by U 1 and U 2 jointly. Eq 6 and 7 308 is substituted by dD1 simultaneously. The response rates of D 1,2 in the OR-gate are higher than the ones in 312 the AND-gate, while the relaxation rates of D 1,2 in the OR-gate are slower than the 313 ones in the AND-gate. Overlapping feedforward loop 315 The OFFL has a more complicated structure than the other two. For simplicity, we 316 set that U 1 and U 2 always act in an OR-gate manner to control M and D. However, 317 M still acts with U 1 and U 2 in an AND-gate or OR-gate manner to control D. We will 318 discuss this AND-gate or OR-gate below. For a mixed model of AND-gate and OR-gate, the differential equations of the 320 OFFL are written as and The results are shown in Fig 6. The progression level of D disease in the OFFL is For the OR-gate, Eq 9 are written as dD vaginal versus a cesarean delivery) are influenced by such a heuristics [37] . If the prior 344 patient had complications in one delivery mode, the physician will be more likely to 345 switch to the other, and likely inappropriate-delivery mode for the subsequent patient, 346 regardless of patient's indicators. More importantly, this strategy presents small but 347 significantly negative effects on patient health outcomes and increases resource use. which have widespread effects on the whole disease system. In parallel, many methods have been introduced to the classification of human 373 diseases, such as machine learning [46] , integration of phenotypic similarity with 374 genomics [47] , pathway-based classification [48] and consensus-based technique [49] . However, contemporary approaches usually do not consider the interactions among 376 diseases [50] . This failure partly comes from the focused nature of medical training, 377 and the reductionist paradigm in modern medicine. To overcome this shortcoming, the 378 network framework is applied to define human disease [29, 51] . In our work, the disease 379 modules clustered in the HDCN may further provide complementary information to 380 classify the human disease more accurately. In this paper, the HDCN is constructed from the medical data. We investigate the 383 topological characteristics, including the degree distribution, clustering coefficient and 384 k-core of the HDCN. Further, we identify the disease modules which are dominated by 385 the classes of diseases. The relations between modules are unveiled in detail. The Exploring the human diseasome: the human disease network Systems biology and the future of medicine Network medicine: A network-based approach to human disease Computational network biology: data, models, and applications Dynamics-based data science in biology Comparative host-coronavirus protein interaction networks reveal pan-viral disease mechanisms A scored human protein-protein interaction network to catalyze genomic interpretation Towards a proteome-scale map of the human protein-protein interaction network A human protein-protein interaction network: a resource for annotating the proteome The transcriptional landscape of the mammalian genome Metabolic network analysis reveals microbial community interactions in anammox granules The large-scale organization of metabolic networks The small world of metabolism Global reconstruction of the human metabolic network based on genomic and bibliomic data A network of noncoding regulatory RNAs acts in the mammalian brain Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets Rational siRNA design for RNA interference Networkin: a resource for exploring cellular phosphorylation networks Multiscale analysis of independent alzheimers cohorts finds disruption of molecular, genetic, and clinical networks by human herpesvirus A gene-coexpression network for global discovery of conserved genetic modules A genetic network mediating the control of bud break in hybrid aspen Exploring genetic interactions and networks with yeast Quantitative genetic interactions reveal biological modularity Using electronic patient records to discover disease correlations and stratify patient cohorts Comorbidity network for chronic disease: A novel approach to understand type 2 diabetes progression A dynamic network approach for the study of human phenotypes Probing genetic overlap among complex human phenotypes Human symptoms-disease network The multiplex network of human diseases The human disease network Diseasome and comorbidities complexities of SARS-COV-2 infection with common malignant diseases Network motifs: simple building blocks of complex networks Network motifs: theory and experimental approaches Network motifs in the transcriptional regulation network of escherichia coli The origin of motif families in food webs Structure and function of the feed-forward loop network motif Heuristics in the delivery room Identification of influential spreaders in complex networks Quantifying the social structure of elites in ancient china Simulation of prokaryotic genetic circuits Robustness in simple biochemical networks The proof and measurement of association between two things Finding and evaluating community structure in networks Data driven approach for eye disease classification with machine learning Disease classification: from phenotypic similarity to integrative genomics and beyond Pathway-based classification of genetic diseases A novel classification system for research reporting in rare and progressive genetic conditions The human disease network: Opportunities for classification, diagnosis, and prediction of disorders and disease genes Human disease classification in the postgenomic era: a complex systems approach to human pathobiology Supporting information S1 Data availability. The data used for this article are available in the supporting information. The source of the directed network is shown in DATA.xlsx, in which the names of nodes are chinese.