key: cord-0907476-jih9mpij authors: Martínez, Ricardo; Sánchez-Soriano, Joaquín title: Mathematical indices for the influence of risk factors on the lethality of a disease date: 2021-12-08 journal: J Math Biol DOI: 10.1007/s00285-021-01700-4 sha: 76c935d2f378394ae1b2199dac71c6e13c6f495a doc_id: 907476 cord_uid: jih9mpij We develop a theoretical model to measure the relative relevance of different pathologies of the lethality of a disease in society. This approach allows a ranking of diseases to be determined, which can assist in establishing priorities for vaccination campaigns or prevention strategies. Among all possible measurements, we identify three families of rules that satisfy a combination of relevant properties: neutrality, irrelevance, and one of three composition concepts. One of these families includes, for instance, the Shapley value of the associated cooperative game. The other two families also include simple and intuitive indices. As an illustration, we measure the relative relevance of several pathologies in lethality due to COVID-19. The COVID-19 pandemic is exerting a substantial global impact both from a humanitarian and economic perspective. As of February 2021, the virus has already resulted in over two million deaths. As with many other diseases, it is crucial to determine how preexisting conditions (e.g., other pathologies or genetic factors) influence the evolution and end of the disease. Identifying risk factors is relevant for the design of efficient public health systems and health services, including strategic interventions that can limit the fatal outcomes of a disease. However, when several factors are present and when these do not always occur simultaneously, the following question reasonably arises: which factors are the most relevant to a prognosis? The ability to quantify the relevance of a pathology in the observed mortality is a determinant in the design of a successful strategy to combat a disease. This is especially the case when establishing priorities between different population groups in the design of a vaccination campaign. In the case of COVID-19, several papers have examined the influence of preexisting comorbidities on COVID-19 mortality. For example, using logistic regression, Nogueira et al. (2020) evaluated and ranked the risk factors for COVID-19 mortality in Portugal. In the New York City Area, Richardson et al. (2020) studied risk factors for the evolution of hospitalized patients with COVID-19. Bello-Chavolla et al. (2020) used Cox proportional-hazards regression to score risk factors of lethality in patients with COVID-19 in Mexico. Stoian et al. (2020) statistically analyzed patterns of comorbidity, gender, and age in the mortality of patients with COVID-19 in Romania. Therefore, it is of research interest to examine the extent to which each risk factor contributes to COVID-19 mortality. A major aim of epidemiological research is to measure disease occurrence in relation to specific variables, which are known as risk factors. Impact measures are used to assess the contribution of one or several risk factors to the occurrence of incident cases at the population level (Benichou 2007) . Therefore, in a given population, the methodological task of adequately evaluating the impact of risk factors, which are not always present at the same time, on the final outcome of a disease is a relevant mathematical question. The first major problem involves determining how to assess risk factors that contribute to a particular goal. Since we can view risk factors as collaborating to achieve a goal, one possibility is to approach the problem from the perspective of cooperative game theory. In this case, it is first necessary to associate a cooperative game with the problem at hand, and then to apply a solution concept. One of the most commonly used solutions is the Shapley value (Shapley 1953) . Its widespread use is due to its relevant properties and simple interpretation (see, for instance, Roth 1988; Algaba et al. 2019b) . Another alternative is to use the theory of distributive justice (Rawls 1971; Roemer 1996) to define indices or measures that are closely related to the analyzed attribution problem, and which have properties that make them relevant and suitable in the corresponding context. On many occasions, the approaches are interrelated, studying whether the solutions provided in one of the approaches correspond to some concept or principle of the other. Moreover, these approaches can be also carried out in other biological systems in which several factors are implicated in producing a desirable or non-desirable result (e.g., climate change, genetics, or artificial and biological networks). The most popular measures of epidemiological risk are relative risk, the odds rate, and attributable risk (Levin 1953) . On this last concept of risk measure, we can find different approaches from the perspective of game theory when there are several risk factors. Cox (1985) investigated the problem of risk attribution by considering a risk function from the set of risk factors to [0, 1] . This can be seen as the characteristic function of a game, where the factors play the role of the players and the risk allocation functions are defined. In turn, the author demonstrated that there exists a unique risk allocation function that satisfies three reasonable properties in the epidemiological context: additivity, independence of labeling, and independence of irrelevant factors. Moreover, this risk allocation function is the Shapley value of the risk attribution problem seen as a game. Eide and Gefeller (1995) and Gefeller (1994) introduced sequential and average attributable fractions as measures of risk based on the results of Cox (1985) , particularly those related to the Shapley value. Land and Gefeller (1997) and Gefeller et al. (1998) provided a game theoretic justification of the results in Eide and Gefeller (1995) by means of a set of axioms (symmetry, marginal rationality, and internal marginal rationality) different from those used in Cox (1985) . Likewise, they used the (multifactorial) attributable risk function instead of a general risk function. McElduff et al. (2002) and Llorca and Delgado-Rodriguez (2004) introduced a proportional weighting scheme to distribute attributable risk among the different risk factors, and Rabe and Gefeller (2006) compared this method to the one based on the Shapley value. To finish with the problem of distributing attributable risk, Gefeller (1998, 2000) introduced a multiplicative version of the Shapley value to distribute the attributable risk. Cooperative games have been also applied to attribution problems arising from biological situations, particularly from genetics. For example, Moretti et al. (2007) introduced microarray games to analyze the relevance of genes. The authors proposed the Shapley value of the game as a relevance index for genes based on properties with a genetic interpretation (partnership rationality, partnership feasibility, partnership monotonicity, equal splitting, null gene). Lucchetti et al. (2010) investigated the Shapley value and the Banzhaf value (Banzhaf 1965) for microarray games by considering new relevant properties: symmetry, individual consistency, average loss, and total loss. Moretti et al. (2010) and Cesari et al. (2018) analyzed the co-expression networks of genes to explore the relevance of genes in terms of their relationships with other genes, relying on centrality indices in networks and the Shapley value of an associated game. Microarray games were also applied to study neuroblastic tumors in Albino et al. (2008) and to detect the genes involved in autism in Esteban and Wall (2011) . Phylogenetic trees are arborescent schemes that illustrate evolutionary relationships between various species or other entities that are believed to have a common ancestry. These schemes are useful for measuring (genetic) biodiversity. Therefore, if we are interested in preserving biodiversity, the problem of finding adequate measures to quantify the biological diversity of a species or genus is of great biological interest. In one sense, this is an attribution problem where we are interested in knowing which species are responsible for what part of the diversity. For each (unrooted) phylogenetic tree, Haake et al. (2008) defined a game (known as a phylogenetic tree game) and proposed the Shapley value of that game as a suitable measure of the diversity of species. Moreover, the authors characterized the Shapley value of those games by means of a set of axioms that are considered relevant for a diversity measure: Pareto efficiency, symmetry, additivity, and group proportionality. Redding et al. (2008) demonstrated that the Shapley value in Haake et al. (2008) and the fair proportion index (Redding and Mooers 2006) are highly correlated. Also, Hartmann (2013) showed that the Shapley value and the fair proportion index become equivalent when the number of species increases. However, Fuchs and Jin (2015) proved that the Shapley value of the (rooted) phylogenetic tree game and the fair proportion index are, in fact, the same, and Fuchs and Paningbatan (2020) studied the correlation between the (unrooted) Shapley value and the fair proportion index when the β-splitting model is used to generate random phylogenetic trees. More recently, Stahn (2020) studied the main differences between the Shapley values of phylogenetic tree games, and Wicke and Steel (2020) examined the combinatorial properties of phylogenetic diversity index, including the different versions of the Shapley value. As mentioned above, attribution problems are relevant in many other fields. For example, Brander et al. (2011) offered an account of the importance of the attribution problem in climate change, and Burger et al. (2020) conducted an in-depth review of the attribution problem in the context of climate change, both from a technical perspective and its legal and policy applications. A final example of attribution problems is in the field of artificial and biological networks. In this case, the attribution problem refers to measuring the contribution of each element of a network to a function for the successful performance of that function. Keinan et al. (2004) used the Shapley value for fair attribution of functional contribution in networks and provided a wide range of potential applications of this approach. In this paper, we approach the epidemiological problem of multifactorial risk attribution, but we directly use the risk profile of individuals in the population, as in the microarray problems in Moretti et al. (2007) and others. This is in contrast to using attributable risk, as in Cox (1985) , Gefeller (1995), or Gefeller (1994) . Moreover, instead of considering solutions for an associated game, we directly consider population data to define indices for the influence of risk factors on the lethality of a disease. Another difference compared to previous approaches is that we only consider individuals who have a specific outcome in the development of the disease and not all possible outcomes. Therefore, we measure the relative influence of risk factors in a particular outcome (e.g., the lethality of a disease). In this way, we obtain a ranking of the influence of risk factors on the outcome of interest (i.e., seeking to identify the risk factors that have the greatest impact on the outcome of interest). To be more precise, in our setting, a problem is determined by a set of pathologies, a set of individuals who have passed away, and a lethality matrix, which specifies the pathologies that led to an individual's death. An index is a measure of the lethality relevance of the pathologies as a function of the lethality matrix. Following the axiomatic methodology in the theory of fair distribution (Rawls 1971; Roemer 1996) , we investigate whether there exist indices that satisfy combinations of properties which are suitable in this context. In particular, we focus on three axioms: neutrality, irrelevance, and composition. The first of these says that the mere name of the pathology should not affect the measurement. Irrelevance states that the relevance lethality of an irrelevant pathology is zero. Finally, composition requires that when we bring together data from two subgroups, the index of the total group can be determined by a suitable composition of the indices of the subgroups. We consider three types of composition: additive composition, sized composition, and incidence composition. Additive composition requires that the index is additive with respect to the set of individuals; sized composition requires that the index is weighted additive with respect to the size of the subgroups of individuals; and incidence composition requires that the index is weighted additive with respect to the incidence of pathologies in each subgroup. The combination of neutrality, irrelevance, and one of the composition properties give rise to a unique family of indices for the influence of risk factors on the lethality of a disease. In this way, we obtain three families of indices. Each of these families has as a member a simple index that is remarkable and easy to interpret. In particular, the family obtained with additive composition contains the equal attribution index, which emerges as the most convenient alternative of that family since it coincides with the Shapley value of the natural associated game. The family obtained with sized composition contains the share index, which measures the proportions of individuals who died with each pathology. Finally, the family obtained with incidence composition contains the ratio index, which measures the ratio of pathologies out of all possible cases. There are several problems in the literature consisting of a set of attributes and a population whose individuals have at least one of those attributes (see Algaba et al. 2019a ). Then, a game is associated with each problem, and the measure of relevance of each attribute usually coincides with a solution concept of that cooperative game. Efficiency is one of the basic requirements for the solution. One of these problems is the museum pass problem (Ginsburgh and Zang 2003) . Our model also presents several particularities with respect to this problem and those related to it. First, we do not require efficiency in the definition of the lethality influence index. The lethality relevances do not need to add up to a given amount, and different indices may add up differently. And second, we consider the possibility that there are individuals in the population who do not have any of the attributes, that is, there may be individuals who have died without any of the considered pathologies. This possibility is explicitly excluded in Ginsburgh and Zang (2003) , Bergantiños and Moreno-Ternero (2015) , and Dehez and Ginsburgh (2020) . In this sense, our results, if properly applied to each domain, can be also understood as a generalization of these papers, when we eliminate efficiency. Indeed, one of the families of indices we characterize contains the solution proposed by these authors for their frameworks. Finally, to illustrate the application of our theoretical framework, we measure the relevance of several pathologies in the lethality of COVID-19. According to the Novel Coronavirus Pneumonia Emergency Response Epidemiology Team (see Team 2020), the most prominent comorbidities implicated in COVID-19 mortality are hypertension, cardiovascular disease, diabetes, and chronic respiratory disease. From their data, we run several simulations to apply the indices we characterize. We find that the relevance of some pathologies differs from the impact reported in their study. The rest of the paper is organized as follows. Section 2 presents the mathematical model we will use. In Sect. 3, we propose some simple and intuitive indices that are suitable for measuring the relative relevance of pathologies of the lethality in society, one of which is the equal attribution index. In Sect. 4, we present some properties that we consider relevant in this context. Our main results are given in Sect. 5. We characterize the three families of rules that satisfy the properties. The equal attribution index belongs to one of these families and we show that it coincides with the Shapley value of an associated game. Section 6 illustrates the application of the equal attribution index of lethality relevance to the case of COVID-19. Finally, Sect. 7 offers concluding remarks. Let P = {1, . . . , p} be a set of pathologies. Suppose we want to assess their relevance as the cause of death in a group of individuals N = {1, . . . , n}. A lethality matrix is a matrix X ∈ {0, 1} n× p of n rows (one for each individual) and p columns (one for each pathology), where We denote by x i· the i-th row of X , which indicates the pathologies that i has. We also denote by x ·a the a-th column of X , which indicates the individuals with pathology a. Let D N be the domain of all possible matrices with individuals in N . We shall also consider a variable-population generalization of the model. Then, there is a set of potential individuals, which are indexed by the natural numbers N. Let N be the set of finite subsets of N, with generic element N . We denote by D ≡ N ∈N D N the class of all possible matrices with variable population. Although we have only mentioned pathologies in the model, it is possible to consider other risk factors such as gender and age without the need to make any special modifications to the model. Therefore, the model is sufficiently general to analyze other types of biological problems in which the impact of different factors must be assessed. Lethality relevance is measured using an index. It is a mapping λ : D −→ R p ≥0 that assigns a vector λ(X ) to each lethality matrix X ∈ D N , where λ a (X ) is the relevance of pathology a ∈ P on the lethality. Example 1 Consider the case of a group of individuals N = {1, . . . , 6} who die with one or several pathologies in P = {diabetes, high blood pressure, bronchitis}. Data are represented using a lethality matrix as follows: The first row indicates that Agent 1 died with high blood pressure and bronchitis but was not diabetic. We observe that three out of six people had diabetes when they passed away, the same number as those with high blood pressure. Also, in this example, any individual with high blood pressure was affected by bronchitis. At this point, the obvious question is: what is the relevance of each pathology on the lethality of this society? This section presents several lethality influence indices that can be applied to quantify the relevance of risk factors (preexisting pathologies) on the lethality of a disease. The first lethality influence index, relevance, is relatively straightforward. Relevance is defined as the number of individuals who died having been diagnosed with the pathology. Just counting the number of cases (and obviating the size of the population, for example) may not be very accurate, but it can be considered as a primary measure in epidemiology. In fact, this indicator has been recurrent in the media during the pandemic. Count index. For each X ∈ D N and each a ∈ P, The next index states that lethality is the share of individuals who died having been diagnosed with a pathology. The objective of this index is to measure what proportion of the deceased individuals had a certain pathology. This kind of index can also be considered as a primary measure in epidemiology and, in addition, it is common to find it in epidemiological reports and studies (see, for example, Sanyaolu et al. 2020). Share index. For each X ∈ D N and each a ∈ P, The third alternative measures lethality as the share of occurrences of a pathology out of all possible cases. While the previous index only takes into account what proportion of individuals suffered from a certain pathology, this measure collects the effect that individuals may have more than one pathology and, therefore, what it measures is the impact on the total number of pathologies suffered by the deceased. For example, if a certain pathology were present in all individuals, its share index would be 1, but if all the deceased had, in addition, two other different pathologies each, this should be taken into account when evaluating its influence on lethality because it was always accompanied by two other pathologies. In this example, its ratio index would be 0.33, which better captures this circumstance. For each X ∈ D N and each a ∈ P, The last index is also simple. It works as follows: assign to each of the deceased a mass of 1 unit; then, split this mass among the pathologies the individual experienced; finally, add across individuals. At this point it is legitimate to wonder why the mass of 1 unit should be equally split among the different pathologies. We must take into account that the mass 1 corresponds to one individual who died with several pathologies. Ex ante, it may not be possible to know the primary cause of the death of such an individual, especially if it is evaluated in conjunction with a disease that is common to all cases (COVID-19, for instance). When it is not feasible to explore in depth the causes of death of each single person, it is natural and convenient to assume that all pathologies are equally responsible for the decease of that individual. Applying the same reasoning to all individuals and combining the data, it is possible to capture the relative relevance of each pathologies in the whole population. That is the goal of the equal attribution index. Equal attribution index. For each X ∈ D N and each a ∈ P, where N * a = {i ∈ N : x ia = 1}. Example 2 Continuing with Example 1, we first illustrate the functioning of the equal attribution index. From matrix X , we construct the following matrix: Each row has a mass of one unit. Individual 1, represented by the first row, has diabetes and high blood pressure, and so the mass is split between these pathologies by assigning a weight of 1 2 each. The procedure is applied to the rest of the population. Finally, we sum the weights to obtain the relevance of each pathology. Thus, From the information in X , we conclude that diabetes and bronchitis are equally relevant in the lethality of this disease in society. Also, both diabetes and bronchitis have a greater impact compared to high blood pressure, with the significance of high blood pressure being 27.3% less than the other two pathologies. For the same example, the count index is As we observe, different results are obtained with this index. According to the count index, the two diseases with the same relevance are diabetes and high blood pressure. On applying the share index, we get Notice that the count and share indices differ in absolute terms, but the relative relevance of any pair of pathologies is the same in both methods. Finally, lethality relevance according to the ratio index is Given a lethality matrix X ∈ D, we define the aggregate impact of the pathologies in P as the aggregation of lethality relevances (X ) = a∈P λ a (X ). It is worth noting that the aggregate impacts of different indices, as the previous example illustrates, are different in general. Indeed, in Example 2, E (X ) = 5, C (X ) = 10, S (X ) = 5 3 , and R (X ) = 1. The first axiom is a standard principle of impartiality. It simply requires that the name of the pathology should not be relevant for the measurement of lethality relevance. Neutrality. For each X ∈ D N , where π(X ) is a permutation of the columns of X and, since we can identify columns with pathologies, π(λ(X )) is the same permutation applied to the vector λ(X ). Furthermore, since π can be looked at as a bijective mapping from P to P, we abuse notation and also denote by π the permutation of the elements of the set of pathologies P, where π(a) is the new number (column in the matrix) associated with pathology a when permutation π is applied. 2 We say that a pathology is irrelevant for a lethality if it was not present in any death. The next property says that the lethality relevance of an irrelevant pathology must be zero. Irrelevance. For each X ∈ D N and each a ∈ P, if x ia = 0 for all i ∈ N , then λ a (X ) = 0. Suppose there are two disjoint groups of individuals N and M (e.g., two regions of a country). Now consider a larger society resulting from combining N and M. The question is how to recalculate the index for N ∪ M from the indices for N and M. The name of this recalculation is composition. Depending on how the composition of the indices for N and M is undertaken, different composition properties will be generated. Additive composition states that the relevance on the lethality of each pathology in the large population, N ∪ M, is the sum of the relevance in N and M. where X ⊕ Y ∈ D N ∪M is the matrix resulting from stacking X above Y (by rows). However, we could consider that population size should be taken into account when relating the relevance of the pathologies in the larger population and the smaller ones. Sized composition states that the relationship between the relevance on the lethality of each pathology in the large population and the relevance in the populations N and M must be weighted by their sizes. Finally, we could consider that the incidence of pathologies in each population should be taken into account, and then a new composition property could be defined. Incidence composition says that the relationship between the relevance on the lethality of each pathology in the large population and the relevance in the populations N and M must be weighted by the incidence of the pathologies in each population. Next, we explore the compatibility or incompatibility of some groups of the previous properties. In this regard, Proposition 1 states that the three composition requirements are mutually excluding. More specifically, there is no lethality index, other than the null index, that satisfies any pair of compositions. The null index, λ 0 (X ) = 0 P for all N ∈ N and for each X ∈ D N , is the unique index that satisfies the following pairs of properties: i) Additive composition and sized composition. Proof Since λ 0 (X ) = 0 for each X ∈ D, it is obvious that it satisfies the three composition properties. Let us suppose that there exists a non-null index λ satisfying additive composition and sized composition. Since this index is not null, there exists X ∈ D N such that λ(X ) = 0 P . Next we take X and any Y ∈ D M , N ∩ M = ∅, since λ satisfies additive composition and sized composition, then Let us now suppose that there exists an index λ satisfying additive and incidence composition. Since this index is not null, there exists X ∈ D N such that λ(X ) = 0 P . Next we take X and any Y ∈ D M so that Y 1 > 0, N ∩ M = ∅, since λ satisfies additive and incidence composition, then for each X ∈ D, are the unique indices that satisfies sized composition and incidence composition. Proof It is straightforward to prove that constant indices satisfy sized composition and incidence composition. Let λ be an index that satisfies sized and incidence composition, then we have that 1. For each X ∈ D N and for each Y ∈ D M , so that (m We can now apply the previous case and obtain that λ(X ) = λ(Y ) and λ(Y ) = λ(Y ), therefore λ(X ) = λ(Y ). 3. For each X ∈ D N with X 1 = 0 and for each Y ∈ D M with Y 1 = 0, we can do the same as in the previous case and also obtain that λ(X ) = λ(Y ). Now, let X ∈ D N and Y ∈ D M be two lethality matrices, and let Z ∈ D Q be a lethality matrix such that N ∩ Q = ∅ and M ∩ Q = ∅. The pairs of lethality matrices X , Z and Y , Z will be in the conditions of one of the three cases described above. Therefore, we have that λ(X ) = λ(Z ) and λ(Y ) = λ(Z ), so we conclude that λ(X ) = λ(Y ). Therefore, if an index λ satisfies sized and incidence composition then λ(X ) = K for all X ∈ D, for some K ∈ R P ≥0 . Proposition 1 and Proposition 2 imply that any group of compatible properties (which do not lead to a constant index) has, at most, three requirements, namely neutrality, irrelevance, and one of the composition. Proposition 3 shows that any of the compositions is compatible with both neutrality and irrelevance. More precisely, it states that there exist indices that satisfy any pair of those three requirements but violate the third one. Proof i) The 0 − 1 index satisfies neutrality and irrelevance, but it violates the three composition properties. For each X ∈ D N and each a ∈ P, ii) The following index satisfies neutrality and additive composition, but it violates irrelevance. For each X ∈ D N and each a ∈ P, λ 1 a (X ) = X 1 . iii) The following index satisfies irrelevance and additive composition, but it violates neutrality. For each X ∈ D N and each a ∈ P, iv) The following index satisfies neutrality and sized composition, but it violates irrelevance. For each X ∈ D N and each a ∈ P, v) The following index satisfies irrelevance and sized composition, but it violates neutrality. For each X ∈ D N and each a ∈ P, vi) The following index satisfies neutrality and incidence composition, but it violates irrelevance. For each X ∈ D N and each a ∈ P, vii) The following index satisfies irrelevance and incidence composition, but it violates neutrality. For each X ∈ D N and each a ∈ P, x ia if a = 1 and X 1 ≥ 1, 0 otherwise. Finally, in the next theorem we establish which properties are satisfied by the four lethality influence indices introduced in Sect. 3. Table 1 shows the properties satisfied for each of these rules. i) The count index satisfies neutrality, irrelevance, additive composition, but not sized composition and incidence composition. ii) The share index satisfies neutrality, irrelevance, sized composition, but not additive composition and incidence composition. iii) The ratio index satisfies neutrality, irrelevance, incidence composition, but not additive composition and sized composition. iv) The equal attribution index satisfies neutrality, irrelevance, additive composition, but not sized composition and incidence composition. Proof By using the definitions of the indices is straightforward to prove that they all satisfy neutrality and irrelevance. Next, we prove what composition property they satisfy and which do not. 1. It is immediate to prove that the count index satisfies additive composition. Now, by Proposition 1 the count index does not satisfy sized composition and incidence composition. Therefore, the share index satisfies sized composition. However, by Proposition 1 it does not satisfy additive composition, and by Proposition 2 it does not satisfy incidence composition. Sized composition For each X ∈ D N and each Y ∈ D M . We distinguish three cases: (a) If X 1 = 0 and Y 1 = 0, it is obvious that (c) If X 1 > 0 and Y 1 > 0, reasoning similarly as in (4), we obtain that Therefore, the ratio index satisfies incidence composition. However, by Proposition 1 it does not satisfy additive composition, and by Proposition 2 it does not satisfy sized composition. Therefore, the equal attribution index satisfies additive composition. Now, by Proposition 1 it does not satisfy sized composition and incidence composition. In this section, we present our main findings. Theorem 2 and Proposition 4 constitute a theoretical justification for the prominence of the equal attribution index in the measurement of lethality relevance, while Theorem 3 and 4 show characterizations of the families of indices that contain the share index and the ratio index, respectively. We shall observe that the only difference among the three families of indices characterized in this section is the definition of the property of composition. The first result characterizes the family of indices that satisfy neutrality, irrelevance, and additive composition. It states that under these requirements, the relevance must be a weighted sum of the lethality of the individuals. Theorem 2 An index λ satisfies neutrality, irrelevance, and additive composition if and only if, there exist neutral functions 3 w i : {0, 1} P → R ≥0 for each i ∈ N such that for each X ∈ D N and for each a ∈ P, Proof Let λ be an index and let w i : {0, 1} P −→ R ≥0 be neutral functions such that, for each X ∈ D N and for each a ∈ P, λ a (X ) = n i=1 w i (x i· )x ia . We start by showing that this index satisfies the three properties. • Neutrality. Let X ∈ D N , and let π be a permutation of columns of X . For each a ∈ P, Now, when we apply permutation π to the vector λ(X ), its π −1 (a)-th coordinate becomes the a-th coordinate. Therefore, λ(π(X )) = π(λ(X )) holds. • Irrelevance. Let X ∈ D N , and let a ∈ P such that x ia = 0 for each i ∈ N . Now, we prove the converse. Let λ be an index that fulfills neutrality, irrelevance, and additive composition. Suppose that N is a singleton (a group of just one individual N = {i}). On the one hand, in the application of irrelevance, when the pathology a is irrelevant we know that λ a (X ) = 0, and the statement holds. On the other hand, due to neutrality, λ b (X ) = λ c (X ) for any pair of non-irrelevant pathologies. Indeed, let π be a permutation such that π(b) = c, π(c) = b and π(a) = a for all a = b, c, then we have that, but since b and c are non-irrelevant and X is just a vector, π(X ) = X , hence, λ c (X ) = λ b (X ). Then, we now consider a function w i : where b is any nonirrelevant pathology in x i· , and w i (0 P ) = 0. Since λ satisfies neutrality, w i is a neutral function. Now, suppose that N is such that n ≥ 2. Notice that where each x i· is a lethality matrix with just one individual. Since λ satisfies additive composition, it follows that Let a ∈ P. We have already seen that λ a ( By inspecting Table 1 we can conclude that the share and ratio indices do not belong to the family characterized in Theorem 2. On the other hand, the count index is an element of the family, with w i (x i· ) = 1, ∀i ∈ N. The equal attribution index also belongs to this family, with Among the many indices that satisfy neutrality, irrelevance, and additive composition, the equal attribution index presents a distinctive particularity. Since the index is closely related to a well-known solution of cooperative games, it can be grounded from a game theoretic perspective. We can naturally define a cooperative game as follows. 5 The set of players is the set of pathologies P, and the value function for each coalition Q ⊂ P is given by v(Q) = |{i ∈ N : x ia = 1 for some a ∈ Q}| , where |T | is the cardinality of set T . For a given cooperative game (P, v) , a solution is a vector s ∈ R p ≥0 such that a∈P s a = v(P), where s i represents the allocation to player i. Several authors have proposed different solution concepts based on different notions of fairness. Among those, the Shapley value (Shapley 1953 ) emerges as the most prominent (see Roth 1988; Algaba et al. 2019b ). Its expression is the following. For each a ∈ P, v(∅) = 0, v({1}) = 3, v({1, 2}) = 5, v({1, 2, 3}) = 5, v({2}) = 3, v({1, 3}) = 5, v({3}) = 4, v({2, 3}) = 4. Sh a (v) = 11 6 , 8 6 , 11 6 . As we can observe, the equal attribution index in Example 2 and the Shapley value in Example 3 provide the same lethality relevance. The next proposition states that they always coincide. Proof We first prove that the Shapley value satisfies neutrality, irrelevance, and additive composition. It is obvious that the Shapley value satisfies neutrality and irrelevance. Now we prove that the Shapley value also satisfies additive composition. Let X ∈ D N , Y ∈ D M , and Z = X ⊕ Y ∈ D N ∪M , and let (P, v X ), (P, v Y ) and (P, v Z ) be the cooperative games associated with X , Y , and X ⊕ Y , respectively. For each a ∈ P, Therefore, by Theorem 2, for each Let X ∈ D N such that N = {i} (i.e., a singleton), and let (P, v X ) be the associated game. We distinguish two cases: 1. x i· 1 ≥ 1. By the definition of the game, v X (P) = 1. Since the Shapley value is efficient (Shapley 1953) , it follows that a∈P Sh a (v X ) = v X (P) = 1. Now we have the following chain of equalities: By the definition of the game, v X (P) = 0. Now, it is obvious that for all a ∈ P, Sh a (v X ) = 0 = λ E a (X ). Therefore, the equal attribution index and the Shapley value coincide for singletons. Let X ∈ D N such that n ≥ 2. X can be written as It is known that the equal attribution index and the Shapley value coincide for each of the singletons. Therefore, by additive composition, both coincide in general. Therefore, we have shown that the justification for using the equal attribution index is twofold: first, it belongs to the family of indices that is uniquely determined by a suitable combination of properties (Theorem 2); and second, it corresponds to the Shapley value of the associated natural game (Proposition 4). As pointed out above, the share and ratio indices do not belong to the family characterized in Theorem 2. However, the following results identify the families of indices of lethality relevance to which the share and ratio indices belong. Each family is determined by only changing the concept of composition in use. Thus, the share index belongs to the family of indices determined by neutrality, irrelevance, and sized composition together, while the ratio index belongs to the family of indices determined by neutrality, irrelevance, and incidence composition together. 6 Theorem 3 An index λ satisfies neutrality, irrelevance, and sized composition if and only if, there exist neutral functions w i : {0, 1} P → R ≥0 for each i ∈ N such that for each X ∈ D N and for each a ∈ P, 6 As it is shown in Theorem 1, the share index and the ratio index fulfill the properties in Theorem 3 and Theorem 4, respectively. Proof Let λ be an index and let w i : {0, 1} P −→ R ≥0 be neutral functions such that, for each X ∈ D N and for each a ∈ P, λ a (X ) = 1 n n i=1 w i (x i· )x ia . We start by showing that this index satisfies the three properties. The proofs that this index satisfies neutrality and irrelevance are analogous to the corresponding ones in Theorem 2. Regarding sized composition, let X ∈ D N and Y ∈ D M . Also, let Z = X ⊕ Y . For each a ∈ P we have that Conversely, let λ be an index that fulfills neutrality, irrelevance, and sized composition. Suppose that N is a singleton. By irrelevance, when the pathology a is irrelevant we know that λ a (X ) = 0, and the statement holds. Due to neutrality, λ b (X ) = λ c (X ) for any pair of non-irrelevant pathologies. Therefore, there must exist Suppose next that N is such that n ≥ 2. Then, for each X ∈ D N , X = x 1· ⊕· · ·⊕x n· , where each x i· is a lethality matrix with just one individual. Since λ satisfies sized composition, it holds that i∈N 1 λ(X ) = 1 · λ(x 1· ) + · · · + 1 · λ(x n· ). Let a ∈ P. We have already seen that λ a (x i· ) = w i (x i· )x ia for any i ∈ N . Therefore, Theorem 4 An index λ satisfies neutrality, irrelevance, and incidence composition if and only if, there exist neutral functions w i : {0, 1} P → R ≥0 for each i ∈ N such that for each X ∈ D N and for each a ∈ P, Proof Let λ be an index and let w i : {0, 1} P −→ R ≥0 be neutral functions such that, for each X ∈ D N and for each a ∈ P, We once again start by showing that this index satisfies the three properties. The proofs that this index satisfies neutrality and irrelevance are mutatis mutandis analogous to the corresponding ones in Theorem 2. Regarding incidence composition, let X ∈ D N and Y ∈ D M . Also, let Z = X ⊕ Y . We distinguish three cases: immediately follows since all factors are zero. • X 1 > 0 and Y 1 = 0. In this case, X ⊕ Y 1 = X 1 . For each a ∈ P, it holds that • X 1 > 0, and Y 1 > 0. For each a ∈ P, we have that Conversely, let λ be an index that fulfills neutrality, irrelevance, and incidence composition. Suppose that N is a singleton. By irrelevance, when the pathology a is irrelevant we know that λ a (X ) = 0, and the statement holds. By neutrality, λ b (X ) = λ c (X ) for any pair of non-irrelevant pathologies. Therefore, there must exist Now, suppose that N is such that n ≥ 2. Then, for each X ∈ D N , X = x 1· ⊕· · ·⊕x n· , where each x i· is a lethality matrix with just one individual. Since λ satisfies incidence composition, it holds that i∈N x i· 1 λ(X ) = x 1· 1 · λ(x 1· ) + · · · + x n· 1 · λ(x n· ). If i∈N x i· 1 = 0, the result follows since λ satisfies irrelevance. Let us suppose that i∈N x i· 1 > 0, and let a ∈ P. We have already seen that λ a (x i· ) = w i (x i· )x ia for any i ∈ N . Therefore, Note that i∈N x i· 1 = X 1 . Now, we consider the following functions: which are also neutral. At this stage, we have characterized three families of indices that contain simple and intuitive indices. This means that based on the characteristics of the specific problem, it is possible to select the index that is most compatible with the properties relevant to the problem. Also, we can determine whether there is any differentiation between the individuals of the population when considering them in the calculation of the index. For example, w i can measure the life expectancy of the individual, in this way, the lethality indices would also take into account the impact on the reduction of the life span of the deceased. Moreover, note that, by Proposition 3, Theorems 2, 3, 4 are tight, that is, all the properties in their statements are necessary for the characterization. Finally, note that none rules given in Proposition 3 satisfy the conclusions of the theorems since each one fails to satisfy one of the three properties required in their statements. In this section, we illustrate a possible application of the model presented in the previous sections. The coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus. First identified in December 2019 in Wuhan, China, COVID-19 resulted in an ongoing pandemic with devastating effects. As of February 2021, around 100 million cases have been reported worldwide, affecting more than 180 countries and resulting in around 2.2 million deaths. Many researchers have analyzed how preexisting health conditions have influenced COVID-19 mortality. One of the first studies was published by The Novel Coronavirus Pneumonia Emergency Response Epidemiology Team. It involved a descriptive and exploratory analysis of all cases of COVID-19 diagnosed nationwide in China in February 2020. Among the several aspects analyzed in the study, the authors identified hypertension, cardiovascular disease, diabetes, and chronic respiratory disease as the main sources of comorbidity among patients who died due to COVID-19 (see Table 2 ). 7 If a patient has died with the pathology in the column, the value is 1, and 0 otherwise A relevant question that many publications have examined, primarily from a medical perspective (see, for example, Zhou et al. 2020; Guan et al. 2020; Li et al. 2020; Nogueira et al. 2020; Richardson et al. 2020; Bello-Chavolla et al. 2020; Stoian et al. 2020, among others) , relates to the impact of these four diseases on the lethality of COVID-19. Our theoretical framework constitutes a useful tool that provides an answer to the previous inquiry and quantifies the relative relevance of comorbid pathologies in the lethality of COVID-19. Let N be the set of patients who died with diagnosed COVID-19, and let P = {HYP, CAR, DIA, RES}, the set of the the four considered pathologies. The lethality matrix X , obtained from the microdata, identifies the preexisting diseases (other than COVID-19 infection) (Table 3) . If the microdata are public, computing the equal attribution index is straightforward, and the medical implications of the results are ready to be analyzed. This is easily achievable by a health authority with access to the information. Unfortunately, these microdata are not publicly available. However, we can overcome the lack of data at an individual level using simulations. Although simulation results are not as reliable compared to the use of real data, we believe they are sufficiently valid to illustrate the application of the theoretical model. We can ignore the specific entries for the lethality matrix X , but we know that the sum of the entries in the first column must coincide with the number of patient deaths that occurred with hypertension. The same reasoning applies to the rest of the columns/pathologies. Therefore, we proceed as follows: 1. Let N contain 50 individuals whose pathologies are distributed similarly to Table 2 . 8 In particular, suppose that 20 individuals died with hypertension, 11 individ- The results are shown in Table 4 . The third column, and probably the most interesting one, indicates the relative relevance of each disease in the lethality caused by From our findings, we conclude that the relative relevance of hypertension is 51.2%. Interestingly, if we restrict ourselves to the four preexisting diseases, only 44% of patients died with hypertension. In this way, the analysis indicates that the impact of hypertension on deaths due to COVID-19 is significantly greater than what might appear at first glance. On the contrary, the other pathologies are less relevant, with 22.2% vs. 25% for cardiovascular disease, 19.7% vs. 22% for diabetes, and 6.9% vs. 9% for chronic respiratory disease. In 2020, the COVID-19 pandemic left the world devastated. Therefore, one of the main objectives of this paper was to identify the preexisting pathologies that may have contributed to COVID-19 mortality. The ability to quantify the relative relevance of comorbidities on the lethality of COVID-19 may be crucial. In this paper, we established a theoretical model with that purpose in mind. In this context, we presented several alternatives to measure lethality relevance: the count, share, ratio, and equal attribution indices. Following an axiomatic methodology, we proposed several properties that emerge as natural requirements in this context: neutrality, irrelevance, and three different ways of understanding the concept of composition, namely, additive composition, sized composition, and incidence composition. By the combination of neutrality, irrelevance, and one of the composition properties, we characterized three families of indices for measuring the relative relevance of the pathologies in the lethality of a disease. Theorem 2 states that an index satisfies neu- 9 The relative relevance is calculated by normalizing λ E , i.e., for each a ∈ P, ν E a = λ E a λ E 1 . trality, irrelevance, and additive composition if and only if it is a weighted combination of the absence or presence of pathologies in the affected individuals. Theorem 3 states that an index satisfies neutrality, irrelevance, and sized composition if and only if it is an average of a weighted combination of the absence or presence of pathologies in the affected individuals. Theorem 3 states that an index satisfies neutrality, irrelevance, and incidence composition if and only if it is a proportion of a weighted combination of the absence or presence of pathologies in the affected individuals. On the other hand, the equal share index belongs to the family determined in Theorem 2. Also, Proposition 4 shows that this index coincides with the Shapley value of the associated cooperative game. The application of cooperative game theory in our problem seems very natural. Indeed, we may consider that mortality occurs due to a confluence of cooperating pathologies, from which the need follows to determine how to allocate responsibilities for that loss of life. Therefore, it emerges as the most convenient index of lethality relevance of the family. The share index belongs to the family determined in Theorem 3, whereas the ratio index belongs to the family determined in Theorem 4. These two indices are also simple and intuitive, and they are two relevant members of their respective families of indices. As an illustration, we applied the proposed theoretical model to quantify the relevance of four pathologies on the lethality of COVID-19: hypertension, cardiovascular disease, diabetes, and chronic respiratory disease. We found that the equal attribution index imputed more relative relevance to hypertension compared to the impact suggested in Team (2020) . Although we justify the application of the equal attribution index, other indices introduced in this paper may deserve deeper analysis. The count index is one such index. It identifies the lethality relevance of a pathology with the share of deaths in which the pathology is implicated. It seems that this is precisely what is done implicitly in several medical studies, which simply state the percentage of deaths that occurred when the underlying pathology was present. Finally, the mathematical approach to the attribution problem adopted in this paper can be also applied to other biological or social contexts, including those different from epidemiology, where a similar mathematical structure may be considered. Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Identification of low intratumoral gene expression heterogeneity in neuroblastic tumors by genome-wide expression analysis and game theory Relationship between labeled network games and other cooperative games arising from attributes situations Weighted voting doesn't work: a mathematical analysis Predicting mortality attributable to SARS-CoV-2: a mechanistic score relating obesity and diabetes to COVID-19 outcomes in Mexico Biostatistics and epidemiology: measuring the risk attributable to an environmental or genetic factor The axiomatic approach to the problem of sharing the revenue from museum passes The value of attribution The law and science of climate change attribution An application of the Shapley value to the analysis of co-expression networks A new measure of attributable risk for public health applications Approval voting and Shapley ranking Sequential and average attributable fractions as aids in the selection of preventive strategies Using game theory to detect genes involved in Equality of Shapley value and fair proportion index in phylogenetic trees Correlation between Shapley values of rooted phylogenetic trees under the beta-splitting model Variants of the attributable risk in a multifactorial situation: theory and computational realization Second thoughts: averaging attributable fractions in the multifactorial situation: assumptions and interpretation The museum pass game and its value He JX (2020) Comorbidity and its impact on 1590 patients with COVID-19 in China: a nationwide analysis The Shapley value of phylogenetic trees The equivalence of two phylogenetic biodiversity measures: the Shapley value and Fair Proportion index Fair attribution of functional contribution in artificial and biological networks A game-theoretic approach to partitioning attributable risks in epidemiology A multiplicative approach to partitioning the risk of disease A multiplicative variant of the Shapley value for factorizing the risk of disease The occurrence of lung cancer in man Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan A new way to estimate the contribution of a risk factor in populations avoided nonadditivity The Shapley and Banzhaf values in microarray games Estimating the contribution of individual risk factors to disease in a person with more than one risk factor The class of microarray games and the relevance index for genes Using coalitional games on biological networks to measure centrality and power of genes The role of health preconditions on COVID-19 deaths in portugal: evidence from surveillance data of the first 20293 infection cases The attributable risk in a multifactorial situation Incorporating evolutionary measures into conservation prioritization The most original species often capture more phylogenetic diversity than expected Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area The Shapley value: essays in honor of Lloyd S. Shapley A value for N -person games Death by SARS-CoV 2: a Romanian COVID-19 multi-centre comorbidity study Biodiversity, Shapley value and phylogenetic trees: some remarks The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) Combinatorial properties of phylogenetic diversity indices Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations