key: cord-0186450-9ihcjjn1 authors: Amornbunchornvej, Chainarong; Surasvadi, Navaporn; Plangprasopchok, Anon; Thajchayapong, Suttipong title: Framework for inferring empirical causal graphs from binary data to support multidimensional poverty analysis date: 2022-05-12 journal: nan DOI: nan sha: 4d8fe4e2bb6a26435d682b4a4a5b6d57770df19f doc_id: 186450 cord_uid: 9ihcjjn1 Poverty is one of the fundamental issues that mankind faces. Multidimensional Poverty Index (MPI) is deployed for measuring poverty issues in a population beyond monetary. However, MPI cannot provide information regarding associations and causal relations among poverty factors. Does education cause income inequality in a specific region? Is lacking education a cause of health issues? By not knowing causal relations, policy maker cannot pinpoint root causes of poverty issues of a specific population, which might not be the same across different population. Additionally, MPI requires binary data, which cannot be analyzed by most of causal inference frameworks. In this work, we proposed an exploratory-data-analysis framework for finding possible causal relations with confidence intervals among binary data. The proposed framework provides not only how severe the issue of poverty is, but it also provides the causal relations among poverty factors. Moreover, knowing a confidence interval of degree of causal direction lets us know how strong a causal relation is. We evaluated the proposed framework with several baseline approaches in simulation datasets as well as using two real-world datasets as case studies 1) Twin births of the United States: the relation between birth weight and mortality of twin, and 2) Thailand population surveys from 378k households of Chiang Mai and 353k households of Khon Kaen provinces. Our framework performed better than baselines in most cases. The first case study reveals almost all mortality cases in twins have issues of low birth weights but not all low-birth-weight twins were died. The second case study reveals that smoking associates with drinking alcohol in both provinces and there is a causal relation of smoking causes drinking alcohol in only Chiang Mai province. The framework can be applied beyond the poverty context. Poverty is one of the fundamental issues that mankind faces. More than 100 million people are back into the extreme poverty line by living under the 1.25 USD per day during COVID19 pandemic [13] . Ending poverty in all its forms everywhere has also been recognized as the greatest global challenge in the 2030 Agenda for Sustainable Development. However, poverty alleviation often requires comprehensive measures depending on the ground-truth realities and the extent of each region's capability to tackle poverty issues. The first crucial and challenging step is to understand factors associated with poverty, and then to identify the root cause(s) of issues that give rise to poverty. One of the well-known measures for poverty is "Multidimensional Poverty Index (MPI)" [3, 5] , which has been proposed for estimating the degree of poverty in specific areas and population. The MPI measures poverty beyond the aspect of monetary issues by including other factors such as deficiency in health, inadequate education and truncated standard of living. The principle of MPI allows poverty-related factors and their weights to be adjusted according to the ground-truth realities in each region. Despite the usefulness and flexibility of MPI, the focus has been primarily on 1) the degree of poverty from multiple factors and 2) the contributions of each factor toward poverty without any information regarding causation among factors. Knowing the causal relationships among MPI factors would enhance the practical capacity of MPI itself and subsequently enable the issues of poverty to be solved more effectively. Nevertheless, MPI works only on binary data, which cannot be analyzed by most causal inference framework. The scope of poverty issues is beyond monetary [3, 4, 9] . Poverty can relate with other factors such as social capital, homogeneity of population, in multiple ways [35] . Causal inference plays a key role for explanation, prediction, decision making, etc. [10, 24] . It reveals causal relations between variables/factors, which leads to the understanding of influence among variables. In policy making, causal inference can be used to estimate outcomes of policy change [11] and to support policy designing [34] . Poverty can be caused by many factors such as health issue [32, 34] , education issue [41] , income inequality [40] , etc. Understanding causal relationships is a key step for designing effective policies to combat poverty [34] . The work in [25] uses frequent pattern mining to infer causal relations called "causal rule" from discrete variables using the concept of odd ratios. The framework is consistent with the potential outcome framework [28, 30] in the causal inference [25] . However, the framework assumes that the direction of causal relations are given. In the related field, Bayesian network [17, 29, 36, 37] , the work in [36, 37] provides a software in a form of R package in the Comprehensive R Archive Network (CRAN) [33] that can be used to learn network structures in general, which is suitable for inferring causal networks. To the best of our knowledge, there is still no work of causal-inference framework based on structural causal models on binary variables utilizing estimation statistics, which are able to provide magnitudes of difference between groups (e.g. cause and effect) [8] , and is capable of inferring causal directions with degree of causal direction in form of confidence intervals. By knowing a confidence interval of degree of causal-direction, not only we know the causal relation, but we also know the how strong the causal relation is. To fill the gap, in this work, we formalized the definition of structural causal models on binary variables and proposed a framework to infer causal relations from binary data and estimation statistics technique. Our framework is capable of: • Inferring the causal graph: inferring causal relations among binary variables in a form of a causal graph using frequent pattern mining on non-parametric hypothesis testing; and • Inferring magnitude of difference in term of confidence intervals: inferring dependency, association, and degree of causal-direction in forms of confidence intervals using estimation statistics. We validated our framework on simulation data by comparing the proposed method with baseline approaches. We demonstrated the application of our framework on inferring causal relations of mortality, birth weights, and other risk factors in the U.S.twins dataset and causal directions of poverty indicators from the datasets of hundred thousands of Thailand households to support data analysis in poverty. The proposed framework can be utilized on binary data beyond the field of poverty causal inference. Note that the causal relations inferred by this work is not the real causal relations; they are empirical causal relations that needed to be validated. Our main goal is to develop an exploratory data analysis tool to pinpoint possible causal relations to support researchers before the validation in the field studies to find real causal relations. To increase readability of notations, in a directed graph, we use − → to represent that there is a directed edge from to , and ← − for a directed edge from to . We also use − → to represents causes in a causal graph. Definition 1 (Structural Causal Model (SCM) [31] ). Given an SCM model ℭ = (S, ) with a directed graph = ( , ) where = { 1 , . . . , } is a set of variable nodes and is a set of causal directions. A set S consists of equations, which are defined below: Where PA ⊆ \ { } is a set of parents of in s.t. ∀ ∈ PA , ( , ) ∈ or causes denoted − → . = 1 ,..., is a joint distribution over the noise variables 1 , . . . , where all noise variables are independent from each other: ∀ , , ⊥ ⊥ . Where ∧ and ∨ are "AND" and "OR" operators respectively. The , is a binary parameter with , = 1 if has a positive causation relation with . In contrast, , = 0 if has a negative causation relation with . If PA = ∅, then = . Note that is not a cause of if and only if ∉ PA . Suppose we have a dataset D = { ì 1 , . . . , ì } that was generated from b-SCM ℭ where ì = ( ,1 , . . . , , ) is an th vector of realizations of random variables 1 , . . . , in ℭ. However, the equations in S of ℭ is unknown to us. In this work, we are interested in finding both directed and indirected causes of any variable . Hence, we define a transitive causal graph to represent this idea. Assuming that there is no confounding factors outside variables in ℭ, we can formalize the following problem for inferring the transitive causal graphˆof ℭ. where ∈ PA , , = 1, and is the noise parameter of s.t. ⊥ ⊥ . By setting = 1, we have = 1. Hence, ( = 1| = 1) = 1. For ( = 1| = 1), we have the following equation. Due to the fact that − → , the probability ( = 1) > ( = 1) by Proposition 3.2. This makes In the backward direction, suppose ( = 1| = 1) > ( = 1| = 1). According to the Reichenbach's Common Cause Principle, if ⊥ / ⊥ , there are three possible relations between , : 1) − → , 2) − → , and 3) , have the same confounding variable ′ s.t. − → , − → , and ⊥ ⊥ | ′ . We show that 2) and 3) have some cases that contradict the assumption This case implies ( = 1| = 1) < ( = 1| = 1) and it establishes the contradiction with the given assumption ( = 1| = 1) > ( = 1| = 1). Since 2) and 3) relations in Reichenbach's Common Cause Principle are not possible given . For the case that , have a negative association ( = 0 causes = 1), it is obvious that the above proof still valid if we replace = 1 with = 0. □ Theorem 3.4. Given D = { ì 1 , . . . , ì } that was generated from b-SCM ℭ. Assuming that the noise variables of ℭ, 1 , . . . , are i.i.d. with the probability < 1 of being 1. Algorithm 2 is a solution of Problem 1, which is able to infer the transitive causal graphˆof ℭ. Proof. In the forward direction, given D = { ì 1 , . . . , ì }, we show that Algorithm 2 providesô f ℭ. In line 2-6, Algorithm 2 infers pairs of variables that are statistically dependent and keeps them into 0 . Then, in line 7-16, for any pair ( , ) ∈ 0 , the algorithm infers whether Check whether ( = 1, ⊥ ⊥ | but ( = 1| = 1) ≤ ( = 1| = 1). In both 1) and 2), ( , ) ∉ˆin the line 24. Hence, Algorithm 2 provides the exact transitive causal graph. In the backward direction, given anyˆ= ( ,ˆ) of ℭ, we show thatˆis inferred by Algorithm 2. Suppose there is a transitive causal graphˆ′ = ( ,ˆ′) of ℭ s.t.ˆ≠ˆ′ and Algorithm 2 cannot inferˆ′. Assuming that the variables of two graph are the same. There is only one possibility: ≠ˆ′. Suppose ( , ) ∈ˆbut ( , ) ∉ˆ′. This means causes inˆbut is not a cause of inˆ′. However, this is impossible since bothˆ,ˆ′ are transitive causal graphs of the same ℭ; if ( , ) ∈ˆ, then ( , ) must be inˆ′. Hence, it establishes the contradiction thatˆ=ˆ′ and the Algorithm 2 provides the unique solution. □ Given a dataset D = { ì 1 , . . . , ì } where ì = ( ,1 , . . . , , ) is an th vector of realizations of random variables 1 , . . . , , the main purpose of this work is to provide a solution for Problem 1 by inferring a transitive causal graphˆ= ( ,ˆ) from D. In the context of poverty analysis, D can be represented as an × matrix where rows represent households and columns represent poverty factors we would like to find causal relations between them. The output of the framework is the adjacency matrix of a causal graph among poverty factors. Figure 1 illustrates an overview of the proposed framework. In the first step, the framework performs "Bootstrapping" to generate B = {D ′ 1 , . . . , D ′ } from D (Section 4.4). Then, it aligns data in B using Algorithm 3. Afterwards, the framework infers a transitive causal graphˆusing Algorithm 2, which deploys several statistics that derived from B. The core of statistical estimation in the framework is the estimation of conditional probability using Algorithm 4. To infer whether two variables , are statistically independent given or | , we can check the following statement: (3) If | ( , | ) − ( | ) ( | )| = 0, then we can conclude that ⊥ ⊥ . Otherwise, ⊥ / ⊥ . However, in real datasets, if the distributions that generate the data are unknown, we cannot access to compute the probability ( ) directly. In the data mining community, the concept of support and confidence [1, 2, 22] might be used to estimate the probability of any given event. Before computing conditional probability using support and. confidence, we need to align dataset D using the algorithm below. We can have the following equation to compute the degree of dependency between , given .ˆ( In both Eq. 4 and Eq. 5, , are independent if the value is close to zero. Where oddRatio( , ) > 1 implies , has a positive association, while oddRatio( , ) < 1 implies , has a negative association. oddRatio( , ) = 1 implies no direction of association. The odd difference, which is an alternative of the odd ratio in Eq. 6, can be defined below. Where oddDiff ( , ) > 0 implies , has a positive association, while oddDiff ( , ) < 0 implies , has a negative association. There is no association if oddDiff ( , ) = 0 After we check that there is no variable s.t. ⊥ ⊥ | using Eq. 4. In Algorithm 2, the next step to check whether − → is to check their estimated conditional probability. We approximate the probability below. Given 1 , . . . , ∼ are random variables that independent and identically distributed (i.i.d.) w.r.t. an unknown distribution with mean < ∞ and variance 2 < ∞, the realizations of these random variables are in a set ′ = { 1 , . . . , }. By performing the sampling with replacement from ′ = { 1 , . . . , } times, we can have sets of data sampling from ′ : ′ 1 , . . . , ′ . The process of sampling ′ to be ′ 1 , . . . , ′ is called "Bootstrapping". The summary statistics , of ′ 1 , . . . , ′ is approaching ′ 's when a number of bootstrap replicates is large [8, 12, 14] . In the aspect of hypothesis testing, suppose the null hypothesis 0 : = 0 while the alternative hypothesis 1 : > 0, we can test either 0 or 1 is supported by ′ using the sets of data from bootstrapping ′ : ′ 1 , . . . , ′ . However, there are several disadvantages of using the hypothesis testing alone as follows: 1) the hypothesis testing provides only either 0 or 1 is supported by data, but there is no information regarding the magnitude of summary statistics we estimate [19] , 2) the hypothesis testing always rejects 0 in some system even the effect might be too small [16] , 3) the hypothesis testing faces the problem of repeatability [21] . To address these issues, "estimation statistics" has been developed, which is considered as a methodology that is more informative than the hypothesis testing [8, 15, 18, 23] . In the aspect of estimation statistics, the sets of data from bootstrapping ′ : ′ 1 , . . . , ′ can be used to estimate 100 * (1 − )% confidence interval (CI) of . Moreover, if we have two datasets ′ and ′ , we can compare the magnitude of difference between ′ and ′ using mean-difference CI. Given Let ℑ be the expectation of ℑ. The null hypothesis 0 : ℑ = 0, while the alternative hypothesis 1 : ℑ > 0. We use Mann-Whitney test [27] , which is a nonparametric test, to determine whether we can reject 0 with the significance level = 0.05. If 0 is rejected, then we can conclude that ⊥ / ⊥ | . In the aspect of estimation statistics, we report the 95%-CI of ℑ . 2. Odd difference oddDiff ( , ): We compute = {oddDiff 1 ( , ), . . . , oddDiff ( , )} on Eq. 7 from B. Let be the expectation of . We use Mann-Whitney test [27] to determine whether we can reject 0 : = 0. If 0 is rejected, then we can conclude that the alternative hypothesis 1 : ≠ 0 is supported. After rejecting 0 , , has a positive association if > 0, otherwise, for < 0, , has a negative association. We report the 95%-CI of . ≠ 0. If we cannot reject 0 , then there is no conclusion regarding the causal direction of , . In contrast, suppose 0 is successfully rejected, − → if > 0, otherwise, for < 0, − → . We report the 95%-CI of . Given is a number of data points, is a number of dimensions, and is a number of bootstrap replicates, for the Vector alignment in Algorithm 3 and the Conditional probability estimation in Algorithm 4, both require ( ). To check whether ⊥ / ⊥ and any independence check, it requires ( ) = ( ) for the bootstrapping approach of which its replicates need to estimate the conditional probability in Eq. 4 where is typically considered as a constant number. In the Algorithm 2, it requires ( 2 ) for line 1-6. For the line 7-16, it also requires ( 2 ) since the number of edges is bounded by ( 2 ) and the operation of Independence checking is ( ). For the line 17-26, it also requires ( 2 ), which has the same reason for the number of edges and the operation to compute the conditional probability requires ( ). Hence, the Algorithm 2 has the time complexity as ( 2 ). In the first simulation, there are 10 poverty indicators. Let 1 , . . . , 10 be random variables of poverty indicators, be a probability of a random variable being 1, and is a random variable that has ( = 1) = . The following equations represent the directed causal relations of these random variables. For each individual, if the value is one in the indicator , it means this individual has a poverty issue in the indicator . In the first simulation, data is generated using ∈ {0.5, 0.3, 0.1, 0.05}, which has 500 individual for each value. In the second simulation, data is generated varying number of individuals ∈ {50, 100, 150, 300, 500, 750, 1000}, which has = 0.3. Infant mortality can be a predictor of poverty [39] . By understanding causal factors of infant mortality, policy makers might be able to understand more regarding poverty situation in areas. This dataset consists of several variables regarding pairs of twins, birth weights, the mortality outcome, etc., from the Twin births of the United States in 1989-1991. There are 71,345 pairs of twin in the dataset. The dataset was used in [26] , which was included in the literature survey work in [20] . In this work, since we are interested only in inferring of causal relations in binary variables, we reformat the dataset and use only binary variables: birth weights of twins, and the mortality outcome along with other risk variables. For the birth weight, the value is one if at least one of the twin has the weight below or equal 1000 grams. Otherwise, it is zero. For the mortality outcome, one represents the twin being death and zero represents being alive. There are also other parent's riskfactor variables we included in the analysis: alcohol use, Anemia, Cardiac, chronic hypertension, Diabetes, Eclampsia, Hemoglobinopathy, Herpes, Incompetent cervix, Lung, Preqnancy-associated hypertension, tobacco use, and Uterine bleeding. All risk-factor variables are one if there is any risk, otherwise, they are zero. Our goal is to used the dataset to evaluate whether the framework is able to reveal the causal relation of birth weight and twin mortality. The survey of poverty used in this paper was from the work in [9] . The survey was taken in 2018. The number of household for Chiang Mai province in the survey was 378,466 households, while it was 353,910 households for Khon Kaen province. The survey was conducted for the purpose of analyzing of multidimensional poverty index (MPI) [3, 5] . The survey consists of 31 indicators for poverty (see some indicators in Table 1 ). The MPI value is between 0 and 1. MPI is close to 0 when there is no poverty in any indicators, while it is close to 1 if everyone has issues in almost all indicators. Hence, lower MPI is better. To the best of our knowledge, there is no direct method that deal with causal inference from binary variables using frequent pattern techniques except the work in [25] . It uses frequent pattern mining to infer causal relations called "Causal rule" from discrete variables using the concept of odd ratios. The framework is consistent with the potential outcome framework [28, 30] in the causal inference [25] . However, the causal-rule framework in [25] assumes that the causal directions are given. Therefore, we modified the causal rule framework to be able to infer causal direction using the same approach as our framework. For the Bayesian network, we deploy the PC algorithm [17] , which is a first practical constraintbased structure learning algorithm from the "bnlearn" package in R [36, 37] . The PC algorithm is designed for inferring causal structure from data, which is suited in our task of causal inference in this paper. Another baseline approach is the Frequent-pattern approach that can be applied in data from binary variables. This approach utilizes the support and confidence in association rule mining directly to find causal relations. For example, if the confidence of Y given X is higher than X given Y, then X causes Y. We compare all methods with the tasks of 1) inferring Transitive causal graph 2) inferring Directed causal graph. In the task of inferring the Transitive causal graph, if X causes Y and Y causes Z, then inferring that X causes Z is good enough. However, in the 2) task, all methods must be able to infer that X causes Y directly but X does not cause Z directly. We measure the performance of all methods using simulation datasets by comparing the inferred causal graphs from both tasks with the ground truth graph using precision (Pre), recall (Re), and F1 score. The true positive (TP) is the case when the causal relation or causal edge (e.g. X causes Y) exists in both inferred and ground-truth graphs. The false positive (FP) is the case when the causal edge exists in the inferred graph but never exists in the ground-truth graph. The false negative (FN) is the case when the causal edge exists in the ground-truth graph but never exists in the inferred graph. The precision is a ratio of TP/(TP+FP), the recall is a ratio of TP/(TP+FN), and F1 score is a ratio of 2(Pre*Re)/(Pre+Re). Results of performance of three approaches in simulation with different levels of (the probability of variable being 1) are in the Table 2 and 3. For the task of interring transitive causal graph (Table 2) , based on the F1 scores, the Frequent pattern approach performed the best, while the second performer is our approach. The last performer is the PC method. In the high value of , all approaches performed the best; the F1 score is equal to 1. However, when the decreases, only Frequent pattern approach performed well. In the task of inferring directed causal graph (Table 3) , however, the Frequent pattern approach performed the worst, while our approach performed the best. When the decreases, only our approach performed well. Results of performance of four approaches in simulation with different number of individuals are in the Fig. 3 and 4 . For the task of interring transitive causal graph (Fig. 3) , based on the F1 scores, the Frequent pattern approach performed the best, while the second performer is our approach. The third performer is the Causal rule method. The last one is the PC algorithm. In the Fig. 2 . Inferred directed causal graphs from a simulated dataset in Section 5.1 with = 0.1, = 500 using four approaches: A. Causal rule method, B. Frequent pattern, C. PC algorithm, and D. Proposed method. Each node represents a variable (e.g. node 1 represents 1 in Eq. 10 and node 4 represents 4 in Eq. 11.) Edges represent causal relations between variables. high value of , all approaches performed the best; the F1 score is equal to 1. However, when the decreases, only Frequent pattern approach performed well. In the task of inferring directed causal graph, however, the PC algorithm and Frequent pattern approach performed poorly, while our approach performed the best. When the decreases, only our approach performed well. The result in Fig. 3 is consistent with the result in Table 3 . Fig. 2 illustrates the results of inferring directed causal graphs from four methods. The proposed method (Fig. 2 D. ) inferred the correct directed causal graph. The Frequent pattern method (Fig. 2 B. ) inferred a causal graph that cannot distinguish between directed and indirected causal relations. For example, in Eq. 12, 6 is directly caused by 1 , 4 and indirectly caused by 2 , 3 (Eq. 10) and These results indicate that the the Frequent pattern is a proper method for the task of interring transitive causal graph, which is simpler than the task of inferring directed causal graph. In contrast, our proposed approach is more appropriate for the task of inferring directed causal graph. Hence, if the task is about inferring directed causal relations, our approach should be used in binary data. Given is a variable of status of twin birth weights (one if the weight of either child below 1000 grams and zero otherwise), and is a variable of twin mortality status (one if both children are dead and zero otherwise) along with other parent's risk-factor variables, the result of causal inference of the proposed framework is below. Only the causal relation of birth weight and the mortality of twins exists. There is a dependency between and . The 95th percentile confidence interval ofˆ( ⊥ / ⊥ ) in Eq. 4 is [0.018, 0.020]. The Mann-Whitney test reject the 0 thatˆ( ⊥ / ⊥ ) = 0 with the significance threshold at 0.05, which implies there exists a dependency between and exists. The Mann-Whitney test reject the 0 that causalDir( , ) = 0 with the significance threshold at 0.05. The 95th percentile confidence interval of causalDir( , ) in Eq. 8 is [0.523, 0.552], which implies − → . Lastly, the mean ofˆ( = 1| = 1) is 0.94 and the 95th percentile confidence interval of ( = 1| = 1) is [0.926, 0.950]. Additionally, assuming , do not have any confounding factor outside the dataset, since has no parent,ˆ( = 1| = 1) = ( = 1| ( = 1)) [31] where ( | ( = )) represents an intervention distribution of intervening by fixing = . Hence, causes . It implies that almost all mortality in twins have issues of lo w birth weights but not all low-birth-weight twins were died. No other risk variables have strong causal relations. This result is consistent with the work in [6] that the low-birth-weight issue has smaller effect on twin mortality than previous belief; it is not a sole cause of birth mortality. While the low-birth-weight issue plays a key role in twin mortality, other confounding factors (e.g. genetic) might contribute significant effect on twin mortality [6] . In Khon Kaen province, there is a sole dependency between smoking cigarette 25 and drinking alcohol 24 . The 95th percentile confidence interval ofˆ( 24 ⊥ / ⊥ 25 ) in Eq. 4 is [0.092, 0.094]. The Mann-Whitney test reject the 0 thatˆ( 24 ⊥ / ⊥ 25 ) = 0 with the significance threshold at 0.05. There is no evidence of causation between them. In Chiang Mai province, on the other hand, there is a sole dependency between smoking cigarette and drinking alcohol but the result shows that smoking cigarette might cause drinking alcohol. The Mann-Whitney test reject the 0 thatˆ( 24 ⊥ / ⊥ 25 ) = 0 with the significance threshold at 0.05. The 95th percentile confidence interval ofˆ( 24 ⊥ / ⊥ 25 ) in Eq. 4 is [0.059, 0.061]. The Mann-Whitney test reject the 0 that oddDiff ( 24 , 25 ) = 0 with the significance threshold at 0.05. The The 95th percentile confidence interval of oddDiff ( 24 , 25 ) in Eq. 7 is [0.060, 0.062], which implies a positive association. The Mann-Whitney test reject the 0 that causalDir( 25 , 24 ) = 0 with the significance threshold at 0.05. The 95th percentile confidence interval of causalDir( 25 , 24 ) in Eq. 8 is [0.254, 0.262], which implies 25 − → 24 . Lastly, the mean ofˆ( 24 = 1| 25 = 1) is 0.73 and the 95th percentile confidence interval of ( 24 = 1| 25 = 1) is [0.7250.733]. Additionally, assuming 24 , 25 do not have any confounding factor outside the dataset, since 25 has no parent,ˆ( 24 = 1| 25 = 1) = ( 24 = 1| ( 25 = 1)) [31] where ( 24 | ( 25 = 1)) represents an intervention distribution of 24 intervening by fixing 25 = 1. This implies smoker trends to drink alcohol but not vise versa. The MPI of Khon Kaen is 0.018 while the MPI of Chiang Mai is 0.024. This implies Chiang Mai has higher degree of poverty than Khon Kaen's. Nevertheless, by using MPI alone, we cannot discover the possible causal relation of smoking causes drinking alcohol. In literature, it is not surprised that smoking associates with drinking alcohol [38] . However, due to the nature of results from exploratory data analysis, the smoking and drinking alcohol causal relation in this study can be considered as a guideline of possible causal relation and it is needed to be validated in the experimental study. In this work, we proposed an exploratory-data-analysis framework for finding possible causal relations among factors that contribute to poverty from similar data sources that are used in MPI analysis. By combining causal graph and MPI, not only we know how severe the issue of poverty is, but we also know the causal relations among poverty factors, which can help us to target the right issues to solve poverty effectively. We evaluated the proposed framework with several baseline approaches in simulation datasets varying degree of noise and number of data points. Our framework performed better than baselines (Frequent pattern and Causal rule methods) in most cases. The first case study of Twin births of the United State revealed that almost all mortality cases in twins have issues of low birth weights but not all low-birth-weight twins were died. The second case study revealed that smoking associates with drinking alcohol in both provinces. While there was no causal relation in Khon Kaen province, there was a causal relation of smoking causes drinking alcohol in Chiang Mai province. Note that the causal relations inferred by this work is not the real causal relations; they are empirical causal relations that needed to be validated. Our main goal is to develop an exploratory data analysis tool to pinpoint possible causal relations to support researchers before the validation in the field studies to find real causal relations. The framework can be applied beyond the poverty context. Lastly, the framework in this work has already been implemented in R programming language [33] in a form of R package "BiCausality" [7] . Frequent pattern mining Mining Association Rules Between Sets of Items in Large Databases The global multidimensional poverty index (MPI) 2021 United Nations Development Programme Human Development Report Office Background Paper No Multidimensional Poverty Index 2010: research briefing. OPHI Briefing The costs of low birth weight BiCausality: Binary Causality Inference Framework in R Anon Plangprasopchok, and Suttipong Thajchayapong. 2020. A nonparametric framework for inferring orders of categorical data from category-real pairs Identifying linear models in multi-resolution population data using minimum description length principle to predict household income Variable-Lag Granger Causality and Transfer Entropy for Time Series Analysis Machine Learning and Causal Inference for Policy Evaluation Bootstrap of the mean in the infinite variance case The Contribution of Data-Driven Technologies in Achieving the Some asymptotic theory for the bootstrap Estimation statistics should replace significance testing The earth is round (p<. 05): Rejoinder Order-independent constraint-based causal structure learning Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results A Survey of Learning Causality with Data: Problems and Methods The fickle P value generates irreproducible results Frequent pattern mining: current status and future directions Moving beyond P values: data analysis with estimation graphics From Observational Studies to Causal Rule Mining Causal effect inference with deep latent-variable models. Advances in neural information processing systems On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other Counterfactuals and causal inference Bayesian netwcrks: A model cf self-activated memory for evidential reasoning Causal inference in statistics: An overview Elements of causal inference: foundations and learning algorithms A Causality Between Health and Poverty: An Empirical Analysis and Policy Implications in the Korean Society 2022. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Frank Schilbach, and Vikram Patel. 2020. Poverty, depression, and anxiety: Causal evidence and mechanisms Learning Bayesian networks with the bnlearn R Package Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package Drinking and Smoking: a Field Study of their Association1 Urban poverty and infant mortality rate disparities Sustainable Utilization of Financial and Institutional Resources in Reducing Income Inequality and Poverty The poverty trap of education: Education-poverty connections in Western China The authors would like to thank the National Electronics and Computer Technology Center (NECTEC), Thailand, to provide our resources in order to successfully finish this work. This paper was supported in part by the Thai People Map and Analytics Platform (TPMAP), a joint project between the office of National Economic and Social Development Council (NESDC) and the National Electronic and Computer Technology Center (NECTEC), National Science and Technology Development Agency (NSTDA), Thailand.