key: cord-1018999-8o4kfhe1 authors: Cox, Louis Anthony; Popken, Douglas A. title: Should air pollution health effects assumptions be tested? Fine particulate matter and COVID-19 mortality as an example date: 2020-09-02 journal: Glob Epidemiol DOI: 10.1016/j.gloepi.2020.100033 sha: 480b27e73b0975285b0ca6868484bb5db0213e1f doc_id: 1018999 cord_uid: 8o4kfhe1 In the first half of 2020, much excitement in news media and some peer reviewed scientific articles was generated by the discovery that fine particulate matter (PM2.5) concentrations and COVID-19 mortality rates are statistically significantly positively associated in some regression models. This article points out that they are non-significantly negatively associated in other regression models, once omitted confounders (such as latitude and longitude) are included. More importantly, positive regression coefficients can and do arise when (generalized) linear regression models are applied to data with strong nonlinearities, including data on PM2.5, population density, and COVID-19 mortality rates, due to model specification errors. In general, statistical modeling accompanied by judgments about causal interpretations of statistical associations and regression coefficients – the current weight-of-evidence (WoE) approach favored in much current regulatory risk analysis for air pollutants – is not a valid basis for determining whether or to what extent risk of harm to human health would be reduced by reducing exposure. The traditional scientific method based on testing predictive generalizations against data remains a more reliable paradigm for risk analysis and risk management. In our view, testing falsifiable theory-based predictions against data not used in deriving the theory is the sine qua non of traditional sound science as applied in disciplines from astronomy to zymology (e.g., Craig 2013) , and we know of no clear methodological reason not to apply it in air pollution health effects research. One purpose of this paper is to discuss and illustrate how nonparametric and graphical (Bayesian network) methods can help to implement this approach in practice, taking as an illustrative example the question of whether a data set provides evidence that past levels of exposure to fine particulate matter (PM2.5) air pollution increase risks of COVID-19-associated mortality. This approach was contrasted with a popular alternative framework, widely favored in regulatory risk assessment and policy making over the past decade, in which scientists use their best judgments -often said to reflect the (never precisely defined) "weight of evidence" (WoE) from all sources that they consider -to draw causal conclusions and make policy recommendations. In this view, regulatory risk assessments and review processes should focus on building consensus about the need to regulate by agreeing on expert judgments about the most appropriate causal interpretations and policy implications of statistical "relationships" and "links" between air pollution and health. The epidemiological "links" in question usually refer to positive exposure concentration-response (C-R) regression coefficients in (selected) regression equations. There is no formal requirement in the WoE framework that further technical details or explanations of these coefficients be specified, such as whether or to what extent they reflect residual confounding, model specification errors, measurement errors, or other non-causal factors, before considering them as evidence for a causal relationship in "causal determination" judgments; thus, for example, finding that multiple high-quality observational studies show an association, even though copollutant exposures are difficult to address, exemplifies the support needed for a judgment of "likely to be a causal relationship" in a WoE framework (Cox 2019a ). The WoE approach does not require that causal judgments have more precise conceptual or operational meanings (e.g., distinguishing between necessary, sufficient, or contributing causes; or between direct and indirect effects; or providing an explicit philosophical or logical basis for defining causal effect); or make unambiguous predictions (e.g., about whether or by how much reducing air pollution levels would reduce health risks, given levels of other causally relevant variables); or that such predictions be tested against data before the conclusions are accepted and used to make policy recommendations. To the contrary, advocates of the WoE approach have objected that upholding these principles for air pollution health effects research would "place a nearly unattainable burden of proof" on a community not accustomed to having to provide such empirical proof for its convictions (Goldman and Dominici 2019, 1399) . A second purpose of this paper is to argue that the WoE framework's use of expert judgments about the causal interpretation of significant positive regression coefficients is unnecessary and unsound. It is unnecessary because less restrictive nonparametric techniques allow identification of mutual information (i.e., statistical dependence) between variables without imposing the assumptions of parametric regression models, which invite the risk of model specification errors. It is unsound because significant positive regression coefficient arise in numerous non-causal ways, including model specification error, measurement error, residual confounding, and non-random subject selection; and the regression models and coefficients themselves provide no basis for judging why one is significantly positive, or whether it would remain so if model specification errors and other errors were removed. Human judgment cannot overcome this statistical limitation: if the information that is logically necessary to ascertain causality is missing, judgment alone cannot provide it. However, if direct causes provide unique information about their effects, nonparametric tests for mutual information (or, conversely, for conditional independence) between random variables provide data-driven tests for potential evidence of causality -no mutual information, no evidence of potential causality between exposure and response -without the necessity of judging why regression coefficients might (or might not) be positive (e.g., Berrett and Samworth 2019; Bouezmarni and Taamouti 2014) . Such nonparametric information-based tests are robust to many forms of distortion and measurement error that could invalidate more restrictive parametric modeling assumptions (ibid). The second half of this paper seeks to illustrate the use of nonparametric and information-based (conditional independence test) methods for testing for potential evidence of causality in thePM2.5-COVID-19 example. J o u r n a l P r e -p r o o f Journal Pre-proof North (2020) framed these two approaches as a clash of paradigms for how best to use data to reach causal conclusions. We view the first paradigm as the traditional scientific method used in most areas of applied science, and consider its demands for tests of assumptions and conclusions against data to be an essential part of this paradigm. Recent methods of causal analysis have led to increased ability to meet these demands using observational data (Cox 2018 . Specifically, we propose and illustrate the following data analysis steps for implementing the four proposed principles of scientific approach and applying them to assess whether a data set (or more than one) provides evidence that exposures (e.g., to PM2.5) increase risk of an adverse response (e.g., COVID-19 mortality):  Use conceptually and operationally clear definitions of exposure and response variables and of the causal effects of interest between them. We accept without change the definitions of PM2.5 exposure concentrations and COVID-19 mortality in the data sources used for our example. We propose that to be of interest, a causal effect of exposure on response must satisfy the condition that the response is not conditionally independent of exposure, given the values of other covariates. This is a deliberately expansive constraint, intended to reflect a necessary rather than a sufficient condition; it allows for predictive causation (changes in exposure help to predict subsequent changes in response); manipulative causation(changes in exposure change response probabilities); necessary causation (response probability does not change unless exposure changes); sufficient causation (response probability changes if exposure changes); and various types of contributing causation(response probability changes based on the values of exposure and other variables) (Cox 2018) . What it does not allow for is that an association between PM2.5 and COVID-19 mortality can be judged to be "causal," or to provide evidence of a causal or likely causal relationship, if COVID-19 is conditionally independent of PM2.5 given the values of other variables (e.g., winter temperatures, which might affect both PM2.5 and COVID-19 mortality rates).  Show explicit, independently verifiable derivations of causal conclusions from stated assumptions and data. To implement this requirement, we use standard software packages (e.g., nonparametric classification and regression tree (CART) software and Bayesian network (BN)-learning software) to perform conditional independence tests (e.g., Vitolo et al. 2018; Nagarajan al. 2013; Cox 2018) . In general, the causal conclusions derived by applying conditional independence tests to observational data are as follows: If the null hypothesis of conditional independence between exposure and response (here, PM2.5 and COVID-19 mortality risk) is not rejected (i.e., if there is no arrow between them in a BN, and if PM2.5 is not J o u r n a l P r e -p r o o f Journal Pre-proof identified as a significant predictor of COVID-19 mortality risk after conditioning on other variables such as winter temperatures in a CART tree), then these tests provide no evidence that exposure is a cause of response, in the rather permissive sense just discussed. (Of course, pooling data across many individually underpowered studies might allow a more powerful test. Absence of arrows only indicates that no effect was detected and not necessarily that no effect exists; an effect that is too small to be detected cannot be ruled out. However, for large data sets (e.g., Vitolo et al. 2018) , the plausible size of undetected effects is limited, and simulation can be used to put plausible upper bounds on the sizes of hypothesized unobserved effects (Cox and Popken 2015.) If the null hypothesis is rejected, then the data do provide evidence that that exposure might be a cause of the response (here, that PM2.5 might be a cause of COVID-19 mortality): the proposed necessary condition is satisfied.  Provide careful qualification of causal interpretations and conclusions to correctly and transparently characterize remaining uncertainties and ambiguities. This is done by noting the conditional independence tests are used to test whether a proposed necessary condition for any causal relationship of interest is satisfied. It does neither more nor less. The remaining uncertainties if conditional independence between PM2.5 and COVID-19 mortality risk is not rejected are about the sizes of effects that might still exist without having been detected (i.e., without having led to rejection of the null hypothesis of conditional independence). This can be illuminated by studying the smallest effect sizes that are reliably detected. The remaining uncertainties if conditional independence between PM2.5 and COVID-19 mortality risk is rejected are about false-positive rates and about why the two variables are not conditionally independent (e.g., does this reflect predictive causation, manipulative association, omitted confounders, or something else).  Show results from empirical tests of causal conclusions (predictive generalizations) implied by causal theories or models against observational data. In this paper, the empirical tests of causal conclusions consist simply of the conditional independence tests for whether COVID-19 mortality risk is conditionally independent of PM2.5, given the values of other variables (e.g., those identified in a CART tree, a random forest ensemble, or via BN learning as predictors of COVID-19 mortality). The predictive generalization that COVID-19 mortality risk should depend on PM2.5 if PM2.5 (if PM2.5 is a direct cause of it) is tested empirically against observational data via conditional independence tests, and the results can be shown explicitly, e.g., as CART J o u r n a l P r e -p r o o f Journal Pre-proof trees or BNs with splits or arrows indicating detected dependence relations for which the null hypothesis of conditional independence is rejected based on the observational data. Thus, we propose that conditional independence tests now widely available in software used throughout much of machine learning, computational statistics, and data science can be used to support the four steps of the scientific approach. By contrast, the portion of the WoE framework on which we focus also has four steps and supporting statistical methods, as discussed and illustrated subsequently; to us, the key part is an expert judgment about whether significant positive C-R regression coefficients should be treated as evidence that reducing exposure would reduce risk, e.g., based on the Bradford Hill considerations (i.e., strength, consistency, temporality, plausibility, etc. of associations) (Cox 2018 ). More generally, our main proposal is that innovations in data science, such as conditional independence tests and supporting software, make such judgments unnecessary, at least to the extent that these tools can be applied reliably to the types and sizes of data sets that are available, which has become increasingly practical with the development and widespread application of relevant machine learning and computational statistics methods and packages in recent years (e.g., Dorie et al. 2018; Glymour et al., 2019; Vitolo et al. 2018; Nagarajan al. 2013; Cox 2018) . Instead, the results of testing testable implications of the causal hypothesis that exposure increases risk can be shown as evidence about the extent to which data do or do not support the hypothesis that exposure is a cause of an adverse effect in an exposed population. Many testable implications of the hypothesis that exposures cause adverse health effects, along with principles and algorithms for testing these implications using observational data, have been developed over the past century, and have been shown to work well in practice by various metrics for many simulated and real data sets (Dorie et al. 2018; Glymour et al. 2019 ). Examples include the following (Cox 2018) :  Effects depend on their direct causes. Conditional independence tests test this by ascertaining whether data allow the corresponding null hypothesis, that an effect is conditionally independent of a hypothesized direct cause to be rejected. If the probability distribution for the effect differs significantly for different values of the hypothesized direct cause, holding other potential direct causes fixed, this provides evidence that the effect depends on the hypothesized direct cause.  Information flows from causes to their direct effects over time. Changes in causes help to predict and explain subsequent changes in the probability distribution of their direct effects. Various formalizations of this concept have been developed (Wiener, 1956; Granger, 1969) , recently leading to software implementing nonparametric tests and estimation procedures for information flows between time series variables based on transfer entropy (Behrendt et al. 2019) . The example in this paper focuses on conditional independence testing, which is relatively well developed and undemanding (Glymour et al., 2019; Nagarajan al. 2013; Pearl 2009 ): unlike ICP, it can be applied to a single data set; and unlike transfer entropy, it does not require time series data for both cause and effect. But the larger point is that innovations in data science and computational statistics make it practical to test many proposed implications of causality with observational data (Cox 2018; Glymour et al. 2019) , or with a mix of observational and interventional data (Triantafillou et al. 2015) . Doing so, and displaying the results, advances the application of principles 1-4 above. North (2020) interprets these innovations as a new paradigm for causal modeling. He may well be right, but we also view testing theory-derived predictions against observations using independently verifiable calculations and reproducible procedures as defining elements of scientific method since at least Galileo (Ross 1971) . By contrast, we view the WoE paradigm's rejection of the need for such empirical tests in favor of the authoritative judgments of selected experts as a retreat from the traditional requirements of sound science. Although the WoE paradigm is sometimes described as an approach to "assessing all the evidence," we focus here on its use of judgment to assess the causal significance of epidemiological evidence consisting of significant positive C-R regression coefficients. This type of "evidence" has played a dominant role in recent claims about adverse health effects attributed to PM2.5, including suggestions that PM2.5 increases COVID-19-related mortality risk (Wu et al. 2020) . We object to it on the grounds that finding a significant positive C-R regression coefficient typically usually has no implications for the hypothesis of causality, and appealing to judgment cannot fix this limitation of what regression coefficients show (e.g., that conditioning on exposure reduces J o u r n a l P r e -p r o o f Journal Pre-proof mean squared prediction error for the response) or make them show something more relevant for causal inference (e.g., whether changing exposure would change the probability distribution of the response) (Pearl 2009 ). North traces the divergence of these paradigms to the acceptance into epidemiology and regulatory risk assessment of the influential work of Sir Austen Bradford Hill in the 1960s, which sought a basis for making intuitive judgments about whether epidemiological associations were best explained as being causal, without applying formal causal analysis methods or testing competing explanations. Ironically, those who favor the WoE framework often characterize calls to apply the scientific method as an attack on science, rather than as a challenge to experts to apply science to back up their judgments with science (Tollefson 2019). The frequency and ferocity of ad hominem attacks (e.g., Drugmand 2020) suggests that North's diagnosis of a clash of paradigms may well be correct. This article continues the discussion using an important recent real-world example to illustrate how the paradigms differ and why the choice between them matters: interpreting studies associating air pollution and COVID-19. Concern that compelling evidence for human health benefits caused by tighter regulation of air pollution might be unattainable from real-world data (Goldman and Dominici 2019) is well-founded: the benefits assumed and claimed in WoE analyses have proved difficult to find in evidence-based studies that have compared public health risks before and after pollution-reducing interventions or changes (Burns et al. 2020) , even under conditions where they should have been easily seen if they were approximately as large as claimed (Cox and Popken 2015) . For example, for fine particulate matter (PM2.5), a positive correlation between levels of PM2.5 and mortality is clear in many studies -both PM2.5 levels and mortality rates are higher in some times and places than in others, inducing strong correlations and regression coefficients between them. This has sufficed to drive causal determinations and recommendations to regulate in a WoE framework that deals in vague "links" and "relationships." But in multiple studies in multiple countries over many years, reducing PM2.5 has not been found to have an unequivocal causal effect -or, in many studies, even a clear association -with changes in allcause mortality risk (Burns et al. 2020 , Vitolo et al. 2018 , which would be the hallmark of a genuine causal relationship between them (Pearl 2009 ). For example, Burns et al. (2020) , after reviewing 42 such studies, conclude that "Given the heterogeneity across interventions, outcomes, and methods, it was difficult to derive overall conclusions regarding the effectiveness of interventions in terms of J o u r n a l P r e -p r o o f Journal Pre-proof improved air quality or health. Some evidence suggests that interventions are associated with improvements in air quality and human health, with very little evidence suggesting interventions were harmful."Of course, as Burns et al. (2020) also emphasize, absence of clear evidence is not clear evidence of absence of an effect, although perhaps it is clear evidence of absence of an effect large enough to detect in the studies reviewed, or of the sizes predicted by regression models when regression coefficients are interpreted causally (Cox and Popken 2015) . The absence of clear evidence that regulations or other interventions that reduce ambient air pollution in recent decades have caused reductions in all-cause mortality has often been met by using expert judgments, regression models of associations, and counterfactual causal interpretations of regression model results to predict that these changes should take place in theory (i.e., according to the regression models if they are interpreted causally), whether or not they actually do take place. Predictions from computer models stocked with consensus assumptions, rather than empirical validation of predictions against data, are generally treated as sufficient in the WoE paradigm to draw conclusions and policy recommendations to be shared with policy-makers and the press. For example, in the United States, the United States Environmental Protection Agency (US EPA) BenMAP-C computer model uses regression models and expert judgments to predict how changes in air pollution would change public health effects, even though the detailed documentation for its health impact functions repeatedly notes that causal information was not included (Cox 2016) . It is well understood in epidemiology that, technically, correlation is not causality and regression coefficients reflect only whether predictors help to predict the dependent variable in a regression model (e.g., reducing the mean squared error (MSE) or increasing the value of the likelihood function for regression-based predictions), and not whether or how much changing the values of predictors would change the distribution of the dependent variable (Pearl 2009 ). Nonetheless, it remains common practice to present estimated or assumed air pollution concentration-health response (C-R) associations and regression coefficients as if they were causal relationships with life-and-death implications; users of BenMAP-C often make this assumption (Cox 2016) . Authoritative expert judgment and consensus bridge the evidentiary gap: a C-R association is treated as if it were causal inappropriate authorities -or the scientists doing the work and reporting the results -agree that they think causality is the best explanation for it (Cromar and Ewart 2016) . As one recent example among many, Chen et al. (2020) estimated that a "reduction in PM2.5 during the [COVID-19] quarantine period avoided a total of 3214 PM2.5-related deaths (95% CI 2340-4087) in China, 73% of which were from cardiovascular diseases" during a 34-day quarantine period. These numbers were calculated by assuming(and therefore J o u r n a l P r e -p r o o f predicting) that reducing PM2.5 concentrations causes approximately proportional reductions in daily mortalities, with the constant of proportionality being estimated from statistical associations in past data. Thus, the claim that reducing PM2.5 "avoided a total of 3214 PM2.5-related deaths" is not driven by observations of any actual reduction in death counts compared to what would have been expected in the absence of reduced PM2.5. Rather, it reflects a judgment that previously estimated statistical slope coefficients describing C-R associations should be used to project reduced mortalities. Such judgmentbased projections require no observations about actual death counts during the quarantine period. That comparing model predictions to real-world observations is not necessary for applying the WoE paradigm is also well illustrated by studies in the United States that predict human health benefits from reducing ambient pollution levels. Such predictions can be generated conveniently using the BenMAP-C software, which supplies the judgment-based assumptions and regression models needed to generate positive heath benefits estimates. In defending this use against objections that it treats association as causation, assumptions as data, and hypothetical predictions as facts (Cox 2016), proponents explained that "The purpose of our report was not to demonstrate causation between exposure to O3 and PM2.5 air pollution and adverse health effects. Our estimates of excess morbidity and mortality are based not simply on observed associations but, rather, on the 'hundreds of epidemiology studies and decades of related scientific research' that clearly establish a relationship between exposure to PM2.5 and O3 and adverse health outcomes. … BenMAP is a well-established research tool that has been used by many investigators to develop estimates of the health benefits that can be achieved by reducing air pollution" (Cromar and Ewart 2016) .Here, the precedent and popularity of treating an established statistical C-R "relationship" (specifically, positive C-R regression coefficients) between exposure to PM2.5 and O3 and adverse health outcomes as being causal is deemed sufficient justification for continuing the practice. The burden of empirical proof -either showing that the model projections successfully predict real-world experience (e.g., that substantial reductions in PM.5 are followed by corresponding changes in the adverse health effects said to be caused by PM2.5), or else explaining why not and revising the assumptions in the BenMAP model accordingly -is avoided by substituting computer simulations for reality and appealing to expert judgment and tradition to decide whether to accept the simulation results as real for purposes of policy making and risk communication. Failure of the projected benefits to be detected in real data Popken 2015,Vitolo et al. 2018; Burns et al. 2020 ) is of no consequence in a WoE framework that treats expert judgment, precedent, and consensus as the ultimate arbiters for which modeling assumptions and predictions should be accepted. But this also deprives those who rely on the results of the opportunity to learn from reality J o u r n a l P r e -p r o o f and to correct errors in modeling assumptions. "Burden of proof" (Goldman and Dominici 2019) may be too strong a phrase, in that epidemiological papers seldom seek to prove their causal conclusions, but only to present evidence, which is usually less than conclusive. However, the guidance that causal claims should make explicit, empirically testable predictions (such as that effects are not conditionally independent of their direct causes, and other implications discussed previously), and that these predictions should in fact be tested and the results presented before stating causal conclusions, are not onerous to implement, as illustrated next. As COVID-19 mortalities mounted worldwide in the first two quarters of 2020, environmental activists and scientists rushed to shape policy with headlines and scientific articles warning that fine particulate matter air pollution (PM2.5) increases risk of COVID-19-related illness and death. Once again, WoE thinking and unverified model predictions paved the way. For example, Jiang et al. (2020) used Poisson regression model to conclude that PM2.5 and humidity increased the risk of daily COVID-19 incidence in three Chinese cities, while coarse particulate air pollution (PM10)and temperature decreased this risk. Bashir et al. (2020) as implying that "A small increase in long-term exposure to PM2.5 leads to a large increase in the COVID-19 death rate." This interpretation attracted national headlines and widespread political concern (Friedman 2020). These examples follow a common technical approach with the following steps, which we view as exemplifying WoE thinking as it applies to interpreting evidence from one (or more) regression models: 1. Collect data on estimated air pollution levels, one or more adverse health outcomes of interest (such as COVID-19 mortality), and covariates of interest (e.g., humidity, temperature, population density, etc.) 2. Fit one or more regression model to the data, treating air pollution levels as predictors and adverse health outcomes as dependent variables. Include other variables as covariates at the modeler's discretion. J o u r n a l P r e -p r o o f 3. If the regression coefficient for a pollutant as a predictor of an adverse health outcome is significantly positive in the one or more regression models, use judgment to interpret this as evidence that reducing levels of the pollutant would reduce risk of the adverse health outcome. 4. Communicate the results to policy makers and the press using the policy-relevant language of causation and change -that is, claim that a given reduction in pollution would create a corresponding reduction in adverse health outcomes -rather than in the (technically accurate) language of association and difference: that a given difference in estimated exposures is associated with a corresponding difference in the conditional expected value of a dependent variable predicted by the selected regression model. Step 3 is based on a judgment that a positive regression coefficient in a modeler-selected regression model is evidence of a causal relationship: that it implies or suggests that reducing exposure would reduce risk, even if the experiment has not actually been made. In this respect, it incorporates the central principle of the Woe framework: that a well-informed expert scientist can make a useful judgment about whether the association indicated by a statistically significant positive regression coefficient is likely to be causal. We next scrutinize this assumption. As noted by Dominici et al. (2014) , either significant positive coefficients or significant negative regression coefficients (or no significant regression coefficient at all) for air pollution as a predictor of mortality risk can often be produced from the same data, depending on the modeling choices made; thus "There is a growing consensus in economics, political science, statistics, and other fields that the associational or regression approach to inferring causal relations-on the basis of adjustment with observable confounders-is unreliable in many settings."In the field of air pollution health effects research, however, investigators continue to rely on regression modeling in step 2 of the above approach. A skilled regression modeler can usually produce a model with a significant positive regression coefficient for exposure in step 2, allowing steps 3 and 4 to proceed. We illustrate next how this can be done, using a data set on PM2.5 and COVID-19 mortality in the United States as an example. The data set, described and provided via a web link in Appendix A, compiles county-level data on historical ambient PM2.5 concentration estimates, COVID-19 mortality rates and case rates (per 100,000 people) through April of 2020, along with numerous other county-level variables. A key step in regression modeling is to select variables to include in the model. Figure 1 Importance plots for several variables as predictors of COVID-19 mortality (left) and case rates (right) per 100,000 people. The plots are generated by random forest nonparametric model ensembles that explain about 48% of the variance in mortality rates and 40% of the variability in case rates among counties in the United States as of April, 2020. Appendix provides details and data. "Importance" is measured as the percentage increase in mean squared prediction error ("%IncMSE") if a variable is dropped as a predictor. Variable labels are defined in the text and in Appendix A for the most important variables; see data sources in Appendix A for all variables. does not suggest that either causes the other: it simply reflects that counties with higher percentages of Hispanic populations tend to also have higher PM2.5 levels. However, if variables depend on their direct causes, then absence of an arrow between two variables corresponds to absence of empirical evidence in the BN that either directly causes the other. COVID-19 mortality in Figure 2 is shown as depending directly on latitude and longitude (which are presumably surrogates for other biologically effective causes), as well as on time since first case in a county (FirstCaseDays), average winter temperature, and ethnic composition (PCT_BLACK and PCT_HISP). Figure 3 shows an analogous BN for COVID-19 case rate, which depends directly on latitude and longitude, ethnic composition (PCT_BLACK and PCT_HISP), time since first case in a county (FirstCaseDays), and education (PCT_ED_HS). Bayesian network learning is a relatively new technique for exploring and visualizing direct and indirect dependencies among variables. As an alternative, Figure 4 shows a classification and regression tree (CART) tree for COVID-19 mortality. The CART algorithm (implemented in the rpart package in R) recursively partitions counties into clusters with significantly different COVID-19 mortality rates, based on the results of binary tests ("splits"), such as whether Longitude < -75.61 (yes = left branch, no = right branch). For example, the counties with Longitude< -75.61, PCT_BLACK< 0.2636, and time since first case < 44.5 days have an average COVID-19 mortality rate of less than 3 per 100,000 (2.436, although 3 J o u r n a l P r e -p r o o f significant digits is spurious precision), compared to a rate over 50 times greater (148.7 per 100,000) for counties further to the East with high population densities and longer times since first cases. Although CART trees are subject to residual confounding due their binary splits of continuous variables and are not very robust, in the sense that fitting them to multiple random samples from the same data set often produces different trees (e.g., with WINTERAVGTMP in some and FebMinTmp2000.2019 in others), they provide a relatively simple, well-established nonparametric technique for exploring significant predictors of a selected dependent variable such as DeathRate100K. The predictors identified in Figure 4 are (1) For simplicity, equation (1) follows Wu et al. (2020) in assuming that risk depends on a weighted sum of terms on the right side, ignoring interaction terms (e.g., that increasing PM2.5 should not increase death rates if population density = 0); consequences of this modeling choice are discussed later. Fitting equation (1) to the data set via ordinary least squares (OLS) regression yields Table 1 . Tables 1 and 2 ) is negative and not significantly different from 0 (p = 0.87), consistent with Figure 2 . However, regression modeling allows modelers to select variables to include in the model, which can drive the results that get published (Dominici et al. 2014) . For example, dropping Longitude from the regression model yields Table 2 . Now the regression coefficient for PM2.5is positive instead of negative, and it is highly significant (p = 0.000053) instead of non-significant. In effect, PM2.5 acts as a partial surrogate for longitude for predicting COVID-19 mortality risk, so that omitting longitude induces PM2.5 to enter with a significant positive coefficient. Table 2 as evidence that an increase in PM2.5 increases PM2.5 mortality risk would be mistaken: it is only evidence that the modelers made choices (such as omitting longitude from the model) that led to a positive regression coefficient. This example not only illustrates the obvious point that omitting from a regression model predictors such as longitude, on which both PM2.5 and COVID-19 mortality rate depend (Figures 2, 4 and 5) , can induce a significant positive C-R regression coefficient for PM2.5 when COVID-19 mortality rate is regressed against it and other variables; but it also illustrates the more constructive point that nonparametric methods can help to identify variables that must be conditioned on to avoid such spurious C-R coefficients. The BN in Figure 2 indicates that longitude provides information about both PM2.5 and COVID-19 mortality rates that the other variables do not, implying that it is a potential confounder that must be adjusted for in order to obtain unbiased C-R coefficients (Textor et al. 2015 ). The linear model (1) More generally, there are many reasons that PM2.5 might have a significant positive regression coefficient that do not imply that increasing PM2.5 would increase risk. As a simple conceptual example, suppose that PopDensityLog is a confounder of the PM2.5-Risk association, and that the structural equations describing the causal relationships among these variables are as follows: If model (4) is fit to a large data set, e.g., 1000 observations in which PopDensityLog is randomly sampled from a continuous distribution (e.g., with values uniformly distributed between 0 and 1) and corresponding values of E(Risk) andPM2.5 are calculated using equations (2) and (3), respectively, then the ordinary least-squares fit will be b 0 = 0, b 1 = 0, and b 2 = 1. Thus, regression identifies a significant positive coefficient for PM2.5, and not for the confounder PopDensityLog, because these parameter values minimize prediction error (MSE). But this coefficient has no relevance for determining how or whether changing PM2.5 would change Risk. A claim that such a regression analysis had "controlled for" potential confounding from PopDensityLog by including it in regression model (4), and yet still found that PM2.5 increased risk, would be wrong. A judgment that such an analysis provides evidence that increasing PM2.5 levels increases risk would be mistaken. As a less hypothetical example, suppose we create a synthetic data set that is identical to the one for Table 2 , except for the addition of a new Risk variable defined as Risk = PopDensityLog 2 . In other words, we artificially create a variable that we know is determined only by population density, via the nonlinear formula Risk = (log(population density)) 2 . (This example is suggested by Figure 6 , which shows a scatter plot of COVID-19 deaths per 100,000 against PopDensityLog.) Fitting a multiple linear regression model to the data with this artificial Risk variable as the dependent variable yields the results in Table 3 . All but one of the predictors, including PM2.5 (2000-2016AveragePM25), have highly statistically significant positive regression coefficients, even though, by construction, Risk does not depend on anything other than PopDensityLog. The reason is that the multiple linear regression model's assumption that risk depends only on a weighted sum of the predictors is false. As illustrated in Figure 6 for the real risk variable (DeathRate100k), risk varies nonlinearly with PopDensityLog. The mistaken modeling assumption of linearity is sufficient to induce many other predictors to enter the regression J o u r n a l P r e -p r o o f model with significant positive coefficients, because including them helps to reduce the mean squared prediction error due to model misspecification. Again, interpreting such a positive regression coefficient for exposure as evidence that reducing exposure would reduce risk is mistaken. Instead, positive regression coefficients are only evidence that the assumed regression model does not describe the data. Nonparametric methods help to avoid these difficulties. Figure 7 shows a CART tree for the same example as in Table 3 . In this tree, as also in a non-parametric Bayesian network fit to the same data, the only predictor of PopDensityLog 2 is found to be PopDensityLog. Figure 7 . A CART tree model for the dependent variable PopDensityLog 2 This example illustrates that even including the right variables (such as measured confounders) in an adjustment set to obtain unbiased estimates of a C-R coefficient in a regression model, controlling for confounders without introducing collider biases (Textor et al. 2015) , does not suffice to prevent spurious significant positive C-R coefficients if the model form is incorrectly specified -for example, by assuming a generalized linear model or no interactions when nonlinearities and interactions among predictors are important, as they are in this example (Figures 4 and 5) . More constructively, it shows that nonparametric methods can help to avoid such spurious C-R coefficients by clarifying which variables provide unique information about a dependent variable (Figure 7) , and which merely reduce errors in predicting the dependent variable by helping to correct for the errors introduced by improper specification of the model (Table 3) . More generally, PM2.5 could have a significant positive regression coefficient as a predictor of COVID-19 mortality risk for any or all of the following reasons (Cox 2020):  Model specification errors, e.g., if mortality rate is assumed to depend on a weighted sum of variables, but in fact its dependence is better described by a model with nonlinearities or interaction terms, as in Table 3 .  Omitted confounders, as in the example of PM2.5 and COVID_19 mortality risk depending on latitude and longitude (independently of other factors, as shown in Figure 2 ), if these factors are omitted; J o u r n a l P r e -p r o o f  Measurement errors in explanatory variables, e.g., if PM2.5 is correlated with other variables that are measured or estimated with error, so that including PM2.5 in the regression reduces prediction error due to uncertainties about those variables;  Residual confounding, e.g., if older people tend to live in more polluted areas and, independently, to have higher mortality rates, but age is only measured in wide categories such as "% of people aged 65 or older";  Use of surrogate variables, e.g., "Average winter temperature" since 2000, rather than more causally relevant variables such as low temperatures in the months of COVID-19 in 2020;  Unmodeled interactions or dependencies among variables, e.g., if PM2.5 modifies or is modified by variables such as humidity and temperature that affect respiratory illnesses and COVID-19 mortality; A positive regression coefficient explained by one or more of these sources does not provide evidence that reducing PM2.5 would reduce mortality risk. We do not conclude from the foregoing considerations that PM2 headline.) This need not mean that the conclusion is false, but it does mean that the conclusion is not implied by the data and regression analyses from which it is said to be derived. Likewise, attempts to use weight of evidence (WoE) judgments to synthesize all relevant evidence across studies risk producing conclusions of dubious validity if they do not correct such errors and biases in the individual studies being synthesized. Repeating errors many times (as when many investigators fit generalized linear regression models to many different data sets while ignoring key confounders, nonlinearities, interaction terms, etc. in each case) can produce consistency without making results any less erroneous or increasing weight of evidence for a genuine effect. The analyses presented here illustrate that such errors are easy to avoid using modern data science methods, such as non-parametric trees (or ensembles) to avoid model specification errors and to incorporate nonlinearities and interactions; and tests of conditional independence in BNs (or CART trees) to identify potential confounders that should not be omitted. In the examples presented, these methods show that a statistically significant C-R regression coefficient "linking" PM2.5 to COVID-19 mortality risk could be an artifact of omitted variables and improperly specified parametric regression modeling; for the example in Table 3 , this significant positive coefficient disappears when these errors are remedied using non-parametric methods (Figure 7) . Austin Bradford Hill, is being challenged by another paradigm based on statistical procedures to distinguish between association and causation."Perhaps the most fundamental prescription of the causal paradigm is to recognize that statistical "links" such as positive regression coefficients (or relative J o u r n a l P r e -p r o o f risks greater than 1, or positive attributable risks and burdens of disease, and so forth) are neither more nor less than indicators of statistical association, which do not necessarily or usually provide relevant evidence about causation (Pearl 2009 ). Modern methods such as the Bayesian network in Figure 2 can help to discover what evidence a data set does provide that some variables depend, directly or indirectly, on others. For example, Figure 2 suggests that latitude and longitude have direct effects on COVID-19 mortality risk (meaning, effects in addition to those mediated by the other variables in Figure 2 ). This discovery might not have been anticipated intuitively by an investigator, leading to the omission of these confounders, as in Wu et al (2020) . Such computer-assisted discoveries from data may assist, but not replace, the scientific work of formulating testable predictions about whether and how much changes in some variables affect changes in others, and then testing these predictions against new data and reporting the results. If COVID-19 mortality risk appears to be conditionally independent of PM2.5 in non-parametric analyses with adequate power to detect even relatively small effects (Figure 4) , then parametric regression modeling that imposes assumptions on the data sufficient to create a positive regression coefficient for PM2.5 (Tables 2 and 3) should not be construed as evidence that changing PM2.5 would change COVID-19 mortality risk. But neither should it preclude a search for alternative hypotheses, backed by empirical testing, that better explain the observations. For example, for the same average PM2.5 concentration over the past 20 years, do counties with constant or increasing PM2.5 levels over time have significantly earlier first dates of COVID-19 mortalities than counties with PM2.5 levels that decreased over time? Figure 2 leaves open this possibility, and additional research might pursue it further. The question of what scientific hypotheses are worth investigating further is surely a proper matter for expert judgment. Interpretation of regression coefficients as evidence that reducing exposure would reduce risk is not. Thus, we conclude that empirical testing of predictive generalizations against data should not be skipped in favor of applying judgment to regression coefficients to draw policy-relevant causal conclusions. Regression coefficients simply do not provide the information needed to determine -or to make sound judgments about -whether or to what extent they are likely to be causal (Table 3 ). Judgment that seeks to bridge the gap between association and causation based on positive regression coefficients, as in BenMap and its applications (Cromar and Ewart 2017) , is akin to a We collected data from many sources, including most of those cited by Wu et al. (2020) , but with alternate authoritative sources for temperature, humidity, and cases/deaths data. We used more recent data for PM2.5, demographics, temperatures, and cases/deaths, and added further sources or fields. For example, we collected USDA county level economic characterizations along with various county attributes compiled by the UC Berkeley Yu Group (2020). Figure 4 ) and click on "Tree" to generate CART trees (we used the tree generated by the rpart package, as this is older and better documented than the partytree package). The CAT software provides links to documentation on the algorithms and R packages used; book-length treatments are also available (e.g., Cox LA Jr., Popken DA, Sun RX. Causal Analytics for Applied Risk Analysis. Springer, 2018) . A short introduction to Bayesian network learning, random forest, and CART algorithms is Cox (2018) . In the CAT software, Click on "Importance" to generate random forest importance plots (Figure 1) , and "Bayesian" to generate a BN using the bnlearn R package (Figure 2 ) (Nagarajan et al. 2013) . The "Bayesian" option allows constraints to be entered on possible arrow directions. To generate Figure 2, we specified Longitude and Latitude as sources (only outward-pointing arrows allowed) and DeathRate100k as a sink (only inward-pointing arrows allowed), since latitude and longitude cannot be effects of other variables (but might be causes), and death cannot be a cause of other variables (but might be an effect). Table 1 Table 2 Table 3 J o u r n a l P r e -p r o o f Journal Pre-proof Correlation between environmental pollution indicators and COVID-19 pandemic: A brief study in Californian context RTransferEntropy -Quantifying information flow between different time series using effective transfer entropy Nonparametric independence testing via mutual information Nonparametric tests for conditional independence using conditional distributions Interventions to reduce ambient air pollution and their effects on health: An abridged Cochrane systematic review Air pollution reduction and mortality benefit during the COVID-19 outbreak in China Implications of nonlinearity, confounding, and interactions for estimating exposure concentration-response functions in quantitative risk analysis Should health risks of air pollutants be studied scientifically? Global Epi Improving causal determination Modernizing the Bradford Hill criteria for assessing causal relationships in observational data Concentration-response associations used to estimate public health benefits of less pollution are not valid causal predictive models Has reducing fine particulate matter and ozone caused reduced mortality rates in the United States? Constructing theories in communication research Reply: Concentration-Response Associations Used to Estimate Public Health Benefits of Less Pollution Are Not Valid Causal Predictive Models A note on the correctness of the causal ordering algorithm Science and regulation. Particulate matter matters Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition EPA clean air panel chair dismisses his oil industry ties, slams Harvard study on air pollution and COVID risks Review of causal discovery methods based on graphical models Don't abandon evidence and process on air pollution policy Investigating causal relations by econometric models and cross-spectral methods Invariant causal prediction for nonlinear models Effect of ambient air pollutants and meteorological variables on COVID-19 Bayesian Networks in R with Applications in Systems Biology Commentary on "Should health risks of air pollution be studied scientifically Causal inference in statistics: An overview Martinus Nijhoff, The Hague Causal ordering and identifiability Robust causal inference using directed acyclic graphs: the R package 'dagitty' Air pollution science under siege at US environment agency Top EPA adviser attacks agency decision-making ahead of major review of air pollution standards Constraint-based causal discovery from multiple interventions over overlapping variable sets Modeling air pollution, climate, and health data using Bayesian Networks: A case study of the English regions The theory of prediction Exposure to air pollution and COVID-19 mortality in We thank five anonymous reviewers and the Editor-in-Chief for their helpful comments and suggestions.These encouraged us to explicitly list how the principles of the scientific paradigm in the introduction can be applied using CART trees, random forest, and Bayesian networks; explain why our critique of regression models is not a universal condemnation of modeling, but only a criticism of clearly identifiable and avoidable errors such as omitting variables on which exposure and response both depend, or fitting generalized linear main-effects models to data with important nonlinearities and interactions; and clarify that conditional independence tests provide information about evidence of possible causality even if not all arrows in a Bayesian network model have clear causal interpretations.We appreciate the reviewer's questions and comments which contributed to a more explicit discussion of these and other methodological points. The authors have no competing interests to declare