key: cord-0028110-e41hz0zm authors: Erickson, Chrystal L.; Barron, Ileana G.; Zapata, Isain title: The effects of hydraulic fracturing activities on birth outcomes are evident in a non-individualized county-wide aggregate data sample from Colorado date: 2021-10-07 journal: J Public Health Res DOI: 10.4081/jphr.2021.2551 sha: 3658b20a47ab3ca77d21a02b3344ca24667bde85 doc_id: 28110 cord_uid: e41hz0zm Background: There is growing concern about the recent increase in oil and gas development using hydraulic fracturing. Studies linking adverse birth outcomes and maternal proximity to hydraulic fracturing wells exist but tend to use individualized maternal and infant data contained in protected health care records. In this study, we extended the findings of these past studies to evaluate if analogous effects detected with individualized data could be detected from non-individualized county-wide aggregated data. Design and methods: This study used a retrospective cohort of 252,502 birth records from 1999 to 2019 gathered from a subset sample of 5 counties in the state of Colorado where hydraulic fracturing activities were conducted. We used Generalized Linear Models to evaluate the effect of county-wide well density and production data over unidentified birth weight, and prematurity data. Covariates used in the model were county-wide statistics sourced from the US Census. Results: Our modeling approach showed an interesting effect where hydraulic fracturing exposure metrics have a mixed effect directional response. This effect was detected on birth weight when well density, production and their interaction are accounted for. The interaction effect provides an additional interpretation to discrepancies reported previously in the literature. Our approach only detected a positive association to prematurity with increased production. Conclusions: Our findings demonstrate two main points: First, the effect of hydraulic fracturing is detectable by using countywide unidentified data. Second, the effect of hydraulic fracturing can be complicated by the number of operations and the intensity of the activities in the area. Hydraulic fracturing is a highly contentious topic. There is great fear and anxiety over the implications of this unconventional oil drilling practice on air and groundwater pollution. In Colorado, this concern that has led to several recent local and statewide ballot measures to either restrict wells a certain distance from homes and schools, or to stop this practice all together. 1, 2 Hydraulic fracturing involves drilling vertically then horizontally for several miles to access shale embedded with natural gas or coalbed methane. During the process, a pressurized mixture of sand, water, and proprietary fracking fluid is injected into wellbores, fracturing the rock and unlocking trapped hydrocarbons. 3 Many of these chemicals used as hydraulic fracturing fluids are known carcinogens and have been linked to reproductive or developmental toxicity. 4, 5 According to the U.S. Geological Survey (USGS), since the end of the 20th century, the use of this technique to produce oil and gas from previously unproductive formations has dramatically increased, which has pushed hydraulic fracturing and related processes into regions where oil and gas had not previously been produced. 6 The evolution and prevalence of this drilling practice is worldwide as an alternative to coal mining for energy production. 7 With this relatively new proliferation of unconventional drilling, only recently have studies begun to emerge evaluating evidence of long-term negative health effects of hydraulic fracturing. 4, 5, [8] [9] [10] [11] This research has shown there are negative environmental and health impacts for those residing in areas where hydraulic fracturing takes place. These negative effects include higher illness rates, negative birth outcomes, and other changes to morbidity and mortality when compared to areas where there is no fracking. 4, 5, 12 The only report evaluating such effect in Colorado found positive associations between density and proximity to natural gas wells within a 10-mile radius of maternal residence and prevalence of congenital heart disease and possibly neural tube defects. 9 However, several other studies evaluating the effects of hydraulic fracturing on birth outcomes, including, birth weight in other states and Canada; have had mixed results. 12, 13 The majority of studies have reported a negative association that varies in effect size, [14] [15] [16] [17] but contradictory results that include positive associations with certain caveats have also been reported in addition to Colorado. 18, 19 Birth weight of an infant is an important determinant of its chances of survival and healthy growth and development. 20 Because birth weight is conditioned by the health and The proliferation of hydraulic fracturing for oil and natural gas production has led to an increase in interest in the public health impact of this industry. Research in this field can be complicated due to data accessibility and concerns of privacy violations. In this study we focus on the assessment of maternal health outcomes while considering data privacy. The main goal of our study was to evaluate the potential of using non-individualized, county-wide data to detect the effects of hydraulic fracturing activities on birth outcomes. This goal was achieved by using county-wide exposure metrics of hydraulic fracturing well density and production and by adjusting to known demographic covariates sourced from Census data. Our study provides and alternate approach to evaluate health effects of hydraulic fracturing activities and provides additional evidence highlighting the complicated effects associations that should be considered in further studies. nutritional status of the mother, the proportion of infants born with low birth weight closely reflects the health status of the communities into which they are born and thus is used as a health indicator by the CDC to assess the health of the nation. 21 A normal term birth weight in the United States is between 2,500 and 4,000 grams. 22 It is unclear how evidence of association between birthweight and hydraulic fracking can still be inconclusive when recent cohort studies have included very large sample sizes of over a million 15 and close to 3 million 18 births along with very accurate covariate data to adjust for confounding variables. In contrast, prematurity has been more consistently associated to hydraulic fracturing activities where a higher exposure is associated with increased risk of preterm birth 12, 18 with only a single report showing no association. 17 A common strategical pattern in these studies is the use of well proximity to the mother's site of residence as a proxy estimator of exposure. This proxy estimator approach has provided very important evidence to associate hydraulic fracturing activities to healthrelated outcomes; however, the use of such strategy requires birth location data along with the mother's and infant's clinical data, which can potentially identify cohort subjects causing privacy issues. Privacy issues are at this moment a very strong public concern that require urgent real and equitable solutions. 23 Unfortunately, public concerns about the handling of protected data reduces a research team's access to valuable data. 24, 25 For this reason, and in view of the healthcare burden and public health concern for hydraulic fracturing activities, we took a similar approach to previous studies but only using non-identifiable county-wide data to explore the wide utility of the methodology. The use of nonidentifiable data can reduce the burden for researchers, facilitating and accelerating discovery in the field while addressing privacy concerns. Therefore, the objective of this study was to evaluate generalized effects of hydraulic fracturing exposure from aggregated birth weight and gestational age data (measured as increased risk of additional weeks of premature birth) collected from a subset sample of 5 counties across the state of Colorado. The approach presented in this study opens up the rationale for creative research approaches that can facilitate research advancement in a way that does not invade the privacy of individuals. Our study was designed to evaluate the utility of non-identifiable county-wide birth weight and prematurity records with generalized production metrics as proxies for hydraulic fracturing exposure data. This evaluation was performed while adjusting for confounding demographic variables sourced from generalized census data by county. Since our study was designed with the goal of being able to detect birth weight and prematurity effects based on county-wide data, which could be expected to be noisier in comparison to individualized data sets, only a small subset of counties in the state were included. The five Colorado counties included in the study, Adams, Baca, Garfield, Moffat, and Weld, covered a wide range of characteristics in terms of population metrics and hydraulic fracturing activities. Counties like Adams and Weld counties are semi-urban and densely populated while counties like Baca are rural and very sparsely populated. The geographical location of these counties is presented in Figure 1 . Our study was vetted by the Institutional Review Board (IRB #2020-0023). Although the information is publicly available, a formal request of data was submitted to the Colorado Department of Public Health and Environment's Vital Birth Statistics registry. Specifically, annual, county-wide birth statistics for the five Colorado counties included birth weight (in grams), estimated gestational age at birth (in weeks) and the sex of the infant. Each data set included all babies born between 1999 and 2019 calendar years. No additional statistics or personal data associated with the mothers or infants was attached to this data. The initial dataset included 277,837 births; incomplete records and cases of extremely prematurity (<28 weeks) were excluded from the study. The exclusion of extreme premature babies from the study was used to avoid adding into the data any bias associated to survivability of these babies. This is because survival of babies born at an extremely premature term are highly dependent on access to a neonatal intensive care unit which is a variable not considered in the study, 26 which can also be associated to complex demographic factors. 27 The final data set included 252,502 births (90.88% of the initial data set). For the purpose of adjusting birth weight models, we used preterm birth as a covariate which is defined as a live birth delivered before 37 completed weeks of gestation. However, in our models we used prematurity defined as the preterm weeks by subtracting the estimated gestational age from the maximum value in the data set (45 weeks). This conversion helped the estimation by inverting the scale direction (higher value = higher prematurity), focusing the range to what is relevant for interpretation purposes (as the viability of a birth is greatly reduced with increased prematurity). Hydraulic fracturing well data was collected through the Colorado Oil and Gas Conservation Commission's website (available at: https://cogcc.state.co.us/data.html#/cogis). This information is publicly available. Data collected included: Number of wells in active production and total gas production by county reported yearly. Data was matched by both county and year for the same time range of 1999 to 2019 as in the birth outcomes dataset. Well density was calculated by year per county by dividing the number of wells by the total surface area of the county. Production values were log transformed prior to analysis to reduce scale issues. Confounding demographic data that has been previously associated to birth weight and prematurity risk were compiled from the 2000 and 2010 US Census reports and from the 2019 American Community Survey (ACS) (data available at https://data.census.gov/cedsci/). All covariate data is publicly available. Data gathered included the following categories: Population, which included total county population and population density (calculated in the same manner as well density, total population divided by total surface area of the county); Age, which included the percentage of population under 5 years of age, percentage of population under 18 years of age and percentage of population over 65 years of age; Gender which only include the female percentage ratio; Race, which included the percentage of households that identify as Caucasians (alone), African-Americans, Asians, Native Americans in addition to Hispanics (although Hispanic is not a race but an ethnicity); Education, which included the percentage of the population that completed high school and the percentage of the population that completed a Bachelor's level degree or above; Income, which included adjusted household income (income by household adjusted to 2019 dollars to address inflation) and percentage of population at poverty level (defined by Federal guidelines Data was analyzed using Generalized Linear Models. Dependent variables were birth weight (in grams) and prematurity (as risk for additional preterm weeks). To reduce collinearity issues by category, each covariate was assessed by including it as a single independent factor in the model and selecting the best performing p-value (smallest value within the category). Variables selected by this single independent factor in the model approach were the same for both models. The birth weight model was set as Gaussian distributed since the original units for this variable are continuous and normally distributed; the prematurity model was set as Poisson distributed since the original values are discrete counts with a Poisson distribution. Preliminary models for birth weight evaluated the additive interaction effects of Well density*Production and Sex*Term (Term defined as preterm if the estimated gestational age was below 37 weeks otherwise defined as normal). Significant values for the interaction terms in these preliminary models justified their inclusion in the final model (P≤0.05). For birth weight, the final model estimate is expressed in the original units (grams) and defined as follows: In a similar manner, the prematurity final model was defined through preliminary models where the inclusion of the additive interaction effect of Well density*Production was evaluated. The Sex*Term interaction was not evaluated for prematurity since the variable Term duplicates the dependent variable. For prematurity, the final model estimate is expressed in hazards ratio of one-week increase and defined as follows: All modeling analyses and descriptive statistics were performed in SAS v9.4 (SAS institute, Cary NC) though PROC MEANS and PROC GLIMMIX. Significant differences were declared at an α threshold of 0.05. Our study used a total of 252,502 birth records spanning from 1999 to 2019 in five counties, Adams, Baca, Garfield Moffat and Weld, that allow for hydraulic fracturing activities in the state of Colorado. These counties represent a wide range of demographic characteristics that occur in the state of Colorado. These demographic characteristics span all categories measured in the study, which are presented in detail in Supplementary Table 1 . Birth outcomes are presented in Table 1 . We observed mean birth weights to significantly vary in some counties (p=3.35E-126), along with mean estimated gestational age at birth which also significantly varied in some counties (p=9.91E-32), no significant variation of sex ratios by county was observed (p=0.1104). However, the observed female percent ratios were all significantly biased towards a lower female proportion and deviated from the expected 50% (p-value range for Binomial probability of 0.0217 for Baca County to 7.86E-19 for Adams County) Last, preterm ratios did show significant differences by county (p=5.20E-8). All these differences are reported although they are not in the scope of our study to evaluate. Exposure to hydraulic fracturing activities was evaluated in our study through the evaluation of production and well density by county per year. Yearly trends for these two-exposure metrics are presented in Figure 2 . The production output of these 5 counties varied through the years where mean values were within 3 orders of magnitude from the least producing Baca County with an average 1,710,339 units per year to Garfield County with an average 416,260,300 units per year. Similarly, well density varied within a range of 0.0618 to 4.7536 active wells per square mile as yearly average. Among our five counties evaluated we observed mixed increases and decreases in production and well density through the span of the studied timeframe. It is noteworthy to mention that well density does not directly explain production as the most productive county (Garfield) has 45% lower average well density than the densest county (Weld). Single factor Generalized Linear Models were used for a preliminary evaluation of covariate inclusion for birth weight and prematurity. These covariates were always significant in these models (data not shown) although some categories became non-significant as they were included in the full model along with all other covariate categories. For the birth weight model (Table 2) , we successfully detected a strong positive association to exposure variables (well density and production) using county-wide, non-individualized data, which achieves part of our main goal. This association possess a very interesting situation since the well density by production interaction was significant but contributed to the model with a negative parameter estimate. The mix of positive and negative contributions of the two exposure variables is suggestive of a complex association to exposure where not only the proximity, but the amount of activity is important for estimating health effects. Although two recent studies have incorporated production output into their evaluations, 18,28 interaction effects have not been explicitly reported besides main effects. Only one of such studies by van Tran in 2020 18 evaluated birth weight, finding small mixed associations when comparing rural to urban locations while also accounting for active and inactive wells. In the literature, the most common exposure metric used to evaluate the effect of hydraulic fracturing activities is well proximity to the mother's residence, which is determined by geographical location. In addition to the previous studies, other studies have incorporated well density into the equation 9, 16, 17, 19 with some of them reporting mixed results; however, by not incorporating additional metrics to account for the production intensity, no further explanation of the results could be offered. Some covariate categories in the full model were also significant, including sex of the infant, Term (denominated as preterm or normal at birth) and gender ratio as the strongest effect variables based on absolute values of parameter estimates. The covariate category of Income that represented the percentage of the population at poverty level was also significant but contributed marginally to the model. For the prematurity model (Table 3) , our approach was also successful in detecting significant associations using countywide, non-individualized data. For this model, we observed a different pattern when compared to the association seen with the birth weight data where only production was significantly associated to prematurity while well density and its interaction with production were not. Although not all variables in the exposure category were significant, parameter estimate effect directions were the same as in birth weight with the main effects positively associated while the interaction effect was negatively associated. Our findings closely resemble previous reports where increased preterm risk is consistently associated with exposure to hydraulic fracturing activities. 9, 16, 18, 19, 28 For this model, demographic covariates more consistently remained as statistically significant when included in the full model. Among those categories that remained significant, sex of the infant was strongly associated while gender ratio showed a modest effect based on the parameter estimates. Population, age, education and income categories were also significantly associated, although their contributions to increased risk were much smaller. In addition to presenting significant associations and parameter estimates for both of the full models in Tables 2 and 3, we present an outcome scenario table based exclusively on exposure (Table 4 ). This table presents an empirical visualization of exposure on birth weight and prematurity. This table was constructed using parameter estimates presented in Tables 2 and 3 and minimum and maximum values for the full dataset. For well density, the minimum and maximum values were 0.0062573 and 7.0582524 active wells per square mile respectively, while for production, they were 328,894 and 955,469,444 respectively. In this table, it is clear how hydraulic fracturing increases birth weight when well density is in the low end (of ~100 gr difference increase) when comparing lowest versus highest production. This difference is reversed when well density is on the high end (of ~400 gr difference decease) when comparing the lowest versus the highest production. The negative effect of the interaction between these parameters is the source of this discrepancy. A normal birth term cohort model (parameter estimates not shown) is also presented in Table 4 as a comparison, this cohort presents an analogous scenario. A very similar Article Table 3 . Prematurity model results. Type 3 test for fixed effect outcomes and parameter estimates. Table 4 . Outcome scenarios based exclusively on exposure. Parameter estimates for exposure variables were used to calculate outcome scenarios using the data set's lowest and highest values. This outcome scenario allows for visualization of the effect of well density, production and the interaction effects. For birth weight, all three effects were significant in the full model while for prematurity, production was the only significant effect. pattern also occurs for prematurity when following the same rationale, although it must be emphasized that for this model only production was significant. Our findings suggest that by evaluating interaction effects in exposure models, researchers can better define complex relationships that often lead to associations with counterintuitive effect directions. The main goal of our study was to explore the use of non-individualized, county-wide data to estimate the effects of hydraulic fracturing on birth weight and increased prematurity risk. Our study was conceived as the starting point and an alternative to current mainstream methods. The approach we describe can be further refined to incorporate more specialized and detailed metrics, which are likely to yield more precise estimates. For this reason, the intention with this study was not to emphasize directly the parameters estimated but to demonstrate how this approach can be a viable alternate method for analysis. The approach presented here can be useful when privacy concerns and the handling of protected data are likely to be a limitation of future studies. 24, 25 Findings in our study are limited by the representativeness of the sample used. In Colorado, an estimated 74% of counties are classified as rural but only 13% of the total population in the state lives in those counties 29 . For this reason, the counties included in our study were not randomly selected. The five counties sampled were selected for being demographically and geographically similar to population parameters for the whole state. In the sampling considered here, a large proportion of the data (91%) came from Adams and Weld counties, which are two largely suburban and semi-urban counties that are included in the Denver-Aurora Combined Statistical Area. In contrast Baca, Garfield and Moffat counties are largely rural, noting that, by volume, Garfield County is a major producer of natural gas in the state. The diversity of hydraulic fracturing practices along with their demographics and population proportions make the 5 counties chosen a good representation of the state. The representativeness of our sample can be debated and for that reason we avoid specific parameter estimate inference being the focus of our study. Despite the inherent study limitations and challenges posed by using non-identified data, it is noteworthy to mention that this study closely corroborates the findings of the only previous study performed in Colorado by McKenzie et al. in 2014 9 where mixed effect associations were detected for birth weight along with a positive association to preterm birth risk using detailed location data along with mother's and the child's information. The use of non-individualized county-wide aggregated data likely has implications in the signal to noise ratio, which was expected to be larger in our approach. The larger noise that comes from imprecise aggregate data is a welcome challenge for demonstrating the concept we present in this study since being able to detect an effect using lower resolution data dampens the signal and, if detected, implies a strong association. Unfortunately, such premise has also a negative side, because our approach is likely to decrease statistical power, which can be detrimental when precise estimations of weaker effects are required. Additionally, the implementation and validity of our approach is predicated on the premise of a large series of related studies; these studies provide a precedent that is necessary to evaluate and judge the outcomes of this new approach. Due to the limitations discussed previously, this new concept cannot replace traditional mainstream approaches but provides an additional option to explore the data. The main goal of our study was to evaluate the potential of using non-individualized, county-wide data to detect the effects of hydraulic fracturing activities on birth outcomes. We achieved this main goal by detecting strong associations between county-wide exposure metrics of well density and production and by adjusting to known demographic covariates that were sourced from Census derived data. More specifically, birth weight was found to be positively associated to well density and production but negatively associated to their interaction effect. This mixed effect direction association to exposure parameters provides an interpretation to the mixed outcomes reported by previous studies. In contrast, we only detected a strong positive association between production and increased prematurity risk, which is concordant with previous studies. In summary, our study provides and alternate approach to evaluate health effects of hydraulic fracturing activities and provides additional evidence highlighting the complicated effects associations that should be considered in further studies. Partisanship and proximity predict opposition to fracking in Colorado Local Government Fracking Regulation: A Colorado Case Study The process of unconventional natural gas production: hydraulic fracturing A Systematic review of the epidemiologic literature assessing health outcomes in populations living near oil and natural gas operations: Study quality and future recommendations Unconventional oil and gas development and risk of childhood leukemia: Assessing the evidence When did hydraulic fracturing become such a popular approach to oil and gas production? Accessed Shale gas: Analysis of its role in the global energy market Childhood hematologic cancer and residential proximity to oil and gas development Birth outcomes and maternal residential proximity to natural gas development in rural Colorado Relationships between indicators of cardiovascular disease and intensity of oil and natural gas activity in Northeastern Colorado Congenital heart defects and intensity of oil and gas well site activities in early pregnancy Systematic review of the association between oil and natural gas extraction processes and human reproduction Density and proximity to hydraulic fracturing wells and birth outcomes in Northeastern British Columbia, Canada Fracking and infant mortality: fresh evidence from Oklahoma Hydraulic fracturing and infant health: New evidence from Pennsylvania Maternal residential proximity to unconventional gas development and perinatal outcomes among a diverse urban population in Texas Perinatal outcomes and unconventional natural gas operations in Southwest Pennsylvania Residential proximity to oil and gas development and birth outcomes in California: A retrospective cohort study of 2006-2015 births Unconventional natural gas development and birth outcomes in Pennsylvania The contribution of low birth weight to infant mortality and childhood morbidity United Health Foundation. Natality public-use data. America's Health Rankings analysis of CDC WONDER Online Database International Notes Update: Incidence of low birth weight On privacy in the age of COVID-19 Information privacy research: An interdisciplinary review A review of big data and medical research Survival of infants born at periviable gestational ages Racial/ethnic disparities in neonatal intensive care: A systematic review Drilling and production activity related to unconventional gas development and severity of preterm birth The State of Health in Rural Colorado We would like thank Dr. Amanda Brooks for her assistance in improving the quality of our manuscript.Ethics approval and consent to participate: The study was vetted by the Rocky Vista University Institutional Review Board (IRB #2020-0023). This study used non identifiable data that is available in the public domain. Informed consent: Not applicable. The data used to support the findings of this study are available from the corresponding author upon request.