key: cord-0910067-572g21m3 authors: Turner, Nicholas A; Pan, William; Martinez-Bianchi, Viviana S; Panayotti, Gabriela M Maradiaga; Planey, Arrianna M; Woods, Christopher W; Lantos, Paul M title: Racial, Ethnic, and Geographic Disparities in Novel Coronavirus (SARS-CoV-2) Test Positivity in North Carolina date: 2020-09-08 journal: Open Forum Infect Dis DOI: 10.1093/ofid/ofaa413 sha: 9e34be3d8ab4528476cfd5f70afe3e418f5f661b doc_id: 910067 cord_uid: 572g21m3 BACKGROUND: Emerging evidence suggests that Black and Hispanic communities in the United States are disproportionately affected by coronavirus disease 2019 (COVID-19). A complex interplay of socioeconomic and healthcare disparities likely contribute to disproportionate COVID-19 risk METHODS: We conducted a geospatial analysis to determine whether individual and neighborhood level attributes predict local odds of testing positive for SARS-CoV-2. We analyzed 29,138 SARS-CoV-2 tests within the 6 -county catchment area for Duke University Health System from March to June 2020. We used generalized additive models to analyze the spatial distribution of SARS-CoV-2 positivity. Adjusted models included individual-level age, gender, and race, as well as neighborhood level ADI, population density, demographic composition, and household size RESULTS: Our dataset included 27,099 negative and 2,039 positive unique SARS-CoV-2 tests. The odds of a positive SARS-CoV-2 test were higher for males (OR 1.43, 95% CI 1.30-1.58), Blacks (OR 1.47, 95% CI 1.27-1.70), and Hispanics (OR 4.25, 955 CI 3.55-5.12). Among neighborhood-level predictors, percent Black population (OR 1.14, 95% CI 1.05-1.25) and percent Hispanic population (OR 1.23, 95% CI 1.07-1.41) also influenced the odds of a positive SARS-CoV-2 test. Population density, average household size, and area deprivation index were not associated with SARS-CoV-2 test results after adjusting for race CONCLUSIONS: The odds of testing positive for SARS-CoV-2 were higher for both Black and Hispanic individuals, as well as within neighborhoods with a higher proportion of Black or Hispanic residents – confirming that Black and Hispanic communities are disproportionately affected by SARS-CoV-2 Coronavirus disease 2019 (COVID- 19) was first reported in the United States in January 2020. In less than one month, cases had been confirmed in all 50 states.(1) As of June 30, 2020, 2,545,250 cases and 126,369 deaths had been reported. (2) Emerging data suggests particular racial and ethnic groups in the United States population may be disproportionately affected by the pandemic. For example, surveys of hospitalization data gathered by the Centers for Disease Control (CDC) found that Black patients comprised 33% of COVID-19 related hospitalizations despite representing just 18% of the catchment population.(3) Similarly, a recent report from the Baltimore-Washington, DC region found >40% of Hispanics were positive. (4) Geographic, racial, and socioeconomic disparities in disease risk have implications for pandemic mitigation, suppression, and surveillance strategies. Disproportionate comorbidity burdens may increase the risk of disease or adverse outcomes among the most vulnerable. Access to medical evaluation may be hindered by proximity to healthcare facilities, access to reliable transportation, and differences in insurance. Financial strain may hinder the ability of individuals to adhere to social distancing and stay at home orders, and fear of exposure may inhibit sick individuals from seeking timely medical care. To investigate the potential influence of geographic and racial disparities on the likelihood of COVID-19 disease, we conducted a geospatial analysis of SARS-CoV-2 test results using clinical testing data from the Duke University Health System (DUHS). We hypothesized that the spatial distribution for the probability of having a positive test result would be heterogenous with test positivity being more likely among residents living in urban, low income, and minority communities. A c c e p t e d M a n u s c r i p t 4 This study was determined exempt by the Duke University Health System Institutional Review Board. Waivers of informed consent and HIPAA were granted. SARS-CoV-2 nucleic acid amplification testing data were obtained from the electronic health records of patients within the Duke University Health System (DUHS). The DUHS clinical sites include three inpatient hospitals and many outpatient facilities. We queried all patients whose record included a test whose name included the terms "COVID" or "SARS" from March 11, 2020 (the date of the first test performed) through June 26, 2020. Our unit of analysis was by unique individual test. Tests were de-duplicated according to the following rules: For multiply tested individuals, we only included one test every 14 days counting sequentially from their first test. For any individual who tested positive we included their first positive test in our analysis, but did not include subsequent test results after this first positive subsequent tests. Of the subjects identified, we obtained the test result, date of the test, date of birth, gender, self-reported race, self-reported ethnicity, and the longitude and latitude coordinates of their residential address. Ethnicity was consolidated into the categories "Hispanic," "Not Hispanic," and "Unavailable." Individual race was grouped into the categories "Black," "White," "Asian," "Multiracial," "Other," and "Unavailable." The category "Other" included individuals who had selfreported their race as "Other," as well as a small number of individuals who identified as "Native American," "Alaska Native," "Pacific Islanders," and "Native Hawaiians." We excluded a small number of records that were missing gender or date of birth. SARS-CoV-2 test results were dichotomized as Positive or Negative. We also excluded tests that were either not performed or that For ease of computation and interpretability, numeric variables (household size, population density, age, and day) were centered on 0 by subtracting their mean, then scaled by dividing by their standard deviation. Variables that were already on a percent scale (percent Black population and percent Hispanic population) or a percentile scale (Area Deprivation Index) were centered on 0 by subtracting 0.5. We chose to limit our analysis to those subjects whose address fell in one of six counties in North Carolina: Durham, Chatham, Orange, Person, Granville, or Wake. DUHS is the major health system within Durham County, and the DUHS catchment extends into the city of Raleigh. Within the city of Raleigh as well as in the southwestern extent of our study area, the DUHS patient catchment overlaps with other health systems from which we did not have access to patient records. Thus the density of patient address locations declined with increasing distance from Durham. To maximize data density and to rationally exclude spatial outliers, we used ArcGIS to perform a two standard deviational ellipse. This method draws the smallest possible ellipse that contains 95% of all data points. Thus, our analysis was limited to just those subjects whose address fell both within the sixcounty study region and within this ellipse. Our primary model was a logistic generalized additive model (GAM). GAMs are regression models that use nonparametric polynomial functions to model nonlinear relationships between independent variables and an outcome variable of interest (9) . We used the statistical programming language R (www.r-project.org) and the brms and mgcv packages (9) (10) (11) . Mgcv is a comprehensive package for the specification of GAMs. Brms, through its dependency on mgcv, allows the construction of Bayesian GAMs that are sent to the program Stan (www.mc-stan.org) for sampling of the posterior probability distribution. The response variable in our models was the binary result of COVID testing (negative vs positive); our individual-level linear predictors were gender, race, ethnicity, and test date (expressed as day of the year); and our neighborhood-level linear predictors were average household size, population density, percent Black, percent Hispanic, and Area Deprivation Index percentile. We used a tensor product thin plate spline of longitude and latitude to model geographic heterogeneity of COVID testing results in 2-dimensional geographic space. Tensor product splines allow for different degrees of smoothness or wiggliness in the x (longitude) and y (latitude) dimensions. We also chose to use thin plate splines for patient age and for day, with the foreknowledge that availability of testing and in particular for testing for pediatric subjects varied temporally. Thus, age and day represented a varying testing landscape and not merely a reflection of SARS-CoV-2 epidemiology. For this stage of model selection, we used mgcv, which uses maximum likelihood estimation and which estimates models very quickly compared with its Bayesian counterpart, brms. A key parameter for GAMs is the number of "knots," or junctions between smoothing polynomial segments. We selected the number of knots through a trial and error process, incrementally increasing the number of knots and comparing models using analysis of deviance. The number of knots for our final model was ultimately chosen once model performance no longer improved with increasing knot numbers. A c c e p t e d M a n u s c r i p t 7 For our final model, which was estimated using brms, we chose loosely regularizing priors for our fixed parameters, selecting normal distributions with mean 0 and standard deviation 1. Default priors were accepted for smoothed terms, which were a minimally informative Student t distribution. We ran two models: a partially adjusted model that included only geographic coordinates, subject age, and date of test; and a fully adjusted model that also included individual gender, race, and ethnicity, the block group-level predictors, a nested random intercept term for tract and block group. Among 417 duplicate tests from 265 unique subjects, only their first positive test was included in our analysis. There were 4569 tests among 3122 subjects who only had negative tests; analysis of these test results was limited to one test every 14 days, counting in intervals from their first test To evaluate spatial statistical trends, we predicted our fit models onto a longitude-latitude grid covering the entire geographic area of interest. Census data were matched by block-group. We expressed our results as a local odds ratio (OR), which was computed by dividing local odds by the average odds for the entire study area. We defined a local OR as "significant" where there was at least a 95% probability that the local odds differed from the mean odds for the entire study area. We used contours to circumscribe these areas, using red and blue to denote areas with significantly higher or lower OR, respectively. Table 1 . Block-group matched census traits stratified by race are presented in Table S1 (supplemental materials). the temporal trends in SARS-CoV-2 positivity were fairly constant over time among most racial and ethnic groups. However, we observed a dramatic increase in the Hispanic population between May and June 2020 (Figure 2) . Hispanic individuals were slightly over-represented among those with missing data, representing 13.8% of those with missing data versus 8.7% of those with complete data. The COVID-19 positivity rate was also slightly higher among subjects with missing address or demographic data relative to subjects with complete data (9.3% versus 6.0%) On inspection for temporal trends in SARS-CoV-2 positivity stratified by race/ethnicity, we observed a dramatic increase in proportion testing positive among Hispanic individuals which was particularly pronounced between May and June 2020 (Figure 2) . Table 2 . Gender, race, ethnicity, and age were associated with the probability of a positive SARS-CoV-2 test. The odds of a positive SARS-CoV-2 test were higher for males ( The odds that a SARS-CoV-2 test would be positive were spatially heterogeneous, with the local odds ratio of a positive test ranging from 0.17 to 3.03 (Figure 3) . In the cities of Durham and Raleigh, there were areas with a significantly high OR of a positive test. We identified several smaller areas in Person, Orange, Chatham, and Wake Counties where the OR of a positive test was significantly low. Adjustment for both individual and areal variables blunted the overall OR range to 0.64 to 1.34 and abrogated the high and low OR clusters seen in our unadjusted model. Although early reporting suggests the potential for racial disparities in COVID-19 disease burden, published data and formal epidemiologic studies are limited to date. Most published geospatial analyses have been conducted at larger spatial scales, and have analyzed data aggregated at the county or state level. In this study, we examined the association between both individual and geographic predictors of a positive SARS-CoV-2 test. It is widely recognized that the COVID-19 pandemic has disproportionately affected racial and ethnic minorities, and this is documented in an emerging body of literature. Our study is unique in its use of individual location data to evaluate not only individual variables, but also the effect of neighborhood variables and location itself. The OR of testing positive was increased across a range of minority groups -most notably Blacks, Hispanics, and those reporting a multiracial background. Neighborhood level variables representing racial and ethnic composition were also associated with a greater OR of a positive SARS-CoV-2 test. The spatial distribution of testing results revealed a higher OR of a positive test in the urban centers of Durham and Raleigh. This corresponds closely to racial and ethnic segregation within these communities, and accounts for why the effect of location was blunted by adjustment for individual race and ethnicity. Household size, area deprivation index, and population density were not clearly associated with individual SARS-CoV-2 testing results, but our models indicate that there remains substantial unmeasured variance at the neighborhood level. It is likely that many exposures, including nutrition, A c c e p t e d M a n u s c r i p t 10 number of people in the home, housing quality, wealth, education, and healthcare access, produce an environment of disparate health risk in segregated neighborhoods. Our findings are consistent with other early reports noting an increased burden of COVID-19 disease among Blacks and Hispanics (4, (12) (13) (14) (15) (16) (17) (18) . In particular, a similar sharp increase within Hispanic communities was recently described in the Baltimore-Washington, DC area, slightly preceding the time period examined here (4). A complex interplay of socioeconomic factors and structural disparities across multiple levels (environment, occupation, housing, multi-generation living arrangements, education, transportation) likely contribute to increased risk (19) . The COVID-19 pandemic exhibits a disparity among minorities that is well documented with numerous other health conditions, including diabetes mellitus, hypertension, and cardiovascular disease (20, 21) . Other recent county-level geospatial analyses have found correlations between higher rates of air pollution, unemployment, and uninsured status among the minority communities most affected by . It is well-documented that healthcare service access is patterned by race and socioeconomic status, and these inequities further influence access to testing and clinical outcomes (23) . Of particular relevance with COVID-19, minorities may be disproportionately represented in service industries considered essential during the pandemic -placing them at elevated risk of exposure to SARS-CoV-2. Still more troubling are the potential implications for the immigrant communities where fear of deportation may further hinder access to testing and appropriate healthcare, household occupancy is often higher, and the pressure to continue working even more severe (24) . The delayed but dramatic increase in SARS-CoV-2 test positivity rates among Hispanic individuals was not associated with any specific geographic or occupational setting, and we are left to speculate on how the explosive emergence of COVID-19 in this population came about. It is most likely that COVID-19 cases among Hispanic individuals increased simultaneously in geographically discontinuous areas. This could be understood by socially segregated networking within the Hispanic A c c e p t e d M a n u s c r i p t 11 community, for instance among geographically separated family members or shared meeting spaces such as churches and workplaces that draw from several discontinuous neighborhoods. While outbreaks have previously been reported within churches, nursing facilities, congregate living settings, and prisons, the emergence we observed in the local Hispanic population seems unlikely to be related to any of these. (25) (26) (27) Lack of close geographic case clustering argues against a typical point source (as might be seen with a church, prison or congregate setting) and the majority of Hispanic individuals testing positive for SARS-CoV-2 appear to be young (median age 33.5, IQR 21.5-46.5), community-dwelling individuals. Similar community outbreaks affecting young healthy individuals have been reported among workers in essential industries where social distancing might not be feasible (e.g., meat packing workers or warehouse workers) or where exposure is an occupational hazard (e.g., healthcare workers) (28) . Our study does carry several limitations. Our geospatial patient locations were extracted from electronic medical records, and we could not verify our subjects' addresses. A patient's residential address is usually not their sole location, and cannot account for their exposures away from the home. Our use of neighborhood-level risk factors is limited block-group level resolution; this is the smallest level census unit in which robust demographic data are made public, but block groups do not tend to correspond to real-world neighborhood definitions and are variable in shape, area, and their relationship with neighboring block groups. Perhaps most importantly, many of the same limitations to healthcare access among marginalized and minority populations might also limit our assessment of these communities in particular; in other words, the highest risk communities may be undertested compared with more affluent areas. Thus, it could be that our work underestimates the abundance of positive SARS-CoV-2 tests within minority communities that still lack access to testing. Factors contributing to COVID-19 risk are complex, but emerging data suggest Black and Hispanic populations are at elevated risk. Further research with more detailed, prospective A c c e p t e d M a n u s c r i p t 12 collection of subject-specific, clinical and socioeconomic data will be needed to dissect out the drivers of increased COVID-19 risk among minorities. While ongoing research will take time, urgent action is needed on the part of healthcare providers, public health officials, and government leaders to assure the protection of the most vulnerable populations amid this rapidly evolving pandemic. Moreover, enhanced risk awareness in vulnerable communities may increase demand for testing and improve the palatability of risk mitigation strategies. M a n u s c r i p t A c c e p t e d M a n u s c r i p t 17 Geographic Differences in COVID-19 Cases, Deaths, and Incidence -United States Centers for Disease Control (CDC). Hospitalization Rates and Characteristics of Patients Hospitalized with Laboratory-Confirmed Coronavirus Disease 2019 -COVID-NET, 14 States SARS-CoV-2 Positivity Rate for Latinos in the Area Deprivation Index Databases Introduction of an Area Deprivation Index Measuring Patient Socioeconomic Status in an Integrated Health System: Implications for Population Health Health Innovation Program. Area Deprivation Index. UW Health Innovation Program Neighborhood Disadvantage is Associated with High Cytomegalovirus Seroprevalence in Pregnancy Generalized additive models : an introduction with R Package 'mgcv' : Mixed GAM Computation Vehicle with GCV/AIC/REML Smoothness Estimation brms: An R Package for Bayesian Multilevel Models Using Stan COVID-19 and African Americans This Will Come as No Surprise…. The American journal of medicine Social Vulnerability and Racial Inequality in COVID-19 Deaths in Chicago. Health education & behavior : the official publication of the Society for Public Health Education Racial demographics and COVID-19 confirmed cases and deaths: a correlational analysis of 2886 US counties Coronavirus disease 19 in minority populations of Epidemiology of the 2020 Pandemic of COVID-19 in the State of Texas: The First Month of Community Spread Disparities In Outcomes Among COVID-19 Patients In A Large Health Care System In California Moving Health Education and Behavior Upstream: Lessons From COVID-19 for Addressing Structural Drivers of Health Inequities Prevalence of Diabetes by Race and Ethnicity in the United States Cumulative Incidence of Hypertension by 55 Years of Age in Blacks and Whites: The CARDIA Study Assessing Differential Impacts of COVID-19 on Black Communities Racial and ethnic differences in access to medical care Covid-19: Black people and other minorities are hardest hit in US COVID-19 in Correctional and Detention Facilities -United States High SARS-CoV-2 Attack Rate Following Exposure at a Choir Practice -Skagit County Assessment of SARS-CoV-2 Infection Prevalence in Homeless Shelters -Four Among Workers in Meat and Poultry Processing Facilities -19 States A c c e p t e d M a n u s c r i p t The authors report no commercial financial conflicts with the presented work.Author contributions: PML, WP, and NAT: data acquisition and statistical analysis. CWW, PML, WP, NAT, VSMB, AMP and GMMP: writing, editing, and manuscript preparation. . Access to data and data analysis: Nicholas Turner and Paul Lantos had full access to the data in the study and take responsibility for the integrity of the data and accuracy of the data analysis. The content of this manuscript is original, and has not been previously published. A c c e p t e d M a n u s c r i p t 20 Figure 3 : Spatial distribution of COVID-19 testing results. The study area depicted is a 6-county area around Durham, NC. The elliptical shape that intersects the study area was a 2-standard deviational ellipse, the smallest possible ellipse containing 95% of the subject locations. The odds of a positive test were modeled using the home address coordinate locations of individual subjects as a smoothed, 2-dimensional independent variable. These models were then predicted on a dense grid of coordinate pairs covering the study area. The local OR, depicted in the color background, was computed by dividing the odds at each coordinate pair in the prediction grid by the average odds. Areas circumscribed by high (red) or low (blue) contours are those in which the local OR has at least a 95% probability of differing from the average. Areas with the highest OR in our unadjusted model included the cities of Durham and Raleigh. Adjusting for individual and neighborhood variables eliminated much of the geographic heterogeneity in OR.