key: cord-1004243-vvo6gk93 authors: Sy, Karla Therese L.; White, Laura F.; Nichols, Brooke E. title: Reproducible Science Is Vital for a Stronger Evidence Base During the COVID‐19 Pandemic date: 2021-11-16 journal: Geogr Anal DOI: 10.1111/gean.12314 sha: 1bf2b759aedebb9df9cec37df0748066e3e5d1e3 doc_id: 1004243 cord_uid: vvo6gk93 Reproducible research becomes even more imperative as we build the evidence base on SARS‐CoV‐2 epidemiology, diagnosis, prevention, and treatment. In his study, Paez assessed the reproducibility of COVID‐19 research during the pandemic, using a case study of population density. He found that most articles that assess the relationship of population density and COVID‐19 outcomes do not publicly share data and code, except for a few, including our paper, which he stated “illustrates the importance of good reproducibility practices”. Paez recreated our analysis using our code and data from the perspective of spatial analysis, and his new model came to a different conclusion. The disparity between our and Paez’s findings, as well as other existing literature on the topic, give greater impetus to the need for further research. As there has been near exponential growth of COVID‐19 research across a wide range of scientific disciplines, reproducible science is a vital component to produce reliable, rigorous, and robust evidence on COVID‐19, which will be essential to inform clinical practice and policy in order to effectively eliminate the pandemic. to different conclusions, and examine inconsistent results across studies. Reproducible science fosters greater collaboration across disciplines, improves scientific discourse, and strengthens scientific evidence. However, in recent years, the scientific community has established that there is a "reproducibility crisis" (Sayre and Riegelman 2019) . In 2016, a survey of researchers found that more than 70% have tried and failed to reproduce another scientist's experiments, and more than 50% failed to recreate their own (Baker 2016) . This lack of scientific rigor and robustness has consequences for the reliability of research findings, and the extent to which the public should trust research. In particular, during the current COVID-19 pandemic, there has been near exponential growth of COVID-19 literature and evidence across a wide range of scientific disciplines (Brainard 2020) . Reproducible research becomes even more imperative as we build the evidence base on SARS-CoV-2 epidemiology, diagnosis, prevention, and treatment. In his study, Paez assessed the reproducibility of COVID-19 research during the pandemic, using a case study of population density (Paez 2021) . He found that most publications that assessed the association of population density and COVID-19 outcomes fell short of the gold standard of reproducible research. Currently, the literature for the association of population density and COVID-19 outcomes remains inconclusive, as several studies in various geographic settings have come to opposing conclusions. He found that different researchers used a range of statistical techniques that could account for these varying findings. Reproducibility of studies to verify findings or reanalyze the data could provide greater insight into why associations differ across studies. Unfortunately, Paez found that most articles that assess the relationship of population density and COVID-19 outcomes do not publicly share data and code, except for a few, including our article (Sy and White 2021), which he stated "illustrates the importance of good reproducibility practices". He was able to recreate our analysis using our code and data. Additionally, he also reanalyzed our data from the perspective of spatial analysis, and his new model came to a different conclusion. The perspective that Paez introduces to our work is an example of the importance of reproducibility. He empirically demonstrated that scientific results can have different conclusions depending on model choice and specifications, and that the discrepancy between his and our analysis likely stemmed from different modeling choices, which would not have been apparent had we not made our code and data available. It is well-known that variation in statistical models and assumptions used can vastly change the results, and analytic choices usually depend on the scientists' field of research. When 29 research groups were asked to answer a research question on the same data set, the effect sizes ranged from OR = 0.89 to OR = 2.83 (Silberzahn et al. 2018) . Reproducible science also helps scientists across different fields work together in order to address various knowledge or method gaps, which promotes and fosters more rigorous interdisciplinary research. In this case, his expertise in spatial analysis added to our expertise in infectious disease epidemiology, and offered an alternative conclusion to our findings. In his article, Paez noted that the mixed linear models in our analysis were an appropriate modeling choice, but indicated two potential limitations that he attempted to address with spatial models. He addressed the potential non-random sample selection with Heckman's selection model with spatial filtering, and replaced the log-transformation of population density with a quadratic expansion. In our study, we conducted several sensitivity analyses to assess the robustness of our results to address any potential limitations in our model; however, we state that sampling could be an issue, "we had to only include counties that had sufficient case data in order to accurately estimate R 0 ". When Heckman's selection model was used, Paez found that "the coefficient for population density is still positive, but the magnitude changes: in effect, it appears that the effect of density is more pronounced than what [Sy, White, and Nichols] Model 3 indicated.", which corresponds to what we predicted in our discussion section "if we included all counties, the true association between population density and R 0 would likely be greater than what we report in our analysis given our findings that the counties excluded in the analysis had a significantly lower density and expected very low R 0 due to lack of cases." His selection model confirmed our prediction about the direction of the potential selection bias. Moreover, decisions in how variables are operationalized could also be a reason for different conclusions. Paez's method of transforming the population density variable allowing for non-monotonic changes to the relationship between population density and the basic reproductive number offered an alternative conclusion, where higher density is not always associated with greater risk of disease spread. The disparity in findings between our and Paez's findings give greater impetus to the need for further research on this topic. As noted in our article, "recent literature has been conflicting, where some research also suggests a density-dependence of COVID-19 transmission (Rocklöv and Sjödin 2020; Rubin et al. 2020) and other measures of the severity of the outbreak (Anand et al. 2020; Wong and Li 2020) , while other research suggests that there are other factors that can better explain the pandemic ". Aside from differences in model specifications, other potential reasons for the variation in conclusions may be due to (a) unmeasured confounding, (b) data quality issues that cause misclassification, (c) selection bias, or (d) other unknown biases. These sources of bias are ubiquitous in observational studies, and researchers do their best to mitigate and limit these potential sources of bias that obscure the true association between exposure and outcome. Reproducible research provides an avenue to critically examine published study results for potential sources of bias that were not properly accounted for. The COVID-19 pandemic is an unprecedented time for scientific research, with experts from various fields working together to rapidly improve our understanding of SARS-CoV-2. Reproducible science is a vital component to produce reliable, rigorous, and robust literature on COVID-19, which will be essential to inform clinical practice and policy in order to effectively eliminate the pandemic. Prevalence of SARS-CoV-2 Antibodies in a Large Nationwide Sample of Patients on Dialysis in the USA: A Cross-sectional Study 1,500 Scientists Lift the Lid On Reproducibility Scientists are Drowning in COVID-19 Papers. Can New Tools Keep Them Afloat? Recommendations to Funding Agencies for Supporting Reproducible Research Longitudinal Analyses of the Relationship Between Development Density and the COVID-19 Morbidity and Mortality Rates: Early Evidence from Metropolitan Counties in the United States Does Density Aggravate the COVID-19 Pandemic? Reproducibility of Research During COVID-19: Examining the Case of Population Density and the Basic Reproductive Rate from the Perspective of Spatial Analysis High Population Densities Catalyse the Spread of COVID-19 Association of Social Distancing, Population Density, and Temperature With the Instantaneous Reproduction Number of SARS-CoV-2 in Counties Across the United States Replicable Services for Reproducible Research: A Model for Academic Libraries Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results Population Density and Basic Reproductive Number of COVID-19 Across United States Counties Spreading of COVID-19: Density Matters