key: cord-1004234-91zftk60 authors: Kobilov, Botir; Rouen, Ethan; Serafeim, George title: Predictable country-level bias in the reporting of COVID-19 deaths date: 2021-08-11 journal: Journal of Government and Economics DOI: 10.1016/j.jge.2021.100012 sha: d039f96d525ab0665a64dc3c1da7cc5e6131e818 doc_id: 1004234 cord_uid: 91zftk60 We examine whether a country's management of the COVID-19 pandemic relate to the downward biasing of the number of reported deaths from COVID-19. Using deviations from historical averages of the total number of monthly deaths within a country, we find that the probability of underreporting of COVID-related deaths for countries with the most stringent policies was 58.6%, compared to a 28.2% for countries with the least stringent policies. Countries with the lowest ex ante healthcare capacity in terms of number of available beds underreport deaths by 52.5% on average, compared to 23.1% for countries with the greatest capacity. probability of underreporting of COVID-related deaths for countries with the most stringent policies was 58.6%, compared to a 28.2% for countries with the least stringent policies. Countries with the lowest ex ante healthcare capacity in terms of number of available beds underreport deaths by 52.5% on average, compared to 23.1% for countries with the greatest capacity. Keywords: Government policy, crisis management, reporting, incentives As severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spread across the globe in 2020, researchers and governments devoted significant resources to estimating the fatality ratio and understanding the efficacy of various mitigation efforts. With healthcare resources in short supply in various countries, policy makers also attempted to model pressure on healthcare systems based on the spread of the virus and the ex ante healthcare capacity of the countries (Verity et al. 2020; Verelst, Kuylen, and Beutels 2020) . These studies largely relied on governments' reporting of the number of deaths resulting from coronavirus disease 2019 , and assumed that the reported information was unbiased, meaning that any errors in reporting were exogenous to the policies being implemented or to the healthcare capacity of a country (Flaxman et al. 2020 ). This assumption of unbiased data is unlikely to be true, given that the institutional structures of countries have been shown to impact government reporting of measures of accountability. Prior research suggests that governments could manipulate public statistics for political gain. For example, Milesi-Ferretti (2003) suggests that fiscal rules can encourage "creative accounting" to portray budgets in ways that benefit politicians. Dafflon and Rossi (1999) studies how governments could alter their public accounting data to meet the requirements for European Union membership. In addition, several papers document the propensity of governments to manipulate financial data to disguise the magnitudes of deficits (von Hagen and Wolff 2006; Buti, Nogueria, and Turrinin 2007) . While academics and the popular press have reported significant variation in the number of deaths during the pandemic that are unexplained by both COVID-19 and historical trends (i.e., deaths that are likely related to COVID-19 but go unreported as such), we know of no attempt to formally study the role that political incentives or country capacity play in this propensity to underreport (e.g., Viglione 2020; Wu, McCann, and Peltier 2020; The Economist 2020a) . This lack of understanding is particularly concerning given the abundance of attention paid to the relation among government policies, healthcare capacity, and deaths related to COVID-19 (e.g., Fuller et al. 2021) . In this paper, we examine two factors that may encourage governments to bias downward the number of deaths related to COVID-19. First, we examine the relation between the stringency of the policies adopted to curtail the spread of the virus and the propensity to underreport. Policies such as lockdowns have caused protests against politicians, likely increasing pressure on those politicians to document that the policies were effective (Carothers and Press 2020) . Second, we examine whether a country's ex ante healthcare capacity, measured as the number of hospital beds per 1,000 residents, is associated with underreporting of COVID-19 deaths. Low healthcare capacity may represent poor planning or an inability of the government to care for its people, suggesting that governments unable to meet the demands placed on their healthcare system by the pandemic could be inclined to underreport COVID-19 deaths. Moreover, a negative relation between healthcare capacity and probability of underreporting could arise if a lack of hospital beds is correlated with a country's inability to keep track of and thereby classify a death as COVID-related. However, this resources explanation could generate a positive association with underreporting as well, given that in countries with low resources, deaths might be classified as COVID-related even when they are not. When we use a variable that serves as a proxy for those resources, measured as the number of physicians per 1,000 people, we find no association between underreporting and this proxy, mitigating concerns that the number of hospital beds is solely a proxy for healthcare resources and is unlikely to be related to bias. We find statistically and economically significant results consistent with our conjectures. Using deviations from historical averages of the total number of monthly deaths within a country, we find that the probability of underreporting COVID-related deaths in countries that took the most stringent steps to curtail the spread of the was 58.6% on average, compared to 28.2% for countries with the least stringent policies. Countries with the lowest ex ante healthcare capacity in terms of number of available beds underreport deaths by 52.5% on average, compared to 23.1% for countries with the greatest capacity. These results point to the need to account for the misreporting of outcomes when evaluating the efficiency of government policies and preparedness that were adopted across countries. The remainder of this paper is structured as follows. Section I provides a description of the data, variable construction, and research design. Section II documents our empirical results. Section III concludes. To calculate our main variables of interest, the underreporting of COVID-19 deaths at the country level, we begin by measuring the baseline expected monthly deaths for 51 countries where data are available. We first obtain data from the New York Times (NYT) that reports for 33 countries the number of deaths from all causes during the pandemic, as well as total deaths for prior years. We augment these data with similar data for 10 additional countries reported by Eurostat. We also hand collect from governmental and statistical agencies these data for an additional eight countries. that are not classified as such, but they could also arise because of random variation across time within a country. We note that negative values could arise because other causes of deaths (e.g., accidents, flu-related deaths) might have decreased in 2020 or because of random variation across time. As we discuss later, we use several different measures of underreporting, including binary variables that require the unexpected excess death measure to be more than 1, 2, or 5%, and a continuous measure to show that our results are robust to various specifications. OxCGRT collects information at the daily level on the actions taken by governments to curtail the spread of COVID-19. The goal of this data collection process is to develop a proxy for the intensity of a government's lockdown policies during the pandemic. To that end, OxCGRT identifies nine relevant containment and closure policies that governments took and creates variables at the daily level for each of these, scaling them from 0 to 100, where 0 means that the government has implemented no policy, and 100 means that the government implemented the most stringent policy possible (e.g., if all schools in a country are closed, then a country would have a school closure policy score of 100). The policies OxCGRT considers are school closures, workplace closures, cancelations of public events, limits on gatherings, public transportation closures, stay-home orders, restrictions in movement domestically and internationally, and presence of public health information campaigns (see Hale et al. 2020 for a detailed description of the data). These variables are then summed to create a daily index of the total stringency of current policies for every country. We then take the average index for each country on the 15 th day of the month to create country-month pairs, 3 calculated as: We refer to this index in our empirical results as the variable Stringency Index. 4 From the "Our World in Data" COVID-19 dataset, we collect our measure of healthcare capacity, Hospital beds, measured as the number of hospital beds per 1,000 people in the country in 2018, the most recent year for which the data are available. To improve the predictive power of the model and the precision of our estimates, and to address omitted variable bias, we include several control variables in our regressions. Governments in countries with greater interconnection with the global economy and larger trade flows with partner countries may have different reporting incentives than less economically connected economies. In addition, countries with greater connectedness to the global economy might be at greater risk for exposure to COVID-19. To account for this variation, in our regressions we control for trade as a share of gross domestic product (GDP) for each country as measured in 2019, the latest year with available data (Krugman 2008; Khan, Nallareddy, and Rouen 2020). The United Nations estimates that 90% of all reported COVID-19 cases come from urban areas (United Nations 2020). Since the virus spreads through person-to-person contact, we control for the percentage of a country's population living in urban areas in 2019 to ensure that our results are not driven by country-level variation in this population. We note that countries with more rural populations might be less able to account accurately for COVID-19 related deaths, leading to underreporting. In addition, approximately 10.5% of patients with pre-existing cardiovascular disease (CVD) who contract COVID-19 die, compared with only 0.9% of patients who are otherwise healthy (Ritchie et al. 2021 ). Therefore, we control for the percentage of the population with CVD in 2017, the most recent year for which the data are available, to ensure that our results are not driven by this related factor. Moreover, we include a control variable measuring the number of days since the reporting of the first COVID-19 death in each country since the pandemic began to spread at different times across countries, and this might affect unexplained excess deaths. Including this variable in our models allows us to make inferences that control for the different stages of the pandemic in each country. Importantly, the ability to diagnose and record deaths related to COVID-19 is directly related to a country's testing capacity, which varied significantly across time and across countries. 5 We include a country's monthly testing capacity, measured as the number of tests per 1,000 residents, as a control to alleviate concerns that the results are driven by how many people are being tested. These data are downloaded from OurWorldinData, supplemented with hand 5 See https://coronavirus.jhu.edu/testing/international-comparison for detailed data on country-level testing capacity. collection of excluded countries. 6 Lastly, because the Stringency Index varies within a country over time, this allows us to also include country fixed effects to accounting for time-invariant country correlated omitted variables in some specifications. Given that the healthcare capacity variable is time invariant within the period of our study, we cannot include country fixed effects in the models that include healthcare capacity. All variables and their data sources are provided in Appendix A. When predicting if a country is underreporting deaths and the country's management and exposure to the pandemic are given as features, the response variable has two values, underreporting or not. This type of response can be analyzed using a binomial logistic regression, where the response variable has two values 0 and 1, with 1 representing underreporting. We designate a country-month observation taking the value of 1 if unexplained excess deaths is more than 1%. We also test the robustness of our results using 2% and 5% thresholds to avoid the influence of observations that might have an increase in deaths but which might be small and therefore more likely due to luck or other factors. In our analysis we are interested in finding whether a country has incentives to underreport fatalities given the explanatory variables, not the magnitude of that underreporting. Hence, we focus on logistic regressions. However, as a robustness test, we perform OLS regressions and report coefficients along with logistic regressions using a continuous measure of unexplained excess deaths. Since logistic regression coefficients are reported in log odds, and in order to have economically meaningful interpretation, we derive marginal effects at means for a select set of values of the Stringency Index and healthcare capacity to show how the predicted probability of underreporting changes as the levels of the Stringency Index and healthcare 6 Despite hand-collecting efforts, testing capacity was not available for all countries, so we lose some observations in this analysis. In addition, for most countries, testing capacity in the early months of 2020 were represented as nulls in the data. We assume that testing in the first three months of the year were 0, but our results are robust to excluding these months in our analysis. capacity change. As described below, Figures 2 through 4 plot those marginal effects (regression results are reported in table 2 through 4 and table 6 ). We cluster standard errors at the country level to mitigate serial correlation in the error term within countries. During the pandemic, which first appeared in China in December 2019 and has since spread to 219 countries, there has been significant country-level variation in the deviation from the historical average in the number of monthly deaths unattributed to the pandemic ; The Economist 2020a; Viglione 2020; World Health Organization 2020). Our data sources, at the country level, are reported in Table 1 , Panel A. For the 51 countries for which monthly historical death rates are available, we find, on average, 3,140 unexpected deaths from January through December of 2020. On the extreme end, the United States had 50,876 unexpected deaths during this period (see Table 1 , Panel B, for descriptive statistics). Figure 1 shows for each country in our sample its highest monthly ratio of total reported deaths to historical expected deaths. For example, Bolivia and Ecuador had 2.5 times as many deaths (unattributed to COVID) than what historical averages would predict for the same months, while Japan and Bulgaria had almost no unexpected deaths. Some behaviors during the pandemic, such as quarantining at home, may have reduced deaths due to factors such as the spread of other illnesses like seasonal flu (The Economist 2020b; Oguzuglu 2020). Therefore, this increase in unexpected deaths is likely attributable, at least in part, to COVID-19 but were not reported as such. This misreporting could be due to random error, as countries unprepared for the outbreak also may have been unable to devote resources for sufficient analysis of causes of death. Alternatively, it could be due to systematic biases related to the management of the outbreak or a country's capacity to handle the outbreak, and thereby its exposure to the outbreak. We further examine the potential for systematic and predictable bias in the sections that follow. Epidemiological responses to contain the outbreak of SARS-CoV-2, such as travel restrictions and geographical lockdowns, have had large negative impacts on countries' economies. As a result, 400 million jobs have been lost worldwide, and global gross domestic product in 2020 was expected to decline by 4.9% (International Labour Organization 2020; International Monetary Fund 2020). Still, there is growing evidence that the economic sacrifice of lockdowns has played a vital role in curtailing the spread of the virus (Hale et al. 2020) . Given the economic consequences of the actions taken to curtail the spread, political leaders may have been incentivized to underreport deaths from COVID-19 to provide evidence that they handled the response to the pandemic properly. We find support for this conjecture. Figure 2 documents that, for a country that has taken few actions to curtail the spread of SARS-CoV-2 (i.e., a Stringency Index of 10 out of 100), the probability of underreporting the number of deaths related to COVID-19 by 1% or more is 28.2%. Alternatively, countries implementing the most stringent policies to curtail the virus's spread are two times more likely to underreport deaths related to COVID-19 (i.e., the likelihood of underreporting is 58.6%). As reported in Table 2 , these findings are robust to various specifications, including using a continuous measure of underreporting, adding a host of countrylevel control variables, changing the dependent variable from being equal to 1 when the total number of unexpected deaths greater than 1% of total deaths to being equal to 1 when the total number of unexpected deaths is greater than 5% of total deaths, and using a continuous dependent variable of total unexpected deaths. In Columns 1 through 7 of Table 2 , we also include a calendar control, measured as the number of days since the first reported death related to COVID-19. We include our control for testing capacity in columns 8 and 9. Given that testing increased over time, these two variables have a correlation of 0.53, so in this table, as well as tables 3 and 4, we do not include both in the same specification to mitigate concerns about multicollinearity. 7 Importantly, the marginal probability estimates for underreporting, reported both in Figure 2 and in We next examine whether the exposure risk of a country's population to SARS-CoV-2, based on a country's healthcare capacity, is related to underreporting of total deaths from COVID-19. The number of hospital beds is frequently used as an indicator of a country's ability to cope with COVID-19 and provide care for its citizens (Cavallo, Donoho, and Forman, 2020) . Therefore, governments in countries with fewer hospital beds per capita are incentivized to underreport the number of deaths related to COVID-19 since political leaders risk being blamed for deaths that could be perceived as avoidable were it not for an inadequate healthcare system. In addition, countries with the lowest healthcare capacity are also less likely to be able to 7 When we include both Calendar control and Testing capacity in the same regression, the statistical significance on the coefficient on Stringency index is reduced. appropriately diagnose all causes of death when the healthcare system is taxed, thereby giving rise to a predictable negative relation between number of beds and probability of underreporting. As reported in Figure 3 and Tables 3 and 6, we find a negative relation between the likelihood of underreporting deaths related to COVID-19 and the number of hospital beds. The probability of underreporting COVID-related deaths in countries with fewer than 100 beds per 100,000 people is 52.5%, compared to 23.1% in countries with the highest ex ante healthcare capacity. As in our prior analysis, these results remain unchanged when changing the specification or adding country-level controls. These additional results are reported in Table 6 . Similar to the results for the Stringency Index, the marginal probability estimates for underreporting, reported in Figure 3 and in Table 6 , decrease monotonically as healthcare capacity increases. It is plausible that there exists an endogenous relation between a country's management of the virus outbreak and its exposure to the virus given that ex ante healthcare capacity is likely to influence the actions a country takes to address the spread of the virus. For example, it might be the case that countries with more hospital beds would take less stringent measures as they can better cope with infected people. To mitigate this concern, Figure 4, panel A reports the relation between unexpected deaths, country management of the virus, and country exposure to the virus. The Stringency Index has a mediating effect on the relation between exposure and underreporting, but the intuition remains the same. Specifically, as reported in Figure 4 and Table 4 , healthcare capacity remains directionally consistent and predictive of underreporting in most specification, but the statistical significance decreases (in column 2 of Table 4 , the coefficient on Hospital beds is significant at the 5% level, while it is significant at the 10% level in columns 1 and 3). Importantly, the Stringency Index remains statistically significant across all specifications. As shown in Figure 4 , panel B, for countries with the lowest healthcare capacity, the probability of underreporting remains above 40%. In our main analysis, we use a panel of country-months to examine the relation between government policies and the underreporting of COVID-19-related deaths. To better show how these policies relate to underreporting over time we conducted detailed withincountry tests. Panel A of Table 2 reports logistic regression with time fixed effects (and country fixed effects in column 2) and suggests that, within country, increasingly stringent policies are associated with underreporting. 8 In Panel B of Table 2 , we also report this fixed effects specification for the continuous measure of misreporting and again find that our inferences do not change. To examine the relation between lockdown stringency and underreporting, we conduct a quasi-event study. We create an indicator equal to 1 if a country's Stringency Index is above 20 (the level at which policies start going into effect) in a given month and 0 otherwise. We estimate the following event-study with standard two-way fixed effects: where is a continuous measure of misreporting, are country fixed effects, are time fixed effects (month), is indicator variable as mentioned above (Treat t *Post t ), is main coefficient of interest identifying the average treatment effect on the treated, and is time varying control (i.e., total monthly deaths, including those that are COVID-related and those that are unattributed to . Table 5 reports the results. In the column 1, remains equal to one for the entire duration after the Stringency Index takes a value of 20 or greater. In the second column, we limit the sample to four months for each country, from two months before and to two months after the Stringency Index first becomes 20 or greater. In both columns, the coefficient on is positive and significant, providing evidence that within countries, as COVID-related lockdown policies become more stringent, government underreporting of COVID-related deaths increases. Our results are subject to two important caveats. First, our measure of excess deaths assumes that total deaths are reported accurately throughout time. Second, our measure of excess deaths assumes that changes in behavior during the pandemic do not influence the frequency with which other types of death are expected to occur. While the first caveat is less of a concern, given that it is easier to misclassify the cause of death than it is to not report that death at all, the second is unlikely to hold as deaths from behaviors like driving have been influenced by the pandemic and other infectious diseases have spread less rapidly as more people have stayed home due to lockdowns (The Economist 2020b; Oguzuglu 2020). Still, these changes in behavior are likely to bias against our results, given that total deaths unrelated to COVID-19 are likely to decrease during this period. Overall, our results suggest that incentives and response capacity to the pandemic shape reported outcomes. This finding has implications for researchers who seek to understand the efficacy of actions taken by or characteristics of countries that might mitigate or exacerbate deaths related to COVID-19. Our results point to the need to account for misreporting of reported outcomes to assess the efficacy of these actions and characteristics more accurately. This table reports information about the data used to create the variables in our main analysis. Panel A lists the countries used in the analysis, the years used to calculate the expected number of deaths based on the historical average, the geographic level at which the data were measured, and the source from which the data were collected. For the countries with asterisk, data was complemented with United Nations Demographics data and countryspecific statistical agency. Panel B reports descriptive statistics for the variables used in the analysis. All variables are defined in Appendix A. This table reports the relation between lockdown stringency and underreporting, in a quasi-event study. We estimate following event-study with standard two-way fixed effects: where D it is an indicator equal to 1 if a country's Stringency Index is above 20 the level at which policies start going into effect) in a given month and 0 otherwise, are country fixed effects, are time fixed effects (month), is indicator variable as mentioned above (Treat t *Post t ), is main coefficient of interest identifying the average treatment effect on the treated countries and is a time-varying control (monthly deaths). In column 1, remains equal to one for the entire duration after the Stringency Index takes a value of 20 or greater. In the second column, we limit the sample to four months for each country, from two months before to two months after the Stringency Index first becomes 20 or greater. In both specifications, standard errors are clustered at the country level and reported in parentheses. All variables are defined in Appendix A. This table reports marginal probabilities derived from logistic regressions and shows the predicted probabilities of underreporting COVID-19-related deaths by at least 5% for each decile of the Stringency Index and healthcare capacity (Hospital beds). Ranks in Columns 1 and 3 are in ascending order, with 10 representing the most stringent policies to curb the spread of the pandemic. Ranks in columns in 2 and 4 are in descending order, with 10 representing the lower number of hospital beds per 1,000 residents. Column 1 reports coefficients derived from running the logistic regression as in Column 4 of Table 2 . Column 2 reports coefficients derived from running logistics regression in Column 3 of Table 3 . Columns 3 repeats the analysis as in Column 1, but controls for hospital beds, and column 4 repeats the analysis in Column 2 with the Stringency Index as an additional control. In all specifications, standard errors are clustered at the country level and reported in parentheses. All variables are defined in Appendix A. From deficits to debt and back: Political incentives under numerical fiscal rules The global rise of anti-lockdown protests -and what to do about it Hospital capacity and operations in the coronavirus disease 2019 (COVID-19) pandemic-panning for the Nth patient Public accounting fudges towards EMU: A first empirical survey and some public choice considerations The COVID-19 pandemic is worse than official figures show The southern hemisphere skipped the flu season in 2020 Estimating effects of non-pharmaceutical interventions on COVID-19 in Europe Mitigation policies and COIVD-19-associated mortality -37 European Contries Variation in government responses to COVID-19 ILO monitor: COVID-19 and the world of work World economic outlook update The role of taxes in the disconnect between corporate performance and economic growth Trade and wages, reconsidered Good, bad or ugly? On the effects of fiscal rules with creative accounting COVID-19 Lockdowns and Decline in Traffic Related Deaths and Injuries Mortality risk of COVID-19 Policy brief: COVID-19 in an urban world Estimates of the severity of coronavirus disease 2019: A model-based analysis Indications for healthcare surge capacity in European countries facing an exponential increase in coronavirus disease (COVID-19) cases How many people has the coronavirus killed? What do deficits tell us about debt? Empirical evidence on creative accounting with fiscal rules in the EU Coronavirus disease COVID-19) : Situation report -161 Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China 207,000 missing deaths: Tracking the true toll of the coronavirus outbreak