key: cord-0952233-y6o6z989 authors: Gómez, Juan-Pedro; Mironov, Maxim title: Using Soccer Games as an Instrument to Forecast the Spread of COVID-19 in Europe date: 2021-02-23 journal: Financ Res Lett DOI: 10.1016/j.frl.2021.101992 sha: 9342fcb990004d0a9afc9f69890ff295d0a8a46e doc_id: 952233 cord_uid: y6o6z989 We provide strong empirical support for the contribution of soccer games held in Europe to the spread of the COVID-19 virus in March 2020. We analyze more than 1,000 games across 194 regions from 10 European countries. Daily cases of COVID-19 grow significantly faster in regions where at least one soccer game took place two weeks earlier, consistent with the existence of an incubation period. These results weaken as we include stadiums with smaller capacity. We discuss the relevance of these variables as instruments for the identification of the causal effect of COVID-19 on firms, the economy, and financial markets. There is anecdotal evidence that soccer games have contributed to the spread of the COVID-19 pandemic in Europe. 1 In this paper, we provide strong empirical support for this conjecture and discuss the implications of our findings for the identification of the causal impact of COVID-19 on firms, the economy, and financial markets. Although it makes sense to assume that the original outbreak of the pandemic in China at the end of 2019 is exogenous, this becomes a more questionable assumption for the propagation of cases across countries and regions in Europe during the first quarter of 2020. For instance, the uninstrumented number of cases, especially at the beginning of pandemic, is likely to overestimate the incidence of COVID-19 in well-connected versus remote cities. 2 Similarly, cities and regions with more inhabitants and higher population density are likely to experience faster virus spread (Rocklöv and Sjödin (2020) ). On the one side, these regions tend to accumulate a higher percentage of firms and human capital, hence making any correlation between the number of cases and firm variables (like productivity, growth, solvency, or liquidity) potentially spurious. On the other side, these regions are likely to concentrate more economic and medical resources to detect and counterattack the pandemic. Thus, the raw number of COVID-19 cases might capture the inverse quality of the regional health system, which is likely correlated with firm performance and regional growth. To overcome these endogeneity issues, we propose four variables related to soccer games played across European regions from 10 countries during the first quarter of 2020. These variables constitute a novel and valuable instrument to explore the causal effect of COVID-19 infections on firm performance, management decisions, and the economy. Methodologically, the exclusion restriction is well founded. National leagues and pan-European tournaments, like the UEFA Champions and Europa leagues, were scheduled well before the original outbreaks of COVID-19 in China. Although there is evidence of the behavioral impact of victories and losses of soccer matches on stock returns (e.g., Edmans, García, and Norli (2007) ), our soccer-related instruments are independent from the game's output. As far as we know, there is neither theory nor evidence that links directly the number of attendants to a soccer match or the capacity of the venue where it is played with, for instance, stock returns, cash holdings, or dividends of firms headquartered in the region, or, alternatively, growth in regional product or unemployment. Theoretically, the physical interaction among spectators in large venues as well as their arrival and departure from stadiums increase the likelihood of being infected with the virus, ultimately working as "super-spreader" events. The evidence in this paper is consistent with this conjecture and offers solid support for the relevance of these instruments to predict the spread of COVID-19 cases across European regions. We collect data from soccer games from all competitions (domestic and international) played in 194 regions across Belgium, France, Italy, Germany, the Netherlands, Poland, Spain, Sweden, Switzerland, and the UK, between January 1 and until the end of March 2020 (most games in Europe were canceled after March 10). In our main analysis, we include games played in venues with a minimum capacity of 25,000 people. In total, there are 1,051 qualifying games during this period. 3 We also collect the confirmed cases of COVID-19 in these regions until the end of March, plus three economic and demographic variables: gross regional product, population, and density. We construct four variables related to the soccer matches. Namely: a dummy variable that takes a value of one if there was a soccer game in the region, zero otherwise; a variable that accumulates the number of games played in the region; a variable that accumulates all the spectators who attended those games; and a variable that accumulates the capacity (maximum number of spectators) of the venues where the matches took place. We document the following findings. First, for any single country and day from March 1 through 14, the rate of change in the number of COVID-19 cases relative to the previous day is, on average, 5.5 percentage points higher in regions where there was at least one soccer game two weeks earlier relative to regions with no games in the same period (as reference, the average rate of change is 23% per day during this period). Additionally, the daily increment of cases is, on average, about 6 basis points higher for every 1% increase in the attendance and venue capacity of games played two weeks earlier. These results are significant at the 1% level, and robust to the inclusion of regional demographic and economic control variables known to affect the virus spread (e.g., Rocklöv and Sjödin (2020) ). Second, games celebrated, either the previous week or earlier than 2 weeks before, have no significant effect in the increment of daily cases. This is consistent with the incubation period and the lack of massive testing in the early stages of the pandemic. 4 Third, as we expand the sample to include games celebrated in venues with smaller capacity, the statistical significance of the coefficient on the three soccer-related variables decreases, turning non-significantly different from zero when we include stadiums with a minimum capacity above 10,000 spectators. This evidence is consistent with the effect of "super-spreaders" of the virus documented in other large events (e.g., Dave et al (2020) and Felbermayr, Hinz, and Chowdhry (2020) ). Fourth, the games played by a (local) team of a given region in another region have no significant effect on the number of cases in the local region, regardless of the game attendance or the venue capacity. Thus, there is no evidence that soccer fans moving to other regions or people gathering in bars in the local region to watch the game have contributed significantly to the spread of the virus. The rest of the paper is organized as follows. Section 2 describes the data. Results are presented in Section 3. We discuss the limitations of the analysis in Section 4, before concluding with Section 5. Our variables and their sources are described in the Appendix. Our sample consists of 2,162 region-day observations. 5 We collect the accumulated number of diagnosed cases of COVID-19 per day and region from day 1 through 14 of March 2020, in 194 regions from Belgium, France, Italy, Germany, the Netherlands, Poland, Spain, Sweden, Switzerland, and the UK. 6 We call this variable Cases. 7 Panel A of Table 1 shows that, on average, there are 96 accumulated cases per day and region with an average of 35 accumulated cases per million regional inhabitants and day (variable Cases/Population). Then, we collect data from soccer games from all competitions (domestic and international) played in the 194 regions between January 1 and until the end of March 2020 (most games in Europe were canceled after March 10). Originally, we only include games played in venues with a minimum capacity of 25,000 people. In total, there are 1,051 qualifying games during the sample period. From each game, we collect date, playing teams, attendance (when available), venue capacity, and the region and country where it is located. Finally, we also collect the following demographic variables from each region: Population, Density, and Gross Regional Product (GRP) per capita. 3 For robustness, we also collet data from games that took place in stadiums with a minimum capacity of 10,000 spectators, increasing the sample up to 2,314 matches. 4 "Coronavirus disease 2019 (COVID-19) Situation Report -73," WHO, April 2, 2020. 5 Data on COVID-19 cases from Poland start on March 4, from Switzerland on March 6, and from England on March 9. 6 We are unable to obtain regional data of COVID-19 cases from Northern Ireland, Scotland, or Wales. Hence, only English regions are considered. 7 Table A in the Appendix shows the exact definition and source for each variable. First, we want to explore if there is a pattern in the relation between the attendance to these events and the propagation of the virus. Every day, from March 1 through 14, we calculate the number of matches (# Games), Attendance and venue Capacity that took place in each region 1, 2,…, and up 30 days before. Figure 1 plots the average value of each variable across the 14 days and 194 regions for each day lag. Notice that game attendance and venue capacity are highly correlated across lags (correlation coefficient 0.98). The average match attendance is about 60% of venue capacity and this percentage is very stable across lags. The figure shows periodic spikes around 7, 21, and 28-day lags for the 3 variables. Considering that the first day of our sample is Sunday, March 1, these spikes reflect the higher concentration of soccer matches on weekends (70% of soccer matches take place on weekends). Figure 2 confirms this by plotting the number of soccer games across all regions in our sample, from January 14 through March 14. In the horizontal axis, we include Saturdays. We can see that a disproportionate number of games fall on Saturday or Sunday. Thus, in order to smooth out the effect of weekends, we accumulate games, attendance and venue capacity over weekly windows. For every region in our sample and for every day from March 1 through 14, we estimate the number of soccer matches, the accumulated attendance, and the accumulated venue capacity 1, 2…, and up to 6 weeks earlier. We also calculate the variable I_Games that takes a value of 1 if there was at least one soccer match in the region during a given week, zero otherwise. Table 1 , Panel A reports the statistics accumulated over the 6 weeks window. From March 1 through 14, on average, there were games in 44% percent of the regions over the previous 6 weeks. Additionally, for every day and region, there were on average 3.29 games accumulated over the previous 6 weeks, attended by an average of 78,953 (accumulated) people and played in venues with an average (accumulated) capacity of 136,092 spectators. Table B in the Appendix includes a list of all regions, with the accumulated number of cases, games, attendance and venue capacity in our sample. 8 Table 1 Summary Statistics for the Sample of Region-Days In Panel A, each observation is a duple region-day. Every day from March 1 through March 14, 2020, Cases is the accumulated number of diagnosed cases of COVID-19 in the region during that period. Cases/Population is the number of cases per million inhabitants. We consider all regions in Belgium, France, Italy, Germany, the Netherlands, Poland, Spain, Sweden, Switzerland, and the UK. The distribution of observations across regions is in Table B .1 of Appendix B. Every day from March 1 through March 14, # Games, Attendance, and Capacity is the accumulated number of soccer matches played in the region, their attendance, and the venue capacity, respectively, over the previous 6 weeks. I_Games is a dummy variable that takes a value of 1 if there was at least one soccer match in the region where the firm is located during the previous 6 weeks, zero otherwise. Population is thousands of inhabitants per region; Density is number of inhabitants per square-Km; GRP is the Gross Regional Product per capita in USD. Log (x) denotes the natural logarithm of x. Δ Log(1+x t )=Log((1+x t )/(1+x t-1 )). In Panel B, we report the average across regions of the weekly accumulated number of games, attendance and venue capacity for up to 6 weekly lags. Thus, the median value of the three variables in Table 1 is zero. Table 1 . Table 1 presents the average of each variable across the 14 sample days and 194 regions for each week lag. Except for the first week, 9 the estimates are very similar across weeks. On average, across weeks 2 through 6, 33% of the regions celebrated at least one soccer match per week. There were 0.55 games per week and region, attended by 13,192 people and played in venues with average capacity for about 22,590 spectators. 10 We proceed now to analyze the relation between, on the one side, the number, attendance, and venue capacity of the soccer games celebrated until all competitions were interrupted, and, on the other, the propagation of COVID-19 cases across days and regions during the first two weeks in March 2020. There is evidence that the incubation period of COVID-19 (that is, the "pre-symptomatic" period between becoming infected and developing symptoms of the disease) can be as long as two weeks. Thus, there is likely a lag between the time when the match spectators become infected and the time they are tested after developing symptoms compatible with the disease. This is especially relevant in the first two weeks of March 2020 when mass testing (in particular, across asymptomatic people) had not been yet implemented in any country. Figure 3 shows that by March 15, all countries in our sample, except Switzerland and (marginally) Germany, had a ratio of COVID-19 tests per thousand people below 0.2. Most likely, at the onset of the pandemic, only people with symptoms were tested and, eventually, diagnosed as new cases of COVID-19 infections. Therefore, considering the incubation window and that only symptomatic people were tested at that point, we expect the predictive power of our instruments to become significant with a lag after the game. To test this prediction, we run the following panel regression in region r and day t from March 1 through 14, 2020: 11 ΔLog(1 + Cases r,t ) represents the (log) difference between 1 plus the number of cases in region r and day t and day t-1. Likewise, ΔLog(1 + Cases r,t − 1 ) is the same variable lagged 1 day. For every lagged week w = {1, 2, …, 6} and region r, the variable WX r,t − w represents, alternatively, the dummy variable, I Games t− w , that takes a value of one if there was a soccer match in the region any day t ∈ (t − (1 + 9 Games were canceled throughout Europe around March 10. Thus, the variable estimates from March 11 through 14 over the first week-lag are smaller than the corresponding estimates for weeks 2 through 6. 10 If the region did not have any games, the capacity is zero. Thus, the average capacity is below 25,000, the minimum required stadium capacity to be included in the sample. 11 In other to keep all observations, we add 1 to Cases since otherwise the logarithm of zero is not defined. The figure shows the number of daily test of COVID-19 per thousand people from February 1 through March 31, 2020, for the countries in our sample for which there is data available. The graph is retrieved from https://ourworldindata.org/coronavirus-testing. Data is collected by Our World in Data by Oxford Martin School at the University of Oxford. Data description and sources per coutry can be found at https://ourworldindata.org/coronavirus-testing#source-information-country-by-country 7 × (w − 1), t − 7 × w); the natural logarithm of 1 plus the accumulated number of match attendants over the week, Log(1 + Attendance t − (1 + 7 × (w − 1)) − Attendance t − 7 × w ); and the natural logarithm of 1 plus the accumulated venue capacity of the games played over the week, Log(1 + Capacity t − (1 + 7 × (w − 1)) − Capacity t − 7 × w ). We control for each region's population, density and gross regional product per capita (GRP). Our object of interest is the series of coefficients on the weekly lagged predictors, c w = {1, 2, …, 6} .FE c × t represents country times day fixed effects. All variables are defined in Appendix A. Standard errors are clustered at the region level. Table 2 presents the results from regression (1) for the three soccer variables. The rate at which the daily number of cases of COVID-19 increases is positive and significantly related to the increase of cases the previous day. It is also higher in more populated and wealthier (higher Log(GRP)) areas. With respect to the lagged soccer variables, only the coefficient c 2 corresponding to I_Games, Log(Attendance), or Log(Capacity) two weeks earlier is significant. The other lags are non-significant for any of the three variables. In specification (1), for any single country and day from March 1 through 14, the rate of change in the number of COVID-19 cases relative to the previous day is, on average, 5.5 percentage points higher in regions where there was a soccer game two weeks earlier relative to regions with no games in the same period. This result is statistically significant at the 1% level as well as economically significant (the average growth rate of cases was 23% per day during this period). Specifications (2) and (3) show that the rate of change is, on average, about 6 basis points higher for every 1% increase in attendance and venue capacity, respectively. Both results are significant at the 1% level. These results are consistent with the documented incubation period of the virus and the lack of massive testing during the sample period. Finally, we test if our results change when we include venues with smaller minimum capacity. There is evidence of the role played by large gatherings of people in the dissemination of the virus. These are known as "super-spreader" events (e.g., Dave et al (2020) and Felbermayr, Hinz, and Chowdhry (2020) ). To test the importance of the minimum venue capacity, we expand the sample to include games that took place in venues with a minimum capacity of 10,000 spectators. The extended sample includes 2,314 games. Table 3 presents the results of regression (1) when we consider games held in venues with a minimum capacity of 20, 15 and 10 thousand spectators, respectively, for each of the three soccer variables. Like in Table 2 , the daily increment in the number of cases of COVID-19 is positive and significantly related to the increase of cases the previous day. It is also higher in more populated and wealthier (higher Log(GRP)) areas. When we include stadiums with a minimum capacity of 20,000 spectators, the rate of change in the number of COVID-19 cases relative to the previous day is, on average, higher by 4.2 percentage points in regions where there was a soccer game two weeks earlier relative to regions with no games in the same period. This is lower than the 5.5% difference in Table 2 . ΔLog( 1 + Cases r,t ) = a + b 1 ΔLog( 1 + Cases r,t− 1 ) + b 2 Log(Population r ) + b 3 Log(Density r ) + b 4 Log(GRP r ) + ∑ 6 w=1 c w WX r,t− w + FE c×t + ∈ r,t . ΔLog(1 + Cases r,t ) represents (log) difference between 1 plus the number of cases in region r and day t with respect to day t-1. Likewise, ΔLog(1 + Cases r,t − 1 ) is the same variable lagged 1 day. For every lagged week w={1,2,…,6} and region r, the variable WX r,t − w represents, alternatively, the dummy variable, I Games t− w , that takes a value of one if there was a soccer match in the region any day t ∈ (t − (1 + 7 × (w − 1), t − 7 × w); the natural logarithm of 1 plus the accumulated number of match attendants over the week,Log(1 + Attendance t − (1 + 7 × (w − 1)) − Attendance t − 7 × w ), or the natural logarithm of 1 plus the accumulated venue capacity over the week, Log(1 + Capacity t − (1 + 7 × (w − 1)) − Capacity t − 7 × w ). We control for each region's Population, Density and Gross Regional Product per capita (GRP). FE c × t Represents country times day fixed effects. Appendix A includes the definition and source of each variable. Standard errors (in parenthesis) are clustered at the region level. ***, **, * represent statistical significance at the 1, 5, and 10% level, respectively. Log ( 194 194 194 The coefficient is significant at the 5% level (down from 1% in Table 2 ). The Attendance and Capacity variables show a similar qualitative pattern. However, when we expand the minimum capacity to 15,000 spectators, the coefficient is not statistically different from zero for any of the three variables (only marginally at the 10% for Attendance). These results are confirmed when the minimum capacity is lowered to 10,000 spectators. We interpret these results as consistent with the evidence of other super-spread events. A minimum agglomeration is needed for the spread of the virus to be statistically detectable. In this section, we discuss some limitations of our analysis. In the first place, our regressions only explain, on average, 18% of the change in daily cases. Thus, the coefficients on the soccer variables should be interpreted in a cross-sectional way: they help explain differences in the incidence of COVID-19 across regions in the early stages of the pandemic, rather than the absolute numbers of contagions within each region. Furthermore, relative to our sample period, people's awareness has increased and governments around the world have taken measures to promote public hygiene and social distancing. Currently, we would expect any public gathering or mass event to result in much lower COVID-19 spreading. For this reason, using soccer games as an instrument variable is only applicable during the outbreak of the pandemic across Europe in March. This limitation is shared by other studies based on large gatherings, like motorcycle rallies and ski resorts, mentioned in the Introduction. Unlike these events, however, soccer competitions have two advantages as an instrument. First, they take place across several countries, hence expanding the sample size considerably. Second, the games are staggered through the first quarter of 2020, in contrast with other mass events like Carnival celebrations, which take place rather simultaneously across Europe in the same period. Finally, another limitation is that people might have also caught the corona virus in bars where soccer matches were broadcasted, Table 3 Regression of Change in Cases on Weekly Lagged Games, Attendance and Capacity Sorted by minimum venue Capacity (below 25K spectators) This table reports the coefficients from the following regression: ΔLog( 1 + Cases r,t ) = a + b 1 ΔLog( 1 + Cases r,t− 1 ) + b 2 Log(Population r ) + b 3 Log(Density r ) + b 4 Log(GRP r ) + ∑ 6 w=1 c w WX r,t− w + FE c×t + ∈ r,t . ΔLog(1 + Cases r,t ) represents (log) difference between 1 plus the number of cases in region r and day t with respect to day t-1. Likewise, ΔLog(1 + Cases r,t − 1 ) is the same variable lagged 1 day. For every lagged week w={1,2,…,6} and region r, the variable WX r,t − w represents, alternatively, the dummy variable, I Games t− w , that takes a value of one if there was a soccer match in the region any day t ∈ (t − (1 + 7 × (w − 1), t − 7 × w); the natural logarithm of 1 plus the accumulated number of match attendants over the week,Log(1 + Attendance t − (1 + 7 × (w − 1)) − Attendance t − 7 × w ), or the natural logarithm of 1 plus the accumulated venue capacity over the week, Log(1 + Capacity t − (1 + 7 × (w − 1)) − Capacity t − 7 × w ). We control for each region's Population, Density and Gross Regional Product per capita (GRP). FE c × t Represents country times day fixed effects. Appendix A includes the definition and source of each variable. >20K, >15K, and >10K represent the minimum capacity of venues included in the sample. Standard errors (in parenthesis) are clustered at the region level. ***, **, * represent statistical significance at the 1, 5, and 10% level, respectively. ΔLog ( 194 194 194 194 194 194 194 194 194 without being physically present in the match venue. To assess the impact of this indirect via of contagion, we perform the following exercise. For every game in our sample, we replicate Table 2 but considering the spread of cases in the region when a local team plays outside the region. In this case, we might expect an increase of bar attendance in the region of the local team but not mass gathering of people as we predict in the region where the game is actually played. 12 That is, in regression (1), for every lagged week w={1,2,…,6} and region r, the variable WX r,t − w now represents, alternatively, the dummy variable, I Games t− w , that takes a value of one if there was a soccer match in which a team from region r played outside that region any day t ∈ (t − (1 + 7 × (w − 1), t − 7 × w); the natural logarithm of 1 plus the accumulated number of match attendants to those games,Log(1 + Attendance t − (1 + 7 × (w − 1)) − Attendance t − 7 × w ), or the natural logarithm of 1 plus the accumulated venue capacity of those games, Log(1 + Capacity t − (1 + 7 × (w − 1)) − Capacity t − 7 × w ). We include the same set of controls as in equation (1). Standard errors are clustered at the region level. Results are reported in Table 4 . Even accounting for the impact of cross-border movements of fans, the celebration of any game where a local team plays outside the region has no significant effect on the virus spread in the region, regardless of the venue attendance or capacity. The evidence about the soccer variables introduced in this paper may help overcome potential endogeneity issues in the analysis of how the spread of COVID-19 has affected the economy and firm decisions. Despite the limited time span (March 2020) of these variables, the impact of the COVID-19 pandemic is so deep and unprecedented, that we believe this analysis is relevant. Gómez and Mironov (2020) , for instance, show that, only after instrumenting the number of COVID-19 cases with the soccer variables, there is evidence of a causal relation between the propagation of the virus and the cross-section of stock returns from firms headquartered in these regions. The accumulated drop in stock performance during March and April 2020 is significantly higher for firms in regions with Table 4 Regression of Change in Cases on Weekly Lagged Games, Attendance and Capacity when a Regional Local Team Plays in a Different Region This table reports the coefficients from the following regression: ΔLog( 1 + Cases r,t ) = a + b 1 ΔLog( 1 + Cases r,t− 1 ) + b 2 Log(Population r ) + b 3 Log(Density r ) + b 4 Log(GRP r ) + ∑ 6 w=1 c w WX r,t− w + FE c×t + ∈ r,t . ΔLog(1 + Cases r,t ) represents (log) difference between 1 plus the number of cases in region r and day t with respect to day t-1. Likewise, ΔLog(1 + Cases r,t − 1 ) is the same variable lagged 1 day. For every lagged week w={1,2,…,6} and region r, the variable WX r,t − w represents, alternatively, the dummy variable, I Games t− w , that takes a value of one if there was a soccer match where a local team from region r played outside that region any day t ∈ (t − (1 + 7 × (w − 1), t − 7 × w); the natural logarithm of 1 plus the accumulated number of match attendants to those games,Log(1 + Attendance t − (1 + 7 × (w − 1)) − Attendance t − 7 × w ), or the natural logarithm of 1 plus the accumulated venue capacity of those games, Log(1 + Capacity t − (1 + 7 × (w − 1)) − Capacity t − 7 × w ). We control for each local region's Population, Density and Gross Regional Product per capita (GRP). FE c × t Represents country times day fixed effects. Appendix A includes the definition and source of each variable. Standard errors (in parenthesis) are clustered at the region level. ***, **, * represent statistical significance at the 1, 5, and 10% level, respectively. Log ( 194 194 194 12 Arguably, this is not a perfect experiment since fans of a local team from a given region might have travelled to attend the game when the team plays in another region, later spreading the virus at home (see footnote 1). The number of local fans travelling to another region is likely to increase with the game attendance and the venue capacity. We cannot disentangle this effect from the virus spread from bar attendants in the local region. higher incidence of (instrumented) COVID-19 but only when the company's CEO is older than 60 years. The existing evidence shows that older people are more likely to suffer from severe illness or even death in case of contagion. Thus, the market is discounting the likelihood of the company's CEO possibly dying of COVID-19. These instruments could also be used to analyze the causal effect of the virus on the drop in regional gross product or employment, or corporate variables like revenue, cash holdings, dividends, investments, inventories, and accounts payable, as more data becomes available. None. Tables A and B. The Contagion Externality of a Superspreading Event: The Sturgis Motorcycle Rally and COVID-19 Sports Sentiment and Stock Returns Après-ski: The Spread of Coronavirus from Ischgl through Germany COVID-19 and the Value of CEOs: The Unintended Effect of Soccer Games across European Stocks Identification of critical airports for controlling global infectious disease outbreaks: Stress-tests focusing in Europe High population densities catalyze the spread of COVID-19