key: cord-0788683-9jhwjmya authors: Andersen, M. S.; Bento, A. I.; Basu, A.; Marsicano, C.; Simon, K. title: College Openings, Mobility, and the Incidence of COVID-19 Cases date: 2020-09-23 journal: nan DOI: 10.1101/2020.09.22.20196048 sha: 2497a8abc011a7a7a62bd77f881bbb61fa3b01e5 doc_id: 788683 cord_uid: 9jhwjmya Most U.S. colleges have reopened campuses for in-person teaching this Fall, following rapid closures at the onset of the COVID-19 pandemic this Spring. Despite administrative efforts at mitigation and preventive measures, the large congregation of students within close quarters has caused public health concerns. In this paper, we examine college reopenings' association with changes in human mobility within campuses and in COVID-19 incidence in the counties of the campuses, over a two-week period before and after college reopenings. To estimate the daily reproduction number (Rt), we used a Bayesian framework. Using a difference-in-differences design comparing areas with a college campus, before and after reopening, to areas without a campus, we find that after college reopenings for face-to-face instruction, COVID-19 incidence in the county increased on average by a statistically significant 0.024 per thousand residents, following increases in mobility on campus. Similarly, we estimated increased transmission locally after reopening, with a rising trend in Rt. The increase in cases was larger in counties with colleges that drew students from areas with increasing incidence rates. During late Summer 2020, colleges reopened and welcomed hundreds of thousands of students back to campus in the United States (1) . However, in several prominent cases, institutions switched to online instruction after rapid increases in COVID-19 cases on campus (2) . These early experiences call into question the feasibility of resuming in-person instruction during the COVID-19 pandemic. Several reports indicate that opening may have promoted the spread of infections among students, and likely in the community (3) . However, we are unaware of any research that formally tests this hypothesis in the U.S. or elsewhere. We hypothesized that reopening colleges would increase COVID-19 cases within the college community with potential spillovers onto the neighboring areas. We also hypothesized that increases in incidence would be greater in campus' that attract students from areas with higher incidence of COVID-19, and that effects would be concentrated among campuses providing face-to-face instruction. It is outside of the scope of our study to understand the impact of specific actions colleges may have taken in response to rising rates more recently. We describe our methods in the Supplementary Information. Our sample period ran from July 15 th 2020 to September 13 th 2020. Across the 3,142 counties of the United States, only 779 contained a college campus in our universe of 1,409 colleges (Panel A of eFigure 1). The number of devices visiting campus increased significantly, in the week leading up to the start of classes and after classes have begun (Panel B, Figure 1 ). In the aggregate, the number of devices on campus rose by 47.3% (eTable 1 Model 1, 95% CI: 36.3% -58.2%, (4)) in the 2 weeks after classes started, relative to the 2 week period before the start of class. In eTable 2, we break out these results further by week relative to the reopening date. The increase was larger among schools that opened for primarily in-person education (55.7%; CI: 43.0 -68.5) versus those that did not (33.2%, CI: 23.4% -43.0%) (eTable 1 Model 2, (4)). eFigures 1 and 2 (4) present conventional event studies that use counties with a school that has not yet opened, or without a school, as a comparison group and yield similar conclusions. Using our difference-in-difference framework, we find that reopening a college was associated with a statistically significant increase of 0.017 cases per 1000 people (eTable 1, Model 1, (4)). 95% CI: 0.003-0.030). When we separate the types of opening (Model 2), we find that reopening for primarily inperson instruction increases COVID-19 cases in the county by 0.024 (Model 2, 95% CI: 0.008 -0.040) cases per thousand, with no significant effect of opening online. There trends are illustrated in Figure 2 . Using our weighted measure created for exposure to cases from student -out-of-county migration, we find that a campus with a ten percent higher exposure level greater exposure is associated with an additional 0.0119 cases per thousand county residents (Model 3, 95% CI: 0.0078 -0.0160) per day than other campuses, holding all else constant. Further, when including in-person instruction interactions, we find that the ten percent increase in the out-of-county exposure measure is associated with an additional 0.0142 (CI: 0.0092 -0.0192) cases per thousand in counties in which the first school opened for in-person instruction and no association for other counties. We performed several additional analyses and specification checks, which we report in GitHub and describe here. In eTables 3-5 (4), we report similar results using alternative measures of our main disease-related outcome measure. In eTable 3, (4) we use the number of new cases over the last three days, which ensures that results are not driven by weekends. eTable 4 (and eFigure 1, Panel C) use new cases over a week, and eTable 5 (and eFigure 1, Panel D) report results using as the dependent variable. Using the estimates, the main effect of reopening is consistently significant and indicates that reopening was associated with an increase transmission, we no longer find significant differences by instructional modality. Weighting counties by population (eTable 6, (4)), resulted in no significant differences between online and in-person reopening, nor was there a statistically significant association with reopening. However, we continued to find that reopening schools that drew students from higher risk areas was associated with an increase in COVID-19 cases in the county. In addition, we used higher resolution measures of in-person teaching (eTable 7, (4) ), which indicate that there is a dose-response relationship between the degree of in-person teaching and new cases. Institutions that opened for primarily in-person instruction and those that adopted hybrid or hyflex models were predicted to have higher numbers of visitors than those at the extremes of the in-person to online distribution. Our findings demonstrate that re-opening a college is associated with an increase in the number of cellular devices in the week leading up to the start of classes, irrespective of type of instruction offered. Our second main finding is that re-opening a college with in-person instruction significantly increases the rate of new daily cases in the county. We also find a significant and positive relationship between daily new cases in the county and a county's exposure to new cases from students' home states in the two weeks before reopening. This is consistent with the county estimates of Rt, reflecting a steady increase in transmission 2weeks after the uptick in mobility. Using our classification of counties, our results indicate that reopening college campuses for in-person instruction is associated with more than 3000 (3219 [95% CI: 1094 -5344]) additional cases of Covid-19 per day in the United States , assuming that our results are not due solely to new cases being diagnosed on campus, rather than at home. However, because of the nature of the cases reports data, we were unable to disentangle how many of the cases we measure as our outcome are "imported " (student arrivals) and how many are local transmissions from the students. Further to this, asymptomatic cases will only be caught if testing was done in campus regardless of symptoms. Nevertheless, our results are inconsistent with large numbers of "imported" cases since an imported case would lead to an increase in COVID-19 cases in the first week of reopening, assuming that a positive test result is returned nine days after infection, while we find no increase in COVID-19 cases until the second week of classes, as reflected by Rt. We did not quantify potential spillovers to the communities surrounding campuses, as these effects will take more weeks to be observed and would benefit from college-level incidence data. Evaluating the effectiveness of specific mitigation measures taken by colleges, especially the ways in which colleges have reacted to the initial increases in cases with strong countermeasures, was also beyond the scope of this initial study-these remain priorities in forthcoming studies. Several other limitations of our analysis arising primarily out of lack of data are discussed in the SI. We submit, our findings are critical in the context of public health adaptive management strategies and in particular for colleges, as they consider additional strategies to mitigate disease burden and decrease transmission. Institutional leaders should think carefully as they plan their Spring 2020 semestersexamining not only the "situation on the ground" in their own institutions' communities, but also in those from which they draw a substantial number of students. Most schools began allowing students to move-in around seven days before the start of class (B). Cases in counties with a school that opened for in-person education increased significantly by 8 days after reopening (C). Trends in (D) indicate that reopening, rather than instructional modality, were associated with changes in over time. Each panel reports estimates from a single regression model including controls for county/CBG and date. 95% confidence intervals based on standard errors clustered on county and allowing for serial correlation in the error term. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09.22.20196048 doi: medRxiv preprint We collected data on opening dates and announced instructional methods from the College Crisis Initiative(1) for 1,431 public and non-profit colleges and universities ("colleges") in the United States. The College Crisis Initiative collects data on nearly all non-profit and public four-year degree-granting institutions with full-time undergraduates that receive Title IV aid. It excludes four-year for-profit institutions, specialty institutions like seminaries or stand-alone law schools, or institutions with graduate-only programs. This list comes from the Integrated Postsecondary Education Data System (IPEDS), which lists in total 6,527 institutions ranging from research universities to non-degree-granting institutions like local cosmetology schools. IPEDS indicates that of those, 2,009 are four-year public and non-profit degree-granting institutions with first-time, full-time undergraduates. Our sample, therefore, represents nearly 70 percent of these institutions. Further, this represents 70 percent of total undergraduate enrollment among all institutions of higher education in the United States (author calculations based on IPEDS administrative 2018 data). We merged college opening dates with "shapefiles" (the geographic coordinates) for college campuses (2) which allowed us to assign college campuses to Census Block Groups (CBGs) and counties. Our final sample included 1,404 colleges, as 27 colleges did not map to a CBG. We dropped an additional 2 schools that did not have an announced reopening strategy for a final sample of 1,409 schools. We assigned reopening strategies based on the mode of instruction reported on the date instruction began for Fall 2020. Campuses were classified as primarily in-person (886) or primarily online (483). Across these colleges there are campuses in 1446 college-county pairs. We extracted cellular data from SafeGraph's Social Distancing Metrics datafiles, an opt-in sample of approximately 20 million mobile devices. These data measure the number of devices that are detected each day in each CBG, from July 15th through Sunday September 13 th . SafeGraph data have been used in several recent publications (3) (4) (5) . We collected daily state and county confirmed COVID-19 cases for the same period (6) dividing by U.S. Census population estimates to form our main outcome. In addition to case counts, we estimated the county-level net reproduction number (Rt) (7) (details in the eMethods) as a measure of disease transmission. We used data from 2019 to construct estimates of each college's exposure to counties around the United States (see supplement for details) based on the number of devices moving onto campus around the start of classes for the Fall 2019 semester. We use 2019 data because that allows us to create potential exposure indices for all campuses, including those who have not reopened. Using our estimate of a college's exposure to counties, we computed the weighted average of the log weekly incidence just prior to campus reopening in 2020, for those areas from which a college drew students. We used the All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09.22.20196048 doi: medRxiv preprint change in this measure from July 15 th to two weeks before reopening as a measure of the unexpected incidence in students' home areas. We used the daily cases, incubation period, and serial interval previously estimated (7) . This allowed us to estimate the effective reproduction number for each county. The basic reproduction number R0 characterizes the mean number of secondary cases produced by a primary infector during the exponential growth phase of the epidemic, before interventions are applied and when the depletion of susceptible individuals is negligible (8) . The effective reproduction number R(t) represents the mean number of secondary cases generated by a primary infector at time t (9) . This measure is useful to track the effectiveness of performed control measures, which aims to push it below the epidemic threshold (corresponding to R(t)=1). R(t) incorporates factors affecting the spread of the epidemic (e.g., individual's behavior and susceptible depletion. To estimate R(t), we use the same methodology described previously (9, 10) to distinguish between locally acquired and imported cases. Thus, we assume that the daily number of new cases (date of symptom onset) with locally acquired infection L(t) can be approximated by a Poisson distribution where C(t) is the number of new cases (either locally acquired or imported) at time t (date of symptom onset), R(t) is the effective reproduction number at time t and is the generation time distribution. To estimate the time between consecutive generations of cases, we adopted the serial interval (which measures the time difference between the symptom onset of the infectors and of their infected) estimated from the literature (7), namely a gamma distribution with mean 5.0 days and standard deviation 3.4 days (shape=4.87, rate=0.65). The likelihood Λ of the observed time series cases from day 1 to T can be written as: where P(x,y) is the Poisson density distribution of observing x events, given the parameter y. We then use Metropolis-Hastings MCMC sampling to estimate the posterior distribution of R(t). The Markov chains were run for 100,000 iterations, considering a burn-in period of 10,000 steps, and assuming non-informative prior distributions of R(t) (flat distribution in the range (0-1000]). Convergence was checked by visual inspection by running multiple chains starting from different starting points. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 23, 2020. We constructed a measure of a colleges exposure to different geographic areas using movement data from 2019. We collected start dates for the Fall 2019 semester and estimated the change in the number of devices on campus from a pre-period, which we defined as the period from 14 to 8 days before the start of classes, to a post-period, which we defined as the first week of classes. Because the number of devices in each original county varied over time, we normalized the visitors from each county by the ratio of the average number of devices from July 15 2019 through September 30 2019 to the number of devices observed on day t. We omitted movements that resulted in a decrease in devices on campus and then converted these movement changes into a measure of exposure as the fraction of devices moving to campus from county c relative to the total number of devices moving to campus. We converted our empirical measure of movement into a measure of disease exposure by averaging the log number of new cases per 1000 over the week ending two weeks before the start of classes. Lastly, because it is possible that campus administrators considered the disease exposure of their students when choosing to reopen for face-to-face instruction, we computed the residual from a regression of our disease exposure measure from fourteen days before campus opened on disease exposure as of July 15 th . Our main analyses use a balanced panel of counties and CBGs. In cases where a county or CBG has multiple colleges represented (as happens for 266 of 3139 counties and 110 of 216992 CBGs), we only include the college with the earliest start date and broke ties by choosing the college with the highest total enrollment. We used event-study and difference-in-difference methods to assess the relationship between college reopenings and our two main outcomes: mobility to campuses, and county daily COVID-19 reported cases. Our event study assessed the changes in mobility and changes in COVID-19 cases, relative to when a college has reopened, controlling for the geographic level of our analysis-either the CBG (for the mobility models) or county (for the COVID-19 cases models)-calendar date, and college (since a college can span more than one CBG or county). We constructed counterfactual time trends for counties without a college or that had not yet opened using the distribution of opening dates for all other college in our data-e.g. since 11.8% opened on Aug 31 st , the counterfactual will assume that 11.8% of nonopened colleges opened on that date. Our regression models for the event studies, using to denote the geographic unit (CBG or county), can be written as: Where is the outcome, is the reopening date for campus , and are date and time fixed effects, and is an idiosyncratic error term. In some cases, we computed these event studies allowing All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 23, 2020. We replaced the time relative to opening indicators with a single indicator for the post period to estimate the difference in difference model: Where is the coefficient of interest. In our difference-in-difference models, we assessed changes relative to the reopening date, controlling for county (or CBG), college, and date effects. We also estimated models that included interactions with an indicator for a campus being primarily in-person and our student exposure index. We used Driscoll-Kraay standard errors which account for first-order autocorrelation and clustering of observations at the county level. Results were considered statistically significant if the two-sided p-value was less than 0.05. As a limitation our data do not contain such measures so we cannot assess the effectiveness of withincollege strategies. Similarly, we are unable to test what has occurred once colleges change decisions, such as changing modes temporarily or encouraging students to return home (11) . One can also make a strong theoretical argument that sending students home risks further spreading the pandemic across the country by providing new infusions of cases in areas that may relatively low rates of transmission. These concerns have been raised by public health officials, some of whom have publicly opposed closing dormitories, even after a college or university transitioned to online education (12) . Further research on the effects of sending students home is needed to understand the risks and benefits of closing residence halls. Our results are limited by the nature of our data. Our mobility analysis relies on observing cellular GPS signals and these devices may not always report their location. In addition, it is unlikely that devices correspond in a one-to-one manner with people since college students may have more than one device (a phone and a cell-enabled tablet or watch) that provide data under distinct identifiers. Second, we are unable to measure cases among college students vs. others in the county community. Fourth, our measure of mobility does not take account of students that may live in off-campus housing and take classes online. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09.22.20196048 doi: medRxiv preprint All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09.22.20196048 doi: medRxiv preprint All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 23, 2020. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 23, 2020. The College Crisis Initiative -Crisis to Innovation Schools Briefing: University Outbreaks and Parental Angst. The New York Times How Colleges Became the New Covid Hot Spots. The New York Times The College Crisis Initiative -Crisis to Innovation Department of Homeland Security. Colleges and Universities Campuses The spread of social distancing Social distancing responses to COVID-19 emergency declarations strongly differentiated by income Interdependence and the cost of uncoordinated responses to COVID-19 Serial Interval of COVID-19 among Publicly Reported Confirmed Cases Modeling Infectious Diseases in Humans and Animals Measurability of the epidemic reproduction number in data-driven contact networks A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics College student contribution to local COVID-19 spread: Evidence from university spring break timing. Available at SSRN 3606811 CDC official affirms coronavirus deaths really are coronavirus deaths SafeGraph, a data company that aggregates anonymized location data from numerous applications in order to provide insights about physical places, provided data for this project. To enhance privacy, SafeGraph excludes census block group information if fewer than five devices visited an establishment in a month from a given census block group.