key: cord-0683313-usqgyxuz authors: Badruddoza, S.; Amin, M. D. title: Causal Impacts of Teaching Modality on U.S. COVID-19 Spread in Fall 2020 Semester date: 2020-11-03 journal: nan DOI: 10.1101/2020.10.28.20221986 sha: 32bdf356b44afbb3e638d854eb55524b37ef913a doc_id: 683313 cord_uid: usqgyxuz We study the impact of college openings and teaching modality on county-level COVID-19 cases and deaths using the information of 745 U.S. colleges. We group the colleges by their teaching modes in the fall 2020 semester: in-person, online, and hybrid; and employ a logistic model and a gradient boosting algorithm to estimate the propensity scores for the three groups to adjust the pre-treatment imbalances of college and county-level covariates. We find that greater enrollments, individual mask policies, and fewer republican votes in the county are major predictors of adopting online or hybrid modes. Treatment effects provide evidence that college reopenings, especially with in-person teaching elements, increase daily new cases and deaths. The COVID-19 pandemic and its containment measures have unprecedentedly affected human health 29 and economic activities (e.g., McKibbin and Fernando, 2020; Atkeson, 2020) . Countries across the world 30 have implemented partial or full business and school closures to mitigate the infection spread 31 (Panovska-Griffiths et al., 2020; Di Domenico et al., 2020) . Many U.S. colleges temporarily closed or 32 switched to online in the spring semester, and over six out of ten colleges reopened in the fall semester 33 with an in-person or a combination of in-person and online teaching plans (Gallagher and Palmer, 2020; 34 College Crisis Initiative, 2020). College students mainly fall in the age cohort of 18 to 29 years, which has 35 a lower death rate (0.4%) from COVID-19 (Wrighton and Lawrence, 2020), but a greater chance of 36 socialization than the other age cohorts. Although colleges are following best practices-including 37 symptom screening, contact tracing, cleaning and disinfecting, and mandatory mask policies-there is a 38 chance that reopening colleges may spread the virus among faculty and staff serving students, and in 39 the region around the college (Hubler and Hartocollis, 2020). 40 Figure (1) and (2) respectively show daily new COVID-19 cases and deaths grouped by college teaching 41 modality. Both series are shown as percentages of their values on August 10th, 2020 to eliminate the 42 initial differences and make the figures comparable across groups. Median values are plotted to remove 43 the influence of extreme observations. Figure 1 suggests that colleges teaching in-person are located in 44 counties with rapidly increasing COVID-19 cases. Small bumps of cases right before the start dates 45 indicate reopening-induced testing. In Figure 2 , new deaths in counties with colleges teaching in person 46 are growing faster than hybrid and online groups. Thus, the figures suggest that both COVID-19 cases 47 and deaths grow faster in counties where colleges started the fall semester with in-person mode. 48 Rosenbaum and Rubin (1983, 1984) discuss propensity score matching (PSM) to adjust the probabilities 96 for the differences in pre-treatment variables. A propensity score is the conditional probability of 97 receiving the treatment given the pretreatment variables. In our context, the probabilities of selecting a 98 teaching mode can be obtained from the following specification. 99 {Online i , Hybrid i , In-person i } = f(College features, Regional features) (3) 100 The probabilities of treatments generated from equation (3) are used to create weights that adjust the 101 effects of treatments in estimating equation (2). For example, if a college has a high probability of 102 choosing online, then hybrid, then in-person, greater weights are assigned on in-person, then hybrid, 103 followed by online during the estimation (McCaffrey et al., 2013) . 104 The key difference between equation (3) and Rosenbaum-Rubin's matching model is that their model 105 mainly deals with binary treatments, whereas we have three treatments. Imbens (2000) extends the 106 concept to multiple treatments, and suggests a multinomial or nested logit for multivalued discrete 107 response models where the ordering of the response does not matter (see Lopez and Gutman (2017) for 108 a review). Recent works have also examined the application of machine learning approaches such as 109 bagging or boosting, random forests, other tree-based methods, and neural networks (McCaffrey, 110 Ridgeway, and Morral, 2004; Setoguchi et al., 2008; Lee, Lessler and Stuart, 2010; McCaffrey et al., 111 2013) . 112 We estimate a multinomial logit model for PSM (equation 3) using cross-sectional data of college and 113 county features. The predicted probabilities of adopting a teaching mode for each college are then 114 inverted and used as weights in a panel fixed effects (FE) regression following equation (2). The sources 115 of variation are colleges for the cross-section, and college times days for the panel. For robustness, we 116 repeat the task with a gradient booster model instead of multinomial logit. A logit model provides 117 interpretable coefficient estimates, whereas gradient boosting offers greater predictive accuracy 118 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted November 3, 2020. ; https://doi. org/10.1101 /2020 .10.28.20221986 doi: medRxiv preprint (Ridgeway et al., 2014 Burgette, Griffin and McCaffrey, 2017) . McCaffrey, Ridgeway, and Morral (2004) 119 and Mc-Caffrey et al. (2013 , 2015 discuss the technical details of the two approaches. In simple words, 120 a gradient booster involves growing decision trees from randomly chosen college and county 121 characteristics to predict the treatment group, and then repeating the process to explain the residuals 122 and re-adjust the importance of predictors to achieve the optimal prediction. The algorithm penalizes 123 complex and large decision trees in order to avoid overfitting (Chen and Guestrin, 2016). The 124 importance factors on predictors from gradient booster are not as interpretable as coefficients from a 125 multinomial logit, so we use the gradient booster to predict PSM only, and do not discuss the 126 importance factors. 127 128 We collected characteristics of four-year and two-year undergraduate degree-granting colleges and 130 universities from the National Center for Education Statistics (2020). Most of the colleges available in 131 the data are public or private non-profit. Data on teaching modality in the fall 2020 semester come from 132 the College Crisis Initiative (2020). The College Crisis initiative reports fall reopening plans, which we 133 categorize into three types: (1) in-person, (2) online, and (3) hybrid. Colleges in the "in-person" group 134 conduct classes in person with certain exceptions for online delivery and open residence halls; colleges 135 in the "online" group primarily conduct classes online with some exceptions for lab components and 136 may have some students on campus; and colleges in the "hybrid" group either switch the teaching mode 137 on a rolling basis, or offer courses with both in-person and online access. We manually collected the 138 official start dates of the fall 2020 semester from respective college websites. County characteristics 139 were extracted from the American Community Survey (2020). Merging all these data eventually gives 140 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. ; https://doi.org/10.1101/2020.10.28.20221986 doi: medRxiv preprint 745 colleges in bordering states. Alaska and Hawaii were omitted to avoid large heterogeneity in 141 regional features. 142 We obtain daily new COVID-19 cases and deaths at the county level from the New York Times (2020). 143 The impact of teaching modality is assessed on county-level outcomes (and not on the college level) 144 because students may spread the virus to non-students and immunocompromised people in the 145 community, which is a pressing policy concern. We also merged the data with individual mask policies in 146 county and state from HealthData.gov (2020). Finally, information on county-level shares of republican 147 votes in Presidential Election 2016 is included from McGovern et al. (2017) to take residents' perception 148 of COVID-19 risk into account (e.g., Tyson, 2020). 149 (2) and (3) Table ( 2) and table (4) indicate 160 considerable variation across treatment groups. The mean different t-tests are significant especially 161 when variables are compared between online and in-person group. The difference justifies the use of 162 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. ; https://doi.org/10.1101/2020.10.28.20221986 doi: medRxiv preprint PSM in our analysis. In fact, comparing propensity scores across treatment groups is equivalent to 163 comparing the pre-treatment covariates, which is done in the next section. 164 165 4. Results 166 To calculate the probability of choosing each teaching mode and re-adjust the weights, we use a 167 multinomial logistic regression. The results in table (5) shows that colleges that chose online teaching 168 mode over in-person have greater enrollment, endowment per student, and are public in nature. 169 Online-teaching colleges are located in a county that has more population, on-going mask ordinance, 170 and fewer republican votes. From hybrid and in-person comparison in table (5) column (2), hybrid-171 adopting colleges have more enrollment, a lower cost of attendance, fewer students per faculty and are 172 located in a county with fewer republican votes. To summarize, greater enrollment, fewer republican 173 supporters in the county, and prevailing individual mask ordinance are common predictors of adopting 174 online or hybrid teaching modality over in-person. 175 We use the multinomial logit model to calculate the propensity scores for each college. The scores need 176 to be sufficiently similar across groups for causal estimation. Figure ( 3) compares the distributions of the 177 propensity scores using box plots. The distributions are overlapping-an essential feature for causal 178 estimation (McCaffrey et al., 2013) . We invert the scores and use them as weights to adjust the 179 difference of college and county characteristics across treatment groups. The maximum Kolmogorov-180 Smirnov (KS) statistic of distributional equality is 0.33 for the unweighted sample and 0.12 for the 181 weighted sample, with p-values zero and 0.059, respectively. Therefore, we reject the null hypothesis of 182 distributional equality for the unweighted sample, but fail to reject the null for the weighted sample. 183 That is, we use PSM to create a weighted sample such that any teaching mode is equally likely to be 184 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. ; https://doi.org/10.1101/2020.10.28.20221986 doi: medRxiv preprint chosen by a college. We now utilize the weighted sample to estimate the causal effects of teaching 185 modality on COVID-19 cases and deaths. 186 Tables (6) and (7) We find a mixed result on the effects of online and hybrid modes-none is consistently greater than the 206 other. One explanation is that students might choose online when offered a combination (hybrid) of 207 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. HealthData.gov. 2020. "COVID-19 state and county policy orders." 263 https://healthdata.gov/dataset/covid-19-state-and-county-policy-orders, Accessed: 2020-09-30. 264 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. "Toolkit for weighting and analysis of nonequivalent groups: a tutorial for the R TWANG 300 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. ; https://doi. org/10.1101 org/10. /2020 CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. Source: Deaths are obtained from the New York Times (2020), and teaching modality from the College 345 Crisis Initiative (2020). 346 347 . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. ; https://doi.org/10.1101 https://doi.org/10. /2020 CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. ; https://doi.org/10.1101/2020.10.28.20221986 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. ; https://doi.org/10.1101/2020.10.28.20221986 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted November 3, 2020. ; https://doi.org/10.1101/2020.10.28.20221986 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted November 3, 2020. ; https://doi.org/10.1101/2020.10.28.20221986 doi: medRxiv preprint CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted November 3, 2020. ; https://doi.org/10. 1101 /2020 CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted November 3, 2020. ; https://doi.org/10. 1101 /2020 American Community Survey. 2020