key: cord-0764139-f70a2twc authors: Yeo, Sing Chen; Lai, Clin K.Y.; Tan, Jacinda; Gooley, Joshua J. title: A targeted e-learning approach to reduce student mixing during a pandemic date: 2020-06-12 journal: bioRxiv DOI: 10.1101/2020.06.10.135533 sha: 51d977319f0a9a078bc0ded302e750de5ec77dc3 doc_id: 764139 cord_uid: f70a2twc The COVID-19 pandemic has resulted in widespread closure of schools and universities. These institutions have turned to distance learning to provide educational continuity. Schools now face the challenge of how to reopen safely and resume in-class learning. However, there is little empirical evidence to guide decision-makers on how this can be achieved. Here, we show that selectively deploying e-learning for larger classes is highly effective at decreasing campus-wide opportunities for student-to-student contact, while allowing most in-class learning to continue uninterrupted. We conducted a natural experiment at a large university that implemented a series of e-learning interventions during the COVID-19 outbreak. Analyses of >24 million student connections to the university Wi-Fi network revealed that population size can be manipulated by e-learning in a targeted manner according to class size characteristics. Student mixing showed accelerated growth with population size according to a power law distribution. Therefore, a small e-learning dependent decrease in population size resulted in a large reduction in student clustering behaviour. Our results show that e-learning interventions can decrease potential for disease transmission while minimizing disruption to university operations. Universities should consider targeted e-learning a viable strategy for providing educational continuity during early or late stages of a disease outbreak. year, in which student clustering behaviour on school days changed little over the semester (Fig. 2c-d) . The 101 transition to e-learning for classes with >25 students effectively eliminated student clustering. These findings 102 were further visualised by plotting the data on university map to identify hot spots of clustering activity (Fig. 103 2e) . After e-learning, there was a marked reduction in student clustering in buildings where students usually 104 converged for classes and social activities. 105 Each e-learning transition was associated with a decrease in the number of unique pairs of students with 106 spatiotemporal overlap (Fig. 3a) . Over a typical day, nearly half a million unique pairs of students showed Wi-107 Fi connection overlap. This number was cut in half after e-learning was implemented for classes with >50 108 students, and it dropped further as more restrictive e-learning policies were enacted. We then examined the 109 degree of overlap for individual students (i.e., the number of unique pairs formed by a student), focusing on the 110 top 100 students per day with the greatest amount of spatiotemporal overlap with their peers (Fig. 3b) . Each e-111 learning transition was associated with a decrease in the degree of student overlap, with network plots 112 demonstrating weakening of the spatiotemporal student network (Fig. 3c) . 113 Next, we investigated scaling properties of student mixing patterns with the number of students 114 detected on campus. The number of Wi-Fi access points with student clustering increased with student 115 population size according to a power law distribution (Fig. 4a) . The relationship was super-linear whereby 116 growth in the number of student clusters accelerated with larger numbers of students on campus. Similar results 117 were observed for the daily duration of student clustering (Fig. 4b) . These findings were reproducible using 118 data from the prior academic year, demonstrating that scaling properties of student clustering behaviour with 119 population size were generalisable and not related to the COVID-19 pandemic or implementation of e-learning 120 the power law function was greater), and that e-learning resulted in a marked decrease in the frequency and 126 duration of clustering behaviour for all cluster sizes. In line with these observations, the number of unique pairs 127 of students with spatiotemporal overlap exhibited super-linear scaling with daily population size (Fig. 4c) , as 128 did the degree of overlap for 'highly-connected' individual students with their peers (Fig. 4d) . 129 The scaling properties of student mixing patterns have important implications for strategies that seek to 130 minimize person-to-person contact during a disease outbreak. We showed that a small decrease in student 131 population size resulted in a large reduction in student clustering behaviour. Hence, an important goal for 132 reducing risk of disease transmission is to decrease the number of students on campus. This can be achieved in 133 a predictable manner by implementing e-learning for all classes that exceed a given class size. It is more 134 practical to focus on larger classes because they are often conducted in a lecture format that can be converted 135 easily to e-learning (e.g., video lecture), and they are a main driver of student clustering behaviour on campus. 136 The power law scaling we observed is consistent with prior work demonstrating accelerated growth of 137 human interactions with city population size [14] [15] [16] [17] [18] . Epidemiological models indicate that these scaling 138 relationships drive super-linear growth of disease transmission rates as cities get bigger 15, 17, 18 . Like cities, 139 universities are complex systems composed of different infrastructural and social elements whose hierarchical 140 structures give rise to scaling laws 14, 19 . However, we found that the growth rates of student mixing patterns on 141 campus (determined by the exponent of the power law scaling function) were greater compared with studies on 142 scaling of human interactions with city size. This may be related to differences in student network dynamics 143 and university infrastructural components compared with cities in which they reside. Earlier work examined 144 social connectivity patterns derived from mobile phone call records and internet interactions [16] [17] [18] . Such methods 145 capture information about social networks but not the potential for physical contact between individuals. By 146 comparison, Wi-Fi connection data provide information on student proximity patterns, which is more relevant 147 for assessing disease transmission risk. 148 Our study is the first to measure campus-wide spatiotemporal mixing of students using the university's 8 points. Students are only detected if they have a Wi-Fi enabled device that is actively scanning for a Wi-Fi 151 access point. The location where a student is connected also depends on the proximity and range of the nearest 152 Wi-Fi access point. Despite these limitations, prior studies have shown that Wi-Fi connections are as accurate 153 as dedicated physical sensors (e.g., infrared beam-break or thermal sensors) for estimating student occupancy 154 of university rooms and buildings 20,21 . The daily pattern of student Wi-Fi connections also conforms to 155 expectations for different sites on campus including teaching spaces, libraries, food courts, and residential 156 buildings 20,22-24 . An important limitation of our study is that we did not investigate student mixing with 157 university staff or visitors because we only had Wi-Fi data for students. In future work, it will be important to 158 evaluate the scaling properties of clustering behaviour while considering all persons on campus. Here, we took 159 advantage of the university's existing Wi-Fi network infrastructure to collect data from students without the 160 need for their active participation. This approach can be adopted for continual monitoring of students' Wi-Fi 161 connection patterns and clustering behaviour, and it can be extended to include all other users of the Wi-Fi 162 network. 163 In conclusion, e-learning is an intervention that universities can use to provide educational continuity 164 while decreasing student-to-student contact during a disease outbreak. We recommend that e-learning be 165 incorporated into each university's pandemic preparedness plan. First, universities should evaluate their class 166 size distribution to determine the impact of a given e-learning policy on the daily number of students with in-167 class learning. This information makes it possible to achieve a targeted reduction in student population size 168 because the number of students on campus is dependent on the proportion of students with in-class learning. 169 Second, universities should develop the capacity to count the number of students on campus because 170 population size is a main driver of student clustering behaviour and mixing patterns. This can be achieved 171 using existing Wi-Fi network infrastructure, and the data can be used to derive scaling properties of student 9 implement e-learning in view of the local and nationwide health response to a disease outbreak. A partial 176 transition to e-learning may be most appropriate before widespread community spread has occurred, or during 177 recovery to normal school operations. A full transition to e-learning may be required near the peak of an 178 epidemic. Taken together, our study establishes a roadmap that universities can follow for making evidence-179 based decisions on students' learning and safety during the COVID-19 pandemic and future disease outbreaks. 180 181 Methods 182 Our study was performed using university archived data managed by the NUS Institute for Applied 184 Learning Sciences and Educational Technology (ALSET). The ALSET Data Lake stores and links deidentified 185 student data across different university units for the purpose of conducting educational analytics research 26 . 186 Data tables in the ALSET Data Lake are anonymised by student tokens which map identifiable data to a hash 187 string using a one-way function that does not allow recovery of the original data. The same student-specific 188 tokens are represented across tables, allowing different types of data to be combined without knowing students' 189 identities. The data types used in our study included basic demographic information (age, sex, ethnicity, 190 citizenship, year of matriculation), class enrolment information, and Wi-Fi connection metadata. Students 191 included in our study provided informed consent to the NUS Student Data Protection Policy, which explains 192 that student data may be used for research and evaluating university policies. Research analyses were approved 193 by the NUS Learning Analytics Committee on Ethics. 194 The timeline of COVID-19 cases in Singapore was determined using daily situation reports published 197 online by the Ministry of Health (MOH) 27,28 . Nationwide alerts and policies regarding the public health 198 response were taken from press releases available on the MOH website 29 . University policies enacted during 199 the COVID-19 outbreak were compiled from circulars distributed to staff and students, and they are archived 200 by the NUS Office of Safety, Health, and Environment 30 . 201 202 Student timetables and class size characteristics 203 Student data were analysed in the second semester of the 2018/19 and 2019/20 school years. This 204 allowed us to compare student behaviour before and during the COVID-19 outbreak over an equivalent period 205 (from January to May). Students' class schedules and class sizes were derived from student enrolment data 206 provided by the NUS Registrar's Office. At NUS, students enrol in course modules, many of which are further 207 divided into different lectures, class groups, tutorials, or laboratory sessions. We analysed data in students 208 taking at least one module that required in-class learning (23,668 and 23,993 students in 2018/19 and 2019/20 209 school years). Data were excluded from students taking only fieldwork or project-based modules with no in-210 class component (2,722 and 3,240 students). Class size was defined as the number of students who were 211 scheduled to meet in the same place for a given course module. The timing and location of classes were 212 retrieved using the NUSMods application programming interface (https://api.nusmods.com/v2/). Timetable 213 data were sorted for each school day of the semester to identify students with scheduled in-class learning. 214 These data were also used to determine which classes were converted to e-learning based on class size. This 215 allowed us to calculate the daily number of students with in-class learning, e-learning only, or no class. 216 217 Connections to the NUS Wi-Fi network are continually monitored by NUS Information Technology to 219 evaluate and improve services provided to the university. The campus-wide wireless network comprises 220 several thousand Wi-Fi access points and deploys different types of routers (Cisco Aironet 1142, 2702I and 221 2802I) and wireless protocols (802.11n 2.4 GHz, 802.11n 5 GHz, and 802.11ac 5 GHz). Each time that a 222 person's Wi-Fi enabled device associates with the NUS wireless network the transmission data are logged. 223 Students' Wi-Fi connection metadata were added daily to the ALSET Data Lake by a data pipeline managed by 224 access control (MAC) address used to identify the Wi-Fi enabled device of the student (e.g., smartphone, 226 tablet, or laptop), the name and location descriptor of the Wi-Fi access point, and the start and end time of each 227 Wi-Fi connection. The name and location descriptor usually carried information about the room or building in 228 which the Wi-Fi access point was located. By cross-referencing these data with the known timing and location 229 of classes, we categorised Wi-Fi access points into teaching facilities (lecture theatres or classrooms) and non-230 teaching facilities. 231 232 The Wi-Fi dataset comprised more than 24 million student connections to the wireless network over 2 234 semesters. Students' connection data were binned in 15-min intervals to reduce the size of the data, resulting in 235 11,328 epochs that spanned 118 days in each semester. In instances where students were connected to more 236 than one Wi-Fi access point in the same epoch, they were assigned to the access point in which their Wi-Fi 237 enabled device received the greatest volume of data (i.e., based on megabytes of data received). The resulting 238 table of Wi-Fi connections and access points was used to derive time and location information for each student 239 over the semester. This enabled us to count the daily number of students who connected to the Wi-Fi network, 240 and the number of students who were connected to the same Wi-Fi access point within a 15-min epoch. The 241 latter was used to examine student clustering behaviour. We defined a cluster as >25 students connected to the 242 same Wi-Fi access point because of the high potential for student-to-student contact, and it aligned with the 243 university's e-learning policy prior to suspension of in-class learning (i.e., e-learning for class size >25). The 244 duration of student clustering at each Wi-Fi access point was calculated as the sum of 15-min epochs with >25 245 students. Data were analysed using R statistical software (version 3.6.3) 31 . 246 Geospatial clustering was visualised by plotting students' data on a map of the NUS campus. The 247 researchers did not have access to the geospatial coordinates for Wi-Fi access points. Therefore, general 248 location information provided in the Wi-Fi metadata (e.g., name of the building or room) was used to 12 determine manually the building locations. Using sources that included the official NUS campus map and 250 venues listed on class timetables, we confirmed the geospatial coordinates for 80% of Wi-Fi access points. access points within the building. Subsequently, we merged the clustering duration data with the ESRI 258 shapefiles using the "sf" package (version 0.9-0) 32 in the R software environment. The QGIS platform was then 259 used to visualise student clustering for 124 buildings across the NUS campus. Buildings with incomplete Wi-Fi 260 data and student hostels were excluded from the analysis. 261 The number of unique pairs of students with spatiotemporal overlap in their Wi-Fi connections was 262 determined for 4 representative weeks of the semester (weeks 4, 5, 11, 12). These time intervals captured the 263 transition from normal in-class learning to e-learning for classes with >50 students (week 4 to 5), and the 264 transition from e-learning for classes with >25 students to e-learning for all classes (week 11 to 12). The 265 decision to focus on these temporal windows was driven by practical reasons related to computing resources 266 required to analyse the data. In each student, the degree of Wi-Fi connection overlap was determined by 267 counting the number of unique students with whom he/she shared a Wi-Fi connection. Our analyses focused on 268 the top 100 students per day with the greatest degree of overlap with their peers because we expected this 269 group would illustrate best the impact of e-learning on individual student networks. This student group size 270 was also practical for visualising effects of e-learning on student network structure, which was performed using 271 the "igraph" package 33 (version 1.2.5) with the force-directed layout algorithm (layout_with_fr) in the R 272 software environment. 273 Student clustering behaviour on school days was modelled as a function of daily student population size 274 using a power law scaling equation: = . In this equation, is the measure of student mixing (e.g., 275 number of Wi-Fi access points with a student cluster, duration of student clustering, or pairs of students with 276 Wi-Fi connection overlap); is a constant; is the daily population size estimated by the number of students 277 who connected to the NUS Wi-Fi network; and the exponent reflects the underlying dynamics (e.g., 278 hierarchical structure, social networks, and infrastructure) of the university ecosystem. We considered other 279 mathematical functions, including exponential and hyperbolic equations, but they did not fit as well to the data. 280 Variables that show power law scaling are linearly related when each variable is logarithmically transformed. 281 We therefore took the natural logarithm of each pair of variables (i.e., the student mixing variable and daily 282 population size) and performed linear regression to confirm the expected linear relationship. The coefficient of 283 determination (R 2 value) was used to evaluate goodness-of-fit for the regression model. Modelling and 284 regression analyses were performed using Sigmaplot software (Version 14; Systat Software, Inc) and R 285 statistical software. 286 287 The data that support the findings of this study will be made available from the corresponding author upon 289 reasonable request. Requests will be handled in compliance with data sharing and data management policies of 290 the National University of Singapore. Class sizes were categorized as small (green; ≤25 students), medium (blue; >25 to ≤50 students), or large (red; >50 461 students). (b) The combined student enrolment in medium and large classes was greater than enrolment in small classes. (c) The cumulative distribution plot shows the number of students whose smallest class of the day exceeded a given class 463 size threshold. The black trace with shaded grey lines shows the daily mean and range. The red dropline shows that the 464 transition to e-learning for classes with >50 students resulted in about 5,000 students per day who had classes delivered 465 only by e-learning. The blue dropline shows that the transition to e-learning for classes with >25 students resulted in 466 about 9,000 students per day who had classes delivered only by e-learning. When all classes were shifted to e-learning 467 there were about 18,000 students per day taking their classes online. Extended Data Fig. 3 . The daily number of students detected on campus was predicted by the number of students campus. Data are shown for the second semester of the 2019/20 school year at the National University of Singapore 531 (NUS) during the COVID-19 outbreak. Different definitions of a student cluster were tested ranging from >5 to >50 532 students detected at the same Wi-Fi access point. For all cluster sizes, student clustering behaviour showed accelerated 533 growth with increasing number of students detected on campus, including (a) the number of Wi-Fi locations with a 534 student cluster, and (b) the duration of student clustering at these locations. Each dataset was fitted with a power law 535 function, with β representing the scaling exponent. Insets show results for linear regression after taking the natural 536 logarithm of each variable. Circle colours correspond to different parts of the semester with normal in-class learning 537 (green), e-learning for classes with >50 students (red), e-learning for classes with >25 students (blue), and e-learning for 538 all classes (orange). Open circles indicate non-class days. The cumulative duration of student clustering (>25 students connected to the same Wi-Fi access 495 point) is shown for (a) the second semester of the 2018/19 school year, and (b) the second semester of the 2019/20 496 school year in which the COVID-19 outbreak occurred. Data are plotted for Wi-Fi access points with at least one student 497 cluster detected during the semester (785 out of 6,573 locations The authors declare no competing interests. 384