key: cord-0254450-ee0lghe3 authors: Susswein, Z.; Rest, E. C.; Bansal, S. title: Disentangling the rhythms of human activity in the built environment for airborne transmission risk date: 2022-04-16 journal: nan DOI: 10.1101/2022.04.07.22273578 sha: 7346f8ad49bc90388c44838edc3867826f7e5b94 doc_id: 254450 cord_uid: ee0lghe3 Since the outset of the COVID-19 pandemic, substantial public attention has focused on the role of seasonality in suppressing transmission. Misconceptions have relied on seasonal mediation of respiratory diseases driven solely by environmental variables. However, seasonality is expected to be driven by host social behavior, particularly in highly susceptible populations. A key gap in understanding the role of social behavior in respiratory disease seasonality is our incomplete understanding of the seasonality of indoor human activity. We leverage a novel data stream on human mobility to characterize activity in indoor versus outdoor environments in the United States. We use a mobile app-based location dataset encompassing over 5 million locations nationally. We classify locations as primarily indoor (e.g. stores, offices) or outdoor (e.g. playgrounds, farmers markets), disentangling location-specific visitor counts into indoor and outdoor, to arrive at a fine-scale measure of indoor to outdoor human activity across time and space. We find the proportion of indoor to outdoor activity during a baseline year is seasonal, peaking in winter months. The measure displays a latitudinal gradient with stronger seasonality at northern latitudes and an additional summer peak in southern latitudes. We statistically fit this baseline indoor-outdoor activity measure to inform incorporation of this complex empirical pattern into infectious disease dynamic models. However, we find that the disruption of the COVID-19 pandemic caused these patterns to shift significantly from baseline, and the empirical patterns are necessary to predict spatio-temporal heterogeneity in disease dynamics. Our work empirically characterizes, for the first time, the seasonality of human social behavior at a large-scale with high spatio-temporal resolution, and provides a parsimonious parameterization of seasonal behavior that can be included in infectious disease dynamics models. We provide critical evidence and methods necessary to inform the public health of seasonal and pandemic respiratory pathogens and improve our understanding of the relationship between the physical environment and infection risk in the context of global ecological change. The seasonality of infectious diseases is a widespread and familiar phenomenon. Although a number of potential mechanisms driving seasonality in directly transmitted infectious disease have been proposed, the causal process behind seasonality is still largely an open question [1, 2, 3] . In the case of the influenza virus, seasonal changes in humidity have been identified as a potential mechanism, with drier winter months enhancing transmission [4, 5, 6] ; similar patterns have been observed for respiratory syncytial virus and hand foot and mouth disease [7, 8] . However, humidity is but one of many mechanisms contributing to seasonality in infectious disease transmission. Seasonal changes in temperature, human mixing patterns, and the immune landscape, among other factors, are thought to contribute to transmission dynamics [9, 10, 11, 12, 2] . The relative importance of these disparate mechanisms varies across directly-transmitted pathogens and is still 1 largely unexplained [1, 3] . The influence of seasonal host behavior on respiratory disease seasonality remains particularly understudied [13, 11] except for a few notable examples [14, 15, 16] . For respiratory pathogens spread via the aerosol transmission route, in particular, seasonality may be mediated by multiple behaviorally-driven mechanisms. Aerosol transmission, a significant mode of transmission for a number of respiratory pathogens including tuberculosis, measles and influenza [17] , has become increasingly acknowledged during the COVID-19 pandemic [18, 19, 20, 21, 22] . The role of aerosols in respiratory disease transmission allows for transmission outside of the traditional 6 ft. radius and 5 minute duration for the droplet mode, and implicates human mixing in indoor locations with poor ventilation as being a high-risk for transmission, regardless of the intensity of the social contact. While more is known about the spatio-temporal variation in the indoor environment that is experienced when people spend time indoors (e.g. [23] ) as well as the impact this has on airborne pathogen transmission (e.g. [24, 25] ), limited information is available on rates of indoor activity. In the US, most studies quantifying indoor and outdoor time are conducted in the context of air pollutants, suffer from small study size, lack spatio-temporal resolution, and are outdated. The most cited estimates originate from the 1980s-90s and estimate that Americans spend upwards of 90% of their time indoors [26] ; and more recent data agree with these estimates [27, 28] . While it is well-understood that seasonal differences and latitude likely affect time spent indoors, little is known of the spatio-temporal variation in indoor activity beyond this one monolithic estimate, vastly limiting our ability to comprehensively characterize the seasonality of airborne disease exposure risk. Because our understanding of the drivers of seasonality for respiratory diseases has been limited, the modeling of seasonally-varying infectious disease dynamics has been traditionally done using environmental datadriven or phenomenological approaches. Environmental data-driven approaches incorporate seasonality into epidemiological models through environmental correlates of seasonality, such as solar exposure or outdoor temperature [12, 7, 29] . This approach to seasonal dynamics controls for inter-seasonal variation in transmission dynamics and measures the strength of correlations between proposed metrics and seasonal variation in force of infection -although the observed relationship is rarely causally relevant for respiratory disease transmission. In contrast, phenomenological models such as seasonal forcing approaches modulate transmissibility over time without specifying a particular mechanism for this modulation [30, 2] . By applying well-understood functions (such as sine functions), seasonal forcing allows for flexible specification and quantification of dynamics, such as periodicity or oscillation damping, and indirectly captures seasonal variation in non-environmental factors such as school mixing. A significant remaining gap in seasonal infectious disease modeling is thus the ability to empirically incorporate spatio-temporal variation in behavioral mechanisms driving seasonality of disease exposure and transmission. Thus, despite the role of the indoor built environment in exposure to the airborne transmission route, seasonal variation in indoor human mixing has not yet been systemically characterized, nor integrated into mathematical models of seasonal respiratory pathogens. To address this gap, we construct a novel metric quantifying the relative propensity for human mixing to be indoor at a fine spatio-temporal scale across the United States. We derive this metric using anonymized mobile GPS panel data of visits of over 45 million mobile devices to approximately 5 million public locations across the United States. We find a systematic latitudinal gradient, with indoor activity patterns in the northern and southern United States following distinct temporal trends at baseline, but find that the COVID-19 pandemic disrupted this structure. Lastly, we fit simple parametric models to incorporate these seasonal activity dynamics into models of infectious disease transmission when indoor activity is expected to be at baseline. Our work provides the evidence and methods necessary to inform the epidemiology of seasonal and pandemic respiratory pathogens and improve our understanding of the relationship between the physical environment and infection risk in the light of global change. Based on anonymized location data from mobile devices, we construct a novel metric that measures the relative propensity for human activity to be indoors at a fine geographic (US county) and temporal (weekly) scale. We characterize the systematic spatio-temporal structure in this metric of indoor activity seasonality with a time series clustering analysis. We also characterize the shift that occurred in the baseline patterns 2 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 16, 2022 County (in the northern US) has high indoor activity in the winter months, and a deep trough in indoor activity in the summer months. Maricopa County (in the southern US) sees moderate indoor activity in the winter and an additional peak in indoor activity during the summer. We apply a 3-week rolling window mean for visualization purposes. (B) A heatmap of the indoor activity seasonality metric for all US counties by week for the calendar year 2018. Counties are grouped by state and are ordered alphabetically by state. We see significant spatio-temporal heterogeneity with distinct trends in the summer versus winter seasons. 3 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 16, 2022. ; of indoor activity seasonality during the COVID-19 pandemic. We note that this seasonal variation in propensity of human activity to be indoors is different from the variation in overall rates of contact between individuals, which does not vary seasonally S1. Lastly, we fit non-linear models to the indoor activity metric at baseline, comparing the ability of a simple model to capture seasonal variation in transmission risk. The indoor activity seasonality metric, σ, captures the relative frequency of visits to indoor versus outdoor locations within an area. The components of σ capture the degree to which indoor and outdoor locations are occupied; when σ = 1, a given county is at its specific average propensity for indoor activity relative to outdoor. When σ < 1, activity within the county is more frequently outdoor and less frequently indoor than average, while σ > 1 indicates that activity is more frequently indoor and less frequently outdoor than average. Thus, a σ of 1.2 indicates that the county's activity is 20% more indoor than average and a σ of 0.80 indicates that the county's activity is 20% less indoor than average (additional details in methods). Through this metric, we measure the relative propensity for human activity to be indoors for every community (i.e. US county) across time, finding systematic heterogeneity between counties ( Figure 1A ). The representative examples of Cook County, Illinois (home of the city of Chicago in the midwestern US) and Maricopa County, Arizona (the home of the city of Phoenix in the southwestern US) highlight systematic spatial and temporal heterogeneity in indoor activity dynamics. In Cook County, indoor activity varies over time, at its peak in the winter, with the relative odds of an indoor contact well above average. During the summer, σ in Cook County reaches its trough, with activity systematically more outdoor on average. On the other hand, the variation of σ across time in Maricopa County is characterized by a smaller winter peak in indoor activity, and an additional peak in the summer (i.e. July and August); this peak occurs concurrently with the trough in Cook County. Unlike in Cook County, σ in Maricopa County is lowest in the spring and fall. These representative counties illustrate the systematic within-county variation in indoor activity over time, as well as the between-county variation in temporal trends as represented in Figure 1B for all US communities. To identify systematic geographic structure, we cluster the heterogeneous time series of county-level, weekly indoor activity. We find three geographic clusters corresponding to groups of locations that experience similar indoor contact dynamics ( Figure 2 ). These clusters primarily split the country into two clusters: a northern cluster and southern cluster. Among the communities in the northern cluster, activity is more commonly outdoor over the summer months, trending toward indoor during fall, with a peak in the winter months, as observed in Cook County. Comparatively, the southern cluster has a larger winter peak (i.e. between December and February) and a smaller summer peak (i.e. between July and August); most summer peaks are less extreme than that of Maricopa County (shown). We hypothesize that these two clusters are consistent with climate zones. We compare these clusters to climate zones defined for the construction of the indoor built environment and find that there is substantial consistency between the two (Supplementary Figure S3 ). The third cluster differs substantially: it is geographically discontiguous and its two annual peaks occur during the spring (close to April) and fall (closer to November) seasons. Thus, the counties in this cluster have outdoor activity more frequently than average during both the winter and the summer. The counties in this cluster correspond to locations that are hubs for winter tourism, which we speculate is driving their unique dynamics (Supplementary Figure S4 ). In addition to the description of indoor activity seasonality at baseline, we examine the impact of a large-scale disruption -the COVID-19 pandemic -to these patterns. We compare indoor activity seasonality during the COVID-19 pandemic in 2020 to the baseline patterns of 2018 and 2019. We find that the temporal trends in indoor activity are more heterogeneous in 2020 than those of previous years (see Supplementary Figure S5 for a characterization of the variability). We focus on four case studies to highlight the varying impacts on indoor activity of the pandemic disruption ( Figure 3 ). We find that in most locations indoor activity deviated from pre-pandemic trends. However, we highlight that in a subset of counties -such as 4 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022 Figure 2 : Using a time series clustering approach on the indoor activity time series for each US county, we identify groups of counties that experience similar trends in indoor activity. Locations in the northern cluster (light blue) follow a single peak pattern with the highest indoor activity occurring every winter. Locations in the southern cluster (dark blue) experience two peaks in indoor activity each year, one in the winter and a second, smaller one in the summer. The third cluster also experiences two peaks not matching environmental conditions, but potentially corresponding to winter tourism areas. We apply a 3-week rolling window mean to the time series for visualization purposes. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022 2020 in four case study locations. We find that most locations saw a shift in their indoor activity patterns, while others (such as Maricopa County) did not. We also find that while overall activity was diminished uniformly during the Spring of 2020, indoor activity decreased in some locations (Travis County, Texas and Baltimore County, Maryland) and increased in others (Charleston County, South Carolina). We apply a 3-week rolling window mean to the time series for visualization purposes. 6 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. to seasonal forcing model components) fit the northern cluster better than the southern cluster, with a markedly poorer fit for the southern cluster's second, summer peak. (B) Regional seasonal forcing models display variation in patterns of disease incidence omitted by a non-seasonal model, but even region-level seasonal forcing does not fully capture within-cluster county-level variation. Maricopa County (home of the city of Phoenix, AZ) -the time series were largely unperturbed relative to prior years. We also find that in early 2020, when there was substantial social distancing in the United States (e.g. school closures, remote work), activity was more likely to be outdoor than in prior years, independent of changes in overall activity levels. With our case studies, we highlight that social distancing policies can have different impacts of airborne exposure risk in different locations: while some locations, such as Travis County (home of Austin, Texas), shifted activities outdoor during this period, reducing their overall risk further, other locations, such as Charleston County, South Carolina (home of Charleston, South Carolina) increased indoor activity above the seasonal average during this period, potentially diminishing the effect of reducing overall mobility. We use this finely grained spatio-temporal information on indoor activity to incorporate airborne exposure risk seasonality into compartmental models of disease dynamics using common, coarser seasonal forcing approaches. To investigate the impact of heterogeneity in σ on estimation of seasonal forcing for infectious disease models, we fit a sinusoidal model to the time series of indoor activity for each of the primary clus-7 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022. ; ters ( Figure 4A ). We find that the parameters of seasonality vary across clusters: the amplitude is higher and phase is lower in the northern cluster compared to the southern cluster, indicating a difference in the variability of indoor and outdoor activity seasonality in each cluster (Supplementary Figure S7) . The sinusoidal model was a poorer fit for the southern cluster, particularly around the second peak of indoor activity during the summer months. These differences in best fit indicate that sinusoidal models may have an overly restrictive functional form, limiting the accuracy of the approximation, and may underestimate the impacts of seasonality on transmission, obscuring systemic differences between regions. Furthermore, differences in seasonal activity of the observed magnitude can have important implications for disease modeling; applying region-level and county-level forcing to a simple disease model alters incidence patterns ( Figure 4B ). Although region-level seasonality changes incidence timing and peak size relative to a non-seasonal model, it does not fully capture the changes produced by county-level seasonality. These differences indicate that while coarser geographic approximations of seasonality can be appropriate, these approximations can also oversimplify, reducing the accuracy of disease models. Additionally, while simple models of baseline indoor activity can capture seasonality in exposure risk, disruptions such as pandemics can alter this baseline structure and increase heterogeneity. The seasonality of influenza, SARS-CoV-2 and other respiratory pathogens depends not only on environmental variables, but also the social behavior of hosts. In settings with little prior immunity -like the COVID pandemic in early 2020 -host social behavior (generating contacts during which transmission may occur) primarily drives heterogeneity in disease dynamics, and seasonality is dwarfed by susceptibility [31] . In settings with higher rates of immunity, contact remains critically important and seasonal changes in contacts (both direct and indirect) can contribute to movement of R t above and below 1 -providing noticeable changes in incidence. Although environmental variables play a role in the seasonality of respiratory pathogens, the role of host social behavior in pathogen seasonality is poorly understood, driven by a poor understanding of indoor versus outdoor social interactions and interactions between behavior and the environment. In this study, we propose a fine-grain measure of indoor activity seasonality across time and space. We determine that indoor activity seasonality displays significant spatio-temporal heterogeneity and that this variability can be decomposed into two geographic groups representing distinct temporal dynamics in indoor activity. We also find that while indoor activity seasonality may be highly predictable under baseline conditions, disruptions such as the COVID-19 pandemic can alter these patterns. Finally, we provide an illustration of how our findings can be incorporated into classical infectious disease models using parsimonious models of exposure seasonality. The indoor activity seasonality that we quantify may reflect heterogeneity in transmission risk via a number of mechanisms including those affecting host contact, susceptibility, or transmissibility. Increased indoor activity may indicate longer-duration airborne contact (e.g. co-location without direct interaction) between susceptible and infected individuals, elevating respiratory transmission risk. Increased indoor density may also suggest increased droplet contact (e.g. a conversation in close proximity), under homogeneous mixing. Additionally, indoor activity may suggest increased susceptibility as poor ventilation, increased pollutants, reduced solar exposure, and low humidity of the indoor environment has been shown to weaken immune response [32] . Finally, increased indoor activity may indicate an increase in transmissibility due to higher exposure as low humidity caused by HVAC in indoor environments has been shown to increase viral survival and HVAC re-circulation has been shown to increase viral dispersion [33, 34] . While our new measure does not disentangle these component mechanisms, it represents an integrated seasonality in exposure risk due to all of these factors, and can help lead us to a more complete understanding of the heterogeneity in disease dynamics and outcomes. We find that spatio-temporal heterogeneity in the indoor activity metric can be classified into two large geographically-contiguous groups in the northern and southern United States. These groups closely correspond to built environment climate zones, potentially explaining this systematic variability. We note, however, that while these clusters overlap with climate classifications, this correspondence does not suggest that environmental variables such as temperature and humidity should be used to represent behavioral het-8 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022. ; erogeneity. Climatic factors within these climate zones may be related to, but not necessarily correlated with, the seasonality of human mixing within these zones. Additionally, even in the case that environmental factor variability drives behavioral variability, it would be critical to capture the effect of behavior on disease directly so as to not obscure any direct effects of climatic factors on disease. We illustrate how to incorporate seasonality in exposure risk to future models of disease dynamics using a simple phenomenological model. We use this traditional model of infectious disease dynamics to evaluate the implications of spatial coarseness of seasonal forcing. Our results suggest that the substantial local heterogeneity in the dynamics of indoor activity across time and space could be large enough to alter seasonality in infectious disease dynamics. We suggest that researchers should carefully consider the spatial scale on which they model seasonality. We additionally highlight that the use of simple or complex functional forms of seasonality requires statistical fits to baseline data, and in the case of disruptions, these fitted models may no longer be appropriate. As we show, patterns of human mobility changed substantially during the COVID-19 pandemic, potentially contributing to changes in infectious disease seasonality. Recent work during the COVID-19 pandemic demonstrates the impact of reduced occupancy in indoor locations and increasing outdoor activity on the likelihood of disease transmission. In particular, behavioral interventions or nudges that reduce occupancy are more impactful than reducing overall mobility as they reduce visitor density and the likelihood of density-dependent airborne transmission. Similarly, the availability of outdoor areas in urban settings, such as public parks, has been demonstrated to reduce case rates when population mobility becomes less restricted [35] . Our results suggest that such public health strategies should be implemented in a targeted manner, informed by real-time data and with clear communication of the goals. We found notable changes occurred in indoor activity seasonality at the start of the COVID-19 pandemic, despite relatively consistent patterns during the spring season in prior years. Designing a behavioral strategy and measuring its effectiveness without real-time data could thus be misleading. Our finding of two distinct geographic clusters of indoor activity suggests the need for geographical targeting of strategies to reduce indoor transmission risk. While northern latitudes might benefit from decreased indoor occupancy and increased outdoor activity in Northern Hemisphere winters, southern latitudes should be targeted for such interventions in summer months. Lastly, our findings highlight the need to clearly communicate the goals of behavioral interventions. While all communities universally reduced overall activity during the early days of the COVID-19 pandemic, some increased indoor activity during this time, potentially diminishing the positive effects of the social distancing policies put into place. A public health education campaign to clarify the role of indoor interactions in transmission risk may have ameliorated this. Our study leverages a novel data stream that has been made available due to the COVID-19 pandemic. Such novel data streams offer many opportunities to address long-unanswered questions in infectious disease behavior dynamics, but these data must be interpreted carefully. Safegraph's mobile-app-based location data does not include data on individuals less than 16 years of age [36] . While we may expect that children under 12 may be accompanied by adults that may be represented in the dataset, our metric likely does not capture the activity dynamics of older children (children 12-15 make up 5% of the US population). For those included in the Safegraph database, representation is dependent on smartphone usage as well as a number of business processes not transparent to users of the data, thus we expect that there is geographic variation in the representativeness of the data. Smartphone ownership has increased in recent years with 85% of US adults reporting smartphone ownership; however, smartphone usage does vary significantly by age with only 61% of adults over 65 reporting smartphone use [37] . Based on an analysis done by Safegraph, the panel is representative of race, educational attainment and income at the US county level (but not at finer spatial scales) [38] . On the other hand, a recent independent analysis shows that older and non-white individuals are less likely to be captured in the panel for POI-specific analyses [39] . It is important to note that both studies are associative in nature as the devices in the panel are fully anonymized so no device-level demographic data exists. Continued work to understand the sampling biases of such datasets will be needed so that improved bias correction approaches can be developed [39] . Additionally, we limit our scope in this study to consider only number of visits and do not incorporate information about visit duration. The dataset counts all visits of one minute or longer. For disease transmission, there may be a threshold duration required for an interaction between an infected and susceptible individual for infection to be propagated. These thresholds are not well-understood for all respiratory diseases, but evidence that SARS-CoV-2 transmission can occur with brief encounters has emerged [40] . While the Safegraph dataset does provide median dwell times for POIs, the likely significant heterogeneity in the distribution of dwell times remains unknown and is difficult to capture in an aggregated manner. Our metric and analysis also focuses on the US county scale to reflect the finest scale generally used for infectious disease modeling as well as public health decision-making. This choice is likely to ignore some within-county heterogeneity, and means that our metric does not represent the experience of all groups, particularly by socioeconomic status. For example, low-income and racially marginalized communities have systematically less access to outdoor, natural spaces and spend more time indoors due to structural inequities including lack of paid leave [28, 41, 42] . Thus our estimate of a county's indoor transmission risk represents an underestimate of the risk experienced by individuals in these communities. We commit to continued work to better characterize the transmission risk experienced by vulnerable populations. Lastly, we acknowledge that data modeling work that can influence public health policy decisions, particularly during an ongoing crisis, must be done with care to prevent misconceptions from having adverse effects on risk perception and policies [43] . We thus strongly note that while our measure of indoor behavioral seasonality provides a potential driver of respiratory disease seasonality, it remains one among many complex factors which integrate to predict the transmission potential of an ongoing epidemic or pandemic [44] . Thus we cannot rely on behavioral seasonality to diminish transmission naturally, and pandemic intervention strategies should not be planned around behavioral seasonality while population susceptibility remains high in so many locations. Ongoing global change events highlight the importance of this work, as it informs how widespread disruptions may shift patterns of indoor activity, potentially altering traditional infectious disease seasonality. Global change events will continue to cause significant disruption to normal behavior patterns; mechanistic understanding of infectious disease seasonality and real-time data collection will be crucial components to future disease control efforts. While other global change events may impact indoor activity in different ways than the COVID-19 pandemic, a rigorous understanding of the impact of host behavior on infectious disease allows policymakers and emergency preparedness experts to effectively address future disruptions. We use the SafeGraph Weekly Patterns data, which provides foot traffic at public locations ("points of interest") across the US based on the usage of mobile apps with GPS [45] . The data are from 2018 to 2020, and 4.6 million POIs are sampled in all years of our study. The data is anonymized by applying noise, omitting data associated with a single mobile device, and is provided at the weekly temporal scale. Data are sampled from over 45 million smartphone devices (of approximately 275-290 million smartphone devices in the US during 2018-2021 [46] ), and does not include devices that are out of service, powered off, or ones that opt out of location services on their devices. As a data cleaning step, we use spatial imputation for any county-weeks in which the visitor count is less than 100. Ethical review for this study was sought from the Institutional Review Board at Georgetown University and the study was approved on October 14, 2020. Safegraph Points of Interest (POIs) are locations where consumers can spend money and/or time and include schools, hospitals, parks, grocery stores, and restaurants, etc, but do not include home locations. Each POI is assigned a six-digit North American Industry Classification System (NAICS) code in the SafeGraph Core Places dataset to classify each location into a business category. We classify each NAICS category as primarily indoor (e.g. schools, hospitals, grocery stores), primarily outdoor (e.g. parks, cemeteries), or unclear if the location is a potentially mixed indoor and outdoor setting. Approximately 90% of POIs were classified as indoors, 6.5% were classified as outdoors, and 3.5% were classified as unclear. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022. ; We define σ it , equation (1), as the propensity for visits to be to indoor locations relative to outdoor locations. We aggregated raw visitor counts, defined when a device is present at a non-home POI for longer than one minute, to all indoor POIs and all outdoor POIs in a given week (t) at the U.S. county level (i). Visitor counts are normalized by the maximum visitor counts for indoor or outdoor locations in each county during the year 2019. This metric is then mean-centered to arrive at a relative measure of indoor activity seasonality, σ it , which is comparable across all counties: We note that µσ is not spatially structured (see Supplementary Figure S2 ). To characterize groups of US counties with similar indoor activity dynamics, we use a complex networksbased time series clustering approach [47] . We first calculate the pairwise similarity between z-normalized indoor activity time series for each pair of counties, i and j using the Pearson correlation coefficient (ρ ij ). For pairs of locations where ρ ij ≥ 0.9, we represent the pairwise time series similarities as a weighted network where nodes are US counties and edges represent strong time series similarity. We then cluster the time series similarity network using community structure detection. This method effectively clusters nodes (counties) into groups of nodes that are more connected within than between. The resulting clustering thus represents a regionalization of the U.S. in which regions consist of counties that have more similar indoor activity dynamics to each other than to other regions. One benefit of the network-based community detection approach over traditional clustering methods is that community detection does not require user specification of the number of clusters (regions, in this case); instead the number of clusters emerge organically from the data connectivity [48] . For community detection, we use the Louvain method [49] , a multiscale method in which modularity is first optimized using a greedy local algorithm, on the similarity network with edge weights (i.e. time series correlations) using a igraph implementation in Python [50] . We investigate the COVID-19 pandemic's impact on indoor activity seasonality by comparing pre-pandemic mobility patterns in 2018 and 2019 with mobility patterns during the COVID-19 pandemic in 2020. We compared the proportion of indoor visitor counts at the county level, σ it , across 2018, 2019, and 2020 to examine changes in indoor activity seasonality during the COVID-19 pandemic. We also examined total activity, aggregating visitor counts to indoor, outdoor, and unclear POIs by week and mean-centering them for each US county during the COVID-19 pandemic in 2020. We seek to illustrate the impact of incorporating seasonality into an infectious disease model using a phenomenological model versus empirical data. To achieve this, we parameterize a simple compartmental disease model with a seasonality term, using either our empirically-derived indoor activity seasonality metric or an analytical phenomenological model of seasonality fit to this metric. We first fit our empirically-derived indoor activity seasonality metric using a time-varying non-linear model. We specify the time-varying effect as a sinusoidal function as is commonly done to incorporate seasonality 11 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022. ; into infectious disease models phenomenologically. The indoor activity seasonality, σ it for cluster i at week t is specified as: σ it = 1 + α i sin(ω i t + ϕ i ), where α i is the sine wave amplitude, ω i is the frequency and ϕ i is the phase. We fit a model for locations in the northern cluster separately from those in the southern cluster, as identified above. We fit the parameters for this model using the nlme package in R. We model infectious disease dynamics through a simple SIR model of disease spread: We incorporate alternative seasonality terms to consider the impact of heterogeneity in indoor seasonality on disease dynamics. For the northern and southern cluster separately, we define modeled seasonality as β(t) = 1 + α sin(ωt + ϕ), with the fitted parameters for each cluster (Supplementary Figure S7) . We also consider two exemplar locations for empirical estimates of seasonality, where β(t) = σ t after rolling window smoothing: Cook County for an example county from the northern cluster, and Maricopa County for an example location from the southern cluster. We also compare against a null expectation where β(t) = 1. (All seasonality functions are illustrated in Supplementary Figure S6 ). We assume that β 0 = 0.0025 and γ = 2 (on a weekly time scale). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 16, 2022 Consistency between climate zone and indoor activity geography Figure S3 : (A) The IECC climate zones are based on temperature, humidity, and rainfall in each county and govern the type building material and amount of ventilation required in a building. (B) The consistency between the two primary clusters of indoor activity identified by our analysis and the IECC climate zones. Figure S4 : The third indoor seasonality cluster displays some correlation with areas of increased winter tourism, including US ski areas in western and northeastern states, potentially contributing to off-season activity increases. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 16, 2022 Figure S5 : (A) Indoor seasonality during 2020 can be clustered into four groups, although clusters are more geographically fragmented than previous years. (B) Time series for 2020 indoor seasonality clusters display heterogeneous trends that were not apparent in previous years, with some clusters more variable than others. Figure S6 : The seasonal forcing functions (β9t)) we used in the epidemiological model. The non-seasonal model (grey) shows no variation in transmission risk over time. We model northern seasonality via a sinusoidal model fit to the northern indoor activity data (light blue solid) and via the empirically-measured indoor seasonality from a county in the northern cluster (Cook County, light blue dotted). We model southern seasonality via a sinusoidal model fit to the southern indoor activity data (dark blue solid) and via the empirically-measured indoor seasonality from a county in the northern cluster (Maricopa County, dark blue dotted). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 16, 2022. ; https://doi.org/10.1101/2022.04.07.22273578 doi: medRxiv preprint Figure S7 : Inferred parameters for the sinusoidal model fits of the indoor activity data for the northern and southern clusters show a similar frequency, but greater amplitude and shorter phase in the northern cluster. Values displayed are mean parameter estimates. Standard errors for all parameters are smaller than 5e-3 and thus are not displayed. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 16, 2022. ; https://doi.org/10.1101/2022.04.07.22273578 doi: medRxiv preprint The calendar of epidemics: Seasonal cycles of infectious diseases Seasonality and the dynamics of infectious diseases Seasonal infectious disease epidemiology Absolute humidity modulates influenza survival, transmission, and seasonality Absolute humidity and the seasonal onset of influenza in the continental United States Urbanization and humidity shape the intensity of influenza epidemics in US cities Epidemic dynamics of respiratory syncytial virus in current and future climates The influence of temperature and humidity on the incidence of hand, foot, and mouth disease in Japan Seasonality and comparative dynamics of six childhood infections in pre-vaccination Copenhagen Social contacts and mixing patterns relevant to the spread of infectious diseases Drivers of infectious disease seasonality: potential implications for COVID-19 Exploring the seasonal drivers of varicella zoster transmission and reactivation Seasonality of viral infections: mechanisms and unknowns Explaining seasonal fluctuations of measles in Niger using nighttime lights imagery Seasonality, disease and behavior: Using multiple methods to explore socio-environmental health risks in the Mekong Delta Measuring the seasonality of human contact patterns and its implications for the spread of respiratory infectious diseases. medRxiv Recognition of aerosol transmission of infectious agents: a commentary Ten scientific reasons in support of airborne transmission of SARS-CoV-2. The Lancet Airborne transmission of respiratory viruses Transmission of COVID-19 virus by droplets and aerosols: A critical review on the unresolved dichotomy Airborne transmission of SARS-CoV-2: theoretical considerations and available evidence It is time to address airborne transmission of coronavirus disease 2019 (COVID-19) Daily indoor-to-outdoor temperature and humidity relationships: a sample across seasons and diverse climatic regions Sensitivity of airborne transmission of enveloped viruses to seasonal variation in indoor relative humidity Dynamics of airborne influenza a viruses indoors and dependence on humidity Human activity patterns: a review of the literature for estimating time spent indoors, outdoors, and in transit The National Human Activity Pattern Survey (NHAPS): a resource for assessing exposure to environmental pollutants Time-location patterns of a diverse population of older adults: the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) Shifting patterns of seasonal influenza epidemics Seasonally forced disease dynamics explored as switching between attractors Susceptible supply limits the role of climate in the early SARS-CoV-2 pandemic Seasonality of respiratory viral infections COVID-19 outbreak associated with air conditioning in restaurant A probabilistic transmission dynamic model to assess indoor airborne infection risks Associations between COVID-19 transmission rates, park use, and landscape structure What about bias in your dataset?": Quantifying Sampling Bias in SafeGraph Patterns Leveraging administrative data for bias audits: Assessing disparate coverage with mobility data for COVID-19 policy COVID-19 in a correctional facility employee following multiple brief exposures to persons with COVID-19-vermont Who has access to urban vegetation? a spatial analysis of distributional green equity in 10 US cities Perceptions of nature and access to green space in four urban neighborhoods Misconceptions about weather and seasonality must not misguide COVID-19 response Ignoring spatial heterogeneity in drivers of sars-cov-2 transmission in the us will impede sustained elimination. medRxiv Individuals of any age who own at least one smartphone and use the smartphone(s) at least once per month Characterizing an epidemiological geography of the United States: influenza as a case study. medRxiv Data Clustering: Algorithms and Applications Fast unfolding of communities in large networks Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number R01GM123007. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.We gratefully acknowledge data sharing by Safegraph which made this study possible. The raw data underlying the results presented in the study are openly available from Safegraph for researchers. The data generated by our study, including the indoor seasonality metric, is available for download at https://github.com/bansallab/indoor_outdoor. The authors declare that they have no competing interests.