key: cord-0972134-xhdyxoqm authors: Dey, Tanujit; Lee, Jaechoul; Chakraborty, Sounak; Chandra, Jay; Bhaskar, Anushka; Zhang, Kenneth; Bhaskar, Anchal; Dominici, Francesca title: Lag time between state-level policy interventions and change points in COVID-19 outcomes in the United States date: 2021-06-18 journal: Patterns (N Y) DOI: 10.1016/j.patter.2021.100306 sha: 641b908b702f2d497c9875d15ecac139159c4e53 doc_id: 972134 cord_uid: xhdyxoqm State-level policy interventions have been critical in managing the spread of the new coronavirus. Here, we study the lag time between policy interventions and change in COVID-19 outcome trajectory in the United States. We develop a stepwise drifts random walk model to account for non-stationarity and strong temporal correlation and subsequently apply a change-point detection algorithm to estimate the number and times of change points in the COVID-19 outcome data. Furthermore, we harmonize data on the estimated change points with non-pharmaceutical interventions adopted by each state of the United States, which provides us insights regarding the lag time between the enactment of a policy and its effect on COVID-19 outcomes. We present the estimated change points for each state and the District of Columbia and find five different emerging trajectory patterns. We also provide insight into the lag time between the enactment of a policy and its effect on COVID-19 outcomes. Correspondence fdominic@hsph.harvard.edu A rigorous study on the lag time between implemented policies and their effects on the trajectory of COVID-19 outcomes is desired as more data are available. Timeseries models and data-driven search algorithms can be used to effectively model COVID-19 outcomes and detect their changes due to their associated policies. We find five patterns in the trajectory of US COVID-19 outcomes and a 10-to 14-day lag time between implementation of the policies and their effects on COVID-19 outcomes. The outbreak of the COVID-19 pandemic In December of 2019, a new strand of coronavirus was identified in Wuhan, China. The first case of COVID-19 in the United States of America was identified on January [19] [20] 2020 , in the state of Washington, after a man returned from Wuhan, China, on January 15, 2020. 1 The first non-travel-related COVID-19 case was confirmed on February 26, 2020, which raised concerns related to community transmission of the new coronavirus in the United States. 2 On March 11, 2020, after noticing patterns of international spread, the World Health Organization declared COVID-19 a pandemic. With the number of COVID-19 cases rising steadily in the United States and around the world, concerted intervention by states and the US federal government was required in order to effectively monitor and prevent the spread of the virus. The goal of these interventions was to ensure and maintain access to testing for as many people as required. On March 13, 2020, President Donald Trump declared a state of emergency in an attempt to strengthen the response of the federal government to the pandemic. As commented on by the Kaiser Family Foundation (KFF), US states took slow action to contain the spread of the virus, especially those states hit hardest by the outbreak, suggesting that effective policy responses were delayed. In one such example, the first patient to have diagnosed COVID-19 died on February 29, 2020, whereas the declaration of a state of emergency by national and state-level governments was announced nearly a full month later. According to KFF, every state in the United States had made an emergency declaration by March 16, 2020 , with most of these declarations listed as either a State of Emergency or a Public Health Emergency. 3 Although these emergency declarations were made, the United States had already reported 4,507 cases on March 16 and 188,461 by March 31, 2020. 4 In response, several states issued stay-at-home orders, such as California issuing its stay-at-home order on March 19, 2020. 5 Importance of state-wise policies compared with federal-level policies According to KFF, many states have implemented policies in order to increase access to COVID-19 testing and treatment in addition to improving the management of other health conditions. 3 Governments across the world, at either the national or the subnational level, have similar methods for implementing public policies. Many nations implemented policies after considering the challenges faced by other nations when implementing a similar policy. This allowed legislators to draw valuable lessons during implementation and alter policies in the aim of achieving the best possible outcome for their constituents. 6 Policy decisions are further conditioned by ''contextual factors, including institutional (e.g., constitutional and legalistic structures) factors, cultural orientations, economies, and political styles (among others).'' 6 Policy networks, which are entities that seek to influence policy, relationships with other legislations, and the related outcomes, both respond and contribute to the shifting of attention to policy issues and the change of government agendas. 6 Examples of policy networks include political parties, elected offices, non-governmental organizations, and public entities, which communicate across numerous connections vital to the policy-making process. 6 The COVID-19 pandemic has encouraged a rapid and dramatic shift in the priorities of policy networks and, correspondingly, shifts in the priorities of many government decision-making venues, such as legislatures and parliaments. 6 The Swiss parliament, for instance, interrupted its usual slate for the spring session and adopted other issues, such as climate change and pension changes. 6 Since the start of COVID-19, many policy networks, like those at the state level, have begun to focus more on the fundamental purpose of particular policy areas, whether that be education or to provide food for struggling families. Other examples include the implementation of California's state-wide law to prohibit the eviction of tenants on commercial property, 6 or initiatives put in place, such as Connecticut providing $95.5 million in SNAP benefits to families of children eligible for free and reduced-price school meals through the Pandemic Electronic Benefits Transfer program. 5 Literature review on previous studies Social distancing measures were either fully or partially relaxed by all US states and the District of Columbia weeks after they were first issued in order to suppress the transmission of COVID-19 and reduce the growth in cases of severe coronavirus disease. Using segmented linear regression, one study measured the extent to which the relaxation of social distancing measures influenced the control of the epidemic in the United States. 7 Following the gradual easing back of social distancing measures across the United States, Tsai et al. 7 observed an immediate and significant turnaround in the suppression of the epidemic as the country's ability to monitor the disease burden associated with COVID-19 was compromised by the premature relaxation of social distancing steps. However, another study suggested a slightly dissimilar conclusion when applying reduced-form econometric methods to empirically evaluate the effect of anti-contagion policies on the growth rate of infections. 8 When reduced-form econometric methods were used, it was found that anti-contagion policies have significantly decelerated the growth rate of infections 8 . It was estimated that among the six countries (China, South Korea, Italy, Iran, France, and the United States) analyzed, anti-contagion policies have prevented or delayed approximately 61 million confirmed cases and averted 495 million total infections. 8 In another study, a group of researchers used a stochastic individual-based model for the transmission of COVID-19 to describe individual contact networks stratified into household, school, community, and workplace layers, using demographic and epidemiological data from the United Kingdom. 9 The authors found that if social distancing measures were relaxed, including the reopening of schools, they must be accompanied by large-scale, population-wide testing for symptomatic individuals alongside contact tracing. 9 Another rapid review was also conducted to assess the effects of quarantine, alone or in combination with other measures, on individuals who have come in contact with confirmed cases of COVID-19. 10 The review concluded that evidence of COVID-19 was limited to modeling studies, but consistently indicated that quarantine is crucial to the reduction of infection and mortality during the pandemic. Another study used generalized linear mixed-effects models with state-level clustering in order to estimate county-level associations with an overall social vulnerability index (SVI) as well as SVI subcomponent scores with COVID-19 case fatality rate (CFR). 11 The authors found no significant association between overall SVI and COVID-19 incidence but found that the social status and minority status subcomponents of the SVI were both predictors of higher incidence and CFR. Other papers have been suggestive of the existence of social inequities in the United States for COVID-19 outcomes in regard to national, state, and local public health data. 12 The analysis of US countylevel COVID-19 deaths, confirmed cases, and positive tests in Illinois and New York City, which is stratified by zip codes, area percent crowding, poverty, and population of color, revealed that socioeconomic disparities for COVID-19 outcomes exist in the United States. 12 To provide an evidence base for policy and resource allocation, the paper suggests the use of straightforward cost-effective methods to report on social disparities in COVID-19 results. In addition to social disparities, the association of public health interventions with epidemiological characteristics of the COVID-19 pandemic was evaluated using individual-level data on 32,583 laboratory-confirmed COVID-19 cases in Wuhan, China. 13 The rates of laboratory-confirmed COVID-19 cases were calculated across five periods: first, from December 8, 2019, to January 9, 2020, when no public health interventions were executed; then, January 10-22, 2020, during the Chinese New Year Holiday; January 23 to February 1, 2020, during which travel restrictions and quarantine were enforced; February 2-16, 2020; and February 17 to March 8, 2020, when universal symptom surveys were completed. Over the series of public health interventions, it was identified that there was a temporal association between the interventions and the improved control of the COVID-19 pandemic in Wuhan, China. 13 This indicates that public health interventions were associated with improved control of the COVID-19 pandemic in the earlier stages of the outbreak in Wuhan, China, which may inform public health policy in other countries and regions. 13 In addition to public health interventions, non-pharmaceutical interventions (NPIs) appeared to be effective in containing the COVID-19 outbreak in China. 14 The study constructed a travelnetwork-based susceptible-exposed-infectious-removed model using epidemiological parameters estimated for the early stages of the outbreak in Wuhan before NPIs were implemented, in order to simulate the outbreak of the pandemic across cities in mainland China. Based on the model's results, without the implementation of NPIs, the number of confirmed COVID-19 cases would possibly have shown a 67-fold increase, indicating that the early detection and isolation of cases were forecast to prevent more infections than travel restrictions and reductions in contact. 14 Also, it was proposed that integrated NPIs would have achieved the strongest, most robust, and most rapid effect of preventing more infections. 14 However, if these NPIs had been implemented 1-3 weeks earlier in China, it is estimated that the total number of infections could have been reduced by 66%-95%. 14 In conclusion, to mitigate health, social, and economic impacts in affected regions around the world, integrated NPI strategies should be planned, implemented, and modified earlier than they were. 14 COVID-19 infections and the effect of policies over time: A change-point perspective As the data related to this pandemic are recorded over time, a natural choice for the development of methodological research could be time-series analysis. Also, because the infection rate of COVID-19 changes based on several reasons, one of the most important research goals could be the association of those changes in infection rates with the interventions imposed to stop or delay the growth of the pandemic. Therefore, a change-point analysis of the COVID-19 data associated with the growth rate of the pandemic should be rigorously studied. In the short period of time from the start of the pandemic until now, we are indeed witnessing an increasing number of contributions in this area of research. Using data from Germany, Dehning et al. 15 applied a Bayesian epidemiological model to analyze the time dependence of the growth rate of COVID-19 infections from a change-point perspective. Their research found that the growth rate of the pandemic is indeed correlated with the time points at which public interventions (policies) were decided. Jiang et al. 16 performed a change-point analysis based on COVID-19 health outcomes (cases, deaths) using a piecewise linear trend model. They analyzed the trajectory of cumulative COVID-19 cases and deaths across 30 countries. In addition, they developed a forecasting model for predicting cumulative deaths in the United States. In yet another study, based on data from European countries, researchers identified the change points in the COVID-19 epidemic. 17 A mixed-effects Poisson regression model was used to assess the relationship between the level of social distancing and the observed decay in the national epidemic. Wagner et al. 18 performed an interrupted time-series analysis to evaluate the association between the interventions taken by the US states and the reduction in the spread of COVID-19. The current gaps in knowledge on this topic include four critical issues that need to be further considered altogether. First, the models for COVID-19 outcomes need to be flexible to better adapt to continuously evolving changes in the pandemic. Restrictive models such as model-based regression could be limited in modeling diverse patterns pertinent to COVID-19 outcomes. Second, the COVID-19 outcomes in a day are heavily dependent on the previous day. To be more effective at accounting for this temporal correlation, one will need a model that effectively models this strong temporal correlation. Third, the COVID-19 outcomes typically show multiple changing trajectories due to policy changes adopted by the states. The COVID-19 models need to consider such multiple structural changes in the trajectory. Fourth, as the time lag between a newly adopted policy and its resulting change in COVID-19 outcomes can vary, a probabilistic approach could be relevant to model the lag time. In short, to combine all of these four issues, one will need to develop a unified approach that rigorously finds change points in COVID-19 outcomes and links these change points to their associated policies via a probabilistic approach. In this paper we study the lag time between policy interventions and a change in daily COVID-19 outcome trajectory in the United States, a critical topic that has been less studied in the literature. First, we modeled the COVID-19 time series data using a stepwise drifts random walk to account for nonstationarity, strong temporal correlation, and multiple changes in the rate of change for the daily COVID-19 outcome (confirmed cases and deaths). Second, because a rigorous estimation of change points in the COVID-19 outcomes is very important, we apply a genetic algorithm (GA) with the minimum description length criterion to estimate the number and times of change points in the US COVID-19 outcome. GAs are a data-driven search technique based on the natural selection principle and, in particular, have been very effective in detecting multiple unknown change points in time-series data, [19] [20] [21] [22] unlike some previous studies 17, 18 that considered at most only one change point. We estimated the change points separately for each state and the District of Columbia. Third, we estimated the time-dependent growth rate for each US state based on its estimated change points and then identified five different emerging trajectory patterns in COVID-19 outcomes. Fourth, to link our estimated change points with populationlevel NPIs adopted by each US state, we created a random variable that assesses the relationship between the change points in COVID-19 health outcomes and the dates of NPIs and then tested the hypothesis of whether there is any impact of the NPIs on the COVID-19 outcomes within several selected days. For this, the underlying probabilistic model associated with this random variable is also identified. Data used for this study Data for COVID-19 outcomes We used the US COVID-19 dataset from the New York Times data repository at https://github.com/nytimes/covid-19-data for our analysis of the US pandemic change points. For the data analysis, we selected the daily confirmed cases (from March 8, 2020, to February 28, 2021) and daily deaths (from March 18, 2020, to February 28, 2021) for each of the 50 states and the District of Columbia. We performed all the statistical analysis using the statistical software R version 4.0.5. We extracted the dates marking policy implementations, available until July 30, 2020, for each of the 50 states and the District of Columbia, from the KFF GitHub (https://github.com/KFFData/ COVID-19-Data/tree/kff_master/State%20Policy%20Actions/ State%20Social%20Distancing%20Actions). We then manually updated the policy dates until February 22, 2020, using the current status of the different types of policies from August 3, 2019, to February 22, 2020. We made available the current status of each policy in the KFF GitHub (here). If from August to February there was a change in the state of any of the below-mentioned policy types, it was added to the original spreadsheet with policy dates until July 30, 2020. Policies were sorted into seven different categories, including stay-at-home orders, mandatory quarantines, non-essential business closures, bans on large gatherings, school closures, limitations on restaurants, and declarations of a state of emergency. The date of rollback for five of these policies (stay-at-home orders, mandatory quarantines, non-essential business closures, bans on large gatherings, and limitations on restaurant limits) was also recorded. Each state could have multiple policies of the same type if, for example, the policy was implemented, rolled back, and reimplemented. In this scenario, two separate policy implementation dates would be recorded. The updated policy data spreadsheet can be found at our GitHub website. Based on the updated policy data spreadsheet, we visualized the number of lockdown ( Figure 1 ) and the number of reopening ( Figure 2 ) policy implementations for each US state. The number of lockdown policy implementations as recorded by the KFF is displayed for each of the 50 states. For each state, there are seven different types of lockdown policies, including stay-at-home orders, mandatory quarantine, non-essential business closures, large gathering bans, school closures, restaurant limits, and state-of-emergency declarations. The number of reopening policy implementations as recorded by the KFF is displayed for each of the 50 states. For each state, there are five different dates for different reopening policies, including the rollback of stay-at-home orders, mandatory quarantine, non-essential business closures, large gathering bans, and restaurant limits. We applied our change-point method to each US state and the District of Columbia. To depict a detailed picture of the entire change-point results, we selected the COVID-19 data from Florida and thoroughly illustrated the analysis method and interpretation. Figure 3 summarizes our GA change-point estimation results for the log-transformed 7-day moving average series of daily new COVID-19 confirmed cases in Florida from March 8, 2020, to February 28, 2021. On the left, each vertical line denotes the daily number of new COVID-19 confirmed cases, the blue line represents the 7-day moving average case series, and the red vertical dotted lines show the GA estimated change-point times. As shown in the figure, the GA method estimates 10 change points on March 19, March 27, April 5, May 30, July 16, August 1, August 21, August 29, September 28, 2020, and January 9, 2021, segmenting the study period into 11 different regimes. These change-point times are also superimposed on the daily cumulative COVID-19 cases as displayed on the right of the figure. The GA appears to describe well the changes in day-to-day growth rate of the number of new cases (on the left) and the changes in the growth rate for day-to-day cumulative cases (on the right), as well. Table 1 summarizes the estimated values of the means and standard deviations for day-to-day changes in the Florida logtransformed 7-day moving average case series with the 10 GA-estimated change points. The mean and standard deviation estimates for the day-to-day changes (or growth rates) experienced 11 different regimes, starting from a daily increase of 41.79% in the initial period of March 8-March 18, 2020, and ending with À2.09% in the last period of January 9-February 28, 2021. The results also indicate that, since March 8, 2020, the 7-day moving average of new cases increased, on average, at a rate of 41.79% day À1 until March 18, and after 10 changes, decreased at a rate of 2.09% day À1 in the period of January 9-February 28, 2021. Next, we applied the GA method to the log-transformed 7-day moving average series of Florida daily COVID-19 deaths from March 18, 2020, to February 28, 2021. The GA detected seven change points on April 1, April 8, June 15, July 29, September 2, October 24, 2020, and January 24, 2021, partitioning the study period into eight regimes. Figure 4 depicts our GA change-point estimation results. The left of the figure shows the number of daily deaths as vertical lines, the 7-day moving average of daily deaths as the blue line, and the GA-estimated change-point times as red vertical lines. The right shows the daily cumulative deaths with these four GA change-point times overlaid. The GA change-point times appear to detect well the changing mean of day-to-day differences in the deaths series (on the left) and also the changes in the rate of mean changes in day-to-day cumulative cases (on the right). Table 2 summarizes the estimated values of the means and standard deviations for day-to-day changes in the Florida logtransformed 7-day moving average death series with the 10 GAestimated change points. The estimated mean for the day-today growth rate in Florida COVID-19 deaths was 20.47% in the initial regime for March 18-31, and after seven changes, was decreased to À0.80% in the last regime for January 24-February 28, 2021. Equivalently, this result implies that the 7-day moving average of COVID-19 deaths changed, starting from a rate of 20.47% day À1 until March 31 and ending with a rate of À0.80% day À1 for the period of January 24-February 28, 2021. Figure 5 depicts our GA change-point results for the daily new cases. Except for Hawaii, with its highest peak occurring in August 2020, all of the US states experienced their highest peaks during November 2020 and January 2021. Focusing on the continental United States, we identified five emerging patterns of the trajectory of US confirmed cases. The first pattern, denoted by +À+, is observed in Connecticut, Delaware, District of Columbia, Massachusetts, Michigan, New Hampshire, New Jersey, New York, Rhode Island, and Vermont, showing an early peak during April and May, a fast decrease after then, and the highest peak in mid-November 2020 and January 2021. The second pattern, denoted by +++, occurs in Illinois, Indiana, Louisiana, Maryland, Nebraska, Pennsylvania, Virginia, and Washington, characterized by the first peak in late March and May, another peak during the summer, and then the highest peak between mid-November 2020 and January 2021. The third pattern, À++, which shows two substantial peaks, the first substantial peak around July and the highest peak occurring between mid-November 2020 and January 2021, is observed in Alabama, Alaska, Arizona, California, Florida, Georgia, Idaho, Mississippi, Nevada, Oregon, South Carolina, Tennessee, Texas, and Utah. The fourth pattern, À/+, shows low constant trends until August and then a sharp increase, with the highest peak occurring between November 2020 and January 2021, and appears in Colorado, Maine, Minnesota, Montana, New Mexico, North Dakota, Ohio, South Dakota, West Virginia, Wisconsin, and Wyoming. The rest of the states experienced a steady increasing trend and then a fast increase, with the highest peak during the winter, the fifth pattern, //+, in daily case series. Table 3 summarizes the continental US states and the District of Columbia according to their corresponding patterns. We find geographical similarities within these five patterns, especially in patterns 1, 3, 4, and 5. Specifically, we note several northeastern states in pattern 1, many southern states in pattern 3, northern states in pattern 4, and a few central states in pattern 5. Five patterns are also found in the daily deaths as displayed in Figure 6 . The first pattern (+À+), an early considerable peak during April and May, a fast decrease after then, and another peak from December 2020 to February 2021, is observed in Colorado, Connecticut, Delaware, District of Columbia, Illinois, Indiana, Maryland, Massachusetts, Michigan, Minnesota, New Hampshire, New Jersey, New York, Pennsylvania, and Rhode Island. The second pattern (+++) of a first peak in April, another substantial peak in the summer, and then a third peak in the winter appears in Florida, Louisiana, and Washington. The third pattern (+//), with three incremental peaks, one peak during April and May, a larger peak during July and August, and the highest peak between December 2020 and February 2021, is found in Alabama, Arizona, California, Georgia, Idaho, Mississippi, Nevada, New Mexico, Oregon, South Carolina, Texas, and Utah. The fourth pattern (ÀÀ+), a low steady trend and then one large peak from November 2020 to February 2021, can be found in Iowa, Kansas, Montana, Nebraska, North Dakota, South Dakota, West Virginia, and Wisconsin. The fifth pattern (//+), a steadily increasing trend, is observed in Arkansas, Kentucky, Missouri, North Carolina, Ohio, Oklahoma, Tennessee, and Virginia. Table 4 summarizes these 45 states and the District of Columbia according to their similar patterns. Geographical similarities are also identified: several northeastern states in pattern 1, southern states in pattern 3, and a few central states in pattern 4. In Table 5 , we report the results of the hypothesis test using the Wilcoxon signed-rank test for test 1. At a 5% level of significance, the null hypotheses with d = 3 and d = 7 are rejected, Figure 7 is the visual representation of the goodness-of-fit test to the log-normal distribution. All plots in the figure indicate that the log-normal model is well justified, as the candidate distribution of our random variable Y with change point is based on daily confirmed cases. Table 6 represents the numerical goodnessof-fit results based on the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). It shows that the log-normal model gives the best fit to the data compared with other competing models. To be more specific, both AIC and BIC values for the log-normal are at least 50 points below their closest competitor (Weibull model). Now, we compute the log-normal parameter estimates and their standard errors for our fitted log-normal probability model and summarize the results in Table 7 . The fitted log-normal cumulative distribution function plot for our random variable Y is displayed in Figure 8 . This summarizes how our model can be used to figure with a certain degree of certainty within how many days we will see an impact of a policy or NPI on the change point in daily confirmed cases. For example, based on the red dotted horizontal line, we find that there is a 50% chance that we will see the effect of an NPI on changing the positivity rate within 8 days. Similarly, based on the blue-dotted horizontal line, there is a 95% chance that we will see the effect of an NPI on changing the positivity rate within 61 days. The Wilcoxon signed-rank test results for the lag time between NPIs and change points for the death cases (test 2) are summarized in Table 8 . At a 5% level of significance, the null hypotheses with d = 3 and d = 7 are rejected, but the null hypotheses with d = 10 and 14 are not. It shows that the effect of the policy on change points in the COVID-19 daily death counts may occur within 10 days of an NPI. Figure 9 displays the goodness-of-fit to the log-normal distribution. All plots in the figure indicate that the log-normal model is satisfactorily justified as the candidate distribution of our random variable Y with change point based on daily deaths. Table 9 represents the goodness-of-fit results based on AIC and BIC. The result numerically confirms that the log-normal model gives the best fit to the data compared with other competing models. To be more specific, both AIC and BIC values for the log-normal are at least 40 points below their closest competitor (Weibull model). Table 10 shows the log-normal parameter estimates and their standard errors for our fitted log-normal model. Using these estimates, we estimated the cumulative distribution function of our random variable Y for the death cases. Figure 10 depicts this result and summarizes how our model can be used to assess within how many days we will see an impact of a policy or NPI on the change point in daily death counts. For instance, from the red dotted horizontal line, there is a 50% chance that we will see the effect of an NPI on changing the daily death counts within 9 days. Similarly, from the blue-dotted horizontal line, we can conjecture that there is a 95% chance that we will see the effect of an NPI on changing the daily death counts within 74 days. For each state and each dataset of COVID-19 cases and deaths, we calculated the number of positive and negative changes in the estimated COVID-19 growth rates that followed lockdown policies and also the number of positive and negative changes in the estimated COVID-19 growth rates that followed reopening policies. The change in growth rates at the current regime was determined by subtracting the previous growth rate from the current growth rate. A positive change in the growth rate represents an increase in the growth rate of COVID-19 cases or deaths, while a negative change in the growth rate indicates a decrease in the growth rate of COVID-19 cases or deaths. These results are summarized in Table 11 . Overall, lockdown policies are more associated with a decrease in growth rates ($87.6% for confirmed cases and $69.2% for deaths), and reopening policies are more likely associated with an increase in growth rates ($68.9% for confirmed cases and $53.4% for deaths). Considering that many other Article factors can influence growth rates, this identified association between policy type and growth rate change is notable. We then conducted a chi-square test for independence between the policy type and the growth rate change using the results in Table 11 and found that there was a significant association between policy type and growth rate change for cases (c 2 = 113.29, p < 0.001) and for deaths (c 2 = 13.41, p < 0.001). We see a less prominent association between policy type and growth rate change for COVID-19 deaths. This is likely because death is a more distal measure of pandemic trajectory, and therefore there are more confounders between the policy and its change point that will affect the strength of the relationship. To further understand this association, in Figure 11 we visualize the distribution of the magnitude of the growth rate changes associated with lockdown and reopening policies for confirmed cases (top) and deaths (bottom). As expected, lockdown policies have overall lower growth rate change for both confirmed cases and deaths, whereas reopening policies tend to have increased growth rate changes for both outcomes. We find that the separation for growth rate change between lockdown policies and reopening policies is greater for confirmed cases than for deaths. As the pandemic becomes more apparent and the COVID-19 virus spreads more widely, governments and public health agencies have responded to this evolving situation, intending to reduce COVID-19 transmission by implementing several health policies. The US COVID-19 data show that the confirmed cases and deaths have had multiple changes in their growth rates according to the implemented policies. However, how many changes there are and when those changes have occurred are not clear for many states and should be rigorously estimated by using a reasonable approach. Otherwise, a naive use of the existing epidemiological models with these changes ignored could result in an unrealistic prediction with spurious patterns. To better understand the actual COVID-19 data with these changing patterns considered, we applied several statistical approaches to study such different patterns of change. We used two different sources of data and incorporated their separate analytical results to develop a unified method from a statistical data science perspective. For the first part of our unified method, we used two major outcomes (confirmed cases and deaths) associated with the COVID-19 pandemic. By using a stepwise drifts random walk model and a GA technique for those outcomes presented in the COVID-19 data, we assessed whether there existed changes in the growth rates of the outcomes over time for all the US states. We applied the GA change-point method to estimate change points in each state. We found that there were similarities in the trajectory of US COVID-19 outcomes among the states and then categorized the continental US states and the District of Columbia into five groups based on the underlying changing patterns associated with the outcomes. As summarized in Tables 3 and 4 , we found strong geographical similarities within the five pattern groups, especially among the northeastern, northern, southern, and central states. In the second part of our unified method, we connected the findings from the change-point analysis to test the hypothesis to see if there exists any connection or impact between the policies (NPI) adapted by several states over time and the change points we have estimated from the change-point analysis of the COVID-19 outcomes. By using a time-to-event modeling approach for this context, we created a random variable, which can represent the lag times between the policies and their associated change points. A graphical example is provided for Florida to illustrate how this random variable can be interpreted as illustrated in Figure 12 . Once this random variable is observed, our hypothesis testing problem is then well defined. We tested the hypothesis on the short-term and long-term impact of policies on the change of growth rate for COVID-19 confirmed cases and deaths by choosing several combinations of the lag time in days. We found that a policy implementation takes on average about 10-14 days to be effective in changing the growth rate of COVID-19 outcomes. As the realizations of a random variable can be characterized by a probability distribution, we further found that our defined random variable can be modeled by the two-parameter lognormal distribution as it is commonly used for the time-to-event data. Once we fitted the log-normal distribution to the data associated with our random variable, we were able to visualize how our log-normal model can be used to estimate, with certain degree of certainty, within how many days we will see an impact of a policy or NPI on the change point based on the outcomes of the COVID-19 pandemic. Also, our fitted log-normal model can be further used as a generative model to illustrate behavioral patterns of general public policy measures (NPI) on changing the course of public health in a pandemic situation. For example, our generative model can be used as a tool to help local governments decide when to put a policy into effect and how long it will take its course to run before they can see any significant impact on public health outcomes. There are some directions to be considered for future research. First, our lag-time random variable, when evaluating the time gap between policy changes and their subsequent change points, considers only the latest policy change and the first subsequent change point. This approach could underestimate longer-term associations between policy changes and their change points. By extending our model, one could create multiple lag-time random variables for different policies and therefore could more rigorously assess the long-term effects of each policy. This extended approach could also be useful to assess potential impacts of repeated implementations of the sample policy on COVID-19 out- comes. Second, our lag-time log-normal model does not consider covariates that could have an impact on lag-time distribution. Identifying such influential covariates and incorporating them into a relevant parametric or semiparametric model would seem useful. Third, one could be interested in incorporating heterogeneity of policy changes among different groups. To be specific, school closure, for example, could differently affect primary, secondary, and high schools and colleges and also could be different across regions and locations within the same state. In addition, the impact of state-level policies versus local policies in long-term care facilities could be further studied, as a substantial portion of COVID-19 deaths in the United States have been reported in long-term care facilities. Finally, although we identified five patterns of the US COVID-19 trajectory and found geographical similarities within the patterns, a further study is anticipated to investigate what might be driving the five patterns. Resource availability Lead contact Further information regarding this study should be requested from the lead contact, Francesca Dominici (fdominic@hsph.harvard.edu). This study did not generate any physical materials. Data and code availability All codes and data are available at the GitHub repository: https://github.com/ jaechoullee/COVID-19-Policy-and-Changepoints. Change-point detection for COVID-19 outcomes Suppose fX1; .; X n g denotes a 7-day moving average series of daily new COVID-19 confirmed cases over a study period of n days for a given state. Because the 7-day moving average COVID-19 case series is non-stationary with a very strong temporal correlation, we use a random walk process to model these features presented in the COVID-19 data. In general, random walk processes are a stochastic process with a Markov property: the current state is dependent on the previous state only among all other past states. 23 However, since adopted health policies could influence the rate of change in the number of new COVID-19 cases, as described below we develop a model that takes these possible rate changes into account to avoid bias in the model parameter estimation. 24 More specifically, we assume that the rate of change in the 7-day moving average new COVID-19 case series has changed m times on days t 1 ; .; t m . To incorporate these rate changes into the COVID-19 data with non-stationary and strong autocorrelation features, we consider a random walk model with varying stepwise drifts: where X t denotes the 7-day moving average of the COVID-19 outcome (either daily confirmed cases or daily deaths) on day t, D t represents a time-dependent stepwise drift on day t with multiple changes occurring at times t 1 ; .; t m , modeled as and fZ t g is a mean-zero Gaussian white noise process with a variance of U 2 t varying at times t 1 ; .; t m : Note that since m change points occurred at times t 1 ; .; t m , the random walk model in Equation (1) experiences a total m + 1 different regimes. The model parameters d j and u 2 j are interpreted as the expected value and variance of day-to-day changes in ln X t , respectively, during the period in regime j, for j = 1; .; m + 1. To elaborate, we reexpress the model in Equation (1) as ln X t À ln X tÀ1 = D t + Z t ; ( Equation 2) and, by taking expectation and variance on both sides of Equation (2), we obtain: Figure 8 . Cumulative density plot from the log-normal model to showcase days to decide as a function of cumulative probability for the confirmed cases Article if day t is in regime j. We note that the difference in log-transformed daily outcomes, ln X t À ln X tÀ1 , approximates the daily growth rate p t = ðX t À X tÀ1 Þ=X t . To elaborate, the growth rate p t can be reexpressed as X t = ð1 + p t ÞX tÀ1 , which in turn is equivalent to ln X t À ln X tÀ1 = lnð1 + p t Þ. If the growth rate p t is relatively small in magnitude as the COVID-19 case, the right-sided term is approximated as lnð1 + p t Þzp t . Therefore, the use of log-transformed outcomes to the stepwise drifts random walk model can be effective to model the changes in growth rates. If the change-point number m and the change-point times t 1 ; .; t m are a priori known, we can estimate the random walk model parameters using the maximum likelihood method. However, because m and t 1 ; .; t m are unknown in many practices as COVID-19 outcomes, we treat these change points as unknown parameters and estimate them from the data. In addition, although the starting dates of newly implemented health policies are known, we do not know the exact dates of actual changes in day-to-day growth rate because of many factors that are inducing the changes with unknown time lags. Due to these reasons, we propose estimating the change-point number and times based on the COVID-19 data. To estimate the change-point number m and change-point times t 1 ; .; t m in the model (Equation 1 ), we use a GA. The GA is a data-driven search algorithm that finds an optimal solution for a given target function by implementing the principle of natural selection. Similar to the GA methods developed by Davis et al., Li and Lund, and Lee et al., [19] [20] [21] [22] our GA method uses a penalized likelihood approach with a penalty based on the minimum description length criterion. Our GA method can successfully detect multiple change points, unlike some other at-most one-change-point methods used in recent COVID-19 change-point results. 17, 18 Further, our GA method is distinct from the existing GA methods in that our GA uses the likelihood function derived from the stepwise drifts random walk model in Equation (1) to specifically adapt to the nonstationary and strong temporal correlation features in the COVID-19 data. By applying the GA method with a stepwise drifts random walk model, we estimate the change points in the 7-day moving average of daily positive cases for March 8, 2020, to February 28, 2021, and the change points in the 7-day moving average of daily deaths for March 18, 2020, to February 28, 2021. To gather evidence regarding a potential association between the impact of a policy (NPI) on the change point in COVID-19 outcomes, we have created a random variable Y by using the following definition: Y = the number of days from the last policy to the next change point: Note that we ignore the change point without any policy before the previous change point. Furthermore, we calculate the Y separately from those change points for the number of daily positive cases and the number of daily deaths. To be specific, assume a state policy (NPI) is denoted by P and a change point by C and suppose we find the following sequence of events: P 1 P 2 P 3 C 1 C 2 P 4 C 3 . Then, the first value of Y is calculated by the number of days between P 3 and C 1 ; the second value of Y is the number of days between P 4 and C 3 , and so on. Figure 12 is an example of how the values of this random variable Y were computed based on the daily confirmed COVID-19 case data of the state of Florida. We did this computation for all 50 states and the District of Columbia to obtain the set of observed values for our defined random variable Y. Once the Y values are computed, our first hypothesis testing is as follows: Test 1 d H 0 (null hypothesis): after a policy is implemented, a day-to-day growth rate change in COVID-19 confirmed cases occurs on average in at most d days. d H 1 (alternative hypothesis): after a policy is implemented, a day-to-day growth rate change in COVID-19 confirmed cases occurs on average in more than d days. Here, we test the above hypotheses for four different values of d (in the number of days): 3 (for immediate impact), 7, 10 (for moderate impact), and 14 (for long-term impact). Overall, the above hypothesis testing scheme will reveal to us ''if and how long it takes for an NPI to cause significant impact (change point) on the daily positive case counts.'' Next, we formulate similar statistical hypotheses but with Y defined based on the change point for the daily count of COVID-19 related deaths as follows: Test 2 d H 0 (null hypothesis): after a policy is implemented, a day-to-day growth rate change in the number of COVID-19-related deaths occurs on average in at most d days. d H 1 (alternative hypothesis): after a policy is implemented, a day-to-day growth rate change in the number of COVID-19-related deaths occurs on average in more than d days. We also test these hypotheses for four different values of d (in the number of days): 3 (for immediate impact), 7, 10 (for moderate impact), and 14 (for long- term impact). This hypothesis testing will help us better understand ''if and how long it takes for an NPI to cause significant impact (change point) on the daily number of COVID-19 related death counts.'' Because the data on our Y (the number of days from the last policy to the next change point for daily positive cases and daily deaths) do not exactly follow a normal distribution and the sample size is not large enough for the asymptotic normality, we use a non-parametric test, Wilcoxon test, to test the hypotheses in tests 1 and 2. To quantify the uncertainties for our random variable Y, we need to identify which probability distribution fits the data best so that we can calculate the probability of impact of a policy decision on the change point of outcome (positivity and death counts) from the pandemic. It is important to use the probability distribution that accurately reflects the nature of the data. There are a few choices in this situation. From the way we have defined the random variable Y, the Y values can be treated as time-to-event data. For time-to-event or lifetime data, a common choice is a log-normal distribution; in fact, later we will justify why the ''log-normal'' model is more suitable for this problem compared with other competing models. Therefore, we assume that Y follows a lognormal distribution with parameters m and s > 0, respectively. The probability density function (PDF) of Y is hence expressed as fðyÞ = 1 ys ffiffiffiffiffiffi 2p p exp À ðln y À mÞ 2 2s 2 for y > 0. Note that we have two sets of data on Y, one when we use the positivity numbers for change point and the other when we use the death counts for change point. Therefore, we fit two separate log-normal distributions to each of these separately calculated values of Y . We visually assess the goodness of fit by overlaying the histogram with the PDF. We also produce Q-Q and P-P plots. For a numerical verification, we compare the log-normal model to a few other competing models (Gamma, Gumbel, and Weibull) using the two model selection indices AIC and BIC. The model with the smallest AIC and BIC values is selected as our best model. In both cases, the log-normal model gives us the smallest AIC and BIC values compared with all the others. All our hypothesis testing was performed with a 5% level of significance. We thank the reviewers for their considerate comments and suggestions that have greatly improved the quality of this paper. F.D. was supported by National Institutes of Health (NIH) grant P30ES000002 and the 2020 Starr Friedman Award. T.D., J.L., S.C., J.C., K.Z., and A.B. contributed to formulation of the idea, data preparation, data analysis, data interpretation, and writing of the manuscript. J.L., J.C., and A.B. additionally contributed to the data preparation and the First case of 2019 novel coronavirus in the United States Evidence for limited early spread of COVID-19 within the United States State COVID-19 data and policy actions Coronavirus (COVID-19 West: coronavirus-related restrictions by state COVID-19 and the policy sciences: initial reactions and perspectives COVID-19 transmission in the U.S. before vs. after relaxation of statewide social distancing measures The effect of large-scale anti-contagion policies on the COVID-19 pandemic Determining the optimal strategy for reopening schools, the impact of test and trace interventions, and the risk of occurrence of a second COVID-19 epidemic wave in the UK: a modelling study Quarantine alone or in combination with other public health measures to control COVID-19: a rapid review impact of social vulnerability on COVID-19 incidence and outcomes in the United States Revealing the unequal burden of COVID-19 by income, race/ethnicity, and household crowding: US county versus zip code analyses Association of public health interventions with the epidemiology of the COVID-19 outbreak in Wuhan, China Effect of non-pharmaceutical interventions for containing the COVID-19 outbreak in China Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions Time series analysis of COVID-19 infection curve: a change-point perspective The effect of social distance measures on COVID-19 epidemics in Europe: an interrupted time series analysis Social distancing merely stabilized COVID-19 in the United States Structural break estimation for nonstationary time series models Multiple changepoint detection via genetic algorithms Trends in extreme U.S. temperatures Trend assessment for daily snow depths with changepoint considerations Time Series Analysis and its Applications with R Examples Detection of undocumented changepoints: a revision of the two-phase regression model manuscript. F.D. contributed to formulation of the idea, study design, data interpretation, funding, and writing of the manuscript. All authors contributed to the interpretation of the results and critical revision of the manuscript for important intellectual content and approved the final version of the manuscript. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. The authors declare that they have no competing interests. We worked to ensure gender balance in the recruitment of human subjects. One or more of the authors of this paper self-identifies as an underrepresented ethnic minority in science. The author list for this paper includes contributors from the location where the research was conducted who participated in the data collection, design, analysis, and/or interpretation of the work.