key: cord-0804411-3rxdpvfq authors: Salichos, L.; Warrell, J.; Cevasco, H.; Chung, A.; Gerstein, M. title: Genetic determination of regional connectivity in modelling the spread of COVID-19 outbreak for improved mitigation strategies date: 2021-02-02 journal: nan DOI: 10.1101/2021.01.30.21250785 sha: 420487081952cb2974562c17f9e9dfb0c05370ea doc_id: 804411 cord_uid: 3rxdpvfq Covid-19 has resulted in the death of more than 1,500,000 individuals. Due to the pandemic's severity, thousands of genomes have been sequenced and publicly stored with extensive records, an unprecedented amount of data for an outbreak in a single year. Simultaneously, prediction models offered region-specific and often contradicting results, while states or countries implemented mitigation strategies with little information on success, precision, or agreement with neighboring regions. Even though viral transmissions have been already documented in a historical and geographical context, few studies aimed to model geographic and temporal flow from viral sequence information. Here, using a case study of 7 states, we model the flow of the Covid-19 outbreak with respect to phylogenetic information, viral migration, inter- and intra-regional connectivity, epidemiologic and demographic characteristics. By assessing regional connectivity from genomic variants, we can significantly improve predictions in modeling the viral spread and intensity. Contrary to previous results, our study shows that the vast majority of the first outbreak can be traced to very few lineages, despite the existence of multiple worldwide transmissions. Moreover, our results show that while the distance from hotspots is initially important, connectivity becomes increasingly significant as the virus establishes itself. Similarly, isolated local strategies -such as relying on herd immunity- can negatively impact neighboring states. Our work suggests that we can achieve more efficient unified mitigation strategies with selective interventions. Covid-19 related deaths have surpassed 1,500,000 worldwide and 330,000 in the United States. 43 Due to the importance of the pandemic, many resources are available for COVID-19 genome 44 research, including GenBank, GISAID, and Nextstrain [1] [2] [3] . Due to the severity of the pandemic 45 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 2, 2021. ; https://doi.org/10.1101/2021.01.30.21250785 doi: medRxiv preprint Pennsylvania, Maryland and Virginia, a secondary outbreak -which also circulates in NY, CT 114 and MA-appears significant (figure 2, S1-7). 115 116 Assessing Viral Connectivity Between States 117 By selecting a set of (whenever possible) 50 reference sequences per state (in addition to the set 118 of world reference sequences), we built a phylogenetic tree that includes sequences from all 7 119 states. We then inferred a connectivity map between the different states by parsing the tree's 120 bipartitions (figure 3i). For this, we examined all possible connected pairs of sequences that 121 cluster together, while moving hierarchically from smaller to larger bipartitions without double-122 counting. To establish directionality between pairs, we used sampling dates. For example, the 123 pair (NY-PV09151_USA_NY_2020-03-22, CT-UW-6574_USA_CT_2020-04-03) would be 124 counted as NY -> CT, which denotes one incoming transitional connectivity from NY to CT. 125 Similarly, the pair (NY-PV08434_USA_NY_2020-03-18, NY-NYUMC659_USA_NY_2020-126 03-18) would be counted as ingrown connectivity NY -> NY. Overall, the NY outbreak showed 127 the highest connectivity, while VA and MA showed the lowest. Interestingly, even though CT 128 showed high connectivity comparable to NJ, the decreased number of outgoing versus incoming 129 connections explains the low connectivity shown by MA. This is also supported by the 130 outbreak's high transitional connectivity from NY to CT (NY ->CT) rendering CT as a potential 131 bottleneck (figure 3). 132 133 Previously, by considering the geographic distance from the initial hotspots NY and WA, we 135 found a strong association between the distance and the severity of the outbreak for the first 136 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 2, 2021. incoming, outgoing and ingrowing transmission rates between states, and a transmission-based 141 normalized distance of each state from New York representing the viral flow (described below). 142 The full feature set includes: maximum Reproduction rate per state (Rt) usually in April, 143 Urbanization Index (U), Geographic or transition-based distance from New York (D or trD), and 144 incoming, outgoing and ingrowing transmission rates. To determine the importance of regional 145 transitional connectivity in addition to these factors in explaining and predicting the outbreak 146 intensity during the whole first wave, we built 4 regression models with increasing complexity 147 that combine phylogenetic information with epidemiological data from 10 dates (April 29 th , May (see methods). By replacing D with trD we were able to significantly increase our model's 159 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 2, 2021. ; https://doi.org/10.1101/2021.01.30.21250785 doi: medRxiv preprint predictability throughout the first wave (p=0.0003, figure S8 ). In our third model (figure 4iii), we 160 returned to using the geographic distance D, but this time we also included each states' total 161 incoming, outgoing and ingrowing rates. Finally, in our 4 th model (figure 4iv), we again replaced 162 D with trD, while also including states' incoming and outgoing rates. While our 4 th model also 163 integrates transitional connectivity in trD, this information is also used in calculating each state's 164 incoming, outgoing and ingrowing connectivity. Therefore, as expected, factors trD, incoming 165 and outgoing rates often behave in a complementary manner. However, model 4 is still 166 significantly more informative than model 3 (p=0.0273). Moreover, model 4 indicates that the 167 initial importance of trD during the beginning of the outbreak, is gradually replaced by the state's 168 connectivity rate, as the outbreak spreads away from the initial hotspots. Yule process, ii) exponential growth, iii) logistic growth iv) Bayesian Skyline v) Birth-Death 219 skyline. The BEAST suite also includes multiple software tools that aid in selecting appropriate 220 models and parameters (BEAUti) to infer a phylogenetic tree using Bayesian inference, 221 coalescent theory and speciation with respect to the time of sequence collection. We evaluated 222 the efficacy of these models using Tracer v1.7.1(46). The best model (Yule process) for this data 223 was selected based on the estimated sample size, posterior probabilities, and reports on algorithm 224 convergence. 225 226 Estimating Transitional Connectivity 227 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 2, 2021. ; https://doi.org/10.1101/2021.01.30.21250785 doi: medRxiv preprint Using custom scripts, we were able to parse the inferred phylogenetic trees into groups of 228 sequences based on the tree bipartitions. Then, by further parsing the groups in ascending order 229 based on group size (from groups of 2 to X=10), we determined all possible pairs and state 230 connectivity based on dates. For example, pair of sequences {NY-PV09151_USA_NY_2020-03-231 22 and CT-UW-6574_USA_CT_2020-04-03} would depict an outgoing connectivity between 232 New York (NY) and Connecticut (CT) denoted as NY>CT +1 (see figure 3i) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 2, 2021. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 2, 2021. ; https://doi.org/10.1101/2021.01.30.21250785 doi: medRxiv preprint on the prediction of the per-state death rate. We use data from 7 states (NY, CT, MA, PA, NJ, 253 VA, MD), over a series of 10 timepoints from April 29 to July 23. We regress the per-state death 254 rate (the cumulative ratio of deaths to cases from the earliest date) on either three variables 255 (Transmission rate (R0), Urbanism, Distance from NYC) or six variables (Transmission rate 256 (R0), Urbanism, Distance from NYC, ingoing/outgoing/ingrowing rates per-state). Prior to the 257 analysis, we Z-score all variables (enforcing zero mean and unit covariance). For distance from 258 NYC, we use either the geographic distance between the state's capital and NYC, or the 259 transition distance as defined below. For each model, we calculate the log-likelihood by fitting a 260 variance parameter to the predicted outputs and using a Gaussian noise model. Hence, we set 261 probability between states and . We make such updates for = 1, = 2 and = 2, = 1 283 simultaneously, hence breaking the link in both directions. We then recalculate the distances 284 1 2 , i.e. the distance between states i and j, given the link between 1 and 2 has been broken. 285 We then use these to estimate the overall predicted reduction in the death-rate given the break as: 286 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 2, 2021. Here, we used SARS-CoV2 genomes to determine regional connectivity in a case study of 7 298 states, where New York acted as an initial hotspot. By combining epidemiological demographic 299 and genetic information, we used four regression models to evaluate the importance of different 300 factors that contribute to outbreak severity throughout the first viral wave. 301 Our results can explain the discordance between regions and strategies, especially between the 302 first and second pandemic waves. For example, states within distance from hotspots are able to 303 deal with a milder initial outbreak, before the virus establishes at a later timepoint, depending on 304 transitional distance (i.e., the speed of the wave) and regional connectivity. Similarly, states with 305 lower connectivity (e.g., naturally or physically isolated regions) can be more efficient in battling 306 the viral spread, as they deal with reduced viral wave and incoming infections. This also suggests 307 that reducing incoming transmission routes (through pharmaceutical or non-pharmaceutical 308 interventions) can have a significant effect in addition to local mitigation strategies such as 309 lockdowns. This does not necessarily mean complete isolation, but rather a blockade on 310 transmission routes with high connectivity. However, our results also suggest that states deciding 311 to follow less stringent mitigation strategies are also largely responsible for their outgoing viral 312 connectivity, affecting neighboring regions, while often taking advantage of the low incoming 313 connectivity resulting from neighboring lockdowns in return. 314 By deriving genetic connectivity between regions using genomic information, we combined 315 genetic information with demographic and epidemiological data to create a model and a proxy 316 for the flow of the viral wave in order to study factors that temporally contribute to the severity 317 of local outbreaks throughout the pandemic. Then we used this model to consider the outcome of 318 selective intervention strategies using geographic blockades. Overall, our results suggest that 319 unified mitigation strategies are more efficient in tackling a pandemic, while also providing a 320 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 2, 2021. ; https://doi.org/10.1101/2021.01.30.21250785 doi: medRxiv preprint framework within which to pursue these strategies. Our framework can be implemented for both 321 pharmaceutical (e.g vaccination) or non-pharmaceutical interventions (e.g., lockdowns, 322 blockades). NextStrain: Real-time tracking of pathogen evolution The hepatitis C sequence database in 410 Evolution and epidemic spread of SARS-CoV-2 in Brazil Evolutionary and structural analyses of SARS-CoV-2 D614G spike protein 414 mutation now documented worldwide Accommodating individual travel history and unsampled diversity in 416 Bayesian phylogeographic inference of SARS-CoV-2 Tracking the COVID-19 pandemic in Australia using genomics Genomic surveillance reveals multiple introductions of SARS-CoV-2 into 421 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity Evidence for Limited Early Spread of COVID-19 Within the United 423 Coast-to-Coast Spread of SARS-CoV-2 during the Early Epidemic in 426 the United States Cryptic transmission of SARS-CoV-2 in Washington State The emergence of SARS-CoV-2 in Europe and North America Epidemiological data from the COVID-19 outbreak, real-time case 432 information Prediction models for diagnosis and prognosis of covid-19: Systematic 434 review and critical appraisal Estimation of Excess Deaths Associated with the COVID-19 Pandemic in the United States Population-level COVID-439 19 mortality risk for non-elderly individuals overall and for non-elderly individuals 440 without underlying diseases in pandemic epicenters Impact of Non-pharmaceutical 443 The authors declare no competing interests 332 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 2, 2021. York outbreak, which we consider as the outbreak epicenter. In (2iii-vi) we show the 367 phylogenetic tree analysis for Massachusetts, New Jersey, Virginia and Connecticut as rooted by 368 the older lineage that contains sequences from Wuhan dating in 2019. 369 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 2, 2021. ; https://doi.org/10.1101/2021.01.30.21250785 doi: medRxiv preprint 370 371 372 Figure 3 . In i) we use a tree adaptation example to explain the workflow that we implemented 373 in order to assign directed connectivity, incoming, ingrowing and outgoing connections between 374 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 2, 2021. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (iii-iv) Models 3-4 (six factors). Likelihood significance was found for models (1) vs (2) and (3) 385 versus (4). (Model 1 vs. 2 / 3 vs. 4; p=0.0003, 0.0273 resp., 2-sided t-test for Pearson's r). We 386 then estimated the sum of total deaths that would be saved if we remove any geographic link 387 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 2, 2021. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)The copyright holder for this preprint this version posted February 2, 2021. ; https://doi.org/10.1101/2021.01.30.21250785 doi: medRxiv preprint