key: cord-0282826-gwxn50d9 authors: Moncla, L. H.; Black, A.; DeBolt, C.; Lang, M.; Graff, N. R.; Perez-Osorio, A. C.; Mueller, N. F.; Haselow, D.; Lindquist, S.; Bedford, T. title: Repeated introductions and intensive community transmission fueled a mumps virus outbreak in Washington State date: 2020-10-21 journal: nan DOI: 10.1101/2020.10.19.20215442 sha: f1587832f7c85fa1e76d7b4bf9c4f36f9b24673c doc_id: 282826 cord_uid: gwxn50d9 In 2016/2017, Washington State experienced a mumps outbreak despite high childhood vaccination rates, with cases more frequently detected among school-aged children and members of the Marshallese community. Sequencing 166 mumps genomes revealed that mumps was introduced into Washington approximately 13 times, primarily from Arkansas, sparking multiple, co-circulating transmission chains. Neither vaccination status nor age were strong determinants of transmission. Instead, the outbreak in Washington was overwhelmingly sustained by transmission within the Marshallese community. Our findings underscore the utility of genomic data to clarify epidemiologic factors driving transmission, and pinpoint contact networks as critical determinants of mumps transmission in Washington. In 2016 and 2017, mumps virus swept the United States in the country's largest outbreak since 29 the pre-vaccine era 1 . Washington State was heavily affected, reporting 889 confirmed and 30 probable cases. Longitudinal studies 2 , epidemiologic outbreak investigations 3 , and epidemic 31 models 4 suggest that mumps vaccine-induced immunity wanes over 13-30 years, consistent 32 with the preponderance of young adult cases in recent outbreaks. Like with other recent mumps 33 outbreaks, most Washington cases in 2016/17 were vaccinated. Unusually though, incidence 34 was highest among children aged 10-18 years, younger than expected given waning immunity. 35 The outbreak was also peculiar in that approximately 52% of the total cases were Marshallese, 36 an ethnic community that comprises ~0.3% of Washington's population. These same 37 phenomena were also observed in Arkansas. Of the 2,954 confirmed and probable Arkansas 38 cases, 57% were Marshallese, and 57% of cases were children aged 5-17 5 . Amongst school-39 aged children in Arkansas and Washington, >90% had previously received 2 doses of MMR 40 vaccine 5 . The high proportion of vaccinated cases, younger-than-expected age at infection, 41 disproportionate impact on the Marshallese community, and epidemiologic link to Arkansas 42 suggest that factors beyond waning immunity are necessary to explain mumps transmission 43 during this outbreak in Washington. 44 Review Board, and classified as not involving human subjects. Samples were selected for 100 sequencing to maximize temporal and epidemiologic breadth and to ensure successful 101 sequencing. As such, samples were chosen based on the date of sample collection, the PCR 102 cycle threshold (Ct) (targeting samples with Cts < 30), case vaccination status, and community 103 status (Marshallese or non-Marshallese). All metadata, including case vaccination status were 104 transferred from WA DOH to FHCRC in a de-identified form. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint transcription reaction, and for each pool for each PCR reaction. These negative controls were 198 carried through the library preparation process and sequenced alongside actual samples. Any 199 samples whose negative controls from any step in the process resulted in >10x mumps genome 200 coverage were re-extracted and sequenced. 201 Human reads were removed from raw FASTQ files by mapping to the human reference genome 203 GRCH38 with bowtie2 18 version 2.3.2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml). 204 Reads that did not map to the human genome were output to separate FASTQ files and used 205 for all subsequent analyses. Illumina data was analyzed using the pipeline described in detail at 206 https://github.com/lmoncla/illumina_pipeline. Briefly, raw FASTQ files were trimmed using 207 Trimmomatic 19 (http://www.usadellab.org/cms/?page=trimmomatic), trimming in sliding windows 208 of 5 base pairs and requiring a minimum Q-score of 30. Reads that were trimmed to a length of 209 <100 base pairs were discarded. Trimming was performed with the following command: java -jar 210 then remapped each sample's trimmed FASTQ files to its own consensus sequence. These 218 bam files were again manually inspected in Geneious, and a final consensus sequence was 219 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 21, 2020. American mumps genomes on nextstrain.org/mumps/na. We used a discrete trait model 27 and 256 estimated migration rates using BSSVS and ancestral states with 27 geographic locations. 257 Here, "state" refers to the inferred ancestral identity of an internal node, where the inferred 258 identity could be any of the 27 geographic locations (US states and Canadian provinces) in the 259 dataset. For the prior on non-zero rates for BSSVS, we specified a Poisson distribution with 260 mean 0.69 with an offset of 26. As a prior on each pairwise migration rate, we used an 261 exponential distribution with mean 1. All other priors were left at default values. We ran this 262 analysis for 100 million steps, sampling every 10,000, and removed the first 10% of sampled 263 states as burnin. A maximum clade credibility tree was summarized with TreeAnnotator, using 264 the mean heights option. All tree plotting was performed with baltic 265 (https://github.com/evogytis/baltic). Input XML files and output results are available at 266 https://github.com/blab/mumps-wa-phylodynamics/tree/master/phylogeography. 267 268 Testing for descendants in divergence trees 269 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint transmission chains than other groups, we developed a statistic to quantify transmission in the 271 tree. Using the tree JSON output from the Nextstrain pipeline 21 , we traversed the tree from root 272 to tip. For each tip that lay on an internal node, i.e., had a branch length of nearly 0 (< 1 x 10 -16 ), 273 we counted the number of descendants. We collapsed very small branches (those with branch 274 lengths less 1 x 10 -16 ) to obtain polytomies. We then classified tips as either having descendants 275 (i.e., the number of descendents was > 0) or not having descendants. Here, we define a 276 "descendant" as a tip that occurs in any downstream portion of the tree, i.e., it falls along the 277 same lineage but to the right of the parent tip. A diagram of what we classify as "descendant 278 tips" is shown in Figure 4a . The probability of having descendants was evaluated as a function 279 of community status, age, and vaccination status with logistic regression as described below. 280 For each Washington tip in the tree, we classified it as either having descendants (coded as a 1) 282 or not having descendants (coded as 0). For each tip, we coded it's corresponding age, 283 vaccination status, and community membership as a predictor variable input into a logistic 284 regression model. We coded these attributes as follows: For community membership, non-285 Marshallese tips were coded as 0 and Marshallese tips were coded as 1. Age was coded as a 286 single, continuous variable. In our dataset, there were 3 classifications for vaccination status: 287 up-to-date, not up-to-date, and unknown vaccination status. According to the Advisory 288 Committee on Immunization Practices (ACIP) 28 , individuals aged 5-18 had to have received 289 both recommended doses of mumps-containing vaccine, children aged 15 months to 5 years 290 required 1 dose of mumps-containing vaccine, and adults over 18 had to have received at least 291 1 dose of mumps-containing vaccine to be classified as up-to-date for mumps vaccination. Individuals under 15 months are considered up-to-date without any doses of mumps-containing 293 vaccine. Not up-to-date individuals are those with a known vaccination status who did not 294 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint qualify under criteria to be classified as up-to-date. Individuals who could not provide 295 documentation regarding their MMR vaccination history were considered to have "unknown" 296 vaccination status. Individuals with "known" vaccination status could either be fully up-to-date, 297 undervaccinated, or unvaccinated. To ensure that we measured the effect of vaccination among 298 individuals who knew their vaccination status, we coded vaccination information using two 299 dummy variables in our logistic regression, one signifying whether vaccination status was 300 known or not, and one indicating whether vaccination was up-to-date or not. We then fit a 301 logistic regression model to this data using the glm package in R 302 (https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/glm), specifying a 303 binomial model as 304 Pr(having descendants) ~ β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 , where x 1 represents 0 or 1 value for 306 member of Marshallese community (Not Marshallese coded as a 0, Marshallese coded as a 1), 307 x 2 represents the numeric value of age, x 3 represents 0 or 1 value for whether vaccination status 308 is unknown (having a known vaccination status coded as a 0, having an unknown vaccination 309 status coded as a 1) and x 4 represents 0 or 1 value for whether vaccination status is up-to-date 310 (up-to-date coded as a 0 and not up-to-date coded as a 1). Under this formulation, an individual 311 with unknown vaccination status would be coded as x 3 =1, x 4 =0, an individual who is up-to-date 312 would be coded as x 3 =0, x 4 =0, and an individual who is not up-to-date is coded as This encoding allows us to evaluate the effects of having an unknown vaccination status and a 314 vaccination status that is not up-to-date. Age was normalized such that values fall between 0 315 and 1 by: (x 2 -minimum age in dataset) / (maximum age in dataset -minimum age in dataset). 316 317 P-values were assigned via a Wald test, and inferred coefficients were exponentiated to return 318 odds ratios. All code used to parse the divergence tree and formulate and fit the regression 319 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 21, 2020. Using the full set of North American mumps sequences, we designated all non-Washington 325 North American sequences as "background" sequences. We then separated Washington 326 sequences into Marshallese tips (57 total sequences) and non-Marshallese tips (52 total 327 sequences). For this analysis, we excluded the genotype K sequence in our dataset due to its 328 extreme divergence from other viruses sampled in Washington, which were all genotype G. For 329 each group (Marshallese vs. non-Marshallese), we then generated subsampled datasets 330 comprised of a random sample of 1 to n sequences, where n is the number of total sequences 331 available for that group. For each number of sequences, we performed 10 independent 332 subsampling trials. Subsampling was performed without replacement. So, for community 333 members, we generated 10 datasets in which 1 community member sequence was sampled, 334 then 10 datasets in which 2 community members sequences were sampled, etc. up to 10 335 datasets in which all 57 community members sequences were sampled. For each subsampled 336 dataset, we then combined these subsampled datasets with the background North American is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint Here, "state" refers to the inferred ancestral identity of an internal node, where the identity could 347 be inferred as "Marshallese" or "not Marshallese". The multitype tree model 29 in BEAST 2 348 v2.6.2 30 infers the effective population sizes of each deme and the migration rates between 349 them. Because the multitype tree model requires that all partitions contain all demes, we could 350 only analyze 4 clades that circulated in Washington State and included both Marshallese and 351 non-Marshallese tips. We generated an XML in BEAUti v2.6.2 with 4 partitions, and linked the 352 clock, site, and migration models. We used a strict, fixed clock, set to 4.17 x 10 -4 substitutions 353 per site year, and used an HKY substitution model with 4 gamma-distributed rate categories. 354 We chose this substitution rate based on the clock rate estimated from the North American tree 355 inferred with IQTree and Treetime. Migration rates were inferred with the prior specified as a 356 truncated exponential distribution with a mean of 1 and a maximum of 50. Effective population 357 sizes were inferred with the prior specified as a truncated exponential distribution with a mean of 358 1, a minimum value of 0.001, and a maximum value of 10,000. All other priors were left at 359 default values. In order to improve convergence, we employed 3 heated chains using the 360 package CoupledMCMC 31 , where proposals for chains to swap were performed every 100 361 states. The analysis was run for 100 million steps, with states sampled every 1 million steps. We 362 ran this analysis 3 independent times, and combined log and tree file output from those 363 independent runs using LogCombiner, with the first 10% (1000 states) of each run discarded as 364 burnin. We then summarized these combined output log and tree files. A maximum clade 365 credibility tree was inferred using TreeAnnotator with the mean heights option. To ensure that 366 results were not appreciably altered by the migration rate prior, we also repeated these 367 analyses with migration rates inferred with the prior specified as a truncated exponential 368 distribution with a mean of 10 and a maximum of 50. 369 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. non-Marshallese tips. We generated 3 randomly subsampled datasets, and for each one ran 3 382 independent chains, with each chain run for 50 million steps, sampling every 500,000. For one 383 of the subsampled datasets, none of the chains converged after 20 days. In each of the 384 remaining 2 subsampled datasets, 2 out of 3 chains converged. We combined these converged 385 chains using LogCombiner, with the first 10% of each run discarded as burn-in. We then 386 summarized these combined output log and tree files, and inferred a maximum clade credibility 387 tree using TreeAnnotator with the mean heights option. 388 389 The analysis as described above assumes that each introduction into Washington State is an 390 independent observation of the same structured coalescent process, and that the dataset 391 represents a random sample of the underlying population. Additionally, this approach requires a 392 priori definition of which sequences are part of the same Washington State transmission chain. 393 Finally, the above analysis could only make use of the 4 Washington introductions with both 394 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint these issues, we supplemented the above approach with an additional analysis using the 396 approximate structured coalescent 32 in MASCOT 33 . Using all of the Washington sequences, 397 we specified three demes: Marshallese in Washington, non-Marshallese in Washington, and 398 outside of Washington. To account for any transmission that happened outside of Washington 399 State, the "outside of Washington" deme acted as a "ghost deme" from which we did not use 400 any samples. The effective population size of this "outside of Washington" deme then describes 401 the rate at which lineages between any location outside of Washington share a common 402 ancestor. Including specific samples from outside of Washington would bias the inferred 403 effective population size towards the coalescent rates of the sampled locations, by incorporating 404 local transmission dynamics of other locations. We then estimated migration rates and effective 405 population sizes for all 3 demes, but fixed the migration rates such that the unsampled deme 406 ("outside of Washington") could only act as a source population. This is motivated by not having 407 observed obvious migration out of Washington State in our previous analysis here. We ran this 408 analysis for 10 million steps, sampling every 5000, and discarded the first 10% of states as 409 burnin. in 889 confirmed and probable cases across Washington (Fig. 1) . Individuals aged <1 to 64 418 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint years were affected, but the highest rate of infection occurred in children aged 10-14 (44.9 419 cases per 100,000) and 15-19 (47.0 per 100,000) (Supplemental Table 1 ). Among individuals 420 aged 5-19 years old, 91% were considered up-to-date on mumps vaccine. Adults in the age 421 group most likely to be parents of school aged children 20-39 were infected at a rate of only 422 12.9 cases per 100,000, but comprised a significant proportion (29%) of total cases 423 (Supplemental Table 1 ). While Marshallese individuals comprise only ~0.3% of Washington's 424 total population, they accounted for 52% of reported mumps cases (Supplemental Table 2 ). 425 Among Marshallese individuals aged 5-19, 93% were up-to-date on vaccination, suggesting that 426 this over-representation is not attributable to poor vaccine coverage. 427 428 We combined our sequence data with publicly available full genome sequences sampled from 430 North America between 2006 and 2018, and built a time-resolved phylogeny, inferring migration 431 history among 26 US states and Canadian provinces (Fig. 2, Supplemental Figure 1 ). 432 American mumps viruses sampled from the same times. Except for 2 sequences (one from 434 Wisconsin in 2006, genotype A, and one from Washington in 2017, genotype K, both excluded 435 from Fig. 2) , all samples in our dataset were genotype G viruses. Ten Washington sequences 436 were highly divergent from other North American genotype G viruses, with a time to the most 437 recent common ancestor (TMRCA) of ~22 years (Fig. 2) . The remaining Washington sequences 438 nest within the diversity of other North American viruses, and descend from the same mumps 439 lineage that has circulated in North America since 2006 (Fig. 2) . We observe substantial 440 geographic mixing along the tree. While viruses from Massachusetts (dark green tips and 441 branches) seeded outbreaks in the Northeast and Midwest, we also infer transmission from 442 Massachusetts to Texas, Louisiana, Alabama, and British Columbia. Despite the close 443 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint geographic proximity between British Columbia and Washington, most British Columbia 444 sequences form a distinct cluster on a long branch (Fig. 2) , suggesting seeding from an 445 unsampled location. Although viruses from Washington are scattered throughout the phylogeny, 446 most cluster within a clade of viruses sampled in Arkansas (Fig. 2) . 447 448 Estimating the number and timing of viral introductions is important for estimating epidemiologic 450 parameters and evaluating surveillance networks, but is challenging with case count data alone. 451 The Washington Department of Health had identified a single potential index case in October of 452 2016. To determine whether the genomic data supported a single introduction, we separated 453 each introduction inferred in the maximum clade credibility tree and plotted each as its own 454 transmission chain (Fig. 3a) . We then enumerated the number of transitions into Washington in 455 each tree in the posterior set, and plotted the distribution of Washington introductions consistent 456 with the phylogeny (Fig. 3b) . 457 458 Genomic data show that mumps was introduced into Washington State approximately 13 459 independent times (95% highest posterior density, HPD: 11 -16), from geographically disparate 460 locations (Fig. 3) . Ten sampled tips descend from long branches (~22 years), suggesting likely 461 transmission from unsampled geographic locations. We infer introductions from Ontario and 462 Missouri that each lead to 1-3 sampled cases (Fig. 3b) , suggesting limited onward transmission 463 following these introductions. In contrast, 4 introductions from Arkansas account for 92/110 464 sequenced cases, suggesting that these introductions led to more sustained chains of 465 transmission following introduction (Fig. 3b) . We refer to the largest cluster as the "primary 466 outbreak clade," and infer its introduction from Arkansas to Washington around August of 2016 467 (August 7, 2016, 95% HPD: July 11, 2016 to September 19, 2016, Fig. 3b ), 3.5 months before 468 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. We developed a transmission metric to quantify whether Marshallese cases were enriched at 499 the beginnings of successful transmission chains. We traverse the full genome divergence 500 phylogeny (Supplemental Figure 2a) from root to tip. When we encounter a tip that lies on an 501 internal node, we enumerate the number of tips that descend from its parent node. We then 502 classify each tip in the phylogeny as either having descendants or not, and compare the 503 proportion of tips with and without descendants among groups (Fig. 4a , see Methods for more 504 details). Given our sampling proportion (110 sequences/889 total cases, ~12%), we do not 505 expect to have captured true parent/child infection pairs. Rather, we expect to have 506 preferentially sampled long, successful transmission chains within the state. This allowed us to 507 assess whether infections with particular attributes (community membership, vaccination status, 508 age) are predictors for being upstream in these chains, and thus associated with sustained 509 transmission. We evaluated the probability of having descendants in the tree as a function of 510 vaccination status, age, and community status with logistic regression (see Methods for details 511 and full model). Neither age nor vaccination status were significantly associated with the 512 presence of downstream tips in the tree (Supplemental Table 3 ). However, Marshallese cases 513 were significantly more likely to have downstream descendants than non-Marshallese cases 514 (odds ratio = 3.2, p = 0.00725, Supplemental Table 3 ). While only 27% (14/52) of non-515 Marshallese tips were ancestral to downstream samples, 56% (32/57) of Marshallese tips had 516 downstream descendents. These results suggest that community membership was a significant 517 determinant of sustained transmission while controlling for vaccination status and age. 518 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint In the absence of recombination, closely-linked infections will cluster together on the tree, while 521 unrelated infections should fall disparately on the tree, forming multiple smaller clusters. We 522 inferred the number of Washington-associated clades in the tree as a function of whether 523 sampled infections came from Marshallese or non-Marshallese individuals. Using the full North 524 American phylogeny, we removed all Washington sequences and separated them into viruses 525 sampled from cases noted as Marshallese or non-Marshallese. Then, separately for each 526 group, we added sequences back into the tree one by one, until all sequences for that group 527 had been added. For each number of sequences, we performed 10 independent trials (see 528 Methods for complete details), and at each step, enumerated the number of inferred 529 Washington clusters in the phylogeny. For comparison, we also grouped tips by vaccination 530 status and repeated this analysis. 531 532 For tips from non-Marshallese individuals, the number of inferred clusters increases linearly as 533 tips are added to the tree (Fig. 4b) . This suggests that these infections are not directly related, 534 and are not part of sustained transmission chains (Fig. 4b) . In contrast, the number of inferred 535 clusters for Marshallese tips stabilizes after ~10 tips are added, even as almost 50 more 536 sequences are added to the tree. This pattern likely arises because many Marshallese 537 infections are part of the same long transmission chain, such that newly added tips nest within 538 existing clusters. We do not observe similar differences among vaccination groups 539 (Supplemental Figure 3) . These findings are consistent with distinct patterns of transmission 540 among Marshallese versus non-Marshallese cases: transmission among Marshallese 541 individuals resulted in a small number of large clusters, while transmission among non-542 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. We next separated each Washington introduction and colored each tip by community 546 membership. Every introduction that was not seeded from Arkansas led to exclusively non-547 Marshallese tips (Fig. 4c) . The primary outbreak clade is particularly enriched, containing 43 549 Marshallese tips and 26 non-Marshallese tips, hinting that transmission chains are longer when 550 Marshallese cases are present in a cluster. 551 552 Internal nodes on a phylogeny represent ancestors to subsequently sampled tips, while terminal 554 nodes represent viral infections that did not give rise to sampled progeny. If the mumps 555 outbreak were primarily sustained by transmission within one group, the backbone of the 556 phylogeny and the majority of internal nodes should be inferred as that group. We selected the 557 4 introductions that contained both Marshallese and non-Marshallese tips (Fig. 4c , the 4 558 Arkansas introductions), and reconstructed ancestral states along the phylogeny and 559 migration/transmission rates between Marshallese and non-Marshallese groups using a 560 structured coalescent model. 561 562 74/88 internal nodes were inferred to circulate within the Marshallese community with posterior 563 probability of at least 0.95 (Fig. 5a, b) . Transmission events from Marshallese into non-564 Marshallese demes resulted in short, terminal transmission chains (Fig. 5a, dark blue 565 branches). This suggests that transmission was overwhelmingly maintained within the 566 Marshallese community, and that infections seeded into the non-Marshallese community did not 567 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint estimate 29 transmission events from Marshallese to non-Marshallese groups (95% HPD: 21, 570 37), and only 6 (95% HPD: 0, 14) from non-Marshallese to Marshallese groups (Fig. 5d) . This 571 strongly suggests that transmission predominantly occurred in one direction: transmission 572 events leading to non-Marshallese infections usually died out, and did not typically re-seed 573 circulation within the Marshallese community. These results hold true regardless of migration 574 rate prior (Supplemental Figure 4) . 575 To ensure that our results were not driven by unequal sampling within the analyzed clades, we 577 generated 3 datasets in which the number of Marshallese and non-Marshallese tips were 578 subsampled to be equal. For each of these 3 subsampled datasets, we ran 3 independent 579 chains under the same model described above. Chains converged for 2 of the 3 subsampled 580 datasets. In the converged chains, we recover very similar tree topologies (Supplemental 581 The structured coalescent model requires both groups to be present in each cluster, excluding 587 several small Washington introductions composed entirely of non-Marshallese tips (Fig. 4c) . To 588 account for this, we used all Washington genotype G sequences in our dataset and estimated a 589 single tree using an approximate structured coalescent model 33 . All Washington sequences 590 were annotated as either Marshallese or not Marshallese. To provide a "source" population for 591 the extensive diversity among our disparate Washington introductions, we also specified a third, 592 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint unsampled deme, for which migration was only allowed to proceed outward. As above, we 593 inferred very few non-Marshallese internal nodes (Supplemental Figures 6 and 7) . All internal 594 nodes in the primary outbreak group are inferred as Marshallese with high probability, while 595 non-Marshallese cases are present as terminal nodes. We recovered support for a single non-596 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint previously vaccinated individuals, and result in different virus lineages infecting individuals in 618 different vaccination categories. We colored the tips of all Washington cases in our phylogeny to 619 represent whether they were derived from individuals who were up-to-date, not up-to-date, or 620 whose vaccination status was unknown. Mirroring overall vaccination coverage in Washington, 621 the vast majority of samples in our dataset were from up-to-date individuals. The not up-to-date 622 individuals present in our dataset are dispersed throughout the phylogeny and do not cluster 623 together (Fig. 6) , suggesting that there is no genetic difference between viruses infecting 624 individuals with different vaccination statuses. Mumps has historically caused outbreaks in communities with strong, interconnected contact 661 patterns 5,45,46 , and in dense housing environments 47 . While a combination of waning immunity 662 and dense housing settings make college campuses ideal for mumps outbreaks, the 663 Washington and Arkansas outbreaks show that populations other than young adults are at risk. 664 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. there has been limited published data on long-term health impacts of nuclear exposure, and 687 significant concern remains within the community 13 . Finally, when Marshallese individuals do 688 access care, they report experiencing disdain from healthcare workers 57 and sub-optimal care 689 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint 55 . Interviews with medical workers show that blame for poor Marshallese health outcomes is 690 sometimes placed on host genetics or cultural practices 57 , poor health literacy 58 , or choosing to 691 delay care 58 , with less consideration given to how the economic and legal impacts of US 692 occupation affect the health of Marshallese individuals. These factors compound, and 693 Marshallese individuals report hesitation to seek medical care, even when sick 55 . Hesitancy to 694 seek care could have contributed to mumps transmission if sick individuals were primarily cared 695 for at home without knowledge of or the ability to implement community-isolation protocols. The findings of this paper demonstrate the importance of expanding our understanding of 705 populations at risk for mumps re-emergence, so that rapid and comprehensive outbreak 706 response strategies can be implemented to mitigate negative health impacts for all affected 707 communities. Finally, future work to disentangle the complex interplay between healthcare 708 access, social and economic disparity, and respiratory virus risk will be essential for mitigating 709 health impacts of mumps and other respiratory viruses. 710 711 Acknowledgments 712 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint Marshallese community, and works as a translator to assist the community in Washington State. 715 These insightful discussions were absolutely critical for contextualizing our results. We would 716 also like to sincerely thank Kelsey Florek for locating and sharing mumps samples from 717 Wisconsin, Ohio, Missouri, Alabama, and North Carolina, which greatly enhanced the analyses 718 presented here. We also thank Jeff Joy for graciously sharing mumps genomes from British 719 week, and the y-axis represents the number of confirmed and probable cases for that week. We combined all publicly available North American mumps genomes and built a time-resolved 880 phylogeny. Here, we display the maximum clade credibility tree, where each color represents a 881 unique US state or Canadian province and the x-axis represents the collection date (for tips), or 882 the inferred time to the most recent common ancestors (for internal nodes). We inferred 883 geographic history using a discrete trait model. The color of each internal node represents the 884 posterior probability of the inferred geographic location, where increasingly grey tone represents 885 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint an internal node, i.e., have a branch length of zero, we infer the number of child tips that 900 descend from that tip's parental node. For each tip in the example tree, the number of 901 descendants we would infer is annotated along side it. All tips that have a nonzero branch 902 length are annotated as having 0 descendants. We can then compare whether sequences of 903 particular groups (here, blue vs. red) are more likely to have descendants in the tree via logistic 904 regression. b. We separated all Washington tips and classified them into Marshallese and not 905 Marshallese. We then performed a rarefaction analysis and plot the number of inferred 906 Washington clusters (y-axis) as a function of the number of sequences included in the analysis 907 (x-axis). Dark blue represents not Marshallese sequences, and light blue represents 908 Marshallese sequences. Each dot represents the number of trials in which that number of 909 clusters was inferred, and the solid line represents the mean across trials. c. The exploded tree 910 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. The exploded tree as shown in Figure 3a is shown, but tips are now colored by whether they 932 represent cases from individuals who are up-to-date for mumps vaccination, not up-to-date, or 933 cases for which vaccination status was unknown. The color of the large dot represents the 934 inferred geographic location from which the Washington introduction was seeded. 935 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint We combined all publicly available North American mumps genomes and built a time-resolved phylogeny. Here, we display the maximum clade credibility tree, where each color represents a unique US state or Canadian province and the x-axis represents the collection date (for tips), or the inferred time to the most recent common ancestors (for internal nodes). We inferred geographic history using a discrete trait model. The color of each internal node represents the posterior probability of the inferred geographic location, where increasingly grey tone represents decreasing probability. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 21, 2020. We separated each introduction inferred on the maximum clade credibility tree ( Figure 2 ) and plotted them independently. Large, colored dots represent the inferred geographic location that the Washington introduction was seeded from. Branches that extend ealier than July of 2016 are dotted to represent that transmission likely occurred via other, unsampled locations. For reference, the cumulative case counts from Arkansas and Washington are plotted below. b. For each tree in the posterior set, we inferred the number of introductions into Washington. We plot the proportion of trees in the posterior set in which that number of introductions was inferred. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 21, 2020. A schematic for quantifying tips that lie "upstream" in transmission chains. For tips that lie on an internal node, i.e., have a branch length of zero, we infer the number of child tips that descend from that tip's parental node. For each tip in the example tree, the number of descendants we would infer is annotated along side it. All tips that have a nonzero branch length are annotated as having 0 descendants. We can then compare whether sequences of particular groups (here, blue vs. red) are more likely to have descendants in the tree via logistic regression. b. We separated all Washington tips and classified them into Marshallese and not Marshallese. We then performed a rarefaction analysis and plot the number of inferred Washington clusters (y-axis) as a function of the number of sequences included in the analysis (x-axis). Dark blue represents not Marshallese sequences, and light blue represents Marshallese sequences. Each dot represents the number of trials in which that number of clusters was inferred, and the solid line represents the mean across trials. c. The exploded tree as shown in Figure 3a is shown, but tips are now colored by whether they represent Marshallese or non-Marshallese cases. For reference, the number of Washington cases (y-axis) is plotted over time (x-axis), where bar color represents whether those cases were Marshallese or not. c . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 21, 2020. Using the 4 Washington clusters that had a mixture of Marshallese and non-Marshallese cases, we inferred phylogenies using a structured coalescent model. Each group of sequences shared a clock model, migration model, and substitution model, but each topology was inferred separately, allowing us to incorporate information from all 4 clusters into the migration estimation. For each cluster, the maximum clade credibility tree is shown, where the color of each internal node represents the posterior probability that the node is Marshallese. b. For each internal node shown in panel a, we plot the posterior probability of that node being Marshallese. Across all 4 clusters, 74 out of 88 internal nodes (84%) are inferred as Marshallese with a posterior probability of at least 0.95. c. The posterior distribution of the number of "jumps" or transmission events from Marshallese to not Marshallese (light blue) and not Marshallese to Marshallese (dark blue) inferred for the primary outbreak clade. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 21, 2020. The exploded tree as shown in Figure 3a is shown, but tips are now colored by whether they represent cases from individuals who are up-to-date for mumps vaccination, not up-to-date, or cases for which vaccination status was unknown. The color of the large dot represents the inferred geographic location from which the Washington introduction was seeded. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 21, 2020. . https://doi.org/10.1101/2020.10.19.20215442 doi: medRxiv preprint CDCMMWR. Notifiable Diseases and Mortality Tables Effectiveness of a Third Dose of MMR Vaccine for Mumps Outbreak 739 Vaccine waning and mumps re-emergence in the United 741 Mumps in a highly vaccinated Marshallese community in Arkansas, USA: 743 an outbreak report Bravo for the Marshallese: Regaining Control in a Post-Nuclear Health consequences and health systems 747 response to the Pacific U.S. Nuclear Weapons Testing Program An investigation into the prevalence of thyroid disease on Kwajalein 750 A brief history of people and events related to atomic weapons testing in the 752 A history of the people of Bikini following nuclear weapons testing in the 754 Marshall Islands: with recollections and views of elders of Bikini Atoll Barriers and opportunities: a 757 community-based participatory research study of health beliefs related to diabetes in a US 758 Background gamma radiation and soil activity measurements in the northern Marshall 761 Prevalence of antibody to hepatitis A and hepatitis 765 B viruses in selected populations of the South Pacific Serologic markers for hepatitis B 768 among Marshallese accidentally exposed to fallout radiation in 1954 Diabetes mellitus prevalence in 771 out-patient Marshallese adults on Ebeye Island, Republic of the Marshall Islands Effect of US health policies on health care 774 access for Marshallese migrants Fast gapped-read alignment with Bowtie 2 Trimmomatic: a flexible trimmer for Illumina 778 sequence data ViPR: an open bioinformatics database and analysis resource for 780 virology research Nextstrain: real-time tracking of pathogen evolution MAFFT: a novel method for rapid 784 multiple sequence alignment based on fast Fourier transform IQ-TREE: a fast and effective 787 stochastic algorithm for estimating maximum-likelihood phylogenies Maximum-likelihood phylodynamic 790 analysis Exploring the temporal structure 792 of heterochronous sequences using TempEst (formerly Path-O-Gen) Bayesian phylogeography 797 finds its roots Prevention 799 of Measles, Rubella, Congenital Rubella Syndrome, and Mumps Epidemiology of a mumps outbreak in a highly vaccinated island 839 population and use of a third dose of measles-mumps-rubella vaccine for outbreak Mumps outbreak in Orthodox Jewish communities in the United States Mumps vaccine effectiveness in primary schools and households, 844 the Netherlands We the People: Pacific Islanders in the United States Census Bureau. Historical Households Tables Diabetes and Hypertension in Marshallese Adults: Results from Faith-848 Inequities in Access to Medical Care Among Adults Diagnosed with 850 Diabetes: Comparisons Between the US Population Barriers to health services perceived by Marshallese 854 immigrants Congress, 108th United States. Compact of Free Association Amendments Act of Marshallese COFA Migrants in Arkansas Interpretive policy analysis: Marshallese COFA migrants and the 860