key: cord-0441966-5a66ozrd authors: Douglas, Jordan; Welch, David title: PEACH Tree: A Multiple Sequence Alignment and Tree Display Tool for Epidemiologists date: 2021-12-12 journal: nan DOI: nan sha: c8db69fcec1c41b0459eed31bd737d114c7abe46 doc_id: 441966 cord_uid: 5a66ozrd PEACH Tree is an easy-to-use, online tool for displaying multiple sequence alignments and phylogenetic trees side-by-side. PEACH Tree is powerful for rapidly tracing evolutionary and transmission histories by filtering invariant sites out of the display, and allowing samples to readily be filtered out of the display. These features, coupled with the ability to display epidemiological metadata, make the tool suitable for infectious disease epidemiology. PEACH Tree further enables much needed communication between the fields of genomics and infectious disease epidemiology, as exemplified by the COVID-19 pandemic. Genomic methods are now instrumental in efforts to control infectious diseases. While epidemiological models (Brauer, 2008) are primarily informed by case counts, trajectories, and population data -such as movement, contact, and demographygenomic methods exploit the fast mutation rates of certain pathogens to infer evolutionary and transmission histories (Grenfell et al., 2004) . Widespread pathogen sequencing forms the basis of pathogen surveillance technologies (Gardy and Loman, 2018) such as NextStrain (Hadfield et al., 2018) and GISAID (Shu and McCauley, 2017) and has informed public health response for a range of epidemics (Baize et al., 2014; Faria et al., 2017; Seemann et al., 2020; Douglas et al., 2021c) . Historically, infectious disease epidemiology and genomics have existed as distinct fields, however the rise of real-time sequencing and its ability to inform outbreak response has demonstrated the benefit in strong communication between the two. Visualisation is a vital element in scientific communication. Biological sequences, such as viral genomes, can be aligned and viewed using a wide range of existing programs (Larsson, 2014; Waterhouse et al., 2009; Larkin et al., 2007) . Phylogenetic trees can then be inferred from multiple sequence alignments (MSA) and displayed with a range of software packages (Paradis et al., 2004; Rambaut, 2009; Vaughan, 2017; Douglas, 2021) . Viewing large MSAs or large trees are nontrivial tasks and the display can easily become overloaded with information. However, when studying infectious diseases, the segregating sites (i.e., alignment positions which vary among the samples) are typically of more interest than the invariant sites, and some cases, outbreaks, species, or other taxonomic groups are often of more interest than others. Displaying the full dataset may overwhelm the user with unwanted information and impede computational performance. For an epidemiologist in particular, the ability to easily trace transmission histories, view symptom onset dates, and link genomes to case numbers (and other metadata) are desirable features in any software package built to view infectious disease transmission. We present PEACH Tree (Plotting Epidemiological and Alignment CHaracters onto phylogenetic Trees), a program for viewing multiple sequence alignments and phylogenetic trees specifically designed for (but not restricted to) infectious disease epidemiology and pathogen surveillance. PEACH Tree is responsive, easy-to-use, and runs in the web-browser. When opening PEACH Tree, the user is prompted for an MSA (FASTA format) and/or a tree (Newick/NEXUS format). If a tree is not provided, a neighbour joining tree can be constructed from the MSA (Saitou and Nei, 1987) . The user may also upload case metadata (comma-or tab-separated-variable format), describing sample dates or symptom onset dates for instance. PEACH Tree then plots the phylogenetic tree alongside the MSA and renders further epidemiological annotations onto the display (Figure 1) . By default, only segregating sites are shown, as opposed to the complete MSA.A subset of samples (such as a monophyletic group) can be focused on and the segregating sites are recalculated. This can be useful for understanding genomic variants within a particular outbreak or cluster, for instance. The tree can be displayed as a transmission tree, where internal nodes are oriented to represent a transmission event from the top child to the bottom child. The orientation of nodes can be flipped by clicking on the transmission node. A Scalable Vector (Douglas et al., 2021b) . Only 13 [monophyletic] cases from the full tree are displayed along with their segregating sites. Sites are coloured by minor alleles. Internal nodes are coloured by their posterior clade support probability and the infectious periods are displayed as orange bars (modelled by symptom onset dates minus 2 days, plus 5 days). Note that onset dates were randomised for privacy protection. Bottom: phylogenetic tree of Rubulavirinae L proteins (Douglas et al., 2021a) , with branches coloured by their substitution rate under the relaxed clock model (Douglas et al., 2021d) . All sites (within the indicated range) are displayed. Graphics (SVG) or Portable Network Graphics (PNG) file can be readily downloaded from the web-browser. While numerous MSA and tree visualisation tools already exist, PEACH Tree stands alone for several reasons. First, PEACH Tree is a fast and responsive web-browser implementation. Second, PEACH Tree displays MSAs and trees together side-by-side. Third, PEACH Tree is specifically catered to the domain of infectious disease transmission and comes with a range of tools for easily focusing on samples and sites, and displaying epidemiological annotations. This project was funded by the New Zealand Ministry of Health and Ministry of Business, Innovation and Employment. Emergence of zaire ebola virus disease in guinea Compartmental models in epidemiology Uglytrees: a browser-based multispecies coalescent tree visualizer Evolutionary history of cotranscriptional editing in the paramyxoviral phosphoprotein gene Real-time genomics for tracking severe acute respiratory syndrome coronavirus 2 border incursions after virus elimination, new zealand Phylodynamics reveals the role of human travel and contact tracing in controlling the first wave of covid-19 in four island nations Adaptive dating and fast proposals: Revisiting the phylogenetic relaxed clock model Establishment and cryptic transmission of zika virus in brazil and the americas Towards a genomics-informed, real-time, global pathogen surveillance system Unifying the epidemiological and evolutionary dynamics of pathogens Nextstrain: real-time tracking of pathogen evolution Aliview: a fast and lightweight alignment viewer and editor for large datasets Ape: analyses of phylogenetics and evolution in r language The neighbor-joining method: a new method for reconstructing phylogenetic trees Tracking the covid-19 pandemic in australia using genomics Gisaid: Global initiative on sharing all influenza datafrom vision to reality Icytree: rapid browser-based visualization for phylogenetic trees and networks Jalview version 2-a multiple sequence alignment editor and analysis workbench