key: cord-265682-yac7kzaf authors: Eden, John-Sebastian; Rockett, Rebecca; Carter, Ian; Rahman, Hossinur; de Ligt, Joep; Hadfield, James; Storey, Matthew; Ren, Xiaoyun; Tulloch, Rachel; Basile, Kerri; Wells, Jessica; Byun, Roy; Gilroy, Nicky; O’Sullivan, Matthew V; Sintchenko, Vitali; Chen, Sharon C; Maddocks, Susan; Sorrell, Tania C; Holmes, Edward C; Dwyer, Dominic E; Kok, Jen title: An emergent clade of SARS-CoV-2 linked to returned travellers from Iran date: 2020-04-10 journal: Virus Evol DOI: 10.1093/ve/veaa027 sha: doc_id: 265682 cord_uid: yac7kzaf The SARS-CoV-2 epidemic has rapidly spread outside China with major outbreaks occurring in Italy, South Korea, and Iran. Phylogenetic analyses of whole-genome sequencing data identified a distinct SARS-CoV-2 clade linked to travellers returning from Iran to Australia and New Zealand. This study highlights potential viral diversity driving the epidemic in Iran, and underscores the power of rapid genome sequencing and public data sharing to improve the detection and management of emerging infectious diseases. From a public health perspective, the real-time whole-genome sequencing (WGS) of emerging viruses enables the informed development and design of molecular diagnostic assays, and tracing patterns of spread across multiple epidemiological scales (i.e. genomic epidemiology). However, WGS capacities and data sharing policies vary in different countries and jurisdictions, leading to potential sampling bias due to delayed or underrepresented sequencing data from some areas with substantial SARS-CoV-2 activity. Herein, we show that the genomic analyses of SARS-CoV-2 strains from Australian returned travellers with COVID-19 disease may provide important insights into viral diversity present in regions currently lacking genomic data. In late December 2019, a cluster of cases of pneumonia of unknown aetiology in Wuhan city, Hubei province, China was reported by health authorities (Wuhan Municipal Health Commission 2019). A novel betacoronavirus, designated SARS-CoV-2, was identified as the causative agent (Wu et al. 2020) of the disease now known as COVID-19, with substantial humanto-human transmission (Lu et al. 2020) . To contain a growing epidemic, Chinese authorities implemented strict quarantine measures in Wuhan and surrounding areas in Hubei province. Significant delays in the global spread of the virus were achieved, but despite these measures, cases were exported to other countries. As of 9 March 2020, these numbered more than 100 countries, on all continents except Antarctica; the total number of confirmed infections exceeded 110,000 and there were nearly 4,000 deaths (Dong, Du, and Gardner 2020) . Although the majority of cases have occurred in China, major outbreaks have also been reported in Italy, South Korea, and Iran (World Health Organisation 2020a). Importantly, there is widespread local transmission in multiple countries outside China following independent importations of infection from visitors and returned travellers. Viral extracts were prepared from respiratory tract samples where SARS-CoV-2 was detected by reverse-transcription polymerase chain reaction (RT-PCR) using World Health Organisation (2020b) recommended primers and probes targeting the E and RdRp genes. In New South Wales (NSW), Australia, WGS for SARS-CoV-2 was developed based on an existing amplicon-based Illumina sequencing approach (Di Giallonardo et al. 2018 ). Viral extracts were reverse transcribed with SSIV VILO cDNA master mix and then used as input for multiple overlapping PCR reactions (2.5 kb each) spanning the viral genome using Platinum SuperFi master mix (primers provided in Supplementary Table S1 ). Amplicons were pooled equally, purified, and quantified. Nextera XT libraries were prepared and sequencing was performed with multiplexing on an Illumina iSeq (300 cycle flow cell). In New Zealand, the ARTIC network protocol was used for WGS (Quick 2020) . In short, 400-bp tiling amplicons designed with Primal Scheme (Grubaugh et al. 2019a ) were used to amplify viral cDNA prepared with SuperScript III. A sequence library was then constructed using the Oxford NanoPore ligation sequencing kit and sequenced on a R9.4.1 MinION flow cell. Near-complete viral genomes were then assembled de novo in Geneious Prime 2020.0.5 or through reference mapping with RAMPART V1.0.6 (Hadfield 2019) using the ARTIC network nCoV-2019 novel coronavirus bioinformatics protocol (Loman and Rambaut 2020) . In total, 13 SARS-CoV-2 genomes were sequenced from cases in NSW diagnosed between 24 January and 3 March 2020, as well as a single genome from the first patient in Auckland, New Zealand sampled on 27 February 2020 (Table 1) . Australian and New Zealand sequences were aligned to global reference strains sourced from GISAID with MAFFT (Katoh 2002) and then compared phylogenetically using a maximumlikelihood approach-PhyML v2.2.4 (Guindon and Gascuel 2003) . The Australian strains of SARS-CoV-2 were dispersed across the global SARS-CoV-2 phylogeny (Fig. 1A) . The first four cases of COVID-19 disease in NSW occurred between 24 and 26 January 2020, and these were closely related with 1-2 single nucleotide polymorphisms (SNPs) difference to the dominant variant circulating in Wuhan at the time (prototype strain MN908947/SARS-CoV-2/Wuhan-Hu-1). As the four patients identified in this period had recently returned from China, this region was the likely source of infection. From 1 February 2020, travel to Australia from mainland China was restricted to returning Australian residents and their children, who were placed in home quarantine for 14 days. Despite the intensive testing of such returning travellers, no further cases of COVID-19 were detected in NSW until 28 February 2020, when SARS-COV-2 was detected in an individual returning from Iran (NSW05). A close contact of this individual also tested positive (NSW14) providing the first evidence of local transmission within NSW. This was followed by further Iran travel-linked cases in NSW (NSW06, NSW11, NSW12, and NSW13) and New Zealand (NZ01). Of note, the genomes of all patients with a history of travel to Iran were part of a monophyletic group defined by three nucleotide substitutions (G1397A, T28688C, and G29742T) in the SARS-CoV-2 genome relative to the Wuhan prototype strain (Fig. 1B) . G1397A and T28688C both occur in coding regions with G1397A producing a non-synonymous change (V378I) in the ORF1ab-encoded non-structural protein 2 region. G29742T occurs in the 3 0 -UTR. In addition to the Australian and New Zealand strains, this clade also included a traveller who had returned to Canada from Iran (BC_37_0-2), providing further evidence of its likely link to the Iranian epidemic. Indeed, a search of all currently available GISAID sequences and metadata revealed no other complete genome sequences from patients with documented history of travel to or residence in Iran (as of 9 March 2020). A search of partial sequences identified two SARS-CoV-2 sequences which originated in Iran (413553/IRN/ Tehran15AW/2020-02-28 and 413554/IRN/Tehran9BE/2020-02-23) spanning a 363 nt region of the viral nucleoprotein (N). Although short in length, these two sequences covered one of the informative SNPs defining this clade-T28688C, and both Iranian strains matched the sequences from patients with travel histories to Iran and grouped by phylogenetic analysis (Supplementary Figs. S1 and S2). Technological advancements and the widespread adoption of WGS in pathogen genomics have transformed public health and infectious disease outbreak responses (Popovich and Snitkin 2017) . Previously, disease investigations often relied on the targeted sequencing of a small locus to identify genotypes and infer patterns of spread along with epidemiological data (Dudas and Bedford 2019) . As seen with the recent West African Ebola (Dudas et al. 2017) and Zika virus epidemics (Grubaugh et al. 2018) , rapid WGS significantly increases resolution of diagnosis and surveillance thereby strengthening links between genomic, clinical, and epidemiological data (Grenfell 2004) , and potentially uncovering outbreaks in unsampled locations (Grubaugh et al. 2019b ). This advance improves our understanding of pathogen origins and spread that ultimately lead to stronger and more timely intervention and control measures (Grubaugh et al. 2019c) . Following the first release of the SARS-CoV-2 genome (Wu et al. 2020) , public health and research laboratories worldwide have rapidly shared sequences on public data repositories such as GISAID (Shu and McCauley 2017) (n ¼ 236 genomes as of 9 March 2020) that have been used to provide near real-time snapshots of global diversity through public analytic and visualization tools (Hadfield et al. 2018) . Although all known cases linked to Iran are contained in this clade, it is important to note the presence of two Chinese strains sampled during mid-January 2020 from Hubei and Shandong provinces. It is expected that further Chinese strains will be identified that fall in this clade, and across the entire phylogenetic diversity of SARS-CoV-2 as this is where the outbreak started, including likely for the outbreak in Iran itself. However, while we cannot completely discount that the cases in Australia and New Zealand came from other sources including China, our phylogenetic analyses, as well as epidemiological (recent travel to Iran) and clinical data (date of symptom onset), provide evidence that this clade of SARS-CoV-2 is directly linked to the Iranian epidemic, from where genomic data are currently lacking. Importantly, the seemingly multiple importations of very closely related viruses from Iran into Australia suggest that this diversity reflects the early stages of SARS-CoV-2 transmission within Iran. Supplementary data are available at Virus Evolution online. Evolution of Human Respiratory Syncytial Virus (RSV) over Multiple Seasons in New South Wales An Interactive Web-Based Dashboard to Track COVID-19 in Real Time The Ability of Single Genes vs Full Genomes to Resolve Time and Space in Outbreak Analysis Virus Genomes Reveal Factors That Spread and Sustained the Ebola Epidemic Unifying the Epidemiological and Evolutionary Dynamics of Pathogens An Amplicon-Based Sequencing Framework for Accurately Measuring Intrahost Virus Diversity Using PrimalSeq and iVar A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood Nextstrain: Real-Time Tracking of Pathogen Evolution MAFFT: A Novel Method for Rapid Multiple Sequence Alignment Based on Fast Fourier Transform Genomic Characterisation and Epidemiology of 2019 Novel Coronavirus: Implications for Virus Origins and Receptor Binding Whole Genome Sequencing-Implications for Infection Prevention and Outbreak Investigations GISAID: Global Initiative on Sharing All Influenza Data-From Vision to Reality 2020b) Coronavirus Disease (COVID-19) Technical Guidance: Laboratory Testing for 2019 A New Coronavirus Associated with Human Respiratory Disease in China Briefing on the Current Pneumonia Epidemic Situation in Our City The members of the nCoV-2019 Study Group also include Linda Donovan, Shanil Kumar, Tyna Tran, Danny Ko, Christine Ngo, Tharshini Sivaruban, Verlaine Timms, Connie Lam, Mailie Gall, Karen-Ann Gray, Rosemarie Sadsad, and Alicia Arnott. The authors acknowledge the Sydney Informatics Hub and the use of the University of Sydney's high-performance computing cluster, Artemis, and all the laboratories that have referred SARS-CoV-2 samples to the Centre for Infectious Diseases and Microbiology Laboratory Services, NSW Health Pathology -Institute of Clinical Pathology and Medical Research, Westmead Hospital. We would finally like to thank all the authors who have kindly shared genome data on GISAID, and we have included a table (Supplementary Table S2 ) outlining the authors and institutes involved. Data including the sequences in this study are available for download from https:// www.gisaid.org/. This study was supported by the Prevention Research Support Programme funded by the New South Wales Ministry of Health and the NHMRC Centre of Research Excellence in Emerging Infectious Diseases (GNT1102962).Conflict of interest: None declared.