key: cord-0727690-09te4v9m authors: Lumley, Sheila F; Constantinides, Bede; Sanderson, Nicholas; Rodger, Gillian; Street, Teresa L; Swann, Jeremy; Chau, Kevin K; O'Donnell, Denise; Warren, Fiona; Hoosdally, Sarah; laboratory, OUH Microbiology; Control team, OUH Infection Prevention and; O'Donnell, Anne-Marie; Walker, Timothy M; Stoesser, Nicole E; Butcher, Lisa; Peto, Tim EA; Crook, Derrick W; Jeffery, Katie; Matthews, Philippa C; Eyre, David W title: Epidemiological data and genome sequencing reveals that nosocomial transmission of SARS-CoV-2 is underestimated and mostly mediated by a small number of highly infectious individuals. date: 2021-07-28 journal: J Infect DOI: 10.1016/j.jinf.2021.07.034 sha: 0f3c8ba81481dc57e88532d9a70e7b0a04b7718f doc_id: 727690 cord_uid: 09te4v9m Objectives: Despite robust efforts, patients and staff acquire SARS-CoV-2 infection in hospitals. We investigated whether whole-genome sequencing enhanced the epidemiological investigation of healthcare-associated SARS-CoV-2 acquisition. Methods: From 17-November-2020 to 5-January-2021, 803 inpatients and 329 staff were diagnosed with SARS-CoV-2 infection at four Oxfordshire hospitals. We classified cases using epidemiological definitions, looked for a potential source for each nosocomial infection, and evaluated genomic evidence supporting transmission. Results: Using national epidemiological definitions, 109/803(14%) inpatient infections were classified as definite/probable nosocomial, 615(77%) as community-acquired and 79(10%) as indeterminate. There was strong epidemiological evidence to support definite/probable cases as nosocomial. Many indeterminate cases were likely infected in hospital: 53/79(67%) had a prior-negative PCR and 75(95%) contact with a potential source. 89/615(11% of all 803 patients) with apparent community-onset had a recent hospital exposure. Within 764 samples sequenced 607 genomic clusters were identified (>1 SNP distinct). Only 43/607(7%) clusters contained evidence of onward transmission (subsequent cases within ≤1 SNP). 20/21 epidemiologically-identified outbreaks contained multiple genomic introductions. Most (80%) nosocomial acquisition occurred in rapid super-spreading events in settings with a mix of COVID-19 and non-COVID-19 patients. Conclusions: Current surveillance definitions underestimate nosocomial acquisition. Most nosocomial transmission occurs from a relatively limited number of highly infectious individuals. Limiting acquisition of SARS-CoV-2 by patients and staff in hospitals is an infection prevention and control (IPC) priority. Despite robust efforts, both patients and staff are infected in hospitals; 10-40% of hospital-diagnosed COVID-19 cases are thought to have been acquired in hospital, with 8700 deaths following nosocomial infection reported in the UK, [1] [2] [3] [4] and higher rates of seroconversion are reported in healthcare workers compared to the general population. [5, 6] Distinguishing which patients have acquired infection in hospital allows potential transmission events to be investigated. Epidemiological rules are frequently used for nosocomial classification and outbreak investigation, using spatial and temporal patient data to make assumptions about acquisition and transmission. However, such rules may exclude plausible transmission and leave uncertainty around the source of individual infections. SARS-CoV-2 whole-genome sequencing (WGS) has been proposed as an adjunct to assist hospital outbreak investigation. Individuals infected with identical or near-identical (≤1 SNP) viruses, are more likely to be linked in a transmission chain than those with more distantly related viruses, as demonstrated by previous retrospective studies that have utilised WGS to identify nosocomial infections and outbreaks. [7] [8] [9] [10] [11] [12] We investigated whether sequencing could enhance epidemiological investigation of healthcare-associated SARS-CoV-2 acquisition in two areas: i) confirming/excluding nosocomial acquisition and ii) understanding the role of outbreaks in nosocomial acquisition. We highlight the benefits and pitfalls of this approach, to help guide local practice in individual centres. The Oxford University Hospitals NHS Foundation Trust comprises four hospitals with ~1100 beds (mostly in 4-bed bays within wards of 20-30 beds) and ~13,500 staff. The four hospitals are presented as "A", a large acute hospital admitting both COVID-19 and non-COVID-19 patients; "B", a smaller general hospital admitting both COVID-19 and non-COVID-19 patients; "C", a hospital focused predominantly on cancer care; and "D", a largely elective orthopedic hospital, with C and D not routinely admitting COVID-19 patients. Ward admission and discharge dates were available for all patients from 14 days before the first positive PCR, and the work location for those staff working exclusively or predominantly on a single ward. Public Health England (PHE) guidance for COVID-19 IPC was followed throughout the study, including the use of patient pathways, personal protective equipment (PPE), symptomatic and asymptomatic staff and patient testing (summarised in Supplement). [13] Infections in patients and hospital staff were detected by symptomatic and asymptomatic SARS-CoV-2 PCR testing of combined nasal and oropharyngeal swabs by Thermo Fisher TaqPath assay (2553/2773, 92% samples) and other platforms (details in Supplement). PCR-positive samples were stored at -80℃ for WGS. Sequencing was attempted on all stored samples, regardless of cycle threshold (Ct) value, using the ARTIC LoCost protocol [14] (details in Supplement). Nosocomial SARS-CoV-2 infection was defined following NHS England and NHS Improvement definitions: [15] -Community-Onset, PCR-positive ≤2 days after hospital admission/attendance Enhanced nosocomial classification -prior negative PCR results (available as a result of admission screening, weekly ward screening and symptomatic testing) and admissions in the 14 days prior to diagnosis were used to determine whether additional support existed for nosocomial acquisition. For the purpose of identifying plausible transmission events, indicative incubation periods were defined as 1-14 days prior to a positive PCR test. [16] Infectious periods were defined from 4 days before to 7 days after a positive PCR for patients, and 4 days before to the day of the positive PCR test for staff (reflecting that staff isolated at home for 10 days following a positive test). [17] Mean serial intervals, i.e. the duration between the symptom-onset time of in a transmission donor and recipient, have been estimated at 4-7 days, here 5 days is used. [18, 19] Individuals acquiring SARS-CoV-2 are denoted "recipients'', and those transmitting infection as "donors". A "plausible donor" for a recipient, is identified by the donor and recipient being present on the same ward ("ward contact"), during the donor's infectious period and the recipient's incubation period. "Hospital contact" was defined as presence in the same hospital on the same calendar day, during the donor's infectious period and the recipient's incubation period. Epidemiological outbreaks were defined following PHE guidance: [20] We initially classified all cases according to epidemiological definitions above, and then tested if there was epidemiological evidence of a potential source case for each new definite/probable/indeterminate patient and staff infection. We then evaluated how many of these epidemiologically linked cases were within ≤1 SNPs of each other, i.e. had genomic evidence to support transmission. Following this, we searched for epidemiologically defined outbreaks involving infected patients and staff. Community-onset cases were only included as part of an outbreak if they could have plausibly seeded the outbreak (i.e. their diagnosis preceded the first staff or nosocomial patient case on that ward), and not if admitted during an ongoing outbreak. Combined epidemiological and genomic analysis was performed using R version 4.0.2 [21] , with visualisation using ggplot2 [22] and igraph[23] packages. Multiple sequence alignment and phylogenetic analysis were performed with MAFFT) [24] and IQTree [25] respectively; phylogenies were prepared and visualised using Treeswift [26] and Toytree [27] . From 17-November-2020 to 5-January-2021, 1132 individuals (803 inpatients, with 1104 admissions, and 329 staff) were newly diagnosed with PCR-confirmed SARS-CoV-2 infection (Figure 1 ). The median inpatient age at diagnosis was 67 years (IQR 49-81, range 0-102), 43% were female. 188/803 (23%) inpatient infections were classified as nosocomial (definite, probable or indeterminate). Length of stay after the first positive PCR was a median of 6 days in communityonset cases vs. 8 days in nosocomial cases (Kruskal-Wallis p<0.001). All-cause mortality within 28 days of a first positive SARS-CoV-2 PCR was 14% after community-onset and 23% after nosocomial infection (p<0.001) ( Table 1) . Swabs from 764/1132 (67%) PCR-positive individuals were successfully sequenced, including 116/188 (62%) nosocomial cases and 261/329 (79%) staff cases (Tables S1, S2, Figure S1 ). Based on standard national definitions, 188/803 (23%) inpatient infections were classified as nosocomial, subgrouped as definite (n=51), probable (n=58) or indeterminate (n=79). In the UK, patients who acquired SARS-CoV-2 infection in hospital but were discharged before testing positive are not reported as nosocomial and so are not accounted for in these numbers, in part because community testing results are not routinely available to hospitals. Applying an epidemiological outbreak definition considering all ward overlaps led to the identification of 3 outbreaks, the largest containing over 700 individuals, highlighting that it is an unworkable definition when inpatient prevalence is high. Therefore, to more closely replicate IPC practice and provide more interpretable data, the definition of an epidemiological outbreak was restricted to only include ward overlaps with patients and staff on the ward of nosocomial diagnosis. a) Genomics provides confirmatory evidence of nosocomial acquisition for most nosocomial Genomics helped clarify uncertainty around cases without a prior-negative PCR ( Table 2 ). In the probable group, two individuals lacked a prior-negative PCR; one was sequenced alongside a donor and confirmed as genomically-linked (≤1 SNP), and therefore likely nosocomially-acquired. In contrast, in the indeterminate group, 26 individuals lacked a priornegative PCR. 15/24 (63%) were sequenced alongside ≥1 potential donors; only 6/15 (40%) were genomically-linked. Hence absence of a prior-negative PCR in the indeterminate group was associated with a lower likelihood of nosocomialacquisition, but sequencing did support some of these infections having a nosocomial source. Of the 69 individuals with community-onset infection with a prior hospital admission and a plausible donor, 37 were sequenced alongside ≥1 plausible donor(s). 17/37 were genomically-linked, indicating 17 additional infections previously categorised as "community-associated" were plausibly nosocomially acquired ( Table 2) . Amongst the 116 nosocomial cases sequenced, 13 (11%) were genetically-linked to ≥1 other case within 0-1 SNPs but with no documented ward or hospital contact (either patient or staff). These may represent community acquisition in the case of indeterminate cases, but may also be due to undiagnosed/unsequenced individuals providing the missing epidemiological link, e.g. due to incomplete admission and ward-based patient screening and undiagnosed staff cases. Genomic data was unable to provide confirmation of nosocomial acquisition for 22/116 (19%) of sequenced nosocomial cases. Although we can conclude that these cases were not linked to any of the other cases sequenced, we cannot use this information to exclude nosocomial acquisition from undiagnosed/unsequenced individuals, due to incomplete sampling/sequencing. Genomic data were used to refine the epidemiologically-defined outbreaks ( Figure 2B ). Considering the cohort as a whole, rather than just those in an epidemiological outbreak as above, of the 764 individuals with samples sequenced, 200 were placed in one of 43 genomic clusters on the basis of being linked to at least one other case within 0-1 SNPs and 564 were singletons. Therefore during the period of study, SARS-CoV-2 was introduced to OUH on at least 607 occasions, with evidence of onward transmission in 43 clusters (7% of introductions) (Figure 3 ). The median cluster size was 2 (range 2-32 Use of hospital-level ward data, accounting for all patient moves before and after testing PCR positive, led to identification of an unfeasibly large epidemiological outbreak of over 700 individuals. However, using these data in combination with WGS provides a more plausible identification of 15 additional individuals linked to outbreaks, who were missed by wardbased application of the outbreak definition due to patient ward moves during their incubation period, highlighting that outbreaks can span multiple wards. Only 7/25 epidemiologically-defined outbreaks started with a known community-onset case, 2/7 were successfully sequenced, and only 1 confirmed to be genetically-related to subsequent cases within 0-1 SNPS. Despite the partial sequencing of community-onset cases, these data are consistent with limited direct patient-patient spread from known community-onset SARS-CoV-2 infected patients. Approximately two-thirds of staff infections were genetically distinct in this dataset, with 170/261 (65%) >1 SNP different to all other cases, across 90 work locations. Although these cases occurred on wards with existing outbreaks, they were more common in areas with transient patient contact e.g. outpatient areas and dialysis units. The distribution of genomic clusters differed by hospital site; consistent with the extent of exposure to COVID-19 admissions; only isolated/single cases occurred at hospital "D" and only two clusters observed in hospital "C"(one staff pair and one trio containing 1 staff member and 2 patients). In contrast, hospitals "A" and "B" saw multiple larger clusters (notably, the proportion of cases sequenced was the same across all sites). In addition to differences in COVID-19 case load/infectious pressure, other factors may have played a role, such as: patient pathways including co-location of non-COVID-19 and COVID-19 cohort wards, estates/facilities, including number of patient side rooms and ventilation, differences in staff mobility between COVID-19 and non-COVID-19 wards, staff facilities (communal/break areas) and adherence to social distancing. Broadly two patterns of nosocomial acquisition were seen; patterns are shown on a representative example phylogeny in in specialties with patients highly dependent on nursing care (e.g. trauma, acute medicine, neurology). The median outbreak size was 9 (range 7-32) and median duration 12 days (range 7-35 days). Although infrequent, these 8 superspreading events accounted for 80% of cases linked to a genomic outbreak. With a serial interval of 5 days, some outbreaks may represent exposure to a single superspreading infection, but most are subsequently propagated amongst staff/patients. Incomplete sampling and asymptomatic individuals without symptom onset dates prevents confident identification of the source of each outbreak, however, in two clusters, staff cases preceded patient cases, so staff could have acted as an index events. In the remaining six outbreaks, there were no cases diagnosed prior to the first definite/probable nosocomial case to act as a plausible index, therefore the outbreak was likely seeded by an undiagnosed or unsequenced patient/staff/visitor. No outbreaks were seeded by direct patientpatient transmission from known positive patients, however we cannot exclude a non-sequenced cross-covering staff member providing the missing epidemiological link, by acquiring infection from a known positive patient and seeding an outbreak on a non-COVID ward. These are characterised by slow "rumbling" accumulation of nosocomial and staff cases on a ward, on both non-COVID and mixed wards with side rooms accommodating COVID and non-COVID patients (Figure 4 ). They consist of multiple introductions of distinct viral variants over a more prolonged period of time, giving the appearance of a slowly progressing outbreak, but with no, or minimal, onward transmission within the unit. Genomic data is required to distinguish recurrent introductions from genomic outbreaks. Recurrent introductions involving one or more definite/probable case occurred on 6 different wards across hospitals "A", "B" and "D". Each mimicked an outbreak with between 3-6 staff and nosocomial cases occurring on the ward over 2-6 week periods, however all were genetically distinct introductions. In this retrospective cohort study of healthcare-associated SARS-CoV-2 transmission using combined epidemiological and sequencing data we make several key findings that challenge current surveillance definitions and reveal most nosocomial transmission occurs from a relatively limited number of highly infectious individuals. [28, 29] . However, in contrast to other nosocomial infections, we found evidence that most nosocomial acquisition occurs in explosive superspreading events, with clusters of genomically-related cases occurring in short time periods, as observed by others for SARS-CoV-2 in both community and hospital settings [9, [30] [31] [32] [33] . WGS added most value when investigating outbreaks during periods of high SARS-CoV-2 prevalence, given high rates of ward-based contact with infected patients. The majority of epidemiologically-defined outbreaks consisted of multiple genomic introductions with some smaller genomic clusters. The role of staff in outbreaks is overestimated from epidemiological data alone, with genomics confirming only 52% of staff epidemiologically placed in an outbreak were genomically-linked, and the majority of sequenced staff cases were genomic singletons (≥2 SNPs from any other case). Additionally, in hospitals not routinely admitting COVID-19 patients, rates of transmission were low, suggesting that isolated acquisition from staff is relatively uncommon, and that transmission requires a 'perfect storm' of mixed COVID and non-COVID wards, emergency admissions and dependent patients accommodated in bays. The main limitations of genomic data were two-fold. Firstly, although epidemiological data is available for all patients, genomic data is limited by sample availability and difficulty of generating sequences at low viral loads. Here 67% of the cohort were successfully sequenced, in line with other similar hospital cohorts (20-70%) [7, 11, 34] . As such, genomic data does not enable nosocomial acquisition to be ruled out. Incomplete hospital sequencing datasets suffer from an 'absence of evidence' when attempting to exclude nosocomial acquisition, which should not be mistaken for 'evidence of absence' of nosocomial acquisition. This may be mitigated in the future by integrated community epidemiological and genomic datasets, and could be addressed through probabilistic inference methods that can account for missing data, or by further optimising sequencing yields. Future approaches to evaluating transmission could also consider proxy markers of infectiousness such as Ct values (reflecting viral loads). Secondly, the rapid transmission of SARS-CoV-2 in relation to viral evolution and the short time spans of outbreaks are insufficient for substantial genetic variation to accumulate, and therefore genomic data alone is insufficient to confer linkage or resolve the ordering of transmission; a combination of epidemiological and genomic data is required. suggestive of a superspreading event, rapid action should be taken, which may involve temporary ward closure to mitigate secondary transmission, recognising that those recently infected have the highest viral loads [37] and are most infectious to patients and staff. [38] If resources allow, use of dedicated staff in high risk areas, and self-isolation at home for staff exposed to a high risk event, may also be appropriate. Challenges include recognising outbreaks spanning multiple areas and implementing effective testing and control measures, e.g. for patients who move between wards and staff who cross-cover multiple wards, including during nights and those contracted by outside agencies. Communication that patients discharged from a superspreading ward are at high risk for acquisition should lower the threshold for postdischarge SARS-CoV-2 screening/testing. Variations in rates of nosocomial transmission suggest screening should be prioritised on wards and in specialities with the highest risks (e.g. acute medicine, trauma, neurology in our setting). As vaccination-mediated reductions in inpatient COVID cases occur, it will be important to raise awareness that patients on low risk wards/pathways are still at risk of nosocomial acquisition, in addition to highlighting that in general outbreaks are caused by patients or staff not known to be positive. This study demonstrates that retrospective analyses of genomic data is useful in some circumstances to guide future IPC practice, with results consistent with similar studies in the UK [7] [8] [9] [10] . It remains to be seen whether the additional costs of generating and analysing this genomic data near real-time (<48hrs from sample to dissemination of results) are justified by additional IPC gains, or whether the rapid and rigorous application of gold standard epidemiological methods in response to fast accumulation of nosocomial PCR-based diagnoses is the key intervention. This question will be addressed by studies such as the COG-HOCI trial [39] . Regardless of WGS, there is a clear need for automated systems to rapidly assimilate epidemiological data tracking patients over space and time to allow transmissions based on locations other than ward of diagnosis to be quickly identified and fed to IPC teams. In conclusion, epidemiological investigation can be enhanced by genomic data, to provide insights into nosocomial acquisition and outbreaks in the hospital setting, and provide practical insights to optimise IPC interventions. DWE declares lecture fees from Gilead, outside the submitted work. No other author has a conflict of interest to declare. there is an untoward incident or outbreak or high prevalence. Patients received an admission PCR test regardless of symptoms, with weekly asymptomatic ward screening thereafter, increasing to twice weekly from 01-December-2020, with symptomatic testing as required. Symptomatic and twice monthly asymptomatic staff testing programmes were available throughout the study period as previously described, [5] and voluntary twice weekly lateral flow device testing was introduced for staff from 23-November-2020. [40] Of the 2773 PCRs performed on this cohort during the study period, 2553 (92%) were performed using the Thermo Samples were sequenced using a multiplex PCR-based approach with the ARTIC LoCost protocol and v3 primers [14] using R9. 4 A multiple sequence alignment (MSA) of 764 sequences based on Wuhan-Hu-1 was constructed using MAFFT parameters '--auto --6merpair --addfragments'. Phylogenetic reconstruction from the MSA was performed with IQTree parameters 'm GTR+G -blmin 1e-9'. The phylogeny was rooted with Wuhan-Hu-1 and branch lengths were SNP-scaled. The estimated evolutionary rate for SARS-CoV-2 is 1.1x10 -3 substitutions/site/year, which equates to 2-3 substitutions per genome per month. [42] This evolutionary rate is slow compared to the rapid transmission of cases in the community and in hospital, with a serial interval of approximately 5 days. [18] This leads to the possibility that two people may be infected with identical viruses by chance, rather than due to direct person-to-person transmission. Using a cumulative Poisson distribution, with approximations of one mutation every 2 weeks and a serial interval of 5 days, there is 70% chance of no new SNPs per transmission, 95% of 0-1 SNPs, and 99% of 0-2 SNPs, therefore the higher the SNP threshold set for considering transmission, the more true transmissions are captured. However, the extent of population-level diversity also needs to be considered, i.e. the probability of two randomly chosen sequences being within 0, 0-1, or 0-2 SNPs within this data set was 0.9%, 2.1% and 3.5% respectively. Therefore a plausible genetic relationship was defined as ≤1 SNP between two samples in this study, an association close enough to support transmission whilst minimising overcalling of linkage in non-nosocomial cases. Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China Nosocomial COVID-19 infection: examining the risk of mortality. The COPE-Nosocomial Study (COVID in Older PEople) Covid infections caught in hospital rise by a third in one week n Up to 8,700 patients died after catching Covid in English hospitals. The Guardian Differential occupational risks to healthcare workers from SARS-CoV-2 observed during a prospective observational study Risk of hospital admission with coronavirus disease 2019 in healthcare workers and their households: nationwide linkage cohort study Rapid feedback on hospital onset SARS-CoV-2 infections combining epidemiological and sequencing data Combined epidemiological and genomic analysis of nosocomial SARS-CoV-2 transmission identifies community social distancing as the dominant intervention reducing outbreaks Superspreaders drive the largest outbreaks of hospital onset COVID-19 infection n Genomic and healthcare dynamics of nosocomial SARS-CoV-2 transmission Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care associated COVID-19: a prospective genomic surveillance study Whole-genome sequencing to track SARS-CoV-2 transmission in nosocomial outbreaks COVID-19: infection prevention and control (IPC) nCoV-2019 sequencing protocol v3 (LoCost) Correct measurement and reporting of healthcare associated-COVID-19 cases: letter to Regional Chief Nurses The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing Rapid review of available evidence on the serial interval and generation time of COVID-19 A Systematic Review of COVID-19 Epidemiology Based on Current Evidence COVID-19: epidemiological definitions of outbreaks and clusters in particular settings n The R project in statistical computing Elegant Graphics for Data Analysis Adding unaligned sequences into an existing alignment using MAFFT and LAST Toytree: A minimalist tree visualization and manipulation library for Python Diverse sources of C. difficile infection identified on whole-genome sequencing Transmission of Staphylococcus aureus between health-care workers, the environment, and patients in an intensive care unit: a longitudinal cohort study based on whole-genome sequencing Phylogenetic analysis of SARS-CoV-2 in Boston highlights the impact of superspreading events Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong Estimating the overdispersion in COVID-19 transmission using outbreak sizes outside China Genomic epidemiology of superspreading events in Austria reveals mutational dynamics and transmission properties of SARS-CoV-2 Applying prospective genomic surveillance to support investigation of hospital-onset COVID-19 Transmission dynamics of SARS-CoV-2 withinhost diversity in two major hospital outbreaks in South Africa Nosocomial Outbreak of SARS-CoV-2 in a "Non-COVID-19" Hospital Ward: Virus Genome Sequencing as a Key Tool to Understand Cryptic Transmission SARS-CoV-2 viral dynamics in acute infections Transmission dynamics of SARS-CoV-2 in the hospital setting Project Hospital-Onset COVID-19 Infections Study -Full Text View -ClinicalTrials Home-based SARS-CoV-2 lateral flow antigen testing in hospital workers Temporal signal and the phylodynamic threshold of SARS-CoV-2 We would like to thank all OUH staff who participated in the staff testing program, and the staff and medical students who