key: cord-0774727-pgl7aebd authors: Beury, Delphine; Fléchon, Léa; Maurier, Florence; Caboche, Ségolène; Varré, Jean-Stéphane; Touzet, Hélène; Faure, Karine; Dubuisson, Jean; Hot, David; Guery, Benoit; Goffard, Anne title: Use of whole-genome sequencing in the molecular investigation of care-associated HCoV-OC43 infections in a hematopoietic stem cell transplant unit date: 2019-11-16 journal: J Clin Virol DOI: 10.1016/j.jcv.2019.104206 sha: 7c66c10a9077cbf5403b5b684c4ae9e10baac9c2 doc_id: 774727 cord_uid: pgl7aebd BACKGROUND: While respiratory viral infections are recognized as a frequent cause of illness in hematopoietic stem cell transplantation (HSCT) recipients, HCoV−OC43 infections have rarely been investigated as healthcare-associated infections in this population. OBJECTIVES: In this report, HCoV−OC43 isolates collected from HSCT patients were retrospectively characterized to identify potential clusters of infection that may stand for a hospital transmission. STUDY DESIGN: Whole-genome and S gene sequences were obtained from nasal swabs using next-generation sequencing and phylogenetic trees were constructed. Similar identity matrix and determination of the most common ancestor were used to compare clusters of patient’s sequences. Amino acids substitutions were analysed. RESULTS: Genotypes B, E, F and G were identified. Two clusters of patients were defined from chronological data and phylogenetic trees. Analyses of amino acids substitutions of the S protein sequences identified substitutions specific for genotype F strains circulating among European people. CONCLUSIONS: HCoV−OC43 may be implicated in healthcare-associated infections. Although highly pathogenic human coronaviruses (HCoV), like SARS-CoV and MERS-CoV, have emerged recently, the majority of HCoV infections are caused by HCoV-229E, HCoV-NL63, HCoV−OC43 and HCoV-HKU1. These coronaviruses are mainly recognized as causative agents of community-acquired infections. With the exception of SARS and MERS-CoV, there are not many published studies about health-care associated HCoV infections [1] [2] [3] [4] [5] . HCoV-229E, HCoV-NL63, HCoV−OC43 and HCoV-HKU1 are usually associated with mild diseases in immunocompetent patients, but can cause severe respiratory tract infections in fragile populations. Indeed, development of molecular detection tests for diagnosis has shown that HCoV are clearly responsible for severe infections in immunocompromised patients, including hematopoietic stem cell transplant recipients (HSCT) [6] [7] [8] . HCoV−OC43, HCoV-229E, HCoV-NL63 and HCoV-HKU1 account for 6.7-15.4% of the viruses detected during respiratory infections in HSCT recipients [9] . Currently, HCoV−OC43 is detected around the world, circulating all year long with a slight predominance during winter in temperate countries. The efficient transmission via small droplets and a prolonged shedding by HSCT recipient's contribute to virus dissemination and highlight the need for a better understanding of hospital-acquired HCoV infections [10] [11] [12] [13] . Although an outbreak of HCoV-NL63 respiratory infections in a long-term care facility has recently been reported, a limited amount data are available on the overall genomic characteristics of HCoV−OC43 healthcare-associated infections [14] . The main objective of this work was to identify potential clusters of infection that may stand for a hospital transmission. Next-Generation Sequencing (NGS) was used to generate whole-genome sequences from nasal swabs and phylogenetic analyses were then applied to characterize sequences. The secondary objective was to identify molecular signature of circulating strains. Eight HSCT recipients with HCoV−OC43 detected in nasal swabs by multiplex respiratory viral PCR using Anyplex II RVS 16 detection kit at the University hospital of Lille between 2013 and 2015 were included in the study. Demographic and clinical data were collected. All patients were informed that they were included in a research cohort and agreed that their biological samples could be used for research purposes. Nasal swabs studied here were included into the biological collection of the University Hospital of Lille declared to the French Ministry of High Education and Research (reference number DC-2008-642). According to the French laws, this declaration implies acceptance by an ethic committee. Patients are named as MDSX for "Maladies Du Sang"X. Viral genome sequencing was performed as described by Maurier et al. [15] . Briefly, extracted viral RNA was reverse transcribed, PCR amplified using multiple primer pairs and pooled before preparing sequencing libraries using NxSeq AmpFREE LowDNA (Lucigen) library kit and barcoded with Illumina-compatible adaptors. Libraries were paired-end sequenced in 2 × 300 cycles on Illumina MiSeq. Sequencing analysis and genome assembly was performed as described by Maurier et al. [15] Annotation of the 8 newly sequenced genomes MDS2, MDS4, MDS6, MDS11, MDS12, MDS14, MDS15 and MDS16, was performed by comparison with the HCoV−OC43 isolate from Mexico (Genbank accession KX344031) [16] using BlastN [17] . This isolate served as a reference and allowed us to compute the coordinates of the following functional elements: ORF1a, ORF1b, ns2A, HE, S, M and N genes. For the whole genome evolutionary analysis of MDS2, MDS4, MDS6, MDS11, MDS12, MDS14, MDS15 and MDS16, we selected 40 published full-length human coronavirus genomes with diverse genotypes and geographical origins, making a total of 48 sequences. We also considered more specifically the phylogeny of S gene and included 12 additional sequences from partially sequenced genomes from Caen hospital (France) and one additional published sequence from China (Table supplementary data) , making a total of 61 sequences. For each of the two datasets, three distantly related bovine coronaviruses served as outgroup and multiple-sequence alignment was built using MUSCLE from SeaView 4.6.4 followed by a manual correction step taking into account protein coding sequences for coding regions [18, 19] . Based on the multiple sequence alignments, similarity between each pair of nucleotide sequences was computed using the Sequence Manipulation Suite [20] . Phylogenetic trees were then inferred using the maximum likelihood method implemented in PhyML 3.1 with a general time reversible (GTR) nucleotide substitution model, a proportion of invariant sites (+I) and a gamma rate heterogeneity (+Γ) with 1000 bootstrap replicates [21] . The substitution model was chosen by optimizing the AIC score using smart model selection [22] . A total of 30,815 nucleotides positions were included for the whole genome analysis and 4137 nucleotides positions for S gene. The software FigTree 1.4.4 was used to produce figures [23] . Amino acids comparative analyses for ORF1a, ORF1b, ns2A, HE, S, M and N proteins were done on 58 sequences strains (8 MDS + 40 previously included and 10 others) for which complete genomes were available and on 76 sequences for S gene only (8 MDS + 68 others) ( Table Supplementary data) . For this analysis, we added new sequences compared to the phylogenetic analysis. Indeed, we need to examine each position of the sequences separately, which requires a larger support. By doing this, we reduced the chance to select erroneous mutations. The goal is to ensure that the informative mutations found are characteristic of a particular genotype: they are found only in this genotype and are shared by all sequences known for this genotype. In both cases, the signature amino acid substitutions were determined by selecting a subset of informative sites using DIVEIN [24] . Colorization was performed manually to highlight conserved sites. We followed the methodology used in [25] . Notably, we added Asiatic sequences that were also used in [25] and [26] . We estimated divergence times using BEAST 1.8.4 [12] on both whole genome and S gene [27] . Exponential population size, relaxed clock with coalescent tree (BSP distribution for S and uncorrelated exponential distribution for whole genome) was tested to estimate the time of most recent common ancestor (tMRCA). All information on the sequences used (identifiers, strains, bibliographical references) is summed up in the supplementary Table S1 . Complete genome sequences were deposited in Genbank under accession numbers MK303619 to MK303625, MK327281 HCoV−OC43 isolates were detected by RT-PCR from nasal swabs collected from 8 HSCT recipients. Table 1 shows the characteristics of these patients. The median time to HCoV−OC43 infection after HSCT was 239.5 days (range 13-1052 days). All patients had graft-versus-host diseases for which they received a steroid daily dose of higher than 1 mg/kg, except patient MDS4. All patients died except one, MDS4; however, in all the cases, death was not directly related to HCoV−OC43 infection. Fig. 1 presents the timeline of events for patients involved in HCoV−OC43 clusters of infections, that why patients MDS15 and MDS16 are not mentioned. Patients were frequently admitted in hospital for consultations in the outpatient clinic or inpatient hospitalizations. They shared the same hospital units and the same healthcare worker teams. At the time of nasal swab collection, acute respiratory symptoms were present for all the patients. Other respiratory pathogens were detected in nasal swabs for only 2 patients (Table 1 ). No concomitant co-infection with bacteria or fungi was detected. All episodes occurred during winter. To characterize the overall diversity of HCoV−OC43 circulating among the 8 HSCT patients, 7 full-length genome sequences were obtained from nasal swabs of HSCT patients using NGS method. The sequence obtained from MDS15 patient was incomplete since a part between positions 7473 and 7689 in orf1ab was missing. Considering the small fragment of sequence missing, the partial full-length sequence obtained from this patient was also included in all phylogenetic analyses. To genotype the HCoV−OC43 isolates, phylogenetic trees were reconstructed by the maximum-likelihood method using sequences of full-length genomes and S genes obtained and compared to those retrieved from GenBank ( Fig. 2A/B ). The tree obtained from full-length genomes showed that MDS sequences are divided into 4 known genotypes B, F and G ( Fig. 2A ) [25, 26, 28] . MDS4, MDS6, MDS14 and MDS12 sequences, belonging to the genotype F, are grouped into the same cluster separated from other sequences of genotype F with high bootstrap value (79.6 %). These sequences were obtained from samples collected in 2013 except for the MDS12 sequence that was obtained from sample collected in 2014. This cluster was named "cluster 2013" (Fig. 2A) . To reinforce the hypothesis of common origin of clinical isolates, the divergence times of MDS sequences were estimated using BEAST method. The estimated mean evolutionary rate was 4.0 × 10 −4 (3.4 × 10 −4 -4.5 × 10 −4 ) substitutions/site/year for the full-length genome and, for the S gene, 5.5 × 10 −4 (4.5 × 10 -4 -6.5 × 10 −4 ) substitutions/site/year. Based on the full-length genome and the S gene data, the time of emergence for genotype A was estimated in the 1960s and in the late 1990s to early 2000s for genotypes B to D (Table 2) Using the prototype ATCC VR759 as the reference strain for amino acid positions, sequence MDS16 shares 19 amino acid substitutions with other sequences of genotype B and presents 18 specific amino acid substitutions mapped across the whole genome relative to the other sequences of the genotype B (Fig. 3A) . Likewise, sequences of the cluster 2013 show the specific substitution of the genotype F (Y177 H) and 9 additional amino acids substitutions relative to the others of the genotype F [25] . Notably, MDS6, MDS12 and MDS14 sequences show 12 additional substitutions relative to the genotype F, which are absent on the MDS4 sequence. MDS15 sequence, for which only the S region could be analyzed, shares the same amino acid substitutions with the other sequences of the genotype E and shows only 1 specific substitution relative to the other sequences of the same genotype. MDS2 and MDS11 sequences appear similar to the others of genotype G. Since S protein sequences of other French HCoV−OC43 isolates have been previously published, phylogenetic analyses show that MDS sequences of the cluster 2013 are clustered to other French sequences on the phylogenetic tree, whereas sequences of the cluster 2014 appear more related to Malaysian, Chinese and American sequences (Fig. 2B ) [29] . When signature amino acid substitutions were analyzed, for the genotype B, 3 amino acid substitutions appear on the sequence MDS16: absence of substitution D471 N, substitutions L482 F that is common to other sequences of genotype B, C, F and G and L505S that is also observed on sequences of genotype C. For the genotype E, the amino acid (Fig. 3B) . These data show that any amino acid substitution appear specific to sequences of genotype B and G circulating in France. But, sequences circulating among French people belonging to the genotype F had a peculiar molecular signature (absence of substitution D1264H and C1319S, substitution A1054S and D1252 G). In this work, we describe retrospectively 8 HCoV−OC43 infections having occurred during the winters 2013 and 2014 at the HSCT Unit of the Lille University hospital. Six patients were divided into 2 clusters, respectively named 2013 and 2014 on the basis of the year of occurrence and genomic characteristics. HCoV−OC43 cases described in this report could be classified as healthcare-associated infections defined as a group of infections emerging among patients who come from a community with a history of previous exposure to healthcare, but do not fit the nosocomial infection criteria [30] . The 6 patients divided into the two clusters of cases regularly attended the hospital for in-and outpatient cares, as consultations, day and inpatient hospitalizations. So, they were frequently in contact with health care professionals and hospital visitors that may be considered as multiple sources of HCoV infections, during periods of usual circulation of HCoV−OC43 in France [31] . Outside the hospital, they resided in geographically non-related places and no common activity could be identified. So, the hospital admission seems the only link between the patients, even though we were unable to identify any direct contact between the patients. HCoV−OC43 can be transmitted either by an asymptomatic patient with prolonged shedding, by healthcare worker asymptomatic or with an upper respiratory tract infection [10, 32] . Phylogenetic analyses showed that whole-genome sequences included in a same cluster were very closely related, had very high levels of similarities and shared same common ancestors. All these elements provide additional evidence that these infections could be classified as healthcare-associated infections. Similarly, to what is observed for influenza viruses, parainfluenza virus type 3 and respiratory syncytial virus, our data show that HCoV−OC43 is involved in healthcare-associated infections [33] [34] [35] . Because we were unable to identify the index case of HCoV−OC43 infections of the cluster 2013, the MDS12 patient infection during the winter 2014 with a strain highly related to those circulating the previous year is surprising. However, previous studies have shown that genomes of other high and low pathogenic coronaviruses presented low variation rates during hospital-acquired outbreaks [36, 37] . Our data show that sequences of HCoV−OC43 appear also very stable. Phylogenetic trees based on the whole genome and S gene HCoV−OC43 show that genotypes B, G, E and F currently circulate among French patients, in agreement with other report [29] . Even if HCoV−OC43 was previously detected among patients with haematology malignancy, the involvement of HCoV−OC43 genotypes in this type of infection was rarely investigated [38] [39] [40] . Indeed, the analysis of more HCoV−OC43 strains from other healthcare-associated infections will reveal the relative prevalence of each genotype in this type of infections in different localities. The divergence time of sequences included in the phylogenetic analyses was calculated to accumulate to collect additional information showing the common origin of clinical isolates. Our results are similar to those previously described, validating the tMRCA method used [25, 28] . Indeed, our results show that each cluster of sequences has diverged from a distinct ancestor in 2010.9 for the cluster 2013 and in 2011.5 for the cluster 2014. Due to limited HCoV−OC43 full-length genome published in databases, phylogenetic analyses were conducted on S gene sequences. All results obtained from whole genome and S gene sequences were similar. However, phylogenetic analyses from the numerous published S gene sequences allowed comparison of HCoV−OC43 sequences isolated in Lille with those isolated in different parts of the world. Strains described here represent probably a part of strains currently circulating among European people. However, our data show that both kinds of HCoV−OC43 strains circulate among patients included in the study: those belonging to the genotype G closely related to Asian strains and other belonging to the genotype F more related to other European strains previously described [25, 26, 28, 29, 41] . Amino acid substitution analyses showed that some substitutions especially in the S protein could be specific of strains circulating in European population but the number of strains analyzed here and the number of sequences obtained from European isolates of HCoV−OC43 are too small and further studies will be needed to confirm our data. We are aware that our study presents some limitations. As previously mentioned, we were unable to identify the index case of the HCoV−OC43 infections in the HSCT unit and determine precisely the contact history of the patients. Thus, we cannot exclude that other patients were involved in the HCoV−OC43 healthcare-associated infections described. In the same way, we were unable to detect infection or asymptomatic portage in healthcare workers of the HSCT unit. Some reports have shown that different HCoV−OC43 strains circulated during a same winter, suggesting that there may be multiple channels of HCoV introduction in hospital [25, 42, 43] . In fact, the HCoV strains of healthcare-associated infections were probably contracted from one of the strains circulating in the community at that time. For all these reasons, cases reported in this study did not correspond to the definition of nosocomial transmission of pathogens and we used the term of coronavirus healthcare-associated infections. Another weakness of our work concerns the determination of pattern of amino acid substitutions specific of strains circulating among French people. Indeed, our data suggest that some substitutions might be specific to strains circulating in Europe but the analysis of a larger number of strains would be necessary to confirm our hypothesis. To conclude, we retrospectively described the molecular investigation of HCoV−OC43 infections in a HSCT unit using whole-genome sequencing combined with advanced phylogenetic analyses. These tools allowed us to associate HCoV−OC43 infections with healthcare. Moreover, we suggest that two kinds of HCoV−OC43 strains circulate among the French population, one sharing common ancestors with Asian strains and the other closely related to European lineage. This work was supported by a Lille Hospital University grant (EPI-CoV) and by CNRS and Lille University (PEPS 2015). The Transparency document associated with this article can be found in the online version. The manuscript had not been published or presented elsewhere, and the authors have no conflict of interest to declare. Coronavirus-related nosocomial viral respiratory infections in a neonatal and paediatric intensive care unit: a prospective study A major outbreak of severe acute respiratory syndrome in Hong Kong Outbreaks of human coronavirus in a pediatric and neonatal intensive care unit Case characteristics among Middle East respiratory syndrome coronavirus outbreak and non-outbreak cases in Saudi Arabia from Nosocomial amplification of MERS-coronavirus in South Korea A prospective hospital-based study of the clinical impact of non-severe acute respiratory syndrome (Non-SARS)-related human coronavirus infection The clinical impact of coronavirus infection in patients with hematologic malignancies and hematopoietic stem cell transplant recipients Epidemiologic and clinical characteristics of coronavirus and bocavirus respiratory infections after allogeneic stem cell transplantation: a prospective single-center study Human rhinovirus and coronavirus detection among allogeneic hematopoietic stem cell transplantation recipients Prolonged shedding of human coronavirus in hematopoietic cell transplant recipients: risk factors and viral genome evolution Genetic variability of human coronavirus OC43-, 229E-, and NL63-like strains and their association with lower respiratory tract infections of hospitalized infants and immunocompromised patients Co-circulation of four human coronaviruses (HCoVs) in Queensland children with acute respiratory tract illnesses in Human Respiratory Coronaviruses Detected In Patients with Influenza-Like Illness in Arkansas, USA Severe Respiratory Illness Outbreak Associated with Human Coronavirus NL63 in a Long-Term Care Facility A complete protocol for whole-genome sequencing of virus from clinical samples: application to coronavirus OC43 Complete genome sequence of human coronavirus OC43 isolated from Mexico Basic local alignment search tool MUSCLE: multiple sequence alignment with high accuracy and high throughput SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building The sequence manipulation suite: JavaScript programs for analyzing and formatting protein and DNA sequences New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0 SMS: Smart Model Selection in PhyML DIVEIN: a web server to analyze phylogenies, sequence divergence, diversity, and informative sites Identification and evolutionary dynamics of two novel human coronavirus OC43 genotypes associated with acute respiratory infections: phylogenetic, spatiotemporal and transmission network analyses Genotype shift in human coronavirus OC43 and emergence of a novel genotype by natural recombination Bayesian phylogenetics with BEAUti and the BEAST 1.7 Molecular epidemiology of human coronavirus OC43 reveals evolution of different genotypes over time and recent emergence of a novel genotype due to natural recombination Genomic analysis of 15 human coronaviruses OC43 (HCoV-OC43s) circulating in France from 2001 to 2013 reveals a high intra-specific diversity with new recombinant genotypes Classification of healthcare-associated infection: a systematic review 10 years after the first proposal Seasonal variation of respiratory pathogen colonization in asymptomatic health care professionals: a single-center, cross-sectional, 2-season observational study Respiratory virus shedding in a cohort of on-duty healthcare workers undergoing prospective surveillance Respiratory viral infections after hematopoietic stem cell transplantation in children The impact of infection control upon hospital-acquired influenza and respiratory syncytial virus, Scand Impact of respiratory viruses in hospital-acquired pneumonia in the intensive care unit: a single-center retrospective study PCR sequencing of the spike genes of geographically and chronologically distinct human coronaviruses 229E Direct sequencing of SARS-Coronavirus S and N genes from clinical specimens shows limited variation Epidemiology of viral respiratory tract infections in an outpatient haematology facility Significant transplantation-related mortality from respiratory virus infections within the first one hundred days in children after hematopoietic stem cell transplantation Clinical significance of human coronavirus in Bronchoalveolar Lavage samples from hematopoietic cell transplant recipients and patients with hematologic malignancies Comparative molecular epidemiology of two closely related coronaviruses, bovine coronavirus (BCoV) and human coronavirus OC43 (HCoV-OC43), reveals a different evolutionary pattern A novel human coronavirus OC43 genotype detected in mainland China Epidemiology characteristics of human coronaviruses in patients with respiratory infection symptoms and phylogenetic analysis of HCoV-OC43 during 2010-2015 in Guangzhou The authors thank J. Ogiez for the management of the biological collection. Supplementary material related to this article can be found, in the online version, at doi:https://doi.org/10.1016/j.jcv.2019.104206.