key: cord-0999678-4iqd90x0 authors: Li, Ci-Xiu; Burrell, Rebecca; Dale, Russell C; Kesson, Alison; Blyth, Christopher C; Clark, Julia E; Crawford, Nigel; Jones, Cheryl A.; Britton, Philip N.; Holmes, Edward C. title: Diagnosis and analysis of unexplained cases of childhood encephalitis in Australia using metagenomic next-generation sequencing date: 2021-05-10 journal: bioRxiv DOI: 10.1101/2021.05.10.443367 sha: e7f5f15c10f3f63490f69a9261c43e5b44d64ec8 doc_id: 999678 cord_uid: 4iqd90x0 Encephalitis is most often caused by a variety of infectious agents, the identity of which is commonly determined through diagnostic tests utilising cerebrospinal fluid (CSF). Immune-mediated disorders are also a differential in encephalitis cases. We investigated the clinical characteristics and potential aetiological agents of unexplained encephalitis through metagenomic next-generation sequencing of residual clinical samples of multiple tissue types and independent clinical review. A total of 43 specimens, from both sterile and non-sterile sites, were collected from 18 encephalitis cases with no cause identified by the Australian Childhood Encephalitis study. Samples were subjected to total RNA sequencing to determine the presence and abundance of potential pathogens, to reveal mixed infections, pathogen genotypes, and epidemiological origins, and to describe the possible aetiologies of unexplained encephalitis. From this, we identified five RNA and two DNA viruses associated with human infection from both non-sterile (nasopharyngeal aspirates, nose/throat swabs, urine, stool rectal swab) and sterile (cerebrospinal fluid, blood) sites. These comprised two human rhinoviruses, two human seasonal coronaviruses, two polyomaviruses and one picobirnavirus. With the exception of picobirnavirus all have been previously associated with respiratory disease. Human rhinovirus and seasonal coronaviruses may be responsible for five of the encephalitis cases reported here. Immune-mediated encephalitis was considered clinically likely in six cases and RNA sequencing did not identify a possible pathogen in these cases. The aetiology remained unknown in nine cases. Our study emphasises the importance of respiratory viruses in the aetiology of unexplained child encephalitis and suggests that the routine inclusion of non-CNS sampling in encephalitis clinical guidelines/protocols could improve the diagnostic yield. Author Summary Encephalitis is caused by both infectious agents and auto-immune disorders. However, the aetiological agents, including viruses, remain unknown in around half the cases of encephalitis in many cohorts. Importantly, diagnostic tests are usually based on the analysis of cerebrospinal fluid which may limit their utility. We used a combination of meta-transcriptomic sequencing and independent clinical review to identify the potential causative pathogens in cases of unexplained childhood encephalitis. Accordingly, we identified seven viruses associated with both sterile and non-sterile sampling sites. Human rhinovirus and seasonal coronaviruses were considered as most likely responsible for five of the 18 encephalitis cases studied, while immune-mediated encephalitis was considered the cause in six cases, and we were unable to determine the aetiology in nine cases. Overall, we demonstrate the role of respiratory viruses as a cause of unexplained encephalitis and that sampling sites other than cerebrospinal fluid is of diagnostic value. Metagenomic next-generation sequencing (mNGS) has successfully identified a broad 99 range of infectious agents in a range of clinical syndromes [8] [9] [10] and is gradually being 100 established as a powerful and reliable diagnostic platform [8] . Indeed, mNGS has identified 101 an increasing number of novel or unexpected pathogens associated with encephalitis [11] . 102 Total RNA sequencing -meta-transcriptomics -may be especially powerful as it provides a 103 simple way to characterise all the actively transcribing microbes in a sample and estimate 104 their abundance [12] [13] [14] . Not only does total RNA sequencing identify the RNA viruses 105 present in a sample, but also those DNA viruses, bacteria, parasites and fungi that are actively 106 transcribing RNA [15] . 107 The Australian Childhood Encephalitis (ACE) study has comprehensively identified, 108 collected data, and reviewed cases of this severe syndrome nationally through active sentinel 109 hospital surveillance since 2013, and requested banking of salvaged laboratory biospecimens 110 from cases [7] . Herein, we describe the use of total RNA sequencing on samples of differing 111 tissue types obtained from 18 cases of childhood encephalitis categorised as having unknown Diagnostic testing for a range of viral agents by PCR on CSF, blood/plasma, sputum and 1 stool samples were performed in the local hospitals. All were negative with the exception of 2 one positive detection of rotavirus from a stool sample, and one detection each of rhinovirus 3 and coronavirus (OC43) in respiratory samples (Table S1 ). None were considered significant 4 to the clinical presentation. Similarly, serological tests were negative (not consistent with 5 acute infection) for all cases (Table S1) . Some cases were also tested for antibodies against 6 ganglioside, muscle specific tyrosine kinase (MUSK), acetylcholine receptor (AChR), N-7 Methyl-D-Aspartate receptor (NMDAR), voltage-gated potassium channel (VGKC) and 8 neuromyelitis optica (NMO). All were negative (Table S1 ). A multidisciplinary expert team (PNB, RD, AK, CAJ) re-reviewed clinical presentation, 10 available diagnostic testing using published criteria for assigning causation in encephalitis 11 and criteria for clinically diagnosing autoimmune encephalitis [16, 17] . Following this 12 review, nine were considered to likely have infectious causes, six immune-mediated causes, 13 and three could not be further classified ( (NPA) and endotracheal aspirates (ETA), were utilised in metagenomic testing ( Figure 1B ). All 43 samples were individually examined using meta-transcriptomics, generating 2.77 21 billion raw paired-end reads in total (between 12.2 and 81.0 million reads per sample) ( Table 22 S2). For each library, 48.46 to 85.79% of the reads were retained after removal of low 23 complexity and redundant reads ( Figure S1 , Table S2 ), and 1.10 to 80.19% of these reads 24 were subsequently retained after removal of human reads ( Figure S1 , Table S2 ). The resultant 25 sequence reads and assembled contigs were annotated against NCBI reference databases, 26 revealing a number of microbes, including potential pathogens. These are described in more 27 detail below. lower than 100 RPM in case 09 (blood) ( Figure 2 , Table 2 ). To identify specific viral genotypes/lineages and their epidemiological origins, 51 phylogenetic analyses were performed using the resulting complete or partial virus genomes 52 ( publicly available sequence ( Figure 3D , Table S3 ). One picobirnavirus was identified from one stool sample (case 17), and the near (Table S3) . respiratory swab) and one in a sterile site (blood) ( of clinical evaluation and mNGS analysis they are considered possible causative agents. Targeted assays of the CSF or antibody tests could be beneficial for pathogen determination 175 in future cases. Another possible explanation for the pathogenicity of these respiratory viruses at non- CSF sites is that they represent para-infectious encephalitis resulting from indirect CNS Logic model for determining pathogenic potential of mNGS detections 286 We applied a logic model ( Figure S2) Abundance levels (reads per million total reads) were estimated using MetaPhlAn2. The 554 taxonomic tree was visualized using GraPhlAn. ^Abundance was determined to be high if sequence reads were above 10000 RPM, or low if below 1000 RPM. In the event of detection in multiple samples sterile samples taken priority. Table S1 . Summary of laboratory tests performed on the encephalitis cases. Table S3 . Identity of viruses identified in this study with the most closely related sequence available on public sequence databases. 1 N N N P* N N 2 N N N N 3 N N N N N P # N N 4 N N N N N 5 N N N N N N N N 6 N N N N N N N N N N 7 N N N N N N N N N N N E 8 N N N N N N N N N N N N P 9 N N N N N N 10 N N N N N N N N 11 N N N N N N N N N N N N 12 N N N N N N 13 N N N N N N N N N N P N N N N 14 N N N N N N N 15 N N N N N N N N N N N N 16 N N N N N N N N N N 17 N N N N 18 N N N N Genes or genome alignments used in the phylogenetic analysis Domain Alignment length (bp) HCoV-OC43/03-16118/NSW/AU/2019 MK303622/HCoV-OC43/MDS11 99.73% Spike protein MH940245/HCoV-HKU1/SI17244 99.75% Partial Spike protein MH940245/HCoV-HKU1/SI17244 99.75% Partial Spike protein FJ445114/HRV-A9/F01 96.48% Near complete genome FJ445137/HRV-B52/F10 92.82% Near complete genome FJ445137/HRV-B52/F10 92.82% Near complete genome 44%(aa) RdRp 524 amino acids