key: cord-0307766-a06w44lz authors: Aljabr, Waleed; Alruwaili, Muhannad; Penrice-Randal, Rebekah; Alrezaihi, Abdulrahman; Harrison, Abbie Jasmine; Ryan, Yan; Bentley, Eleanor; Jones, Benjamin; Alhatlani, Bader Y.; AlShahrani, Dayel; Mahmood, Zana; Rickett, Natasha J.; Alosaimi, Bandar; Naeem, Asif; Alamri, Saad; Alsran, Hadel; Hamed, Maaweya; Dong, Xiaofeng; Assiri, Abdullah; Alrasheed, Abdullah R.; Hamza, Muaawia; Carroll, Miles W.; Gemmell, Matthew; Darby, Alistair; Donovan-Banfield, I’ah; Stewart, James P.; Matthews, David A.; Davidson, Andrew D.; Hiscox, Julian A. title: Amplicon and metagenomic analysis of MERS-CoV and the microbiome in patients with severe Middle East respiratory syndrome (MERS) date: 2020-11-29 journal: bioRxiv DOI: 10.1101/2020.11.28.400671 sha: 2e484212e455a42e1301b00fc5e74ed068b27f5f doc_id: 307766 cord_uid: a06w44lz Middle East Respiratory Syndrome coronavirus (MERS-CoV) is a zoonotic infection that emerged in the Middle East in 2012. Symptoms range from mild to severe and include both respiratory and gastrointestinal illnesses. The virus is mainly present in camel populations with occasional spill overs into humans. The severity of infection in humans is influenced by numerous factors and similar to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) underlying health complications can play a major role. Currently, MERS-CoV and SARS-CoV-2 are co-incident in the Middle East and a rapid way is required of sequencing MERS-CoV to derive genotype information for molecular epidemiology. Additionally, complicating factors in MERS-CoV infections are co-infections that require clinical management. The ability to rapidly characterise these infections would be advantageous. To rapidly sequence MERS-CoV, we developed an amplicon-based approach coupled to Oxford Nanopore long read length sequencing. The advantage of this approach is that insertions and deletions can be identified – which are the major drivers of genotype change in coronaviruses. This and a metagenomic approach were evaluated on clinical samples from patients with MERS. The data illustrated that whole genome or near whole genome information on MERS-CoV could be rapidly obtained. This approach provided data on both consensus genomes and the presence of minor variants including deletion mutants. Whereas, the metagenomic analysis provided information of the background microbiome. Coronaviruses were once described as the backwater of virology as they did not cause extensive disease in humans. However, with the emergence of severe acute respiratory syndrome coronavirus (SARS-CoV) Department of the Army, has been recently evaluated in a human trial 5 and also in dromedary camels 6 . Additionally, a Phase 1 clinical trial has started in humans in Saudi Arabia assessing the safety and tolerability of a vaccine based on the ChAdOx1 MERS vaccine 7, 8 . The sporadic nature of the outbreaks and the lack of suitable animal models has hindered research. Given the lack of medical counter measures, shutting down transmission changes is essential in bringing such outbreaks under control 9 , and sequencing the genomes of viruses can aid in this through molecular epidemiology, as was demonstrated with SARS-CoV in Singapore 10, 11 . This approach can also be used to identify properties of the virus relevant to health authorities, including the rate of virus evolution and whether targets of medical counter measures remain stable and valid. During the 2013-2016 West African Ebola virus outbreak, high resolution sequencing using Illumina 12 and then long-read length/MinION 13 based sequencing was used to provide measures of virus evolution 12 . The data also showed the origin of the outbreak was a single zoonotic transmission event and was spread through subsequent human to human transmission in Guinea and to surrounding countries 12, 14 . The utility of the portable MinION based approach for rapid contact tracing was demonstrated in the West African outbreak through shutting down transmission chains arising from sexual contact 15 . Likewise, MinION based sequencing proved crucial in the 2018 Nigerian Lassa fever outbreak to show that independent transmission events from rodents rather than human to human contact was responsible for the spread of the outbreak 16 . MinION sequencing is also used for deriving consensus genomes for SARS-CoV-2 from clinical samples 17, 18 . Thus, providing sequence information of the genomes of viruses (or pathogens in general), is informative for health authorities, but only if the data is provided in real or near real time. Also, geopolitical considerations must be taken into consideration in terms of sample ownership and data sharing 19 , especially given the Helsinki Declaration. MERS-CoV is highly transmissible 20 and sporadic outbreaks are ongoing in the Kingdom of Saudi Arabia and surrounding regions. Working with clinical specimens can prove challenging, particularly in understanding the pathology of MERS-CoV in how viral load and which co-infections influence outcome and the severity of this infectious disease. In order to aid in the diagnosis of MERS-CoV, and to rapidly generate viral genome sequence, the application of an amplicon-based and metagenomic MinION sequencing approach was to sequence viral RNA. During replication of coronaviruses the approximately 30 kb positive sense RNA genome, a nested set of subgenomic mRNAs are synthesized. Subgenomic mRNAs located toward the 3' end of the genome are generally more abundant than transcripts nearer the 5' end 21, 22 . The data demonstrated that whole genome or near whole genome information on MERS-CoV could be rapidly obtained from samples taken from infected patients and sequenced using the amplicon approach. Further, the amplicon-based sequencing approach provided data on both consensus genomes and the presence of minor variants including deletion mutants. Metagenomic analysis provided information of the background microbiome and how this was associated with outcome. Real time analysis of the microbiome in patients and the identification of antibiotic resistance markers may provide better chemotherapeutic approaches to manage co-infections. Ethical approval was obtained from the Institutional Review Board no 18-102, King Fahad Medical City, Riyadh, Saudi Arabia. Informed consent to participant was not required since only remaining left-over samples were used for this study. Nasopharyngeal aspirates (NPA), oropharyngeal, tracheal aspirates, throat swabs were collected from MERS-CoV positive patients (25- To prepare RNA from virus infected cells as a control for amplification, the MRC-5 cell lines was infected with MERS-CoV (ERASMUS strain) at a MOI of 5. Total RNA was purified using the Trizol method. RNA samples were treated with Turbo DNase (Invitrogen), by adding 1 l of TURBO DNase with 5 l 10x buffer in a 56 l reaction. The reaction was incubated at 37 o C for 30 minutes. 5 l of inactivating agent was added and incubated at room temperature for 2 minutes. The reaction was centrifuged at 10,000g for 90s and the RNA supernatant was transferred into a new tube. Primers for the generation of overlapping amplicons were designed using the Primer3Plus platform and re-checked again by Primer Blast (NCBI) to avoid primers with a high self-complementary score. Primers were synthesized by Eurofins Genomics (the sequence of the primers is shown Table 2 ). The stock concentration was 100 M, and primers were diluted in DNase/RNase free H2O to make a 10 M Fast5 files were base called again using Guppy to mitigate for the lower accuracy associated with the rapid base calling setting initially used on the MinIT. To investigate deletions in the MERS-CoV genome, library reads were filtered by expected amplicon size and then aligned to the NCBI MERS reference NC_019843.3 using minimap2 as per the artic pipeline and SVIM was used to identify deletions 18 . In addition, samtools was used to sort and index the alignment files, picard was used to remove amplification duplicates then a custom perl script was used count the nucleotides at each position on the reference genome, when the mapping quality was above 10. Data was visualised using R studio using the ggplot2. To consider the minor variation, the nucleotide depth at a single position with less than 20 counts was not taken forward into analysis to mitigate for random sequencing errors. For rapid assessment of the meta-transcriptome approach, Fastq files were uploaded to Oxford Nanopore Technology's (ONT's) cloud-based pipeline EPI2ME In addition, meta-transcriptomic reads were assessed with Kraken2. Host removal was carried out to remove human reads. The Human genome assembly GRCh38.p13 was used as a reference. The Human genome assembly was indexed with minimap2 using the parameter "-ax map-ont" 23 . The quality-controlled Oxford Nanopore fastq reads were mapped to the indexed human genome assembly with minimap2 using default parameters. The resulting sam file was sorted and converted to bam with the command "samtools sort" 24 . A fastq file with unmapped reads was produced with the "samtools fastq" command with the parameter "-f 4" 24 The quality controlled, host removed reads were classified using Kraken2 25 . The standard kraken output and sample report output files were produced with the options "--output" and "--report" respectively. The database used during Kraken classification consisted of the "bacteria" and "viral" kraken2 reference libraries (built 23rd April 2020 Coronaviruses encode several proteins that are involved in proof reading during virus replication 29,30 , therefore using nucleotide divergence for contact tracing may be problematic. Recombination and resulting deletions (and potentially insertions) may account for the wide genome diversity observed in some strains of coronaviruses 31-33 . These recombination events can led to new strains of coronaviruses 34 or potentially affect vaccine strategies 35, 36 . In order to identify these genome changes with sequencing, rapid long read length approaches offered by Oxford Nanopore were advantageous. Sample quality can also inform design of sequencing strategies and a RT-PCR approach was used to recover viral genome/transcript information. Rather than designing very short amplicons, longer amplicons were selected through selection of appropriate primer pairs along the MERS-CoV genome (Fig. 1 ). Twenty MERS-CoV genome sequences were aligned with a reference sequence terminus. This resulted in the selection of thirty sets of primer pairs ( Table 2 ) that could be used to walk across the MERS-CoV genome with overlap between each generated amplicon. Alternatively, primer pairs can be selected that allowed the generation of larger amplicons that spanned more of the genome in contiguous reads (Fig. 1) . To evaluate the utility of the selected primers for the amplification of viral RNA under controlled conditions, RNA was purified from MRC-5 cells that had been infected with the EMC strain of MERS-CoV at a MOI of 5. Infection was carried out under CL3+ conditions and total RNA purified from these infected MRC5-cells at 16 hrs post-infection. This RNA was used as a template to prime cDNA synthesis using random hexamers. Primer pairs ( Table 2) were tested by gradient PCR to determine the optimum annealing temperature (data not shown). This resulted in amplification conditions (Table 3) such that the MERS-CoV genome was amplified using either 30 amplicons ( Fig. 2A) , 15 amplicons (Fig. 2B ) or 8 amplicons (Fig. 2C ), using the same set of conditions for each set of amplicons ( Table 2 ). The rationale being that using the same amplification conditions across all primer pairs would be more efficient if large scale sample analysis was required. The data indicated that for the 30amplicon approach, PCR products were observed that spanned the MERS-CoV genome. For the 15 and 8 amplicon approach products were also observed that spanned the MERS-CoV genome. However, amplification of these products varied in efficiency. The The nucleotide substitution rate can drive the selection of genotypic and phenotypic variants of MERS-CoV. Whilst variation and potential functional changes in consensus genomes have been compared between patients, the ability to monitor minor variants and their frequency and how these contribute to the overall viral phenotype within a patient is unknown. The minor variant population in infection has been shown to influence the kinetics of virus replication and be associated with patient outcome 37 . Therefore, methodologies were developed that could be used to assess the minor variant frequency within a sample from a patient. The custom perl script used to call the consensus also revealed the nucleotide depth and the counts of each nucleotide at each position (Fig. 5) . The depth was used to normalise the mutation frequency into a proportion instead of a raw count, allowing comparison for samples of different read depth. Nucleotides that had a count less than 20 were removed from analysis. As proof of principle, this approach was applied to the sequencing data obtained from patients 10 and 115. Patient 10 appeared to have more base changes in comparison to patient 115 (Fig. 6) . Transitions (A>G, G>A, C>U, U>C) were more frequently observed and C>U seemed more prominent than other mutations. This is consistent with mRNA editing by APOBEC. APOBEC3 family members have been shown to be involved in restricting the growth of human coronavirus NL63 (HCoV-NL63) 38 and identified as potential drivers of C to U transitions in SARS-CoV-2 39 . We would note that assessing minor variant populations using data from MinION sequencing may be problematic due to the higher error rate in sequencing than for example Illumina based approaches 13 . However, with sufficient read depth in a sample, an overview of the minor variant frequency/population can still be obtained. Sequencing data from patients 10 and 115 were interrogated for deletions using SVIM. Table 4 shows deletions identified with more than 5 supporting reads. Patient Acinetobacter baumannii species (LpxC, adeI, adeJ, mexT, adeN, adeK, ADC-2) ( The abundance of bacteria was compared between fatal cases and non-fatal cases using DESeq2 (Fig. 10) . In general, bacteria from the Proteobacteria phyla were in higher abundance in the fatal cases in comparison to non-fatal, including species Table 2 : Details of primers sequence used in this study. The expected product size is given for the 30-amplicon approach (see Fig. 1 ). Table 4 : Analysis of deletions present in the MERS-CoV genome from patients 115 and 10. Columns from left to right; Patient number, deletion start position (bp), deletion end position (bp), the number of supporting reads for this deletion, the quali ty score (which takes into consideration the mapping quality scores, where a value greater than 10 has higher confidence), standard deviation (SD) of the deletion span (bp) and SD of the position of the deletion from the supporting reads. Coordinates are gi ven for the affected gene, and in the case of overlap, the second gene is provided. Table 4 . Pathogenic human coronavirus infections: causes and consequences of cytokine storm and immunopathology MERS-CoV infection is associated with downregulation of genes encoding Th1 and Th2 cytokines/chemokines and elevated inflammatory innate immune response in the lower respiratory tract Hajjassociated infections Treatment with interferon-alpha2b and ribavirin improves outcome in MERS-CoV-infected rhesus macaques Safety and immunogenicity of an anti-Middle East respiratory syndrome coronavirus DNA vaccine: a phase 1, openlabel, single-arm, dose-escalation trial Efficacy of an Adjuvanted Middle East Respiratory Syndrome Coronavirus Spike Protein Vaccine in Dromedary Camels and Alpacas Humoral Immunogenicity and Efficacy of a Single Dose of ChAdOx1 MERS Vaccine Candidate in Dromedary Camels ChAdOx1 and MVA based vaccine candidates against MERS-CoV elicit neutralising antibodies and cellular immune responses in mice Use of quarantine in the control of SARS in Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection SARS transmission pattern in Singapore reassessed by viral sequence variation analysis Temporal and spatial analysis of the 2014-2015 Ebola virus outbreak in West Africa Real-time, portable genome sequencing for Ebola surveillance Virus genomes reveal factors that spread and sustained the Ebola epidemic Resurgence of Ebola Virus Disease in Guinea Linked to a Survivor With Virus Persistence in Seminal Fluid for More Than 500 Days Metagenomic sequencing at the epicenter of the Nigeria 2018 Lassa fever outbreak Data sharing: Make outbreak research open access Transmissibility of MERS-CoV Infection in Closed Setting Mutations in coronavirus nonstructural protein 10 decrease virus replication fidelity One severe acute respiratory syndrome coronavirus protein complex integrates processive RNA polymerase and exonuclease activities Experimental evidence of recombination in coronavirus infectious bronchitis virus Sequence evidence for RNA recombination in field isolates of avian coronavirus infectious bronchitis virus coronavirus through recombination Feline coronavirus type II strains 79-1683 and 79-1146 originate from a double recombination between feline coronavirus type I and canine coronavirus Genetic diversity of MERS-CoV spike protein gene in Saudi Arabia Agarose gel electrophoresis of amplicons generated using 30 (A), 15 (B) and C) combinations of primers pairs. These primer pairs were used to generate amplicons in combination with reverse transcription of RNA extracted from MERS-CoV infected cells We would like to acknowledge the Research Center at King Fahad Medical City for funding this study (Grant no. 019-003). The authors declare that they have no conflict of interest.