key: cord-0975871-orkz1boo authors: Al-Shomrani, Badr M.; Manee, Manee M.; Alharbi, Sultan N.; Altammami, Mussad A.; Alshehri, Manal A.; Nassar, Majed S.; Bakhrebah, Muhammed A.; Al-Fageeh, Mohamed B. title: Genomic Sequencing and Analysis of Eight Camel-Derived Middle East Respiratory Syndrome Coronavirus (MERS-CoV) Isolates in Saudi Arabia date: 2020-06-03 journal: Viruses DOI: 10.3390/v12060611 sha: b65cd32d4a5347f1938f7d6ebb7b1f68f120522a doc_id: 975871 cord_uid: orkz1boo Middle East respiratory syndrome coronavirus (MERS-CoV) causes severe respiratory illness in humans; the second-largest and most deadly outbreak to date occurred in Saudi Arabia. The dromedary camel is considered a possible host of the virus and also to act as a reservoir, transmitting the virus to humans. Here, we studied evolutionary relationships for 31 complete genomes of betacoronaviruses, including eight newly sequenced MERS-CoV genomes isolated from dromedary camels in Saudi Arabia. Through bioinformatics tools, we also used available sequences and 3D structure of MERS-CoV spike glycoprotein to predict MERS-CoV epitopes and assess antibody binding affinity. Phylogenetic analysis showed the eight new sequences have close relationships with existing strains detected in camels and humans in Arabian Gulf countries. The 2019-nCov strain appears to have higher homology to both bat coronavirus and SARS-CoV than to MERS-CoV strains. The spike protein tree exhibited clustering of MERS-CoV sequences similar to the complete genome tree, except for one sequence from Qatar (KF961222). B cell epitope analysis determined that the MERS-CoV spike protein has 24 total discontinuous regions from which just six epitopes were selected with score values of >80%. Our results suggest that the virus circulates by way of camels crossing the borders of Arabian Gulf countries. This study contributes to finding more effective vaccines in order to provide long-term protection against MERS-CoV and identifying neutralizing antibodies. Coronaviruses are a family of single-stranded, positive-sense RNA genomes that range from 26,000 to 32,000 basepairs in length. This family can infect both humans and animals, and has been found in many mammal hosts, including bats, camels, and pigs [1] . Coronaviruses are also known to have high genetic diversity influenced by mutation and recombination, which can lead to the development of new viruses. Before 2003, human coronaviruses were primarily known for causing common colds. In 2003, severe acute respiratory syndrome coronavirus (SARS-CoV) was first diagnosed in southern China as an epidemic, and went on to infect almost 8000 people in 29 countries, with a fatality rate of approximately 10% [2, 3] . Almost ten years later, in 2012, another novel and deadly human coronavirus emerged, named the Middle East respiratory syndrome coronavirus (MERS-CoV); it infected more The consensus MERS-CoV genome obtained through analyzing isolates from eight infected camels from different locations across Saudi Arabia was 30,118 nt in length with a GC content of 41%. The eight constituent genomes shared >99% identity. The camel MERS-CoV genome is similar to that of human MERS-CoV, containing ten ORFs (ORF1ab, ORF3, spike [S], ORF5, ORF4a, ORF4b, envelope [E] , membrane [M] , nucleocapsid [N] , and ORF8b). BLAST comparison to other camel and human MERS-CoV strains available from NCBI revealed that all share ∼99% identity with one another. The comparative analysis of homologous sequences of spike proteins and genes has been performed for the 31 Betacoronavirus strains ( Table 1 ). The MERS-CoV strains were nearly identical across their spike proteins and genes (>99.50% and >81%, respectively). Indicative of a very recent emergence into the human population. Phylogenetic analysis was performed on the complete genomes and spike proteins of the eight new MERS-CoV sequences and 23 representative viruses from the genus Betacoronavirus. The eight camel-derived MERS-CoV sequences clustered in a distinct clade and were all relatively close to other MERS-CoV sequences from human and camel hosts ( Figure 1 ). The 2019-nCoV falls in a distinct clade and shows a closer relationship to Bat-SARS-Like (GU190215.1) than the MERS-CoV strains. In agreement with the complete genome results, analysis based on the spike proteins showed that 2019-nCoV has a closer relationship to Bat-SARS-Like and SARS-CoV strains than MERS-CoV strains ( Figure 2 ). We observed very high conservation between the new eight sequenced MERS-CoV genomes, with sequence identity >99%, despite the different geographical locations of sampling. In addition, a specific sample collected in 2015 from a human in Qatar was revealed to be more closely related to the eight new MERS-CoV isolates than to other UAE and Oman MERS-CoV strains. Genetically, the MERS-CoV genome can overall be considered more diverse than those of other betacoronavirus family members. Phylogenetic analysis of the MERS-CoV spike protein and its counterparts showed that the four clades formed four well-supported branches ( Figure 2 ). B cell epitopes are significant in the defense of the immune system against viral diseases, allowing B cells to identify different viral infections and activate responses against them. There has been many studies aiming to design MERS-Cov vaccine [24] [25] [26] . In this study, epitope analysis of MERS-CoV showed that the MERS-CoV spike protein has 24 discontinuous regions in total, from which just six epitopes with score values of >0.8 could be selected. The higher the score, the greater the potential of a true discontinuous epitope; the highest probability obtained was 0.898. Table 2 details the peptides of these high-scoring discontinuous epitopes and their sequence locations, epitope lengths, and associated scores. Their positions on the 3D structure of the MERS-CoV spike protein (PDB ID: 5X5F) are illustrated in Figure 3 . MERS-CoV is a zoonotic virus for which ongoing studies indicate major roles in transmission for direct contact with live camels and with humans having symptomatic MERS [27] [28] [29] , with dromedary camels being considered intermediate hosts that are the main source of human infections. The second largest outbreak to date occurred in Saudi Arabia, and MERS-CoV is widespread among humans and camels in the Gulf countries [30] , where it remains a potential hazard to regional and global health and welfare. The present study seeks to determine the evolutionary relationships between eight new MERS-CoV isolates obtained from Arabian camels in different regions of Saudi Arabia and 23 other previously sequenced betacoronavirus strains, based on phylogenetic trees constructed with BEAST for the complete viral genome and the spike protein. In the phylogenetic analysis, the eight new strains clustered in a separate clade, exhibiting close relationships with strains obtained from MERS-CoV-infected camels and humans in Saudi Arabia, the United Arab Emirates (UAE), Qatar, and Oman. This finding supports the expectation that the virus circulates via camels crossing borders with Oman and other Gulf countries, and is also consistent with the report of another study that movement of camels between Saudi Arabia, the UAE, and Oman contributes to the spread of the virus throughout the Arabian Peninsula [31] . Based on these results, it is recommended that restrictions be implemented to prevent these animals from migrating between Gulf countries. Within the Betacoronavirus genus, SARS-CoV and MERS-CoV are distinct from 2019-nCoV [32] ; the 2019-nCoV genome and spike glycoprotein, respectively, showed 48.83% and 41.95% sequence identity with human-derived MERS-CoV [33] . The 2019-nCoV virus was found to be most closely related to Bat-SARS-Like [1]. This confirms the identification of bats as reservoirs for coronaviruses, including MERS-CoV and 2019-nCoV. In conclusion, this study determined the evolutionary relationships of newly sequenced MERS-CoV, along with 23 previously sequenced isolates of the genus Betacoronavirus. We also performed genomic characterization of MERS-CoV, highlighting its origin and likely means of transmission through Gulf countries. In addition, we analyzed the MERS-CoV spike protein for suitable epitopic candidates and identified six viral peptides with good antigen presentation scores (>80%). The samples were collected from 30 dromedary camels at different locations in Saudi Arabia, following the guidelines of the Food and Agriculture Organization of the United Nations. The camel was appropriately restrained, and the camel nasal wing was then held and raised by the thumb and index finger, followed by gentle insertion of the nasal swab by the upper half of the nasal vestibule as deep as possible until the swab is handled. The swab was then laterally rotated, transferred into a tube containing virus transport media, and kept at 4 • C in preparation for PCR screening. Total RNA was successfully extracted from eight samples of camel nasal fluids using a MagMax-96 viral RNA extraction kit (Thermo Fisher Scientific, Waltham, MA, USA) as per the manufacturer's protocol. The final DNAase-treated RNA product was stored at −80 • C. Eight complete genomes of MERS-CoV from C. dromedarius have been sequenced in our lab. The samples came from multiple geographical locations in Saudi Arabia: four from Jeddah city, one from Northern Saudi Arabia, two from Mecca city, and one from Riyadh city (Figure 4 ). For the sake of comparison, we also retrieved 21 complete genomes of betacoronavirus strains from the National Center for Biotechnology Information (NCBI) database (http://www.ncbi.nlm.nih.gov/genomes/ VirusVariation/Database/nph-select2.cgi), along with those of two 2019-nCoV strains isolated in Saudi Arabia (https://www.gisaid.org). In total, 31 betacoronavirus sequences were analyzed (Table 3 ). The concentration of each RNA sample was measured on a Qubit 3.0 using the Qubit RNA BR assay kit (Thermo Fisher Scientific, Waltham, MA, USA) and reverse transcribed with a SuperScript II cDNA kit (Thermo Fisher Scientific, Waltham, MA, USA). A minimum RNA concentration of 100 ng/uL was used for cDNA conversion. High-quality cDNA was subjected to Ampliseq library preparation as per the manufacturer's protocol using MERS Ampliseq Panels (Thermo Fisher Scientific, Waltham, MA, USA) with a mean insert size of 200 bp. Each library was assigned to a distinct barcode using the Ion Xpress Barcode Adapters 1-16 kit and purified using Agencourt AMpure XP beads (Beckman Coulter, Brea, CA, USA). Purified libraries were efficiently quantified on a StepOnePlus Real-Time PCR system using an Ion Library TaqMan quantification kit (Thermo Fisher Scientific, Waltham, MA, USA) as per the manufacturer's protocol. Libraries were normalized to 25 pM, and template-positive Ion Sphere particles (ISPs) were obtained by pooling all ten normalized libraries and performing clonal amplification on a OneTouch 2 system using the Ion PGM HiQ OT2 200 kit (Thermo Fisher Scientific, Waltham, MA, USA). Template-positive ISPs were then loaded on Ion 318 chips and sequenced on an Ion PGM instrument using the Ion PGM HiQ Sequencing kit (Thermo Fisher Scientific, Waltham, MA, USA). The resulting data were preprocessed (base calling, base quality recalibration, alignment and consensus sequence assembly) using Torrent Suite Server v 5.6. Ninety percent of the consensus sequences were obtained from this data; the remaining gaps were filled by Sanger sequencing on a 3730xl Genetic Analyzer using custom-designed primers (Macrogen LLC, Seoul, Korea) and BigDye Terminator v 3.1 Cycle Sequencing chemistry (Thermo Fisher Scientific, Waltham, MA, USA). To fill the gaps that remained after mapping Ion reads, we designed 11 primer sets (Table 4 ) against the NCBI MERS-CoV reference genome assembly (NC_019843.3). Primers were designed and checked using Primer Express software (v3.0) (Applied Biosystems, Foster City, CA, USA) and NetPrimer (Premier Biosoft, Palo Alto, CA, USA), with gap flanking regions incorporated into the primers. Sequences for all the major gaps were amplified by PCR using Platinum TM Hot Start PCR Master Mix 2X (Thermo Fisher Scientific, Waltham, MA, USA). PCR products were sequenced using the BigDye ® Terminator Sequencing Kit (Applied Biosystems, Foster City, CA, USA) on an Applied Biosystems 3730xl DNA Analyzer. Most minor gaps were sample-specific; to ensure complete genome coverage for phylogenetic analysis, these gaps were closed manually after performing multiple sequence alignment (MSA) on the fully-sequenced genomes. MSA was carried out using Geneious software [34] . Table 4 . Primer sets used for filling sequence gaps. Gap1_F1 CCTCGTTCTCTTGCAGAACTT Gap1_R1 CCGGACGAAACCGTGTATT Gap1_F2 CACTTCCCCTCGTTCTCTTG Gap1_R2 ACCGTGTATTGTGACCGAGA Gap2_F1 ACCAATTGGCTTATAGCTCTAGT Cap2_R1 CCTTTAGGATCCGCCTCAAA Gap3_F1 GGGATTACCCTAAGTGTGATAGAG Gap3_R1 GTGCACTGACATTAGCAGTTG Gap4_F1 CTGGGTTGTACCCAACCATT Gap4_R1 CGTGCTGTAGGGTAGTAAATCG Gap5_F1 TTTGCCAGGTTGTGATGG Gap5_R1 ATACTCTCTGTACTCTGTAGCAT Gap6_F1 TTCTTTACTTACCTGTGTAACCTCA Gap6_R1 TCATAGGAGTGGAATTTCTCCAAA Gap7_F1 CAGCATCAGCTCGTGATCTT Gap7_R1 CTCCCAGAGCCTGATTAAACTTAT Gap8_F1 CTCCTTTGGCCGTAGATGTT Gap8_R1 CCGCTAGCAGGAATGTATGT Gap8_F2 CCTTTGGCCGTAGATGTTGT Gap8_R2 GCCGCTAGCAGGAATGTATG Gap9_F1 AGATCGCGGCAATCGTT Gap9_R1 GGCACTGTTCACTTGCAATC We first performed multiple sequence alignment of the 31 strains using MAFFT (v7) [35] . Next, we estimated the phylogenetic relationships using a time-framed Bayesian evolution analysis approach via a Markov Chain Monte Carlo (MCMC) inference method; this approach was implemented in the BEAST package (v1.8.2) [36] . For analysis of whole genomes, we used the GTR+I substitution model, a strict clock model, and a constant-size tree prior. For the spike protein dataset, we used the Blosum62+Gamma substitution model, a strict clock model, and a constant-size tree prior. MCMC analyses were run for 20 million iterations, sampling every 100 thousand iterations after a 10% burn-in. We assessed each run's convergence using Tracer (v1.6) [37] . The BEAST package TreeAnnotator was utilized to summarize a maximum clade credibility (MCC) tree for every dataset using MCMC tree samples. Phylogenetic clustering patterns were visualized and analyzed with FigTree (v1.4.3) (http://tree.bio.ed.ac.uk/software/figtree/). The IEDB [38] and BCPRED [39] servers were utilized to predict B cell epitopes of the MERS-CoV spike protein. IEDB software can predict both linear and discontinuous epitopes using a protein 3D structure (PDB format) as input. The software associates each epitope with its score, which is defined as a Protrusion Index (PI) method. For example, score value 0.8 means the predicted epitope would include within 80% of the protein amino acids and 20% of the protein amino acids will be outside of the epitope. Only epitopes that were located on the outer surface of the protein were chosen. Six residues in length was regarded as adequate to prompt a defensive immune reaction. We focused on discontinuous antigenic epitopes as they are increasingly explicit and have higher dominant attributes over linear epitopes [40, 41] . DQ648857.1Batcoronavirus(BtCoV/279/2005) MK211374.1CoronavirusBtRl-BetaCoV/SC2018 KT444582.1SARS-likecoronavirusWIV16 MN988713.1Severeacuterespiratorysyndromecoronavirus2isolate2019-nCoV/USA-IL1 KY417145.1BatSARS-likecoronavirusisolateRf4092 KF636752.1BatHp-betacoronavirus/Zhejiang2013 MK211374.1CoronavirusBtRl-BetaCoV/SC2018 KT444582.1SARS-likecoronavirusWIV16 KF636752.1BatHp-betacoronavirus/Zhejiang2013 GQ153547 DQ648857.1Batcoronavirus(BtCoV/279/2005) MN988713.1Severeacuterespiratorysyndromecoronavirus2isolate2019-nCoV/USA-IL1 KY417145.1BatSARS-likecoronavirusisolateRf4092 Writing-original draft Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Coronaviruses in avian species-review with focus on epidemiology and diagnosis in wild birds Emergence of SARS-like coronavirus poses new challenge in China Structure, function, and evolution of coronavirus spike proteins Middle East respiratory syndrome coronavirus infection is inhibited by griffithsin Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia Lack of Middle East respiratory syndrome coronavirus transmission from infected camels Seroepidemiology for MERS coronavirus using microneutralisation and pseudoparticle virus neutralisation assays reveal a high prevalence of antibody in dromedary camels in Egypt Middle East respiratory syndrome Animal models of Middle East respiratory syndrome coronavirus infection Human infection with MERS coronavirus after exposure to infected camels Middle East respiratory syndrome coronavirus infection in dromedary camels in Saudi Arabia Evidence for camel-to-human transmission of MERS coronavirus Camelid genomes reveal evolution and adaptation to desert environments Medicine and Surgery of Camelids Reported direct and indirect contact with dromedary camels among laboratory-confirmed MERS-CoV cases Unraveling the drivers of MERS-CoV transmission Middle East respiratory syndrome coronavirus: Transmission and phylogenetic evolution The recent ancestry of Middle East respiratory syndrome coronavirus in Korea has been shaped by recombination CoV spike protein: A key target for antivirals Epitope-based peptide vaccine design and target site depiction against Middle East Respiratory Syndrome Coronavirus: An immune-informatics study Characterization of novel monoclonal antibodies against MERS-coronavirus spike protein Production of a Monoclonal Antibody targeting the M protein of MERS-CoV for detection of MERS-CoV using a synthetic peptide epitope formulated with a CpG-DNA-liposome complex Zoonotic origin and transmission of Middle East respiratory syndrome coronavirus in the UAE Middle East respiratory syndrome (MERS) coronavirus and dromedaries Middle East respiratory syndrome coronavirus transmission Middle East respiratory syndrome coronavirus (MERS-CoV) in dromedary camels Prevalence of Middle East respiratory syndrome coronavirus (MERS-CoV) in dromedary camels in Abu Dhabi emirate A novel coronavirus from patients with pneumonia in China The SARS-CoV-2 outbreak from a one health perspective. One Health Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability Bayesian phylogenetics with BEAUti and the BEAST 1.7 The immune epitope database and analysis resource: From vision to blueprint Predicting linear B-cell epitopes using string kernels Conformational B-cell epitope prediction on antigen protein structures: A review of current algorithms and comparison with common binding site prediction methods Prediction of residues in discontinuous B-cell epitopes using protein 3D structures