key: cord-0807032-zb1pzdd0 authors: Badaoui, Bouabid; Sadki, Khalid; Talbi, Chouhra; Driss, Salah; Tazi, Lina title: Genetic Diversity and Genomic Epidemiology of SARS-COV-2 in Morocco date: 2020-06-25 journal: bioRxiv DOI: 10.1101/2020.06.23.165902 sha: fb0e71964095a703fd2100cb50c393ff0cd0737b doc_id: 807032 cord_uid: zb1pzdd0 COVID-A9 is an infection disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), declared as a pandemic due to its rapid expansion worldwide. In this study we investigate the genetic diversity and genomic epidemiology of SARS-CoV-2 using 22 virus genome sequences reported by three different laboratories in Morocco till the date 07/06/2020 as well as (40366) virus genomes from all around the world. The SARS-CoV-2 genomes from Moroccan patients revealed 62 mutations of which 30 were missense mutations. The mutations Spike_D614G and NSP12_P323L were present in all the 22 analyzed sequences, followed by N_G204R and N_R203K which occurred in 9 among the 22 sequences. The mutations NSP10_R134S, NSP15_D335N, NSP16_I169L, NSP3_L431H, NSP3_P1292L and Spike_V6F occurred one time in our sequences with no record in other sequence worldwide. These mutations should be investigated to figure out their potential effects on all around the world virulence. Phylogenetic analyses revealed that Moroccan SARS-CoV-2 genomes included 9 viruses pertaining to clade 20A, 9 to clade 20B and 2 to clade 20C. This finding suggest that the epidemic spread in Morocco did not show a predominant SARS-CoV-2 route. For multiple and unrelated introductions of SARS-CoV-2 into Morocco via different routes have occurred, giving rise to the diversity of virus genomes in the country. Furthermore, very likely, the SARS-CoV-2 virus circulated in cryptic way in Morocco starting from the fifteen January before the discovering of the first case the second of March. An outbreak of respiratory illness, named COVID-19 disease, caused by SARS-CoV2 virus, have been reported in Wuhan, Hubei Province, China the beginning of December 2019 and then quickly spread to the rest of the world. On 30 January 2020, the WHO declared that the coronavirus disease constitutes a Public Health Emergency of International Concern. This pandemic has infected more than 7,725,457 people around the world and caused more than 427,683 deaths as of June 7th, 2020 (https://www.worldometers.info/coronavirus/). This disease was confirmed to have spread to Morocco on 2 March 2020, when the first case COVID-19 case was confirmed. Till June 7, 2020 Morocco reported 8692 confirmed cases and 212 deaths. The most important feature of this disease is expressed by high-level of inflammatory response including pro-inflammatory cytokines especially in severe form which cause pneumonia and severe acute respiratory syndrome (Andersen et al. 2020) . SARS-CoV-2 genome has a size of 29.8-29.9 kb harboring a long 5' end that contains orf1ab. The 3 end codes for ′ the structural proteins: small envelope (E) protein, matrix (M) protein, spike (S) protein and nucleocapsid (N) protein. Furthermore, the SARS-CoV-2 contains six accessory proteins, encoded by ORF3a, ORF6, ORF7a, ORF7b, and ORF8 genes (Khailany, Safdar, and Ozaslan 2020) . Compared to SARS-CoV, SARS-CoV-2 has high transmission and less pathogenicity (Fu, Cheng, and Wu 2020) for, hitherto, unknown reasons. Viruses evolution in nature comes about through manifold mechanisms of which nucleotide substitution is of uttermost importance (Lauring and Andino 2010) . Furthermore, genomic epidemiology of emerging viruses is of chief relevance for capturing virus evolution and spread (Fraser et al. 2009, and gardy et al. 2018) . This approach has been proved to be very efficient during the Ebola virus disease epidemic in West Africa (Baize et al. 2014 ) as well as Zika virus in Brazil (Faria et al. 2017) . To investigate the mutations underlying the evolution of SARS-CoV-2 in Morocco, 40390 genomes sequences of SARS-CoV-2, of which 22 were from Moroccan patients, were collected from GISAID (https://www.gisaid.org/) and used to study the diversity and genomic epidemiology of SARS-CoV-2 in Morocco. To assess the genetic variation of SARS-CoV-2 in Morocco, a total of 40390 complete genomes of SARS-CoV-2 as well their corresponding metadata were retrieved from GISAID database (Shu and McCauley 2017) and analyzed using the Nextstrain bioinformatics pipeline (Hadfield et al. 2018 ): The platform Augur was used to perform a multi-alignment using all the genomes via MAFFT (Katoh et al. 2002) and ('Wuhan-Hu-1/2019', 'Wuhan/WH01/2019') as reference genomes. The phylogeny was built by maximum likelihood using IQTREE (Nguyen et al. 2015) . In this study, the analysis of 20 genomes from Moroccan patients revealed 62 mutations (Supplementary Table 1 ) of which 30 were missense mutations (Supplementary Table 2) . Among those mutations, Spike_D614G and NSP12_P323L were present in all the 22 analyzed sequences, followed by N_G204R and N_R203K which occurred in 9 sequences among the 22 (Table 1) . Herein we report on the putative history of SARS-CoV-2 transmission in Morocco as revealed by genomic epidemiology (Figure 1 and 2 Phylogenetic analysis (Figure 1) shows that the SARS-CoV-2 genomes from Moroccan patients generated in this study are dispersed across the evolutionary tree of SARS-CoV-2 viruses, estimated from 40390 genomes available on GISAID as of June 07th, 2020. These viruses included 9 viruses from clade 20A, 9 from Clade 20B and 2 viruses from clade 20C (Figure 4 ). lineage. However, it's more likely that the virus circulated in hidden way around the beginning of February before the discovering of the first case the 2 th of March. Among the missense mutations, Spike_D614G and NSP12 P323L, N R203K and N G204R occurred with high frequency worldwide. This, verisimilarly, contributes to increase the SARS-CoV-2 transmissibility. In fact, Throughout its evolution within the host, the virus seeks to proliferate efficiently whilst concurrently circumventing host morbidity to set out a maximum transmission (Alizon et al. 2009 ). This is in concordance with the concomitant reduced pathogenicity that accompanies its transmission increase. The Spike_D614G mutation manifested a major effect on the virus efficiency to infect hosts (Junior et al. 2020 [preprint] ) and showed a high ability to hinder the immune systems of hosts that already dealt with version of SARS-CoV-2 without the Spike_D614G mutation (Zhang et al. 2020 [preprint] ). This aspect should be emphasized for future vaccine researches. The mutation P323L, although provoked the substitution of proline that plays a prominent role in protein folding, and aggregation, neither increased the SARS-CoV2's infectivity nor its fitness regarding natural selection (Maitra et al 2020) . This might be because the change from proline to leucine amino-acid ( P323L) did not change the protein function as both amino-acids pertain to the nonpolar aliphatic R groups. Other missense mutations occurred either with small frequency like Spike M1237I that was reported 11 times all over the world or have been reported only one time worldwide like NSP12_M196I, NSP3_A1819V, M_L13F, NSP14_D324A, NSP14_T75I and NSP5_V125I. These Preliminary identification of potential vaccine targets for the covid-19 coronavirus (sars-cov-2) based on sars-cov immunological studies Virulence evolution and the trade-off hypothesis: History, current state of affairs and the future The proximal origin of sars-cov-2 Emergence of zaire ebola virus disease in guinea Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in wuhan, china: A descriptive study Establishment and cryptic transmission of zika virus in brazil and the americas Pandemic potential of a strain of influenza a (h1n1): Early findings. science Understanding sars-cov-2-mediated inflammatory responses: From mechanisms to potential therapeutic tools Human coronavirus: Host-pathogen interaction Towards a genomics-informed, real-time, global pathogen surveillance system Nextstrain: Real-time tracking of pathogen evolution The global population of sars-cov-2 is composed of six major subtypes MAFFT: A novel method for rapid multiple sequence alignment based on fast fourier transform Genomic characterization of a novel sars-cov-2 Quasispecies theory and the behavior of rna viruses Mutations in sars-cov-2 viral rna identified in eastern india: Possible implications for the ongoing outbreak in india and impact on viral structure and host susceptibility IQ-tree: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies Phylogenetic analysis of nCoV-2019 genomes GISAID: Global initiative on sharing all influenza data-from vision to reality The d614g mutation in the sars-cov-2 spike protein reduces s1 shedding and increases infectivity