key: cord-0747768-ycuty2g6 authors: Badaoui, Bouabid; Sadki, Khalid; Talbi, Chouhra; Salah, Driss; Tazi, Lina title: Genetic diversity and genomic epidemiology of SARS-CoV-2 in Morocco date: 2021-02-03 journal: Biosaf Health DOI: 10.1016/j.bsheal.2021.01.003 sha: a21735e3242e209006818eedd3d432e1f601ef37 doc_id: 747768 cord_uid: ycuty2g6 Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome Corona virus 2 (SARS-CoV-2), declared as a pandemic due to its rapid spread worldwide. In this study, we investigate the genetic diversity and genomic epidemiology of SARS-CoV-2, using 22 virus genome sequences reported by three different laboratories in Morocco till 07/06/2020, as well as 40,366 virus genomes from all around the world. The SARS-CoV-2 genomes from Moroccan patients revealed 62 mutations, of which 30 were mis-sense mutations. The mutations Spike_D614G and NSP12_P323L were present in all the 22 analyzed sequences, followed by N_G204R and N_R203K, which occurred in 9 among the 22 sequences. The mutations NSP10_R134S, NSP15_D335N, NSP16_I169L, NSP3_L431H, NSP3_P1292L and Spike_V6F occurred once in Moroccan sequences, with no record in other sequences worldwide. Phylogenetic analyses revealed that Moroccan SARS-CoV-2 genomes included 9 viruses belonging to clade 20A, 9 to clade 20B and 2 to clade 20C, suggesting that the epidemic spread in Morocco did not display a predominant SARS-CoV-2 route. Therefore, multiple and unrelated introductions of SARS-CoV-2 into Morocco through different routes have occurred, giving rise to the diversity of virus genomes in the country. Further, in all probability, the SARS-CoV-2 circulated in a cryptic way in Morocco, starting from January 15, 2020 before the first case was officially discovered on March 2, 2020. An outbreak of respiratory illness, named COVID-19 disease, caused by SARS-CoV-2, was reported, for the first time, in Wuhan (Hubei Province, China) in early December 2019 and subsequently many cases were reported in other countries worldwide. On 30 th January 2020, WHO declared that the Corona-virus disease constituted a Public Health Emergency of International Concern. This pandemic has infected more than 7,725,457 people globally and caused more than 427,683 deaths as of June 7, 2020 (https://www.worldometers.info/coronavirus/). This disease was confirmed to have spread to Morocco on 2 nd March 2020, when the first COVID-19 case was confirmed. Till June 7, 2020 Morocco reported 8692 confirmed cases and 212 deaths. The most important feature of this disease is expressed by a high-level of inflammatory response, including pro-inflammatory cytokines in an especially severe form which cause pneumonia and severe acute respiratory syndrome [1] . SARS-CoV-2 genome has a size of 29.8-29.9 kb, harboring a long 5' end that contains orf1ab. The 3′ end codes for the structural proteins: small envelope (E) protein, matrix (M) protein, spike (S) protein and nucleocapsid (N) protein. Further, the SARS-CoV-2 genome contains six accessory proteins, encoded by ORF3a, ORF6, ORF7a, ORF7b, and ORF8 genes [2] . Compared to SARS-CoV, SARS-CoV-2 has high transmission and less pathogenicity [3] due to hitherto unknown reasons. Virus evolution in nature arises through manifold mechanisms, of which nucleotide substitution is of utmost importance [4] . Genomic epidemiology of emerging viruses is of major relevance for capturing virus evolution and spread [5, 6] . This approach has been proved to be very efficient during the Ebola virus epidemic in West Africa [7] and the Zika virus spread in Brazil [8] . To investigate the mutations underlying the evolution of SARS-CoV-2 in Morocco, 40390 genome sequences of SARS-CoV-2, of which 22 were from Moroccan patients, were collected from GISAID (https://www.gisaid.org/) to study the diversity and genomic epidemiology of SARS-CoV-2 in Morocco. J o u r n a l P r e -p r o o f To assess the genetic variation of SARS-CoV-2 in Morocco, a total of 40390 complete genomes of SARS-CoV-2 and their corresponding metadata were retrieved from GISAID database [9] and analyzed using the Nextstrain bioinformatics pipeline [10] . The platform Augur was used to perform a multi-alignment using all the genomes through MAFFT [11] and 'Wuhan-Hu-1/2019', and 'Wuhan/WH01/2019' as reference genomes. The phylogeny was built by maximum likelihood using IQTREE [12] . For comparative analysis of SARS-CoV-2 genome sequences, we used the following protocol: i) Collection of SARS-CoV-2 sequences and the corresponding metadata from the GISAID database. ii) Filtering the SARS-CoV-2 sequences to exclude inaccurate sequences based on missing bases and sequence length and to set a fixed number of samples per group according to their similarities. iii) Performing a multi-sequence alignment via MAFFT. vi) Inferring a phylogenetic tree from the multi-sequence alignment, getting a time-resolved tree through TreeTime and inferring ancestral traits sequences, and v) Identifying the mutations. In this study, the analysis of 22 genomes from Moroccan patients revealed 62 mutations (Supplementary Table 1 ) of which 30 were mis-sense mutations (Supplementary Table 2 ). Among these, Spike_D614G and NSP12_P323L were present in all the 22 analyzed sequences, followed by N_G204R and N_R203K occurring in 9 of the 22 sequences (Table 1) . Phylogenetic analysis (Fig. 1) shows that the SARS-CoV-2 genomes from Moroccan patients are dispersed across the evolutionary tree of SARS-CoV-2, estimated from 40390 genomes available on GISAID as of June 07, 2020. These included 9 viruses from clade 20A, 9 from Clade 20B and 2 from clade 20C (Fig. 3 ). This reveals that the epidemic spread in Morocco did not show a predominant SARS-CoV-2 lineage. However, it is more likely that the virus circulated covertly around the beginning of February, before the official discovery of the first case on March 2. Among the missense mutations, Spike_D614G, NSP12_P323L, N_R203K and N_G204R occurred with high frequency worldwide. It is very likely that this contributes to increase the SARS-CoV-2 transmissibility. Throughout its evolution within the host, the virus seeks to proliferate efficiently, while concurrently circumventing host morbidity to set out a maximum transmission [14] . This is in concordance with the concomitant reduced pathogenicity that accompanies its transmission increase. The Spike_D614G mutation manifested a major effect on the efficiency of the virus to infect hosts [15] and showed a high ability to hinder the immune systems of hosts that already dealt with version of SARS-CoV-2 without the Spike_D614G mutation [16] . This aspect should be emphasized for future vaccine researches. Though the mutation NSP12_P323L aroused the substitution of proline that plays a prominent role in protein folding and aggregation, neither increased the SARS-CoV-2's infectivity or its fitness regarding natural selection [17] . This might be because the change from proline to leucine amino-acid (P323L) did not change the protein function, as both amino-acids pertain to the non-polar aliphatic R groups. Other missense mutations occurred either with small frequency like Spike_M1237I that was to receptors on the host cell [18] . This glycoprotein is also the major target of neutralizing antibodies [19] . Hence, mutations in the spike surface glycoprotein could provoke change in the antigenicity of SARS-CoV-2. The mutations NSP10_R134S, NSP15_D335N, NSP16_I169L, NSP3_L431H, NSP3_P1292L and Spike_V6F that occurred in the Moroccan sequences with no record in other sequences worldwide, should get careful attention and should be investigated to figure out their potential effects on the SARS-CoV-2 virulence, as well as their association with immunological and clinical symptoms. Special focus should be afforded to the mutation Spike_V6F, as the structural protein spike has been proven to be essential for the virus' ability to infect the hosts and was used as an important target for vaccine development [20] . It is tempting to speculate that the specific evolutionary profiles of the SARS-CoV-2 in Morocco might be the product of the interaction between its evolutionary routes before reaching the country and its adaptation to the Moroccan genetic background. Second, all the samples analyzed, except one, were obtained from public health laboratories and thus may not be representative of the general population. The genetic analysis of the SARS-CoV-2 genomes from Moroccan patients revealed some new mutations, with no record in other sequences worldwide. These mutations should be investigated to ascertain their potential effects on the SARS-CoV-2 virulence and to evaluate their impacts on the immune response. The phylogenetic analyses revealed that the COVID-19 spread occurred  The genetic analysis of the SARS-CoV-2 genomes from Moroccan patients revealed some new mutations with no aforementioned record in other sequences worldwide.  Genomic Epidemiology analyses revealed that the COVID-19 spread occurred through multiple and unrelated introductions of COVID-19 into Morocco via different routes. The proximal origin of SARS-CoV-2 Genomic characterization of a novel SARS-CoV-2 Understanding SARS-CoV-2-mediated inflammatory responses: From mechanisms to potential therapeutic tools Quasispecies theory and the behavior of RNA viruses Pandemic potential of a strain of influenza a (h1n1): Early findings Towards a genomics-informed, real-time, global pathogen surveillance system Emergence of zaire ebola virus disease in guinea Establishment and cryptic transmission of zika virus in brazil and the americas GISAID: Global initiative on sharing all influenza data from vision to reality Nextstrain: Real-time tracking of pathogen evolution MAFFT: A novel method for rapid multiple sequence alignment based on fast fourier transform IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies Phylogenetic analysis of nCoV-2019 genomes Virulence evolution and the trade-off hypothesis: History, current state of affairs and the future The global population of SARS-CoV-2 is composed of six major subtypes SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity Mutations in SARS-CoV-2 viral RNA identified in eastern india: Possible implications for the ongoing outbreak in india and impact on viral structure and host susceptibility Human coronavirus: Host-pathogen interaction Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in wuhan, china: A descriptive study, The Lancet Preliminary identification of potential vaccine targets for the COVID-19 coronavirus (SARS-CoV-2) based on SARS-CoV immunological studies