key: cord-0978252-d0tq5wjx authors: Taiwo, Idowu A.; Adeleye, Nike; Anwoju, Fatimah O.; Adeyinka, Adeyemi; Uzoma, Ijeoma C.; Bankole, Taiwo T. title: Sequence analysis for SNP detection and phylogenetic reconstruction of SARS-cov-2 isolated from Nigerian COVID-19 cases date: 2022-01-18 journal: New Microbes New Infect DOI: 10.1016/j.nmni.2022.100955 sha: 023c36e0f7a54c6103c44e7dc1c017832f1f63b0 doc_id: 978252 cord_uid: d0tq5wjx Background Coronaviruses are a group of viruses that belong to the Family Coronaviridae, Genus Betacoronavirus. In December 2019, a new coronavirus disease (COVID-19) characterized by severe respiratory symptoms was discovered. The causative pathogen was a novel coronavirus known as 2019-nCoV and later as SARS-CoV-2. Within two months of its discovery, COVID-19 became a pandemic causing widespread morbidity and mortality. Methodology Whole genome sequence data of SARS-CoV-2 isolated from Nigerian COVID-19 cases were retrieved by downloading from GISAID database. A total of 18 sequences that satisfied quality assurance (length > 29700 nts and number of unknown bases denoted as `N’ < 5%) were used for the study. In addition, genome sequence of SARS-CoV-2 obtained from Nigeria’s COVID-19 index case (Accession ID: EPI_ISL_413550) and the reference genome (Accession NC_ 045512.2) were obtained from GISAID and the GenBank databases respectively. Multiple sequence alignment (MSA) was done in MAFFT (Version 7.471) while SNP calling was implemented in DnaSP (Version 6.12.03) respectively and then visualized in Jalview (Version 2.11.1.0). Phylogenetic analysis was with MEGA X software. Results Nigerian SARS-CoV-2 had 99.9% genomic similarity with four large conserved genomic regions. A total of 66 SNPs were identified out of which 31 were informative. Nucleotide diversity assessment gave Pi = 0.00048 and average SNP frequency of 2.22 SNPs per 1000 nts. Non-coding genomic regions particularly 5’UTR and 3’UTR had a SNP density of 3.77 and 35.4 respectively. The region with the highest SNP density was ORF10 with a frequency of 8.55 SNPs/1000 nts). This value was significantly higher (P<0.01) than that of the spike gene, the region of greatest interest in SARS-CoV-2 genomics. Majority (72.2%) of viruses in Nigeria are of L lineage with preponderance of D614G mutation which accounted for 11 (61.1%) out of the 18 viral sequences. Nigeria SARS-CoV-2 revealed 3 major clades namely Oyo, Ekiti and Osun on a maximum likelihood phylogenetic tree. Conclusion and Recommendation There was a preponderance of L lineage (to include the new lineage scheme) and D614G mutants. Nigerian SARS-CoV-2 genome revealed ORF1ab as the region containing the highest SNP density as compared to the spike gene. The implication of this distribution of SNPs for the empirical lower infectivity of SARS-CoV-2 in Nigeria is discussed. This also underscores the need for more aggressive testing and treatment of COVID-19 in Nigeria. Additionally, attempt to produce testing kits for COVID-19 in Nigeria should consider the conserved regions identified in this study. Strict adherence to COVID-19 preventive measure is recommended in view of Nigerian SARS-CoV-2 phylogenetic clustering pattern, which suggests intensive community transmission possibly rooted in communal culture characteristic of many ethnicities in Nigeria. Coronaviruses are a group of viruses that belong to the Family Coronaviridae, Genus 63 Betacoronavirus [1, 2] . These viruses are of special interest because they possess the largest 64 genome among RNA viruses and also have the capability to infect a wide host range causing 65 intestinal and respiratory infections in animals and humans [3, 4] . In recent times, coronaviruses 66 have attracted renewed interest in view of a novel coronavirus disease outbreak of 2019 19) that originated in Wuhan, China [5] . The causative agent was found to be a novel coronavirus 68 (2019-nCoV) that was identified in December, 2019. Barely two months after its discovery, had 69 the disease become a pandemic of global concern causing widespread morbidity and mortality [6, 70 7]. Because COVID-19 is a serious respiratory disease, the causative pathogen was later known as 71 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Before the origin of SARS-CoV-2, six coronaviruses were known to infect man [5, 8] . Out of 73 these, severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory To ensure true homology, non-biological differences (i.e. differences due to technical variations) 146 were removed from the retrieved sequences. The sequences were aligned in MAFFT and trimmed 147 at the 5' and 3' ends in MEGA X to obtain homologous sequences of 29,787 nts each. Alignment (Table 4) . J o u r n a l P r e -p r o o f Multiple sequence alignment of translated SARS-CoV-2 spike (S) gene revealed a D/G amino 255 acid substitution at position 614 of the spike protein (Fig. 4) . There was a preponderance of 256 D614G mutation because majority 11 (61.1%) of the analyzed genomic sequences of SARS- CoV-2 had G at 614 position of spike protein as compared to 7 (39.9%) that had D at the site. Four major clusters were identified as depicted in Fig. 6 Origin and evolution of pathogenic coronaviruses The establishment of reference 457 sequence for SARS-CoV-2 and variation analysis Genome Sequence of a 2019 Novel Coronavirus (SARS-CoV-2) Strain Isolated in Nepal Tsiodras 465 S. Full-genome evolutionary analysis of the novel corona virus (2019-nCoV) rejects the 466 hypothesis of emergence as a result of a recent recombination event The proximal origin of 470 SARS-CoV-2 SARS-CoV-2 genome sequence from Nigerian COVID-19 case SARS-CoV-2 Genomes from Nigeria Reveal 479 Multiple Virus Lineages and Spike Protein Mutation Associated with Higher 480 Transmission and Pathogenicity The authors declare no competing interest.