key: cord-0711114-fkik557p authors: Hassan, Sk Sarif; Basu, Pallab; Redwan, Elrashdy M.; Lundstrom, Kenneth; Choudhury, Pabitra Pal; Serrano-Aroca, Angel; Azad, Gajendra Kumar; Aljabali, Alaa A.A.; Palu, Giorgio; El-Aziz, Tarek Mohamed Abd; Barh, Debmalya; Uhal, Bruce D.; Adadi, Parise; Takayama, Kazuo; Bazan, Nicolas G.; Tambuwala, Murtaza; Lal, Amos; Chauhan, Gaurav; Baetas-da-Cruz, Wagner; Sherchan, Samendra P.; Uversky, Vladimir N. title: Periodically aperiodic pattern of SARS-CoV-2 mutations underpins the uncertainty of its origin and evolution date: 2021-09-22 journal: Environ Res DOI: 10.1016/j.envres.2021.112092 sha: d3cb86a302a3d7bd6d9d7ce8a8754283e5a383cf doc_id: 711114 cord_uid: fkik557p Various lineages of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have contributed to prolongation of the coronavirus disease 2019 (COVID-19) pandemic. Several non-synonymous mutations in SARS-CoV-2 proteins have generated multiple SARS-CoV-2 variants. In our previous report, we have shown an evenly uneven distribution of unique protein variants of SARS-CoV-2 is geo-location or demography-specific. However, the correlation between the demographic transmutability of the SARS-CoV-2 infection and mutations in various proteins remains unknown due to hidden symmetry/asymmetry in the occurrence of mutations. This study tracked how these mutations are emerging in SARS-CoV-2 proteins in six model countries and globally. In a geo-location, considering the mutations having a frequency of detection of at least five hundred in each SARS-CoV-2 protein; we studied the country-wise percentage of invariant residues. Our data revealed that since October 2020, highly frequent mutations in SARS-CoV-2 have been observed mostly in the Open Reading Frames (ORF) 7b and ORF8, worldwide. No such highly frequent mutations in any of the SARS-CoV-2 proteins were found in the UK, India, and Brazil, which does not correlate with the degree of transmissibility of the virus in India and Brazil. However, we have found a signature that SARS-CoV-2 proteins were evolving at a higher rate, and considering global data, mutations are detected in the majority of the available amino acid locations. Fractal analysis of each protein's normalized factor time series showed a periodically aperiodic emergence of dominant variants for SARS-CoV-2 protein mutations across different countries. It was noticed that certain high-frequency variants have emerged in the last couple of months, and thus the emerging SARS-CoV-2 strains are expected to contain prevalent mutations in ORF3a, membrane, and ORF8 proteins. In contrast to other beta-coronaviruses, SARS-CoV-2 variants have rapidly emerged based on demographically dependent mutations. Characterization of the periodically aperiodic nature of the demographic spread of SARS-CoV-2 variants in various countries can contribute to the identification of the origin of SARS-CoV-2. Various lineages of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have contributed to prolongation of the coronavirus disease 2019 pandemic. Several non-synonymous mutations in SARS-CoV-2 proteins have generated multiple SARS-CoV-2 variants. In our previous report, we have shown an evenly uneven distribution of unique protein variants of SARS-CoV-2 is geo-location or demography-specific. However, the correlation between the demographic transmutability of the SARS-CoV-2 infection and mutations in various proteins remains unknown due to hidden symmetry/asymmetry in the occurrence of mutations. This study tracked how these mutations are emerging in SARS-CoV-2 proteins in six model countries and globally. In a geo-location, considering the mutations having a frequency of detection of at least five hundred in each SARS-CoV-2 protein; we studied the country-wise percentage of invariant residues. Our data revealed that since October 2020, highly frequent mutations in SARS-CoV-2 have been observed mostly in the Open Reading Frames (ORF) 7b and ORF8, worldwide. No such highly frequent mutations in any of the SARS-CoV-2 proteins were found in the UK, India, and Brazil, which does not correlate with the degree of transmissibility of the virus in India and Brazil. However, we have found a signature that SARS-CoV-2 proteins were evolving at a higher rate, and considering global data, mutations are detected in the majority of the available amino acid locations. Fractal analysis of each protein's normalized factor time series showed a periodically aperiodic emergence of dominant variants for SARS-CoV-2 protein mutations across different countries. It was noticed that certain high-frequency variants have emerged in the last couple of months, and thus the emerging SARS-CoV-2 strains are expected to contain prevalent The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of the Coronavirus Disease (COVID-19) (Pedersen et al., 2020; Setti et al., 2020; Domingo et al., 2020) . SARS-CoV-2 has spread rapidly and has evolved prolonging pandemic and precarious clinical entity (Health, 2020; Tapper and Asrani, 2020) . Since the beginning of the pandemic, SARS-CoV-2 has increasingly accumulated various mutations leading to patterns of genomic diversity (van Dorp et al., 2020; Zhou et al., 2020) . The wide SARS-CoV-2 variations were scattered across the various geolocations, and it can underlie geographically specific etiological effects (Mercatelli and Giorgi, 2020) . It was expected that these mutations could be of use to to monitor the spread of the virus, and to identify sites putatively under selection as SARS-CoV-2 potentially adapts to its new human host. SARS-CoV-2 may be evolving towards higher transmissibility as it may not yet fully adapt to its human host. The most plausible mutations under putative natural selection are those which have emerged repeatedly and independently (van Dorp et al., 2020) . It was reported that 198 sites in the SARS-CoV-2 genome appear to have already undergone recurrent, independent mutations (van Dorp et al., 2020) . Various SARS-CoV-2 missense mutations are the key evolving factors affecting the infectivity, and virulence, and pathogenicity of the virus (Mercatelli and Giorgi, 2020) . Several SARS-CoV-2 variants have significantly strengthened their infectivity as reported previously . It was previously reported that the rate of SARS-CoV-2 mutations are relatively low compared to other RNA viruses, such as influenza virus (Kupferschmidt, 2020; Khan et al., 2020; Gómez-Carballa et al., 2020) . The low SARS-CoV-2 mutation rate might relate to its proofreading ability, which is a unique embedded function of SARS-CoV-2 (Romano et al., 2020; Ogando et al., 2020) .Thus far, although several mutations have been detected, SARS-CoV-2 seem not to be drifting antigenically (Yuan et al., 2021; Williams and Burgers, 2021) . However, the mechanism of SARS-CoV-2 evolution or developing gain of function variations have remained unclear . A non-uniform mutation pattern in the viral proteins was recently reported, and further emerging of the questions of the origin of SARS-CoV-2 (Hassan et al., 2021b) . The rapidly evolving data on mutations and various strains of SARS-CoV-2 makes it vulnerable to firmly assert whether SARS-CoV-2 results from a zoonotic emergence or from an accidental escape of a laboratory (Sallard et al., 2021; Pipes et al., 2021; Nadeau et al., 2021; Seyran et al., 2021) . This issue of origin needs to be resolved because it has important consequences on the risk/benefit balance of human interactions with ecosystems, on intensive breeding of wild and domestic animals, on some laboratory practices and on scientific policy and bio-safety regulations (Sallard et al., 2021) . Despite these recent investigations, several issues related to the evolutionary patterns and origin of the COVID-19 pandemic remain to be fully characterized (Liu et al., 2020 (Liu et al., , 2019 Domingo, 2021a,b) . No direct correlation was observed in the mutation pattern of SARS-CoV-2 from the infection rate in the first and second waves of COVID-19 (Ko et al., 2021; Lv et al., 2020; Kumar et al., 2020) . In this study, the potential embedded mutation pattern of S, E, M, N, ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10 proteins of SARS-CoV-2 are analyzed in six model countries viz. USA, UK, Brazil, Germany, India, South Africa (SA), and globally. The majority of publicly available SARS-CoV-2 genomic sequences were sourced from GISAID, NCBI, and CNGB. The SARS-CoV-2 sequence data was taken from the GISAID database (Shu and McCauley, 2017) . Mutation data with their respective details were collected from the CoVal database. In this study, we focused on spike (S), membrane (M), envelope (E), nucleocapsid (N), ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10 proteins of SARS-CoV-2. We also considered mutations within the protein of our interest from all geo-locations available in the CoVal database. In particular, a set of six model countries, the USA, UK, South Africa, India, Germany, and Brazil was selected. Single mutation details were retrieved from the CoVal database by searching by the name of the model country and the SARS-CoV-2 protein of interest. For example, details of all single mutations in the SARS-CoV-2 S protein from the UK were retrieved, of which a snapshot is presented in Figure 1 . Likewise, for other model countries and SARS-CoV-2 proteins, details of single mutations were retrieved. Prior to proceeding into the result section, some definitions are recalled and redefined for easy reading. Four different classes were defined on the date of the first detection and frequency of mutations in the SARS-CoV-2 proteins of our interest. Class-I contains all mutations detected in the proteins of SARS-CoV-2 across the world. Class-II contains only those mutations with a frequency more than or equal to 500 (a reasonably good frequency of a mutation detected in a geo-location) in SARS-CoV-2 in affected patients worldwide. Mutations that were detected after October 2020 belong to Class-III. Mutations with a frequency larger than 500 since October 2020 are the members of Class-IV. Amino acid residues where no mutations were detected are termed "invariant residues". From the CoVal database, we first found distinct residue positions of mutations, and the total number of invariant residues (r) of each type. Furthermore, the percentage of invariant residues in each protein of length l was determined using the formula l−r × 100. The percentages of invariant residues (Class-I) in SARS-CoV-2 proteins in various countries are listed in Table 1 . J o u r n a l P r e -p r o o f Considering all mutations with amino acid changes in all available geo-locations, it was observed that, except from the SARS-CoV-2 structural proteins, the ORF proteins (ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10) possessed mutations at every residue position of the respective protein. On the other hand, it was found that an in-creasing order of the percentage of invariant residues in the structural proteins of SARS-CoV-2 turned out to be E(6.67) < N (10.74) < S(14.69) < M (24.32). In other words, the highest and lowest number of mutations were detected in the E and M proteins, respectively. Across six countries, the highest and lowest number of invariant residues were found in the M and ORF3a proteins, respectively. It was noted that the highest frequency of invariant residues in the E among all the proteins was observed in Germany. Among all the proteins, the highest number of mutations was detected in the ORF3a proteins in Germany, India, Brazil, and South Africa, whereas in the USA and UK, ORF7b possessed the highest number of mutations. Notably, it was observed that each protein in SARS-CoV-2 possessed an almost similar number of mutations in the USA and UK. Among six countries, the least amount of mutations across all proteins was found in COVID-19 patients from South Africa, whereas in the USA, the highest number of mutations in SARS-CoV-2 were detected. An increasing-order (decreasing-order) of the six geo-locations based on the invariant residues (mutations) across all proteins was found as USAUSA>Germany>SA>Brazil>India (from highest to lowest). In other words, the highest number of different mutations across all these proteins were detected in the UK, and the lowest number of mutations was observed in India. In India, Brazil, and SA, no mutation with a frequency of at least 500 (originated in these countries) was detected in any SARS-CoV-2 protein, except one or two reported most frequent deleterious mutations D614G in S and Q57H in ORF3a. Note that in Germany, the S, N, ORF10, ORF3a, and ORF8 proteins possessed a couple of mutations, whereas E, M, ORF6, ORF7a, and ORF7b did not have any mutations originating from Germany, with a frequency of more than 500. The percentages of invariant residues (Class-III) in the SARS-CoV-2 proteins in various countries are presented in Table 3 . It was observed that the SARS-CoV-2 ORF3a protein was no longer a hotspot for dominant mutations. From October 2020 until June 2021, M and ORF8 owned the lowest (66.22%) and the highest (99.174%) number of mutations, respectively, in SARS-CoV-2 across the world. Notably, in the past, ORF3a possessed the highest number of mutations (Bianchi et al., 2021; Hassan et al., 2020) . Currently, it seems that ORF3a mutations in the USA, UK, and elsewhere are relatively rare (less than 10%). Since October 2020, the highest number of mutations have been detected in ORF3a in SA. In the UK, Germany, India, and SA, the E protein had the lowest frequency of mutations, whereas M protein possessed several mutations amounting to the highest among others in the USA and Brazil. The highest percentage of mutations were detected in the ORF8 protein, since October 2020 in Germany, India, and Brazil. The highest frequency of mutations in the USA and UK were observed in ORF7a and ORF7b, respectively. An increasing order of the six geo-locations based on the variability of mutations in SARS-CoV-2 proteins (detected after September 2020 until June 2021) was India