key: cord-0427203-hxqudykh authors: Sengupta, Antara; Ghosh, Sreeya; Choudhury, Pabitra Pal title: Analysis of changes occurring in Codon Positions due to mutations through the cellular automata transition rules date: 2021-09-01 journal: bioRxiv DOI: 10.1101/2021.08.30.458305 sha: 793c1007c2f4794bbada2205147154ccb256a152 doc_id: 427203 cord_uid: hxqudykh Variation in the nucleotides of a codon may cause variations in the evolutionary patterns of a DNA or amino acid sequence. To address the capability of each position of a codon to have non-synonymous mutations, the concept of degree of mutation has been introduced. The degree of mutation of a particular position of codon defines the number of non-synonymous mutations occurring for the substitution of nucleotides at each position of a codon, when other two positions of that codon remain unaltered. A Cellular Automaton (CA), is used as a tool to model the mutations of any one of the four DNA bases A, C, T and G at a time where the DNA bases correspond to the states of the CA cells. Point mutation (substitution type) of a codon which characterizes changes in the amino acids, have been associated with local transition rules of a CA. Though there can be transitions of a 4-state CA with 3-neighbourhood cells, here it has been possible to represent all possible point mutations of a codon in terms of combinations of 16 local transition functions of the CA. Further these rules are divided into 4 classes of equivalence. Also, according to the nature of mutations, the 16 local CA rules of substitutions are classified into 3 sets namely, ‘No Mutation’, ‘Transition’ and ‘Transversion’. The experiment has been carried out with three sets of single nucleotide variations(SNVs) of three different viruses but the symptoms of the diseases caused by them are to some extent similar to each other. They are SARS-CoV-1, SARS-CoV-2 and H1N1 Type A viruses. The aim is to understand the impact of nucleotide substitutions at different positions of a codon with respect to a particular disease phenotype. Genetic code defines some rules to translate genetic information encoded in nucleotide triplets or codons into amino acids. It also defines the order of amino acid to be added next during protein synthesis. 4 3 = 64 codons are there in genetic code table, which encodes 20 standard amino acids and 3 stop codons. Hence, there arises a context of degeneracy. Multiplet structure of DNA sequence [1] specifies that instead of one-to-one mapping a single amino acid can be coded by one, two, three, four or six codons. The codon usage is an important determinant of gene expression and surprisingly transcriptions rather than translations play a key role here [2, 3] . It has been reported that instead of codons or amino acids, codon and amino acid usage is consistent with the forces acting on four DNA bases [4] . Analysis of codon usage gives insight about the evolution of any organism [5] . Selection of codon to code for an amino acid is a natural selection and amino acid composition in protein aims to minimize the the impact of mutations on protein structure [6] . A codon can have mutations at the first, second or third positions. Mutations at the third position of the codon are more likely to be synonymous than mutations that occur at the first or second positions [7] . Hence, probability of substitution of amino acid with a new one due to mutations at third position of a codon is less than that of its first and second positions. The second position of codon is the most conserved position, as According to the genetic code table 61 codons code for 20 amino acids and there are three stop codons [28] . The standard classic model of genetic code table consists four rows and four columns. The four rows represents the first base of each codon, the four columns represent the second base and the right side indicates the third base of them. Codon contains combinations of 4 bases A,T,C,G at its 3 positions and as a whole codes for a particular amino acid. Since there are 20 different amino acids and 64 possible codons, more than one codon may code for a single amino acid. Hence, any changes in nucleotides at any positions of codon due to mutation either may change the produced amino acid or can code for the same amino acid and there is a talk about non-synonymous and synonymous mutations respectively. Here in this section it is tried to get a clear view of mapping between codon and amino acid when mutations occur at first, second and third positions of a codon. Definition 2.1 (Degree of Mutation of a particular position of codon). Given a codon C with constituent nucleotides say, (N 1 , N 2 , N 3 ), where N i ∈ N is a particular position of a codon. Now, consider S i as any one nucleotide among the set of nucleotides S={T,C,A,G} at a particular position N i in codon C when nucleotides at other two positions are constant. The degree of mutation (δ(M )) at a particular position of codon defines the number of non-synonymous mutations occurred to substitution of nucleotides at that position of a codon, when other two positions of that codon are unaltered. It is to be noted that when any two positions of a codon are constant, it is possible to make change in 16 possible places with maximum three nucleotides, when the position is initially being occupied by any one of four nucleotides (shown in Table 1 ). Thus number of probable changes of amino acids (AA) due to mutations at first position of codon may be between 0 to 3 when 2nd and 3rd positions are constant or fixed. Hence, the degree of mutation at the first position of codon vary from 0 to 3. Due to mutation change in nucleotide at 2nd position the probable change in amino acid will be maximum and the degree of mutation is between the range of 2 to 3. The mutations at third position of codon have very less capability to make non-synonymous changes in amino acids and hence the range is 0 to 2. As an example, when T is constant at both 2nd and 3rd positions, due to change in nucleotides (A/T/C/G) at first position the total numbers of amino acids can be changed is 3 according to genetic code table and the amino acids are F, L, I and V. Hence, the degree of mutation δ(M ) here is 3. [29, 14, 30] ) is a triplet (Q, Q Z , τ ), where, • Q is a finite state set Definition 2.2. A restriction from Z to a subset S i containing i ∈ Z, induces a restriction of C to c i given by c i : S i → Q; where c i may be called the local configuration and S i the neighbourhood of the i th cell. The mapping µ i : Q S i → Q is known as a local transition function for the i th cell. Thus ∀i ∈ Z, µ i (c i ) ∈ Q and it follows that, Point mutation of a codon can be associated with local transitions of Cellular Automata (CA) having 3-celled local configurations. 16 substitutions are possible with four DNA bases A, C, T and G. They are The local configuration of the i th cell maybe denoted by (x, y, c i ) such that c i−2 = x and c i−1 = y. The local transition function for i th cell denoted by µ R(xyi) is, where R(xyi) is the rule number for some R ∈ {0, 1, 2, ..., 15}. The rules for third position mutation corresponding to first and second position constant nuleotides x, y is computed as follows : The local configuration of the i th cell maybe denoted by (x, c i , y) such that c i−1 = x and c i+1 = y. The local transition function for i th cell denoted by µ R(xiy) is, The rules R(xiy) for second position mutation are as follows : . . . · · · · · · . . . The local configuration of the i th cell maybe denoted by (c i , x, y) such that c i+1 = x and c i+2 = y. The local transition function for i th cell denoted by µ R(xyi) is, The rules for first position mutation are as follows : . . . · · · · · · . . . These combinations of 16 CA rules can further be classified into three sets which depict No Mutation, Transition and Transversion of nucleotides irrespective of the position where the point mutation occurs. According to the rules for point mutation with respect to constant nucleotides x and y we get: representing Transitions where point mutations occur due to substitutions between any two purine (A or G) bases or pyrimidine bases (T or C); representing Transversions where point mutations occur due to substitution of a purine (A or G) base by a pyrimidine base (T or C) or vice-versa. These classifications have been tabulated in Table 2 . 2.4. Amino Acids Arising due to Point Mutations Represented by Equivalent Rules Definition 2.3. Any two local transition functions for an i th cell denoted by µ R(xyi) and µ R (xyi) are equivalent if both the rules produce same output. Thus where R(xyi) and R (xyi) are rule numbers for R, R ∈ {0, 1, 2, ..., 15}. GT/GC/GA/GG [3] (GiT), [3] (GiC), [3] (GiA), [3] (GiG) 1st GT/GC/GA/GG [3] (iGT), [3] (iGC), [3] (iGA), [3] To establish the novelty of the methodologies discussed in previous section, it is necessary to apply the same into a given dataset. To carry out the experiment, mutated genomic sequences of three types of genes SARS-CoV-1, SARS-CoV-2 and H1N1 are taken. 39680 of genomic sequences of SARS-CoV-2 reported for Asian countries are collected from https://covidcg.org/, which is an open resource to track SNVs (single-nucleotide variations). For SARS-CoV-1, 54 mutated genomic sequences are considered. 35008 patients' data of H1N1 type A are collected from NCBI influenza virus database (https://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi#mainform). The information about collected dataset are summarized at table. It is observed that mutations occurred at different positions of codon throughout the dataset. Here in this section we have tried to find out the highest occurrence of codon transitions. The degree of mutations for each mutation is analysed. It has been observed that mutations are majorly taken place of degree 3 for all datasets, which has been shown in figure1. shows percentage of mutations occurred according to CA rule. It has been observed that the SNVs of SARS-CoV-2 has a trend to mostly follow the rules 4 (51%), whereas, CA rule 4 (19.01%) and CA rule 1 (18.71%) have approximately equal contributions in of SARS-CoV-1. The SNVs of H1N1 has the trend of rule 14 (51%). The rule 4 indicates the substitution of nucleotide C by T and rule 1 specifies substitution of T by C, i.e. between pyrimidines and rule 14 indicates the substitution of neucleotide G by A, i.e. between purines. Further microscopic view has been given on the codon position wise degree of mutations occurred in SARS-CoV-1 and SARS-CoV-2 where rule 4 (C → T ) is applied and in H1N1 rule 14 (G → A ) is applied maximum. The result is shown in Figure 4 . It is remarkable that in both the datasets of SARS-CoV-1 and SARS-CoV-2 maximum mutations took place at 2nd position of codons and they are of degree 3. In H1N1 TYPE A virus all the mutations of degree 3 are taken places equally at the 1st and 2nd position of codons. Few transversions (15.29%) are also taken place in SARS-CoV-2. In these case base G of codons are substituted by T. In CA rule this substitution comes under rule 12. In SARS-CoV-1, few transversions are found, where substitutions are taken place between A and T, which are defined by CA rule 2 (T → A) and rule 8 (A → T ). It is reported that the most harmful mutations due to substitutions take place between A and T. These kind of mutations change the hydropathy and polarity of amino acids. Hence, next point of investigation is carried out with it. It has been observed that according to CA rules, 9.61% and 3.51% of total SNVs found in the data set of SARS-CoV-1 are following the rule 8 (T → A) and rule 2 (A → T ) respectively (shown in Table 6 ). In this article, point mutation (substitution type) of a codon has been associated with local transitions of Cellular Automata (CA) having 3-celled local configurations. Clearly, 16 substitutions are possible with four DNA bases A, C, T and G, which can be represented in terms of combinations of 16 local transition functions of CA. The experiment has been carried out with three sets of SNVs of three different viruses but the symptoms of the diseases caused by them are to some extent similar to each other. They are SARS-CoV-1, SARS-CoV-2 and H1N1 Type A viruses. The aim is to understand the impact of nucleotide substitutions in different codon positions on mutations occurred in a particular disease phenotype. With reference to the supplementary Table S1 it is to be noted that although the size of genomic sequences taken for all three viruses are huge, but H1N1 type A virus has comparatively very few variants. Codon usage bias is observed in all organisms even in viruses too. The reason behind may be either pressure of natural selection or due to biases in the mutation process. According to the origin and evolution theory of genetic code, codons are selected in such a way so that it can minimize the adverse effect of point mutations and translation errors. It has been observed that in all the datasets maximum mutations have taken place at the codons having degree of mutation 3. The codons having degree of mutation 3 are capable to change up to 3 amino acids due to substitution of nucleotides at a particular position. It has been observed that in the SNVs of SARS-CoV-1 the mutations majorly took place at 2nd positions but in SNVs of H1N1 type A 1st and 2nd positions of codons are equally affected. In SARS-CoV-2 the maximum mutations occurred at 1st positions of codon. The second position of codon is the most functionally constrained position and causes non-synonymous change. According to the nature of mutations, 16 CA rules of substitutions are classified into 3 classes namely, 'No Mutation', 'Transition' and 'Transversion'. Experimental results find substitutions from CA class 'Transition' more than the other two classes. Transition mutations are more likely than transversions, because transversions make substitutions of nucleotides between purine (having 2 rings in it structure) and pyrimidine (having 1 ring). Hence, substitution of a single ring structure with another single ring structure is more likely than substitution of a double ring with a single ring. Transitions are more certain to change amino acids. Harmful substitutions from CA class 'Transversion' (rule 2 and rule 8) are noticed (13.16% in total) between bases A and T in some SNVs of SARS-CoV-1, which are responsible to make huge structural changes in existing proteins. In this article a Cellular Automaton has been used to model substitutions of four DNA bases A, C, T and G at different positions of codons. Considering codon as a triplet, substitution of nucleotides may take place in any one of the three positions of a codon and cause point mutations. All possible point mutations have been represented here as functions of 16 CA transition rules. Point mutations may or may not make changes in the amino acids. The degree of mutation at a particular position of a codon defines the number of amino acids change due to substitution of nucleotides at each position of the codon, when other two positions of that codon are fixed. Hence, the degree of mutation specifies the capability of nucleotide substitutions in a particular position of a codon to produce new amino acids and their impacts in a particular disease pathogenesis. Thus, the aim of this work is investigating the codon alteration patterns due to nucleotide substitutions and their impact during mutations of a gene responsible for a particular disease. Hence, signature of a particular disease could be portrayed in the light of CA transition rules and codon alteration patterns. The genetic code multiplet structure Codon usage is an important determinant of gene expression levels largely through its effects on transcription Variation and selection on codon usage bias across an entire subphylum A simple model based on mutation and selection explains trends in codon and amino-acid usage and gc composition within and across genomes Codon usage analysis of zoonotic coronaviruses reveals lower adaptation to humans by sars-cov-2, Infection Amino acid composition of proteins reduces deleterious impact of mutations Variation in evolutionary processes at different codon positions Correction: Optimization of the standard genetic code according to three codon positions using an evolutionary algorithm Mathematical model for coronavirus disease 2019 (covid-19) containing isolation class Mathematical models for devising the optimal sars-cov-2 strategy for eradication in china, south korea, and italy Co-infection with sars-cov-2 and influenza a virus in patient with pneumonia, china The role of clade competition in the diversification of north american canids Clade gr and clade gh isolates of sars-cov-2 in asia show highest amount of snps Theory of cellular automata: A survey Cellular automata and its applications in bioinformatics: A review Towards modeling dna sequences as automata A cellular automaton model for the study of dna sequence evolution Reconstruction of dna sequences using genetic algorithms and cellular automata: Towards mutation prediction? A New Kind of Computational Biology: Cellular Automata Based Models for Genomics and Proteomics Mutations of the" game of life": A generalized cellular automata perspective of complex adaptive systems Application of local rules and cellular automata in representing protein translation and enhancing protein folding approximation Computational model on covid-19 pandemic using probabilistic cellular automata Fuzzy cellular automata model for discrete dynamical system representing spread of mers and covid-19 virus A novel cellular automata classifier for covid-19 trend prediction A simple cellular automaton model for influenza a viral infections Probing the effects of the well-mixed assumption on viral infection dynamics Modeling influenza viral dynamics in tissue The origin of the genetic code Some algebraic properties of linear synchronous cellular automata Evolutions of some one-dimensional homogeneous cellular automata The authors declare no competing interests. Table S1 . Virus-wise specification of all the SNVs.