key: cord-0860323-zh58a30y authors: Victor, Manish Prakash; Das, Rohit; Ghosh, Tapash Chandra title: An in-silico study on SARS-CoV-2: Its compatibility with human tRNA pool, and the polymorphism arising in a single lineage over a month date: 2020-07-23 journal: bioRxiv DOI: 10.1101/2020.07.23.217083 sha: 053658e14377217cfdd39d1060245668e78d94a4 doc_id: 860323 cord_uid: zh58a30y SARS-CoV-2 has caused a global pandemic that has costed enormous human lives in the recent past. The present study is an investigation of the viral codon adaptation, ORFs’ stability and tRNA co-adaptation with humans. We observed that for the codon usage bias in viral ssRNA, ORFs have near values of folding free energies and codon adaptation index with mRNAs of the human housekeeping CDS. However, the correlation between the stability of the ORFs in ssRNA and CAI is stronger than the mRNA stability and CAI of HKG, suggesting a greater expression capacity of SARS-CoV-2. Mutational analysis reflects polymorphism in the virus for ORF1ab, surface glycoprotein and nucleocapsid phosphoprotein ORFs. Non-synonymous mutations have shown non-polar substitutions. Out of the twelve mutations nine are for a higher t-RNA copy number. Viruses in general have high mutation rates. To understand the chances of survival for the mutated SARS-CoV-2 we did simulation for synonymous mutations. It resulted in 50% ORFs with higher stability than their native equivalents. Thus, considering only the synonymous mutations the virus can exhibit a lot of polymorphism. Collectively our data provides new insights for SARS-CoV-2 mutations and the human t-RNA compatibility. Significance Survivability of SARS-CoV-2 in humans is essential for its spread. It has overlapping genes exhibiting a high codon optimization with humans even after a higher codon usage bias. They seem to possess cognizance for high copy number t-RNA (cognate or near-cognate) in humans, while mutating. Even though, it has been well established that native transcripts posses the highest stability, our in-silico studies show that SARS-CoV-2 under mutations give rise to ORFs with higher stability. These results significantly present the virus’s ability and the credibility of survival for the mutants. Despite its focus on a geographical location it explains the ongoing behavior of SARS-CoV-2 for a steady existence in humans as all the different lineages have a common origin. Wuhan, China. We have used Pearson correlations with 95% level of confidence as a measure of significance 149 unless otherwise stated. R language and environment (https://www.r-project.org/) was used to 174 Effective number of codons (ENC) quantifies the codon usage bias in genes/genomes. It ranges 175 from 20(extremely biased) to 61(nil bias).Thus, a higher value of ENC indicates lower bias and 176 vice-versa. In our dataset for SARS-CoV-2 predecessor strain the ENC=45.001 and for 177 Housekeeping genes the ENC=56.550. Other successor strains have also shown very near values 178 of ENC amongst themselves and with the predecessor (Table1). co-adaptation index (co-AI) or 179 tRNA co-adaptation is an indicator of the codon optimization to the tRNA pool (23, 24) . As genes, and it is used to estimate the gene expression level (24). Using the codon set of highly 187 expressed genes in Homo sapiens, SARS-CoV-2 predecessor strain yielded CAI mean = 0.637 188 which is very close to HKG CAI mean = 0.73 (Refer Table S2 Hundred in-silico mutated sequences were generated through random synonymous codon 204 shuffling of each ORF in the predecessor strain. For each random sequence of a gene we 205 maintained the inherent codon usage frequency and the amino-acid sequences equivalent to each 206 gene in the predecessor strain. Minimum folding energy for all the sequences were calculated 207 (Refer S1.1, Supplementary materials and methods). We found that 50% of the ORFs showed 208 stability maxima, higher than their native ORFs (Refer Table S2 .4, Supplementary tables). The 209 difference between the stabilities obtained is significant which was found through the one-tailed Table S2 .5, Supplementary tables). Overall twelve codon 217 changes (different frequencies) have been recorded, resulting in a miniscule GC-content 218 reduction (~0.01%) in entirety ( Table 2 ). All the non-synonymous changes are non-polar (Table 219 3). Out of the twelve mutations recorded, seven showed the choice for rare codons, which after 220 near-cognate t-RNA mapping has shown the choice for higher t-RNA gene copy number, three 221 showed the choice for frequently used codons and the rest two are neutral changes. In totality 222 nine mutations are for the higher gene copy number of cognate and near-cognate t-RNA 223 containing the anticodons ( Table 2 ). The polymorphism was calculated between the predecessor 224 strain and all the successor strains as the ratio of non-synonymous to synonymous 225 polymorphisms (þn/ þs). þn/ þs for all the mutations has generated values >1 (Table 4) . Housekeeping genes are established to be evolutionarily conserved set of genes that tends to 253 have higher codon adaptation index. They are required for the basic cell maintenance processes. Table S2 .1, Supplementary tables). As SARS-CoV-2 affects the respiratory system we needed a set of housekeeping genes that can be relied upon even if there 259 was any kind of pulmonary infection. The set of housekeeping genes (HKG) were taken as a 260 control as suggested through the acute pulmonary inflammation study by(32). (i.e. protein abundance). We calculated the co-AI, a correlate between the codons and t-RNA 293 gene copy number (Refer Results). co-AI for HKG is 0.350 and predecessor strain is 0.304 294 (Refer Results, Table 1 ). The co-AI values were also calculated for the successors. It was found 295 that all the successors have approximately the same values and is very near to the predecessor 296 strain (Table 1) . This strongly confirms a very close proximity of the codons in SARS-CoV-2 297 with the human t-RNA pool. The results signify that the codon composition of the SARS-CoV-2 298 is highly optimized with the anticodons in Homo sapiens. Here, it can be propounded that even 299 under the mutations SARS-CoV-2 shows high codon optimization, hence, during its infection in 300 the host the expression might be higher than other host genes. It's the mRNA stability which is a determiner of the ribosomal abundance during translation. Greater stability of the mRNA results in a higher ribosomal abundance leading to higher 305 translational efficiency(13). There is a positive correlation between the ORFs' stability and CAI 306 in HKG (Refer Results; Table S2 . CoV-2 which implies that with increasing stability of the ORFs the gene expression level 308 increases. It was seen that SARS-CoV-2 holds a stronger and significant correlation between 309 ORFs' stability and CAI, compared to the human HKG, which indicates the virus's capacity of 310 expression to be greater than that of humans. Viruses have a very high rate of mutation and in order to examine the scope of plausible codon 312 changes, synonymous codon randomization study was carried on the predecessor strain. This was 313 an in-silico study; simulating synonymous mutations (Refer S1.1, Supplementary materials and 314 methods; Results; Table S2 .4, Supplementary tables). As the codon composition in each 315 randomization is identical to the predecessor strain, CAI values will also be identical amongst all 316 the randomized sequences and the predecessor strain. Nonetheless, nucleotide positions have been swapped in the randomized sequences and they will show changes in their folding free 318 energies that will influence the secondary structures and hence the ribosomal abundance. Previous study has established that native mRNAs have the highest stability compared to their 320 in-silico randomized variants (37). On the contrary we found that 50% of the synonymously 321 randomized ORFs exhibited folding free energy maxima greater than their native ORFs ( Figure 322 3). We tested for the significance of the difference between the initial (native) and stability 323 maxima for the randomized sequences through t-test and found it to be significant (Refer anticodons with a marginally lower t-RNA copy number (Table 2 ). The result clearly states that, 342 at the genomic level SARS-CoV-2 is thoroughly tuned to be highly compatible with the 343 translational machinery of its host i.e. Homo sapiens. The findings presented here reveal a strong universal connection between the codon usage 348 patterns in the SARS-CoV-2 with Homo sapiens t-RNA pool. Even with a high mutation rate the 349 choice of the codons has a strong inclination for the high copy number of the cognate and near- is seen (13). It is interesting to find that the virus even with a higher bias than the human HKG 354 has a considerable codon optimization with the latter. As bias is not the only essential criteria for 355 heightened gene expression the virus has notably balanced both, bias and optimization. We also 356 found that the virus has undergone purifying selection and are exhibiting polymorphism with 357 changes in its ORF1ab polyprotein, surface glycoprotein, and nucleocapsid phosphoprotein. Forces that influence the evolution of codon bias Codon usage and tRNA content in unicellular and multicellular organisms