key: cord-0075907-ncl4029v authors: Mitra, Debanjan; Pal, Aditya K.; Das Mohapatra, Pradeep Kr. title: Intra-protein interactions of SARS-CoV-2 and SARS: a bioinformatic analysis for plausible explanation regarding stability, divergency, and severity date: 2022-03-21 journal: Syst Microbiol and Biomanuf DOI: 10.1007/s43393-022-00091-x sha: 67125aa01f93c1e586d705d4fbe470c831c67cb3 doc_id: 75907 cord_uid: ncl4029v The current nightmare for the whole world is COVID-19. The occurrence of concentrated pneumonia cases in Wuhan city, Hubei province of China, was first reported on December 30, 2019. SARS-CoV first disclosed in 2002 but had not outspread worldwide. After 18 years, in 2020, it reemerged and outspread worldwide as SARS-CoV-2 (COVID-19), as the most dangerous virus-creating disease in the world. Is it possible to create a favorable evolution within the short time (18 years)? If possible, then what are those properties or factors that are changed in SARS-CoV-2 to make it undefeated? What are the fundamental differences between SARS-CoV-2 and SARS? The study is one of the initiatives to find out all those queries. Here, four types of protein sequences from SARS-CoV-2 and SARS were retrieved from the database to study their physicochemical and structural properties. Results showed that charged residues are playing a pivotal role in SARS-CoV-2 evolution and contribute to the helix stabilization. The formation of the cyclic salt bridge and other intra-protein interactions specially network aromatic–aromatic interaction also play the crucial role in SAS-CoV-2. This comparative study will help to understand the evolution from SARS to SARS-CoV-2 and helpful in protein engineering. Disease caused by SARS-CoV-2 has been recognized as Corona Virus Disease 2019 . SARS-CoV first came out in the Guangdong province of China in 2002 and had outspread into five countries infecting 8098 people and 774 deaths, having a mortality rate of 11% [1] . After that, in 2012, MERS-CoV appeared in the Arabian Peninsula and had outspread into 27 countries, infecting a total of 2494 individuals and took 858 lives with a mortality rate of 34% [2] . Recently SARS-CoV-2 has been elevated in Wuhan city, Hubei province of China, in December 2019. Till now (11.01.2022) , there are over thirty core cases of COVID-19 and over 5.4 million deaths (mortality rate around 3.40%) have been reported to affect 222 countries globally. Currently, a new variant of COVID-19 named "Omicron" are also reported in many countries with high transmission rate. On March 11, 2020 , the World Health Organization announced the COVID-19 pandemic, a public health emergency of global concern. All age's people can catch this viral infection, but immune-compromised people having co-morbidities are most vulnerable. Propensity of age, males with chronic diseases (like-diabetes, heart disease, cancer, etc.) are higher vulnerable than other groups of people [3] . This virus can be easily transmitted through the droplets generated when coughing and sneezing by the infected people [4] . These infectious droplets can be spread up to 1-2 m and stay on surfaces. This virus can survive on metal surfaces for several hours, even days, in favorable conditions but can be destroyed by disinfectants like hydrogen peroxide, sodium hypochlorite, etc. [5] . The incubation period varies from 2 to 14 days. Few common clinical symptoms are fever (except asymptomatic cases), dry cough, sore throat, fatigue, headache, breathlessness, sudden loss of smell and taste. Without proper treatment, this disease can cause pneumonia, respiratory failure and even death. Generally, after --week recovery started. It has been observed in patients that the progression of this disease increases the release of cytokines including interleukin (IL)-6 and IL-10, whereas the levels of CD4 + T and CD8 + T are reduced [6] . There is no approved treatment for COVID-19 but anti-viral drugs such as Remdesivir, Tocilizumab are in use for treatment [7] . Also, many chemical compounds and bioactive compounds appear by molecular docking studies as a drug in treatment of COVID-19 [8, 9] . Coronavirus is an enveloped virus having a positive single-strand RNA genome, and they have spike proteins on the surface with a size of 60-140 nm [10] . There are four subtypes such as alpha, beta, gamma, and delta coronaviruses. Most of the highly pathogenic viruses are severe acute respiratory syndrome coronavirus (SARS-CoV), Middle East respiratory syndrome coronavirus (MERS-CoV), and SARS-CoV-2; and belongs to β-coronavirus [11] . Generally, the β-coronavirus genome contains six open reading frames (ORFs); first ORFs (ORF1a/b) are in two-thirds of the whole genome and encode 16 nonstructural proteins (nsps). There is one frameshift between ORF1a and ORF1b, which produces two polypeptides, pp1a and pp1ab. Main protease (M pro ) and chymotrypsin-like protease (3CL pro ) are involved in the processing of these polypeptides [12, 13] . Other ORFs of the genome near the 3′-terminus encode the four main structural proteins, spike glycoproteins, membrane, envelope, and nucleocapsid proteins [14] . Genome analysis of SARS-CoV-2 revealed that there are 79.5% and 97% of similarities with the whole genome sequences of SARS-CoV and bat SARS-CoV, respectively [3] . SARS-CoV-2 enters the host respiratory mucosa by binding with the receptor of angiotensin-converting enzyme 2 (ACE2) with its spike glycoproteins [15] . A recent study has shown that SARS-CoV-2 binds with ACE2 with a tenfold higher affinity compared to SARS-CoV [16] . The basic reproduction number (R 0 ), the average number of secondary infections produced by patients, is between 2.47 and 2.86 for SARS-CoV-2, whereas the R 0 value of SARS-CoV is 2.2-3.6, and 2.0-6.7 for MERS-CoV [17] [18] [19] . These results indicate that SARS-CoV-2 has comparatively high transmission ability than other coronaviruses. Sequence analysis of SARS-CoV 2, SARS-CoV, and other SARS-related coronavirus (SARSr-CoV) spike glycoproteins showed that four amino acids are inserted in the positions of 681-684 between S1 and S2 subunit of SARS-CoV-2 [20] . SARS-CoV ORF 3b, ORF 6, and N proteins inhibit the expression of beta interferon (IFN-β) [21] . The envelope (E) protein in coronavirus is a small membrane protein that has several functions in virion assembly and ion-channel activity, through which it can interact with the host [22] . With the unavailability of anti-viral drugs for nCoV, society demands sincere efforts in drug design and discovery for COVID-19 [23, 24] . Since 2002, SARS has present on this earth. But it creates a dangerous effect and makes a pandemic situation after 18 years. Why? Why is this virus so harmful to us? What are the fundamental differences between SARS-CoV-2 and SARS? How evolution makes them stronger than SARS? How can they gain stability in such extreme environments? Do intra-protein interactions play a vital role in SARS-CoV-2? This study will help to find out all those questions. A detailed investigation of the sequences and structures of SARS-CoV-2 was performed with reference to the old SARS. Four types of SARS-CoV-2 and SARS reviewed protein sequences, i.e., spike proteins, membrane proteins, nucleoproteins, and ORF proteins (ORF 3, ORF 6, ORF 7, ORF 8, and ORF 9) were considered in this study. All annotated protein sequences of SARS-CoV-2 and SARS were retrieved from the UNIPROT [25] database. The crystal structures of SARS-CoV-2 and SARS proteins were retrieved from the RCSB protein database (PDB) [26] . The structure was chosen based on some criteria of crystal structures. The protein sequences were subjected to multiple sequence alignment (MSA) with the help of CLUSTAL Omega [27] . Both block and non-block FASTA [28] formats of the sequences were analyzed. Block of the sequence was prepared by BLOCK Maker [29] from MSA. Both non-block and block formats were analyzed by ProtParam server [30] [31] [32] and ProtScale server [33] for calculation of physicochemical properties likes amino acid composition, GRAVY, aliphatic index, bulkiness, polarity, etc. The value of ORF protein analysis is the average of all ORF (ORF 3, ORF 6, ORF 7, ORF 8, and ORF 9). The total amount of disorderforming residues (i.e., E, P, K, S) and order forming residues (i.e., I, F, W, Y) are calculated from amino acid compositions based on previous reports [34, 35] . Intrinsic disorder regions of protein were analyzed by DisEMBL [36] server. SARS-CoV-2 protease (5R80) and SARS protease (2H2Z) were extracting from RCSB PDB for structural comparison. All structured were minimized in 1000 steps using UCSF Chimera with forcefield [37] . Analyses of the secondary structure were done by CFSSP [38] server to find the amino acid abundance in coil, helix, sheet and turn. Number of salt bridges were extracted by WHAT IF server [39] . Intraprotein interactions were determined by Protein interaction calculator [40] and Arpeggio [41] . Free solvation energy was calculated by ProWaVE server [42] . Surface area and volume were determined by the CASTp [43] . Phosphorylation sites of protein were identified by the NetPhos server [44] . Protein mutations were analyzed by the DUET [45] . Here D, E, H, R, K amino acids were considered as a charged residues and C, S, T, N, Q, Y, W as uncharged polar residues. Amino acid compositions were calculated from the non-block format, whereas block format was used to calculate disorder-forming residues, order forming residues, bulkiness, aliphatic index (AI), and polarity. GRAVY (grand average of hydropathy) is calculated by adding the hydropathy value [46] for each residue and dividing by the length of the protein sequence. Is there a preference for amino acids in SARS-CoV-2 relative to SARS? To findout the answer, all physicochemical properties were calculated. Spike proteins showed higher abundance (Fig. 1 ) of charged residues (except D) in SARS-CoV-2. Polar residues in spike proteins showed higher quantity (except T, Fig. 1 Comparative analysis of physicochemical properties like amino acid compositions, disorder-forming residues, order forming residues, GRAVY, aliphatic index of spike proteins (SP), nucleopro-teins (NP), membrane proteins (MP), ORF proteins (OP) from SARS-CoV-2 (red bar) and SARS (green bar) W) in SARS-CoV-2. In nucleoproteins of SARS-CoV-2 D, K and R showed higher abundance and E, H showed lower abundance as charged amino acids. Polar residues in nucleoproteins also showed higher plenty (except T, N) in SARS-CoV-2. Surprisingly C is absent in both groups of sequence in nucleoproteins. Other proteins, i.e., membrane proteins and ORF proteins, showed almost similar abundance with those previous results. Polar residues also help proteins to tolerate temperature [47] . Number of disorderforming residues has higher abundance in SARS-CoV-2 than SARS. The number of order forming residues has lower abundance in SARS-CoV-2 than SARS (in case of spike and ORF proteins). The higher number of disorderforming residues in SARS-CoV-2 indicates that it can easily increase pathogenicity or virulence. Proline may give a preadaptive advantage by enhancing antioxidant defenses, which in the setting of disease would extend cell viability, raise colonization efficiencies, and enhance virulence [48] . It was also reported that disorder-forming residues like S and E, are responsible for increase pathogenicity [49, 50] . The aliphatic index is high in every SARS-CoV-2 protein. Increased value of the aliphatic index in SARS-CoV-2 proved that SARS-CoV-2 is more thermally stable than SARS [51] . The polarity of those proteins showed slightly higher values in SARS-CoV-2 than SARS (Fig. 2) . Due to the latter, bulkiness is also high in SARS-CoV-2 than SARS. The high value of bulkiness in SARS-CoV-2 indicates that they need more extended heating periods in hydrolysis [52] . They can tolerate heat better than SARS. The Kyte-Doolittle hydrophobicity scale suggests that the SARS-CoV-2 is hydrophilic in nature (Fig. 3) . The lower value of GRAVY (except nucleoproteins) indicates the hydrophilic nature of SARS-CoV-2. The hydrophilic nature of SARS-CoV-2 gives a clue that it can quickly interact with water or aqueous medium and spread easily than SARS [53, 54] . The intrinsic disorder regions are very much high in SARS-CoV-2 than SARS. A high abundance of intrinsic disorder regions of SARS-CoV-2 indicates that it helps in protein folding of SARS-CoV-2 and will interact more with other proteins than SARS. Many intrinsically disordered proteins (IDPs) have been found to undergo a disorder-to-order transition, implying that their folding processes are inherently distinct from those seen in globular proteins. After binding to natural partners, certain IDPs can fold into a unique 3D form. Many IDPs/IDPRs can fold when they engage with their binding partners and have various binding specificities, allowing them to participate in one-to-many and many-to-one interactions [34, [55] [56] [57] [58] [59] [60] . At various levels, viral IDPs mediate successful infection and govern pathogenesis. Because of their widespread engagement in host-pathogen mediated regulators and great prevalence in viral proteomes, virus IDPs are being investigated as possible therapeutic targets [61] . The building blocks of proteins, i.e., amino acids, are found in four positions of secondary structure, i.e., coil, helix, sheet, and turn. Charged residues showed higher abundance in every position (turn, helix, coil, and sheet) of SARS-CoV-2 (Table 1) than SARS. Charged residues showed higher abundance within the helix of both proteins. The introduction of higher number of charged residues in the helix, resulted in more resistant proteins to the acidic environment or temperature denaturation which helps in increasing the stability [62, 63] . Hydrophobic residues have higher abundance in SARS (except coil) than SARS-CoV-2. Polar residues also showed higher abundance in every secondary structure position of SARS-CoV-2 than SARS. It was already proved that polar amino acids on the surface can influence helix formation and increase its stability [64] . However, the highest abundance of polar residues was found in sheet of both SARS-CoV-2 and SARS. More than 50% of residues were present in sheet of SARS, whereas SARS-CoV-2 have 39.33% and 31.54% residues on sheet and helix. So, SARS-CoV-2 increase amino acids propensity in helix to increase its stability. Salt bridges have a significant effect on protein stability [65] [66] [67] [68] Charged residues are participating in the formation of salt bridges. Usually, two types of salt bridges are found in proteins, i.e., isolated salt bridge and network salt bridge. The increasing number of charged residues of SARS-CoV-2 indicates that charged residues might enhance salt bridge formation to gain more stability. Other intra-protein interactions like, metal ion binding site [69] , aromatic-aromatic interactions [70] [71] [72] also help in protein stabilization. SARS-CoV-2 has large pocket area than SARS (Fig. 4A, B) , which gives it more protein-protein or protein-ligand interactions possibilities ( Table 2 ). The volume of the protein is also high in SARS-CoV-2 than in SARS. Protease from SARS-CoV-2 possess 9 isolated salt bridges and 1 network salt bridge, whereas SARS protease has 8 isolated and 1 network salt bridge. The result indicated that SARS-CoV-2 is highly stabilized by the help of salt bridges. Though SARS-CoV-2 and SARS proteins have only one type of network salt bridge, but SARS-CoV-2 has gained a special engineered salt bridge (Fig. 4E, F) , which is cyclic in nature (R131-E290, K137-E290, R131-D197, K137-D197, R131-D289). Residue number 131R participated maximum time to form this cyclic salt bridge. Novel cyclic salt bridge might have a great role in its protein stability [68, 72] . Number of metal ion binding sites is also high in SARS-CoV-2 than SARS. These 3 metal ion binding sites contain dimethyl sulfoxide in COVID-19 virus. Free solvation energy is a thermodynamic factor that determines protein solvation or the nature of denaturation [73] . By this property, the rate of proteins denaturation can be determined. Solvation free energy is also high in SARS-CoV-2 than SARS which indicates that, the SARS-CoV-2 protein cannot be easily denatured in contact with the solvent. Aromatic-aromatic interactions showed high number in SARS-CoV-2 than SARS (Table 3) . Not only number, but some of the residues are participated in aromatic-aromatic interactions are forming a very long network, which has never been reported in any viral proteins. SARS-CoV-2 has 3 isolated and 2 network aromatic-aromatic interactions whereas SARS has only 9 isolated aromatic-aromatic interactions. The number of phosphorylation sites (Fig. 4C, D) in SARS-CoV-2 is 54, whereas the number of phosphorylation sites in SARS is 45. That means SARS-CoV-2 has higher number of phosphorylation sites than SARS. The high numbers of phosphorylation sites in SARS-CoV-2 increase the strength of protein-protein interactions and helps in stability [74] . Result of MSA of both structures showed some point mutations in SARS-CoV-2. So, their effect on protein stability has been analyzed. Total 11 mutations have been identified, among which 8 are favorable and 3 are unfavorable in SARS-CoV-2 (Table 4 ). Residue number 35, which was threonine of SARS substituted by valine in SARS-CoV-2 after mutation, contributes highest energy, i.e., − 2.24 kcal/mol. Residue S63N mutation contributed 2nd highest energy to SARS-CoV-2, i.e., − 1.16 kcal/mol. However, mutation on A46S, K180N and I286L showed destabilization in SARS-CoV-2 protein stability. 6 polar and 5 non-polar amino acids of SARS were mutated to 5 polar and 6 non-polar amino acids in SARS-CoV-2. The point mutations predicted in SARS-CoV-2 contributed about the total energy level of − 7.46 kcal/mol, which is the main driving force in more stability as compared to SARS. The acidic and basic residues are playing the significant role in evolution. The presence of charged residues in the helix region contributed increasing in protein stability. Increasing hydrophilicity helps SARS-CoV-2 to spread easily through air droplets. Disorder forming residues increase SARS-CoV-2 pathogenicity. High bulkiness of SARS-CoV-2 make them heat tolerate. The long network aromatic-aromatic interactions are the added advantage in protein stability. It is the first report of the presence of cyclic salt bridge and long network aromatic-aromatic interaction in viral protein. Increasing of metal ion binding sites and phosphorylation sites are also playing a crucial role in SARS-CoV-2 protein stability. The point mutations showed, how SARS-CoV-2 engendered itself to gain more stability. It is also a clue to stop SARS-CoV-2 infection severity by deleting those favorable mutant amino acid residues. Protein engineering helps us in this process. The findings of the present investigation contributed many more things, which are essential in drug and vaccine development against SARS-CoV-2. The authors declare that they have no conflict of interest. Epidemiology, virology, and clinical features of severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2; Coronavirus Disease-19) A review of coronavirus disease-2019 (COVID-19) Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study The first 2019 novel coronavirus case in Nepal Persistence of coronaviruses on inanimate surfaces and their inactivation with biocidal agents Characteristics of lymphocyte subsets and cytokines in peripheral blood of 123 hospitalized patients with 2019 novel coronavirus pneumonia (NCP) Triple combination of interferon beta-1b, lopinavir-ritonavir, and ribavirin in the treatment of patients admitted to hospital with COVID-19: an open-label, randomised, phase 2 trial Study of potentiality of dexamethasone and its derivatives against Covid-19 Remarkable effect of natural compounds that have therapeutic effect to stop COVID-19, in Recent Advances in Pharmaceutical Sciences evaluation and treatment coronavirus (COVID-19) Interspecies transmission and emergence of novel viruses: lessons from bats and birds Virus-encoded proteinases and proteolytic processing in the Nidovirales The molecular biology of coronaviruses The proteins of severe acute respiratory syndrome coronavirus-2 (SARS CoV-2 or n-COV19), the cause of COVID-19 The novel coronavirus 2019 (2019-nCoV) uses the SARS-coronavirus receptor ACE2 and the cellular protease TMPRSS2 for entry into target cells Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Modelling environmentally-mediated infectious diseases of humans, transmission dynamics of schistosomiasis in China Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study Transmission dynamics and control of severe acute respiratory syndrome Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein Severe acute respiratory syndrome coronavirus open reading frame (ORF) 3b, ORF 6, and nucleocapsid proteins function as interferon antagonists The coronavirus E protein: assembly and beyond Inhibition of SARS-CoV-2 protein by bioactive compounds of edible mushroom; a bioinformatics insight Demerdash O. Supercomputer-based ensemble docking drug discovery pipeline with application to COVID-19 UniProt, a hub for protein information The RCSB Protein Data Bank: new resources for research and education scalable generation of high-quality protein multiple sequence alignments using Clustal Omega Using the FASTA program to search protein and DNA sequence databases Automated construction and graphical presentation of protein blocks from unaligned sequences Protein identification and analysis tools on the ExPASy server. The proteomics protocols handbook Bioactive compounds as a potential inhibitor of colorectal cancer; an insilico study of Gallic acid and Pyrogallol Cold adaptation strategy of psychrophilic bacteria: an in-silico analysis of isocitrate dehydrogenase ExPASy: the proteomics server for in-depth protein knowledge and analysis Intrinsic protein disorder, amino acid composition, and histone terminal domains How disordered is my protein and what is its disorder for? A guide through the "dark side" of the protein universe Protein disorder prediction: implications for structural proteomics UCSF Chimera-a visualization system for exploratory research and analysis Chou and Fasman secondary structure prediction server WHAT IF: a molecular modeling and drug design program PIC: protein interactions calculator Arpeggio: a web server for calculating and visualising interatomic interactions in protein structures Structural and thermodynamic investigations on the aggregation and folding of acylphosphatase by molecular dynamics simulations and solvation free energy analysis CASTp 3.0: computed atlas of surface topography of proteins Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach A simple method for displaying the hydropathic character of a protein Investigation of the effect of temperature on the structure of SARS-Cov-2 spike protein by molecular dynamics simulations. Front Mol Biosci Role of proline in pathogen and host interactions A single amino acid change in rabies virus glycoprotein increases virus spread and enhances virus pathogenicity Serine-aspartate repeat protein D increases Staphylococcus aureus virulence and survival in blood Thermostability and aliphatic index of globular proteins Amino acid bulkiness defines the local conformations and dynamics of natively unfolded α-synuclein and tau Hydrophilicity of cavities in proteins. Proteins Struct Funct Bioinforms Atomic and residue hydrophilicity in the context of folded protein structures Predicting intrinsic disorder in proteins: an overview The alphabet of intrinsic disorder: II Various roles of glutamic acid in ordered and intrinsically disordered proteins Structural disorder and induced folding within two cereal, ABA stress and ripening (ASR) proteins. Sci Rep Intrinsic disorder, protein-protein interactions, and disease Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions Utilization of protein intrinsic disorder knowledge in structural proteomics Intrinsically disordered proteins of viruses: involvement in the mechanism of cell regulation and pathogenesis Stabilization of α-helix structure by polar side-chain interactions: complex salt bridges, cation-π interactions, and C-H… OH-bonds Stabilization of proteins by rational design of α-helix stability using helix/coil transition theory Helix stabilizing factors and stabilization of thermophilic proteins: an X-ray based study Defining the role of salt bridges in protein stability Salt-bridge networks within globular and disordered proteins: characterizing trends for designable interactions Protein stability governed by its structural plasticity is inferred by physicochemical factors and salt bridges Effects of salt bridges on protein structure and design Studying metal ion-protein interactions, electronic absorption; circular dichroism; and electron paramagnetic resonance Aromatic-aromatic interaction: a mechanism of protein structure stabilization Electrostatic interactions in aromatic oligopeptides contribute to protein stability Discovery of novel cyclic salt bridge in thermophilic bacterial protease and study of its sequence and structure Protein model structure evaluation using the solvation free energy of folding Phosphorylation in protein-protein binding: effect on stability and function