key: cord-0129891-rcguue89 authors: Hameed, Madiha; Majiid, Abdul; Khan, Asifullah title: FANCA: In-Silico deleterious mutation analysis for early prediction of leukemia date: 2021-07-19 journal: nan DOI: nan sha: 1d99569a48592e49a32f654fca7b25d5cc468e55 doc_id: 129891 cord_uid: rcguue89 As a novel biomarker from the Fanconi anemia complementation group (FANC) family, FANCA is antigens to Leukemia cancer. The overexpression of FANCA has predicted the second most common cancer in the world that is responsible for cancer-related deaths. Non-synonymous SNPs are an essential group of SNPs that lead to alterations in encoded polypeptides. Changes in the amino acid sequences of gene products lead to Leukemia. First, we study individual SNPs in the coding region of FANCA and computational tools like PROVEAN, PolyPhen2, MuPro, and PANTHER to compute deleterious mutation scores. The three-dimensional structural and functional prediction conducted using I-TASSER. Further, the predicted structure refined using the GlaxyWeb tool. In the study, the proteomic data has been retrieved from the UniProtKB. The coding region of the dataset contains 100 non-synonymous single nucleotide polymorphisms (nsSNPs), and 24 missense SNPs have been determined as deleterious by all analyses. In this work, six well-known computational tools were employed to study Leukemia-associated nsSNPs. It is inferred that these nsSNPs could play their role in the up-regulation of FANCA, which further leads to provoke leukemia advancement. The current research would benefit researchers and practitioners in handling cancer-associated diseases related to FANCA. The proposed study would also help to develop precision medicine in the field of drug discovery. Leukemia is associated with many other cancers that initially start from the bone marrow and rapidly grow a large amount of anomalous blood cells. Leukemia comes from an immature blast of cells that resulted in the abnormality of leukocyte cells [1] . Based on cell origin, function and appearance, leukemia is divided into different types. Four major types are 1) acute lymphocytic leukemia (ALL), 2) acute myelocytic leukemia (AML), 3) chronic myelocytic leukemia (CML), and 4) chronic lymphocytic leukemia(CLL). The chronic type of leukemia commonly found in children and growth gradually. Whereas, the growth rate of acute leukemia is rapid compared to chronic type and gets worse quickly [1] . Timely detection of novel biomarkers and curative targets is an efficient way for the treatment of leukemia. The prediction of novel leukemia related to amino acids helps in identifying the protein sequence that develop cancerous cells [2] . These eight family members do not share sequence similarities [5] . The composition of FANCA protein is based on1455 amino acids, and assembled together by a joint nuclear protein complex. It leads to multiple genetic irregularities of FANCA that become the cause of leukemia and other cancers. See It is widely accepted that the variation of FANCA genes can be associated to diseases [6] . The proteins found in the human's body are: a) FAA, b) FACA, and c) FANCA. It is hypothesized that to activate as a post-replication restoration of a cell cycle checkpoint [7] , FANCA protein repairs cross-link of DNA (Deoxyribonucleic acid ) and maintain a normal chromosome constancy that normalizes the variation of hematopoietic embryonic cells to developed blood cells. Non-synonymous single nucleotide polymorphism (nsSNP) is considered one of the most common types of interpretations [6] . Due to single-point mutation, Non-synonymous-SNP modification of an amino acid sequence in the protein has altered. SNPs are the mutual mutations inducing genomic alterations within the human's body. Previous studies showed that ~92% -93% of human genes signify at least one SNP [7] [8] . The mutations may be because of SNPs, duplications or deletions that affect multiple gene functions. The coding of nonsynonymous (nsSNPs) are deleterious due to change in the physical and chemical properties of amino acids coding linked with the particular mutation[9] [10] . The change in the nature of amino acids affects the protein translation polarity, stability, and accessibility. Disorder in the proteins eventually becomes the cause of its malfunctioning and molecular dynamics [11] . In recent era, the in-silico annotations are being used to evaluate the effect of SNPs on genomics and proteomics which subsequently helps us to early prediction of cancer and its grading. [12] . To address the challenges mentioned above, our contribution in this study are following (i) Identify the deleterious nsSNP (ii) Analyze the mutation on protein constancy to conclude whether the SNPs are deleterious or non-deleterious. (iii) The deleterious mutation analysis is categorized into four different type of amino acid groups that help to understand their functional and structural property based biological function. (iv) The multiple ligand sites are found that are useful for biological annotation. This protein act as diagnostic biomarkers in several inherited human diseases such as neural abnormalities, vascular conditions [16] , tumors [17] , etc. In current work, the proteomic data is obtained from the UniProtKB. We found the coding region contains 100 non-synonymous single nucleotide polymorphisms (nsSNPs) and 24 missense SNPs as deleterious and divide them into four amino acid groups. It is inferred that nsSNPs could help the up-regulation of FANCA [18] . In this study, the main contribution is the exploration of a new dimension for early prediction of Leukemia using deleterious mutation information in amino acid sequences. The proposed framework of deleterious mutation identification and structural prediction is reported in Figure 2 . The input data of FANCA protein, related to amino acid composition variations, is obtained from UniprotKB. The Basic Local Alignment Search Tool (BLAST) algorithm is employed to deduce functional and evolutionary relationships between the sequences of FANCA. Homologs of sequence queries are identified using the BLAST [19] . FANCA has ten transcripts in the database, but we chose only known transcripts. Non-synonymous SNPs of the recognized transcripts are selected. Afterwards, various tools are used to ascertain the functional and structural effects of all non-synonymous SNPs. These tools are based on diverse procedures to find the impact on protein sequences. In the proposed framework, for SNPs annotations, five different tools PolyPhen2 [20] , PROVEAN [21] , Mupro [22] , I-Mutant [23] and PANTHER [24] are employed for different purposes. For example, PolyPhen2 predicts the damage of missense mutations. PolyPhen2 predicts the damage of missense mutations. It uses iterative greedy algorithm to identify the sensitivity and specificity score of that mutation which helps to find out the severity of damage. On the other hand, Protein Variation Effect Analyzer (PROVEAN) tool uses the locationspecific score approach that takes Protein sequence and amino acid variations as input. A cutoff value -2.5 is set for the given binary prediction to achieve balanced accuracy. Amino acid substitutions with a value less than the threshold is considered deleterious. PROVEAN tool uses CD-HIT clustering algorithm for BLAST run. It returns the identity of 75% global sequence. The top 30 identified sequences are closely related to the groups of sequences from the supporting domain that are used to produce the prediction results. A delta orientation score is calculated for each supportive sequence [26] . PROVEAN calculates the prediction score by employing the following equation. where, Q indicates the query sequence used for the score calculation. However, symbols S represent the collected protein sequences. The symbol V indicates the actual variant of specific protein sequence. The final PROVEAN score is generated by averaging the accumulated score within and across the clusters. If the PROVEAN score is ≤ -2.5, the protein variant is predicted as "deleterious". Otherwise, the variant is expected to be "neutral". Mupro tool is used to ascertain the protein stability prediction from a sequence of a single site mutation [27] . This tool is empowered with Support Vector Machine (SVM) and Neural Network algorithms to predict the increase/decrease stability with a confidence score of singlesite mutation of amino acid. The original amino acid is given with mutation location and mutated amino acid of a protein sequence to implement the algorithm. To calculate the consistency of nsSNPs based on function and structure of target protein, I-MUTANT is another tool used for prediction that is based on SVM algorithm [28] [29] . I-Mutant sets Delta-Delta Gibbs (DDG) free energy value within the range of -0.5 to 0.5. This value is an indicator to identify how much a single site mutation is affecting protein consistency. The more negative value result in higher decrease of stability [30] . On the other hand, PANTHER tool finds mutation based on algorithms such as statistical modelling, Multiple Sequence Alignment (MSA), and Hidden Markov Model (HMM) [31] . It is a suite of tools to identify query sequence functions and analyzes large-scale experimental data with several statistical tests. Biologists widely use PANTHER to identify protein mutation stability. In the proposed framework, Iterative Threading Assembly Refinement (I-Tasser) is used for 3D structure prediction of FANCA and its biological functions based on amino acid sequences[32] [33] . Functionality of I-Tasser is based on three steps: a) Iterative structure assembly, b) structural template identification, and c) structure-based function annotation. The confidence score (C-Score) is the selection criteria of the model [34] . The high value of C-score indicates the better model. FANCA structure evaluated for both wild-type and mutated protein [35] . The protein model is refined using GalaxyWeb tool. It takes Protein Data Bank (PDB) format as input file for refinement of models [36] . GalaxyWeb employed Z score value of HHsearch results. HHSearch algorithm is associated with three well-known search BLAST, PSI-BLAST, HMMER[36] [37] . These model are further used to compare different databases to identify the pairwise sequences, and to further improve the quality of global and local structure quality [38] . The re-ranking is the accumulated result of Z-score. It can be obtained from HHsearch sequence score Zss and Zseq. MetaServer approach named as COACH, based on the combination of ligand-binding site and multiple function annotation, is the base of COFACTOR algorithm, TM-SITE & S-SITE program used to find out ligand binding sites. The particular ligands with higher C-score highlighted more confidence to specify consistent prediction. In this section, first, we will analyze the mutation results derived using above mentioned five S/w tools. Identify the deleterious nsSNP. The multiple ligand sites are found that are useful for biological annotation. The deleterious mutation analysis is categorized into four different type of amino acid groups that help to understand their functional and structural property based biological function. The multiple ligand sites are found that are useful for biological annotation The damage of missense mutations computes the likelihood of submitted variants based on obtaining results. A higher value with high sensitivity and specificity indicates the higher damaging effects of missense mutation. In Figure 3 , the black vertical line denotes the Q387V mutation predicted with possible damage score, sensitivity, and specificity values of 0.9468, 0.80, and 0.95, respectively. Figure shows the range of score in between 0-1. Table 2 indicates the higher deleterious prediction score of -4.513 for variant Q387V. and W209T computed as deleterious with score -2.707 and -6.856, respectively. Delta-Delta Gibbs(DDG) free energy value within the range of -0.5 to 0.5 is used to identify how much a single site mutation is affecting protein consistency. The more negative value of -1.87 result in less stability, with pH level 25, indicate the alarming condition of protein mutation. For 3D structure prediction of FANCA is predicted and biological functions based on amino acid sequences is also analyzed. FANCA structure evaluated for both wild-type and mutated Root means square and all relevant results are used to refine the FANCA structure. RMSD of a ligand is necessary to check if it is stable in the active sites and identify possible binding modes. All the value associated with the refined model of FANCA protein structure initial values are given in the first row and the comparison of obtain value shows that model c in Figure 7 has the highest RMSD of value 0.501that indicates that model c have strong binding site. This inferred that model c is good for precision medicine implementation and help in leukemia treatment. Same as GDT-HA, Mol probility, clash score, poor rotamers and rama favored have their influence in the protein structure refined model. Variation in their values effects the models structure accordingly that is mention in table below in detail. Further studies, nsSNPs among all other variants were chosen. More than 100 mutations with four amino acid groups detected in FANCA protein, 24 of them are identified found deleterious. Table 2 reports the details of four amino acid group that were analyzed as deleterious. Chemical properties divide AA into four group Basic amino acid group, Polar amino acid group, Nonpolar amino acid group and acidic amino acid group. The deleterious mutation is categorized into four different type of amino acid groups that help to understand their functional and structural property based biological function. There are three variant lies in Basic group of amino acid properties. First mutation in E38R glutamic acid is substituted with arginine that changes its group from acidic to basic amino acid property; this results in variation in charge of ligands and/or other residues. This mutation become the cause to abnormal folding [44] . In transformation, L406R leucine substituted with arginine at location 406. This deleterious mutation increases its iconic bond that becomes the cause of malfunctioning and disturbs its particular conformation [58] . Third mutation P597R here proline replace with arginine at location 597 with and overall charge of +1 at physiological pH. This deleterious mutation causes abnormal folding the functionality of core protein with the damaging score of 0.993 [63] . Portion of side chain in amino acid group that contain negative charge at certain pH value is acidic group and there are 6 out of 24 mutation lies in acidic group. G66Q Glycine amino acid is substituted with glutamine at location 66.This mutation loss the polarity into acidic property that causes the loss of charge; hence, the molecular interaction is also lost [49] . In mutation, however, in mutation V77D Valine amino acid substituted with aspartic acid at location 77 that become the cause of external interaction. This mutation is also cause of loss hydrophobicity that reduces oxygen affinity [50] . Mutation A403D alanine substituted by aspartic acid at location 403. This mutation causes the loss of hydrophobic interaction. It also reduces the toxicity and activity of the enzyme [57] . In the third mutation L587E here leucine is substituted with glutamic acid at location 587.This mutation becomes the cause to loss of core protein hydrophobic interaction [62] . Mutation L1319Q leucine replaced with glutamine at location 1319. This mutation becomes the reason to abolish the function and proper binding of protein by entering acidic group [68] . And the last mutation V566N valine substituted with asparagine at location 566. This mutation becomes the cause of destabilization of local conformation and a significant decrease in some specific activities [59] [60]. Uncharged amino acid group where the side chain in this group is possess a spectrum of a functional group. First of its mutation where H54S histidine is replaced with serine at location 54 and change its properties from basic to acidic that causes low surface tension at the core of protein would be vanished [41] . However, in mutation, R49G arginine is substituted by glycine at location 49. Arginine lies in basic group of amino acid and after this mutation serene comes in group of acidic activate their side chain that have carboxylic acid group whose pKa's are low enough to lose proton and becoming negative charged in the process. The transformation would break hydrogen bonds and/ or result in improper folding [42] [43] . While mutation W209T Amino acids tryptophan is substituted with threonine at location 209, which becomes the cause of loss of hydrophobic interactions [51] . Non polar side chains consist mainly of hydrocarbon. Any functional groups they contain are uncharged at physiological pH and are incapable of participating in hydrogen bonding that loss the polarity of the amino acid group and raise the hydrogen bonds in it. Like in mutation Y35M tyrosine is replaced with methionine that loss the charged at physiological pH of that causes the failure of hydrogen bonds and disturbs correct folding. Nonpolar property have negative charge comes increased [45] . C625A cysteine substitute with alanine at location 625.This damaging mutation effects on the ligands charge [64] . However, G638I Amino Acid glycine is replacing with leucine at location 638.This deleterious mutation become the cause to loss of positive charge [65] . Mutation L43A occur by changing leucine into alanine at location 43. This mutation can interrupt its particular conformation and revoke its function by stay still nonpolar group its carboxylic acid remains same. [46] . In mutation, V378I valine amino acid replaces with isoleucine at location 378, which becomes the cause of mutation and disrupts local conformation, leading to loss of interaction [52] . In mutation, V397L valine amino acid returns with leucine at location 397, which becomes the cause of upsetting the region's stability and causes the loss of interaction [54] . Mutation P401A proline substituted by alanine at location 401. The substitution can interrupt its particular chemical structure and eliminate its function. Proline mutation into alanine affects the reputed integral membrane protein segments 6 and 10 responsible for glucose transportation GLUTl [55] [56] . In mutation, F1108A phenylalanine substituted with alanine at location 1108. This deleterious mutation become the causes of loss of hydrophobic interaction [67] . The third basic group mutation is also H780A histidine substation with alanine at location 780, this mutation changes in nonpolar group and becomes the reason to disturb domain by losing hydrophilicity [66] . Mutation I573V Isoleucine substituted with valine at location 573 becomes the causes the loss of interaction. Secondary structures of α helice is not preferred by this type of residue [61] . In mutation E63L Glutamic acid is being mutated in leucine at location 63. It changes its acidic properties into nonpolar that activate its carboxylic acid. It becomes the cause of metabolism functionality and may also cause intellectual disability [47] [48] . Mutation Q387V glutamine replaced with valine at location 387 causes the loss of proper folding and conduce loss of hydrogen bond [53] . Table 2 , it is observed that L1319Q and W209T is most deleterious mutation in FANCA protein. Ligand (L) binding sites have a significant role in protein functionality. Mutation in protein sequence disrupts the collaboration between protein ligands and transmembrane. Figure The particular ligands with higher C-score highlighted more confidence to specify consistent prediction. The ligand R1P has a higher C-score than the remaining ligands, and its potential binding sites. The above mention table shows that ligand binding sites 492 and 517 have more C-score value of 3 than others. This mean it is stronger site for residue binding. These binding sites useful in precision medicine. In this modern era, proteomic analysis for disease classification related to single nucleotide polymorphisms (SNPs) is tremendously inspiring. Bioinformatics helps to reduce the genotyping cost that increases omics association studies. We conducted in-silico study to predict the FANCA related nsSNPs disease and multiple ligand sites find out to analyze the biological annotation of ligand binding sites. We analyze the association of nsSNPs with leukemia. In this study, 24 mutations are found that have been predicted as deleterious by all structural and sequential prediction bioinformatics tools. The deleterious mutation analysis is carried out from four different type of amino acid groups Basic, Polar, Non-Polar, Acidic to understand their physiochemical property. It is inferred that L1319Q and W209T both changes their amino acid groups one moves into acidic group where other transform into polar group that are either charged at physiological pH are major causes of leukemia. The results showed that these nsSNPs affects the functional and structural mechanism of FANCA, which plays a significant role in acute myeloid leukemia. It increases the chance of mitotic cell division, specifically in the bone marrow. The ligand-binding site prediction have a vital role in the precision medicine for leukemia. So, this study will be a valuable addition to the research world to predict the consequence of nsSNPs of FANCA in the up-regulation of leukemia. It is anticipated that the prediction of deleterious mutations in the genomic functionality helps in the early detection of leukemia. The proposed study would also help to develop precision medicine in the field of drug discovery. Acute leukemia classification by ensemble particle swarm model selection Structural basis of the fanconi anemia-associated mutations within the FANCA and FANCG complex The Fanconi anemia protein FANCF forms a nuclear complex with FANCA, FANCC and FANCG Fanconi Anemia Proteins FANCA, FANCC, and FANCG/XRCC9 Interact in a Functional Nuclear Complex The causes of Fanconi anemia in South Asia and the Middle East: A case series and review of the literature Machine learning applications in cancer prognosis and prediction Alanine, arginine, cysteine, and proline, but not glutamine, are substrates for, and acute mediators of, the liver-α-cell axis in female mice Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine Computational and Pathway Analysis of nsSNPs of MED23 Gene Involved in Human Congenital Diseases Development of a Bayesian belief network model for personalized prognostic risk assessment in colon carcinomatosis In-silico analysis of non-synonymous-SNPs of STEAP2: To provoke the progression of prostate cancer Prevalence and Associated Risk Factors of Mortality Among COVID-19 Patients: A Meta-Analysis PROVEAN web server: A tool to predict the functional effect of amino acid substitutions and indels Disease-related mutations predicted to impact protein function Computational Analysis and Polymorphism study of Tumor Suppressor Candidate Gene-3 for Non Syndromic Autosomal Recessive Mental Retardation Cancer prevalence in Pakistan: Meta-analysis of various published studies to determine variation in cancer figures resulting from marked population heterogeneity in different parts of the country Rationales, design and recruitment of the Taizhou longitudinal study Current methods of mutation detection Large-scale predictions of gram-negative bacterial protein subcellular locations Lung Cancer Classification Models Using Discriminant Information of Mutated Genes in Protein Amino Acids Sequences In silico prediction of deleterious single nucleotide polymorphisms in human interleukin 27 (IL-27) gene Predicting the functional and structural consequences of nsSNPs in human methionine synthase gene using computational tools A General Sequence Processing and Analysis Program for Protein Engineering In silico prediction of a disease-associated STIL mutant and its affect on the recruitment of centromere protein J (CENPJ) LOCSVMPSI: A web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST Tailored Mutants of Phenylalanine Ammonia-Lyase from Petroselinum crispum for the Synthesis of Bulky l-and d-Arylalanines Glutamine-451 Confers Sensitivity to Oxidative Inhibition and Heme-Thiolate Sulfenylation of Cytochrome P450 4B1 Prediction of cancer outcome with microarrays: A multiple random validation strategy An SVM-based system for predicting protein subnuclear localizations Autoinflammatory mutation in NLRC4 reveals a leucine-rich repeat (LRR)-LRR oligomerization interface Deletion of ribosomal protein genes is a common vulnerability in human cancer, especially in concert with TP 53 mutations Using SIFT and PolyPhen to predict lossof-function and gain-of-function mutations Role of cancer immunology in chronic myelogenous leukemia Co-mutation of histone H2AX S139A with Y142A rescues Y142A-induced ionising radiation sensitivity Identification of rare heterozygous missense mutations in FANCA in esophageal atresia patients using next-generation sequencing GalaxyWEB Prediction of protein cellular attributes using pseudo-amino acid composition Prediction of protease types in a hybridization space Binary tomography reconstruction based on shape orientation In vitro and in vivo activity of gallic acid and Toona sinensis leaf extracts against HL-60 human premyelocytic leukemia The protein histidine phosphatase LHPP is a tumour suppressor Acquisition of the T790M resistance mutation during afatinib treatment in EGFR tyrosine kinase inhibitor-naïve patients with non-small cell lung cancer harboring EGFR mutations Evolution of proline biosynthesis: Enzymology, bioinformatics, genetics, and transcriptional regulation Transcriptomic and metabolomics analyses reveal metabolic characteristics of L-leucine-and L-valineproducing Corynebacterium glutamicum mutants Cytoplasmic localization of proline, glutamic acid, leucinerich protein 1 (PELP1) induces breast epithelial cell migration through up-regulation of inhibitor of κb kinase ϵ and inflammatory cross-talk with macrophages Alanine Scanning Effects on the Biochemical and Biophysical Properties of Intrinsically Disordered Proteins: A Case Study of the Histidine to Alanine Mutations in Amyloid-β 42 Open Gate of Corynebacterium glutamicum Threonine Deaminase for Efficient Synthesis of Bulky α-Keto Acids Structural details of the enzymatic catalysis of carbonic anhydrase II via a mutation of valine to isoleucine Simultaneous analysis of D-alanine, D-aspartic acid, and D-serine using chiral high-performance liquid chromatography-tandem mass spectrometry and its application to the rat plasma and tissues Leucine Rich Repeat Proteins: Sequences, Mutations, Structures and Diseases Role of the conserved valine 236 in access of ligands to the active site of Thermus thermophilus ba3 cytochrome oxidase Physician's Guide to the Diagnosis, Treatment, and Follow-Up of Inherited Metabolic Diseases Proline, glutamic acid and leucine-rich protein-1 is essential for optimal p53-mediated DNA damage response Functional characterization of the alanine-serine-cysteine exchanger of Carnobacterium sp AT7 Production of growth factors by malignant lymphoma cell lines Familial hypomagnesaemia, Hypercalciuria and Nephrocalcinosis associated with a novel mutation of the highly conserved leucine residue 116 of Claudin 16 in a Chinese patient with a delayed diagnosis: A case report RBPro-RF: Use Chou's 5-steps rule to predict RNA-binding proteins via random forest with elastic net iPSW(2L)-PseKNC: A two-layer predictor for identifying promoters and their strength by hybrid features via pseudo K-tuple nucleotide composition The management of chronic myeloid leukaemia -A case history Identification of Protein-Ligand Binding Sites by Sequence Information and Ensemble Classifier Multi-parametric evolution of conditions leading to cancer invasion in biological systems P2X4 receptor-eNOS signaling pathway in cardiac myocytes as a novel protective mechanism in heart failure Detection and localization of surgically resectable cancers with a multi-analyte blood test Structural basis of the Fanconi anaemia-associated mutations within the FANCA and FANCG complex Gene expression inference with deep learning Molecular characterization of a putative serine protease from Trichinella spiralis and its elicited immune protection Structural basis of the fanconi anemia-associated mutations within the FANCA and FANCG complex Non-invasive early detection of cancer four years before conventional diagnosis using a blood test