key: cord-0907743-xqgouabj authors: Christopher, Meera; Kooloth-Valappil, Prajeesh; Sreeja-Raju, Athiraraj; Sukumaran, Rajeev K. title: Repurposing proteases: An in-silico analysis of the binding potential of extracellular fungal proteases with selected viral proteins date: 2021-07-01 journal: Bioresour Technol Rep DOI: 10.1016/j.biteb.2021.100756 sha: ef94761b6e90222a02f838fe6ee1775fd0d1d6ba doc_id: 907743 cord_uid: xqgouabj Proteases have long been the target of many drugs, but their potential as therapeutic agents is a well-known, but under-explored area. Due to the heightened threat from new and emerging infectious agents, it is worthwhile to tap into the vast microbial protease resource to identify potential therapeutics. By docking proteases of the fungus Penicillium janthinellum NCIM 1366 with the proteins encoded by the SARS-CoV-2 virus, the enzymes that have the potential to bind with, and thereby degrade viral proteins were identified. In-silico docking analysis revealed that both fungal and commercially available proteases belonging to the A1A, M20A, S10, S8A and T1A families were able to bind the viral spike, envelope, ORF-7a and Nsp2 proteins (binding energy < −50 kJ/mol), thereby opening up the possibility of developing additional therapeutic applications for these enzymes. In keeping with global efforts to reduce dependency on potentially hazardous chemicals and chemical processes, there has been a gradual shift towards establishing biological and biochemical processes to replace them. Various enzymes, especially proteases, are now used in diverse industries like textiles, detergents, leather, feed, pharmaceuticals, bioremediation etc. Proteases are one of the most widely used classes of industrial enzymes, accounting for up to 20% 0f the enzymes marketed worldwide (Singhal et al., 2012) . Because of their versatility in performing both synthetic and degradative functions, proteases enjoy a ubiquitous distribution in nature. In particular, microbial proteases have excellent potential for commercial applications because of their robust nature and tolerance to harsh conditions. Microbes account for nearly two-thirds of the commercial proteases produced worldwide (Beg and Gupta, 2003) . Microbial proteases are usually extracellular, which simplifies their purification and other downstream processes (Nisha and Divakaran, 2014) . In comparison, production of proteases from animals and plants are more labor-intensive. Additionally, owing to their broad-spectrum biochemical variety, higher yield, lower time consumption, lesser space requirement, ease of genetic manipulations and cost-effectiveness, microbes are the preferred source for commercial proteases (Ali et al., 2016) . Among the microbes, Bacillus sp. are extensively studied for protease production on a large scale; other proficient producers include Pseudomonas and Streptomyces sp. Fungal species like Aspergillus, Penicillium, Rhizopus, Mucor and Endothia, have been studied thoroughly for the production of acid, neutral and alkaline proteases. In spite of usually having lower reaction rates and lesser heat tolerance than their bacterial counterparts, one of as a pretreatment for enhancing adenovirus-mediated cancer gene therapy (de Souza et al., 2015) . In case of viral infections, to date, nearly 20 different chemotherapeutic agents (that are mostly nucleoside analogs) have been approved for treatment via the inhibition of viral DNA synthesis/reverse transcription. These chemicals are used primarily for alleviating infections caused by herpes virus, the human immunodeficiency virus, respiratory syncytial virus and the influenza A virus (Burrell et al., 2017) . However, most of these agents have limited clinical efficacy, adverse side effects, and suboptimal pharmacokinetics, which results in the use of chronic therapy that in turn leads to the emergence of drug-resistant viral strains that limit subsequent treatment options. In the last 20 years, there has been seven significant viral threats -Nipah virus, SARS, MERS, Ebola, avian influenza, swine flu and now the SARS-CoV-2 mediated COVID-19. Estimates place about 60% of infectious diseases and 70% of emerging human infections as zoonotic in origin, with two-thirds originating in wildlife (Vorou et al., 2007) . Due to human encroachment on the natural world, experts worldwide agree that this pandemic will not be the last; there exists the ideal conditions for diseases from wildlife to spill over into humans and spread quickly around the world. To date, the global COVID-19 pandemic has caused 3.48 million deaths worldwide, including more than 300,000 deaths in India. The number of Indians who have/had infections stand at nearly 27 million. In addition to its devasting effects on healthcare systems worldwide, reports from the World Bank estimated that an outbreak of this scale could push about 49 million people into extreme poverty-almost half whom will be in Sub-Saharan Africa, with an additional 16 million in South Asia. In India, due to factors like where they live, where they work, high dependence on public services and limited savings and unavailability of insurance, it is estimated that 260 million people will be back in poverty due to the pandemic, from which approximately 40 million people will be in "extreme poverty" (Blake and Wadhwa, 2020) . Since recent technological advances have facilitated greater understanding of the essential viral enzymes, these proteins represent potential therapeutic targets. Because of the ease with each microbial proteases can be obtained, the recognition that proteases are an established class of safe and efficient drugs, and the fact that industrial scale processes already exist for the commercial production of many of them, it is worthwhile to explore their potential in degrading the viral enzymes, and thereby unearth additional therapeutic applications for these enzymes. The overall objective of this study is to identify microbial proteases that have the potential to bind with, and degrade, viral proteins. Since microbial proteases are grouped into 83 families based on structure, functionality and substrate specificity, and as each family contains hundreds of entries, it will be a herculean task to analyze representatives of each subgroup. Therefore, to reduce the size of the dataset, the proteins of the fungus Penicillium janthinellum NCIM 1366 (henceforth referred to as PJ-1366) have been used as a model group, and the proteins encoded by SARS-CoV-2 has been used as a representative of enveloped viral proteins. From the whole genome sequence of PJ-1366 obtained by paired end sequencing, genes were predicted using AUGUSTUS (Stanke et al., 2004) . From the list of predicted genes, proteins with signal sequences were identified using SignalP-5.0 (Almagro Armenteros et al., 2019). A database of proteases reported from Penicillium sp. was created using information available on the MEROPS database (Rawlings et al., 2018) . A sequence similarity search between the extracellular proteins of PJ-1366 and the custom peptidase database was performed using blastp. The resulting matches were annotated using the UniProt KB (Magrane and Consortium, 2011) . Any sequence not annotated as a protease was removed from the list. analysis. The structures of the proteins of the SARS-CoV-2 virus were obtained from the UniProt databank. Using PJ-1366 proteases as the receptor and SARS-CoV-2 proteins as the ligand, molecular docking based on shape complementarity principles was performed using PatchDock (Schneidman-Duhovny et al., 2005) with default parameters and clustering RMSD of 4.0. The docking solutions were refined and scored according to energy function using FireDock (Mashiach et al., 2008) . The structures of other microbial proteases belonging to the same families as PJ-1366 proteins with good binding potential (-50 to -80 kcal/mol) to viral proteins were selected from the MEROPS database, and their binding affinities to SARS-CoV-2 proteins were studied using PatchDock and FireDock. From the 37.6 Mbp genome of PJ-1366, 11,828 genes were predicted by AUGUSTUS. Out of these, 1007 sequences had putative eukaryotic signal sequences as per SignalP analysis. From the MEROPS peptidase database, a custom list of 2146 proteases from 49 Penicillium species was created. Blastp alignment gave 175 PJ-1366 proteins with significant similarity. Based on UniProt annotation, 109 sequences were selected which were described as proteases/hypothetical proteins. BlastKOALA analysis of these sequences annotated 17 sequences as Peptidases and Inhibitors, based on BRITE hierarchical analysis (Table 1 ) (Table 1_Here) From the orthology analysis, it was seen that some of the enzymes-such as tripeptidylpeptidase 1, cerevisin and cathepsin, are lysosomal components where their natural function is protein degradation (He et al., 2015; Sohar et al., 2013) , while others, like carboxypeptidase D, kexin and leucyl aminopeptidases, are involved in the pre-processing of proteins, especially hormones (Cawley et al., 2014; Fuller, 2013) . Interestingly, one of the J o u r n a l P r e -p r o o f Journal Pre-proof proteins matched to deuterolysin, which has long been used in the food industry, and is known usually to be thermostable (Maeda et al., 2016) . In the MEROPS database, proteases are grouped into 84 families according to their evolutionary relationship, under seven catalytic types: serine, aspartic, cysteine, threonine, glutamic acid and metallo-proteases, and asparagine peptide lyases (which catalyze via an elimination reaction rather than by hydrolysis). Using this information from MEROPS, the proteases/inhibitors of PJ-1366 were further sorted into different protease families based on structure-based classification. It was seen that serine proteases were the most prevalent (10 sequences), followed by metalloproteases (5 sequences) and one each of aspartic and threonine proteases. Proteolytic enzymes are generally classified either based on the site of their action (as exopeptidases and endopeptidases), or by the optimal pH in which they are active (as acid/neutral/alkaline proteases). Based on UniProt descriptions (Table 2) , it was seen that carboxypeptidases were the most prevalent. Since these enzymes generally are zinccontaining exopeptidases that remove single amino acids or dipeptides from the carboxyl end of oligopeptides, their applicability in destroying viral proteins may be limited. Consequently, proteases like the tripeptidyl peptidases, Penicillolysin (which is involved in degradation of proteins for nutrient uptake (Ichishima, 2004)), Aorsin and proteasome subunits are of more interest since, being endopeptidases, they can potentially disrupt viral protein structures by hydrolyzing peptide bonds in the interior of polypeptide chains. 3D models of the selected PJ-1366 proteases were obtained by homology modelling from the SWISS-MODEL server. While GMQE values of all models were >0, 11 models had QMEAN < -4.00, indicating lower quality. Also, it was observed that multiple models were modelled on the same template, the most common of which were 1ac5-which is the kex1 delta-p subunit, and 3edy-which is a tripeptidyl peptidase 1 from Homo sapiens (Fig. 1) . In yeast, the KEX1 protease is involved in apoptosis caused by defective N-glycosylation (Hauptmann and Lehle, 2008) , while TPP1 in humans is found in lysosomes to digest and recycle different types of molecules (Stumpf et al., 2017) . Fig.1 , it was seen that in most cases, the sequence identity was less than 50%. Better matches were obtained only for 3 sequences-ctg7180000009963.g312 (carboxypeptidase), ctg7180000014384.g300 (proteasome sub-unit) and ctg7180000015102.g232 (nexin). Nexin is a component of fibroblasts that links thrombin and plasminogen activator and mediates their binding to cells (Baker et al., 1980) Positive-strand RNA viruses like the SARS-CoV-2 virus are a group of related viruses that have positive-sense, single-stranded RNA genomes. The RNA genome acts as an mRNA that is translated into viral proteins using the host cell's ribosomes. Coronaviruses have the largest known RNA genomes, between 27 and 32 kilobases in length. Such viruses account for a significant fraction of known viruses. In humans and birds, these viruses are known to cause respiratory tract infections. Some of the coronaviruses cause mild illnesses in humans including some cases of the common cold, while more lethal varieties can cause SARS, MERS, and COVID-19. In the case of SARS-CoV-2, each virion is 50-200 nm wide and has four structural proteins-the S (spike), E (envelope), M (membrane), and N (nucleocapsid) proteins (Supplementary File 1). The N protein holds the RNA genome which is a single-stranded 29.9kb long mRNA encoding 13 ORFs; the S protein allows attachment and fusion with the host cell membrane, the M proteins are responsible for virion morphogenesis, while the envelope small membrane proteins participate in ESCRT-independent budding for the formation of new virus particles (Neuman et al., 2011; ViralZone, 2020) . The SARS-CoV-2 genome encodes a ~7096 residue long polyprotein which consists of the structural and non-structural proteins Wu et al., 2020) . Expression of the viral proteins is either through a primary translation of the polyprotein that initiates infection, or after some replication, through sub-genomic mRNA expression which produces all structural proteins (Kim et al., 2020) . A summary of the interactome of SARS-CoV-2 proteins with human proteins is provided in Supplementary File 2. Eight viral proteins whose structures were resolved were used for docking-the Spike Glycoprotein (involved in viral attachment and entry), the ORF-7a Protein (which disrupts Tetherin antiviral activity), the Envelope protein (participates in viral budding), the Nucleoprotein (involved in viral genome packaging), the non-structural proteins Nsp1, Nsp2 J o u r n a l P r e -p r o o f Journal Pre-proof and Nsp14 (which interfere with host cellular processes), and the ORF-6 protein (which disrupts interferon signaling by preventing nuclear import of proteins). Docking using PatchDock gave results sorted on the basis of the shape complementarity score, the interface area of the docked molecule, and atomic contact energy. The top 10 results were analyzed using FireDock, and the most favorable global binding energy of the complex was noted. The energies were color coded from red (most favorable) to green (least favorable) in order to better visualize the comparison between them (Fig. 2) . It was observed that while most of the non-structural viral proteins did not have favorable binding energies with PJ-1366 proteases, structural proteins like the spike glycoprotein and envelope protein, which are crucial for viral entry to host cells, were capable of being bound by the fungal proteases. Of the 17 fungal proteases that were analyzed, 7 structures showed favorable binding potential. These were ctg7180000009929.g55 (carboxypeptidase), and Nowak, 1995) . However, unlike in bacterial and mammalian systems, the mechanisms by which the fungal hosts tolerate or overcome these infections is not precisely known. Mycophages often have barely detectable effects on the host's fitnessa neutral co-existence that might be the result of co-evolution. Also, in certain yeasts and Ustilago species, retaining the virus actually proves to be beneficial as it increases the fungal pathogenicity (Pearson et al., 2009 ). Even though their roles in overcoming mycoviral attacks is not known, several bioactive compounds, mostly polysaccharides, terpenoids and phenolics that are beneficial for human health have been derived from fungi, especially mushrooms (Seo and Choi, 2021) . Proteases are architectural diverse-ranging from small enzymes (∼20 kDa) to sophisticated multi-domain structures like proteasomes and meprin metalloproteinases (0.7-6 MDa). This multiplicity of enzymes results in an outstanding diversity in protease functions. Diversity is also observed in the case of specificity towards the targets, with some proteases exhibiting exquisite substrate/bond preferences; however, most proteases are relatively non-specific and can target multiple substrates (López-Otín and Bond, 2008). Using information from the MEROPS family classification of peptidases, it was observed that the PJ-1366 proteases with the most favorable binding energies to viral proteins belonged to diverse families of endopeptidases-one aspartic protease (A1A), one metalloprotease (M20A), four serine proteases belonging to families S8A and S10, and a threonine protease (T1A) ( Table 3 ). ( Table 3-Here) Based on structure, the aspartic proteinases (APs) are classified into five superfamilies-AA, AC, AD, AE, and AF. The A1 family of eukaryotic APs is part of the AA clan, and contains many well-characterized enzymes with industrial and therapeutic uses (Rawlings et al., 2018) . Not only do these peptidases hydrolyze proteins for nutrition and recycling, but they also perform many essential post-translational processing events for the activation/inactivation of enzymes and peptide hormones. Pepsin-like enzymes are aspartic proteases, which belong to the A1 family of peptidases. Pepsin hydrolyses proteins into water-soluble fragments called peptones. Partial digestion by pepsin has been commercially used for processing proteins in food industries. Medically. it has also been employed as a laxative (Summers, 2017) . Amongst the various peptidases in family M20 are carboxypeptidases such as the glutamate carboxypeptidase from Pseudomonas (M20.001), the thermostable J o u r n a l P r e -p r o o f Journal Pre-proof carboxypeptidase Ss1 of broad specificity from archaea such as Sulfolobus sp. (M20.008) and the yeast Gly-X carboxypeptidase (M20.002). Bacterial glutamate carboxypeptidases-that have high affinity for folic acid-have been developed for anti-cancer regimes in two settingsto eliminate methotrexate from circulation rapidly, and to remove the glutamate residue from pro-drugs to release a cytotoxic agents at tumor sites. The peptidase family S10 is active only at acidic pH, unlike most other serine peptidase families. Carboxypeptidase Y (CPY) is a glycoprotein exopeptidase with a broad amino acid specificity that can retain its activity under the denaturing conditions used for polypeptide sequencing (Nielsen et al., 1990) . CPY has also been used as a sensing element in biosensors for the direct detection of ochratoxin A in olive oil (Dridi et al., 2015) . Most members of the family S8A are neutral/alkaline endopeptidases. Many peptidases in the family are thermostable. Because of this, these proteases, especially engineered subtilisins, have extensive applications in various industrial sectors such as detergent and leather industries, cosmetics, food processing, skin care ointments, metal scavenging and waste treatment (Sharma et al., 2019) .. The ubiquitin-proteasome system participates in the regulation of most fundamental cellular processes via intracellular protein degradation. However, its proteolytic core, the 20S proteasome, has found to be attached also to the cell plasma membrane and certain observations suggest that they may be released into the extracellular medium (Sixt and Dahlmann, 2008) . The eukaryotic proteasome has three different activities (trypsin-like, chymotrypsin-like and cleavage after glutamate). Each activity resides in a different β subunit. The archaean and bacterial proteasomes have only chymotrypsin-like activity, which are included in T1A. While the proteasome is an established anticancer drug target (Osmulski et al., 2017) , utilizing the proteasome for degradation of heterologous proteins is an yet unexplored area. The structures of industrially-produced proteases from the five protease families with high binding potential were obtained from the MEROPS database, and were used for docking with viral proteins. After refinement of docking results using FireDock, it was seen that while all the proteases were able to bind the viral proteins in their catalytic pockets, the binding energies were noticeably higher (Fig. 3) . This shows that in spite of being evolutionarily related, changes in amino acid sequences can still cause significant differences in the tertiary structure, which in turn affects the binding potential. Another factor to be reckoned with, is J o u r n a l P r e -p r o o f the imperfect modelling of the fungal proteases. Yet another possibility is that, due to evolution and adaptation, and their ability to colonize different environments and life forms, fungal proteins simply might have better chances of binding to viral proteins than the prokaryotic ones. This might be the reason for viral attacks not having highly deleterious effects on fungi. ( Figure 3-Here) Due to the unprecedented and unchecked spread of the COVID-19 outbreak globally, current treatments have mostly focused on alleviating symptoms and providing respiratory support. The development of different vaccines against SARS-CoV-2 are either completed or nearing completion, and at least 13 different vaccines (across 4 platforms) are now being administered globally (Prompetchara et al., 2020) . Still, the threat persists, mainly due to the rapidly evolving nature of the virus, and also due to the possible emergence of other zoonotic diseases. The situation is more dire in a country like India, with extreme geo-climatic and socio-economic diversity, and the nation faces a constant threat of emerging and re-emerging viral infections of public health importance. Therefore, repurposing and re-evaluating existing drugs and commercially available inhibitors against druggable targets of the virus could effectively accelerate the drug discovery process. In this regard, targeting essential proteins (viral and/or host) involved in viral entry and proliferation can be considered as a practical approach to alleviate the epidemic. An important aspect to consider would be the specificity of the proteases against the targeted protein, since non-specific inhibition can adversely affect the regular physiological functioning of the host, either by activation of endogenous proteases or through degradation of protease inhibitors. Also, the proteases themselves can be immunogenic, and can induce inflammatory responses. For example, certain protease-activated receptors (PARs) have been known to alter the permeability of epithelial barriers which leads to inflammation (Enjoji et al., 2014) . Some allergens also elicit an immune response through the protease-mediated cleavage of PARs, which induces proinflammatory cytokines and chemokines (Maeda et al., 2013) . Therefore, any assessment of a protease-based strategy should take into consideration the availability, effectiveness, safety and cost of alternative measures, including checking the spread of infection, immunization or treatment with existing drugs. In addition to in-vivo effects, the possibility of using these proteases as external antiviral agents, either to hydrolyze J o u r n a l P r e -p r o o f Journal Pre-proof or competitively bind viral proteins, needs to be explored. New technologies for rationally engineering proteases, as well as improved delivery options, will significantly expand the efficacy of these enzymes. Since proteases are already being used worldwide for a multitude of commercial applications, a combination of modeling studies was performed to identify proteases that could bind, and potentially degrade, SARS-CoV-2 proteins. Binding energy evaluation identified 7 proteases belonging to 5 different families that are suitable candidates for further evaluation. Based on our current understanding on the roles and physiological effects of proteases, it is proposed that a two-pronged clinical approach, aimed at either destroying or inhibiting viral proteins, could be applied for a more robust response against SARS-CoV-2, with due consideration given to the dosage and site of protease activity. J o u r n a l P r e -p r o o f Journal Pre-proof The energies are ranked from red (most favorable) to green (least favorable). Gene IDs of proteins with highly favorable binding energies (<-50 kcal/mol) are highlighted in yellow J o u r n a l P r e -p r o o f J o u r n a l P r e -p r o o f Molecular characterization and growth optimization of halo-tolerant protease producing Bacillus Subtilis Strain BLK-1.5 isolated from salt mines of Karak SignalP 5.0 improves signal peptide predictions using deep neural networks Basic local alignment search tool Protease-nexin: A cellular component that links thrombin and plasminogen activator and mediates their binding to cells Purification and characterization of an oxidation-stable, thioldependent serine alkaline protease from Bacillus mojavensis SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information Year in Review: The impact of COVID-19 in 12 charts Biochemical features of microbial keratinases and their production and applications Proteases as therapeutics Bacterial and Fungal Proteolytic Enzymes: Production, Catalysis and Potential Applications A biotechnology perspective of fungal proteases Comparison of carboxypeptidase y and thermolysin for ochratoxin A electrochemical biosensing Oyster Mushroom Laccase Inhibits Hepatitis C Virus Entry into Peripheral Blood Cells and Hepatoma Cells Regulation of epithelial cell tight junctions by proteaseactivated receptor 2 Kexin, in: Handbook of Proteolytic Enzymes Kex1 protease is involved in yeast cell death induced by defective N-glycosylation, acetic acid, and chronological aging Disruption of Cerevisin via Agrobacterium tumefaciens BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences The Architecture of SARS-CoV-2 Transcriptome Proteases: Multifunctional enzymes in life and disea se Tyrosinase from mushroom Agaricus bisporus as an inhibitor of the Hepatitis C virus Nebrodeolysin, a novel hemolytic protein from mushroom Pleurotus nebrodensis with apoptosis-inducing and anti-HIV-1 effects A novel non-thermostable deuterolysin from Aspergillus oryzae Protease-activated receptor-2 induces proinflammatory cytokine and chemokine gene expression in canine keratinocytes UniProt Knowledgebase: A hub of integrated protein data A structural analysis of M protein in coronavirus assembly and morphology Regulated overproduction and secretion of yeast carboxypeptidase Y Optimization of alkaline protease production from Bacillus subtilis NS isolated from sea water Anticancer applications of allosteric inhibitors of proteasome Protease inhibitors as antiviral agents Mycoviruses of filamentous fungi and their relevance to plant pathology Immune responses in COVID -19 and potential vaccines: Lessons learned from SARS and MERS epidemic The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database PatchDock and SymmDock: Servers for rigid and symmetric docking Antiviral Bioactive Compounds of Mushrooms and Their Antiviral Mechanisms: A Review A review on protease: An essential tool for various industrial approaches STUDIES ON PRODUCTION, CHARACTERIZATION AND APPLICATIONS OF MICROBIAL ALKALINE PROTEASES Extracellular, circulating proteasomes and ubiquitin -Incidence and relevance Tripeptidyl Peptidase I, in: Handbook of Proteolytic Enzymes AUGUSTUS: A web server for gene finding in eukaryotes A tripeptidyl peptidase 1 is a binding partner of the Golgi pH regulator (GPHR) in Dictyostelium. DMM Industrial Uses of Pepsin A quick guide to diagnosis and treatment of pneumonia with novel coronavirus infections SARS-Cov-2 genome Emerging zoonoses and vector-borne infections affecting humans in Europe A new coronavirus associated with human respiratory disease in China ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests