key: cord-1041395-tw7rygbs authors: Scott, Benjamin M.; Lacasse, Vincent; Blom, Ditte G.; Tonner, Peter D.; Blom, Nikolaj S. title: Predicted Coronavirus Nsp5 Protease Cleavage Sites in the Human Proteome: A Resource for SARS-CoV-2 Research date: 2021-06-08 journal: bioRxiv DOI: 10.1101/2021.06.08.447224 sha: 8b343389f03b8c4475a71a93a4fefc5fd1c4e5a8 doc_id: 1041395 cord_uid: tw7rygbs Background The coronavirus nonstructural protein 5 (Nsp5) is a cysteine protease required for processing the viral polyprotein and is therefore crucial for viral replication. Nsp5 from several coronaviruses have also been found to cleave host proteins, disrupting molecular pathways involved in innate immunity. Nsp5 from the recently emerged SARS-CoV-2 virus interacts with and can cleave human proteins, which may be relevant to the pathogenesis of COVID-19. Based on the continuing global pandemic, and emerging understanding of coronavirus Nsp5-human protein interactions, we set out to predict what human proteins are cleaved by the coronavirus Nsp5 protease using a bioinformatics approach. Results Using a previously developed neural network trained on coronavirus Nsp5 cleavage sites (NetCorona), we made predictions of Nsp5 cleavage sites in all human proteins. Structures of human proteins in the Protein Data Bank containing a predicted Nsp5 cleavage site were then examined, generating a list of 92 human proteins with a highly predicted and accessible cleavage site. Of those, 48 are expected to be found in the same cellular compartment as Nsp5. Analysis of this targeted list of proteins revealed molecular pathways susceptible to Nsp5 cleavage and therefore relevant to coronavirus infection, including pathways involved in mRNA processing, cytokine response, cytoskeleton organization, and apoptosis. Conclusions This study combines predictions of Nsp5 cleavage sites in human proteins with protein structure information and protein network analysis. We predicted cleavage sites in proteins recently shown to be cleaved in vitro by SARS-CoV-2 Nsp5, and we discuss how other potentially cleaved proteins may be relevant to coronavirus mediated immune dysregulation. The data presented here will assist in the design of more targeted experiments, to determine the role of coronavirus Nsp5 cleavage of host proteins, which is relevant to understanding the molecular pathology of SARS-CoV-2 infection. Nsp10-Nsp11 cleavage site resulted in a higher score versus SARS-CoV (0.865 vs 0.65), due to 136 leucine being more common at P2 versus methionine. This mutation may result in a more rapid 137 cleavage at this site in SARS-CoV-2 versus SARS-CoV, as Nsp5 favors leucine above all other 138 residues at P2 [17, 25] . 139 To investigate if NetCorona can distinguish between cleaved and uncleaved motifs, 140 NetCorona scores for all glutamine motifs in the SARS-CoV-2 1ab polyprotein were also 141 determined. To gather context from the ongoing pandemic and to investigate glutamine motifs 142 across different viral variants, 8017 SARS-CoV-2 1ab polyprotein sequences obtained from 143 patient samples were scored with NetCorona (Fig. 1c , Additional File 1: Table S1 ). Apart from 144 two motifs present in only 40 sequences, all glutamine motifs not naturally processed by Nsp5 145 received a NetCorona score <0.5, indicating they were correctly predicted not to be cleaved. 146 Mutations at native Nsp5 cleavage sites were also rare, with only 28 such mutated cleavage 147 sites present in 63 sequences. Except for three mutations present in one sequence each, 148 mutations at native Nsp5 cleavage sites were conservative and only modestly changed the 149 NetCorona score. One sequence contained a histidine at Nsp8-Nsp9 P1 (QIO04366), resulting 150 in NetCorona not scoring the motif. SARS-CoV and SARS-CoV-2 Nsp5 may be able to cleave 151 motifs with histidine at P1, albeit with reduced efficiency [17, 47] . 152 These combined results indicate that despite NetCorona not being trained on the SARS-153 CoV-2 sequence, it was able to correctly distinguish between cleaved versus uncleaved motifs 154 in the 1ab polyprotein, except for Nsp5-Nsp6. The rarity of mutated canonical cleavage sites 155 and mutations introducing new cleavage sites (0.8% and 0.5% of sequences respectively), 156 indicates stabilizing selection for a distinction between Nsp5 cleavage sites and all other 157 glutamine motifs. 158 159 To generate a global view of Nsp5 cleavage sites in the human proteome, datasets were batch 161 analyzed using NetCorona (Fig. 2) . Every 9-residue motif flanking a glutamine was scored, 162 where glutamine acts as P1 and four resides were analyzed on either side (P5-P4'). Using a 163 NetCorona score cutoff of >0.5, 15057 proteins (~20%) in the "All Human Proteins" dataset 164 contained a predicted cleavage site, 6056 (~29%) proteins in the "One Protein Per Gene", and 165 2167 (~32%) proteins in the "Proteins With PDB" dataset (Additional File 1: Table S2 -S4, raw 166 data sets in Additional File 2-4). 167 To help interpret these results, we compared the output from "One Protein Per Gene" to 168 proteins that have been directly tested in vitro for cleavage by a coronavirus Nsp5 protease 169 (Additional File 1: Table S5 ). There are six human proteins where cleavage sites have been 170 mapped to the protein sequence (GOLGA3, NEMO, NLRP12, PAICS, PNN, TAB1) [40, 46, 48] , 171 and also two proteins from pigs (NEMO, STAT2) [41, 43] , and one from cats (NEMO) [42] . 172 NetCorona accurately scored 6 out of the 12 unique cleavage sites mapped in these proteins. 173 NetCorona struggled with an identical cleavage motif at Q231 in NEMO from cats, pigs, and 174 humans, which contains an uncommon valine at P1'. Interestingly, NetCorona predicted a 175 cleavage site in PNN at Q495, which was not identified in the original study but matches the 176 size of a reported secondary cleavage product [48] . varied depending on where the BioID tag was located on on the C145A catalytically inactive mutant), were plotted against the NetCorona score from our 196 study, which is illustrated in Additional File 5: Figure S1 (raw data in Additional File 1: Table S8) . 197 Although statistically significant, the negative correlation between the strength of the human protein interaction and the maximum NetCorona score was small: ρ ranged from -0.18 to 199 -0.29, r 2 ranged from 0.03 to 0.08, depending on where the BioID tag was located on Nsp5. We next sought to incorporate available structural information of potential protein substrates into 207 our analysis, to address the discrepancy between the cleavage events predicted by NetCorona, 208 and mapped cleavage sites that have been directly observed in vitro. "Proteins With PDB" and proteins in the other two datasets was assessed through the non-215 the distribution of scores for "Proteins With PDB" proteins was equivalent to scores for "All 217 Human Proteins" and "One Protein Per Gene" (p=0.121 and p=0.856, respectively), indicating 218 that there was not significant bias in the distribution of NetCorona scores. 219 NetCorona scores are derived from the primary amino acid sequence, but targeted 220 proteolysis is also dependent on the 3D structural context of the potential substrate peptide 221 within a protein [56, 57] . Many methods have been developed to quantify this structural context 222 in silico, and solvent accessibility has been shown to be a strong predictor of proteolysis [57] . 223 Accessible surface area (ASA) is commonly used to measure solvent accessibility, where a 224 probe that approximates a water molecule is rolled around the surface of the protein, and the 225 path traced out is the accessible surface [63] . Thin slices are then cut through this path, to 226 calculate the accessible surface of individual atoms. After obtaining PDB files containing motifs 227 predicted to be cleaved by NetCorona, the total ASA of each 9 amino acid motif was calculated 228 using Protein Structure and Interaction Analyzer (PSAIA) [64] . This ASA was then multiplied by 229 the motif's NetCorona score to provide a "Nsp5 access score", which represents both the 230 solvent accessibility and substrate sequence preference. A Nsp5 access score was obtained for 231 914 glutamine motifs in 794 unique human proteins (Additional File 1: Table S9 ), with the 232 process for selecting PDB files to analyze listed in Additional File 6. Specific examples are presented to illustrate the utility of the Nsp5 access score (Fig. 234 3b-e). Acetylcholinesterase (ACHE) contains a motif at Q259 that was highly scored by 8 protein, the low ASA (38.4 Å 2 ) results in a similarly low Nsp5 access score (34.1) and is 237 therefore unlikely to be cleaved by Nsp5 (Fig. 3b) . TGF-beta-activated kinase 1 (TAB1) is one of 238 the few human proteins with a structure and experimental evidence of SARS-CoV-2 cleavage at 239 specific sites (Q132 and Q444) [46] . As illustrated in Fig. 3c , the nearby motif at Q108 was 240 scored higher than Q132 by NetCorona, but the greater ASA of the Q132 motif contributes to a 241 higher Nsp5 access score, which matches the experimental evidence. The human protein with 242 the highest Nsp5 access score was DEAH box protein 15 (DHX15), as the motif surrounding 243 Q788 was both highly scored by NetCorona and its location proximal to the C-terminus of the 244 protein makes it highly solvent exposed (Fig. 3d) . 245 246 To focus analysis on human proteins most likely to be cleaved by Nsp5, we determined a 248 relevant cut-off to the Nsp5 access score. Using available structures and homology models, the 249 Nsp5 access score of SARS-CoV-2 native cleavage sites was calculated, which ranged from 250 487 (Nsp15-Nsp16) to 923 (Nsp4-Nsp5) (Additional File 1: Table S10 ). The Nsp15-Nsp16 site 251 Based on these comparisons to available experimental data, a Nsp5 access score cut-263 off of 500 was selected, which is further illustrated in Additional File 7: Figure S2 (full data in 264 Additional File 1: Table S12 ). This cut-off accommodates motifs with marginal NetCorona 265 scores (~0.5) but maximally observed ASA (~1000 Å 2 ), and the opposite scenario where a low 266 ASA comparable to Nsp15-Nsp16 (~500 Å 2 ) is matched with a high NetCorona score (~0.9). 92 267 motifs in 92 human proteins were found to have a Nsp5 access score >500 (Fig. 3f) , which were 268 forwarded to the next rounds of analysis. 269 Proteins with a Nsp5 access score above 500 were imputed in STRING within the Cytoscape 273 environment [65] [66] [67] . The STRING app computes protein network interaction by integrating 274 information from publicly available databases, such as Reactome and Uniprot. Through 275 textmining of the articles reported in those databases, it also compiles scores for multiple 276 tissues and cellular compartment. The nucleus and cytosol were the top locations for human 277 proteins with a highly predicted Nsp5 cleavage site (Fig. 4a) , and the highest expression was in 278 the nervous system and liver (Fig. 4b) . The mean or summed expression score did not correlate 279 with the Nsp5 access score (ρ = 0.03 and 0.05 respectively), nor was there a correlation 280 between the Nsp5 access score and subcellular localization scores (ρ = -0.08 for mean and -281 0.17 for sum). 282 Nsp5 may exist in infected cells, and thus what human proteins it may be exposed to. Flanked 284 by transmembrane proteins Nsp4 and Nsp6 in the polyprotein, Nsp5 is exposed to the cytosol 285 when first expressed, where it colocalizes with Nsp3 once released [68] [69] [70] . Recent studies 286 have indicated that SARS-CoV-2 Nsp5 activity can be detected throughout the cytosol of a 287 patient's cells ex vivo [25] , and Nsp5 is also found in the nucleus and ER [51, 71] . 288 Through the Human Protein Atlas (HPA), we obtained information on protein expression 289 in tissue by immunohistochemistry (IHC) together with intracellular localization obtained by 290 confocal imaging for most of the proteins in our dataset [72] . Proteins that are not found in the 291 same cellular compartment as Nsp5, or where intracellular localization was unknown, were 292 filtered out. Out of the initial 92 proteins with a Nsp5 access score over 500 and based on 293 current knowledge, only 48 proteins were likely to be found in the same cellular compartment as 294 Nsp5 (Fig. 5 , Additional File 1: Table S13-14), indicating the greatest potential for interacting 295 with and being cleaved by the protease. Proteins involved in apoptosis, such as CASP2, E2F1, 296 and FNTA, had both a high Nsp5 access score and an above average expression. 297 298 Imputation in STRING of these 48 human proteins with a Nsp5 access score over 500 and 300 plausible colocalization, revealed multiple pathways of interest (Fig. 6 , Additional File 1: Table 301 S15). The pathway containing the most proteins that may be targeted by and colocalize with 10 directly in apoptosis or its regulation (CASP2, E2F1, FNTA, MAPT, PTPN13). DNA damage 305 response, mediated through ATF2, NEIL1, PARP2, and RAD50 may also be targeted by Nsp5. 306 PARP2 had the second highest Nsp5 access score in our analysis, and the predicted cleavage 307 site at Q352 is located between the DNA-binding domain and the catalytic domain [73] . 308 Proteins involved in membrane trafficking (RAB27B and SNX10), or in microtubule 309 organization (DNM1, HTT, MAPRE3, TSC1) were also enriched in this focused dataset, which 310 were grouped together under the descriptor "vesicle trafficking". Two proteins related to 311 ubiquitination (UBA1 and USP4) were also amongst these potential Nsp5 targets. Nsp3 312 mediated modulation of ubiquitination has been shown to be important for IFN antagonism [32-313 39] , and there is also evidence for Nsp5 mediated reduction of ubiquitination [45, 47] . Finally, a 314 group of proteins implicated in cytokine response was also strongly predicted to be cleaved 315 The NetCorona neural network generated long lists of potentially cleaved human 328 proteins, but mismatches between these predictions and the in vitro mapping of Nsp5 cleavage 329 sites indicated that NetCorona scores alone were insufficient for accurate predictions. Similar to 330 previous reports [58-60], solvent accessibility helped to filter predictions based on primary 331 sequence alone, which was made possible thanks to the PSAIA tool which automated the 332 measurement of motif ASA with an easy-to-use GUI that handled batch input of PDB files [64] . 333 Human proteins predicted to be cleaved by Nsp5 did not correlate with Nsp5-human 334 protein-protein interactions identified in vitro, and Nsp5 overall appears to interact with fewer 335 human proteins compared to other Nsps and structural proteins [51] . This may be because the 336 proteolytic activity of Nsp5 reduces the efficiency of proximity labeling/affinity purification, 337 whereby Nsp5 may cleave proteins it interacts with most favorably, reducing the appearance of 338 host protein interactions. The small but statistically significant negative correlation between the 339 strength of the Nsp5-human protein interaction and the human protein's maximum NetCorona 340 score may be evidence of this. Indeed, different sets of interacting proteins are obtained when 341 using the catalytically inactive Nsp5 mutant C145A versus the wildtype Nsp5 [49, 51, 54] . We 342 therefore hypothesize that the interactions observed by proximity labeling/affinity purification do 343 not reflect Nsp5 mediated proteolysis and instead represent non-proteolytic protein-protein 344 interactions, which may still be important to understanding Nsp5's role in modulating host 345 protein networks. 346 N-terminomics based approaches have identified many potential Nsp5 cleavage sites in 347 human proteins [47, 48] , but they have some limitations that bioinformatics can compliment. 348 Trypsin is used in the preparation of samples for mass spectrometry, which generates 349 cleavages at lysine and arginine residues that are not N-terminal to a proline. Lysine and 350 arginine appear in many cleavage sites predicted by NetCorona, meaning that cleavage by 351 trypsin may mask true cleavage sites by artificially generating a N-terminus proximal to a P1 352 glutamine residue. Only one protein overlaps between the Koudelka et al. and Meyer et al. 353 results, as these studies used different cell lines, and thus different proteins will be expressed, August 4 2020. Sequences with inconclusive "X" residues were filtered out as they were not 453 correctly handled by NetCorona, leaving 8017 SARS-CoV-2 1ab polyprotein sequences to be 454 analyzed. 455 456 The command line version of NetCorona was used to predict Nsp5 cleavage site scores for 458 human and viral protein sequences [55] . To overcome the input file limit of 50,000 amino acids 459 per submission and handle sequences with non-standard amino acids, a Python script was 460 developed. This script partitions the input data, runs the NetCorona neural network on each 461 subset, and parses and concatenates the output data. The output file includes sequence 462 accession number, position of P1-Glutamine (Q) residue, Netcorona score (0.000-1.000) and a 463 Microsoft Excel (Additional File 1: Table S1 ). Statistical analysis and the generation of graphs 471 was performed using GraphPad Prism (version 9.1.0) 472 473 PDB metadata associated with proteins in the "Proteins With PDB" dataset that also contained a 475 predicted Nsp5 cleavage (NetCorona score >0.5), were downloaded from the RCSB PDB 476 website by generating a custom report in .csv format. Homology models, and structures with a 477 resolution greater than 8Å or where resolution was not reported, were removed. Nsp5 cleavage 478 sites predicted by NetCorona were matched with one PDB file per cleavage site, by searching 479 the PDB metadata for the predicted 9 amino acid cleavage motif using Microsoft Excel 480 (Additional File 6). The entire predicted 9 amino acid motif must appear in the PDB file to be 481 considered a match. Matches between a PDB file and predicted cleavage motif were manually 482 corrected when the motif sequence appeared by chance in a PDB containing the incorrect 483 PDB files containing a predicted Nsp5 cleavage site were then batch downloaded from 485 the RCSB PDB, and analyzed 100 at a time using the Protein Structure and Interaction Analyzer 486 (PSAIA) tool using default settings [64] , with chains in each PDB analyzed independently. The 487 total accessible surface area (ASA) of each residue was calculated using a Z slice of 0.25 Å and 488 a probe radius of 1.4 Å. XML files output by PSAIA were combined in Microsoft Excel, to create 489 searchable datasets for each 9 amino acid motif predicted to be cleaved by NetCorona, and the 490 total ASA of all atoms in each 9 amino acid motif were summed. The motif's ASA was then 491 multiplied by the NetCorona score to provide a Nsp5 access score. 492 Proteins known to be cleaved by mammalian chymotrypsin-like proteases were 493 independently obtained from the RCSB PDB, and the known cleaved motifs were analyzed as 494 above. Protein structures and homology models of SARS-CoV-2 proteins were obtained the 495 RCSB PDB and from SWISS-MODEL [93] and were analyzed as above. Publication quality 496 figures were generated using PyMOL 2.3.0. 497 498 Proteins with Nsp5 access score above 500 were loaded into the STRING app [65] within 500 Cytoscape [66] (version 1.6.0 and 3.8.2 respectively) using Uniprot ID, Homo sapiens 501 background, 0.80 confidence score cut-off and no additional interactor for the pathway 16 expression scores and compartments score for each protein) was exported to R for wrangling 504 and data visualization using the tidyverse and ggrepel packages [94] [95] [96] . 505 To increase confidence, tissue expression and subcellular localization data were 506 obtained from the Human Protein Atlas which are all based on immunohistochemistry (tissue 507 expression) or confocal microscopy (subcellular localization) [72, 97] . Each entry was then 508 matched in R, table joining was done using Uniprot IDs. Expression levels noted as "Not 509 detected", "Low", "Medium" or "High" were replaced by numeric values ranging from 0 to 3. 510 Mean expression was calculated as the mean expression across all tissues, removing missing 511 values from the analysis. 512 The following intracellular locations were used to encompass the nucleus, cytoplasm 513 and endoplasmic reticulum: "Cytosol", "Nucleoplasm", "Endoplasmic reticulum", "Microtubules", 514 "Nuclear speckles", "Intermediate filaments", "Nucleoli", "Nuclear bodies". All the proteins that 515 did not include one or more of these locations in the HPA database were excluded from further 516 analysis. 517 518 The 48 proteins with a Nsp5 access score >500 and that had the potential to be found in the 520 same cellular compartment as Nsp5 were imported into the STRING app (again within 521 Cytoscape) while allowing a maximum of 5 additional interactor for the network generation 522 instead of none. All the other parameters were left unchanged. Individual nodes that had no 523 protein-protein interactions with other proteins in the network were manually moved closer to 524 other nodes presenting the same or similar pathway. When proteins could interact in multiple 525 pathways represented here, a "main pathway" was assigned based on literature search. Node 526 color was a gradient based on Nsp5 access score. Node size increased with the mean 527 expression. Edges represent protein-protein interaction (confidence > 0.80). Gene name labels 528 were colored based on the Nsp5 access acore for readability only. The proximal origin of SARS-CoV-2 Coronavirus Closely Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site 557 of the Spike Protein Can we contain the COVID-19 outbreak with the same measures as 559 for SARS? Virus-encoded proteinases and proteolytic processing in the 592 Profiling of substrate specificities of 3C-like proteases from 40 Porcine Epidemic Diarrhea Virus Protease Regulates Its Interferon Antagonism by Cleaving NEMO Porcine deltacoronavirus nsp5 inhibits 657 interferon-beta production through the cleavage of NEMO Feline Infectious Peritonitis Virus Nsp5 Inhibits Type I Interferon Production by Cleaving NEMO at Multiple Sites. Viruses Deltacoronavirus nsp5 Antagonizes Type I Interferon Signaling by Cleaving STAT2 Evasion of Type I 664 Interferon by SARS-CoV-2 SARS-CoV-2 serves as a bifunctional molecule in restricting type I interferon antiviral signaling SARS-CoV-2 proteases PLpro and 3CLpro cleave IRF3 and critical modulators of 670 inflammatory pathways (NLRP12 and TAB1): implications for disease presentation across species Terminomics for the Identification of In Vitro Substrates and Cleavage Site Specificity of the SARS-674 CoV-2 Main Protease Characterisation of protease activity during SARS-CoV-2 infection identifies novel 677 viral cleavage sites and cellular targets for drug repurposing A SARS-CoV-2 protein interaction map reveals targets for drug repurposing Proteomics of SARS-CoV-682 2-infected host cells reveals therapy targets A SARS-CoV-2 -host proximity interactome Global BioID-based SARS-CoV-2 proteins proximal interactome unveils 688 novel ties between viral polypeptides and host factors involved in multiple COVID19-associated 689 mechanisms Proteomic Survey Reveal Potential Virulence Factors Influencing SARS-CoV-2 Pathogenesis Batra J 694 et al: Comparative host-coronavirus protein interaction networks reveal pan-viral disease 695 mechanisms Coronavirus 3CLpro proteinase cleavage sites: possible 697 relevance to SARS virus pathology Molecular recognition. Conformational analysis of limited 699 proteolytic sites and serine proteinase protein inhibitors Smith 701 JW: Structural determinants of limited proteolysis PoPS: a computational tool for 703 modeling and predicting protease specificity SitePredicting the cleavage of proteinase 62 Exploring bias in the Protein Data Bank using contrast classifiers PSAIA -protein structure and interaction analyzer Cytoscape StringApp: Network Analysis and 718 Visualization of Proteomics Data Ideker T: 720 Cytoscape: a software environment for integrated models of biomolecular interaction networks Biological network exploration with Cytoscape 3. Curr Protoc 723 Bioinformatics Topology and 725 membrane anchoring of the coronavirus replication complex: not all hydrophobic domains of nsp3 726 and nsp6 are membrane spanning Ultrastructure and origin of membrane vesicles associated with the severe acute 729 respiratory syndrome coronavirus replication complex Snijder 731 EJ: SARS-coronavirus replication is supported by a reticulovesicular network of modified 732 endoplasmic reticulum A systemic and molecular 734 study of subcellular localization of SARS-CoV-2 proteins Breckels 736 LM et al: A subcellular map of the human proteome PARP-2 domain requirements for DNA damage-dependent 738 activation and localization to sites of DNA damage Potentiates TH1 Polarization and Is Critical for Effective Antitumor and Antiviral Immunity. Front 741 Immunol Degradation of AIMP1/p43 induced by hepatitis C virus E2 leads to 743 upregulation of TGF-beta signaling and increase in surface expression of gp96 The Global Phosphorylation Landscape of SARS-CoV-2 Infection Identification of a Novel Susceptibility Marker for SARS-CoV-2 Infection in Human Subjects and Risk Mitigation with a Clinically Approved JAK Inhibitor in Human/Mouse Cells DEAD-Box Helicases: Sensors, Regulators, and Effectors for Antiviral Defense. 752 Viruses Nlrp6 754 regulates intestinal antiviral innate immunity The DEAH-box RNA helicase DHX15 activates NF-kappaB and MAPK signaling downstream 757 of MAVS during antiviral responses DHX15 Is a Coreceptor for RLR Signaling That Promotes 759 Antiviral Defense Against RNA Virus Infection Specific Protease Family Substrates in SUMOylation PARP inhibitors: Synthetic lethality in the clinic Severe acute respiratory syndrome coronavirus 3C-like 21 PTPN1/2-mediated dephosphorylation of MITA/STING promotes 778 its 20S proteasomal degradation and attenuates innate antiviral response The role of TC-PTP (PTPN2) in 781 modulating sensitivity to imatinib and interferon-alpha in CML cell line, KT-1 cells T-cell protein 784 tyrosine phosphatase deletion results in progressive systemic inflammatory disease SARS-CoV-2 3CLpro whole human proteome cleavage prediction and 787 enrichment/depletion analysis R: A language and environment for statistical computing ggrepel: Automatically Position Non-Overlapping Text Labels with 'ggplot2 Proteomics. Tissue-based map of the human proteome. Science This is the raw data output by NetCorona following analysis of the "All Human Proteins" dataset 879 Accessible surface area (ASA) of a predicted and known Nsp5 motifs plotted against NetCorona 896 scores, with data published by Moustaqil et al. and Koudelka et al. highlighted [46, 47] , and the 897 Nsp5 access score cut-off displayed. 898