key: cord-0733080-7phdxzqd authors: Prescott, Lucas title: SARS-CoV-2 3CLpro whole human proteome cleavage prediction and enrichment/depletion analysis date: 2022-03-28 journal: Comput Biol Chem DOI: 10.1016/j.compbiolchem.2022.107671 sha: 1ce88714ecd99bf691d3566b2e599a640dcf8b47 doc_id: 733080 cord_uid: 7phdxzqd A novel coronavirus (SARS-CoV-2) has devastated the globe as a pandemic that has killed millions of people. Widespread vaccination is still uncertain, so many scientific efforts have been directed toward discovering antiviral treatments. Many drugs are being investigated to inhibit the coronavirus main protease, 3CLpro, from cleaving its viral polyprotein, but few publications have addressed this protease’s interactions with the host proteome or their probable contribution to virulence. Too few host protein cleavages have been experimentally verified to fully understand 3CLpro’s global effects on relevant cellular pathways and tissues. Here, I set out to determine this protease’s targets and corresponding potential drug targets. Using a neural network trained on cleavages from 392 coronavirus proteomes with a Matthews correlation coefficient of 0.985, I predict that a large proportion of the human proteome is vulnerable to 3CLpro, with 4,898 out of approximately 20,000 human proteins containing at least one putative cleavage site. These cleavages are nonrandomly distributed and are enriched in the epithelium along the respiratory tract, brain, testis, plasma, and immune tissues and depleted in olfactory and gustatory receptors despite the prevalence of anosmia and ageusia in COVID-19 patients. Affected cellular pathways include cytoskeleton/motor/cell adhesion proteins, nuclear condensation and other epigenetics, host transcription and RNAi, ribosomal stoichiometry and nascent-chain detection and degradation, ubiquitination, pattern recognition receptors, coagulation, lipoproteins, redox, and apoptosis. This whole proteome cleavage prediction demonstrates the importance of 3CLpro in expected and nontrivial pathways affecting virulence, lead me to propose more than a dozen potential therapeutic targets against coronaviruses, and should therefore be applied to all viral proteases and subsequently experimentally verified. Coronaviruses are enveloped, positive-sense, single-stranded RNA viruses with giant genomes (26-32 kb) that cause diseases in many mammals and birds. Since 2002, three human coronavirus outbreaks have occurred: severe acute respiratory syndrome (SARS) in 2002-2004, Middle East respiratory syndrome (MERS) from 2012 to present, and coronavirus disease 2019 (COVID-19) from 2019 to present. The virus that causes the latter disease, SARS-CoV-2, was first thought to directly infect the lower respiratory epithelium and cause pneumonia in susceptible individuals. The most common symptoms include fever, fatigue, nonproductive or productive cough, myalgia, anosmia, ageusia, and shortness of breath. More recently, however, correlations between atypical symptoms (chills, arthralgia, diarrhea, conjunctivitis, headache, dizziness, nausea, severe confusion, stroke, and seizure) and severity of subsequent respiratory symptoms and mortality have motivated researchers to investigate additional tissues that may be infected. One way to explain these symptoms and associated cellular pathways is to modulators IFI6 and IRAK-1, the epithelial ion channels CFTR and SCNN1D, the tumor suppressors p53BP1/2 (although not p53 itself), RNA polymerase subunits RPA1 and RPC1, eIF4G1, the cytoskeletal proteins MAP4 and MAPRE1/3, and many members of the ubiquitin pathway (USP1/4/5/9X/9Y/13/26 and SOCS6). Additionally, Yang's decision trees [58] were trained on 4 amino acid sliding windows and substitution matrix similarity score-based embeddings, achieved MCCs up to 0.95, but were limited to only 18 coronavirus polyproteins. The embedding-derived non-orthogonality somewhat stabilized the prediction to small changes in sequence assuming the substitution matrix reflects how the cleavages evolve. Decision trees have the benefit of being symbolic and explainable but often predict suboptimally when presented with interpolated or extrapolated inputs, making alternative machine learning techniques more attractive for predicting human protein cleavage prediction. For example, Narayanan et al. [61] and later Singh et al. [62] demonstrated that neural networks outperform decision trees for HIV and hepatitis C virus (HCV) protease cleavage prediction. Additional mixed methods such as Li et al.'s nonlinear dimensionality reduction and subsequent support vector machine (SVM) are able to retain some of the benefits of both linear and nonlinear classifiers. [63] Rognvaldsson et al. [64, 65] argue that nonlinear models including neural networks should not be used for cleavage prediction, however the HIV dataset from Cai et al. [66] that they used and their expanded dataset only included 299 and 746 samples, respectively. Additionally, physiochemical or structural encodings have outperformed one-hot encoding (also called orthogonal encoding) for their small HIV datasets [67] and have moreover eliminated differences between linear and nonlinear classifiers in an equivalent HCV dataset with 891 samples. [68] To my knowledge no one has expanded the 3CLpro cleavage dataset to the point where nonlinearity becomes significant, investigated the entire human proteome for 3CLpro cleavages sites with any method, or performed enrichment analysis and classification of these affected proteins. A complete, manually reviewed human proteome containing 20,350 sequences (not including alternative isoforms) was retrieved from UniProt/Swiss-Prot (proteome:up000005640 AND reviewed:yes). [69] Coronavirus polyprotein sequences were collected from GenBank. [70] Searching for "orf1ab," "pp1ab," and "1ab" within the family Coronaviridae returned 391 different, complete polyproteins, and an additional polyprotein sequence from the monotypic Microhyla letovirus 1 was derived from accession number GECV01031551. [71] These polyproteins each contained 11 cleavages manually discovered using the Clustal Omega multiple sequence alignment server, [72] [73] [74] totaling 4,312 balanced cleavages ( Figure 1 ). P1 glutamines and histidines were unambiguously conserved when aligned to known cleavages in SARS, SARS-CoV-2, MERS, IBV, etc., and all remaining glutamines and histidines were considered to be uncleaved. Although some of the ten amino acid sequences surrounding the cleavages were identical (805 different sites total), all 4,312 balanced positive cleavages were used for subsequent classifier training in addition to all other different, uncleaved sequences with P1 glutamines (18, 477) and histidines (12, 128) , totaling 34,917 samples. Here I assumed that SARS-CoV-2 3CLpro is capable of cleaving all aligned cleavages between all genera of coronaviruses (Alpha-, Beta-, Gamma-, and Deltacoronavirus and the monotypic Alphaletovirus) because variation in cleavage sequences is greater within polyproteins than between them (Figures 2 and 3 ) no matter the existence of protease/cleavage cophylogeny ( Figure 4 ). [76] Figures 1c and 1d demonstrate that the same 11 clusters appear when a lower-dimensionality physiochemical encoding (with dimensionality 40 containing normalized volumes, interface and octanol hydrophobicity scales, and isoelectric points) is used, however this dataset is large enough that one-hot encoding (200 dimensional binary input) outperforms it. The NetCorona 1.0 server as in Kiemer et al.'s work, [59] my reproductions of their sequence logo-derived rules and NN, and my improved sequence logo-based logistic regression and naïve Bayes classification and NNs were optimized and compared to decide which model to use for prediction of human cleavage sites. [78] Kiemer et al.'s seven genome sequence logo and multilayer perceptron used one-hot encoding for the 10 amino acid window surrounding each cleavage (linearizing 10 amino acids resulted in an input of 200 bits). [59] First, logistic regression was performed on the logit of the probability output of the sequence logo (as opposed to Chou et al.'s manual probability cutoff setting by maximizing an unbalanced measure of accuracy [79] ) with a nonzero but optimally extremely small pseudocount and returned an MCC of 0.825 with 74.0% recall. Updating the sequence logo with all known cleavages ( Figure 1 ) improved its MCC to 0.931 with 94.1% recall. A naïve Bayes classifier was additionally constructed from both the positive and negative sequence logos and slightly improved the MCC to 0.935 with 94.0% recall. Figure 5 demonstrates correlations (represented as the mutual information variant known as total entropy correlation coefficients or symmetric uncertainties) between positions that are not captured by simple sequence logos and classifiers assuming independence. [80, 81] NNs, however, allow inclusion of 2D and higher-order correlations not easily visualizable and therefore J o u r n a l P r e -p r o o f often improve accuracy. Finally, in addition to information content, Figure 6 shows a charge-polarityhydrophobicity scale with no obvious trend, reaffirming why one-hot encoding performs can achieve a higher MCC than any physiochemical, lower-dimensional encoding for NNs when the training set is large enough. As for my improvements to the NN, note that Kiemer et al.'s MCC of 0.840 is an average from triple cross-validation (CV). [59] Because the known cleavage dataset is small, no data went unused; the three NN output scores were averaged and similarly considered cleavages when greater than 0.5. Retraining the same NN structures (each with one hidden layer with 2 neurons) on the larger dataset resulted in three-average CV MCC of 0.968, a significant improvement even though the datasets are less balanced. This MCC was maintained after adding all other histidines (which precede 20/805 different cleavages) as negatives. Interestingly, two infectious bronchitis viruses (Igacovirus, Gammacoronavirus) and one wigeon coronavirus HKU20 (Andecovirus, Deltacoronavirus) contained cleavages following leucine, methionine, and arginine (VSKLL^AGFKK in APY26744.1, LVDYM^AGFKK and DAALR^NNELM in ADV71773.1, and AIRCR^NNELM in YP_005352870.1). To my knowledge, synthetic tetra/octapeptides have been cleaved following histidine, phenylalanine, tryptophan, methionine, and possibly proline residues, [56, 83] but only one natural histidine substitution has been documented in HCoV-HKU1 [84] and likely does not affect function. [85] [86] [87] To optimize hyperparameters, the whole dataset was repeatedly split into 80% training/20% testing sets with further splitting of the 80% training set for cross-validation. The optimal settings, no oversampling (within training folds [88] ), limited-memory Broyden-Fletcher-Goldfarb-Shanno (lbfgs) solver, rectifier (ReLU) activation, 0.00001 regularization, and 1 hidden layer with 10 neurons, had an average 20% test set MCC of 0.976 when split and trained many times. Train/test sets repeatedly split with different ratios in Figure 7 demonstrate that the entire dataset is not required for adequate performance for all three classification methods, although my final method used all the data to maximize accuracy. Note that any errors in these predictions are amplified when applied to the whole human proteome as below but that enrichment/depletion statistics proved robust against this variability. Similarly careful optimization and bias and variance characterization should again be performed if this type of analysis is to be repeated on other protease datasets. Also note that Figure 7 displays a curve for a physiochemical encoding (also used in Figures 1c and 1d) underperforming when compared to one-hot encoding even at relatively small training sizes. Of the four physiochemical scales used, octanol hydrophobicity alone reached an MCC of 0.959, and, in the order of importance, addition of volume, interface hydrophobicity, and isoelectric point features increased the maximum MCC to 0.977. Figure 7 : Random train/test split fraction vs MCC demonstrating that performance quickly approaches a limit for all classifiers. Given that protease cleavage datasets are relatively small and training individual models is computationally inexpensive, combining multiple models into ensembles is recommended to reduce variability and at least slightly improve accuracy. The cross-validation described above is itself an ensemble that improves accuracy by introducing diversity in resampling like bootstrap aggregating. In addition to resampling methods, averaging ensembles of networks trained on the same dataset but initialized differently were able to improve accuracies as recently discovered in benchmark datasets. [89] Without an obvious upper bound on ensemble complexity, the final model used for subsequent analyses was an average from 10 sets of 100-fold cross-validated networks. The extremely few sequences incorrectly labeled varied with retraining and were not overrepresented in any lineage; in essence there was no distinction between easy-to-learn and hard-to-learn samples. The average 20% test MCC of this size ensemble was 0.985, although the final ensemble used the entire dataset. Even with the extremely high accuracies of models trained on this large dataset, randomly train/test splitting does not account for any taxonomic biases. One can easily imagine that extending this training dataset to the entire order Nidovirales or even the class Pisoniviricetes may not improve SARS-CoV-2 protease prediction without some (co)phylogenetic weighting or complex resampling algorithms and experimental verification. A novel leave-one-(sub)genus-out resampling analysis (using the final NN architecture and one-hot encoding) summarized in Table 1 affirms that more divergent lineages are more difficult to accurately predict, but that leaving out whole Sarbecovirus and Betacoronavirus resulted in the MCCs 0.865 and 0.835, still rivaling accuracies in previous publications. Alternatively, initially training on only Sarbecovirus sequences and progressively expanding the training set phylogenetically to Milecovirus only reduced Sarbecovirus-specific MCCs from 0.996 to 0.989 while increasing all other subgenera-specific MCCs to similar values. This again affirms that the entire dataset should be used and that diversity between the 11 cleavages is more important than between lineage. Some predicted cleavage sites were close enough to the N-and C-termini that the ten amino acid window input into the neural network was not filled. These sites with P1 glutamine residue less than four amino acids from the N-terminus or less than five amino acids from the C-terminus were omitted because although they may be within important localization sequences, their cleavage kinetics are likely significantly retarded by truncation. Of the 20,350 manually reviewed human proteins, 4,898 were predicted to be cleaved at least once with a final average NN score greater than or equal to 0.5. To prove that the cleavages were nonrandomly distributed among human proteins, random sequences with weighted amino acid frequencies were checked for cleavages. Cleavages occurred at 1.28% of glutamines (4.77% of amino acids) [90] or every 1,640 amino acids in these random sequences. Most proteins are shorter than this and would, if randomly distributed, follow a Poisson distribution; this data's deviation from this distribution indicates that many cleavages are intentional. Protein annotation, classification, and enrichment analysis was performed using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) 6.8. [91, 92] Tissue (UP_TISSUE and UNIGENE_EST_QUARTILE), InterPro, direct Gene Ontology (GO includes cellular compartment (CC), biological process (BP), and molecular function (MF)), Reactome pathways, sequence features, and keywords annotations were all explored, and only annotations with Benjamini-Hochberg-corrected pvalues less than 0.05 were considered statistically significant. Both enriched and depleted (no cleavages) annotations are listed in Tables S2-S10, and my training data, prediction methods, and results can be found on GitHub (https://github.com/Luke8472NN/NetProtease). Enrichment and depletion analyses are often used to probe the importance of annotations in many disease states, yet quantification is not possible without experimentation. Table 2 summarizes cleavages within and hypotheses about noteworthy pathways, however many caveats exist. First, if a protein is central to a pathway, a single cleavage may be all that is required to generate equivalent downstream outcomes. Cleaved proproteins such as coagulation factors or complement proteins may even be activated by 3CLpro cleavage. Additional exhaustive analysis or inclusion of some measure of centrality is required to determine if any insignificantly enriched or depleted pathways are still affected at central nodes (as in false negatives). Second, protease-, [56] substrate sequence-, [76, 83, [93] [94] [95] substrate truncation-, [96] pH-, temperature-, inhibitor type and concentration-, and time after infectiondependent cleavage kinetics convert this classification problem into a regression problem. Cleavage rates among the 11 cleavages per pp1ab vary by at least 50-fold and are uncorrelated with the scores from the classifier described here, so these predictions assume that 3CLpro exists in high enough concentrations and for a long enough time that rate constants do not matter because cleavage reactions J o u r n a l P r e -p r o o f are complete. Third, longer proteins are more likely to be randomly cleaved and may confound conclusions about annotations containing them. Cleavages in longer proteins (e.g. cytoskeletal or cellcell adhesion components) are no less important than those in shorter sequences, and annotations containing proteins with multiple cleavages deviating from Poisson distributions are more likely due to highly conserved sequences than simply protein length. Lastly, convergent evolution within the host may also result in false positives and may be partially avoided by investigating correlations between domains, motifs, repeats, compositionally biased regions, or other sequence or structural similarities and other functional and ontological annotations. Ideally, a negative control proteome from an uninfectible species could prevent false positives, but coronaviruses are extremely zoonotic. Here, depletions in the human proteome are taken to be negative controls. Comparison with a bat proteome with deficiencies in many immune pathways, however, may show which human cleavages are unintentional or exerted little or no selective pressure before cross-species transmission. As expected in this data, the most significant tissue enrichment of 3CLpro cleavages are in the epithelium, but central and peripheral nervous tissues are also affected due to their similar expression and enrichment of complex structural and cell junction proteins. It is noteworthy that major proteins associated with multiple neurodegenerative diseases (Alzheimer's disease, Parkinson's disease, amyotrophic lateral sclerosis, and spinocerebellar ataxia type 1) are also predicted to be cleaved. Testis has somewhat similar expression to epithelium and brain, highly expresses ACE2, and is enriched in movement/motility-(subset of structural proteins) and meiosis-related (chromosome segregation) proteins, further increasing the likelihood that this tissue is infectible. Spleen, however, does not express much ACE2, and its enrichment is likely due to genes with immune function and mutagenesis sites. Proteins with greater tissue specificity (3 rd quartile) show additional enrichments along the respiratory tract (tongue, pharynx, larynx, and trachea), in immune tissues (lymph node and thymus), and in other sensory tissues (eye and ear). Combining tissues, tobacco use disorder is the only significantly enriched disease, but acquired immunodeficiency syndrome (AIDS) and atherosclerosis were surprisingly depleted. Cleavages are also surprisingly depleted in olfactory and gustatory pathways given the virus' ability to infect related cells and present as anosmia and ageusia. Olfactory receptors are transmembrane rhodopsin-like G protein-coupled receptors that, when bound to an odorant, stimulate production of cAMP via the G protein and adenylate cyclase. The G proteins GNAL and GNAS are not cleaved, and some but not all adenylate cyclases are cleaved, likely resulting in an increase in cAMP. cAMP is mainly used in these cells to open their respective ligand-gated ion channels and cause depolarization, but it is also known to inhibit inflammatory responses through PKA and EPAC. Multiple PDEs that degrade cAMP but not PDE4, the major PDE in inflammatory and immune cells, are cleaved. PDE4 inhibitors have been shown to reduce destructive respiratory syncytial virus-induces inflammation in lung, [96] but olfactory receptor neurons are quickly regenerated and sacrifice themselves when infected by influenza A virus. [98] The depletion in cleavages and resulting increase in cAMP in these neurons is likely to inhibit their programmed cell death long enough for the virus to be transmitted through the glomeruli to mitral cells and the rest of the olfactory bulb. Tongue infection may have similar mechanisms, and herpes simplex virus has been shown to be transmitted to the brainstem through the facial and trigeminal nerves. [99] Gene Ontology Cleaved proteins are depleted in the extracellular space (except for structural collagen, laminin, and fibronectin mainly associated) and enriched in the cytoplasm and many of its components, indicating that the selective pressure for cleavage is weaker once cells are lysed and the protease is released. In the cytoplasm, the most obviously enriched sets are in the cytoskeleton, motor proteins, cell adhesion molecules, and relevant Ras GTPases, particularly in microtubule organizing centers (MTOCs) including centrosomes, an organelle central to pathways in the cell cycle including sister chromatid segregation. More specifically, cleavage of cilia-associated proteins may contribute to dyskinesia and reduced mucociliary escalator effectiveness associated with many respiratory viruses including HCoV-229E and SARS and their resulting bacterial pneumonias. [100, 101] Additionally, cilial dysfunction in olfactory cells in COVID-19 leads to anosmia, although the main reported mechanism is nsp13 (helicase/triphosphatase)-centrosome interaction. [102] Coiled coils account for many of these cleavages and are primarily expressed in corresponding cellular compartments in the epithelium, testis, and brain. Only the coronavirus nsp1, nsp13, and spike proteins have so far been shown to interact with the cytoskeleton, [103] [104] [105] although many other viruses including influenza A virus, [106] herpes simplex virus, rabies virus, vesicular stomatitis virus, and adeno-associated virus [107] also modulate the cytoskeleton. [108] In neurons, this allows for axonal and trans-synaptic transport of viruses which can often be inhibited but sometimes exaggerated by cytoskeletal drugs often used in oncology. [109] [110] [111] [112] Modulation of these structural and motor proteins is required for formation of the doublemembrane vesicles surrounding replicase complexes [113, 114] and for egress. Similarly required for vesicular transport, the coatomer COPI, clathrin, and caveolae pathways are untouched by 3CLpro, but COPII components are likely cleaved due to their function in selecting cargo [115, 116] and contribution to membrane curvature preventing inward nucleocapsid engulfment. [117] Cleavage of many adaptor subunits often targeting degradation leaves only the poorly characterized AP4 or other unknown pathways to handle egress. Modulators of any of these vesicle trafficking pathways may be effective treatments for COVID-19. The nucleus is enriched because its nuclear localization signals and scaffolding proteins are cleaved. Additionally, many nuclear pore complex proteins and importins/exportins associated with RNA transport are also cleaved. Lamins, which are cleaved by caspases during apoptosis to allow chromosome detachment and condensation, are also cleaved by 3CLpro. Chromatin-remodeling proteins including HATs often containing bromodomains, HDACs, SMC proteins also containing coiled coils, separase, and topoisomerase III alpha, but not CTCF nor any other topoisomerases are cleaved, complicating the effects on chromosome condensation and global gene expression. HDAC inhibitors have been shown to decrease or increase virulence depending on the virus, [118] [119] [120] [121] [122] and some but not all DNA methyltransferases and demethylases are cleaved, further complicating these effects. Viruses benefit from preventing programmed cell death and its corresponding chromosomal compaction in response to viral infection (pyknosis), but they also attempt to reduce host transcription by condensing chromosomes and reroute translation machinery toward their own open reading frames. [123, 124] Relatedly, 28S rRNA has been shown to be cleaved by murine coronavirus, and ribosomes with altered activity are likely directed from host to viral RNAs. [125] Ribosome cleavages are depleted here because they are required for viral translation, but the few ribosomal proteins that are cleaved tend to be more represented in monosomes, not polysomes, [126] indicating that ribosomes that initiate faster than they elongate are preferred because they likely frameshift more frequently, allowing for control of the stoichiometric ratio of pp1a and pp1ab. [127] If slower ribosomes are not directly more likely to frameshift, they are still less likely to participate if frameshift-induced traffic jams, collision-stimulated translation abortion and splitting, [128] and subsequent 60S subunit obstruction sensing and nascentchain ubiquitylation, which is especially noteworthy because multiple proteins involved this quality control are predicted to be cleaved. [129] Signal recognition particle (SRP) subunits 68/72kDa associated with the ribosome are also predicted to be cleaved, and the uncleaved SRP9/14kDa are known to encourage translation elongation arrest to allow translocation including transmembrane domain insertion (e.g. coronavirus envelope protein) and have been associated with frameshifts. [130] [131] [132] In fact, frameshifting is a highly enriched keyword in cleaved proteins mainly due to endogenous retroviral (ERV) elements, some of which can activate an antiviral response via pattern recognition receptors (PRRs). [133] Some also resemble reverse transcriptases and may, like the CRIPSR system in prokaryotes, be capable of copying coronavirus genomic RNA to produce an RNAi response via the similarly cleaved DICER and AGO. [134] If the latter is true, individuals with distinct ERV alleles and loci may differentially respond to SARS-CoV-2 infection and/or treatment, especially exogenous RNAi. Lastly, ribosomal proteins are also included in the nonsense-mediated decay (NMD) pathway, which is likely depleted in cleavages because NMD has been shown to be a host defense against coronavirus genomic and subgenomic RNAs' multiple ORFs and large 3' UTRs. [135] It was also shown that the nucleocapsid protein inhibits this degradation but often cannot protect newly synthesized RNAs early in infection. The selective pressure on 3CLpro may be reversed by this nucleocapsid inhibition and the preferential degradation of host mRNAs such that host resources can again be directed toward viral translation. In addition to affecting large organelles, 3CLpro is predicted to cleave all known components of vault. Vault function has not been completely described, but it has known interactions with other viruses. [136] [137] [138] TERT, which is associated with vault TEP1 is also cleaved, but is more frequently reported to be activated by other viral infections and/or promote oncogenesis. [139] Other common viral process proteins are enriched in the epithelium and adaptive immune cells, and those cleaved may affect the heat shock response and other small RNA processing. Lactoferrin, an antiviral protein that is upregulated in SARS infection, [140] is also cleaved, although one of its fragments, lactoferricin, has known antiviral activity. [141] Many PRRs, their downstream effectors, and related pathways (PI3K/AKT/mTOR, MAPK, and nitric oxide synthesis, where nitric oxide has conflicting effects on viral infection [142, 143] ) and transcription factors are cleaved, yet no interferons nor their receptors are cleaved likely due to their redundancy. Downstream of interferon, however, multiple STATs and ISGs are cleaved. Finally, complicating the effects of infection on apoptosis, cleavages in both pro-apoptotic caspases and in the anti-apoptotic Bcl-2 and inhibitors of apoptosis exist. Lipoproteins are a depleted keyword, but multiple apolipoproteins, lipid transfer proteins, and their receptors are predicted to be cleaved and, other than the proapoptotic APOL1, [144] are associated with chylomicrons, VLDL, and LDL as opposed to HDL, indicating that lipoproteins may contribute to the correlations between COVID-19 symptom severity, dyslipidemia, and cardiovascular disease. It was recently discovered that SARS-CoV-2 spike protein binds cholesterol, allowing for association with and reduced serum concentration of HDL. These findings combined with the 3CLpro cleavages show an opportunity for HDL receptor inhibitor treatment, especially antagonists of the uncleaved scavenger receptor SR-B1. [145] Cleavage of the adipokines leptin, leptin receptor, and IL-6 provide a mechanism for COVID-19 comorbidity with obesity independent of lipoproteins and indicate another potential treatment: anti-leptin antibodies. [146, 147] Ubiquitinating and deubiquitinating (DUBs) enzymes are most enriched in the epithelium and the nucleus, and cleavages exist in E3 ubiquitin ligases such as NEDD4, E3-supporting cullins, and DUBs such as proteasomal base and lid subunits, but not in ubiquitin itself. NEDD4 has been shown to enhance influenza infectivity by inhibiting IFITM3 [148, 149] and Japanese encephalitis virus by inhibiting autophagy, [150] but its ubiquitination of many diverse human viruses promotes their egress. IFITMs generally have antiviral activity (others include HIV-1, [151] dengue virus, [152] and filoviruses [153] ), but its use as a treatment for COVID-19 should be carefully considered given its varying effects among other coronaviruses. [154, 155] SARS-CoV-2 has two probable NEDD4 binding sites: the proline-rich, N-terminal PPAY and LPSY [156] in the spike protein and nsp8, respectively. Although the former sequence is APNY and is likely not ubiquitinated in SARS-CoV, small molecule drugs targeting this interaction or related kinases may be useful treatments for COVID-19 as they have been for other RNA viruses. [157] [158] [159] Further research is required to compare these cleavages to the PLpro deubiquitinating activity and the specificity and function of distinct ubiquitin and other ubiquitin-like protein linkage sites. [160, 161] Helicases make up approximately 1% of eukaryotic genes and are enriched in cleavages with many containing RNA-specific DEAD/DEAH boxes. Most viruses except for retroviruses have their own helicase (nsp13 in SARS-CoV-2) and multiple human RNA helicases have been shown to sense viral RNA or enhance viral replication. [162] [163] [164] SARS nsp13 and nsp14 have been shown to be enhanced by the uncleaved human DDX5 and DDX1, respectively, [165, 166] however subunits of the antiviral, RNAdegrading SKI and NEXT complexes and the catalytic subunit of the interacting exosome complex are cleaved. Additionally, DHX36 cleavage may be motivated by its importance in dsRNA sensing when complexed with DDX1 and DDX21, signaling through the similarly cleaved TRIF to type I interferons. [167] The remaining cleaved DEAD/DEAH-box helicases tend to interact with RIG-I-like receptor dsRNA sensing or are involved in ribosome biogenesis or translation initiation. Their varying proviral and antiviral activities make recommending possible therapeutic targets impossible without further characterization. [168] The coagulation cascade contains many predicted cleavages (coagulation factors II, III, VIII (also an acute-phase protein secreted in response to infection), XII, XIII, plasmin(ogen), von Willebrand factor, plasma kallikrein, kininogen-1, and fibronectin), but it is not trivial to predict if these cleavages are similar enough to those in the normal pathway to be activating or inhibiting even though 3CLpro is structurally similar to factors IIa and Xa. [169] Additionally, multiple cleaved serpin suicide protease inhibitors (PAI-2, megsin, A1AT, and the less relevant angiotensinogen, PZI, CBG, LEI, and HSP47) are related to coagulation, hinting that 3CLpro may increase both thrombosis and fibrinolysis rates or result in dose-dependent effects. [170, 171] Angiotensinogen is, however, unrelated to coagulation and is cleaved far from its N-terminus, so its effects on the renin-angiotensin system remain unknown. The structurally similar A2M has a predicted cleavage outside its protease bait region, however, the addition of a missense mutation Q694S would allow cleavage at the same site as factor XIII without reducing protease trapping ability as much as large deletions. [172, 173] Additional support for this potential exogenous replacement includes presence of serine in the same position in PZP, which shares 71% identity with A2M and contains a neighboring GAG site resembling known PLpro cleavages in its primary bait region. Most other antiproteases, however, are too small to have many potential cleavage sites even though they are a very important response to respiratory virus infection. Serpin or alpha globulin replacement therapy or treatment with modified small, 3CLpro competitive inhibitors may be a useful treatment for COVID-19. [174] In addition to coagulation factors, the complement system can induce expulsion of neutrophil extracellular traps (NETs) intended to bind and kill pathogens. [175] NETs, however, simultaneously trap platelets expressing tissue factor and contribute to hypercoagulability. The complement pathway is not obviously enriched, but many central proteins (C1/3/4/5) are or have subunits that are cleaved, indicating viral adaptation to the classical, alternative, and likely lectin pathways. [176] [177] [178] Neutrophilia and NET-associated host damage are known to occur in severe SARS-CoV-2 infection, so inhibitors of the pathway are currently in clinical trials: histone citrullination, neutrophil elastase, and gasdermin D inhibitors to prevent release and DNases to degrade chromatin after release. [179, 180] Complement inhibition would likely similarly reduce the risks of hypercoagulability and other immune-mediated inflammation associated with COVID-19, but effects may vary widely between sexes and ages. [181, 182] Redox-active centers including proteins involved in selenocysteine synthesis are additionally depleted in cleavages likely because of their involvement in avoiding cell death and innate immune response. Respiratory viruses differentially modulate redox pathways, balancing lysis-enhanced virion proliferation and DUOX2-derived reactive oxygen species (ROS)-induced interferon response. [183] In addition to depleted antioxidant proteins, cleavage of DUOX1, NOX5, and XO, the former of which are upregulated in chronic obstructive pulmonary disease (COPD), [184] indicates that coronaviruses prefer to reduce oxidative stress in infected cells, contrary to most COVID-19 symptoms. Given the diversity of responses to respiratory virus infections, each proposed antioxidant should be thoroughly evaluated before being recommended as a treatment of COVID-19. The impact of post-translational modifications on viral protease cleavage frequency remains uncharacterized. Glutamine and leucine, the two most important residues in the cleavage sequence logo, are rarely modified, but serine, the next most important residue, is the most frequently phosphorylated amino acid. Analysis of keywords showed enrichment of phosphoproteins and depletion of disulfide crosslinked, lipid-anchored, and other transmembrane proteins. Lastly, the keywords polymorphism and alternate splicing were enriched, indicating that additional variability between cell lines and between individuals are likely. Once health systems are not so burdened by the quantity of cases and multiple treatments are developed, personalized interventions will likely differ significantly between individuals. Many expected and novel protein annotations were discovered to be enriched and depleted in cleavages, indicating that 3CLpro is a much more important virulence factor than previously believed. 3CLpro cleavages are enriched in the epithelium (especially along the respiratory tract), brain, testis, plasma, and immune tissues and depleted in olfactory and gustatory receptors. Affected pathways with discussed connections to viral infections include cytoskeleton/motor/cell adhesion proteins, nuclear condensation and other epigenetics, host transcription and RNAi, coagulation, pattern recognition receptors, growth factor, lipoprotein, redox, ubiquitination, and apoptosis. These pathways point toward many potential therapeutic mechanisms to combat COVID-19: cytoskeletal drugs frequently used against cancer, modulators of ribosomal stoichiometry to enrich monosomes, upregulation of DICER1 and AGO1/2, exogenous lactoferrin and modified antiproteases including alpha globulins, upregulation of serpins potentially via dietary antioxidants, complement inhibition, reduction of LDL and inhibition of HDL receptor (e.g. by antagonizing SR-B1), anti-leptin antibodies, and downregulating NEDD4 or related kinases and upregulating IFITMs. Pathway components with more complex disruption that may also deliver therapeutic targets but require elucidating experimental results include PDEs, histone acetylation, nitric oxide, and vesicle coatomers. It is also worth further investigating how 3CLpro contributes if at all to the correlations between obesity and severity of infection or to viral induction of autoimmune and potentially oncological conditions. Expansion of the training dataset to the whole order Nidovirales or class Pisoniviricetes may provide more diversity to improve classifying methods if additional protease/cleavage coevolution does not invalidate the assumption of cross-reactivity. Issues requiring in vitro and in vivo experimentation include characterization of cleavage kinetics, any functional differences between proteases, the molecular effects of post-translation modifications, and the individual and population effects of polymorphisms in cleavage sequences on susceptibility to or severity of infection. Even though many caveats exist without experimentation, similar prediction, enrichment/depletion analysis, and therapeutic target identification should be performed for every other viral protease. Supplementary Information S1 Olfactory and gustatory dysfunction as a clinical presentation of mild to moderate forms of COVID-19: A multicenter European study Evidence of the COVID-19 virus targeting the CNS: Tissue distribution, host-virus interaction, and proposed neurotropic mechanisms Possible central nervous system infection by SARS coronavirus. Emerg Infect Dis Severe acute respiratory syndrome coronavirus infection causes neuronal death in the absence of encephalitic in mice transgenic for human ACE2 The neuroinvasive potential of SARS-CoV2 may play a role in the respiratory failure of COVID-19 patients Liver injury in COVID-19: management and challenges SARS-associated viral hepatitis caused by a novel coronavirus: Report of three cases Epithelial cells lining salivary gland ducts are early target cells of severe acute respiratory syndrome coronavirus infection in the upper respiratory tract of rhesus macaques The spleen as a target in severe acute respiratory syndrome The novel coronavirus 2019 epidemic and kidneys ACE2 expression in kidney and testis may cause kidney and testis infection in COVID-19 patients. Front Med (Lausanne) Clinical characteristics and intrauterine vertical transmission potential of COVID-19 infection in nine pregnant women: a retrospective review of medical records COVID-19 and the cardiovascular system Immunopathogenesis of coronavirus infections: implications for SARS Multiple organ infection and the pathogenesis of SARS Two things about COVID-19 might need attention High prevalence of obesity in severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) requiring invasive mechanical ventilation Fatty airways: Implications for obstructive disease Virus-encoded proteinases and proteolytic processing in the Nidovirales The SARS-coronavirus papain-like protease: Structure, function and inhibition by designed antiviral compounds An overview of severe acute respiratory syndrome-coronavirus (SARS-CoV) 3CL protease inhibitors: Peptidomimetics and small molecule chemotherapy Design of wide-spectrum inhibitors targeting coronavirus main proteases Coronavirus main proteinase (3CLpro) structure: Basis for design of anti-SARS drugs The papain-like protease determines a virulence trait that varies among members of the SARS-coronavirus species The papain-like protease of severe acute respiratory syndrome coronavirus has deubiquinating activity Proteolytic processing, deubiquitinase and interferon antagonist activities of Middle East respiratory syndrome coronavirus papain-like protease Crystal structure of the Middle East respiratory syndrome coronavirus (MERS-CoV) papain-like protease bound to ubiquitin facilitates targeted disruption of deubiquinating activity to demonstrate its role in innate immune suppression Severe acute respiratory syndrome coronavirus papain-like protease suppressed alpha interferon-induced responses through downregulation of extracellular signal-regulated kinase 1-mediated signalling pathways The papain-like protease of porcine epidemic diarrhea virus negatively regulates type I interferon pathway by acting as a viral deubiquitinase The SARS coronavirus papain like protease can inhibit IRF3 at a post activation step that requires deubiquination activity Regulation of IRF-3-dependent innate immunity by the papain-like protease domain of the severe acute respiratory syndrome coronavirus Positive selection of a serine residue in bat IRF3 confers enhanced antiviral protection. iScience A pneumonia outbreak associated with a new coronavirus of probable bat origin Structures of the Middle East respiratory syndrome coronavirus 3C-like protease reveal insights into substrate specificity Structures of two coronavirus main proteases: Implications for substrate binding and antiviral drug design Structure of coronavirus main proteinase reveals combination of a chymotrypsin fold with an extra α-helical domain Porcine deltacoronavirus nsp5 antagonizes type I interferon signaling by cleaving STAT2 Porcine epidemic diarrhea virus 3C-like protease regulates its interferon antagonism by cleaving NEMO SARS-CoV-2 proteases PLpro and 3CLpro cleave IRF3 and critical modulators of inflammatory pathways (NLRP12 and TAB1): implications for disease presentation across species. Emerg Microbes Infect The genome organization of the Nidovirales: Similarities and differences between arteri-, toro-, and coronaviruses. Sem Virol Identification and characterization of Iflavirus 3C-like protease processing activities Calicivirus 3C-like proteinase inhibits cellular translation by cleavage of poly(A)-binding protein Substrate requirements of human rhinovirus 3C protease for peptide cleavage in vitro Cleavage of synthetic peptides by purified poliovirus 3C proteinase Expression of virus-encoded proteinases: Functional and structural similarities with cellular enzymes Foot-and-mouth disease virus protease 3C inhibits cellular transcription and mediated cleavage of histone H3 Foot-and-mouth disease virus protease 3C induces specific proteolytic cleavage of host cell histone H3 An RNA polymerase II transcription factor inactivated in poliovirus-infected cells copurifies with transcription factor TFIID A transcriptionally active form of TFIIIC is modified in poliovirus-infected HeLa cells Poliovirus proteinase 3C converts an active form of transcription factor IIIC to an inactive form: A mechanism for inhibition of host cell polymerase III transcription by poliovirus Direct cleavage of human TATA-binding protein by poliovirus protease 3C in vivo and in vitro DNA binding domain and subunit interactions of transcription factor IIIC revealed by dissection with poliovirus 3C protease Poliovirus infection results in structural alteration of a microtubuleassociated protein Poliovirus protease 3C mediates cleavage of microtubuleassociated protein 4 Mechanisms and enzymes involved in SARS coronavirus genome expression Profiling of substrate specificities of 3C-like proteases from group 1, 2a, 2b, and 3 coronaviruses. PLOS One Prediction of proteinase cleavage sites in polyproteins of coronaviruses and its applications in analyzing SARS-CoV genomes Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection Coronavirus 3CLpro proteinase cleavage sites: Possible relevance to SARS virus pathology Cleavage site analysis in picornaviral polyproteins: Discovering cellular targets by neural networks. Protein Science Mining viral protease data to extract cleavage knowledge Prediction of HIV-1 protease cleavage site using a combination of sequence, structural, and physiochemical features Predicting human immunodeficiency virus protease cleavage sites in nonlinear projection space Why neural networks should not be used for HIV-1 protease cleavage site prediction Comprehensive bioinformatic analysis of the specificity of human immunodeficiency virus type 1 protease Artificial neural network model for predicting HIV protease cleavage sites in protein The importance of physiochemical characteristics and nonlinear classifiers in determining HIV-1 protease specificity A comparison of machine learning algorithms for the prediction of hepatitis C NS3 protease cleavage sites UniProt: a worldwide hub of protein knowledge Description and initial characterization of metatranscriptomic nidovirus-like genomes from the proposed new family Abyssoviridae, and from a sister group to the Coronavirinae, the proposed genus Alphaletovirus. Virology Fast, scalable generation of highquality protein multiple sequence alignments using Clustal Omega A new bioinformatics analysis tools framework at EMBL-EBI Analysis tool web services from the EMBL-EBI WebLogo: A sequence logo generator Prediction and biochemical analysis of putative cleavage sites of the 3C-like protease of Middle East respiratory syndrome coronavirus Visualizing data using t-SNE Scikit-learn: Machine Learning in Python A vector projection approach to predicting HIV protease cleavage sites in proteins CorreLogo: an online server for 3D sequence logos of RNA and DNA alignments Medical image registration using mutual information Sequence Bundles: a novel method for visualizing, discovering and exploring sequence motifs Substrate specificity profiling and identification of a new class of inhibitor for the major protease of the SARS coronavirus In silico analysis of ORF1ab in coronavirus HKU1 genome reveals a unique putative cleavage site of coronavirus HKU1 3C-like protease Structural basis and functional analysis of the SARS coronavirus nsp14-nsp10 complex Atlas of coronavirus replicase structure Proteolytic processing of polyproteins 1a and 1ab between non-structural proteins 10 and 11/12 of Coronavirus infectious bronchitis virus is dispensable for viral replication in cultured cells Cross-validation for imbalances datasets: Avoiding overoptimistic and overfitting approaches Deep ensembles: A loss landscape perspective Proteome-pI: proteome isoelectric point database Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists Biosynthesis, purification, and substrate specificity of severe acute respiratory syndrome coronavirus 3C-like proteinase Conservation of substrate specificities among coronavirus main proteases Evaluating the 3C-like protease activity of SARS-coronavirus: Recommendations for standardized assays for drug discovery The substrate specificity of SARS coronavirus 3C-like proteinase Type 4 phosphodiesterase inhibitors attenuate respiratory syncytial virus-induced airway hyper-responsiveness and lung eosinophilia Olfactory receptor neurons prevent dissemination of neurovirulent influenza A virus into the brain by undergoing virus-induced apoptosis Invasion of cranial nerves and brain stem by herpes simplex virus inoculated into the mouse tongue The effects of coronavirus of human nasal ciliated respiratory epithelium First contact: the role of respiratory cilia in host-pathogen interactions in the airways COVID-19, cilia, and smell A SARS-CoV-2 protein interaction map reveals targets for drug repurposing Porcine hemagluttinating encephalomyelitis virus activation of the integrin α5β1-FAK-cofilin pathway causes cytoskeletal rearrangement to promote its invasion of N2a cells Tubulins interact with porcine and human S proteins of the genus Alphacoronavirus and support successful assembly and release of infectious viral particles. Virol Actin and RIG-I/MAVS signaling components translocate to mitochondria upon influenza A virus infection of human primary macrophages The role of the cytoskeleton during viral infection. Curr Top Microbiol Microtubule regulation and function during virus infection Neuritic transport of herpes simplex virus in rat sensory neurons in vitro. Effects of substances interacting with microtubular function and axonal flow Effect of inhibitors that destroy cytoskeleton structures on the antiviral and antiproliferative activity of interferons Discovery of novel small-molecule inhibitors of LIM domain kinase for inhibiting HIV-1 Disruption of the actin cytoskeleton can complement the ability of Nef to enhance HIV-1 infectivity Double-membrane vesicles as platforms for viral replication Does form meet function in the coronavirus replicative organelle? Trends in Microbiol Cargo selection into COPII vesicles is driven by the Sec24p subunit Structural basis of cargo membrane protein discrimination by the human COPII coat machinery Structure of the Sec13/31 COPII coat cage Influenza A virus dysregulates host histone deacetylase 1 that inhibits viral infection in lung epithelial cells HDAC6 restricts influenza A virus by deacetylation of the RNA polymerase PA subunit Histone deacetylase inhibitors potentiate vesicular stomatitis virus oncolysis in prostate cancer cells by modulating NF-κB-dependent autophagy Histone deacetylase inhibitors suppress RSV infection and alleviate virus-induced airway inflammation Histone deacetylase inhibitors increase virus gene expression but decrease CG8+ cell antiviral function in HTLV-1 infection To kill of be killed: how viruses interact with the cell death machinery Mitotic transcription repression in vivo in the absence of nucleosomal chromatin condensation RNase L-independent specific 28S rRNA cleavage in murine coronavirus-infected cells Differential stoichiometry among core ribosomal proteins Achieving a golden mean: Mechanisms by which coronaviruses ensure synthesis of the correct stoichiometric ratios of viral proteins Inverted translational control of eukaryotic gene expression by ribosome collisions Mechanisms and functions of ribosome-associated protein quality control Elongation arrest is not a prerequisite for secretory protein translocation across the microsomal membrane Signal recognition particle-dependent insertion of coronavirus E1, an intracellular membrane glycoprotein The signal recognition particle receptor alpha subunit assembles cotranslationally on the endoplasmic reticulum membrane during an mRNA-encoding translation pause in vitro Human endogenous retroviruses are ancient acquired elements still shaping innate immune responses. Front Immunol Viral infection impacts transposable element transcript amounts in Drosophila Interplay between coronavirus, a cytoplasmic RNA virus, and nonsense-mediated mRNA decay pathway Major vault protein plays important roles in viral infection The major vault protein is responsive to and interferes with interferon-γ-mediated STAT1 signals Robust expression of vault RNAs induced by influenza A virus plays a critical role in suppression of PKR-mediated innate immunity Regulation of telomerase and telomeres: Human tumor viruses take control Expression profile of immune response genes in patients with severe acute respiratory syndrome Antiviral properties of lactoferrin-A natural immunity molecule Harnessing nitric oxide for preventing, limiting and treating the severe pulmonary consequences of COVID-19. Nitric Oxide Inducible nitric oxide contributes to viral pathogenesis following highly pathogenic influenza virus infection in mice Apolipoprotein L1, a novel Bcl-2 homology domain 3-only lipid-binding protein, induces autophagic cell death Cholesterol metabolism-Impact for SARS-CoV-2 infection prognosis, entry, and antiviral therapies. medRxiv Obesity, the most common comorbidity in SARS-CoV-2: is leptin the link? Leptin mediates the pathogenesis of severe 2009 pandemic influenza A (H1N1) infection associated with cytokine dysregulation in mice with diet-induced obesity E3 ubiquitin ligase NEDD4 promotes influenza virus infection by decreasing levels of the antiviral protein IFITM3 mTOR inhibitors lower an intrinsic barrier to virus infection mediated by IFITM3 E3 ubiquitin ligase Nedd4 promotes Japanese encephalitis virus replication by suppressing autophagy in human neuroblastoma cells. Sci Rep IFITM proteins restrict HIV-1 infection by antagonizing the envelope glycoprotein IFITM3-containing exosome as a novel mediator for anti-viral response in dengue virus infection Distinct patterns of IFITM-mediated restriction of filoviruses, SARS coronavirus, and influenza A virus Identification of residues controlling restriction versus enhancing activities of IFITM proteins on entry of human coronaviruses Interferoninduced transmembrane protein (IFITM3) is upregulated explicitly in SARS-CoV-2 infected lung epithelial cells. Front Immunol Nedd4 and Nedd4-2: closely related ubiquitin-protein ligases with distinct physiological functions Small-molecule probes targeting the viral PPxY-host Nedd4 interface block egress of a broad range of RNA viruses Crosstalk between kinases and Nedd4 family ubiquitin ligases SARS-CoV-2 encodes a PPxY late domain motif that is known to enhance budding and spread in enveloped RNA viruses Ubiquitination, ubiquitin-like modifiers, and deubiquitination in viral infection Ubiquitin in the immune system RNA helicases in infection and disease Determination of host RNA helicases activity in viral replication Genome-wide comprehensive analysis of human helicases The cellular RNA helicase DDX1 interacts with coronavirus nonstructural protein 14 and enhances viral replication Interaction between SARS-CoV helicase and a multifunctional cellular protein (Ddx5) revealed by yeast and mammalian cell twohybrid systems DDX1, DDX21, and DHX36 helicases form a complex with the adaptor molecule TRIF to sense dsRNA in dendritic cells DEAD-box helicases: the Yin and Yang roles in viral infections Coagulation modifiers targeting SARS-CoV-2 main protease Mpro for COVID-19 treatment: an in silico approach COVID-19-related severe hypercoagulability in patients admitted to intensive care unit for acute respiratory failure Elevated plasmin(ogen) as a common risk factor for COVID-19 susceptibility The α-macroglobulin bait region: Sequence diversity and localization of cleavage sites for proteinases in five mammalian α-macroglobulins α2-macroglobulin bait region variants: A role for the bait region in tetramer formation Respiratory protease/antiprotease balance determined susceptibility to viral infection and can be modified by nutritional antioxidants NETosis, complement, and coagulation: a triangular relationship The case of complement activation in COVID-19 multiorgan impact Complement evasion strategies of viruses: An overview. Front Microbiol Mannose binding lectin in severe acute respiratory syndrome coronavirus infection Neutrophilia and NETopathy as key pathologic drivers of progressive impairment in patients with COVID-19 Neutrophil extracellular traps in COVID-19 Complement inhibition in coronavirus disease (COVID)-19: A neglected therapeutic option. Front Immunol Age and sexassociated changes of complement activity and complement levels in healthy Caucasian population. Front Immunol Redox biology of respiratory viral infections. Viruses Increased cytokine response of rhinovirus-infected airway epithelial cells in chronic obstructive pulmonary disease Graphical abstract Prescott CRediT Authorship Contribution Statement Lucas Prescott: Conceptualization, Methodology, Data Curation, Software, Validation, Formal Analysis, Writing, Visualization I am very grateful for my mother, Victoria Prescott, Esq., and friends who have given me invaluable help and advice throughout my work on this project. The author certifies that there is NO affiliation with or involvement in any organization or entity with any financial or non-financial interest in the subject matter or materials discussed in this manuscript titled "SARS-CoV-2 3CLpro whole human proteome cleavage analysis prediction and enrichment/depletion analysis." This project was conducted in spare time outside my full-time employment and without any funding.  Gathered a large 3CLpro cleavage dataset.  Optimized multiple machine learning (ML) models to predict 3CLpro cleavages. Applied the best model to the whole human proteome and performed enrichment/depletion analysis.  Discussed the possible consequences of predicted cleavages in many tissues and cellular pathways and proposed more than a dozen potential therapeutic targets.