Capsule network for protein ubiquitination site prediction Capsule network for protein ubiquitination site prediction Qiyi Huang1,2¶ Jiulei Jiang3¶ Yin Luo2* Weimin Li4& Ying Wang3 1(School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, Ningxia, China) 2(School of Life Sciences, East China Normal University, Shanghai 200444, China) 3 (School of Computer Science and Engineering, Changshu Institute of Technology, Suzhou 215500, Jiangsu, China) 4 (School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China) *Corresporending author. E-mail: yluo@bio.ecnu.edu.cn(YL) ¶ These authors contributed equally to this work. & This author also contributed equally to this work. Copyright: © 2020 Huang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This project is supported by the National Key R&D Program of China (2018YFE0194000), National Nature Science Foundation of China (61762002), National Statistical Science Research Project (2020LY074). Competing Interests: The authors have declared that no competing interests exist. Abstract Ubiquitination modification is one of the most important protein posttranslational modifications used in many biological processes. Traditional ubiquitination site determination methods are expensive and time-consuming, whereas calculation-based prediction methods can accurately and efficiently predict ubiquitination sites. This study used a convolutional neural network and a capsule network in deep learning to design a deep learning model, “Caps-Ubi,” for multispecies ubiquitination site prediction. Two encoding methods, one-of-K and the amino acid continuous type were used to characterize the sequence pattern of ubiquitination sites. The proposed Caps-Ubi predictor achieved an accuracy of 0.91, a sensitivity of 0.93, a specificity of 0.89, a measure-correlate-prediction of 0.83, and an area under receiver operating characteristic curve value of 0.96, which outperformed the other tested predictors. Introduction Ubiquitination is an important posttranslational modification of proteins, consisting of the covalent binding of ubiquitin to a variety of cellular proteins. Ubiquitin was discovered in 1975 by .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/ Goldstein et al. [1]; it is a small protein composed of 76 amino acids [2]. Ubiquitination is the process of covalently binding the lysine of a substrate protein to the small ubiquitin molecule under the action of a series of enzymes. Three enzymes are involved in the process: E1 activation, E2 conjugation, and E3 ligation. Ubiquitination modification plays a very important role in basic reactions such as signal transduction, cell diseases, DNA repair, and transcription regulation [3–6]. Due to the important biological characteristics of ubiquitination, identifying potential ubiquitination sites helps to understand protein regulation and molecular mechanisms. Determining ubiquitination sites based on traditional biological experimental techniques such as mass spectrometry [7] and antibody recognition [8] is costly and time-consuming. Therefore, it is necessary to develop a calculation method that can accurately and efficiently recognize protein ubiquitination. In recent years, some calculation methods have been developed to predict potential ubiquitination sites. Huang et al. [9] used amino acid composition (AAC), a position weighting matrix, amino acid pair composition (AAPC), a position-specific scoring matrix (PSSM), and other information to develop a predictor called UbiSite using a support vector machine (SVM). Nguyen et al. [10] used an SVM to combine three kinds of information: AAC, evolution information, and AAPC to develop a predictor. Qiu et al. [11] developed a new predictor called “iUbiq-Lys” to apply to sequence evolution information and a gray system model. Chen et al. [12] also applied SVM to build a UbiProber predictor. Wang et al. [13] introduced physical–chemical attributes into an SVM to develop the ESA-UbiSite predictor. Radivojac et al. [14] developed the predictor UbPred using a random forest algorithm. Lee et al. [15] developed UbSite using efficient radial basis functions. All of those machine learning-based methods and predictors have promoted the development of ubiquitination site prediction research and achieved good prediction performance. However, most of them rely on artificial feature selection, which may lead to imperfect features [16], and their datasets are small despite the large volume of accumulated biomedical data. Deep learning, the most advanced machine learning technology, can handle large-scale data well. It has multilayer networks and nonlinear mapping operations, which can fit the complex structure of data well. In recent years, deep learning has been developed rapidly [16] and has been successfully applied in various fields of bioinformatics [17,18]. Some methods based on deep learning have been used for ubiquitination site identification. For example, Fu et al. [19] applied one-hot and composition of k-spaced amino acid pairs encoding methods to develop DeepUbi with text-CNN. Liu et al. [20] used deep transfer learning methods to develop the DeepTL-Ubi predictor for multispecies ubiquitination site prediction. He et al. [21] established a multimodel predictor using one-hot, physical–chemical properties of amino acids, and a PSSM. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/ Although various ubiquitination site predictors and tools have been developed, there are still some limitations, and their accuracy and other performance elements must be further improved. In this paper, a deep learning model, “Caps-Ubi,” is proposed that uses a capsule network for protein ubiquitination site prediction. In Caps-Ubi, the protein fragments are first passed through one-of-K and amino acid continuous methods to encode them. Then three convolutional layers and the capsule network layer are used as a feature extractor to obtain the functional domains in the protein fragments and finally to get the prediction result. Relative to existing tools, the prediction performance of Caps-Ubi is a significant improvement. Researchers could use the predictor to select potential ubiquitination candidate sites and do experiments to verify them, which will reduce the range of protein candidates and save time. Materials and methods Benchmark dataset The ubiquitination dataset came from the largest online protein lysine modification database, PLMD 3.0, which contains 20 protein lysine modifications. The database has 53,501 proteins and 284,780 protein lysine modification sites, including 25,103 proteins and 121,742 ubiquitination sites. To eliminate errors caused by homologous sequences, we used CD-HIT [22] to filter out homologous sequences with sequence similarities greater than 40%. We obtained 12,100 proteins and 54,586 ubiquitination sites, which were used as a positive sample set. Based on those annotated sequences, 427,305 nonubiquitinated sites were extracted from the proteins as a negative sample set, and CD-HIT-2D [23] was used to filter out homologous sequences within the positive sample set that were greater than 50%. To establish a balanced training model, we randomly selected the same data as the positive sample set and selected 90% of it as the training and validation sets and 10% as the independent test set. Finally, 53,999 data on ubiquitination sites and 50,315 data on nonubiquitination sites were obtained. The final data division is shown in Table 1. Table 1. Data of protein ubiquitination sites Dataset No. of positive data No. of negative data Training 44,214 44,214 Validation 4,913 4,913 Testing 5,459 5,459 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/ Input sequence coding The coding method directly determines the quality of its prediction results; a good feature can extract the correlation between the ubiquitination feature and the targets from peptide sequences [24]. After encoding the protein sequence, the sequence information is converted into digital information, and then deep learning is done on it. In this study, two methods were used to encode the amino acid sequence around the protein ubiquitination site; namely, one-of-K encoding and amino acid continuous encoding. One-of-K encoding The one-of-K encoding method was adopted for protein fragments, and each protein fragment was encoded into an m × k 2D matrix, where m is the number of amino acids in each sequence— that is, the length of the input sequence—and k is the type of amino acid. There are 20 kinds of common amino acids. When the length of the input sequence did not reach the window length, it was filled in with a “-” on the left or right side of the protein fragment and was treated as another amino acid, so each sequence consisted of 21 amino acids. Continuous coding of amino acids The continuous amino acid coding method [25] was proposed by Venkatarajan; the coding uses 237 physical-chemical properties to quantitatively characterize 20 amino acids. They used five main components to characterize the changes in 237 physica-chemical properties of amino acids. In this paper, each amino acid is represented by a 6D vector, wherein the first 5D represents the five principal components as shown in Table 1 of [25], the last 1D represents the gap in the input protein fragment with a length of m. The gap is represented by a dash“-”, meaning that when the sequence length does not reach the window length, the bit is coded as 1; otherwise, it is 0. Finally, each protein fragment is coded into an m × 6 2D matrix. This continuous coding scheme can comprehensively consider the physical and chemical properties of protein amino acids and has a smaller dimension than that of one-of-K coding. The smaller input dimension will lead to a relatively simple network structure, which is beneficial to avoid overfitting. Capsule network In a CNN, the pooling layer can extract valuable information from the data, but some location information is lost [26]. Also, a CNN outputs scalar values in neurons, and the information represented by scalar neurons is limited and cannot reflect the spatial position relation of the internal .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/ features of the neural network. To solve the problems of scalar neurons, in 2017 Hinton proposed a deep learning architecture called a capsule network [27]. The main building module of a capsule network is the capsule [28], which is a set of neuron vectors. The length of the capsule represents the probability of the existence of an entity; the longer the capsule, the greater the probability,and the direction of the capsule represents the state of the entity. The capsule network provides a unique and powerful deep learning building block that can better model the complex relations within a neural network. A CNN uses scalar input activation functions, such as the rectified linear activation function ReLU, a sigmoid, and a tanh, and the capsule network uses an activation function called a squash. The calculation equation is (1) where 𝑣 𝑗 is the output of capsule 𝑗 , and 𝑠 𝑗 is the weighted sum of the input vectors of capsule 𝑗 . This function compresses the vector length to the interval [0,1], which can be regarded as a kind of compression and reallocation of the vector length. In addition to the first-layer capsule network, the input of the capsule 𝑠 𝑗 is obtained by the weighted sum of the prediction vector (𝑢 𝑗 | 𝑖 ) located in the lower-layer capsule, and the prediction vector (𝑢 𝑗 | 𝑖 ) is passed through the lower layer. The capsule is calculated by multiplying its output (𝑢 𝑖 ) and the weight matrix (𝑤 𝑖 𝑗 ): (2) (3) where 𝑐𝑖𝑗 is the coupling coefficient, which is obtained by a softmax transformation from 𝑏𝑖𝑗; its calculation equation is (4) In Eq. (4), the sum of the coupling coefficients of all capsules and capsule 𝑖 in the previous layer is 1. The coupling coefficient is obtained through a dynamic routing mechanism; the pseudocode is as follows: procedure ROUTING ( 𝑢𝑗|𝑖 ,r,l) 2 2 || || 1 || || || || j j j j j s s v s s   |ˆj i ij j is c u  |ˆ j i ij iu w u exp( ) exp( ) ij ij k ik b c b   .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/ for all capsules i in layer l and capsules j in layer (l + 1): 𝑏𝑖𝑗 0. for r iterations do: for all capsules i in layer l:𝑐𝑖 softmax (𝑏𝑖) for all capsules j in layer (l + 1): 𝑠𝑗 𝛴𝑐𝑖𝑗𝑢𝑗|𝑖 for all capsules j in layer (l + 1): 𝑣𝑗 squashing (𝑠𝑗) for all capsules i in layer l and capsules j in layer (l + 1):𝑏𝑖𝑗 𝑏𝑖𝑗 + 𝑢𝑗|𝑖. 𝑣𝑗 return 𝑣𝑗 The loss function of the capsule network is the margin loss function, and the calculation equation is (5) where 𝐾 is the number of categories, 𝑇 𝐾 is the real label ubiquitinated to 1 and nonubiquitinated to 0, | | 𝑉 𝑘 | | is the output length of the kth capsule, which is the probability of predicting the kth class. The boundary 𝑚 + is 0.9, which is a penalty for false positives, and the lower boundary 𝑚 ― is 0.1, which is a penalty for false negatives. 𝜆 is a proportional coefficient of 0.5, which is used to control the loss caused when some categories do not appear , to prevent the capsule vector length of all categories from being reduced in the early stage of training,and the total loss is the sum of the losses of 𝐾 categories. Architecture design As shown in Figure 1, the structure of the proposed model contains two identical subnetworks that process one-of-21 and amino acid continuous encoding modes. After training in their respective network model, the two models merge the features as the final output. Each subnetwork consists of the same three 1D convolutional layers (Conv1, Conv2, Conv3) and a capsule network layer. The first convolutional layer (Conv1) of the network is a 1D convolution kernel, which comprises 256 convolution kernels with a size of 1 and a step size of 1 that use the ReLU activation function. A convolution kernel with a length of 1 first appears in the Network in Network [29]; a convolution kernel with a length of 1 can reduce the complexity of the model and can make the network deeper 2 2L max(0, || ||) (1 ) max(0,|| || )k k k k kT m V T V m       .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/ and wider. Applied in this study, it acted as a feature filter and could pool features in two encoding modes. The second convolutional layer, Conv2, is a conventional convolutional layer with 256 1D convolution kernels with a length of 7 and a step size of 1, which functions as a local feature detector to extract the protein sequence input and convert it to corresponding local features. Conv2 is understood as the functional domain characteristics of the protein, and its output is used as the input of the next layer, Conv3. The third convolutional layer, Conv3, has 256 1D convolution kernels with a size of 11 and a step size of 1. The activation function used is ReLU and a dropout mechanism with a random deletion rate of 0.3. The dropout mechanism is used to prevent the model from overfitting and to increase the generalization ability of the model. These two convolutional layers are used to increase the feature representation ability of the capsule network and convert the original features of protein fragments into more advanced and abstract features. Then the local features of Conv2 are used as the input of the PrimaryCapsule network layer. The dimension of each capsule in the PrimaryCapsule is 8, the step size is 1, the convolution kernel length is 20, and the squash activation function is used. The last layer of LabelCapsule is a capsule with a dimension of 10, which is used to represent the two states of the input protein fragment: the input sequence is ubiquitination site or non-ubiquitination site, and finally the output of the two subnetworks are merged as the final prediction result. Figure 1. Network structure structure of the proposed model Model training For model training, we used the Adam[30] optimization algorithm. Adam can automatically adjust the learning rate of the parameters, improve the training speed, and improve the stability of the model. The learning rate was 0.003, the first-order estimated exponential decay rate was 0.9, and the exponential decay rate estimated by the second moment was 0.999. The dynamic routing .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/ mechanism was consistent with that in the original paper [26]. The number of routing iterations was 3, and the boundary loss function was used as the loss function of the model. The boundary loss function form is shown in Eq. (5). and the number of model training iterations was 50 epochs. The deep learning framework used by this model was Keras 2.1.4. Keras is a highly modular deep learning framework based on Theano and written in Python; it supports both CPU and GPU. The programming language was Python 3.5, and the model was trained and tested on a Windows 10 system equipped with an Nvidia RTX 2060 GPU. Result Model evaluation and performance indicators A confusion matrix is a visual display tool used to evaluate the quality of classification models. Each row of the matrix represents the actual condition of the sample, and each column represents the sample condition predicted by the model. There are four values in the matrix, as shown in the following equations, where FN is the number of false negatives, FP is the number of false positives, TN is the number of true negatives, and TP is the number of true positives. The following indicators based on the confusion matrix are usually used to evaluate the prediction of the model performance: Among them, Sn stands for sensitivity, which is the evaluation of the prediction performance of negative samples; Sp is the specificity, which is the evaluation of the prediction performance of positive samples; Acc is the accuracy, which is the evaluation of the accuracy of the model; and MCC is the Matthew’s correlation coefficient, which is the overall evaluation of the model. The receiver operating characteristic (ROC) curve and the area under the curve (AUC) for the ROC curve are usually used to evaluate the pros and cons of binary classifiers: the larger the AUC value, the better the model performance.   FN ( )( )( ) TP TN FP TN FP T MCC TP FN P FP TN FN         .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/ Experimental results First, we did many experiments on the selection of the window size of protein fragments. Because the correlation information between amino acids had a direct effect on the prediction results, we needed to determine an appropriate window size. Previous studies directly used empirical values such as 21, 33, or 49. However, different data models and classifiers tend to have different window sizes [31]. Therefore, a window length of n was selected from a range of 21 to 75, and we did a series of experiments with the different window lengths. For each window length, we encoded all training data into two input modes and trained their respective subnetworks. According to the prediction results of the validation set, we selected each appropriate window size. Figure 2 shows the performance of various window sizes in one-of-21 and amino acid continuous encoding modes. Figure 2. Accuracy of the verification set for various window lengths In Figure 2, the abscissa represents the window length, and the ordinate represents the accuracy of the model. It can be seen from Figure 3 that when the window length was 51, the two encoding modes had the highest accuracy. Therefore, we set the window length of this model to 51. To compare the performance of the model under different encoding schemes, we compared the capsule network and the CNN with similar hierarchical structures of capsule networks and the same training set size. The CNN structure replaced only the PrimaryCapsule layer with the Conv3 layer. We set the LabelCapsule layer to a 128 × 1 fully connected layer. The comparison results are shown in Table 2. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/ Table 2. Comparison of various coding schemes Feature Model Acc (%)1 Sn (%)2 Sp (%)3 AUC4 MCC5 CapsNet 89.51 93.70 85.31 0.96 0.80 One-of-21 CNN 84.93 86.39 82.93 0.93 0.70 CapsNet 90.06 91.88 88.23 0.96 0.80 Amino acid continuous CNN 83.83 85.25 82.41 0.91 0.68 CapsNet 90.47 93.66 87.27 0.96 0.81One-of-21 and amino acid continuous CNN 84.67 82.62 86.72 0.93 0.70 1Accuracy of the model 2Sensitivity of the model 3Specificity of the model 4Area under curve 5Matthew’s correlation coefficient From Table 2, it can be concluded that the capsule network’s accuracies were 5.39%, 7.43%, and 6.85% percentage points higher than those of CNN under the one-of-21, amino acid continuous, and combined one-of-21 and amino acid continuous types, indicating that the capsule network internally expressing the hierarchical relation modeling aspect has more advantages than CNN. Among them, the performance under the combined one-of-21 and amino acid continuous encoding modes is the best on the capsule network: this proposed Caps-Ubi model achieved an accuracy, sensitivity, specificity, area under curve, and Matthew’s correlation coefficient of 91.23%, 93.11%, 89.34%, 0.96, 0.83 respectively. The proposed Caps-Ubi was obtained from balanced data. The ROC curve of Caps-Ubi on the test set is shown in Figure 3, which shows that it was very close to the real situation. Figure 3. Receiver operating characteristic curve of Caps-Ubi on the test set When we used balanced data to train the model on an experimentally verified ubiquitination dataset and a nonubiquitination dataset [19], the ratio of positive peptides and negative peptides was 1:8, so we tested Caps-Ubi using natural-distribution data. The test results are shown in Table 3. According to the test results, the performance was slightly worse than that under the balanced data. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/ Table 3. Results of testing Caps-Ubi under natural-distribution data Protein fragment Acc (%)1 Sn (%)2 Sp (%)3 AUC4 MCC5 Positive–negative ratio 1,000 53.75 0.08 0.99 0.70 0.19 1:8 10,000 53,30 0.12 0.95 0.59 0.12 1:8 1Accuracy of the model 2Sensitivity of the model 3Specificity of the model 4Area under curve 5Matthew’s correlation coefficient Comparison with other methods In the past 10 years, many researchers have contributed to the prediction and research of protein ubiquitination sites. We compared the proposed model with other sequence-based prediction tools. The corresponding data and results are shown in Table 4, which shows that the performance of the Caps-Ubi model exceeded that of the best-performing deep learning model DeepUbi and several other prediction models. The accuracy, sensitivity, specificity, area under curve, and Matthew’s correlation coefficient of Caps-Ubi were 2.36, 3.31, 1.24, 0.05, and 0.05 respectively percentage points higher than those of DeepUbi. Table 4. Proposed Caps-Ubi compared with other methods Predictor Acc (%)1 Sn (%)2 Sp (%)3 AUC4 MCC5 UbiPred 84.44 83.44 85.43 0.85 0.69 UbSite 74.5 65.5 74,8 – – CKSAAP_UbSite 73.4 69.85 76.96 0.81 0.47 UbiProber – 37.0 90.0 0.77 0.63 iUbiq-Lys 82.14 80.56 99.39 – 0.50 DeepUbi 88.98 89.8 88,10 0.91 0.78 Caps-Ubi 91.34 93.11 89.34 0.96 0.83 1Accuracy of the model 2Sensitivity of the model 3Specificity of the model 4Area under curve 5Matthew’s correlation coefficient Conclusion and outlook In this paper, a new deep learning model for predicting protein ubiquitination sites is proposed, using one-of-K and amino acid continuous coding modes. We used the largest available protein ubiquitination site dataset, and the experimental results above verify the effectiveness of this model. The operation of the model has four main steps: encoding protein sequences, constructing convolutional layers, constructing a capsule network layer, and constructing an output layer. The capsule network introduces a new building block for deep learning. Relative to CNN, the capsule network, which uses a dynamic routing mechanism to update parameters, requires more training time, but the time required for prediction is similar. The capsule network can also characterize the .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/ complex relations among amino acids in various sequence positions and can explore the internal data distribution related to biochemical significance. The proposed Caps-Ubi prediction tool will facilitate the sequence analysis of ubiquitination and can also be used to identify other posttranslational modification sites in proteins. In the future, we will study other features that may better extract sample attributes to construct deeper models. References 1. Goldstein G, Scheid M, Hammerling U, Schlesinger DH, Niall HD, Boyse EA. Isolation of a polypeptide that has lymphocyte-differentiating properties and is probably represented universally in living cells. Proc Natl Acad Sci U S A. 1975;72:11-15. 2. Wilkinson KD. The discovery of ubiquitin-dependent proteolysis. Proc Natl Acad Sci U S A. 2005;102:15280-15282. 3. Hicke L, Schubert HL, Hill CP. Ubiquitin-binding domains. Nat Rev Mol Cell Biol. 2005;6:610 621. 4. Hicke L. Protein regulation by monoubiquitin. Nat Rev Mol Cell Biol. 2001;2:195-201. 5. Pickart CM. Ubiquitin enters the new millennium. Mol Cell. 2001;8:499-504. 6. Haglund K, Dikic I. Ubiquitylation and cell signaling. EMBO J. 2005;24:3353-3359. 7. Peng J, Schwartz D, Elias JE, et al. A proteomics approach to understanding protein ubiquitination. Nat Biotechnol. 2003;21:921-926. 8. Gentry MS, Worby CA, Dixon JE. Insights into Lafora disease: malin is an E3 ubiquitin ligase that ubiquitinates and promotes the degradation of laforin. Proc Natl Acad Sci U S A. 2005;102(24):8501-8506. 9. Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY. UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines. BMC Syst Biol. 2016;10 Suppl 1(Suppl 1):6. 10. Nguyen VN, Huang KY, Huang CH, Lai KR, Lee TY. A New Scheme to Characterize and Identify Protein Ubiquitination Sites. IEEE/ACM Trans Comput Biol Bioinform. 2017;14:393- 403. 11. Qiu WR, Xiao X, Lin WZ, Chou KC. iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. J Biomol Struct Dyn. 2015;33:1731-1742. 12. Chen X, Qiu JD, Shi SP, Suo SB, Huang SY, Liang RP. Incorporating key position and amino acid residue features to identify general and species-specific Ubiquitin conjugation .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/ sites. Bioinformatics. 2013;29:1614-1622. 13. Wang JR, Huang WL, Tsai MJ, Hsu KT, Huang HL, Ho SY. ESA-UbiSite: accurate prediction of human ubiquitination sites by identifying a set of effective negatives. Bioinformatics.2017;33:661-668. 14. Radivojac P, Vacic V, Haynes C, et al. Identification, analysis, and prediction of protein ubiquitination sites. Proteins. 2010;78(2):365-380. 15. Lee TY, Chen SA, Hung HY, Ou YY. Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites. PLoS One. 2011;6:e17331. 16. Wang D, Zeng S, Xu C, et al. MusiteDeep: a deep-learning framework for general and kinase specific phosphorylation site prediction. Bioinformatics. 2017;33:3909-3916. 17. Shaw D, Chen H, Jiang T. DeepIsoFun: a deep domain adaptation approach to predict isoform functions. Bioinformatics. 2019;35(15):2535-2544. 18. Sun, D. , Wang, M. , Feng, H. , & Li, A. . (2018). Prognosis prediction of human breast cancer by integrating deep neural network and support vector machine: Supervised feature extraction and classification for breast cancer prognosis prediction. 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). IEEE. 19. Fu H, Yang Y, Wang X, Wang H, Xu Y. DeepUbi: a deep learning framework for prediction of ubiquitination sites in proteins. BMC Bioinformatics. 2019;20:86. 20. Liu Y, Li A, Zhao XM, Wang M. DeepTL-Ubi: A novel deep transfer learning method for effectively predicting ubiquitination sites of multiple species. Methods. 2020;S1046- 2023(20)30156-0. 21. He F, Wang R, Li J, Bao L, Xu D, Zhao X. Large-scale prediction of protein ubiquitination sites using a multimodal deep architecture. BMC Syst Biol. 2018;12(Suppl 6):109. 22. Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010;26:680-682. 23. Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY. UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines. BMC Syst Biol. 2016;10 Suppl 1(Suppl 1):6. 24. Plewczynski D, Tkacz A, Wyrwicz LS, Rychlewski L. AutoMotif server: prediction of single residue post-translational modifications in proteins. Bioinformatics. 2005;21:2525-2527. 25. Venkatarajan M S , Braun W . New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties[J]. Molecular modeling annual, 2001, 7(12):445-453. .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/ 26. Dombetzki LA. An overview over capsule networks. Network Architectures and Services 2018. 27. Sabour S , Frosst N , Hinton G E . Dynamic Routing Between Capsules[J]. 2017. 28. Hinton,G.E. et al. (2011) Transforming Auto-encoders. International Conference on Artifificial Neural Networks. Springer, Finland, pp. 44–51. 29. Lin M., Chen Q., Yan S. Network in network[J]. arXiv preprint arXiv:1312.4400,2013: 30. Kingma,D. and Ba,J. (2014) Adam: a method for stochastic optimization, arXiv preprint arXiv:1412.6980 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted January 7, 2021. ; https://doi.org/10.1101/2021.01.07.425697doi: bioRxiv preprint https://doi.org/10.1101/2021.01.07.425697 http://creativecommons.org/licenses/by/4.0/