id author title date pages extension mime words sentences flesch summary cache txt cord-321386-u1imic5l Li, Chun Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation 2018-02-17 .txt text/plain 5503 311 59 METHODS: Based on two physicochemical properties of amino acids, a protein primary sequence was converted into a three-letter sequence, and then a graph without loops and multiple edges and its geometric line adjacency matrix were obtained. A generalized PseAAC (pseudo amino acid composition) model was thus constructed to characterize a protein sequence numerically. In addition, a generalized PseAAC based SVM (support vector machine) model was developed to identify DNA-binding proteins. Also, we develop a SVM (support vector machine) model using the generalized PseAAC to identify DNA-binding and non-binding proteins on three datasets. By combining these elements with the conventional amino acid composition (AAC), a dimensional feature vector can be constructed to numerically characterize a protein sequence: , By combining these elements with the frequencies of occurrence of 20 standard amino acids and their three representative letters, a generalized PseAAC model of a protein sequence was constructed. Numerical characterization of protein sequences based on the generalized Chou's pseudo amino acid composition ./cache/cord-321386-u1imic5l.txt ./txt/cord-321386-u1imic5l.txt