key: cord-0036752-udbyrms7 authors: Basu, Kaustubh; Sriraam, N.; Richard, R. J. A title: Block based semi-global alignment scheme for the analysis of Given DNA sequences date: 2007 journal: World Congress on Medical Physics and Biomedical Engineering 2006 DOI: 10.1007/978-3-540-36841-0_50 sha: 3b68f25c3129451160f9ba7b620b44e04dd024d4 doc_id: 36752 cord_uid: udbyrms7 Pair wise sequence alignment scheme has been emerged as an efficient computational tool to find region of similarity among sequences of proteins and nucleic acids. As new disease causing viruses are emerging rapidly, new alignment schemes with the advent of fast computers have gained its importance recently. In this paper, we have proposed a block based semi-global alignment scheme to evaluate the optimal alignment between any given two DNA sequences. DNA sequences are divided into blocks of equal length and alignment between the block is determined using dynamic programming. The performances are evaluated in terms of overall matrix score and percentage of similarity. It is inferred from the results, that higher the percentage of similarity between any two blocks, it may code for the same protein/amino acids. The computational complexity of the proposed algorithm is much less compared to that of the general global alignment scheme with O (M, N). Genomic sequence analysis involves the study of similarities between known and unknown DNA sequences to predict the structure and function of proteins [1] [2] [3] [4] [5] [6] [7] [8] .Pairwise alignment algorithms, such as global and local alignments which involves whole sequence or only substrings for analysis respectively provides the best alignment path. Several work based on global alignment of sequences using dynamic programming were reported in [4] [5] [6] [7] [8] [9] [10] where the computational complexity is directly related to the length product of two sequences. In [11] , authors proposed global alignment algorithm which involves inclusion of gaps and showed the improvement on computational efficiency. It has been shown in [10] that alignment involving only prefixes and suffixes of any given sequence improves the computational requirement. This scheme is referred as semiglobal alignment [10] .In this paper we have proposed a block based semi-global alignment scheme (BGA) to evaluate the optimal alignment between any given two DNA sequences. Two pairs of DNA sequences, plant species (Rhizobium sp) and disease causing virus (AIDS and SARS) are used for alignment of the sequences. The optimal block is identified based on two factors, namely, overall matrix score and similarity percentage between the DNA pair sequences. II. BLOCK BASED SEMI-GLOBAL ALIGNMENT SCHEME In BGA scheme, the pair DNA sequences are divided into several blocks of equal size and alignment is performed using dynamic programming as reported in [11] .The main objective of this scheme is to find the similarity between sequence with less computational time thereby the proteomic structure of the unknown DNA can be determined immediately. For the purpose of analysis, two types of DNA sequence, namely, plant species and disease causing virus are considered [11, 12] .Alignment is performed between DNA sequence of Agrobacterium radiobacter and between DNA sequence of Rhizobium sp. for histamine dehydrogenase (String 1 and String 2 respectively) and between DNA sequence of virus causing AIDS and DNA sequence of virus causing SARS (String 3 and String 4 respectively). Fig 1-4 The following notations as shown in Table 1 are used for DNA sequences of different block size. The performance of the proposed scheme is evaluated in terms of overall matrix score and percentage of similarity between the DNA pairs [10, 11] .The overall matrix score (F) is defined as [10] : where 'm' denotes the match between two bases , 's ' denotes mismatch between two bases , 'd ' denotes deletions (gaps) . For matching, and mismatch +1 and -1 points are assigned whereas for gaps -2 point is assigned. 02 18 12 09 12 13 12 14 17 01 21 12 11 07 19 18 15 20 13 23 16 06 18 23 09 13 14 13 07 18 12 25 10 10 21 11 18 11 27 14 27 21 17 21 19 16 09 22 14 15 17 16 12 12 15 20 19 11 18 11 20 15 19 20 25 26 27 23 17 23 17 14 13 20 17 27 21 25 16 27 24 14 27 17 17 10 22 15 27 14 25 11 21 16 1 It can be noticed from the above matrices that as the block size increases, the mismatch between the base pairs increases. This results in more negative score. Among the different block sizes, Block size 100 and Block size 400 yields the best result, for strings 1-2, 3-4 respectively. Table 2 shows the percentage of similarity between the base pairs. For each block, only the highest value for 12x12 , 6x6,3x3(string1-2)and5x5,4x4, 2x2 (string 3-4)combination are shown. It can be noticed from the above table that as the block size increases, the percentage of similarity decreases. It can be noticed that as the block sizes increases, the computation time required for computing using BGA scheme also increases. This paper discusses a block based semi-global alignment scheme (BGA).Two DNA categories, namely, plant species and disease causing virus are used and the base pairs were divided into different blocks of equal length. Dynamic programming with inclusion of gaps was used to evaluate the overall matrix score and percentage of strings similarity. It has been found that as the block size increases, overall matrix score tends to become more negative and hence results in less similarity between the strings. It has been further shown the computation time requires for implementing the algorithm seems to be less for small block sizes and further has to be tested with large databases for obtaining the optimization for alignment between the DNA sequence. It can be concluded that by using this BGA algorithm the structure and function of protein for the unknown sequence can be predicted with less computational time. Improved tools for biological sequence comparison Basic local alignment search tool A generalized Global alignment algorithm AVID: a global alignment program A new approach to sequence comparison: normalized sequence alignment Alignment of whole genomes Fast algorithms for large-scale genome alignment and comparison On global sequence alignment Bio-sequence comparison and applications Introduction to Computational molecular biology Global alignment of two DNA sequences: Mechanism and Strategy