key: cord-346436-p61mpc6t authors: Onodera, Kenji title: Selection for 3′-End Triplets for Polymerase Chain Reaction Primers date: 2007 journal: PCR Primer Design DOI: 10.1007/978-1-59745-528-2_3 sha: doc_id: 346436 cord_uid: p61mpc6t Primer extension by thermostable DNA polymerase in PCR starts from the 3′-end of a primer. If the PCR starting process fails, the entire PCR fails. Primer sequences at the 3′-end often interfere with success in PCR experiments. Over 2000 primer sequences from successful PCR experiments used with varieties of templates and conditions were analyzed for finding frequencies of the 3′-end triplets. This chapter discusses a trend in 3′-end triplet frequencies in primers used in successful PCR experiments and proposes requirements for the 3′-end of a primer. Finally, a method break to select primers with the best 3′-end triplets is introduced based on the 3′-end analysis result. The 5 -and 3 -ends of a primer have different meanings for PCR processes. Complementarity of the 5 -end of a primer to the PCR template is not so critical as for the 3 -end, and it is known that longer primers at the 5 -end (such as 30 nt or longer) do not improve specificity of PCR primers. The 5 -end of a primer also allows an addition of a tagging sequence. Complete binding between primers and template is not required at the 5 -end. However, the 3 -end of a primer is different from the 5 -end. Thermostable DNA polymerase starts attaching nucleotides from the 3 -end of a primer during the extension step, and it requires complete annealing of the 3 -end of a primer to a template. Incomplete binding at the 3 -end results in inefficient PCR or sometimes no PCR products. Alternatively, it is possible that too stable annealing of a primer at the 3 -end to a template allows generation of PCR product without complete binding between the rest of primer and template, and tolerance in incomplete binding may amplify unexpected PCR product by primer binding to other templates or different regions in a target template. Thus, the 3 -end of a primer is important in PCR primer design for successful PCR experiments. Several kinds of recommendations for the 3 -end of a primer can be found in the literature. One recommends one or two S (S stands for C or G) at the 3 -end triplet of a primer for promoting strong annealing at the 3 - end (1,2) . Another also recommends C or G at the 3 -end of a primer but no CG or GC due to potential formation of hairpins and primer-dimers (3). Although incorporation of S at the 3 -end is recommended, others recommend addition of W (W stands for A or T) at the 3 -end (4). One recommends low G + C content at the 3end (5,6). Another recommends at least one W in the 3 -end triplet but no T at the 3 -end of a primer (4). Recommendations listed here suggest that S and W achieve complete annealing of 3 -end and specificity of a primer, respectively. Many primer design programs have functions to define the 3 -end of a primer. DS Gene searches for S at the 3 -end in the default configuration. However, most of them, such as Primer3, Primer Premier, and Vector NTI, inactivate such functions in the default configurations. Although primer design programs check dimer or hairpin formations, which partially interfere with the 3 -end sequences of primers, few 3 -end considerations are included by commercial software, currently. Although the importance of the 3 -end of primers is recognized to be a key in the primer design, recommendations listed in the literature were mostly based on theory, and it seemed that the 3 -ends of primers had not been well studied. Further studies of the 3 -end were necessary for primer design. It is difficult to perform comprehensive testing of the effects of all 3 -end triplet types in actual PCR experiments. There are 64 triplet types and the amplifying region of the template is different for every primer for 3 -end triplet testing. The optimized PCR conditions may differ for each primer pairs because primer pairs and amplifying regions of a template differ. In this case, PCR results are confounded with many critical factors: 3 -end triplets, primer and template sequences, and PCR conditions. Thus, another approach was taken to examine the frequencies of the 3 -end triplets in successful PCR experiments with a variety of templates and under a variety of conditions. Experimental conditions and primer sequences in successful PCR experiments have been deposited in and are available through the VirOligo database (7). From the VirOligo database, 2137 PCR primer sequences were retrieved for detailed analysis of the 3 -end triplets of successful PCR primers (8; see Note 1). Primer sequences were obtained for analysis from the VirOligo database on February 6, 2002. On that date, the VirOligo database covered all 1685 articles whose abstracts appeared in PubMed before December 19, 2001 as query results for PCR experiments targeted for alcelaphine herpesvirus, bovine adenovirus, bovine viral diarrhea virus (BVDV), bovine herpesvirus (BHV), bovine respiratory syncytial virus, bovine rotavirus, bovine coronavirus, foot-and-mouth disease virus (FMV), variola (smallpox) virus, cowpox virus, and human adenovirus. No filtration of articles was made from the search results. Simply all PCR conditions and oligonucletides listed in the articles were deposited into the VirOligo database. The articles contained 3985 published virus-specific oligonucleotide sequences and 2300 PCR and hybridization conditions for detection of the viruses. All primer sequences were retrieved from the VirOligo database on that date. Finally, duplicated primer sequences were eliminated from the analyzed set. Primers in the VirOligo database were mostly between 18 and 22 nt long (65.8%; see Fig. 1 ), and 20 nt was the most frequent primer length (30.4%). G + C contents of primers in the VirOligo database were 40-60% mostly (78.0%; see Fig. 2 ). Primers were most frequently 50% G + C content (G + C content 47.5-52.4%; 23.7% of all primers). The range of product sizes expected from a distance between primer pairs were most frequently 200-299 bp (18.3%; see Fig. 3 ). Less than 100 bp was not used frequently (2.0%). Most of expected PCR product sizes were less than 1 kb (82.3%), and 57.8% of PCR products were less than 500 bp. In general, the primers in the VirOligo database did not seem to be selected based on T m of primers. T m distribution for FMV-specific primers (see Fig. 4 ), for instance, was similar to that for any 20 nt fragments from the entire FMV genome sequence (GenBank Accession No. AF377945, 53.1% G + C content, 7813 bp). In this observation, 7794 sequences were obtained from the entire genome by sliding a 20 nt frame from the 5 -end to the 3 -end one nucleotide at a time, and T m for each 20 nt fragment was calculated by the nearest-neighbor method Ref. 10 . Only when genome compositions were rich (e.g., BHV-1, GenBank Accession No. AJ004801, 72.4% G + C content, 135,301 bp) and poor (e.g., BVDV, GenBank Accession No. AF220247.1, 45.5% G + C content, 12,294 bp) in G + C contents, were primers with higher than 64 C and lower than 50 C, respectively, less frequent than expected from their genome sequences. Frequencies of primers with T m 's between 50 and 64 C were proportional to those of any random 20 nt fragments in target genome sequences. The analysis of the 3 -end triplets of primers (8; see Fig. 5 and Table 1) showed all 64 types were used in successful PCR experiments from the VirOligo database. No triplets completely inhibited a generation of PCR products. However, preferred and not-preferred triplets existed. The possible combinations of three nucleotides at 3 -end triplets give a total of 64 types, as mentioned above. If there are no preferences and selections of target sequences are completely random, the frequencies should be one-64th, which is approximately 1.56%. The mean and standard deviation (SD) of the distribution were 1.56% as expected and 0.63%, respectively. The difference between the most popular triplet, AGG (3.27%), and the least frequent triplets, TTA (0.42%), was high at 2.85% and 7.8 times difference in frequency. Most frequently reported triplets (frequencies greater than mean plus SD) were of eleven types, AGG, TGG, CTG, TCC, ACC, CAG, AGC, TTC, GTG, CAC, and TGC. The least frequently reported triplets (frequencies less than mean minus SD) were of six types, TTA, TAA, CGA, ATT, CGT, and GGG. Two concerns for the approach used here were addressed. One was that the frequency of a 3 -end triplet may predominantly be determined by its frequency in the target genome sequences. To address this concern, three genome sequences, the most popular templates in the analyzed set (BHV, BVDV, and FMV), were tested for finding whether their genome sequences biased the 3end triplet frequencies. Although the details of the genome comparison results were not discussed here, the trends in 3 -end triplet frequencies shown were not due to target genome sequences (8). Another concern was that the trends in triplet frequencies may be an exact copy of recommendations for primer design in literature. If one design method is used by all primer designers, the trends in triplet frequencies should reflect choices made by the method even though the method may not choose the best 3 -end. Because all recommendations have theoretical support, it was not surprising that the trends in triplet frequencies met some of the recommendations. However, no single set of recommendations was able to predict the most and the least frequent triplets for the analyzed set. The analysis actually revealed triplet frequencies that could not be expected from the recommendations in the literature. Thus, neither concern was valid for the analyzed set. The recommendation has been derived from the trend in 3 -end triplet frequencies. In the VirOligo database, WSS (19.0%; six of top 11) and SWS (18.0%; four of top 11) were the two most common kinds of 3 -end triplets. Whereas two WSS triplets, the WGC pair, were preferred (top 7th and 11th), another two WSS triplets, the WCG pair, were not favored (worst 14th and 26th). Triplets with "CG" were somehow less frequent in VirOligo, and CGW represented two of the worst six triplets. It should be noted that TTS triplets were also preferred (top 8th and 14th), and all top 16 triplets were NNS at the 3 -end (N stands for any of the four nucleotides). Thus, primers should be designed to have SWS, WSS, or TTS as the 3 -end triplet but no CG in the triplet. If such primers are not available, one should select primers with S at the 3 -end. However, one should never select primers with the worst triplet types: WWW, CGW, or GGG at the 3 -end. Any primer design programs can be used for the 3 -end selection method described here. However, every primer design program has different searching algorism, and output primer pairs may not be the same. User settings in the programs also change the outputs. If more than one primer design program is available, the investigator should select a program with simple and easy exporting function of candidate primer sequences. For the 3 -end selection, it requires a number of primer sequences, and exporting them for comparison of the 3 -end can be a labored task without helpful exporting function. Alternatively, a list of primer pairs can be prepared manually. A spreadsheet program (e.g., Excel) is helpful if automated 3 -end selection method (Subheading 3.2.2.) is preferred. Primer design based on the 3 -end triplet recommendation above is simple and straightforward. The investigator needs to filter primer candidates obtained by primer design programs (or manually designed) according to the 3 -end primer recommendations. During selection processes of primer candidates, it is easier to find the one matching the recommendations best with more primer candidates. Thus, more primer pairs should be obtained from the primer design programs than are provided by default. Use a primer design program as usual, but obtain twice or more primer pairs than are given using its default settings (see Notes 2 and 3). • Case of Primer3 (by Steve Rozen and Helen J. Skaletsky) "GC clamp" should be set to 0. "Number To Return" should be set to 20 or more. • Case of Primer Premier 5 (PREMIER Biosoft International) "GC clamp" function does not need to be checked off. GC clamp in Primer Premier checks primer's stability at 5 -end, and it does not interfere with the 3 -end recommendations. Primer Premier returns all primer pairs that passed its requirements. Thus, the number of outputs can not be set. However, if there are too few, conditions such as melting temperature can be relaxed. • Case of DS Gene 1.5 (Accelrys) "S" in "3 -dinucleotide" field should be removed (see Note 4). DS gene returns all primer pairs that passed their requirements and does not need to be set the number of outputs. After obtaining enough primer pairs, the next step is filtering and ranking the primers. The best suited triplets are TTS, SWS, and WSS (no WCG), and they should be searched from the program's output, a list of primer pairs. 1. Find TTS, SWS, and WSS but no WCG in the 3 -ends of primers near the top of the list of primer pair candidates (see Note 5). If suitable pairs are found, the selection process is done here. Use them for PCR. If not, the investigator needs to find suitable pairs without selecting the disfavored 3 -end triplets. 2. Find WCG, WWW, CGW, and GGG in the 3 -end triplets of primers and discard them from a list of primer pair candidates. 3. Find primers with S at the 3 -end near the top of the list of primer pair candidates. If no suitable primer pairs are found by step 3, 4. Select a primer pair from the top of the list (see Note 6). Instead of following steps 1-4 in Subheading 3.2.1, primers can be selected based on the 3 -end triplet frequencies. The recommendation discussed in this chapter may be missing some of important triplets near the top of the frequencies list or avoidable triplets near the bottom of the list. By focusing on the list, the success rate of the PCR experiments can be increased. However, it should be noted that other properties of primers scored by primer design programs may be neglected when the 3 -end triplet frequencies of primer are emphasized although primers with extremely low properties, such as too high or low T m , have probably been discarded and are not in the list of primer candidates put out by primer design programs. Prepare a spreadsheet as per the following steps (see Fig. 6 ). 5. Look up its frequency value from column J and K by entering formula "=vlookup(D3, $j$3:$k$66,2)" in cell F3. 6. Copy the cell F3 and paste it to all cells between F3 and G22. 7. Multiply triplet frequencies of forward and reverse primers by entering formula "= F3 * G3" in cell H3. 8. Copy the cell H3 and paste it to all cells between H3 and H22. 9. Sort the table based on the frequencies of products in column H. Then, report the top-scoring pairs in the spreadsheet. Because it can be automated, this enhancement could allow entry of large sets of primer pairs. In this chapter, a primer selection method based on 3 -end triplet sequence was introduced. The disfavored triplets do not mean that primers with such triplets do not work at all in PCR experiments and only mean that such primers have more chances of failing to amplify the target templates. Such outputs should not be discarded from the primer design program even though all primer candidates obtained from the program did not meet the 3 -end recommendations described in this chapter. If a pair that matches the 3 -end recommendations is found using the recommendations, the success rate of PCR experiments is increased and that will reduce costs and one's stress from unsuccessful PCR experiments and low amplification efficiencies. Such effects are especially significant in large-scale PCR experiments such as the amplification of genes or gene fragments for preparation of microarrays for gene expression analysis. as an output. The favored triplets are 11 of 64 types (17%). Because there are two primers in a pair for PCR, the chance of getting both forward and reverse primers with the recommended triplets is 17% multiplied by itself, or as small as 3%. Fifty output pairs are expected to have at least one pair that matches the requirements if nothing interferes with 3 -end sequence. A larger primer output is better, but more lower ranked pairs appear in the output in such case, and low ranked pairs have undesirable properties in terms of G + C contents or T m of primers. In addition, it simply is not easy to handle very large numbers of primers. So, try to obtain about twice the default number. 3. When primers are obtained by a primer design program, the user needs to note that some primer search settings such as 3 -end "GC clamp" interfere with the 3 -end triplet selection. Such settings should be turned off. Otherwise, all 3 -ends in output primer set will have uniform triplets as a result of the search configuration. GC clamp has been used for point mutation detection by adding GC-rich sequences to 5 -end of PCR primer (11). When PCR products with G + C-rich 5 -end primers were differentiated mutated sequences from wide type, GC clamp helps avoiding complete denaturation of DNA strands during a denaturing gradient gel electrophoresis process. GC clamp for point mutation detection is simply by adding it to the 5 -end of a primer, and it is essentially not a complementary sequence of the gene sequence. In primer design programs, GC clamp means different from that for point mutation detection. G + C-rich sequence creates strong annealing of primers to the template. Thus, some primer design programs search for G +C-rich or stable region in a target template. Some of them look for GC clamp at the 5 -ends of primers (e.g., Primer Premier) as that of point mutation detection does at the 5 -end. On the contrary, some programs (e.g., Primer3) have a function to search G + C-rich region at the 3 -ends of primers. Although such function can be used for ensuring complete annealing of the 3 -end of a primer to the template, it interferes with the recommendations in this chapter. Please consult a manual for primer design program using and find out a type of "GC clamp" that the primer design program searches for. In case the program has the searching function for GC clamp at the 3 -end, such function should be turned off or the number of GC clamp required should be set to zero. 4. Both Vector NTI Advance 9.1 and DS Gene 1.5 let the user decide on a 3 -end sequence to search for primer candidates. Because the recommendations described here consist of three types, SWS, WSS, and TTS, it is not possible to search all three types using this function in one primer search session. The user may try one triplet type at a time using such 3 -end restriction function, but the user should note that a better primer pair might be a forward primer with SWS type and a reverse primer with WSS type. Thus, such functions are not easy to apply for the present 3 -end recommendations, but it may be worth trying to find "S" at the 3 -end as all top 16 triplets contain "S" at the 3 -end. The following are the steps for finding "S" at the 3 -end: • Case of Vector NTI Advance 9.1 In "3 -end" tab of "find primer" window of Vector NTI, click off A and T for first nucleotide and leave the rest for both sense and anti-sense primer 3 . • Case of DS Gene 1. 5 Use the default settings ("S" should be appeared in "3 -dinucleotide" field). 5 . In most programs, e.g., in Primer3, the list of primer candidates is sorted from the best to the worst. So, try to find primers with TTS, SWS, and WSS at the 3 -ends near the top of the list to find the best scored primer pairs with the best triplet selection. 6. Primer pairs that should be avoided have been discarded by step 3. Rest of primer pairs are neither extremely preferred nor not-preferred ones. Thus, an average success rate in the PCR experiment can be expected from the primer pairs. 7. Recording primer pair ID number from the primer design program will help tracking primer pairs after sorting the spreadsheet in later processes. Copy and paste, or type in pair ID numbers, forward primer sequences, and reverse primer sequences in columns A, B, and C of a spreadsheet, respectively Enter the triplet frequencies from Table 1, triplet types in column J and percentages in column K Obtain the right-hand three characters as a substring of the forward primer by entering formula "=Right(B3,3)" in cell D3 Copy the cell D3 and paste it to all cells between D3 and F22 Technical report: Part 2. Basic requirements for designing optimal PCR primers The design of primers for PCR In vitro amplification of DNA by the polymerase chain reaction Optimizing PCR, in PCR 2: A Practical Approach Software to determine optimal oligonucleotide sequences based on hybridization simulation data The basics of RT-PCR VirOligo: a database of virus-specific oligonucleotides Selection for 3 end triplets for polymerase chain reaction primers Construction, application and analysis of the oligonucleotide database, VirOligo, PhD thesis A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics Use of denaturing gradient gel electrophoresis to detect point mutations in the factor VIII gene The author expresses appreciation to Dr. Ulrich Melcher of Department Biochemistry and Molecular Biology, Oklahoma State University for advice on and review of the manuscript.