key: cord-0000024-bbvxu8op authors: Karaman, Mazen W.; Groshen, Susan; Lee, Chi-Chiang; Pike, Brian L.; Hacia, Joseph G. title: Comparisons of substitution, insertion and deletion probes for resequencing and mutational analysis using oligonucleotide microarrays date: 2005-02-18 journal: Nucleic Acids Res DOI: 10.1093/nar/gni034 sha: 867e1b0f6ca8757f2a32a625d99b23888ab40d49 doc_id: 24 cord_uid: bbvxu8op Although oligonucleotide probes complementary to single nucleotide substitutions are commonly used in microarray-based screens for genetic variation, little is known about the hybridization properties of probes complementary to small insertions and deletions. It is necessary to define the hybridization properties of these latter probes in order to improve the specificity and sensitivity of oligonucleotide microarray-based mutational analysis of disease-related genes. Here, we compare and contrast the hybridization properties of oligonucleotide microarrays consisting of 25mer probes complementary to all possible single nucleotide substitutions and insertions, and one and two base deletions in the 9168 bp coding region of the ATM (ataxia telangiectasia mutated) gene. Over 68 different dye-labeled single-stranded nucleic acid targets representing all ATM coding exons were applied to these microarrays. We assess hybridization specificity by comparing the relative hybridization signals from probes perfectly matched to ATM sequences to those containing mismatches. Probes complementary to two base substitutions displayed the highest average specificity followed by those complementary to single base substitutions, single base deletions and single base insertions. In all the cases, hybridization specificity was strongly influenced by sequence context and possible intra- and intermolecular probe and/or target structure. Furthermore, single nucleotide substitution probes displayed the most consistent hybridization specificity data followed by single base deletions, two base deletions and single nucleotide insertions. Overall, these studies provide valuable empirical data that can be used to more accurately model the hybridization properties of insertion and deletion probes and improve the design and interpretation of oligonucleotide microarray-based resequencing and mutational analysis. Oligonucleotide microarrays are a powerful technological platform for large-scale screens of common genetic variation and disease-causing mutations (1) (2) (3) (4) (5) . In most published studies (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) , oligonucleotide microarrays are designed to screen specific sequence tracts, up to megabases in length (11, 15, 22, 23) , for all possible single nucleotide substitutions. With some exceptions (24) (25) (26) (27) (28) (29) (30) (31) , the same emphasis was not placed on identifying all possible small insertions and deletions in the heterozygous state. Nevertheless, it is crucial to detect such small insertions and deletions since they can play a major role in inactivating or altering gene function by disrupting functional elements (e.g. splice junctions, cis-acting elements and open reading frames) and also represent another class of common genetic variation. Two fundamental approaches are commonly used to analyze data sets from oligonucleotide microarrays tailored to identify genetic variation in specific DNA segments purely by hybridization (1, (3) (4) (5) 9) . One approach involves identifying statistically significant gains of target hybridization signal to oligonucleotide probes complementary to specific sequence variants (9) . In theory, the gain of signal approach has the advantage of both detecting the presence of genetic variation and identifying the nature of the sequence change in the target. However, it is not feasible to screen for virtually all possible insertions and deletions due to the overwhelming The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oupjournals.org number of mutation-specific probes needed for this analysis. Furthermore, little effort has been made to systematically access the hybridization properties of probes complementary to these small insertions and deletions. The second approach involves identifying losses of hybridization signal to perfect match (PM) probes that are fully complementary to the DNA segment of interest (8, 25, 27, 30, 31) . In theory, the loss of signal approach allows one to screen for all possible sequence changes, including insertions and deletions, that cause a given target nucleic acid sequence to contain mismatches with specific PM probes. However, this necessitates the sequencing of specific DNA regions to identify the nature of the sequence changes (8, 25, 27, 30, 31) . Thus, a combination of the gain and loss of hybridization signal analysis could provide the most robust means of identifying and characterizing mutations using non-enzymatic oligonucleotide microarray assays. Here, we analyze the specificity and reproducibility of nucleic acid hybridization to oligonucleotide microarrays used in the large-scale mutational analysis of the ATM (ataxia telangiectasia mutated) gene that is responsible for autosomal recessive disorder involving cerebellar degeneration, immunodeficiency, radiation sensitivity and cancer predisposition and is also commonly mutated in certain lymphoid malignancies (32, 33) . These microarrays include 25mer oligonucleotide probes complementary to all possible single base substitutions and insertions as well as one and two base deletions on both strands of the ATM coding region. This provides the first comparative analysis of the hybridization properties of substitution, insertion and deletion probes in an oligonucleotide microarray-based mutational analysis of a large gene. A series of 120 DNA samples derived from biopsies of lymphoma patients were previously screened for all possible ATM mutations using oligonucleotide microarrays (30) . Here, we have selected a total of 68 samples that showed robust amplification signals in all 62 coding exons for further analysis (30) . A total of 17 unique mutations, each in a one-to-one mixture with wild-type sequence, occurred once in these samples. The impact of any given mutation in a single sample is minimal given that 67 other samples with wild-type sequences in the region encompassing a given mutation are included in this analysis. Several single nucleotide polymorphisms (SNPs) were present multiple times: 735 C/T, 2572 T/C and 4258 C/T in two samples; 3161 C/G in four samples; and 5557 G/A in five samples. Likewise, these SNPs have a minimal effect on our global analyses given the large number of samples and bases interrogated in this study. As previously described (30) , individual ATM coding exons were amplified from genomic DNA using primers containing T3 and T7 RNA polymerase tails, pooled, and then in vitro transcribed using T3 or T7 RNA polymerase to create biotinlabeled sense and antisense strand targets, respectively. Fluorescein-labeled reference target was made using genomic DNA from an unaffected individual. Reference and test sample targets were fragmented, diluted in hybridization buffer [3 M TMA-Cl (tetramethylammonium chloride), 1· TE, pH 7.4, 0.001% Triton X-100] and hybridized to the ATM microarrays as described previously (30) . Afterwards, the microarray was stained with a phycoerythrinstreptavidin conjugate and digitized hybridization images from both reference and test targets were acquired using the Gene Array Scanner (Hewlett Packard, Palo Alto, CA) equipped with the appropriate emission filters. Custom software was used to quantify hybridization signals for each probe and subtract background hybridization signals. We exclusively focused on raw data from the biotin-labeled test targets since they provide approximately seven times the hybridization signal of the fluorescein-labeled wild-type reference target in this system (28) . This enhanced signal provides greater sensitivity toward detecting weak hybridization. For each sample, for each base and for each potential type of mutation (i.e. substitution, one or two base deletion or one base insertion), the specificity was calculated as the ratio of the PM probe hybridization signal of the wild-type target to their cognate insertion, deletion or single base substitution probes on each strand. The logarithm of these ratios was plotted as a function of the position within the gene. To illustrate the special patterns and to smooth out random variation, running averages of data from 10 bases were used. To capture the variability, at each base, the sample-to-sample standard deviation was again calculated using data derived from a running average of 10 bases for each sample. To estimate the mean hybridization specificity for each type of mutation, the geometric mean (i.e. the antilog of the average of the logged ratios) over all bases and over all specimens was calculated (Table 1) . To further examine the variability of the specificity ratios, the coefficient of variation (cv) was calculated in two ways. The cv is the ratio of the standard deviation divided by the mean; it is useful for understanding the amount of variability relative to the magnitude of the mean or typical value. For the intra-sample cv, the cv was calculated for each of the 68 samples (using the running average of 10 at each ATM base) and the average of the 68 coefficient of variations was taken. For the inter-sample cv, at each of the bases, the cv a Hybridization specificity ratio is defined as the ratio of PM probe hybridization signal to that of the brightest mismatch probe within a given category. The global average of all hybridization specificity ratios for each base in all samples for a given probe type is provided. b Determined for hybridization specificity ratios averaged across windows of 10 bases either within (intra) or across (inter) samples. was calculated using the 68 samples, and the average of the coefficient of variations was taken. For both calculations, the moving average of 10 was used, instead of the original value, since the goal was to understand how the specificity varied over bases and across samples, rather than to estimate the experimental (or measurement) error. In order to determine the relative specificity of the hybridization of complex nucleic acid targets to oligonucleotide probes complementary to single base substitutions, insertions and deletions, we analyzed data generated from oligonucleotide microarray-based mutational analysis of the 9168 bp ATM coding region (30) . These studies used a pair of oligonucleotide microarrays (Affymetrix, Santa Clara, CA) containing over 250 000 probes (25 nt in length) specifically designed to screen the sense and antisense strands of the ATM coding region for genetic variation (27, 30) . Collectively, the ATM sense and antisense microarrays contain 55 008 probes complementary to all possible single base substitutions, 73 344 probes complementary to all possible one base insertions, and 18 336 probes complementary to all possible one base deletions and 18 336 probes complementary to all possible two base deletions in the ATM coding sequence (Figures 1 and 2 ). These microarrays have been used to screen for sequence variation in the ATM gene in over 100 DNA samples (30) . SNPs and gene inactivating mutations were uncovered by screening for localized losses of hybridization signal to PM probes complementary to every 25 nt segment of the ATM coding region (8, 25, 27, 30) . However, hybridization data from deletion and insertion probes were not relied upon in this analysis. Therefore, this data set provides a unique opportunity to examine the relative hybridization specificity of nucleic acid targets to each of these classes of mismatch probes. In order to gain a global overview of hybridization specificity, we determined the average ratio of PM probe hybridization signal of wild-type target (see Materials and Methods) to their cognate insertion, deletion and single base substitution probes on each strand (Table 1 ). In these calculations, we considered data for all 9168 interrogated bases in all 68 DNA samples (see Materials and Methods). For example, we report the ratio of the PM probe signal to the signal from its cognate 1 or 2 bp deletion probe. However, for single base substitutions, we report the ratio of the PM probe signal to that of the cognate substitution probe with the highest hybridization signal. This provides the most rigorous assessment of cross-hybridization to single base substitution probes. Likewise, for single base insertion probes, we report the ratio of the PM probe signal to that of the cognate insertion probe with the highest hybridization signal. For both sense and antisense strands, we found that the two base deletion probes had the highest average PM to cognate MM hybridization specificity ratio (3.26-fold sense and (Table 1) . To provide a finer-scale analysis of hybridization specificity, we determined the relative frequencies of hybridization specificity ratios in defined bins. There was a similar distribution of specificity ratios for single base substitution and two base deletion probes on both strands ( Figure 3 ). The overall lower hybridization specificities of single base deletion and insertion probes are reflected by the increased frequencies of probes within the lower specificity bins (i.e. <2-fold ratio) and decreased frequencies of probes within higher specificity bins (i.e. >3-fold ratio) on both strands. Next, we sought to uncover underlying trends in the hybridization specificity of different classes of mismatch probes across the entire ATM coding region within a given sample (intra-sample variation). This provides insights into sequence context effects that may influence the hybridization specificity of each class of mismatch probe. To approach this problem, we plotted the average hybridization specificity ratios of substitution, deletion and insertion probes for all 1168 bases across the 68 samples (Figure 4 and Supplementary Figure 1) . We analyzed data determined over running averages of 10 bases in order to maximize our ability to detect trends and minimize the effect of randomly dispersed confounding factors (e.g. intra-or intermolecular secondary structure) that may skew data for any given base. As expected from Table 1 and Figure 3 , the two base deletion probes consistently showed a higher average hybridization specificity ratio followed by single base substitution, single base deletion and single base insertion probes on both strands of exon 50 ( Figure 4) . Nevertheless, the hybridization specificity ratios for all classes of mismatch probes fluctuate across the exon 50 sequence (Figure 4 ). For example, two base deletion probes showed a peak value of 6.76 (unlogged) centered at base 7071 and a trough value of 1.90 (unlogged) centered at base 7002 on the sense strand. We also found similar fluctuations in specificity ratios for all mismatch probe types in the remaining 61 ATM coding exons (Supplementary Figure 1 ). To assess intra-sample variability in hybridization specificity by a different means, we determined the average cv for substitution, deletion and insertion probes within a given experiment (Table 1) . Again, we analyzed data from running average of 10 bases in order to maximize our ability to detect trends and maintain consistency in our data analysis. Substitution probes had the lowest average intra-sample cv, 0.31 and 0.23 for sense and antisense strands, respectively. One base deletion, two base deletion and insertion probes showed comparable intra-sample coefficients of variation on the sense strand, 0.37, 0.39, and 0.38, respectively. However, insertion probes showed relatively higher variability than the deletion probes on the antisense strand. Coupled with plots shown in Supplementary Figure 1 , it is evident that of all the mismatch probe types, the hybridization specificities of base substitution probes were least affected by target sequence context. Intrigued by the above observations, we next searched for specific target sequence tracts that produced the lowest hybridization specificity among and between the different classes of mismatch probes. To approach this problem, we determined how many mismatch probes within running windows of 10 bases gave poor hybridization specificity, previously defined as a hybridization specificity ratio <1.2 (26). In Table 2 , we report nucleotide tracts where at least 8 probes within a given 10 base window showed poor hybridization specificity ratios. A comprehensive listing of probes with poor hybridization specificity is provided in Supplementary Table 1 . Repetitive sequence tracts, including homopolymer, homopurine and homopyrimidine, are highly represented in Table 2 . Upon closer inspection, it became apparent why the cross-hybridization is strong for probes in homopolymeric regions. In these sequence contexts, substitution and deletion probes can form duplexes with wild-type target that are longer than 12 bp in length. For example, the probe designed to detect a single base deletion at position 633 is designed to form one 12 bp and one 13 bp duplex with wild-type target. However, this probe can form duplexes that range from 12 to 18 bp in length with wild-type sense strand target due to slippage ( Figure 5 ). This type of ambiguity leads to increased stability of these DNA-RNA heteroduplexes (34) . In principle, the homopurine and homopyrimidine tracts uncovered have the capacity to form higher order structures, such as triple helices (35) . These tracts are known to alter the conformation and stabilities of RNA-DNA heteroduplexes (36, 37) , such as those formed between RNA targets and DNA probes in our system. Finally, we expect the ATM target to be especially rich in such sequence tracts given that both strands of the 3 0 -splice acceptor sequences, typically containing homopyrimidine tracts, for all 62 coding exons are included in the ATM target. This increases the likelihood that highly related sequence tracts in the ATM target can cross-hybridize to probes interrogating a particular homopurine or homopyrimidine sequence tract and reduce the overall hybridization specificity in this region. Next, we screened for potential structures that can form in the PM probes listed in Table 2 or their targets that could explain their poor hybridization specificity. To do this, we used Mfold (38) to calculate Gibbs free energies for intramolecular structures that can form in these PM probes and targets. Based on these Gibbs free energy values, we classified the probes and targets as having strong (S) [DG < (À3 kcal/mmol)], medium (M) [(À1 kcal/mmol) > DG > (À3 kcal/mmol)] and weak (W) [G > (À1 kcal/mmol)] potential for secondary structure. We found that several target and probe sequences could form substantial secondary structures, as displayed in Figure 6 . This could artificially lower the affinity of target to PM probes and thus lower the hybridization specificity. It is more difficult to model intermolecular structure in the solution-phase complex target and in the solidphase oligonucleotide probes. However, it appears likely that such structures could also have a similar negative impact on hybridization specificity. The relative variability in hybridization specificity ratios across samples (inter-sample variability) represents another important issue that should be considered in resequencing analysis (9) . To uncover general trends in inter-sample variability for each type of mismatch probe, we calculated an average cv for mismatch probe hybridization specificity ratios determined over running windows of 10 bases (Table 1) . Interestingly, on both strands, the single base substitution probes showed the lowest inter-sample cv. The one and two base deletion probes showed at least 2-fold higher coefficients of variation on both strands, relative to the substitution probes. Surprisingly, the one base insertion probes showed significantly higher coefficient of variations than any of the other classes of mismatch probes across samples. In fact, they are 3.5-fold higher than the corresponding substitution probes on each strand. The relative levels of inter-sample variation for all mismatch probes across exon 50 are displayed graphically in Figure 4 . The error bars represent one standard deviation from the mean of the hybridization specificity ratio determined over a running window of 10 bases in each of the 68 samples. Note that the substitution probes show lower inter-sample variability than one base deletion, two base deletion and . Hybridization specificities of mismatch probes. A 10-base running window of the log 10 hybridization specificity ratios of substitution (red), one base deletion (green), two base deletion (blue) and one base insertion (black) was plotted for the sense (A) and antisense (B) strands of ATM exon 50. The light red, light green, light blue and gray shaded areas represent -1 SD of the log 10 hybridization specificity ratios for the substitution, one base deletion, two base deletion and one base insertion probes, respectively. one base insertion probes, in agreement with Table 1 . The variability in hybridization specificity measurements is consistent across all 62 ATM coding exons (Supplementary Figure 1) . Overall, our analyses indicate that, on average, single base insertion probes show substantially lower reproducibility across experiments than base substitution, one base deletion and two base deletion probes. The increased inter-and intrasample variability in hybridization specificity of single base insertion and deletion probes relative to single base substitution and two base deletion probes should be considered when designing and interpreting microarray-based screens for genetic variation. For a given microarray design, substantially more control hybridization experiments may be needed to determine baseline fluctuations in the hybridization specificities of insertion and deletion probes relative to those of substitution probes. In contrast to single nucleotide mismatches, detailed thermodynamic analyses of double helical nucleic acids with bulged nucleotides have only recently been conducted (34, (39) (40) (41) . In such cases, the bulged nucleotide is unpaired on only one of the nucleic acid strands. These studies are relevant to understanding the properties of the deletion and insertion probes since they can form duplexes containing bulges with target nucleic acid. For deletion probes, the bulged nucleotide is located on the target strand ( Figure 7) . Conversely, the insertion probes contain the bulged nucleotide in duplexes with wild-type target (Figure 7) . Although subject to sequence context effects, duplexes containing a single base bulge are predicted to be more stable than those containing single nucleotide mismatches (34, (39) (40) (41) . This is reflected in the lower average hybridization specificity of single base deletion and insertion probes relative to that of substitution probes (Table 1 and Figure 4) . Conversely, duplexes containing two base bulges are predicted to be generally less stable than those containing a single base mismatch (40, 41) . In part, this is due to the assumption that helical stacking is interrupted by bulges of two or greater bases in length while it is preserved for one base bulges (40, 41) . The higher average hybridization specificity ratios of two base (38) was used to predict the intramolecular structures with the lowest Gibbs free energy (DG) for either the 25-30 base stretches that encompass each listed sequence tract in the target or for the PM probes complementary to each sequence tract. We use these DG values to predict the stability of these structures. DG > (À1 kcal/mmol) = weak (W); (À1 kcal/mmol) > DG > (À3 kcal/mmol) = medium (M); and DG < (À3 kcal/mmol) = strong (S). c Type of mismatch probe that provided poor hybridization specificity ratios. d Low hybridization specificity found on both sense and antisense strands. e Immediately following the 3 0 end of this segment is a (T) 5 sequence tract. deletion probes relative to substitution probes are in agreement with the predicted properties of these probes ( Table 1) . The considerably lower average inter-sample variability of substitution probes relative to deletion and insertion probes was unexpected given that the same target was hybridized to all mismatch probes simultaneously in the same experiment. The sources of inter-sample variation include sample preparation, hybridization conditions and the microarrays themselves. It is reasonable to assume that the microarrays themselves are not the major source of variability since the combinatorial manufacturing processes should lead to roughly equivalent synthesis quality for all the arrayed probes (42, 43) . It seems more likely that the insertion and deletion probes are more sensitive to subtle changes in target preparation (e.g. amount of fragmentation and dye incorporation) and hybridization conditions (e.g. target concentration, temperature and wash conditions) than the substitution probes. However, a definitive explanation for our observations will require further investigations (44) (45) (46) (47) (48) (49) (50) (51) (52) . In addition to their potential value, it is important to note some of the caveats when relying upon mismatch probes for mutation detection. For example, it is important to screen for all possible sequence changes, including multiple base insertions and deletions, in mutational analyses of disease-related loci, such as the ATM, BRCA1 and BRCA2 genes. Given that 4 N probes per base per strand are needed to screen for insertions of length N in a mixed sequence, it is unlikely that oligonucleotides complementary to insertions of two or more base pairs will be represented on microarrays screening large sequence tracts for mutations in the near future. Deletions represent a more tenable situation since only one probe per base per strand is needed to screen for a deletion of a given length in a mixed sequence. Nevertheless, there will still be limitations as to the number of deletion probes that can be realistically represented in a given microarray. Finally, it is often critical to precisely determine the nature of a sequence change within a given sample in order to properly assess its functional significance. Thus, it is important to consider error rates when assigning the identity of a mutation based on mismatch probe data. When dealing with clinical samples, it will be especially important to confirm the identity Sequence variation in genes and genomic DNA: methods for large-scale analysis Analysis of SNPs and other genomic variations using gel-based chips Mutational analysis using oligonucleotide microarrays Resequencing and mutational analysis using oligonucleotide microarrays New developments in high-throughput resequencing and variation detection using high density microarrays Rapid p53 sequence analysis in primary lung cancer using an oligonucleotide probe array Characterization of single-nucleotide polymorphisms in coding regions of human genes Accessing genetic information with high-density DNA arrays High-throughput variation detection and genotyping using microarrays Flexible use of high-density oligonucleotide arrays for single-nucleotide polymorphism discovery and validation Evolutionarily conserved sequences on human chromosome 21 Extensive polymorphisms observed in HIV-1 clade B protease gene using high-density oligonucleotide arrays Large-scale discovery and genotyping of single-nucleotide polymorphisms in the mouse The Human MitoChip: a high-throughput sequencing microarray for mitochondrial mutation detection Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21 Clinical application of oligonucleotide probe array for full-length gene sequencing of TP53 in colon cancer Largescale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome Identification of TP53 mutations in human cancers using oligonucleotide microarrays Evaluation of the performance of a p53 sequencing microarray chip using 140 previously sequenced bladder tumor samples DNA analysis and diagnostics on oligonucleotide microchips Tracking the evolution of the SARS coronavirus using high-throughput, high-density resequencing arrays Genomic DNA insertions and deletions occur frequently between humans and nonhuman primates Segmental phylogenetic relationships of inbred mouse strains revealed by fine-scale analysis of sequence variation across 4.6 mb of mouse genome Cystic fibrosis mutation detection by hybridization to light-generated DNA probe arrays Detection of heterozygous mutations in BRCA1 using high density oligonucleotide arrays and two-colour fluorescence analysis Enhanced high density oligonucleotide array-based sequence analysis using modified nucleoside triphosphates Strategies for mutational analysis of the large multiexon ATM gene using high-density oligonucleotide arrays Two color hybridization analysis using high density oligonucleotide arrays and energy transfer dyes Oligonucleotide microarray based detection of repetitive sequence changes Oligonucleotide microarrays demonstrate the highest frequency of ATM mutations in the mantle cell subtype of lymphoma The MLH1 D132H variant is associated with susceptibility to sporadic colorectal cancer ATM and related protein kinases: safeguarding genome integrity ATM gene and lymphoid malignancies Thermodynamic parameters for an expanded nearest-neighbor model for the formation of RNA duplexes with single nucleotide bulges Structural features and stability of an RNA triple helix in solution Solution structures of DNA.RNA hybrids with purine-rich and pyrimidine-rich strands: comparison with the homologous DNA and RNA duplexes Sequence specific thermodynamic and structural properties for DNA.RNA duplexes Mfold web server for nucleic acid folding and hybridization prediction The effect of base sequence on the stability of RNA and DNA single base bulges Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure Prediction of hybridization and melting for double-stranded nucleic acids Light-directed, spatially addressable parallel chemical synthesis High-density genechip oligonucleotide probe arrays Improving the sensitivity and specificity of gene expression analysis in highly related organisms through the use of electronic masks Prioritized selection of oligodeoxyribonucleotide probes for efficient hybridization to RNA transcripts Thermodynamic calculations and statistical correlations for oligo-probes design OligoArray 2.0: design of oligonucleotide probes for DNA microarrays using a thermodynamic approach A model of molecular interactions on short oligonucleotide microarrays Probe selection for high-density oligonucleotide arrays Modeling of DNA microarray data by using physical properties of hybridization High-density nucleoside analog probe arrays for enhanced hybridization Sequence-independent and linear variation of oligonucleotide DNA binding stabilities We would like to thank Nathaniel Hunt at the National Institutes of Health for programming assistance in the early stages of the project and Juergen Reichardt at the University of Southern California for thoughtful discussion. This work was partially funded by National Institutes of Health Grants P50-HG002790 and P30-CA014089. Funding to pay the Open Access publication charges for this article was provided by a USC Institute for Genetic Medicine gift account. Supplementary Material is available at NAR Online.