Triplex and other DNA motifs show motif-specific associations with mitochondrial DNA deletions and species lifespan 1 Triplex and other DNA motifs show motif-specific associations with 1 mitochondrial DNA deletions and species lifespan. 2 Authors 3 Kamil Pabis1 4 1. Georg August University of Göttingen, Göttingen, Germany. 5 Mail: Kamil.pabis@gmail.com 6 7 8 ABSTRACT 9 The “theory of resistant biomolecules” posits that long-lived species show resistance to molecular 10 damage at the level of their biomolecules. Here, we test this hypothesis in the context of mitochondrial 11 DNA (mtDNA) as it implies that predicted mutagenic DNA motifs should be inversely correlated with 12 species maximum lifespan (MLS). 13 First, we confirmed that guanine-quadruplex and direct repeat (DR) motifs are mutagenic, as they 14 associate with mtDNA deletions in the human major arc of mtDNA, while also adding mirror repeat (MR) 15 and intramolecular triplex motifs to a growing list of potentially mutagenic features. What is more, 16 triplex motifs showed disease-specific associations with deletions and an apparent interaction with 17 guanine-quadruplex motifs. 18 Surprisingly, even though DR, MR and guanine-quadruplex motifs were associated with mtDNA 19 deletions, their correlation with MLS was explained by the biased base composition of mtDNA. Only 20 triplex motifs negatively correlated with MLS even after adjusting for body mass, phylogeny, mtDNA 21 base composition and effective number of codons. 22 Taken together, our work highlights the importance of base composition for the comparative 23 biogerontology of mtDNA and suggests that future research on mitochondrial triplex motifs is 24 warranted. 25 ABBREVIATIONS 26 BPs, mtDNA deletion break points 27 DR, direct repeats 28 ER, everted repeats 29 GQ, guanine-quadruplexes 30 IR, inverted repeats 31 MLS, species maximum lifespan 32 MR, mirror repeats 33 nBMST, non-B DNA motif search tool 34 Nc, number of effective codons 35 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.13.381475doi: bioRxiv preprint mailto:Kamil.pabis@gmail.com https://doi.org/10.1101/2020.11.13.381475 2 PGLS, phylogenetic generalized least squares 36 SD, standard deviation 37 Trip, Triplex forming motif 38 XR, any repeat half-site or motif 39 mtDNA, mitochondrial DNA 40 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.13.381475doi: bioRxiv preprint https://doi.org/10.1101/2020.11.13.381475 3 INTRODUCTION 41 Macromolecular damage to lipids, proteins and DNA accumulates with aging (Richardson and Schadt 42 2014, Gladyshev 2013), whereas cells isolated from long-lived species are resistant to genotoxic and 43 cytotoxic drugs, giving rise to the multistress resistance theory of aging (Miller 2009, Hamilton and 44 Miller 2016). By extension of this idea, the “theory of resistant biomolecules” posits that lipids, proteins 45 and DNA itself should be resilient in long-lived species (Pamplona and Barja 2007). In support of this 46 theory, it was shown that long-lived species possess membranes that contain fewer lipids with reactive 47 double bonds (Valencak and Ruf 2007) and perhaps a lower content of oxidation-prone cysteine and 48 methionine in mitochondrially encoded proteins (see Aledo et al. 2012 for a discussion). 49 Mitochondrial DNA (mtDNA) mutations constitute one type of macromolecular damage that 50 accumulates over time. Point mutations accumulate in proliferative tissues like the colon and in some 51 progeroid mice (Kauppila et al. 2017), while the accumulation of mtDNA deletions in postmitotic tissues 52 may underpin certain age-related diseases like Parkinson’s and sarcopenia (Lawless et al. 2020, Bender 53 et al. 2006). 54 If the theory of resistant biomolecules can be generalized, the mtDNA of long-lived species should resist 55 both point mutation and deletion formation. However, we will focus on deletions because they are 56 more pathogenic than point mutations at the same level of heteroplasmy (Gamamge et al. 2014) and 57 human tissues do not accumulate high levels of point mutations observed in progeroid mouse models 58 (Khrapko et al. 2006). 59 Since deletion formation depends on the primary sequence of the mtDNA (sequence motifs) it is 60 amenable to bioinformatic methods. Ever since a link between direct repeat (DR) motifs and deletion 61 formation became known, variations of the theory of resistant biomolecules have been tested, although 62 not necessarily under this name. It was reasoned that long-lived species evolved to resist deletion 63 formation and mtDNA instability by reducing the number of mutagenic motifs in their mtDNA 64 (Khaidakov et al. 2006, Yang et al. 2013). 65 We aim to extend these findings by re-evaluating and establishing new candidate motifs, which we then 66 correlate with species maximum lifespan (MLS). Studying multiple motif classes at once also allows us to 67 reveal relationships between potentially overlapping mtDNA motifs that may affect the data. We define 68 candidate motifs as those that are associated with deletion formation inside the major arc of human 69 mtDNA, because during asynchronous replication the major arc is single stranded for extended periods 70 of time (Persson et al. 2019) which should favor the formation of secondary structures. Finally, we test if 71 these motifs correlate with the MLS of mammals, birds and ray-finned fishes after correcting for 72 potential biases, especially global mtDNA base composition which is an important confounder (Aledo et 73 al. 2012) yet is neglected in some studies (Yang et al. 2013). 74 The choice of motifs to study is based on biological plausibility and published literature that will be 75 briefly reviewed below. Mutagenic motifs include repeats as well as guanine-quadruplex (GQ)- and 76 triplex-forming motifs. DR motifs can lead to DNA instability through strand-slippage if two DR motifs 77 mispair during replication (Persson et al. 2019). Whereas inverted repeat (IR), G-quadruplex and triplex 78 motifs destabilize progression of the replication fork through the formation of stable secondary 79 structures. Some of the structures formed include hairpins for IR motifs (Tremblay-Belzile et al. 2015), 80 triple stranded DNA for triplex motifs and bulky stacks of guanines for G-quadruplex motifs (Bacolla et 81 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.13.381475doi: bioRxiv preprint https://doi.org/10.1101/2020.11.13.381475 4 al. 2016; Fig. 1). Mirror repeat (MR) and everted repeat (ER) motifs, in contrast, do not allow stable 82 Watson-Crick base pairing and are thus less likely to be mutagenic, although a subset of MR motifs may 83 form triplex structures (Kamat et al. 2016). 84 Thus, many motifs can be mutagenic in principle, but what is the evidence that these motifs are related 85 to mtDNA instability, particularly deletions, and MLS? 86 Paradoxically, while DRs are the motif most consistently associated with mtDNA deletion breakpoints 87 (BPs), despite preliminary reports (Khaidakov et al. 2006, Lakshmanan et al. 2012, Yang et al. 2013), no 88 correlation with species MLS was seen in recent studies (Lakshmanan et al. 2015). In contrast, with the 89 exception of one preprint (Mikhailova et al. 2020), IRs are not known to be associated with mtDNA 90 deletions (Dong et al. 2014), although they do show a negative relationship with species MLS (Yang et 91 al. 2013) and may contribute to inversions (Tremblay‐Belzile et al. 2015). Whether age-related mtDNA 92 inversions underlie any pathology, however, requires further study. Finally, G-quadruplex motifs are 93 associated with both deletions (Dong et al. 2014) and point mutations (Butler et al. 2020), but no study 94 tested if they correlate with MLS. Triplex motifs are poorly studied with one report finding no 95 association between these motifs and deletions (Oliveira et al. 2013). 96 Based on these studies we decided to test the theory of resistant biomolecules by quantifying DR, MR, 97 IR, ER, G-quadruplex- and triplex-forming motifs. We stipulate that if a motif class played a causal role in 98 aging, it should be involved in deletion formation and its abundance should be negatively correlated 99 with species MLS. 100 101 Figure 1 102 A. Direct repeat, both half-sites have the same orientation. 103 B. Inverted repeat, the half-sites are complementary and has mirror symmetry. 104 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.13.381475doi: bioRxiv preprint https://doi.org/10.1101/2020.11.13.381475 5 C. Everted repeat, the half-sites are complementary. 105 D. Mirror repeat, the half-sites have mirror symmetry. 106 107 E. Triplex motifs can form a triple helical DNA structure also called H-DNA. 108 F. In a G-quadruplex multiple G-quartets (depicted as blue rectangles) stack on top of each other. 109 Adapted from Gurusaran et al. (2013) and Khristich and Mirkin (2020) with permission. Half-sites 110 shown in red. 111 METHODS 112 Detection of DNA motifs 113 Repeats were detected by a script written in R (vR-3.6.3). Briefly, to find all repeats with N basepairs 114 (bps), the mtDNA light strand is truncated by 0 to N bps and each of the N truncated mtDNAs is then 115 split every N bps. This generates every possible substring (and thus repeat) of length N. In the next step, 116 duplicate strings are removed. Afterwards we can find DR (a substring with at least two matches in the 117 mtDNA), MR (at least one match in the mtDNA and on its reverse), IR (at least one match in the mtDNA 118 and on its reverse-complement) and ER motifs (at least one match in the mtDNA and on its 119 complement). Overlapping and duplicate repeats were not counted for the correlation between repeats 120 and MLS. The code for the analyses performed in this paper can be found on github 121 (pabisk/aging_triplex2). 122 Unless stated otherwise, all analyses were performed in R. G-quadruplex motifs were detected by the 123 pqsfinder package (v2.2.0, Hon et al. 2017). Intramolecular triplex-forming motifs were detected by the 124 triplex package (v1.26.0, Hon et al. 2013) and duplicates were removed. We also compared the data 125 with two other publicly available tools, Triplexator (Buske et al. 2013), and with the non-B DNA motif 126 search tool (nBMST; Cer et al. 2011). Triplexator was run on a virtual machine in an Oracle VM 127 VirtualBox (v6.1) in -ss mode on the human mitochondrial genome and its reverse complement, the 128 results were combined and overlapping motifs from the output were removed. We used the web 129 interface of nBMST to detect mirror repeats/triplexes (v1.0). 130 Association between motifs and major arc deletions 131 The major arc was defined as the region between position 5747 and 16500 of the human mtDNA 132 (NC_012920.1). The following deletions and their breakpoints were located in this region and included: 133 1066 deletions from the MitoBreak database (Damas et al. 2014, mtDNA Breakpoints.xlsx), 1114 from 134 Persson et al. (2019) and 1894 from Hjelm et al. (2019). 135 Each deletion is defined by two breakpoints. A breakpoint pair was considered to associate with a motif 136 if the motif fell within a defined window around one or both breakpoints, depending on the analysis. 137 The window size was chosen in relation to the length of the studied motifs (30 bp for repeats and 50 bp 138 for other motifs). 139 Three different motif orientations relative to the breakpoints were considered. Two orientations for 140 motifs with half-sites (i.e. repeats), either both half-sites at any one breakpoint of a deletion, or one 141 half-site per breakpoint of a deletion. Motifs with overlapping half-sites were not counted. In the third 142 case, distinct G-quadruplex and triplex motifs could associate with one or both breakpoints of a deletion, 143 but were at most counted once, since the latter case is sufficiently rare. 144 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.13.381475doi: bioRxiv preprint https://doi.org/10.1101/2020.11.13.381475 6 In order to exclude overlapping “hybrid” motifs, MR and DR motifs with the same sequence were 145 removed whereas triplex and G-quadruplex motifs were removed if they were in proximity. 146 To generate controls, the mtDNA deletions as a whole were randomly redistributed inside the major arc 147 which, because of the fixed deletion size, allowed us to approximate the original distribution of 148 breakpoints (as suggested by Oliveira et al. 2013). Significance was determined via one-sample t-test in 149 Prism (v7.04) by comparing actual breakpoints to 20 such randomized controls. Alternative controls 150 were generated by shifting each breakpoint by 200 bp towards the midpoint of the major arc or as in 151 Fig. S9. 152 Cancer associated breakpoints 153 We obtained all autosomal breakpoints available from the Catalogue Of Somatic Mutations In Cancer 154 (COSMIC; release v92, 27th August 2020), which includes deletions, inversions, duplications and other 155 abnormalities (n=587515 in total). After removing breakpoints whose sequences could not be retrieved 156 (<1.7%), we quantified the number of predicted G-quadruplex and triplex motifs in a 500 bp window 157 centered on the breakpoints using default settings for the detection of these motifs. Sequences of 158 breakpoint regions were obtained from the GRCh38 build of the human genome using the BSgenome 159 package (v1.3.1). Each breakpoint shifted by +3000 bps served as its own control. 160 Lifespan, base composition and life history traits 161 We included three phylogenetic classes in our analysis for which we had sufficient data (n>100), 162 mammals, birds and ray-finned fishes (actinopterygii). MLS and body mass were determined from the 163 AnAge database (Tacutu et al. 2018) and, for mammals, supplemented with data from Pacifici et al. 164 (2013). The mtDNA accessions were obtained from an updated version of MitoAge (unpublished; Toren 165 et al. 2016). Species were excluded if body mass data was unavailable, if the sequence could not be 166 obtained using the genbankr package (v1.14.0), or if the extracted cytochrome B DNA sequence did not 167 allow for an alignment, precluding phylogenetic correction. The species data can be found in the 168 supplementary (Species Data.xlsx). 169 We analyzed the full mtDNA sequence, heuristically defined as the mtDNA sequence between the first 170 and last encoded tRNA, excluding the D-loop, which is rarely involved in repeat-mediated deletion 171 formation (Yang et al. 2013). The effective number of codons was calculated using Wright’s Nc (Smith et 172 al. 2019). Base composition was calculated for the light-strand. GC skew was calculated as the fraction 173 (G − C)/(G + C) and AT skew as (A − T)/(A + T). All correlations are Pearson’s R. Partial correlations were 174 performed using the ppcor package (v1.1). 175 Phylogenetic generalised least squares and phylogenetic correction 176 Observed correlations between traits and lifespan can be spurious due to shared species ancestry 177 (Speakman 2005). To correct for this, we use phylogenetic generalised least squares (PGLS) 178 implemented in the caper package (v1.0.1). Species phylogenetic trees were constructed via neighbor 179 joining based on aligned cytochrome B DNA sequences using Clustal Omega from the msa package 180 (v1.18.0) and in the resulting mammalian and bird tree, four branch edge lengths were equal to zero, 181 which were set to the lowest non-zero value in the dataset. 182 183 RESULTS 184 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.13.381475doi: bioRxiv preprint https://doi.org/10.1101/2020.11.13.381475 7 Direct repeats and mirror repeats are over-represented at mtDNA deletion breakpoints 185 In order to define candidate mtDNA motifs that could be linked with lifespan, we started by reanalyzing 186 motifs that associate with mtDNA deletion breakpoints reported in the MitoBreak database (Damas et 187 al. 2014; Fig. S1; mtDNA Breakpoints.xlsx). In the below analysis, we consider DR and IR motifs thought 188 to be mutagenic, as well as MR and ER motifs, so far not known to be mutagenic and we pool all 6 to 15 189 bp long repeats, since the data is similar between different repeat lengths (Fig. S2). 190 As shown by others, we found that DR motifs often flank mtDNA deletions (Fig. 2A). In contrast, no 191 strong association was seen for ER and IR motifs, even considering a larger window around the 192 breakpoint to allow for the fact that IRs could bridge and destabilize mtDNA over long distances 193 (Persson et al. 2019; Fig. S3). 194 Surprisingly, we also found MR motifs flanking deletion breakpoints more often than expected by 195 chance (Fig. 2A). However, DR and MR motifs are known to correlate with each other (Shamanskiy et al. 196 2019; Fig. 5B) and indeed we noticed a large sequence overlap between MR and DR motifs (Fig. 2B), 197 which could explain an apparent over-representation of MRs at breakpoints. Removal of overlapping 198 MR-DR hybrid motifs confirmed this suspicion. After this correction, the degree of enrichment was 199 strongly attenuated (Fig. 2C) and the total number of breakpoints flanked by MR motifs was reduced 200 by >80%. Nevertheless, long MR motifs remained particularly over-represented around deletions (Fig. 201 S4). 202 Since the prior analysis only considered motifs that flank both breakpoints, we next tested the idea that 203 IR and other motifs could be mutagenic if both half-sites are found at any of the breakpoints. However, 204 in this analysis no motif class showed enrichment around breakpoints (Fig. 2D). 205 206 207 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.13.381475doi: bioRxiv preprint https://doi.org/10.1101/2020.11.13.381475 8 208 Figure 2 209 Direct repeat (DR) and mirror repeat (MR) motifs are significantly enriched around actual deletion 210 breakpoints (BPs) compared to reshuffled BPs, but the same is not true for inverted repeat (IR) and 211 everted repeat (ER) motifs (A, D). The surprising correlation between MR motifs and deletion BPs is 212 attenuated when MRs that have the same sequence as DR motifs are removed (B, C). Controls were 213 generated by reshuffling the deletion BPs while maintaining their distribution (n=20, mean ±SD shown). 214 The schematic drawings above (A, D) depict the orientation of the repeat (XR) half-sites in relation to the 215 BPs. *** p < 0.001; ** p < 0.01 by one sample t-test. 216 217 A) The number of deletions associated with DR, MR, IR or ER motifs at both BPs compared with 218 reshuffled controls. 219 B) Venn diagram showing the number of MR, DR and hybrid MR-DR motifs that were identified within 220 the major arc. 221 C) The number of deletions associated with MR motifs, before (MR) and after removal of hybrid MR-DR 222 motifs (MRDR-), compared with reshuffled controls. 223 D) The number of deletions associated with DR, MR, IR or ER motifs at either BP compared with 224 reshuffled controls. 225 226 Predicted triplex-forming motifs are over-represented at mtDNA breakpoints 227 Given the association between MR motifs and breakpoints we decided to analyze triplex motifs, a 228 special case of homopurine and homopyrimidine mirror repeats (Khristich and Mirkin 2020, Bissler 229 2007), and their association with deletion breakpoints in the MitoBreak database. 230 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.13.381475doi: bioRxiv preprint https://doi.org/10.1101/2020.11.13.381475 9 Here, we use the triplex package to predict intramolecular triplex motifs because it has several 231 advantages compared to other software (Hon et al. 2013). For example, using the nBMST tool, as in a 232 previous study of mtDNA instability (Oliveira et al. 2013), we only identified two potential triplex motifs 233 within the major arc that did not overlap with the six motifs identified by the triplex package (Table S1). 234 In contrast, using Triplexator (Buske al. 2013) we were able to detect four of the six triplex motifs and 235 the motifs detected by Triplexator were also enriched at breakpoints (Table S2). 236 We noticed that predicted triplexes are G-rich and thus could be related to G-quadruplex motifs (Doluca 237 et al. 2013). In a comparison of the two motif types, however, we found several differences (Table S1, 238 S3). Triplex motifs were shorter and less abundant than predicted G-quadruplexes, associated with 239 fewer breakpoints altogether (Fig. 3) and, in contrast to G-quadruplexes almost exclusive to the G-rich 240 mtDNA heavy-strand, triplex motifs were also common on the light-strand. 241 The six triplex motifs detected by the triplex package were significantly enriched around deletion 242 breakpoints and when we excluded triplex-G-quadruplex hybrid motifs the result was attenuated but 243 remained significant (Fig. 3A). Given the higher risk of spurious findings with only six motifs, we 244 repeated the analysis using a relaxed definition of triplex and the results were fundamentally unchanged 245 (Fig. 3B). Furthermore, our results were not sensitive to reasonable changes in the size of the search 246 window around breakpoints (Fig. S5A, B), motif quality scores (Fig. S5C, D) or inclusion of overlapping 247 motifs (Fig. S5E-G). 248 Analogous to the situation with MR motifs we tested if overlapping triplex-DR hybrid motifs could bias 249 our results. Given the rarity of triplex motifs and the many DRs in the mitochondrial genome we choose 250 an alternative approach rather than excluding triplex motifs that overlapped any DR half-site. We 251 compared the fraction of triplex and G-quadruplex positive deletions associated with DRs (GQ+, DR+ and 252 Trip+, DR+) and not associated with DRs (GQ+, DR- and Trip+, DR-). We considered a deletion to be DR+ if 253 both breakpoints were flanked by the same DR sequence. In this case, only 44% of Trip+ deletions 254 associated with DRs whereas 66% of GQ+ deletions did (Table S4). 255 256 Figure 3 257 Triplex motifs are significantly enriched around actual breakpoints (BPs) compared to reshuffled BPs (A, 258 B) even after removal of G-quadruplex (GQ)-triplex hybrid motifs (TripGQ-). The number of unique triplex 259 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.13.381475doi: bioRxiv preprint https://doi.org/10.1101/2020.11.13.381475 10 motifs, GQ motifs and of hybrid triplex-GQ motifs, within the mtDNA major arc, is shown in the Venn 260 diagrams above (A, B). Enrichment of GQ motifs around BPs is shown for comparison in (C). Controls 261 were generated by reshuffling the deletion BPs while maintaining their distribution (n=20, mean ±SD 262 shown). The schematic drawing above (C) depicts the orientation of the GQ and triplex motifs (XR) in 263 relation to the BPs. *** p < 0.0001 by one sample t-test. 264 265 A) The number of deletion BPs associated with triplex motifs compared with reshuffled controls. 266 Analysis including (left side) or excluding triplex-GQ hybrid motifs (right side). 267 B) Same as (A) but with relaxed criteria for the detection of triplex motifs (min score=12) and GQ motifs 268 (min score=26). 269 C) The number of deletion BPs associated with GQ motifs compared with reshuffled controls. Relaxed 270 settings (left side, min score=26) and default settings (right side, min score=47). 271 272 Triplex forming motifs may be associated with mitochondrial disease breakpoints 273 Next, we sought to validate our findings on two recently published next generation sequencing datasets 274 (Hjelm et al. 2019, Persson et al. 2019; mtDNA Breakpoints.xlsx; Table S5). We were able to confirm 275 the enrichment of DR (Fig. S6A, S7A), MR (Fig. S6A, S7A) and G-quadruplex motifs (Fig. 4A, B; S6C, D) 276 around deletion breakpoints. Additionally, we confirmed that hybrid MR-DR motifs are responsible in 277 large part for the enrichment of MR motifs around breakpoints (Fig. S6B, S7B). 278 In contrast, we found that triplex motifs were not consistently enriched around breakpoints in the 279 dataset of Hjelm et al. (Fig. S6C, D), which is based on post-mortem brain samples from patients without 280 overt mitochondrial disease, whereas we saw enrichment in the dataset by Persson et al. (Fig. 4A, B), 281 which is based on muscle biopsies from patients with mitochondrial disease. This unexpected 282 discrepancy prompted us to take a second look at the MitoBreak data. In this dataset triplex motifs were 283 significantly more enriched at breakpoints in the mtDNA single deletion subgroup compared to the 284 healthy tissues subgroup (Fig. S8). In addition, we found more broadly that mitochondrial disease status 285 might explain the heterogenous results across datasets we have seen (Fig. 4C). 286 Further strengthening our findings, triplex motifs were enriched in the MitoBreak and Persson et al. 287 dataset regardless of the breakpoint shuffling method chosen and of our statistical assumptions (Fig. 288 S9). What is more, triplex motifs were also enriched at breakpoints when we pooled all three datasets 289 (Fig. 4D), although to a lesser extent. 290 Finally, G-quadruplex motifs close to triplex motifs were more strongly enriched at deletion breakpoints 291 than solitary G-quadruplex motifs (Fig. 4E; Fig. S10), suggesting that triplex formation may further 292 contribute to DNA instability. 293 294 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.13.381475doi: bioRxiv preprint https://doi.org/10.1101/2020.11.13.381475 11 295 Figure 4 296 In the Persson et al. (2019) dataset, triplex and G-quadruplex (GQ) motifs are enriched around deletion 297 breakpoints (BPs), using either default (A) or relaxed scoring criteria (B). Although triplex motifs 298 predominate in mitochondrial disease datasets (C), we also find that triplex motifs are significantly 299 enriched around BPs (D) after pooling the data from MitoBreak, Persson et al. (2019) and Hjelm et al 300 (2019). Finally, GQ and triplex motifs show stronger enrichment around BPs than either of them in 301 isolation (E). Controls were generated by reshuffling the deletion BPs while maintaining their 302 distribution (n=20, mean ±SD shown). The schematic drawing above (D) depicts the orientation of the 303 motifs (XR) in relation to the BPs. *** p<0.0001, **p<0.001 by one sample t-test. 304 305 A) The number of deletion BPs associated with GQ and triplex motifs compared with reshuffled controls 306 (min score = default). 307 B) The number of deletion BPs associated with GQ and triplex motifs compared with reshuffled controls 308 (min score = relaxed). 309 C) The number of deletion BPs associated with triplex motifs (relaxed settings, min score=12) stratified 310 by mitochondrial disease status. MitoBreak data includes single and multiple mitochondrial deletion 311 syndromes. 312 D) The number of deletion BPs associated with triplex motifs, or with triplex motifs excluding triplex-GQ 313 hybrid motifs (TripGQ-), compared with reshuffled controls. Default settings (left side, min score=15) and 314 relaxed settings (right side, min score=12). 315 E) The fold-enrichment of GQ and triplex motifs around deletion BPs is shown. Motifs were considered 316 overlapping if their midpoints were within 50 bp. 317 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.13.381475doi: bioRxiv preprint https://doi.org/10.1101/2020.11.13.381475 12 318 Repeats and lifespan: no support for the theory of resistant biomolecules 319 For our analysis, we focus on 11 bp long repeat motifs as short repeats are less likely to allow stable 320 base pairing and longer repeats are rare (Fig. S11) and because results considering repeat motifs of 321 different lengths usually agree with each other (Table S6; Yang et al. 2013). To allow comparability with 322 other studies (Lakshmanan et al. 2015) we analyzed non D-loop motifs, but results for major arc motifs 323 are numerically similar (Table S7). 324 First, consistent with Yang et al. (2013) we found that IR motifs show a negative correlation with the 325 MLS of mammals in the unadjusted model. In addition, we identified ER motifs, a class of symmetrically 326 related repeats, that show an even stronger inverse relationship with longevity (Fig. 5A; Table 1). 327 However, these inverse correlations vanished after taking into account body mass, base composition 328 and phylogeny in a PGLS model (Table 1). Second, in agreement with Lakshmanan et al. (2015) we 329 found that DR motifs do not correlate with the MLS of mammals. The same was true for the 330 symmetrically related MR motifs. Just as with IR motifs, modest inverse correlations vanished in the fully 331 adjusted model (Table 1). We also found the same null results in two other vertebrate classes, birds and 332 ray-finned fishes (Table S6). To gain hints as to causality, we finally tested if longer repeats, allowing 333 more stable base pairing, show stronger correlations with MLS, but to our surprise we noticed the 334 opposite (Fig. S12A-D). 335 Considering all four types of repeats together, we noticed that repeats with both half-sites on the same 336 strand (DR and MR) or half-sites opposite strands (IR and ER) were correlated with each other (Fig. 5B) 337 and with the same mtDNA compositional biases (Fig. 5C). Thus, for DR and MR motifs, an apparent 338 relationship with MLS may be explained by their inverse relationship with GC content and for IR and ER 339 motifs by an inverse relationship with GC content and a positive relationship with GC skew. 340 341 Figure 5 342 The number of everted repeat (ER) motifs is negatively correlated with species MLS in an unadjusted 343 analysis (A). Repeats with a similar orientation correlate with each other (B). Direct repeat (DR) and 344 mirror repeat (MR) motifs have a similar orientation since both half-sites are found on the same strand 345 and in the case of ER and inverted repeat (IR) motifs the half-sites are on opposite strands. Finally, we 346 show the major mtDNA compositional biases that co-vary with the four repeat classes (C) and may 347 (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2020.11.13.381475doi: bioRxiv preprint https://doi.org/10.1101/2020.11.13.381475 13 explain an apparent correlation with MLS. Data is for 11 bp long repeats and Pearson’s R is shown in (A-348 C). 349 350 Table 1. Correlation between potentially mutagenic motifs and species lifespan 351 Motif Type Raw Adjusted DR11 11bp -0.113 0.055 MR11 11bp -0.155 -0.002 IR11 11bp -0.336 0.105 ER11 11bp -0.356 -0.047 triplex default -0.296 -0.211** triplex relaxed -0.190 -0.127^ GQ default 0.264 0.068 GQ relaxed 0.283 -0.097** The adjusted model takes into account body mass, GC content, GC skew, AT skew and number of 352 effective codons. Significant correlations in the raw or adjusted model are bolded/underlined (p<0.05). 353 The PGLS model additionally considers phylogeny. ^denotes p-values of 0.05