key: cord-0297229-l5rsifc7 authors: Duveau, Fabien; Zande, Petra Vande; Metzger, Brian P. H.; Diaz, Crisandra J.; Walker, Elizabeth A.; Tryban, Stephen; Siddiq, Mohammad A.; Yang, Bing; Wittkopp, Patricia J. title: Mutational sources of trans-regulatory variation affecting gene expression in Saccharomyces cerevisiae date: 2021-02-22 journal: bioRxiv DOI: 10.1101/2021.02.22.432283 sha: c0d88457db7b456d3a5995d2a9718c4641fc2abe doc_id: 297229 cord_uid: l5rsifc7 Heritable variation in a gene’s expression arises from mutations impacting cis- and trans-acting components of its regulatory network, with expression variation often derived from trans-regulatory mutations within species. Here, we investigate how trans-regulatory mutations are distributed within the genome and within a gene regulatory network by identifying and characterizing 69 mutations with trans-regulatory effects on expression of the same focal gene in Saccharomyces cerevisiae. Relative to 1766 mutations without effects on expression of this focal gene, we found that these trans-regulatory mutations were enriched in coding sequences of transcription factors previously predicted to regulate expression of the focal gene. However, over 90% of the trans-regulatory mutations identified mapped to other types of genes involved in diverse biological processes including chromatin state, metabolism and signal transduction. Finally, we find that the genomic distribution of these new regulatory mutations significantly overlaps with the genomic distribution of trans-regulatory variants segregating within S. cerevisiae. The regulation of gene expression is a complex process, essential for cellular function, that 47 impacts development, physiology, and evolution. Expression of each gene is regulated by its 48 cis-regulatory DNA sequences (e.g., promoters, enhancers) interacting either directly or 49 indirectly with trans-acting factors (e.g. transcription factors, signaling pathways) encoded by 50 genes throughout the genome. Genetic variants affecting both cis-and trans-acting components 51 of regulatory networks contribute to expression differences within and between species ( Leeuwen et al., 2016) . However, the extent to which the genomic sources of trans-regulatory 79 mutations can be predicted from such networks is generally unknown (Flint & Ideker, 2019) . 80 Recently, a chemical mutagen was used to induce mutations throughout the genome of S. 81 cerevisiae, and hundreds of mutant genotypes were collected that all altered expression of the 82 same gene, providing the biological resources needed to systematically characterize properties 83 of new trans-regulatory mutations and to test the predictive power of inferred regulatory 84 networks. 85 86 Here, we use genetic mapping, candidate gene sequencing and functional validation to identify 87 69 trans-regulatory mutations that alter expression of the focal gene from this set of mutants 88 and contrast their properties with a comparable set of 1766 mutations that did not affect 89 expression of the focal gene. Using these data, we determined how these trans-regulatory 90 mutations were distributed within the genome and within regulatory networks. For example, we 91 asked how frequently trans-regulatory mutations were located in coding or non-coding 92 sequences because trans-regulatory variants are often predicted to affect coding sequences 93 (Hill et al., 2020) but some non-coding variants have been shown to be associated with trans-94 regulatory effects on gene expression (Consortium, 2020; Yao et al., 2017; Yvert et al., 2003) . 95 We also asked whether genes encoding transcription factors were the primary source of trans-96 regulatory variation, which is often assumed ( how well an inferred regulatory network can predict genomic sources of expression changes, we 100 mapped the trans-regulatory mutations to a network of transcription factors predicted by 101 functional genomic data to regulate expression of the focal gene and examined the molecular 102 functions and biological processes impacted by trans-regulatory mutations that did not map to 103 genes in this network. By systematically examining the properties and identity of new trans-104 regulatory mutations, this work fills a key gap in our understanding of how expression 105 differences arise and may help predict sources of trans-regulatory variation segregating in 106 natural populations. Indeed, we found that the genomic distribution of new trans-regulatory 107 mutations overlaps significantly with the genomic distribution of trans-regulatory variants 108 segregating among wild isolates of S. cerevisiae that affect expression of the same gene 109 (Metzger & Wittkopp, 2019) , suggesting that the mutational process generating new trans-110 regulatory variation significantly shaped the regulatory variation we see in the wild. 111 114 To characterize properties of new trans-regulatory mutations affecting expression of a focal 115 gene, we took advantage of three previously collected sets of haploid mutants that all showed 116 altered expression of the same reporter gene ( Figure 1A , Gruber et al., 2012; Metzger et al., 117 2016). This reporter gene (PTDH3-YFP) encodes a yellow fluorescent protein whose expression 118 is regulated by the S. cerevisiae TDH3 promoter, which natively drives constitutive expression 119 of a glyceraldehyde-3-phosphate dehydrogenase involved in glycolysis and gluconeogenesis 120 (McAlister & Holland, 1985) . Mutations in these mutants were caused by exposure to the 121 chemical mutagen ethyl methanesulfonate (EMS), which induces primarily G:C to A:T point 122 mutations randomly throughout the genome (Shiwa et al., 2012) . Together, these collections 123 contain ~1500 mutants isolated irrespective of their fluorescence levels ("unenriched" mutants) 124 and ~1200 mutants isolated after enriching for cells with the largest changes in fluorescence 125 ( Figure 1A ). When we started this work, expression level of PTDH3-YFP in these mutant 126 genotypes had been described (Gruber et al., 2012; Metzger et al., 2016) , but the specific 127 mutations present within each mutant as well as which mutation(s) alter(s) PTDH3-YFP 128 expression in each genotype were unknown. 129 130 From these collections, we selected 82 EMS-treated mutants for genetic mapping to identify 131 individual causal mutations ( Figure 1A ). Sanger sequencing of the reporter gene in these 132 mutants showed that none had mutations in the TDH3 promoter or any other part of the reporter 133 gene, indicating that they harbored mutations affecting PTDH3-YFP expression in trans. 39 of 134 these mutants were selected based on previously published fluorescence data, with 11 mutants 135 selected from the collections enriched for large effects (red points in Figure 1B ,C) and 28 136 mutants selected from the unenriched collection (red points in Figure 1D ). Each selected mutant 137 showed changes in average YFP fluorescence greater than 1% relative to the un-mutagenized 138 progenitor strain. Another 197 mutants from the unenriched collection (blue points in Figure 1D ) 139 were subjected to a secondary fluorescence screen, from which an additional 43 mutants with a 140 change in fluorescence greater than 1% (red points in Figure 1E ) were chosen. A 1% change in 141 YFP fluorescence has previously been shown to correspond to a ~3% change in YFP mRNA 142 abundance (see Methods and Duveau et al., 2018) , although changes in fluorescence caused 143 by trans-regulatory mutations in these mutants could affect either transcription driven by the 144 TDH3 promoter or post-transcriptional regulation of YFP synthesis or stability. 145 146 To identify mutations within the 82 selected EMS mutants, and to determine which of these 147 mutation(s) were most likely to affect YFP expression in each mutant, we performed bulk-148 segregant analysis followed by whole-genome sequencing (BSA-Seq) as described in Duveau 149 et al. (2014) with minor modifications (see Methods). Briefly, each mutant strain was crossed to 150 a common mapping strain expressing the PTDH3-YFP reporter gene, and large populations of 151 random haploid spores were isolated after inducing meiosis in the resulting diploids ( Figure 2A ). 152 For each of the 82 segregant populations, a low fluorescent bulk and a high fluorescent bulk of 153 ~1.5 x 10 5 cells each were isolated using fluorescence-activated cell sorting (FACS) ( Figure 154 2B). Genomic DNA extracted from each bulk was then sequenced to an average coverage of 155 ~105x (ranging from 75x to 134x among samples, Supplementary File 1) to identify the 156 mutations present within each mutant genotype and to quantify the frequency of mutant and 157 non-mutant alleles in both bulks ( Figure 2C) . A mutation causing a change in fluorescence is 158 expected to be found at different frequencies in the two populations of segregant cells. 159 Conversely, a mutation with no effect on fluorescence that is not genetically linked to a mutation 160 affecting fluorescence is expected to be found at similar frequencies in these two populations. 161 Prior work suggested this protocol should have 95% power to identify mutations altering 162 fluorescence by 1% or more (Duveau et al., 2014) . 163 164 Using a stringent approach for calling sequence variants (see Methods), we identified a total of 165 1819 mutations (Supplementary File 2), among which 1768 mutations (97.2%) were single 166 nucleotide changes ( Figure 2D ). Of these single nucleotide changes, 96.3% were one of the two 167 types of point mutations (G:C to A:T transitions) known to be primarily induced by EMS (Shiwa 168 et al., 2012) . 48 small indels and 3 aneuploidies, which could have arisen spontaneously or 169 been introduced by EMS, were also identified. Of these 3 mutants with aneuploidies, 2 were 170 found to have an extra copy of chromosome I and 1 was found to have an extra copy of 171 chromosome V based on ~1.5 fold higher sequencing coverage of these chromosomes relative 172 to the rest of the genome in the BSA-seq data from segregant populations (shown in 173 Supplementary Table 1) . We identified an average of 23.9 mutations per strain, which is within 174 the 95% confidence interval of 21 to 45 mutations per strain estimated previously from the 175 frequency of canavanine resistant mutants (Metzger et al., 2016) . Surprisingly, the number of 176 mutations per strain did not follow a Poisson distribution: we observed more strains with a 177 number of mutations far from the average than expected for a Poisson process (P-value < 10 -5 , 178 resampling test; Figure 2 - The remaining 36 mutants did not have any mutations significantly associated with fluorescence 200 (Supplementary File 2). These mutants tended to show smaller changes in fluorescence than 201 mutants with one or more associated mutations (Figure 2 -figure supplement 2), suggesting 202 that our power to map mutations causing 1% changes in fluorescence might have been lower 203 than anticipated. These 36 mutants might also harbor multiple mutations with small effects on 204 expression, each of which was below our detection threshold. Consistent with this possibility, we 205 observed a small but significant correlation (r 2 = 0.127, P = 0.03) between the total number of 206 mutations in these 36 EMS mutants and their expression level (Figure 2 -figure supplement 3 ). 207 It is also possible that we failed to find associated mutations in some of these mutants because 208 their change in fluorescence was initially overestimated by the "winner's curse" (Xiao & 209 Boehnke, 2009). Accordingly, 71% of mutants selected for mapping after two independent 210 fluorescence screens had at least one mutation significantly associated with fluorescence 211 compared to only 30% of mutants selected after a single fluorescence screen. Some changes in 212 fluorescence observed in these 36 mutants might also have been caused by non-genetic 213 variation and/or undetected mutations. 214 215 Additional trans-regulatory mutations identified by sequencing candidate genes 216 217 We noticed in the BSA-seq data that three mutations increasing fluorescence more than 5% 218 relative to the un-mutagenized progenitor strain mapped to two genes (ADE4 and ADE5) in the 219 same biochemical pathway (de novo purine biosynthesis) (Supplementary File 2). We therefore 220 used Sanger sequencing to test whether these genes or other genes in this pathway were also 221 mutated in 15 additional EMS mutants with fluorescence at least 5% higher than the progenitor 222 strain. We first looked for mutations in ADE4, then ADE5 if no mutation was found in ADE4, and 223 then ADE6 if no mutation was found in the other genes. At least one nonsynonymous mutation 224 was identified by Sanger sequencing in one of these three genes in 14 of the 15 EMS mutants 225 (green points in Figure 1C ,E; Supplementary File 4). For the remaining mutant (brown point in 226 Figure 1E ), we sequenced a fourth purine biosynthesis gene, ADE8, but again found no 227 mutation. In two additional EMS mutants with smaller increases in fluorescence (2.1% and 228 4.6%, purple points in Figure 1D ,E) and a reddish color characteristic of ADE2 loss of function 229 mutants (Roman, 1956 mutants showed a significant change in expression relative to the progenitor strain ( Figure 2E ). 263 In each case, the single-site mutant and the EMS mutant showed changes in expression in the 264 same direction relative to the progenitor strain ( Figure 2E ). The mutation affecting expression 265 was always the mutation with the larger G-value in the BSA-Seq data, consistent with the 266 results of the statistical tests described above (Supplementary File 3) . In the last case (YPW54 267 in Figure 2E ), both mutations affected expression in the single-site mutants, consistent with our 268 inability to statistically predict which mutation was more likely to impact expression from the 269 BSA-Seq data for this mutant as well as both mutations being nonsynonymous changes in the 270 same gene (CHD1) (Supplementary File 3) . The BSA-seq data also accurately predicted 271 whether a mutation increased or decreased fluorescence for 27 (93%) of the 29 mutations with 272 significant effects on fluorescence in single-site mutants ( Figure 2F ). For the other two 273 mutations, effects on expression in the same direction were observed in the single-site mutants 274 and the corresponding EMS mutants (Supplementary File 5), suggesting that the different 275 growth conditions used for the mapping experiment (see Methods) might have modified the 276 effects of these mutations. 277 278 Comparing PTDH3-YFP expression in the 40 single-site mutants that significantly altered 279 fluorescence to that in the 40 EMS mutants from which these mutations were identified showed 280 that expression was very similar overall between single-site and EMS mutants sharing the same 281 mutation ( Figure 2G , linear regression: r 2 = 0.944, P = 2.4 x 10 -25 ), although significant 282 differences in expression were observed for some pairs ( Figure 2G , Figure 2 - figure 283 supplement 5). These data suggest that (1) the vast majority of the mutations we identified by First, we asked whether the mutational spectra of trans-regulatory mutations differed from non-302 regulatory mutations ( Figure 3A) . We found that G:C to A:T transitions most commonly 303 introduced by EMS occurred at similar frequencies in the two groups (G-test, P = 0.84). No 304 indels were associated with expression in the BSA-seq data (Supplementary File 6), which was 305 not statistically different from the frequency of indels among non-regulatory mutations (0% vs 306 2.7%, G-test, P = 0.056). By contrast, aneuploidies were highly over-represented in the set of 307 trans-regulatory mutations since all three extra copies of a chromosome observed in the BSA-308 Seq data were found to be associated with fluorescence (G-test, P = 8.6 x 10 -6 ). We also found 309 a significant difference in the genomic distribution of the two sets of mutations (G-test, P = 2.4 x 310 10 -3 ), with non-regulatory mutations appearing to be randomly distributed throughout the 311 genome but trans-regulatory mutations enriched on chromosomes VII and XIII ( Figure 3B , 312 Figure 3 -figure supplement 1). However, these two chromosomes contain the purine 313 biosynthesis genes in which multiple trans-regulatory mutations were identified, and there was 314 no significant difference in genomic distributions between trans-regulatory and non-regulatory 315 mutations when mutations in purine biosynthesis genes were excluded (G-test, P = 0.35). 316 317 Trans-regulatory mutations are often assumed to be located in coding sequences, but they can 318 also be located in non-coding, presumably cis-regulatory, sequences of trans-acting genes (Hill 319 et al., 2020). We therefore asked whether trans-regulatory mutations affecting PTDH3-YFP 320 expression were more often found in coding or non-coding regions of the genome than 321 expected by chance. Of the 1766 non-regulatory mutations, 1257 (71.3%) were coding 322 mutations located in exons, and 506 (28.7%) were non-coding mutations located in intergenic (n 323 = 500) or intronic (n = 6) regions ( Figure 3C ). This paucity of mutations in introns is consistent 324 with the rarity of introns in S. cerevisiae, and the overall frequency of non-coding mutations 325 (28.7%) is similar to the fraction of the S. cerevisiae genome (30.6% of 12.1 Mb) considered 326 non-coding (www.yeastgenome.org). By contrast, of the 66 trans-regulatory point mutations, 327 only one was located in a non-coding sequence. This non-coding mutation was located in the 328 intergenic sequence between IOC2 and KIN2, presumably affecting expression of one or both 329 genes with a downstream effect on PTDH3-YFP expression. The 3 aneuploidies were excluded 330 from this and subsequent analyses because they affected both coding and non-coding 331 sequences of a large number of genes. The underrepresentation of non-coding changes among 332 regulatory mutations was statistically significant (1.5% of regulatory mutations are non-coding vs 333 28.4% of non-regulatory mutations; G-test, P = 4.3 x 10 -9 ), suggesting that new mutations 334 affecting PTDH3-YFP expression in trans are more likely to alter coding than non-coding 335 sequences. This enrichment in coding sequences might be because coding sequences tend to 336 have a higher density of functional sites than non-coding sequences. 337 338 Finally, we examined how trans-regulatory mutations located in coding sequences impacted the 339 amino acid sequences of the corresponding proteins. Among mutations identified in coding 340 sequences, we found that all trans-regulatory mutations changed the amino acid sequence of 341 proteins whereas only 70% of non-regulatory mutations did ( Figure 3D ; G-test, P = 1.4 x 10 -4 ). 342 This difference was primarily driven by mutations that introduced stop codons (nonsense 343 mutations) rather than mutations that substituted one amino acid for another (nonsynonymous 344 mutations): 20% of trans-regulatory mutations in coding sequences were nonsense mutations 345 versus 3% of non-regulatory mutations ( Figure 3D ; G-test, P = 4.8 x 10 -6 ), and 80% of trans-346 regulatory mutations were nonsynonymous versus 67% of non-regulatory mutations ( Figure 3D ; 347 G-test, P = 0.07). Nonsense mutations always altered an arginine, glutamine, or tryptophan 348 codon ( Figure test, P < 10 -4 ), perhaps because glycine is the smallest amino acid, making its substitution likely 358 to modify protein structure (Bhate et al., 2002; Miller, 2007 supports evidence of a transcription factor binding to a gene's promoter and regulating its 379 expression, we constructed a network (Figure 4 ) of potential direct regulators of TDH3 as well 380 as potential direct regulators of these direct regulators (1 st and 2 nd level regulators of TDH3) and 381 asked how often the trans-regulatory mutations we identified mapped to these genes. We found 382 that 4 trans-regulatory mutations mapped to three genes in this network, with 2 mutations 383 affecting the 1 st level regulator TYE7, 1 mutation affecting the 1 st level regulator GCR2, and 1 384 mutation affecting the 2 nd level regulator TUP1 (Supplementary File 6). This number of 385 mutations mapping to genes in the predicted TDH3 regulatory network was 12-fold greater than 386 expected by chance (6% for trans-regulatory vs 0.5% for non-regulatory mutations; G-test, P = 387 0.0037), thus the inferred regulatory network had predictive power even though the vast majority 388 of trans-regulatory coding mutations (61 of 65, or 94%) mapped to other genes. Only one of 389 these other trans-regulatory mutations mapped to a transcription factor. This mutation was a 390 nonsynonymous substitution affecting ROX1, which is predicted in the YEASTRACT database 391 to directly regulate expression of the indirect TDH3 regulator TUP1. In other words, ROX1 is 392 predicted by existing functional genomic data to be a 3 rd level regulator of TDH3 ( Figure 4) are both known to bind directly to the TDH3 promoter ( Figure 5A ), and mutations in these 405 binding sites cause large decreases in TDH3 expression (Metzger et al., 2015) . These 406 observations strongly suggest that mutations in RAP1 and GCR1 should also cause detectable 407 changes in TDH3 expression, yet no mutations were observed in these genes in our set of 408 trans-regulatory mutations. To investigate why we did not recover trans-regulatory mutations in 409 RAP1 or GCR1, we used error-prone PCR to generate mutant alleles of these genes with 410 mutations in either the promoter or coding sequence of RAP1 or the second exon of GCR1, 411 which includes 99.7% of the GCR1 coding sequence ( Figure 5B ). Hundreds of these RAP1 and 412 GCR1 mutant alleles were then introduced individually into the un-mutagenized strain carrying 413 the PTDH3-YFP reporter gene using CRISPR/Cas9-guided allelic replacement. Sequencing the 414 mutated regions of RAP1 and GCR1 in a random subset of transformants showed that each 415 strain harbored an average of 1.8 mutations in the RAP1 gene ( Figure 5C ) or 2.4 mutations in 416 the GCR1 gene ( Figure 5D ). As expected for PCR-based mutagenesis, the number of mutations 417 per strain appeared to follow a Poisson distribution both for RAP1 mutants ( Figure 5C , Chi-418 square goodness of fit, P = 0.14) and GCR1 mutants ( Figure 5D , Chi-square goodness of fit, P 419 = 0.79). 420 Among the RAP1 mutant strains, only 9.1% (43 of 470 strains) showed a significant change in 422 PTDH3-YFP expression greater than 3% (corresponding to a ~1% change in fluorescence) 423 relative to the un-mutagenized progenitor strain ( Figure 5E) For the GCR1 mutant strains, 37.7% showed a significant change in PTDH3-YFP expression 448 greater than 3% relative to the un-mutagenized progenitor strain ( Figure 5F ). Several of these 449 mutant alleles decreased the expression driven by the TDH3 promoter by ~80%, which is 450 similar to the previously reported effects of mutations in the Gcr1p binding sites of the TDH3 451 promoter (Metzger et al., 2015) , suggesting that they were null alleles. Indeed, resequencing 452 these large effect alleles revealed that one of them had a single nucleotide insertion in the 28 th 453 codon of the GCR1 ORF, which led to a frame shift eliminating 96% of amino acids (757 of we hypothesized that the fitness effects of mutations in GCR1 might also have caused them to 458 be underrepresented in the population from which the EMS mutants analyzed were derived. To 459 test this hypothesis, we measured the relative fitness of 62 of the 220 GCR1 mutants, including 460 all mutants with decreased PTDH3-YFP expression. GCR1 mutants causing the largest changes 461 in PTDH3-YFP expression showed strong defects in growth rate; however, several GCR1 mutants 462 with changes in PTDH3-YFP expression greater than 3% did not strongly affect fitness ( Figure 463 5G). This observation suggests that some of the coding mutations in GCR1 decreasing PTDH3-464 YFP expression could have been sampled among the EMS mutants used for mapping. We 465 therefore conclude that mutations in GCR1 were most likely not recovered in our set of 466 regulatory mutations because of the wide diversity of mutations that can affect TDH3 expression 467 and the limited number of EMS mutants included in the mapping experiment. 468 469 Properties of genes harboring regulatory mutations 470 471 With only 5 of the 65 trans-regulatory point mutations in coding sequences mapping to 472 transcription factors, we used gene ontology (GO) analysis to examine the types of genes 473 harboring trans-regulatory mutations affecting PTDH3-YFP expression more systematically. In all, 474 these 65 mutations mapped to 42 different genes, with 9 genes affected by more than one 475 mutation, 4 of which were genes involved in the de novo purine biosynthesis pathway ( Figure 476 6A). Several gene ontology terms were significantly enriched among genes affected by trans-477 regulatory mutations relative to genes affected by non-regulatory mutations. Supplementary File 478 8 includes all enriched GO terms, whereas Figure 6B only includes enriched GO terms that are 479 not parent to other GO terms in the GO hierarchy. Of the 33 GO terms enriched for trans-480 regulatory mutations shown in Figure 6B , 11 terms (including 13 of the 42 genes with trans-481 regulatory mutations) were related to chromatin structure ( Figure 6B ), which is known to play an 482 important role in the regulation of gene expression (Li et al., 2007 ). An additional 5 GO terms 483 (including 6 genes with trans-regulatory mutations) were related to metabolism, and 4 terms 484 (including 9 genes with trans-regulatory mutations) were related to transcriptional regulation 485 ( Figure 6B ). Three GO terms related to glucose signaling, including regulation of transcription by 486 glucose, carbohydrate transmembrane transport and glucose metabolic process, were also 487 significantly enriched for genes affected by trans-regulatory mutations ( Figure 6B ). When we 488 broadened this category of genes based on a review of glucose signaling (Santangelo, 2006) , 489 the enrichment included 5 genes implicated in glucose signaling (Supplementary File 9; 12.2% 490 of genes affected by trans-regulatory mutations were involved in glucose signaling vs 2.7% of 491 genes affected by non-regulatory mutations; Fisher's exact test: P = 6.2 x 10 -3 ). 492 493 At the pathway level, we found that genes involved in glycolysis and de novo purine 494 biosynthesis were also significantly enriched for trans-regulatory mutations ( Figure 6B ), with the 495 latter driven by the mutations in ADE2, ADE4, ADE5 and ADE6 genes described above 496 (Supplementary File 10). Genes involved in iron homeostasis also emerged as an over-497 represented group, with 5 GO terms (including 7 genes) being related to the regulation of 498 intracellular iron concentration ( Figure 6B ). Diverse cellular processes implicated in iron 499 homeostasis were represented among genes harboring trans-regulatory mutations, such as iron 500 transport (FTR1, CCC2), iron trafficking and maturation of iron-sulfur proteins (CIA2, NAR1), 501 transcriptional regulation of the iron regulon (FRA1) and post-transcriptional regulation of iron 502 homeostasis (TIS11). Remarkably, nearly half of all trans-regulatory point mutations in coding 503 sequences (31 of 65) were located in genes involved either in purine biosynthesis or iron 504 homeostasis. Moreover, 6 of the 8 genes harboring more than one trans-regulatory mutation 505 ( Figure 6A ) were involved in one of these two processes. Mutations in purine biosynthesis 506 genes tended to cause large increases in expression, whereas mutations in iron homeostasis 507 genes tended to cause large decreases in expression (Supplementary File 10). Although the 508 mechanistic relationship between these pathways and TDH3 expression is not known, changing 509 cellular conditions, including concentrations of metabolites (Pinson et al., 2009) Ultimately, our data suggest that although mutations affecting PTDH3-YFP expression map to 512 genes with diverse functions, genes involved in a small number of well-defined biological 513 processes are particularly likely to harbor such trans-regulatory mutations. 514 515 Trans-regulatory mutations are enriched in genomic regions harboring natural variation affecting 516 TDH3 expression 517 518 Because new mutations affecting gene expression provide the raw material for regulatory 519 variation segregating within a species, we asked whether the trans-regulatory mutations we 520 observed were enriched in genomic regions associated with naturally occurring trans-regulatory 521 variation affecting expression driven by the TDH3 promoter. Specifically large effects ( Figure 7B ; G-test: P = 3.3 x 10 -5 ). The enrichment of trans-regulatory mutations in 536 eQTL regions was thus not only driven by the effect size of these mutations or by the fact that 537 several of the trans-regulatory mutations with large effects were located in the same genes. 538 When we considered eQTL regions identified from each cross separately, we observed a 539 significant enrichment of trans-regulatory mutations in eQTL regions identified in SK1 x BY and 540 YPS1000 x BY crosses, but not in eQTL regions identified in the M22 x BY cross ( Figure 7B ; G-541 tests: P = 0.016 for SK1 x BY, P = 6.5 x 10 -3 for YPS1000 x BY, P = 0.70 for M22 x BY). This 542 pattern might be explained by the close genetic relatedness between BY and M22 (Metzger & 543 Wittkopp, 2019) or by the specific ecological niche of M22 isolated from an Italian vineyard 544 (Capece et al., 2012) . Overall, the enrichment of trans-regulatory mutations in eQTL regions 545 suggests that biases in the mutational sources of regulatory variation have shaped genetic 546 sources of expression variation segregating in wild populations. 547 By systematically isolating and characterizing 69 trans-regulatory mutations that all affect 549 expression of the same focal gene, this study reveals how trans-regulatory mutations are 550 distributed within a genome and within a regulatory network. For example, we found that these 551 trans-regulatory mutations were widely spread throughout the genome, with all except one 552 located in coding sequences. These data also allowed us to determine how well a regulatory 553 network inferred from integrating functional genomic and genetic data can predict sources of 554 trans-regulatory variation. Like many biological networks, transcriptional regulatory networks 555 have been inferred with the promise of explaining relationships between genetic variants and 556 the higher order trait of gene expression, but the predictive power of such networks remains 557 sparsely tested (Flint & Ideker, 2019) . 558 559 We found that although the trans-regulatory mutations in coding regions were not enriched in 560 transcription factors generally, they were overrepresented among transcription factors inferred 561 to be regulators of TDH3. None of these transcription factors are known to directly bind to the 562 TDH3 promoter, however, and mutations in RAP1 and GCR1, which have well characterized 563 binding sites in the TDH3 promoter, were notably missing from our set of trans-regulatory 564 mutations affecting PTDH3-YFP expression. Targeted mutagenesis of RAP1 and GCR1 565 suggested that most mutations in these genes (particularly RAP1) cause severe growth defects 566 that might have prevented their recovery in mutagenesis screens. Over 90% of the trans-567 regulatory mutations examined were located in genes outside of this network encoding proteins 568 with diverse molecular functions involved in chromatin remodeling, nonsense-mediated mRNA 569 decay, translation regulation, purine biosynthesis, iron homeostasis, and glucose sensing. 570 Surprisingly, nearly half of the trans-regulatory mutations mapped to genes involved in either the 571 purine biosynthesis or iron homeostasis pathways. Although not anticipated, finding so many 572 trans-regulatory mutations in genes that are not transcription factors is consistent with the 573 transcriptomic effects of gene deletions showing that transcription factors tend not to affect 574 expression of more genes than other types of proteins (Featherstone & Broadie, 2002) . 575 Consequently, it seems that regulatory networks describing the relationships between 576 transcription factors and target genes might capture only a small fraction of the potential 577 sources of trans-regulatory variation. 578 579 Understanding the properties of trans-regulatory mutations is important because these 580 mutations provide the raw material for natural trans-regulatory variation. We found that 581 mutations affecting PTDH3-YFP expression were enriched in genomic regions associated with 582 expression variation among wild isolates of S. cerevisiae, suggesting that mutational sources of 583 regulatory variation have helped shape the sources of genetic variation affecting gene 584 expression segregating in natural populations. Differences in the genomic distribution of new 585 regulatory mutations and polymorphisms are presumably due to natural selection, which 586 influences the evolutionary fate of new regulatory mutations based on their fitness 587 consequences. The fitness consequences of trans-regulatory mutations include not only 588 changes in growth rate caused by altering expression of the focal gene, but also their pleiotropic 589 effects on activity of other genes. Ultimately, explaining the variation in gene expression we see 590 in natural populations will require studies like this elucidating the mutational input as well as 591 studies describing the fitness and pleiotropic effects of these mutations in native environments. 592 Mutant strains selected for mapping 595 596 To identify mutations associated with expression changes, we selected 82 haploid mutant 597 strains for bulk segregant analysis ( Figure 1A ) from three collections of mutants obtained in 598 Gruber on YPD + G418 + Nat medium (YPD agar with 350 mg/L geneticin (G418) and 100 mg/L 726 Nourseothricin) to select diploid hybrids. After growth, cells were streaked on another YPD + 727 G418 + Nat agar plate, one colony was patched on YPG agar for each mutant and the diploid 728 strain was kept frozen at -80°C. Bulk segregant populations were then collected for batches of 8 729 mutants in parallel as follows. Diploid strains were thawed and revived on YPG plates, grown for 730 12 hours at 30°C on GNA plates (50 g D-glucose, 30 g Difco nutrient broth, 10 g yeast extract 731 and 20 g agar per liter) and sporulation was induced for 4 days at room temperature on KAc 732 plates ( not included in the BSA-Seq assays and that showed increased fluorescence by more than 5% 833 relative to the progenitor strain. Two of the sequenced mutants had a mutation in the ADE4 834 coding sequence. We then sequenced the ADE5 coding sequence in the remaining 12 mutants 835 and found a mutation in five of the sequenced mutants. We continued by sequencing the ADE6 836 coding sequence in the remaining seven mutants. Five of the sequenced mutants had a single 837 mutation and one mutant had two mutations in the ADE6 coding sequence. We sequenced the 838 ADE8 coding sequence in the last mutant but we found no candidate mutation in this mutant. 839 Finally, we sequenced the ADE2 coding sequence in two mutants that showed a reddish color 840 when growing on YPD plates. For all genes, the sequenced region was amplified by PCR from 841 cell lysates, PCR products were cleaned up using Exo-AP treatment (7.5 µl PCR product mixed 842 with 0.5 µl Exonuclease-I (NEB), 0.5 µl Antarctic Phosphatase (NEB), 1 µl Antarctic 843 Phosphastase buffer and 0. at the target site. One positive clone was grown in YPD and stored at -80°C in 15% glycerol. For 873 the pop-out step, a genomic region of ~240 bp centered on the mutation was amplified from the 874 EMS-treated mutant containing the desired mutation. The amplicon was transformed into the 875 strain with Ura3-hphMX4 inserted at the target site. Cells were plated on synthetic complete 876 medium containing 0.9 g/l of 5-fluoroorotic acid (SC + 5-FOA) to counterselect cells expressing 877 Ura3. After growth, a dozen [Ura-] colonies were streaked on SC + 5-FOA plates and one 878 colony from each streak was patched on a YPG plate. Cell patches were screened by PCR 879 using oligonucleotides that flanked the sequence of the transformed region and amplicons of 880 expected size (~350 bp) were sequenced to confirm the insertion of the desired mutation and 881 the absence of PCR-induced mutations. When possible two independent clones were stored at -882 80°C in 15% glycerol, but in some cases only one positive clone could be retrieved and stored. 883 A "one-step" CRISPR-Cas9 approach was used to insert mutations impairing a NGG or 884 CCN motif in the genome (22 mutations oligonucleotides that flanked the mutation site and amplicons of expected size (~350 bp) were 904 sequenced to confirm the insertion of the desired mutation and the absence of secondary 905 mutations. Then, one or two positive clones were patched on SC + 5-FOA to counterselect the 906 Cas9/sgRNA plasmid, grown in YPD and stored at -80°C in 15% glycerol. 907 A "two-steps" CRISPR-Cas9 approach was used to insert mutations located near but 908 outside a PAM sequence (4 mutations). Each step was performed as described above for the 909 "one-step" CRISPR-Cas9 approach. In the first step, Cas9 was targeted by the sgRNA to a 910 PAM sequence (the initial PAM) located close to the mutation site (up to 20 bp In our mutagenesis approach, we introduced a mutation that impaired the target PAM 1018 sequence in all RAP1 and GCR1 mutants. To determine the effect of this mutation alone, we 1019 generated strains YPW2701 and YPW2732 that carried the PAM mutation in the RAP1 1020 terminator or in the GCR1 intron, respectively, without any other mutation in RAP1 or GCR1. 1021 The fluorescence level of these two strains was not significantly different from the fluorescence 1022 level of the progenitor strain YPW1139 in flow cytometry assays. 1023 1024 Estimation of RAP1 and GCR1 mutation rates 1025 1026 The The network of potential TDH3 regulators shown on Figure 4 was established using data 1096 available in July 2019 on the YEASTRACT (www.yeastract.com) repository of regulatory 1097 associations between transcription factors and target genes in Saccharomyces cerevisiae 1098 (Teixeira et al., 2018) . We used the tool "Regulation Matrix" to obtain three matrices in which 1099 rows corresponded to the 220 transcription factor genes in YEASTRACT and columns 1100 corresponded to the 6886 yeast target genes included in the database. In the first matrix 1101 obtained using the option "Only DNA binding evidence", an element had a value of 1 if the 1102 transcription factor at the corresponding row was reported in the literature to bind to the 1103 promoter of the target gene at the corresponding column and a value of 0 otherwise. The two 1104 other matrices were obtained using the option "Only Expression evidence" with either "TF acting 1105 as activator" or "TF acting as inhibitor". An element had a value of 1 only in the "TF acting as 1106 activator" matrix if perturbation of the transcription factor at the corresponding row was reported 1107 to increase expression of the target gene at the corresponding column. An element had a value 1108 of 1 only in the "TF acting as inhibitor" matrix if perturbation of the transcription factor at the 1109 corresponding row was reported to decrease expression of the target gene at the corresponding 1110 column. An element had a value of 1 in both matrices if perturbation of the transcription factor at 1111 the corresponding row was reported to affect expression of the target gene at the corresponding 1112 column in an undetermined direction. Finally, an element had a value of 0 in both matrices if 1113 perturbation of the transcription factor at the corresponding row was not reported to alter 1114 expression of the target gene at the corresponding column in the literature. We then used a 1115 custom R script (Supplementary File 16) to generate a smaller matrix that only contained first 1116 level and second level regulators of TDH3 and TDH3 itself. A transcription factor was 1117 considered to be a first level regulator of TDH3 if a regulatory association with TDH3 was 1118 supported both by DNA binding evidence and expression evidence. A transcription factor was 1119 considered to be a second level regulator of TDH3 if a regulatory association with a first level 1120 regulator of TDH3 was supported both by DNA binding evidence and expression evidence. The 1121 network shown on Figure 4 was drawn using Adobe Illustrator based on regulatory interactions 1122 included in the matrix of TDH3 regulators (in Supplementary File 11) . To determine whether 1123 mutations in the TDH3 regulatory network constituted a significant mutational source of 1124 regulatory variation affecting PTDH3 activity, we compared the proportions of trans-regulatory and 1125 non-regulatory mutations that were located in a TDH3 regulator gene (first or second level) 1126 using a G-test (likelihood.ratio function in R package Deducer). Competitive fitness assays 1129 1130 We performed competitive growth assays to quantify the fitness of 62 strains with random 1131 mutations in the second exon of GCR1. These 62 strains corresponded to all GCR1 mutants 1132 that showed a significant decrease of PTDH3-YFP expression as quantified by flow cytometry as 1133 well as GCR1 mutants for which GCR1 exon 2 was sequenced and the location of mutations 1134 was known. The 62 strains were thawed on YPG plates as well as reference strains YPW1139 1135 and YPW2732 and strain YPW1182 that expressed a GFP (Green Fluorescent Protein) reporter 1136 instead of YFP. After three days of incubation at 30°C, strains were arrayed in four replicate 96-1137 well plates containing 0.5 ml of YPG per well. In parallel, the [GFP+] strain YPW1182 was also 1138 arrayed in four replicate 96-well plates. The eight plates were incubated on a wheel at 30°C for 1139 32 hours. We then measured the optical density at 620 nm of all samples using a Sunrise plate 1140 reader (Tecan) and calculated the average cell density for each plate. Samples were then 1141 transferred to 1. 95% confidence intervals of fluorescence levels measured among these replicates. P-values 1207 were obtained using the permutation tests described in Methods. (B-E) Mutants analyzed by 1208 BSA-Seq are highlighted in red. All of these mutants showed fluorescence changes greater than 1209 0.01 (vertical dotted lines) and P-value below 0.05 (horizontal dotted lines); percentages of all 1210 mutants that met these selection criteria in each collection are also shown. Mutants selected for 1211 Sanger sequencing of the ADE4, ADE5, and/or ADE6 candidate genes are highlighted in green. 1212 The mutant analyzed with both BSA-seq and Sanger sequencing is both red and green in panel 1213 C). Two mutants selected for Sanger sequencing of the ADE2 gene are highlighted in purple, 1214 one in D and one in E. was more frequent in the low fluorescence bulk and a positive sign if the mutation was more 1247 frequent in the high fluorescence bulk. One single-site mutant (NAP1, red) showed no 1248 significant change in expression relative to the wild-type progenitor strain (t-test, P-value > 1249 0.05); the mutation it carries is therefore considered to be a false positive in the BSA-seq data. 1250 For the remaining 29 mutations tested in single-site mutants, effects on expression were in the 1251 same direction as predicted by the signed G-value in all but two cases (ATP23 and IRA2, showed that the average magnitude of expression differences between independent clones was 1379 higher for single site mutants with a statistically significant expression difference between the 1380 single-site and EMS mutant sharing the same mutation (Mann-Whitney-Wilcoxon test, P = 1381 0.008). Results from E and F suggest that secondary mutation(s) and/or epigenetic changes 1382 that unintentionally occurred in some of the single-site mutant clones likely contributed to 1383 expression differences between some EMS and single-site mutants. It is important to 1384 emphasize, however, that these expression differences were small in magnitude and that 1385 overall the expression level of single-site mutants was strongly correlated with the expression 1386 level of EMS mutants (Figure 3 ). Trans-regulatory mutations were significantly enriched on chromosome VII that contained the 1421 purine biosynthesis genes ADE5 and ADE6 in which several mutations were identified (24.3% 1422 of trans-regulatory mutations located on chromosome VII vs 9.3% of non-regulatory mutations; 1423 G-test, P = 3.4 x 10 -4 ). Trans-regulatory mutations were also enriched on chromosome XIII that 1424 contained the purine synthesis gene ADE4, although this enrichment was not statistically 1425 significant (13.0% of trans-regulatory mutations located on chromosome XIII vs 7.8% of non-1426 regulatory mutations; G-test, P = 0.15). 1427 1428 Regions of RAP1 (purple) and GCR1 (green) genes that were subjected to random 1455 mutagenesis using error-prone PCR. 470 RAP1 mutants and 220 GCR1 mutants were obtained 1456 by integration of random PCR fragments at the native RAP1 or GCR1 loci using CRISPR/Cas9 1457 allelic replacement. Trans-regulatory mutations from: Proportion of mutations Trans-regulatory mutation The role of regulatory variation in complex traits and 1549 disease Genetics of 1551 trans-regulatory variation in gene expression Exploring the phenotypic consequences of 1556 tissue specific gene expression variation inferred from GWAS summary statistics Folding and Conformational 1559 Consequences of Glycine to Alanine Replacements at Different Positions in a Collagen 1560 Diversity of Saccharomyces 1562 cerevisiae yeasts associated to spontaneously fermenting grapes from an Italian "heroic 1563 vine-growing area FASTER MT : Isolation of Pure Populations of a and α Ascospores from Saccharomyces cerevisiae Glycolysis mutants in Saccharomyces 1569 cerevisiae The GTEx Consortium atlas of genetic regulatory effects across 1571 human tissues A global genetic interaction network maps a wiring diagram of cellular function Quantitative trait loci mapped to single-nucleotide 1579 resolution in yeast Genes Contribute to the Spontaneous Mitochondrial Genome Instability of 1582 Fitness effects of altering gene expression noise in 1586 Mapping Small Effect Mutations in Saccharomyces cerevisiae : Impacts of Experimental Design and Mutational Properties. G3: Genes, Genomes Wrestling with pleiotropy : Genomic and topological 1592 analysis of the yeast gene expression network Transcriptomic signatures across human tissues identify functional rare genetic 1599 variation The great hairball gambit Using an 1605 atlas of gene regulation across 44 human tissues to inform complex disease-and trait-1606 associated variation Haplotype-based variant detection from short-read 1609 sequencing Functional organization of the yeast proteome by systematic analysis of protein 1614 complexes High-efficiency yeast transformation using the LiAc/SS 1616 carrier DNA/PEG method Contrasting Properties of Specific Regulatory, Coding, and Copy Number Mutations in Saccharomyces 1620 cerevisiae : Frequency, Effects, and Dominance Molecular and evolutionary processes 1623 generating variation in gene expression Empirical measures of 1626 mutational effects define neutral models of regulatory evolution in Saccharomyces 1627 cerevisiae Nucleosome organization affects the sensitivity of 1630 gene expression to promoter mutations Posttranscriptional Regulation of Gcr1 Expression 1634 and Activity Is Crucial for Metabolic Adjustment in Response to Glucose Availability Mapping Yeast Transcriptional Networks Functional Discovery via a Compendium of Expression 1642 Characterization of the DNA-binding activity of GCR1 : In vivo evidence for 1645 two GCR1-binding sites in the upstream activating sequence of TPI of Saccharomyces 1646 cerevisiae Gene 1648 regulatory network reconstruction using single-cell RNA sequencing of barcoded 1649 genotypes in diverse environments. eLife, 9 The Cost of Protein Production Scale Genetic Perturbations Reveal Regulatory Networks and an Abundance of Gene Spectrum of disease-causing mutations in protein secondary 1659 structures The crystal structure of the DNA-1661 binding domain of yeast RAP1 in complex with telomeric DNA Complex effects 1664 of nucleotide variants in a mammalian cis-regulatory element Fast gapped-read alignment with Bowtie 2 New Vectors for Simple and Streamlined CRISPR-Cas9 Genome Editing Exploring genetic suppression interactions on a global scale Genetic Architecture of Ethanol-1679 Responsive Transcriptome Variation in Saccharomyces cerevisiae Strains The Role of Chromatin during Transcription Heterogeneity in Eukaryotic Homologous Recombination Rate A large accessory protein interactome is 1687 rewired across environments. eLife, 9 Understanding the Growth Phenotype of the Yeast gcr1 Mutant in Terms of Global Genomic Expression Patterns DNA variants affecting the expression 1692 of numerous genes in trans have diverse mechanisms of action and evolutionary 1693 histories A genome-integrated massively 1696 parallel reporter assay reveals DNA sequence determinants of cis-regulatory activity in 1697 neural cells Cutadapt removes adapter sequences from high-throughput sequencing 1699 reads Integrating 1705 genotypic and expression data in a segregating mouse population to identify 5-1706 lipoxygenase as a susceptibility gene for obesity and bone traits Systematic dissection and optimization of inducible enhancers in human cells using a 1711 massively parallel reporter assay Contrasting Frequencies and Effects of cis-and trans-Regulatory Mutations Affecting 1715 Compensatory trans-regulatory alleles minimizing 1718 variation in TDH3 expression are common within Saccharomyces cerevisiae Selection on 1721 noise constrains variation in a eukaryotic promoter PANTHER version 14 : 1724 More genomes, a new PANTHER GO-slim and improvements in enrichment analysis 1725 tools The mutability of enzyme active-site shape determinants Regulatory Variation at Glypican-3 Underlies a Major 1734 Iron sensing and regulation in Saccharomyces 1737 cerevisiae : Ironing out the mechanistic details High-1740 resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis The different 1743 (sur)faces of Rap1p Metabolic intermediates selectively stimulate transcription factor interaction and 1747 modulate phosphate and purine pathways Comprehensive Genome-wide Protein-DNA Interactions 1750 Detected at Single-Nucleotide Resolution A system selective for mutations affecting the synthesis of adenine in yeast Glucose Signaling in Saccharomyces cerevisiae An 1760 integrative genomics approach to infer causal associations between gene expression 1761 and disease Inferring gene regulatory logic from high-throughput 1764 measurements of thousands of systematically designed promoters Homotypic cooperativity and 1767 collective binding are determinants of bHLH specificity and function Genome Profiling of a Novel Mutagenesis Technique Using Proofreading-Deficient DNA In Vivo Site-Specific Mutagenesis and Gene 1775 Collage Using the Delitto Perfetto System in Yeast Saccharomyces cerevisiae YEASTRACT : An 1784 upgraded database for the analysis of transcription The role of Gcr1p in the 1788 transcriptional activation of glycolytic genes in yeast Saccharomyces cerevisiae Stochastic activation of a DNA damage response causes cell-to-cell mutation rate 1792 variation The amino-acid mutational spectrum of human 1794 genetic disease The UAS of the yeast GAPDH promoter 1796 consists of multiple general functional elements including RAP1 and GRF2 binding sites The exchangeability of amino acids in proteins Dynamic Role of trans Regulation of Gene 1803 Expression in Relation to Complex Traits Trans-acting regulatory variation in Saccharomyces cerevisiae and 1807 the role of transcription factors Genetic analysis of 1810 variation in transcription factor binding in yeast High-resolution DNA-1815 binding specificity analysis of yeast transcription factors Sequencing depth in BSA-seq data 1825 Supplementary File 2. List of all mutations identified by BSA-Seq or Sanger sequencing in this 1826 study 1827 Supplementary File 3. Linked mutations associated with fluorescence level in BSA-Seq 1828 experiments 1829 Supplementary File 4. Mutations identified by Sanger sequencing of candidate genes 1830 Supplementary File 5. Mutations tested in single-site mutants 1831 Supplementary File 6. Mutations associated with fluorescence level in BSA-Seq experiments 1832 Supplementary File 7 YFP expression because they all produce the same YFP transcript. However, we observed that 1313 the mutations in purine synthesis genes increased fluorescence level when YFP expression was 1314 driven by the TDH3 or the GPD1 promoter but not when YFP expression was driven by the 1315 RNR1 or the STM1 promoter, indicating that the effects of these mutations on YFP expression 1316 were promoter specific. observed between EMS and single-site mutants (bars). To assess the statistical significance of 1321 these expression differences, we estimated the magnitude of expression differences expected 1322 to arise by chance between genetically identical strains grown at different positions of a 96-well 1323 plate (red line). This null distribution was obtained from the differences in expression measured 1324 for 10,440 pairs of the un-mutagenized progenitor strain grown at different well positions in four 1325 replicate populations. We next randomly permuted 10 5 times the expression values between i) 1326 each pair of EMS and single-site mutants and ii) random pairs of the progenitor strain to 1327 calculate the one-sided p-value for each pair of mutants (i.e. the proportion of randomized 1328 expression differences greater than the observed expression difference). After Benjamini-1329Hochberg correction for multiple testing, we found that the expression difference between the 1330 single-site mutant and the EMS mutant carrying the same mutation was statistically significant 1331 (adjusted p-value < 0.05) for 14 out of the 40 pairs of mutants (35%, red and blue bars), but 1332 highly significant (adjusted p-value < 0.01) for only 1 pair (2.5%, red bar). Because mutant 1333 strains were exposed to the same micro-environmental and technical variation as the control 1334 samples used to establish the null distribution, these sources of variation are unlikely to explain 1335 the significant differences of expression observed between EMS and single-site mutants. 1336