key: cord-0317799-kk6cq79w authors: Long, Phillip N; Cook, Vanessa J; Majumder, Arundhati; Barbour, Alan G; Long, Anthony D title: The utility of a closed breeding colony of Peromyscus leucopus for dissecting complex traits date: 2021-08-15 journal: bioRxiv DOI: 10.1101/2021.08.14.456359 sha: 00593aa9e2e39eec5f40cd78b8c580b231b64c05 doc_id: 317799 cord_uid: kk6cq79w Although Peromyscus leucopus (deermouse) is not considered a genetic model system, its genus is well suited for addressing several questions of biologist interest, including the genetic bases of longevity, behavior, physiology, adaptation, and it’s ability to serve as a disease vector. Here we explore a diversity outbred approach for dissecting complex traits in Peromyscus leucopus, a non-traditional genetic model system. We take advantage of a closed colony of deer-mice founded from 38 individuals between 1982 and 1985 and subsequently maintained for 35+ years (∼40-60 generations). From 405 low-pass (∼1X) short-read sequenced deermice we accurately imputed genotypes at 17,751,882 SNPs. Conditional on observed genotypes for a subset of 297 individuals, simulations were conducted in which a QTL contributes 5% to a complex trait under three different genetic models. The power of either a haplotype- or marker-based statistical test was estimated to be 15-25% to detect the hidden QTL. Although modest, this power estimate is consistent with that of DO/HS mice and rat experiments for an experiment with ∼300 individuals. This limitation in QTL detection is mostly associated with the stringent significance threshold required to hold the genome-wide false positive rate low, as in all cases we observe considerable linkage signal at the location of simulated QTL, suggesting a larger panel would exhibit greater power. For the subset of cases where a QTL was detected, localization ability appeared very desirable at ∼1-2Mb. We finally carried out a GWAS on a demonstration trait, bleeding time. No tests exceeded the threshold for genome-wide significance, but one of four suggestive regions co-localizes with Von Willebrand factor. Our work suggests that complex traits can be dissected in founders-unknown P. leucopus colony mice in much the same manner as founders-known DO/HS mice and rats, with genotypes obtained from low pass sequencing data. Our results further suggest that the DO/HS approach can be powerfully extended to any system in which a founders-unknown closed colony has been maintained for several dozen generations. for 35+ years (~40-60 generations). From 405 low-pass (~1X) short-read sequenced deermice 23 we accurately imputed genotypes at 17,751,882 SNPs. Conditional on observed genotypes for 24 a subset of 297 individuals, simulations were conducted in which a QTL contributes 5% to a 25 complex trait under three different genetic models. The power of either a haplotype-or marker-26 based statistical test was estimated to be 15-25% to detect the hidden QTL. Although modest, 27 this power estimate is consistent with that of DO/HS mice and rat experiments for an experiment 28 with ~300 individuals. This limitation in QTL detection is mostly associated with the stringent 29 significance threshold required to hold the genome-wide false positive rate low, as in all cases 30 we observe considerable linkage signal at the location of simulated QTL, suggesting a larger 31 panel would exhibit greater power. For the subset of cases where a QTL was detected, 32 localization ability appeared very desirable at ~1-2Mb. We finally carried out a GWAS on a 33 demonstration trait, bleeding time. No tests exceeded the threshold for genome-wide 34 significance, but one of four suggestive regions co-localizes with Von Willebrand factor. Our 35 work suggests that complex traits can be dissected in founders-unknown P. leucopus colony 36 mice in much the same manner as founders-known DO/HS mice and rats, with genotypes 37 obtained from low pass sequencing data. Our results further suggest that the DO/HS approach 38 can be powerfully extended to any system in which a founders-unknown closed colony has 39 been maintained for several dozen generations. 41 Variation in complex genetic traits is due to the action of many genes as well as the 42 environment. Despite complex genetic traits (e.g., risk of certain mental disorders, heart 43 disease, stroke, and diabetes) accounting for the bulk of US health spending, in most cases we 44 do not yet understand their precise genetic architecture. Over the last decade, human 45 geneticists have largely employed a Genome-Wide Association Study (GWAS) approach, using nest building), and adaptive spatial changes in coat coloration to avoid predation among other 75 traits (beautifully reviewed in (Bedford and Hoekstra 2015) ). Different deermice species are also 76 major reservoirs for several tick-borne diseases including Lyme disease, Borrelia miyamotoi 77 relapsing fever, the malaria-line protozoan disease babesiosis, and hantavirus (Barbour 2017) . 78 The role of P. leucopus as the likely primary reservoir for the bacteria that causes Lyme disease 79 (Borrelia burgdorferi) and several other tick-borne diseases is analogous to that of bats as 80 reservoirs for SARS coronaviruses and Ebola virus. In fact, P. leucopus's role as the primary 81 reservoir for the bacteria that causes Lyme disease has led to proposals that it be the first 82 mammal considered for natural release gene drive experiments in North America (Najjar et al. 83 2017). 84 Our previous effort to create infrastructure to strengthen P. leucopus's role as an emerging 6 Bleeding time assay: The method was a modification of that of Broze et al (Broze et al. 2001) . 138 The animals were lightly anesthetized with 3% isoflurane by inhalation with 2 L/min flow of 139 oxygen in a small animal veterinary induction chamber. A sterilized, small animal nail clipper 140 (Conair PRO small; item PGRDNCS) equipped with a guard set at 2 mm was used to sever the 141 tail's tip, and the timer was started. The exposed tissue was briefly touched every 0.5 min with 142 Whatman No. 2 filter paper. The recorded bleeding time was the number of minutes in half 143 minute intervals until further bleeding of the exposed tail tip ceased for at least 0.5 min after the 144 last touch of filter paper. To prevent further bleeding after the animal was returned to its cage K is the number of founder haplotypes in the population. Although the number of founder 176 haplotypes in the colony is 76, which is much greater than the employed 8, increasing K greatly 177 beyond 8 seems to result in poorer quality imputations, so we adopted this lower number. Kinship matrix: The individuals of this study are from a closed colony. Although the colony 230 employs a breeding design to minimize inbreeding, some mice are more closely related to one 231 another, and the genetic constitution of the colony is slowly changing over time. It is common in 232 such situations to employ a kinship matrix and statistical models to control for impact of cryptic 233 relatedness on test statistics. We derived a kinship matrix at every thousandth locus using the 234 eight STITCH-generated haplotype dosages. In order to calculate a kinship matrix we first had 235 9 to convert the dosage of the j th individual at the l th locus for the h th haplotype into the three 236 genotypic probabilities. To do this, we define: with g taking the values 0, 1, or 2 corresponding to the three possible genotypes, and the "0.18" 239 a somewhat arbitrarily chosen constant scaled to reflect uncertainty in those genotypes 240 proportional to the spread of the dosage estimates about 0, 1, or 2. We calculate the proportion 241 of alleles identical by state between individuals j and j' for the h th haplotype at the l th locus as: (with relmatLmer from the lme4qtl package). 267 We could not employ the same approach for haplotype-based scans and were forced to fit a full 268 model at every locus. We obtained a 10X speed-up by only carrying out a test at every tenth 269 marker. This is a reasonable compromise as haplotype scores change very little over 10 marker Age and DOB affect the trait as first order polynomials. Although bleeding time was measured in 319 colony animals over 6 years, 75% of the animals were assayed over less than two years, but we 320 do not know if our assay was changing slowly over time or the colony itself was changing. We estimate the heritability, using lme4qtl::relmatLmer (below), of the residual quantile 322 normalized bleeding phenotype or a single replicate of our simulation using either the haplotype-323 or SNP-based kinship matrices described above. genotypes from low coverage data and the highly confident RNAseq calls. Figure 1A pass sequencing data appears to be routine in P. leucopus closed colony deermice, as there is 382 likely extensive linkage disequilibrium. 383 We estimate a kinship matrix at every 1000th marker genome-wide for the set of 8 haplotype 384 dosages for the 405 animals, and normalized the relatedness matrix using a parent offspring trio 385 (Supp. Figure 2) . Although there is the potential for relatedness in the P. leucopus colony 386 animals of our study, the kinship matrix subsetted for the 297 individuals we examine more 387 carefully here, suggested that mating between closely related animals is largely avoided in the 388 colony with only a small number of comparisons between individuals suggesting close 389 relatedness. We did not attempt to remove these individuals from the study, and instead utilize 390 the kinship matrix and mixed models to map QTL. intuitive, that the marker-based test seems to be similarly powered to detect QTL whose 448 underlying genetics is NOT a single causative site (Table 1) . We suspect both tests are really 449 detecting haplotypes or markers that happen to tag a region fairly efficiently (and the marker- We tabulated summary statistics for the scans carried out to detect a QTL located on 458 chromosome 23 (Table 1 ; thresholds are 7.5 and 6.0 for marker and haplotype-based tests (c.f. Figure 3) suggest block-like patterns of highly significant markers. Finally, the median distance of the MSM from the causative SNP/gene is less than the mean 485 distance ( Table 1 ), suggesting that occasionally the MSM is very far from the causative site, when marker-or haplotype-based tests show "peaks" they do so via large blocks likely greater 523 than 1Mb is size. Figure 5 , a Manhattan plot of the bleeding time GWAS, shows no significant 524 hits (consistent with the QQ plot). Furthermore it is not obvious from a visual comparison of the 525 actual GWAS or GWAS on permuted phenotypes that the real GWAS is producing highly 526 18 meaningful results. So any interpretation of the results must be tempered by the idea that 527 observed peaks are only suggestive. 528 We choose to focus on four regions located on chromosomes 3, 13, 15, and 22 (Table 2) . Although not significant, these peaks are elevated, show consistent signals across the two tests, 530 show the expected block-like pattern of significance, and are not located near the tips of 531 chromosomes (nor on the X chromosome) where imputation is more suspect. Supplementary 532 Figure 7 shows the associations under the two tests for these four chromosomes and Figure 6 533 10Mb regions centered on the peak. Although the peaks are not significant they are consistent 534 across tests and suggest localizations within about 2Mb. We thus tested a set of 83 genes 535 whose mouse GO term matched "platelet function" and two additional genes associated with 536 warfarin resistance in rodents to see if they were located within 2Mb of the candidate peaks. The peak on chromosome 3 was associated with three coagulation candidate genes, namely: Von Willebrand factor (VWF), CD9 antigen, and protein tyrosine phosphatase, non-receptor 539 type 6. There are no obvious polymorphisms in VWF that could explain our mapping result, but 540 it is an interesting candidate gene. We further hypothesized that variation in VKOR1 (or a 541 paralog) could be associated with bleeding time QTL, as deermice are undoubtedly exposed to 542 warfarin "rat" poison, but neither gene was associated with the four suggestive peaks. Here we explore a diversity outbred approach for dissecting complex traits in Peromyscus 545 leucopus, a non-traditional genetic model system. We take advantage of a colony of deer-mice 546 founded from 38 individuals between 1982-85 and subsequently maintained as a closed colony 547 for 35+ years (~40-60 generations). We speculate that this P. leucopus colony shares many suggesting that like DO mice on the order of 500-1000 mice are likely necessary to routinely 592 map QTL contributing 5% to variation in the complex trait (Chitre et al. 2020 * percent of replicates with at least one test above 6.0 (haplotype) or 7.5 (marker) Left panels from top to bottom are raw 692 dosage values colored by haplotype for a lower coverage individual (0.7X), higher coverage 693 individual (1.7X), or average over all 297 individuals Supplementary Figure 2: Histogram of relatedness values in the kinship matrix used in QTL 697 scans. Two arrows point to relatedness values of known parent-offspring relationships Supplementary Figure 3: Example of chromosome 23 scans with a simulated QTL on 700 chromosome 19 (control) Supplementary Figure 4: Distance of most significant marker (MSM) in a scan from the 703 simulated causative gene as a function of the MSM LOD Score. Only replicates having one or 704 more "hits" are displayed Supplementary Figure 5: QQ plots of LOD Scores from genome-wide scans for bleeding time 707 with actual or permuted phenotypes. Left, marker-based tests Supplementary Figure 6: Manhattan plot of a control genome-wide scan for bleeding time 710 under permuted phenotypes (negative control). Faceted to match figure 5 Supplementary Figure 7: Chromosome scans of chromosomes containing notable peaks from 713 the genome-wide scan for bleeding time gene Genetic 718 analysis of complex traits in the emerging Collaborative Cross Infection resistance and tolerance in Peromyscus spp., natural reservoirs of 721 microbes that are virulent for humans Genomes, 723 expression profiles, and diversity of mitochondria of the White Peromyscus leucopus, reservoir of Lyme disease and other zoonoses Peromyscus mice as a model for studying natural 726 variation An Expanded View of Complex Traits: From 728 Polygenic to Omnigenic A tail vein bleeding time model and delayed 730 bleeding in hemophiliac mice EBI GWAS Catalog of published genome-wide association studies, targeted arrays and 733 summary statistics 2019 Study in 3,173 Outbred Rats Identifies Multiple Loci for Body Weight, Adiposity, and Fasting 736 Glucose The variant call format 738 26 and VCFtools Rapid genotype imputation from sequence 740 without reference panels Quantitative trait locus 742 mapping methods for diversity outbred mice Development of the National Institutes of Health genetically 744 heterogeneous rat stock Graph-based genome 746 alignment and genotyping with HISAT2 and HISAT-genotype Synthetic Population Resource for the routine dissection of complex traits Back to the Future: Multiparent Populations Provide 752 the Key to Unlocking the Genetic Basis of Complex Traits Mapping mendelian factors underlying quantitative traits 754 using RFLP linkage maps Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM Drosophila Synthetic Population Resource The 760 genome of Peromyscus leucopus, natural host for Lyme disease and other emerging 761 infections Joint estimates of quantitative trait locus effect and 763 frequency using synthetic recombinant populations of Drosophila melanogaster Charting the genotype--phenotype map: lessons from the 766 Finding the 769 missing heritability of complex diseases Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing 772 data Genetic properties 774 of the maize nested association mapping population Lactobacilli and other 776 gastrointestinal microbiota of Peromyscus leucopus, reservoir host for agents of Lyme 777 disease and other zoonoses in North America A method for fine mapping 779 quantitative trait loci in outbred animal stocks Driving towards 782 ecotechnologies Persistence of platelet 784 thrombus formation in arterioles of mice lacking both von Willebrand factor and fibrinogen Estimating FST and kinship for arbitrary population structures Peromyscus leucopus White-footed mouse LL Stock, Peromyscus Genetic Stock Center Are rare variants responsible for susceptibility to complex diseases? Combined sequence-based and genetic mapping analysis of complex traits in 794 outbred rats High-resolution 796 genetic mapping using the Mouse Diversity outbred population Properties and modeling of GWAS when 798 complex disease risk is due to non-complementing, deleterious mutations in genes of large 799 effect Heterogeneous Stock Populations for Analysis of Complex 801 803 29 lme4qtl: linear mixed models with flexible covariance structure for genetic studies of related 804 individuals