key: cord-0274781-mu699x3k authors: Randolph, Haley E; Mu, Zepeng; Fiege, Jessica K; Thielen, Beth K; Grenier, Jean-Christophe; Cobb, Mari S; Hussin, Julie G; Li, Yang I; Langlois, Ryan A; Barreiro, Luis B title: Single-cell RNA-sequencing reveals pervasive but highly cell type-specific genetic ancestry effects on the response to viral infection date: 2020-12-21 journal: bioRxiv DOI: 10.1101/2020.12.21.423830 sha: 520884665adbf01ddfaba65a7bca7d21fff075f9 doc_id: 274781 cord_uid: mu699x3k Humans vary in their susceptibility to infectious disease, partly due to variation in the immune response following infection. Here, we used single-cell RNA-sequencing to quantify genetic contributions to this variation in peripheral blood mononuclear cells, focusing specifically on the transcriptional response to influenza infection. We find that monocytes are the most responsive to influenza infection, but that all cell types mount a conserved interferon response, which is stronger in individuals with increased European ancestry. By comparing European American and African American individuals, we show that genetic ancestry effects on expression are common, influencing 29% of genes, but highly cell type-specific. Further, we demonstrate that much of this population-associated expression variation is explained by cis expression quantitative trait loci, which are enriched for signatures of recent positive selection. Our findings establish common cis-regulatory variants—including those that are differentiated by genetic ancestry—as important determinants of the antiviral immune response. genetic contributions to this variation in peripheral blood mononuclear cells, focusing specifically 23 on the transcriptional response to influenza infection. We find that monocytes are the most 24 responsive to influenza infection, but that all cell types mount a conserved interferon response, 25 which is stronger in individuals with increased European ancestry. By comparing European 26 American and African American individuals, we show that genetic ancestry effects on expression 27 are common, influencing 29% of genes, but highly cell type-specific. Further, we demonstrate that 28 much of this population-associated expression variation is explained by cis expression 29 quantitative trait loci, which are enriched for signatures of recent positive selection. Our findings 30 establish common cis-regulatory variants-including those that are differentiated by genetic 31 ancestry-as important determinants of the antiviral immune response. were probably rare. Due to the restricted potential for long-distance (especially intercontinental) 41 exchange, earlier viral epidemics are thought to have been strongly stratified by geography (Enard 42 and Petrov, 2020). Consequently, although most human genetic variation is shared between 43 populations, differences in viral-mediated selection pressures may have driven divergence in the 44 frequencies of polymorphisms that mediate the viral host response (either because of differences 45 in the viruses that caused epidemic outbreaks or heterogeneity in the timing of epidemic events 46 between populations). If so, the pattern of past epidemic outbreaks may have contributed to 47 variation in viral susceptibility observed within and among modern-day human populations-48 perhaps interacting with or compounding known health disparities that contribute to substantially 49 higher rates of influenza and COVID-19 hospitalization in Black versus non-Hispanic white 50 Americans (e.g., the Centers for Disease Control and Prevention (CDC) estimates a 79% higher 51 rate of influenza-related hospitalizations for Black versus white Americans: (CDC, 2020)). 52 Genetics is thought to play an important role in explaining population variation in 53 susceptibility to influenza and other viral pathogens (Albright et al., 2008; Kenney et al., 2017) . 54 Supporting this view, a study exploring the impact of regulatory genetic variation on gene 55 expression levels (i.e. expression quantitative trait loci [eQTL] studies) following influenza A virus 56 (IAV) infection of human dendritic cells revealed 121 genetic variants that are significantly 57 associated with the immune response to IAV, including one cis-acting variant associated with the 58 interferon regulatory factor 7 gene (IRF7) that also acts as a trans-regulator responsible for 59 genetic effects on a hub of IRF7-induced antiviral genes (Lee et al., 2014) . Variation in the gene 60 expression response to IAV in vitro is also correlated with genetic ancestry in monocytes derived 61 cell type most susceptible to viral entry, the most permissible to intracellular replication of the 113 virus, or both. 114 We then explored the extent to which the infection response was concordant or discordant 115 across the five major PBMC cell types. Overall, we found a strong correlation between the 116 response to IAV infection across cell types (Pearson's r range 0.65 -0.95 across pairwise cell 117 type comparisons, Fig. S1C ). However, we also observed many genes for which the response to 118 infection was discordant between cell types. For example, among differentially-expressed genes 119 shared by monocytes and NK cells (n = 822), 138 genes (16.8%, Fig. S1D ) responded to IAV 120 infection in the opposite direction (Fig. 1F) . To further dissect cell type-specific versus shared 121 responses, we generated a per-gene specificity score based on the coefficient of variation of the 122 log2 fold-change response across cell types for each gene that was significantly differentially-123 expressed in at least one cell type (high values indicate highly cell type-specific responses to IAV, 124 low values indicate shared responses to IAV) (Table S4) . Genes with highly cell type-specific 125 patterns of response were enriched for roles in translational initiation, co-translational protein 126 targeting to membrane, and viral gene expression (FDR < 1x10 -10 for all terms, Fig. 1G , left, Table 127 S4). In contrast, genes with low specificity scores were enriched for pathways related to type I 128 interferon (IFN) signaling (FDR < 1x10 -10 ) and response to type I IFN (FDR = 7.1x10 -4 ) (Fig. 1G, 129 right, Table S4 ). Thus, induction of IFN-related genes appears to be a fundamental component of 130 the antiviral response that is shared across immune cell types (Fig. 1H, top) . One notable 131 exception to this observation is the gene IFNG, which encodes the type II IFN cytokine IFN-γ. 132 IFN-γ is a crucial mediator of antiviral immunity (Kang et al., 2018) , but shows an expression 133 pattern that is almost exclusive to NKT cells in the IAV-infected condition (Fig. 1H, bottom) . 134 Collectively, our results underscore the importance of considering the immune responses of single 135 cell types independently. Not only does this approach reveal distinct, cell type-specific responses 136 to viral infection, but also highlights responses that would be undetectable or potentially 137 misleading in more heterogeneous immune cell populations (e.g. PBMCs). 138 We next identified genes for which gene expression levels are correlated with quantitative 141 estimates of genetic ancestry in either the mock condition, the IAV-infected condition, or both 142 (after controlling for age, batch, and other technical covariates). To increase power and improve 143 our effect size estimates for these "population differentially-expressed" (popDE) genes, we 144 implemented a multivariate adaptive shrinkage method (mash) (Urbut et al., 2019) , which 145 leverages the correlation structure of genetic ancestry effect sizes across cell types (see Methods 146 for details in the statistical models used). Across conditions and cell types, we identified 1,949 147 popDE genes (local false sign rate (lfsr) < 0.10), ranging from 830 in NK cells to 1,235 genes in 148 CD4 + T cells ( Fig. 2A , Table S5 ). Within each cell type, most popDE genes were shared between 149 mock and IAV-infected conditions (52.9% in monocytes -77.4% in CD8 + T cells). In contrast, 150 across cell types, we found that genetic ancestry effects on gene expression were highly cell type-151 specific, such that the majority of popDE genes were identified in only one or two cell types (52.2% 152 in mock, 51.4% in IAV-infected) (Fig. 2B ). For example, CXCL8, which encodes IL-8, an important 153 mediator of the inflammatory response (Bickel, 1993) , is more highly expressed with increasing 154 African ancestry following IAV infection only in monocytes (lfsr = 0.051 in monocytes and lfsr > 155 0.25 in all other cell types) (Fig. 2C, top) . There are, however, a minority of genes that exhibit 156 shared genetic ancestry effects across all five cell types (17.8% in mock, 24.7% in IAV-infected). 157 One such gene is IL32, which encodes a cytokine that induces other proinflammatory cytokines 158 to activate the NF-κB pathway (Ca and Sh, 2006; Yan et al., 2018) . IL32 is more highly expressed 159 with increasing African ancestry in all cell types following infection (lsfr < 8.8x10 -6 ) (Fig. 2C, 160 bottom). 161 To identify the functional pathways most closely associated with genetic ancestry, we 162 performed enrichment analysis on the popDE effects using the Molecular Signatures Database 163 hallmark gene sets (Liberzon et al., 2015) (Fig. 2D , Table S6 ). In monocytes, we identified 164 significant enrichment for multiple immune pathways prior to infection, including the IFN-α 165 response (FDR = 1.9x10 -3 ), IFN-γ response (FDR = 5.4x10 -4 ), TNFα signaling via NF-κB (FDR = 166 6.1x10 -4 ), IL-2/STAT5 signaling (FDR = 2.1x10 -3 ), and inflammatory response (FDR = 0.012) (Fig. 167 2D). In all cases, these enrichments were identified for genes that were more highly expressed at 168 baseline (i.e., in the mock treatment condition) in individuals with a greater proportion of African 169 ancestry. Intriguingly, in IAV-infected cells, this pattern shifts: post-infection, we observed 170 enrichment of type I and II IFN pathways (IFN-α response FDR = 0.014, IFN-γ response FDR = 171 0.040 in monocytes) (Fig. 2D ) in genes more highly expressed with increasing European ancestry, 172 as opposed to African ancestry. To better characterize this shift, we constructed a per-individual, 173 per-condition score of interferon signaling activity (termed IFN score) by calculating the average 174 mean-centered expression of genes belonging to the hallmark IFN-α and IFN-γ gene sets 175 (Liberzon et al., 2015) . This simple summary statistic revealed that increased European ancestry 176 strongly correlates with increased IFN score, but only in the IAV condition (mean Pearson's r 177 across cell types = -0.26, Fisher's meta-p = 2.9x10 -6 in the IAV condition, Fisher's meta-p = 0.746 178 in the mock condition) (Fig. 2E) . 179 These findings suggest that, for some immune pathways (particularly interferon signaling 180 pathways), genetic ancestry may also predict the magnitude of the response to IAV infection. To 181 explicitly test this possibility, we identified significant interactions between treatment condition 182 (mock versus IAV) and genetic ancestry levels. After mash estimation of interaction effect sizes 183 across cell types, we identified 609 genes for which ancestry was associated with the response 184 to infection (i.e., "population differentially-responsive" [popDR] genes, lfsr < 0.10). PopDR genes 185 were found for all five cell types (number of popDR genes range = 84 -334), but were most 186 common in monocytes (n popDR genes = 334) (Fig. 2F) ; a core set of 27 popDR genes were also 187 shared across cell types (Fig. S2A , Table S7 ). In agreement with our previous results, we found 188 that increased European genetic ancestry predicts a stronger type I/II IFN response (measured 189 Polygenic selection has shaped genetic ancestry-associated differences in ribosomal 241 We next sought to evaluate if the intersection set of popDE genes and eGenes clustered 243 into specific biological pathways. We identified a strong enrichment for Gene Ontology (GO) terms 244 related to transcriptional and translational processes, including "ribosomal small subunit 245 biogenesis" and "viral transcription" (FDR < 3x10 -10 in mock-infected and IAV-infected) (Fig. 3E) , 246 as well as more canonical immune-related pathways, such as myeloid/leukocyte activation and 247 degranulation (Table S9 ). The observed gene expression divergence between populations in 248 genes linked to similar biological functions could be explained by two hypotheses: 1) genes 249 associated with such biological processes have evolved under relaxed evolutionary constraint, 250 allowing them to accumulate cis-regulatory variants that have randomly diverged in allele 251 frequencies via neutral genetic drift, or 2) cis-variants regulating these genes have undergone 252 non-neutral shifts in allele frequencies, resulting in the accumulation of alleles that systematically 253 influence the behavior of enriched pathways in a directional manner -a pattern consistent with 254 polygenic selection. 255 To test for polygenic selection, for each of the enriched pathways, we calculated the 256 median ancestry-associated differential expression effect (i.e., the estimated difference in gene 257 expression between European-and African-ancestry individuals) across all popDE genes 258 contained in a given pathway (limited to those with an eQTL). Under neutrality, we expect the 259 direction of the ancestry-associated effects to be randomly distributed (i.e., some genes will be 260 more highly expressed in European-ancestry individuals whereas others will be more highly 261 expressed in African-ancestry individuals). In contrast, under polygenic selection, we expect to 262 find a directional effect, such that most genes for a given pathway show higher expression in one 263 ancestry group versus the other. Interestingly, most of the GO terms for ribosomal protein-related 264 pathways (e.g. ribosomal biogenesis, viral transcription, etc.) show median population-associated 265 differences in gene expression levels that are consistently higher in individuals with increased 266 European ancestry across cell types, in both IAV-infected cells (Fig. 3F, 3G ) and mock-exposed 267 cells (Fig. S4A) . Importantly, these differences are attenuated when regressing out the effects of 268 the associated top cis-eQTLs for the genes (Fig. 3F, 3G) , suggesting that such differences are 269 driven by the cumulative effect of regulatory variants impacting the expression of ribosomal contributed to genetic ancestry-associated differences in vulnerability to these diseases. To 281 address this question, we performed colocalization analysis between the union set of eQTLs 282 detected across all cell types and conditions and 14 publicly available genome-wide association 283 study (GWAS) hits for 11 autoimmune diseases (Table S10 ). Colocalized eQTLs are expected to 284 be strongly enriched for causal drivers of variation in disease susceptibility across individuals. 285 Across all autoimmune diseases, we colocalized eQTL in our study with a total of 95 GWAS 286 variants (Fig. 4A , Table S10 ). 287 To analyze a broader array of immune-related colocalization signals, we combined our This approach allowed us to identify 1,030 colocalized GWAS hits across the 11 traits (mapping 292 to 536 eGenes, Table S10 ). We then asked if these putative causal variants were enriched for 293 signatures of natural selection in the 1000 Genomes Project CEU and YRI populations, using 294 either the integrated haplotype score (iHS, a within-population measure of recent positive 295 selection based on haplotype homozygosity (Voight et al., 2006) ) or extreme values of population 296 differentiation (F ST ). Far more colocalized loci display high |iHS| scores (values > 95 th percentile 297 of the genome-wide distribution) in the CEU population than expected by chance (p = 0.008, Fig. 298 4B), while no significant enrichment was detected in YRI. Our results thus suggest that natural 299 selection has acted on these cis-regulatory autoimmune risk variants, particularly in Europeans, observed that colocalized genes are more likely to be differentially-expressed between 304 populations than expected by chance (35.8% are classified as popDE, p = 0.007) (Fig. 4C) , 305 pointing to a potential genetic contribution for the differences in the incidence of autoimmune and 306 inflammatory disorders reported between African and European-ancestry individuals (Brinkworth 307 and Barreiro, 2014) . 308 Within our set of colocalized eGene-SNP pairs, 48 eGenes carried a signature of recent 309 positive selection in either the CEU or YRI populations (|iHS| or F ST > 95 th percentile of the 310 genome-wide distribution) (Fig. 4D) . Many of these genes involve crucial immune-related 311 functions. For example, the Crohn's disease-susceptibility risk variant rs2284553 colocalized with 312 IFNGR2, the gene encoding the beta chain of the IFN-γ receptor, in naïve CD8 + T cells (Fig. S5A) . 313 This variant is found at much higher frequency in the CEU population (MAF = 0.38) than the YRI 314 population (MAF = 0.05) and shows a signature of recent positive selection in the CEU (iHS = 315 2.22). Another variant detected in the allergic disease GWAS, rs5743618, maps to a non-316 synonymous SNP located in TLR1 that is also an eQTL for the nearby gene TLR6 (Fig. S5B) . studies are now needed to define whether ancestry-associated variation in the interferon 344 response to viral infection in vivo is associated with differential viral clearance, disease severity, 345 and disease outcome. 346 Many of the genetic ancestry-associated differences in immune regulation we observe are 347 driven by allele frequency differences at cis-regulatory variants. Among popDE genes in which 348 we identify at least one cis-eQTL across cell types and conditions, we estimate that, on average, 349 cis-eQTLs explain approximately 53% of the variance in the observed ancestry-associated 350 differences. This is likely an underestimate, given that it assumes that each gene has only a single 351 cis-eQTL, when in fact many genes have been shown to have two or more independent cis-eQTL 352 (Lappalainen et al., 2013) . Further, we are underpowered to detect trans associations, and we do 353 not consider non-SNP regulatory variants (e.g., indels and copy number variation), which also 354 influence gene expression variation in humans (Gymrek et al., 2016) . In addition, we provide 355 evidence for ancestry-associated directional shifts in molecular traits (i.e., gene expression 356 phenotypes related to specific biological pathways) that are under cis-regulatory genetic control, 357 highlighting the potential role of polygenic selection in the history of these phenotypes. levels (x-axis) predict the IFN score response (difference in IFN score between the IAV-infected 442 and mock-infected conditions, y-axis) in PBMCs (adj R 2 = 0.553, p = 2.8x10 -17 ), and most 443 individual cell types (Fig. S2C) . In (E) and (G), lines show the best-fit slope and intercept from a 444 linear model. 445 healthy were included in the study. All individuals had detectable levels of IAV-specific serum IgG 498 antibodies, but no differences in antibody titers were identified between European and African-499 ancestry individuals ( Figure S2C, S2D) . (1 µg/mL). For the following two days, 500 µL of Opti-MEM containing TPCK trypsin (2 µg/mL) 507 was added to the culture. One day later, the supernatant was harvested, centrifuged to remove 508 cellular debris, and stored at -80°C. Cal/04/09 was amplified on MDCK cells to generate a stock. 509 Uninfected MDCK cells were cultured for 48 -72 hours and supernatant was harvested to 510 generate the control, mock-conditioned media. Stocks were plaqued on MDCK cells. Cells were 511 infected in infection media (PBS with 10% Ca/Mg, 1% penicillin/streptomycin, 5% BSA) at 37°C 512 for 1 hour. Infection media was replaced with an agar overlay (2X MEM, 1 µg/mL TPCK trypsin, 513 1% DEAE-dextran, 5% NaCo 3 , 2% oxoid agar), and cells were cultured at 37°C for 40 hours then 514 fixed with 4% formaldehyde. Blocking and immunostaining were done for 1 hour at 25°C in 5% 515 milk. Primary stain was mouse anti-Cal/04/09 (1:5000), secondary stain was peroxidase sheep confounded with genetic ancestry. The morning of the experiment, 1M PBMCs were plated at a 525 concentration of 1M/ml for each condition, and exposed to either mock-conditioned media 526 (negative control) or Cal/04/09 IAV at an MOI of 0.5. After 30 minutes of exposure, the control 527 media or virus was washed from PBMC cultures, cells were replated, and cells were then 528 incubated for 6 hours at 37°C in 5% CO 2 and 20% O 2 . Following the 6 hour incubation, cells were 529 collected, washed, and prepared for single-cell capture using the 10X workflow. Immediately prior 530 to the capture, cells from samples were combined into two pools (6 samples per pool) each 531 balanced for infection status (mock-infected and IAV-infected) and genetic ancestry (Table S1) . 532 Multiplexed cell pools were used as input for the single-cell captures, and for each cell pool, 533 10,000 cells were targeted for collection using the Chromium Single Cell 3' Reagent (v2 We performed two versions of clustering analysis and cell type assignment: 1) in which 608 IAV genes were kept in the raw count matrix (used as input for pseudobulk calculations), and 2) 609 in which IAV genes were subset out of the raw count matrix (for visualization of the UMAP in Fig. 610 1B). All other steps of the clustering workflow (implemented in Seurat v3.1.5) remained the same. HLA-DRB1 + , CCR7 + , CST3 + , CD83 + ). A small group of cells, which were identified as B cells, 627 clustered with CD4 + T cells in the UMAP (Fig. 1B) , and we investigated this further to see whether 628 this subset represented a distinct, rare cell type. Further analysis revealed that these cells express 629 markers typical of NKT cells, including CD3D, NKG7, IL2, TNF, and IFNG, and thus, these cells 630 were manually annotated as NKT cells. In the UMAP constructed from input data containing IAV 631 genes, we excluded 1,832 cells for which we could not confidently assign a cell type, as they 632 independently. For each cell type, lowly-expressed genes were filtered using cell-type specific 653 cutoffs (removed genes with a median logCPM < 1.5 in CD4 + T cells, monocytes, and PBMCs, < individuals were introduced as additional covariates into the following differential infection effect 687 model that was run per cell type: 688 Here, E i,j represents the capture-corrected expression estimate of gene i for individual j and Prior to modeling genetic ancestry effects, capture-corrected expression estimates were 725 quantile-normalized within condition using qqnorm in R. The following nested linear model was 726 used to identify genes for which expression levels are correlated with the proportion of African 727 ancestry across individuals within condition (i.e. popDE genes): 728 Here, E i,j represents the capture-corrected expression estimate of gene i for individual j, β 0 (i) is Genes for which the response to IAV infection is correlated with the proportion of African 739 ancestry (i.e. popDR genes) were detected using the following model: 740 This model is similar to M 1 (differential effect of IAV infection), in that it allows us to obtain 742 estimates based on within-individual variability, with the difference that the IAV infection effect is 743 no longer built in a genetic ancestry-independent manner as in model M 1 , since it is now 744 dependent on genetic ancestry as follows: β IAV + β AA IAV (i)⋅AA. In this context, β AA IAV denotes the 745 genetic ancestry-infection interaction effect induced by IAV infection, which represents variation 746 in the response to infection that is correlated with the proportion of African ancestry. 747 To assess sharing of genetic ancestry effects across cell types and to increase our power 748 to detect these effects, we applied Multivariate Adaptive Shrinkage in R (mashr v0. quantile-normalized within condition prior to running the association. Mock-exposed and IAV-779 infected eQTL were mapped separately across all cell types. All regressions were performed 780 using the R package MatrixEQTL (v2.3) (Shabalin, 2012) . Only SNPs with a minor allele 781 frequency > 5% across all individuals were tested, and SNPs with > 10% of missing data or 782 deviating from Hardy-Weinberg equilibrium at p < 10 -5 were excluded (--maf 0.05 --geno 0.10 --783 hwe 0.00001 PLINK v1.9 filters, www.cog-genomics.org/plink/1.9/) (Chang et al., 2015) . In total, 784 6,305,923 SNPs passed our quality-control filters. Local associations (i.e. putative cis-eQTL) were 785 tested against all SNPs located within the gene body or 100kb upstream and downstream of the 786 transcription start site (TSS) and transcription end site (TES) for each gene tested. We recorded 787 the minimum p-value (i.e. the strongest association) observed for each gene, which we used as 788 statistical evidence for the presence of at least one eQTL for that gene. To estimate an FDR, we 789 permuted the genotype data ten times, re-performed the linear regressions, and recorded the 790 Table S11 . Of note, while PC corrections increase our power to detect eQTL, they do 801 not affect the underlying structure of the expression data. considered shared genes that were tested across all cell types (n = 6,573). For each of these 816 genes, we chose a single, top cis-SNP, defined as the SNP with the lowest FDR across all cell 817 types (n = 5) and conditions (n = 2), to input into mashr, yielding a total of 6,573 gene-SNP pairs. 818 We extracted the prior effect sizes (betas) and computed the standard errors (SEs) of these betas 819 (defined as the beta divided by the t-statistic) from the Matrix eQTL outputs for each gene-SNP 820 pair across cell types and conditions. We defined a set of "strong" tests (i.e. the 6,573 top gene-821 SNP associations) as well as a set of random tests, including both null and non-null tests, which 822 we obtained from randomly sampling 200,000 rows of a matrix containing all gene-SNP pairs 823 tested by Matrix eQTL merged across conditions. Our mashr workflow was as follows: i) the 824 correlation structure among the null tests was learned using the random test subset, ii) the data-825 driven covariance matrices were learned using the strong test subset, iii) the mash model was fit 826 to the random test subset using canonical and data-driven covariance matrices, with two 827 additional "infection" covariance matrices (i.e. one matrix capturing shared effects in only the 828 mock-exposed samples and another matrix capturing shared effects in only the IAV-infected 829 samples), and iv) the posterior summaries were computed for the strong test subset. We used 830 the estimated local false sign rate (lfsr) to assess significance of our posterior eQTL effects and 831 considered a gene-SNP pair to have a significant eQTL effect if the lfsr of the posterior mean was 832 < 0.10, which we defined as an eGene. 833 834 Identification of condition-specific popDE genes and eGenes 835 Within each cell type, we considered either popDE genes or eGenes as condition-specific 836 (i.e. only showing an effect in either the mock or IAV infection condition) if they had an lfsr < 0.10 837 in only one condition. Here, we assume that the risk of identifying a true effect in both mock and 838 IAV-infected cells (i.e. a shared popDE gene/eGene) as falsely condition-specific due to lack of 839 power is low, specifically because we employed the multivariate adaptive shrinkage framework, 840 which draws information across conditions to make better-informed posterior estimates about the 841 sharing of effects, so we do not expect to see many posterior effects called as "condition-specific" 842 when, in fact, they are not. 843 To assess the impact of cis-regression on population-associated expression differences, 870 we used two models evaluating the effect of continuous genetic ancestry (African ancestry 871 proportion) on gene expression: i) a model analogous to M 2 (model evaluating the popDE effects, 872 "Modeling genetic ancestry effects and integration with mashr"), and ii) a model in which, for each 873 gene, the top cis SNP for that gene was regressed by including the genotype dosage for that SNP 874 across individuals as a covariate in the model. The models were fit using limma and mashr was 875 applied (as described in the section "Modeling genetic ancestry effects and integration with 876 mashr") to the prior effect sizes and standard errors derived from both models. The mashr 877 posterior summaries were used to directly obtain the observed population differences in 878 expression for each gene. 879 For each significantly enriched GO term (FDR < 0.01) identified in Fig. 3E (see 880 "Enrichment analyses" section below), we calculated summaries of the observed population 881 difference in expression among the genes that belong to each term that are also popDE genes 882 with evidence of an eQTL in at least one cell type. To do this, for each cell type for each term, we 883 collected the observed population differences among these term-specific genetically-driven 884 popDE genes and calculated the median and standard error (SE) for these values (plotted on the 885 x-axis in Fig. 3G ). This was performed for both the observed ("real") model outputs (model i) as 886 well as the cis-regressed model outputs (model ii). For each cell type, we obtained a p-value for 887 the real effects using a permutation method. To obtain a null distribution, we performed 1,000 888 permutations where, for each iteration, we: 1) sampled the same number of observed term-889 specific, genetically-driven popDE genes for that cell type from a background set of all genetically-890 driven popDE for that cell type, 2) obtained the population differences in expression among these 891 genes, and 3) calculated the median for these null values. We then computed a one-sided, 892 empirical p-value, where we considered the number of instances more extreme in the median null 893 difference compared to the median observed difference in the real data given the sign of this 894 difference (i.e. if the observed difference in the real data was < 0, we counted the number of 895 observations in the null distribution equal to or less than the observed value, and if the observed 896 difference in the real data was > 0, we counted the number of observations in the null distribution 897 equal to or greater than the observed value), where p = number of instances more extreme divided 898 by the number of permutations (n = 1000). Similarly, we obtained a p-value for the cis-regressed 899 effects using the same method, except for that in steps 2 and 3, we considered the cis-regressed 900 population differences as opposed to those seen in the real data. Notably, to calculate the 901 directional p-value for the cis-regressed case, we used the magnitude of the median cis-regressed 902 population difference but still considered the sign of the median observed population difference. 903 904 Specifically for the colocalization analysis, eQTL were remapped in each cell type with 906 Matrix eQTL (Shabalin, 2012) using a 1 megabase (Mb) cis-window, with all other modeling 907 parameters kept constant, to broaden our search space and increase our probability of detecting 908 colocalized variants. We assessed colocalization between our identified eQTLs in each cell type, 909 condition pair and 14 publicly-available GWAS summary statistics for 11 autoimmune diseases 910 (Table S10) to the posterior probability of having two independent signals (one for the eQTL and one for the 919 GWAS) and PP4 corresponds to the posterior probability of colocalization between the eQTL and 920 To expand the number of colocalized genes considered for downstream analyses, we also 922 downloaded colocalization results between harmonized bulk eQTLs in 18 immune cell types from 923 3 studies (DICE, DGN, and BLUEPRINT) and the same 14 autoimmune GWAS. (Weir and Cockerham, 1984) . 947 948 Enrichment of colocalized hits with popDE genes and iHS statistics 949 With popDE genes 950 We tested for an enrichment of popDE genes among genes with a colocalization signal 951 across the 14 autoimmune traits included in the colocalization analysis in aggregate. Considering 952 all colocalized signals, we collected a list of the unique genes associated with colocalization hits, 953 corresponding to the eGenes driving the eQTL signature. Among these genes, we calculated the 954 proportion that are also identified as popDE genes (here, a gene is considered popDE if it is called 955 as significantly (lfsr < 0.10) popDE in at least one cell type and condition) and consider this our 956 "observed proportion". To obtain a null distribution, we performed 1,000 permutations where, for 957 each iteration, we: i) sampled the same number of unique genes associated with colocalization 958 hits from a list of all genes tested for the eQTL analysis that were shared across cell types (n = 959 6205), and ii) calculated the proportion of these genes that are also popDE (our "null percentage"). 960 We calculated a p-value by evaluating the number of permutations in which the null percentage 961 was greater than or equal to the observed percentage divided by the number of total permutations 962 (n = 1,000). 963 We then tested for an enrichment of variants with high iHS values (defined as those with 965 an |standardized iHS| > 95 th percentile of the genome-wide distribution among our tested SNPs 966 for the population being considered) among colocalized SNPs for all of the autoimmune traits as 967 a group. For both the 95 th percentile iHS calculations and our null distribution sampling approach 968 described below, we only considered SNPs that were tested for an eQTL association in Matrix 969 eQTL (i.e. those with a minor allele frequency > 0.05). The 95 th percentile |standardized iHS| 970 cutoffs were as follows: CEU = 1.92 and YRI = 1.95. For each tested SNP, we obtained 971 population-specific allele frequencies calculated from the 1000GP CEU or YRI individuals using 972 the vcftools (v0.1) (Danecek et al., 2011) --freq flag. These alleles frequencies were converted to 973 minor allele frequencies (MAFs) (i.e. if the allele frequency for a SNP was > 0.5, we subtracted it 974 from 1), and SNPs were subsequently partitioned into 5% MAF bins (e.g. bin 1 = 0 -5% MAF, 975 bin 2 = 5 -10% MAF, etc. with the rightmost interval closed). 976 Among the unique colocalized SNPs we identified across traits, we noticed that a subset 977 appeared to be in high linkage disequilibrium (LD) with one another, suggesting that these SNPs 978 likely did not represent independent colocalization signals. To account for this, we systemically 979 identified SNPs with squared inter-variant allele count correlations (r 2 ) > 0.8 using PLINK (v1.9, -980 -r2 --ld-window-r2 0.8) (Chang et al., 2015) among all the colocalized hits with multiple tag SNPs 981 for a single eGene. To obtain a list of independent colocalized SNPs, we included: 1) those 982 identified as unlinked loci, and 2) only the SNP with the highest |iHS| value among each set of 983 SNPs with an r 2 > 0.8. We then calculated the proportion of those with an |iHS| > 95 th percentile 984 and considered this our "observed percentage". To obtain a null distribution, we used a sampling 985 approach that mimicked both the MAF distribution and the underlying LD structure of our true 986 data. We performed 1,000 permutations, where, for each iteration, we treated the sampling for 987 the "independent" and "linked" SNPs separately. To create a null distribution for the "independent" 988 SNPs, we sampled the same number of independent SNPs observed in the real data from the set 989 of all tested SNPs in a MAF bin-matched manner, e.g. if there were 10 colocalized SNPs in MAF 990 bin 3 in the observed data, we sampled 10 SNPs from the set of tested SNPs in MAF bin 3. To 991 obtain a null distribution for the "linked" SNPs, we simulated the underlying LD structure of these 992 variants where, for each eGene in the observed data with multiple tag SNPs (i.e. those SNPs with 993 an r 2 > 0.8), we: i) counted the number of tag SNPs for that gene and obtained the corresponding 994 MAF bin for the SNP with the highest |iHS| value, ii) randomly sampled a set of SNPs from 995 chromosome 1 with r 2 > 0.8 from the MAF bin identified in i), where the number of SNPs in the 996 set was equal to the number of tag SNPs in the observed data, and iii) picked the SNP with the 997 ancestry (x-axis) and IFN score response (y-axis) across individuals (mean Pearson's r across 1059 FDR < 0.05) upon IAV infection in either of the cell types being 1053 compared. (D) Pairwise comparisons of the percentage of DE genes in both cell types being 1054 compared that show discordant effect sizes Cis-genetic effects regulate gene expression. (A) Sharing of significant eGenes (lfsr < 1075 0.10) across cell types and treatment conditions. (B) Correlation of the cis-predicted population 1076 differences in expression (x-axis) versus the observed population differences in expression (y-1077 axis) among popDE genes with an eQTL across all cell types in the mock The black line shows the best-fit line from a linear model References 1096 Detection of Mycobacterium avium 1097 subspecies paratuberculosis from patients with Crohn's disease using nucleic acid-based 1098 techniques: a systematic review and meta-analysis Evidence for a heritable predisposition to death due to influenza Enhancements to the ADMIXTURE algorithm for 1102 individual ancestry estimation A global reference for human 1105 genetic variation Evolutionary Dynamics of Human Toll-Like Receptors 1108 and Their Different Contributions to Host Defense Deciphering 1110 the genetic architecture of variation in the immune response to Mycobacterium tuberculosis 1111 infection Characterizing the genetic basis of 1114 transcriptome diversity through RNA-sequencing of 922 individuals Ribosome biogenesis restricts innate immune responses to virus 1116 infection and DNA The role of interleukin-8 in inflammation and mechanisms of regulation ClueGO: a Cytoscape plug-in to decipher 1121 functionally grouped gene ontology and pathway annotation networks Variation in antiviral 2',5'-oligoadenylate 1125 synthetase (2'5'AS) enzyme activity is controlled by a single-nucleotide polymorphism at a splice-1126 acceptor site in the OAS1 gene The contribution of natural selection to present-day 1128 susceptibility to chronic inflammatory and autoimmune disease Genomics for the world IL-32, a novel cytokine with a possible role in disease Racial and Ethnic Minority Groups Second-1135 generation PLINK: rising to the challenge of larger and richer datasets Genetic Drivers of Epigenetic and Transcriptional Variation in 1138 Human Immune Cells The variant call format and VCFtools GOrilla: a tool for discovery 1143 and visualization of enriched GO terms in ranked gene lists Single nucleotide polymorphism at 1146 exon 7 splice acceptor site of OAS1 gene determines response of hepatitis C virus patients to 1147 interferon therapy Evidence that RNA Viruses Drove Adaptive Introgression 1149 between Neanderthals and Modern Humans Ancient RNA virus epidemics through the lens of recent 1151 adaptation in human genomes Viruses are a dominant driver of protein 1154 adaptation in mammals Innate Immune Activity Conditions the Effect of Regulatory 1157 Variants upon Monocyte Gene Expression Mycobacterium avium subspecies paratuberculosis and 1160 Crohn's disease: a systematic review and meta-analysis Rescue of Influenza A Virus from Recombinant DNA Ribosome assembly in 1164 eukaryotes Signatures of Environmental Genetic Adaptation Pinpoint Pathogens as the Main Selective 1167 Pressure through Human Evolution Bayesian Test for Colocalisation between Pairs of Genetic Association Studies 1170 Using Summary Statistics Abundant contribution of short tandem repeats to gene 1173 expression variation in humans Pandemic H1N1 Influenza Virus Has Minimal Impact on Virulence in Animal Models Interaction of Hantavirus Nucleocapsid Protein with Ribosomal 1179 Protein S19 Souporcell: robust clustering of single-cell RNA-seq data by genotype 1182 without reference genotypes Ensembl comparative genomics resources A DNA 1187 transfection system for generation of influenza A virus from eight plasmids Attenuation of 40S 1189 Ribosomal Subunit Abundance Differentially Affects Host and HCV Translation and Suppresses 1190 HCV Replication Systemic lupus erythematosus in adults is associated with previous Epstein-Barr virus exposure Direct Antiviral Mechanisms of Interferon-Gamma Ethnic differences in 1198 C-reactive protein concentrations Human Genetic Determinants of 1201 Viral Diseases Fast gene set enrichment analysis The ribonuclease L-dependent 1205 antiviral roles of human 2',5'-oligoadenylate synthetase family members against hepatitis C virus Transcriptome and 1209 genome sequencing uncovers functional variation in humans Common Genetic Variants Modulate Pathogen-Sensing 1212 Responses in Human Dendritic Cells A statistical framework for SNP calling, mutation discovery, association mapping 1214 and population genetical parameter estimation from sequencing data Regulation of Ribosomal Proteins on Viral Infection The Molecular Signatures Database Hallmark Gene Set Collection Genetic variation in OAS1 is a risk factor for initial infection 1221 with West Nile virus in man hapbin: An Efficient Program 1223 for Performing Haplotype-Based Scans for Positive Selection in Large Genomic Datasets Impact of cell-type and context-1226 dependent regulatory variants on human immune traits Genetic Ancestry and Natural Selection Drive 1229 Population Differences in Immune Responses to Pathogens Tracing the peopling of the world through genomics Basal Expression of Interferon-Stimulated Genes 1234 Drives Population Differences in Monocyte Susceptibility to Influenza Infection Group 1237 differences in proneness to inflammation Distinctive roles of age, sex, and genetics in shaping 1240 transcriptional variation of human immune responses to microbial challenges Living in an adaptive world: Genomic dissection of the 1243 genus Homo and its immune response Genetic Adaptation and Neandertal Admixture Shaped 1246 the Immune System of Human Populations Systemic autoimmune diseases co-existing with chronic 1249 hepatitis C virus infection (the HISPAMEC Registry): patterns of clinical and immunological 1250 expression in 180 cases limma 1252 powers differential expression analyses for RNA-sequencing and microarray studies edgeR: a Bioconductor package for 1255 differential expression analysis of digital gene expression data Inflammatory bowel disease 1257 associated with Yersinia enterocolitica O:3 infection Adaptively introgressed Neandertal haplotype at the OAS locus functionally 1261 impacts innate immune responses in humans Genetic and evolutionary determinants of 1263 human population variation in immune responses Impact of Genetic 1267 Polymorphisms on Human Immune Cell Gene Expression Matrix eQTL: ultra fast eQTL analysis via large matrix operations Cytoscape: A Software Environment for Integrated Models of 1272 Biomolecular Interaction Networks The Red Queen's long race: human adaptation to 1274 pathogen pressure Social status alters immune 1277 regulation and response to infection in macaques Statistical significance for genomewide studies Comprehensive Integration of Single-Cell Data Gene set enrichment analysis: a 1285 knowledge-based approach for interpreting genome-wide expression profiles Flexible statistical methods for 1288 estimating and testing effects in genomic studies with multiple conditions Immunology of COVID-19: Current State of the Science A Map of Recent Positive 1294 Selection in the Human Genome Genome-Wide Polysome Profiling Reveals an Inflammation-Responsive 1297 Eliciting priors and relaxing the single causal variant assumption in 1299 colocalisation analyses IKKβ 1301 phosphorylation regulates RPS3 nuclear translocation and NF-κB function during Escherichia coli 1302 O157:H7 infection Immunoribosomes: Where's there's fire, there's fire. Molecular Estimating F-Statistics for the Analysis of Population 1306 Structure Elevated immunoglobulin G antibodies to the proline-rich amino-terminal 1309 region of Epstein-Barr virus nuclear antigen-2 in sera from patients with systemic connective 1310 tissue diseases and from a subgroup of Sjögren's syndrome patients with pulmonary 1311 involvements Role 1313 of interleukin-32 in cancer biology Plumbing the sources of endogenous MHC class I peptide ligands CrossMap: a versatile 1317 tool for coordinate conversion between genome assemblies A high-1319 performance computing toolset for relatedness and principal component analysis of SNP data We tested for an enrichment of eGenes among the genes identified as popDE genes 846 within each cell type and condition. For each cell type, condition pair, we created two vectors: i) 847 a popDE gene vector, where significant popDE genes (lfsr < 0.1) were coded as a 1 and non-848 significant popDE genes were coded as a 0, and ii) an eGene vector, where significant eGenes 849 (lfsr < 0.10) were coded as a 1 and non-significant eGenes were coded as a 0. The logistic 850 regression was performed on the popDE gene and eGene vectors using glm in R, where the 851 eGene vector was used as the predictor variable and the popDE gene vector was used as the 852 response variable (popDE[0,1] ~ eGene[0,1]). The odds ratios output by glm were converted to 853 log2 fold enrichments with a 95% confidence interval (plotted along the x-axis in Fig. 3C ). 854 855 We estimated the predicted cis-genetic population differences in gene expression using a 857 method in which we first computed the predicted expression of each gene considering only the 858 posterior effect size of the top cis SNP for that gene and an individual's genotype dosage (a vector 859 of 0, 1, or 2), where, for gene i, individual j: 860 predicted expression i = eQTL effect size i * genotype j 861We then modeled these predicted expression values using a model analogous to that of M 2 862 (model evaluating the popDE effects, "Modeling genetic ancestry effects and integration with 863 mashr") to obtain the predicted genetic ancestry effects (plotted on the x-axis for genetically-864 driven popDE genes in Fig. 3D ). The observed population differences in expression were taken 865 directly from the post-mash beta estimates of M 2 (plotted on the y-axis for genetically-driven 866 popDE genes in Fig. 3D ). 867 highest |iHS| value among this SNP set. We then combined our simulated independent and linked 998SNPs and calculated the proportion of those SNPs with an |iHS| > 95 th percentile and considered 999 this our "null percentage". To calculate a p-value, we evaluated the number of permutations in 1000 which the null percentage was greater than or equal to the observed percentage divided by the 1001 number of total permutations (n = 1,000). All of the above analyses were performed twice, once