key: cord-0255773-rewebngu authors: Kumasaka, Natsuhiko; Rostom, Raghd; Huang, Ni; Polanski, Krzysztof; Meyer, Kerstin B.; Patel, Sharad; Boyd, Rachel; Gomez, Celine; Barnett, Sam N.; Panousis, Nikolaos I; Schwartzentruber, Jeremy; Ghoussaini, Maya; Lyons, Paul A.; Calero-Nieto, Fernando J.; Göttgens, Berthold; Barnes, Josephine L.; Worlock, Kaylee B.; Yoshida, Masahiro; Nikolic, Marko Z.; Stephenson, Emily; Reynolds, Gary; Haniffa, Muzlifah; Marioni, John; Stegle, Oliver; Hagai, Tzachi; Teichmann, Sarah A. title: Mapping interindividual dynamics of innate immune response at single-cell resolution date: 2021-09-01 journal: bioRxiv DOI: 10.1101/2021.09.01.457774 sha: 1cb4215aedbed1db827b6c564d7924f2bdb1d288 doc_id: 255773 cord_uid: rewebngu Common genetic variants modulate the cellular response to viruses and are implicated in a range of immune pathologies, including infectious and autoimmune diseases. The transcriptional antiviral response is known to vary between infected cells from a single individual, yet how genetic variants across individuals modulate the antiviral response (and its cell-to-cell variability) is not well understood. Here, we triggered the antiviral response in human fibroblasts from 68 healthy donors, and profiled tens of thousands of cells using single-cell RNA-seq. We developed GASPACHO (GAuSsian Processes for Association mapping leveraging Cell HeterOgeneity), the first statistical approach designed to identify dynamic eQTLs across a transcriptional trajectory of cell populations, without aggregating single-cell data into pseudo-bulk. This allows us to uncover the underlying architecture and variability of antiviral response across responding cells, and to identify more than two thousands eQTLs modulating the dynamic changes during this response. Many of these eQTLs colocalise with risk loci identified in GWAS of infectious and autoimmune diseases. As a case study, we focus on a COVID-19 susceptibility locus, colocalised with the antiviral OAS1 splicing QTL. We validated it in blood cells from a patient cohort and in the infected nasal cells of a patient with the risk allele, demonstrating the utility of GASPACHO to fine-map and functionally characterise a genetic locus. In summary, our novel analytical approach provides a new framework for delineation of the genetic variants that shape a wide spectrum of transcriptional responses at single-cell resolution. The innate immune response is a cell-autonomous program that induces an antiviral state in infected and nearby cells, and alerts the immune system of the invading pathogen 1 . Dysregulation of this response can affect a wide range of inflammatory and autoimmune diseases and determine the outcome of infection [2] [3] [4] [5] [6] . Common genetic variants have been shown to modulate transcriptional responses to various viral and bacterial stimuli, and to contribute to disease onset and progression [7] [8] [9] [10] [11] . Most past gene expression-focused studies of this program are based on bulk RNA-sequencing technologies, which do not fully elucidate the continuous dynamics of transcriptional changes during the innate immune response. Single-cell genomic technologies are powerful approaches to study cell heterogeneity and transcriptional variability across cells 12 . Furthermore, by utilising single-cell RNA-seq (scRNA-seq) profiling of tissues composed of several cell lineages, previous studies have successfully performed genetic association mapping of cell-type specific expression [13] [14] [15] [16] . We here use full-length scRNA-seq of dermal fibroblasts from different human individuals, challenged with immune stimuli. Based on the pseudo-temporal reconstruction of this data, we map the transcriptional variation of the innate immune response at single-cell resolution. This provides the foundation for superimposing human genetic variation onto the transcriptional dynamics of this response. To this end, we develop a novel statistical approach based on a Gaussian process latent variable model 17, 18 called GASPACHO (GAuSsian Processes for Association mapping leveraging Cell HeterOgeneity). This allows us to identify expression quantitative trait loci (eQTL) that manifest at different stages of the response to stimuli. We find several thousand eQTLs, hundreds of which colocalise with known risk loci of diverse autoimmune and infectious diseases. We perform fine-mapping of the OAS locus, associated with COVID-19, to reveal the imbalanced expression of OAS1 and OAS3 genes during the antiviral innate immune response. We further integrate this data with eQTLs from a COVID-19 patient cohort dataset of Peripheral Blood Mononuclear Cell scRNA-seq, as well as with scRNA-seq data of infected nasal epithelial cells from a COVID-19 patient. Overall, our study illustrates how coupling single-cell transcriptomics with a cutting-edge statistical approach can identify dynamic effects of human trait-associated genetic variants in different contexts of activation of antiviral innate immunity and, in general, in diverse cellular dynamic processes. To study the innate immune expression program that is triggered upon viral infection, we exposed primary dermal fibroblasts from 68 donors from the HipSci 19 to two stimulants: (1) poly(I:C) -a synthetic dsRNA that is rapidly recognized by viral sensors and elicits primary antiviral and inflammatory responses, and (2) Interferon-beta (IFNB), a cytokine that upregulates a secondary wave of response in both infected and bystander cells, and shifts the cells into an antiviral mode, where hundreds of Interferon Stimulated Genes (ISGs) are upregulated in order to contain the infection. We collected cells exposed to each of the two stimuli after 2 and 6 hours of stimulation (Fig 1a) . Following this, single-cell RNA sequencing (scRNA-seq) profiling was performed using a plate-based full-length transcript approach (Online Methods). After QC, 22, 188 high-quality cells were obtained across 128 plates, with each plate containing cells from three donors. The donor identity for each cell was inferred from scRNA-seq read data using known genotypes made available by the HipSci consortium (Extended Data Fig. 1a ; Online Methods). Preliminary analysis showed that our data displays high cell-to-cell variability in gene expression both within and across donors, as observed in previous studies by us and others [20] [21] [22] . In fact, our data was confounded by various technical and biological factors, including library preparation in different batches, and cell cycle effects (Extended Data Fig. 1b) . The complex nature of this data, along with its confounders, motivated us to develop a new approach that reveals the genetic and physiologically-relevant variation, while computationally masking confounding factors. Single-cell transcriptomics (as compared to bulk) enables us to uncover hidden states of complex biological processes, while also requiring regression of technical effects and biological variation that is not of interest (e.g. proliferation). We developed GASPACHO (GAuSsian Processes for Association mapping leveraging Cell HeterOgeneity), which utilises a Gaussian Process latent variable model (GPLVM) to uncover the dynamic cell states of interest, while adjusting periodic cell cycle variation and both known and unknown technical variations simultaneously ( Fig. 1b; Methods) . The use of GPLVM allows us to capture smooth and continuous non-linear trends in gene expression along the latent variables, for which other methods such as the standard linear PCA will not work well. Although there are other models that utilise GPLVM to study single-cell dynamics 23, 24 , the novel aspect of our GPLVM approach is that it explicitly takes into account the donor variation as well as other known confounding effects (such as technical batches) as additional random effect terms ( Fig. 1b; Methods) . These confounders are known to inflate type-I error in downstream analyses, such as in differential expression 25 , leading to false discovery of differentially expressed genes. As detailed below, the model output not only enabled us to look at the architecture of the antiviral response in the cell-state space, but also provided a rigorous statistical framework of (1) spatial differential expression (DE) analysis and (2) genetic association mapping using genotype data obtained from the donor of origin for each cell. Specifically, the gene expression variation in the target cell-state space was inferred by a Gaussian Process (GP) mixture model in which an additional GP component is introduced into the model to capture hidden spatial DE patterns 26 of gene expression in the latent space ( Fig. 1b; Methods) . The genetic association mapping was also carried out by using a GP regression model in which the effect size of a quantitative trait locus (QTL) was modeled as a GP in the target cell-state space. Here, the additional GP was multiplied by the genotype dosage (the number of alternative alleles for each donor) to capture the gene-environment interaction 27 (Fig. 1b; Methods) . Importantly, the eQTL effect is obtained at single-cell resolution, and the model does not require aggregation of single-cell data into pseudo-bulk data, which is a common eQTL mapping strategy. Thus, we can study the effect of genetic variants without losing the continuum of transcriptional dynamics and its spectrum across individual cells. We have implemented the software in R, which is available from github (https://github.com/natsuhiko/GASPACHO). We first applied the GPLVM to adjust for the cell cycle and unknown batch effects in our data (Extended Data Fig. 1c-d) and successfully extracted the innate immune state embedded in the data (Fig. 2a) . We also confirmed that the extracted immune state was orthogonal to cell cycle or the unknown batch variations (Extended Data Fig. 1e ). We observed two major cell trajectories: one for response to IFN-b from the naive state (x-axis) and the other for response to Poly (I:C) (y-axis). We then applied the GP mixture model which revealed two independent innate immune responses, the primary response by virus infection and the secondary response for bystander cells due to IFN-b secretion by the infected cells or direct IFN-b stimulation ( Fig. 2b; Extended Data Fig. 2a) . Those responses were highly overlapping on the UMAP suggesting those two processes are independently and simultaneously happening in each cell. Interestingly, the primary response was also correlated with the predicted cell viability by CEVIChE ( Fig. 2c ; Online Methods). In total, the GP mixture model discovered 903 and 636 genes upregulated during the primary and secondary responses respectively (hereafter referred to as primary response genes and secondary response genes), while 1,020 genes were expressed uniformly across all cells in different experimental conditions (referred to as stationary genes) ( Fig. 2d; Extended Data Fig. 2b ). Many cytokine and chemokine genes were upregulated along the primary response, while interferon stimulated genes (IGSs) were upregulated along the secondary response (Fig. 2e ). The GO enrichment analysis for the primary and secondary genes clearly demonstrated primary response genes are enriched for cell death and inflammatory response, while secondary response genes are enriched for type I interferon response ( Fig. 2f; Extended Data Fig. 2c) . Dynamic genetic e ect on transcriptional response to innate immune stimuli We then mapped expression quantitative trait loci (eQTLs) along innate immune response using the GP regression model to assess genetic association in single cell resolution (Online Methods). We discovered 2,662 eQTL genes (local FDR 10%) among 10,748 genes expressed in, at least, 10% of total cells. In order to demonstrate that we chose a sensible approach, we first compared our GP approach with pseudo bulk approaches and the standard marginal effect approach to confirm that our approach provides both the highest sensitivity and specificity for mapped eQTLs (Extended Data Fig. 3a-b ; Online Methods). Among our eQTLs, we found 16% and 13% are primary and secondary response genes ( Fig. 3a, Extended Data Fig. 3c) . Those genes are strongly enriched with the discovered eQTLs ( Fig. 3a, Extended Data Fig. 3d ). We also found the primary response eQTLs are depleted in GTEx fibroblast eQTLs while the stationary eQTLs are enriched (Fig. 3a, Extended Data Fig. 3e-f ; Online Methods), suggesting our eQTLs are highly context specific. As an example, the CXCL1 gene is a primary response eQTL mostly expressed in later time points of Poly (I:C) stimulated cells (Fig. 3b ) and the expression level is higher for 6 the alternative allele T at rs1358594 compared with the reference allele G (Fig. 3c) . This eQTL signal was discovered more than 100Kb downstream of the gene TSS (transcription start site) and only present upon cell stimulation by Poly (I:C), but not in the naive condition as clearly shown in GTEx fibroblast eQTL data (Fig. 3d) . Note however that this eQTL was discovered in eQTLGen data with tens of thousands of blood samples (Extended Data Fig. 3g ). This might suggest that the eQTL signal is present in-vivo, although the effect size could be too small to discover with only hundreds of GTEx samples. We have also performed fine-mapping using epigenetic data (histone modification ChIP-seq of active promoters and enhancers) originating in dsRNA-stimulation of human dermal fibroblasts 22 (Methods). We identified more than 10% of putative causal eQTL variants which are found in each ChIP-seq peak and enriched for promoter peaks of Poly (I:C) stimulated cells characterised by H3K4me3 antibody (Fig. 3e) . Note that our eQTLs were also strongly enriched around TSS, thereby the number of eQTLs was reduced by 34% every 100Kb further away from TSS (Fig. 3e) . We next tested whether promoter architecture affects the variability between individuals. It was previously shown by us and others 22,28 that genes containing TATA-boxes in their promoters tend to vary more in transcription between species, conditions and between individual cells responding to immune stimulus, whereas promoters containing CpG-Islands (CGIs) tend to vary less and be transcriptionally more homogenous. We observe that genes with TATA-containing promoters are 1.4 times more highly enriched with eQTLs in comparison with genes with CGI-containing promoters ( Fig. 3f) . Using the fine-mapped eQTL variants using ChIP-seq annotations, we finally examined which transcription factor motifs were disrupted by the lead eQTL variants (Methods). We found interferon regulatory factor 1 and 4 (IRF1 and IRF4) as well as REL and ATF4 were significantly enriched (Fig. 3g ). An example of putative TF binding disruption was discovered in RTP4 eQTL, where the alternative allele of a promoter flanking eQTL variant (rs62292793T>A) may disrupt an IRF1 motif that significantly reduces putative TF binding affinity, which subsequently downregulates the RTP4 expression (Extended Data Fig. 3e) . Furthermore, the TATA-motif is also found to be disrupted by eQTL variants, further suggesting the importance of TATA-regulation in modulating the response and its variability among individuals. [mention TATA and Landry's Science 2008 in Discussion] One of the advantages of eQTL mapping is to uncover the target genes and related cell states at each genetic locus implicated by genome-wide association studies (GWAS) of common complex traits. We here tested for colocalization of our eQTLs with risk loci from 701 GWAS with 5 or more genome-wide significant loci, of which 112 were broadly immune related, including autoimmune and chronic inflammatory diseases such as Crohn's disease and infectious diseases such as COVID-19 (Online Methods). We discovered 3,132 unique gene-trait combinations with the posterior probability of a single shared causal variant between an eQTL and a GWAS locus greater than 0.5. The combinations consisted of 495 different GWAS traits and 823 unique genes. We observed an excess of colocalised eQTLs for immune related traits over non-immune traits ( Fig. 4a; P=2.0×10 -5 ; Online Methods), likely reflecting the known involvement of innate immunity in each of the disease pathologies. We discovered 36 primary and secondary response eQTLs that were specifically colocalised with 41 autoimmune and infectious diseases, some of which were colocalised with multiple traits (Fig. 4b) . For example, we detected an eQTL for the ETV7 gene which produces a transcription factor in the ETS family and plays a key role in hematopoiesis 29 . The eQTL was colocalised with rheumatoid arthritis and hayfever, allergic rhinitis or eczema (Fig. 4c) . The gene is an ISG and the expression is upregulated during secondary response (Fig. 4d) . The lead eQTL variant (rs1998266T>C) is shared with the GWAS traits, whose alternative allele C upregulates gene expression in stimulated conditions and also increases the risks of those GWAS traits (Fig. 4d) . The alternative allele C also modifies the binding motif of the transcription factor ATF6 putatively bound at the promoter region of ETV7, thereby potentially increasing the expression level (Extended Data Fig. 4 ). In conjunction with the fibroblast system, we used two additional in vivo systems ( Fig. 5a ) to further finemap the 12q24.13 (OAS) locus which was reported in a genome-wide association study of reported SARS-CoV-2 positive-infected individuals against population controls 30 (index SNP: rs10774671G>A). The locus is colocalised with the OAS1 eQTL in fibroblasts with a posterior probability of 0.89 (Fig. 4b, Fig. 5b ). OAS1 is a secondary response gene, and is highly expressed upon IFN-b (at 2h and later) and Poly (I:C) stimulation (at 6h) (Fig. 5c) . The alternative allele A of rs10774671 down-regulates the expression level (Fig. 5d) . We investigated our recently published PBMC scRNA-seq data 31 Fig. 5b ; Online Methods). As expected, OAS1 is highly expressed as a secondary response gene in PBMCs (Fig. 5e) and we confirmed that OAS1 is also a strong eQTL in PBMCs and colocalises well with the COVID-19 GWAS locus with the posterior probability of 0.99 (Fig. 5b) . The GWAS index variant rs10774671G>A is the lead eQTL variant in PBMCs whose alternative allele A is strongly negatively correlated with OAS1 expression. This is especially clear in CD16+ monocytes, among other immune cell types (Fig. 5f, Extended Data Fig 5a) . The index SNP rs10774671 is known to be a splicing QTL 32 that disrupts the splicing motif right next to the last exon of OAS1 gene (Fig. 5g) . In our fibroblast data, this variant also increased the intron expression between the last two exons and created the three different isoforms, all of which are known to cause impaired OAS1 protein expression 32 . These alternative isoforms are also observed in nasal epithelial cells in the nasal brushing sample of a COVID-19 positive patient with the alternative homozygous genotype (Fig. 5a, h) . The cells from this patient contain SARS-CoV2 viral reads, although the same cells express OAS1 and SARS-CoV2 at relatively low levels 33 . Since the alternative allele A of rs10774671 is also the risk allele in COVID-19 GWAS, this implies that impaired OAS1 RNA, and hence, protein expression may cause dysregulation of SARS-CoV-2 virus RNA degradation and clearance in host cells. In this work we developed GASPACHO -a novel statistical framework that allows for the first time, to infer how genetic variants affect the trajectory of gene expression over a dynamic process such as a stimulation time course across individual cells. Using GASPACHO, we integrated scRNA-seq data from fibroblasts from 68 donors stimulated by innate immune stimuli, and obtained a low dimensional gene expression space representing the response dynamics across stimulated cells. This procedure also provides us with a map of interindividual transcriptional variation at single-cell resolution, which were also enriched for regulatory regions (such as TF binding sites) profiled during fibroblast stimulation 22 . This approach discovered 2,662 eQTL loci, of which 823 were colocalised with one or more GWAS associated loci of autoimmune and infectious diseases including COVID-19 at the OAS locus. In conjunction with the OAS1 eQTL, OAS3 eQTL in fibroblasts was also colocalised with COVID-19 GWAS (PP=0.53) (Fig. 4b, Extended Data Fig. 6a) . Because OAS1 and OAS3 are both interferon stimulated genes, the expression patterns of OAS1 and OAS3 along the innate immune response trajectory are very similar (Fig. 5c, Extended Data Fig. 6b) . However the eQTL effect direction was opposite for the two genes (Fig. 5d The GPLVM implemented in GASPACHO was applicable for more than 20K cells, the current implementation in R is not scalable for hundreds of thousands of cells. In order to overcome this issue, we need a cutting-edge Bayesian inference technique, such as the stochastic variational inference implemented on modern GPU machines. In summary, our study demonstrates how an in vitro system combined with single-cell RNA transcriptomics, allows us to chart the transcriptional landscape of complex innate immune responses. Our single-cell datasets combined with the Gaussian process based approach shed light on the common genetic basis of autoimmune and infectious diseases during this challenging period of the COVID-19 pandemic. The cell viability was predicted by the web based tool called CEVIChE (CEll VIability Calculator from gene Expression; https://saezlab.shinyapps.io/ceviche/). Because the tool is designed for bulk RNA-seq data, we aggregated gene expression levels for neighbouring cells based on the UMAP in Fig. 2a . We constructed 30 × 30 equispaced grids and took geometric means of log CPM values within each grid. We preprocessed the Smart-seq2 data exactly the same as in Kumasaka et al. (REF) . We used demuxlet (https://github.com/statgen/demuxlet) to identify the genetic origin of each cell as well as to remove doublets using the genotype data from HIPSCI (see below). We obtained the SNP genotype data from HipSci 19 . We converted the genome coordinate from hg19 to GRCh38 using CrossMap (version 0.5.2; http://crossmap.sourceforge.net/). The GASPACHO framework incorporated a GPLVM as a core model to estimate the latent variables and model parameters subsequently used in the differential expression analysis and eQTL mapping. We assumed the gene expression vector = ( ; = 1, ..., ) for the gene across cells is independently drawn from where is a baseline GP governed by three different kernel matrices, periodic kernel α matrix for the cell cycle state ( ) and two other squared exponential kernel matrices θ θ and for unknown batch effects (B) and the target cell state (X), respectively. Here is a design matrix for the known covariates, such as donor and sequencing plates (Fig. 1b) , and is a random effect to adjust the known confounding effects whose mean and γ variance were defined by and the diagonal matrix shared across all genes . ζ ∆ = 1, ..., The residual expression was determined by the gene specific residual variance and σ The Bayes factor of genetic association can be obtained by: where we set (see Supplementary Note for more details). δ = 0. 1 We tested genetic variants whose minor allele frequency is greater than 0.05 in 1Mb cis regulatory window centred at each gene TSS. In order to control the false discovery rate in a Bayesian framework, we used the hierarchical model 34 to obtain the posterior probability that a gene is an eQTL as well as the posterior probability that a variant is an eQTL variant within the cis window. The model allows incorporating various genomic annotations in the gene-level and variant-level as demonstrated previously 34 . We used the ChIP-seq peak annotations obtained from Hagai et al. 22 in conjunction with TSS proximity to estimate the contribution of epigenetic information to the eQTL variant discovery. eQTL enrichment in di erentially expressed genes Any enrichment analysis was carried out based on posterior probability that the gene j is an eQTL obtained from the hierarchical model. We then computed a 2×2 ) for , where denotes the probability that the gene j is differentially expressed , = 0, 1 (defined above), denotes the probability that the gene j is an eQTL in our data and (1) not in GTEx fibroblasts, denotes the probability that the gene j is eQTLs in our data (3) and GTEx fibroblasts and not sharing the putative causal variant, and denotes the (4) probability that the gene j is eQTLs in our data and GTEx fibroblasts and colocalised. We used the same pairwise hierarchical model 35 We used the TATA motifs from CIS-BP (Data Availability) and the CpG annotation from UCSC (Data Availability) to annotate genes that have a TATA motif and/or a CpG site 100bp upstream from TSS (we referred to as TATA genes and CpG genes). To perform the hypothesis testing that a TF motif is significantly overlapping with eQTL variants, we set and to be the binary variable whose value is if = =1,..., max { } = 1 the lead eQTL variant l is overlapping with a TF motif; otherwize . We then = 0 computed the 2x2 table to perform the enrichment analysis as described above. We reduced the full GASPACHO approach to accommodate the PBMC single cell data of over 700K cells in a reasonable time scale. The kernel functions used in the model were restricted to the linear kernel without the cyclic kernel for the cell cycle effect. The latent factors were estimated with the covariates of the number of genes expressed, the number of mapped reads, the sequencing center, sex, age, COVID19 status, COVID19 sevierity, patient ID, and the first 3 genotype principal components. The latent factors were then used to define the two GPs The annotation of CpG site was downloaded from the UCSC website Pathogen Recognition by the Innate Immune System Signatures of Environmental Genetic Adaptation Pinpoint Pathogens as the Main Selective Pressure through Human Evolution Gain-of-function mutations in IFIH1 cause a spectrum of human disease phenotypes associated with upregulated type I interferon signaling The contribution of natural selection to present-day susceptibility to chronic inflammatory and autoimmune disease The A946T variant of the RNA sensor IFIH1 mediates an interferon program that limits viral infection but increases the risk for autoimmunity Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression Common genetic variants modulate pathogen-sensing responses in human dendritic cells Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response Genetic analysis of isoform usage in the human anti-viral response reveals influenza-specific regulation of ERAP2 transcripts under balancing selection Population variation in miRNAs and isomiRs and their impact on human immunity to infection Deep generative modeling for single-cell transcriptomics Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation Optimizing expression quantitative trait locus mapping workflows for single-cell studies Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Gaussian process latent variable model Common genetic variation drives molecular heterogeneity in human iPSCs Single-cell RNA-seq reveals dynamic paracrine control of cellular variation Extreme heterogeneity of influenza virus infection in single cells Gene expression variability across cells and species shapes innate immunity Pseudotime estimation: deconfounding single cell time series GrandPrix: scaling up the Bayesian GPLVM for single-cell data A practical solution to pseudoreplication bias in single-cell studies SpatialDE: identification of spatially variable genes A linear mixed-model approach to study multivariate gene-environment interactions Conservation and divergence in Toll-like receptor 4-regulated gene expression in primary human versus mouse macrophages The ETS factor TEL2 is a hematopoietic oncoprotein Mapping the human genetic architecture of COVID-19 Single-cell multi-omics analysis of the immune response in putative causal interactions between regions of open chromatin A map of transcriptional heterogeneity and regulatory variation in human microglia Barplot shows the number of cells for each donor (cell line). b. UMAP calculated from the first XXX principal components from the data. Points are coloured by unknown batch effect (well correlated with experimental date), cell cycle phase estimated from known marker genes (Methods) and experimental conditions. c. Histogram shows the distribution of estimated cell cycle phase by GPLVM (Methods). d. Scatterplots show scaled expression of known cell cycle genes (UBE2C and CDC6) and a gene highly Differential expression result from GPMM. a. Seven different mixture components estimated from the GP mixture model (Methods). b. The number of genes categorized in each of the seven components. c. Top GO terms enriched with the genes Extended Data Figure 4. Fine-mapping of ETV7 eQTL. a. ATF6 expression on the UMAP ATF motif (M6155_1.02; CIS-BP version 1.02). The nucleotide C coloured by red indicates the location of the eQTL variant rs1998266T>C. c. Locus zoom plots of hayfever, allergic rhinitis or eczema Extended Data Figure 6. Association and effect directions of OAS3 eQTL in fibroblasts and colocalisation with COVID-19 GWAS. a. UMAP shows the OAS3 gene expression. b. UMAP shows the eQTL effect size at rs10774671. c. Locus zoom plot shows the COVID-19 both in fibroblasts) associations. d. Effect direction of OAS1/3 eQTLs and the risk allele of COVID-19 GWAS at the lead variant rs10774671 Extended Data Figure 7. Interferon receptor gene expression. a. UMAP shows interferon alpha receptor 1 (IFNAR1) gene expression. b. UMAP shows interferon alpha receptor 2 (IFNAR2) gene expression. c. UMAP shows interferon gamma receptor 1 from Genentech, Biogen, Roche and GlaxoSmithKline, as well as Foresite Labs over the past 3 years. All other authors declare no competing interests.