key: cord-0259577-g10tb4em authors: Korsunsky, Ilya; Wei, Kevin; Pohin, Mathilde; Kim, Edy Y.; Barone, Francesca; Kang, Joyce B.; Friedrich, Matthias; Turner, Jason; Nayar, Saba; Fisher, Benjamin A.; Raza, Karim; Marshall, Jennifer L.; Croft, Adam P.; Sholl, Lynette M.; Vivero, Marina; Rosas, Ivan O.; Bowman, Simon J.; Coles, Mark; Frei, Andreas P.; Lassen, Kara; Filer, Andrew; Powrie, Fiona; Buckley, Christopher D.; Brenner, Michael B.; Raychaudhuri, Soumya title: Cross-tissue, single-cell stromal atlas identifies shared pathological fibroblast phenotypes in four chronic inflammatory diseases date: 2021-02-18 journal: bioRxiv DOI: 10.1101/2021.01.11.426253 sha: d618dcfd9bd4a547e8056349b47d2f2d2bfa1d0a doc_id: 259577 cord_uid: g10tb4em Pro-inflammatory fibroblasts are critical to pathogenesis in rheumatoid arthritis, inflammatory bowel disease, interstitial lung disease, and Sjögren’s syndrome, and represent a novel therapeutic target for chronic inflammatory disease. However, the heterogeneity of fibroblast phenotypes, exacerbated by the lack of a common cross-tissue taxonomy, has limited the understanding of which pathways are shared by multiple diseases. To investigate, we profiled patient-derived fibroblasts from inflamed and non-inflamed synovium, intestine, lung, and salivary glands with single-cell RNA-sequencing. We integrated all fibroblasts into a multi-tissue atlas to characterize shared and tissue-specific phenotypes. Two shared clusters, CXCL10+CCL19+ immune-interacting and SPARC+COL3A1+ vascular-interacting fibroblasts were expanded in all inflamed tissues and additionally mapped to dermal analogues in a public atopic dermatitis atlas. We further confirmed these human pro-inflammatory fibroblasts in animal models of lung, joint, and intestinal inflammation. This work represents the first cross-tissue, single-cell fibroblast atlas revealing shared pathogenic activation states across four chronic inflammatory diseases. Fibroblasts are present in all tissues and adopt specialized phenotypes and activation states to perform both 48 essential functions in development, wound-healing, and maintenance of tissue architecture, as well as 49 pathological functions such as tissue inflammation, fibrosis, and cancer responses (Koliaraki et al., 2020) . (Friedrich et al., 2020) . Lung investigations identified that COL3A1 + ACTA2 + myofibroblasts, PLIN2 + 59 lipofibroblast-like cells, and FBN1 + HAS1 + fibroblasts are expanded in lung biopsies from patients with idiopathic Fibroblast heterogeneity within tissues. We next examined the heterogeneity of fibroblast cell states within individual tissues. We performed a separate Table 5 ). In the intestine, we were able to recapitulate 7 of 8 139 populations identified in (Smillie et al., 2019) : crypt-associated WNT2B + Fos hi and WNT2B + Fos lo , epithelial-140 supportive WNT5B + -1 and WNT5B + -2, stem cell niche supporting RSPO3 + , inflammatory, and myofibroblasts. We note that our data did not support the 2 subtypes of WNT2B + Fos lo fibroblasts identified originally in (Smillie 142 et al., 2019) . In the lung, Habermann et al., 2020 described 4 states: HAS1 + , PLIN2 + , fibroblasts, and 143 myofibroblasts. However, in their analysis, HAS1 + cells were identified in only 1 of 30 donors. When we re-144 analyzed their data to identify clusters shared by multiple donors, we could not distinguish the HAS1 + from 145 PLIN2 + population and thus merged these two in our annotation. In the salivary gland, the only single-cell study 146 of fibroblasts to date was performed with multi-channel flow cytometry (Nayar et al., 2019) , not scRNAseq. The 147 findings here represent the first set of scRNA-seq data in this context. In our single-cell clusters, we identified 148 the two populations previously described (CD34 + and CCL19 + ) and confirmed the expression of key 149 distinguishing cytokines and morphogens that they measured by qPCR (Supplementary Figure 2b) . In the 150 synovium, we clustered 55,143 fibroblasts into 5 major states described in three scRNAseq studies (Croft et al., Next, we asked whether fibroblast states defined within one tissue shared similar expression 155 profiles with states defined in other tissues. We performed cluster marker analysis within each tissue, 156 quantifying the overexpression of each gene in each cluster in terms of the log2 fold change with other 157 clusters. We plotted 4,897 genes that were overexpressed in at least one cluster and labeled the top 158 3 markers per cluster (Figure 2b) . We noticed that many marker genes were present in clusters from 159 different tissues. To find which pairs of clusters across tissues were most similar, we correlated 160 (differential) expression profiles (Methods) for cross-tissue clusters (Figure 2c) . The most correlated 161 (Pearson = 0.44, = 10 )*+ ) pair of clusters contained CD34 + fibroblasts in the salivary gland and 162 CD34 + sublining (SC-F1) fibroblasts in the synovium (Figure 2d) . Although they shared multiple 163 marker genes (PAMR1, MFAP5, CD34, CD70, DPP4, FABP3, and FNDC1), they also had tissue-164 related, cluster-specific genes (POSTN, RAMP1, PRG4, PI16, and TNMD). The shared markers 165 suggest a shared function. The cluster-specific genes may have arisen from a technical artefact, such 166 as different clustering parameters in the tissue-specific analyses, or from true biological signal, such 167 as a tissue-specific microenvironment. In order to distinguish between the two possibilities, we decided 168 to perform a single integrative clustering analysis with fibroblasts from all tissues. 169 We designed an analytical pipeline for integrative clustering to address the two concerns described 209 above (Figure 3a) . In this pipeline, we select genes that were informative in the tissue-specific analyses 210 (Methods), associated with either cluster identity (Supplementary Table 5 Table 6 , n=6,476) within tissue, for a total of 9,521 unique genes. To minimize the impact of 212 different cell numbers, we performed weighted PCA analysis, giving less weight to cells from over-represented 213 tissues (e.g. synovium) and more to cells from under-represented tissues (e.g. lung), such that the sum of 214 weights from each tissue is equivalent (Methods). Compared to unweighted PCA, this approach results in 215 principal components whose variation is more evenly distributed among tissues (Supplementary Figure 3a) . As expected, in this PCA space, cells group largely by donor and tissue (Supplementary Figure 3b,c) . In order 217 to appropriately align cell types, we removed the effect of donor and tissue from the cells' PCA embedding 218 coordinates with a novel, weighted implementation of the Harmony algorithm that we developed for this specific 219 application (Methods). UMAP visualization of the harmonized embeddings shows that cells from different 220 tissues are well mixed (Figure 3b ). In contrast, fibroblast states identified in tissue-specific analyses are well 221 separated (Supplementary Figure 3d) , suggesting that the integrated embedding faithfully preserves cellular 222 composition. In this integrated space, we performed standard graph-based clustering to partition the cells into C14. Points denote log fold change (cluster vs other fibroblast) and error bars mark the 95% confidence interval for the 266 fold change estimate. (g) The number of shared genes for each cluster, ranked from most to least, prioritizes clusters with 267 large evidence of shared gene expression (in red) from those with little (in black). Marker genes for the 5 shared clusters 268 plotted in a heatmap. Each block represents the (differential) gene expression of a gene (column) in a cluster, for a tissue 269 (row). Identification of shared and tissue-specific marker genes in integrated clusters. Next, we modeled gene expression to define active gene programs in the 14 integrative fibroblast clusters. In 272 particular, we wanted to distinguish between two types of cluster markers: tissue-shared and tissue-specific. Tissue-shared markers are highly expressed in the cluster for all four tissues. Tissue-specific markers are highly 274 expressed in the cluster for at least one tissue but not highly expressed in at least one other tissue. In our 275 expression modeling analysis, we needed to allow for the possibility that tissue gene expression will be 276 consistent in clusters and variable in others (Figure 3d ). As we explain in our approach below, we will use ADAM12 expression in cluster C4 as an example of a tissue-shared gene and MYH11 expression in cluster 278 C13 as an example of a tissue-specific gene. Typically, cluster marker analysis is done with regression, to associate gene expression with cluster 280 identity. To address the complex interaction between cluster and tissue identity in our data, we used mixed-281 effects regression to perform hierarchical cluster marker analysis (Methods). This analysis estimated two sets 282 of differential expression statistics for each gene: mean log2 fold change (e.g. cluster 0 vs all other clusters) and 283 tissue-specific log2 fold change (e.g. cluster 0 in lung vs all other clusters in lung). This approach distinguishes 284 shared marker genes, defined by minimal tissue-specific contributions, from tissue-specific marker genes, 285 defined by large tissue-specific fold changes, relative to the mean fold change. To demonstrate, we plotted the 286 estimated log2 fold changes, with a 95% confidence interval, for one shared (Figure 3e ) and one tissue-specific 287 (Figure 3f ) cluster marker. ADAM12, a shared marker for cluster C4, has significant (log2 fold-change = 1.6, 288 = 6.5 × 10 )+ ) mean differential expression in C4, while the tissue-specific effects (in color) are not significantly 289 different for any one tissue (Figure 3e ). In contrast, MYH11, is differentially overexpressed in cluster C13 for 290 intestinal (log2 fold-change = 3.7, = 8.5 × 10 )23 ) and lung fibroblasts (log2 fold-change = 2.6, = 5.9 × 10 )6 ) 291 but not for synovial or salivary gland cells (Figure 3f) . Because MYH11 is so strongly overexpressed in intestinal 292 and lung fibroblasts, the mean log2 fold-change is also significant (log2 fold-change = 1.7, = 5.7 × 10 )+ ) and 293 therefore is not a good metric alone to determine whether a marker is shared or tissue-specific. We defined tissue-shared cluster markers conservatively by requiring a marker gene to be significantly 295 overexpressed in all four tissues, such as ADAM12 above. With this criterion, we quantified the number of 296 shared marker genes per cluster (Figure 3g ). Clusters C0, C1, C2, C3, C6, C7, C10, C12, and C13 each had 297 fewer than 20 shared markers. Based on this cutoff, we decided that these clusters had too little evidence of 298 shared marker genes to be reliably called shared clusters. We assigned names for the remaining clusters based 299 on their shared gene markers: SPARC + COL3A1 + C4, FBLN1 + C5, PTGS2 + SEMA4A + C8, CD34 + MFAP5 + C9, 300 and CXCL10 + CCL19 + C11. We then plotted the log2 fold change values of all 1,524 shared markers for these 301 clusters in Figure 3h and report the results of the full differential expression analysis in Supplementary Table 302 7. Identification of fibroblast states expanded in inflamed tissue. We next addressed which cross-tissue fibroblast states were expanded in inflamed tissues. In order to perform 305 this association across tissues, we first needed to define a common measure of tissue inflammation. While 306 histology is often the gold standard to assess inflammation, histological features are inherently biased to tissue-307 specific pathology. Instead, we decided to define inflammation in a tissue-agnostic way, as the relative 308 abundance of immune cells in each sample. While immune cell abundance alone oversimplifies complex 309 pathological processes, it is a ubiquitous and quantifiable measure of chronic inflammation. We quantified the 310 fraction of immune cells based on previously labeled scRNAseq clusters (Figure 1b) , for salivary gland and 311 lung samples, and based on the proportion of CD45 + cells by flow cytometry (Supplementary Figure 1a) , for 312 synovium and intestine (Figure 4a ). We note that these estimates are quantified with dissociated cells from 313 cryopreserved tissue (Methods) and thus lack granulocytes, such as neutrophils, which constitute an important 314 part of tissue inflammation. In order to get comparable results across tissues, we standardized the raw tissue- Standardized Score Fraction of fibroblasts e Using these standardized inflammation scores, we performed a separate association analysis with 336 mixed-effects logistic regression for each tissue. This analysis provided, for each tissue and fibroblast state, the 337 effect of increased inflammation on cluster abundance (Figure 4c) . Positive log odds ratios denote expansion 338 with inflammation whereas negative ratios denote a diminishing population. Some clusters, such as C2, C3, C7, 339 PTGS2 + SEMA4A + C8, and C12, were significantly (FDR<5%, red) expanded in only one tissue. Others, such 340 as CXCL10 + CCL19 + C11 and SPARC + COL3A1 + C4, were significantly expanded in multiple tissues. We 341 confirmed that association with normalized inflammation scores did not change the qualitative results within 342 tissue but did make the results more interpretable across tissues (Supplementary Figure 4) . We then 343 performed a meta-analysis of these tissue-specific effects (Methods) to prioritize clusters expanded 344 consistently across all tissues (Figure 4d ). This meta-analysis identified two fibroblast states significantly Together, this suggests that SPARC + COL3A1 + fibroblasts may be driven by conserved developmental Since we have previously identified Notch3 signaling as a key driver in differentiation of disease-associated perivascular fibroblasts in RA synovia (Wei et al., 2020) , we predict this cluster may represent a similar 410 endothelium-driven, activated fibroblast state across inflammatory diseases involving other organ tissues. We Correspondence between fibroblast clusters defined in integrative analysis and single-tissue analyses. We determined how the clusters labeled in the single-tissue analyses ( between DKK3 + and THY1 + sublining fibroblasts in the synovium, mapped exclusively to myofibroblasts in the 435 lung, split between inflammatory fibroblasts and myofibroblasts in the intestine, and mapped to CD34 + 436 fibroblasts in the salivary gland. Notably, none of these associations was one-to-one. HLA-DRA + synovial fibroblasts, CCL19 + salivary gland fibroblasts, and RSPO3+ and WNT2B + Fos hi intestinal fibroblasts mapped to 438 multiple clusters that were expanded in one or more tissues: C3 (lung and synovium), C2 (synovium), C12 (intestine), and C8 (salivary gland and synovium). Similarly, the myofibroblasts in the lung and intestine, as well 440 as DKK3 + synovial fibroblasts mapped to both C13 and to vascular fibroblasts (C4). Cluster C13 In the synovium and intestine, several clusters have previously been shown to be associated with distinct Validation in an alternative tissue: dermal fibroblasts in atopic dermatitis. As a proof of principle, we next explored whether the fibroblast states discovered in the four tissues could 465 generalize to a tissue not explored in this study by examining cells from an independent dataset. We analyzed 466 data from a study (He et We wanted to compare dermal fibroblasts directly to clusters defined in our fibroblast atlas. To do this, we leveraged a novel algorithm, Symphony (Kang et al., 2020) (Methods), designed to quickly and accurately 500 map new scRNAseq profiles into a harmonized atlas to compare them with annotated reference cells. Using 501 Symphony, we mapped dermal fibroblasts into our multi-tissue fibroblast atlas and projected them into the 502 reference UMAP space for visual comparison (Figure 6c) . For quantitative comparison of fibroblast subtypes, we labeled individual dermal fibroblasts by their most similar reference clusters (Figure 6d) . Dermal fibroblasts 504 mapped primarily to all clusters except C6, C12, and C13, three clusters which we identified as more tissue-505 specific (Figure 3g) . We computed marker genes for these clusters in skin (Supplementary Table 9 ) and Cross-species mapping identifies shared fibroblast activation states in disease animal models of pulmonary, 526 synovial, and intestinal inflammation. Next, we tested whether our two shared inflammation associated fibroblast subtypes were identifiable in single-528 cell datasets from mouse models of tissue inflammation. By defining which aspects of fibroblast-driven 529 pathology are reproduced in mouse models, it may be possible to elucidate which pathological processes in 530 murine models best parallel human fibroblast cell states. We found three single-cell RNAseq data sets that Within each study, we identified fibroblasts (6,979 intestinal, 10,320 pulmonary, and 5,704 synovial) with 564 clustering and marker analyses (Supplementary Figure 7a,b) . We then mapped these fibroblasts to our human 565 cross-tissue reference with the Symphony pipeline (Methods) and labeled mouse cells with the most similar 566 reference fibroblast subtypes (Figure 7b) . While most clusters were well-represented across tissues, two 567 appeared more tissue-specific (Supplementary Figure 7c) . Myofibroblast-enriched C13 was mostly absent in 568 synovium, which is known to lack myofibroblasts. Cluster C12, which mapped well to the intestinal WNT5B + 2 569 cluster in our initial analyses (Supplementary Figure 5a) , was enriched in intestinal fibroblasts in this mouse 570 analysis. To test the degree to which gene markers are conserved between mouse and human, we performed 571 cluster marker analysis in the mouse fibroblasts (Supplementary Table 10 ) and compared cluster expression 572 profiles between mouse genes and human orthologs (Supplementary Figure 7d) . Importantly, the most similar 573 gene expression profiles were between corresponding clusters in mouse and human. Moreover, for most 574 clusters, expression profiles were even more similar between matched tissues. We next asked whether the same fibroblast subtypes were expanded in inflamed tissues in human 576 disease and mouse models. Thus, we performed differential abundance analysis within each mouse dataset 577 (Supplementary Figure 7e) , comparing inflamed cases to matched controls (Methods) to determine whether 578 the SPARC + COL3A1 + and CXCL10 + CCL19 + populations expanded in human tissues were also expanded in 579 mouse models (Figure 7c) . In bleomycin treated lungs, the most expanded populations were SPARC + COL3A1 + Temporal ordering of C4 and C11 activation in DSS-induced colitis. We were surprised that SPARC + COL3A1 + fibroblasts were not significantly expanded in a DSS-induced colitis cross-tissue analysis, we were able to identify fibroblast phenotypes that were shared by all tissues as well as 627 fibroblast adaptations unique to a subset of tissues. The lack of universal definitions for key concepts such as fibroblast identity and inflammation scoring 629 that apply equally well to all tissues presented a major challenge to our effort to associate fibroblast phenotypes 630 with inflammation. In particular, the lack of a universal, pan-fibroblast surface marker prevented us from directly 631 isolating fibroblasts with flow cytometry. We addressed this problem with negative selection, using specific 632 markers to filter out non-fibroblast populations, and thus defining fibroblasts based on high-dimensional single-633 cell-RNA-seq data as non-epithelial, non-immune, non-endothelial, and non-mural cells with some known 634 tissue-specific fibroblast markers, such as PDPN, PDGFRA, and COL1A1. The lack of a quantifiable score for 635 inflammation impeded us from directly using standard tools from meta-analysis, which assume a standardized 636 phenotype that can be measured equally well across all organ tissues. Inflammation in each disease is defined 637 by disease-specific pathological processes, reflected in tissue-specific histological scores, such as the Krenn The complexity of our study design, with cells measured from multiple donors, tissues, and diseases, 650 presented a second major challenge to our study. Algorithms to identify shared clusters in scRNAseq datasets Based on marker gene profiles, we believe that some of the clusters named in our analysis have been In interpreting clusters with more tissue-specific than tissue-shared genes, we noticed that tissue- One highly correlated pair of clusters from salivary gland (x-axis) and synovium (y-axis) represented by scatter 754 plots of (differential) gene expression. Blue genes are shared by the two clusters, while red genes are unique 755 to one cluster. is also reported in Table 1 . All OASIS participants provided written informed consent and the study was Cryotube were quickly thawed in water bath at 37°C and washed twice in pre-warmed 5%FBS RPMI media. The salivary gland biopsies were then enzymatically digested as previously described (PMID: 31213547). Dead 968 cells were removed using the EasySepTM Dead Cell Removal (Annexin V) kit from the digested samples 969 following manufacturer's instructions before proceeding for the scRNA sequencing using the 10x platform. Inflammation score normalization across tissues. Inflammation scores computed within each tissue had ranges and distributions. To be able to compare 004 inflammation associated phenotypes across tissues, we normalized the distributions by performing quantile 005 normalization. Because the number of samples was relatively small, we did not use an empirical distribution. Instead, we normalized to the quantiles of a parametric distribution. We chose the beta distribution ( = 3, = 007 3) to map the scores to an interpretable interval, between 0 (low inflammation) and 1 (high inflammation). Gene selection For analyses with one tissue, we used the VST method for variable gene selection, reimplemented from the 010 Seurat package (Butler et al., 2018) as a stand along function in our github at 011 immunogenomics/singlecellmethods. We used default parameters and kept the top 2000 genes, ranked by 012 standardized variance. For the multi-tissue integrated analysis, we used genes that we found informative in at least one of the tissue-specific analyses of lung, salivary gland, intestine, and synovium. We defined informative 014 genes with two analyses. The first analysis is differential expression of cluster-markers for tissue-specific 015 fibroblast subtypes (Figure 4a) . We kept cluster-informative genes with < 0.05 and | | ≥ 0.5. The second 016 analysis found broadly inflammation associated genes by fitting a Poisson log-normal GLMMs to each gene. We kept inflammation associated genes with < 0.05 and | | ≥ 0.1. Weighted PCA. We implemented principle components analysis that gives equal weight to each tissue while preserving the total 020 cell number (∑ U U = ). The weights given to each cell were determined to meet this equal weight condition. These weights were then used in the scaling and SVD steps. For scaling, we computed weighted means and 022 variance with the following formulas: . For SVD, we modified the PCA 023 covariance decomposition formula to allow for observation weights with a diagonal matrix : m = m . This decomposition is achieved by performing SVD on the weighted matrix 2/* = m . Because is 025 diagonal, its square root is the element-wise square root. This SVD solution now represents the original data 026 as = m )2/* , with gene loadings and cell embeddings m )2/* . Weighted PCA is implemented on 027 our github at immunogenomics/singlecellmethods with the weighted_pca function. Weighted Harmony. We modified the Harmony algorithm to include observation weights. To achieve this, we modified the clustering We used the UMAP algorithm to visualize cells in two dimensional embeddings. We used the uwot R package 037 (Melville, 2020) Clustering. We performed graph based clustering with the Louvain algorithm (Blondel et al., 2008) , implemented in Seurat 045 (Butler et al., 2018) . Instead of constructing the kNN and sNN graphs from scratch, we used the uniform manifold 046 graph estimated in the UMAP algorithm. In the uwot package (Melville, 2020) , this data structure is directly 047 available in the fgraph field when umap is run with option ret_extra = c('fgraph'). Hierarchical gene expression modeling. Statistical model. We modeled the expression of each gene using Poisson lognormal GLMM regression. This 050 framework allows us to model the hierarchical design in our multi-tissue, multi-donor dataset. We fit the following 051 GLMM for the integrated, multi-tissue analysis, regressing to the frequency of gene in observation . log TU~‚ + ƒ"… † ‡ˆ‰ + Š‹OE‹‰ + Š‹OE‹‰:ƒ"… † ‡ˆ‰ + mU † †…ˆ+ mU † †…ˆ:ƒ"… † ‡ˆ‰ + (log ' "U " ) We chose to model the cluster interaction terms with donor and tissue. As many papers have observed There is generally no analytical way to compute Σ for random 069 effects, so we estimate it with simulation, using the arm R package(Gelman and Su, 2020), with 1000 070 simulations. Tissue-specific cluster effects take into account both the cluster and tissue-cluster interaction term. For instance, if we wanted to know how a gene is associated with cluster 3 in the lung, we would compute we found it difficult to tie model fitting and simulation seamlessly with differential expression analysis. For 080 instance, building contrasts for nested effects and estimating significance for multiple gene queries was difficult 081 to do. Moreover, the memory footprint of lme4 models makes it impractical to fit and save models for 1000s of 082 genes for downstream inference. To make lme4 and arm more accessible for gene expression analysis, we 083 created the Presto package. Presto extracts the necessary components from lme4 models, saves them in 084 efficient data structures, and has all necessary functions to do efficient contrast analysis for differential 085 expression. We made Presto available as an R package, available on github at immunogenomics/presto under 086 the GLMM branch. To make the models more numerically stable, we enforced a minimum value for the size of random For our analyses, we consider significance with respect to these shrunken p values, estimated with random 095 effects, without doing additional post hoc shrinkage. We made two decisions to make Presto scale to large datasets. First, we fit the model with pseudobulk, 097 rather than single-cell RNAseq profiles. Note that in the formula above, the cluster, tissue, and donor covariates 098 are not unique to single cells. Therefore, we collapse reads from cells with same cluster, donor, and tissue 099 identity into one observation. This approach has strong precedent (Lun and Marioni, 2017) . It is important to 100 note that in this strategy, the number of parameters to estimate is equal to the number of observations. With 101 fixed effects, this model is under-determined. However, because we shrink estimates to 0 with Gaussian priors, 102 the effective number of independent parameters shrinks too. The second decision is with the choice of 103 generative model. Many RNAseq differential expression tools used the Negative Binomial distribution, which 104 uses Gamma rather than lognormal priors to model over-dispersion. For completeness, we also included 105 negative binomial GLMMs in Presto. In practice, we found that this error model yielded almost identical results but took ten times longer to run. Tissue heterogeneity. We took a very simple approach to labeling genes as conserved or heterogeneous 108 cluster makers. Conserved markers were significantly ( < 0.05) overexpressed ( > 0) in all four tissues. If a 109 gene was not upregulated in at least one tissue, we considered it to be a heterogeneous marker. Effect 110 heterogeneity has a rich statistical treatment, especially in meta-analysis. We decided to not use these more 111 sophisticated techniques, although the parameters learned in Presto could be used for such analyses. Analyses. To find marker genes for dermal fibroblasts, we fit the same model as above but omitted the Tissue 113 terms: log TU~‚ + ƒ"… † ‡ˆ‰ + Š‹OE‹‰ + Š‹OE‹‰:ƒ"… † ‡ˆ‰ + (log ∑ "U " ). For the mouse scRNAseq analyses, we used the same hierarchical formula with all Tissue terms. Pathway analysis. All formal geneset enrichment was done with the GSEA algorithm, implemented in the fgsea R 117 package (Sergushichev, 2016) . To enrich pathways for marker analyses (Figure 5d) We thank David Lee for having the vision and organizing this Roche network to study stromal biology across 720 tissues. K.W. is supported by a NIH-NIAMS have received support from the National 722 Institute for Health Research (NIHR) Birmingham Biomedical Research Centre and the NIHR/Wellcome Trust 723 is supported by funding from the National Institutes of Health 724 (U19AI111224, U01 HG009379, and R01AI049313) References 154 Single-cell RNA-seq reveals ectopic and aberrant lung-resident cell 156 populations in idiopathic pulmonary fibrosis Projecting 158 single-cell transcriptomics data onto a reference T cell atlas to interpret immune responses Pericytes: developmental, physiological, and pathological 160 perspectives, problems, and promises Gene ontology: tool for the unification of biology. The Gene Ontology 163 Consortium Fitting Linear Mixed-Effects Models Using lme4 Patients with inflammatory bowel disease (IBD) reveal increased induction capacity of intracellular 168 interferon-gamma (IFN-gamma) in peripheral CD8+ lymphocytes co-cultured with intestinal epithelial cells Fast Unfolding of Communities in Near-optimal probabilistic RNA-seq 173 quantification Interleukin-2-175 and interferon-gamma-secreting T cells in normal and diseased human intestinal mucosa Integrating single-cell transcriptomic 178 data across different conditions, technologies, and species Diversity, topographic differentiation, and positional memory in human fibroblasts Distinct fibroblast subsets drive inflammation and damage in arthritis Conserved transcriptomic profile between mouse and human colitis allows unsupervised patient stratification Pathogenic stromal 189 cells as therapeutic targets in joint inflammation Meta-analysis in clinical trials msigdbr: MSigDB gene sets for multiple organisms in a tidy data format Methods for high-dimensional analysis of cells dissociated from 194 cryopreserved synovial tissue Mixed-effects association of single cells identifies an 197 expanded effector CD4+ T cell subset in rheumatoid arthritis GENCODE reference annotation for the human and mouse genomes IL-1-driven stromal-neutrophil interaction in deep ulcers 202 identifies a pathotype of therapy non-responsive inflammatory bowel disease arm: Data Analysis Using Regression and Multilevel/Hierarchical Models Why We (Usually) Don't Have to Worry About Multiple 205 Comparisons scDblFinder: scDblFinder Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: 208 multitissue gene regulation in humans Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial 211 and mesenchymal lineages in pulmonary fibrosis Batch effects in single-cell RNA-213 sequencing data are corrected by matching mutual nearest neighbors Single-cell transcriptome analysis of human skin identifies novel fibroblast 216 subpopulation and enrichment of immune subsets in atopic dermatitis Mucosal Profiling of Pediatric-Onset Colitis and IBD Reveals Common Pathogenics and Therapeutic 220 Pathways Efficient and precise single-cell reference atlas mapping with Symphony Structural Remodeling of the Human Colonic The mesenchymal context in inflammation, 227 immunity and cancer Fast, sensitive and accurate integration of single-cell data with Harmony Synovitis score: discrimination between chronic low-grade and high-grade synovitis Molecular signatures database (MSigDB) 3.0 Query to reference single-cell integration with transfer learning Overcoming confounding plate effects in differential expression 239 analyses of single-cell RNA-seq data Mediterranean diet and risk of Sjögren's syndrome Development and validation of the Nancy histological 244 index for UC Single-Cell Analysis of Crohn's Disease Lesions Identifies a Pathogenic 247 The barcode, UMI, set format and BUStools uwot: The Uniform Manifold Approximation and Projection (UMAP) Method for 251 Dimensionality Reduction Functionally distinct disease-associated fibroblast subsets in 254 rheumatoid arthritis The K/BxN mouse model of inflammatory arthritis: theory and practice Immunofibroblasts are pivotal drivers of tertiary lymphoid structure 260 formation and local pathology Fibroblast-specific IL11 signaling drives chronic inflammation in murine fibrotic lung disease Autocrine Loop Involving IL-6 Family Member LIF, LIF Receptor, and STAT4 Drives Sustained Fibroblast 266 Production of Inflammatory Mediators A Single-Cell Tumor Immune Atlas for Precision Oncology Angiocrine functions of organ-specific endothelial cells A draft network of ligand-receptor-mediated multicellular signalling in 273 human A review on Eph/ephrin, angiogenesis and lymphangiogenesis in gastric, colorectal and 276 pancreatic cancers Evaluation of GRCh38 and de novo haploid genome 279 assemblies demonstrates the enduring quality of the reference assembly An algorithm for fast preranked gene set enrichment analysis using cumulative 281 statistic calculation CUX1 and IκBζ mediate the synergistic inflammatory response 284 to TNF and IL-17A in stromal fibroblasts Intra-and Inter-cellular Rewiring of the Human Colon during 287 Single-cell transcriptomics of human T cells reveals tissue and activation signatures 290 in health and disease A benchmark of 292 batch-effect correction methods for single-cell RNA sequencing data Collagen-producing lung cell atlas identifies multiple 295 subsets with distinct localization and relevance to fibrosis Methods to estimate the between-study variance and its uncertainty in 298 meta-analysis Midkine acts as proangiogenic cytokine in hypoxia-301 induced angiogenesis Notch signalling drives synovial fibroblast identity and arthritis pathology Coordination of Immune-Stroma Crosstalk by IL-6 Family Cytokines Oncostatin M drives intestinal inflammation and predicts response to tumor 309 necrosis factor-neutralizing therapy in patients with inflammatory bowel disease Defining inflammatory cell states in rheumatoid arthritis joint 312 synovial tissues by integrating single-cell transcriptomics and mass cytometry IFN-γ and TNF-α drive a CXCL10 + CCL2 + macrophage phenotype expanded in 315 severe COVID-19 and other diseases with tissue inflammation Massively parallel digital transcriptional profiling of single cells We associated inflammation score with cluster abundance using logistic regression, following the MASC method 123 (Fonseka et al., 2018) , with the following formula: log Ÿ (ƒ"… † ‡ˆ‰•v) Ÿ (ƒ"… † ‡ˆ‰¡v)~1 + + (1| ) + ( + 124 | ). As in MASC, the response variable models the log odds of being in cluster vs not, to test for 125 which factors contribute to cluster abundance. This probability is a function of (1) an intercept, which reflects 126 the average abundance of cluster in the data, (2) fixed effect for , the normalized inflammation score for Cluster correspondence analysis. To compare the co-occurrence of the fibroblast cluster labels, within-tissue (Figure 3 ) and integrative ( Figure 136 4), we used a similar framework to abundance modeling above. We used the following formula: , a categorical variables that encodes the within-tissue cluster 139 identity. We chose to model this with a random effect for numerical stability. To estimate significance, we used 140 Wald's approximation and simulated covariance for the levels of (1| mU † †…ˆ) with the R arm package. Symphony projection. The Symphony pipeline is described in detail in a separate manuscript (Kang et al., 2020) . In order to infer 143 reference cluster identity in query cells, we used a k-NN classifier. K=10 nearest neighbors were estimated with 144 Symphony projected low dimensional embeddings, based on cosine distance ( = 0.1). Ligand receptor analysis. We