key: cord-0327729-sph2zazk authors: Ibáñez-Solé, Olga; Ascensión, Alex M.; Araúzo-Bravo, Marcos J.; Izeta, Ander title: Lack of evidence for increased transcriptional noise in aged tissues date: 2022-05-19 journal: bioRxiv DOI: 10.1101/2022.05.18.492432 sha: 4ede141ef8c9b0a2700ece94b8b76bb16d89ab32 doc_id: 327729 cord_uid: sph2zazk Aging is often associated with a loss of cell type identity that results in an increase in transcriptional noise in aged tissues. If this phenomenon reflects a fundamental property of aging remains an open question. Transcriptional changes at the cellular level are best detected by single-cell RNA sequencing (scRNAseq). However, the diverse computational methods used for the quantification of age-related loss of cellular identity have prevented reaching meaningful conclusions by direct comparison of existing scRNAseq datasets. To address these issues we created Decibel, a Python toolkit that implements side-to-side four commonly used methods for the quantification of age-related transcriptional noise in scRNAseq data. Additionally, we developed Scallop, a novel computational method for the quantification of membership of single cells to their assigned cell type cluster. Cells with a greater Scallop membership score are transcriptionally more stable. Application of these computational tools to seven aging datasets showed large variability between tissues and datasets, suggesting that increased transcriptional noise is not a universal hallmark of aging. To understand the source of apparent loss of cell type identity associated with aging, we analyzed cell type-specific changes in transcriptional noise and the changes in cell type composition of the mammalian lung. No robust pattern of cell type-specific transcriptional noise alteration was found across aging lung datasets. In contrast, age-associated changes in cell type composition of the lung were consistently found, particularly of immune cells. These results suggest that claims of increased transcriptional noise of aged tissues should be reformulated. Concomitant to the large repertoire of known age-associated changes at the cellular level, an reports that deserve further scrutiny. 45 Early studies were based on the quantification of the variance associated with the expression of 46 a few pre-selected transcripts by real-time PCR, on bulk cell and tissue samples (Bahar et al., 2006 ; 47 With the advent of single-cell RNA sequencing (scRNAseq) technologies, whole-48 transcriptome variability on aged tissues was studied at the single-cell level. A pioneering study on 49 human pancreas by Quake and colleagues found an age-related increase in transcriptional noise 50 specific to pancreatic cells (Enge et al., 2017) . The authors introduced a definition of transcriptional 51 noise that was based on whole-transcriptome variability: the ratio between biological and technical 52 variation, where the latter was inferred from External RNA Controls Consortium (ERCC) spike-in 53 variability. As ERCC spike-in controls are not included in every scRNAseq experiment, they proposed 54 two alternative methods that were based on the notion of "distance to centroid": the greater the dependency over k iterations. We implemented an extension of this method so that it could be used 115 in datasets containing several cell types, by computing the GCL averaged over 50 iterations for each 116 cell type of the same individual. Therefore, our implementation outputs a GCL score per cell type 117 and individual rather than a transcriptional variability measure per cell. The Python implementation 118 of these four methods is available at https://gitlab.com/olgaibanez/decibel. Scallop membership score accurately identifies transcriptionally noisy cells 120 In addition to implementing existing methods, we developed Scallop, a novel tool for the quantifi-121 cation of the degree of loss of cell type identity in scRNAseq data ( Figure 1B ). Scallop measures 122 the membership of each cell to a particular cluster by iteratively running a clustering algorithm on 123 randomly selected subsets of cells and computing the fraction of iterations a cell was assigned to a 124 particular cluster. Thus, cluster membership takes values between 0 and 1. Scallop relies on the 125 assumption that the more consistently a cell is assigned to a particular cluster across bootstrap 126 iterations, the greater its transcriptional stability. Conversely, a cell being assigned to different 127 clusters across iterations is indicative of a greater transcriptional variation. Therefore, we quantify 128 loss of cell type identity as 1 − membership. 129 In order to characterize and validate the performance of our method, we conducted three 130 experiments. First, we compared the output of Scallop to the transcriptional noise measured using In addition, we plotted the 10% most transcriptionally stable and unstable cells according to the 141 Euclidean distance to the cell type mean and Scallop methods (Figure 1 -Supplement 1C) . These 142 analyses suggested that Scallop's membership score outperforms distance-to-centroid methods at 143 discriminating between noisy cells lying in between clusters and more transcriptionally robust cells 144 constituting the core of T cell subtypes. 145 Next, we analyzed Scallop's robustness in response to input parameters, namely, the number 146 of bootstrap iterations and the fraction of cells used in each iteration. We ran Scallop on five 147 independent scRNAseq datasets with different size and cluster composition (see Appendix 1), 148 and studied the convergence of Scallop membership scores for a wide range of values ( Figure 149 1 -Supplement 2). The median correlation distance between membership scores decreased as 150 we increased the number of bootstrap iterations (n_trials) and the fraction of cells used in 151 each iteration (frac_cells). We concluded that Scallop's output is robust to changes in its input 152 parameter values, the results suggesting that frac_cells > 0.8 and n_trials > 30 are appropriate 153 parameter values for most datasets (Figure 1 -Supplement 2) . 154 Finally, we studied the relationship between Scallop membership score and robust gene marker Then, a subset of cells is randomly selected and subjected to unsupervised clustering n_trials=10 times (cells not selected in each bootstrap iteration are shown in grey). The cluster labels across bootstrap iterations are harmonized by mapping the cluster labels with the greatest overlap, using the Hungarian method (Munkres, 1957) . A consensus clustering solution is derived by selecting the most frequently-assigned cluster label per cell, and the membership score is computed as the frequency with which the consensus label was assigned to each cell. Scallop measures noise as a 1 − membership value assigned to each cell. Figure 3A ; bubble #22). Similarly, lung interstitial fibroblasts' transcriptional noise appeared to 211 increase with age, although with a larger range of membership scores (3-17%; bubble #24). In 212 both cases, the cell type abundance was not affected by aging. In contrast, alveolar macrophages 213 showed a decrease in age-associated transcriptional noise (5-12% increase in median membership; 214 bubble #10). Finally, several cell identities appeared not to change significantly with regard to 215 their transcriptional noise related to aging. That was clearly the case for capillary endothelial cells 216 (bubble #9) and plasma cells (bubble #5). Vascular endothelial cells (bubble #6) showed less than 217 2% of change in noise in 3 out of 4 datasets, but increased up to 8% in one dataset. Therefore, and 218 contrary to expectation, quantitative analysis of age-associated transcriptional noise did not show a 219 consistent pattern across diverse lung cell types in the four available datasets. In contrast, the cell abundance analysis did reveal a strikingly consistent enrichment of immune The area of the bubbles represents the expected proportion of each cell type in the whole dataset according to the binomial GLM fitted for that dataset. (B) Immune cell type enrichment, but not age-associated increase in transcriptional noise, is consistently detected in old mice lungs. The increase in transcriptional noise associated with aging (left) and the cell type enrichment (right) are shown for several lung cell identities classified on the left as immune and non-immune. Cell identities present in at least 3 out of the 4 studied datasets are shown, and the age-related difference in transcriptional noise of missing cell identities is imputed from the remaining three measurements (mean difference across datasets). More specifically, Angelidis_TN is the transcriptional noise per cell identity on the Angelidis dataset (from their figure 2); Kimmel_OD is the gene overdispersion per cell type on the Kimmel dataset (from their figure 2B); and Kimmel_DC is the cell-cell heterogeneity per cell identity measured as the Euclidean distance to the centroid of the cell identity for a particular age. Columns 4-7 summarize the results of our analysis of age-related loss of cell type identity in the murine lung. Specifically, Angelidis_S, Kimmel_S, TMS_FACS_S and TMS_drop_S report the transcriptional noise per cell identity on the four datasets, measured as the difference in median membership score between young and old individuals. The cell identities used are those drawn from Angelidis. Since some cell identities from Kimmel dataset did not have a 1:1 correspondence to the Angelidis cell identities, they are shown using their original notation at the bottom of the table ("Additional cell identities"). UP/DOWN: age-related increase/decrease in noise, NS: the difference in noise between young and old individuals is not statistically significant. NP: the cell identity was not present in the dataset in sufficient amounts to perform the analysis. For most cell types, it can be concluded that there is little overlap between cell identity-specific noise measurements across datasets and methods For each cell type, its age-related increase in noise (difference in 1 − ℎ between old and young individuals per cell type) and the Old/Young OR are shown. Only cell types whose enrichment/depletion are statistically significant in at least one of the datasets are shown, and the OR-s associated with a p-value > 0.01 are shown as a triangle. The color-bar for the enrichment is shown in a logarithmic scale. aging from a single transcriptionally homogeneous cluster (see Figure 5 and Supplement 1). The (Schneider et al., 2021) . Interestingly, we measured the age-associated 296 transcriptional noise in the alveolar macrophage community using a distance-to-centroid method 297 (Euclidean distance to the cell type mean) and Scallop, and observed that only the latter algorithm (Li et al., 2020) , all established hallmarks of aging. Some authors consider transcriptional noise to 312 be a hallmark of aging in and of itself (Mendenhall et al., 2021) . Since aging is multifactorial and 313 mutational load most likely leads to clonal expansion of aberrant cells that accumulate throughout 314 the lifetime of the individual, other authors suggest that aging traits may be associated with cell type 315 imbalance in aged organs (Cagan et al., 2022) . Another recent hypothesis is inter-tissue convergence 316 through age-associated loss of specialization (Izgi et al., 2022) . 317 In this work, we made a systematic comparison of the most important families of methods 318 that have been used to quantify age-related transcriptional noise through the implementation of 319 Decibel, a novel Python toolkit. Since we were not convinced of the utility of these methods to In order to investigate cell type-specific effects in transcriptional noise, it is crucial to compare 333 between different datasets of the same aging tissue. Otherwise, it is difficult to ascertain whether 334 the variability observed between cell types is due to a pattern that is conserved in that tissue or is In fact, the age-associated increase in immune cell infiltration of solid organs may be generalized. Specifically, one study found neutrophil and plasma cell infiltration in adipose tissue, aorta, liver 354 and kidneys of aged rats of both sexes, and the immune cell infiltration was reversed by caloric 355 restriction (Ma et al., 2020a) . Another study found a subtype of highly secretory plasma cells 356 infiltrated in the aged bone marrow, spleen, fat, kidney, heart, liver, muscle, and lungs (Schaum (Yousefzadeh et al., 2021) , in what has been proposed to be a feed-forward circuit (Salminen, 2021) . 359 Therefore, the importance of immune cell infiltration of the aged lungs cannot be overlooked. In fact, 360 age-associated immune cell type enrichment has also been observed in two independent studies 361 of macaque lungs. One study found increased mast cells, plasma cells and CD8+ T cells in aged 362 lung tissue (Ma et al., 2020b) , while the other found increased alveolar and interstitial macrophage 363 numbers in bronchoalveolar lavages of old macaques (Rhoades et al., 2022) . The significance of the 364 shift in cellular composition of the aged lungs in relation to the appearance of aging traits remains 365 to be determined. Of note, alternative explanations for transcriptional changes associated with 366 aging such as tissue convergence are compatible with shifts in the cellular composition of aging 367 tissues and organs being a primary cause of convergence (Izgi et al., 2022) . 368 In summary, the sources of the apparent increase in transcriptional noise reported by previous 369 studies may be multiple, and are mostly related to the computational methods used to characterize Decibel: Python toolkit for the quantification of transcriptional noise 376 We developed a Python toolkit for the quantification of loss of cell type identity associated with 377 aging. We implemented methods as they were originally described in the literature. cells -is computed. Then, the biological variation is measured as the Euclidean distance from each 385 cell to its cell type mean for that individual. The technical variation is computed using the same 386 procedure, but using only the ERCC spike-ins in the calculation of the distance to cell type mean. Finally, the transcriptional noise is calculated by dividing the biological variation by the technical 388 variation per cell. Euclidean distance to cell type mean 390 The distance to the cell type mean is measured as the second method described by Enge et al.. 391 For each cell type and individual mouse or donor, we compute the average whole-transcriptome Taking the Matlab code provided by the authors, we implemented the GCL in Python. As the 404 original formulation was used in datasets with a single cell type, here we computed the GCL for 405 each cell type separately and then calculated the average GCL for the tissue. For each cell type, the 406 GCL was calculated by splitting the whole transcriptome into two random halves and computing 407 the batch-corrected distance correlation between them (Levy et al., 2020) . The GCL per cell type 408 was averaged over k times. Following the authors' recommendation, we used k=50 in all of our 409 calculations. score: 2. The maximum score would correspond to a total overlap between the two clusters. The score is computed for every pair of clusters between the reference solution and each of 445 the bootstrap iterations to obtain a contingency matrix ( _ × _ ). In order to find columns that maximizes the trace of the contingency matrix. We do this by using Munkres, a Python 448 implementation of the Hungarian method (Munkres, 1957) . 449 As the reference clustering solution is computed on the whole dataset but each of the bootstrap In order to validate our method for transcriptional noise quantification, we conducted three analyses: 485 1) we graphically evaluated the output of Scallop on a dataset of human T cells, 2) we analyzed 486 its robustness to input parameters, and 3) we studied the relationship between membership and 487 robust marker expression, using a PBMC dataset. Stable and unstable cells in the 8K human T cells 489 We downloaded a 23,766 PBMC dataset from from 10X Genomics. We ran the standard processing 490 pipeline including highly variable gene detection, dimensionality reduction through PCA and UMAP, 491 and clustering. We annotated the dataset according to PBMC marker expression and selected the 492 cluster of T lymphocytes. We obtained a dataset of 8,278 cells. We ran the processing pipeline on 493 the T lymphocyte dataset and obtained three main clusters of cells, which we annotated as 0/CD4+ Robustness to input parameters 501 We selected a set of five scRNAseq datasets of various sizes and depths (Table 1) of a routine downstream analysis (differential expression) when given each of the sets as input. We computed the 100 most differentially expressed genes (DEGs) between each cell type and the 522 rest of the cells using 1) all cells, 2) only the stable cells and 3) only the unstable cells. B cells and 523 megakaryoctes were excluded from the analysis as the former were highly stable (so we could not 524 compare between the stable and the unstable fraction) and the latter consisted of very few cells. 525 We compared the distribution of log-fold changes and p-values associated with those DEGs when 526 using only the stable, only the unstable and all the cells. 1,5K murine aging CD4+ T cells 540 We downloaded the raw data and metadata files from (Martinez-Jimenez et al., 2019) from the 541 authors' GitHub. We created an annData object with the raw count matrix and the metadata 542 (mouse strain, age-group, stimulus, individual and cell type). We identified and flagged the counts 543 corresponding to ERCC spike-in controls. We ran a standard processing pipeline: filtering out low 544 quality cells and genes, normalization and log-transformation of counts, selection of highly variable 545 genes, batch-effect correction between mouse strains (Mus musculus domesticus and Mus musculus 546 castaneus) and dimensionality reduction was conducted (PCA and UMAP). 14,8K murine aging lung cells 548 We downloaded the raw count matrix and the metadata file from (Angelidis et al., 2019) from the 549 Gene Expression Omnibus (accession number: GSE124872). We created an annData object with the 550 raw count matrix and the available metadata (cell type annotation, age group, cluster and mouse). We ran a standard processing pipeline: quality control, normalization and log-transformation of 552 counts, selection of highly variable genes, batch-effect correction between individual mice using 553 bbknn (Park et al., 2018) and dimensionality reduction (PCA and UMAP). In our analysis, we used 554 the cell type annotations provided by the authors. We also annotated the rest of the murine aging 555 lung datasets using their annotation as a reference. In order to do that, we computed the DEGs 556 between each lung cell type and the rest of the dataset to obtain a set of gene markers for each 557 cell type. We then used those markers to annotate the rest of the datasets using scoreCT (Seninge, The 3,2K TMS FACS-sorted and the 4,4 TMS droplet lung cell datasets were downloaded from figshare. A standard preprocessing pipeline was run on the two datasets, and cluster labels were harmonized 587 with the rest of the murine aging lung datasets by using the genes differentially expressed between 588 cell types from the Angelidis dataset as input for the automated cell-type annotation through 589 scoreCT (Seninge, 2020) 590 Human Lung Cell Atlas We downloaded the full lung and blood 10X dataset from the HLCA (Travaglini et al., 2020) from Acknowledgments 675 We thank Iñaki Inza for his thorough revision of the manuscript, Laura Yndriago for her feedback, 676 and Sandra Fuertes for useful discussions. We thank Valentine Svensson for support with the 677 application of GLMs to cell type abundance analysis of scRNAseq data. Stripplots showing the distribution of noise values, as measured by the four alternative methods (Biological variation over technical variation, Euclidean distance to cell type mean, Euclidean distance to tissue mean using invariant genes, and Global Coordination Level -GCL) in the seven datasets used in the analysis of the age-related transcriptional noise at the tissue level. Boxplots and their whiskers represent the interquartile range (IR) and 1.5*IR respectively. (a) Experimental approach. Four murine aging lung datasets were preprocessed and cell type-annotated. The cell-type labels from Angelidis were used as a reference to annotate the rest of the datasets. Differences in cell-type abundance between young and old mice were quantified using GLMs. From each dataset, eight subsets of related cell-types were created to classify the 31 cell types into 8 categories, which were used as input for Scallop to analyze the differences in cell-to-cell variability. The 31 detected lung cell types were classified in the Noise ranking (left) according to their greater age-related increase in noise. They were also classified in the Enrichment ranking (right) according to their greater enrichment in old samples. Cell categories that were represented by fewer than 100 cells were excluded from the transcriptional noise evaluation, and therefore do not appear in the plot. Specific cell types are shown in the same color and with the same numbers as specified in the legend. and Kimmel_DC is the cell-cell heterogeneity per cell identity measured as the Euclidean distance to the centroid of the cell identity for a particular age. Columns 4-7 summarize the results of our analysis of age-related loss of cell type identity in the murine lung. Specifically, Angelidis_S, Kimmel_S, TMS_FACS_S and TMS_drop_S report the transcriptional noise per cell identity on the four datasets, measured as the difference in median membership score between young and old individuals. The cell identities used are those drawn from Angelidis. Since some cell identities from Kimmel dataset did not have a 1:1 correspondence to the Angelidis cell identities, they are shown using their original notation at the bottom of the table ("Additional cell identities"). UP/DOWN: agerelated increase/decrease in noise, NS: the difference in noise between young and old individuals is not statistically significant. NP: the cell identity was not present in the dataset in sufficient amounts to perform the analysis. For most cell types, it can be concluded that there is little overlap between cell identity-specific noise measurements across datasets and methods Muris Consortium. A single-cell transcriptomic atlas characterizes ageing 685 tissues in the mouse An atlas of the aging lung mapped by single cell transcriptomics and 687 deep tissue proteomics Design and computational analysis of single-cell RNA-sequencing experiments Increased cell-to-cell variation in gene expression in ageing mouse heart Pattern Recognition with Fuzzy Objective Function Algorithms Fast unfolding of communities in large networks Statistical Mechanics: Theory and Experiment Somatic mutation rates scale with lifespan across mammals The eroding chromatin landscape of aging stem cells A synopsis on aging-Theories, mechanisms 705 and future prospects A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well A systematic performance evaluation of clustering methods for single-cell RNA-seq data Single-Cell Analysis of Human Pan-712 creas Reveals Transcriptional Signatures of Aging and Somatic Mutation Patterns Single-cell transcriptomics allows novel insights into 715 aging and circadian processes Inflamm-aging: An Evolution-717 ary Perspective on Immunosenescence Inflammaging decreases adaptive and innate immune responses in mice and humans Comparison of clustering tools in R for medium-sized 10x Genomics single-cell 722 RNA-sequencing data The hoverfly and the wasp: A critique of the hallmarks of aging as a paradigm Multi-omic 726 rejuvenation of human cells by maturation phase transient reprogramming. eLife Aging: progressive decline in fitness due to the rising deleteriome adjusted by genetic, environ-729 mental, and stochastic processes Dissecting the cellular specificity of smoking effects and reconstructing lineages in the 732 human airway epithelium Quantification of age-related decline in transcriptional 734 homeostasis Pathway dynamics can delineate the sources of transcriptional noise in gene 736 expression. eLife Ageing affects DNA methylation drift and transcriptional cell-to-cell variability in mouse muscle stem cells Inter-tissue convergence of 741 gene expression during ageing suggests age-related loss of tissue and cellular identity. eLife Shape Epidermal and Hair Follicle Heterogeneity Murine single-cell RNA-seq 746 reveals cell-identity-and tissue-specific trajectories of aging On the Programmed/Non-Programmed Nature of Ageing within the Life History Challenges in unsupervised clustering of single-cell RNA-seq data Fast, sensitive and accurate integration of single-cell data with Harmony Inflammaging and the lung Age-related loss of gene-758 to-gene transcriptional coordination among single cells A single-cell transcriptomic atlas of primate pancreatic islet aging Reprogram-764 ming to recover youthful epigenetic information and restore vision Repro-768 gramming to recover youthful epigenetic information and restore vision A step-by-step workflow for low-level analysis of single-cell RNA-seq data with 771 The Hallmarks of Caloric restriction reprograms the single-cell tran-776 scriptional landscape of Rattus Norvegicus aging Single-cell transcriptomic atlas of 778 primate cardiopulmonary aging Aging increases cell-to-cell transcriptional variability 781 upon immune stimulation An outline of generalized linear models. Generalized Linear Models UMAP: Uniform manifold approximation and projection The lung microenvironment shapes a dysfunctional response of alveolar macrophages in aging Cell-to-cell variation in gene expression and the aging 790 process Monocyte-derived alveolar macrophages drive lung fibrosis and persist in the lung 793 over the life span Improving gene network inference with graph wavelets and making in-795 sights about ageing-associated regulatory changes in lungs Decoding the regulatory network of early blood development from 798 single-cell gene expression measurements Algorithms for the Assignment and Transportation Problems Aging of intestinal stem cells Ageing and sources of transcriptional heterogeneity Stem cell aging: Mechanisms, regulators and therapeutic opportunities A 808 transcriptomic atlas of aged human microglia Distinct and diverse chromatin 811 proteomes of ageing mouse organs reveal protein signatures that correlate with physiological functions. eLife Fast Batch Alignment of Single Cell Transcriptomes Unifies Multiple Mouse 814 Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors Hallmarks of human "immunosenescence": adaptation or dysregulation? Tabula muris senis. Processed files (to use with scanpy) Molecular hallmarks of heterochronic parabiosis at single-cell resolution A 871 molecular cell atlas of the human lung from single-cell RNA sequencing Single-874 cell analyses of aging, inflammation and senescence From DNA damage to mutations: All roads lead to aging A smart local moving algorithm for large-scale modularity-based community 879 detection Transcriptional instability is not a universal 881 attribute of aging Factors associated with COVID-19-related death using OpenSAFELY SCANPY: large-scale single-cell gene expression data analysis Single-cell transcriptomic 889 profiling of the aging mouse brain Erosion of the Epigenetic Landscape and Loss of Cellular Identity as a Cause of Aging in Mammals. 893 bioRxiv An 896 aged immune system drives senescence and ageing of solid organs A single-cell transcriptomic 899 landscape of primate arterial aging Semisoft clustering of single-cell data A Single-Cell Transcriptomic 903 Atlas of Human Skin Aging Appendix 3 Table 1 . The general criteria for inclusion in the aging datasets used in this study was to include all samples from young and old individuals and to exclude newborn or pediatric individuals, as we did for the human pancreatic cell dataset (Enge et al., 2017) and the murine dermal fibroblast dataset (Salzer et al., 2018) . Care was taking to make all aging datasets sex-balanced. This was not possible for some datasets, as they consisted of same-sex individuals. However, same-sex datasets were included in our study as sex could not be a confounding factor in the aging analysis.