key: cord-0837776-d8wg8t2n authors: Alexander, Michael J.; Budinger, G.R. Scott; Reyfman, Paul A. title: Breathing fresh air into respiratory research with single-cell RNA sequencing date: 2020-07-03 journal: Eur Respir Rev DOI: 10.1183/16000617.0060-2020 sha: 863ac18fc788c57cb7dd1375afa2a44b6156267e doc_id: 837776 cord_uid: d8wg8t2n The complex cellular heterogeneity of the lung poses a unique challenge to researchers in the field. While the use of bulk RNA sequencing has become a ubiquitous technology in systems biology, the technique necessarily averages out individual contributions to the overall transcriptional landscape of a tissue. Single-cell RNA sequencing (scRNA-seq) provides a robust, unbiased survey of the transcriptome comparable to bulk RNA sequencing while preserving information on cellular heterogeneity. In just a few years since this technology was developed, scRNA-seq has already been adopted widely in respiratory research and has contributed to impressive advancements such as the discoveries of the pulmonary ionocyte and of a profibrotic macrophage population in pulmonary fibrosis. In this review, we discuss general technical considerations when considering the use of scRNA-seq and examine how leading investigators have applied the technology to gain novel insights into respiratory biology, from development to disease. In addition, we discuss the evolution of single-cell technologies with a focus on spatial and multi-omics approaches that promise to drive continued innovation in respiratory research. Every cell in the body shares a similar genome, but the epigenome, transcriptome, proteome and metabolome of each cell varies dramatically between tissues and cells. These "omes" beyond the genome dynamically change in response to environmental challenges, disease states and ageing. While technological advances increasingly allow measurement of epigenome, proteome and metabolome in small tissue samples that can be collected as part of clinical care, none are as robust, reproducible or low cost as next-generation sequencing (NGS) technologies to measure the transcriptome [1] [2] [3] . NGS technologies first allowed direct measurement of gene expression in composite tissues via sequencing of messenger RNA (RNA-seq) in 2008 [4] [5] [6] . Applying these technologies to ever-smaller samples allowed profiling of gene expression in a single cell within a year [7] . Since then, commercialisation and standardisation have made these technologies available in most advanced laboratories, supporting an explosion of publications using single-cell RNA-seq (scRNA-seq). Reductions in cost and advances in computational approaches have allowed the number of cells profiled in these studies to increase exponentially over time reaching >1 million per study [8, 9] . Boosted by these enabling technologies, scRNA-seq is being used in large-scale efforts to provide a high-resolution map of every cell in the human body, offering unparalleled opportunities to explore cellular interactions and trajectories over the course of disease. The community of respiratory researchers, long hampered by the cellular complexity of the lung, have been leaders in applying scRNA-seq to the study of human disease. These studies have supported a broad array of findings, including insights into respiratory system development, the identification of novel cell types in the human lung and profiles of heterogeneity in respiratory system cell populations in health and disease [10] [11] [12] [13] [14] [15] . The ability to address fundamental biological questions is continuously expanding as technologies to collect and process respiratory specimens for scRNA-seq are refined, costs for reagents and sequencing fall and computational platforms become more robust. Rapid advances in spatial transcriptomics, epigenomics, proteomics and metabolomics provide the opportunity for an integrated multi-omic approach to investigating lung disease. Nevertheless, techniques to leverage data generated from scRNA-seq technologies for respiratory research are evolving, and the limitations of these technologies for profiling respiratory samples are incompletely understood. In this review, we aim to provide an overview of scRNA-seq technologies focused on its applications and limitations when applied to studies of the respiratory system. We begin with some illustrative examples from our own group and others that address disease focused questions that can be specifically answered using scRNA-seq. The understanding of alveolar macrophages as a homogenous, nonreplicating cell population continuously replenished from a reservoir of peripheral monocytes changed dramatically when a series of lineage-tracing studies in mice showed that alveolar macrophages are a long-lived, self-renewing population that populates the lung immediately after birth and persists without input from circulating monocytes over prolonged periods of time [16] [17] [18] [19] [20] . In murine models of bleomycin-and asbestos-induced fibrosis, we found that monocytederived alveolar macrophages recruited in response to lung injury were necessary for fibrosis, while tissue-resident alveolar macrophages were dispensable [21, 22] . We used genetic lineage tracing systems to flow cytometry sort tissue-resident and monocyte-derived alveolar macrophages for bulk RNA-seq, which showed that monocyte-derived alveolar macrophages exhibit a profibrotic transcriptomic signature distinct from their tissue-resident counterparts. These findings predicted the presence of at least two transcriptionally distinct populations of alveolar macrophages in the lungs of patients with pulmonary fibrosis, a question that could only be addressed using scRNA-seq [14] . Applying this technology to the human lung, we identified two populations of alveolar macrophages in the lungs of patients with pulmonary fibrosis, one of which resembled macrophages from normal lungs and one of which differentially expressed profibrotic genes homologous to those we observed in mice. We were able to definitively show this in a remarkably small group of patients (eight patients with lung fibrosis and eight controls), suggesting that cellular heterogeneity rather than true biological variability might have masked signals in previous studies using bulk RNA-seq. Similar results were found by two independent laboratories, highlighting the reproducibility of scRNA-seq even when applied to human disease [23, 24] . Two groups used scRNA-seq to sequence a very large number of cells from samples of normal human airway, thereby describing the transcriptional landscape of the airway at unprecedented resolution. They identified a transcriptionally unique cell type that expressed large quantities of CFTR (cystic fibrosis transmembrane conductance regulator), the causal gene of cystic fibrosis [11, 12] . In addition, this cell was characterised by increased expression of several other ion transporters and the transcription factor FOXI1, a transcription factor with homology to foxi1, the canonical transcription factor of Xenopus larval skin ionocytes. The studies describing the discovery of the human airway ionocyte also reported extensive validation to demonstrate the presence of airway ionocytes in the human and murine lung using single-molecule fluorescence in situ hybridisation. They further validated their findings by performing functional studies in mice to show that this population of pulmonary ionocytes contributes disproportionately to CFTR function in the airway. These data challenged the existing paradigm that ciliated cells expressing FOXJ1 are the major source of CFTR protein in the lung with potential implications for gene-therapy approaches to cystic fibrosis. Immunotherapy has quickly become a frontline therapy in patients with nonsmall cell lung cancers (NSCLC) lacking driver mutations. A minority of patients receiving immune checkpoint blockade will exhibit a durable response to disease, an outcome that was previously unheard of. Unfortunately for the majority of patients, progression of their lung cancer is inevitable. The tumour immune microenvironment has been found to have a large influence on response to immunotherapy and by understanding the unique compositions of those environments researchers hope to better predict responsiveness and reveal novel targets [25] . Using scRNA-seq, a group of researchers characterised the tumour T-cell landscape of 14 patients with NSCLC compared to surrounding lung and peripheral blood [26] . They found significant heterogeneity in CD8 + and regulatory T-cell (Treg)-infiltrating T-cells with increasing proportions of exhausted CD8 + subtypes and activated Tregs portending a worse prognosis. A key limitation of bulk RNA-seq applied to whole-tissue or to purified cell populations is that this approach necessarily averages the gene expression signals of all the cells in a sample (figure 1b). This is a particular concern for complex tissues like the lung, which includes 40 or more cell types [27] . Because many of these populations are relatively rare or difficult to dissociate intact from the underlying tissue, their contributions to disease signatures from bulk RNA-seq analysis are lost. Furthermore, scRNA-seq analyses have raised doubt about the reliability of methods of "deconvolution", or computational estimation of the composition of individual cell populations in bulk RNA-seq data. While this limitation can be mitigated by purification of cell types or subpopulations using flow cytometry, often no such method of purification is available. Analysis using scRNA-seq estimates the transcriptomes of individual cells in a sample, enabling attribution of gene expression signals to cell types or subpopulations and allowing recognition of rare cells or pathogenic populations of cells that appear only during disease (figure 1). Compared with flow cytometry, which measures levels of surface or intracellular proteins with single-cell resolution, scRNA-seq offers a less biased portrayal of gene expression because it measures RNA corresponding to all genes rather than measuring only a prespecified panel of protein markers. One frequently discussed limitation of scRNA-seq data when compared with bulk RNA-seq data relates to the relatively shallow depth of sequencing within each cell. As a result, single-cell data is susceptible to the problem of "dropout" where a zero value for a gene may or may not reflect a lack of expression in an individual cell. However, recent work suggests that this sensitivity problem in scRNA-seq data can be overcome by sampling and sequencing sufficiently large numbers of cells [28] . Several workflows for scRNA-seq have been developed, and most scRNA-seq experiments are now performed using commercial platforms (table 1 and figure 2). A critical step of any scRNA-seq experiment is generation of a single-cell suspension from a sample of interest (figure 2b). For solid tissues, this is usually done using a combination of mechanical dissociation and enzymatic digestion. Existing studies have generally used fresh tissue, but protocols have been developed for cryopreserved samples or for cells isolated using lasercapture microdissection from formalin-fixed, paraffin-embedded samples [41] [42] [43] . One study demonstrated that intact lung tissue could be cold-preserved at 4°C for up to 72 h prior to processing for scRNA-seq without substantial evidence of degradation in the resulting data quality [44] . Following dissociation, individual cells are collected and then lysed in either wells on a plate, oil droplets or nanowells on a chip (figure 2c). This enables capture of messenger RNA (mRNA) molecules, generation of complementary DNA (cDNA) by reverse transcription and barcoding of cDNA for each individual cell in isolation from all other cells. A barcode incorporated into the cDNA during the process of preparing a cDNA "library" for each cell consists of a unique nucleotide sequence for each cell, and often another sequence called a unique molecular identifier (UMI) for each mRNA molecule captured from that cell. Afterwards, the cDNA from all cells can be pooled and sequenced using NGS technology. The sensitivity of scRNA-seq which enables its unparalleled resolution also renders the technology susceptible to technical and experimental biases that may obscure true biological signals. Differences in tissue dissociation protocols contribute significantly to batch effect and technical variation. The method of mechanical and enzymatic disaggregation, the processing time and the strengths of reagents can all affect downstream analysis. Aggressive or prolonged digestion protocols can cause cell death or cell fragmentation that release ambient RNA into the media (figure 2b and c). Because this RNA is included in each droplet, genes from these dead cells appear to be "expressed" in all of the sequenced cells (figure 2d). Gentle dissociation protocols lead to overrepresentation of cells that are easily liberated from the tissue [29, 45] . This is a particular problem in the lung where diseases (fibrosis, emphysema and pneumonia) dramatically alter the cellular composition and matrix characteristics of the tissue. Even in normal lung tissue this becomes evident as cell types that are more difficult to liberate intact, such as broad, flat, sail-like alveolar type I (AT1) cells and matrix-embedded fibroblasts, are relatively undersampled compared with cuboidal alveolar type II (AT2) cells and alveolar macrophages in published datasets. Efforts are under way to improve consistency and reliability of scRNA-seq methods and decrease technical variation, including the implementation of automated tissue processing. Data structures designed for storing data from single-cell experiments generally permit the storage of metadata detailing experimental conditions under which the data were obtained. The standardisation and meticulous documentation of implemented protocols will support improved consistency across operators and institutions. The initial processing of raw scRNA-seq data entails sorting reads based on their cell barcodes and aligning the reads to the reference transcriptome of the organism being studied to produce a table of gene counts. In this table, each column corresponds to a cell barcode and each row corresponds to a gene so that each entry contains an integer reflecting the number of reads with that cell barcode mapped to that gene (figure 2d). If libraries have been prepared for sequencing using UMIs, then all reads with the same UMI are treated as amplification duplicates and are collapsed into a single read in the counts table. A threshold is often selected to perform filtering on the counts table which removes cell barcodes with low sequencing depth, or total number of unique reads. These removed barcodes are attributed to capture, during cell isolation, of partial or damaged cells or ambient mRNA. Additional filtering may be performed to remove cell identities based on other metrics reflecting poor quality of input cells, such as having greater than a specific proportion of reads mapping to mitochondrial genes. Following filtering and other quality-control procedures, machine-learning tools can be used to group cells. Most often this entails clustering using either a similarity-or distance-based metric, and then assigning a cell identity to each resulting cluster based on the most highly expressed genes in that cluster compared with the other clusters [46] [47] [48] . Machine-learning algorithms for projecting high-dimensional data in low-dimensional space, such as tdistributed stochastic neighbour embedding (t-SNE) or uniform manifold approximation and projection (UMAP) are usually used for generating visualisations of clustered data [49, 50] . The results of a clustering analysis could provide evidence for a novel cell type or cell state if a distinct cluster was identified that expressed relatively high levels of unexpected genes. Additionally, if an scRNA-seq experiment was designed to compare two different conditions, differential expression analysis could be used to estimate which genes were expressed at a statistically higher level in one condition compared with another between clusters corresponding to the same cell type in samples from the respective conditions. Several computational tools have been developed to assist in combining scRNA-seq data from different experimental conditions or batches or data produced using varying protocols or by distinct groups [46, 51, 52] . These techniques are necessary for generating aggregate datasets and increasing the power to detect small expression differences, but they have to be applied thoughtfully. It is possible to remove expression signatures of true biological differences using these integration techniques, such as when a distinct cell type exists exclusively in one experimental condition. Auxiliary analyses using scRNA-seq data can answer a number of biologically meaningful questions. If the transition of cells through a biological process such as differentiation is of interest, a pseudotime analysis can be performed where cells are arranged sequentially on a graph according to their similarity, computationally recapitulating a temporal phenomenon occurring in a cross-sectional sampling of cells [53] [54] [55] . Another procedure to infer these trajectories, RNA velocity, compares the ratios of spliced, mature mRNA to unspliced, nascent mRNA between cells to identify likely transitional states between cells [56] . If the interest is in uncovering which gene pathways or gene regulatory networks are differentially modulated between cell types or experimental conditions, differentially expressed genes from scRNA-seq clustering analysis can be analysed using a statistical test for enrichment of curated gene sets from a collection such as GO or MSigDB [57, 58] . And if the focus is in uncovering novel cell type interactions, a curated list of ligand-receptor pairs combined with lists of highly expressed genes in different cell clusters from scRNA-seq data can be used to generate hypotheses about signalling interactions between different cell types, such as alveolar macrophage-epithelial cell interactions in the lung [59] . Collaborative efforts to generate and publish data using scRNA-seq on the cellular compositions of normal tissues have coalesced around several "atlases". These efforts are motivated by the goal of making raw scRNA-seq data available as quickly and as widely as possible with the recognition that the value of a single dataset can be increased through careful reanalysis in combination with other datasets. Consistent with such a fundamentally collaborative approach, many researchers using scRNA-seq have adopted the practice of making publications available before peer review on preprint servers such as ArXiv (https:// arxiv.org), bioRxiv (www.biorxiv.org/) and medRxiv (www.medrxiv.org/). The idea that the free sharing and synergistic integration of scRNA-seq datasets can accelerate the pace of discovery is predicated on the quality of those datasets. Collaboration is only possible through corroboration. Computational genomics is not immune to the reproducibility challenges in other branches of science, but in data science any result should be easily reproducible from the raw data and code used to generate the final analysis. We believe that the full disclosure of the experimental procedures, raw data, metadata and code used in a study is critical to the integrity of research. In the United States, the National Institutes of Health (NIH) has published requirements for genomic data sharing for all NIH-funded research and maintains several large databases to meet this need [60, 61] . In Europe, the European Molecular Biology Laboratory through the European Bioinformatics Institute operates similar repositories committed to the free sharing of genomic data [62] . Our practice is to upload our raw sequencing data to repositories maintained by the NIH including the Sequence Read Archive or the Database of Genotypes and Phenotypes (dbGaP, for human data) and to post our code to public repositories. Many journals now mandate that data and code generated in support of an article be uploaded to a repository and freely accessible to other researchers [63, 64] . The computational identification of a cell population or cell state using any of the commonly employed clustering algorithms always requires independent validation with a complementary technology (table 2) . Ideally for tissue analyses, this validation should add spatial information that defines the "neighbourhood" in which a putative new cell population or cell state resides. To this end, the single-cell research community is sharply focused on developing multiplexed, high-resolution spatial transcriptomic and proteomic methods to both validate and extend data from scRNA-seq. To date, most studies in the lung have used variants of single-molecule RNA fluorescence in situ hybridisation to validate their findings. However, these techniques are limited by the small number of probes that can be imaged simultaneously. To address this concern, multiplexed techniques that allow sequential imaging of multiple probes on a tissue section are being optimised [68] [69] [70] . Tissue clearing and permeabilisation protocols combined with light sheet microscopy offer the opportunity for three-dimensional imaging. All of these spatial transcriptomic techniques will necessitate advances in terms of processing and storage given the size and high complexity of these datasets [71] [72] [73] [74] 87] . While the technology to measure the transcriptome at single-cell resolution is more robust and less costly when compared with those that measure the epigenome, proteome or metabolome, these other technologies are advancing rapidly. Examples of integrated singlecell transcriptomics and single-cell ATAC-seq (assay for transposase-accessible chromatin, which measures regions of open chromatin) are already present [80, 81] . While not yet available at single-cell resolution, reduced representation bisulfite sequencing to query the DNA methylome and "cut and run" technology to measure chromatin modifications can be done with very small numbers of cells [88] [89] [90] . Multiparameter flow cytometry and mass cytometry (CyTOF) can currently measure several proteins simultaneously to complement scRNA-seq with proteomic analysis [91] . Even more exciting is a method for incorporating oligonucleotide-tagged antibodies into a droplet-based scRNA-seq workflow called CITEseq (cellular indexing of transcriptomes and epitopes by sequencing) [75] . This technology permits simultaneous measurement of gene expression and surface protein levels in the same single cell. The liberation of cells from tissues for scRNA-seq differs depending on the tissue digestion protocols used. Therefore, scRNA-seq cannot reliably recapitulate the cellular composition of tissues [14, 92] . Single-nucleus RNA-seq (snRNA-seq) offers a solution to this problem. Protocols using this technology typically take advantage of the relative resistance of the nuclear membrane compared to the plasma membrane to fracture during a freeze-thaw cycle. The intact nucleus is then used in typical droplet-based sequencing protocols. While this procedure necessarily increases contamination by ambient RNA, overrepresents long noncoding and small nuclear RNA and reduces the number of measured genes, it has shown promise when applied to the brain [65, 93, 94] . Most importantly, snRNA-seq can be performed on frozen tissue, offering the promise to generate transcriptomic information at single-cell resolution from archival tissues [65, [93] [94] [95] [96] [97] . Costs of scRNA-seq remain high because of costs of reagents in library preparation workflows and costs of sequencing [8, 36] . Sequencing costs are likely to continue to decrease, but current technologies may reach a limit such that further lowering of cost will require the development of novel NGS platforms [98] . Innovations in experimental protocols have produced great decreases in library preparation cost and are likely to continue to do so. Techniques for adding barcodes to samples prior to single-cell isolation, or for using single nucleotide variants to computationally determine sample genotypes will allow pooling of multiple samples for library preparation [39, 40, 99, 100] . In addition, these techniques are helpful for mitigating the effect of doublets, a single shared cell ID barcode in a scRNA-seq dataset that in actuality corresponds to two distinct cells. Preparation of libraries for scRNA-seq using droplet-based methods results in the capture of ambient RNA that is present in the input single-cell suspension [14] . Thus, genes, particularly those expressed at high levels in cells that are highly prevalent in the input sample, will be counted for cells in which they may not actually be expressed. Computational tools have been developed to address this issue by estimating contamination and adjusting counts accordingly [101, 102] . However, it is challenging to validate these tools, and the effect of ambient RNA contamination may vary between different tissues, experimental conditions and single-cell library preparation protocols. According to an often-cited estimate, the lung is composed of ~40 different cell types [27] . However, this estimate is based largely on studies that used microscopy, which may not be able to distinguish between cell populations similar in appearance but having distinct transcriptional or proteomic states and correspondingly distinct functions. It is likely that scRNA-seq will expand this number. Indeed, a recent preprint describes the detection of 58 cell populations from scRNA-seq of human lung [103] . Furthermore, technologies like snRNA-seq offer promise to more precisely quantify the changes in cellular composition in the lung that develop in response to environmental challenge, disease and ageing. While still in the future, three-dimensional imaging techniques with automated quantification may enable precisely localising and quantifying cell populations within the three-dimensional volume of the lung, obviating the need for statistical estimates based on stereology [104] [105] [106] [107] . Two prominent atlas projects using scRNA-seq, focused on mouse and human data respectively, are the Tabula Muris project and the Human Cell Atlas (HCA) project [108] [109] [110] [111] . Each of these atlases aims to produce data about normal structure and function of organs and tissues. The Tabula Muris is a single publicly available dataset including scRNAseq from almost 100000 cells spanning 20 murine organs and tissues, and it includes data from a substantial number of respiratory system cells, which can be included as a basis of comparison in studies of disease models (https://tabula-muris.ds.czbiohub.org). The HCA is an ongoing collaborative project aimed at producing data from many different organs and tissues from healthy humans (www.humancellatlas.org/). Furthermore, the HCA contains 38 different seed networks spanning a number of different organs and tissues including a dedicated HCA lung network focused on the respiratory system [112] . Major strengths of the HCA lung network include a welcoming stance that allows any investigator to participate and incorporation of multiple assay modalities including single-cell epigenetic and spatial transcriptomic tools together with scRNA-seq. Recently, during the coronavirus disease (COVID-19) global pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), members of the HCA lung network have demonstrated the power and speed of this consortium in being able to respond to novel biological challenges. Rapidly, members of the HCA lung network produced several studies reporting analyses of datasets combined from different groups detailing anatomical expression of the SARS-CoV-2 receptor ACE2 and identifying clinical covariates associated with different expression levels in various tissues of viral entry mediators [113] [114] [115] . The first published study using scRNA-seq in the respiratory field by TREUTLEIN et al. [10] attempted to elucidate the process of lung epithelial cell development in a murine embryonic model. The authors performed clustering on gene expression profiles of lung epithelial cells from embryonic and adult mice to support their finding of a common progenitor cell population of AT1 and AT2 epithelial cells during development. A more recent study reported on scRNA-seq of ~2 million cells from 61 mouse embryos between embryonic days 9.5 and 13.5 [9] . This study was labelled by its authors as an "atlas" in acknowledgement of it being descriptive and of the data it produced being potentially useful for incorporation into future studies. Cells belonging to lung epithelium, a lung epithelial developmental trajectory and other developmental trajectories related to components of the respiratory system are all identified in this study. Ageing is a critical risk factor for the development of many lung diseases including cancer, lung fibrosis, COPD and lung infections [116] . Studying ageing in the lung using scRNAseq is attractive because it offers the promise of comparing distributions of transcriptional states within cell types at different extremes of age. A study that combined scRNA-seq with proteomics of lungs from young and old mice found that ageing was associated with increased transcriptional noise, an increased ratio of ciliated cells to club cells and increased cholesterol biosynthesis [117] . Additionally, an integrated analysis of the scRNA-seq and proteomic data from that study allowed the investigators to identify the most likely cellular sources of differentially regulated proteins with age. In another study, investigators analysed scRNA-seq data derived from lung as well as kidney and spleen tissue from young and old mice. The findings from this study suggested that the effect of ageing on transcriptional noise varied between different cell types and was greater in lung stromal cell types, for example, than AT2 cells. Investigating the biology of ageing in the lung using scRNA-seq of human tissue is particularly challenging because obtaining normal lung tissue is difficult, particularly from multiple individuals of different ages over the lifespan. As more and more scRNA-seq datasets of human lung tissue are published, such as within the HCA, a metaanalysis of human lung scRNA-seq data over the lifespan will become feasible, that can include enough samples from different groups to control for batch effects. Tissue samples from patients with lung cancer are relatively accessible because treatment of lung cancer frequently includes surgical resection. Accordingly, several studies have thus focused on scRNA-seq of lung tissue obtained from patients with NSCLC [26, [118] [119] [120] [121] . The majority of these studies highlight the heterogeneity of tumour cells and, as in the study highlighted in case study 3, tumour-associated immune cells [122] [123] [124] . In addition, the results of two studies suggested that the presence of certain subsets of tumour-associated immune cell markers was associated with prognosis in patients with NSCLC [26, 121] . While stromal, epithelial and immune cells have all been implicated in contributing to the pathogenesis of pulmonary fibrosis, the first published study leveraging scRNA-seq to investigate pulmonary fibrosis pathobiology focused on epithelial cells [13, 125] . This study of human lung tissue demonstrated, and work from our group and others later confirmed, that pulmonary fibrosis was characterised by the emergence of transcriptionally distinct epithelial cell populations [14, 24, 126] . Our group was among the first to apply scRNA-seq to the analysis of patients with pulmonary fibrosis to address the question of macrophage heterogeneity during fibrosis [14] . Our findings were subsequently confirmed and extended in two independent datasets, highlighting the reproducibility of these approaches [23, 24] . Importantly, all of the groups responsible for these studies have already made their data available or committed to doing so, and some have made these data available in a format that can be explored by the community [15, 23, 24, 44, [126] [127] [128] [129] [130] . While most investigators have used scRNA-seq at a single time point during the course of disease, time-series single-cell data offer particular promise. For example, Schiller and colleagues performed scRNA-seq in murine lungs harvested over the course of pulmonary fibrosis in mice induced by bleomycin [131] . The investigators identified and validated a novel Krt8-positive transitional epithelial cell progenitor population that receives input from AT2 and club cells and is responsible for replacing AT1 cells during alveolar repair. This finding was supported by an enrichment analysis indicating higher expression of genes related to proliferation in this novel Krt8positive population and by an RNA velocity analysis indicating convergence of the transcriptional states of AT2 and club cells on the Krt8-positive state. Mouse models of lung injury and repair have been used to understand acute lung injury in humans and also to gain insights into related processes involved in lung development and fibrosis. In one study, investigators used scRNA-seq of lung macrophages to confirm a previous observation that following lung injury induced by lipopolysaccharide (LPS), a population of recruited alveolar macrophages expresses increased levels of inflammatory genes compared with resident alveolar macrophages [132, 133] . In another study, investigators performed scRNA-seq on AT2 cells after LPS-induced lung injury and found that a relative decrease in Tgfb2 expression was associated with AT2 cells occupying a transdifferentiating state rather than a cell cycle arrest state during repair [134] . Author Manuscript Author Manuscript One study has reported on scRNA-seq analysis of samples from patients with asthma. The investigators used a distance-based trajectory analysis, termed "pseudotime", to estimate developmental trajectories among epithelial cells from healthy controls and patients with asthma [15] . The investigators found that there were altered epithelial cell differentiation pathways in asthma and increased numbers of goblet and mucous ciliated cells compared with healthy controls. Additionally, there were increased numbers of type 2 helper T (Th2) effector cells in asthma and increased cell-to-cell signalling involving Th2 cells. This finding is further supported by a study using scRNA-seq to analyse Th-cells in a murine house dust mite (HDM) model of allergic airway disease, which identified a distinct Th2 cell gene expression signature in the airways of mice exposed to HDM [135] . We anticipate that, for any diseases affecting the respiratory system for which studies using scRNA-seq have not yet been published, such studies are probably forthcoming. For example, for cystic fibrosis, COPD and pulmonary hypertension, analyses of scRNA-seq data have not yet been published, but there have been multiple studies utilising bulk RNAseq approaches [136] [137] [138] [139] [140] [141] . As techniques for tissue processing, library preparation and data analysis improve, and as cost continues to decrease, more studies containing larger numbers of cells can be expected. The discovery of the pulmonary ionocyte, a cell type that expresses high levels of CFTR, provides an exceptional motivation for using scRNA-seq to gain insight into the function of this cell type in cystic fibrosis [11, 12] . Bulk RNA-seq has been used for biomarker development for a variety of respiratory system diseases, including NSCLC and idiopathic pulmonary fibrosis [142] [143] [144] [145] [146] [147] . Developing biomarkers using scRNA-seq to assist in diagnosis, prognosis and selection of appropriate therapy in a variety of lung diseases may be feasible, but several questions will have to be clarified. It is not known whether tissue proximal to the disease is required for biomarker development using scRNA-seq, or whether samples obtained by less invasive means, such as blood or nasal epithelial samples, can be used. Furthermore, it is not known what approaches to scRNA-seq analysis can support biomarker development. The extent to which scRNA-seq can be part of a successful platform for developing clinically relevant biomarkers will need to be determined through future research, although the existing studies hinting at the prognostic relevance of certain tumour-associated immune cell populations are promising [26, 121] . In a relatively short period of time, the impact of research using scRNA-seq for advancing knowledge about respiratory biology and disease has been substantial. The opportunity of data sharing and meta-analysis of data from multiple studies is encouraging the formation of large-scale collaborations [112] . The influence of scRNA-seq on respiratory research will grow as the underlying technologies continue to improve along with experimental and computational approaches for more closely integrating scRNA-seq data with spatial, proteomic and epigenomic data. Funding support was received from ATS Foundation/Boehringer Ingelheim Pharmaceuticals Inc. Research Fellowship in IPF, National Heart, Lung, and Blood Institute (HL071643, K08HL146943, T32HLO76139), National Institute on Aging (AG049665), Parker Foundation (Parker B. Francis Fellowship), and U.S. Department of Veterans Affairs (BX000201). Funding information for this article has been deposited with the Crossref Funder Registry. Single-cell RNA-sequencing is able to resolve distinct individual cellular transcriptomes compared to bulk RNA-sequencing. a) Whole lung tissue with distinct cellular components represented by blue circles, red squares and yellow triangles; b) transcriptional output of bulk RNA-sequencing experiments results in an averaging of the individual cellular signals. While some marker genes unique to cell types may indicate the presence of that cell type (protruding corners of red square and arcs of blue circle), the overall signal will be a mixture of the cell types (purple) and may obscure the transcriptomic signature of rare cell types (dashed triangle); c) in a single-cell experiment, every cell type (red square, blue circle, yellow triangle) is represented. Typical workflow of a single-cell RNA-sequencing experiment and possible pitfalls. a) Whole-lung tissue with highly abundant cellular components (blue circle, red square) and rare cell type (yellow triangle). b) A single-cell suspension is created through the mechanical and enzymatic disaggregation of the lung. Cell types that are fragile or difficult to liberate intact from the tissue (grey squares) may be underrepresented in the final dataset. In the lung, these cell types typically include alveolar type 1 epithelial cells and mesenchymal cells including fibroblasts. c) Individual cells are isolated into vessels (plate wells or droplets) with barcoded RNA primers and the cells are lysed to create a unique complementary DNA (cDNA) library for each cell. A number of errors can be introduced at this step, including the inclusion of two or more cells in a vessel, the inclusion of a cell fragment, the creation of a library from an empty vessel (*) or one containing ambient RNA (droplet with all colours), the inclusion of apoptotic cells or the induction of transcriptional changes as a result of processing (purple circle). d) cDNA libraries are sequenced and separated by barcode to get all the individual sequences for the individual cell. Sequences are aligned to a reference genome and successful alignments with known gene sequences are counted. This expression matrix is filtered by several quality criteria and used for downstream analyses. Initial sequencing and analysis of the human genome The sequence of the human genome Ten years of next-generation sequencing technology Highly integrated single-base resolution maps of the epigenome in Arabidopsis The transcriptional landscape of the yeast genome defined by RNA sequencing Mapping and quantifying mammalian transcriptomes by RNA-Seq mRNA-Seq whole-transcriptome analysis of a single cell Exponential scaling of single-cell RNA-seq in the past decade The single-cell transcriptional landscape of mammalian organogenesis Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq A revised airway epithelial hierarchy includes CFTRexpressing ionocytes A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte Single-cell RNA sequencing identifies diverse roles of epithelial cells in idiopathic pulmonary fibrosis Single-cell transcriptomic analysis of human lung provides insights into the pathobiology of pulmonary fibrosis A cellular census of human lungs identifies novel cell states in health and in asthma Alveolar macrophages develop from fetal monocytes that differentiate into long-lived cells in the first week of life via GM-CSF Fate mapping reveals origins and dynamics of monocytes and tissue macrophages under homeostasis Mononuclear phagocytes of the intestine, the skin, and the lung The development and function of lung-resident macrophages and dendritic cells Yolk sac macrophages, fetal liver, and adult monocytes can colonize an empty niche and develop into functional tissue-resident macrophages Monocyte-derived alveolar macrophages drive lung fibrosis and persist in the lung over the life span A spatially restricted fibrotic niche in pulmonary fibrosis is sustained by M-CSF/M-CSFR signalling in monocyte-derived alveolar macrophages Single-cell analysis reveals fibroblast heterogeneity and myofibroblasts in systemic sclerosis-associated interstitial lung disease Single-cell RNA-sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis Understanding the tumor immune microenvironment (TIME) for effective therapy Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing Resident cellular components of the human lung: current knowledge and goals for research on cell phenotyping and function Quantitative single-cell interactomes in normal and virus-infected mouse lungs Experimental considerations for single-cell RNA sequencing approaches Single-cell RNA-seq technologies and related computational data analysis Tutorial: guidelines for the experimental design of single-cell RNA sequencing studies Mapping the mouse cell atlas by microwell-seq Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets Amplification of multiple genomic loci from single cells isolated by laser micro-dissection of tissues A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications The RIN: an RNA integrity number for assigning integrity values to RNA measurements Quantitative single-cell RNA-seq with unique molecular identifiers Multiplexed droplet single-cell RNA-sequencing using natural genetic variation Cell hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics Single-cell transcriptome conservation in cryopreserved cells and tissues DMSO cryopreservation is the method of choice to preserve cells for droplet-based single-cell RNA sequencing Gene expression profiling of single cells from archival tissue with laser-capture microdissection and Smart-3SEQ scRNA-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation Single-cell sequencing reveals dissociation-induced gene expression in tissue subpopulations Comprehensive integration of single-cell data SC3: consensus clustering of single-cell RNA-seq data Single-cell mRNA quantification and differential analysis with census Visualizing data using t-SNE UMAP: uniform manifold approximation and projection for dimension reduction scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data Fast, sensitive and accurate integration of single-cell data with Harmony The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development Slingshot: cell lineage and pseudotime inference for singlecell transcriptomics RNA velocity of single cells Gene ontology: tool for the unification of biology. The Gene Ontology Consortium Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles CellPhoneDB v2.0: Inferring cell-cell communication from combined expression of multi-subunit receptor-ligand complexes National Institutes of Health Office of Science Policy. NIH Genomic Data Sharing Date last accessed Genomic Data Sharing Policy Date last accessed Data Submission: EMBL-EBI Date last accessed Reporting Standards and Availability of Data, Materials, Code and Protocols Date last accessed Date last updated RNA-sequencing from single nuclei Using single nuclei for RNA-seq to capture the transcriptome of postmortem neurons Imaging individual mRNA molecules using multiple singly labeled probes Automated cell-type classification in intact tissues by single-cell molecular profiling Multiplexed detection of RNA using MERFISH and branched DNA amplification SCRINSHOT, a spatial method for single-cell resolution mapping of cell states in tissue sections Three-dimensional intact-tissue sequencing of single-cell transcriptional states Visualization and analysis of gene expression in tissue sections by spatial transcriptomics Laser capture microscopy coupled with Smart-seq2 for precise spatial transcriptomic profiling Spatial transcriptomic analysis of cryosectioned tissue samples with Geo-seq Simultaneous epitope and transcriptome measurement in single cells Multiplexed quantification of proteins and transcripts in single cells Large-scale detection of antigen-specific T cells using peptide-MHC-I multimers labeled with DNA barcodes Single-cell methylome landscapes of mouse embryonic stem cells and early embryos analyzed using reduced representation bisulfite sequencing Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity Simultaneous profiling of transcriptome and DNA methylome from a single cell Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state Single-cell chromatin accessibility reveals principles of regulatory variation Mass cytometry: technique for real time single cell multitarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry Mass cytometry: single cells, many features Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis A practical guide to the measurement and analysis of DNA methylation An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites CyTOF supports efficient detection of immune cell subsets from small samples Single-cell transcriptomics of a human kidney allograft biopsy specimen defines a diverse inflammatory response Massively parallel single-nucleus RNA-seq with DroNc-seq Advantages of single-nucleus over single-cell RNA sequencing of adult kidney: rare cell types and novel cell states revealed in fibrosis Div-Seq: single-nucleus RNA-Seq reveals dynamics of rare adult newborn neurons Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain Nuclear RNA-seq of single neurons reveals molecular signatures of activation Coming of age: ten years of next-generation sequencing technologies Multiplexed single-cell RNA-seq via transient barcoding for simultaneous expression profiling of various drug perturbations MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices SoupX removes ambient RNA contamination from droplet based single cell RNA sequencing data Decontamination of ambient RNA in single-cell RNA-seq with DecontX A molecular cell atlas of the human lung from single cell RNA sequencing Morphometric study of rat lung cells. I. Numerical and dimensional characteristics of parenchymal cell population Cell number and cell characteristics of the normal human lung Morphometric characteristics of cells in the alveolar region of mammalian lungs European Respiratory Society: standards for quantitative assessment of lung structure Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris The Human Cell Atlas: from vision to reality The Human Cell Atlas The Human Cell Atlas white paper The Human Lung Cell Atlas: a high-resolution reference map of the human lung in health and disease SARS-CoV-2 entry factors are highly expressed in nasal epithelial cells together with innate immune genes SARS-CoV-2 receptor ACE2 is an interferonstimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues Integrated analyses of single-cell atlases reveal age, gender, and smoking status associations with cell type-specific expression of mediators of SARS-CoV-2 viral entry and highlights inflammatory programs in putative target cells Blue Journal Conference. Aging and susceptibility to lung disease An atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics Identification of distinct tumor subpopulations in lung adenocarcinoma via single-cell RNA-seq Single-cell RNA sequencing of lung adenocarcinoma reveals heterogeneity of immune response-related genes Integration of single-cell RNA-seq data into population models to characterize cancer metabolism Single-cell transcriptomics of human and mouse lung cancers reveals conserved myeloid populations across individuals and species Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer Single-cell map of diverse immune phenotypes in the breast tumor microenvironment Cellular interactions in the pathogenesis of interstitial lung diseases Single cell RNA-seq reveals ectopic and aberrant lung resident cell populations in idiopathic pulmonary fibrosis Proliferating SPP1/MERTK-expressing macrophages in idiopathic pulmonary fibrosis Date last updated Date last updated Longitudinal single cell transcriptomics reveals Krt8 + alveolar epithelial progenitors in lung regeneration Cell origin dictates programming of resident versus recruited macrophages during acute lung injury Single cell RNA sequencing identifies unique inflammatory airspace macrophage subsets Single cell RNA sequencing identifies TGFβ as a key regenerative cue following LPS-induced lung injury Single-cell RNA sequencing of the T helper cell response to house dust mites defines a distinct gene expression signature in airway Th2 cells Transcriptomic profile of cystic fibrosis patients identifies type I interferon response and ribosomal stalk proteins as potential modifiers of disease severity Transcriptomic profile of cystic fibrosis airway epithelial cells undergoing repair Integrated genomics reveals convergent transcriptomic networks underlying chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis RNA-sequencing across three matched tissues reveals shared and tissue-specific gene expression and pathway signatures of COPD RNA sequencing analysis detection of a novel pathway of endothelial dysfunction in pulmonary arterial hypertension Distinct differences in gene expression patterns in pulmonary arteries of patients with chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis with pulmonary hypertension Classification of usual interstitial pneumonia in patients with interstitial lung disease: assessment of a machine learning approach using high-dimensional transcriptional data Usual interstitial pneumonia can be detected in transbronchial biopsies using machine learning Use of a molecular classifier to identify usual interstitial pneumonia in conventional transbronchial lung biopsy samples: a prospective validation study Derivation of a bronchial genomic classifier for lung cancer in a prospective study of patients undergoing diagnostic bronchoscopy A bronchial genomic classifier for the diagnostic evaluation of lung cancer Shared gene expression alterations in nasal and bronchial epithelium for lung cancer detection