key: cord-0318520-forvxtn9 authors: Moses, Lambda; Pachter, Lior title: Museum of Spatial Transcriptomics date: 2021-05-25 journal: bioRxiv DOI: 10.1101/2021.05.11.443152 sha: c5577aa4942f5ba1c34e7ea15ffd2c2d6ca580e1 doc_id: 318520 cord_uid: forvxtn9 The function of many biological systems, such as embryos, liver lobules, intestinal villi, and tumors depends on the spatial organization of their cells. In the past decade high-throughput technologies have been developed to quantify gene expression in space, and computational methods have been developed that leverage spatial gene expression data to identify genes with spatial patterns and to delineate neighborhoods within tissues. To assess the ability and potential of spatial gene expression technologies to drive biological discovery, we present a curated database of literature on spatial transcriptomics dating back to 1987, along with a thorough analysis of trends in the field such as usage of experimental techniques, species, tissues studied and computational approaches used. Our analysis places current methods in historical context, and we derive insights about the field that can guide current research strategies. A companion supplement offers a more detailed look at the technologies and methods analyzed: https://pachterlab.github.io/LP_2021/. It has long been recognized that in biological systems ranging from the Drosophila embryo to the hepatic lobule, many genes need to be properly regulated in space for the system to function. In order to study the spatial patterns of gene expression, many different spatial transcriptomics methods, which produce spatially localized quantification of mRNA transcripts as proxies for gene expression, have been developed. Thanks to growing interest in the field, several reviews have been written in the past 5 years, providing overviews of experimental techniques for data collection [1, 2] , and describing how such techniques can be applied to specific biological systems, e.g. tumors [3] , brain [4] , and liver [5] . These reviews typically begin with either laser capture microdissection (LCM) or single encoding each gene with a combination of colors so transcripts of more genes than easily discernible colors (up to 5) can be quantified simultaneously. Combinatorial barcoding was first reported in immunological DNA FISH in 1989 [41] and was first used for transcripts in 2002 [47] . The first unequivocal demonstration of smFISH showing each mRNA molecule as a spot was reported in 1998 [45] . Highly multiplexed smFISH would not have been possible without the development of these technologies. (WM)ISH was the technology of choice in the late 1990s and the 2000s before the rise of highly multiplexed, high resolution, and more quantitative technologies, and has been used to create gene expression atlases in embryos of several species such as Drosophila melanogaster [25] , Mus musculus, and Gallus gallus [28] , in various mouse organs such as the brain [30] , genitourinary tract [34] , and lung [35] , and for specific types of genes such as miRNAs [29] ( Figure 1B ). For species other than mice and humans, organs other than the brain, and miR-NAs, the only spatial transcriptomics resources currently available are for the most part (WM)ISH atlases. Model organism databases collecting the proliferating gene expression patterns from various sources were also established in this period, such as gene expression database (GXD) [57] and Zebrafish Information Network (ZFIN) [58] ( Figure 1B) . The golden age of (WM)ISH seems to have ended in the 2010s ( Figure 1B) , perhaps due to some of the disadvantages of (WM)ISH, such as requiring stereotypical tissue structure, the need for thousands of animals to generate an atlas, and the largely qualitative nature of results. Early motivating applications for spatial transcriptomics included identification of genes with restricted patterns which indicated function in development, identification of novel cell type markers, and identification of novel cell types not evident from tissue morphology [14, 15] . In the 1980s and 1990s, analyses were typically done manually, although more recently automated methods have been developed (Supplementary Material Chapter 3). Recent developments in machine learning, coupled to more powerful computing infrastructure and the availability of more quantitative data, have opened up new possibilities. However, the legacy of the prequel era still lives on; current era studies still frequently reference prequel atlases [59] [60] [61] . . Developers of such technologies often seek to enable a trifecta of transcriptome wide profiling, single cell resolution, and high gene detection efficiency. While this achievement appears to be increasingly within reach, current era technologies are characterized by trade offs between these goals. Current era microdissection constitutes the compilation of spatial locations of samples during the course of microdissection, and the assaying of transcriptomes of samples by cDNA microarray or RNA-seq. Since 1999, by far the most widely used microdissection technology is LCM, which has been applied to various biological fields such as oncology, neuroscience, immunology, developmental biology, and botany (see Supplementary Material Chapter 6 for topic modeling of PubMed LCM literature). In UV LCM, sometimes also called laser microbeam microdissection (LMM) or laser microdissection (LMD), a UV laser ablates a narrow strand of tissue around the desired area, which is then collected into tubes by gravity (Leica LMD) or laser pressure catapult (LPC, as in Zeiss PALM microbeam) ( Figure 2B , C). In IR LCM (Arcturus PixCell II), the tissue section is mounted on a plastic membrane on tube caps. The IR laser briefly heats the desired area, melting the membrane so the tissue in the area is fused to the cap and captured (Figure 2A ). IR and UV can be combined; for example UV can be used to cut the tissue and IR to remove the desired area with the plastic membrane (recent versions of Arcturus). Advantages of LCM include transcriptome wide profiling, precise cuts informed by histology, compatibility with formalin fixed paraffin embedded (FFPE) tissues [62] , and the possibility of applying the method to single cells from both frozen [63] and FFPE sections [64] . LCM can also be applied to 3D tissues by microdissecting non-overlapping domains as in Geo-seq [65] . Disadvantages of LCM include difficulty to scale to larger number of samples, exclusion of whole mount samples, and potential RNA degradation [66] . While LCM is widely used to capture targeted areas according to histology, it can also cut tissue into an untargeted grid [67] . Other microdissection techniques generally fall into two categories: Mechanical (Supplementary Material 5.1.4) and optical (Supplementary Material 5.1.5). The former includes 2000s voxelation [68] (Figure 4) , and Tomo-seq [69] , which sections a tissue with a cryotome along an axis of interest, followed by RNAseq on each section. Optical microdissection includes GeoMX DSP [70] , which shines UV light on regions of interest (ROIs) to release photo-cleavable gene barcodes for quantification, and Niche-seq [71] , which uses fluorescence activated cell sorting (FACS) to isolate cells expressing photoactivated GFP in transgenic mice for scRNA-seq. Chronologically, the next technology developed in the current era is highly multiplexed single-molecule FISH (smFISH), which began with a 2012 prototype (seqFISH) that relied on super-resolution microscopy (SRM) to simultaneously profile 32 genes in yeast by hybridizing probes with different colors to transcripts, and then deducing relative locations of the colors present [72] . SRM is no longer needed; in 2014 seqFISH [51] was published, in which one color per gene is visualized per round of hybridization and the probes are stripped before the next round for the next color in the barcode. With 4 colors, 8 rounds of hybridization (4 8 = 65536) are more than enough to encode all genes in the human or mouse genome. In practice, an error correcting round of hybridization is performed, so that genes can still be distinguished if signal from one round of hybridization is missing [73] ( Figure 2D ). More recently in a version of seqFISH based on RNA SPOTS [74] , the "colors" themselves are one hot encoded by a sequence of hybridizations, expanding the palette to 20 "colors" per channel and enabling the profiling of 10,000 genes [75] . Another smFISH technique is multiplexed error-robust FISH MERFISH [52] , which uses a different barcoding strategy, in which each gene is encoded by a binary code. The color codes in each experiment must be separated by a Hamming distance (HD) of 4 to allow for correction of missing signal in one round, and by 2 to identify error without the facility for correcting it ( Figure 2D ). The length of barcodes can be increased to encode 10,000 genes [76] . As only the fluorophores are removed but the probes are not stripped, numerous rounds of hybridization in MERFISH are less time consuming than those in seqFISH. Most other smFISH based techniques, such as HybISS [77] and split-FISH [78] , use either seqFISH-like or MERFISH-like barcoding. Advantages of smFISH based techniques include high gene detection efficiency ( 95% for Hamming distance 4 MERFISH [79] compared to smFISH), single cell resolution, and subcellular transcript localization, which can be biologically relevant [80, 81] . Single round smFISH has nearly 100% detection efficiency [72] , and multiple rounds of hybridization tend to decrease the efficiency in part because barcodes with incorrigible errors are discarded. Disadvantages include requirement of pre-defined gene panel and probes, difficulty in probing shorter transcripts, lengthy imaging time, limited scalability to large area of tissues, possible challenges in cell segmentation, and the need to process terabytes of images. Challenges with smFISH have been addressed by various methods: Signal to noise ratio can be improved with rolling circle amplification (RCA) [77] , branched DNA (bDNA) [82] , hybridization chain reaction (HCR) [73] , and tissue clearing [83] . With increasing number of genes profiled, the transcript spots are increasing likely to overlap, causing optical crowding. This can be mitigated by expansion microscopy (ExM) [84] , only imaging a subset of probes at a time and using computational super-resolution [75] , imaging highly expressed genes without combinatorial barcoding [52] , and computationally resolving overlapping spots [85] . While smFISH based techniques are typically designed for frozen sections, SCRINSHOT is designed for FFPE sections [86] . ISS methods yield spatial transcriptome information by sequencing, typically by ligation (SBL), gene barcodes (targeted) or short fragments of cDNAs (untargeted) in situ. Such methods rely on ligase only joining two pieces of DNAa primer with known sequence and a probe -if they match the template, with non-matching probes washed away. The probes used are degenerate except for one or two query bases encoded by a color. The 2013 ISS [50] , later commercialized by Cartana, uses one query base per probe as in cPAL [87] to sequence gene barcodes ( Figure 2E ). FISSEQ [88] and a later adaptation with ExM called ExSeq [89] use SOLiD, which uses two query bases per probe to sequence circularized and RCA amplified cDNAs. In STARmap [90] , gene barcodes are sequenced by SEDAL, in which SOLiD-like two query bases are also used to reject error, but one base encoding can also be used [90] . BARseq also RCA amplifies probes with gene barcodes, but uses sequencing by synthesis (SBS) instead of SBL to sequence the barcodes [91] . Advantages of ISS include single cell resolution and subcellular transcript localization, as ISS displays each mRNA molecule as a spot. The ISS technologies mentioned above use RCA, which greatly amplifies signal, but misses many transcripts. This can be an advantage, as brighter and less crowded spots allow ISS to be applied to larger tissue areas such as whole mouse brain sections, with lower magnification (20x, while MERFISH uses 60x) [92, 93] . However, ISS techniques tend to have low detection efficiency. Whereas detection efficiency of scRNA-seq techniques is between 3% -25% [94] [95] [96] [97] [98] , the detection efficiencies of Cartana ISS and FISSEQ [99] are 5% and 1% respectively, with STARmap only marginally better. However, ExSeq claims up to 62% efficiency compared to smFISH, i.e. for genes of interest in the same cell type, ExSeq detects around 62% as many transcripts as smFISH. Untargeted ExSeq can be transcriptome wide, but the targeted ISS techniques have only been used for up to 1020 genes [90] though more typically fewer than 300 genes [89] , perhaps due to limited read length of SBL and challenges of imaging and image processing as in smFISH. Spatial locations of transcripts can also be preserved by capturing the transcripts from tissue sections on in situ arrays. Such arrays can be manufactured by printing spot barcodes, UMIs, and poly-T oligos on commercial microarray slides to capture polyadenylated transcripts, as in the ST and Visium technologies ( Figure 2G , H). They can also be Drop-seq-like beads [94] with split pool barcodes, UMIs, and poly-T oligos spread on slides in a single layer (e.g., Slide-Seq [100] ) or confined in wells etched on the slides (e.g., HDST [101] ), with bead barcodes subsequently located using in situ SBL. Alternatively, in DBiT-seq [102] , an array is generated by microfluidic channels, which are used to deposit one type of barcode in one direction, and then another in a perpendicular direction, with the orthogonal barcodes ligated so each spot can be identified with a unique pairwise combination. While array based techniques are typically designed for frozen sections and 3' end Illumina sequencing, Visium has recently been adapted to FFPE sections [103] and Nanopore long read sequencing [104] . Array based techniques have been applied to large areas of tissue [61] , and their use is increasing ( Figure 3A ). Nevertheless, they do not have single cell resolution, and the spot diameter of ST is 100 µm with spots 200 µm apart ( Figure 2G ). Visium, which is an improved version of ST released after 10X Genomics acquired ST, has spots in a hexagonal array with diameter 55 µm ( Figure 2H ). Bead diameter is 10 µm in Slide-seq, and 2 µm in HDST. Slideseq and HDST use bead size smaller than single cells, however they still don't provide single cell resolution because one bead can span two or more cells. Resolution of DBiT-seq is determined by channel width (either 50, 25, or 10 µm). More recently, the spot size can be reduced to below 1 µm, with RCA amplified DNA nanoballs as small as 0.22 µm across with spot barcodes deposited in wells 0.5 or 0.715 µm apart in Stereo-seq [105] , and in Seq-Scope polonies with spatial barcodes 0.5 µm in diameter on an Illumina flow cell re-purposed to capture transcripts from tissue sections [106] . Another polony based method PIXEL-seq achieves spot diameter of about 1.22 µm but unlike in the flow cell, PIXEL-seq does not have much spacing around each polony [107] . Array based techniques tend to have low detection efficiency. The efficiency of ST is estimated to be 6.9% compared to smFISH for select genes in the same tissue type, comparable to that of scRNA-seq. Visium's efficiency seems to be moderately higher than that of ST, and DBiT-seq's is even higher, at 15.5% compared to smFISH. Slide-seq and HDST are much less efficient. Efficiency of Slide-seq1 is only 2.7% of that of Drop-seq, while Slide-seq2 is on par with Drop-seq [108] . Efficiency of HDST is 1.3% per bead compared to smFISH. Efficiencies of the submicron techniques, in number of UMIs per unit area in the same tissue, might be comparable to that of Visium [107] . Some technologies have been developed to preserve information necessary to computationally reconstruct spatial gene expression patterns without imaging. One such technology is DNA microscopy [109] , which records proximity between cDNAs. This information can be used to reconstruct relative locations of transcripts. At the cellular level, gene expression in rare cell types can be reconstructed by deliberately assaying multiplets, and then mapping them to locations in a spatial reference based on gene expression of cells from common cell types attached to cells from the rare cell types [110, 111] . Variants of the term "spatial transcriptomics" have also been used to describe techniques localizing transcripts to organelles (e.g., APEX-seq [112] ), although no spatial coordinates are recorded. Another direction of development is multi-omics (Supplementary Material 4.6). Oligo-tagged antibodies are used to detect proteins of interest, and the oligonucleotide signifying the protein species can be detected with smFISH based methods. Such antibody panels have been combined with a variant of ST as SM-Omics [113] , GeoMX DSP [70] , and MERFISH [90] . A disadvantage of antibody panels is that the number of proteins profiled is limited to at most a few dozens. MERFISH [114] and seqFISH+ [115] have been adapted to visualize chromatin structure, by targeting DNA genomic loci [114] or introns of nascent transcripts [114, 115] . Multiplexed transcript quantification can also be combined with neuron projection tracing. For instance, cholera toxin subunit b (CTb) retrograde tracing has been used in conjunction with MERFISH to visualize axons [116] . Also, BARseq was originally designed to use ISS for axon tracing by sequencing neuron specific barcodes introduced by a virus injected into the brain, but was later adapted to sequence gene barcodes as well. The processing and analysis of high-throughput spatial transcriptomics data requires novel methods and tools, especially for problems such as image preprocessing, spatial reconstruction of scRNA-seq data, cell type deconvolution of array-based data, identification of spatially variable genes, and inference of cell-cell interactions. For smFISH and ISS based data, the raw data consists of images of fluorescent spots, which must be processed to identify transcript spots, match spots to genes, and assign spots to cells (Section 5.1). SmFISH and ISS studies often use classical image processing tools such as top-hat filtering to remove background, translation to align images from different rounds of hybridization, and watershed for cell segmentation [73, 79, 90] . Machine learning in Ilastik, deep learning packages like DeepCell [117] , and alternative tools incorporating scRNA-seq data [118] , can also be used for cell segmentation. However, without visualizing the plasma membrane, accuracy of cell segmentation is limited. Some analyses, such as identification of tissue regions, can be performed without cell segmentation [118] . Until 2019, image processing was typically performed with poorly documented and technique specific code written in the proprietary language MATLAB, but more recently such code is increasingly written in the open source language Python. The package starfish [119] was developed as an attempt to provide a unified and well-documented user interface to process images from different techniques such as seqFISH, MERFISH, and ISS, but it has not yet been widely adopted. Recently, improvements in scRNA-seq technology have inspired new methods for leveraging the complementary nature of high-resolution transcriptome quantification with spatial transcriptomics data. For smFISH and ISS data that is not transcriptome wide, expression patterns of genes not profiled in the spatial data can be imputed with scRNA-seq data, either by mapping dissociated scRNA-seq cells to the spatial reference or by directly imputing gene expression in space using expression profiles from scRNA-seq (Supplementary Material 7.3). Cells can be mapped to spatial locations on an existing spatial dataset with genes shared by the two datasets, with an ad hoc score favoring similarity between cell and location [60] or via optimal transport modeling [120] . While ad hoc scoring is simple to implement, the results tend to be qualitative. Alternatively, spatial locations of scRNA-seq cells can be reconstructed de novo, by optimal transport modeling, or projection into 2 or 3 dimensions corresponding to the spatial dimensions [121] . Optimal transport exploits spatial autocorrelation of gene expression to map cells to locations. However, spatial autocorrelation in tissues breaks down when different cell types are co-localized. When a spatial reference is not available, the de novo approach often does not yield results resembling the original tissue [121] . Gene expression in space can also be imputed from scRNA-seq without explicitly mapping scRNA-seq cells to locations. A common approach is to project the spatial and scRNA-seq data into a shared low dimensional and batch-free latent space, and to subsequently estimate gene expression by projecting the spatial cells into the latent space. Examples of this approach include Seurat3 [122] and gimVI [123] . These methods may also be used to add spatial context to single cell multi-omics data when spatial techniques for some of the multi-omics data are not available. In spatial data without single cell resolution, such as those derived from ST and Visium, scRNA-seq data can inform cell type composition of the spots or voxels (Supplementary Material 7.4). A common approach is to explicitly model observed gene expression at the spots as a weighted sum of mean gene expression of each cell type from scRNA-seq. Gene counts can then be modeled with negative binomial (NB) or Poisson distributions, and cell type proportions in each spot can then be estimated from the parameters of the model. Examples of such methods include stereoscope [124] , and RCTD [124] . Given the relevance of scRNA-seq to spatial data, popular scRNA-seq exploratory data analysis (EDA) ecosystems such as Seurat [122] , SCANPY (Squidpy) [125] , and SingleCellExperiment (SpatialExperiment) [126] have added functionalities for spatial data, such as updates to data containers and functions to facilitate visualization of gene expression and cell/spot metadata at spatial locations (Supplementary Material 7.2). EDA packages dedicated to spatial data with beautiful graphics and good documentation have also been written, such as Giotto [127] , STUtility [128] , and SPATA [129] . Seurat, Giotto, and SPATA also implement basic methods to identify spatially variable genes. In addition, Giotto implements methods to identify cell type enrichment in ST and Visium spots, identify gene coexpression and association between gene expression and cell type colocalization, and to identify spatial regions [130] . Spatially variable genes are genes whose expression is associated with spatial location (Supplementary Material 7.5). Two approaches are commonly used: Gaussian process regression (GPR) [131] and its generalization to Poisson [132] and NB [133] , and Laplacian score [134] . The former models normalized gene expression or the rate parameter of Poisson or NB gene expression as a GPR and finds whether the model better describes the data with the spatial term than without. To speed up computation, the cells or spots can be aggregated with self-organizing map before the GPR approach is applied [135] . The latter approach identifies genes whose expression better reflects the structure of a spatial neighborhood graph. The locations of cells can also be modeled as a spatial point process with gene expression as marks; spatially variable genes can be identified as marks associated with locations [136] . GPR based meth-ods approximate the p-values of genes from theory, while other methods use permutation testing, which makes them less scalable. However, the Gaussian kernel commonly used for GPR based methods doesn't account for anisotropy observed in tissues. Spatial information also enables identification of potential cell-cell interaction ( Supplementary Material 7.8) . This is commonly done with knowledge of ligand-receptor (L-R) pairs and testing of whether L-R pairs are more likely to be expressed in neighboring cells or spots [137] . Expression of genes of interest can also be modeled, including a term for cell-cell co-localization; the gene is considered associated with cell-cell co-localization if the model better describes the data with this term than without [138, 139] . There are many other types of analyses that are useful for spatial transcriptomics analysis, including identification of archetypal gene patterns The quality vs. quantity trade off inherent in existing technologies means that there is no single "best" solution currently available, and the difficulty in implementing methods has resulted in many technologies never spreading beyond their institutions of origin. LCM, ST, Visium, ISS, and Tomo-seq have been the most widely adopted ( Figure 3A) , and in almost all cases in the US and western Europe (Supplementary Figures 4.9 , 5.25, 5.31). In terms of tissues analyzed, multiplexed current era techniques have been used widely to characterize human tissues [140] , tumors [53] (especially breast tumors and squamous cell carcinoma), and pathological tissues that don't necessarily have a stereotypical structure [141] (Figure 3B , C). In the SARS-CoV-2 pandemic, GeoMX DSP has been used for spatial transcriptomic profiling in lung autopsy of COVID victims [142, 143] . Some of the processed data, and associated spatially variable genes, can be downloaded and visualized from SpatialDB [144] . Excluding LCM, the vast majority of current era studies were performed on either humans or mice ( Figure 3D ), and the brain is the most studied organ ( Figure 3B , E, F). In particular, the international project Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative -Cell Census Network (BICCN) is constructing a multi-modal atlas for the human, mouse, and non-human primate brain, including spatial data such as MERFISH and seqFISH [145] . While the smFISH techniques being utilized for this project can in principle scale to many genes, in practice they have for the most part been used for limited numbers of genes ( Figure 3G ) and cells ( Figure 3H ). All packages mentioned in the Data analysis section are open source and written in languages such as R, Python, and Julia. Downstream analyses in While technologies of the prequel are rapidly being deprecated, the ideas and methods that underlie them are fundamental to current era spatial transcriptomics. The field has dramatically expanded over the past 5 years ( Figure 4A) , with a plethora of new techniques and popularization of Visium driving growth ( Figure 4B, Supplementary Figures 4.8, 5.34 ). The popularity of these techniques may be attributable to applicability to diverse tissues and the availability of commercial kits and core facilities translating to less work and cost to set up instruments and train personnel. What lies ahead of the rising curves? First, more can be done to improve data collection techniques. For example, most current era techniques require tissue sections. Highly multiplexed whole mount smFISH and tissue clearing protocols, and more efficient computational tools for aligning multiple sections that may come from multiple individuals or even developmental stages, should be developed to extend current era techniques to 3D and to spatiotemporal analysis. Furthermore, smFISH and ISS techniques, with signal amplification to reduce the number of probes per transcript, can be adapted to target isoform specific exons or untranslated regions rather than all transcripts of a gene. Second, current era data has not yet been integrated into comprehensive databases. Prequel databases such as GXD, e-Mouse Atlas and Gene Expression (EMAGE) [148] , and FlyExpress [149] include data from multiple sources and can be queried by gene symbol and developmental and spatial ontologies. In addition, ABA [30] , EMAGE, and FlyExpress aligned ISH images to common coordinates and can be queried with expression patterns. While some current era authors provide online interactive visualization of datasets from their studies [61] , comprehensive databases integrating data from multiple sources as in the prequel era have not yet been developed. Furthermore, while prequel ontologies are still used in current era studies, such ontologies may be improved with the transcriptome wide quantitative data from the current era. Third, outside of LCM, the current era is highly focused on the brain in human and mice, with potential spatial transcriptomics investigations of other organs such as the liver and the leaf lagging behind. Technological modernization of prequel consortia for organisms other than human and mice, and for organs other than the brain, holds much promise for the development of useful spatial transcriptomics atlases. Fourth, an open source, well-documented, interoperable, and scalable workflow with an integrated easy-to-use interface would greatly simplify spatial transcriptomics data collection and analysis. At present, for tasks beyond EDA, users still often need to learn new syntax, convert object types, and even learn new languages to use some data analysis tools. Finally, our survey of methods shows that spatial transcriptomics methods need to be more open and accessible so that they become adopted around the world, and are not restricted to Western elite institutions. The criterion for including studies in the database was that the study used a method for quantification of transcripts while recording spatial context of samples within a tissue or cell. Methods were required to be able to quantify more transcripts and genes than possible with one round of FISH or immunofluorescence. Reviews and protocols were excluded. For publications on data analysis methods, the criterion for inclusion was that the proposed method went beyond the use of existing packages, and demonstrated a novel methodological approach or technique on a spatial transcriptomic dataset. In addition to collating publications through extensive reading and citation searching, keywords such as "spatial transcriptomics", "digital spatial profiling", "in situ sequencing", as well as technology specific terms such as "visium", "seqfish", and "merfish", were searched on PubMed and bioRxiv. The search results, as well as related papers and other papers citing the publication of interest on PubMed were manually screened and entered into the database. The sharing of the database is intended to foster crowd sourcing of literature in the future. All analyses were performed with R version 4.0.4. The timelines were generated with the R package ggtext [150] . The tissue maps in Figure 3B -C, E-F were generated with the R package gganatogram [151] . All trend lines were estimated by linear regression, and significance of the slope for each was computed by a t-test. The pictograms in Figure 3I -J were generated with the R package ggtextures [152] . In the supplement, the medium resolution world map was generated with the R package rnaturalearth using the Robinson projection option. Cities and institutions were geocoded with the Google geocoding API. For LCM text mining, PubMed abstracts and metadata were downloaded by searching for the expression "((laser capture microdissection) OR (laser microdissection)) AND ((microarray) OR (transcriptome) OR (RNA-seq))" with the PubMed API, with R package easyPubMed [153] , BioRxiv abstracts were downloaded by web scraping with Python package biorxiv-retriever [154] and metadata was downloaded with the bioRxiv API. The abstracts were tokenized into unigrams while preserving common phrases. Stopwords and punctuations were removed and the words were stemmed. With token counts in each abstract, the R package stm [155] was used for topic modeling of LCM abstracts from PubMed and bioRxiv. The date of publication, city of first authors (PubMed) or corresponding authors (bioRxiv), and journal (including bioRxiv) were used as covariates to model topic prevalence. Due to the large number of cities and journals, cities and journals with fewer than 5 publications were grouped into "Other" prior to modeling. The number of topics was chosen based on a trade off between hold out likelihood and residual, and between topic exclusivity and semantic coherence; 50 topics were used. Global vector (GloVe) embedding [156] of words from the abstracts was performed with the R package text2vec [157] , using 125 dimensions. The word embeddings were then Louvain clustered [158] with the R package igraph [159] and projected to lower dimensions for visualization with principal component analysis (PCA) using the R function prcomp, and uniform manifold approximation and projection (UMAP) [160] with the R package uwot [161] . The paper and supplement can be continuously updated to reflect the contents of the database. All code for generating the figures is available at https://github.com/pachterlab/LP_2021, and the version as of submission is archived at https://doi.org/10.5281/zenodo.4795375. The curated database can be accessed at https://docs.google.com/spreadsheets/d/1sJDb9B7AtYmfKv4-m8XR7uc3XXw_ k4kGSout8cqZ8bY/. Uncovering an Organ's Molecular Architecture at Single-Cell Resolution by Spatially Resolved Transcriptomics Spatially Resolved Transcriptomes-Next Generation Tools for Tissue Exploration The Spatial and Genomic Hierarchy of Tumor Ecosystems Revealed by Single-Cell Technologies The promise of spatial transcriptomics for neuroscience in the era of molecular cell typing Single-cell genomics and spatial transcriptomics: Discovery of novel cell states and cellular interactions in liver physiology and disease biology Formation and Detecction of RNA-DNA Hybrid Molecules in Cytological Preparations RNA-DNA hybrids at the cytological level Localisation of cellular globin messenger RNA by in situ hybridisation to complementary DNA Immunological method for mapping genes on Drosophila polytene chromosomes High resolution detection of DNA-RNA hybrids in situ by indirect immunofluorescence A non-radioactive in situ hybridization method for the localization of specific RNAs in Drosophila embryos reveals translational control of the segmentation gene hunchback Whole-mount in situ hybridization in the mouse embryo: gene expression in three dimensions Actin gene expression visualized in chicken muscle tissue culture by using in situ hybridization with a biotinated nucleotide analog Detection in situ of genomic regulatory elements in Drosophila Mouse embryonic stem cells and reporter constructs to detect developmentally regulated genes Promoter trapping' in Caenorhabditis elegans tech. rep Soma-germline asymmetry in the distributions of embryonic RNAs in Caenorhabditis elegans Efficient Isolation of Novel Mouse Genes Differentially Expressed in Early Postimplantation Embryos Gene expression screening in Xenopus identifies molecular pathways, predicts gene function and provides a global view of embryonic patterning GXD: a Gene Expression Database for the laboratory mouse MAGEST: MAboya Gene Expression patterns and Sequence Tags Large-scale analysis of gene function in Caenorhabditis elegans by high-throughput RNAi Gene expression profiles in Ciona intesti-nalis tailbud embryos A transcriptome atlas of the mouse brain at cellular resolution Systematic determination of patterns of gene expression during Drosophila embryogenesis MEPD: a Medaka gene expression pattern database The Zebrafish Information Network (ZFIN): the zebrafish model organism database GEISHA, a whole-mount in situ hybridization gene expression screen in chicken embryos MicroRNA Expression in Zebrafish Embryonic Development Genome-wide atlas of gene expression in the adult mouse brain Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution I: Data acquisition pipeline Global Analysis of mRNA Localization Reveals a Prominent Role in Organizing Cellular Architecture and Function Xenbase: a Xenopus biology and genomics resource The GUDMAP database -an online resource for genitourinary research The Molecular Atlas of Lung Development Program ZEBrA: Zebra finch Expression Brain Atlas-A resource for comparative molecular neuroanatomy and brain evolution studies The laser in the Lowry technique for microdissection of freeze-dried tissue slices A ligase-mediated gene detection technique PCR-based cDNA library construction: general cDNA libraries at the level of a few cells Amplified RNA synthesized from limited quantities of heterogeneous cDNA Multiple fluorescence in situ hybridization Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray Laser Capture Microdissection Single-cell mutation analysis of tumors from stained histologic slides Visualization of Single RNA Transcripts in Situ Gene expression profiles of laser-captured adjacent neuronal subtypes Single-Cell Gene Expression Profiling Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis Transcriptome Tomography for Brain Analysis in the Web-Accessible Anatomical Space In situ sequencing for RNA analysis in preserved tissue and cells Singlecell in situ RNA profiling by sequential hybridization Mar Spatially resolved, highly multiplexed RNA profiling in single cells Visualization and analysis of gene expression in tissue sections by spatial transcriptomics Long walk to genomics: History and current approaches to genome sequencing and assembly A GAL4-Driver Line Resource for Drosophila Neurobiology A gene expression atlas of the central nervous system based on bacterial artificial chromosomes A database for mouse development The Zebrafish Model Organism Database: New support for human disease models, mutation details, gene expression phenotypes and searching Spatial reconstruction of single-cell gene expression data The Drosophila embryo at single-cell transcriptome resolution Molecular atlas of the adult mouse brain. Science Advances 6, eabb3446. issn: 2375-2548 Identification of mRNAs and lincRNAs associated with lung cancer progression using next-generation RNA sequencing from laser micro-dissected archival FFPE tissue specimens Laser capture microscopy coupled with Smart-seq2 for precise spatial transcriptomic profiling Gene expression profiling of single cells from archival tissue with laser-capture microdissection and Smart-3SEQ Spatial Transcriptome for the Molecular Annotation of Lineage Fates and Cell Identity in Mid-gastrula Mouse Embryo Combining laser capture microdissection with quantitative real-time PCR: Effects of tissue manipulation on RNA quality and gene expression Topographical transcriptome mapping of the mouse medial ganglionic eminence by spatially resolved RNA-seq Multiplex Three-Dimensional Brain Gene Expression Mapping in a Mouse Model of Parkinson's Disease Genome-wide RNA Tomography in the Zebrafish Embryo High multiplex, digital spatial profiling of proteins and RNA in fixed tissue using genomic detection methods Spatial reconstruction of immune niches by combining photoactivatable reporters and scRNA-seq Single-cell systems biology by super-resolution imaging and combinatorial labeling Situ Transcription Profiling of Single Cells Reveals Spatial Organization of Cells in the Mouse Hippocampus Profiling the transcriptome with RNA SPOTs Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+ Spatial transcriptome profiling by MERFISH reveals subcellular RNA compartmentalization and cell cycle-dependent gene expression Hybridization-based In Situ Sequencing (HybISS): spatial transcriptomic detection in human and mouse brain tissue. bioRxiv Highly specific multiplexed RNA imaging in tissues with split-FISH A computational framework to study sub-cellular RNA localization Localization and abundance analysis of human lncR-NAs at single-cell and single-molecule resolution Image-based transcriptomics in thousands of single human cells at single-molecule resolution Three-dimensional intact-tissue sequencing of single-cell transcriptional states Expansion microscopy Dense transcript profiling in single cells by image correlation decoding SCRINSHOT, a spatial method for single-cell resolution mapping of cell states in tissue sections Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome Highly Multiplexed Subcellular RNA Sequencing in Situ Expansion Sequencing: Spatially Precise In Situ