key: cord-0016657-qdagoqnz
authors: Swanson, Elliott; Lord, Cara; Reading, Julian; Heubeck, Alexander T; Genge, Palak C; Thomson, Zachary; Weiss, Morgan DA; Li, Xiao-jun; Savage, Adam K; Green, Richard R; Torgerson, Troy R; Bumol, Thomas F; Graybuck, Lucas T; Skene, Peter J
title: Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq
date: 2021-04-09
journal: nan
DOI: 10.7554/elife.63632
sha: 1a9514050a7066a8543507ccef7bfa38b0d32676
doc_id: 16657
cord_uid: qdagoqnz

Single-cell measurements of cellular characteristics have been instrumental in understanding the heterogeneous pathways that drive differentiation, cellular responses to signals, and human disease. Recent advances have allowed paired capture of protein abundance and transcriptomic state, but a lack of epigenetic information in these assays has left a missing link to gene regulation. Using the heterogeneous mixture of cells in human peripheral blood as a test case, we developed a novel scATAC-seq workflow that increases signal-to-noise and allows paired measurement of cell surface markers and chromatin accessibility: integrated cellular indexing of chromatin landscape and epitopes, called ICICLE-seq. We extended this approach using a droplet-based multiomics platform to develop a trimodal assay that simultaneously measures transcriptomics (scRNA-seq), epitopes, and chromatin accessibility (scATAC-seq) from thousands of single cells, which we term TEA-seq. Together, these multimodal single-cell assays provide a novel toolkit to identify type-specific gene regulation and expression grounded in phenotypically defined cell types.

Peripheral blood mononuclear cells (PBMCs) purified using gradient centrifugation are a major source of clinically relevant cells for the study of human immune health and disease (Bö yum, 1968) . Like most other human tissues, PBMCs are a complex, heterogeneous mixture of cell types derived from common stem cell progenitors (Laurenti and Gö ttgens, 2018) . Despite the genome being mostly invariant between different PBMC cell types, each immune cell type performs an important and distinct function. Understanding the genomic regulatory landscape that controls lineage specification, cellular maturation, activation state, and functional diversity in response to intra-and extracellular signals is key to understanding the immune system in both health and disease (Satpathy et al., 2019; Wang et al., 2020; Zheng et al., 2020) .

Recent improvements in single-cell genomic methods have enabled profiling of the regulatory chromatin landscape of complex cell-type mixtures. In particular, droplet-based single-nucleus or single-cell assays for transposase-accessible chromatin (snATAC-seq, scATAC-seq, dscATAC-seq, mtscATAC-seq) allow profiling of open chromatin at single-cell resolution (Buenrostro et al., 2015; Lareau et al., 2019) . Promising new methods have combined scATAC-seq with simultaneous measurement of nuclear mRNAs (e.g., sci-CAR, Cao et al., 2018; SNARE-seq, Chen et al., 2019; SHARE-seq, Ma et al., 2020) or in combination with cell surface epitopes (ASAP-seq, Mimitou et al., 2020) . However, a unifying approach for all three modalities that can be applied to highly specified functional immune cell types has yet to emerge in the landscape of single-cell methods. We systematically tested whole cell and nuclear purification and preparation methods for PBMCs to overcome limitations that restricted previous assays to measurement of only nuclear components (ATAC and nuclear RNAs) or proteins on the cell surface. We found that intact, permeabilized cells perform extremely well for scATAC-seq, exceeding conventional scATAC-seq on nuclei by some measures (Figure 1b) . This insight enables a new protocol analogous to Cellular Indexing of Transcriptomes and Epitopes (CITE-seq; Stoeckius et al., 2017) to measure both surface protein abundance and chromatin accessibility: integrated cellular indexing of chromatin landscape and epitopes (ICICLE-seq, Figures 1a and 3) . Finally, we demonstrate that our optimized permeable cell approach can be combined with a droplet-based multiomics platform to enable the simultaneous measurement of three different molecular compartments of the cell: mRNA (by scRNA-seq), protein (using oligo-tagged antibodies), and DNA (by scATAC-seq), which we term TEA-seq after Transcription, Epitopes, and Accessibility ( Figure 4) . These assays enable a new, more unified view into the molecular underpinnings of gene regulation and expression at the single-cell level.

Optimization of single-nucleus and single-cell ATAC-seq of PBMCs As a baseline for optimization and cell surface retention, we performed snATAC-seq as recommended by 10x Genomics, with a protocol based on the Omni-ATAC workflow (Corces et al., 2017) . This single-nucleus assay utilizes a combination of hypotonic lysis, detergents, and a saponin to isolate nuclei without retaining mitochondrial DNA. After performing snATAC-seq using this method, sequencing, and tabulating data quality metrics (Materials and methods), we identified two major populations of cell barcodes (Figure 1b , left panel): (1) a large number of barcodes, shown in gray, that have a low number of unique fragments and a low fraction of reads in peaks (FRIP). These barcodes contain little useful information but consume 80% of total sequenced reads (Figure 1e , non-cell barcodes) at a sequencing depth of 200 million reads per library (20,000 reads per expected barcode). (2) Barcodes with higher quality as measured by FRIP (red points) that contain enough information to attempt downstream analysis.

The loss of 80% of sequenced reads to non-cell barcodes is costly. Previous studies of scRNA-seq data have shown that cellular lysis can release ambient RNA that increases the abundance of lowquality barcodes and contaminates droplets, yielding barcodes with both cellular and ambient RNAs that reduces the accuracy of the transcriptional readout (Marquina-Sanchez et al., 2020) . We reasoned that nuclear isolation protocols may cause the release of ambient DNA, causing a similar effect in scATAC-seq datasets. Optimization of nuclear lysis protocols, especially changing to less stringent detergents, provided increased FRIP and decreased non-cell barcodes (Figure 1-figure supplement 1, Figure 1 -figure supplement 2a). Hypotonic lysis conditions used in these protocols may also be a biophysical stressor to the native chromatin state, as previously observed (Lima et al., 2018) . To reduce perturbation of chromatin and retain the cell surface for multimodal assays, we performed cell membrane permeabilization under isotonic conditions to allow access to the nuclear DNA without isolating nuclei through hypotonic lysis. The saponin digitonin was used to cause concentration-dependent selective permeabilization of cholesterol-containing membranes while leaving inner mitochondrial membranes intact, preventing high levels of Tn5 transposition in mitochondrial DNA (Adam et al., 1990; Colbeau et al., 1971) . Digitonin has previously been used for ATAC-seq assays under hypotonic conditions in Fast-ATAC (Corces et al., 2016) and plate-based scATAC-seq (Chen et al., 2018b) 

We observed that PBMCs purified by leukapheresis rather than Ficoll gradient centrifugation had consistently higher FRIP scores and fewer non-cell barcodes (Figure 1-figure supplement 2c) . A major difference between Ficoll-purified PBMCs and leukapheresis-purified PBMCs was the presence of residual neutrophils in Ficoll-purified samples. We tested removal of dead cells and debris with Figure 1 . Improvements to scATAC-seq methods to enable permeabilized cell profiling. (a) Schematic overview of major steps in snATAC, scATAC, and ICICLE-seq methods. (b) Comparison of quality control characteristics of ATAC-seq libraries generated from nuclei isolation and permeabilized cells, with and without fluorescence-activated cell sorting. Top panels show signal-to-noise as assessed by fraction of reads in peaks on the y-axis and quantity of unique fragments per cell barcode on the x-axis. Lower panels display fragment length distributions obtained from paired-end sequencing of ATAC libraries. Colored lines represent barcodes that pass QC filters; gray lines represent barcodes failing QC (non-cell barcodes). All libraries were equally downsampled to 200 million total sequenced reads for comparison. Colors in (b) are reused in remaining panels. (c) Total coverage of Tn5 footprints summed across all transcription start sites (TSS). Tn5 footprints are 29 bp regions comprising the 9 bp target-site duplication (TSD) and 10 bp on either side, which represent accessible chromatin for each transposition event. (d) Total coverage of TSD centers summed over a set of 100,000 genomic CTCF motifs found in previously published DNase-hypersensitive sites (Meuleman et al., 2020) . TSD centers are obtained by shifting +4 and 

We assessed the quantitative and qualitative differences between nuclei and permeabilized cell protocols with and without sorting by performing both protocols on a single set of input cells. To fairly compare data quality across methods, we equally downsampled raw data at the level of raw sequenced reads per well for each sample that we compare directly (see Materials and methods for details). We then utilized a uniform set of transcription start site (TSS) regions (TSS ± 2 kb) and a previously published PBMC peak set (Lareau et al., 2019 , Materials and methods) as reference regions for computation of fraction of reads in TSS (FRITSS) and FRIP, respectively. Permeabilized cells yielded many more high-quality cell barcodes than nuclear preps using equal loading of cells or nuclei (15,000 loaded, expected 10,000 captured, Table 1 ). scATAC-seq libraries prepared from nuclei contained many more reads originating from nucleosomal DNA fragments (Figure 1b , lower panels), and non-cell barcodes from nuclei (gray lines) contained more of these fragments than cell barcodes. Thus, an overabundance of mononucleosomal fragments may indicate non-cell fragment contamination. Libraries from permeabilized cells consisted almost entirely of short fragments, suggesting that permeabilization under isotonic conditions did not loosen or release native chromatin structure at the time of tagmentation (Figure 1b , lower panels). Previous bulk ATAC-seq studies have shown that differing nuclear isolation protocols lead to varying amounts of mononucleosomal fragments (Li et al., 2019a) . In agreement with in vitro experiments studying the effects of low salt on nucleosomal arrays (Allahverdi et al., 2015) , this further suggests that hypotonic lysis leads to alteration of chromatin structure, raising the possibility of artifactual measurements of accessibility in nuclei-based ATAC-seq. To assess the effect that this difference has on the data obtained by each Figure 1 continued À5 bp from the 5' and 3' ends of uniquely aligned fragments, respectively. (e) Barplot representations of the fraction of total aligned reads in various QC categories. Fragments overlapping a previously published peak set for peripheral blood mononuclear cell dscATAC-seq (Lareau et al., 2019) are in the 'Overlap Peaks' category. Unique fragments are the remaining uniquely aligned fragments that do not overlap peak regions. 'Waste' reads were not aligned or were assigned to cell barcodes with fewer than 1000 total reads. (f) Violin plots showing distributions of QC metrics. Median (wide bar) and 25th and 75th quantiles (whiskers and narrow bars) are overlaid on violin plots. Median values are also in Table 1 . Note that the y-axis of the first panel is on a logarithmic scale; remaining panels are linear. snATAC: single-nucleus assays for transposase-accessible chromatin; scATAC: single-cell assays for transposase-accessible chromatin; ICICLE-seq: integrated cellular indexing of chromatin landscape and epitopes. The online version of this article includes the following source data and figure supplement(s) for figure 1:

Source data 1. Single cell metadata and QC metrics for scATAC-seq experiments . Source data 2. Fragment size distribution data . Source data 3. TSS Footprint pileups. Source data 4. CTCF Tn5 target site duplication center pileups. Source data 5. Fraction of reads per alignment category. Figure supplement 1. Nuclei were isolated from peripheral blood mononuclear cells using two buffer compositions and varying detergent concentrations. Figure supplement 2 . Quality control plots for single-nucleus assays for transposase-accessible chromatin (snATAC-seq) and single-cell assays for transposase-accessible chromatin ( scATAC-seq) experimental conditions. method, we overlaid Tn5 footprints near TSS ( Figure 1c ) and CTCF transcription factor binding sites (TFBS, Figure 1d ). The signal at TSS was retained in permeabilized cells, but positions flanking the TSS (occupied by neighboring nucleosomes) had reduced signal compared to isolated nuclei (examined in detail in Figure 1 -figure supplement 6). At CTCF motifs, we observed nearly identical patterns of accessibility in both nuclei and permeabilized cells, suggesting that scATAC-seq signal at regulatory TFBS is retained in permeabilized cells. Neutrophil and dead cell removal improved the quality of nuclear scATAC-seq libraries, which yielded the highest number of unique fragments, reads in TSS, and reads in peaks, though at the cost of fewer high-quality barcodes captured when compared to permeabilized cells (Table 1) . Overall, permeabilized intact cells obtained by FACS had the highest FRIP and FRITSS scores, fewest non-cell barcodes, and greatest cell capture efficiency with only a modest increase in mitochondrial reads (Table 1 , Figure 1e , f).

We next examined the effect of methodological differences on downstream biological analyses (Figure 2) . Removal of neutrophils greatly improved the ability to separate various cell types in uniform manifold approximation and projection (UMAP) projections of both nuclei and cells (Figure 2a, b) .

To provide ground truth for label transfer, we performed flow cytometry on an aliquot of the same PBMC sample used for scATAC-seq above. A panel of 25 antibodies (Supplementary file 3) was used to determine the proportion of each of the 12 cell types used to label the scATAC-seq cells in the PBMC sample (Figure 2-figure supplement 1 and Supplementary file 4). Label transfer was performed using the ArchR package to generate gene scores that enabled label transfer from a reference scRNA-seq dataset using the method provided in the Seurat package ) (Materials and methods). Using these tools, removal of neutrophils improved label transfer scores, and permeabilized cells yielded more cells with high label transfer scores than nuclei-based approaches (Figure 2b , c). In addition, permeabilized cells provided labels most similar to the cell-type proportions identified by flow cytometry (Figure 2d) , with identification of CD8 effector cells only observed in scATAC-seq with permeabilized cells. All methods yielded fewer CD16+ monocytes than observed by flow cytometry, suggesting that CD16+ monocytes may be lost during scATAC-seq using either nuclei or permeabilized cells, or that label transfer methods were not conducive to identifying this cell type ( Figure 2d ). After labeling cell types, we used ArchR to call peaks for each cell type and perform pairwise tests of differential accessibility between each pair of cell types ( Figure 2-figure supplement 2a) . We found many more differentially accessible sites in both cells and nuclei after removal of neutrophils. Differential accessibility was also used to identify differentially enriched TFBS motifs in each cell type ( Figure 2-figure supplement 2b) . Without neutrophil removal (nuclei unsorted, top panel), we were unable to identify significantly enriched motifs in B cells and NK cells that were readily apparent in data from clean nuclei or permeabilized cells (bottom two panels). Together, these results demonstrate that neutrophil removal and the use of permeabilized cells allow for identification of specific cell types and TFBS motifs that are involved in regulation of gene expression. For all metrics to the right of median N fragments, both the absolute number and a percentage are provided. Median % mitochondrial and median % unique were calculated as a fraction of total fragments; % in TSS and % in peaks were calculated as a fraction of unique fragments. TSS: transcription start sites; FACS: fluorescence-activated cell sorting.

Under standard scATAC-seq protocols, removal of the cell membrane severs the connection between the cell surface and the chromatin state of cells. To test our ability to measure cell surface proteins and chromatin state simultaneously on permeabilized cells, we modified our optimized permeabilized cell scATAC-seq methodology to incorporate measurements using commercially available barcoded antibody reagents, and we term this new method ICICLE-seq . Antibody-derived tags (ADTs) from oligo-antibody conjugates and ATAC-seq fragments can then be selectively amplified by PCR to generate separate libraries for sequencing ( Figure 3-figure supplement 1) . Due to the nature of fragment capture in this system, we obtain both a cell barcode and a single-end scATAC-seq read from the two ends of the paired-end sequencing reaction. We performed ICICLE-seq on a leukapheresis-purified PBMC sample using a 46-antibody panel (Supplementary file 6) and were able to obtain 10,227 single cells with both scATAC-seq and ADT data from three capture wells that passed adjusted QC criteria: >500 unique ATAC fragments (median = 761), FRIP >0.65 (median = 0.725). Cells passing ATAC QC had a median of 3871 ADT unique molecule indexes (UMIs) per cell (Figure 3-figure supplement 2b) . UMAP projection and ATAC label transfer on ICICLE-seq data had resolution similar to scATAC-seq on intact permeabilized cells after dead cell and debris removal ( Figure 3b ). However, the data quality was limited by the poly-A-based capture method and the single-end readout of transposed fragments. Nonetheless, the ICICLE-seq results demonstrated that permeabilized cells enabled simultaneous capture of chromatin accessibility in the nucleus and highquality cell surface epitope quantification. We were able to leverage the additional ADT data to Max. cluster median 1.7 10 2 1.3 10 3 6.7 10 2 1.8 10 2 3.0 10 2 3.0 10 0 9.0 10 0 3.4 10 1 2.1 10 2 1.0 10 3 1.3 10 2 10 2 4.7 10 2 1.8 10 2 1.0 10 2 1.4 10 1 9.3 10 1 2.7 10 1 1.9 10 2 1.2 10 3 10 3 2.2 10 3 2.3 10 2 10 2 10 1 1.2 10 2 1.3 10 3 8.0 10 1 1.2 10 3 1.4 10 1 10 1 3.0 10 2 8.7 10 2 1.2 10 2 1.4 10 2 10 1 1.8 10 2 10 1 8. 0 Source data 1. Cell type labels and UMAP coordinates for ICICLE-seq cells. Source data 2. ADT count data for ICICLE-seq. Trimodal measurement of transcripts, epitopes, and accessibility with TEA-seq With the release of a commercially available platform for simultaneous capture of RNA-seq and ATAC-seq from single nuclei, we reasoned that permeabilized cells could be used to perform simultaneous capture of three major molecular compartments: DNA can be captured using scATAC-seq, RNA could be captured using scRNA-seq, and protein epitope abundance can be captured using polyadenylated antibody barcodes, which we term TEA-seq after Transcripts, Epitopes, and Accessibility. After trials and optimization of key steps, we were able to obtain libraries on the 10x Genomics Multiome ATAC plus Gene Expression platform that combined all three of these measurements for thousands of single cells ( Figure 4a ) using a panel of 46 oligo-tagged antibodies (Supp Table 6 ). After initial data processing for cells loaded into four wells, we identified 29,264 cell barcodes that passed the QC criteria for scATAC-seq described above and had >2,500,500 unique ATAC-seq fragments (median = 8762 unique ATAC fragments, Figure 4b , d) and had >500 genes detected by scRNA-seq (median = 2399 RNA UMIs; median = 1,249,249 genes detected, Figure 4b ), and >500 ADT UMIs (median = 1171 ADT UMIs, Figure 4c ). Additional QC metrics are provided in Figure 4 figure supplement 1.

With the addition of scRNA-seq data, we were able to perform cell-type label transfer from RNA to RNA using Seurat label transfer rather than cross-modality transfer from RNA to ATAC (Materials and methods). For each modality, we were able to perform dimensionality reduction and UMAP projection to get separate views of the relationships between cells (Figure 4e) , as for ICICLE-seq, above (Figure 3a, b) . However, these views do not leverage the strength of linked multimodal measurements. To begin to take advantage of our tri-modal measurements, we extended a method recently described by Hao and Hao et al. for paired weighted nearest-neighbors (WNN) analysis to allow an arbitrary number of simultaneously measured modalities to contribute to the WNN network (Materials and methods). After applying this three-way WNN, we could generate a UMAP embedding with contributions from all three simultaneously measured modalities (Figure 4f ), which enhanced cell-type separation. We found that marker expression is consistent across all three modalities for some markers, such as CD8A, though we found that consistency across all modalities is not a universal characteristic of functional markers ( 

Identification of putative cis-regulatory modules (CRMs) that control the functional state of cells is a distinct advantage of directly linked measurements of chromatin accessibility and gene expression (Cao et al., 2018; Ma et al., 2020) . This linkage is dependent on reliable detection of gene (or protein) expression, which is greatly enhanced by the high sensitivity of epitope detection in methods like CITE-seq Stoeckius et al., 2017) , and now TEA-seq. Because TEA-seq allows direct linkage of chromatin accessibility to both gene expression and protein detection, we used these complementary modalities to investigate peak-to-gene (and peak-to-protein) correlations among the proteins we measured in our TEA-seq epitope panel. We leveraged the analytical capabilities of the ArchR package to compute peak-to-gene correlation , then substituted the more sensitive measurements afforded by ADT counts to compute peak-to-protein correlations. In cases where both gene and protein expression were reliably detected, these comparisons were confirmatory -many of the same putative CRMs that were identified in each comparison ( Figure 4-figure supplement 3) . For some targets, however, more sensitive protein abundance measures allowed detection of many additional correlated CRMs. We present two examples -one of CCR2/CD192 (Figure 4g , h), a chemokine receptor expressed on monocytes (Tsou et al., 2007) ADT Only and CD38 (Figure 4i , j), a glycoprotein expressed on the surface of multiple immune cell types (Ferrero and Malavasi, 2002) , and a target of therapies for multiple myeloma (Dima et al., 2020; Sanchez et al., 2016) . In both cases, peak-to-protein correlations identified many CRMs were missed with peak-to-gene correlation alone. The converse is also observed in our experiment: reliably detected genes may enable detection of some regulatory regions that are not identified using protein detection alone -and peak-to-gene links based on scRNA-seq can be generated genomewide for all expressed genes, whether or not they are targeted by the epitope panel.

To perform a direct assessment of TEA-seq data quality relative to previously published assays, we performed a comparison to standalone 10x scATAC-seq on permeabilized cells, CITE-seq using the 10x 3 0 scRNA-seq kit, and both unstained nuclei and unstained permeabilized cells on the 10x Multiome kit (RNA + ATAC). All samples originated from the same pool of thawed and FACS sorted PBMCs (with dead cells, debris, and neutrophils removed). Separate aliquots were taken for scA-TAC-seq and the two unstained Multiome kit preps, while CITE-seq and TEA-seq samples were taken from the same pool of ADT-stained PBMCs (Figure 4-figure supplement 4a). In each case, 15,000 cells were used as input, and we downsampled a single well from the resulting libraries to equal sequencing depth for a fair comparison (200 Â 10 6 total fragments per well for RNA and ATAC libraries; 48 Â 10 6 for ADT libraries). We performed four wells of TEA-seq to generate a large reference dataset (shown in Figure 4 ), and one well for each other assay. For all comparisons presented in Figure 4 -figure supplement 4, we used a single well of TEA-seq. In these direct comparisons, we found that TEA-seq results appeared qualitatively similar to CITE-seq and scATAC-seq across assays ( Figure 4 -figure supplement 4b, e, i). The alignment method used for these datasets allowed us to quantify the number of RNA UMIs that aligned to intronic vs. exonic regions of the transcriptome. We found that the fraction of reads mapping to exons was lower than that found in whole-cell CITE-seq data, similar to that seen using a lysed-cell nuclei preparation (median fraction of reads with exonic mapping: TEA-seq, 51.8%, Nuclei Multiome, 52.2%; CITE-seq, 69.5% . We found that gene detection was most like nuclei and lower in many cases than CITE-seq. One group of genes stands out as (d) scATAC-seq QC scatterplot comparing unique scATAC-seq fragments and fraction of reads in peaks scores for each cell barcode. n = 34,757,757 total cells are displayed (those with >1,000,000 unique ATAC fragments); 29,264 passing QC criteria are represented by purple points. (e) Uniform manifold approximation and projection (UMAP) projections generated using each of the three modalities separately. Only cells passing QC (n = 7,939,939 barcodes) are presented in (e, f). (f) A joint UMAP projection generated using three-way weighted nearest neighbors that leverages all three of the measured modalities. (g) Scatterplot showing the peak-to-RNA correlations (x-axis) and peak-to-protein (y-axis) correlation values for each peak (points) that was found to be correlated with the CCR2 gene or CD192 antibody in TEA-seq data. Histograms at the margins show the distribution of scores for RNA correlations (top) and protein correlations (right). Dashed line shows 1:1 correspondence. Peaks not found to be correlated in each method were assigned a score of 0. (h) Genome tracks showing links between peaks (red hashes) and the CCR2 gene based on protein expression (above peaks) and gene expression (below peaks). Correlations are represented by arcs colored based on the correlation score (color scale for both panels to the left). The bottom panel shows the gene neighborhood around CCR2. All coordinates are from the Hg38 genome assembly. (i), as in (g), for the CD38 gene and correlated peaks. (j), as in (h), for the CD38 gene locus. The online version of this article includes the following source data and figure supplement(s) for figure 4:

Source data 1. Single cell quality metrics for TEA-seq samples. Source data 2. Cell type labels and UMAP coordinates for TEA-seq samples. Source data 3. Peak to gene and peak to protein link correlations. particularly low, originating from ribosomal gene transcripts (with gene symbols starting with RPL and RPS, highlighted in blue). In comparisons of scATAC-seq metrics, we found that TEA-seq performed comparably to standalone permeabilized cell scATAC-seq or Multiome RNA + ATAC using permeabilized cells, with higher FRIP than purified nuclei used for the same assay ( Figure 4 -figure supplement 4e, f). Finally, in comparisons of protein detection by ADTs, we found that antibody performance in TEA-seq varied by target. Many bimodal distributions were retained (e.g., CD4, CD8a, CD86), while some separations were less apparent (e.g., CD45RO and CD45RA; Figure 4 figure supplement 4h), resulting in reduced separation in UMAP analysis for some types (i.e., naïve vs. mature CD4 cells), though most major cell types were able to be separated (Figure 4 -figure supplement 4i). Together, these results show that TEA-seq data is of high quality but may not always be the method of choice if a particular modality is of greatest interest to researchers. The strength of TEA-seq is its ability to measure all three aspects -transcription, epitopes, and accessibility -simultaneously on each cell, enabling deeper insights into the connections between regulation, expression, and function as demonstrated in Figure 4 .

Multimodal data collection from permeabilized PBMCs will be of use to many researchers in the immunology field and beyond who seek to get the most high-quality data from precious clinical samples and to facilitate cross-datastream integration with flow cytometry. We found that isotonic cell permeabilization allows the generation of scATAC-seq libraries with high quality as measured by FRIP and low nucleosomal content, suggesting that chromatin state in the nucleus is unperturbed ( Figure 1, Figure 2 ). While a previous study utilized FACS followed by scATAC on individually sorted cells (Pi-ATAC; Chen et al., 2018c) , the use of permeabilized cells enables simultaneous interrogation of chromatin accessibility state in the nucleus and the functional state of cells based on their cell surface proteins (ICICLE-seq, Figure 3 ) together with mRNA (TEA-seq, Figure 4 ) at unprecedented scale for the first time. Our methods utilize permeabilized cells to allow the addition of protein measurements that provide a direct link between high-quality scRNA-seq data and scATAC-seq data using truly paired methods extending beyond SNARE-seq (Chen et al., 2019) and SHARE-seq by taking advantage of 10x Genomics Single Cell Multiome sequencing reagents as a platform for nucleic acid capture. The use of a droplet-based method allows multimodal measurements at a much higher scale by capturing thousands of single cells at a time. We expect that the datasets generated by our study will be useful beyond researchers studying PBMC biology, including those seeking to develop new analytical tools to remove background noise from existing sn/scA-TAC-seq data (e.g., SCALE, Xiong et al., 2019 , or scOpen, Li et al., 2019b and tools to accurately identify links across modalities (e.g., AI-TAC, Tasaki et al., 2020) .

We have focused on PBMCs during this initial development and demonstration of the utility of TEA-seq. PBMCs are a widely used source of clinically relevant heterogeneous cell populations for monitoring clinical disease status and research. As such, TEA-seq optimized for PBMCs will provide unprecedented biological insight into the molecular underpinnings of human immune health and disease. It is likely that the use of TEA-seq in other cell types will require sample-specific optimization. For example, the requirement for neutrophil depletion may be limited to studies related to the immune system or immune-infiltrated tissues. However, the value of debris and dead cell removal is likely to be applicable to disaggregated tissue samples from biopsies (Donlin et al., 2018) . Likewise, researchers should take care when assessing fragment length distributions as a quality control metric. In PBMCs, we found that high abundance of mononucleosomal fragments may indicate neutrophil-driven background, but this indicator of poor data quality may not be observed in other cell types or tissues. Early versions of nuclei-based ATAC-seq suffered issues high levels of mitochondrial reads (Corces et al., 2017) , which may be used to profile clonal populations based on their mitochondrial genomes . Here, we have used low concentrations of digitonin to selectively permeabilize cholesterol-containing membranes, thereby leaving the inner mitochondrial membrane intact (Adam et al., 1990; Colbeau et al., 1971) . Different cell types have been shown to have differing sensitivity to digitonin concentrations, perhaps reflecting natural variance in membrane cholesterol composition (Holden and Horton, 2009) . As such, cell permeabilization for TEAseq may need to be optimized for other cell types. This can be readily tested using bulk ATAC-seq and by measuring library complexity, FRIP score, and mitochondrial contamination to determine data quality.

In addition to the Methods presented in this article, we are maintaining an online protocol to aid the research community in further development and adoption of TEA-seq on the Protocols.io platform (https://doi.org/10.17504/protocols.io.bqagmsbw). Additional development could allow simultaneous measurement of additional aspects of cell biology (e.g., CpG sequencing methods, scCUT&Tag; Bartosovic et al., 2020) , as well as improvements to the modalities in the TEA-seq trio (e.g., s3-ATAC, Mulqueen et al., 2021) . In direct comparisons of TEA-seq to existing methods (CITE-seq and scATAC-seq), we found that the sensitivity of gene detection was lower and distributions of antibody detections were altered relative to CITE-seq; however, ATAC-seq data quality remained comparable to the standalone assay ( Figure 4-figure supplement 4) . We anticipate that simultaneous, truly multimodal measurement across molecular compartments will be an essential tool to expand our view of the full picture of immune cell state in health and disease, with many applications extending throughout the genomics research community. 

Biological specimens were purchased from BioIVT as cryopreserved PBMCs and Bloodworks NW as freshly drawn whole blood. All sample collections were conducted by BioIVT and Bloodworks NW under IRB-approved protocols, and all donors signed informed consent forms. See Supplementary file 7 for a list of sources and samples used for data displayed in each figure.

PBMCs sourced from BioIVT were isolated using either Ficoll-Paque or leukapheresis. Following isolation, PBMCs were subjected to RBC lysis, washing, and counting. PBMC aliquots were cryopreserved in Cryostor CS10 (StemCell Technologies, 07930) and stored in vapor phase liquid nitrogen.

For fresh blood samples from Bloodworks NW, PBMC processing occurred in-house. Blood tubes were pooled, gently swirled until fully mixed, about 30 times, and diluted with an equivalent volume of room temperature PBS (Thermo Fisher Scientific, 14190235). PBMCs were isolated using one or more Leucosep tubes (Greiner Bio-One, 227290) loaded with 15 mL of Ficoll Premium (GE Healthcare, 17-5442-03) to which a 3 mL cushion of PBS had been slowly added on top of the Leucosep barrier. Diluted whole blood (24-30 mL) was slowly added to each tube and spun at 1000Âg for 10 min at 20˚C with no brake (Beckman Coulter Avanti J-15RIVD with JS4.750 swinging bucket, B99516). PBMCs were recovered from the Leucosep tube by quickly pouring all volume above the barrier into a sterile 50 mL conical tube (Corning, 352098). A 15 mL cold phosphate-buffered saline (PBS) +0.2% bovine serum albumin (BSA) (Sigma, A9576; 'PBS+BSA') was added and the cells were pelleted at 400Âg for 5-10 min at 4-10˚C. The supernatant was quickly decanted, the pellet dispersed by flicking the tube, and the cells washed with 25-50 mL cold PBS+BSA. Cell pellets were combined as needed, the cells were pelleted as before, supernatant quickly decanted, and residual volume was carefully aspirated. PBMCs were resuspended in 1 mL cold PBS+BSA per 15 mL whole blood processed and counted with a ViCell (Beckman Coulter) using VersaLyse reagent (Beckman Coulter, A09777) or with a Cellometer Spectrum Cell Counter (Nexcelom) using ViaStain acridine orange/propidium iodide solution (Nexcelom, C52-0106-5). PBMCs were cryopreserved in Cryo-stor10 (StemCell Technologies, 07930) or 90% fetal bovine serum (FBS, Thermo Fisher Scientific, 10438026)/10% dimethyl sulfoxide (DMSO, Fisher Scientific, D12345) at 5 Â 10 6 cells/mL by slow freezing in a Coolcell LX (VWR, 75779-720) overnight in a À80˚C freezer followed by transfer to liquid nitrogen.

Cryopreserved PBMCs were removed from liquid nitrogen storage and thawed in a 37˚C water bath for 3-5 min until no ice was visible. Cells were diluted to 10 mL in 37˚C AIM V medium (Gibco, 12055091) with the first 3 mL added dropwise. Cells were then washed once with 10 mL DPBS without calcium and magnesium (Corning, 21-031 CM) supplemented with 0.2% w/v BSA (Sigma-Aldrich, A2934). Cells were counted on a Cellometer Spectrum Cell Counter (Nexcelom) using ViaStain acridine orange/propidium iodide solution (Nexcelom, C52-0106-5) and stored on ice.

To remove dead cells, debris, and neutrophils, PBMC samples were sorted by FACS prior to nuclei isolation or cell permeabilization. Cells were incubated with Fixable Viability Stain 510 (BD, 564406) for 15 min at room temperature and washed with AIM V medium (Gibco, 12055091) plus 25 mM HEPES before incubating with TruStain FcX (BioLegend, 422302) for 5 min on ice, followed by staining with anti-CD45 (BioLegend, 304038) and anti-CD15 (BD, 562371) antibodies for 20 min on ice. Cells were washed with AIM V medium plus 25 mM HEPES and sorted on a BD FACSAria Fusion. A standard viable CD45+ cell gating scheme was employed; FSC-A vs. SSC-A (to exclude sub-cellular debris), two FSC-A doublet exclusion gates (FSC-W followed by FSC-H), dead cell exclusion gate (BV510 LIVE/DEAD negative) followed by CD45+ inclusion gate. Neutrophils (defined as SSC high , CD15+) were then excluded in the final sort gate (Figure 1-figure supplement 3 ). An aliquot of each post-sort population was used to collect 50,000 events to assess post-sort purity.

Bead-based neutrophil depletion was performed using a biotin conjugated monoclonal anti-CD15 antibody in combination with streptavidin-coated magnetic beads. A high neutrophil content (approximately 1.1%) Ficoll-isolated PBMC sample was processed to evaluate efficacy, and a low neutrophil leukapheresis-isolated PBMC sample was processed to control for off-target effects. Briefly, 1 Â 10 7 PBMCs were resuspended in 100 mL of chilled DPBS without calcium and magnesium (Corning, 21-031 CM) supplemented with 0.2% w/v BSA (Sigma-Aldrich, A2934). A 10 mL TruStain FcX (BioLegend 422302) was added to the cell suspension, mixed by pipette, and incubated on ice for 10 min. Anti-CD15 antibody (BioLegend, 301913) was added to the cell suspension, mixed by pipette, and incubated on ice for 30 min. Following antibody binding, 25 mL of Dynabeads MyOne Streptavidin T1 magnetic beads (Invitrogen, 65601) was added to the cell suspension, mixed by pipette, and incubated at room temperature for 5 min. The cell suspension was then diluted with 900 mL of room temperature Dulbecco's phosphate-buffered saline (DPBS) +0.2% w/v BSA and placed on an EasySep magnet (Stemcell Technologies, 18103) for 3 min. The supernatant (approximately 1 mL) was transferred to a new tube and stored on ice until further processing.

Non-depleted and neutrophil-depleted PBMCs from each sample were analyzed by flow cytometry using an eight-color panel to assess the effects of the bead-based depletion on major PBMC populations. For each sample and condition, 1 Â 10 6 cells were centrifuged (750Âg for 5 min at 4˚C) using a swinging bucket rotor (Beckman Coulter Avanti J-15RIVD with JS4.750 swinging bucket, B99516), the supernatant was removed using a vacuum aspirator pipette, and the cell pellet was resuspended in 100 mL of DPBS without calcium and magnesium (Corning, 21-031 CM) supplemented with 0.2% w/v BSA (Sigma-Aldrich, A2934). Cells were incubated with Fixable Viability Stain 510 (BD, 564406) and TruStain FcX (BioLegend, 422302) for 30 min on ice, and washed in chilled FACS buffer (DPBS, 0.2% w/v BSA, 0.1% sodium azide; VWR, BDH7465-2). Cells were stained with a cocktail of antibodies (Supplementary file 1) including 10 mL of Brilliant Stain Buffer Plus (BD, 566385) at a staining volume of 100 mL for 30 min on ice, then washed twice with chilled FACS buffer. Cells were passed through 35 mm Falcon Cell Strainers (Corning, 352235) and analyzed on a BD FACS Symphony flow cytometer. Gating analysis was performed using FlowJo cytometry software (version 10.7).

A sequential gating scheme was used to identify viable singlet CD45+/CD15+/CD16+ neutrophils: (1) a time vs. SSC-A gate (to confirm that no abnormalities occurred in the fluidics), (2) a FSC-A vs. SSC-A gate (to exclude sub-cellular debris), (3) two FSC-A doublet exclusion gates (FSC-W followed by FSC-H), (3) a dead cell exclusion gate (BV510 LIVE/DEAD negative), and (4) a CD45+ inclusion gate. Neutrophils were defined as either SSC-A high /CD15+ or CD15+/CD16+. The neutrophil population defined by SSC-A high /CD15+ was larger than that defined by CD15+/CD16+ due to the presence of some contaminating CD15 low monocytes. Therefore, we used the CD45+/CD15+/CD16 + gate for subsequent analysis including summary statistics (Figure 1-figure supplement 4 and Supplementary file 2).

Isolation of nuclei suspensions was performed according to the Demonstrated Protocol: Nuclei Isolation for Single Cell ATAC Sequencing (10x Genomics, CG000169 Rev C). Briefly, 8 Â 10 5 to 1 Â 10 6 cells were added to a 1.5 mL low binding tube (Eppendorf, 022431021) and centrifuged (300Âg for 5 min at 4℃) using a swinging bucket rotor (Beckman Coulter Avanti J-15RIVD with JS4.750 swinging bucket, B99516). The supernatant was removed using a vacuum aspirator pipette, and the cell pellet was resuspended in 100 mL of chilled 10x Genomics Nuclei Isolation Buffer (10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl 2 , 0.1% Tween-20, 0.1 % NP-40 Substitute CAS 9016-45-9 [BioVision 2127-50], 0.01% digitonin [MP Biomedicals 0215948082], 1% BSA) by pipette-mixing 10 times. Cells were incubated on ice for 3 min, followed by dilution with 1 mL of chilled 10x Wash Buffer (10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl 2 , 0.1% Tween-20 [Bio-Rad 1610781], 1% BSA) by pipette-mixing five times. Nuclei were centrifuged (500Âg for 3 min at 4˚C), and the supernatant was slowly removed using a vacuum aspirator pipette. Nuclei were resuspended in chilled 1x Nuclei Buffer (10x Genomics, 2000207) to a target concentration of 3000-6000 nuclei per mL. Nuclei suspensions were passed through 35 mm Falcon Cell Strainers (Corning, 352235) and counted on a Cellometer Spectrum Cell Counter (Nexcelom) using ViaStain acridine orange/propidium iodide staining solution (Nexcelom, C52-0106-5).

In addition to 10x Nuclei Isolation Buffer (10xNIB), we tested an alternative Nuclei Isolation Buffer (ANIB) as described previously (Mulqueen et al., 2019) 10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl 2 , 0.1% Tween-20, 0.1% IGEPAL CAS 9002-93-1 (Sigma, I8896), 1x Protease Inhibitor (Roche, 11836170001) . For each buffer, we generated a titration series of detergent concentrations relative to the concentrations described above but did not alter the concentration of other buffer ingredients: 1Â, 0.5Â, 0.25Â, 0.1Â for 10ÂNIB, and 1Â and 0.1Â for ANIB. The resulting nuclei were imaged using an EVOS M5000 Imaging System (Thermo Fisher Scientific, AMF5000) in transmitted light mode at Â40 magnification to visually evaluate nuclear integrity (Figure 1-figure supplement  1) . The 1 Â 10 ÂNIB, 0.25 Â 10 ÂNIB, 0.1 Â 10 ÂNIB, and 1Â ANIB were used for 10X scATAC-seq (Figure 1-figure supplement 2 ).

We prepared a 5% w/v digitonin stock by diluting powdered digitonin (MP Biomedicals, 0215948082) with 100% DMSO (Fisher Scientific, D12345) and creating 20 mL aliquots, which were stored at À20˚C. To permeabilize, 1,000,000 cells were added to a 1.5 mL low binding tube (Eppendorf, 022431021) and centrifuged (400Âg for 5 min at 4˚C) using a swinging bucket rotor (Beckman Coulter Avanti J-15RIVD with JS4.750 swinging bucket, B99516). The supernatant was removed using a vacuum aspirator pipette, and the cell pellet was resuspended in 100 mL of chilled isotonic Perm Buffer (20 mM Tris-HCl pH 7.4, 150 mM NaCl, 3 mM MgCl 2 , 0.01% digitonin) by pipette-mixing 10 times. Cells were incubated on ice for 5 min, after which they were diluted with 1 mL of isotonic Wash Buffer (20 mM Tris-HCl pH 7.4, 150 mM NaCl, 3 mM MgCl 2 ) by pipette-mixing five times. Cells were centrifuged (400Âg for 5 min at 4˚C) using a swinging bucket rotor, and the supernatant was slowly removed using a vacuum aspirator pipette. The cell pellet was resuspended in chilled TD1 buffer (Illumina, 15027866) by pipette-mixing to a target concentration of 2300-10,000 cells per mL. Cells were passed through 35 mm Falcon Cell Strainers (Corning, 352235) and counted on a Cellometer Spectrum Cell Counter (Nexcelom) using ViaStain acridine orange/propidium iodide solution (Nexcelom, . For optimization, we used varying final digitonin concentrations in the Perm Buffer: 0.01% w/v, 0.05% w/v, 0.1% w/v, and 0.2% w/v. The optimal concentration observed was 0.01% w/v. snATAC-seq and scATAC-seq sn/scATAC-seq library preparation scATAC-seq libraries were prepared according to the Chromium Single Cell ATAC v1.1 Reagent Kits User Guide (CG000209 Rev B) with several modifications. In total, 15,000 cells or nuclei were loaded into each tagmentation reaction. Nuclei were brought up to a volume of 5 mL in 1x Nuclei Buffer (10x Genomics, 2000207), mixed with 10 mL of a transposition master mix consisting of ATAC Buffer B (10x Genomics, 2000193) and ATAC Enzyme (Tn5 transposase; 10x Genomics, 2000123). Permeabilized cells were brought up to a volume of 9 mL in TD1 buffer (Illumina, 15027866) and mixed with 6 mL of Illumina TDE1 Tn5 transposase (Illumina, 15027916) . Transposition was performed by incubating the prepared reactions on a C1000 Touch thermal cycler with 96-Deep Well Reaction Module (Bio-Rad, 1851197) at 37˚C for 60 min, followed by a brief hold at 4˚C. A Chromium NextGEM Chip H (10x Genomics, 2000180) was placed in a Chromium Next GEM Secondary Holder (10x Genomics, 3000332), and 50% glycerol (Teknova, G1798) was dispensed into all unused wells. A master mix composed of Barcoding Reagent B (10x Genomics, 2000194), Reducing Agent B (10x Genomics, 2000087) , and Barcoding Enzyme (10x Genomics, 2000125) was then added to each sample well, pipette-mixed, and loaded into row 1 of the chip. Chromium Single Cell ATAC Gel Beads v1.1 (10x Genomics, 2000210) were vortexed for 30 s and loaded into row 2 of the chip, along with Partitioning Oil (10x Genomics, 2000190) in row 3. A 10x Gasket (10x Genomics, 370017) was placed over the chip and attached to the Secondary Holder. The chip was loaded into a Chromium Single Cell Controller instrument (10x Genomics, 120270) for GEM generation. At the completion of the run, GEMs were collected and linear amplification was performed on a C1000 Touch thermal cycler with 96-Deep Well Reaction Module: 72˚C for 5 min, 98˚C for 30 s, 12 cycles of 98˚C for 10 s, 59˚C for 30 s, and 72˚C for 1 min.

GEMs were separated into a biphasic mixture through addition of Recovery Agent (10x Genomics, 220016), and the aqueous phase was retained and removed of barcoding reagents using Dynabeads MyOne SILANE (10x Genomics, 2000048) and SPRIselect reagent (Beckman Coulter, B23318) bead clean-ups. Sequencing libraries were constructed by amplifying the barcoded ATAC fragments in a sample indexing PCR consisting of SI-PCR Primer B (10x Genomics, 2000128), Amp Mix (10x Genomics, 2000047) , and Chromium i7 Sample Index Plate N, Set A (10x Genomics, 3000262) as described in the 10x scATAC User Guide. Amplification was performed in a C1000 Touch thermal cycler with 96-Deep Well Reaction Module: 98˚C for 45 s, for 9-11 cycles of 98˚C for 20 s, 67˚C for 30 s, 72˚C for 20 s, with a final extension of 72˚C for 1 min. Final libraries were prepared using a dual-sided SPRIselect size-selection cleanup. SPRIselect beads were mixed with completed PCR reactions at a ratio of 0.4Â bead:sample and incubated at room temperature to bind large DNA fragments. Reactions were incubated on a magnet, and the supernatant was transferred and mixed with additional SPRIselect reagent to a final ratio of 1.2Â bead:sample (ratio includes first SPRI addition) and incubated at room temperature to bind ATAC fragments. Reactions were incubated on a magnet, the supernatant containing unbound PCR primers and reagents was discarded, and DNAbound SPRI beads were washed twice with 80% v/v ethanol. SPRI beads were resuspended in Buffer EB (Qiagen, 1014609), incubated on a magnet, and the supernatant was transferred, resulting in final, sequencing-ready libraries.

Final libraries were quantified using a Quant-iT PicoGreen dsDNA Assay Kit (Thermo Fisher Scientific, P7589) on a SpectraMax iD3 (Molecular Devices). Library quality and average fragment size were assessed using a Bioanalyzer (Agilent, G2939A) High Sensitivity DNA chip (Agilent, . Libraries were sequenced on the Illumina NovaSeq platform with the following read lengths: 51 nt read 1, 8 nt i7 index, 16 nt i5 index, and 51 nt read 2.

Demultiplexing of raw base call files into FASTQ files was performed using 10x cellranger-atac mkfastq (10x Genomics v.1.1.0). To assess samples at an equal sequencing depth, FASTQ files were downsampled to a uniform total raw read count among compared samples: 2 Â 10 8 fragments for comparison of nuclei and cells across FACS conditions (Figures 1 and 2) ; 1.25 Â 10 8 fragments for optimization experiments (Figure 1-figure supplement 2) due to lower available total read depth after sequencing. A 10x cellranger-atac count was used to process sequencing reads by performing adapter trimming and sequence alignment to the GRCh38 (hg38) reference genome (refdata-cellranger-atac-GRCh38-1.1.0). The output files fragments.tsv.gz and singlecell.csv were utilized for downstream processing and quality control analysis.

To evaluate quality control metrics across all scATAC-seq datasets, we utilized bedtools (v2.29.1) and GNU parallel (Tange, 2011) v20161222 to generate overlap counts and feature count matrices for a panel of reference genomic regions: 518,766 peaks from a previous study of PBMCs by scA-TAC-seq (Lareau et al., 2019 ; supplementary file GSE123577_pbmc_peaks.bed.gz from GEO accession GSE123577) were converted from hg19 to hg38 coordinates using the UCSC liftOver tool (Hinrichs et al., 2006 ; kent source v402) and used to compute a standardized fraction of reads in peaks score for cells in each dataset (FRIP); 33,496 transcription start site regions (TSS ±2 kb) from Hg38 ENSEMBL release 93 (Yates et al., 2020) were filtered to select genes used in the 10x Genomics cellranger GRCh38 reference for scRNA-seq (refdata-cellranger-GRCh38-3.0.0) and used to compute the fraction of reads in TSS (FRITSS); and a set of 3,591,898 reference DNase hypersensitive sites from ENCODE (Meuleman et al., 2020 ; ENCODE file ID ENCFF503GCK) were used to assess distal regulatory element accessibility. In addition, we generated tiled window counts across the genome in 5k, 20k, 100k bins.

Custom R scripts were used to assess and filter preprocessed scATAC-seq data along a variety of quality metrics. Cells or nuclei with >1,000,000 uniquely aligned fragments, FRIP >0.2, FRITSS > 0.2, and fraction of fragments overlapping ENCODE reference regions > 0.5 were retained for downstream analysis. Cells or nuclei that passed these QC cutoffs were used to generate sparse count matrices and filtered fragments.tsv.gz files for downstream analysis.

To examine aggregate TSS accessibility, we selected fragments from fragments.tsv.gz that overlapped TSS regions describe above (TSS ±2 kb). For plotting, fragments were separated using cell barcodes (and fragment length in the case of Figure 1-figure supplement 5) into separate groups. Fragment positions were converted to positions relative to TSS (sensitive to transcript strand orientation), and the number of fragments overlapping each position was calculated.

To examine CTCF motif accessibility, CTCF motif locations were obtained from genome-wide motif scans of non-redundant TF motifs (Vierstra, 2020 ; https://resources.altius.org/jvierstra/projects/motif-clustering/releases/v1.0). Motifs were filtered to select CTCF motifs that overlapped ENCODE reference regions (Meuleman et al., 2020 ; ENCODE file ID ENCFF503GCK). Selected were ranked by their MOODS match, and the top 100,000 motifs were selected for analysis. Motif locations were expanded to a total of 4 kb centered on the middle of each CTCF motif (using the resize function from GenomicRanges in R). Fragments from cells that passed QC filtering were converted to target site duplication (TSD) center positions (+5 bp from the 5' end and À4 bp from the 3' end of each fragment). All TSD centers that overlapped expanded CTCF motif regions were selected, and the number of TSD centers that overlapped each position relative to the CTCF motif was calculated (sensitive to CTCF motif strand orientation).

To assess sequencing library saturation, we used the fragments.tsv.gz output file from cellrangeratac, which provides the observed count of each unique fragment position, to calculate how many fragments were observed with each frequency. This count-of-counts table was used as input for the R package preseqR function preseqR.rSAC.bootstrap, which predicts the number of unique species (unique ATAC fragments in our case) that are represented at least r times based on an initial sample (Daley and Smith, 2013; Deng et al., 2018) . For each dataset, we ran 100 rounds of bootstrapped estimations at 10 million-fragment steps, from 0 to 2 billion total fragments per sample. To assess the fraction of additional unique fragments obtained at each 10 million read step (as shown in Figure 1-figure supplement 3) , we subtracted the estimated number of unique fragments from the number of unique fragments at the previous step and divided by 10 million (the number of additional raw sequenced fragments).

For 2D projections of scATAC-seq data, we used binarized sparse matrices of 20 kb window accessibility across the hg38 genome (excluding mitochondrial regions, chrM). Independently for each dataset, we selected features found in >3% of cells/nuclei, weighted features using term frequencyinverse document frequency, log-transformed the resulting weights, and performed principle component analysis (PCA) using singular value decomposition to generate 50 reduced dimensions as described previously Hill, 2019) . We then removed the first PC, which was strongly correlated with the number of available fragments and retained the remaining PCs up to PC 30. For display, we further reduced the dimensionality of selected PCs using UMAP (Becht et al., 2019; McInnes et al., 2018 ) (R package uwot, v0.1.8, parameters: scale = TRUE, min_dist = 0.2).

Labeling of scATAC-seq datasets was performed using the ArchR package v0.9.4. In brief, filtered fragments.tsv.gz files after quality control were used to generate an ArchR GeneScore matrix and a tiled genome feature matrix for each dataset. Cells were grouped by performing iterative latent semantic indexing (LSI) on the tile matrix, followed by the shared nearestneighbor clustering approach implemented in Seurat v3.1.5. GeneScore data was then used to compare scATAC-seq clusters to a labeled reference scRNA-seq dataset consisting of 9380 PBMCs generated by 10x Genomics, with labels provided by the Satija lab (https://www. dropbox.com/s/zn6khirjafoyyxl/pbmc_10k_v3.rds?dl=0) using ArchR's implementation of the Find-TransferAnchors method from Seurat. The best-scoring labels for each scATAC-seq cluster were used for downstream analysis and display (Figure 2) , and label transfer scores for individual cells were used to compare label transfer between methods (Figure 2c ).

After labeling cell types in each dataset, peaks for each cell type were generated using the ArchR functions addGroupCoverages and addReproduciblePeakSet. Within each dataset, peaks scores from each pair of cell types were compared using getMarkerFeatures performed in each direction separately by swapping foreground and background cell types. To identify differentially accessible markers, the Wilcoxon rank sum test was used (testMethod = 'wilcoxon'), and multiple hypothesis test correction was performed using the Benjamini and Hochburg method for false discovery rate estimation (FDR; Benjamini and Hochberg, 1995) . Differentially accessible peaks (DAPs) from each comparison were selected with filter string "FDR <= 0.05 and Log2FC >= log2(1.5)".

To identify enriched TFBS motifs, CIS-BP motif annotations (Weirauch et al., 2014) were attached to each peak identified by ArchR using the addMotifAnnotations. Marker peaks for each cell type were identified using getMarkerFeatures without specifying foreground and background groups using the Wilcoxon rank sum test and FDR correction, as described above. Enriched motifs were identified using the ArchR function peakAnnoEnrichment, which performs a hypergeometric test for enrichment of motif annotations in marker peaks, with FDR correction for multiple hypothesis testing (parameters: peakAnnotation = 'Motif', cutOff = "FDR <= 0.01 and Log2FC >= log2(1.5)"). Up to the top 10 enriched motifs for each cell type were plotted using the ArchR function plotEnrichHeatmap.

To assess cell-type proportions, PBMCs were analyzed with a 25-color immunophenotyping flow cytometry panel. 1 Â 10 6 thawed PBMCs were centrifuged (750Âg for 5 min at 4˚C) using a swinging bucket rotor (Beckman Coulter Avanti J-15RIVD with JS4.750 swinging bucket, B99516), the supernatant was removed using a vacuum aspirator pipette, and the cell pellet was resuspended in DPBS without calcium and magnesium (Corning 21-031 CM). Cells were incubated with Fixable Viability Stain 510 (BD, 564406) and TruStain FcX (BioLegend, 422302) for 30 min at 4˚C, then washed with chilled PBS+0.2% BSA (Sigma, A9576; 'PBS+BSA'). Cells were stained with a cocktail of antibodies (Supplementary file 3) at a staining volume of 100 mL for 30 min at 4˚C, then washed with PBS +0.2% BSA. Fixation was performed by resuspending cells in 100 mL of 4% paraformaldehyde (Electron Microscopy Sciences, 15713) and incubating for 15 min at 25˚C, protected from light. Following fixation, cells were washed twice with PBS+0.2% BSA and resuspended in 100 mL PBS (without BSA). Stained cells were analyzed on a five-laser Cytek Aurora spectral flow cytometer. Spectral unmixing was calculated with pre-recorded reference controls using Cytek SpectroFlo software (version 2.0.2). Cell types were quantified by traditional bivariate gating analysis performed with FlowJo cytometry software (version 10.7, Figure 2-figure supplement 1) .

Custom oligonucleotides for Tn5 complexes, indexing, and amplification were ordered from Integrated DNA Technologies. Sequences and modifications are provided in Supplementary file 5. ADT i7 indexing primers contain a phosphorothioate bond linking the final two 3 0 bases and were purified using polyacrylamide gel electrophoresis (PAGE). The MOSAIC_Bot oligo used in Tn5 complexing contains a 5 0 phosphate group and was purified using standard desalting. All remaining oligos were produced without chemical modifications and purified using standard desalting.

The assembly of Tn5 transposomes was performed as previously described (Mulqueen et al., 2019) . DNA complexes containing mosaic-end sequences with either a poly-T or Nextera R2N 5' overhang (Poly-T Top-L/MOSAIC_Bot, Tn5ME-s7_Top/MOSAIC_Bot) were created by annealing equimolar amounts of top and bottom oligos (Figure 3-figure supplement 1 and Supplementary file 5) on a C1000 Touch thermal cycler with 96-Deep Well Reaction Module (Bio-Rad, 1851197) at 95˚C for 5 min followed by 5˚C decreases every 2 min until the temperature reached 20˚C. Oligos were annealed at a concentration of 16 mM in 2Â Dialysis Buffer (100 mM HEPES-KOH pH 7.5 [Teknova, 550000-016], 200 mM NaCl, 0.2 mM EDTA, 2 mM DTT [IBI Scientific, 21040], 0.2% Triton X-100 [Sigma-Aldrich, T8787], 20% glycerol [Teknova, G1798] ). Annealed complexes were mixed 1:1 for a final concentration of 8 nM. Tn5 transposase (Beta Lifescience, TN5-BL01) was supplemented with 5 M NaCl at a final volume ratio of 1:8 NaCl to Tn5. The resulting NaCl/Tn5 mixture was mixed with the annealed complexes at a volume ratio of 1.2:1 ratio of DNA complexes to Tn5 and incubated at 25˚C for 60 min to form final, reaction-ready Tn5 complexes, which were stored at À20˚C until use.

Final libraries were quantified using qPCR (KAPA Biosystems Library Quantification Kit for Illumina, KK4844) on a CFX96 Touch Real-Time PCR Detection System (Bio-Rad, 1855195). Library quality and average fragment size were assessed using a Bioanalyzer (Agilent, G2939A) High Sensitivity DNA chip (Agilent, . Libraries were sequenced on the Illumina NovaSeq platform with the following read lengths: 28 bp read 1 (Cell barcode and UMI), 8 bp i7 index, 100 bp read 2 (ATACseq sequence or ADT barcode). A Truseq read 1 primer (0.3 mM, Supplementary file 5) was included as a Custom Read 1 primer to mitigate the risk of off-target priming of the standard Illumina Nextera read 1 primer on the partial Nextera R1N sequence included in the mosaic end portion of the Poly-T Tn5 insertion.

Demultiplexing of raw base call files into FASTQ files was performed using bcl2fastq2 (Illumina v2.20.0.422) . Read 2 was trimmed of adapter sequences, low-quality bases and reads, and polyA tailing using fastp (Chen et al., 2018a) v0.21.0 (parameters: -adapter_sequence=CTGTCTCTTA TACACATCT -cut_tailX-trim_poly_x) and the resulting read 2 sequences were aligned to the GRCh38 (hg38) reference genome (Illumina iGenomes, https://support.illumina.com/sequencing/ sequencing_software/igenome.html) using Bowtie 2 (Langmead and Salzberg, 2012; v2.3.0, parameters: -local, -sensitive, -no-unal, -phred33) . Aligned reads in SAM format were filtered by alignment score (!30) then tagged with cell barcode and UMI sequence and quality scores using custom python code (python3 v3.7.3). Barcode sequences were compared against the 10x Genomics v3 3 0 GEX barcode whitelist (3M-february-2018.txt.gz). Sequences not included in the whitelist were corrected to a valid whitelist barcode by allowing a single base mismatch (Hamming distance of 1). Sequences with more than one possible match were corrected at the position with the lowest sequencing quality score. Reads with barcodes that could not be corrected were excluded from further analysis. Filtered and tagged SAM files were converted to sorted, indexed BAM files using GATK (McKenna et al., 2010; Broad Institute v4.1.4.0) . Genomic coordinates were converted to BED format using bedtools (Quinlan and Hall, 2010; v2.26.0) . Custom python code was used to collapse aligned fragments into a list of fragments with unique cell barcode and genomic coordinate combinations. These fragments were then written as a fragments.tsv.gz file in the format: chr, start position, end position, cell barcode, UMI count, and strand (+/-).

Methods for ADT counting were developed in-house and were implemented as an optimized, highly efficient C program named BarCounter (available at https://github.com/AllenInstitute/Barcounterrelease). BarCounter was used for single-cell ADT counting as follows: firstly, barcode sequences were compared against the 10x Genomics v3 3 0 GEX barcode whitelist (3M-february-2018.txt.gz). Sequences not included in the whitelist were corrected to a valid whitelist barcode by allowing a single base mismatch (hamming distance of 1) at a low-quality basecall (sequencing quality score <20). Reads with barcodes that could not be corrected were excluded from further analysis. Next, ADT barcode sequences were compared against a CSV taglist containing ADT barcode/antibody associations. Antibody barcodes in the current TotalSeq-A catalog (BioLegend) have a Hamming distance from all other barcodes of at least 3. Therefore, a single base mismatch (Hamming distance of 1) was allowed. Reads containing ADT sequences that could not be assigned to an antibody in the taglist were excluded. Finally, UMI sequences that were unique within their assigned ADT for their assigned cell barcode were counted. Final ADT UMI counts were written by cell barcode to a CSV file for use in downstream analysis.

ADT features were filtered by comparison to the mouse IgG1k isotype control, which should not bind to human cell surface proteins. The distribution of counts for each antibody was compared to the control using a Mann-Whitney test (R function wilcox.test with parameter alternative = 'greater'). Any features for which the test returned a p-value>1Â10 À9 were considered similar to the control and were removed from downstream analysis (Supplementary file 6).

Aligned ICICLE-seq chromatin accessibility fragments.tsv.gz files were preprocessed as for 10Â scA-TAC-seq samples, above. QC filtering was performed as described, with modified cutoffs: >500 uniquely aligned reads, FRIP >0.65, FRITSS > 0.2, and fraction of fragments overlapping ENCODE reference regions > 0.5 were retained for downstream analysis. For 2D projections of the scATACseq data, we used binarized sparse matrices of 20 kb window accessibility across the hg38 genome, selected features found in >0.5% of cells, weighted features using LogTF-IDF, and performed PCA as described above. We then removed the first PC and retained the remaining PCs up to PC 20. UMAP was performed with adjusted parameters (scale = FALSE, min_dist = 0.2). To assign cell-type labels, filtered fragments.tsv.gz files were used as input to ArchR. ArchR functions addIterativeLSI and addGeneIntegrationMatrix (parameters transferParams = list(dims = 1:10, k.weight = 20) and nGenes = 4000) were used to transfer labels from the scRNA-seq PBMC reference described above (scATAC Cell Type Labeling).

Count matrices for ADT data were scaled for each cell by dividing by the thousands of total ADT UMIs per cell, then transformed using Log 10 (scaled count + 1). Normalized features were used for PCA using the R function prcomp with default parameters. Filtered and normalized features were used as direct input to UMAP (R package uwot with parameter min_dist = 0.2) with the first two PCs from PCA used as initial coordinates to aid reproducibility of UMAP projection. Cells were clustered using a Jaccard-Louvain method (parameters k = 15, radius = 1) using UMAP coordinates. Clusters with high signal from the mouse IgG1k isotype control antibody were removed from subsequent analysis (one cluster, n = 32 cells). The remaining clusters were manually labeled by examination of cell-type marker enrichment. The R package scratch.vis (Tasic et al., 2018; https://github.com/alleninstitute/scrattch.vis; Graybuck, 2021 ; copy archived at swh:1:rev:aa9094fb3d92f05264-c9aa0911eb4c4967862609) was used to generate the cluster median heatmap plot and a river/ alluvial plot comparing the cell-type labels obtained from ATAC-seq and ADT-based analyses.

To compare peaks between cell types, all filtered fragments for cells in each cell type were aggregated and used as input to the MACS2 peak caller (Zhang et al., 2008; parameters -f BED, -g hs,no-model) . The top 2500 peaks from each cell type were selected for comparison (except for Plasmablasts, for which all 592 peaks were used). A master set of peaks across all types was constructed by combining all narrowPeak results files and combining the outer coordinates of overlapping peaks (GenomicRanges function reduce). A binary matrix of peak overlaps for each cell type was generated and used to construct the peak comparison figure inspired by UpSet plots (Lex et al., 2014) .

Consistent with ICICLE-seq, cryopreserved PBMCs were thawed, stained for FACS, depleted of neutrophils, dead cells, and debris through FACS, stained with TotalSeq-A antibodies (BioLegend, Supplementary file 6), and permeabilized using 0.01% digitonin as described above. ATAC and Gene Expression libraries were prepared according to the Chromium Next GEM Single Cell Multiome ATAC + Gene Expression User Guide (CG000338 Rev A) with several modifications. To generate a high-quality reference dataset, we generated four 10x Genomics wells of TEA-seq. For each well, transposition was performed by diluting 15,400 permeabilized cells to a final volume of 5 mL in isotonic Tagmentation Buffer (20 mM Tris-HCl pH 7.4, 150 mM NaCl, 3 mM MgCl 2 , RNase Inhibitor 1 U/mL; Lucigen NxGen, F83923-1). Cells were mixed with 10 mL of a transposition master mix consisting of 7 mL ATAC Buffer B (10x Genomics, 2000193) and 3 mL ATAC Enzyme B (Tn5 transposase; 10x Genomics, 2000265) per reaction. Transposition was performed by incubating the prepared reactions on a C1000 Touch thermal cycler with 96-Deep Well Reaction Module (Bio-Rad, 1851197) at 37˚C for 60 min, followed by a brief hold at 4˚C. A Chromium NextGEM Chip J (10x Genomics, 2000264) was placed in a Chromium Next GEM Secondary Holder (10x Genomics, 3000332) and 50% glycerol (Teknova, G1798) was dispensed into all unused wells. A master mix composed of Barcoding Reagent Mix (10x Genomics, 2000267), Reducing Agent B (10x Genomics, 2000087) , Template Switch Oligo (10x Genomics, 3000228), and Barcoding Enzyme Mix (10x Genomics, 2000266) was then added to each sample well, pipette-mixed, and loaded into row 1 of the chip. Chromium Single Cell Multiome Gel Beads v1.1 (10x Genomics, 2000261) were vortexed for 30 s and loaded into row 2 of the chip, along with Partitioning Oil (10x Genomics, 2000190) in row 3. A 10x Gasket (10x Genomics, 3000072) was placed over the chip and attached to the Secondary Holder. The chip was loaded into a Chromium Single Cell Controller instrument (10x Genomics, 120270) for GEM generation. At the completion of the run, GEMs were collected, and reverse transcription and barcoding were performed by incubating GEMs on a C1000 Touch thermal cycler with 96-Deep Well Reaction Module at 37˚C for 45 min, 25˚C for 30 min, followed by a brief hold at 4˚C. Upon completion of the GEM incubation, 5 mL of Quenching Agent (10x Genomics, 2000269) was immediately added and mixed with the reaction solution.

Final libraries were quantified using qPCR (KAPA Biosystems Library Quantification Kit for Illumina, KK4844) on a CFX96 Touch Real-Time PCR Detection System (Bio-Rad, 1855195). Library quality and average fragment size were assessed using a Bioanalyzer (Agilent, G2939A) High Sensitivity DNA chip (Agilent, . Libraries were sequenced on the Illumina NovaSeq platform with the following read lengths: 50 bp read 1, 10 bp i7 index, 16 bp i5 index, 90 bp read 2.

Demultiplexing of raw base call files into FASTQ files was performed using bcl2fastq2 (Illumina v2.20.0.422, parameters: -create-fastq-for-index-reads, -minimum-trimmed-read-length=8, -mask-short-adapter-reads=8, -ignore-missing-positions, -ignore-missing-filter, -ignore-missing-bcls, -r 24 w 24 p 80). TheX-use-bases-mask option was used to adjust the read lengths for each library type as follows: Y28n*,I10,I10n*,Y90n* for Gene Expression, Y50n*,I8n*,Y16,Y50n* for ATAC, and Y28n*,I8n*,n*,Y90n* for ADT.

For analysis of TEA-seq data alone, the full available sequencing depth from each library for each well was used for alignment. RNA and ATAC libraries were aligned using cellranger-arc software (v1.0.0, 10x Genomics) against 10x genomics reference refdata-cellranger-arc-GRCh38-2020-A using default parameters. ADT barcode counting was performed using BarCounter as described above for ICICLE-seq. After alignment, ATAC-seq data was used as input for scATAC-seq Quality Control analysis and filtering, as described above. The 10x droplet barcodes that were called cells by the cellranger-arc software passed ATAC-seq QC criteria (>2,500,500 unique fragments, FRIP >0.2, FRITSS > 0.2, and fraction of fragments overlapping ENCODE reference regions > 0.5) and had >500 genes detected were used for downstream analysis.

Selected scATAC-seq barcodes were further processed using ArchR (v1.0.1) for Iterative LSI, Clustering, Group Coverage computation, Reproducible Peak Set annotation, and sparse Peak Matrix generation using default settings according to the ArchR documentation (https://archrproject.com). We then used the Peak Matrix as input for an additional round of Iterative LSI based on these features (Peak LSI), then used the ArchR addUMAP function for two-dimensional UMAP projection.

Next, the scRNA-seq count matrix generated by cellranger-arc, the Peak Matrixand, Peak LSI data (for analysis), and UMAP projection, and Gene Score Matrix (for visualization) generated using ArchR and the ADT count matrix were filtered and combined as assays in a Seurat object (Seurat v4.0.0-beta), and dimensionality reduction for scRNA-seq and ADTs was carried out using parameters specific to each assay as described below.

For scRNA-seq, the matrix was normalized (NormalizeData function, default settings), variable features were identified (FindVariableFeatures function, defaults), the data was scaled (ScaleData function, defaults), PCA was performed (RunPCA function, defaults), and a UMAP projection was generated (RunUMAP, dims = 1:30). For label transfer, we performed SCTransform to normalize the data in order to match the Seurat Multimodal Reference Dataset for PBMCs (available from the Satija lab at https://atlas.fredhutch.org/data/nygc/multimodal/pbmc_multimodal.h5seurat, Hao et al., 2020) . Label transfer was then performed using the Seurat functions FindTransferAnchors and Trans-ferData as described in the Seurat v4 Vignettes. For consistency with other figures in the paper, we mapped the reference labels in the 'celltype.l2' categories onto the cell-type categories presented in the reference used for scATAC-seq label transfer, above.

For ADT counts, the data was normalized (NormalizeData function, scale.factor = 1000) and scaled (ScaleData function, defaults), then used directly as inputs for UMAP projection (RunUMAP function).

For scATAC-seq data, we imported the results from ArchR for presentation and analysis, described above.

In order to perform weighted nearest neighbors (WNN) analysis with input from all three assays above, we extended the functions provided by Seurat v4.0.0-beta , which utilize only two assays, to allow an arbitrary number of reduced dimension sets to be used as input. To do so, we performed within-modality and cross-modality predictions and computed cross-modal affinities as described by Y. Hao and S. Hao et al. We then generalized the computation of cell-specific modality affinity ratio, S, for each cell i, in each modality, a, using a small value for e (10 À4 ) as suggested by Y. Hao and S. Hao et al.

Here, x i and x i;knna represent a generic form of the actual and predicted values of the lowdimensionality representation of each cell in each modality, as described for RNA and protein in Y. Hao and S. Hao et al. (e.g ., r i and r i;knna for scRNA), and b is used to represent all modalities that are not the modality under consideration (e.g., ADT and scATAC-seq when a is RNA). The formula for a x i ; x i;knna À Á and a x i ; x i;knnb À Á is given in Y. Hao (Swanson, 2021 ; copy archived at swh:1:rev:d49409e847ba787cb483d12bd5712d1083f41ac1). In the specific case of TEA-seq integration, we used scRNA-seq PCA dimensions 1:30, ATAC Peak LSI dimensions 1:30, and all normalized and scaled ADT features passing QC criteria (1:38).

To identify peak-to-RNA and peak-to-protein correlations in TEA-seq data, we first generated lowdimensional embeddings based on joint analysis for RNA and ATAC data using Archr. We generated an ArchR project using all four of the TEA-seq wells generated and filtered, above. We used the 500 bp tile matrix generated by ArchR to perform LSI for ATAC data using addIterativeLSI. We then imported the RNA-seq data using import10xFeatureMatrix, added the matched RNA data to the project using addGeneExpressionMatrix, and reduced dimensions with addIterativeLSI with modified parameters (clusterParams = list(resolution = 0.2, sampleCells = 1e4, n.start = 10, varFeatures = 2500, firstSelection = 'variable', binarize = FALSE)). We then integrated LSI results using addCombined-Dims, then clustered cells using the combined LSI dimensions using addClusters. We then called peaks for each cluster using addGroupCoverages and addReproduciblePeakSet, before generating a per-cell peak matrix using addPeakMatrix. Next, we recomputed the ATAC-seq LSI based on the peak matrix with addIterativeLSI and generated a refined set of joint reduced dimensions in combination with the RNA LSI results with addCombinedDims. These combined, peak-based dimensions were used to identify peak-to-RNA correlations using addPeak2GeneLinks. For loop visualizations, we filtered the peak-to-RNA correlations using getPeak2GeneLinks with adjusted parameters (cor-CutOff = 0.05, FDRCutOff = 0.01, resolution = 1000, returnLoops = TRUE), and results were filtered for epitope antibody targets (Supplementary file 6). To compute peak-to-Protein correlations, we substituted the ADT count matrix for the RNA matrix by using the ADT matrix as an input for addG-eneExpressionMatrix, then re-ran the addPeak2Gene links and getPeak2GeneLinks functions with the same parameters listed, above. For comparisons shown in scatter plots, we used the base-specific (non-binned) findPeak2GeneLinks results stored in the ArchR object, and selected all peak-to-RNA or peak-to-Protein links identified for each gene with the same corCutOff and FDRCutOff parameters shown above.

Cross-modal comparison of TEA-seq, CITE-seq, and scATAC-seq Joint sample processing and aliquots

To allow direct comparison of TEA-seq data to CITE-seq, scATAC-seq, and 10x Multiome RNA + ATAC performed on unstained nuclei and permeabilized cells, we thawed leukapheresis-purified PBMCs and performed dead cell, debris, and neutrophil removal by FACS as described above. We then separated this pool into separate aliquots for antibody staining (for CITE-seq and TEA-seq), nuclei isolation (for 10x Multiome RNA + ATAC), and two separate cell permeabilization aliquots (for scATAC-seq and 10x Multiome RNA + ATAC). scATAC-seq was performed using 0.01% digitonin permeabilization as described above. We then stained the CITE-seq and TEA-seq aliquot using the same 46 antibody panel as described above for ICICLE-seq and TEA-seq, and divided this stained pool for whole-cell CITE-seq (described below) and TEA-seq. Finally, we permeabilized the TEA-seq portion of this aliquot to perform TEA-seq as described above. For 10x Multiome RNA + ATAC libraries, we either permeabilized cells with 0.01% digitonin as describe above or purified nuclei as described by the 10x Genomics Nuclei Isolation for Single Cell Multiome ATAC + Gene Expression Sequencing Demonstrated Protocol (CG000365 Rev B). Protocol steps following sample preparation for TEA-seq or Multiome ATAC + Gene Expression were performed as described above. For Multiome ATAC + Gene Expression on nuclei or permeabilized cells, we excluded steps for ADT processing. A diagrammatic overview of this experimental workflow is provided in Figure 4 -figure supplement 4a.

CITE-seq library generation was performed using 10x Genomics Single Cell 3 0 v3.1 kit, based on the 10 Â 3 0 Gene Expression Library Construction Guide (CG000204 Rev B) and the New York Genome Center Technology Innovation Lab CITE-seq protocol (version 2019-02-13). Briefly, a Chromium Next GEM Chip G (10x Genomics, 2000177) was placed in a Chromium Next GEM Secondary Holder and 50% glycerol was dispensed into all unused wells. A barcoding master mix was prepared, which consisted of RT Reagent B (10x Genomics, 2000165) , Template Switch Oligo (10x Genomics, 3000228), Reducing Agent B (10x Genomics, 2000087), and RT Enzyme C (10x Genomics, 2000085). The master mix was added to the sample well, pipette-mixed, and loaded into row 1 of the chip. Chromium Single Cell 3' v3.1 Gel Beads (10x Genomics, 2000164) were vortexed for 30 s and loaded into row 2 of the chip, along with Partitioning Oil (10x Genomics, 2000190) in row 3. A 10x Gasket (10x Genomics, 370017) was placed over the chip and attached to the Secondary Holder. The chip was loaded into a Chromium Single Cell Controller instrument for GEM generation. GEMs were separated into a biphasic mixture through addition of Recovery Agent (10x Genomics, 220016), the aqueous phase was retained and removed of barcoding reagents using Dynabeads MyOne SILANE beads. cDNA amplification was performed using the 10x Genomics cDNA Primers (2000089) on a C1000 Touch thermal cycler with 96-Deep Well Reaction Module: 53˚C for 45 min, 85˚C for 5 min, followed by a hold at 4˚C.

To separate ADT and cDNA libraries, we performed a 0.6Â bead:sample SPRIselect reagent size selection. The bead fraction from this separation was retained as the cDNA fraction, and supernatant separation was retained as the ADT-containing fraction. To further process ADTs, we added an additional 1.4Â SPRI volumes to the ADT-containing supernatant to a final ratio of 2Â SPRI:sample, then retained the bead fraction of this second step. We then repeated a fresh 2Â bead:sample cleanup, retaining the bead fraction as the ADT library. ADTs were amplified in a 100 mL reaction consisting of Buffer EB, KAPA HiFi HotStart ReadyMix, SI-PCR-Oligo primer (10 mM, Supplementary file 5), and ADT i7 primer (10 mM, Supplementary file 5). ADT PCR was performed in a C1000 Touch thermal cycler with 96-Deep Well Reaction Module: 95˚C for 3 min, 12 cycles of 95˚C for 20 s, 60˚C for 30 s, and 72˚C for 20 s, followed by a final extension of 72˚C for 5 min. SPRIselect reagent cleanups were performed with a 1.6Â bead:sample ratio to generate final ADT libraries.

cDNA Fragmentation, End-Repair, A-tailing, and indexed amplification were performed as described in the 10 Â 3 0 Gene Expression Library Construction Guide. These steps match the 10x Multiome guide and are described above in the TEA-seq library construction methods.

10x Multiome and TEA-seq library preparation As described above, unstained FACS-sorted PBMCs were used as input to nuclei and permeabilized cell Multiome ATAC plus Gene Expression library preparation. Nuclei were isolated according to 10x Genomics Demonstrated Protocol CG000365 Rev A exactly as recommended for cryopreserved PBMCs. Cells were permeabilized as described above. Libraries were prepared according to the 10x Genomics Chromium Next GEM Single Cell Multiome ATAC plus Gene Expression user guide CG000338 Rev A.

TEA-seq libraries were prepared as described previously in the methods above.

All Multiome ATAC plus Gene Expression and TEA-seq libraries were sequenced on an Illumina NovaSeq S4 v1.0 flowcell with the following read lengths: 50 bp read 1, 10 bp i7 index, 16 bp i5 index, 90 bp read 2. Following sequencing, read lengths were adjusted to the applicable length for each datatype (ATAC, gene expression, ADT) during demultiplexing. Demultiplexing of raw base call files into FASTQ files was performed using bcl2fastq2 (Illumina v2.20.0.422, parameters: -create-fastq-for-index-reads, -minimum-trimmed-read-length=8, -mask-shortadapter-reads=8, -ignore-missing-positions, -ignore-missing-filter, -ignoremissing-bcls, -r 24 w 24 p 80). The -use-bases-mask option was used to adjust the read lengths for each library type as follows: Y28n*,I10,I10n*,Y90n* for Gene Expression, Y50n*,I8n*,Y16, Y50n* for ATAC, and Y28n*,I8n*,n*,Y90n* for ADT.

CITE-seq libraries were sequenced on the Illumina NextSeq platform using a high-output flowcell with the following read lengths: 28 bp read 1, 8 bp i7 index, 91 bp read 2. Demultiplexing of CITEseq raw base call files into FASTQ files was performed using 10x Genomics cellranger mkfastq (v.3.0.2).

The scATAC library was sequenced on the Illumina NextSeq platform using a high-output flowcell with the following read lengths: 51 bp read 1, 8 bp i7 index, 16 bp i5 index, 51 bp read 2. Demultiplexing of raw base call files into FASTQ files was performed using 10x cellranger-atac mkfastq (10x Genomics v.1.1.0).

To perform comparisons between these methodologies, all scRNA-seq and scATAC-seq libraries were downsampled to 200 million FASTQ entries (200 million of each sequenced read;~20,000,000 reads per expected cell recovered at~10,000 cell expected recovery), and all ADT libraries were downsampled to 48 million reads (limited by the available read count for one experiment;~48,00,800 reads per expected cell recovered). For scATAC-seq data, we used cellranger-atac (v 1.2.0) to align the data to 10x Genomics reference refdata-cellranger-atac-GRCh38-1.2.0. For CITE-seq data, RNA alignment was performed using cellranger (v5.0.0) with the -includeintrons parameter against 10x Genomics reference refdata-gex-GRCh38-2020-A. For 10x Multiome and TEA-seq, we aligned RNA and ATAC data using cellranger-arc (v1.0.0) against 10x Genomics reference refdata-cellranger-arc-GRCh38-2020-A. To analyze CITE-seq and TEA-seq ADT libraries, we used BarCounter to compute the counts of ADT barcodes for each 10x cell barcode, as described for ICICLE-seq and TEA-seq, above.

After alignment or tabulation of each dataset, cell barcodes were filtered using modality and method-specific criteria based on inspection of the distribution of QC data for each library. Filtering was performed to remove low-quality barcodes and putative doublets. For transcription (RNA-seq data), we filtered barcodes based on number of genes detected per barcode: TEA-seq: >500 genes, <2,750,750 genes, 7098 barcodes retained; nuclei on 10x Multiome: >300 genes, <4,000,000 genes, 6717 barcodes retained; permeabilized cells on 10x Multiome: >300 genes, <2,750,750 genes, 8334 barcodes retained; CITE-seq: >500 genes, <5,000,000 genes, 7399 barcodes retained. For epitopes (ADT data), both TEA-seq and CITE-seq were filtered based on number of UMIs (>300 UMIs and <5000 UMIs): TEA-seq, 7661 barcodes retained; CITE-seq, 7493 barcodes retained. For accessibility (ATAC-seq data), barcodes were filtered using the same criteria as other scATAC-seq libraries, above (>1,000,000 unique fragments, FRIP > 0.2, FRITSS > 0.2, Fraction ENCODE overlap > 0.5), and doublets were removed using the ArchR package : TEA-seq: 6725 barcodes retained; nuclei on 10x Multiome: 6541 barcodes retained; permeabilized cells on 10x Multiome: 7238 barcodes retained; standalone scATAC-seq: 11,963 barcodes retained.

After filtering, we performed modality-specific analyses to compare the performance of each method in each separate modality. For transcription, scRNA-seq datasets were analyzed using the Seurat package (v.4.0-beta, Hao et al., 2020; Stuart et al., 2019) . Data was normalized using the NormalizeData function, features were selected using the FindVariableFeatures function, and data was then scaled using the ScaleData function. Scaled data for selected features was used as input for PCA to reduce dimensionality, then projected to two dimensions using the RunUMAP function on the first 30 principal component dimensions. All parameters were defaults. To label cells with PBMC cell types, we utilized the same scRNA-seq reference from the Satija lab that was used for scATAC-seq analysis, above. Label transfer was performed using FindTransferAnchors function (with the parameter reduction = 'cca'), followed by the TransferData function. Canonical correlation analysis was used instead of a PCA-based transfer because the scRNA-seq datasets were aligned using introns, while the reference was aligned using exons only. To compare gene detection across experimental methods, we computed the fraction of filtered cell barcodes with !1 UMI for each gene.

For epitope (ADT) data from filtered cell barcodes, we filtered features to remove antibodies that were similar to the Mouse IgG1k Isotype background control. For each cell barcode, we divided the UMI counts for each feature by the number of UMI counts for the control, and flagged features with 16-fold enrichment relative to the control. We then removed the following features that were flagged in >0.5% of cells in the CITE-seq dataset: CD269 (TNFRSF17), CD304 (NRP-1), CD66b (CEA-CAM8), CD80, and TCR Va24-Ja18. The TEA-seq dataset was filtered to remove these features to allow comparison between TEA-seq and CITE-seq data. After feature filtering, each cell barcode was normalized by dividing the UMI counts by the total UMI count, and multiplying by 10,000. Values for each feature were then scaled using the R scale function, and all features were used as input for two-dimensional UMAP projection using the uwot R package (Melville, 2020) .

For accessibility (scATAC-seq) data from filtered cell barcodes, we analyzed each dataset using the ArchR package (v1.0.1 Granja et al., 2020) . For each dataset, we performed dimensionality reduction using the addIterativeLSI function (with parameter varFeatures = 50,000), then generated two-dimensional UMAP projections using the addUMAP function. To label cells, we used the addGe-neIntegraitonMatrix function to compare the ArchR GeneScore matrix to the same scRNA-seq reference used for scRNA-seq, above.

PBMCs were depleted of neutrophils, dead cells, and debris through FACS as described above

ATAC PCR was performed in a C1000 Touch thermal cycler with 96-Deep Well Reaction Module: 98˚C for 45 s, seven cycles of 98˚C for 20 s, 54˚C for 30 s, and 72˚C for 20 s, followed by a final extension of 72˚C for 1 min. ADT fragments were amplified in a 100 mL reaction consisting of Buffer EB (Qiagen, 1014609), KAPA HiFi HotStart ReadyMix (KAPA Biosystems, KM2602), SI-P5-22 primer (10 mM, Supplementary file 5), and ADT i7 primer (10 mM, Supplementary file 5). ADT PCR was performed in a C1000 Touch thermal cycler with 96-Deep Well Reaction Module: 95˚C for 3 min, 15 cycles of 95˚C for 20 s, 60˚C for 30 s, and 72˚C for 20 s, followed by a final extension of 72˚C for 5 min. SPRIselect reagent cleanups were performed with a 1.2Â bead:sample ratio for ADT libraries and a dual-sided size selection of 0.4Â/1.2Â bead:sample ratio for ATAC libraries

Amplification was performed in a C1000 Touch thermal cycler with 96-Deep Well Reaction Module: 98˚C for 45 s, then nine cycles of 98˚C for 20 s, 67˚C for 30 s, and 72˚C for 20 s, with a final extension of 72˚C for 1 min

B23318) size-selection clean-up. Large cDNA fragments were retained in an initial 0.6Â bead:sample SPRIselect incubation, reactions were incubated on a magnet, and small unbound fragments containing ADTs were transferred to new wells where they were subjected to an additional 2.0Â bead:sample SPRIselect cleanup. Consistent with ICICLE-seq, ADT fragments were amplified in a 15 cycle indexing PCR, substituting SI-PCR-Oligo primer (10 mM, Supplementary file 5) for SI-P5-22. Final ADT libraries were prepared using a dual-sided 1.6Â bead:sample SPRIselect reagent

Amplification was performed in a C1000 Touch thermal cycler with 96-Deep Well Reaction Module: 98˚C for 45 s, for 14 cycles of 98˚C for 20 s, 54˚C for 30 s, and 72˚C for 20 s, with a final extension of 72˚C for 1 min. Final Gene Expression libraries were prepared using a dual-sided 0.6Â/0.8Â bead:sample SPRIselect reagent (Beckman Coulter, B23318) size-selection clean-up. A bench protocol for TEA-seq is available on protocols

Post-processing analysis of summary statistics and visualization of snATAC-seq, scATAC-seq, and ICICLE-seq was performed using R v.3.6.3 and greater (R Development Core Team, 2020) in the Rstudio IDE (RStudio Team, 2020; Integrated Development Environment for R) or using the Rstudio Server Open Source Edition as well as the following packages: for scATAC-seq-specific analyses and comparisons to scRNA-seq data

and reshape2 (Wickham, 2007); for data visualization, ggplot2 (Wickham, 2016) and cowplot (Wilke, 2018); for dimensionality reduction and clustering, igraph

2018); for manipulation of genomic region data, bedtools2 (Quinlan and Hall, 2010) and GenomicRanges (Lawrence et al., 2013); and for prediction of sequencing saturation, preseqR

Data and code availability Raw scATAC-seq and ICICLE-seq, and TEA-seq data are in the process of deposition to dbGaP for controlled access

Code related to processing, analysis, and visualization of data in this study is

Writingoriginal draft, Writing -review and editing

Formal analysis, Investigation, Visualization; Morgan DA Weiss, Investigation, Writing -review and editing

Supervision, Writing -review and editing

Supervision, Investigation, Project administration, Writing -review and editing

Conceptualization, Software, Formal analysis, Investigation, Visualization, Methodology, Writing -original draft, Writing -review

Conceptualization, Supervision, Funding acquisition, Writing -original draft, Project administration, Writing -review and editing Author ORCIDs Elliott Swanson

Ethics Human subjects: Biological specimens purchased from BioIVT were collected under BioIVT Protocol SERATRIALS-18002 (WIRB Protocol #20190318), and were provided as cryopreserved PBMCs. Specimens purchased from Bloodworks NW were collected under BloodWorks Protocol BT001 (WIRB Protocol #20141589) as freshly drawn whole blood. All sample collections were conducted by BioIVT and Bloodworks NW under these IRB protocols

Nuclear protein import in permeabilized mammalian cells requires soluble cytoplasmic factors

Chromatin compaction under mixed salt conditions: opposite effects of sodium and potassium ions on nucleosome array folding

Single-cell profiling of histone modifications in the mouse brain

Matrix: Sparse and Dense Matrix Classes and Methods

Dimensionality reduction for visualizing single-cell data using UMAP

matrixStats: Functions That Apply to Rows and Columns of Matrices (And to Vectors

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Isolation of mononuclear cells and granulocytes from human blood. isolation of monuclear cells by one centrifugation, and of granulocytes by combining centrifugation and sedimentation at 1 g

Single-cell chromatin accessibility reveals principles of regulatory variation

Joint profiling of chromatin accessibility and gene expression in thousands of single cells

fastp: an ultra-fast all-in-one FASTQ preprocessor

A rapid and robust method for single cell chromatin accessibility profiling

Joint single-cell DNA accessibility and protein epitope profiling reveals environmental regulation of epigenomic heterogeneity

High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell

Enzymac characterization and lipid composition of rat liver subcellular membranes

Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution

An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues

The igraph software package for complex network research

A Single-Cell atlas of in Vivo Mammalian Chromatin Accessibility

Predicting the molecular complexity of sequencing libraries

Applications of species accumulation curves in large-scale biological data analysis

Estimating the number of species to attain sufficient representation in a random sample

Evaluating daratumumab in the treatment of multiple myeloma: safety, efficacy and place in therapy

Methods for high-dimensional analysis of cells dissociated from cryopreserved synovial tissue

Extension of 'data

Cyclic ADP-Ribose and NAADP: Structures, Metabolism and Functions

ArchR: an integrative and scalable software package for single-cell chromatin accessibility analysis

scrattch.vis. Software Heritage. swh:1:rev:aa9094fb3d92f05264c9aa0911eb4c4967862609

Integrated analysis of multimodal single-cell data

Dimensionality reduction for scATAC data

The UCSC genome browser database: update

Crude subcellular fractionation of cultured mammalian cell lines

Fast gapped-read alignment with bowtie 2

Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility

Massively parallel single-cell mitochondrial DNA genotyping and chromatin profiling

From haematopoietic stem cells to complex differentiation landscapes

Software for computing and annotating genomic ranges

UpSet: visualization of intersecting sets

Identification of transcription factor binding sites using ATAC-seq

ScOpen: chromatin-accessibility estimation of single-cell ATAC data

Osmotic modulation of chromatin impacts on efficiency and kinetics of cell fate modulation

Chromatin potential identified by shared single cell profiling of RNA and chromatin

Single-cell RNA-seq with spike-in cells enables accurate quantification of cell-specific drug effects in pancreatic islets

UMAP: uniform manifold approximation and projection

The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data

Uwot: The Uniform Manifold Approximation and Projection (UMAP) Method for Dimensionality Reduction

Index and biological spectrum of human DNase I hypersensitive sites

Scalable multimodal profiling of chromatin accessibility and protein levels in single cells

Improved single-cell ATAC-seq reveals chromatin dynamics of in vitro corticogenesis

High-content single-cell combinatorial indexing

BEDTools: a flexible suite of utilities for comparing genomic features

R: A Language and Environment for Statistical Computing

RStudio: Integrated Development Environment for

Daratumumab: a first-in-class CD38 monoclonal antibody for the treatment of multiple myeloma

Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion

Simultaneous epitope and transcriptome measurement in single cells

Comprehensive integration of Single-Cell data

2021. seurat. Software Heritage. swh:1:rev:d49409e847ba787cb483d12bd5712d1083f41ac1. https:// archive.softwareheritage.org/swh:1:dir:23b245ae7edc4628526e3c0de51faa9e5c8418f8

GNU Parallel: The Command-Line Power Tool

Deep learning Decodes the principles of differential gene expression

Shared and distinct transcriptomic cell types across neocortical areas

Critical roles for CCR2 and MCP-3 in monocyte mobilization from bone marrow and recruitment to inflammatory sites

Non-Redundant TF Motif Matches Genome-Wide : Altius Institute for Biomedical Sciences

Joint profiling of chromatin accessibility and CAR-T integration site analysis at population and single-cell levels

Determination and Inference of Eukaryotic Transcription Factor Sequence Specificity

Reshaping Data with the reshape Package

ggplot2: elegant graphics for data analysis

Cowplot: Streamlined Plot Theme and Plot Annotations for "Ggplot2

SCALE method for single-cell ATAC-seq analysis via latent feature extraction

Model-based analysis of ChIP-Seq (MACS)

A human circulating immune cell landscape in aging and COVID-19

The authors thank Nina Kondza, Tanja Smith, Leila Shiraiwa, and Ernie Coffey for operational support, Olivia Fong for data management support, and the Allen Institute for Immunology Software Development team for computational support. The authors thank the Allen Institute founder, Paul G. Allen, for his vision, encouragement, and support.

The following dataset was generated: 

Cara Lord: Cara Lord is affiliated with GlaxoSmithKline. The author has no financial interests to declare. The other authors declare that no competing interests exist.

No external funding was received for this work.

Supplementary files . Supplementary file 1. Antibody-fluorophore conjugates used to characterize peripheral blood mononuclear cell cell-type abundance by flow cytometry before and after anti-CD15 bead-based neutrophil depletion.. Supplementary file 2. Quantification of major cell populations by flow cytometry before (pre-depletion) and after (post-depletion) anti-CD15 bead-based neutrophil depletion of either Ficoll-purified or leukapheresis-purified peripheral blood mononuclear cells. The gating strategy used to assess these populations is presented in Figure 1 -figure supplement 4.. Supplementary file 3. Antibody-fluorophore conjugates used to assess peripheral blood mononuclear cell cell-type populations for evaluation of cell-type labeling.. Supplementary file 4. Quantification of peripheral blood mononuclear cell cell-type populations using the antibody panel provided in Supplementary file 3. The gating strategy used to assess these populations is presented in Figure 2 -figure supplement 1. Type labels and proportions used for comparisons of scATAC-seq cell-type labeling in Figure 2d are provided in the last two columns. When two gated populations are combined to tabulate a single cell type, the same cell-type label is listed beside each gate. Cell-type proportions are listed only at the first instance of each cell type.. Supplementary file 5. Custom oligonucleotide sequences used to generate ICICLE-seq libraries.. Supplementary file 6. Antibody-oligo conjugates (BioLegend TotalSeq-A) used for ICICLE-seq and TEA-seq experiments to label peripheral blood mononuclear cell cell types.. Supplementary file 7. Sources and presentation of human biological specimens utilized in this study.. Transparent reporting form Data availability Raw scATAC-seq and ICICLE-seq, and TEA-seq data are deposited dbGaP for controlled access (dbGaP Study Accession phs002316.v1.p1). Processed data has been deposited in GEO (Accession Number GSE158013).