key: cord-0322144-gqsrxul6
authors: Mylka, Viacheslav; Aerts, Jeroen; Matetovici, Irina; Poovathingal, Suresh; Vandamme, Niels; Seurinck, Ruth; Hulselmans, Gert; Van Den Hoecke, Silvie; Wils, Hans; Reumers, Joke; Van Houdt, Jeroen; Aerts, Stein; Saeys, Yvan
title: Comparative analysis of antibody- and lipid-based multiplexing methods for single-cell RNA-seq
date: 2020-11-17
journal: bioRxiv
DOI: 10.1101/2020.11.16.384222
sha: ffdc50d239f8e531dc159993723c0cbc176a632e
doc_id: 322144
cord_uid: gqsrxul6

Multiplexing of samples in single-cell RNA-seq studies allows significant reduction of experimental costs, straightforward identification of doublets, increased cell throughput, and reduction of sample-specific batch effects. Recently published multiplexing techniques using oligo-conjugated antibodies or - lipids allow barcoding sample-specific cells, a process called ‘hashing’. Here, we compare the hashing performance of TotalSeq-A and -C antibodies, custom synthesized lipids and MULTI-seq lipid hashes in four cell lines, both for single-cell RNA-seq and single-nucleus RNA-seq. Hashing efficiency was evaluated using the intrinsic genetic variation of the cell lines. Benchmarking of different hashing strategies and computational pipelines indicates that correct demultiplexing can be achieved with both lipid- and antibody-hashed human cells and nuclei, with MULTISeqDemux as the preferred demultiplexing function and antibody-based hashing as the most efficient protocol on cells. Antibody hashing was further evaluated on clinical samples using PBMCs from healthy and SARS-CoV-2 infected patients, where we demonstrate a more affordable approach for large single-cell sequencing clinical studies, while simultaneously reducing batch effects.

Recent advances in single-cell and single-nucleus RNA sequencing (scRNA-seq and snRNA-seq) have had an unprecedented impact on our understanding of heterogenous cell populations (Aizarani et al. 2019 ; Davie et al. 2018; Han et al. 2018; Van Hove et al. 2019; Lake et al. 2016; Schaum et al. 2018) .

Current scRNA-seq experiments make it possible to routinely assay many thousands of cells at once, with recent datasets reporting hundreds of thousands to millions of cells (Davie et al. 2018; Park et al. 2020; Schaum et al. 2018) . In standard single-cell workflows, individual samples need to be processed in parallel, which limits the throughput, increases reagent costs and has the potential to introduce batch effects. Recently, several approaches for multiplexing have been described, including the use of pre-existing genetic diversity (Kang et al. 2018) or by introducing sample-specific barcodes using oligo-labelled antibodies (Stoeckius et al. 2018) , oligo-labelled lipid anchors (McGinnis et al. 2019 ), chemical labelling with oligos (Gehring; , or genetic cell labelling (Guo et al. 2019) . Multiplexing samples by labelling cells or nuclei with sample-specific barcodes before pooling and single-cell compartmentalization, a technique called 'hashing', allows for accurate detection of two (doublets) or more (multiplets) cells originating from different samples but captured in the same compartment, which inevitably occurs in standard single-cell workflows.

Therefore, implementing a barcoding multiplexing paradigm allows users to drastically increase the number of cells or nuclei loaded per reaction, which consequently decreases per-cell library preparation cost.

The development of oligo-labelled antibodies directed against cell surface proteins for sample multiplexing, is a direct evolution from the Abseq (Shahi et al. 2017) , REAP-seq (Peterson et al. 2017 ) and CITE-seq (Stoeckius et al. 2017) protocols. One of the most widely used methods to date for detection of the cell epitome is by using the TotalSeq antibodies from Biolegend in combination with the scRNA-seq technologies from 10x Genomics. There are several types of TotalSeq antibodies to be used for cell labelling, including TotalSeq-A antibodies that contain a poly-A sequence mimicking a natural mRNA. These are designed to work with any sequencing platform that relies on poly-dT oligonucleotides as the mRNA capture method, while TotalSeq-B and TotalSeq-C antibodies contain a capture sequences that are compatible with the 10x Genomics 3' scRNA-seq (v3 or v3.1) and 5' scRNA-seq workflows, respectively. For human samples, the pre-mixed TotalSeq hashtag reagents recognize cell surface markers CD298 and β2-microglobulin. The success of using antibodies for hashing depends on the ubiquitous expression of these target antigens, which can be problematic for some samples or species (Federico Garrido 2019; Hermiston, Xu, and Weiss 2003) , limiting the sample-agnostic, universal application of this method. An elegant way to overcome this limitation is the use of lipid anchors that are antigen independent and insert universally into the cell or nucleus membrane, irrespective of sample type (McGinnis et al. 2019) . Both antibody-based and lipid-based methods are simple, straightforward and generally applicable to a wide range of single cell applications and platforms, while genetic cell labelling and chemical labelling with oligos tend to be more challenging. It is still unclear which method is most accurate in separating samples based on the inserted hashtags. In terms of labour intensity both hashing methods are comparable.

In this study, we compared antibody-based and lipid-based sample barcoding methods by multiplexing four distinct human cancer cell lines. By exploiting the intrinsic genetic variations of these cell lines, demultiplexing by genetic diversity serves as a 'ground truth' and allows determining the hashing accuracy of each method.

Single-cell suspensions can only be prepared from fresh tissues, which is a major roadblock for analysing clinical samples, archived materials and tissues such as the brain, for which cells cannot be readily extracted (Grindberg et al. 2013; Habib et al. 2016; Lake et al. 2016) . To overcome these limitations, single-nuclei can be extracted and analysed similarly to a standard single-cell workflow (Habib et al. 2017) . Therefore, we included nuclei samples in our comparison of hashing methods.

Finally, we evaluated hashing accuracy of human PBMCs from different healthy and diseased patients using TotalSeq-A antibodies, which can be very relevant for single-cell sequencing in clinical studies.

MCF7, PC3, DU145 and MDA-MB-231 cells were maintained according to standard procedures in RPMI-1640 (Gibco, #21875034), F-12K (LGCstandards, ATCC 30-2004 , EMEM (LGCstandards, ATCC 30-2003) and DMEM: F-12 (LGCstandards, ATCC 30-2006) medium respectively, supplemented with 10% fetal bovine serum (Gibco, #10082147) and 1% penicillin/streptomycin (100 U/ml and 100 μg/ml, respectively, Gibco, #15140122) at 37 °C with 5% CO2. RPMI-1640 medium used to culture the MCF7 cells was additionally supplemented with 0.01 mg/ml human recombinant insulin (Sigma Aldrich, #I3536).

To prepare single cell solutions of the cultured cell lines, the culture medium was removed, and the cells were washed with 1X PBS. Afterwards, the cells were trypsinized (0.05% trypsin-EDTA, Gibco, # 25300054) and pelleted at 200 x g for 5min. The cells were resuspended in 2 ml of culture medium, gently mixed with 8 ml of 1X PBS and put on ice. Afterwards, the cells were pelleted at 200 x g for 5min on 4°C and resuspended in 1 ml 1X PBS with 0.04% BSA. Cells were passed through a 40 µm cell strainer (Corning, # CLS431750-50EA) and counted with a LUNA FL counter, using 1 μl of acridine orange and propidium iodide dye (F23001, Westburg).

Single nuclei suspensions were prepared using a modified nuclei isolation protocol from 10x Genomics (demonstrated protocol: isolation of nuclei from single cell suspensions, #CG000124, revD). We replaced the original lysis buffer with Nuclei EZ Prep buffer (Sigma Aldrich, # NUC101-1KT) with a total lysis time of 5 min and proceeded according to the referenced protocol. Loss of cell viability as a proxy for nuclei extraction quality and the number of nuclei were assessed with a LUNA FL counter, using 1 μl of acridine orange and propidium iodide dye (F23001, Westburg).

Cell and nuclei labelling were performed according to an adapted Biolegend cell hashing protocol (TotalSeq™-A Antibodies and Cell Hashing with 10x Single Cell 3' Reagent Kit v3 3.1 Protocol, Biolegend). Briefly, cells were then incubated for 10 min with Fc receptor block (PN 422301, TruStain FcX, BioLegend) Nuclei were labelled with oligo-tagged anti-nucleoporin antibodies (#Mab414, Biolegend) (Sup . Table 1 ). MCF7, PC3, DU145 and MDA-MB-231 nuclei (840 000 of each line) were labelled with 0.5 µg TotalSeq-A0456, -A0457, -A0458 and -A0459, respectively, in 2% BSA solution with 0.02% Tween-20, 10mM Tris, 146mM NaCl, 1mM CaCl2, 21mM MgCl2 (Gaublomme et al. 2019) . After the last wash nuclei were resuspended in the same labelling buffer as above with lower concentration of BSA (1%) and pooled prior to loading on chip B from 10x Genomics.

The DNA library preparation was performed according to the manufacturer's guidelines:

CG000186 Rev A protocol for 5' 10x Genomics chemistry and CG000185 Rev B protocol for 3' 10x Genomics (v3 chemistry). The anchor and co-anchor of the commercial LMOs are available through Integrated DNA technologies. Also, the anchor and co-anchor of the CMOs were conjugated to cholesterol at the 3′ or 5′ ends via a triethylene glycol (TEG) linker and are also commercially available from Integrated DNA Technologies (Sup. 

The generated libraries were pooled targeting 85% mRNA and 15% hashtag oligo (HTO) and pairedend sequenced on individual lanes on a HiSeq4000 (Illumina) instrument, using the following read lengths: 28 bp Read1, 8 bp I7 Index and 91 bp Read2.

PBMCs from 3 healthy control individuals and 3 SARS-CoV-2 patients were isolated as follows. Whole 

The raw sequencing data were demultiplexed and further converted into a single-cell level gene 

The deconvolution of the four cell lines was determined by a genotype-free tool freemuxlet which is an extension of demuxlet (Kang et al. 2018 GRCh38 genetic variants data (Auton et al. 2015) adapted to the index used for mapping and the variant allele frequency > 0.9 and <0.1. were discarded. Next, freemuxlet with default parameters was used to determine sample identity and identify doublets.

To assign hashing tags for each cell two strategies implemented in Seurat 3.1.4 package ( Importantly, the tested demultiplexing functions from Seurat use antibody-or lipid-derived hashtag oligo (HTO) expression data for cell line annotation, while freemuxlet uses transcriptome data and a

Bayes Factors approach to evaluate the likelihood of a hashed droplet being a doublet and estimation of its cell line origin. Freemuxlet exploits a similar algorithm as demuxlet (Kang et al. 2018 ), but does not require externally genotyped data (popscle package).

One of the first characteristics evaluated for all types of hashing techniques in our study was the level of expression of each HTO. When comparing the expression of each antibody-or lipidderived hashtag oligo overlaid on the Uniform Manifold Approximation and Projection (UMAP) derived from the gene expression data, we observed a distinct barcode for each of the four cell line clusters. Cells with more than one HTO generally group together in between the large clusters,

suggesting that these groups of cells are indeed doublets/multiplets. (Figure 2A ). Each cell cluster expresses an intrinsic gene marker relevant to each cell line (Sup. Fig. 1 ). These findings are corroborated when each cluster is identified based on inherent genetic variation using freemuxlet ( Figure 2B) , clearly separating the four cell lines and doublets. We also evaluated sample multiplexing on the same cells using 5' scRNA-seq chemistry and compatible TotalSeq-C hashing antibodies (Figure 3) . Cells demultiplexed using MULTISeqDemux had a high concordance (95 % for TotalSeq-A and 96% for TotalSeq-C hashing) with freemuxlet-based annotation (singlets only) ( Figure   2B , 3B and Table 2 ). The custom lipid (LMO) and MULTI-Seq lipid hashing techniques demonstrated 68.5 % and 84.9 % hashing efficiency, respectively ( Table 2) (Figure 2D, 3D) . We observed ~2 times less unique molecular identifiers (UMIs) and ~3

times less genes per cell in "Negatives" compared to singlets in case of antibody hashing (Figure 2D,   3D ). For the lipid hashing (LMOs) experiments this ratio was only 1.1 times less for both number of UMIs and genes per cell (Figure 4D, 5D) . Even though the hashtag fractions of at least "TotalSeq-A cell" and "custom LMO cell" samples were sequenced to the same depth -around 400 hashtag reads per cell (Table 1) . It's noticeable that lower hashing efficiency when using lipids compared to TotalSeq antibodies coincides with higher number of cells in the lipid hashing samples (Table 1, Sup.   Fig. 2) . Nevertheless, the repeated TotalSeq-A experiment with an increased number of cells (17611) still demonstrated better hashing efficiency (90.9 %) even when compared to the MULTI-Seq lipid hashing (84.9 %, 16827 cells) (Table 1 and 2) . Remarkably, the number of "Negatives" was relatively higher in DU145 cells hashed with both types of lipid hashes (custom and MULTI-seq) (Figure 4C,   5C ), which is in line with the relatively low hashtag expression in this cell line (Figure 4A, 5A) . Table 4 ).

Regarding the performance of the demultiplexing functions from Seurat, we noticed that MULTISeqDemux function (autoTresh =T) overall correctly deconvolutes more cells than HTODemux function (default parameters) ( Table 2) . 11 ).

Additional to the comparison of different hashing techniques on cells, we also estimated the level of expression of CD298 and β2-microglobulin in different human cells (mainly cancer cell lines) using flow cytometry. Both antigens are targeted by human TotalSeq -A and -C hashing antibodies and for their detection we used flow cytometry antibodies with the same clones. All the tested 17 human cell types including PBMCs, HEK293A and THP-1 cells express both antigens, potentially enabling antibody-based hashing on these cells (Sup. Fig. 7) . To conclude, the Totalseq -A and -C hashing antibodies provide excellent hashing capabilities for a wide range of human sample types.

Gene-cell matrices were generated using CellRanger, followed by log-transformation of gene UMI counts and cell clustering (gene expression, Principal Component Analysis (PCA) reduction) using Seurat. The hashtag UMI counts were CLR-transformed (centered log-ratio) and visualised in blue color on the gene expression UMAP plots. B. Cell annotation (4 cell lines) was performed using freemuxlet (gene expression) or Seurat (MULTISeqDemux function applied on hashtag counts data) and visualized on the gene expression UMAP plots. C. MULTISeqDemux-annotated singlets (MCF7, PC3, DU145 or MDAMB231) and negatives (cells with background expression for each hashtag) were matched with the freemuxletannotated cells (MCF7, PC3, DU145 or MDAMB231). The rest (unmatched) of freemuxlet-annotated singlets were assigned as doublets and altogether visualized in barplots. D. Gene expression were log-transformed using Seurat and detected genes (left plot), UMIs (middle plot) and percentage of mitochondrial genes expression (right plot) in cells were visualised as violin-box plots with median values highlighted in red, across MULTISeqDemux-annotated groups (singlets, doublets, negatives on basis of hashtag expression). 

For each of the four cell lines we extracted the nuclei to compare the antibody-based and lipidbased hashing efficiency. Nuclei hashing using lipids (cholesterol-modified oligos or CMO) and

TotalSeq-A antibodies demonstrated overall lower signal-to-noise ratio of hashtag expression compared to TotalSeq antibody cell hashing (Figure 6A , 7A vs Figure 2A, 3A) . Nevertheless, the nuclei hashed with cholesterol oligos followed by demultiplexing using MULTISeqDemux demonstrated 84% concordance with the reference annotation by freemuxlet (singlets correctly assigned to one of each 4 cell lines), or 79% when using HTODemux function ( Table 2) . TotalSeq-Abased nuclei hashing was less accurate (50% of all freemuxlet-annotated singlets for MULTISeqDemux and 54% for HTODemux function), mainly due to a high number of nuclei recognized as "Negatives" (cells with a background signal for each hashtag) by the demultiplexing functions used for this benchmarking study (Figure 7B, 7C) . It was especially prominent in the MCF7 cell cluster (Figure 7B, 7C) . However, it was possible to improve the hashing efficiency by adjusting parameters of the demultiplexing functions. For example, changing "positive.quantile" parameter in HTODemux function from a default value 0.99 to 0.9, increased hashing efficiency of the TotalSeq antibody nuclei dataset by more than 10% (from 54% to 65%) (Sup. Fig. 6) . We also observed that a large number of negatives in MCF7 cluster ( Figure 7B) is rather explained by an issue with the antibodies "hashtag 1", since when other cells (DU145) were stained with antibodies from the same vial in a repeated experiment, many DU145 cells were also recognized as "Negatives" by Seurat ( Sup. Fig. 11B ). It is also worth mentioning that in this repeated "TotalSeq A nuclei rep2" experiment we reduced the number of captured nuclei to 9868 (there were 23451 nuclei in the first experiment). This potentially improved the hashing efficiency, which resulted in a higher number of correctly annotated singlets using MULTISeqDemux (63 % versus 50% in the first "TotalSeq A nuclei" experiment) (Sup. Fig. 2) . Importantly, CMO nuclei hashing still outperformed the TotalSeq A nuclei hashing (85 % vs 77 %), when hashtag 1 (DU145) was excluded from the hashing efficiency calculation and compared to CMO nuclei hashing on three cell lines.

Gene-nuclei matrices were generated using CellRanger v3, followed by log-transformation of gene UMI counts and nuclei clustering (gene expression, PCA reduction) using Seurat. The hashtag UMI counts were CLRtransformed and visualised in blue color on the gene expression UMAP plots. B. Nuclei annotation (4 cell lines) was performed using freemuxlet (gene expression) or Seurat (MULTISeqDemux function applied on hashtag counts data) and visualized on the gene expression UMAP plots. C. MULTISeqDemux-annotated singlets (MCF7, PC3, DU145 or MDAMB231) and negatives (cells with background expression for each hashtag) were matched with the freemuxlet-annotated nuclei (MCF7, PC3, DU145 or MDAMB231). The rest (unmatched) of freemuxlet-annotated singlets were assigned as doublets and altogether visualized in barplots. D. Gene expression were log-transformed using Seurat and detected genes (left plot), UMIs (middle plot) and percentage of mitochondrial genes expression (right plot) in nuclei were visualised as violin-box plots with median values highlighted in red, across MULTISeqDemux-annotated groups (singlets, doublets, negatives on basis of hashtag expression).

Gene-nuclei matrices were generated using CellRanger, followed by log-transformation of gene UMI counts and nuclei clustering (gene expression, PCA reduction) using Seurat. The hashtag UMI counts were CLR-transformed and visualised in blue color on the gene expression UMAP plots. B. Nuclei annotation (4 cell lines) was performed using freemuxlet (gene expression) or Seurat (MULTISeqDemux function applied on hashtag counts data) and visualized on the gene expression UMAP plots. C. MULTISeqDemux-annotated singlets (MCF7, PC3, DU145 or MDAMB231) and negatives (cells with background expression for each hashtag) were matched with the freemuxlet-annotated nuclei (MCF7, PC3, DU145 or MDAMB231). The rest (unmatched) of freemuxlet-annotated singlets were assigned as doublets and altogether visualized in barplots. D. Gene expression were log-transformed using Seurat and detected genes (left plot), UMIs (middle plot) and percentage of mitochondrial genes expression (right plot) in nuclei were visualised as violin-box plots with median values highlighted in red, across MULTISeqDemuxannotated groups (singlets, doublets, negatives on basis of hashtag expression).

In the first experiment with more nuclei the antibody hashing efficiency, excluding the MCF7 cells (hashtag 1), only reached 64 %.

Interestingly, the tested hashing techniques on nuclei were characterised by an opposite pattern compared to cell hashing in respect to the background expression for each hashtag (reflected in number of detected "Negatives"). Accordingly, the MULTISeqDemux function annotated more droplets (33%) as "Negatives" in antibody nuclei hashing compared to CMO nuclei hashing experiment (8.1% droplets detected as negatives) (Figure 5C vs 6C) . This contrasts with the hashing of cells where antibody hashing delivered less "Negatives" compared to the lipid hashing ( Figure 2C , 3C vs 4C, 5C).

Additionally, we observed ~1,6 times less genes and UMIs per cell in "Negatives" compared to singlets in case of the lipid nuclei hashing (CMO) (Figure 5D) . While for the antibody nuclei hashing experiment, this ratio was 1.2, for both number of UMIs and genes per cell ( Figure 6D ). The mislabelling hashtag rate for the nuclei samples varied from 0.89% of swapped labels among singlets for CMO technique to ~ 4% for TotalSeq-A nuclei hashing (Sup . Table 4 ).

We also wanted to compare some gene expression-related metrics in samples from the hashing experiments of this study and other non-hashed samples on the single cell line normalized to the same sequencing depth. We did not observe major differences in number of genes and UMIs per nucleus when comparing MCF7 nuclei from hashing (antibody and lipid) and non-hashed experiments (Sup. Fig. 9 ). The same conclusion was valid for all the tested hashing techniques on MCF7 cells (Sup. Fig. 10 ).

In general, opposite to the cell hashing we observed better hashing results on nuclei when using lipid-based hashing (CMO) compared to the antibody (TotalSeq-A) nuclei hashing.

In the context of the COVID-19 virus pandemic, we also tested TotalSeq-A antibody hashing on PBMCs from healthy individuals and SARS-CoV-2 patients in the frame of COVID-19 clinical study (NCT04326920). All samples additionally contained a large 277 antibody CITE-seq (TotalSeq-A antibodies) panel, thus making this PBMC hashing evaluation even more relevant for other multiomics scRNA-seq experiments. We could correctly assign 84% of all singlets to an appropriate patient with MULTISeqDemux (autoTresh = T) function and using freemuxlet annotation as a reference, in both pools: 3 healthy and 3 SARS-CoV-2 patients ( Table 2) The MULTISeqDemux function annotated 5.2% and 6.3% cells as "Negatives" in antibody hashing of healthy and diseased samples, respectively (Figure 8C, 9C) . The number of genes per cell in "Negatives" compared to singlets was 2.6 times lower (for UMI -3.3 times less) in a healthy control sample ( Figure 8D ). For the antibody hashing of diseased patients this ratio was 1.2 times lower for both number of UMIs and genes per cell ( Figure 9D ). Despite this, as we mentioned above, the hashing efficiency for both groups of PBMC samples was 84% ( Table 2) . According to freemuxlet, the hashtag mislabelling rate was 4.76% of swapped singlet labels for healthy sample and 1.53% for SARS-CoV-2 sample (Sup . Table 4 ).

Gene-cell matrices were generated using CellRanger, followed by log-transformation of gene UMI counts and cell clustering (gene expression, PCA reduction) using Seurat. The hashtag UMI counts were CLR-transformed and visualised in blue color on the gene expression UMAP plots. B. Cell annotation (3 patients) was performed using freemuxlet (gene expression) or Seurat (MULTISeqDemux function applied on hashtag counts data) and visualized on the gene expression UMAP plots. C. MULTISeqDemuxannotated singlets (PBMC1, PBMC2 or PBMC3) and negatives (cells with background expression for each hashtag) were matched with the freemuxlet-based annotation (PBMC1, PBMC2 or PBMC3). The rest (unmatched) of freemuxlet-annotated singlets were assigned as doublets and altogether visualized in barplots. D. Gene expression were log-transformed using Seurat and detected genes (left plot), UMIs (middle plot) and percentage of mitochondrial genes expression (right plot) in cells were visualised as violin-box plots with median values highlighted in red, across MULTISeqDemuxannotated groups (singlets, doublets, negatives on basis of hashtag expression).

Gene-cell matrices were generated using CellRanger, followed by log-transformation of gene UMI counts and cell clustering (gene expression, PCA reduction) using Seurat. The hashtag UMI counts were CLR-transformed and visualised in blue color on the gene expression UMAP plots. B. Cell annotation (3 patients) was performed using freemuxlet (gene expression) or Seurat (MULTISeqDemux function applied on hashtag counts data) and visualized on the gene expression UMAP plots. C. MULTISeqDemuxannotated singlets (PBMC1, PBMC2 or PBMC3) and negatives (cells with background expression for each hashtag) were matched with the freemuxlet-based annotation (PBMC1, PBMC2 or PBMC3). The rest (unmatched) of freemuxlet-annotated singlets were assigned as doublets and altogether visualized in barplots. D. Gene expression were log-transformed using Seurat and detected genes (left plot), UMIs (middle plot) and percentage of mitochondrial genes expression (right plot) in cells were visualised as violin-box plots with median values highlighted in red, across MULTISeqDemuxannotated groups (singlets, doublets, negatives on basis of hashtag expression). It is important to mention that our comparison of different hashing strategies was applied to cells from filtered gene-cell matrices as produced by CellRanger v3 without further outlier filtering (e.g. based on mitochondrial gene expression). We could see in our PBMC dataset that in terms of cell calling, this filtered matrix contains almost the same number of cells as a "raw" matrix after filtering out all droplets with less than 200 UMIs (gene expression) and all droplets expressing one gene in less than 3 droplets. We observed that the tested demultiplexing functions from Seurat assigned some cells with relatively low number of genes and UMIs to the group "negatives". Hence, not surprisingly the hashing efficiency can be improved by filtering out the cell outliers. For example, correct PBMCs annotation (healthy control patients) was improved from 84,1% to 89,4 % after filtering out 2660 from 14630 cells using the "scater" Bioconductor package (outliers on basis of number of genes and UMIs per cell). On the other hand, a stringent filtering might cause a loss of biologically relevant cell subtypes. Thus, optimisation of the sample demultiplexing should be performed in parallel with a deeper biological sample analysis.

We can conclude that the type of hashing strategy must be chosen based on the hashtag antigen expression, since it is known that some cells might not express both antigens targeted by available TotalSeq-hashing antibodies. For instance, CD45 and MHC II antigens (targeted by mouse hashing antibodies from BioLegend) in mouse C3 and B16-BL6 melanoma cells (Sup. Fig. 8 ).

However, a relatively lower affinity of certain cells towards different type of lipid hashes can be also observed. Nevertheless, lipid-based hashing might be a preferred strategy for multiplexing of samples with a low or unknown expression of hashing antigens, provided the sample demultiplexing is well optimized.

In the present study on human cancer cell lines, the antibody-based hashing when targeting cells overall performed better than the lipid-based hashing. When labelling nuclei from the same cell lines, cholesterol-based hashes (CMO) demonstrated better hashing efficiency compared to the antibody-based labelling (TotalSeq -A). This result can be partially explained by the potential detrimental effect of the lysis buffer (Nuclei EZ Lysis Buffer) used for nuclei isolation on the nuclei surface proteins. The comparison of TotalSeq -A based cell hashing with TotalSeq-C cell hashing compatible with 3' and 5' gene 10x Genomics sequencing respectively, demonstrates similarly good hashing efficiency. Alternatively, other antibody-based hashing methods are also available on the market, e.g. hashing antibodies from BD Biosciences compatible with the BD Rhapsody platform for scRNA-seq experiments. Overall, we can conclude that various hashing techniques can be successfully applied on cells and nuclei, with different hashing efficiency metrics though, to reduce costs and batch effects of scRNA-seq experiments.

A Human Liver Cell Atlas Reveals Heterogeneity and Epithelial Progenitors

A Global Reference for Human Genetic Variation

Integrating Single-Cell Transcriptomic Data across Different Conditions, Technologies, and Species

A Single-Cell Transcriptome Atlas of the Aging Drosophila Brain

DoubletDecon: Deconvoluting Doublets from Single-Cell RNA-Sequencing Data

MHC Class-I Loss and Cancer Immune Escape

Nuclei Multiplexing with Barcoded Antibodies for Single-Nucleus Genomics

Highly Multiplexed Single-Cell RNA-Seq for Defining Cell Population and Transcriptional Spaces Jase

RNA-Sequencing from Single Nuclei

CellTag Indexing: Genetic Barcode-Based Sample Multiplexing for Single-Cell Genomics

Massively Parallel Single-Nucleus RNA-Seq with DroNc-Seq

Div-Seq: Single-Nucleus RNA-Seq Reveals Dynamics of Rare Adult Newborn Neurons

Mapping the Mouse Cell Atlas by Microwell-Seq

CD45: A Critical Regulator of Signaling Thresholds in Immune Cells

A Single-Cell Atlas of Mouse Brain Macrophages Reveals Unique Transcriptional Identities Shaped by Ontogeny and Tissue Environment

Multiplexed Droplet Single-Cell RNA-Sequencing Using Natural Genetic Variation

Neuronal Subtypes and Diversity Revealed by Single-Nucleus RNA Sequencing of the Human Brain

MULTI-Seq: Sample Multiplexing for Single-Cell RNA Sequencing Using Lipid-Tagged Indices

DoubletFinder : Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors

A Cell Atlas of Human Thymic Development Defines T Cell Repertoire Formation

Multiplexed Quantification of Proteins and Transcripts in Single Cells

Single-Cell Transcriptomics of 20 Mouse Organs Creates a Tabula Muris

Abseq: Ultrahigh-Throughput Single Cell Protein Profiling with Droplet Microfluidic Barcoding

Simultaneous Epitope and Transcriptome Measurement in Single Cells

Cell Hashing with Barcoded Antibodies Enables Multiplexing and Doublet Detection for Single Cell Genomics

Comprehensive Integration of Single-Cell Data Resource Comprehensive Integration of Single-Cell Data

Scrublet : Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data

We gratefully acknowledge the help of Gert Van Isterdael with flow cytometry experiments and the analysis, Christopher McGInnis from UCSF for providing the MULTI-seq reagents, Mark Fiers for providing with pre-mRNA genome, Kris Davie for providing us with the vcf files, Kevin Verstaen for help with optimising CellRanger and for valuable discussions on analysis, and Junbin Qian for optimising and running demuxlet. This work was funded by the "VIB Tech Watch -Janssen" collaboration.

No potential conflict of interest was reported by the authors.