Copy-scAT: Deconvoluting single-cell chromatin accessibility of genetic subclones in cancer


Copy-scAT: Deconvoluting single-cell chromatin accessibility of genetic subclones in cancer 1 

 2 

Ana Nikolic1,2,3, Divya Singhal1,2,3, Katrina Ellestad1,2,3, Michael Johnston1,2,3, Aaron Gillmor1,2,3, Sorana 3 
Morrissy1,2,3, Jennifer A Chan1,2,4, Paola Neri1,4, Nizar Bahlis1,4, Marco Gallo1,2,3* 4 

 5 
1Arnie Charbonneau Cancer Institute 6 
2Alberta Children’s Hospital Research Institute 7 
3Department of Biochemistry and Molecular Biology 8 
4Department of Oncology 9 

Cumming School of Medicine, University of Calgary, Calgary, AB, Canada 10 

 11 

*Corresponding author:  Marco Gallo  12 

    marco.gallo@ucalgary.ca 13 
 14 
 15 
ABSTRACT 16 

Single-cell epigenomic assays have tremendous potential to illuminate mechanisms of transcriptional control 17 
in functionally diverse cancer cell populations. However, application of these techniques to clinical tumor 18 
specimens has been hampered by the current inability to distinguish malignant from non-malignant cells, 19 
which potently confounds data analysis and interpretation. Here we describe Copy-scAT, an R package that 20 
uses single-cell epigenomic data to infer copy number variants (CNVs) that define cancer cells. Copy-scAT 21 
enables studies of subclonal chromatin dynamics in complex tumors like glioblastoma. By deploying Copy-22 
scAT, we uncovered potent influences of genetics on chromatin accessibility profiles in individual subclones. 23 
Consequently, some genetic subclones were predisposed to acquire stem-like or more differentiated 24 
molecular phenotypes, reminiscent of developmental paradigms. Copy-scAT is ideal for studies of the 25 
relationships between genetics and epigenetics in malignancies with high levels of intratumoral heterogeneity 26 
and to investigate how cancer cells interface with their microenvironment. 27 
 28 
 29 
INTRODUCTION 30 
 31 
Single-cell genomic technologies have made enormous contributions to the deconvolution of complex 32 
cellular systems, including cancer (1). Single-cell RNA sequencing (scRNA-seq) in particular has been widely 33 
employed to understand the implications of intratumoral transcriptional heterogeneity for tumor growth, 34 
response to therapy and patient prognosis (2–6). This field has hugely benefited from an emerging ecosystem 35 
of computational tools that have enabled complex analyses of scRNA data. Since copy number variants 36 
(CNVs) mostly accrue in malignant cells and are rare in non-malignant tissues, computational platforms that 37 
use scRNA data to call CNVs have resulted in improved understanding of the behavior of genetic subclones 38 
in tumors (7–9).  39 

On the other hand, the application of single-cell epigenomic techniques, including the assay for transposase 40 
accessible chromatin (scATAC) (10, 11), to study cancer has been slowed by computational bottlenecks. For 41 
instance, unlike scRNA-seq, currently no dedicated tool exists to call CNVs using scATAC data. This 42 
technical gap has prevented scATAC studies of clinical tumor specimens, which often are surgical resections 43 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


2 

that include both malignant and non-malignant cells. Inability to deconvolute these cell populations after the 44 
generation of scATAC datasets would confound downstream analyses and interpretation of this data type.  45 

In this report, we describe Copy-scAT (Copy number inference using scATAC-seq data), a new 46 
computational tool that uses scATAC datasets to call CNVs at the single-cell level. Using scATAC datasets 47 
from adult glioblastoma (aGBM), pediatric GBM (pGBM) and multiple myeloma (MM), we demonstrate the 48 
effectiveness of Copy-scAT in calling (A) focal amplifications, (B) segmental gains and losses and (C) 49 
chromosome arm-level gains and losses. At the most basic level, Copy-scAT can therefore discriminate 50 
between malignant and non-malignant cells in scATAC datasets based on the presence or absence, 51 
respectively, of CNVs. This distinction is fundamental to ensure that downstream analyses include only the 52 
appropriate tumor or microenvironment cell populations. At a more sophisticated level, we show that 53 
implementation of Copy-scAT allows investigations of the relationship between genetic and epigenetic 54 
principles governing the behavior of individual subclones. In this regard, we show that each genetic subclone 55 
has characteristic accessible chromatin profiles, indicating that genetics imparts information that determines 56 
key epigenetic features. Strong influence of genetics on chromatin states is demonstrated by the 57 
predisposition of genetic subclones to have stem-like or more differentiated molecular profiles in GBM.   58 

 59 
RESULTS 60 
 61 
Design and implementation of Copy-scAT 62 

We designed Copy-scAT, an R package that uses scATAC-seq information to infer copy number alterations. 63 
Copy-scAT uses fragment files generated by cellranger-atac (10xGenomics) as input to generate chromatin 64 
accessibility pileups, keeping only barcodes with a minimum number of fragments (defaulting to 5,000 65 
fragments). It then generates a pileup of total coverage (number of reads × read lengths) over bins of 66 
determined length (1 million bp as default) (Fig. 1a). Binned read counts then undergo linear normalization 67 
over the total signal in each cell to account for differences in read depth, and chromosomal bins which 68 
consist predominantly of zeros (at least 80% zero values) are discarded from further analysis. All parameters, 69 
including reference genome, bin size, and minimum length cut-off are user-customizable. Copy-scAT then 70 
implements different algorithms to detect focal amplifications and larger-scale copy number variation. 71 

To call focal amplifications (Fig. 1b), Copy-scAT generates a linear scaled profile of density over normalized 72 
1 Mbp bins along each chromosome on a single-cell basis, centering on the median and scaling using the 73 
range. Copy-scAT then uses changepoint analysis (12) (see Methods) to identify segments of abnormally high 74 
signal (Z score > 5) along each chromosome in each single cell. These calls are then pooled together to 75 
generate consensus regions of amplification, in order to identify putative double minutes and 76 
extrachromosomal amplifications. Each cell is scored as positive or negative for each amplified genomic 77 
region.  78 

Segmental losses are called in a similar fashion, by calculating a quantile for each bin on a chromosome, 79 
running changepoint analysis to identify regions with abnormally low average signal, and then using Gaussian 80 
decomposition of total signal in that region to identify distinct clusters of cells.  81 

For larger copy number alterations, Copy-scAT pools the bins further at the chromosome arm level using a 82 
trimmed mean, while normalizing the data on the basis of length of CpG islands contained in each bin (Fig. 83 
1c). Data is then scaled for each chromosome arm, compared to a pseudodiploid control (expected signal 84 
distribution for a diploid genotype) that is modeled for each sample, and cluster assignments are generated 85 
using Gaussian decomposition. Cluster assignments are then normalized to get an estimate of copy number 86 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


3 

for each cell (Fig. 1d). These assignments can be optionally combined with clustering information to 87 
generate consensus genotypes for each cluster of cells and further filter false positives (Fig. 1e) For full 88 
details regarding the execution of Copy-scAT, see Methods. A step-by-step tutorial for Copy-scAT is 89 
available on GitHub (see Methods). 90 

 91 

Fig. 1. Copy-scAT workflow. 92 
(a) Copy-scAT accepts barcode fragment matrices generated by cellranger (10xGenomics) as input. 93 
(b) Large peaks in normalized coverage matrices can be used to infer focal CNVs. 94 
(c) Normalized matrices can be used to infer segmental and chromosome-arm level CNVs. 95 
(d) Example of chromosome-arm level CNV (chromosome 10p loss) called by Copy-scAT 96 
(e) Consensus clustering is used to finalize cell assignment. 97 
 98 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


4 

Copy-scAT effectively calls CNVs in diverse malignancies 99 
We have tested the ability of Copy-scAT to use scATAC data to call CNVs with three different approaches 100 
and with different tumor types. First, we benchmarked Copy-scAT against CNV calls made with whole-101 
genome sequencing (WGS) data for adult GBM (aGBM) surgical resections (n = 4 samples, 3,647 cells). This 102 
approach consisted in isolating nuclei from flash-frozen aGBM samples, mixing nuclei in suspension, and 103 
then using these nuclei for either scATAC or WGS library construction (Fig. 2a). This was meant to ensure 104 
similar representation of genetic subclones, which are usually regionally contiguous in this solid tumor, in 105 
both scATAC and WGS libraries. Second, we benchmarked Copy-scAT against CNV calls made using 106 
pediatric GBM (pGBM) surgical resections (n = 6 patient-matched diagnostic-relapse samples, 33,695 cells). 107 
In this case, scATAC and WGS libraries were generated from separate geographical regions of the same 108 
tumor (Fig. 2b). Third, we benchmarked Copy-scAT against CNV calls made with the single-cell CNV 109 
(scCNV) assay (10xGenomics) using multiple myeloma (MM) clinical samples (n = 10 samples, 31,266 cells). 110 
Overall, we observed that Copy-scAT correctly inferred all or most of the CNVs that were called with WGS 111 
(Figs. 2a,b; Figs. S1, S2) or scCNV data (Fig. 2c; Fig S3). In total, we profiled 51,571 cells from 20 112 
malignancies from 17 patients, and were able to infer CNV status for a total of 39,486 cells (Table S1).  On 113 
average, we were able to call CNVs for 78.09% of cells in each sample (range: 29.16 – 91.22%) (Table S1). 114 
For chromosome-arm level CNV gains, sensitivity ranged from 0.51 for MM to 1.0 for aGBM and specificity 115 
ranged from 0.93 to 0.94 (Table S2). For chromosome arm-level losses, sensitivity ranged from 0.67 to 0.79 116 
and specificity from 0.89 to 0.95. The sensitivity and specificity of focal amplifications were very high 117 
(>0.975, Table S2). The variation observed may reflect technical differences between the strategies used for 118 
benchmarking. As expected, the calls of Copy-scAT for aGBM were the most accurate, likely because 119 
scATAC and WGS datasets were generated by relatively homogeneous starting material, as described above. 120 
Because of its design, it is also possible that Copy-scAT is more sensitive at inferring CNVs that occur in 121 
relatively rare subclones compared to WGS, potentially explaining (in addition to true false positives) why 122 
the number of CNVs inferred by our new tool is sometime higher than inferences made with WGS.  123 
 124 
scATAC data can be used to distinguish malignant from non-malignant cells 125 
Tumor cells often harbor CNVs, and we reasoned that the use of Copy-scAT should enable the use of 126 
scATAC data to infer CNVs and therefore to distinguish between malignant and non-malignant cells. To 127 
test this hypothesis, we overlayed CNVs called by Copy-scAT onto scATAC datasets displayed in uniform 128 
manifold approximation and projections (UMAP) plots. This exercise led to the identification of cells that 129 
were clearly positive for multiple CNVs and others that appeared to have a normal genome. As an illustrative 130 
example, we found that the aGBM sample CGY4349 was composed of discrete cell populations that 131 
harbored focal amplifications at the MDM4 (Fig. 2d), PDGFRA (Fig. 2e) and EGFR (Fig. 2f) loci, as well 132 
as chromosome 10p deletion (Fig. 2g) and chromosome 7 gain (Fig. 2h,i). Copy-scAT results suggest 133 
specific lineage relationships between subclones. For instance, chromosome 7 amplifications are clonal in 134 
this sample (Fig. 2h,i), whereas the chromosome 10 deletion is subclonal (Fig. 2g). In addition, our 135 
computational tool predicts that PDGFRA (Fig. 2e) and EGFR (Fig. 2f) focal amplifications are mutually 136 
exclusive, a phenomenon that has been reported in aGBM (13).  137 

 138 
Altogether, these results illustrate one specific population of cells (shaded green in Fig. 2i) that harbors 139 
several CNVs and are therefore putative cancer cells. At the same time, we also identified cells (labeled in 140 
dark blue in Fig. 2i) that did not appear to have any CNVs and are therefore likely to be cells from the tumor 141 
microenvironment. Equivalent results were obtained for pGBM (Fig. S4) and MM samples (Fig. S5). Since 142 
the latter appear as multiple scATAC clusters, it is possible that our strategy detects multiple distinct non-143 
neoplastic cell clusters. Differential motif analysis with ChromVAR confirmed high scores for neural 144 
progenitor cell-associated motifs like NFIC and ASCL1 in CNV+ cells (Fig. 2j,k), while the putative non-145 
neoplastic clusters showed increased occupancy at transcription factor motifs associated with hematopoietic 146 
lineages, such as IKZF1 (Fig. 2l). Another CNV- cluster showed enrichment of FOXG1 binding motifs in 147 
accessible chromatin, in keeping with a non-neoplastic neural cell identity (Fig. 2). Using this approach, it 148 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


5 

was possible to discriminate between malignant and cells from the tumor microenvironment in all tumor 149 
samples analyzed (Extended Figs. S6-S8). Copy-scAT therefore effectively uses scATAC data to infer 150 
CNVs, which can then be used to distinguish malignant from non-malignant cells and to infer lineage 151 
relationships between genetic subclones that coexist in a tumor. 152 
 153 

 154 
 155 

Fig. 2. Benchmarking of Copy-scAT with three methods involving clinical samples from three 156 
distinct malignancies. 157 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


6 

(a) Banked frozen aGBM samples were used for both scATAC and WGS. Nuclei were isolated from the 158 
samples, mixed, and used for both scATAC and WGS. Number of chromosome-arm level gains detected in 159 
adult GBM samples identified using both methods, versus total numbers of gains detected by scATAC or 160 
WGS. 161 
(b) Surgical pGBM resections were split, and one section was used for scATAC and the other for WGS. 162 
Number of chromosome-arm level gains detected in adult GBM samples identified using both methods, 163 
versus total numbers of gains detected by scATAC or WGS. 164 
(c) Multiple myeloma samples were profiled by both scATAC and the single-cell CNV assay. Number of 165 
chromosome-arm level gains detected in adult GBM samples identified using both methods, versus total 166 
numbers of gains detected by scATAC or scCNV assay. 167 
(d) MDM4 amplification in an adult GBM sample (CGY4349). Amplified cells are coloured dark blue, and 168 
normal cells in pale blue.  169 
(e) PDGFRA amplification in an adult GBM sample (CGY4349). Amplified cells are coloured dark blue, 170 
and normal cells in pale blue. 171 
(f) EGFR amplification in an adult GBM sample (CGY4349). Amplified cells are coloured dark blue, and 172 
normal cells in pale blue. 173 
(g) Chromosome 10p loss in an adult GBM sample. 174 
(h) Chromosome 7P gain in an adult GBM sample. 175 
(i) Chromosome 7Q gain in an adult GBM sample. 176 
(j) ChromVAR activity score for ASCL1. 177 
(k) ChromVAR activity score for NFIC. 178 
(l) ChromVAR activity score for IKZF1. 179 
(m) ChromVAR activity score for FOXG1. 180 
 181 
 182 
 183 
Subclonal genetics shapes chromatin accessibility profiles in aGBM 184 
We noticed that in most tumors we analyzed, cells harboring a given CNV had a tendency to cluster together 185 
(Fig. 2d-i). Individual clusters were in fact defined by the presence of specific CNVs (Fig. 3a-c). This was 186 
an unexpected observation, because it is widely assumed that clustering of scATAC data reflects the global 187 
patterns of chromatin accessibility. One possible explanation for this observation could be that chromosomal 188 
regions affected by a CNV display imbalances in the fragment depth distribution of scATAC datasets, and 189 
that these patterns have a dominant effect on cluster assignment. Most scATAC-seq workflows rely on some 190 
variant of term-frequency inverse document frequency (TF-IDF) normalization rather than feature scaling, 191 
and this may amplify the effects of CNV-driven DNA content imbalances. For instance, it is possible that 192 
focal amplifications of the PDGFRA locus result in increased frequency of transposition events that are 193 
mapped to this site. A dominant effect of chromatin accessibility at this amplified locus could result in 194 
PDGFRA-amplified cells clustering together in UMAP representations of scATAC data (Fig 3d,e). Indeed 195 
we found that compared to a random selection of peaks, the chromosomes which carried CNVs had 196 
significantly different numbers of peaks ranked as highly variant than chromosomes that did not have CNVs, 197 
leading to a markedly uneven distribution of top peaks (p < 2.2E-16; Chi-squared test; Fig. S9a) This was 198 
not seen in non-neoplastic cells, which had relatively even top fragment distribution patterns (p = 0.05472, 199 
Chi-squared test; Fig. S9b).  To test this hypothesis, we used Copy-scAT to call CNVs in our tumor samples, 200 
then removed all peaks mapping to chromosomes predicted to harbor CNVs, and finally re-clustered all cells 201 
in each sample (Fig. 3f). We found that although removing chromosomes with CNVs from our analyses 202 
changed the overall cluster structure of a sample (Fig. 3g), PDGFRA-amplified cells still clustered close to 203 
each other (Fig. 3h). In fact, our results indicate that clustering after CNV removal is more granular but 204 
overall very stable (Fig. 3i). In this case, PDGFRA-amplified cells localized to a single cluster before 205 
removing chromosomes affected by CNVs. Following removal of CNV+ chromosomes and re-clustering, 206 
most PDGFRA-amplified cells still clustered together, with only a few cells merging into a cluster that 207 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


7 

included both amplified and non-amplified cells. Comparing the most variable peaks after chromosome 208 
CNV removal showed a distribution closer to normal, supporting the marked effect of the CNVs on the 209 
identification of variant peaks (p = 2.418E-8; Fig. S9c). Contrary to current views of cancer epigenomics, 210 
these data indicate that genetic subclones may have characteristic patterns of chromatin accessibility, and 211 
that a cell’s genetic background has significant influence on its likelihood of attaining specific epigenetic 212 
states. 213 

 214 

 215 
Fig. 3. Subclonal genetics influences clustering of scATAC-seq data. 216 
(a-c) CNVs in adult GBM CGY4218 segregate within specific scATAC clusters. 217 
(d, e) PDGFRA-amplified cells cluster together in adult GBM CGY4349. 218 
(f) Diagram summarizing our strategy to remove CNVs from clustering of scATAC data. All chromosomes 219 
or regions with putative CNVs were removed from downstream analyses, and cells were re-clustered. 220 
(g) Reclustering of (d) following removal of chromosomes and regions affected by CNVs in CGY 4349. 221 
(h) Distribution of PDGFRA-amplified cells following re-clustering. 222 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


8 

(i) Cluster assignments of cells in CGY4349 (aGBM specimen) before and after removal of CNV-containing 223 
regions (purple: PDGFRA-amplified cells). 224 
 225 
 226 
Genetic events predispose subclones to the acquisition of developmental chromatin states 227 
We further explored the notion that CNVs may shape chromatin accessibility profiles and its possible 228 
implications for cell fate determination. As an illustrative example, we focused on an aGBM sample 229 
(CGY4218) where CNVs at chromosome 1p characterized three genetic subclones, as determined with 230 
Copy-scAT: (i) A subclone with two copies of chromosome 1p; (ii) a subclone with loss of 1p; (iii) a subclone 231 
with gain of 1p (Fig. 4a).  232 

 233 
We were interested in determining whether the major genetic subclones in this tumor had similar cycling 234 
properties. Unlike scRNA-seq, we found it is not possible to use scATAC profiles at cell cycle genes to 235 
determine whether a cell is proliferating. We reasoned that cells that are actively going through cell division 236 
have to replicate their DNA. Given that cancer cells have numerous CNVs on autosomes and could lead to 237 
noisy data, we decided to use Copy-scAT to identify cells that have doubled the number of their X 238 
chromosomes and defined them as actively cycling cells. To validate this approach, we determined the 239 
number of cells with double the number of expected X chromosomes – ie putative cycling cells – in 240 
previously published scATAC datasets for mouse brain and peripheral blood mononuclear cells (PBMCs). 241 
We hypothesized that we should be able to identify cycling cells in fetal mouse brain, but not in PBMCs. In 242 
fact, we detected numerous cycling cells (with twice the expected number of X chromosomes) in brain tissue 243 
but not in PBMCs (Fig. S10). This method detected putative cycling cells in our datasets (Fig. 4b). We used 244 
scATAC data to arrange cells from this tumor along pseudotime with the package STREAM (14) (Fig. 4c) 245 
and then superimposed cell cycle status determined with our X chromosome doubling method (Fig. 4d). 246 
The results show that cells along branch 2, which is strongly enriched for cells with chromosome 1p gains, 247 
are also the most proliferative (Fig. 4e), with over 25% of the cells actively going through replication (P = 248 
7.776 × 10-14; Chi-square test). On the other hand, ~5% of cells along branch 1 and ~15% of cells along 249 
branch 3 were cycling. These data therefore indicate functional differences between cells with gain or loss of 250 
chromosome 1p.  251 

 252 
We then used ChromVAR(15) and STREAM-ATAC to calculate scores for transcription factor (TF) binding 253 
motifs that are associated with neurodevelopmental processes. This analysis revealed that motifs bound by 254 
TFs that are associated with stem-like phenotypes, including OLIG2 and HOXA2, are enriched in accessible 255 
chromatin regions in cells that have one copy of chromosome 1p (Fig. 4f). Motifs bound by TFs associated 256 
with progenitor (Fig 4g) and differentiated states (Fig. 4h) were enriched in the branch with more cells 257 
showing gain of chromosome 1p. This was associated with a significant shift in the overall distribution of 258 
enrichment of these motifs in cells along the different branches of the trajectory (Fig. 4i-k). A distribution 259 
of genetic subclones along developmental chromatin accessibility states was observed in other tumor samples 260 
we studied (Fig. S11-S13). Overall, the data support the notion that tumor cells sample a discrete number 261 
of chromatin states, but their transition probabilities differ based on genotype. Consequently, chromatin 262 
states associated with each genetic subclone manifest as different functional properties, here demonstrated 263 
at the level of cell proliferation and stemness profiles. 264 

 265 
 266 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


9 

 267 
Figure 4. Subclonal genetic alterations predispose cells to adopt developmental chromatin states. 268 
(a) Cells were clustered based on scATAC ChromVAR motif scores, then shaded based on the presence of 269 
1, 2 or 3 copies of chromosome 1P. 270 
(b) Cells were shaded based on their predicted cycling properties. 271 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


10 

(a) Data shown in (a) projected onto pseudotime. The resulting three branches are populated preferentially 272 
by cells with gain or loss of chromosome 1P respectively. 273 
(d) Proliferation status as shown in (b), overlaid onto pseudotime. 274 
(e) Branches enriched for 1P gain show greater proportions of proliferative cells (statistics: Chi-squared test). 275 
(f) Scaled chromatin accessibility at binding motifs for OLIG2 and HOXA2, two TFs associated with 276 
stemness. 277 
(g) Scaled chromatin accessibility at binding motifs for RFX2 and NFIX, two TFs associated with 278 
progenitor-like phenotypes. 279 
(h) Scaled chromatin accessibility at binding motifs for RARA::RXRA and STAT3, two TFs associated with 280 
differentiated phenotypes. 281 
(i) Enrichment plot for motif Z scores for OLIG2 and HOXA2. 282 
(j) Enrichment plot for motif Z scores for RFX and NFIX. 283 
(k) Enrichment plot for motif Z scores for RARA::RXRA and STAT3. P values calculated by Kruskal-Wallis 284 
test. 285 
  286 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


11 

DISCUSSION  287 

Here we describe Copy-scAT, the first computational tool dedicated to inferring CNVs using scATAC data. 288 
Copy-scAT resolves a computational bottleneck that has restricted the application of single-cell epigenomic 289 
techniques to the study of clinical tumor samples, which are often mixtures of malignant and non-malignant 290 
cells. The presence of non-malignant cells can severely confound the analyses of these samples and 291 
downstream data interpretation. Cell admixture is a particular problem for scATAC data because of the 292 
inherent sparsity of these datasets and because they do not provide direct information on the expression 293 
status of cell lineage markers that could be used to solve cellular identities. Because most tumor types harbor 294 
CNVs, Copy-scAT provides a simple way of solving this problem.  295 

It is important to note that Copy-scAT enables users to perform analyses on both malignant and non-296 
malignant cells from a tumor sample, because cell barcodes associated with both presence or absence of 297 
CNVs can be selected for downstream analyses. Implementation of Copy-scAT will therefore be beneficial 298 
to groups interested in defining the epigenomes of both tumor cells and their microenvironment. Because 299 
chromatin accessibility datasets provide information on mechanisms of transcriptional regulation by distal 300 
and proximal enhancer and super enhancer elements, Copy-scAT could be useful in clarifying epigenetic 301 
mechanisms involved in immune suppression and T cell exhaustion, for instance. Copy-scAT also allows 302 
scATAC studies of frozen banked cancer specimens (see Methods), because it requires no prior knowledge 303 
of cell composition. 304 

We show that the underlying CNV architecture plays a significant role in clustering of scATAC data, a 305 
problem that is amplified by the use of TF-IDF algorithms for normalization. These effects are less 306 
pronounced when clustering is based on motif activity scores (e.g. ChromVAR), likely as this incorporates 307 
data from multiple chromosomes, thus dampening the effect of variation at any one specific locus. Further 308 
studies are needed to identify the optimal way to address the effects of CNVs in downstream analyses, as 309 
they may present a significant confounder and potentially mask significant biological relationships.  310 

In this report, we provide evidence that Copy-scAT can be used to shed new light on how genetics and 311 
epigenetics interface in cancer. We show that genetic subclones tend to have unique chromatin accessibility 312 
landscapes that can promote or antagonize stem-like phenotypes. Consequently, we report that some genetic 313 
subclones have greater proportions of stem-like cells, and others appear more differentiated. These results 314 
offer a radically different view of functional hierarchies in GBM, where stem-like properties were thought to 315 
be programmed by epigenetic factors, independently of genotype. These findings provide a simple 316 
explanation for the observed intra-tumoral transcriptional heterogeneity in GBM ((5, 16)), by suggesting that 317 
each genetic subclone achieves specific chromatin accessibility profiles, which in turn result in subclone-318 
specific transcriptional outcomes.  319 

Copy-scAT will enable future studies of subclonal chromatin dynamics in complex tumor types and may be 320 
an important tool in better understanding the functional relationships between subclones, their 321 
microenvironment and therapy response. 322 
 323 
 324 
MATERIALS AND METHODS 325 

 326 

Ethics and consent statement 327 

All samples were collected and used for research with appropriate informed consent and with approval by 328 
the Health Research Ethics Board of Alberta. 329 

 330 

scATAC-seq sample processing 331 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


12 

GBM samples were either frozen surgical resections (pediatric GBM) or cells dissociated from fresh surgical 332 
specimens and cryopreserved (adult GBM). Samples were dissociated in a 1.5 mL microcentrfuge tube, using 333 
a wide-bore P1000 pipette followed by a narrow bore P1000 pipette in nuclear resuspension buffer (10 mM 334 
Tris-HCl; 10 mM NaCl; 3 mM MgCl2; 0.1% IGEPAL, 0.1% Tween-20, 0.01% Digitonin, 1% BSA in PBS), 335 
then vortexed briefly, chilled on ice for 10 minutes, then pipetted again, and spun at 4°C, 500 g for 5 minutes. 336 
This step was repeated, and the sample was then resuspended in Tween wash buffer (10 mM Tris-HCl; 10 337 
mM NaCl; 3 mM MgCl2; 0.1% IGEPAL, 0.1% Tween-20; 1% BSA in PBS), then strained though a 35 μm 338 
cell strainer FACS tube (Fisher Scientific 08-771-23) to remove debris. Nuclei were then quantified by trypan 339 
blue on the Countess II (Invitrogen), spun down at 500 g at 4°C for 5 minutes, resuspended in the nuclear 340 
isolation buffer (10X Genomics), and the rest of the scATAC was performed as per the 10X Genomics 341 
protocol. MM samples were from bone marrow aspirates collected from patients; tumor cells were isolated 342 
from mononuclear cell fractions through Ficoll gradients coupled with magnetic bead sorting of CD138+ 343 
cells. scATAC libraries were prepared from GBM and MM samples using a Chromium controller 344 
(10xGenomics). Libraries were sequenced on NextSeq 500 or Novaseq 6000 instruments (Illumina) at the 345 
Centre for Health Genomics and Informatics (CHGI; University of Calgary) using the recommended 346 
settings. 347 

 348 

scATAC-seq initial data analysis 349 

The raw sequencing data was demultiplexed using cellranger-atac mkfastq (Cell Ranger ATAC, version 1.1.0, 350 
10x Genomics). Single cell ATAC-seq reads were aligned to the hg38 reference genome (GRCh38, version 351 
1.1.0, 10x Genomics) and quantified using cellranger-atac count function with default parameters (Cell 352 
Ranger ATAC, version 1.1.0, 10x Genomics). 353 

 354 

Single-cell CNV analysis 355 

Fragment pileup and normalization 356 

The fragment file was processed and signal was binned into bins of a preset size (default 1 Mb) across the 357 
hg38 chromosomes to generate a genome-wide read-depth map. Only barcodes with a minimum of 5000 358 
reads were retained, in order to remove spurious barcodes. This flattened barcode-fragment matrix pileup 359 
was cleaned by removal of genomic intervals which were uninformative (greater than 80% zeros) and 360 
barcodes with greater than a certain number of zero intervals. Cells passing this first filter were normalized 361 
with counts-per-million normalization using cpm in the edgeR package (17). 362 

 363 

Chromosome arm CNV analysis 364 

The normalized barcode-fragment matrix was collapsed to the chromosome arm level, using chromosome 365 
arm information from the UCSC (UCSC table: cytoBand), centromeres were removed, and signal in each 366 
bin was normalized using the number of basepairs in CpG islands in the interval using the UCSC CpG islands 367 
table (UCSC table: cpgIslandExtUnmasked). The signal was then summarized using a quantile-trimmed-368 
mean (between the 50th and 80th quantiles). Only chromosome arms with a minimum trimmed mean signal 369 
were kept for analysis. 370 

The chromosome arm signal matrix is mixed with a generated set proportion of pseudodiploid control cells, 371 
defined using the mean of chromosome segment medians with a defined standard deviation. This cell-signal 372 
matrix is then scaled across each chromosome arm and centered on the median signal of all chromosomes. 373 
Each chromosome arm segment is then analyzed using Gaussian decomposition with Mclust (18). The 374 
subsequent clusters are filtered based on Z scores and mixing proportions, and redundant clusters are 375 
combined. These Z scores are then translated into estimated copy numbers for each segment for each 376 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


13 

barcode. The barcode CNV assignments can be optionally used to assign consensus CNVs to clusters 377 
generated in other software packages such as Loupe or Seurat/Signac.  378 

 379 

Detection of amplifications 380 

The normalized barcode-fragment matrix was scaled and mean-variance changepoint analysis using the 381 
Changepoint package was performed for each cell and each chromosome to identify areas of abnormally 382 
high signal (Z score greater than 5) (19). The consensus coordinates of each amplification region were 383 
generated across all cells and only abnormalities affecting a minimum number of cells were kept for analysis. 384 

 385 

Detection of loss of heterozygosity 386 

The normalized barcode-fragment matrix was scaled as above. As overall coverage levels in these samples 387 
are quite sparse, a chromosome-wide coverage profile was generated for the entire sample in bulk, using the 388 
30% quantile as a cut-off, and then changepoint analysis was used to find inflection points. This was followed 389 
by Gaussian decomposition of the values using Mclust to identify putative areas of loss or gain, thresholded 390 
by a minimum difference in signal between the clusters identified by Mclust. 391 

 392 

scATAC trajectory analysis 393 

STREAM-ATAC and STREAM (20) were used to generate pseudotime trajectories based on motif 394 
occupancy profiles generated using ChromVAR (21) with the JASPAR 2018 motif database as reference (22). 395 
Dimensionality reduction was performed using the top 20 components and 50 neighbours, and an initial 396 
elastic graph was generated on the 2D UMAP projection using 10 clusters, using the kmeans method with 397 
n_neighbours = 30. An elastic principal graph was constructed using the parameters epg_alpha = 0.02, 398 
epg_mu = 0.05, epg_lambda = 0.02 and epg_trimmingradius = 1.2, with branch extension using 399 
‘QuantDists’. Trees were rooted using the branch with highest motif activities for OLIG2 and ETV motifs 400 
as root. 401 

 402 

Whole genome sequencing 403 

DNA was extracted from residual nuclei from the same samples and tissue fragments used for scATAC-seq 404 
of adult GBM samples, using the Qiagen DNEasy Blood and Tissue DNA extraction kit (Qiagen # 69504).  405 
Libraries were prepared using the NEBNext Ultra II DNA Library Prep Kit (#E7645) and sequenced on 406 
the Novaseq 6000 (Illumina) at the CHGI (University of Calgary), in paired-end mode.  407 

 408 

Whole genome data processing 409 

Genome data was aligned to the hg38 assembly using bwa mem (bwa  0.7.17)(23). Samtools was used to 410 
extract high-quality reads (Q > 30) and picard tools (Broad Institute) was used to remove duplicates (24).  411 

 412 

Whole genome SNV and CNV detection 413 

Gatk mutect2 (Broad Institute) was run on the filtered data to detect SNVs with low stringency using the 414 
following settings: --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter. CNVkit was subsequently 415 
used to call copy number variants using the following parameters: --filter cn -m clonal –purity 0.7 (25). Adjacent 416 
segments were further combined and averaged using bedtools (26). 417 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


14 

Data visualization and clustering 418 

Data was visualized and UMAP plots were generated using Seurat 3.0.0 and Signac 1.0.0 (27) and Cell Loupe 419 
version 4.0.0 (28). 420 

 421 

Statistical analysis 422 

Between-group differences in discrete values (e.g. chromosome peaks, branch assignments) were calculated 423 
using the Chi-squared test. Differences in non-parametric distributions (motif accessibility in clusters) were 424 
quantified using the Kruskal-Wallis test. 425 
 426 
 427 
References 428 

1.  B. Lim, Y. Lin, N. Navin, Advancing Cancer Research and Medicine with Single-Cell Genomics. 429 
Cancer Cell. 37, 456–470 (2020). 430 

2.  A. P. Patel, I. Tirosh, J. J. Trombetta, A. K. Shalek, S. M. Gillespie, H. Wakimoto, D. P. Cahill, B. V. 431 
Nahed, W. T. Curry, R. L. Martuza, D. N. Louis, O. Rozenblatt-Rosen, M. L. Suvà, A. Regev, B. E. 432 
Bernstein, Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 433 
(80-. ). (2014), doi:10.1126/science.1254257. 434 

3.  S. Darmanis, S. A. Sloan, D. Croote, M. Mignardi, S. Chernikova, P. Samghababi, Y. Zhang, N. 435 
Neff, M. Kowarsky, C. Caneda, G. Li, S. D. Chang, I. D. Connolly, Y. Li, B. A. Barres, M. H. 436 
Gephart, S. R. Quake, Single-Cell RNA-Seq Analysis of Infiltrating Neoplastic Cells at the Migrating 437 
Front of Human Glioblastoma. Cell Rep., 1399–1410 (2017). 438 

4.  J. Gojo, B. Englinger, L. Jiang, J. M. Hübner, M. L. Shaw, O. A. Hack, S. Madlener, D. Kirchhofer, 439 
I. Liu, J. Pyrdol, V. Hovestadt, E. Mazzola, N. D. Mathewson, M. Trissal, D. Lötsch, C. Dorfer, C. 440 
Haberler, A. Halfmann, L. Mayr, A. Peyrl, R. Geyeregger, B. Schwalm, M. Mauermann, K. W. 441 
Pajtler, T. Milde, M. E. Shore, J. E. Geduldig, K. Pelton, T. Czech, O. Ashenberg, K. W. 442 
Wucherpfennig, O. Rozenblatt-Rosen, S. Alexandrescu, K. L. Ligon, S. M. Pfister, A. Regev, I. 443 
Slavc, W. Berger, M. L. Suvà, M. Kool, M. G. Filbin, Single-Cell RNA-Seq Reveals Cellular 444 
Hierarchies and Impaired Developmental Trajectories in Pediatric Ependymoma. Cancer Cell. 38, 445 
44–59 (2020). 446 

5.  C. Neftel, J. Laffy, M. G. Filbin, T. Hara, M. E. Shore, G. J. Rahme, A. R. Richman, D. Silverbush, 447 
M. L. Shaw, C. M. Hebert, J. Dewitt, S. Gritsch, E. M. Perez, L. N. Gonzalez Castro, X. Lan, N. 448 
Druck, C. Rodman, D. Dionne, A. Kaplan, M. S. Bertalan, J. Small, K. Pelton, S. Becker, D. Bonal, 449 
Q.-D. Nguyen, R. L. Servis, J. M. Fung, R. Mylvaganam, L. Mayr, J. Gojo, C. Haberler, R. 450 
Geyeregger, T. Czech, I. Slavc, B. V. Nahed, W. T. Curry, B. S. Carter, H. Wakimoto, P. K. 451 
Brastianos, T. T. Batchelor, A. Stemmer-Rachamimov, M. Martinez-Lage, M. P. Frosch, I. 452 
Stamenkovic, N. Riggi, E. Rheinbay, M. Monje, O. Rozenblatt-Rosen, D. P. Cahill, A. P. Patel, T. 453 
Hunter, I. M. Verma, K. L. Ligon, D. N. Louis, A. Regev, B. E. Bernstein, I. Tirosh, M. L. Suvà, An 454 
Integrative Model of Cellular States, Plasticity, and Genetics for Glioblastoma. Cell. 178, 835-849.e21 455 
(2019). 456 

6.  M. C. Vladoiu, I. El-Hamamy, L. K. Donovan, H. Farooq, B. L. Holgado, Y. Sundaravadanam, V. 457 
Ramaswamy, L. D. Hendrikse, S. Kumar, S. C. Mack, J. J. Y. Lee, V. Fong, K. Juraschka, D. 458 
Przelicki, A. Michealraj, P. Skowron, B. Luu, H. Suzuki, A. S. Morrissy, F. M. G. Cavalli, L. Garzia, 459 
C. Daniels, X. Wu, M. A. Qazi, S. K. Singh, J. A. Chan, M. A. Marra, D. Malkin, P. Dirks, L. Heisler, 460 
T. Pugh, K. Ng, F. Notta, E. M. Thompson, C. L. Kleinman, A. L. Joyner, N. Jabado, L. Stein, M. 461 
D. Taylor, Childhood cerebellar tumours mirror conserved fetal transcriptional programs. Nature. 462 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


15 

572, 67–73 (2019). 463 

7.  I. Tirosh, A. S. Venteicher, C. Hebert, L. E. Escalante, A. P. Patel, K. Yizhak, J. M. Fisher, C. 464 
Rodman, C. Mount, M. G. Filbin, C. Neftel, N. Desai, J. Nyman, B. Izar, C. C. Luo, J. M. Francis, 465 
A. A. Patel, M. L. Onozato, N. Riggi, K. J. Livak, D. Gennert, R. Satija, B. V. Nahed, W. T. Curry, 466 
R. L. Martuza, R. Mylvaganam, A. J. Iafrate, M. P. Frosch, T. R. Golub, M. N. Rivera, G. Getz, O. 467 
Rozenblatt-Rosen, D. P. Cahill, M. Monje, B. E. Bernstein, D. N. Louis, A. Regev, M. L. Suvà, 468 
Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma. Nature. 539, 469 
309–313 (2016). 470 

8.  A. S. Venteicher, I. Tirosh, C. Hebert, K. Yizhak, C. Neftel, M. G. Filbin, V. Hovestadt, L. E. 471 
Escalante, M. L. Shaw, C. Rodman, S. M. Gillespie, D. Dionne, C. C. Luo, H. Ravichandran, R. 472 
Mylvaganam, C. Mount, M. L. Onozato, B. V. Nahed, H. Wakimoto, W. T. Curry, A. J. Iafrate, M. 473 
N. Rivera, M. P. Frosch, T. R. Golub, P. K. Brastianos, G. Getz, A. P. Patel, M. Monje, D. P. Cahill, 474 
O. Rozenblatt-Rosen, D. N. Louis, B. E. Bernstein, A. Regev, M. L. Suvà, Decoupling genetics, 475 
lineages, and microenvironment in IDH-mutant gliomas by single-cell RNA-seq. Science (80-. ). 355 476 
(2017), doi:10.1126/science.aai8478. 477 

9.  S. Müller, A. Cho, S. J. Liu, D. A. Lim, A. Diaz, CONICS integrates scRNA-seq with DNA 478 
sequencing to map gene expression to tumor sub-clones. Bioinformatics (2018), 479 
doi:10.1093/bioinformatics/bty316. 480 

10.  J. D. Buenrostro, P. G. Giresi, L. C. Zaba, H. Y. Chang, W. J. Greenleaf, Transposition of native 481 
chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins 482 
and nucleosome position. Nat. Methods (2013), doi:10.1038/nmeth.2688. 483 

11.  J. D. Buenrostro, B. Wu, U. M. Litzenburger, D. Ruff, M. L. Gonzales, M. P. Snyder, H. Y. Chang, 484 
W. J. Greenleaf, Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 485 
(2015), doi:10.1038/nature14590. 486 

12.  R. Killick, I. A. Eckley, Changepoint: An R package for changepoint analysis. J. Stat. Softw. (2014), 487 
doi:10.18637/jss.v058.i03. 488 

13.  M. Snuderl, L. Fazlollahi, L. P. Le, M. Nitta, B. H. Zhelyazkova, C. J. Davidson, S. Akhavanfard, D. 489 
P. Cahill, K. D. Aldape, R. A. Betensky, D. N. Louis, A. J. Iafrate, Mosaic amplification of multiple 490 
receptor tyrosine kinase genes in glioblastoma. Cancer Cell (2011), doi:10.1016/j.ccr.2011.11.005. 491 

14.  H. Chen, L. Albergante, J. Y. Hsu, C. A. Lareau, G. Lo Bosco, J. Guan, S. Zhou, A. N. Gorban, D. 492 
E. Bauer, M. J. Aryee, D. M. Langenau, A. Zinovyev, J. D. Buenrostro, G. C. Yuan, L. Pinello, 493 
Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nat. 494 
Commun. 10, 1903 (2019). 495 

15.  A. N. Schep, B. Wu, J. D. Buenrostro, W. J. Greenleaf, ChromVAR: Inferring transcription-factor-496 
associated accessibility from single-cell epigenomic data. Nat. Methods. 14, pages975–978 (2017). 497 

16.  A. P. Patel, I. Tirosh, J. J. Trombetta, A. K. Shalek, S. M. Gillespie, H. Wakimoto, D. P. Cahill, B. V. 498 
Nahed, W. T. Curry, R. L. Martuza, D. N. Louis, O. Rozenblatt-Rosen, M. L. Suvà, A. Regev, B. E. 499 
Bernstein, Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 500 
344, 1396–1401 (2014). 501 

17.  M. D. Robinson, D. J. McCarthy, G. K. Smyth, edgeR: A Bioconductor package for differential 502 
expression analysis of digital gene expression data. Bioinformatics. 26, 139–140 (2010). 503 

18.  L. Scrucca, M. Fop, T. B. Murphy, A. E. Raftery, Mclust 5: Clustering, classification and density 504 
estimation using Gaussian finite mixture models. R J. 8, 289–317 (2016). 505 

19.  R. Killick, I. A. Eckley, Changepoint: An R package for changepoint analysis. J. Stat. Softw. 58 506 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


16 

(2014), doi:10.18637/jss.v058.i03. 507 

20.  H. Chen, L. Albergante, J. Y. Hsu, C. A. Lareau, G. Lo Bosco, J. Guan, S. Zhou, A. N. Gorban, D. 508 
E. Bauer, M. J. Aryee, D. M. Langenau, A. Zinovyev, J. D. Buenrostro, G. C. Yuan, L. Pinello, 509 
Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nat. 510 
Commun. (2019), doi:10.1038/s41467-019-09670-4. 511 

21.  A. N. Schep, B. Wu, J. D. Buenrostro, W. J. Greenleaf, ChromVAR: Inferring transcription-factor-512 
associated accessibility from single-cell epigenomic data. Nat. Methods (2017), 513 
doi:10.1038/nmeth.4401. 514 

22.  A. Khan, O. Fornes, A. Stigliani, M. Gheorghe, J. A. Castro-Mondragon, R. Van Der Lee, A. Bessy, 515 
J. Chèneby, S. R. Kulkarni, G. Tan, D. Baranasic, D. J. Arenillas, A. Sandelin, K. Vandepoele, B. 516 
Lenhard, B. Ballester, W. W. Wasserman, F. Parcy, A. Mathelier, JASPAR 2018: Update of the 517 
open-access database of transcription factor binding profiles and its web framework. Nucleic Acids 518 
Res. (2018), doi:10.1093/nar/gkx1126. 519 

23.  H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform. 520 
Bioinformatics. 25, 1754–60 (2009). 521 

24.  H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, 522 
The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25, 2078–2079 (2009). 523 

25.  E. Talevich, A. H. Shain, T. Botton, B. C. Bastian, CNVkit: Genome-Wide Copy Number Detection 524 
and Visualization from Targeted DNA Sequencing. PLoS Comput. Biol. 12 (2016), 525 
doi:10.1371/journal.pcbi.1004873. 526 

26.  A. R. Quinlan, I. M. Hall, BEDTools: A flexible suite of utilities for comparing genomic features. 527 
Bioinformatics. 26, 841–2 (2010). 528 

27.  T. Stuart, A. Srivastava, C. Lareau, R. Satija, bioRxiv, in press, doi:10.1101/2020.11.09.373613. 529 

28.  A. Butler, P. Hoffman, P. Smibert, E. Papalexi, R. Satija, Integrating single-cell transcriptomic data 530 
across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018). 531 

 532 
 533 

ACKNOWLEDGMENTS 534 

Funding: A Canada Research Chair in Brain Cancer Epigenomics (tier 2) from the Government of Canada, 535 
Project grants from the Canadian Institutes of Health Research (CIHR; PJT-156278, PJT-173475), a 536 
Discovery grant from the Natural Sciences and Engineering Research Council (NSERC) and an Azrieli 537 
Future Leader in Canadian Brain Research grant to MG; a Clinician Investigator Program fellowship from 538 
Alberta Health Services and a fellowship from Alberta Innovates to AN; an Eyes High Scholarship from the 539 
University of Calgary to DS; a Clark Smith postdoctoral fellowship and a CIHR postdoctoral fellowship to 540 
MJ; a Canada Research Char in Precision Oncology (tier 2) and a CIHR grant (PJT-438802) to SM; an Alberta 541 
Graduate Excellence Scholarship and Alberta Innovates scholarship to AG. This project has been made 542 
possible by the Brain Canada Foundation through the Canada Brain Research Fund, with the financial 543 
support of Health Canada and the Azrieli Foundation. 544 

 545 

Author contributions: Conception and experimental design: AN, MG. Generation of datasets: AN, KE, 546 
JC, PN, NB. Data acquisition and analysis: AN, DS, MJ, AG, SM, NB, MG. Data interpretation and creation 547 
of new software: AN, MG. Manuscript preparation: All co-authors. 548 
 549 
Competing interests: The authors declare no competing interests. 550 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


17 

 551 

Data and materials availability: The Copy-scAT package and a sample tutorial are available on Github at 552 
http://github.com/spcdot/CopyscAT. All datasets will be made available upon publication in a peer 553 
reviewed journal. 554 
 555 
 556 
 557 
 558 
 559 
 560 
 561 
 562 
 563 
 564 
 565 
 566 
 567 
 568 
 569 
 570 
 571 
 572 
 573 
 574 
 575 
 576 
 577 
 578 
 579 
 580 
 581 
 582 
 583 
 584 
 585 
 586 
 587 
 588 
 589 
 590 
 591 
 592 
 593 
 594 
 595 
 596 
 597 
 598 
 599 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


18 

SUPPLEMENTAL MATERIAL 600 
 601 
 602 
Table S1. Summary of samples and cells profiled by Copy-scAT 603 
 604 
Sample Unique barcodes 

after pileup 
Unique barcodes 
after filtering 

Percent passing 
filters 

CGY4218 1542 1335 86.58% 
CGY4250 1371 947 69.07% 
CGY4275 1004 609 60.66% 
CGY4349 961 756 78.67% 
pCGY2932 1203 802 66.67% 
pCGY2937 1445 1200 83.04% 
pCGY3103 2189 956 43.67% 
pCGY3402 3162 2318 73.31% 
pCGY3749 2774 2503 90.23% 
pCGY4021 1963 1382 70.40% 
MM1217 890 792 88.99% 
MM1388 2538 2219 87.43% 
MM1389 7438 6607 88.83% 
MM1438 1774 1578 88.95% 
MM1460 2048 1683 82.18% 
MM1479 5135 4564 88.88% 
MM1498 7408 2160 29.16% 
MM1555 7220 6586 91.22% 
MM1643 3141 2794 88.95% 
MM1698 2844 2283 80.27% 
Total cells profiled 58050 44074 76.86% 

 605 
 606 
 607 
 608 
 609 
 610 
 611 
 612 
 613 
 614 
 615 
 616 
 617 
 618 
 619 
 620 
 621 
  622 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


19 

Table S2. Sensitivity and Specificity of Copy-scAT in aGBM, pGBM and MM samples  623 
 624 
 Gains Losses Amplifications 
Samples Sensitivity Specificity Sensitivity Specificity Sensitivity Specificity 
aGBM (n = 3) 1.0 0.94 0.79 0.89 1.0 1.0 
pGBM (n= 6) 0.73 0.93 0.73 0.95 N/A 0.975 
MM (n = 10) 0.51 0.94 0.67 0.89 N/A N/A 

 625 
 626 
 627 
 628 
 629 
 630 
 631 
 632 
 633 
 634 
 635 
 636 
 637 
 638 
 639 
 640 
 641 
 642 
 643 
 644 
 645 
 646 
 647 
 648 
 649 
 650 
 651 
 652 
 653 
 654 
 655 
 656 
 657 
 658 
 659 
 660 
 661 
 662 
 663 
 664 
 665 
 666 
 667 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


20 

Fig. S1. Comparison of CNVs inferred by Copy-scAT and by WGS for adult GBM samples. 668 
(A) Comparison of chromosome arm level losses detected in three adult GBM samples by single cell 669 
ATAC, WGS, or both methods.  670 
(B) Comparison of focal amplifications detected in three adult GBM sample by scATAC, WGS, or both 671 
methods. 672 
 673 

 674 
 675 
 676 
 677 
 678 
 679 
 680 
 681 
 682 
 683 
 684 
 685 
 686 
 687 
 688 
 689 
 690 
 691 
 692 
 693 
 694 
 695 
 696 
 697 
 698 
 699 
 700 
 701 
 702 
 703 
 704 
 705 
  706 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


21 

Fig. S2. Comparison of CNVs inferred by Copy-scAT or WGS in pediatric GBM samples. 707 
(a) Gains detected in three pediatric GBM samples compared to linked-reads WGS. 708 
(b) Losses detected in three pediatric GBM samples compared to linked-reads WGS. 709 
 710 

 711 
 712 
 713 
 714 
 715 
 716 
 717 
 718 
 719 
 720 
 721 
 722 
 723 
 724 
 725 
 726 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


22 

Fig. S3. Comparison of CNVs inferred by Copy-scAT or with the scCNV assay in multiple 727 
myeloma samples. 728 
(a) Comparison of gains seen in additional myeloma samples versus 10x single-cell CNV sequencing. (b) 729 
Comparison of chromosome losses seen in additional myeloma samples versus 10x single-cell CNV 730 
sequencing. (c,d) Number of gains and losses detected by both methods compared to number of cells in 731 
scATAC-seq sample. (e-f) Number of shared gains or losses detected between the two methods, plotted 732 
versus the number of cells in the scATAC-seq experiment.  (g-h) Number of shared gains or losses 733 
detected between the two methods, plotted versus the number of reads per cell in the scATAC. 734 

 735 
 736 
 737 
 738 
 739 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


23 

Fig. S4. CNVs are detected in scATAC clusters with Copy-scAT in pediatric GBM samples.  740 
(a) Overview of cell assignments in two paired patient libraries.  741 
(b-d) Representative WGS-confirmed alterations detected in pCGY2932 and pCGY2937. 742 
 743 

 744 
  745 
 746 
 747 
 748 
 749 
 750 
 751 
 752 
 753 
 754 
 755 
 756 
 757 
 758 
 759 
 760 
 761 
 762 
  763 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


24 

Fig. S5. CNVs are identified by Copy-scAT in specific scATAC clusters in multiple myeloma 764 
samples. (a) Gain of chromosome 11p restricted to neoplastic cell populations. (b) Similar pattern with 765 
gain of chromosome 11q. (c) Similar pattern with loss of chr13q. 766 
 767 

 768 
 769 
 770 
 771 
 772 
 773 
 774 
 775 
 776 
 777 
 778 
 779 
 780 
 781 
 782 
 783 
 784 
 785 
 786 
  787 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


25 

Fig. S6. Additional chromosome copy number analyses for CGY4218.  788 
(a) Initial neighbourhood clustering results from Signac.  789 
(b-f) Representative chromosome-level copy number alteration profiles for tumour and normal cells. (g-n) 790 
Representative motif scores from ChromVAR for different motifs, including (g) ELF5, (h) SPIB, (i) 791 
ASCL1, (j) IKZF1, (k) NEUROD1, (l) NFIC, (m) NFYA, (n) ELK3.792 

 793 
 794 
 795 
 796 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


26 

Fig. S7. Representative copy number information and distribution for aGBM sample CGY4250.  797 
(a) Neighbourhood clustering results from Signac.  798 
(b-c) Distribution of amplifications in EGFR and MDM2.  799 
(d-i) Representative chromosome-level copy number alteration profiles for tumour and normal cells. (j-l) 800 
Representative motif scores from ChromVAR for different motifs, including (j) NFIC, (k) SPIB and (l) 801 
FOXG1.802 

 803 
 804 
 805 
 806 
 807 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


27 

Fig. S8. Representative copy number information and distribution for aGBM sample CGY4275.  808 
(a) Neighbourhood clustering results from Signac.  809 
(b) Distribution of amplifications in EGFR.  810 
(c-j) Representative chromosome-level copy number alteration profiles for tumour and normal cells. (g-l) 811 
Representative motif scores from ChromVAR for different motifs, including (g) NFIC, (h) FOS::JUN, (i) 812 
NEUROD1, (j) ELF5, (k) SPIB, and (l) IKZF1. 813 
 814 

 815 
 816 
 817 
 818 
 819 
  820 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


28 

Fig. S9. Effects of removing CNVs on variance in aGBM sample CGY4349.  821 
(a) Distribution of the top 2000 most variable peaks in the tumour cells after filtering out non-neoplastic 822 
cells; p value from Chi-squared test.  823 
(b) Distribution of top 2000 most variable peaks in non-neoplastic cells after filtering (P VALUE FROM 824 
CHI-SQUARED TEST). Chromosomes with CNVs or amplification regions are highlighted in pink. 825 
(c) Distribution of top 2000 most variable peaks in tumour cells after filtering of non-neoplastic cells and 826 
removal of regions containing CNVs (P VALUE FROM CHI-SQUARED TEST). 827 
 828 

 829 
 830 
 831 
 832 
 833 
 834 
 835 
 836 
 837 
 838 
 839 
 840 
 841 
 842 
 843 
 844 
 845 
  846 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


29 

Fig. S10. Validation of Copy-scAT and identification of putative proliferative cells in non-847 
neoplastic datasets. (a) Chromosome copy number distribution in a 10X dataset of 5000 human PBMCs.  848 
(b) Seurat clusters for the 10X dataset of 5000 human PBMCs.  (c) Estimate of cycle status for the 10X 849 
dataset of 5000 human PBMCs. (d) Chromosome copy number distribution in a 10X dataset of mouse 850 
embryonic brain at E18. (e,f) Predicted cycle status and cluster assignments in E18 mouse brain. (g,h) 851 
Predicted cell cycle status and cluster profile in P50 mouse brain dataset from 10X. 852 
 853 

 854 
 855 
 856 
 857 
 858 
 859 
 860 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


30 

Fig. S11. Pseudotime trajectory analysis of  aGBM sample CGY4250. Distribution of EGFR 861 
amplification (a) and cell cycle status (b) amongst branches. Distribution of ChromVAR motif scores in 862 
branches for proneural motifs ASCL1 and OLIG2 (c,d), ETV1 (e), NFIX (f), and mesenchymal motifs 863 
JUN::JUNB (g) and STAT3 (h). 864 
 865 

 866 
 867 
 868 
 869 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


31 

Fig. S12. Pseudotime trajectory analysis of aGBM sample CGY4349.  Distribution of PDGFRA 870 
amplification (a) and cycling status (b) amongst branches. Distribution of ChromVAR motif scores in 871 
branches for proneural motifs ASCL1 and OLIG2 (c,d), ETV1 (e), NFIX (f), and mesenchymal motifs 872 
JUN::JUNB (g) and STAT3 (h). 873 
 874 

 875 
 876 
 877 
 878 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/


32 

Fig. S13. Pseudotime trajectory analysis of aGBM sample CGY4275.  Distribution of ChromVAR 879 
motif scores in branches for proneural motifs ASCL1 and OLIG2 (a,b), ETV1 (c), NFIX (d), and 880 
mesenchymal motifs JUN::JUNB (e) and STAT3 (f). 881 
 882 

 883 
 884 
 885 
 886 
 887 
 888 

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 13, 2021. ; https://doi.org/10.1101/2020.09.21.305516doi: bioRxiv preprint 

https://doi.org/10.1101/2020.09.21.305516
http://creativecommons.org/licenses/by-nc-nd/4.0/