Sample-wise unsupervised deconvolution of complex tissues


 1 

Sample-wise unsupervised deconvolution of complex tissues 

 
Lulu Chen1, Chia-Hsiang Lin2,*, Chiung-Ting Wu1,*, Robert Clarke3, Guoqiang Yu1, Jennifer E. 

Van Eyk4, David M. Herrington5, and Yue Wang1,# 

 
* Equal contribution 

 
1Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State 

University, Arlington, VA 22203, USA; 2Department of Electrical Engineering, National Cheng 

Kung University, Tainan City, Taiwan 70101; 3The Hormel Institute, University of Minnesota, 

Austin, MN 55912; 4Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical 

Center, Los Angeles, CA 90048, USA; 5Department of Internal Medicine, Wake Forest 

University, Winston-Salem, NC 27157, USA 

 
Article Format:   bioRxiv 

 
Running title:    Sample-wise unsupervised deconvolution 

 
Open source software: R code is available  

at https://github.com/Lululuella/swCAM 

 
#Author for correspondence:  Yue Wang, Ph.D. 

Virginia Polytechnic Institute and State University 

900 N. Glebe Road, Arlington, VA 22203 

E-mail: yuewang@vt.edu 

  
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

mailto:yuewang@vt.edu
https://doi.org/10.1101/2021.01.04.425315


 2 

Abstract 

We report a sample-wise fully unsupervised deconvolution method, namely sample-wise Convex 

Analysis of Mixtures (swCAM), that can estimate constituent proportions and subtype-specific 

expressions in individual samples using tissue-level bulk data (Chen 2019). The swCAM software 

tool enables statistically-principled subtype-level downstream analyses, such as detecting subtype-

specific differentially expressed genes (sDEG) and differential dependency networks (DDN) 

(Zhang, Li et al. 2009, Chen, Lu et al. 2020). Significantly different from population-level 

deconvolution, individual-level deconvolution is mathematically an underdetermined problem 

because there are more variables than observations. We therefore extend the existing CAM 

framework by adding an extra term of between-sample variations and formulate swCAM as a 

nuclear-norm regularized low-rank matrix factorization problem (Wang, Hoffman et al. 2016). We 

determine hyperparameter value by random entry exclusion based cross-validation scheme and 

obtain swCAM solution using a modified efficient alternating direction method of multipliers 

(ADMM). Experimental results on realistic simulation data sets show that swCAM can accurately 

perform sample-wise unsupervised deconvolution of complex tissues and successfully recover 

subtype-specific correlation networks that are otherwise unobtainable using existing methods. 

  
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 3 

Introduction 

Decades of research on molecule regulatory mechanisms have provided a rich framework with 

which we can extract molecule expression patterns to gain insight into the organization and 

structure of the large biological networks (Zhang and Horvath 2005, Zhang, Li et al. 2009, Tian, 

Zhang et al. 2014). However, most discoveries are concluded from measured molecule expressions 

in heterogeneous tissues, in which the underlying changes of constituent components could 

obscure molecule regulations and corrupt network inference that only occur in particular tissue 

subtypes. The inference of subtype-specific molecule expression patterns becomes an essential 

problem for understanding complex molecule functions and the role of each subtype during the 

dynamic biological process, such as cell fate specification (Sonawane, Platig et al. , Chasman and 

Roy 2017, Gal, London et al. 2017). The ability to obtain sample-wise expression variation in each 

subtype is critical prior to infer subtype-specific molecular networks (Shen-Orr, Tibshirani et al. 

2010, Junttila and de Sauvage 2013). Single-cell expression profiling techniques have become 

popular to investigate cell-type-specific network but may lose critical information of cell-cell 

interactions and is prone to cell-cycle/state confounders (Buettner, Natarajan et al. 2015, Gal, 

London et al. 2017). 

While the current CAM tool can dissect mixed signals of multiple samples into the ‘averaged’ 

expression profiles of subtypes, many subsequent molecular analyses of complex tissues require 

sample-specific signal deconvolution where each sample is a mixture of ‘individualized’ subtype-

specific expression profiles. Here we propose a new algorithm called swCAM as an extension of 

CAM to solve sample-specific Blind Source Separation (sBSS) problem. The sBSS problem, 

because the number of variables is much larger than the number of observations, is ill-posed and 

underdetermined. As a result, simple yet highly regularized approaches often become the methods 

of choice (Hastie, Tibshirani et al. 2001). The sBSS problems have received increasing interest in 

hyperspectral imagery area where the spatially smooth and variation sparsity regularization can be 

exploited to unmix spectral signals (Thouvenin, Dobigeon et al. 2016). In the context of biological 

process, transcriptional regulatory networks connect regulatory proteins, such as transcription 

factors (TFs) and signaling proteins, to target genes and thus form co-expressed gene sets as 

function modules in each subtype. Based on such underlying cellular mechanisms, we impose and 

exploit the low-rank assumption on between-sample variations of molecule expressions in each 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 4 

subtype. While estimating subtype-specific signals from a single mixture for each gene 

independently is an underdetermined problem, the low-rank assumption can aggregate information 

from gene sets within function modules to help find a biologically plausible solution.  

General rank minimization is a challenging nonconvex optimization problem for which all existing 

finite-time algorithms have at least doubly exponential running times in both theory and practice 

(Recht, Fazel et al. 2010). Minimizing the nuclear norm, or the sum of the singular values of the 

matrix, over the affine subset, have multiple advantages. The nuclear norm is a convex function 

and can be optimized efficiently, thus can provide the best convex approximation of the rank 

function over the unit ball of matrices with norm less than one (Recht, Fazel et al. 2010, Candes, 

Sing-Long et al. 2013). Nuclear norm regularization has been successfully applied in many 

practical applications with low-rank modeling, such as image denoising (Candes, Sing-Long et al. 

2013) and matrix completion (Cai, Candès et al. 2010). swCAM will adopt nuclear norm 

regularization to optimize the estimation of between-sample variations in each subtype to recover 

sample-specific signals for each subtype. This Chapter introduces mathematical modeling of 

sample-specific deconvolution and optimization solver used in swCAM algorithm, followed by 

validation in simulations and discussion on further possible improvement by sparsity 

regularization. 

Method 

Problem formulation and nuclear norm regularization 

A fundamental assumption for the conventional linear mixing model, 𝑿 = 𝑨𝑺 + 𝑬, is that all the 

mixture samples share a common source matrix 𝑺 . However, each sample may have its 

‘individualized’ sample-specific sources as the sampled realizations in additional sample-specific 

subtype proportions (Fig. 1): 

𝑺𝑖 = �̅� + ∆𝑺𝑖 , 𝑖 = 1, … , 𝑀. 

The associated sample-specific BSS (sBSS) model is given by 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 5 

𝒙𝑖 = 𝒂𝑖 (�̅� + ∆𝑺𝑖 ) + 𝒏𝑖 ∈ ℝ
𝐿 , ∀𝑖 = 1, … , 𝑀 (1)

where 𝒂𝑖  is the row vector in proportion matrix 𝑨, and 𝒏𝑖 is noise term. ∆𝑺𝑖 is expected to have 

small-valued entries so that source matrices among different samples are similar. 

 
Fig. 1. Sample-specific deconvolution problem formulation and the assumption of hidden low-rank pattern 

in each source. (For convenient illustration, 𝑻 matrix in all figures are the transposed version of those in 
the text and equations.) 

If some expression units (molecular features) are highly correlated in a particular source, the rank 

of the matrix consisting of ∆𝑺𝑖 entries in this source will be low, leading to the following swCAM 

objective function: 

min
𝑨,{∆𝑺𝑖}𝑖=1

𝑀

1

2
∑‖𝒙𝑖 − 𝒂𝑖 (�̅� + ∆𝑺𝑖 )‖2

2

𝑀

𝑖=1

+ 𝜆 ∑‖𝑻𝑘 ‖∗

𝐾

𝑘=1

(2) 

𝑠. 𝑡.  𝒂𝑖 ≽ 𝟎𝐾 , 𝒂𝑖 𝟏𝐾 = 1, �̅� + ∆𝑺𝑖 ≽ 𝟎𝐾×𝐿 , 

𝑻𝑘 = [∆𝑺1
𝑇 (𝑘), … , ∆𝑺𝑀

𝑇 (𝑘)] ∈ ℝ𝐿×𝑀,  𝑘 = 1, … , 𝐾, 

where 𝑻𝑘  consists of the 𝑘 th column in all ∆𝑺𝑖 , 𝑖 = 1, … , 𝑀 , representing between-sample 

variation in source 𝑘, namely the variation matrix for the 𝑘th subtype. ‖𝑻𝑘 ‖∗ is the nuclear norm 

of 𝑻𝑘; and 𝜆 > 0 is the regularization parameter of nuclear norms. The hyperparameter λ actually 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 6 

can be different among different source 𝑘  and thus the regularizer in (2) can be replaced by 

∑ 𝜆𝑘 ‖𝑻𝑘 ‖∗
𝐾
𝑘=1  if necessary. Please see supplementary information for the reason to select subtype-

specific regularization terms. 

Optimization of swCAM objective function 

The objective function in (2) is bi-convex w.r.t. the two block-wise variables, i.e. 𝑨 ≜

[𝒂1
𝑇 , … , 𝒂𝑀

𝑇 ]𝑇  and 𝑻 ≜ [𝑻1
𝑇 , … , 𝑻𝑀

𝑇 ]𝑇 ∈ ℝ𝐾𝐿×𝑀 . Accordingly, we can solve (2) by alternatively 

solving the following two convex subproblems until convergence:  

𝑻𝑝+1 ∈ argmin
∆𝑺i≽−𝑺,∀𝒊

𝒥(𝑨𝑝, 𝑻) (3) 

𝑨𝑝+1 ∈ argmin
𝑨≽0𝑀×𝐾,𝑨𝟏𝐾=𝟏𝑀 

𝒥(𝑨, 𝑻𝑝+1) (4) 

where 

𝒥(𝑨, 𝑻) ≜
1

2
∑‖𝒙𝑖 − 𝒂𝑖 (�̅� + ∆𝑺i)‖2

2

𝑀

𝑖=1

+ 𝜆 ∑‖𝑻𝑘 ‖∗

𝐾

𝑘=1

 
CAM-estimated subtype-specific expression matrix serves as the initial reference 𝑺. Note that in 

(3) (4), we have implicitly used the following relationship for concise representation: 

𝑻 ≜ [𝑣𝑒𝑐(Δ𝑺1
𝑇 ), … , 𝑣𝑒𝑐(Δ𝑺𝑀

𝑇 )], 

where (4) can be decoupled w.r.t each row of 𝑨: 

𝒂𝑖
𝑝+1

∈ argmin
𝒂𝑖≽𝟎𝐾,𝒂𝑖𝟏𝐾=1

1

2
‖𝒙𝑖 − 𝒂𝑖 (�̅� + ∆𝑺𝑖

𝑝+1
)‖

2

2
 

which can be solved using quadratic programming. If a prior proportion matrix or CAM-estimated 

proportion matrix has already been of high quality, we can skip the alternative optimization on 𝑨 

matrix, and obtain 𝑻 matrix by optimizing the subproblem (3) only once. 

To solve (3), we notice that the main bottleneck is its huge dimension of variables (typically, L is 

several ten thousand), preventing conventional convex solvers from being readily applicable here. 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 7 

We propose to solve (3) by adapting the alternating direction method of multipliers (ADMM), 

which has been widely applied to many large-scale problems in areas such as statistical learning, 

image processing and computational biology (Boyd, Parikh et al. 2011).  

ADMM naturally allows decoupling the non-smooth regularization term from the smooth loss term, 

which is computationally advantageous. Specifically, we reformulate (3) in the form that the 

primal variable can be “split” into several parts, with the associated objective function “separable” 

across this splitting (Boyd, Parikh et al. 2011). We will use the following definitions: 

𝑻 ≜ [𝑣𝑒𝑐(Δ𝑺1
𝑇 ), … , 𝑣𝑒𝑐(Δ𝑺𝑀

𝑇 )] = [
𝑻1
…
𝑻𝐾

] ∈ ℝ𝐾𝐿×𝑀 

𝑺 ≜ [𝑣𝑒𝑐(𝑺1
𝑇 ), … , 𝑣𝑒𝑐(𝑺𝑀

𝑇 )] ∈ ℝ𝐾𝐿×𝑀  

𝑽 ≜ 𝑿𝑇 ∈ ℝ𝐿×𝑀 

𝑾 ≜ [
𝑻
𝑺

] ∈ ℝ2𝐾𝐿×𝑀 

𝑪1 ≜ [
𝑰𝐾𝐿
𝑰𝐾𝐿

] ∈ ℝ2𝐾𝐿×𝐾𝐿 

𝑪2 ≜ −𝑰2𝐾𝐿 ∈ ℝ
2𝐾𝐿×2𝐾𝐿  

𝑪3 ≜ [
𝟏𝑀

𝑇 ⨂𝑣𝑒𝑐(�̅�𝑇)
𝟎𝐾𝐿×𝑀

] ∈ ℝ2𝐾𝐿×𝑀 

𝑩0 ≜ [𝟎𝐾𝐿×𝐾𝐿 , 𝑰𝐾𝐿 ] ∈ ℝ
𝐾𝐿×2𝐾𝐿  

𝑩𝑘 ≜ [𝟎𝐿×(𝑘−1)𝐿 , 𝑰𝐿 , 𝟎𝐿×(𝐾−𝑘)𝐿 , 𝟎𝐿×𝐾𝐿 ] ∈ ℝ
𝐿×2𝐾𝐿 , 𝑘 = 0, … , 𝐾 

Then we can simplify (3) as the equivalent form: 

min
𝑼∈ℝ𝐾𝐿×𝑀,𝑾∈ℝ2𝐾𝐿×𝑀

1

2
‖𝒜(𝑼) − 𝑽‖𝐹

2 + 𝜆 ∑‖𝑩𝑘 𝑾‖∗

𝐾

𝑘=1

+ 𝐼+(𝑩0𝑾) (5) 

𝑠. 𝑡. 𝑪1𝑼 + 𝑪2𝑾 = 𝑪3, 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 8 

where 𝐼+(∙) is the indicator function for the non-negative orthant;  𝐼+(𝑩0𝑾) = 𝐼+(𝑺) = 0 if 𝑺 ≽

𝟎𝐾𝐿×𝑀  ( 𝐼+(𝑼) = +∞ , otherwise). The linear transformation in the first term is 𝒜(𝑼) =

𝒜([𝒖1, … , 𝒖𝑀]) = [𝑯1𝒖1, … , 𝑯𝑀𝒖𝑀] with 𝑯𝑖 = [𝒂𝑖
𝑝

⨂𝐼𝐿 ], 𝑖 = 1, … , 𝑀 . Note that (5) has been 

with the ADMM form w.r.t. the two split block variables 𝑼 and 𝑾, and, as (5) is solved, the 

solution of (3) can be obtained by 𝑻𝑝+1 = [ 𝑰𝐾𝐿 , 𝟎𝐾𝐿×𝐾𝐿 ]𝑾
∗. 

Given a penalty parameter 𝛾 > 0  (empirically, 𝛾 ≔ 1  generally guarantees good convergence 

speed), the augmented Lagrangian (ignoring some irrelevant terms) of problem (5) is defined by 

ℒ(𝑼, 𝑾, 𝒁) =
1

2
‖𝒜(𝑼) − 𝑽‖𝐹

2 + 𝜆 ∑‖𝑩𝑘 𝑾‖∗

𝐾

𝑘=1

+ 𝐼+(𝑩0𝑾) +
𝛾

2
‖𝑪1𝑼 + 𝑪2𝑾 − 𝑪3 − 𝒁‖𝐹

2  

where “−𝛾𝒁”∈ ℝ2𝐾𝐿×𝑀 is the dual variable (or Lagrange multiplier) associated with the constraint 

𝑪1𝑼 + 𝑪2𝑾 = 𝑪3. Then, ADMM solves (5) via the following iterative procedure: 

𝑼𝑞+1𝜖 argmin
𝑼∈ℝ𝐾𝐿×𝑀

ℒ(𝑼, 𝑾𝑞 , 𝒁𝑞 ) (6𝑎) 

𝑾𝑞+1𝜖 argmin
𝑾∈ℝ2𝐾𝐿×𝑀

ℒ(𝑼𝑞+1, 𝑾, 𝒁𝑞 ) (6𝑏) 

𝒁𝑞+1 = 𝒁𝑞 − (𝑪1𝑼
𝑞+1 + 𝑪2𝑾

𝑞+1 − 𝑪3) (6𝑐) 

where 𝑾0 can be initialized by [𝑻0
𝑇 , 𝑼0

𝑇 ]𝑇  with 𝑻0 = 𝟎𝐾𝐿×𝑀  and 𝑼0 = 𝟏𝑀
𝑇 ⨂𝑣𝑒𝑐(�̅�𝑇 ); 𝒁0 can be 

simply initialized by 𝟎2𝐾𝐿×𝑀. As we will show, both (6a) and (6b) can be solved with closed-form 

expressions, thanks to the decomposability of ADMM. 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 9 

 
Fig. 2. The objective function of swCAM for sample-specific deconvolution problem and its reformulation 

by ADMM. (For convenient illustration, 𝑻 matrix in all figures are the transposed version of those in the 
text and equations.) 

Notice that (6a) is a column-wise separable optimization problem, so we can decouple w.r.t each 

column of 𝑼: 

𝒖𝑖
𝑞+1

∈ argmin
𝒖𝑖∈ℝ

𝐾𝐿

1

2
‖𝑯𝑖 𝒖𝑖 − 𝒗𝑖 ‖2

2 +
𝛾

2
‖𝑪1𝒖𝑖 + 𝒚𝒊

𝑞
‖

𝐹

2
(7) 

where [𝒚1
𝑞

, … , 𝒚𝑀
𝑞

] ≜ 𝑪2𝑾
𝑞 − 𝑪3 − 𝒁

𝑞 . The subproblem (7) is an unconstrained quadratic 

problem, which can be solved by  

𝒖𝑖
𝑞+1

= (𝑯𝑖
𝑇 𝑯𝑖 + 𝛾𝑪1

𝑇 𝑪1)
−1(𝑯𝑖

𝑇 𝒗𝑖 − 𝛾𝑪1
𝑇 𝒚𝒊

𝑞
). (8) 

The matrix inversion can speed up by 

(𝑯𝑖
𝑇 𝑯𝑖 + 𝛾𝑪1

𝑇 𝑪1)
−1 = ((𝒂𝑖

𝑝
)

𝑇
𝒂𝑖

𝑝
+ 2𝛾𝑰𝐾 )

−1

⨂𝑰𝐿 . 

The right term in (8) can also be simplified as 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 10 

𝑯𝑖
𝑇 𝒗𝑖 − 𝛾𝑪1

𝑇 𝒚𝒊
𝑞

= (𝒂𝑖
𝑝

)
𝑇

⨂𝒙𝑖
𝑇 − 𝛾 (𝒚

𝒊
𝑞

+ 𝒚𝒊
𝑞

), 

where 𝒚𝒊
𝑞

= [(𝒚
𝒊
𝑞

)
𝑇

, (𝒚𝒊
𝑞

)
𝑇

]
𝑇

with 𝒚
𝒊
𝑞

∈ ℝ𝐾𝐿  and 𝒚𝒊
𝑞

∈ ℝ𝐾𝐿  being the first and second half vector 

of 𝒚𝒊
𝑞
, respectively.  

Finally, the column vectors of 𝑼𝑞+1 in (6a) can be computed fast by 

𝒖𝑖
𝑞+1

= 𝑣𝑒𝑐 {𝑑𝑒𝑣𝑒𝑐 {(𝒂𝑖
𝑝

)
𝑇

⨂𝒙𝑖
𝑇 − 𝛾 (𝒚

𝒊
𝑞

+ 𝒚𝒊
𝑞

) |𝐿, 𝐾} ((𝒂𝑖
𝑝

)
𝑇

𝒂𝑖
𝑝

+ 2𝛾𝑰𝐾 )
−1

} (9) 

To solve (4.6b), we remove some irrelevant terms from its objective function: 

min
𝑾∈ℝ2𝐾𝐿×𝑀

𝜆 ∑‖𝑩𝑘 𝑾‖∗

𝐾

𝑘=1

+ 𝐼+(𝑩0𝑾) +
𝛾

2
‖𝑪1𝑼

𝑞+1 + 𝑪2𝑾 − 𝑪3 − 𝒁
𝑞 ‖𝐹

2 , (10) 

And then, by defining 𝑼𝑘
𝑞+1

∈ ℝ𝐿×𝑀, 𝑘 = 1, … , 𝐾 as block matrices from top to bottom in 𝑼𝑞+1 ∈

ℝ𝐾𝐿×𝑀 , 𝒁𝑘 ∈ ℝ
𝐿×𝑀, 𝑘 = 1, … , 𝐾 and 𝒁0 ∈ ℝ

𝐾𝐿×𝑀 as block matrices from top to bottom in 𝒁 ∈

ℝ2𝐾𝐿×𝑀 , respectively (i.e., 𝒁 ≜ [𝒁1
𝑇 , … , 𝒁𝐾

𝑇 , 𝒁0
𝑇 ]𝑇 ), we decouple the objective function (10) as 

functions of 𝑻𝑘 , 𝑘 = 1, … , 𝐾 and 𝑺: 

min
𝑾∈ℝ2𝐾𝐿×𝑀

∑ {𝜆‖𝑻𝑘 ‖∗ +
𝛾

2
‖𝑼𝑘

𝑞+1
− 𝑻𝑘 − 𝟏𝑀

𝑇 ⨂�̅�𝑘 − 𝒁𝑘
𝑞

‖
𝐹

2
}

𝐾

𝑘=1

+ {𝐼+(𝑺) +
𝛾

2
‖𝑼𝑞+1 − 𝑺 − 𝒁0

𝑞
‖

𝐹

2
} 

Therefore, 𝑾𝑞+1 can be solved by the proximal point algorithm (PPA) (Parikh and Boyd 2014). 

Specifically, we have 

𝑾𝑞+1 = [(𝑻1
𝑞+1

)
𝑇

, … , (𝑻𝐾
𝑞+1

)
𝑇

, (𝑺𝑞+1)𝑇 ]
𝑇

 
in which 

𝑻𝑘
𝑞+1

∈ argmin
𝑻∈ℝ𝐾𝐿×𝑀

𝜆‖𝑻𝑘 ‖∗ +
𝛾

2
‖𝑼𝑘

𝑞+1
− 𝑻𝑘 − 𝟏𝑀

𝑇 ⨂�̅�𝑘 − 𝒁𝑘
𝑞

‖
𝐹

2
(11𝑎) 

𝑺𝑞+1 ∈ argmin
𝑻∈ℝ𝐾𝐿×𝑀

𝐼+(𝑺) +
𝛾

2
‖𝑼𝑞+1 − 𝑺 − 𝒁0

𝑞
‖

𝐹

2
(11𝑏) 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 11 

Note that (4.11a) and (4.11b) are exactly the proximal operators of ‖𝑻𝑘 ‖∗ and 𝐼+(𝑺), respectively 

(Parikh and Boyd 2014), and their closed-form solutions are given by 

𝑻𝑘
𝑞+1

= ∑ (𝜎𝑘ℓ −
𝜆

𝛾
)

+

𝝁𝑘ℓ𝝂𝑘ℓ
𝑇

𝑟

ℓ=1

, 𝑘 = 1, … , 𝐾, (12) 

𝑺𝑞+1 = [𝑼𝑞+1 − 𝒁0
𝑞

]
+

, (13) 

where the singular value decomposition (SVD) of is performed ahead of the computation of (12), 

i.e. 𝑼𝑘
𝑞+1

− 𝑻𝑘 − 𝟏𝑀
𝑇 ⨂�̅�𝑘 − 𝒁𝑘

𝑞
= ∑ 𝜎𝑘ℓ𝝁𝑘ℓ𝝂𝑘ℓ

𝑇𝑟
ℓ=1 . 

A reasonable termination criterion is that the primal residual, 𝑝𝑟𝑖 = ‖𝑪1𝑼 + 𝑪2𝑾 − 𝑪3‖2, and 

dual residual, 𝑑𝑢𝑎𝑙 = ‖𝛾𝑪1
𝑇 𝑪2(𝑾

𝑞+1 − 𝑾𝑞 )‖2, are smaller than a predefined tolerance. 

Model parameter tuning 

In noisy scenarios, the penalty parameter 𝜆 setting is critical to determine how much variation is 

persevered as patterns of interest or ignored as noise. An extremely large  𝜆  will coerce the 

individual variation to be zero. Decreasing 𝜆  will allow more subtype-specific patterns to be 

detected until overfitting. 

Cross-validation is a popular strategy in parameter tuning for the balance of underfitting and 

overfitting. One round of cross-validation excludes a certain portion of samples and uses the model 

learned from other samples to predict the excluded ones. Then every model is assessed by 

summarizing prediction performances across multiple rounds. However, our sample-specific 

deconvolution estimates the individual expression of each sample in each subtype, which cannot 

be used to predict the excluded samples directly. Thus, we proposed to randomly exclude entries 

rather than samples in 𝑿 matrix (Fig. 3), similar to the strategy used in missing value imputation. 

The foundation of success is that the low-rank patterns in 𝑻𝑘  matrix are detectable by only a 

portion of 𝑿 entries and able to predict the excluded 𝑿 entries. 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 12 

 
Fig. 3. 10-fold cross-validation strategy for model parameter tuning. A part of entries is randomly removed 

before applying swCAM. The removed entries are reconstructed by estimated 𝑻 matrix and compared to 
observed expressions for computing RMSE to decide the optimal parameter 𝜆. 

Specifically, we fix the 𝑨 and 𝑺 at the initialization values (from CAM-estimation or a priori 

knowledge) and randomly remove entries in 𝑿 matrix, leading to the objective function w.r.t 

∆𝑺𝑖 , 𝑖 = 1, … , 𝑀: 

min
{∆𝑺𝑖}𝑖=1

𝑀

1

2
∑‖𝑃Ω𝑖 (𝒙𝑖 ) − 𝑃Ω𝑖 (𝒂𝑖 (�̅� + ∆𝑺𝑖 ))‖2

2
𝑀

𝑖=1

+ 𝜆 ∑‖𝑻𝑘 ‖∗

𝐾

𝑘=1

(14) 

𝑠. 𝑡. �̅� + ∆𝑺𝑖 ≽ 𝟎𝐾×𝐿 , 

𝑻𝑘 = [∆𝑺1
𝑇 (𝑘), … , ∆𝑺𝑀

𝑇 (𝑘)] ∈ ℝ𝐿×𝑀, 𝑘 = 1, … , 𝐾, 

where 𝑃Ω𝑖 (𝒙𝑖) ∈ ℝ
𝐿  denote a vector with the entries in Ω𝑖 left alone, and all other entries set to 

zero. The workflow of our proposed 10-fold cross-validation strategy is:  

(1) Randomly split all entries into 10 folds; 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 13 

(2) Remove one fold of entries and use the remaining 9 folds of entries to solve (14) with 

different 𝜆 values [𝜆1, 𝜆2, …]; 

(3) Use estimated ∆𝑺𝑖 (𝜆𝜃 ), 𝑖 = 1, … , 𝑀,  𝜃 = 1,2, …,  together with fixed 𝑨  and 𝑺  matrix to 

reconstruct 𝑿 matrix and only record the reconstructed values for the removed entries in 𝑿; 

(4) Repeat Step (2)-(3) and obtained a reconstructed �̃�(𝜆𝜃 ) matrix in which all entry values 

are reconstructed when their original values are absent in optimization processes with 𝜆 =

𝜆𝜃.  

(5) Calculate Root Mean Square Error (RMSE) by 

𝑅𝑀𝑆𝐸(𝜆𝜃 ) = √
1

𝑀𝐿
∑ ∑ (𝑿𝑖𝑗 − �̃�𝑖𝑗 (𝜆𝜃 ))

2
𝐿

𝑗=1

𝑀

𝑖=1

(15) 

(6) Choose the 𝜆𝜃 yielding the minimum RMSE. 

Warm start can be used in Step (2) with the decreasing parameter 𝜆1 > 𝜆2 > ⋯, which use the 

estimation with 𝜆𝜃 as the initialization of next optimization with 𝜆𝜃+1. 

The optimization problem (14) can be solved using a similar ADMM algorithm in (5-13) that have 

solved (3). The only modification is that (7) becomes 

𝒖𝑖
𝑞+1

∈ argmin
𝒖𝑖∈ℝ

𝐾𝐿

1

2
‖𝑃Ω𝑖

′ (𝑯𝑖 𝒖𝑖 ) − 𝑃Ω𝑖
′ (𝒗𝑖 )‖

2

2
+

𝛾

2
‖𝑪1𝒖𝑖 + 𝒚𝒊

𝑞
‖

𝐹

2
(16) 

where 𝑃Ω𝑖
′ (∙) = [𝟏𝐾

𝑇 ⨂ 𝑃Ω𝑖 (∙)
𝑇 ]

𝑇
∈ ℝ𝐾𝐿  makes all excluded-entry related variables be optimized 

only by the second term, which is still an unconstrained quadratic problem that can be solved easily. 

The remaining variables unrelated to excluded entries can still be optimized following (8-9). 

Sparsity regularization 

In addition to low-rank assumption, we could also reasonably assume only limited genes are 

involved in functional modules and thus impose a row-sparsity regularization by ℓ2,1 -norm 

minimization. The alternative swCAM formulation will be: 

min
𝑨,{∆𝑺𝑖}𝑖=1

𝑀

1

2
∑‖𝒙𝑖 − 𝒂𝑖 (�̅� + ∆𝑺𝑖 )‖2

2

𝑀

𝑖=1

+ 𝜆 ∑‖𝑻𝑘 ‖∗

𝐾

𝑘=1

+ 𝛿‖𝑻‖2,1 (17) 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 14 

where 𝛿 > 0 is the regularization parameter of ℓ2,1 norm of 𝑻, defined as 

‖𝑻‖2,1 ≜ ∑‖𝒕𝑖 ‖2

𝐾𝐿

𝑖=1

 
accounting for the row-sparsity of 𝑻. If necessary, the parameter 𝛿  actually can be varied for 

different rows based on the character of each gene, such as mean-variance trend. The 

supplementary information gives more details on the optimization of (17) by ADMM method. The 

ℓ1 or ℓ2-norm minimization, as common-used sparsity regularization methods, could impose the 

entry sparsity in 𝑻  matrix. We also provide ADMM optimization for sample-specific 

deconvolution with ℓ1 or ℓ2-norm minimization, which could be useful in other sBSS problems. 

Results 

As swCAM focuses on subtype-specific variation estimation, simulating biological variance 

within each subtype and technical variance for each observation is important for validating 

swCAM performance. We conduct two sets of simulations. The first is in an ideal scenario where 

the variance is not related to mean value. The second is more realistic where genes with larger 

mean usually have larger variance.  

Validation on ideal simulations 

In the first simulations, we design twelve function modules, with four in each of three subtypes. 

The observations for 300 genes in 50 samples were simulated with subtype-specific expression 

baseline, �̅� , sampled from the purified cell populations in real benchmark microarray gene 

expression data GSE19380 (Kuhn, Thu et al. 2011). 𝒂𝒊, 𝑖 = 1, … , 𝑀, are drawn randomly from a 

flat Dirichlet distribution. Between-sample variation, ∆𝑺𝑖 (𝑘, 𝑗), 𝑖 = 1, … , 𝑀, for the kth subtype 

and jth gene was drawn from normal distribution 𝒩(0, 𝜎𝑘𝑗
(𝑠)

) if the jth gene was involved in a 

function module in the kth subtype; otherwise zero (Fig. 4a). The genes in the same function 

module has pairwise correlation coefficient equal to one, thus generating a highly correlated gene 

set in each module. 𝜎𝑘𝑗
(𝑠)

 are drawn from uniform distribution 𝑈[50, 300]. The technical noise, 

𝒏𝑖 , 𝑖 = 1, … , 𝑀, was drawn from zero-mean normal distribution with the variance 𝜎𝑖𝑗
(𝑛)

=10.  

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 15 

The twelve functional modules can be recognized in the variation matrix from swCAM when 𝜆 

falls into a certain range (Fig. 4b~4i). Increasing the penalty parameter of the nuclear norm will 

filter more noise but at the cost of the possibility of missing the true variation signal. RMSE derived 

by 10-fold cross-validation strategy is relatively small when 𝜆 = 1~50 and reach the minimum at 

𝜆 = 5 (Fig. 5a). The estimated variation matrix looks quite similar when 1 ≤ 𝜆 ≤ 50 (Fig. 4e~4g), 

with 12 clear patterns and some artifacts. The artifacts are formed when the signal variation in one 

subtype spreads to other subtypes for the same genes, which are much lower than detected true 

signals if 𝜆 is not extremely small. (As shown in the supplementary information, the nuclear norm 

minimization for each subtype’s variation matrix is a good option to reduce artifacts compared to 

other regularization terms.) 

It is interesting to find 𝜆 = 5 is also the point where both primal and dual residuals surge in 

ADMM algorithm (Fig. 5c~5f). It is because larger 𝜆 tends to train an over-simplified model and 

thus approach the optimum solution more easily in ADMM. 

The recovery of sample-specific signals in a subtype is also affected by the mixing proportions of 

this subtype within the sample. When a subtype accounts for a very small portion in a certain 

sample, its true signal in this sample will be very weak and thus underestimated (green points in 

Fig. 6). On the contrary, the major subtype in a sample can be estimated very well by CAM-SS 

(red points in Fig. 6). 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 16 

 
Fig. 4. Heatmap of estimated 𝑇 matrix with varied 𝜆 parameters compared to ground truth in the ideal 
simulation. Increasing the penalty parameter of the nuclear norm will filter more noise but at the cost of the 

possibility of missing true signal variation. 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 17 

 
Fig. 5. 10-fold cross-validation results under different 𝜆 parameter in the ideal simulation. (a) RMSE; (c) 
Residuals for primal feasibility condition; (e) Residuals for dual feasibility condition; (b), (d), (f) are 

zoomed curves of (a), (c), (e). 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 18 

 
Fig. 6. Estimated 𝑻 matrix versus ground truth when 𝜆=5 in the ideal simulation. The mixing proportions 
associated with estimated entries are colored to show the sample-specific expression estimations for high-

proportion subtypes can be estimated more accurately than those for low-proportion ones 

Validation on realistic simulations 

Mean-variance trend is widely existing in molecular expression data. In our second simulation, all 

settings are the same as above except that the variance of subtype-specific expression, 𝜎𝑘𝑗
(𝑠)

, and 

the technical variance of observations, 𝜎𝑖𝑗
(𝑛)

, are proportional to the subtype-specific expression 

mean and mixed expression level, respectively. The coefficient of variation (CV), as the ratio of 

the standard deviation to the mean, is drawn from uniform distribution 𝑈[0.15, 0.3]  and 

𝑈[0.02, 0.05], respectively. 

10-fold cross-validation strategy still obtains the minimum RMSE at 𝜆 = 5 (Fig. 8a~8b) when 

both primal and dual residuals also surge (Fig. 8c~8f). However, the estimated variation matrix by 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 19 

swCAM is blurred by artifacts trained from noise (Fig. 7). Some high-expressed genes have 

relatively large variance, which could be falsely modeled as subtype-specific signal variations. As 

shown in Fig. 9, the entries with zero value in Ground Truth variation matrix could be 

overestimated.  

Though the absolute expression values estimated by swCAM could deviate from Ground Truth, 

we can still clearly detect 12 functional modules defined by the Weighted Gene Correlation 

Network Analysis (WGCNA) (Zhang and Horvath 2005, Langfelder and Horvath 2008) on the 

estimated sample-specific expressions (Fig. 10). WGCNA constructs weighted networks based on 

correlation patterns among genes across samples and thus detects function modules of highly-

correlated gene sets. In Fig. 10, the second and third subtype finds the exact four true modules 

with very few genes are missed. The first subtype detects an extra false module, but it is a less 

significant pattern compared to other modules and can be undetectable with stricter tree height cut 

threshold. More importantly, without swCAM based deconvolution (Fig. 10d), WGCNA on 

mixture expression profiles can find none of the true modules, but three false modules that are 

related to the mixing process of three subtypes. 

Incorporation of L21-norm regularization 

In the above simulations, the deconvoluted sample-specific signals contain artifacts trained from 

signals of other subtypes and artifacts trained from noise (Fig. 4 and Fig. 7). We can use a ℓ2,1-

norm regularization to enforce the sparsity of genes that have signal variation across samples. It is 

supposed to reduce artifacts while it also follows the assumption that genes contributing to source 

variation in hidden modules are limited. Figure 11 shows the alleviated artifacts with 𝜆 = 5 and 

𝛿 = 10, 1, or 0.1. The true function modules are correctly detected with 𝜆 = 5 and 𝛿 = 1 or 0.1, 

where the false module in the first subtype is suppressed when 𝛿 = 1 (Fig. 12). 

Increasing the penalty parameter 𝛿 will force more genes to have zero variance, which suppresses 

the artifacts and false function modules but brings the risk of missing the true signals. It is critical 

to propose a parameter tuning method for 𝛿. However, the cross-validation strategy with randomly 

excluding entries for tuning parameter 𝜆 is based on the low-rank assumption, where the hidden 

low-rank patterns can be trained from a part of entries and then used to reconstruct the remaining 

entries. This strategy is not applicable to 𝛿 selection, which needs further study.  

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 20 

 
Fig. 7. Heatmap of estimated 𝑻 matrix scaled by associated means compared to ground truth in the realistic 
simulation with varied 𝜆 parameters. Increasing the penalty parameter of the nuclear norm will filter more 
noise but at the cost of the possibility of missing true variation signal. 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 21 

 
Fig. 8. 10-fold cross-validation results under different 𝜆 parameter in the realistic simulation. (a) RMSE; 
(c) Residuals for primal feasibility condition; (e) Residuals for dual feasibility condition; (b), (d), (f) are 

zoomed curves of (a), (c), (e). 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 22 

 
Fig. 9. Estimated 𝑻 matrix scaled by associated means versus ground truth in the realistic simulation (𝜆=5). 
The mixing proportions associated with estimated entries are colored to show the sample-specific 

expression estimations for high-proportion subtypes can be estimated more accurately than those for low-

proportion ones. 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 23 

 
Fig. 10. Gene co-expressed function modules detected by WGCNA on swCAM estimated sample-specific 

expression for each subtype (a~c) or on originally observed expressions without deconvoluton (d).  

(Network interconnectedness is measured by topological overlap; cutHeight = 0.7; minSize = 8.) 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 24 

 
Fig. 11. Heatmap of estimated T matrix scaled by associated means compared to ground truth in the realistic 

simulation with 𝜆 = 5 and varied 𝛿. Increasing the penalty of L21 norm will enforce more zero columns in 
𝛥𝑆𝑘 matrix. 

 
Fig. 12. Gene co-expressed function modules detected by WGCNA on swCAM estimated sample-

specific expression for each subtype with λ=5 and δ=1 or 0.1.  (Network interconnectedness is 

measured by topological overlap; cutHeight = 0.7; minSize = 8.) 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 25 

Discussion 

Most existing tissue deconvolution methods ignore the expression variability of subtypes across 

individual samples. swCAM will significantly expand the utility of CAM by producing subtype-

specific expression profiles in each sample. The success of swCAM depends on the low-rank 

assumption, which takes advantage of biologically expected cooperation among genes and thus 

sheds light on solving the seemingly underdetermined sample-specific deconvolution problem. 

The low-rank assumption holds naturally in molecule expression data when there exist activated 

functional modules required by particular biological processes or pathways in different subtypes. 

The detection of such subtype-specific associations or networks is one of the major targets in the 

analysis of molecule expression profiles. After our sample-specific deconvolution by swCAM, 

conventional network analysis methods can be applied directly to the estimated sample-subtype-

specific signals to construct subtype-specific networks, e.g. weighted correlation network analysis 

(WGCNA (Zhang and Horvath 2005, Langfelder and Horvath 2008)) and differential dependency 

network analysis (DDN (Zhang, Li et al. 2009, Zhang, Tian et al. 2011, Tian, Zhang et al. 2014, 

Tian, Zhang et al. 2015)). 

The cross-validation strategy of excluding entries randomly is inspired by the similar ideas in 

matrix imputation methods that commonly assume the matrix to be recovered has a low rank. Our 

results consistently show a U-curve over parameter 𝜆, demonstrating the feasibility of the proposed 

cross-validation strategy. Meanwhile, CAM is not sensitive to the choice of 𝜆, as the U-curve has 

a wide platform where the recovered sample-subtype-specific signals are similar and detected 

modules are close.  

It is also reasonable to assume that genes involved in biological associations or networks are sparse. 

Therefore, it deserves our further study to use ℓ2,1-norm regularization for reducing artifacts and 

improving function module detection.  

When group information is available, we can also apply basic CAM algorithm to each group to 

obtain group-wise expression profiles of subtypes. Compared to sample-specific deconvolution, 

group-specific deconvolution aims at a lower resolution of underlying subtype signals and thus 

could obtain more robust results. If grouping is fine enough, group-specific deconvolution can also 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 26 

acquire signal variation in each subtype and thus help detect function modules and construct 

biological networks. 

Though swCAM can solve a seemingly underdetermined problem theoretically based on a low-

rank assumption. It still needs improvement and validations. First, the improvement of swCAM 

by sparsity regularization. The sparsity assumption is practically reasonable, and we already show 

some preliminary results after imposing ℓ2,1 norm regularization. However, introducing one more 

regularization term will increase the difficulty of parameter tuning. Besides, the current cross-

validation strategy with matrix entry sampling is not applicable to selecting the coefficient of ℓ2,1 

norm term. Therefore, the integration of sparsity regularization still needs our further study. 

Second, the improvement of function module detection based on swCAM estimated sample-

specific signals in each subtype. Recovering the exact values of sample-specific signals is 

impossible unless there are more strong assumptions. Luckily, our goal is to detect function 

module or networks from the between-sample variations in each subtype. Thus, increasing the 

accuracy of estimated intercorrelations among molecules can be regarded as our target of further 

efforts. Third, the validation of Validate swCAM in real data analysis. We have demonstrated the 

capacity of swCAM to estimate sample-specific signals in each subtype using simulations where 

the between-sample variation matrices are low-rank. Validation of swCAM in real molecule 

expression data would be difficult, as the benchmark datasets with true subtype-specific signals 

are unavailable. One possible direction is to verify the constructed subtype-specific networks 

through biological experiments. 

Conclusion 

We propose a sample-specific deconvolution algorithm to estimate simple-specific molecule 

expressions for each subtype, from which between-sample variation can be used to detect 

biological associations and construct networks in each subtype. The contributions of this work 

include: We formulate the objective function for swCAM with a penalty term to minimize the 

nuclear norm of between-sample variation matrix in each subtype, based on our expectation on the 

existence of subtype-specific networks. We design an efficient method based on ADMM to solve 

swCAM’s optimization problem in large-scale biological data. We design a 10-fold cross-

validation strategy to select the coefficient of nuclear norm term, and demonstrate its feasibility in 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 27 

simulations where a U-curve of RMSE is obtained to determine the optimal selection. We validate 

swCAM in simulations to demonstrate sample-specific signals can be well estimated when low-

rank assumption holds. Even though artificial signal variances exist in swCAM estimations, the 

intercorrelations among genes can still be well preserved for function module detection and 

biological network construction. We propose to use extra ℓ2,1 norm regularization to enforce the 

sparsity of genes involved in networks and thus reduce the artifacts trained from noise or from 

signals of other subtypes. 

 
ACKNOWLEDGMENTS 

This work has been supported by the National Institutes of Health under Grants HL111362-

05A1, HL133932, NS115658-01, and the Department of Defence under Grant W81XWH-18-1-

0723 (BC171885P1).  

 
COMPETING FINANCIAL INTERESTS 

The authors declare no competing financial interests. 

 
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 28 

Reference 

Boyd, S., N. Parikh, E. Chu, B. Peleato and J. Eckstein (2011). "Distributed Optimization and 

Statistical Learning via the Alternating Direction Method of Multipliers." Found. Trends Mach. 

Learn. 3(1): 1-122. 

Buettner, F., K. N. Natarajan, F. P. Casale, V. Proserpio, A. Scialdone, F. J. Theis, S. A. Teichmann, 

J. C. Marioni and O. Stegle (2015). "Computational analysis of cell-to-cell heterogeneity in single-

cell RNA-sequencing data reveals hidden subpopulations of cells." Nat Biotechnol 33(2): 155-160. 

Cai, J.-F., E. J. Candès and Z. Shen (2010). "A Singular Value Thresholding Algorithm for Matrix 

Completion." SIAM Journal on Optimization 20(4): 1956-1982. 

Candes, E. J., C. A. Sing-Long and J. D. Trzasko (2013). "Unbiased Risk Estimates for Singular 

Value Thresholding and Spectral Estimators." Trans. Sig. Proc. 61(19): 4643-4657. 

Chasman, D. and S. Roy (2017). "Inference of cell type specific regulatory networks on 

mammalian lineages." Current Opinion in Systems Biology 2(Supplement C): 130-139. 

Chen, L. (2019). Mathematical Modeling and Deconvolution for Molecular Characterization of 

Tissue Heterogeneity. Ph.D. Doctoral Dissertation, Virginia Polytechnic Institute and State 

University. 

Chen, L., Y. Lu, C.-T. Wu, R. Clarke, G. Yu, J. E. Van Eyk, D. Herrington and Y. Wang (2020). 

"Data-driven detection of subtype-specific differentially expressed genes." Scientific Reports. 

Gal, E., M. London, A. Globerson, S. Ramaswamy, M. W. Reimann, E. Muller, H. Markram and 

I. Segev (2017). "Rich cell-type-specific network topology in neocortical microcircuitry." Nature 

Neuroscience 20: 1004. 

Hastie, T., R. Tibshirani and J. Friedman (2001). The Elements of Statistical Learning. New York, 

NY, USA, Springer New York Inc. 

Junttila, M. R. and F. J. de Sauvage (2013). "Influence of tumour micro-environment heterogeneity 

on therapeutic response." Nature 501: 346. 

Kuhn, A., D. Thu, H. J. Waldvogel, R. L. Faull and R. Luthi-Carter (2011). "Population-specific 

expression analysis (PSEA) reveals molecular changes in diseased brain." Nat Methods 8(11): 

945-947. 

Langfelder, P. and S. Horvath (2008). "WGCNA: an R package for weighted correlation network 

analysis." BMC Bioinformatics 9: 559. 

Parikh, N. and S. Boyd (2014). "Proximal Algorithms." Foundations and Trends® in Optimization 

1(3): 127-239. 

Recht, B., M. Fazel and P. A. Parrilo (2010). "Guaranteed Minimum-Rank Solutions of Linear 

Matrix Equations via Nuclear Norm Minimization." SIAM Review 52(3): 471-501. 

Shen-Orr, S. S., R. Tibshirani, P. Khatri, D. L. Bodian, F. Staedtler, N. M. Perry, T. Hastie, M. M. 

Sarwal, M. M. Davis and A. J. Butte (2010). "Cell type-specific gene expression differences in 

complex tissues." Nat Methods 7(4): 287-289. 

Sonawane, A. R., J. Platig, M. Fagny, C.-Y. Chen, J. N. Paulson, C. M. Lopes-Ramos, D. L. 

DeMeo, J. Quackenbush, K. Glass and M. L. Kuijjer "Understanding Tissue-Specific Gene 

Regulation." Cell Reports 21(4): 1077-1088. 

Thouvenin, P. A., N. Dobigeon and J. Y. Tourneret (2016). "Hyperspectral Unmixing With 

Spectral Variability Using a Perturbed Linear Mixing Model." IEEE Transactions on Signal 

Processing 64(2): 525-538. 

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315


 29 

Tian, Y., B. Zhang, E. P. Hoffman, R. Clarke, Z. Zhang, I.-M. Shih, J. Xuan, D. M. Herrington 

and Y. Wang (2014). "Knowledge-fused differential dependency network models for detecting 

significant rewiring in biological networks." BMC Systems Biology 8(1): 87. 

Tian, Y., B. Zhang, E. P. Hoffman, R. Clarke, Z. Zhang, M. Shih Ie, J. Xuan, D. M. Herrington 

and Y. Wang (2015). "KDDN: an open-source Cytoscape app for constructing differential 

dependency networks with significant rewiring." Bioinformatics 31(2): 287-289. 

Wang, N., E. P. Hoffman, L. Chen, L. Chen, Z. Zhang, C. Liu, G. Yu, D. M. Herrington, R. Clarke 

and Y. Wang (2016). "Mathematical modelling of transcriptional heterogeneity identifies novel 

markers and subpopulations in complex tissues." Scientific Reports 6: 18909. 

Zhang, B. and S. Horvath (2005). "A general framework for weighted gene co-expression network 

analysis." Stat Appl Genet Mol Biol 4: Article17. 

Zhang, B., H. Li, R. B. Riggins, M. Zhan, J. Xuan, Z. Zhang, E. P. Hoffman, R. Clarke and Y. 

Wang (2009). "Differential dependency network analysis to identify condition-specific topological 

changes in biological networks." Bioinformatics 25(4): 526-532. 

Zhang, B., Y. Tian, L. Jin, H. Li, M. Shih Ie, S. Madhavan, R. Clarke, E. P. Hoffman, J. Xuan, L. 

Hilakivi-Clarke and Y. Wang (2011). "DDN: a caBIG(R) analytical tool for differential network 

analysis." Bioinformatics 27(7): 1036-1038. 

 
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 
The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.04.425315