Sample-wise unsupervised deconvolution of complex tissues 1 Sample-wise unsupervised deconvolution of complex tissues Lulu Chen1, Chia-Hsiang Lin2,*, Chiung-Ting Wu1,*, Robert Clarke3, Guoqiang Yu1, Jennifer E. Van Eyk4, David M. Herrington5, and Yue Wang1,# * Equal contribution 1Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA; 2Department of Electrical Engineering, National Cheng Kung University, Tainan City, Taiwan 70101; 3The Hormel Institute, University of Minnesota, Austin, MN 55912; 4Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA; 5Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA Article Format: bioRxiv Running title: Sample-wise unsupervised deconvolution Open source software: R code is available at https://github.com/Lululuella/swCAM #Author for correspondence: Yue Wang, Ph.D. Virginia Polytechnic Institute and State University 900 N. Glebe Road, Arlington, VA 22203 E-mail: yuewang@vt.edu (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint mailto:yuewang@vt.edu https://doi.org/10.1101/2021.01.04.425315 2 Abstract We report a sample-wise fully unsupervised deconvolution method, namely sample-wise Convex Analysis of Mixtures (swCAM), that can estimate constituent proportions and subtype-specific expressions in individual samples using tissue-level bulk data (Chen 2019). The swCAM software tool enables statistically-principled subtype-level downstream analyses, such as detecting subtype- specific differentially expressed genes (sDEG) and differential dependency networks (DDN) (Zhang, Li et al. 2009, Chen, Lu et al. 2020). Significantly different from population-level deconvolution, individual-level deconvolution is mathematically an underdetermined problem because there are more variables than observations. We therefore extend the existing CAM framework by adding an extra term of between-sample variations and formulate swCAM as a nuclear-norm regularized low-rank matrix factorization problem (Wang, Hoffman et al. 2016). We determine hyperparameter value by random entry exclusion based cross-validation scheme and obtain swCAM solution using a modified efficient alternating direction method of multipliers (ADMM). Experimental results on realistic simulation data sets show that swCAM can accurately perform sample-wise unsupervised deconvolution of complex tissues and successfully recover subtype-specific correlation networks that are otherwise unobtainable using existing methods. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 3 Introduction Decades of research on molecule regulatory mechanisms have provided a rich framework with which we can extract molecule expression patterns to gain insight into the organization and structure of the large biological networks (Zhang and Horvath 2005, Zhang, Li et al. 2009, Tian, Zhang et al. 2014). However, most discoveries are concluded from measured molecule expressions in heterogeneous tissues, in which the underlying changes of constituent components could obscure molecule regulations and corrupt network inference that only occur in particular tissue subtypes. The inference of subtype-specific molecule expression patterns becomes an essential problem for understanding complex molecule functions and the role of each subtype during the dynamic biological process, such as cell fate specification (Sonawane, Platig et al. , Chasman and Roy 2017, Gal, London et al. 2017). The ability to obtain sample-wise expression variation in each subtype is critical prior to infer subtype-specific molecular networks (Shen-Orr, Tibshirani et al. 2010, Junttila and de Sauvage 2013). Single-cell expression profiling techniques have become popular to investigate cell-type-specific network but may lose critical information of cell-cell interactions and is prone to cell-cycle/state confounders (Buettner, Natarajan et al. 2015, Gal, London et al. 2017). While the current CAM tool can dissect mixed signals of multiple samples into the ‘averaged’ expression profiles of subtypes, many subsequent molecular analyses of complex tissues require sample-specific signal deconvolution where each sample is a mixture of ‘individualized’ subtype- specific expression profiles. Here we propose a new algorithm called swCAM as an extension of CAM to solve sample-specific Blind Source Separation (sBSS) problem. The sBSS problem, because the number of variables is much larger than the number of observations, is ill-posed and underdetermined. As a result, simple yet highly regularized approaches often become the methods of choice (Hastie, Tibshirani et al. 2001). The sBSS problems have received increasing interest in hyperspectral imagery area where the spatially smooth and variation sparsity regularization can be exploited to unmix spectral signals (Thouvenin, Dobigeon et al. 2016). In the context of biological process, transcriptional regulatory networks connect regulatory proteins, such as transcription factors (TFs) and signaling proteins, to target genes and thus form co-expressed gene sets as function modules in each subtype. Based on such underlying cellular mechanisms, we impose and exploit the low-rank assumption on between-sample variations of molecule expressions in each (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 4 subtype. While estimating subtype-specific signals from a single mixture for each gene independently is an underdetermined problem, the low-rank assumption can aggregate information from gene sets within function modules to help find a biologically plausible solution. General rank minimization is a challenging nonconvex optimization problem for which all existing finite-time algorithms have at least doubly exponential running times in both theory and practice (Recht, Fazel et al. 2010). Minimizing the nuclear norm, or the sum of the singular values of the matrix, over the affine subset, have multiple advantages. The nuclear norm is a convex function and can be optimized efficiently, thus can provide the best convex approximation of the rank function over the unit ball of matrices with norm less than one (Recht, Fazel et al. 2010, Candes, Sing-Long et al. 2013). Nuclear norm regularization has been successfully applied in many practical applications with low-rank modeling, such as image denoising (Candes, Sing-Long et al. 2013) and matrix completion (Cai, Candès et al. 2010). swCAM will adopt nuclear norm regularization to optimize the estimation of between-sample variations in each subtype to recover sample-specific signals for each subtype. This Chapter introduces mathematical modeling of sample-specific deconvolution and optimization solver used in swCAM algorithm, followed by validation in simulations and discussion on further possible improvement by sparsity regularization. Method Problem formulation and nuclear norm regularization A fundamental assumption for the conventional linear mixing model, 𝑿 = 𝑨𝑺 + 𝑬, is that all the mixture samples share a common source matrix 𝑺 . However, each sample may have its ‘individualized’ sample-specific sources as the sampled realizations in additional sample-specific subtype proportions (Fig. 1): 𝑺𝑖 = �̅� + ∆𝑺𝑖 , 𝑖 = 1, … , 𝑀. The associated sample-specific BSS (sBSS) model is given by (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 5 𝒙𝑖 = 𝒂𝑖 (�̅� + ∆𝑺𝑖 ) + 𝒏𝑖 ∈ ℝ 𝐿 , ∀𝑖 = 1, … , 𝑀 (1) where 𝒂𝑖 is the row vector in proportion matrix 𝑨, and 𝒏𝑖 is noise term. ∆𝑺𝑖 is expected to have small-valued entries so that source matrices among different samples are similar. Fig. 1. Sample-specific deconvolution problem formulation and the assumption of hidden low-rank pattern in each source. (For convenient illustration, 𝑻 matrix in all figures are the transposed version of those in the text and equations.) If some expression units (molecular features) are highly correlated in a particular source, the rank of the matrix consisting of ∆𝑺𝑖 entries in this source will be low, leading to the following swCAM objective function: min 𝑨,{∆𝑺𝑖}𝑖=1 𝑀 1 2 ∑‖𝒙𝑖 − 𝒂𝑖 (�̅� + ∆𝑺𝑖 )‖2 2 𝑀 𝑖=1 + 𝜆 ∑‖𝑻𝑘 ‖∗ 𝐾 𝑘=1 (2) 𝑠. 𝑡. 𝒂𝑖 ≽ 𝟎𝐾 , 𝒂𝑖 𝟏𝐾 = 1, �̅� + ∆𝑺𝑖 ≽ 𝟎𝐾×𝐿 , 𝑻𝑘 = [∆𝑺1 𝑇 (𝑘), … , ∆𝑺𝑀 𝑇 (𝑘)] ∈ ℝ𝐿×𝑀, 𝑘 = 1, … , 𝐾, where 𝑻𝑘 consists of the 𝑘 th column in all ∆𝑺𝑖 , 𝑖 = 1, … , 𝑀 , representing between-sample variation in source 𝑘, namely the variation matrix for the 𝑘th subtype. ‖𝑻𝑘 ‖∗ is the nuclear norm of 𝑻𝑘; and 𝜆 > 0 is the regularization parameter of nuclear norms. The hyperparameter λ actually (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 6 can be different among different source 𝑘 and thus the regularizer in (2) can be replaced by ∑ 𝜆𝑘 ‖𝑻𝑘 ‖∗ 𝐾 𝑘=1 if necessary. Please see supplementary information for the reason to select subtype- specific regularization terms. Optimization of swCAM objective function The objective function in (2) is bi-convex w.r.t. the two block-wise variables, i.e. 𝑨 ≜ [𝒂1 𝑇 , … , 𝒂𝑀 𝑇 ]𝑇 and 𝑻 ≜ [𝑻1 𝑇 , … , 𝑻𝑀 𝑇 ]𝑇 ∈ ℝ𝐾𝐿×𝑀 . Accordingly, we can solve (2) by alternatively solving the following two convex subproblems until convergence: 𝑻𝑝+1 ∈ argmin ∆𝑺i≽−𝑺,∀𝒊 𝒥(𝑨𝑝, 𝑻) (3) 𝑨𝑝+1 ∈ argmin 𝑨≽0𝑀×𝐾,𝑨𝟏𝐾=𝟏𝑀 𝒥(𝑨, 𝑻𝑝+1) (4) where 𝒥(𝑨, 𝑻) ≜ 1 2 ∑‖𝒙𝑖 − 𝒂𝑖 (�̅� + ∆𝑺i)‖2 2 𝑀 𝑖=1 + 𝜆 ∑‖𝑻𝑘 ‖∗ 𝐾 𝑘=1 CAM-estimated subtype-specific expression matrix serves as the initial reference 𝑺. Note that in (3) (4), we have implicitly used the following relationship for concise representation: 𝑻 ≜ [𝑣𝑒𝑐(Δ𝑺1 𝑇 ), … , 𝑣𝑒𝑐(Δ𝑺𝑀 𝑇 )], where (4) can be decoupled w.r.t each row of 𝑨: 𝒂𝑖 𝑝+1 ∈ argmin 𝒂𝑖≽𝟎𝐾,𝒂𝑖𝟏𝐾=1 1 2 ‖𝒙𝑖 − 𝒂𝑖 (�̅� + ∆𝑺𝑖 𝑝+1 )‖ 2 2 which can be solved using quadratic programming. If a prior proportion matrix or CAM-estimated proportion matrix has already been of high quality, we can skip the alternative optimization on 𝑨 matrix, and obtain 𝑻 matrix by optimizing the subproblem (3) only once. To solve (3), we notice that the main bottleneck is its huge dimension of variables (typically, L is several ten thousand), preventing conventional convex solvers from being readily applicable here. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 7 We propose to solve (3) by adapting the alternating direction method of multipliers (ADMM), which has been widely applied to many large-scale problems in areas such as statistical learning, image processing and computational biology (Boyd, Parikh et al. 2011). ADMM naturally allows decoupling the non-smooth regularization term from the smooth loss term, which is computationally advantageous. Specifically, we reformulate (3) in the form that the primal variable can be “split” into several parts, with the associated objective function “separable” across this splitting (Boyd, Parikh et al. 2011). We will use the following definitions: 𝑻 ≜ [𝑣𝑒𝑐(Δ𝑺1 𝑇 ), … , 𝑣𝑒𝑐(Δ𝑺𝑀 𝑇 )] = [ 𝑻1 … 𝑻𝐾 ] ∈ ℝ𝐾𝐿×𝑀 𝑺 ≜ [𝑣𝑒𝑐(𝑺1 𝑇 ), … , 𝑣𝑒𝑐(𝑺𝑀 𝑇 )] ∈ ℝ𝐾𝐿×𝑀 𝑽 ≜ 𝑿𝑇 ∈ ℝ𝐿×𝑀 𝑾 ≜ [ 𝑻 𝑺 ] ∈ ℝ2𝐾𝐿×𝑀 𝑪1 ≜ [ 𝑰𝐾𝐿 𝑰𝐾𝐿 ] ∈ ℝ2𝐾𝐿×𝐾𝐿 𝑪2 ≜ −𝑰2𝐾𝐿 ∈ ℝ 2𝐾𝐿×2𝐾𝐿 𝑪3 ≜ [ 𝟏𝑀 𝑇 ⨂𝑣𝑒𝑐(�̅�𝑇) 𝟎𝐾𝐿×𝑀 ] ∈ ℝ2𝐾𝐿×𝑀 𝑩0 ≜ [𝟎𝐾𝐿×𝐾𝐿 , 𝑰𝐾𝐿 ] ∈ ℝ 𝐾𝐿×2𝐾𝐿 𝑩𝑘 ≜ [𝟎𝐿×(𝑘−1)𝐿 , 𝑰𝐿 , 𝟎𝐿×(𝐾−𝑘)𝐿 , 𝟎𝐿×𝐾𝐿 ] ∈ ℝ 𝐿×2𝐾𝐿 , 𝑘 = 0, … , 𝐾 Then we can simplify (3) as the equivalent form: min 𝑼∈ℝ𝐾𝐿×𝑀,𝑾∈ℝ2𝐾𝐿×𝑀 1 2 ‖𝒜(𝑼) − 𝑽‖𝐹 2 + 𝜆 ∑‖𝑩𝑘 𝑾‖∗ 𝐾 𝑘=1 + 𝐼+(𝑩0𝑾) (5) 𝑠. 𝑡. 𝑪1𝑼 + 𝑪2𝑾 = 𝑪3, (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 8 where 𝐼+(∙) is the indicator function for the non-negative orthant; 𝐼+(𝑩0𝑾) = 𝐼+(𝑺) = 0 if 𝑺 ≽ 𝟎𝐾𝐿×𝑀 ( 𝐼+(𝑼) = +∞ , otherwise). The linear transformation in the first term is 𝒜(𝑼) = 𝒜([𝒖1, … , 𝒖𝑀]) = [𝑯1𝒖1, … , 𝑯𝑀𝒖𝑀] with 𝑯𝑖 = [𝒂𝑖 𝑝 ⨂𝐼𝐿 ], 𝑖 = 1, … , 𝑀 . Note that (5) has been with the ADMM form w.r.t. the two split block variables 𝑼 and 𝑾, and, as (5) is solved, the solution of (3) can be obtained by 𝑻𝑝+1 = [ 𝑰𝐾𝐿 , 𝟎𝐾𝐿×𝐾𝐿 ]𝑾 ∗. Given a penalty parameter 𝛾 > 0 (empirically, 𝛾 ≔ 1 generally guarantees good convergence speed), the augmented Lagrangian (ignoring some irrelevant terms) of problem (5) is defined by ℒ(𝑼, 𝑾, 𝒁) = 1 2 ‖𝒜(𝑼) − 𝑽‖𝐹 2 + 𝜆 ∑‖𝑩𝑘 𝑾‖∗ 𝐾 𝑘=1 + 𝐼+(𝑩0𝑾) + 𝛾 2 ‖𝑪1𝑼 + 𝑪2𝑾 − 𝑪3 − 𝒁‖𝐹 2 where “−𝛾𝒁”∈ ℝ2𝐾𝐿×𝑀 is the dual variable (or Lagrange multiplier) associated with the constraint 𝑪1𝑼 + 𝑪2𝑾 = 𝑪3. Then, ADMM solves (5) via the following iterative procedure: 𝑼𝑞+1𝜖 argmin 𝑼∈ℝ𝐾𝐿×𝑀 ℒ(𝑼, 𝑾𝑞 , 𝒁𝑞 ) (6𝑎) 𝑾𝑞+1𝜖 argmin 𝑾∈ℝ2𝐾𝐿×𝑀 ℒ(𝑼𝑞+1, 𝑾, 𝒁𝑞 ) (6𝑏) 𝒁𝑞+1 = 𝒁𝑞 − (𝑪1𝑼 𝑞+1 + 𝑪2𝑾 𝑞+1 − 𝑪3) (6𝑐) where 𝑾0 can be initialized by [𝑻0 𝑇 , 𝑼0 𝑇 ]𝑇 with 𝑻0 = 𝟎𝐾𝐿×𝑀 and 𝑼0 = 𝟏𝑀 𝑇 ⨂𝑣𝑒𝑐(�̅�𝑇 ); 𝒁0 can be simply initialized by 𝟎2𝐾𝐿×𝑀. As we will show, both (6a) and (6b) can be solved with closed-form expressions, thanks to the decomposability of ADMM. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 9 Fig. 2. The objective function of swCAM for sample-specific deconvolution problem and its reformulation by ADMM. (For convenient illustration, 𝑻 matrix in all figures are the transposed version of those in the text and equations.) Notice that (6a) is a column-wise separable optimization problem, so we can decouple w.r.t each column of 𝑼: 𝒖𝑖 𝑞+1 ∈ argmin 𝒖𝑖∈ℝ 𝐾𝐿 1 2 ‖𝑯𝑖 𝒖𝑖 − 𝒗𝑖 ‖2 2 + 𝛾 2 ‖𝑪1𝒖𝑖 + 𝒚𝒊 𝑞 ‖ 𝐹 2 (7) where [𝒚1 𝑞 , … , 𝒚𝑀 𝑞 ] ≜ 𝑪2𝑾 𝑞 − 𝑪3 − 𝒁 𝑞 . The subproblem (7) is an unconstrained quadratic problem, which can be solved by 𝒖𝑖 𝑞+1 = (𝑯𝑖 𝑇 𝑯𝑖 + 𝛾𝑪1 𝑇 𝑪1) −1(𝑯𝑖 𝑇 𝒗𝑖 − 𝛾𝑪1 𝑇 𝒚𝒊 𝑞 ). (8) The matrix inversion can speed up by (𝑯𝑖 𝑇 𝑯𝑖 + 𝛾𝑪1 𝑇 𝑪1) −1 = ((𝒂𝑖 𝑝 ) 𝑇 𝒂𝑖 𝑝 + 2𝛾𝑰𝐾 ) −1 ⨂𝑰𝐿 . The right term in (8) can also be simplified as (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 10 𝑯𝑖 𝑇 𝒗𝑖 − 𝛾𝑪1 𝑇 𝒚𝒊 𝑞 = (𝒂𝑖 𝑝 ) 𝑇 ⨂𝒙𝑖 𝑇 − 𝛾 (𝒚 𝒊 𝑞 + 𝒚𝒊 𝑞 ), where 𝒚𝒊 𝑞 = [(𝒚 𝒊 𝑞 ) 𝑇 , (𝒚𝒊 𝑞 ) 𝑇 ] 𝑇 with 𝒚 𝒊 𝑞 ∈ ℝ𝐾𝐿 and 𝒚𝒊 𝑞 ∈ ℝ𝐾𝐿 being the first and second half vector of 𝒚𝒊 𝑞 , respectively. Finally, the column vectors of 𝑼𝑞+1 in (6a) can be computed fast by 𝒖𝑖 𝑞+1 = 𝑣𝑒𝑐 {𝑑𝑒𝑣𝑒𝑐 {(𝒂𝑖 𝑝 ) 𝑇 ⨂𝒙𝑖 𝑇 − 𝛾 (𝒚 𝒊 𝑞 + 𝒚𝒊 𝑞 ) |𝐿, 𝐾} ((𝒂𝑖 𝑝 ) 𝑇 𝒂𝑖 𝑝 + 2𝛾𝑰𝐾 ) −1 } (9) To solve (4.6b), we remove some irrelevant terms from its objective function: min 𝑾∈ℝ2𝐾𝐿×𝑀 𝜆 ∑‖𝑩𝑘 𝑾‖∗ 𝐾 𝑘=1 + 𝐼+(𝑩0𝑾) + 𝛾 2 ‖𝑪1𝑼 𝑞+1 + 𝑪2𝑾 − 𝑪3 − 𝒁 𝑞 ‖𝐹 2 , (10) And then, by defining 𝑼𝑘 𝑞+1 ∈ ℝ𝐿×𝑀, 𝑘 = 1, … , 𝐾 as block matrices from top to bottom in 𝑼𝑞+1 ∈ ℝ𝐾𝐿×𝑀 , 𝒁𝑘 ∈ ℝ 𝐿×𝑀, 𝑘 = 1, … , 𝐾 and 𝒁0 ∈ ℝ 𝐾𝐿×𝑀 as block matrices from top to bottom in 𝒁 ∈ ℝ2𝐾𝐿×𝑀 , respectively (i.e., 𝒁 ≜ [𝒁1 𝑇 , … , 𝒁𝐾 𝑇 , 𝒁0 𝑇 ]𝑇 ), we decouple the objective function (10) as functions of 𝑻𝑘 , 𝑘 = 1, … , 𝐾 and 𝑺: min 𝑾∈ℝ2𝐾𝐿×𝑀 ∑ {𝜆‖𝑻𝑘 ‖∗ + 𝛾 2 ‖𝑼𝑘 𝑞+1 − 𝑻𝑘 − 𝟏𝑀 𝑇 ⨂�̅�𝑘 − 𝒁𝑘 𝑞 ‖ 𝐹 2 } 𝐾 𝑘=1 + {𝐼+(𝑺) + 𝛾 2 ‖𝑼𝑞+1 − 𝑺 − 𝒁0 𝑞 ‖ 𝐹 2 } Therefore, 𝑾𝑞+1 can be solved by the proximal point algorithm (PPA) (Parikh and Boyd 2014). Specifically, we have 𝑾𝑞+1 = [(𝑻1 𝑞+1 ) 𝑇 , … , (𝑻𝐾 𝑞+1 ) 𝑇 , (𝑺𝑞+1)𝑇 ] 𝑇 in which 𝑻𝑘 𝑞+1 ∈ argmin 𝑻∈ℝ𝐾𝐿×𝑀 𝜆‖𝑻𝑘 ‖∗ + 𝛾 2 ‖𝑼𝑘 𝑞+1 − 𝑻𝑘 − 𝟏𝑀 𝑇 ⨂�̅�𝑘 − 𝒁𝑘 𝑞 ‖ 𝐹 2 (11𝑎) 𝑺𝑞+1 ∈ argmin 𝑻∈ℝ𝐾𝐿×𝑀 𝐼+(𝑺) + 𝛾 2 ‖𝑼𝑞+1 − 𝑺 − 𝒁0 𝑞 ‖ 𝐹 2 (11𝑏) (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 11 Note that (4.11a) and (4.11b) are exactly the proximal operators of ‖𝑻𝑘 ‖∗ and 𝐼+(𝑺), respectively (Parikh and Boyd 2014), and their closed-form solutions are given by 𝑻𝑘 𝑞+1 = ∑ (𝜎𝑘ℓ − 𝜆 𝛾 ) + 𝝁𝑘ℓ𝝂𝑘ℓ 𝑇 𝑟 ℓ=1 , 𝑘 = 1, … , 𝐾, (12) 𝑺𝑞+1 = [𝑼𝑞+1 − 𝒁0 𝑞 ] + , (13) where the singular value decomposition (SVD) of is performed ahead of the computation of (12), i.e. 𝑼𝑘 𝑞+1 − 𝑻𝑘 − 𝟏𝑀 𝑇 ⨂�̅�𝑘 − 𝒁𝑘 𝑞 = ∑ 𝜎𝑘ℓ𝝁𝑘ℓ𝝂𝑘ℓ 𝑇𝑟 ℓ=1 . A reasonable termination criterion is that the primal residual, 𝑝𝑟𝑖 = ‖𝑪1𝑼 + 𝑪2𝑾 − 𝑪3‖2, and dual residual, 𝑑𝑢𝑎𝑙 = ‖𝛾𝑪1 𝑇 𝑪2(𝑾 𝑞+1 − 𝑾𝑞 )‖2, are smaller than a predefined tolerance. Model parameter tuning In noisy scenarios, the penalty parameter 𝜆 setting is critical to determine how much variation is persevered as patterns of interest or ignored as noise. An extremely large 𝜆 will coerce the individual variation to be zero. Decreasing 𝜆 will allow more subtype-specific patterns to be detected until overfitting. Cross-validation is a popular strategy in parameter tuning for the balance of underfitting and overfitting. One round of cross-validation excludes a certain portion of samples and uses the model learned from other samples to predict the excluded ones. Then every model is assessed by summarizing prediction performances across multiple rounds. However, our sample-specific deconvolution estimates the individual expression of each sample in each subtype, which cannot be used to predict the excluded samples directly. Thus, we proposed to randomly exclude entries rather than samples in 𝑿 matrix (Fig. 3), similar to the strategy used in missing value imputation. The foundation of success is that the low-rank patterns in 𝑻𝑘 matrix are detectable by only a portion of 𝑿 entries and able to predict the excluded 𝑿 entries. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 12 Fig. 3. 10-fold cross-validation strategy for model parameter tuning. A part of entries is randomly removed before applying swCAM. The removed entries are reconstructed by estimated 𝑻 matrix and compared to observed expressions for computing RMSE to decide the optimal parameter 𝜆. Specifically, we fix the 𝑨 and 𝑺 at the initialization values (from CAM-estimation or a priori knowledge) and randomly remove entries in 𝑿 matrix, leading to the objective function w.r.t ∆𝑺𝑖 , 𝑖 = 1, … , 𝑀: min {∆𝑺𝑖}𝑖=1 𝑀 1 2 ∑‖𝑃Ω𝑖 (𝒙𝑖 ) − 𝑃Ω𝑖 (𝒂𝑖 (�̅� + ∆𝑺𝑖 ))‖2 2 𝑀 𝑖=1 + 𝜆 ∑‖𝑻𝑘 ‖∗ 𝐾 𝑘=1 (14) 𝑠. 𝑡. �̅� + ∆𝑺𝑖 ≽ 𝟎𝐾×𝐿 , 𝑻𝑘 = [∆𝑺1 𝑇 (𝑘), … , ∆𝑺𝑀 𝑇 (𝑘)] ∈ ℝ𝐿×𝑀, 𝑘 = 1, … , 𝐾, where 𝑃Ω𝑖 (𝒙𝑖) ∈ ℝ 𝐿 denote a vector with the entries in Ω𝑖 left alone, and all other entries set to zero. The workflow of our proposed 10-fold cross-validation strategy is: (1) Randomly split all entries into 10 folds; (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 13 (2) Remove one fold of entries and use the remaining 9 folds of entries to solve (14) with different 𝜆 values [𝜆1, 𝜆2, …]; (3) Use estimated ∆𝑺𝑖 (𝜆𝜃 ), 𝑖 = 1, … , 𝑀, 𝜃 = 1,2, …, together with fixed 𝑨 and 𝑺 matrix to reconstruct 𝑿 matrix and only record the reconstructed values for the removed entries in 𝑿; (4) Repeat Step (2)-(3) and obtained a reconstructed �̃�(𝜆𝜃 ) matrix in which all entry values are reconstructed when their original values are absent in optimization processes with 𝜆 = 𝜆𝜃. (5) Calculate Root Mean Square Error (RMSE) by 𝑅𝑀𝑆𝐸(𝜆𝜃 ) = √ 1 𝑀𝐿 ∑ ∑ (𝑿𝑖𝑗 − �̃�𝑖𝑗 (𝜆𝜃 )) 2 𝐿 𝑗=1 𝑀 𝑖=1 (15) (6) Choose the 𝜆𝜃 yielding the minimum RMSE. Warm start can be used in Step (2) with the decreasing parameter 𝜆1 > 𝜆2 > ⋯, which use the estimation with 𝜆𝜃 as the initialization of next optimization with 𝜆𝜃+1. The optimization problem (14) can be solved using a similar ADMM algorithm in (5-13) that have solved (3). The only modification is that (7) becomes 𝒖𝑖 𝑞+1 ∈ argmin 𝒖𝑖∈ℝ 𝐾𝐿 1 2 ‖𝑃Ω𝑖 ′ (𝑯𝑖 𝒖𝑖 ) − 𝑃Ω𝑖 ′ (𝒗𝑖 )‖ 2 2 + 𝛾 2 ‖𝑪1𝒖𝑖 + 𝒚𝒊 𝑞 ‖ 𝐹 2 (16) where 𝑃Ω𝑖 ′ (∙) = [𝟏𝐾 𝑇 ⨂ 𝑃Ω𝑖 (∙) 𝑇 ] 𝑇 ∈ ℝ𝐾𝐿 makes all excluded-entry related variables be optimized only by the second term, which is still an unconstrained quadratic problem that can be solved easily. The remaining variables unrelated to excluded entries can still be optimized following (8-9). Sparsity regularization In addition to low-rank assumption, we could also reasonably assume only limited genes are involved in functional modules and thus impose a row-sparsity regularization by ℓ2,1 -norm minimization. The alternative swCAM formulation will be: min 𝑨,{∆𝑺𝑖}𝑖=1 𝑀 1 2 ∑‖𝒙𝑖 − 𝒂𝑖 (�̅� + ∆𝑺𝑖 )‖2 2 𝑀 𝑖=1 + 𝜆 ∑‖𝑻𝑘 ‖∗ 𝐾 𝑘=1 + 𝛿‖𝑻‖2,1 (17) (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 14 where 𝛿 > 0 is the regularization parameter of ℓ2,1 norm of 𝑻, defined as ‖𝑻‖2,1 ≜ ∑‖𝒕𝑖 ‖2 𝐾𝐿 𝑖=1 accounting for the row-sparsity of 𝑻. If necessary, the parameter 𝛿 actually can be varied for different rows based on the character of each gene, such as mean-variance trend. The supplementary information gives more details on the optimization of (17) by ADMM method. The ℓ1 or ℓ2-norm minimization, as common-used sparsity regularization methods, could impose the entry sparsity in 𝑻 matrix. We also provide ADMM optimization for sample-specific deconvolution with ℓ1 or ℓ2-norm minimization, which could be useful in other sBSS problems. Results As swCAM focuses on subtype-specific variation estimation, simulating biological variance within each subtype and technical variance for each observation is important for validating swCAM performance. We conduct two sets of simulations. The first is in an ideal scenario where the variance is not related to mean value. The second is more realistic where genes with larger mean usually have larger variance. Validation on ideal simulations In the first simulations, we design twelve function modules, with four in each of three subtypes. The observations for 300 genes in 50 samples were simulated with subtype-specific expression baseline, �̅� , sampled from the purified cell populations in real benchmark microarray gene expression data GSE19380 (Kuhn, Thu et al. 2011). 𝒂𝒊, 𝑖 = 1, … , 𝑀, are drawn randomly from a flat Dirichlet distribution. Between-sample variation, ∆𝑺𝑖 (𝑘, 𝑗), 𝑖 = 1, … , 𝑀, for the kth subtype and jth gene was drawn from normal distribution 𝒩(0, 𝜎𝑘𝑗 (𝑠) ) if the jth gene was involved in a function module in the kth subtype; otherwise zero (Fig. 4a). The genes in the same function module has pairwise correlation coefficient equal to one, thus generating a highly correlated gene set in each module. 𝜎𝑘𝑗 (𝑠) are drawn from uniform distribution 𝑈[50, 300]. The technical noise, 𝒏𝑖 , 𝑖 = 1, … , 𝑀, was drawn from zero-mean normal distribution with the variance 𝜎𝑖𝑗 (𝑛) =10. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 15 The twelve functional modules can be recognized in the variation matrix from swCAM when 𝜆 falls into a certain range (Fig. 4b~4i). Increasing the penalty parameter of the nuclear norm will filter more noise but at the cost of the possibility of missing the true variation signal. RMSE derived by 10-fold cross-validation strategy is relatively small when 𝜆 = 1~50 and reach the minimum at 𝜆 = 5 (Fig. 5a). The estimated variation matrix looks quite similar when 1 ≤ 𝜆 ≤ 50 (Fig. 4e~4g), with 12 clear patterns and some artifacts. The artifacts are formed when the signal variation in one subtype spreads to other subtypes for the same genes, which are much lower than detected true signals if 𝜆 is not extremely small. (As shown in the supplementary information, the nuclear norm minimization for each subtype’s variation matrix is a good option to reduce artifacts compared to other regularization terms.) It is interesting to find 𝜆 = 5 is also the point where both primal and dual residuals surge in ADMM algorithm (Fig. 5c~5f). It is because larger 𝜆 tends to train an over-simplified model and thus approach the optimum solution more easily in ADMM. The recovery of sample-specific signals in a subtype is also affected by the mixing proportions of this subtype within the sample. When a subtype accounts for a very small portion in a certain sample, its true signal in this sample will be very weak and thus underestimated (green points in Fig. 6). On the contrary, the major subtype in a sample can be estimated very well by CAM-SS (red points in Fig. 6). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 16 Fig. 4. Heatmap of estimated 𝑇 matrix with varied 𝜆 parameters compared to ground truth in the ideal simulation. Increasing the penalty parameter of the nuclear norm will filter more noise but at the cost of the possibility of missing true signal variation. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 17 Fig. 5. 10-fold cross-validation results under different 𝜆 parameter in the ideal simulation. (a) RMSE; (c) Residuals for primal feasibility condition; (e) Residuals for dual feasibility condition; (b), (d), (f) are zoomed curves of (a), (c), (e). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 18 Fig. 6. Estimated 𝑻 matrix versus ground truth when 𝜆=5 in the ideal simulation. The mixing proportions associated with estimated entries are colored to show the sample-specific expression estimations for high- proportion subtypes can be estimated more accurately than those for low-proportion ones Validation on realistic simulations Mean-variance trend is widely existing in molecular expression data. In our second simulation, all settings are the same as above except that the variance of subtype-specific expression, 𝜎𝑘𝑗 (𝑠) , and the technical variance of observations, 𝜎𝑖𝑗 (𝑛) , are proportional to the subtype-specific expression mean and mixed expression level, respectively. The coefficient of variation (CV), as the ratio of the standard deviation to the mean, is drawn from uniform distribution 𝑈[0.15, 0.3] and 𝑈[0.02, 0.05], respectively. 10-fold cross-validation strategy still obtains the minimum RMSE at 𝜆 = 5 (Fig. 8a~8b) when both primal and dual residuals also surge (Fig. 8c~8f). However, the estimated variation matrix by (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 19 swCAM is blurred by artifacts trained from noise (Fig. 7). Some high-expressed genes have relatively large variance, which could be falsely modeled as subtype-specific signal variations. As shown in Fig. 9, the entries with zero value in Ground Truth variation matrix could be overestimated. Though the absolute expression values estimated by swCAM could deviate from Ground Truth, we can still clearly detect 12 functional modules defined by the Weighted Gene Correlation Network Analysis (WGCNA) (Zhang and Horvath 2005, Langfelder and Horvath 2008) on the estimated sample-specific expressions (Fig. 10). WGCNA constructs weighted networks based on correlation patterns among genes across samples and thus detects function modules of highly- correlated gene sets. In Fig. 10, the second and third subtype finds the exact four true modules with very few genes are missed. The first subtype detects an extra false module, but it is a less significant pattern compared to other modules and can be undetectable with stricter tree height cut threshold. More importantly, without swCAM based deconvolution (Fig. 10d), WGCNA on mixture expression profiles can find none of the true modules, but three false modules that are related to the mixing process of three subtypes. Incorporation of L21-norm regularization In the above simulations, the deconvoluted sample-specific signals contain artifacts trained from signals of other subtypes and artifacts trained from noise (Fig. 4 and Fig. 7). We can use a ℓ2,1- norm regularization to enforce the sparsity of genes that have signal variation across samples. It is supposed to reduce artifacts while it also follows the assumption that genes contributing to source variation in hidden modules are limited. Figure 11 shows the alleviated artifacts with 𝜆 = 5 and 𝛿 = 10, 1, or 0.1. The true function modules are correctly detected with 𝜆 = 5 and 𝛿 = 1 or 0.1, where the false module in the first subtype is suppressed when 𝛿 = 1 (Fig. 12). Increasing the penalty parameter 𝛿 will force more genes to have zero variance, which suppresses the artifacts and false function modules but brings the risk of missing the true signals. It is critical to propose a parameter tuning method for 𝛿. However, the cross-validation strategy with randomly excluding entries for tuning parameter 𝜆 is based on the low-rank assumption, where the hidden low-rank patterns can be trained from a part of entries and then used to reconstruct the remaining entries. This strategy is not applicable to 𝛿 selection, which needs further study. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 20 Fig. 7. Heatmap of estimated 𝑻 matrix scaled by associated means compared to ground truth in the realistic simulation with varied 𝜆 parameters. Increasing the penalty parameter of the nuclear norm will filter more noise but at the cost of the possibility of missing true variation signal. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 21 Fig. 8. 10-fold cross-validation results under different 𝜆 parameter in the realistic simulation. (a) RMSE; (c) Residuals for primal feasibility condition; (e) Residuals for dual feasibility condition; (b), (d), (f) are zoomed curves of (a), (c), (e). (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 22 Fig. 9. Estimated 𝑻 matrix scaled by associated means versus ground truth in the realistic simulation (𝜆=5). The mixing proportions associated with estimated entries are colored to show the sample-specific expression estimations for high-proportion subtypes can be estimated more accurately than those for low- proportion ones. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 23 Fig. 10. Gene co-expressed function modules detected by WGCNA on swCAM estimated sample-specific expression for each subtype (a~c) or on originally observed expressions without deconvoluton (d). (Network interconnectedness is measured by topological overlap; cutHeight = 0.7; minSize = 8.) (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 24 Fig. 11. Heatmap of estimated T matrix scaled by associated means compared to ground truth in the realistic simulation with 𝜆 = 5 and varied 𝛿. Increasing the penalty of L21 norm will enforce more zero columns in 𝛥𝑆𝑘 matrix. Fig. 12. Gene co-expressed function modules detected by WGCNA on swCAM estimated sample- specific expression for each subtype with λ=5 and δ=1 or 0.1. (Network interconnectedness is measured by topological overlap; cutHeight = 0.7; minSize = 8.) (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 25 Discussion Most existing tissue deconvolution methods ignore the expression variability of subtypes across individual samples. swCAM will significantly expand the utility of CAM by producing subtype- specific expression profiles in each sample. The success of swCAM depends on the low-rank assumption, which takes advantage of biologically expected cooperation among genes and thus sheds light on solving the seemingly underdetermined sample-specific deconvolution problem. The low-rank assumption holds naturally in molecule expression data when there exist activated functional modules required by particular biological processes or pathways in different subtypes. The detection of such subtype-specific associations or networks is one of the major targets in the analysis of molecule expression profiles. After our sample-specific deconvolution by swCAM, conventional network analysis methods can be applied directly to the estimated sample-subtype- specific signals to construct subtype-specific networks, e.g. weighted correlation network analysis (WGCNA (Zhang and Horvath 2005, Langfelder and Horvath 2008)) and differential dependency network analysis (DDN (Zhang, Li et al. 2009, Zhang, Tian et al. 2011, Tian, Zhang et al. 2014, Tian, Zhang et al. 2015)). The cross-validation strategy of excluding entries randomly is inspired by the similar ideas in matrix imputation methods that commonly assume the matrix to be recovered has a low rank. Our results consistently show a U-curve over parameter 𝜆, demonstrating the feasibility of the proposed cross-validation strategy. Meanwhile, CAM is not sensitive to the choice of 𝜆, as the U-curve has a wide platform where the recovered sample-subtype-specific signals are similar and detected modules are close. It is also reasonable to assume that genes involved in biological associations or networks are sparse. Therefore, it deserves our further study to use ℓ2,1-norm regularization for reducing artifacts and improving function module detection. When group information is available, we can also apply basic CAM algorithm to each group to obtain group-wise expression profiles of subtypes. Compared to sample-specific deconvolution, group-specific deconvolution aims at a lower resolution of underlying subtype signals and thus could obtain more robust results. If grouping is fine enough, group-specific deconvolution can also (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 26 acquire signal variation in each subtype and thus help detect function modules and construct biological networks. Though swCAM can solve a seemingly underdetermined problem theoretically based on a low- rank assumption. It still needs improvement and validations. First, the improvement of swCAM by sparsity regularization. The sparsity assumption is practically reasonable, and we already show some preliminary results after imposing ℓ2,1 norm regularization. However, introducing one more regularization term will increase the difficulty of parameter tuning. Besides, the current cross- validation strategy with matrix entry sampling is not applicable to selecting the coefficient of ℓ2,1 norm term. Therefore, the integration of sparsity regularization still needs our further study. Second, the improvement of function module detection based on swCAM estimated sample- specific signals in each subtype. Recovering the exact values of sample-specific signals is impossible unless there are more strong assumptions. Luckily, our goal is to detect function module or networks from the between-sample variations in each subtype. Thus, increasing the accuracy of estimated intercorrelations among molecules can be regarded as our target of further efforts. Third, the validation of Validate swCAM in real data analysis. We have demonstrated the capacity of swCAM to estimate sample-specific signals in each subtype using simulations where the between-sample variation matrices are low-rank. Validation of swCAM in real molecule expression data would be difficult, as the benchmark datasets with true subtype-specific signals are unavailable. One possible direction is to verify the constructed subtype-specific networks through biological experiments. Conclusion We propose a sample-specific deconvolution algorithm to estimate simple-specific molecule expressions for each subtype, from which between-sample variation can be used to detect biological associations and construct networks in each subtype. The contributions of this work include: We formulate the objective function for swCAM with a penalty term to minimize the nuclear norm of between-sample variation matrix in each subtype, based on our expectation on the existence of subtype-specific networks. We design an efficient method based on ADMM to solve swCAM’s optimization problem in large-scale biological data. We design a 10-fold cross- validation strategy to select the coefficient of nuclear norm term, and demonstrate its feasibility in (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 27 simulations where a U-curve of RMSE is obtained to determine the optimal selection. We validate swCAM in simulations to demonstrate sample-specific signals can be well estimated when low- rank assumption holds. Even though artificial signal variances exist in swCAM estimations, the intercorrelations among genes can still be well preserved for function module detection and biological network construction. We propose to use extra ℓ2,1 norm regularization to enforce the sparsity of genes involved in networks and thus reduce the artifacts trained from noise or from signals of other subtypes. ACKNOWLEDGMENTS This work has been supported by the National Institutes of Health under Grants HL111362- 05A1, HL133932, NS115658-01, and the Department of Defence under Grant W81XWH-18-1- 0723 (BC171885P1). COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 28 Reference Boyd, S., N. Parikh, E. Chu, B. Peleato and J. Eckstein (2011). "Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers." Found. Trends Mach. Learn. 3(1): 1-122. Buettner, F., K. N. Natarajan, F. P. Casale, V. Proserpio, A. Scialdone, F. J. Theis, S. A. Teichmann, J. C. Marioni and O. Stegle (2015). "Computational analysis of cell-to-cell heterogeneity in single- cell RNA-sequencing data reveals hidden subpopulations of cells." Nat Biotechnol 33(2): 155-160. Cai, J.-F., E. J. Candès and Z. Shen (2010). "A Singular Value Thresholding Algorithm for Matrix Completion." SIAM Journal on Optimization 20(4): 1956-1982. Candes, E. J., C. A. Sing-Long and J. D. Trzasko (2013). "Unbiased Risk Estimates for Singular Value Thresholding and Spectral Estimators." Trans. Sig. Proc. 61(19): 4643-4657. Chasman, D. and S. Roy (2017). "Inference of cell type specific regulatory networks on mammalian lineages." Current Opinion in Systems Biology 2(Supplement C): 130-139. Chen, L. (2019). Mathematical Modeling and Deconvolution for Molecular Characterization of Tissue Heterogeneity. Ph.D. Doctoral Dissertation, Virginia Polytechnic Institute and State University. Chen, L., Y. Lu, C.-T. Wu, R. Clarke, G. Yu, J. E. Van Eyk, D. Herrington and Y. Wang (2020). "Data-driven detection of subtype-specific differentially expressed genes." Scientific Reports. Gal, E., M. London, A. Globerson, S. Ramaswamy, M. W. Reimann, E. Muller, H. Markram and I. Segev (2017). "Rich cell-type-specific network topology in neocortical microcircuitry." Nature Neuroscience 20: 1004. Hastie, T., R. Tibshirani and J. Friedman (2001). The Elements of Statistical Learning. New York, NY, USA, Springer New York Inc. Junttila, M. R. and F. J. de Sauvage (2013). "Influence of tumour micro-environment heterogeneity on therapeutic response." Nature 501: 346. Kuhn, A., D. Thu, H. J. Waldvogel, R. L. Faull and R. Luthi-Carter (2011). "Population-specific expression analysis (PSEA) reveals molecular changes in diseased brain." Nat Methods 8(11): 945-947. Langfelder, P. and S. Horvath (2008). "WGCNA: an R package for weighted correlation network analysis." BMC Bioinformatics 9: 559. Parikh, N. and S. Boyd (2014). "Proximal Algorithms." Foundations and Trends® in Optimization 1(3): 127-239. Recht, B., M. Fazel and P. A. Parrilo (2010). "Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization." SIAM Review 52(3): 471-501. Shen-Orr, S. S., R. Tibshirani, P. Khatri, D. L. Bodian, F. Staedtler, N. M. Perry, T. Hastie, M. M. Sarwal, M. M. Davis and A. J. Butte (2010). "Cell type-specific gene expression differences in complex tissues." Nat Methods 7(4): 287-289. Sonawane, A. R., J. Platig, M. Fagny, C.-Y. Chen, J. N. Paulson, C. M. Lopes-Ramos, D. L. DeMeo, J. Quackenbush, K. Glass and M. L. Kuijjer "Understanding Tissue-Specific Gene Regulation." Cell Reports 21(4): 1077-1088. Thouvenin, P. A., N. Dobigeon and J. Y. Tourneret (2016). "Hyperspectral Unmixing With Spectral Variability Using a Perturbed Linear Mixing Model." IEEE Transactions on Signal Processing 64(2): 525-538. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315 29 Tian, Y., B. Zhang, E. P. Hoffman, R. Clarke, Z. Zhang, I.-M. Shih, J. Xuan, D. M. Herrington and Y. Wang (2014). "Knowledge-fused differential dependency network models for detecting significant rewiring in biological networks." BMC Systems Biology 8(1): 87. Tian, Y., B. Zhang, E. P. Hoffman, R. Clarke, Z. Zhang, M. Shih Ie, J. Xuan, D. M. Herrington and Y. Wang (2015). "KDDN: an open-source Cytoscape app for constructing differential dependency networks with significant rewiring." Bioinformatics 31(2): 287-289. Wang, N., E. P. Hoffman, L. Chen, L. Chen, Z. Zhang, C. Liu, G. Yu, D. M. Herrington, R. Clarke and Y. Wang (2016). "Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues." Scientific Reports 6: 18909. Zhang, B. and S. Horvath (2005). "A general framework for weighted gene co-expression network analysis." Stat Appl Genet Mol Biol 4: Article17. Zhang, B., H. Li, R. B. Riggins, M. Zhan, J. Xuan, Z. Zhang, E. P. Hoffman, R. Clarke and Y. Wang (2009). "Differential dependency network analysis to identify condition-specific topological changes in biological networks." Bioinformatics 25(4): 526-532. Zhang, B., Y. Tian, L. Jin, H. Li, M. Shih Ie, S. Madhavan, R. Clarke, E. P. Hoffman, J. Xuan, L. Hilakivi-Clarke and Y. Wang (2011). "DDN: a caBIG(R) analytical tool for differential network analysis." Bioinformatics 27(7): 1036-1038. (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted January 5, 2021. ; https://doi.org/10.1101/2021.01.04.425315doi: bioRxiv preprint https://doi.org/10.1101/2021.01.04.425315