VIA: Generalized and scalable trajectory inference in single-cell omics data


/

1 VIA:   Generalized   and   scalable   trajectory   inference   in   single-cell   omics   data  
 

2 Shobana   V.   Stassen 1 ,   Gwinky   G.   K.   Yip 1 ,   Kenneth   K.   Y.   Wong 1,3 ,   Joshua   W.   K.   Ho 2,4    and   Kevin   K.   Tsia 1,3  
3 1 Department   of   Electrical   &   Electronic   Engineering,   The   University   of   Hong   Kong,   Pokfulam   Road,   Hong   Kong  
4 2 School   of   Biomedical   Sciences,   Li   Ka   Shing   Faculty   of   Medicine,   The   University   of   Hong   Kong,   Pokfulam,   Hong   Kong  
5 3 Advanced   Biomedical   Instrumentation   Centre,   Hong   Kong   Science   Park,   Shatin,   New   Territories,   Hong   Kong  
6 4 Laboratory   of   Data   Discovery   for   Health,   Hong   Kong   Science   Park,   Shatin,   New   Territories,   Hong   Kong  

7 Abstract  
8 Inferring  cellular  trajectories  using  a  variety  of  omic  data  is  a  critical  task  in  single-cell  data  science.                  
9 However,  prediction  and  thus  biologically  meaningful  discovery  of  cell  fates  are  challenged  by  the  sheer                

10 size  of  single-cell  data,  diverse  omic  data  types,  and  their  complex  data  topologies.  We  present  VIA,  a                  
11 scalable  trajectory  inference  algorithm  that  uses  lazy-teleporting  random  walks  to  accurately  reconstruct             
12 complex  cellular  trajectories  beyond  tree-like  pathways  (e.g.  cyclic  or  disconnected  structures),  and  to              
13 discover  less  populous  lineages  or  those  otherwise  obscured  in  other  methods.  VIA  outperforms  existing               
14 algorithms  in  recapitulating  cell  fates/lineages,  and  also  mitigates  loss  of  global  connectivity  information              
15 in  large  datasets  beyond  a  million  cells.  Furthermore,  VIA  demonstrates  versatility  by  distilling  cellular               
16 trajectories  in  single-cell  transcriptomic,  epigenomic,  proteomic  and  morphological  data  –  showing  new             
17 promise   in   scalable,   multifaceted   single-cell   analysis   to   explore   novel   biological   processes.  

18 Introduction  
19 Single-cell  omics  data  captures  snapshots  of  cells  that  catalog  cell  types  and  molecular  states  with  high                 
20 precision.  These  high-content  single-cell  readouts  can  be  harnessed  to  model  evolving  cellular             
21 heterogeneity  and  track  dynamical  changes  of  cell  fates  in  tissue,  tumour,  and  cell  population.  However,                
22 current  computational  methods  face  four  critical  challenges.  First,  it  remains  difficult  to  accurately              
23 reconstruct  high-resolution  cell  trajectories  and  detect  cell  fates  embedded  within  them.  Even  the  few               
24 algorithms  which  automate  cell  fate  detection  (e.g.,  SlingShot 1  and  Palantir 2 )  exhibit  low  sensitivity  and               
25 are  highly  susceptible  to  changes  in  input  parameters.  Second,  current  trajectory  inference  (TI)  methods               
26 predominantly  work  well  on  tree-like  trajectories  (e.g.  Slingshot,  Monocle2 3 ),  but  lack  the  generalisability              
27 to  infer  disconnected,  cyclic  or  hybrid  topologies  without  imposing  restrictions  on  transitions  and              
28 causality 4 .  Third,  the  growing  scale  of  single-cell  data,  notably  cell  atlases  of  whole  organisms 6 ,7 ,               
29 embryos 8 ,9  and  human  organs 10 ,  exceeds  the  existing  TI  capacity,  not  just  in  runtime  and  memory,  but  in                  
30 preserving  global  connectivity,  which  is  often  lost  after  extensive  dimension  reduction  or  subsampling.              
31 Fourth,  fueling  the  advance  in  single-cell  technologies  is  the  ongoing  pursuit  to  understand  cellular               
32 heterogeneity  from  a  broader  perspective  beyond  transcriptomics.  However,  the  applicability  of  TI  to  a               
33 broader   spectrum   of   single-cell   data   has   yet   to   be   fully   exploited.  

 
34 To  overcome  these  recurring  challenges,  we  present  VIA,  a  graph-based  TI  algorithm  that  uses  a  new                 
35 strategy  to  compute  pseudotime,  and  reconstruct  cell  lineages  based  on  lazy-teleporting  random  walks              
36 integrated  with  Markov  chain  Monte  Carlo  (MCMC)  refinement.  VIA  relaxes  common  constraints  on              
37 traversing  the  graph  by  allowing  cyclic  and  temporally  reversed  movements,  and  thus  robustly  detects  cell                
38 fates  involving  complex  transitions  that  are  otherwise  obscured  in  other  methods.  VIA  outperforms              
39 popular  TI  algorithms  in  terms  of  capturing  cellular  trajectories  not  limited  to  multi-furcations  and  trees,                
40 but  also  disconnected  and  cyclic  topologies  ( Supplementary  Fig.  S1) .  Compared  to  existing  TI  methods,               

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=s4aw0ujvuwn8
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=14ruyzz6p81
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=i7cha45y61s6
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=pwzzqt446v1
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=2glf5ij7qkb6
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=mtxpwbv8o1fn
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=gsc2l2p4cfhh
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

41 VIA  is  highly  scalable  with  respect  to  number  of  cells  (10 2  to  >10 6  cells)  and  features,  without  requiring                   
42 extensive  dimensionality  reduction  or  subsampling  which  compromise  global  information.  We           
43 demonstrate  VIA’s  accuracy,  scalability,  topological-generalizability  and  multi-omic  versatility  across          
44 multiple  modalities  by  investigating  10  simulated  and  experimental  datasets  ( Supplementary  Table  S1 ),             
45 ranging  from  single-cell  RNA-sequencing  (scRNA-seq),  single-cell  sequencing  assay  for          
46 transposase-accessible   chromatin   (scATAC-seq),   multi-omics   integration,   to   mass   and   imaging   cytometry.   

 
47 Figure  1.  General  workflow  of  VIA  algorithm. Step  1: Single-cell  level  graph  is  clustered  such  that  each  node                   
48 represents  a  cluster  of  single  cells  (computed  by  our  clustering  algorithm  PARC 11 ).  The  resulting  cluster  graph  forms                  
49 the  basis  for  subsequent  random  walks. Step  2: 2-stage  pseudotime  computation:  (i)  The  pseudotime  (relative  to  a                  
50 user  defined  start  cell)  is  first  computed  by  the  expected  hitting  time  for  a  lazy-teleporting  random  walk  along  an                    
51 undirected  graph.  At  each  step,  the  walk  (with  small  probability)  can  remain  (orange  arrows)  or  teleport  (red  arrows)                   
52 to  any  other  state.  (ii)  Edges  are  then  forward  biased  based  on  the  expected  hitting  time  (See  forward  biased  edges                     
53 illustrated  as  the  imbalance  of  double-arrowhead  size).  The  pseudotime  is  further  refined  on  the  directed  graph  by                  
54 running  Markov  chain  Monte  Carlo  (MCMC)  simulations  (See  3  highlighted  paths  starting  at  root). Step  3: Consensus                  
55 vote  on  terminal  states  based  on  vertex  connectivity  properties  of  the  directed  graph. Step  4 :  lineage  likelihoods                  
56 computed  as  the  visitation  frequency  under  lazy-teleporting  MCMC  simulations. Step  5 :  visualization  that  combines               
57 network  topology  and  single-cell  level  pseudotime/lineage  probability  properties  onto  an  embedding  using  GAMs,  as               
58 well   as   unsupervised   downstream   analysis   (e.g.   gene   expression   trend   along   pseudotime   for   each   lineage).  

  
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

59 Results  

60 Algorithm  
61 VIA  first  represents  the  single-cell  data  as  a  cluster  graph  (i.e.  each  node  is  a  cluster  of  single  cells),                    
62 computed  by  our  recently  developed  data-driven  community-detection  algorithm,  PARC,  which  allows            
63 scalable  clustering  whilst  preserving  global  properties  of  the  topology  needed  for  accurate  TI 11  ( Step  1  in                 
64 Fig.  1) .  The  cell  fates  and  their  lineage  pathways  are  then  computed  by  a  two-stage  probabilistic  method,                  
65 which  is  the  key  algorithmic  contribution  of  this  work  ( Step  2  in  Fig.  1 ,  see Methods ).  In  the  first  stage,                     
66 VIA  models  the  cellular  process  as  a  modified  random  walk  that  allows  degrees  of laziness  (remaining  at                  
67 a  node/state)  and teleportation  (jumping  to  any  other  node/state)  with  pre-defined  probabilities.  The              
68 pseudotime,  and  thus  the  graph  directionality,  can  be  computed  based  on  the  theoretical  hitting  times  of                 
69 nodes  (See  the  theory  and  derivation  in Methods  and  Supplementary  Note  2 ).  The  lazy-teleporting               
70 behavior  prevents  the  expected  hitting  time  from  converging  to  a  local  distribution  in  the  graph  as                 
71 otherwise  occurs  in  regular  random  walks,  especially  when  the  sample  size  grows 12 .  More  specifically,  the                
72 laziness  and  teleportation  factors  regulate  the  weights  given  to  each  eigenvector-value  pair  in  the  expected                
73 hitting  time  formulation  such  that  the  stationary  distribution  (given  by  the  local-node  degree-properties  in               
74 regular  walks)  does  not  overwhelm  the  global  information  provided  by  other  ‘eigen-pairs’.  Moreover,  the               
75 computation  does  not  require  subsetting  the  first k eigenvectors  (bypassing  the  need  for  the  user  to  select                  
76 a  suitable  threshold  or  subset  of  eigenvectors)  since  the  dimensionality  is  not  on  the  order  of  number  of                   
77 cells,  but  equal  to  the  number  of  clusters.  Hence  all  eigenvalue-eigenvector  pairs  can  be  incorporated                
78 without  causing  a  bottleneck  in  runtime.  Consequently  in  VIA,  the  modified  walk  on  a  cluster-graph  not                 
79 only  enables  scalable  pseudotime  computation  for  large  datasets  in  terms  of  runtime,  but  also  preserves                
80 information  about  the  global  neighborhood  relationships  within  the  graph.  In  the  second  stage  of  Step  2,                 
81 VIA  infers  the  directionality  of  the graph  by  biasing  the  edge-weights  with  the  initial  pseudotime                
82 computations,  and  refines  the  pseudotime  through  MCMC  simulations.  Next (Step  3  in  Fig . 1),  the                
83 MCMC-refined  graph-edges  of  the  lazy-teleporting  random  walk  enable  accurate  predictions  of  terminal             
84 cell  fates  through  a  consensus  vote  of  various  vertex  connectivity  properties  derived  from  the  directed                
85 graph.  The  cell  fate  predictions  obtained  using  this  approach  are  more  robust  to  changes  in  input  data  and                   
86 parameters  compared  to  other  TI  methods  ( Supplementary  Fig.  S1  and  Fig.  S16) .  Trajectories  towards               
87 identified  terminal  states  are  resolved  using  lazy-teleporting  MCMC  simulations  ( Step  4  in  Fig.  1 ).  The                
88 probabilistic  approach  and  relaxation  of  edge  constraints  allowed  by  VIA  in  computing  differentiation              
89 pathways  and  pseudotime  enables  greater  sensitivity  to  cell  fates  and  complex  trajectories,  and  makes               
90 allowances  for  asynchrony  in  differentiation  processes  by  avoiding  prematurely  imposing  constraints  on             
91 node-to-node  mobility.  Other  methods  resort  to  constraints  such  as  reducing  the  graph  to  a  tree,  imposing                 
92 unidirectionality  by  thresholding  edges  based  on  pseudotime  directionality,  removing  outgoing  edges            
93 from  terminal  states 13 , 2  and  computing  shortest  paths  for  pseudotime 2 ,1 .  VIA’s  probabilistic  approach  to              
94 graph-traversal  allows  it  to  infer  cell  fates  when  the  underlying  data  spans  combinations  of  multifurcating                
95 trees  and  cyclic/disconnected  topologies  -  topologies  and  hence  lineages  often  obscured  in  existing  TI               
96 methods  ( Supplementary  Fig.  S1 ).  Together,  these  four  steps  facilitate  holistic  topological  visualization             
97 of  TI  on  the  single-cell  level  (e.g.  using  UMAP  or  PHATE  embeddings 14 ,15 )  and  other  data-driven                
98 downstream   analyses   such   as   recovering   gene   expression   trends   ( Methods ).   ( Step   5   in   Fig.   1 ).  

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=e9uxoipvwota
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=onqq15xmlfrf
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=jbvdrwuod0wd
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=14ruyzz6p81
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=14ruyzz6p81
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=igxr9liha0sr
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

99 VIA   accurately   infers   trajectories   in   diverse   scRNA-seq   data  
100 VIA  recapitulates  differentiation  topologies  and  identifies  elusive  cell  fates  across  a  wide  range  of               
101 transcriptomic  data.  We  first  showcase  the  ability  of  VIA  to  explore  large  single-cell  transcriptomic               
102 datasets  by  employing  the  1.3-million-cell  mouse  organogenesis  cell  atlas  (MOCA) 8 .  While  this  dataset  is               
103 inaccessible  to  most  TI  methods  from  a  runtime  and  memory  perspective,  VIA  can  efficiently  resolve  the                 
104 underlying  developmental  heterogeneity,  including  9  major  trajectories  ( Fig.  2a,b )  with  a  runtime  of  ~40               
105 min,  compared  to  the  next  fastest  method  which  has  a  runtime  of  at  least  4  hours 2  ( Supplementary  Table                   
106 S3 ).  VIA  preserves  wider  neighborhood  information  and  reveals  a  globally  connected  topology  of  MOCA               
107 which  is  otherwise  lost  in  the  previous  method.  Broadly  speaking,  the  overall  cluster  graph  of  VIA                 
108 consists  of  three  main  branches  that  concur  with  the  known  developmental  process  at  early               
109 organogenesis. 16  ( Fig.  2a) .  It  starts  from  the  root  stem  which  has  a  high  concentration  of  E9.5  early                  
110 epithelial  cells  made  of  multiple  sub-trajectories  (e.g.  epidermis,  nose  and  foregut/hindgut  epithelial  cells              
111 derived  from  the  ectoderm  and  endoderm).  The  stem  is  connected  to  two  distinct  lineages:  1)                
112 mesenchymal  cells  originated  from  the  mesoderm  which  arises  from  interactions  between  the  ectoderm              
113 and  endoderm 17  and  2)  neural  tube/crest  cells  derived  from  neurulation  when  the  ectoderm  folds  inwards 1 .                
114 The  sparsity  of  early  cells  (only  ~8%  are  E9.5)  and  the  absence  of  earlier  ancestral  cells  make  it                   
115 particularly  challenging  to  capture  the  simultaneous  development  of  trajectories.  However,  the  overall             
116 pseudotime  structure  presented  by  VIA  is  reasonable.  For  instance,  at  the  junction  of  the               
117 epithelial-mesenchymal  branch,  we  find  early  mesenchymal  cells  from  E9.5-E10.5.  Cells  from  later             
118 mesenchymal  developmental  stages  (e.g.  myocytes  from  E12.5-  E13.5)  reside  at  the  leaves  of  branches.               
119 Similarly,  at  the  junction  of  epithelial-neural  tube,  we  find  dorsal  tube  neural  cells  and  notochord  plate                 
120 cells  which  are  predominantly  from  E9.5-E10.5  and  more  developed  neural  cells  at  the  tips  (e.g.                
121 excitatory  and  inhibitory  neurons  from  E12.5-E13.5).  VIA  also  places  the  other  dispersed  groups  of               
122 trajectories  (e.g.  endothelial,  hematopoietic)  in  biologically  relevant  neighborhoods  ( Supplementary          
123 Notes  3,  Supplementary  Fig.  S11 ).  While  VIA’s  connected  topology  offers  a  coarse-grained  holistic              
124 view,  it  does  not  compromise  the  ability  to  delineate  individual  lineage  pathways  (consistent  with  those                
125 found  by  Cao  et  al., 8 )  as  shown  in Fig.  2c  and  Supplementary  Fig.  S11. TI  using  VIA  uniquely                   
126 preserves  both  the  global  and  local  structures  of  the  data  and  is  thus  particularly  favorable  for  biological                  
127 exploration  involving  large  datasets,  especially  for  comparative  studies  involving  cell  atlases 19 .  Whilst             
128 manifold-learning  methods  are  often  used  to  extensively  reduce  dimensionality  to  mitigate  the             
129 computational  burden  of  large  single-cell  datasets,  they  tend  to  incur  loss  of  global  information  and  be                 
130 sensitive  to  input  parameters.  VIA  is  sufficiently  scalable  to  bypass  such  a  step,  and  therefore  retains  a                  
131 higher  degree  of  neighborhood  information  when  mapping  large  datasets.  This  is  in  contrast  to               
132 Monocle3’s 8  UMAP-reduced  inputs  that  reveal  different  disconnected  super-groups  and  fluctuating           
133 connectivity  depending  on  input  parameters  (see Supplementary  Fig.  S12-15 for  the  biologically             
134 consistent  structures  proposed  by  VIA  across  a  range  of  parameters  compared  to  the  contradicting  cell                
135 super   groups   and   connectivity   suggested   by   a   UMAP   based   TI   interpretation ).   

 
136 We  next  demonstrated  the  applicability  of  VIA  in  single-cell  multi-omics  analysis  by  inferring  murine               
137 Isl1+  cardiac  progenitor  cell  (CPC)  transition  states  using  both  single-cell  transcriptomic  and  chromatin              
138 accessibility  information 20  ( Fig.  2d-i ).  VIA  consistently  uncovers  the  bifurcating  lineages  towards  the             

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=mtxpwbv8o1fn
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=14ruyzz6p81
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=s2f3i1x6uus6
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=ic935kiviuls
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=c4gwvsmlicg
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=mtxpwbv8o1fn
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=mtxpwbv8o1fn
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=dt4acr7aal4q
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

139 endothelial  and  cardiomyocyte  fates  based  on  the  scRNA-seq,  scATAC-seq  datasets  and  their  data              
140 integration ( see  Methods for  data  integration).  Other  methods  such  as  Palantir  and  Slingshot,  that  are  also                 
141 applicable  to  non-transcriptomic  data,  fail  to  uncover  the  two  main  lineages  in  the  individual  as  well  as                  
142 the  more  challenging  integrated  multi-omic  data.  They  typically  only  detect  one  of  the  two  lineages  and                 
143 instead  falsely  detect  several  intermediate  and  early  stages  as  final  cell  fates  ( see  Fig.  2i for  prediction                  
144 accuracy).  PAGA  does  not  offer  automated  cell  fate  prediction  and  is  therefore  not  benchmarked  for  this                 
145 dataset.  VIA  detects  lineage  pathways  in  both  the  scRNA-seq  and  scATAC-seq  that  can  be  used  to                 
146 interpret  relationships  between  transcription  factor  dynamics  and  gene  expression  in  an  unsupervised             
147 manner.  VIA  automatically  generates  a  pseudotemporal  ordering  of  cells  (without  requiring  manual             
148 selection  of  relevant  cells  as  done  in  Jia  et  al. 20 )  along  respective  lineages  and  their  marker-TF  pairs  ( see                   
149 Fig.  2f  and  Supplementary  Fig.  S8e) .  The  highlighted  gene  and  TF  pairs  in  the  cardiac  lineage  show  a                   
150 strong  correlation  between  expression  and  accessibility  of Gata and  Homeobox Hox  genes  which  are               
151 known  to  be  related  to  the  regulation  of  cardiomyocyte  proliferation 23,24,25 .  VIA’s  reliable  performance              
152 against  user-reconfiguration  (choice  of  components,  individual  or  integrated  omic  data)  suggests  it  can  be               
153 used   for   transferable   interpretation   between   scRNA-seq   and   chromatin   accessibility   data.  

 
154 We  further  tested  VIA  on  a  wider  scope  of  (small-to  mid-sized)  scRNA-seq  datasets,  including  B-cell                
155 differentiation 26 ,  hematopoiesis 2 , 27 ,  embryonic  stem  (ES)  cell  differentiation  in  embryoid  bodies 15 ,  and            
156 endocrine  differentiation  (~10 2  -  10 4  cells).  By  comparing  VIA  with  top-performing  and  popular  TI               
157 algorithms,  e.g.  PAGA 28 ,  Palintir,  SlingShot  and  CellRank 13  (See  Methods ,  and Supplementary  Fig.             
158 S1-7  for  full  analysis),  we  showed  that  VIA  consistently  outperforms  other  methods  in  terms  of  both                 
159 runtime  (in  some  cases  by  several  magnitudes see  Supplementary  Table  S3  for  runtime  comparison ),               
160 and  more  robust  and  accurate  lineage  prediction  across  a  wide  range  of  pre-processing  and  algorithmic                
161 parameters.  VIA’s  relaxation  of  graph  traversal  to  permit  cyclic  sub-paths (see  Supplementary  Fig.  S1)               
162 and  movements  that  are  temporally  reversed,  augments  its  sensitivity  to  lineages.  Notably,  VIA  more               
163 consistently  across  a  wide  range  of  input  parameter  choice  identified  less  populous  lineages  that  were  at                 
164 best  detected  by  other  methods  for  a  narrow  sweet  spot  of  parameters.  For  example,  VIA  reliably                 
165 delineates  the  megakaryocyte,  conventional  and  plasmacytoid  dendritic  cell  (cDC  and  pDC)  lineages  in              
166 human  hematopoiesis  ( Fig.  2m-o,  Supplementary  Fig.  S3-4 for  pseudotime  and  graph-topological  gene             
167 trends  for  all  lineages);  and  Delta  cells  (3%)  during  the  endocrine  progenitor  cells  differentiation  ( Fig.                
168 2j-l,  Supplementary  Fig.  S6 for  pseudotime  and  topological  gene  trends  for  all  lineages),  as  evidenced                
169 by  the  corresponding  gene-expression  trend  analysis  and  parameter  stress  tests.  Interestingly,  we  find  that               
170 VIA  often  detects  2  Beta  cell  subpopulations (Supplementary  Fig.  S6b,d,f) that  express  typical  Beta               
171 markers  like Dlk1,  Pdx1 , but  differ  in  their  expression  of Ins1  and Ins2 (Supplementary  Fig.  S6d) .  Such                  
172 a  Beta  cell  heterogeneity 29 ,30 ,  whereby  the  immature  Beta-2  population  strongly  expresses Ins2 ,  and              
173 weakly  expresses Ins1 ,  and  the  mature  Beta-1  population  expresses  both  types  of Ins 30 ,  can  also  be                 
174 reconciled   based   on   the   position   of   the   Beta-2   cluster   on   the   VIA   graph    (Supplementary   Fig.   S6f).  

 
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=dt4acr7aal4q
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=hpu96rs05k0s
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=14ruyzz6p81
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=y5ixemnm3tac
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=9ohfvgceg55
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=s2mv1z2n9zoj
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=jbvdrwuod0wd
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=hn4iy4esq5sd
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=thb0kc2o9554
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

175

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

176 Figure  2  VIA  accurately  infers  trajectories  in  diverse  scRNA-seq  datasets.  (a)  VIA  cluster-graph  trajectory  where                
177 nodes  are  colored  by  pseudotime,  and  branches  are  shaded  according  to  major  lineages  of  1.3-million-cell  mouse                 
178 organogenesis  cell  atlas  (MOCA).  The  VIA  analysis  (which  is  independent  of  the  choice  of  visualization)  produces  a                  
179 connected  structure  with  linkages  between  some  of  the  major  cell  types  that  have  a  tendency  to  become  segregated                   
180 in  a  UMAP  based  TI  analysis  (see Supplementary  Fig.  S12-15 ).  The  stem  (root)  branch  consists  of  epithelial  cells                   
181 derived  from  ectoderm  and  endoderm,  leading  to  two  main  branches:  1)  the  mesenchymal  and  2)  the  neural  tube  and                    
182 neural  crest.  Other  major  groups  are  placed  in  the  biologically  relevant  neighborhoods,  such  as  the  adjacencies                 
183 between  hepatocyte  and  epithelial  trajectories;  the  neural  crest  (comprising  glial  cells  and  PNS  neurons)  and  the                 
184 neural  tube;  as  well  as  the  links  between  early  mesenchyme  with  both  the  hematopoietic  cells  and  the  endothelial                   
185 cells  ( see  Supplementary  Note  3 ). (b)  (Left)  Single-cell  PHATE  embedding  colored  by  major  cell  groups.  (Right)                 
186 Single-cell  PHATE  embedding  colored  by  VIA  pseudotime. (c)  Lineage  pathways  and  probabilities  of  neuronal,               
187 myocyte  and  WBC  lineages  ( see  Supplementary  Fig.  S6  for  other  lineages ). (d)  scRNA-seq  and  scATAC-seq  data                 
188 of Isl1+  cardiac  progenitors  (CPs)  integrated  using  Seurat3  before  VIA  TI  analysis  and  PHATE  visualization.  Cells  are                  
189 colored  by  annotated  cell-type  and  experimental  modality (e)  Cells  are  colored  by  VIA  pseudotime  with  the                 
190 VIA-inferred  trajectory  towards  endothelial  and  myocyte  lineages  projected  on  top. (f)  Marker  gene  expression  and                
191 chromatin  accessibility  for  gene-TF  pairs  along  pseudotime  axis  for  Cardiomyocyte  lineage (g) VIA-graph  trajectory               
192 with  nodes  colored  by  pseudotime  shows  bifurcation  to  endothelial  and  myocyte  cells  in  scRNA-seq  cells  (h)                 
193 scATAC-seq  of Isl1+  CPs:  VIA-graph  again  shows  bifurcation  after  intermediate  CP  stage.  (i) Lineage  prediction                
194 accuracy  (F1-score)  for  methods  that  offer  automated  lineage  detection  and  are  not  limited  to  transcriptomic  data.  (k)                  
195 Pancreatic  Islets:  Colored  by  VIA  pseudotime  with  detected  terminal  states  shown  in  red  and  annotated  based  on                  
196 known  cell  type  as  Alpha,  Beta-1,  Beta-2,  Delta  and  Epsilon  lineages  where  Beta-2  is Ins1 low Ins2+  Beta  subtype                  
197 ( Supplementary  Fig.  S7 ). (l)  VIA  inferred  cluster-level  pathway  shows  gene  regulation  along  endocrine  progenitor               
198 (EP)  to  Delta  lineage  specification  (top)  and  Sst  gene-expression  trend  shows  rise  of Sst  in  Delta  lineage  ( See                   
199 Supplementary  Fig.  S7  for  remaining ). (m)  Prediction  Accuracy  of  the  4  major  endocrine  cell  types  when  varying                  
200 the  number  of  HVGs  selected  in  pre-processing,  and  the  number  of  PCs. (n)  Human CD34+  hematopoiesis  with  6                   
201 detected  cell  fates  annotated (o)  lineage  pathway  and  gene-pseudotime  trend  shown  for  the CD41  Megakaryocytic                
202 cells  ( see  Supplementary  Fig.  S3  for  other  lineages ). (p) Prediction  accuracy  of  6  cell  fates  when  varying  number                   
203 of  K  (nearest  neighbors)  and  PCs.  Note  Slingshot  on  default  mode  (“V2”)  uses  GMM  clustering  and  “V1”  uses                   
204 K-means  clustering  (allowing  for  over-clustering  K=15,  to  increase  sensitivity).  Runtime  of  each  method  is  also                
205 highlighted   below   the   chart.  

206 VIA   enables   multi-omic   analysis    beyond   transcriptomic   data  
207 Broad  applicability  of  TI  beyond  transcriptomic  analysis  is  increasingly  critical,  but  existing  methods              
208 have  limitations  contending  with  the  disparity  in  the  data  structure  (e.g.  sparsity  and  dimensionality)               
209 across  a  variety  of  single-cell  data  types  and  oftentimes  are  designed  with  a  view  to  only  handling                  
210 transcriptomic   data   (e.g.   methods   using   RNA   velocity   to   infer   directionality).   

 
211 First,  we  employ  VIA  to  analyze  human  scATAC-seq  profiles  (from CD34+  human  bone  marrow)  ( Fig.                
212 3a ),  and  find  that  the  continuous  landscape  of  hematopoiesis  generally  mirrors  the  scRNA-seq  human               
213 hematopoietic  data  ( Fig.  2c ). The  intrinsic  sparsity  of  scATAC-seq  data  poses  a  challenge  that  can  be                 
214 alleviated  by  choice  of  pre-processing  pipelines,  and  we  see  that  VIA  consistently  predicts  the  expected                
215 hierarchy  of  furcations  that  leads  to  the  lymphoid,  myeloid  and  erythroid  lineages  for  two  commonly                
216 accepted  pre-processing  protocols 31 ,27 ( Methods ) .  This  again  holds  across  a  wide  range  of  input              
217 parameters,  as  shown  by  the  changes  in  the  accessibility  of  TF  motifs  associated  with  known  regulators,                 
218 e.g.    Gata1    (erythroid),    Cebpd    (myeloid)    (Fig   3b-d,   Supplementary   Fig.   S7).   

 
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=cpy816x2qcr
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

219 We  next  investigated  whether  VIA  can  cope  with  a  significant  drop  in  data  dimensionality  (10-100),  as                 
220 often  presented  in  flow/mass  cytometry  data,  and  still  delineate  continuous  biological  processes.  We  run               
221 VIA  on  a  time-series  mass  cytometry  data  (28  antibodies,  90K  cells)  capturing  murine  embryonic  stem                
222 cells  (ESCs)  differentiation  toward  mesoderm  cells  (Day  0  -  Day  11) 32 .  Unlike  previous  analysis 32  of  the                 
223 same  data  which  required  chronological  labels  to  visualize  the  developmental  hierarchy,  we  ran  VIA               
224 without  such  supervised  adjustments  and  accurately  captured  the  sequential  development.  VIA  computed             
225 the  trajectories  with  faster  runtime  (running  in  2  minutes  versus  Slingshot  which  required  6  hours see                 
226 Table  S3 ),  detecting  3  terminal  states  corresponding  to  cells  in  the  final  developmental  stages:  2                
227 corresponding  to  the  main  region  of  Day  10-11  (marked  by Pdgfra , Cd44  and Gata4  expressions),  and  a                  
228 small  population  of  Day  10-11  cells  expressing  EpCAM,  which  are  otherwise  obscured  in  other  methods                
229 (e.g.   Palantir,   Slingshot),   especially   the   small   EpCAM   population   (~0.5%   of   cells)    (Fig   3e-h,   Fig.   S9e,f) .   

 
230 Finally  we  tested  the  adaptability  of  VIA  to  infer  cell-cycle  stages  based  on  label-free  single-cell                
231 biophysical  morphology  (38  features, see  Supplementary  Table  S4  and  Table  S5 )  profiled  by  our               
232 recently  developed  high-throughput  imaging  flow  cytometer,  called  FACED 33 .  VIA  reliably  reconstructed            
233 the  continuous  cell-cycle  progressions  from  G1-S-G2/M  phase  of  two  different  types  of  live  breast  cancer                
234 cells  as  validated  by  the  single-cell  fluorescent  (DNA  dye)  images  captured  by  the  same  system                
235 ( Methods )( Fig.  3i-k for  MCF7 ,  Supplementary  Fig.  S10 for  MDA-MB231 ) .  Intriguingly,  according  to             
236 the  pseudotime  ordered  by  VIA,  not  only  can  it  reveal  the  known  cell  growth  in  size  and  mass 34 ,  and                    
237 general  conservation  of  cell  mass  density 35  (as  derived  from  the  FACED  images  ( Methods ))  throughout               
238 the  G1/S/G2  phases,  but  also  a  slow-down  trend  during  the  G1/S  transition,  consistent  with  the  lower                 
239 protein-accumulation  rate  during  S  phase 36  ( Fig.  3l,  Supplementary  Fig,  S10  f,g ).  The  variation  in               
240 biophysical  textures  (e.g.  phase  entropy)  along  the  VIA  pseudotime  likely  relates  to  known  architectural               
241 changes  of  chromosomes  and  cytoskeletons  during  the  cell  cycles  ( Fig.  3l,  Fig.  S10  f,g ).  These  results                 
242 further  substantiate  the  growing  body  of  work 37 ,38,39,40  on  imaging  biophysical  cytometry  for  gaining  a               
243 mechanistic   understanding   of   biological   systems,   especially   when   combined   with   omics   analysis 41 .   

244 Concluding   Remarks  
245 Overall,  VIA  offers  an  advancement  to  TI  methods  to  study  a  diverse  range  of  single-cell  omic  data,                  
246 including  those  targeted  by  many  cell-atlas  initiatives.  By  combining  lazy-teleporting  random  walks  and              
247 MCMC  simulations,  VIA  relaxes  common  constraints  on  graph  traversal  and  causality.  This  enables              
248 accurate  lineage  prediction  that  is  robust  to  parameter  configuration  for  a  variety  of  complex  topologies                
249 and  rarer  lineages  obscured  in  other  methods.  Our  stress  tests  showed  that  the  modeled  developmental                
250 landscape  in  other  methods  is  vulnerable  to  user  parameter  choice  which  can  incur  fragmentation  or                
251 spurious  linkages,  and  consequently  only  yield  biologically  sensible  lineages  for  a  narrow  sweet  spot  of                
252 parameters  (See  the  summary  in Supplementary  Fig.  S16 ).  For  example,  due  to  algorithmic  measures               
253 taken  to  restrict  permissible  graph-edge  transitions  and  progressively  reduce  the  inherent  dimensionality             
254 (e.g.  PCA  followed  by  subsetting  the  number  of  diffusion  components)  other  algorithms  struggle  to               
255 delineate  obscure  lineages  and  maintain  neighborhood  relationships.  VIA’s  wider  bandwidth  of  accuracy,             
256 superior  runtime  and  preservation  of  global  graph  properties  for  very  large  datasets,  offers  a  unique  and                 
257 well-suited  approach  for  multifaceted  exploratory  analysis  to  uncover  novel  biological  processes,            
258 potentially   those   deviated   from   healthy   trajectories.   

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=29jzsvekzs4z
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=29jzsvekzs4z
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=1ezm5xa09jlh
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=tbqfcii5qr09
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=wlju6tjkbefs
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=qj4499p96ubk
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=z8feu0u2i2yt
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=d11sb02yalb
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

 
259 Figure  3  VIA  infers  trajectories  in  single-cell  multi-omic  and  image  datasets  (a) Major  lineages  of  human                 
260 hematopoiesis  (profiled  by  scATAC-seq)  projected  onto  the  UMAP  embedding.  Lineages  are  colored  by  FACS  sorted                
261 labels 27 . (b)  VIA  cluster-graph  topology  colored  by  VIA  pseudotime. (c)  Trajectory,  pseudotime  and  detected  terminal                
262 states  (red)  projected  onto  the  UMAP  embedding. (d)  F1-scores  (on  the  k-mer  Z-score  input)  for  terminal  state                  
263 prediction  by  different  TI  methods  (for  a  fixed  KNN  =  20).  Terminal  states  include  megakaryocyte–erythroid  progenitor                 
264 (MEP),  common  lymphoid  progenitor  (CLP),  plasmacytoid  dendritic  cell  (pDC)  and  monocytes  (Mono)  lineages.  The               
265 comparisons  show  that  VIA's  accuracy  remains  high  across  a  wide  range  of  PCs. (e)  Differentiation  of  mESC  to                   
266 mesoderm  cells  measured  by  single-cell  mass  cytometry.  UMAP  embedding  is  colored  by  different  measurement  time                
267 points  (Day  0-11). (f)  VIA  cluster  graph  with  3  detected  terminal  nodes  (red)  and  colored  by  pseudotime. (g)  VIA                    
268 results  projected  onto  single-cell  UMAP  embedding  shows  3  terminal  states  correspond  to  Day  10/11  regions.  (h)                 
269 Correlation  of  inferred  pseudotime  and  day-labels  achieved  by  different  TI  methods.  The  benchmark  was  done  across                 
270 different  numbers  of  KNN  (using  all  28  antibodies).  (i)  Label-free  cell  cycle  progression  tracking  based  on  FACED                  
271 imaging  cytometry.  The  PHATE  embedding  is  constructed  using  38  biophysical/morphological  features  computed             
272 from  images  of  human  breast  cancer  cells  (MCF7)  (See Supplementary  Fig.  S10  for  additional  results  using  another                  
273 breast  cancer  cell  type  (MDA-MB231)).  The  embedding  is  colored  by  the  known  cell  cycle  stages  given  by  the  DNA                    
274 fluorescence  images  (obtained  from  the  same  system). (j)  VIA  graph  topology  colored  by  pseudotime.  (k)  VIA                 
275 trajectory  and  pseudotime  projected  on  embedding. (l)  “Biophysical”  feature  expressions  (Z-score  normalized)  over              
276 pseudotime.   (See    Supplementary   Table   S4-5    for   detailed   definitions   of   the   features).  

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=y5ixemnm3tac
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

277 Methods  

278 VIA   Algorithm  
279 VIA  applies  a  scalable  probabilistic  method  to  infer  cell  state  dynamics  and  differentiation  hierarchies  by                
280 organizing  cells  into  trajectories  along  a  pseudotime  axis  in  a  nearest-neighbor  graph  which  is  the  basis                 
281 for  subsequent  random  walks.  Single  cells  are  represented  by  graph  nodes  that  are  connected  based  on                 
282 their  feature  similarity,  e.g.  gene  expression,  transcription  factor  accessibility  motif,  protein  expression  or              
283 morphological   features   of   cell   images.   A   typical   routine   in   VIA   mainly   consists   of   four   steps:  

 
284 1. Accelerated  and  scalable  cluster-graph  construction .  VIA  first  represents  the  single-cell  data  in  a              
285 k-nearest-neighbor  (KNN)  graph  where  each  node  is  a  cluster  of  single  cells.  The  clusters  are                
286 computed  by  our  recently  developed  clustering  algorithm,  PARC 11. .  In  brief,  PARC  is  built  on               
287 hierarchical  navigable  small  world  (HNSW 58 )  accelerated  KNN  graph  construction  and  a  fast             
288 community-detection  algorithm  (Leiden  method 42 ),  which  is  further  refined  by  data-driven  pruning.            
289 The  combination  of  these  steps  enables  PARC  to  outperform  other  clustering  algorithms  in              
290 computational  run-time,  scalability  in  data  size  and  dimension  (without  relying  on  subsampling  of              
291 large-scale,  high-dimensional  single-cell  data  (>1  million  cells)),  and  sensitivity  of  rare-cell  detection.             
292 We  employ  the  cluster-level  topology,  instead  of  a  single-cell-level  graph,  for  TI  as  it  provides  a                 
293 coarser  but  clearer  view  of  the  key  linkages  and  pathways  of  the  underlying  cell  dynamics  without                 
294 imposing  constraints  on  the  graph  edges.  Together  with  the  strength  of  PARC  in  clustering  scalability                
295 and  sensitivity,  this  step  critically  allows  VIA  to  faithfully  reveal  complex  topologies  namely  cyclic,               
296 disconnected   and   multifurcating   trajectories   ( Supplementary   Fig.   S1 ).   

 
297 2. Probabilistic  pseudotime  computation .  The  trajectories  are  then  modeled  in  VIA  as  (i)             
298 lazy-teleporting  random  walk  paths  along  which  the  pseudotime  is  computed  and  further  refined  by               
299 (ii)  MCMC  simulations.  The  root  is  a  single  cell  chosen  by  the  user.These  two  sub-steps  are  detailed                  
300 as   follows:  
301 (i) Lazy-teleporting  random  walk :  We  first  compute  the  pseudotime  as  the  expected  hitting  time               
302 of  a  lazy-teleporting random  walk  on  an  undirected  cluster-graph  generated  in  Step  1.  The               
303 lazy-teleporting  nature  of  this  random  walk  ensures  that  as  the  sample  size  grows,  the  expected                
304 hitting  time  of  each  node  does  not  converge  to  the  stationary  probability  given  by  local  node                 
305 properties,  but  instead  continues  to  incorporate  the  wider  global  neighborhood  information 12 .            
306 Here  we  highlight  the  derivation  of  the  closed  form  expression  of  the  hitting  time  of  this  modified                  
307 random   walk   with   a   detailed   derivation   in    Supplementary   Note   2 .  

 
308 The  cluster  graph  constructed  in  VIA  is  mathematically  defined  as  a  weighted  connected  graph G                
309 ( V , E , W )  with  a  vertex  set V  of n  vertices  (or  nodes),  i.e. and  an  edge  set E ,               V =  {v , ,  }1 ⋯ vn       
310 i.e.  a  set  of  ordered  pairs  of  distinct  nodes. W  is  an  weight  matrix  that  describes  a  set  of            n ×n         
311 edge  weights  between  node i  and j ,  are  assigned  to  the  edges .  For  an  undirected        ≥0wij        v ,( i vj)     
312 graph, The  probability  transition  matrix, P,  of  a  standard  random  walk  on  this  wwij =  ji   ×nn             
313 graph    G    can   be   given   by  

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=e9uxoipvwota
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=nlu4dvpyxpr5
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=w4963vuo4u2f
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=onqq15xmlfrf
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

314 D WP =  −1 (1)  
315 where D  is  the  degree  matrix,  which  is  a  diagonal  matrix  of  the  weighted  sum  of  the  degree      ×nn                
316 of   each   node,   i.e.   the   matrix   elements   are   expressed   as   

 
317 where k  are  the  neighbouring  nodes  connected  to  node i .  Hence,  (which  can  be  reduced  as )            dii       di  
318 is   the   degree   of   node    i .   We   next   consider   a    lazy    random   walk,   defined   as    Z ,   with   probability   
319 ( )   of   being   lazy   (where   0   ),   i.e.   staying   at   the   same   node,   then 1 − x < x < 1  

 
320 xP 1 )IZ =  + ( − x (3)  

 
321 where I  is  the  identity  matrix.  When  teleportation  occurs  with  a  probability  ( ),  the  modified             1 − α    
322 lazy-teleporting   random   walk     Z'    can   be   written   as   follows,   where   is   an     matrix   of   ones.  J  ×nn   

 
323 αZ 1 ) JZ ′ =  + ( − α n

1 (4)  
 

324 Here  we  adapt  the  concept  of  personalized  PageRank  vector,  originally  used  for  recording  (or               
325 ranking )  personal  preferences  of  a  web-surfer  toward  particular  website  pages 43 ,  to rank  the              
326 importance  of  other  nodes  (clusters  of  cells)  to  a  given  node,  depending  on  the  similarities  among                 
327 nodes  (related  to P in  the  graph),  and  the  lazy-teleporting  random  walk  characteristics  in  the                
328 graph  (set  by  probabilities  of  teleporting  and  being  lazy).  Based  on  this  concept,  one  could  model                 
329 the  likelihood  to  transit  from  one  node  (cluster  of  cells)  to  another,  and  thus  construct  the                 
330 pseudotime  based  on  the  hitting  time,  which  is  a  parameter  describing  the  expected  number  of                
331 steps  it  takes  for  a  random  walk  that  starts  at  node i  and  visit  node j  for  the  first  time.  Consider                      
332 the  teleporting  probability  of  ( )  and  a  seed  vector s  specifying  the  initial  probability     1 − α           

333 distribution  across  the n  nodes  (such  that ,  where s m  is  the  probability  of  starting  at        ∑
 

m
sm = 1          

334 node m )  the  personalized  PageRank  vector (which  is  defined  as  a  column  vector)  is  the       prα (s)           
335 unique   solution   to 56  

336 . αpr Z 1 )sprα (s)
T =  α (s)

T + ( − α T (5)  
 

337 Substituting Z  (Eq.  (3))  into  Eq.  (5),  we  can  express  the  personalized  PageRank  vector  in               prα (s)   
338 terms  of  the  inverse  of  the 𝛃 -normalized  Laplacian, of  the  modified  random  walk         Rβ,N L      
339 ( Supplementary   Note   2),   i.e.  
340   , s D R  Dprα (s)

T = β T −0.5 β,N L
0.5 (6)  

341 where ,  and .  and  are  the m th eigenvector  and  β = (2−α)
2(1−α)   Rβ,N L = ∑

 
m=1

Φ Φm
T
m

β+2x(1−β)η[ m]
 Φm   ηm    

   
342 eigenvalue  of  the  normalized  Laplacian.  In  the  expression  of R 𝛃,NL, the  𝛃  and x  regulate  the                  
343 weight  of  contribution  in  each  eigenvalue-eigenvector  pair  of  the  summation  such  that  the  first               
344 eigenvalue-eigenvector  pair  (corresponding  to  the  stationary  distribution  and  given  by  the            
345 local-node  degree-properties)  remains  included  in  the  overall  expression,  but  does  not  overwhelm             
346 the  global  information  provided  by  subsequent  ‘eigen-pairs’.  Moreover,  computation  of R 𝛃,NL  is              
347 not  limited  to  a  subset  of  the  first k eigenvectors  (bypassing  the  need  for  the  user  to  select  a                    
348 suitable  threshold  or  subset  of  eigenvectors)  since  the  dimensionality  is  not  on  the  order  of                

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=uzyf68ddp60p
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=fi93hnl2oym0
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

349 number  of  cells,  but  equal  to  the  number  of  clusters  and  hence  all  eigenvalue-eigenvector  pairs                
350 can   be   incorporated   without   causing   a   bottleneck   in   runtime.  

 
351 The   expected   hitting   time   from   node    q    to   node    r    is   given   by 44 ,  

 
352 hα (q, )r = dr

pr (e ) (r)[ α r T ] − dq
pr (e ) (q)[ α r T ]  (7)  

 
353 where is  an  indicator  vector  with  1  in  the i th  entry  and  0  elsewhere  (i.e.  if  and  ei               sm = 1   m = i   
354 if ).  We  can  substitute  Eq.  (6)  into  Eq.  (7),  making  use  of  the  fact  that  sm = 0  ≠im                
355 ,  and  is  symmetric,  to  obtain  a  closed  form  expression  of  the  1dr = D e[ −1 r] (r)   R  DD−0.5 β,N L −0.5            
356 hitting   time   in   terms   of   Rβ,N L  

 
357 (e ) D R  D ehα (q, )r = β r − eq

T −0.5
β,N L

−0.5
r (8)  

 
358 (ii) MCMC  simulation :  The  hitting  time  metric  computed  in  Step-1  is  used  to  infer               
359 graph-directionality.  Instead  of  pruning  edges  in  the  ‘reverse’  direction,  edge-weights  are  biased             
360 based   on   the   time   difference   between   nodes   using   the   logistic   function   with   growth   factor   b   =1.   
361 (t) f =  1

1+e −b (t − t )1 0  
  

362 We  then  recompute  the  pseudotimes  on  the  forward  biased  graph:  Since  there  is  no  closed  form                 
363 solution  of  hitting  times  on  a  directed  graph,  we  perform  MCMC  simulations  (parallely  processed               
364 to  enable  fast  simulations  of  1000s  of  teleporting,  lazy  random  walks  starting  at  the  root  node  of                  
365 the  cluster  graph)  and  use  the  first  quartile  of  the  simulated  pseudotime  values  for  a  respective                 
366 node  as  the  refined  pseudotime  for  that  node  relative  to  the  root.  This  refinement  step  ensures  that                  
367 the  pseudotime  is  robust  to  the  spurious  links  (or  conversely,  links  that  are  too  weakly  weighted)                 
368 that  can  distort  calculations  based  purely  on  the  closed  form  solution  of  hitting  times               
369 ( Supplementary  Fig.  S9d ).  By  using  this  2-step  pseudotime  computation, VIA  mitigates  the             
370 issues  of  convergence  issues  and  spurious  edge-weights,  both  of  which  are  common  in              
371 random-walk   pseudotime   computation   on   large   and   complex   datasets 12. .  

 
372 3. Automated  terminal-state  detection.  The  algorithm  then  uses  the  refined  directed  and  weighted             
373 graph  (the  edges  are  re-weighted  using  the  refined  pseudotimes)  to  predict  which  nodes  represent  the                
374 terminal  states  based  on  a  consensus  vote  of  pseudotime  and  multiple  vertex  connectivity  properties,               
375 including  out-degree  (i.e.  the  number  of  edges  directing  out  from  the  node),  closeness C( q ) ,  and                
376 betweenness    B( q ).    

377
C (q) = 1

(q,r)∑
 

q≠r
l

 
378 B (q) = ∑
 

r=q≠t/
σrt

σ (q)rt  

 
379  is  the  distance  between  node q  and  node r  (i.e.  the  sum  of  edges  in  a  shortest  path  connecting l (q, )r                     
380 them). is  the  total  number  of  shortest  paths  from  node r  to  node t .  is  the  number  of  these  σrt              σrt (q)       

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=hy8nvy7h6bta
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=onqq15xmlfrf
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

381 paths  passing  through  node q .  The  consensus  vote  is  performed  on  nodes  that  score  above  (or  below                  
382 for  out-degree)  the  median  in  terms  of  connectivity  properties.  We  show  on  multiple  simulated  and                
383 real  biological  datasets  that  VIA  more  accurately  predicts  the  terminal  states  across  a  range  of  input                 
384 data  dimensions  and  key  algorithm  parameters  than  other  methods  attempting  the  same             
385 (Supplementary   Fig.   S16).  

 
386 4. Automated  trajectory  reconstruction .  VIA  then  identifies  the  most  likely  path  of  each  lineage  by               
387 computing  the  likelihood  of  a  node  traversing  towards  a  particular  terminal  state  (e.g.  differentiation).               
388 These  lineage  likelihoods  are  computed  as  the  visitation  frequency  under  lazy-teleporting  MCMC             
389 simulations  from  the  root  to  a  particular  terminal  state,  i.e.  the  probability  of node  i reaching                 
390 terminal-state  j as  the  number  of  times cell  i  is  visited  along  a  successful  path  (i.e. terminal-state  j  is                    
391 reached)  divided  by  the  number  of  times cell  i  is  visited  along  all  of  the  simulations.  In  contrast  to                    
392 other  trajectory  reconstruction  methods  which  compute  the  shortest  paths  between  root  and  terminal              
393 node 1 ,2 ,  the  lazy-teleporting  MCMC  simulations  in  VIA  offer  a  probabilistic  view  of  pathways  under               
394 relaxed  conditions  that  are  not  only  restricted  to  the  random-walk  along  a  tree-like  graph,  but  can  also                  
395 be  generalizable  to  other  types  of  topologies,  such  as  cyclic  or  connected/disconnected  paths.  In  the                
396 same  vein,  we  avoid  confining  the  graph  to  an  absorbing  Markov  chain 13,3  (AMC)  as  this  places                 
397 prematurely  strict  /  potentially  inaccurate  constraints  on  node-to-node  mobility  and  can  impede             
398 sensitivity  to  cell  fates  (as  demonstrated  by  VIA’s  superior  cell  fate  detection  across  numerous               
399 datasets   ( Supplementary   Fig.   S16 ).   

400 Downstream   visualization   and   analysis  
401 VIA  generates  a  visualization  that  combines  the  network  topology  and  single-cell  level             
402 pseudotime/lineage  probability  properties  onto  an  embedding  based  on  UMAP  or  PHATE.  Generalized             
403 additive  models  (GAMs)  are  used  to  draw  edges  found  in  the  high-dimensional  graph  onto  the  lower                 
404 dimensional  visualization  ( Fig.  1 ).  An  unsupervised  downstream  analysis  of  cell  features  (e.g.  marker              
405 gene  expression,  protein  expression  or  image  phenotype)  along  pseudotime  for  each  lineage  is  performed               
406 ( Fig.  1 ).  Specifically,  VIA  plots  the  expression  of  features  across  pseudotime  for  each  lineage  by  using                 
407 the  lineage  likelihood  properties  to  weight  the  GAMs.  A  cluster-level  lineage  pathway  is  automatically               
408 produced  by  VIA  to  visualize  feature  heat  maps  at  the  cluster-level  along  a  lineage-path  to  see  the                  
409 regulation  of  genes.  VIA  provides  the  option  of  gene  imputation  before  plotting  the  lineage  specific  gene                 
410 trends.  The  imputation  is  fast  as  it  relies  on  the  single-cell  KNN  (scKNN)  graph  computed  in  Step  1.                   
411 Using  an  affinity-based  imputation  method 45 ,  this  step  computes  a  “diffused”  transition  matrix  on  the               
412 scKNN   graph   used   to   impute   and   denoise   the   original   gene   expressions.   

413 Benchmarked   Methods   
414 The  methods  were  mainly  chosen  based  on  their  superior  performance  in  a  recent  large-scale               
415 benchmarking  study 4 ,  including  a  select  few  recent  methods  claiming  to  supersede  those  in  the  study.                
416 Specifically,  recent  and  popular  methods  exhibiting  reasonable  scalability,  and  automated  cell  fate             
417 prediction  in  multi-lineage  trajectories  were  favoured  as  candidates  for  benchmarking  (See            
418 Supplementary  Table  S1  for  the  key  characteristics  of  methods).  Performance  stress-tests  in  terms  of               
419 lineage  detection  of  each  biological  dataset,  and  pseudotime  correlation  for  time-series  data  were              

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=s4aw0ujvuwn8
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=jbvdrwuod0wd
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=c4deqc1by20h
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=pwzzqt446v1
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

420 conducted  over  a  range  of  key  input  parameters  (e.g.  numbers  of  k-nearest  neighbors,  highly  variable                
421 genes  (HVGs),  and  principal  components  (PCs))  and  pre-processing  protocols  (see Fig.  2m,p,             
422 Supplementary  Fig.  16 ).  All  comparisons  were  run  on  a  computer  with  an  Intel(R)  Xeon  (R)  W-2123                 
423 central   processing   unit   (3.60GHz,   8   cores)   and   126   GB   RAM.  

 
424 Quantifying  terminal  state  prediction  accuracy  for  parameter  tests  was  done  using  the  F1-score,  defined               
425 as   the   harmonic   mean   of   recall   and   precision   and   calculated   as:  

426 F 1 =  
tp

tp + 0.5(f p+f n)   
 

427 Where tp  is  a  true-positive:  the  identification  of  a  terminal  cluster  that  is  in  fact  a  final  differentiated  cell                    
428 fate; fp  is  a  false  positive  identification  of  a  cluster  as  terminal  when  in  fact  it  represents  an  intermediate                     
429 state;   and    fn    is   a   false   negative   where   a   known   cell   fate   fails   to   be   identified   

 
430 PAGA 28 . It  uses  a  cluster-graph  representation  to  capture  the  underlying  topology.  PAGA  computes  a               
431 unified  pseudotime  by  averaging  the  single-cell  level  diffusion  pseudotime  computed  by  DPT,  but              
432 requires  manual  specification  of  terminal  cell  fates  and  clusters  that  contribute  to  lineages  of  interest  in                 
433 order   to   compare   gene   expression   trends   across   lineages.   

 
434 Palantir 2 . It  uses  diffusion-map 46.  components  to  represent  the  underlying  trajectory.  Pseudotimes  are             
435 computed  as  the  shortest  path  along  a  KNN-graph  constructed  in  a  low-dimensional  diffusion  component               
436 space,  with  edges  weighted  such  that  the  distance  between  nodes  corresponds  to  the  diffusion               
437 pseudotime 47.  (DPT).  Terminal  states  are  identified  as  extrema  of  the  diffusion  maps  that  are  also  outliers                 
438 of  the  stationary  distribution.  The  lineage-likelihood  probabilities  are  computed  using  Absorbing  Markov             
439 Chains   (constructed   by   removing   outgoing   edges   of   terminal   states,   and   thresholding   reverse   edges).   

 
440 Slingshot 1 . It  is  designed  to  process  low-dimensional  embeddings  of  the  single-cell  data.  By  default               
441 Slingshot  runs  clustering  based  on  Gaussian  mixture  modeling  and  recommends  using  the  first  few  PCs  as                 
442 input.  Slingshot  connects  the  clusters  using  a  minimum  spanning  tree  and  then  fits  principle  curves  for                 
443 each  detected  branch.  It  uses  the  orthogonal  projection  against  each  principal  curve  to  fit  a  separate                 
444 pseudotime  for  each  lineage,  and  hence  the  gene  expressions  cannot  be  compared  across  lineages.  Also,                
445 the   runtimes   are   prohibitively   long   for   large   datasets   or   high   input   dimensions.   

 
446 CellRank 13 . This  method  combines  the  information  of  RNA  velocity  (computed  using  scVelo 48. )  and              
447 gene-expression  to  infer  trajectories.  Given  it  is  mainly  suited  for  the  scRNA-seq  data,  with  the                
448 RNA-velocity  computation  limiting  the  overall  runtime  for  larger  dataset,  we  limit  our  comparison  to  the                
449 pancreatic   dataset   which   the   authors   of   CellRank   used   to   highlight   its   performance.   

450 Simulated   Data  
451 We  employed  the  DynToy 4  ( https://github.com/dynverse/dyntoy )  package,  which  generates  synthetic          
452 single-cell  gene  expression  data  (~1000  cells  x  1000  ‘genes’),  to  simulate  different  complex  trajectory               
453 models.  Using  these  datasets,  we  tested  that  VIA  consistently  and  more  accurately  captures  both  tree  and                 
454 non-tree  like  structures  (multifurcating,  cyclic,  and  disconnected)  compared  to  other  methods            

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=s2mv1z2n9zoj
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=14ruyzz6p81
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=6th1gotw3ydi
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=l3wpb1nev23n
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=s4aw0ujvuwn8
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=jbvdrwuod0wd
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=xpkwepv274v
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=pwzzqt446v1
https://github.com/dynverse/dyntoy
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

455 (Supplementary  Fig.  S1) .  All  methods  are  subject  to  the  same  data  pre-processing  steps,  PCA  dimension                
456 reduction   and   root-cell   to   initialize   the   path.  

 
457 Multifurcating  structure .  This  dataset  consists  of  1000  ‘cells’  multifurcating  into  4  terminal  states.  VIA               
458 robustly  captures  all  four  terminal  cell  fates  across  a  range  of  input  PCs  and  the  pseudotimes  are  well                   
459 inferred  relative  to  the  root  node (Supplementary  Fig.  S1a) .  Note  that  two  terminal  states  (M2  and  M8),                  
460 which  are  very  close  to  each  other,  are  easily  merged  by  the  other  methods  (Slingshot,  Palantir  and                  
461 PAGA).  
462 Cyclic  structure. We  ran  VIA  and  other  methods  for  different  values  of  K  nearest  neighbors.  VIA                 
463 unambiguously  shows  a  cyclic  network  for  a  range  of  K  (in  KNN). Slingshot  does  not  use  a  KNN                   
464 parameter  and  shows  3  fragmented  different  lineages  (top  to  bottom).  PAGA  fails  to  capture  the                
465 connected  cyclic  structure  at  K  =  10  and  5,  while  Palantir  visually  shows  a  linear  (K  =  10,  30)  or                     
466 disconnected  structure  (K  =  5).  Van  den  Berge  et  al 57  note  that  the  challenge  of  cyclic  trajectory                  
467 reconstruction  is  also  common  in  other  popular  methods,  such  as  Monocle3  that  consistently  fragments  or                
468 fits   branching   structures   onto   cyclic   simulated   datasets.   
469 Disconnected  structure. This  dataset  comprises  two  disconnected  trajectories  (T1  and  T2).  T1  is  cyclic               
470 with  an  extra  branch  (M5  to  M6),  T2  has  a  bifurcation  at  M3  ( Supplementary  Fig.  S1c) .  VIA  captures                   
471 the  two  disconnected  structures  as  well  as  the  M6  branch  in  the  cyclic  structure,  and  the  bifurcation  in  the                    
472 smaller  structure.  PAGA  captures  the  underlying  structure  at  PC  =  20  but  becomes  fragmented  for  other                 
473 numbers  of  PCs.  Palantir  also  yields  multiple  fragments  and  is  not  able  to  capture  the  overall  structure,                  
474 while  Slingshot  (using  the  default  clustering  based  on  Gaussian  mixture  modeling)  connects  T1  and  T2,                
475 and   only   captures   one   of   the   bifurcations   in   T1.  

476 Biological   Data  
477 The  pre-processing  steps  described  below  for  each  dataset  are  not  included  in  the  reported  runtimes  as                 
478 these  steps  are  typically  very  fast,  (typically  less  than  1-10%  of  the  total  runtime  depending  on  the                  
479 method.  E.g.  only  a  few  minutes  for  pre-processing  100,000s  of  cells)  and  only  need  to  be  performed                  
480 once  as  they  remain  the  same  for  all  subsequent  analyses.  It  should  also  be  noted  that  visualization  (e.g.                   
481 UMAP,  t-SNE)  are  not  included  in  the  runtimes.  VIA  provides  a  subsampling  option  at  the  visualization                 
482 stage  to  accelerate  this  process  for  large  datasets  without  impacting  the  previous  computational  steps.               
483 However,  to  ensure  fair  comparisons  between  TI  methods  (e.g.  other  methods  do  not  have  an  option  to                  
484 compute  the  embedding  on  a  subsampled  input  and  transfer  the  results  between  the  full  trajectory  and  the                  
485 sampled  visualization,  or  rely  on  a  slow  version  of  tSNE),  we  simply  provide  each  TI  method  with  a                   
486 pre-computed   visualization   embedding   on   which   the   computed   results   are   projected.   

 
487 ScRNA-seq  of  mouse  pre-B  cells. This  dataset 26  models  the  pre-BI  cell  (Hardy  fraction  C’)  process                
488 during  which  cells  progress  to  the  pre-BII  stage  and  B  cell  progenitors  undergo  growth  arrest  and                 
489 differentiation.  Measurements  were  obtained  at  0,  2,  6,  12,  18  and  24  hours  (h)  for  a  total  of  313  cells  x                      
490 9,075  genes.  We  follow  a  standard  Scanpy  preprocessing  recipe 49  that  filters  cells  with  low  counts,  and                 
491 genes  that  occur  in  less  than  3  cells.  The  filtered  cells  are  normalized  by  library  size  and  log  transformed.                    
492 The  top  5000  highly  variable  genes  (HVG)  are  retained.  Cells  are  renormalized  by  library  count  and                 
493 scaled  to  unit  variance  and  zero  mean.  VIA  identifies  the  terminal  state  at  18-24  h  and  accurately                  

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=mfw3e1p8x8hj
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=hpu96rs05k0s
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=s9izyt6361ye
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

494 recapitulates  the  gene  expression  trends 26  along  inferred  pseudotime  of IgII1 , Slc7a5 , Fox01 , Myc , Ldha               
495 and Lig4 .  ( Supplementary  Fig.  S2a). We  show  the  results  generalize  across  a  range  of  PCs  for  two                  
496 values  of  K  of  the  graph  with  higher  accuracy  in  locating  the  later  cell  fates  than  Slingshot  and  Palantir.                    
497 ( Supplementary   Fig.   S2b).  

 
498 ScRNA-seq  of  human  CD34+  bone  marrow  cells. This  is  a  scRNA-seq  dataset  of  5800  cells                
499 representing  human  hematopoiesis 2. .  We  used  the  filtered,  normalized  and  log-transformed  count  matrix             
500 provided  by  Setty  et  al 2. .,  with  PCA  performed  on  all  the  remaining  genes.  The  cells  were  annotated  using                   
501 SingleR 50.  which  automatically  labeled  cells  based  on  the  hematopoietic  reference  dataset  Novershtern             
502 Hematopoietic  Cell  Data  - GSE24759 51. .  The  annotations  are  in  agreement  with  the  labels  inferred  by                
503 Setty  et  al.  for  the  7  clusters,  including  the  root  HSCs  cluster  that  differentiates  into  6  different  lineages:                   
504 monocytes,  erythrocytes,  and  B  cells,  as  well  as  the  less  populous  megakaryocytes,  cDCs  and  pDCs.  VIA                 
505 consistently  identifies  these  lineages  across  a  wider  range  of  input  parameters  and  data  dimensions  (e.g.                
506 the  number  of  K  and  PCs  provided  as  input  to  the  algorithms  see Fig.  2p,  and  Supplementary  Fig.  S3c ).                    
507 Notably,  the  upregulated  gene  expression  trends  of  the  small  populations  can  be  recovered  in  VIA,  i.e.                 
508 pDC  and  cDC  show  elevated  CD123  and  CSF1R  levels  relative  to  other  lineages,  and  the  upregulated                 
509 CD41   expression   in   megakaryocytes   ( Supplementary   Fig.   S3-S4) .   

 
510 ScRNA-seq  of  human  embryoid  body. This  is  a  midsized  scRNA-seq  dataset  of  16,825  human  cells  in                 
511 embryoid  bodies  (EBs) 15 .  We  followed  the  same  pre-processing  steps  as  Moon  et  al.  to  filter  out  dead                  
512 cells  and  those  with  too  high  or  low  library  count.  Cells  are  normalized  by  library  count  followed  by                   
513 square  root  transform.  Finally  the  transformed  counts  are  scaled  to  unit  variance  and  zero  mean.  The                 
514 filtered  data  contained  16825  cells  ×  17580  genes.  PCA  is  performed  on  the  processed  data  before                 
515 running  each  TI  method.  VIA  identifies  6  cell  fates,  which,  based  on  the  upregulation  of  marker  genes  as                   
516 cells  proceed  towards  respective  lineages,  are  in  accord  with  the  annotations  given  by  Moon  et  al.,  (See                  
517 the  gene  heatmap  and  changes  in  gene  expression  along  respective  lineage  trajectories  in Supplementary               
518 Fig.  S5).  Note  that  Palantir  and  Slingshot  do  not  capture  the  cardiac  cell  fate,  and  Slingshot  also  misses                    
519 the   neural   crest    ( see   the   F1-scores   summary   for   terminal   state   detection    Supplementary   Fig.   S5).  

 
520 ScRNA-seq  of  mouse  organogenesis  cell  atla s . This  is  a  large  and  complex  scRNA-seq  dataset  of  mouse                 
521 organogenesis  cell  atlas  (MOCA)  consisting  of  1.3  million  cells 6. .  The  dataset  contains  cells  from  61                
522 embryos  spanning  5  developmental  stages  from  early  organogenesis  (E9.5-E10.5)  to  organogenesis            
523 (E13.5).  Of  the  2  million  cells  profiled,  1.3  million  are  ‘high-quality’  cells  that  are  analysed  by  VIA.  The                   
524 runtime  is  approximately  40  minutes  which  is  in  stark  contrast  to  the  next  fastest  tool  Palantir  which  takes                   
525 4  hours  (excluding  visualization).  The  authors  of  MOCA  manually  annotated  38  cell-types  based  on  the                
526 differentially  expressed  genes  of  the  clusters.  In  general,  each  cell  type  exclusively  falls  under  one  of  10                  
527 major  and  disjoint  trajectories  inferred  by  applying  Monocle3  to  the  UMAP  of  MOCA.  The  authors                
528 attributed  the  disconnected  nature  of  the  10  trajectories  to  the  paucity  of  earlier  stage  common                
529 predecessor  cells.  We  followed  the  same  steps  as  Cao  et  al. 6  to  retain  high-quality  cells  (i.e.  remove  cells                   
530 with  less  than  400  mRNA,  and  remove  doublet  cells  and  cells  from  doubled  derived  sub-clusters).  PCA                 
531 was  applied  to  the  top  2000  HVGs  with  the  top  30  PCs  selected  for  analysis.  VIA  analyzed  the  data  in  the                      
532 high-dimensional  PC  space.  We  bypass  the  step  in  Monocle3 6  which  applies  UMAP  on  the  PCs  prior  to                  
533 TI  as  this  incurs  an  additional  bias  from  choice  of  manifold-learning  parameters  and  a  further  loss  in                  

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=hpu96rs05k0s
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=14ruyzz6p81
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=14ruyzz6p81
https://bioconductor.org/packages/3.11/SingleR
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=gqvc4cw37qlq
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE24759
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=8w1hbpz1mcwd
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=9ohfvgceg55
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=2glf5ij7qkb6
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=2glf5ij7qkb6
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=2glf5ij7qkb6
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

534 neighborhood  information.  As  a  result,  VIA  produces  a  more  connected  structure  with  linkages  between               
535 some  of  the  major  cell  types  that  become  segregated  in  UMAP  (and  hence  Monocle3),  and  favors  a                  
536 biologically  relevant  interpretation  ( Fig.  2,  Supplementary  Fig.  S11 ).  A  detailed  explanation  of  these              
537 connections  (graph-edges)  extending  between  certain  major  groups  using  references  to  literature  on             
538 organogenesis   is   presented   in    Supplementary   Note   3.  

 
539 ScRNA-seq  of  murine  endocrine  development 5 . This  is  an  scRNA-seq  dataset  of  E15.5  murine              
540 pancreatic  cells  spanning  all  developmental  stages  from  an  initial  endocrine  progenitor-precursor  (EP)             
541 state  (low  level  of Ngn3  ,  or Ngn3 low ),  to  the  intermediate  EP  (high  level  of Ngn3  ,  or Ngn3 high )  and  Fev +                    
542 states,  to  the  terminal  states  of  hormone-producing  alpha,  beta,  epsilon  and  delta  cells5. 5.  Following  steps                
543 by  Lange  et  al 13. ,  we  preprocessed  the  data  using  scVelo  to  filter  genes,  normalize  each  cell  by  total  counts                    
544 over  all  genes,  keep  the  top  most  variable  genes,  and  take  the  log-transform.  PCA  was  applied  to  the                   
545 processed  gene  matrix.  We  assessed  the  performance  of  VIA  and  other  TI  methods  (CellRank,  Palantir,                
546 Slingshot)   across   a   range   of   number   of   retained   HVGs   and   input   PCs   ( Fig.   2m ,    Supplementary   Fig.   S6) .   

 
547 ScATAC-seq  of  human  bone  marrow  cells. This  scATAC-seq  data  profiles  3072  cells  isolated  from               
548 human  bone  marrow  using  fluorescence  activated  cell  sorting  (FACS),  yielding  9  populations 27 : HSC,              
549 MPP,  CMP,  CLP,  LMPP,  GMP,  MEP,  mono  and  plasmacytoid  DCs  ( Fig.  3a  and  Supplementary  Fig.                
550 S7 ).  We  examined  TI  results  for  two  different  preprocessing  pipelines  to  gauge  how  robust  VIA  is  on  the                   
551 scATAC-seq  analysis  which  is  known  to  be  challenging  for  its  extreme  intrinsic  sparsity.  We  used  the                 
552 pre-processed  data  consisting  of  PCA  applied  to  the  z-scores  of  the  transcription  factor  (TF)  motifs  used                 
553 by  Buenrostro  et  a 27. .  Their  approach  corrects  for  batch  effects  in  select  populations  and  weighting  of  PCs                  
554 based  on  reference  populations  and  hence  involves  manual  curation.  We  also  employed  a  more  general                
555 approach  used  by Chen  et  al. 31.  which  employs  ChromVAR  to  compute  k-mer  accessibility  z-scores  across                
556 cells.  VIA  infers  the  correct  trajectories  and  the  terminal  cell  fates  for  both  of  these  inputs,  again  across  a                    
557 wide   range   of   input   parameters   ( Fig.   3d   and   Supplementary   Fig.   S7 ).   

 
558 ScRNA-seq  and  scATAC-seq  of Isl1+  cardiac  progenitor  cells. This  time-series  dataset  captures             
559 murine Isl1+  cardiac  progenitor  cells  (CPCs)  from  E7.5  to  E9.5  characterized  by  scRNA-seq  (197  cells)                
560 and  scATAC-seq  (695  cells) 20. .  The Isl1+ CPCs  are  known  to  undergo  multipotent  differentiation  to               
561 cardiomyocytes  or  endothelial  cells.  For  the  scRNA-seq  data,  the  quality  filtered  genes  and  the  size-factor                
562 normalized  expression  values  are  provided  by  Jia  et  al. 20  as  a  “Single  Cell  Expression  Set”  object  in  R.                   
563 Similarly,  the  cells  in  the  scATAC-seq  experiment  were  provided  in  a  “SingleCellExperiment”  object  with               
564 low  quality  cells  excluded  from  further  analysis.  The  accessibility  of  peaks  was  transformed  to  a  binary                 
565 representation  as  input  for  TF-IDF  (term  frequency-inverse  document  frequency)  weighting  prior  to             
566 singular  value  decomposition  (SVD).  The  highlighted  TF  motifs  in  the  heatmap  ( Fig.  2j )  correspond  to                
567 those  highlighted  by  Jia  et  al.  We  tested  the  performance  when  varying  the  number  of  SVDs  used.  We                   
568 also  considered  the  outcome  when  merging  the  scATAC-seq  and  scRNA-seq  data  using  Seurat3 52. .              
569 Despite  the  relatively  low  cell  count  of  both  datasets,  and  the  relatively  under-represented  scRNA-seq  cell                
570 count,  the  two  datasets  overlapped  reasonably  well  and  allowed  us  to  infer  the  expected  lineages  in  an                  
571 unsupervised  manner  ( Fig.  2d  and  Supplementary  Fig.  S8 .  In  contrast,  Jia  et  al.,  performed  a  supervised                 
572 TI  by  manually  selecting  cells  relevant  to  the  different  lineages  (for  the  scATAC-seq  cells)  and  choosing                 
573 the   two   diffusion   components   that   best   characterize   the   developmental   trajectories   in   low   dimension 20 .  

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=4l8bltp8p9u2
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=4l8bltp8p9u2
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=jbvdrwuod0wd
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=y5ixemnm3tac
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=y5ixemnm3tac
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=cpy816x2qcr
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=dt4acr7aal4q
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=dt4acr7aal4q
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=hkri0zw1jhb3
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=dt4acr7aal4q
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

 
574 Mass  cytometry  data  of  mouse  embryonic  stem  cells  (mESC) . This  is  a  mass  cytometry  (or  CyTOF)                 
575 dataset,  consisting  of  90,000  cells  and  28  antibodies  (corresponding  to  ~7000  cells  each  from  Day  0-11                 
576 measurements),  that  represents  differentiation  of  mESC  to  mesoderm  cells 32. .  An  arcsinh  transform  with  a               
577 scaling  factor  of  5  was  applied  on  all  features  -  a  standard  procedure  for  CyTOF  datasets,  followed  by                   
578 normalization  to  unit  variance  and  zero  mean.  Given  the  small  feature  set,  no  PCA  is  required                 
579 (Supplementary  Fig.  S9) .  VIA  identifies  3  main  terminal  states  corresponding  to  Day  11  and  Day  10,                 
580 Palantir  on  the  other  hand  identifies  three  Terminal  states  that  all  correspond  to  Days  in  the  first  half  of                    
581 the  experiment  and  the  pseudotime  is  heavily  influenced  by  the  root  node  being  very  weakly  connected  to                  
582 the  other  stages  of  the  process.  Slingshot  appears  to  capture  the  overall  pseudotime  but  the  6  lineages                  
583 imposed  onto  the  low  dimensional  representation  are  difficult  to  interpret  and  distinguish.  To  improve               
584 Palantir  performance  we  used  5000  waypoints  but  this  takes  almost  20  minutes  to  complete  (excluding                
585 time  taken  for  embedding  the  visualization).  VIA  runs  in  ~3  minutes  and  produces  results  consistent  with                 
586 the  known  ordering.  The  pseudotime  reflects  the  range  of  Days  very  well,  even  capturing  the  small                 
587 population  of  Day  11  cells  on  the  left  hand  side  of  the  Day  6  cells  in  the  embedding (Fig.  3,  and                      
588 Supplementary   Fig.   S9) .  

 
589 Single-cell  biophysical  phenotypes  derived  from  imaging  flow  cytometry.  This  is  the  in-house  dataset              
590 of  single-cell  biophysical  phenotypes  of  two  different  human  breast  cancer  types  (MDA-MB231  and              
591 MCF7).  Following  our  recent  image-based  biophysical  phenotyping  strategy 53 , 54 ,  we  defined  the            
592 spatially-resolved  biophysical  features  of  a  cell  in  a  hierarchical  manner  based  on  both  bright-field  and                
593 quantitative  phase  images  captured  by  the  FACED  imaging  flow  cytometer  (i.e.,  from  the  bulk  features  to                 
594 the  subcellular  textures).  At  the  bulk  level,  we  extracted  the  cell  size,  dry  mass  density,  and  cell  shape.  At                    
595 the  subcellular  texture  level,  we  parameterized  the  global  and  local  textural  characteristics  of  optical               
596 density  and  mass  density  at  both  the  coarse  and  fine  scales  (e.g.,  local  variation  of  mass  density,  its                   
597 higher-order  statistics,  phase  entropy  radial  distribution  etc.).  This  hierarchical  phenotyping  approach 53 , 54            
598 allowed  us  to  establish  a  single-cell  biophysical  profile  of  38  features,  which  were  normalized  based  on                 
599 the  z-score  ( See  Supplementary  Table  S4  and  Table  S5 ).  All  these  features,  without  any  PCA,  are  used                  
600 as  input  to  VIA.  In  order  to  weigh  the  features,  we  use  a  mutual  information  classifier  to  rank  the  features,                     
601 based  on  the  integrated  fluorescence  intensity  of  the  fluorescence  FACED  images  of  the  cells  (which                
602 serve  as  the  ground  truth  of  the  cell-cycle  stages).  Following  normalization,  the  top  3  features  (which                 
603 relate   to   cell   size)   are   weighted   (using   a   factor   between   3-10).   

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=29jzsvekzs4z
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=msgdsdsh4sty
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=qiw3flfp2a4l
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=v1z8vlisxo3y
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=7jdtkd13bs9a
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

604 Imaging   flow   cytometry   experiment  

605 FACED   imaging   flow   cytometer   setup   

606 A  multimodal  FACED  imaging  flow  cytometry  (IFC)  platform  was  used  to  obtain  the  quantitative  phase                
607 and  fluorescence  images  of  single  cells  in  microfluidic  flow  at  an  imaging  throughput  of  ~70,000                
608 cells/sec.  The  light  source  consisted  of  an  Nd:YVO  picosecond  laser  (center  wavelength  =  1064  nm,                
609 Time-Bandwidth)  and  a  periodically-poled  lithium  niobate  (PPLN)  crystal  (Covesion)  for  second            
610 harmonic  generation  of  a  green  pulsed  beam  (center  wavelength  =  532  nm)  with  a  repetition  rate  of  20                   
611 MHz.  The  beam  was  then  directed  to  the  FACED  module,  which  mainly  consists  of  a  pair  of                  
612 almost-parallel  plane  mirrors.  This  module  generated  a  linear  array  of  50  beamlets  (foci)  which  were                
613 projected  by  an  objective  lens  (40X,  0.6NA,  MRH08430,  Nikon)  on  the  flowing  cells  in  the  microfluidic                 
614 channel  for  imaging.  Each  beamlet  was  designed  to  have  a  time  delay  of  1  ns  with  the  neighboring                   
615 beamlet  in  order  to  minimize  the  fluorescence  crosstalk  due  to  the  fluorescence  decay.  Detailed               
616 configuration  of  the  FACED  module  can  be  referred  to  Wu  et  al. 33. .  The  epi-fluorescence  image  signal                 
617 was  collected  by  the  same  objective  lens  and  directed  through  a  band-pass  dichroic  beamsplitter  (center:                
618 575nm,  bandwidth:  15nm).  The  filtered  orange  fluorescence  signal  was  collected  by  the  photomultiplier              
619 tube  (PMT)  (rise  time:  0.57  ns,  Hamamatsu).  On  the  other  hand,  the  transmitted  light  through  the  cell  was                   
620 collected  by  another  objective  lens  (40X,  0.8NA,  MRD07420,  Nikon).  The  light  was  then  split  equally  by                 
621 the  50:50  beamsplitter  into  two  paths,  each  of  which  encodes  different  phase-gradient  image  contrasts  of                
622 the  same  cell  (a  concept  similar  to  Scherlien  photography 55. ).  The  two  beams  are  combined,               
623 time-interleaved,  and  directed  to  the  photodetector  (PD)  (bandwidth:  >10  GHz,  Alphalas)  for  detection.              
624 The  signals  obtained  from  both  PMT  and  PD  were  then  passed  to  a  real-time  high-bandwidth  digitizer  (20                  
625 GHz,   80   GS/s,   Lecroy)   for   data   recording.  

 
626 Cell   culture   and   preparation   
627 MDA-MB231  (ATCC)  and  MCF7  (ATCC),  which  are  two  different  breast  cancer  cell  lines,  were  used  for                 
628 the  cell  cycle  study.  The  culture  medium  for  MDA-MB231was  ATCC  modified  RPMI  1640  (Gibco)               
629 supplemented  with  10%  fetal  bovine  serum  (FBS)  (Gibco)  and  1%  antibiotic-antimycotic  (Anti-Anti)             
630 (Gibco),  while  that  for  MCF7  was  DMEM  supplemented  with  10%  FBS  (Gibco)  and  1%  Anti-Anti                
631 (Gibco).  The  cells  were  cultured  inside  an  incubator  under  5%  CO 2  and  37°C,  and  subcultured  twice  a                  
632 week.  1e6  cells  were  pipetted  out  from  each  cell  line  and  stained  with  Vybrant  DyeCycle  orange  stain                  
633 (Invitrogen).   

  
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=1ezm5xa09jlh
https://docs.google.com/document/d/17hKqD3B5gmaBgnhrDUYigPD_-Ue37OFupUwD5jHxSOU/edit#smartreference=k1dnflcadtcx
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

634 Data   Availability  
635 Data   used   in   Figures   1-3   as   well   as   Supplementary   Figures   S1-S15)   is   available   on:  
636 1. Pancreatic   data:   Gene   Expression   Omnibus   (GEO)   under   accession   code   GSE132188.   
637 2. Cardiac  progenitor  data  is  available  from  the  ENA  repository  under  the  accession  code              
638 PRJEB23303   or   from   [ https://github.com/loosolab/cardiac-progenitors ].   
639 3. B-cell:   STATegraData   GitHub   repository.   [ https://github.com/STATegraData/STATegraData ]  
640 4. Mass   cytometry   mesoderm:   Cytobank  
641 [ https://community.cytobank.org/cytobank/experiments/71953 ].   
642 5. Raw   and   processed   data   for   scRNA-seq   Human   Hematopoeisis   are   available   through   the   Human  
643 Cell   Atlas   data   portal   at  
644 https://data.humancellatlas.org/explore/projects/091cf39b-01bc-42e5-9437-f419a66c8a45 .  
645 6. Embryoid   Body:   Mendeley   Data   repository   at   https://doi.org/10.17632/v6n743h5ng.1.  
646 7. Mouse   Organogenesis   :   NCBI   Gene   Expression   Omnibus   under   accession   number    GSE119945  
647 8. FACED  cell  cycle: https://github.com/ShobiStassen/VIA  and  on  FigShare        
648 https://doi.org/10.6084/m9.figshare.13601405.v1   
649 9. scATAC-seq   Hematopoiesis:   GEO:   GSE96772.   Processed   scATAC-seq   data,   which   include   PC  
650 values   and   TF   scores   per   cell   can   be   found   in   Data   S1.   of  
651 https://doi.org/10.1016/j.cell.2018.03.074   
652 10. Toy   Data:    https://github.com/ShobiStassen/VIA  

 
653 Code   Availability  
654 VIA   is   available   as   a   pip   installable   python   library   “pyVIA”   with   tutorials   and   sample   data   available   on  
655 https://github.com/ShobiStassen/VIA    and    https://pypi.org/project/pyVIA/  

 
656 References  
657 1. Street,  K.  et  al.  Slingshot:  cell  lineage  and  pseudotime  inference  for  single-cell  transcriptomics.              
658 BMC   Genomics   19,   477   (2018).  
659 2. Setty  M,  Kiseliovas  V,  Levine  J,  Gayoso  A,  Mazutis  L,  Pe'er  D.  Characterization  of  cell  fate                 
660 probabilities  in  single-cell  data  with  Palantir  [published  correction  appears  in  Nat  Biotechnol.             
661 2019   Oct;37(10):1237].   Nat   Biotechnol.   2019;37(4):451-460.   doi:10.1038/s41587-019-0068-4  
662 3. Qiu,  X.,  Mao,  Q.,  Tang,  Y. et  al.  Reversed  graph  embedding  resolves  complex  single-cell               
663 trajectories.    Nat   Methods    14,   979–982   (2017).    https://doi.org/10.1038/nmeth.4402  
664 4. Saelens,  W.,  Cannoodt,  R.,  Todorov,  H.  et  al.  A  comparison  of  single-cell  trajectory  inference               
665 methods.   Nat   Biotechnol   37,   547–554   (2019).    https://doi.org/10.1038/s41587-019-0071-9  
666 5. Bastidas-Ponce,  A.  et  al.  Comprehensive  single  cell  mRNA  profiling  reveals  a  detailed  roadmap              
667 for   pancreatic   endocrinegenesis.   Development   146,   (2019).  
668 6. Cao,  J.  et  al.  Comprehensive  single-  cell  transcriptional  profiling  of  a  multicellular  organism.              
669 Science   357,661–667   (2017).  

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://github.com/loosolab/cardiac-progenitors
https://github.com/STATegraData/STATegraData
https://community.cytobank.org/cytobank/experiments/71953
https://data.humancellatlas.org/explore/projects/091cf39b-01bc-42e5-9437-f419a66c8a45
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE119945
https://github.com/ShobiStassen/VIA
https://doi.org/10.6084/m9.figshare.13601405.v1
https://doi.org/10.1016/j.cell.2018.03.074
https://github.com/ShobiStassen/VIA
https://github.com/ShobiStassen/VIA
https://pypi.org/project/pyVIA/
https://doi.org/10.1038/nmeth.4402
https://doi.org/10.1038/s41587-019-0071-9
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

670 7. Packer,  J.  S.  et  al.  A  lineage-  resolved  molecular  atlas  of  C.  elegans  embryogenesis  at  single-  cell                  
671 resolution.Science   365,   eaax1971   (2019).  
672 8. Cao,  J.,  Spielmann,  M.,  Qiu,  X.  et  al.  The  single-cell  transcriptional  landscape  of  mammalian               
673 organogenesis.   Nature   566,   496–502   (2019).   
674 9. Briggs,  J.  A.  et  al.  The  dynamics  of  gene  expression  in  vertebrate  embryogenesis  at  single-  cell                 
675 resolution.Science   360,   eaar5780   (2018).  
676 10. Litviňuková,  M.,  Talavera-López,  C.,  Maatz,  H.  et  al.  Cells  of  the  adult  human  heart.  Nature                
677 (2020).  
678 11. Stassen  SV,  Siu  DMD,  Lee  KCM,  Ho  JWK,  So  HKH,  Tsia  KK.  PARC:  ultrafast  and  accurate                 
679 clustering  of  phenotypic  data  of  millions  of  single  cells.  Bioinformatics.  2020  May             
680 1;36(9):2778-2786.   doi:   10.1093/bioinformatics/btaa042.   
681 12. Ulrike  von  Luxburg,  Agnes  Rad,  Matthias  Hein.  Hitting  and  Commute  Times  in  Large  Random               
682 Neighborhood   Graphs.   Journal   of   Machine   Learning   Research   15,   1751-1798   (2014)   
683 13. Marius  Lange,  Volker  Bergen,  Michal  Klein,  Manu  Setty,  Bernhard  Reuter,  Mostafa  Bakhti,             
684 Heiko  Lickert,  Meshal  Ansari,  Janine  Schniering,  Herbert  B.  Schiller,  Dana  Pe’er,  Fabian  J.              
685 Theis.  CellRank  for  directed  single-cell  fate  mapping.  bioRxiv  2020.10.19.345983;  doi:           
686 https://doi.org/10.1101/2020.10.19.345983  
687 14. McInnes,  L.,  Healy,  J.,  Saul,  N.  &  Großberger,  L.  UMAP:  uniform  manifold  approximation  and               
688 projection.    J.   Open   Source   Software.    3,   861   (2018).  
689 15. Moon,  K.R.,  van  Dijk,  D.,  Wang,  Z.  et  al.  Visualizing  structure  and  transitions  in               
690 high-dimensional  biological  data.  Nat  Biotechnol  37,  1482–1492  (2019).         
691 https://doi.org/10.1038/s41587-019-0336-3  
692 16. Tam  PP,  Behringer  RR.  Mouse  gastrulation:  the  formation  of  a  mammalian  body  plan.  Mech  Dev.                
693 1997;68(1-2):3-25.   doi:10.1016/s0925-4773(97)00123-8  
694 17. Chin  AM,  Hill  DR,  Aurora  M,  Spence  JR.  Morphogenesis  and  maturation  of  the  embryonic  and                
695 postnatal  intestine.  Semin  Cell  Dev  Biol.  2017  Jun;66:81-93.  doi:  10.1016/j.semcdb.2017.01.011.           
696 Epub   2017   Feb   1.   
697 18. Gilbert  SF.  Developmental  Biology.  6th  edition.  Sunderland  (MA):  Sinauer  Associates;  2000.  The             
698 Neural   Crest.   Available   from:    https://www.ncbi.nlm.nih.gov/books/NBK10065/  
699 19. The  human  body  at  cellular  resolution:  the  NIH  Human  Biomolecular  Atlas  Program,  Nature              
700 (2019)    https://doi.org/10.1038/s41586-019-1629-x  
701 20. Jia   G,   Preussner   J,   Chen   X,   Guenther   S,   Yuan   X,   Yekelchyk   M,   Kuenne   C,   Looso   M,   Zhou   Y,  
702 Teichmann   S,   Braun   T.   Single   cell   RNA-seq   and   ATAC-seq   analysis   of   cardiac   progenitor   cell  
703 transition   states   and   lineage   settlement.   Nat   Commun.   2018   Nov   19;9(1):4877.   
704 21. Tanya  E.  Foley,  Bradley  Hess,  Joanne  G.  A.  Savory,  Randy  Ringuette,  David  Lohnes.Role  of  Cdx                
705 factors  in  early  mesodermal  fate  decisions.Development  2019  146:  dev170498  doi:           
706 10.1242/dev.170498   Published   1   April   2019  
707 22. Yao  Y,  Yao  J,  Boström  KI.  SOX  Transcription  Factors  in  Endothelial  Differentiation  and              
708 Endothelial-Mesenchymal  Transitions.  Front  Cardiovasc  Med.  2019;6:30.  Published  2019  Mar          
709 28.   doi:10.3389/fcvm.2019.00030  
710 23. Potta  SP,  Liang  H,  Winkler  J,  Doss  MX,  Chen  S,  Wagh  V,  Pfannkuche  K,  Hescheler  J,  Sachinidis                  
711 A.  Isolation  and  functional  characterization  of  alpha-smooth  muscle  actin  expressing           

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://doi.org/10.1101/2020.10.19.345983
https://doi.org/10.1038/s41587-019-0336-3
https://www.ncbi.nlm.nih.gov/books/NBK10065/
https://doi.org/10.1038/s41586-019-1629-x
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

712 cardiomyocytes  from  embryonic  stem  cells.  Cell  Physiol  Biochem.  2010;25(6):595-604.  doi:           
713 10.1159/000315078.   Epub   2010   May   18.   PMID:   20511704.  
714 24. Warkman  AS,  Whitman  SA,  Miller  MK,  Garriock  RJ,  Schwach  CM,  Gregorio  CC,  Krieg  PA.               
715 Developmental  expression  and  cardiac  transcriptional  regulation  of  Myh7b,  a  third  myosin  heavy             
716 chain  in  the  vertebrate  heart.  Cytoskeleton  (Hoboken).  2012  May;69(5):324-35.  doi:           
717 10.1002/cm.21029.  Epub  2012  Apr  30.  Erratum  in:  Cytoskeleton  (Hoboken).  2012           
718 Dec;69(12):1086.   PMID:   22422726;   PMCID:   PMC4734749.  
719 25. Mahmoud  AI,  Kocabas  F,  Muralidhar  SA,  et  al.  Meis1  regulates  postnatal  cardiomyocyte  cell              
720 cycle   arrest.   Nature.   2013;497(7448):249-253.   doi:10.1038/nature12054  
721 26. Gomez-Cabrero,  D.,  Tarazona,  S.,  Ferreirós-Vidal,  I.  et  al.  STATegra,  a  comprehensive            
722 multi-omics  dataset  of  B-cell  differentiation  in  mouse.  Sci  Data  6,  256  (2019).             
723 https://doi.org/10.1038/s41597-019-0202-7  
724 27. Jason  D.  Buenrostro,  M.  Ryan  Corces,  Caleb  A.  Lareau,  Beijing  Wu,  Alicia  N.  Schep,  Martin  J.                 
725 Aryee,  Ravindra  Majeti,  Howard  Y.  Chang,  William  J.  Greenleaf,  Integrated  Single-Cell  Analysis             
726 Maps  the  Continuous  Regulatory  Landscape  of  Human  Hematopoietic  Differentiation,  Cell,  173,            
727 1535-1548.e16,   (2018)    https://doi.org/10.1016/j.cell.2018.03.074 .  
728 28. Wolf,  F.  A.  et  al.  PAGA:  graph  abstraction  reconciles  clustering  with  trajectory  inference  through               
729 a   topology   preserving   map   of   single   cells.   Genome   Biol.   20,   59   (2019).  
730 29. Gutierrez  GD,  Gromada  J,  Sussel  L.  Heterogeneity  of  the  Pancreatic  Beta  Cell.  Front  Genet.               
731 2017;8:22.   Published   2017   Mar   6.   doi:10.3389/fgene.2017.00022  
732 30. Krentz  NAJ,  Lee  MYY,  Xu  EE,  Sproul  SLJ,  Maslova  A,  Sasaki  S,  Lynn  FC.  Single-Cell                
733 Transcriptome  Profiling  of  Mouse  and  hESC-Derived  Pancreatic  Progenitors.  Stem  Cell  Reports.            
734 2018  Dec  11;11(6):1551-1564.  doi:  10.1016/j.stemcr.2018.11.008.  PMID:  30540962;  PMCID:         
735 PMC6294286.  
736 31. Chen,  H.,  Lareau,  C.,  Andreani,  T. et  al.  Assessment  of  computational  methods  for  the  analysis  of                 
737 single-cell  ATAC-seq  data. Genome  Biol  20,  241  (2019).         
738 https://doi.org/10.1186/s13059-019-1854-5  
739 32. Ko,  M.E.,  Williams,  C.M.,  Fread,  K.I. et  al.  FLOW-MAP:  a  graph-based,  force-directed  layout              
740 algorithm  for  trajectory  mapping  in  single-cell  time  course  datasets. Nat  Protoc  15,  398–420              
741 (2020).    https://doi.org/10.1038/s41596-019-0246-3  
742 33. Wu  J.  L.,  Xu  Y.  Q.,  Xu  J.  J.,  Wei  X.  X.,  Chan  A.  C.  S.,  Tang  A.  H.  L.,  Lau  A.  K.  S.,  Chung  B.  M.                            
743 F.,  Cheung  Shum  H.,  Lam  E.  Y.,  Wong  K.  K.  Y.,  Tsia  K.  K.,  “Ultrafast  laser-scanning  time-stretch                  
744 imaging   at   visible   wavelengths,”   Light   Sci.   Appl.   6(1),   e16196   (2016).10.1038/lsa.2016.196  
745 34. Popescu  G,  Park  Y,  Lue  N,  Best-Popescu  C,  Deflores  L,  Dasari  RR,  Feld  MS,  Badizadegan  K.                 
746 Optical  imaging  of  cell  mass  and  growth  dynamics.  Am  J  Physiol  Cell  Physiol.  2008               
747 Aug;295(2):C538-44.   doi:   10.1152/ajpcell.00121.2008.   Epub   2008   Jun   18.   
748 35. Kyoohyun  Kim,  Jochen  Guck  The  Relative  Densities  of  Cytoplasm  and  Nuclear  Compartments             
749 Are  Robust  against  Strong  PerturbationBiophysical  Journal.  Volume  119,  Issue  10,  17  November             
750 2020,   Pages   1946-1957  
751 36. Kafri   R,   Levy   J,   Ginzberg   MB,   Oh   S,   Lahav   G,   Kirschner   MW.   Dynamics   extracted   from   fixed  
752 cells   reveal   feedback   linking   cell   growth   to   cell   cycle.   Nature.   2013   Feb   28;494(7438):480-3.   doi:  
753 10.1038/nature11897.   PMID:   23446419;   PMCID:   PMC3730528.  

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://doi.org/10.1038/s41597-019-0202-7
https://doi.org/10.1016/j.cell.2018.03.074
https://doi.org/10.1186/s13059-019-1854-5
https://doi.org/10.1038/s41596-019-0246-3
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

754 37. Park   SR,   Namkoong   S,   Friesen   L,   Cho   CS,   Zhang   ZZ,   Chen   YC,   Yoon   E,   Kim   CH,   Kwak   H,  
755 Kang   HM,   Lee   JH.   Single-Cell   Transcriptome   Analysis   of   Colon   Cancer   Cell   Response   to  
756 5-Fluorouracil-Induced   DNA   Damage.   Cell   Rep.   2020   Aug   25;32(8):108077.   doi:  
757 10.1016/j.celrep.2020.108077.   
758 38. Zangle   TA,   Teitell   MA.   Live-cell   mass   profiling:   an   emerging   approach   in   quantitative  
759 biophysics.   Nat   Methods.   2014   Dec;11(12):1221-8.   doi:   10.1038/nmeth.3175.   PMID:   25423019;  
760 PMCID:   PMC4319180.  
761 39. Tse   HT,   Gossett   DR,   Moon   YS,   Masaeli   M,   Sohsman   M,   Ying   Y,   Mislick   K,   Adams   RP,   Rao   J,  
762 Di   Carlo   D.   Quantitative   diagnosis   of   malignant   pleural   effusions   by   single-cell  
763 mechanophenotyping.   Sci   Transl   Med.   2013   Nov   20;5(212):212ra163.   doi:  
764 10.1126/scitranslmed.3006559.   PMID:   24259051.  
765 40. Otto,   O.,   Rosendahl,   P.,   Mietke,   A.    et   al.    Real-time   deformability   cytometry:   on-the-fly   cell  
766 mechanical   phenotyping.    Nat   Methods    12,   199–202   (2015).    https://doi.org/10.1038/nmeth.3281  
767 41. Kimmerling,   R.J.,   Prakadan,   S.M.,   Gupta,   A.J.    et   al.    Linking   single-cell   measurements   of   mass,  
768 growth   rate,   and   gene   expression.    Genome   Biol    19,   207   (2018).  
769 https://doi.org/10.1186/s13059-018-1576-0  
770 42. Traag,  V.A.,  Waltman,  L.  &  van  Eck,  N.J.  From  Louvain  to  Leiden:  guaranteeing  well-connected               
771 communities.   Sci   Rep   9,   5233   (2019).    https://doi.org/10.1038/s41598-019-41695-z  
772 43. Langville,  Amy  N.,  and  Carl  D.  Meyer.  Google's  PageRank  and  Beyond:  The  Science  of  Search                
773 Engine   Rankings.   Princeton   University   Press,   2006.  
774 44. Chung  F.,  Zhao  W.  (2010)  PageRank  and  Random  Walks  on  Graphs.  In:  Katona  G.O.H.,               
775 Schrijver  A.,  Szőnyi  T.,  Sági  G.  (eds)  Fete  of  Combinatorics  and  Computer  Science.  Bolyai               
776 Society   Mathematical   Studies,   vol   20.   Springer,   Berlin,   Heidelberg.   
777 45. van  Dijk  D,  Sharma  R,  Nainys  J,  et  al.  Recovering  Gene  Interactions  from  Single-Cell  Data                
778 Using   Data   Diffusion.   Cell.   2018;174(3):716-729.e27.   doi:10.1016/j.cell.2018.05.061  
779 46. Coifman,  R.  R.  et  al.  Geometric  diffusions  as  a  tool  for  harmonic  analysis  and  structure  definition                 
780 of   data:   diffusion   maps.   Proc.   Natl   Acad.   Sci.   USA   102,7426–7431   (2005).  
781 47. Haghverdi  L,  Büttner  M,  Wolf  FA,  Buettner  F,  Theis  FJ.  Diffusion  pseudotime  robustly              
782 reconstructs   lineage   branching.   Nat   Methods.   2016;13(10):845-848.   doi:10.1038/nmeth.3971  
783 48. Bergen,  V.,  Lange,  M.,  Peidli,  S.  et  al.  Generalizing  RNA  velocity  to  transient  cell  states  through                 
784 dynamical  modeling.  Nat  Biotechnol  38,  1408–1414  (2020).        
785 https://doi.org/10.1038/s41587-020-0591-3  
786 49. Zheng  GX,  Terry  JM,  et  al.  Massively  parallel  digital  transcriptional  profiling  of  single  cells.  Nat                
787 Commun.   2017   Jan   16;8:14049.   doi:   10.1038/ncomms14049.  
788 50. Aran   D   et   al.,    (2019).   “Reference-based   analysis   of   lung   single-cell   sequencing   reveals   a  
789 transitional   profibrotic   macrophage.”   Nat.   Immunol.,   20,   163-172  
790 51. Novershtern   N.   et   al.,   Densely   interconnected   transcriptional   circuits   control   cell   states   in   human  
791 hematopoiesis.   Cell.   2011   Jan   21;144(2):296-309.  
792 52. Stuart  T,  Butler  A,  Hoffman  P,  Hafemeister  C,  Papalexi  E,  Mauck  WM  3rd,  Hao  Y,  Stoeckius  M,                  
793 Smibert  P,  Satija  R.  Comprehensive  Integration  of  Single-Cell  Data.  Cell.  2019            
794 Jun13;177(7):1888-1902.e21.   doi:   10.1016/j.cell.2019.05.031.   Epub   2019   Jun   6.   

.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://doi.org/10.1038/nmeth.3281
https://doi.org/10.1186/s13059-018-1576-0
https://doi.org/10.1038/s41598-019-41695-z
https://doi.org/10.1038/s41587-020-0591-3
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/


/

795 53. Siu,  KCM  Lee,  MCK  Lo,  SV  Stassen,  M  Wang,  IZQ  Zhang,  HKH  So.  Deep-learning-assisted               
796 biophysical  imaging  cytometry  at  massive  throughput  delineates  cell  population  heterogeneity.           
797 Lab   on   a   Chip   20   (20),   3696-3708  
798 54. KCM  Lee,  M  Wang,  KSE  Cheah,  GCF  Chan,  HKH  So,  KKY  Wong,  KK  Tsia..  Quantitative                
799 phase  imaging  flow  cytometry  for  ultra‐large‐scale  single‐cell  biophysical  phenotyping.          
800 Cytometry   Part   A   95   (5),   510-520  
801 55. Wenwei  Yan  Jianglai  Wu  Kenneth  K.  Y.  Wong  Kevin  K.  Tsia,  A  high‐throughput  all‐optical               
802 laser‐scanning  imaging  flow  cytometer  with  biomolecular  specificity  and  subcellular  resolution,           
803 J.   Biophotonics   (2017)    https://onlinelibrary.wiley.com/doi/abs/10.1002/jbio.201700178  
804 56. F.  Chung  and  S.-T.  Yau,  Discrete  Green’s  Functions.  Journal  of  Combinatorial  Theory,  Series  A,               
805 91(1-2)   (2000),   pp.   191–214  
806 57. Van  den  Berge,  K.,  Roux  de  Bézieux,  H.,  Street,  K.  et  al.  Trajectory-based  differential  expression                
807 analysis   for   single-cell   sequencing   data.   Nat   Communications  
808 58. Yury  A.  Malkov,  D.  Yashunin.  Efficient  and  Robust  Approximate  Nearest  Neighbor  Search  Using              
809 Hierarchical  Navigable  Small  World  Graph,  Computer  Science,  Medicine,  Mathematics,  IEEE           
810 Transactions   on   Pattern   Analysis   and   Machine   Intelligence,   2020  

 
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted February 11, 2021. ; https://doi.org/10.1101/2021.02.10.430705doi: bioRxiv preprint 

https://onlinelibrary.wiley.com/doi/abs/10.1002/jbio.201700178
https://doi.org/10.1101/2021.02.10.430705
http://creativecommons.org/licenses/by-nc/4.0/