Computing the Riemannian curvature of image patch and single-cell RNA sequencing data manifolds using extrinsic differential geometry


Computing the Riemannian curvature of image patch and

single-cell RNA sequencing data manifolds using extrinsic

differential geometry

Duluxan Sritharan∗1,2, Shu Wang∗1,3, and Sahand Hormoz†2,4,5

1Harvard Graduate Program in Biophysics, Harvard University, Cambridge, MA, USA

2Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA

3Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, USA

4Department of Systems Biology, Harvard Medical School, Boston, MA, USA

5Broad Institute of MIT and Harvard, Cambridge, MA, USA

Abstract

Most high-dimensional datasets are thought to be inherently low-dimensional, that is, datapoints are

constrained to lie on a low-dimensional manifold embedded in a high-dimensional ambient space. Here we

study the viability of two approaches from differential geometry to estimate the Riemannian curvature

of these low-dimensional manifolds. The intrinsic approach relates curvature to the Laplace-Beltrami

operator using the heat-trace expansion, and is agnostic to how a manifold is embedded in a high-

dimensional space. The extrinsic approach relates the ambient coordinates of a manifold’s embedding

to its curvature using the Second Fundamental Form and the Gauss-Codazzi equation. Keeping in mind

practical constraints of real-world datasets, like small sample sizes and measurement noise, we found that

estimating curvature is only feasible for even simple, low-dimensional toy manifolds, when the extrinsic

approach is used. To test the applicability of the extrinsic approach to real-world data, we computed the

curvature of a well-studied manifold of image patches, and recapitulated its topological classification as a

Klein bottle. Lastly, we applied the approach to study single-cell transcriptomic sequencing (scRNAseq)

datasets of blood, gastrulation, and brain cells, revealing for the first time the intrinsic curvature of

scRNAseq manifolds.

∗Equal contribution
†To whom correspondence should be addressed (Sahand Hormoz@hms.harvard.edu)

1

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


1 Introduction

High-dimensional biological datasets have become prevalent in recent decades because of new technologies

such as high-throughput scRNAseq [1, 2, 3], mass cytometry [4, 5] and multiplex imaging [6, 7]. Interpre-

tation and visualization of such high-dimensional datasets have been challenging however, prompting the

development of tools for non-linear projection of datapoints onto 2 or 3 dimensions [8]. These tools, such as

IsoMAP [9], t-SNE [10] and UMAP [11], appeal to the ansatz that datapoints in a high-dimensional ambient

space are constrained to lie on a low-dimensional manifold. Unfortunately, determining the geometry of a

low-dimensional manifold from these visualizations is difficult, since many geometric properties are lost after

projecting onto 2 or 3 dimensions. For example, the cartographic projections used in an atlas to flatten

Earth’s curved surface tear apart continuous neighborhoods and non-uniformly stretch distances.

Fortunately, topology and differential geometry provide a wealth of concepts to characterize a manifold’s

shape directly without confounding projections. In particular, homology [12, 13] categorizes a manifold

according to the number of holes it contains, and the dimensionality of each hole (whereas for example, the

hole in a hollow sphere does not survive projection onto a 2-dimensional plane). Similarly, metrics [14] and

geodesics [9] determine shortest-distance paths between pairs of points on a manifold without any distortion

from a projection (whereas for example, most atlases exaggerate distances at the poles). Curvature [15] is

a local manifold property that quantifies the extent to which a manifold deviates from the tangent plane

at each point p. Projecting a manifold onto a plane for visualization destroys this property by definition.

Recent methods have emerged for estimating homology [16, 17], metrics [14] and geodesics [18] from noisy,

sampled data, with accompanying statistical guarantees [18, 19, 20]. These methods have been applied to

analyze images [21, 22] and biological datasets [23, 24]. However, estimating curvature has received less

attention although it is fundamental to quantifying geometry.

Curvature arises from two sources. On the one hand, a manifold itself can be curved, resulting in

Riemannian or intrinsic curvature. A sphere has intrinsic curvature because it cannot be flattened so that

all geodesics on its surface correspond to straight lines on a Euclidean plane (see Figure 1A). On the other

hand, the embedding of a manifold in an ambient space can give rise to extrinsic curvature, a property that

is not inherent to the manifold itself. For example, a scroll has extrinsic curvature because it is formed

by rolling a piece of parchment, but the parchment itself is not inherently curved (see Figure 1B). It is

important to note that both types of curvature scale inversely with the global length scale (L) associated

with a manifold. It is for this reason that a marble (L ≈ 1 cm) is visibly round, but the Earth (L ≈ 10, 000

km) is still mistaken by some to be flat. Since intrinsic curvature is an inherent property of a manifold, while

2

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


A

B

C

D

Intrinsic 
(Riemannian)

Curvature

Extrinsic
Curvature

Intrinsic 
Differen�al 
Geometry

Extrinsic 
Differen�al 
Geometry

z = ± 1 − x 2 − y2

Figure 1: Riemannian curvature is an intrinsic property of a manifold while extrinsic curvature depends on
the embedding.
(A) (Left) N = 104 points uniformly sampled from the 2-dimensional hollow unit sphere, S2, embedded in the 3-dimensional
ambient space R3, colored according to the z-coordinate. S2 has Riemannian or intrinsic curvature because there is no projection
onto 2-dimensional Euclidean space that preserves geodesic (shortest-path) distances. (Right) For example, a stereographic
projection using the point z = (0, 0, 1) and the plane z = 0 introduces distortions since the geodesic distance between any pair
of points in the lower hemisphere is (non-uniformly) larger than the Euclidean distance in this projection.
(B) (Left) N = 104 points uniformly sampled from a scroll, which is also a 2-dimensional manifold embedded in R3. The
scroll has extrinsic curvature because it curls away from the tangent plane at any point. (Right) However, it does not have
intrinsic curvature, because it can be projected onto 2-dimensional Euclidean space in a way that preserves geodesic distances,
by unfurling.
(C) Intrinsic differential geometry treats manifolds as self-contained objects that can be described using only intrinsic coor-
dinates, which do not depend on any embedding or ambient space. One possible set of intrinsic coordinates for S2 are polar
coordinates, where θ1 and θ2 are the azimuthal and elevation angles respectively. While this representation superficially resem-
bles the unfurled scroll in (B), distances in this plane are non-Euclidean. Any line segment along θ2 = ±π2 has zero length for
example.
(D) Extrinsic differential geometry defines manifolds in the coordinate system of the ambient space, which requires a privileged
vantage point off the manifold itself. Both intrinsic and extrinsic differential geometry can be used to compute intrinsic
curvature, whereas only extrinsic differential geometry can be used to compute extrinsic curvature (as indicated by the black
arrows).

3

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


extrinsic curvature is incidental to an embedding, we will restrict our attention to the former.

A precise description of intrinsic curvature is provided by the Riemannian curvature tensor, Rlkij(p).

For a given basis {v}, this tensor quantifies how much a vector initially pointing in direction vk is displaced

in direction vl after parallel transport around an infinitesimal parallelogram defined by directions vi and vj.

The simplest intrinsic curvature descriptor is scalar curvature, S(p), which is formed by contracting Rlkij(p)

to a scalar quantity, as its name suggests. When S(p) is greater (less) than 0, the sum of the angles of a

triangle formed by connecting three points near p by geodesics is greater (less) than π. Likewise, when S(p)

is greater (less) than 0, a small ball centred at p has a smaller (larger) volume than a ball of the same radius

in Euclidean space. We furnish toy examples in the main text to provide stronger intuition for this quantity.

In theory, intrinsic curvature can be equivalently computed using tools from either one of the two branches

of differential geometry. Intrinsic differential geometry makes no recourse to an external vantage point off a

manifold, just as the polygonal characters in Edwin Abbot’s classic Flatland [25] were confined to traversing

in R2, and found the notion of R3 unfathomable. In this branch, a manifold is therefore represented in

intrinsic coordinates, which are agnostic to any ambient space or embedding. A hollow sphere represented

in polar coordinates and k-nearest neighbor (kNN) graph representations of a dataset, for instance, are in

this spirit (see Figure 1C). Conversely, in extrinsic differential geometry, a manifold is treated as a surface

embedded in an ambient space, and is represented in ambient coordinates (see Figure 1D). The surface of

an organ is parameterized this way, for example, in a surgical robot suturing an incision.

In this work, we explore two approaches for estimating intrinsic curvature based on these twin views,

keeping in mind practical limitations of real-world datasets, which may be comprised of a relatively small

number of noisy measurements. The first approach uses the Laplace-Beltrami operator, which is well-studied

in previous applications of differential geometry to data analysis [14, 26, 27, 28, 29], and is theoretically

appealing as an intrinsic quantity that is embedding-invariant. However, we find that this approach cannot

accurately estimate even average scalar curvature on the simplest of low-dimensional toy manifolds for small

sample sizes, despite the history and ubiquity of the Laplace-Beltrami operator in geometric data analysis.

Meanwhile, the second approach uses the Second Fundamental Form and the Gauss-Codazzi equation [15],

identities that rely on information from the ambient space. We find that this extrinsic approach is not only

more robust to small sample sizes and noise, but permits computation of the full Riemannian curvature

tensor, though we focus on the scalar curvature for simplicity. Using these insights, we developed a software

package to compute the scalar curvature (and associated uncertainty) at each sampled point on a manifold,

and applied this tool to investigate the curvature of image and scRNAseq datasets.

4

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


2 Results

2.1 Estimators of the Laplace-Beltrami Operator Yield Inaccurate Scalar Cur-

vatures

Intrinsic differential geometry treats a d-dimensional manifold, M, as a self-contained object and is agnostic

to how M may be represented in ambient coordinates due to any particular embedding (see Figure 1C).

Conceptually, this is accomplished by only considering M as a collection of local, overlapping neighborhoods.

The geometry of these neighborhoods is encoded using tools such as the Laplace-Beltrami operator, ∆M ,

which captures diffusion dynamics across neighborhoods. For most practical applications, we do not have

direct access to M but instead to a finite number (N) of points sampled from M. For these cases, estimators

of ∆M are used instead. These estimators are well-studied [14, 26, 27, 28, 29], and the convergence rates of

some have been characterized [30].

The scalar curvature averaged across M, has a well-known connection to ∆M via the heat-trace expan-

sion [27, 31], which relates the eigenvalues, λk, of ∆M to the geometry of M:

Z(t) ≡
∞∑
k=1

e−λkt = (4πt)−
d
2

(
n∑
i=0

cit
i
2 + o(t

n+1
2 )

)
, λk ≤ λk+1 (1)

The first few coefficients, ci, are given by [27]:

c0 =

∫
M

dM,

c1 = −
√
π

2

∫
∂M

d(∂M),

c2 =
1

6

∫
M

S dM −
1

6

∫
∂M

J d(∂M)

(2)

where ∂M is the boundary of the manifold and J is the mean curvature on ∂M. Recall that S is the

point-wise scalar curvature. By inspection, c0 is the volume, c1 is proportional to the area, and c2 is directly

related to the average scalar curvature.

We reasoned that if the average scalar curvature cannot be accurately computed for a manifold with

constant scalar curvature using these relations, then computing the point-wise scalar curvature for more

complex manifolds is intractable. To investigate this, we considered the 2-dimensional hollow unit sphere,

S2, for which the true scalar curvature is S(p) = 2 ∀p ∈ M, and uniformly sampled N = 104 points to mirror

the typical size of current scRNAseq datasets (see Figure 1A; Methods Section 4.4.1.1).

Since common estimators of ∆M only yield as many eigenvalues as datapoints (N), we cannot compute

5

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


the infinite set of eigenvalues needed in Equation 1. Therefore, we introduced a truncated series with m

eigenvalues, zm(x), where we have substituted x =
√
t and divided through by the prefactor in the RHS of

Equation 1 to isolate for ci, following the approach in [27]:

zm(x) = (4π)
d/2xd

m∑
k=1

e−λkx
2

(3)

The scalar curvature can then be approximated by fitting the truncated series, zm(x), to a second-order

polynomial, p2(x), over intervals of small x:

zm(x) ≈ p2(x), where

p2(x) = c0 + c1x + c2x
2

(4)

We estimated ∆M using the N sampled points (see Methods Section 4.2.5), substituted the eigenvalues of

the estimate into Equation 3, and numerically fit zm(x) to p2(x) (see Figure S1A-G; Methods Section 4.2.1).

We obtained the scalar curvature by inspecting the resulting c2 coefficient, and compared the result to the

true value of 2. We found that the scalar curvature was always over-estimated (S > 3) regardless of m, the

number of eigenvalues used in the truncated series (see Methods Section 4.2.3), or the choice of estimator

for ∆M (see Methods Section 4.2.5). We identified the poor convergence of the estimated eigenvalues of ∆M

as the source of error (see Methods Section 4.2.4) and found that at least N ≈ 107 points are required to

reduce the error to ±0.5, so that S ≈ 2.5 (see Figure S1H).

Therefore, despite the prevalence of the Laplace-Beltrami operator in geometric data analysis, our exam-

ple shows that an intrinsic approach relying on the operator is not practical for computing scalar curvatures.

Even for noise-free datapoints uniformly sampled from S2, the sample size needed to compute average scalar

curvature accurate to ±0.5 is several orders of magnitude greater than what is typically feasible in current

scRNAseq experiments. Noise and non-uniform sampling would confound the issue further. Most impor-

tantly, we would eventually like to compute local values of S(p) ∀p ∈ M, but this approach failed to correctly

recover even average scalar curvature, which one might have expected to be feasible. To find an alternative

approach, we next considered tools from extrinsic differential geometry.

2.2 Curvature Can Be Computed Accurately Using the Second Fundamental

Form

In extrinsic differential geometry, a manifold is described in the coordinates of the ambient space in which it

is embedded, usually Rn (see Figure 1D). Since the shape of the sphere in Figure 1A is visually unambiguous

6

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


to the eye (thanks to its extrinsic view from a vantage point off the manifold), we reasoned that an extrinsic

approach would be more fruitful.

A d-dimensional manifold, M, embedded in Rn can be described at each point p in terms of a d-

dimensional tangent space, TM (p), and an (n − d)-dimensional normal space, NM (p), as shown in Fig-

ure 2A. Given orthonormal bases for TM (p) and NM (p), points in the neighborhood of p can be expressed

as Y = [t1, ..., td,n1, ...,nn−d] where ti is Y ’s coordinate along the i
th basis vector of TM (p) and nk is Y ’s

coordinate along the kth basis vector of NM (p). The nks can then be locally approximated as functions of

the tis i.e. nk ≈ fk(t1, ..., td) as shown in Figure 2B.

The Riemannian curvature of M is related to the quadratic terms in the Taylor expansion of each fk with

respect to the tis. Specifically, the Second Fundamental Form of M, h
k
ij, gives the second-order coefficient

relating each fk to the quadratic term titj [32]:

hkij(p) =
∂2fk
∂ti∂tj

∣∣∣∣
p

(5)

The Riemannian curvature tensor is related to the Second Fundamental Form according to the Gauss-Codazzi

equation [15]:

Rijkl = (h
α
jkh

β
il −h

β
jih

α
kl)gαβ (6)

where gαβ is the metric of the ambient space, which we take to be the usual Euclidean metric δα,β going

forward. The scalar curvature can be obtained by contracting the Riemannian curvature tensor:

S =
∑
i,j

Rijij (7)

This suggests a conceptually simple procedure to estimate the scalar curvature of a data manifold at

each point p: (i) estimate TM (p) and NM (p), (ii) determine h
k
ij(p) in local coordinates, (iii) compute S using

Equations 6 and 7. We developed a computational tool that provides an implementation of this procedure.

Briefly, given a set of datapoints {X} ∈ Rn and manifold dimension d, a neighborhood around each point

p is selected to be the n-dimensional ball centred on p of radius r encompassing Np(r) points (see Methods

Section 4.3.2). For each point p, Principal Component Analysis (PCA) [33] is performed on the Np(r) points

in its neighborhood, and the first d (last n−d) Principal Components (PCs) accounting for the most (least)

variance are taken as an orthonormal basis for TM (p) (NM (p)). The normal coordinates, nk, of the Np(r)

points in each neighborhood are fit by regression to a quadratic model in terms of the tangent coordinates,

ti, to obtain h
k
ij(p) with associated uncertainties (see Figure 2B; Methods Section 4.3.1).

7

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


The choice of r(p) is an important one since it sets the length scale at which curvature is computed

for point p (see Methods Section 4.3.5). Our tool allows interrogation of curvature at any length scale of

interest by allowing the user to manually set r(p), a feature we use to inspect real-world datasets later in

the paper. However, since the local geometry of the manifold may be non-trivial and unknown a priori,

we also provide the ability to set r(p) according to statistical rather than geometric principles. Specifically,

our tool algorithmically chooses r at each p so that the uncertainty in hkij(p) from regression is less than a

user-specified global parameter, σh (see Methods Section 4.3.2). Since a larger number of points reduces the

uncertainty in regression, a smaller σh requires a larger r(p) ∀p ∈ M. This strategy of setting σh therefore

allows neighborhood sizes to dynamically vary over the manifold based on the local density of the data,

which means the algorithm can gracefully handle non-uniform sampling of the manifold. The choice of σh

will depend on the global length scale, L, of the datapoints (see Methods Section 4.3.5), the average density

of sampled points, and of course, the desired uncertainty in the estimates of hkij. These uncertainties are in

turn used to compute a standard error, σS, accompanying the scalar curvature estimate at each point, using

standard error propagation formulas (see Methods Section 4.3.4). We specify σh instead of σS as the global

parameter for choosing neighborhood sizes, since the latter depends non-linearly on the values of hkij(p),

which makes determining r(p) more difficult.

Our algorithm also computes a goodness-of-fit (GOF) p-value at each p by comparing residuals from

regression against a normal distribution to quantify how well the normal coordinates are fit by a quadratic

function (see Methods Section 4.3.3). We tested this p-value at significance level α = 0.05, declaring fits to be

poor when the residuals are significantly non-Gaussian. The p-value can be disregarded if the neighborhood

size is manually specified to be larger than a length scale for which a quadratic fit is appropriate. However,

when σh is specified instead, a uniform distribution of these p-values over M indicates that the desired

uncertainty results in neighborhoods that are well-approximated using quadratic regression. We adopted

this heuristic when choosing σh for the datasets studied in this paper (see Methods Section 4.4.3, 4.5.7 and

4.6.4). The software is available at https://gitlab.com/hormozlab/ManifoldCurvature.

We first applied our algorithm to compute scalar curvatures for the same N = 104 points uniformly

sampled from S2 for which the intrinsic approach failed (see Figure 2C; Methods Section 4.4.1.1). The

algorithm yielded scalar curvature estimates at each point with mean error −0.17 (computed by averaging

the difference between the point-wise scalar curvature estimates and the ground truth value of 2 across all

points) using neighborhoods that only contained Np(r) ≈ 102 points. This is already superior to the intrinsic

approach, which failed to compute even average scalar accurate to ±1 for the same sample size. The non-zero

value of the mean error indicates that our estimator is biased. The values of hkij are not biased because they

8

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://gitlab.com/hormozlab/ManifoldCurvature
https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


C D

B E F

I

G H

Figure 2: Scalar curvature is accurately estimated using the Second Fundamental Form and the Gauss-Codazzi
equation.
(A) A hypothetical manifold (shown in grey) from which datapoints are sampled (shown as colored dots). The manifold at any
given point p (shown in red) can be decomposed into a tangent space TM (p) (the cyan plane) and a normal space NM (p) (the
cyan line). Points in the neighborhood around p (shown in green) can be expressed in terms of orthonormal bases for TM (p)
and NM (p) (see (B) below).
(B) The set of points in the neighborhood of p (shown as green dots in (A)) are represented here in local tangent (t1, t2)
and normal (n1) coordinates, corresponding to orthonormal bases for TM (p) and NM (p) respectively. Coloring corresponds
to magnitude in the normal direction. The normal coordinates (n1) can be locally approximated as a quadratic function (the
translucent surface) of the tangent coordinates (t1, t2), according to the Second Fundamental Form, h

k
ij.

(C) Scalar curvatures computed using the extrinsic approach for N = 104 points uniformly sampled from the 2-dimensional
hollow unit sphere, S2. The true value is 2 at all points on the manifold. See Methods Section 4.4.1.1.
(D) Scalar curvatures (S) computed in (C) are plotted against their associated standard errors (σS). Points enclosed by the
red lines have a 95% confidence interval (CI), computed as S ± 2σS, containing the true value of 2.
(E) As in (C) but for N = 104 points uniformly sampled from a one-sheet hyperboloid, H22 , which is also a 2-dimensional
manifold. Due to the radial symmetry of the manifold, scalar curvature only varies only along the z-direction. See Methods
Section 4.4.1.2.
(F) Scalar curvatures (black) computed in (E) with their associated 95% CIs (shown in grey) plotted as a function of the
z-coordinates of the datapoints. The true value is shown as a dashed red line.
(G) As in (C) but for N = 104 points uniformly sampled from a 2-dimensional ring torus, T2. T2 is constructed by revolving a
circle parameterized by θ, oriented perpendicular to the xy-plane, through an angle φ around the z-axis. The scalar curvature
only depends on the value of θ. See Methods Section 4.4.1.3.
(H) Scalar curvatures computed in (G) with their associated 95% CIs plotted as a function of the θ values of the datapoints.
Colors as in (F).
(I) Distribution of computed scalar curvatures for N = 104 points uniformly sampled from the d-dimensional unit hypersphere,
Sd, for d = 2, 3, 5, 7. As with S2, these manifolds are isotropic and have constant scalar curvature. The true values are shown
as dashed red lines. See Methods Section 4.4.1.1.

9

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


are obtained using regression. Even so, the components of the Riemannian curvature tensor, Rijkl, may

still be biased because they are non-linear functions of hkij. Note that for S
2, this bias is the same across

all datapoints (because of the isotropic nature of the manifold) and therefore results in a systematic under-

estimation of scalar curvature (see Figure 2C; Methods Sections 4.3.4). We also computed 95% confidence

intervals (CI) for our estimates as S ± 2σS, and despite the mean error, 73% of points still reported a 95%

CI containing the true value of 2 (see Figure 2D).

We next tested our algorithm on a 2-dimensional manifold with negative scalar curvature, by uniformly

sampling N = 104 points from the one-sheet hyperboloid, H22 (see Figure 2E; Methods Section 4.4.1.2).

Here, 71% of points reported a 95% CI containing the true scalar curvature (see Figure 2F). Lastly, we

considered the 2-dimensional ring torus, T 2 (see Figure 2G; Methods Section 4.4.1.3). As a manifold with

regions of positive, zero, and negative scalar curvature, T 2 is a useful toy model for understanding more

complex 2-dimensional manifolds and gaining intuition for higher-dimensional manifolds. In 2 dimensions,

regions of a manifold with positive scalar curvature (θ = 0, 2π in Figure 2H) are dome-shaped, regions with

zero scalar curvature (θ = π
2
, 3π

2
in Figure 2H) are planar, and regions with negative scalar curvature (θ = π

in Figure 2H) are saddle-shaped. We applied our tool to N = 104 points uniformly sampled from T 2 and

found that 88% of points reported a 95% CI containing the true scalar curvature (see Figure 2H).

To test the applicability of our algorithm to higher-dimensional manifolds, we uniformly sampled N = 104

points from unit hyperspheres, Sd, and found that 90%, 84% and 78% of points reported a 95% CI containing

the true scalar curvature for d = 3, 5 and 7 respectively (see Figure 2I; Methods Section 4.4.1.1). The number

of terms, hkij, in the Second Fundamental Form grows as d
2. For larger d, a greater number of datapoints

and hence larger neighborhoods are needed for regression, but these are no longer well-approximated by

quadratic fits according to our GOF measure. More generally, higher-dimensional manifolds require a higher

density of data to estimate scalar curvatures accurately.

We additionally characterized how our algorithm performed when datapoints were non-uniformly sampled

(see Figure S2A; Methods Section 4.4.2.1) or convoluted by observational noise (see Figure S2B; Methods Sec-

tion 4.4.2.2), when the dimension of the ambient space was large (see Figure S2C; Methods Section 4.4.2.3),

and when the specified manifold dimension differed from the ground truth (see Figure S2D; Methods Sec-

tion 4.4.2.4). We found that the algorithm is robust to non-uniform sampling, large ambient dimension

and small observational noise, and provides signatures indicating when the manifold dimension may be mis-

specified. However, when the noise scale is large, the resulting manifold is no longer trivially related to the

noise-free manifold, consistent with existing literature [34, 35, 36, 37], so that scalar curvature cannot be

accurately computed. Lastly, we note that since the full Riemannian curvature tensor is computed as an

10

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


intermediate step in our algorithm, more intricate geometric features in the data can also be analyzed using

our tool, though we defer such investigation to future studies.

Taken together, these examples demonstrate the utility of the algorithm in recovering curvature with

specified uncertainties for manifolds with positive and/or negative scalar curvature. Next, we tested our

algorithm on real-world data.

2.3 Curvature of Image Patch Manifold is Consistent with a Noisy Klein Bottle

Pixel intensity values in images of natural scenes are not independently or uniformly distributed. Understand-

ing the statistics of such images is important for designing compression algorithms [38] and for addressing

challenges in the field of computer vision such as segmentation [39]. Lee et al. discovered that 3x3-pixel

patches extracted from greyscale images of natural scenes, whose pixels have high-contrast (i.e. the differ-

ences between the intensity values of adjacent pixels in a patch are large), are not uniformly distributed in

R9, but are instead concentrated on a low-dimensional manifold [40]. This is because high-contrast regions

in a natural scene usually correspond to the edges of objects in the scene. High-contrast image patches

consequently tend to be comprised of gradients and not simply random speckle. Subsequent work using

topological data analysis revealed that after appropriate normalization (which takes image patches from R9

to S7 ∈ R8, so that the global length scale is L = 1; see Methods Section 4.5.2), dense regions of high-contrast

image patches have the same homology as a 2-dimensional manifold called a Klein bottle [21].

A Klein bottle, K2, is a canonical manifold typically introduced in the context of orientability, where it

is often visualized in R3 (as shown in Figure 3A) to highlight that it is non-orientable. From a topological

perspective, K2 is a manifold parameterized by θ,φ ∈ [0, 2π] as shown in Figure 3B in which vertical edges

are defined to be θ = 0 and θ = π, and horizontal edges are defined to be φ = 0 and φ = 2π. To make a closed

surface, the vertical (horizontal) edges are glued together according to the red (blue) arrows in Figure 3B. K2

is therefore 2π-periodic in φ, since a point corresponding to θ on the bottom horizontal edge (φ = 0) is the

same as the point corresponding to θ on the top horizontal edge (φ = 2π). Similarly, a point corresponding

to φ on the left vertical edge (θ = 0) is the same as the point corresponding to 2π −φ on the right vertical

edges (θ = π). In short, points on K2 obey the similarity relation (θ,φ) ∼ (θ + π, 2π − φ). K2 captures

the dominant features in high-contrast image patches because θ can be treated as a parameter controlling

rotation and φ as a parameter controlling the relative contribution of linear vs. quadratic gradients (see

Figure 3B).

An embedding of K2 into R9 with an analytical form, k0, was proposed by Carlsson et al. in [21] to model

image patches (see Equation 31 in Methods Section 4.5.3). This embedding takes points from (θ,φ) into

11

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


image patches in R9 as shown in Figure 3B. For example, θ = 0 (θ = π
2

) corresponds to patches with vertical

(horizontal) stripes and φ = π
2
, 3π

2
(φ = 0,π) corresponds to patches with linear (quadratic) gradients. As θ

increases, stripes in the image patches are rotated clockwise. As φ increases, image patches oscillate between

having quadratic and linear gradients. Importantly, the image patches constructed by this embedding obey

the same similarity relation (θ,φ) ∼ (θ+π, 2π−φ) topologically required of a Klein bottle. Whereas Carlsson

et al. studied the global topology of image patches using this embedding, here we study their local geometry

instead.

First, we analytically calculated the scalar curvature of k0 as a function of (θ,φ) as shown in Figure 3C

(see Methods Section 4.1). Next, we used our algorithm to compute the scalar curvature on a data manifold

of N ≈ 4.2×105 high-contrast 3x3-pixel image patches randomly sampled from the same van Hateren dataset

used to propose k0 (see Methods Section 4.5.2). We picked σh so that the distribution of GOF p-values was

flat, and fixed this value for all subsequent simulations (see Methods Section 4.5.7). To visualize the results,

we associated each image patch to its closest point on k0 (see Methods Section 4.5.4), and plotted the scalar

curvatures on the resulting (θ0,φ0) coordinates (see Figure 3D). Most image patches map to φ =
π
2
, 3π

2

or θ = 0, π
2

because linear gradients (of any orientation) and quadratic gradients that are vertically or

horizontally oriented are the dominant features in the data as previously reported [21, 40].

The scalar curvatures computed for the image patches did not match the analytical scalar curvature

of k0 (cf. Figures 3C and 3D). To identify the cause of this discrepancy, we first validated our algorithm

by computing scalar curvatures on the set of N ≈ 4.2 × 105 (θ0,φ0) points on k0 associated with the

image patches (see Figure 3E); we found close agreement with the analytical calculation (75% of points

reported a 95% CI containing the true scalar curvature). Next, observing that the neighborhood sizes used

for computing the scalar curvature of image patches were larger than those used for computing the scalar

curvature of the associated (θ0,φ0) points (cf. Figures S3A and S3B), we recomputed the scalar curvatures

of these (θ0,φ0) points, but now with the same neighborhood sizes used for the image patches. The results

agreed with the analytical calculation, but still did not match the scalar curvatures computed for the image

patches (see Figure S3C).

Having ruled out these two possibilities, we hypothesized that the discrepancy was caused by fluctuations

in the positions of the image patches with respect to the (θ0,φ0) points on the k
0 manifold (real image patches

are noisy and the Klein bottle embedding is only an idealization). We found that adding isotropic Gaussian

noise of increasing magnitude in R9 to the set of (θ0,φ0) points on k0 indeed resulted in scalar curvatures

that resemble the data (see Figure 3F; Methods Section 4.5.6). The best agreement between the scalar

curvatures of the image patches and the noisy (θ0,φ0) points was achieved when the magnitude of noise was

12

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


A

B

C D E

F

G

H I J

Figure 3: Scalar curvature computed for image patches is consistent with that of a Klein bottle with added
isotropic Gaussian noise.
(A) The Klein bottle, K2, is a 2-dimensional manifold shown here in R3.
(B) k0 is an analytical embedding given by Carlsson et al. in [21] relating parameter values θ,φ ∈ [0, 2π] to 3x3-pixel patches
of greyscale images (see Equation 31 in Methods Section 4.5.3). θ controls the rotation of stripes in the image patches and
φ determines the relative contribution of linear vs. quadratic gradients. Importantly, as shown in the figure, this embedding
has boundary conditions consistent with the topology of a Klein bottle (depicted by the blue/red arrows). In particular, the
embedding produces image patches that obey the similarity relation (θ,φ) ∼ (θ + π, 2π −φ). Adapted from Figure 6 of [21].
(C) The analytical scalar curvature of k0 (derived as described in Methods Section 4.1).
(D) Scalar curvatures computed for N ≈ 4.2 × 105 high-contrast 3x3-pixel patches sampled from the greyscale images in the
van Hateren dataset [41] are plotted here as a function of (θ0,φ0), the parameter values of the closest point on k

0 associated
with each image patch (see Methods Section 4.5.4).
(E) Scalar curvatures computed for the set of N ≈ 4.2 × 105 closest points on k0 associated with the image patches. Note the
close correspondence with Figure 3C, indicating that our algorithm correctly recapitulates the analytical scalar curvature.
(F) As in (E), but after adding isotropic Gaussian noise in R9 to the set of closest points on k0 (see Methods Section 4.5.6).
Left to right corresponds to increasing levels of noise, σ = 0.007, 0.01, 0.03.
(G) The distribution of Euclidean distances in R8 between each image patch and its closest point on k0 is shown in blue. The
distribution of distances to k0 after adding Gaussian noise to these closest points on k0 is also shown.
(H) k1 is the analytical embedding from θ,φ ∈ [0, 2π] to R9 that minimizes the sum of Euclidean distances from the image
patches to the closest point on the embedding (see Methods Section 4.5.5). Each of the N ≈ 4.2 × 105 image patches was
associated to its closest point on k1, given by parameter values (θ1,φ1) (see Methods Section 4.5.4). Scalar curvatures computed
on this set of N ≈ 4.2 × 105 points on k1 are shown.
(I) The same scalar curvatures computed for the image patches and visualized on (θ0,φ0) coordinates in (D), are shown here
plotted on (θ1,φ1) coordinates.
(J) Scalar curvatures computed for a densely sampled manifold comprised of the full set of N ≈ 1.3×108 high-contrast 3x3-pixel
image patches in the van Hateren image dataset (see Methods Section 4.5.2), visualized on (θ1,φ1) coordinates.

13

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


σ = 0.03. Notably, in this case, the median Euclidean distance of the noisy (θ0,φ0) points to k
0 was 0.132,

which is comparable to 0.148, the median Euclidean distance of the image patches to k0 (see Figure 3G).

Furthermore, the neighborhood sizes chosen by our algorithm when σ = 0.03 (see Figure S3A) matched

those chosen for the image patches (see Figure S3B).

To find an embedding of the Klein bottle that might better explain the scalar curvature of the image

patches without needing to add noise, we incorporated higher-order terms to k0 (see Methods Section 4.5.3).

The coefficients for the higher-order terms were determined by fitting the data, resulting in a new embedding,

which we refer to as k1 (see Methods Section 4.5.5). The median Euclidean distance of the image patches

to k1 was 0.115 versus 0.148 to k0. As was done for k0, we associated each image patch to its closest

point (θ1,φ1) on k
1, and used our algorithm to compute the scalar curvature of these (θ1,φ1) points (see

Figure 3H). Despite the reduction in the median Euclidean distance of images patches to the embedding,

the scalar curvature of k1 was even less similar to that of the image patches (visualized in Figure 3I on these

new (θ1,φ1) coordinates for k
1) than was the scalar curvature of k0; the range of scalar curvature values for

k1 was much larger than for either the image patches or k0, and the scalar curvature fluctuates on smaller

length scales.

Lastly, we reasoned that there might be fine-scale scalar curvature fluctuations in the image patches that

are masked by the larger neighborhood sizes used to compute scalar curvature for the image patches (see

Figure S3B) relative to k1 (see Figure S3D). To decrease the neighborhood sizes chosen by the algorithm

for the same σh, we augmented the image patch dataset using the full set of N ≈ 1.3 × 108 datapoints from

the van Hateren dataset (see Methods Section 4.5.2). This resulted in neighborhood sizes comparable to

those determined for k1 (cf. Figures S3D and S3E), but failed to recapitulate the fine-scale scalar curvature

fluctuations observed in k1 (see Figure 3J). As a sanity check, we confirmed that the scalar curvature of the

augmented image patch dataset matched that of the original image patch dataset, when computed using

the same neighborhood sizes as the latter (see Figure S3F). Therefore, including higher-order terms in the

embedding does not yield scalar curvatures that better agree with the data. Taken together, our analysis

of curvature suggests that the image patch dataset can be best modelled by adding noise to the simplest

embedding, k0.

Having applied our algorithm on real-world manifold-valued data that is well-modelled by an analyti-

cal embedding, we next turned our attention to scRNAseq datasets, which are generally regarded as low-

dimensional manifolds and have no known analytical form.

14

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


2.4 scRNAseq Datasets have Non-Trivial Intrinsic Curvature

In scRNAseq datasets, each datapoint corresponds to a cell, and each coordinate to the abundance of a

different gene. Here we consider the data manifold after basic preprocessing and linear dimensionality

reduction using PCA (see Methods Section 4.6.1). Since many common analyses in the field such as clustering,

visualization, and inference of cell differentiation trajectories are performed in this reduced space, it is natural

to compute curvature in this space as well. We set the ambient dimension, n, to be the number of PCs needed

to explain 80% of the variance. The manifold dimension, d, for scRNAseq datasets is not well-defined and

needs to be chosen heuristically. As a simple heuristic, we specified d as the number of PCs needed to

explain 80% of the variance in the ambient space i.e. 64% of the original variance (we show later that our

computations are relatively insensitive to the choice of d).

We considered three datasets. The first consists of N ≈ 104 peripheral blood mononuclear cells (PBMCs)

collected from a healthy human donor [42]. The second is a gastrulation dataset comprised of N ≈ 1.2×105

cells pooled from 9 embryonic mice sacked at 6-hour intervals from embryonic day 6.5 to 8.5 [43]. The final

dataset is a benchmark in the field consisting of N ≈ 1.3 × 106 brain cells pooled from 2 embryonic mice

sacked at embryonic day 18 [44]. Refer to Figures S4A, S5A and S6A for cell type annotations for the three

datasets.

The PBMC dataset is characteristic of the sample size of current scRNAseq data. The other two are

larger than most scRNAseq datasets, and we included these to verify if geometric features seen in the first

dataset can be reproduced for more densely sampled manifolds. For the PBMC, gastrulation and brain

datasets, the ambient (manifold) dimensions were determined to be 8, 11 and 9 (3, 3 and 5) respectively,

according to the aforementioned heuristic (see Methods Section 4.6.4). For all three datasets, the global

length scale happened to be L ≈ 20 (see Methods Sections 4.3.5). As before, we picked σh for each dataset

according to the distribution of GOF p-values (see Figures S4B, S5B and S6B; Methods Section 4.6.4).

We visualized the computed scalar curvatures on standard plots employed in the field (UMAP and t-

SNE; shown in Figure 4A,D,G) and observed non-trivial scalar curvature for all three datasets. We found

statistically significant correlations between the scalar curvature reported by each point and its kNN for

k ≤ 250 (ρPearson = 0.58, 0.18 and 0.38 for the PBMC, gastrulation and brain datasets respectively at

k = 250, p < 10−6; see Figures S4C, S5C and S6C), indicating that our algorithm yields scalar curvatures

that vary continuously over the data manifolds. By plotting scalar curvatures against their standard errors,

σS, we verified that regions with non-zero scalar curvature are statistically significant (see Figure 4B,E,H). As

a consistency check, we confirmed that the percentage of points with 95% CIs containing the scalar curvatures

reported by their respective kNNs (i) decayed with increasing k for k ≤ 250, and (ii) was significantly larger

15

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


than expected by chance (67%, 72% and 61% for the PBMC, gastrulation and brain datasets respectively

at k = 250, p < 0.001; see Figures S4D, S5D and S6D; Methods Section 4.6.3.1).

To rule out the possibility that localization of non-zero scalar curvature in certain regions of the UMAP/t-

SNE plots is an artifact caused by other features of the data that are also localized, we considered several

factors. First, we plotted the GOF p-value at each point on UMAP/t-SNE coordinates and noted that

poor GOFs were not localized on the data manifolds, let alone to regions of non-zero scalar curvature (see

Figures S4B, S5B and S6B). Therefore, the computed scalar curvatures are not due to poor fits.

Next, we plotted the neighborhood size, r(p), used for fitting and observed that in some regions, non-zero

scalar curvatures seemed to correspond to small r (see Figures S4E, S5E and S6E). Since σh is fixed, these

regions necessarily have a larger number of neighbors Np(r) and are hence more dense (see Figures S4F,

S5F and S6F). To rule out the possibility that the non-zero scalar curvatures were an artifact of smaller

neighborhood size, we recomputed the scalar curvature at three fixed neighborhood sizes (see Figure 4C,F,I),

corresponding to the 25, 50, and 75%-ile values of r(p) which arose from setting σh (see Figures S4E, S5E

and S6E). In general, the scalar curvatures decreased in magnitude when neighborhood sizes increased.

However, regions which had statistically significant non-zero scalar curvatures (zero falls outside of the 95%

CI) using variable neighborhood sizes also had non-zero scalar curvatures for all three fixed neighborhood

sizes. Additionally, statistically significant non-zero scalar curvature also emerged on other parts of the

manifolds when using small fixed neighborhood sizes. These regions are therefore curved at small length

scales but do not have a sufficient density of points to resolve curvature to the desired uncertainty σh (see

Method Section 4.3.5). This is analogous to the image patch dataset for which we could resolve scalar

curvatures of larger magnitude at a smaller length scale when the dataset was augmented with enough

points to attain smaller neighborhood sizes for a fixed σh.

We also checked how computed scalar curvatures changed with density in a toy model with zero scalar

curvature. Importantly, we did not observe the artifactual appearance of statistically significant non-zero

scalar curvature, for either variable neighborhood sizes chosen by the algorithm to achieve σh, or for fixed

neighborhood sizes (see Figure S2A; Methods Section 4.4.2.1). Taken together, although higher density

allows us to resolve statistically significant non-zero scalar curvatures in scRNAseq data, these computed

scalar curvatures are not an artifact of the smaller neighborhood sizes used in regions with higher density.

To ensure that the computed scalar curvatures were not sensitively dependent on the heuristically chosen

manifold dimension, d, we also recomputed scalar curvatures for d − 1 and d + 1 and observed similar

qualitative results (see Figures S4G, S5G and S6G). Lastly, we verified that the computed scalar curvatures

were not correlated with the number of transcripts in each cell (see Figures S4H, S5H and S6H).

16

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


A B C

D E F

H IG

Figure 4: scRNAseq datasets have localized regions of non-zero scalar curvature.
(A) Scalar curvatures were computed for a scRNAseq dataset with N ≈ 104 peripheral blood mononuclear cells (PBMCs)
collected from a healthy human donor. The ambient (n) and manifold (d) dimensions were specified to be 8 and 3 respectively
and variable neighborhood sizes were chosen by setting σh (see Methods Section 4.6.4). The scalar curvatures are shown here
overlaid onto UMAP coordinates, after smoothing the values over k = 250 nearest neighbors in the ambient space.
(B) Scatter plot of (unsmoothed) scalar curvatures, S, and associated standard errors, σS, for each datapoint in the PBMC
dataset. Points enclosed by the red lines reported a 95% CI (S ± 2σS) including 0.
(C) As in (A) but with scalar curvatures computed using a fixed neighborhood size, r, for all datapoints. The value of r was
set to be the 25, 50, and 75-%ile values (left to right) of the neighborhood sizes used in (A) (see Figure S4E). Points for which
a neighborhood of size r does not include enough neighbors for regression are not shown.
(D-F) As in (A-C) for a mouse gastrulation dataset with N ≈ 1.2 × 105, d = 3 and n = 11.
(G-I) As in (A-C) for a mouse brain dataset with N ≈ 1.3 × 106, d = 5 and n = 9, plotted on t-SNE coordinates.

17

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


To confirm the robustness of our results to sampling, we randomly discarded f% of points in the ambient

space determined for each dataset, and recomputed scalar curvatures using the same values of n, d and r(p)

used for the original dataset. We found that a statistically significant percentage of downsampled points

(82% for the PBMC dataset with f = 75, 78% for the gastrulation dataset with f = 75, and 76% for the

brain dataset with f = 50; p < 0.001) had a 95% CI containing the scalar curvature reported by the same

point for the original dataset (see Figures S4I, S5I and S6I; Methods Section 4.6.3.2). This suggests that if

the datasets were more highly sampled, and scalar curvatures were recomputed using the same neighborhood

sizes, they would be reliably contained within the currently reported 95% CIs. Unlike the two other datasets,

the brain dataset could not be downsampled to f = 75 while still having at least 75% of points report 95%

CIs containing the originally reported scalar curvatures, despite having the most points. This might be

because the brain dataset has a larger manifold dimension according to our heuristic and therefore requires

a greater number of terms, hkij, to be estimated in the Second Fundamental Form.

For the PBMC dataset, we additionally downsampled the single-cell count matrix by discarding f% of

transcripts at random and preprocessing the same way. We recomputed scalar curvatures for this downsam-

pled dataset with the same n, d and r(p) values used for the original dataset. Here too, we found that when

f = 50 (f = 75), 70% (65%) of the downsampled points had a 95% CI containing the originally reported

scalar curvature (p < 0.001, see Figure S4J; Methods Section 4.6.3.3). Therefore, the computed scalar cur-

vature is robust to changes in capture efficiency and sequencing depth. Taken together, our computational

analysis reveals non-trivial intrinsic geometry in scRNAseq data.

3 Discussion

In this study, we explored two approaches to computing the curvature of data manifolds using tools from twin

branches of differential geometry. Despite the prevalence of the Laplace-Beltrami operator in geometric data

analysis [14, 26, 27, 28, 29], an intrinsic approach to computing scalar curvature relying on this operator’s

eigenvalues was determined to be infeasible for sample sizes of N ≈ 104 typical of current scRNAseq datasets.

Although methods such as MAGIC [45] and diffusion pseudotime [46] apply the Laplace-Beltrami operator to

smooth scRNAseq data and infer cell differentiation trajectories respectively, using information intrinsic to

the manifold, our results suggest that the embedding of the manifold in the ambient space provides valuable

information necessary for estimating the intrinsic curvature. This observation is perhaps implicit in recent

tools for estimating the Laplace-Beltrami operator, which first use moving local least-squares to approximate

a surface, thereby incorporating information from the ambient space [29].

18

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


Certainly, we found that an extrinsic approach in which the embedding is retained, and curvature is

determined by local quadratic fitting of datapoints in ambient coordinates, is feasible given the sample size

and degree of noise in real-world datasets. To obtain the scalar curvature of data manifolds, our algorithm

first computes the full Riemannian curvature tensor. For other applications, this tensor can be used to

compute other geometric quantities, such as Ricci curvature, or may itself be of interest. More generally, we

focused on intrinsic curvature because we were interested in geometric properties of the manifolds independent

of their embeddings. However, the Second Fundamental Form used in our approach to compute the intrinsic

curvature can be used to obtain all the information about the extrinsic curvature as well. Indeed, hkij(p)

exactly quantifies the extent to which the manifold deviates in the kth normal direction from the ij-tangent

plane at point p.

A key limitation of our algorithm is that the manifold dimension must be specified by the user. We

also assumed that the manifold dimension is the same at every point in a dataset. Extending the algorithm

to determine the manifold dimension from the data itself, potentially in a position-dependent manner, may

prove useful. In addition, there is no inherently correct length scale over which curvature should be computed

for a data manifold. Our algorithm chooses a length scale that varies from one part of the data manifold

to another according to the density of points, and is tuned to achieve a user-specified level of uncertainty in

the computed curvature. For some applications, it might be more sensible to fix a desired length scale for

computing the curvature.

As a demonstration of our algorithm, we computed the scalar curvature of image patches, and found

that it was consistent with that of a Klein bottle. This observation further validates the claim by Carlsson

et al. who showed that image patches have the topology of a Klein bottle [21]. Unlike the Klein bottle

parameterization of image patches however, no definitive analytical form has been established for scRNAseq

datasets. Recent work has suggested the use of hyperbolic geometry to model branching cell differentiation

trajectories [47] and specific manifolds have been proposed to model reaction networks [48], which may be

applicable to scRNAseq data. These proposed manifolds can be validated or improved using knowledge

of the intrinsic geometry of scRNAseq datasets. Finally, incorporating information about curvature may

provide a more principled approach for developing dimensionality reduction and visualization tools.

19

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


4 Methods

4.1 Differential Geometry of Theoretical Manifolds

Here we briefly discuss how to compute the scalar curvature of, and sample from, theoretical manifolds given

a parameterization.

For a d-dimensional manifold, M, with intrinsic coordinates {x1, ...,xd} and embedding in Rn given by

f(x1, ...,xd), the metric is:

gij =
∂fT

∂xi
∂f

∂xj
(8)

The scalar curvature of M can then be derived analytically in intrinsic coordinates in terms of the metric as

S = gij
(
Γkij,k − Γ

k
ik,j + Γ

l
ijΓ

k
kl − Γ

l
ikΓ

k
jl

)
(9)

where the Γijks are Christoffel symbols given by

Γijk =
gil

2

(
∂glj
∂xk

+
∂glk
∂xj

−
∂gjk
∂xl

)
(10)

and Γijk,l=
∂Γijk
∂xl

.

To draw points from M with ai ≤ xi ≤ bi so that the embedded manifold is uniformly sampled in Rn,

we use rejection sampling. For paired random variables x ∼ Uniform(a,b) and y ∼ Uniform(0, max
√

det g),

we retain x as a sample point if
√

det g
∣∣
x
≤ y.

4.2 Details of Intrinsic Approach to Curvature Estimation

Here we explain how we used Equations 2-4 on the simplest of toy manifolds, the noise-free 2-dimensional

hollow unit sphere, S2, to obtain an estimate of the average scalar curvature. The true scalar curvature is

S(p) = 2 ∀p ∈ M. For the remainder of this section, we adopt the convention that symbols with overbars

are estimates of the corresponding unaccented quantities.

4.2.1 Approach for S2

Our approach mirrors the treatment in [27], in which heat-traces are fit over various intervals [x1,x2] with

x1 ≥ 0, to quadratic polynomials p2(x) = c0 + c1x + c2x2 to estimate the geometric quantities in Equation 2.

Here, we constrained the form of p2(x) for fitting by assuming that (i) the manifold is boundary-less

(so that c1 = c1 = 0 and the second boundary term for c2 vanishes), (ii) the volume is known (so that

20

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


c0 = c0 = 4π), and (iii) the scalar curvature is constant (so that c2 =
2π
3
S), yielding p2(x) = 4π +

2π
3
Sx2.

These are strong assumptions that will not hold for an arbitrary manifold, which already precludes this

as a generic procedure. Nonetheless, we proceeded for S2 to see if even with this privileged information,

the scalar curvature could be estimated accurately. We declared an estimate to be accurate on the interval

[x1,x2] if S has error within ±0.5 i.e. S ∈ [1.5, 2.5]. All quadratic fits were performed in MATLAB using

the lsqnonlin function (‘StepTolerance’=1e-3, ‘FunctionTolerance’=1e-6).

First, we evaluated zm(x) using analytical eigenvalues for S2 given by λ(`−1)2+1, ...,λ`2 = `(`− 1),` > 0,

and let Dm be the collection of all intervals for which fits to p2(x) yielded accurate S. Dm corresponds to

intervals where Equation 4 is accurate to our desired tolerance when the eigenvalues are known exactly.

Next, we uniformly sampled N = 104 points from S2 (see Figure 1A; Methods Section 4.4.1.1), estimated

∆M using the random walk Graph Laplacian with Gaussian kernel (see Equation 15 in Methods Section 4.2.5),

and computed empirical eigenvalues, λk, from ∆M . We selected N = 10
4 as it is the same order of magnitude

as the sample size of current scRNAseq experiments, and is sufficient to identify M as S2 by eye (see

Figure 1A). We verified if estimates zm(x), obtained by evaluating Equation 3 using λk, when fit as described

above to p2(x) over intervals in Dm, recapitulated the accurate S obtained using zm(x). We restricted our

attention to Dm for calculations using empirical eigenvalues, since it is only over intervals in Dm that it is

even theoretically possible to compute scalar curvature to the desired accuracy.

Below, we report our findings for different m.

4.2.2 Infinite Series

We first applied this approach to the ideal case in Equation 3, where infinite analytical eigenvalues are

available. We computed z∞(x) (shown as a black line in Figure S1A) and obtained S by fitting p2(x) over

various intervals as described above. Figure S1B shows that D∞ is comprised of intervals with 0 ≤ x1 <

x2 . 1.15. For x2 & 1.15, errors from neglecting higher-order terms o(x
3) in Equation 4 dominate. Since

zm(x) converges from ∞, x2 . 1.15 necessarily holds for any interval in Dm∀m.

4.2.3 Truncated Series

We next considered zm(x) for m < N, since in practice, we will only have access to as many eigenvalues as

datapoints (N). We computed z1000(x) using Equation 3 (shown as a solid blue line in Figure S1A), and

obtained S by fitting p2(x) (see Figure S1C). Intervals in D1000 roughly satisfy 0.25 . x1 < x2 . 1.15.

However, we found that z1000(x) (shown as a dashed blue line in Figure S1A) deviated markedly from

z1000(x) in the rough interval [0.1, 0.75], which has significant overlap with D1000. Consequently, when we

21

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


fit p2(x) to z1000(x) on D1000, the resulting S was not accurate for any interval in D1000 (see Figure S1D).

Note that this inaccuracy was not a consequence of not using all N available eigenvalues. While picking

m = N would reduce the lower bound on valid intervals in Dm (since zm(x) converges from ∞), it is

exactly for small x1 that S obtained from z1000(x) is already over-estimated as shown in Figure S1D. Since

zm2 (x) > zm1 (x) ∀x,m2 > m1, using a truncated series with a larger m would simply exaggerate the

difference between zm(x) and zm(x) for small x and cause scalar curvatures estimated using the latter to be

further over-estimated.

Following this line of thought, we reasoned that picking a fewer number of eigenvalues may ameliorate

the issue. We selected m = 49 (instead of a round number like m = 50 so that all eigenvalues of a given

multiplicity are included) and repeated this analysis for the same set of N = 104 points. z49(x) is shown as

a solid red line in Figure S1A and the intervals over which fits to p2(x) yield accurate S, D49, are shown

in Figure S1E. While z49(x) (shown as a dashed red line in Figure S1A) has a much smaller deviation from

z49(x) than z1000(x) did from z1000(x), no estimate of S obtained from fits of z49(x) to p2(x) on D49 were

sufficiently accurate once again (see Figure S1F).

4.2.4 Eigenvalue Convergence

We refrained from reducing m further to improve agreement between zm(x) and zm(x) after noting that the

size of the intervals in Dm shrink with m. Though we may have a better chance of computing accurate S with

zm(x) on Dm for smaller m, recall that in practice we will not have Dm available to us since the analytical

eigenvalues will be unknown. Therefore, we simply shift the problem to one of choosing an interval that

will yield an accurate S, from a shrinking pool of intervals that could even theoretically yield an accurate

estimate.

Instead, we compared the estimated λks with their true values, λk, and observed that the former con-

sistently under-estimate the latter (see Figure S1H). Furthermore, we found that the fractional error grows

with k, exceeding 60% for k = 37, ..., 49. Therefore, z49(x) will only be accurate if N is large enough to limit

the fractional error.

To determine the required tolerance on the fractional error, we constructed a truncated series analo-

gous to Equation 3, but with eigenvalues interpolated between the analytical eigenvalues and the empirical

22

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


eigenvalues determined for N = 10000, according to a parameter f:

z̃m(x; f) = (4π)
d/2xd

m∑
k=1

e−λ̃k(f)x
2

λ̃k(f) = λk + f(λk −λk)

(11)

f signifies that the fractional error of the interpolated eigenvalues is reduced by 1−f relative to the empirical

eigenvalues determined for N = 10000. We found that f ≤ 0.23 is needed so that z̃49(x; f) (shown as a green

line in Figure S1A) fit to p2(x) yields accurate S on half the intervals in D49 (see Figure S1G).

Given that the fractional error in estimating λ37, ...,λ49 by λ37, ...,λ49 is 60% when N = 10000, how

large does N have to be to reduce this fractional error to 60% × 0.23 ≈ 14%? A convergence rate for the

fractional error is given in Theorem 1 of [30]. For 2-dimensional manifolds:

∣∣λk −λk∣∣
λk

= O

(
(log N)

3
8

N
1
4

)
(12)

Assuming that the big-O bound is sharp at N = 104 for k = 37, ..., 49 (i.e. the prefactor is given by

0.6 10000
1
4

log(10000)
3
8
≈ 2.61), we extrapolated that at least N = 107 datapoints are needed to reduce the fractional

error to 14% (see Figure S1H). Equation 12 also applies to empirical eigenvalues of ∆M constructed from

weighted kNN and r-neighborhood kernels instead of Gaussian kernels (see Methods Section 4.2.5). However,

the prefactor in Equation 12 is actually worse for these estimators since their empirical eigenvalues have

larger fractional errors at N = 10000 (see Figure S1H), so that even larger N would be required to attain

the desired fractional error. Lastly, note that while we had analytical eigenvalues available with which to

ascertain m = 49 as suitable, the naive approach of simply using all eigenvalues available (m = N), would

require sample sizes that are even larger by several more orders of magnitude.

4.2.5 Estimating the Laplace-Beltrami Operator from Data

For N points, {Xi} ∈ Rn, sampled from M, we estimated ∆M by normalizing the weight matrix W (see

below) using the random walk normalization [30, 49]. ∆M constructed using this normalization converges

to ∆M when samples are drawn uniformly from the embedding of M in Rn, as was done in our analysis.

∆M =
4

�
(IN −D−1W)

D = diag{W1}
(13)

23

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


IN is the N ×N identity matrix, 1 ∈ RN is a vector of ones and the kernel width, �, is set to match that

used in Theorem 1 of [30]:

� =
(log N)

3
8

N
1
4

(14)

Throughout our analysis, we used W = Wg, the weight matrix with entries given by a Gaussian kernel:

[Wg]i,j = exp(−‖Xi −Xj‖22/�) − δi,j (15)

To check whether other estimators had more benign prefactors for eigenvalue convergence (see Figure S1H),

we also considered the weighted kNN kernel, WkNN , and the r-neighborhood kernel, Wr, with r = � [50]:

[WkNN ]i,j = [WG]i,j
[
1kNN(j)(i) OR 1kNN(i)(j)

]
[Wr]i,j = 1BXi(r)(Xj) − δi,j

(16)

kNN(i) is the set of indices of the k-nearest neighbors of point i in Rn, BXi(r) is the n-dimensional ball of

radius r centred at Xi, and 1A(x) is the indicator function for x ∈ A.

4.3 Details of Extrinsic Approach to Curvature Estimation

4.3.1 Quadratic Regression on Local Neighborhoods of Data

Here we describe the regression model for computing the coefficients of the Second Fundamental Form, hkij,

at a particular point p. As described in the main text, after performing PCA on a neighborhood of Np

points around p in Rn, each point in the neighborhood can be described in terms of d tangent coordinates,

ti, and n−d normal coordinates, nk. We defer discussion of how the neighborhood is selected to Methods

Section 4.3.2.

The nks are treated as dependent variables that can be modelled as quadratic functions of the tis, which

are taken to be independent variables. See Equation 17 below. Linear terms are excluded since they ought

to have zero coefficients in the tangent basis. Constant terms, Ck, are included to account for affine shifts.

Since hkij = h
k
ji according to Equation 5, in practice we only consider titj and h

k
ij for j ≥ i so that t and h

in Equation 17 have linearly independent columns, though we write the full form here for simplicity.

24

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


n = th + E

n =



n

(1)
1 . . . n

(1)
n−d

...
. . .

...

n
(Np)
1 . . . n

(Np)
n−d




t =




1 t
(1)
1 t

(1)
1 . . . t

(1)
1 t

(1)
d t

(1)
2 t

(1)
1 . . . t

(1)
d t

(1)
d

...
...

. . .
...

...
. . .

...

1 t
(Np)
1 t

(Np)
1 . . . t

(Np)
1 t

(Np)
d t

(Np)
2 t

(Np)
1 . . . t

(Np)
d t

(Np)
d




h =



C1 h

1
1,1 . . . h

1
1,d h

1
2,1 . . . h

1
d,d

...
...

. . .
...

...
. . .

...

Cn−d h
n−d
1,1 . . . h

n−d
1,d h

n−d
2,1 . . . h

n−d
d,d



T

E =



ε

(1)
1 . . . ε

(1)
n−d

...
. . .

...

ε
(Np)
1 . . . ε

(Np)
n−d


 =



ε(1)

T

...

ε(Np)
T




(17)

Regression yields the following least-squares solution:

ĥ = (tT t)−1tT n

Σε =
(n − tĥ)T (n − tĥ)

Np

Σh = Σε ⊗ (tT t)−1

(18)

where ĥ is the matrix of estimates of the Second Fundamental Form, Σε is the estimated covariance structure

of the residuals so that ε(i) ∼ N(0, Σε), and Σh is the covariance matrix for ĥ. ⊗ denotes the Kronecker

product. We used the mvregress function in MATLAB to perform this regression in our code.

When datapoints are sampled exactly from an analytical manifold, Σε measures the contribution of

higher-order terms. In the limit of infinite sampling and infinitesimally small neighborhoods, Σε → 0. When

observational noise is present (discussed in Methods Section 4.4.2.2), Σε also depends on the magnitude of

the noise (σ in Equation 28).

25

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


4.3.2 Selecting Local Neighborhoods for Regression

Here we describe the procedure for selecting a neighborhood around each point p for computing the Second

Fundamental Form. We adopt the simplest approach of selecting the neighborhood to be a ball of radius r

centred at p, Bp(r).

If r(p) is not specified, we set it according to statistical rather than geometric principles, since the geom-

etry of the manifold may be non-trivial and unknown a priori. Specifically, we set r(p) so that the elements

in the covariance matrix, Σh, are upper-bounded by σ
2
h, the square of the specified target uncertainty. The

largest elements in Σh are the variance terms on the main diagonal, corresponding to the squares of the

standard errors, σhk
ij

, for the coefficients hkij. By inspection of Equation 18:

σ2
hk
ij

= [diag Σε]k
[
diag (t′t)−1

]
(ij)

(19)

where [diag Σε]k is the diagonal entry of Σε corresponding to the k
th normal direction and

[
diag (t′t)−1

]
(ij)

is the diagonal entry in (t′t)−1 for which the corresponding entry in t′t is ∼
∑
l(t

(l)
i t

(l)
j )

2. Increasing r(p)

monotonically increases both Np(r), the number of points in Bp(r), and the average magnitude of elements

in t, both of which reduce σhk
ij

.

To avoid sweeping r(p) to find the minimum value such that max σhk
ij
< σh, which is computationally

expensive, for each point we instead model the dependence of Np(r) on r as

Np(r) ∼ rd
′

(20)

so that

σ2
hk
ij
∼

1

rd
′+4

(21)

To determine d′, Np(r) is counted at 10 log-spaced distances, ri, and a line is fit to the (log ri, log Np(ri))

pairs for i ∈{2, ..., 8}. r1 is set to be the distance from p to the
(
d(d+1)

2
+ 1
)

-closest point to p (the minimum

number of points needed for regression). r10 is set to be the distance from p to the furthest point from p.

To solve for r, we first guess rg = r1, perform regression on the set of points in Bp(rg) and assign σ
2
g to be

the largest diagonal entry in Σh. If
∣∣∣σgσh − 1∣∣∣ is within a desired tolerance, we set r = rg, or else we update

rg as shown and iterate to convergence.

rg ← rg
(
σg
σh

) 2
d′+4

(22)

For large datasets, we speed up computation by only selecting r in this manner for a subset of Ncalib ≤ N

26

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


randomly selected calibration points. All datapoints in the Voronoi cell of each calibration point are then

assigned the same r as the calibration point. Unless otherwise specified Ncalib = N.

4.3.3 Goodness-of-Fit Test for Quadratic Regression

For a fixed density of points, there is a fundamental trade-off between reducing uncertainty in the hkijs and

the validity of approximating local neighborhoods with quadratic fits. To reduce σh, more points must be

included in the fit, but a larger neighborhood may not be well-modelled by only quadratic terms. Conversely,

d(d+1)
2

+ 1 points are sufficient to perform the regression, but there is then large uncertainty in the estimate

of hkij. Since our approach is to choose a neighborhood size to achieve a target σh, we include a companion

goodness-of-fit (GOF) statistic measuring how well the neighborhood is fit by a quadratic. Namely, we

use Mardia’s test on the residuals from regression (ε(i) in Equation 17), which yields a p-value for the null

hypothesis that the residuals are normally distributed [51].

When the p-values are small, the quadratic regression model is unlikely to be valid. In this case, curvatures

computed using the resulting hkij may be suspect regardless of the tightness of the errorbars, and the user

may want to consider increasing σh to reduce the neighborhood size. However, the poor GOF may not be

of concern if the length scale of interest is larger than the fluctuations in the manifold which give rise to

the non-Gaussian residuals (see Methods Section 4.3.5). Note that Mardia’s test is relatively weak since it

may yield false negatives for heteroskedastic residuals. This GOF measure is therefore only provided as a

computationally cheap consistency check. Ideally, the density of sampled points is sufficiently high to (i)

permit small σh and (ii) produce GOF p-values that are uniformly distributed (consistent with the null

model) and spatially uncorrelated.

4.3.4 Standard Error and Bias of Scalar Curvature Estimate

Here we discuss how we compute the standard error, σS, of the estimate for S and note sources of estimator

bias. Since the Riemannian curvature tensor in Equation 6 is a bilinear form and the tensor contraction in

Equation 7 is a straightforward sum, σS can be computed using simple error propagation formulas in terms of

the uncertainties from regression. Specifically, the standard error we report is the first-order approximation

to the second moment of a function of random variables:

σS =
√
JT ΣhJ (23)

where J = ∂S
∂hk

ij

∣∣∣
ĥ
.

27

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


It is important to note that our estimate for S is biased and not normally distributed. First, the hkijs

are only normally distributed when the residuals (ε(i) in Equation 17) themselves are normally distributed.

Second, even when the hkijs are normally distributed, our estimate of S will not be due to its bilinear

dependence on hkij. Lastly, estimates for S can be biased in a manifold-dependent and even position-

dependent way. For instance, the analytical scalar curvature of S2 embedded in R3 is given by S = 8(h111h122−

h112h
1
21), with h

1
11 = h

1
22 = 1/2 and h

1
12 = h

1
21 = 0. Numerically however, the symmetric off-diagonal terms

will never be exactly 0 so S will be systematically under-estimated. This is apparent in the left tail of the

blue histogram in Figure 2I. In our experience, adding isotropic noise of small magnitude tends to remove the

skew, presumably because then the residuals more closely match the regression assumptions (see for example

Figure S2B, where the left tail disappears for σ = 0.001). Furthermore, in our examples, we observed that

computed scalar curvatures were less biased when the ambient and/or manifold dimensions were large. We

speculate that this is because the increased number of terms (with alternating signs) in Equations 6 and 7

leads to cancellation of errors, which is likely why the accuracy of computed scalar curvatures was higher

for S3, S5 and S7 than S2, and the distribution of scalar curvatures less skewed (see Figure 2I).

4.3.5 Note on Length Scales

Here we make three remarks regarding length scales relevant both for considering curvature theoretically

and for applying our algorithm.

First, note that scalar curvature has units of inverse length squared. Therefore, scaling all the coordinates

of the points on a manifold by a factor L, changes the scalar curvature at all points by L−2. Thus, it is

always important to contextualize the scalar curvature in terms of the global length scale associated with

the manifold. For example, the scalar curvature of Sd with radius R is Sd(p) =
d(d−1)
R2

∀p ∈ M (here

L = R). In the case of the toy models shown in Figure 2, the global length scale is L ≈ 1 (see Methods

Section 4.4.1). For the image patch dataset, a normalization is applied which places all patches on S7 (see

Methods Section 4.5.2), so that the global scale is again L = 1. For scRNAseq data, we computed scalar

curvature on the datapoints after preprocessing (see Methods Section 4.6.1), without imposing any additional

scaling correction to achieve a standardized global length scale. Since other custom analyses also use these

same boilerplate preprocessing steps, computing scalar curvatures in the context of the global length scale

of the preprocessed data is sensible. For all three scRNAseq datasets, the global length scale happened to

be L ≈ 20 (see Methods Section 4.6.4).

Second, since hkij is a dimension-ful quantity (which scales as L
−1), to keep the ratio of σS to S fixed

when all coordinates are scaled by L, σh needs to be scaled by L
−1.

28

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


Lastly, we note that our choice of σh sets local length scales that are statistically rather than geometrically

informed: neighborhoods are chosen to upper bound the uncertainty in estimates obtained from regression.

This length scale can also be understood in terms of a bias-variance trade-off. Large length scales reduce

variance but may introduce a bias if the resulting neighborhoods are larger than features on the manifold.

This manifests as poor GOFs and can be corrected by finer sampling. However, for manifolds with features

at different length scales (such as a golf ball, which can be treated as dimples superimposed on S2), neigh-

borhoods chosen by this heuristic can also be much smaller than the feature of interest, so that fine-scale

curvature fluctuations are detected (dimples) while coarser features are neglected (S2). Regardless, we de-

fault to this statistical approach because in general, the length scale of relevant features on a data manifold

will not be uniform across the manifold or known a priori. However, we also provide the ability to manually

set position-dependent r(p) in the software to facilitate ad hoc computation of curvatures at any length scale

of interest.

4.4 Details of Toy Manifold Curvature Computations

4.4.1 Analytical Forms

Here we provide analytical forms for the toy manifolds shown in Figures 2 and S2.

4.4.1.1 Hypersphere The d-dimensional unit hypersphere, Sd, has intrinsic coordinates θ1 ∈ [0, 2π],

θ2, ...,θd ∈
[
−π

2
, π

2

]
and ambient coordinates in Rd+1 given by:

xi =



∏d
j=1 cos θj, i = 1

sin θi−1
∏d
j=i cos θj, 1 < i ≤ d + 1

(24)

Using the relations in Methods Sections 4.1, the scalar curvature is given by Sd(p) = d(d − 1) ∀p ∈ M.

To draw uniform samples from Sd, instead of applying rejection sampling on these intrinsic coordinates as

described in Methods Section 4.1, it is more straightforward to let xi ∼N(0, 1) and scale the resulting vector

(x1, ...,xd+1) to have unit norm.

29

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


4.4.1.2 One-Sheet Hyperboloid The one-sheet hyperboloid, H22 , has intrinsic coordinates θ ∈ [0, 2π],

u ∈ R and ambient coordinates in R3 given by:

x = a cos θ
√
u2 + 1

y = b sin θ
√
u2 + 1

z = cu

(25)

For Figure 2E,F, we used a = b = 2 and c = 1. Using the relations in Methods Sections 4.1, the scalar

curvature is given by S(z) = − 2
(5z2+1)2

. To avoid edge effects in the z-direction, we constrained u ∈ [−2, 2],

and sampled points as described in Methods Section 4.1 until a subset of N = 104 had u ∈ [−1, 1]. Scalar

curvature was computed and visualized for these N = 104 points.

4.4.1.3 Ring Torus The 2-dimensional ring torus, T 2, has intrinsic coordinates θ,φ ∈ [0, 2π] and ambi-

ent coordinates in R3 given by:

x = (R + r cos θ) cos φ

y = (R + r cos θ) sin φ

z = r sin θ

(26)

For Figure 2G,H, we used R = 2.5 and r = 0.5. Using the relations in Methods Sections 4.1, the scalar

curvature is given by S(θ) =
8 cos(θ)

5+cos(θ)
.

4.4.1.4 Hypercube The m-dimensional cube of side length r, Dmr , has intrinsic coordinates z1, ...,zm ∈

[−r/2,r/2], and ambient coordinates in Rn for n ≥ m given by:

xi =



zi, 1 ≤ i ≤ m

0, m < i ≤ n
(27)

Using the relations in Methods Sections 4.1, the scalar curvature is given by S(p) = 0 ∀p ∈ M.

4.4.2 Practical Issues for Curvature Estimation on Real-World Datasets

For real-world data, small sample size is only one of the potential confounders for accurately estimating

curvature. Here, we report how our algorithm fares when four other real-world confounders are applied to

toy manifolds: non-uniform sampling, observational noise, large ambient dimension n, and uncertainty in

30

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


the manifold dimension d.

4.4.2.1 Non-Uniform Sampling We expect our approach to handle non-uniform sampling of the man-

ifold gracefully: smaller (larger) neighborhoods will be used on densely (sparsely) sampled portions of the

manifold to encapsulate the number of points needed to achieve σh. To computationally verify the robustness

of our tool to non-uniform sampling, we constructed a toy model to roughly match the (n, d, L) parameters

for the scRNAseq datasets explored in the paper, for which non-zero scalar curvatures seemed to appear at

smaller length scales/higher densities. Specifically, we wanted to verify that non-zero scalar curvatures do

not appear artifactually at specific length scales due to sharp changes in the local density of points sampled

from a flat manifold. To this end, we formed a dataset with a sparse periphery and dense core by uniformly

sampling N1 = 10
4 points from D310 to establish a background density equal to 10 points per unit volume, and

N2 = 10
3 points from D31 to create a core density roughly equal to 103 points per unit volume (see Methods

Section 4.4.1.4). We embedded these points in R11 by adding isotropic Gaussian noise with σ = 0.01 to the

eight normal directions, for all datapoints.

We computed scalar curvature on this dataset for a fixed σh (see Methods Section 4.4.3) and found no

significant deviation from the true value of zero in either the sparse or dense regions (see Figure S2A). We

next computed scalar curvatures at three fixed length scales corresponding to the 5, 50, and 95%-ile r values

obtained using the specified σh (r = 0.54, 0.90 and 1.22 respectively) and again saw no deviation from zero

scalar curvature for points in either the sparse or dense region (see Figure S2A). We repeated this analysis for

N2 = 10
4 and again saw no deviation from zero scalar curvature, regardless of whether variable neighborhood

sizes or fixed length scales (r = 0.37, 0.63 and 1.42 corresponding to the same percentiles) were used (see

Figure S2A).

4.4.2.2 Observational Noise Every ambient coordinate can be considered a measured observable with

its own observational noise. Assuming each observable is distorted by independent, isotropic Gaussian noise

with variance σ2 (sometimes referred to as convolutional noise [37]), datapoints X ∈ Rn sampled from an

embedded manifold M are modelled by:

X = x + N(0,σI), x ∈ M (28)

To study the sensitivity of our algorithm to noise, we uniformly sampled N = 104 datapoints from S2 ∈ R3,

added convolutional noise with σ ranging over several orders of magnitude, and estimated scalar curvatures

using a fixed σh (see Methods Section 4.4.3). For small σ, the distribution of scalar curvatures was centred

31

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


on the true value of 2, but once σ became large (≈ 10% of S2’s radius), the estimated scalar curvatures

approached 0 (see Figure S2B). Noise in the regression context does not change the expectation value of any

estimated parameter. The apparent flattening that is observed therefore indicates that X (obtained from

convoluting M), has a geometry that is not trivially related to M. Certainly for σ ≈ 1, X does not even

preserve the topology of M as S2. From a practical perspective, it suffices to say that small convolutional

noise can be handled by simple quadratic regression, while large convolutional noise obfuscates the original

manifold.

These observations are consistent with literature defining a manifold’s reach [34, 35], a noise scale beyond

which noisy samples cannot be uniquely associated to a point on the noise-free manifold. When σ exceeds the

manifold’s reach, the relationship between the empirical density of sampled points and the original manifold

is non-trivial even for a relatively forgiving model of manifold-orthogonal noise. The ridge manifold [36, 37]

of an empirical density has also been defined as an alternative to the unwieldy task of deconvoluting noisy

samples to recover a noise-free manifold. This definition avoids the notion of a noise-free manifold altogether

and instead defines manifolds as ridges, contours along which the empirical density of points is maximized.

4.4.2.3 Large Ambient Dimension A high-dimensional dataset may have an ambient space comprised

of tens of thousands of observables, i.e. n is very large. Meanwhile, the underlying manifold dimension, d,

may be small. Since convolutional noise occurs in n dimensions, will a low-dimensional manifold still be

discernable?

To explore this, we uniformly sampled N = 104 datapoints from S2 ∈ R3, embedded these points in

Rn for a range of n up to 100, and added convolutional noise of magnitude σ = 0.01, 0.03, and 0.05 in the

n-dimensional ambient space. We computed curvatures for all combinations of n and σ using a fixed σh (see

Methods Section 4.4.3). As n or σ increased, the algorithmically chosen neighborhood sizes, r(p), expanded

to include enough datapoints to maintain the desired σh. The distribution of estimated scalar curvatures

(shown in Figure S2C) is centred on the true value of 2 for n < 80 and σ ≤ 0.05.

However, we observed that r was far less sensitive to changes in n than changes in σ. For example,

exploding n from 3 to 100 at σ = 0.01 and tripling σ from 0.01 to 0.03 at n = 3 required a comparable

increase in r (see Figure S2C). Therefore, consistent with the results of Methods Section 4.4.2.2, as long as

the noise scale σ is small, a large ambient dimension n is not a confounder. Practically however, to shorten

computational overhead and avoid the large-n-and-σ case, it is still helpful to reduce the ambient dimension

by projecting datapoints to an affine subspace containing the manifold (e.g. by PCA). Such a transformation

does not change the intrinsic curvature.

32

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


4.4.2.4 Choice of Manifold Dimension The last practical consideration is accurate selection of the

manifold dimension, d, which we have so far assumed to be known. There is no consensus on the definition

of d for a dataset, so various disciplines have devised different heuristics to determine d in a data-driven

fashion [52]. From the regression perspective, any d > 0 corresponds to a well-defined regression problem.

The choice of d merely determines how local coordinates are partitioned into independent (tangent) and

dependent (normal) variables. However, in our algorithm we noticed that some choices of d result in exces-

sively large r(p) for a fixed σh. We explored this further using two toy manifolds and discovered a signature

indicating that the specified manifold dimension may be incorrect.

The manifolds considered were S3 ⊂ R5 convoluted by isotropic Gaussian noise with σ = 0.01 and

S2 ×S2 ⊂ R6, for which d∗, the true manifold dimension, is d∗ = 3 and d∗ = 4 respectively. We uniformly

sampled N = 104 points from each manifold and estimated scalar curvatures by holding σh fixed for different

d (see Methods Section 4.4.3). For both manifolds, the average neighborhood size, r, was much larger for

d > d∗ and d < d∗, than for d = d∗ (see Figure S2D). In the case of S3, for d < d∗, the average neighborhood

size was even larger than the global length scale, L, of the manifold. Since neighborhood sizes are chosen

to achieve a target σh, manually decreasing r(p) is counter-productive and simply increases the uncertainty

from regression above σh.

The large neighborhood sizes that emerged for both d > d∗ and d < d∗ can be understood in terms of the

mis-assignment of normal vectors to the tangent space, or vice versa. According to Equation 19, σhk
ij

increases

with large variation in the normal direction ([diag Σε]k), or with small variation in the tangent direction

(
[
diag (t′t)−1

]
(ij)

). When we choose d > d∗, we mis-attribute a normal direction with small variation

[diag Σε]k as an independent variable, whereas variation along the true tangent space is � [diag Σε]k. r

must therefore be increased to compensate for the lack of variation along this direction mis-classified as

tangent. When d < d∗, we have spuriously assigned a tangent direction with large variation to be a normal

direction. Since this spurious normal coordinate cannot be well-approximated as a function of tangent

coordinates from which it is linearly independent, the perceived noise scale ([diag Σε]k) is exaggerated so

that a larger neighborhood is needed to attain σh.

This suggests a crude, operational definition of what constitutes an incorrect choice of d. When σhk
ij

is

large relative to the uncertainty in other coefficients, there is either too little variation along the ith and jth

tangent directions, or too much variation along the kth normal direction. In the former case, the ith or jth

tangent direction might be more appropriately classified as a normal direction (d is too large and should

be decreased), while in the latter case, the kth normal direction might be more appropriately classified as

a tangent direction (d is too small and should be increased). When this criterion is applied point-wise,

33

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


there may be a different acceptable choice of d for different parts of the manifold. When this criterion is

generalized over the entire manifold, a σh yielding a flat distribution of GOF p-values when the manifold

dimension is specified to be d will also yield a flat distribution for d + 1 but not necessarily for d − 1: if

residuals in n−d dimensions are well-modelled by a multivariate Gaussian, so too will residuals in n−d−1

dimensions, but not necessarily residuals in n − d + 1 dimensions (see Figure S2D). Our observations are

consistent with manifolds in literature with multiple possible manifold dimensions (like the helix manifold

in [36]), and which could generally arise from non-isotropic noise or non-uniform sampling.

4.4.3 Parameters for Curvature Estimation

For each manifold in Figure 2, we chose σh so that the fraction of points with GOF p-value ≤ α = 0.05

most closely matched the null model of normally distributed residuals consistent with neighborhood sizes

well-approximated by quadratic regression (see Section 4.3.3). σh = (0.017, 0.020, 0.028, 0.055, 0.022, 0.030)

for (S2,S3,S5,S7,H22,T 2) resulted in (7.4, 3.5, 1.4, 2.8, 7.4, 4.0)% of points having GOF p-values ≤ α = 0.05.

Theoretically, max |hkij| = (0.5, 2, 2.5) for (S
d,H22,T

2) so our choices for σh result in small fractional errors

in all cases.

For Figure S2A, we set σh = (0.02, 0.01) for N2 = (10
3, 104) respectively which resulted in (1.6, 2.5)%

of points having GOF p-values ≤ α = 0.05. For all other panels in Figure S2, where we were interested

in ascertaining the sensitivity to different confounders, instead of minimizing uncertainty per se, we used a

fixed value of σh = 0.05. This choice resulted in neighborhoods small enough to be well-approximated by

quadratic regression, manifesting as a roughly uniform distribution of GOF p-values in all cases.

4.5 Details of Image Patch Dataset and Klein Bottle Manifolds

4.5.1 Notation and Preliminaries

First we introduce some notation needed to describe the image patch dataset. We refer readers to [21, 40]

for a more detailed exposition. Let P be the space of all bivariate polynomials p : R × R → R with p ∈ P,

h : P → R9 the vectorization operator given by h(p) = [p(−1, 1),p(−1, 0),p(−1,−1),p(0, 1),p(0, 0),p(0,−1),

p(1, 1),p(1, 0),p(1,−1)]T , u : Rm →Sm−1 the normalization operator given by u(v) = v‖v‖2 , and c : R
9 → R8

the projection operator given by c(y) = ΛATy, where A = [e1 . . . e8], Λ = diag{ 1‖e1‖22 , ...,
1

‖e8‖22
}, and {ei}

34

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


are vectorized basis vectors for the 2-dimensional discrete cosine transform (DCT) applied to 3x3 patches:

e1 = [1, 0,−1, 1, 0,−1, 1, 0,−1]T/
√

6

e2 = [1, 1, 1, 0, 0, 0,−1,−1,−1]T/
√

6

e3 = [1,−2, 1, 1,−2, 1, 1,−2, 1]T/
√

54

e4 = [1, 1, 1,−2,−2,−2, 1, 1, 1]T/
√

54

e5 = [1, 0,−1, 0, 0, 0,−1, 0, 1]T/
√

8

e6 = [1, 0,−1,−2, 0, 2, 1, 0,−1]T/
√

48

e7 = [1,−2, 1, 0, 0, 0,−1, 2,−1]T/
√

48

e8 = [1,−2, 1,−2, 4,−2, 1,−2, 1]T/
√

216

(29)

By inspection, e1 is the basis vector for patches with horizontal stripes and linear gradients, e2 for patches

with vertical stripes and linear gradients, e3 for patches with horizontal stripes and quadratic gradients, e4 for

patches with vertical stripes and quadratic gradients, and e5 for diagonally-oriented patches with quadratic

gradients. All the patches produced by the embedding k0 in Equation 31 below and visualized in Figure 3B

can be written as a linear combination of these 5 basis vectors. Next, note that the components in each ei

sum to 0, so that the projection operator, c, additionally serves to remove the mean. Finally, observe that

the vector norm formed under D = AΛ2AT (referred to hereafter as the D-norm following [40]) measures

the contrast in a 3x3 patch since

‖v‖D =
√
vTDv =

1

2

√∑
i

∑
j∼i

(vi −vj)2 (30)

where j ∼ i refers to all vertical and horizontal neighbors, j, of a pixel i in the preimage of v under h. The

ei are normalized so that ‖ei‖D = 1.

4.5.2 Image Dataset

We used the same van Hateren IML dataset [41] consisting of 4167 greyscale images of size 1532x1020 pixels

studied by Carlsson et al. in [21] and followed the same preprocessing steps used there. In short, we applied

a log1p transformation to all pixel values and randomly sampled 5 × 103 (possibly overlapping) 3x3 patches

from each image. We indexed the pixels in each patch using standard Cartesian coordinates with the middle

pixel as the origin, so that log-transformed pixel values are given by p(x,y),x ∈{−1, 0, 1},y ∈{−1, 0, 1}. We

35

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


then applied h to vectorize each patch p, and retained the high-contrast patches comprising the top quintile

of D-norms for each image, resulting in N ≈ 4.2 × 106 datapoints. Next, we normalized these high-contrast

vectorized patches using the composition u◦ c, resulting in a set of datapoints on S7 ⊂ R8. We determined

the density of these datapoints in R8 using the kNN density estimator with k = 100, and retained the densest

decile, which yielded N ≈ 4.2 × 105 datapoints. This dense subset of high-contrast normalized patches was

found using topological data analysis in [21] to be a Klein bottle, K2 ⊂ S7, and is studied in Figures 3D,I

and S3B.

To generate the augmented image patch dataset used in Figures 3J and S3E,F, we first considered all

N ≈ 1.3 × 109 vectorized high-contrast patches in the van Hateren IML dataset using the same procedure

described above (each of the 4167 images yields 1530 × 1018 patches, of which the top 20% by D-norm are

retained per image). These were normalized by u◦c as before to place them on S7 ⊂ R8. We again wanted

to retain the densest decile of points, since only these have the topology of a Klein bottle. Mirroring the

approach in [21] where the k used in the kNN estimator was scaled with sample size, k = 102 used for

N ≈ 4.2 × 106 corresponds to k = 102 × 1.3×10
9

4.2×106 ≈ 3 × 10
4 for N ≈ 1.3 × 109. Computing k ≈ 3 × 104

neighbors for all N ≈ 1.3 × 109 points is prohibitive however. To determine a reasonable smaller value of

k, we randomly selected 2 × 104 points from the set of N ≈ 1.3 × 109 on which to compare estimators and

found that 90% of points in the densest decile as computed with k = 3 × 104 also appeared in the densest

decile computed using k = 6 × 102. We therefore used the latter value for density estimation and retained

the N ≈ 1.3 × 108 datapoints comprising the densest decile.

4.5.3 Parametric Family of Klein Bottle Embeddings

Let θ,φ ∈ [0, 2π]. Bivariate polynomials parameterized by (θ,φ), kθ,φ ∈ Kθ,φ ⊂ P, that satisfy kθ,φ =

kθ+π,2π−φ form a Klein bottle, K
2: the (θ,φ) ∼ (θ + π, 2π − φ) similarity relation results in edges being

glued together in the manner definitional of a Klein bottle’s topology (shown in Figure 3B). The candidate

Klein bottle embedding supplied in [21] to model image patch data satisfies the similarity relation ∀x,y:

k0 ≡ k0θ,φ(x,y) = cos φ [x cos θ + y sin θ]
2

+ sin φ [x cos θ + y sin θ] (31)

Note that any kθ,φ ∈ Kθ,φ can be decomposed as:

kθ,φ = C + κθ + κφ + κθ,φ (32)

36

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


where κθ = κθ+π, κφ = κ2π−φ and κθ,φ = κθ+π,2π−φ. The first three terms can be understood as constant,

θ-dependent and φ-dependent phases respectively.

We sought an embedding of the Klein bottle for which the sum of Euclidean distances from each image

patch to its closest point on the embedding is minimized. To accomplish this, we constructed a parametric

family of models for each of the four terms in Equation 32. The first three of these are most conveniently

expressed directly in the DCT basis.

(c◦h) (C) = NC
8∑
i=1

µiei

(c◦h) (κθ) =
8∑
i=1


 Nθ∑

j=2
j even

βi,j cos(jθ) + γi,j sin(jθ)


 ei

(c◦h) (κφ) =
8∑
i=1


Nφ∑
j=1

ζi,j cos(jφ)


ei

(33)

NC is a Boolean variable, and Nθ and Nφ control the number of terms in the inner sum for (c◦h) (κθ)

and (c◦h) (κφ) respectively. The expression for (c◦h) (κθ) only includes even coefficients for θ so that the

similarity relation (θ) ∼ (θ +π) is satisfied. The expression for (c◦h) (κφ) only includes cosine terms so that

the similarity relation (φ) ∼ (2π −φ) is satisfied.

For κθ,φ, we refrained from writing a Fourier series-like expansion because we wanted to preserve the

interpretation of θ and φ as parameters controlling the orientation and gradient respectively [21]. Instead,

we devised the following form, which we motivate further below:

κθ,φ(x,y) =

Mφ∑
l=1

cosl(φ)


 s+t≤Mθ∑

0≤s,t≤Mθ
1<s+t even

ηl,s,t(x cos θ)
s(y sin θ)t


 + Mφ∑

l=1
l odd

sinl(φ)



s+t≤Mθ∑

0≤s,t≤Mθ
s+t odd

ϑl,s,t(x cos θ)
s(y sin θ)t




(34)

The expression for (c◦h) (κθ,φ) is unwieldy but we note the following identities for monomials of the form

37

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


q(x,y) = xsyt, which can then be applied term-wise to κθ,φ.

(c◦h) (q) =




√
6e1, if s = 0 and t odd

−
√

6e2, if t = 0 and s odd

√
6e3, if s = 0 and t even

√
6e4, if t = 0 and s even

−
√

8e5, if s odd and t odd

2
√

6
3

e1 +
4
√

3
3

e6, if s > 0 even and t odd

−2
√

6
3

e2 − 4
√

3
3

e7, if t > 0 even and s odd

2
√

6
3

(e3 + e4 + e8) , if s > 0 even and t > 0 even

(35)

Note that the first inner sum in Equation 34 is a linear combination of basis vectors encoding purely

quadratic gradients (e3, e4, e5 and e8), weighted by even trigonometric functions of θ. The prefactors on

this inner sum are functions that are even in φ. This inner sum and its prefactor therefore jointly satisfy

the similarity relation (θ,φ) ∼ (θ + π, 2π−φ) by independently satisfying (θ) ∼ (θ + π) and (φ) ∼ (2π−φ).

Meanwhile, the second inner sum in Equation 34 is a linear combination of basis vectors containing linear

gradients (e1, e2, e6 and e7), weighted by odd trigonometric functions of θ. The prefactors on this inner

sum are functions that are odd in φ. This inner sum and its prefactor therefore jointly satisfy the similarity

relation (θ,φ) ∼ (θ + π, 2π − φ), by independently satisfying (θ) ∼ −(θ + π) and (φ) ∼ −(2π − φ). Since

the trigonometric functions of θ are coupled to (x,y), θ controls the rotation of stripes in the image patches,

just as in k0. Similarly, since the prefactors on the inner sums are functions of φ, φ controls the relative

contribution of quadratic gradients (e3, e4, e5 and e8 in the first inner sum) and linear gradients (e1, e2,

e6 and e7 in the second inner sum). Lastly, the boundary conditions for θ and φ in this parameterization

of κθ,φ, yield patches with vertical (horizontal) stripes when θ = 0 (θ =
π
2

), and linear (quadratic) gradients

when φ = π
2
, 3π

2
(φ = 0,π) just as in k0.

A Klein bottle embedding belonging to this parametric family, kαθ,φ ∈ Kθ,φ, can therefore be specified

in terms of a vector F = [NC,Nθ,Nφ,Mθ,Mφ] defining its functional form, and a corresponding coefficient

vector α = [µi, ...,βi, ...,γi, ...,ζi, ...,ηi, ...,ϑi]. In this parametric family of Klein bottle embeddings, k
0

corresponds to F = [0, 0, 0, 2, 1] with α = [η1,2,0,η1,1,1,η1,0,2,ϑ1,0,1,ϑ1,1,0] = [1, 2, 1, 1, 1]. Note that since

curvatures are only computed on the embedding after normalization, α is only meaningfully defined up to a

multiplicative constant.

38

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


4.5.4 Associating Image Patches to a Klein Bottle Embedding

For a given Klein bottle embedding, kαθ,φ ∈ Kθ,φ, we associated each datapoint vi (already vectorized and

normalized by u◦ c◦h) to the closest point on kαθ,φ by minimizing the Euclidean distance in R
8:

(θ̂i, φ̂i) = argminθ,φ‖(u◦ c◦h)
(
kαθ,φ

)
−vi‖22 (36)

We solved this minimization using the lsqnonlin function (‘StepTolerance’=1e-3, ‘FunctionTolerance’=1e-6)

in MATLAB, supplying initial conditions corresponding to analytical values for a point on k0:

θ̂i = arctan
e1
Tvi

−e2Tvi

(
vi∈(u◦c◦h)(k0)

= arctan
sin φ̂i sin θ̂i

sin φ̂i cos θ̂i

)

φ̂i = arctan

√
(e1Tvi)2 + (e2Tvi)2

(e3Tvi) + e4Tvi


vi∈(u◦c◦h)(k0)= arctan

√
sin2 φ̂i

cos2 φ̂i


 (37)

We constrained solutions to θ̂i ∈ [0,π] and φ̂i = [0, 2π] according to the (θ,φ) similarity relation.

4.5.5 Optimal Klein Bottle Embedding

Let kα̂θ,φ ∈ Kθ,φ be the Klein bottle embedding that minimizes the sum of Euclidean distances in R
8 between

each image patch and the closest point on the embedding. To determine kα̂θ,φ given a functional form F,

we initialized the coefficient vector α̂ to have zero entries everywhere except for the values used in k0. We

then iterated between optimizing for (θ̂i, φ̂i) according to Equation 36 and for α̂ as shown below using

least-squares, until convergence:

α̂ = argminα
∑
i

‖(u◦ c◦h)
(
kα
θ̂i,φ̂i

)
−vi‖22 (38)

k1 ≡ kα̂θ,φ is the optimized Klein bottle embedding corresponding to F = [1, 10, 10, 20, 10], for which results

are shown in Figures 3H and S3D.

4.5.6 Noisy Klein Bottle Embeddings

The set of N ≈ 4.2×105 image patches was associated to k0 according to the procedure described in Methods

Section 4.5.4, yielding (θ̂i, φ̂i) values. Isotropic Gaussian noise of magnitude sσ was added element-wise in

R9 (prior to normalization by u ◦ c) to h(k0
θ̂i,φ̂i

), where s = mediani{‖h(k0
θ̂i,φ̂i

)‖2} ≈ 2.451. Figures 3F,G

and S3A correspond to noise with σ = 0.007, 0.01 and 0.03.

39

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


4.5.7 Parameters for Curvature Estimation

For all scalar curvature computations on image patch datasets and Klein bottle embeddings, we set d = 2

and Ncalib = 10
4. Unless the neighborhoods were manually specified, we used σh = 0.1, which yielded a flat

distribution of GOF p-values (2.5% of points reported GOF p-values ≤ α = 0.05) for the set of N ≈ 4.2×105

points on k0 closest to the image patches (shown in Figure 3E).

4.6 Details of scRNASeq Datasets

The PBMC dataset provided by 10x Genomics is comprised of N = 10194 PBMCs collected from a healthy

donor [42]. The mouse gastrulation dataset consists of N = 116312 cells collected at nine 6-hour intervals

from embryonic day 6.5 to 8.5 [43]. The mouse brain dataset is a benchmark from 10x Genomics consisting

of N = 1306127 cells collected from the cortex, hippocampus and ventricular zone of two embryonic mice

sacked at embryonic day 18 [44].

4.6.1 Preprocessing

For the PBMC dataset, we applied standard preprocessing steps using Seurat v3.1.2 [53] with default function

arguments, to extract PC projections and UMAP coordinates ourselves. Specifically, we removed cells

where the percentage of transcripts corresponding to mitochondrial genes exceeded 15%, or which had

fewer than 500 transcripts. This reduced the number of cells from 10194 to 9385. On this filtered set, we

normalized the data (NormalizeData(normalization.method=‘LogNormalize’, scale.factor=10000)), retained

the 2000 most variable genes (FindVariableFeatures(selection.method=‘vst’, nfeatures=2000)), and scaled

the data (ScaleData). Next, we performed linear dimensionality reduction using PCA down to 50 dimensions

(RunPCA(npcs=50)) and generated UMAP coordinates for visualization (RunUMAP(dims = 1:30)). For

the gastrulation (brain) dataset, we did not preprocess the data ourselves but instead directly used the 50

(20) PC projections and UMAP (t-SNE) visualization coordinates provided with the dataset. Please refer

to [43, 44] for additional details.

4.6.2 Cell Type Annotations

For the PBMC dataset, the AddModuleScore(ctrl=5) function was used to compute the per-cell average

expression of marker genes corresponding to seven different cell types [54]. To prepare Figure S4A, each

cell was assigned the cell type for which its average marker gene expression was the highest. Cell type

annotations for the gastrulation dataset (see Figure S5A) were sourced from Figure 1C of [43]. Cell type

40

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


annotations for the brain dataset (see Figure S6A) are predicted labels sourced from [55].

4.6.3 Statistical Tests

Here we describe the statistical tests applied to scalar curvatures computed for the scRNAseq datasets.

4.6.3.1 Spatial Precision of Errorbars Let m be the fraction of datapoints with 95% CIs containing

the scalar curvatures reported by their respective kNNs. To check whether m was significantly larger than

chance, we used a permutation test. We randomly assigned the kNN of each datapoint to be one of the

N datapoints in the dataset and computed m. We repeated the procedure T = 1000 times to generate an

empirical distribution of m for the null model of random neighbors. The reported p-value for each k is the

fraction of the T trials for which m was greater than the value computed for data. See Figures S4D, S5D

and S6D.

4.6.3.2 Sensitivity to Cell Downsampling To check the sensitivity of the computed scalar curvatures

to the average density of cells, we discarded f% of cells at random from the ambient space computed using

the original set of N datapoints, and recomputed scalar curvatures using the same ambient dimension,

manifold dimension and neighborhood sizes as for the original dataset (see Methods Section 4.6.4). Let m be

the fraction of downsampled datapoints with 95% CIs containing the scalar curvatures originally reported.

Since the CIs grow as f increases, we checked whether m was significantly larger than chance by using

a permutation test. We randomly paired each of the 95% CIs computed after downsampling, to one of

the scalar curvatures reported by the downsampled points for the original dataset, and computed m. We

repeated the procedure T = 1000 times to generate an empirical distribution of m for the null model. The

reported p-value for each f is the fraction of the T trials for which m was greater than the value computed

for data. See Figures S4I, S5I and S6I.

4.6.3.3 Sensitivity to Transcript Downsampling To check the sensitivity of the computed scalar

curvatures to the capture efficiency and sequencing depth of the data, we discarded f% of transcripts at

random from the single-cell count matrix for the PBMC dataset, then performed the same preprocessing steps

described in Methods Section 4.6.1. We recomputed scalar curvatures using the same ambient dimension,

manifold dimension and neighborhood sizes as for the original dataset (see Methods Section 4.6.4). Let m

be the fraction of datapoints with 95% CIs containing the scalar curvatures originally reported. To check

whether m was significantly larger than chance, we used a permutation test. We randomly paired each of the

95% CIs computed after downsampling transcripts, to one of the scalar curvatures computed for the original

41

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


dataset, and computed m. We repeated the procedure T = 1000 times to generate an empirical distribution

of m for the null model. The reported p-value for each f is the fraction of the T trials for which m was

greater than the value computed for data. See Figure S4J.

4.6.4 Parameters for Curvature Estimation

Let the variance explained by the ith PC be given by σ2i and the cumulative fractional variance of the first

m PCs by cm =
∑m
i=1 σ

2
i∑

i
σ2
i

. For each dataset, we selected the ambient dimension as n = argmaxm{cm|cm ≤

0.8}, the manifold dimension as d = argmaxm{cm|cm ≤ 0.64}, and considered the global length scale to

be L = 3σd. (n,d,L) = (8, 3, 18.3), (11, 3, 19.1) and (9, 5, 24.9) for the PBMC, gastrulation and brain

datasets respectively. For the three datasets, we computed scalar curvatures for manifold dimensions d− 1,

d and d + 1. It was not always possible to select σh for each dataset and manifold dimension, so that

the distribution of GOF p-values was flat, according to our usual heuristic. For consistency, we therefore

picked σh so that 1/3 of points had GOF p-values ≤ α = 0.05. For manifold dimension (d − 1,d,d + 1),

σh = (0.031, 0.041, 0.045), (0.036, 0.044, 0.053) and (0.034, 0.050, 0.055) for the PBMC, gastrulation and brain

datasets respectively.

5 Acknowledgements

DS was funded in part by the Natural Sciences and Engineering Research Council of Canada (NSERC

PGSD2-517131-2018). SW was supported by NCI U54-CA225088 and NIH NIGMS T32 GM008313. DS

and SH acknowledge funding from NIH NIGMS R00GM118910, U19 Systems Immunology Pilot Project

Grant at Harvard University, and the Harvard University William F. Milton Fund. The authors would like

to thank Peter Kharchenko and Allon Klein for helpful discussions. Portions of this research were conducted

on the O2 High Performance Compute Cluster, supported by the Research Computing Group, at Harvard

Medical School. See http://rc.hms.harvard.edu for more information.

6 Data and Code Availability

The van Hateren IML dataset is available at http://bethgelab.org/datasets/vanhateren and was loaded

according to the instructions there. The PBMC dataset is available at https://support.10xgenomics.

com/single-cell-gene-expression/datasets/4.0.0/Parent_NGSC3_DI_PBMC. The gastrulation dataset

can be retrieved using instructions found at https://github.com/MarioniLab/EmbryoTimecourse2018.

42

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

http://rc.hms.harvard.edu
http://bethgelab.org/datasets/vanhateren
https://support.10xgenomics.com/single-cell-gene-expression/datasets/4.0.0/Parent_NGSC3_DI_PBMC
https://support.10xgenomics.com/single-cell-gene-expression/datasets/4.0.0/Parent_NGSC3_DI_PBMC
https://github.com/MarioniLab/EmbryoTimecourse2018
https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


The brain dataset is available at https://support.10xgenomics.com/single-cell-gene-expression/

datasets/1.3.0/1M_neurons. The software package described here to compute scalar curvature is avail-

able at https://gitlab.com/hormozlab/ManifoldCurvature. All code and instructions to reproduce the

numerics and figures in this study will be made available upon publication.

43

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons
https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons
https://gitlab.com/hormozlab/ManifoldCurvature
https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


7 Supplementary Figures

44

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


A

B C D E F

H

G

Figure S1: The scalar curvature of S2 is poorly estimated using the Laplace-Beltrami operator.
(A) The heat-trace with m terms, (zm(x) in Equation 3) is shown for m = ∞ (black), m = 1000 (solid blue) and m = 49
(solid red), when evaluated with analytical eigenvalues for S2. Empirical eigenvalues were obtained by uniformly sampling
N = 104 points from S2 (see Figure 1A; Methods Section 4.4.1.1) and estimating the Laplace-Beltrami (LB) operator using
Equations 13-15. The heat-trace evaluated using these empirical eigenvalues, zm, is shown for m = 1000 (dashed blue) and
m = 49 (dashed red). The heat-trace evaluated using eigenvalues obtained by interpolating between the analytical and empirical
values (z̃m(x; f) in Equation 11) is shown for m = 49 and f = 0.23 (solid green). f signifies that the fractional error of the
interpolated eigenvalues is reduced by 1−f relative to the empirical eigenvalues. f = 0 corresponds to the analytical eigenvalues
while f = 1 corresponds to the empirical eigenvalues. The white region bounded by [x1,x2] indicates a candidate interval over
which to fit a heat-trace to a quadratic in order to extract an estimate for the scalar curvature (see Equations 2-4; Methods
Section 4.2.1). On the one hand, since the knee of zm(x) shifts to the left as m increases (i.e. zm(x) converges from ∞),
larger m results in more intervals for which zm(x) well-approximates z∞(x) and will therefore yield accurate scalar curvature
estimates. On the other hand, zm(x) becomes a worse estimator for zm(x) as m increases.
(B) Scalar curvatures estimated by fitting z∞(x) to a quadratic over different intervals [x1,x2] as defined in (A). Scalar
curvatures are shown in color for intervals yielding accurate estimates (S ∈ [1.5, 2.5]). This colored region corresponds to D∞.
(C) As in (B) but with estimates obtained by fitting a quadratic to z1000(x). The colored region corresponds to D1000. By
inspection, D1000 ⊂ D∞.
(D) Scalar curvatures estimated by fitting z1000(x) to a quadratic over each interval in D1000. Though D1000 was constructed
using only intervals which yielded an accurate scalar curvature estimate when analytical eigenvalues were used in the heat-trace,
no interval in D1000 yields an accurate scalar curvature estimate when the same number of empirical eigenvalues are used in
the heat-trace instead.
(E) As in (B) but with estimates obtained by fitting a quadratic to z49(x). The colored region corresponds to D49. By
inspection, D49 ⊂ D1000
(F) As in (D) but with estimates obtained by fitting z49(x) to a quadratic over each interval in D49. No estimate is accurate
just as in (D).
(G) As in (F) but with estimates obtained by fitting z̃49(x; f = 0.23) to a quadratic over each interval in D49. f = 0.23 was
chosen so that half the intervals in D49 yield an accurate scalar curvature estimate.
(H) (Left) The fractional error in the first 49 empirical eigenvalues of the LB estimator from (A) is shown in red. This
operator was computed using the Gaussian kernel (Wg in Equation 15). Eigenvalues 37-49 have a fractional error of 60%. The
fractional error of the eigenvalues of LB estimators computed on the same N = 104 points but using the weighted kNN and
r-neighborhood kernels (WkNN and Wr respectively in Equation 16) is also plotted. Positive error indicates under-estimation.
(Right) Projected fractional error for eigenvalues 37-49 of the LB estimator with Gaussian kernel computed using a larger
sample size (N). The projection is based on the convergence rate given in Theorem 1 of [30], assuming that the big-O bound
is sharp at N = 104 for eigenvalues 37-49. The dashed green line corresponds to the 14% fractional error needed for scalar
curvatures to be accurately estimated for half the intervals in D49. This corresponds to f = 0.23 in (G) since 60% ×f = 14%.
For the LB estimator computed using the Gaussian kernel, achieving this fractional error requires N ≈ 107. Since LB estimators
computed using the other kernels have the same convergence rate but larger fractional error at N = 104, these estimators would
require even larger N to achieve the desired 14% fractional error.

45

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


A

B C

D

Figure S2: Sensitivity of algorithm to real-world confounders.
(A) (Left) A dataset with a sparse periphery and a dense core was formed by uniformly sampling N1 = 10

4 points from the
3-dimensional cube of side-length 10, D310, and N2 = 10

3 points from the 3-dimensional cube of side-length 1, D31 (see Methods
Section 4.4.1.4). These points were embedded in R11 and padded with isotropic Gaussian noise of magnitude σ = 0.01 in the
8 normal directions. Scalar curvatures (S) were computed on this dataset of N1 + N2 points by setting σh and are plotted
against their standard errors (σS) in the leftmost panel. Curvature computations were also performed at fixed length scales
corresponding to the 5, 50 and 95%-ile values for neighborhood size (left to right) used in the leftmost panel (r = 0.54, 0.90
and 1.22 respectively). Here, points for which the chosen r led to neighborhoods with insufficient points for regression are not
shown. For large length scales, all points in the dense region are able to report curvatures but are crowded into the apex of the
plots. The N1 (N2) sparse (dense) points are shown in blue (green). Points enclosed by the red lines have 95% CIs including
the true value of zero. The right four panels show analogous results when N2 = 10

4. Here the the 5, 50 and 95%-ile values for
neighborhood size are r = 0.37, 0.63 and 1.42 respectively. See Methods Section 4.4.2.1.
(B) Distribution of scalar curvatures computed for N = 104 points uniformly sampled from S2 ⊂ R3 and convoluted with
isotropic Gaussian noise of magnitude σ in R3. Noise confounds accurate scalar curvature computation when σ is roughly 10%
of the sphere’s radius. The deviation of the estimated scalar curvatures from the true value of 2 (shown as a dashed red line)
for σ ≥ 0.1 reflects the nontrivial geometry of a manifold convoluted by noise. See Methods Section 4.4.2.2.
(C) (Left) N = 104 points were uniformly sampled from S2 and embedded in Rn. Isotropic Gaussian noise of magnitude σ was
applied to each of the n ambient dimensions. Scalar curvatures computed by keeping σh fixed for all n and σ, recapitulated
the true value of 2 (shown as dashed red lines) for n ≤ 80 and σ ≤ 0.05. (Right) The neighborhood size (r) necessary to attain
σh is less sensitive to changes in n than changes in σ. See Methods Section 4.4.2.3.
(D) N = 104 points were uniformly sampled from (left) S3 ⊂ R5 convoluted with isotropic Gaussian noise in the ambient space
with σ = 0.01 and (right) S2 ×S2 ⊂ R6. To investigate the effects of choosing the manifold dimension, d, differently than the
true value, d∗, σh was kept fixed, and scalar curvatures were computed for d = d

∗−1 (cyan), d = d∗ + 1 (magenta) and d = d∗
(green). The panels show the distribution of (left to right) scalar curvatures (S), standard errors (σS) and GOF p-values. The
true value of the scalar curvature (at d = d∗) is constant across both manifolds and shown as a dashed red line. The average
neighborhood size (r averaged over all points) is much larger for both d = d∗ − 1 and d = d∗ + 1 than for d = d∗ as shown in
the legend. For the same σh, d = d

∗− 1 also leads to a more skewed distribution of GOF p-values relative to d = d∗, while the
distribution for d = d∗ + 1 is still flat. See Methods Section 4.4.2.4.

46

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


A B C

D E F

Figure S3: Additional details of the image patch dataset and Klein bottle embeddings (related to Figure 3).
(A) To compute scalar curvatures for Figure 3E, each image patch was associated to the (θ0,φ0) coordinates of the closest point
on k0. Here we select a handful of these associated points on k0 (shown in black) and visualize how neighborhoods chosen in R8
to compute scalar curvatures for Figure 3E appear in (θ0,φ0) coordinates (shown in red). When noise of increasing magnitude,
σ, is added to the set of closest points on k0 (see Methods Section 4.5.6), the neighborhood size at each point grows until σh
is attained.
(B) As in (A), but showing neighborhoods used in computing the scalar curvatures in Figure 3D for the image patch dataset.
Note the close correspondence in neighborhood size with σ = 0.03 in (A).
(C) Scalar curvatures computed for the set of closest points (θ0,φ0) on k

0 as in Figure 3E, but using the same neighborhood
sizes determined for the image patch dataset shown in Figure 3D, some of which are visualized in (B).
(D) As in (A) but showing neighborhoods used in computing the scalar curvatures in Figure 3H for the set of closest points on
k1. Neighborhoods are visualized on (θ0,φ0) coordinates instead of (θ1,φ1) coordinates for ease of comparison.
(E) As in (B) but showing neighborhoods used in computing the scalar curvatures in Figure 3J for the augmented image patch
dataset.
(F) Scalar curvatures computed for the augmented image patch dataset with N ≈ 1.3 × 108 points as in Figure 3J, but using
the same neighborhood sizes determined for the original image patch dataset with N ≈ 4.2 × 105 shown in Figure 3D and (B).
Note the close correspondence with Figure 3D.

47

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


J

C D

E

H

G

F

I

A

B

Figure S4

48

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


Figure S4: Additional details of the PBMC scRNAseq dataset (related to Figure 4).
(A) Cell types overlaid onto UMAP coordinates and sorted in decreasing order of abundance in the legend. Cells were annotated
as described in Methods Section 4.6.2.
(B) A goodness-of-fit p-value was computed for each point by applying Mardia’s test to the residuals obtained from fitting the
neighborhood around the point to a quadratic function (see Methods Section 4.3.3). These p-values are visualized on UMAP
coordinates corresponding to each point (left) and their empirical distribution is shown using a histogram (right). Small p-values
suggest that the residuals are non-normal so that approximating local neighborhoods as quadratic may not be valid.
(C) Pearson correlation between the scalar curvature reported by each point and its kth-nearest neighbor (kNN) for different
k (shown in blue). The red bar shows the mean and standard deviation of the Pearson correlation when neighbors are chosen
randomly over 1000 trials (*p < 10−6).
(D) The percentage of points with 95% CIs containing the scalar curvatures reported by their respective kNNs (shown in blue).
The red bar shows the mean and standard deviation of this percentage when neighbors are chosen randomly over 1000 trials
(*p < 0.001; see Methods Section 4.6.3.1).
(E) The neighborhood size (r) used for computing scalar curvature at each point, overlaid onto UMAP coordinates (left) and
a corresponding histogram of the empirical distribution (right). The dashed red lines correspond to the 25, 50, and 75%-ile
values of r(p) used for computing scalar curvatures at fixed neighborhood sizes for Figure 4C. See Methods Section 4.3.2.
(F) The number of points in each neighborhood (corresponding to the neighborhood sizes in (E)) overlaid onto UMAP co-
ordinates (left) and a corresponding histogram of the empirical distribution (middle). (Right) The set of neighbors used for
computing scalar curvature (purple) is visualized on UMAP coordinates for a handful of points (black).
(G) Scalar curvatures were computed for manifold dimension d− 1 (left) and d + 1 (right). They are plotted here on UMAP
coordinates after smoothing over the same set of k = 250 neighbors used in Figure 4A. See Methods Section 4.6.4.
(H) The total number of transcripts observed in each cell overlaid onto UMAP coordinates.
(I) Scalar curvatures were computed after downsampling the number of cells in the ambient space by a factor of 2 (left) and 4
(middle), using the same ambient dimension, manifold dimension and neighborhood sizes determined for the original dataset.
They are plotted here on UMAP coordinates after smoothing over the same set of neighbors (which survive downsampling) used
in Figure 4A. (Right) The percentage of points in the downsampled datasets with a 95% CI containing the originally reported
scalar curvature (blue), and likewise for a negative control obtained by randomly pairing 95% CIs and originally reported scalar
curvatures for points in the downsampled dataset (red). Errorbars for the negative control are the standard deviation of this
percentage over 1000 trials with different random pairings (*p < 0.001; see Methods Section 4.6.3.2).
(J) Scalar curvatures were computed after downsampling the number of transcripts by a factor of 2 (left) and 4 (middle),
using the same ambient dimension, manifold dimension and neighborhood sizes determined for the original dataset. They are
plotted here on UMAP coordinates after smoothing over the same set of k = 250 neighbors used in Figure 4A. (Right) The
percentage of points in the downsampled datasets with a 95% CI containing the originally reported scalar curvature (blue), and
likewise for a negative control obtained by randomly pairing 95% CIs and originally reported scalar curvatures for points in the
downsampled dataset (red). Errorbars for the negative control are the standard deviation of this percentage over 1000 trials
with different random pairings (*p < 0.001; see Methods Section 4.6.3.3).

49

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


B C D

E

H

G

F

I

A

Figure S5: Additional details of the gastrulation scRNAseq dataset (related to Figure 4). Panels as in Figure S4.

50

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


B C D

E

H

G

F

I

A

Figure S6: Additional details of the brain scRNAseq dataset (related to Figure 4). Panels as in Figure S4 but with
t-SNE instead of UMAP plots.

51

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


References

[1] A. M. Klein, L. Mazutis, I. Akartuna, N. Tallapragada, A. Veres, V. Li, L. Peshkin, D. A. Weitz, and

M. W. Kirschner. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.

Cell, 161(5):1187–1201, 2015.

[2] E. Z. Macosko, A. Basu, R. Satija, J. Nemesh, K. Shekhar, M. Goldman, I. Tirosh, A. R. Bialas,

N. Kamitaki, E. M. Martersteck, J. J. Trombetta, D. A. Weitz, J. R. Sanes, A. K. Shalek, A. Regev,

and S. A. McCarroll. Highly parallel genome-wide expression profiling of individual cells using nanoliter

droplets. Cell, 161(5):1202–1214, 2015.

[3] G. X. Y. Zheng, J. M. Terry, P. Belgrader, P. Ryvkin, Z. W. Bent, R. Wilson, S. B. Ziraldo, T. D.

Wheeler, G. P. McDermott, J. Zhu, M. T. Gregory, J. Shuga, L. Montesclaros, J. G. Underwood, D. A.

Masquelier, S. Y. Nishimura, M. Schnall-Levin, P. W. Wyatt, C. M. Hindson, R. Bharadwaj, A. Wong,

K. D. Ness, L. W. Beppu, H. J. Deeg, C. McFarland, K. R. Loeb, W. J. Valente, N. G. Ericson, E. A.

Stevens, J. P. Radich, T. S. Mikkelsen, B. J. Hindson, and J. H. Bielas. Massively parallel digital

transcriptional profiling of single cells. Nature Communications, 8(1):1–12, 2017.

[4] D. R. Bandura, V. I. Baranov, O. I. Ornatsky, A. Antonov, R. Kinach, X. Lou, S. Pavlov, S. Voro-

biev, J. E. Dick, and S. D. Tanner. Mass cytometry: technique for real time single cell multitarget

immunoassay based on inductively coupled plasma time-of-flight mass spectrometry. Analytical Chem-

istry, 81(16):6813–6822, 2009.

[5] C. Giesen, H. A. O. Wang, D. Schapiro, N. Zivanovic, A. Jacobs, B. Hattendorf, P. J. Schüffler,

D. Grolimund, J. M. Buhmann, S. Brandt, Z. Varga, P. J. Wild, D. Günther, and B. Bodenmiller.

Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry. Nature

Methods, 11(4):417–422, 2014.

[6] J-R. Lin, M. Fallahi-Sichani, J-Y. Chen, and P. K. Sorger. Cyclic immunofluorescence (CycIF), a highly

multiplexed method for single-cell imaging. Current Protocols in Chemical Biology, 8(4):251–264, 2016.

[7] J-R. Lin, B. Izar, S. Wang, C. Yapp, S. Mei, P. M. Shah, S. Santagata, and P. K. Sorger. Highly

multiplexed immunofluorescence imaging of human tissues and tumors using t-CyCIF and conventional

optical microscopes. eLife, 7, 2018.

[8] L. H. Nguyen and S. Holmes. Ten quick tips for effective dimensionality reduction. PLoS Computational

Biology, 15(6):e1006907, 2019.

52

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


[9] J. B. Tenenbaum. A global geometric framework for nonlinear dimensionality reduction. Science,

290(5500):2319–2323, 2000.

[10] L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research,

9(Nov):2579–2605, 2008.

[11] E. Becht, L. McInnes, J. Healy, C-A. Dutertre, I. W. H. Kwok, L. G. Ng, F. Ginhoux, and E. W.

Newell. Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology,

37(1):38–44, 2018.

[12] A. Hatcher. Algebraic Topology. Cambridge University Press, 2001.

[13] R. Ghrist. Barcodes: the persistent topology of data. Bulletin of the American Mathematical Society,

45(01):61–76, 2007.

[14] D. Perrault-Joncas and M. Meilâ. Non-linear dimensionality reduction: Riemannian metric estimation

and the problem of geometric discovery. arXiv, 2013.

[15] J. M. Lee. Riemannian Manifolds: An Introduction to Curvature (Graduate Texts in Mathematics).

Springer, 1997.

[16] A. Zomorodian and G. Carlsson. Computing persistent homology. Discrete & Computational Geometry,

33(2):249–274, 2004.

[17] G. Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46(2):255–308, 2009.

[18] M. Bernstein, V. De Silva, J. C. Langford, and J. B. Tenenbaum. Graph approximations to geodesics

on embedded manifolds. Technical report, Department of Psychology, Stanford University, 2000.

[19] F. Chazal, M. Glisse, C. Labruère, and B. Michel. Convergence rates for persistence diagram estimation

in topological data analysis. Journal of Machine Learning Research, 16(1):3603–3635, 2015.

[20] C. R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman. Minimax manifold estimation.

Journal of Machine Learning Research, 13(1):1263–1291, 2012.

[21] G. Carlsson, T. Ishkhanov, V. De Silva, and A. Zomorodian. On the local behavior of spaces of natural

images. International Journal of Computer Vision, 76(1):1–12, 2008.

[22] P. Lawson, A. B. Sholl, J. Q. Brown, B. T. Fasy, and C. Wenk. Persistent homology for the quantitative

evaluation of architectural features in prostate cancer histology. Scientific Reports, 9(1):1–15, 2019.

53

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


[23] J. M. Chan, G. Carlsson, and R. Rabadan. Topology of viral evolution. Proceedings of the National

Academy of Sciences, 110(46):18566–18571, 2013.

[24] P. G. Cámara, A. J. Levine, and R. Rabadán. Inference of ancestral recombination graphs through

topological data analysis. PLoS Computational Biology, 12(8):e1005071, 2016.

[25] E. Abbott. Flatland: A Romance of Many Dimensions. Princeton University Press, 1991.

[26] M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering.

Advances in Neural Information Processing Systems, 14:585–591, 2001.

[27] M. Reuter, F-E. Wolter, and N. Peinecke. Laplace–Beltrami spectra as ‘Shape-DNA’ of surfaces and

solids. Computer-Aided Design, 38(4):342–366, 2006.

[28] M. Belkin, J. Sun, and Y. Wang. Constructing Laplace operator from point clouds in Rd. In Proceedings

of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1031–1040, 2009.

[29] J. Liang, R. Lai, T. W. Wong, and H. Zhao. Geometric understanding of point clouds using Laplace-

Beltrami operator. In IEEE Conference on Computer Vision and Pattern Recognition, pages 214–221,

2012.

[30] N. G. Trillos, M. Gerlach, M. Hein, and D. Slepčev. Error estimates for spectral convergence of the

graph Laplacian on random geometric graphs toward the Laplace–Beltrami operator. Foundations of

Computational Mathematics, 20(4):827–887, 2020.

[31] H. P. McKean Jr. and I. M. Singer. Curvature and the eigenvalues of the Laplacian. Journal of

Differential Geometry, 1(1-2):43–69, 1967.

[32] B. Andrews. Lectures on Differential Geometry. https://maths-people.anu.edu.au/~andrews/DG.

Australian National University.

[33] I. T. Jolliffe and J. Cadima. Principal component analysis: a review and recent developments.

Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences,

374(2065):20150202, 2016.

[34] H. Federer. Curvature measures. Transactions of the American Mathematical Society, 93(3):418–418,

1959.

[35] P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submanifolds with high confidence

from random samples. Discrete & Computational Geometry, 39(1-3):419–441, 2008.

54

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://maths-people.anu.edu.au/~andrews/DG
https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


[36] U. Ozertem and D. Erdogmus. Locally defined principal curves and surfaces. Journal of Machine

Learning Research, 12:1249–1286, 2011.

[37] C. R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman. Nonparametric ridge estimation.

The Annals of Statistics, 42(4):1511–1545, 2014.

[38] R. W. Buccigrossi and E. P. Simoncelli. Image compression via joint statistical characterization in the

wavelet domain. IEEE Transactions on Image Processing, 8(12):1688–1701, 1999.

[39] J. Malik, S. Belongie, T. Leung, and J. Shi. Contour and texture analysis for image segmentation.

International Journal of Computer Vision, 43(1):7–27, 2001.

[40] A. B. Lee, K. S. Pedersen, and D. Mumford. The nonlinear statistics of high-contrast patches in natural

images. International Journal of Computer Vision, 54(1-2):83–103, 2003.

[41] J. H. van Hateren and A. van der Schaaf. Independent component filters of natural images compared

with simple cells in primary visual cortex. Proceedings: Biological Sciences, 265(1394):359–366, 1998.

[42] 10x Genomics. PBMCs from a Healthy Donor: Whole Transcriptome Analysis. https://support.

10xgenomics.com/single-cell-gene-expression/datasets/4.0.0/Parent_NGSC3_DI_PBMC, 2020.

[43] B. Pijuan-Sala, J. A. Griffiths, C. Guibentif, T. W. Hiscock, W. Jawaid, F. J. Calero-Nieto, C. Mulas,

X. Ibarra-Soria, R. C. V. Tyser, D. L. L. Ho, W. Reik, S. Srinivas, B. D. Simons, J. Nichols, J. C.

Marioni, and B. Göttgens. A single-cell molecular map of mouse gastrulation and early organogenesis.

Nature, 566(7745):490–495, 2019.

[44] 10x Genomics. 1.3 Million Brain Cells from E18 Mice. https://support.10xgenomics.com/

single-cell-gene-expression/datasets/1.3.0/1M_neurons, 2017.

[45] D. van Dijk, R. Sharma, J. Nainys, K. Yim, P. Kathail, A. J. Carr, C. Burdziak, K. R. Moon, C. L.

Chaffer, D. Pattabiraman, B. Bierie, L. Mazutis, G. Wolf, S. Krishnaswamy, and D. Pe’er. Recovering

gene interactions from single-cell data using data diffusion. Cell, 174(3):716–729, 2018.

[46] L. Haghverdi, M. Büttner, F. A. Wolf, F. Buettner, and F. J. Theis. Diffusion pseudotime robustly

reconstructs lineage branching. Nature Methods, 13(10):845–848, 2016.

[47] A. Klimovskaia, D. Lopez-Paz, L. Bottou, and M. Nickel. Poincaré maps for analyzing complex hierar-

chies in single-cell data. Nature Communications, 11(1):1–9, 2020.

55

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://support.10xgenomics.com/single-cell-gene-expression/datasets/4.0.0/Parent_NGSC3_DI_PBMC
https://support.10xgenomics.com/single-cell-gene-expression/datasets/4.0.0/Parent_NGSC3_DI_PBMC
https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons
https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons
https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/


[48] S. Wang, J-R. Lin, E. D. Sontag, and P. K. Sorger. Inferring reaction network structure from single-cell,

multiplex data, using toric systems theory. PLoS Computational Biology, 15(12):e1007311, 2019.

[49] M. Hein, J-Y. Audibert, and U. von Luxburg. Graph Laplacians and their convergence on random

neighborhood graphs. Journal of Machine Learning Research, 8(48):1325–1370, 2007.

[50] D. Ting, L. Huang, and M. Jordan. An analysis of the convergence of graph Laplacians. arXiv, 2011.

[51] K. V. Mardia. Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3):519–

530, 1970.

[52] P. Campadelli, E. Casiraghi, C. Ceruti, and A. Rozza. Intrinsic dimension estimation: Relevant tech-

niques and a benchmark framework. Mathematical Problems in Engineering, 2015:1–21, 2015.

[53] A. Butler, P. Hoffman, P. Smibert, E. Papalexi, and R. Satija. Integrating single-cell transcriptomic

data across different conditions, technologies, and species. Nature Biotechnology, 36(5):411–420, 2018.

[54] Y. Hu, M. Ranganathan, C. Shu, X. Liang, S. Ganesh, A. Osafo-Addo, C. Yan, X. Zhang, B. E.

Aouizerat, J. H. Krystal, D. C. D’Souza, and K. Xu. Single-cell transcriptome mapping identifies

common and cell-type specific genes affected by acute delta9-tetrahydrocannabinol in humans. Scientific

Reports, 10(1):1–14, 2020.

[55] K. Xie, Y. Huang, F. Zeng, Z. Liu, and T. Chen. scAIDE: clustering of large-scale single-cell RNA-seq

data reveals putative and rare cell types. NAR Genomics and Bioinformatics, 2(4), 2020.

56

.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made 

The copyright holder for this preprintthis version posted January 8, 2021. ; https://doi.org/10.1101/2021.01.08.425885doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425885
http://creativecommons.org/licenses/by-nc-nd/4.0/

	Introduction
	Results
	Estimators of the Laplace-Beltrami Operator Yield Inaccurate Scalar Curvatures
	Curvature Can Be Computed Accurately Using the Second Fundamental Form
	Curvature of Image Patch Manifold is Consistent with a Noisy Klein Bottle
	scRNAseq Datasets have Non-Trivial Intrinsic Curvature

	Discussion
	Methods
	Differential Geometry of Theoretical Manifolds
	Details of Intrinsic Approach to Curvature Estimation
	Approach for S2
	Infinite Series
	Truncated Series
	Eigenvalue Convergence
	Estimating the Laplace-Beltrami Operator from Data

	Details of Extrinsic Approach to Curvature Estimation
	Quadratic Regression on Local Neighborhoods of Data
	Selecting Local Neighborhoods for Regression
	Goodness-of-Fit Test for Quadratic Regression
	Standard Error and Bias of Scalar Curvature Estimate
	Note on Length Scales

	Details of Toy Manifold Curvature Computations
	Analytical Forms
	Hypersphere
	One-Sheet Hyperboloid
	Ring Torus
	Hypercube

	Practical Issues for Curvature Estimation on Real-World Datasets
	Non-Uniform Sampling
	Observational Noise
	Large Ambient Dimension
	Choice of Manifold Dimension

	Parameters for Curvature Estimation

	Details of Image Patch Dataset and Klein Bottle Manifolds
	Notation and Preliminaries
	Image Dataset
	Parametric Family of Klein Bottle Embeddings
	Associating Image Patches to a Klein Bottle Embedding
	Optimal Klein Bottle Embedding
	Noisy Klein Bottle Embeddings
	Parameters for Curvature Estimation

	Details of scRNASeq Datasets
	Preprocessing
	Cell Type Annotations
	Statistical Tests
	Spatial Precision of Errorbars
	Sensitivity to Cell Downsampling
	Sensitivity to Transcript Downsampling

	Parameters for Curvature Estimation


	Acknowledgements
	Data and Code Availability
	Supplementary Figures
	References