key: cord-0331423-xdjf3h44 authors: Xia, Cedric Huchuan; Barnett, Ian; Tapera, Tinashe M.; Cui, Zaixu; Moore, Tyler M.; Adebimpe, Azeez; Rush-Goebel, Sage; Piiwaa, Kayla; Murtha, Kristin; Linguiti, Sophia; Leibenluft, Ellen; Brotman, Melissa A.; Martin, Melissa Lynne; Calkins, Monica E.; Roalf, David R.; Wolf, Daniel H.; Bassett, Danielle S.; Lydon-Staley, David M.; Baker, Justin T.; Ungar, Lyle; Satterthwaite, Theodore D. title: Mobile Footprinting: Linking Individual Distinctiveness in Mobility Patterns to Mood, Sleep, and Brain Functional Connectivity date: 2021-05-18 journal: bioRxiv DOI: 10.1101/2021.05.17.444568 sha: 9b589b8b58023827eff760942cfb384f0808c116 doc_id: 331423 cord_uid: xdjf3h44 Mapping individual differences in behavior is fundamental to personalized neuroscience. Here, we establish that statistical patterns of smartphone-based mobility features represent unique “footprints” that allow individual identification. Critically, mobility footprints exhibit varying levels of person-specific distinctiveness and are associated with individual differences in affective instability, circadian irregularity, and brain functional connectivity. Together, this work suggests that real-world mobility patterns may provide an individual-specific signature linking brain, behavior, and mood. Linking individual differences in behavior to brain function is a central task of behavioral 3 neuroscience 1 . However, quantifying complex human behavior in real world settings remains a 4 challenge. One alternative to standard behavioral assessment is digital phenotyping, which uses 5 mobility data from personal smartphones to quantify moment-by-moment human behavior 2 . 6 Prior work has associated geolocation features to important clinical outcomes in psychiatric 7 disorders such as bipolar disorder and schizophrenia 3 , and has linked accelerometer metrics to 8 post-surgical recovery 4,5 . Furthermore, researchers have recently begun to capitalize on the 9 substantial variability of behavior assessed with digital phenotyping to link individual 10 differences in brain and behavior. For example, lower prefrontal activity during processing 11 negative emotions has been associated with individual exposure to urban green space 6 , while 12 greater functional coupling of the hippocampus and striatum has been linked to location 13 variability 7 . 14 15 While these studies suggest that digital phenotyping can be a powerful tool for studying 16 individual differences, it remains unknown whether mobility patterns are in fact person-specific. 17 Recent high-impact work has established that individual humans have unique patterns of 18 functional brain connectivity 8, 9 . The uniqueness of such brain-based "fingerprints" (also called 19 "connectotypes" 10 ) have been associated with development, cognition, and psychiatric 20 conditions 11 . Establishing analogous person-specific mobility patterns -or mobility "footprints" 21 -would constitute an important advance in behavioral neuroscience, and provide the foundation 22 for targeted, individual-specific interventions. Accordingly, here we test the hypothesis that 23 mobility patterns derived from personal smartphones can be used to create person-specific 24 behavioral footprints. Furthermore, we evaluate whether the distinctiveness of these footprints 25 was related to individual differences in mood, sleep, and brain functional connectivity. As part of a study of trans-diagnostic affective instability in youth, we tracked 3,317 28 person-days of geolocation and 2,972 person-days of accelerometer data from 41 adolescents and 29 young adults (28 females; mean [s.d.] age = 23.4 [3.5] years, range 17-30 years) -approximately 30 3 months per individual (Fig. 1a, Supplementary Fig. 1 ). In this sample, 93% of participants 31 reported clinically significant affective instability in the context of psychiatric disorders 32 (especially borderline personality disorder; see Supplementary Table 1 ). After applying hot-33 deck imputation to missing GPS data as implemented in the Smartphone Sensor Pipeline 13 , we 34 constructed the daily mobility trajectory for each participant (Fig. 1b & c; see Online Methods). 35 Instead of using raw coordinates that would allow trivial individual identification (and raise 36 privacy concerns) given a participant's exact location, we extracted high-level summary statistics 37 of mobility features. These features (15 geolocation-based and 7 accelerometer-based) included 38 time spent at home, number of locations visited, and many others (Fig. 1d, Supplementary 39 Table 2). to-2D standard space projection and were further imputed for missing data. b) For each individual, we 5 constructed daily personal mobility trajectories, which consist of flights (movement) and pauses 6 (stationary segments). Length of linear lines represents the duration of flights and size of circles 7 represents the duration of pauses. Warm and cold colors indicate daytime and nighttime, respectively. c) 8 A representative week of trajectories is shown, which demonstrates rich characteristics of personal 9 mobility patterns formed over time. d) We extracted timeseries of mobility statistics (e.g. daily time spent 10 at home) from geolocation and accelerometer data that parameterize movement characteristics over weeks 11 to months. The example represented all 110 days of participants' geolocation metrics recorded. e) For 12 each individual, we constructed a covariance matrix from the mobility metric timeseries. Each cell of the 13 matrix was populated by the Pearson correlation between a given pair of mobility metrics. Warm and cold 14 colors indicate positive and negative correlations, respectively. f) We randomly divided data into two 15 equally sized parts, called the reference and target set. Subj X from the target set was matched to the 16 subject in the reference that had the highest correlations between their footprints (argmax(r1, r2, ..rN)). The 17 identification was considered correct when underlying data came from the same subject; otherwise, the 18 identification was considered incorrect. We quantified individual identification accuracy as the proportion 19 of correct identifications across the entire sample; this procedure was repeated 1,000 times across 20 different random partitions of the data. When tracked over weeks to months, these timeseries of mobility statistics captured rich 23 characteristics of individual mobility patterns. One illustrative example of the sensitivity of the 24 timeseries to track mobility patterns is when COVID-19 pandemic hit the Philadelphia area 25 towards the end of the study period. Participants who were still engaged in active data collection 1 (n=3) exhibited dramatic shifts in mobility features (Supplementary Fig. 2 ). Of note, as the data 2 points during COVID-19 represented merely 1.1% of all data, the findings reported below did 3 not change significantly when these data were removed. 4 5 Drawing on prior work of brain connectome "fingerprinting," 8,10 we created a covariance 6 matrix of each participant's fifteen geolocation-based and seven accelerometer-based mobility 7 features timeseries to identify individuals (Fig. 1e) , akin to a person-specific mobility 8 "footprint." Data from each individual is was partitioned into two groups: the target partition and 9 the reference partition. For each individual, the data in the target partition was separately 10 correlated with every individual's data in the reference partition; this procedure yielded 41 11 correlation values. A correct identification was declared only when the maximum correlation 12 was from the data belonging to the same individual across the target and reference partitions 13 (Fig. 1f) . In order to ensure that the random partitioning of the data did not impact results, this 14 matching procedure was then repeated for each individual 1000 times (Online Methods). Initial inspection across random partitions of the data revealed that it was visually 17 apparent that there was substantially greater correlation between mobility footprints within 18 participants rather than between participants (Fig. 2a) . Permutation testing on the entire sample 19 revealed that individuals could be successfully identified using their mobility footprints (p 20 <0.001; Fig. 2b ). Across 1,000 random data partitions, the mean individual identification 21 accuracy was 63%. Critically, this accuracy was far better than chance performance determined 22 by a permuted null distribution (mean: 3% accuracy; see Moving beyond aggregate measures of accuracy across the group, we next investigated 25 whether certain individuals could be consistently identified more accurately than others. Similar 26 to prior studies of brain connectome fingerprinting 8,10,11 , we refer to this measure as an 27 individual's "footprint distinctiveness". Notably, individuals exhibited a wide distribution of 28 footprint distinctiveness, ranging from 4% to 99% (Fig. 2c) . In other words, certain participants 29 had such distinct mobility patterns that it enabled correct identification nearly every single time; 30 other participants were difficult to identify. Nonetheless, permutation testing showed that all 31 participants had significant footprint distinctiveness compared to the null distribution. 32 33 As the group and individual level accuracy results reported thus far were based on the 34 combination of geolocation and accelerometer features, we next examined each feature set 35 separately. Individual footprint distinctiveness derived by geolocation was not correlated with 36 that of accelerometer (r = 0.18; p = 0.26). Interestingly, while accelerometer data alone yielded 37 lower identification accuracy (28%) than geolocation data (55%), combining these features 38 resulted in higher identification accuracy, suggesting that they encode complementary 39 information (Fig. 2d) . Importantly, individual identification accuracy was stable across different 40 inclusion thresholds for data missingness and was robust to removal of individual mobility 41 features (Supplementary Fig. 3) . was readily apparent that mobility features were more highly correlated within participants (on diagonal) 4 across data partitions than between participants (off diagonal). Note that this visualization was not used in 5 statistical analysis or individual identification. b) Across 1,000 random partitions, mobility footprinting 6 enabled successful individual identification (mean: 63%, S.D.: 6%). In contrast, the mean chance 7 accuracy from 1,000 permutation was 3% (inset, p <0.001). c) For each individual, we calculated the 8 footprint distinctiveness, or the percentage of correct identification across the 1,000 random partitions of 9 the data. Ranked in ascending order, participants' footprint distinctiveness exhibited a wide range, from 10 4% to 99%. However, even the participant with the lowest identification distinctiveness was significantly 11 higher than the null distribution. d) Individual identification based on geolocation alone had higher 12 accuracy than accelerometer alone. However, they appeared to encode complementary features, as 13 performance was maximal when both measures were used in footprinting. We next investigated participant factors that influenced footprint distinctiveness. We 17 found that data quantity (i.e. number of days recorded) was associated with footprint 18 distinctiveness (Supplementary Fig. 4) . In contrast, the amount of missing data within a given 19 day was unrelated to footprint distinctiveness. Based on this result, all subsequent analyses of 20 individual differences related of footprint distinctiveness controlled for number of days of data 21 available. As a next step, we evaluated whether footprint distinctiveness was related to age or sex 1 in our sample of adolescents and young adults. We found that geolocation-based footprints 2 became more distinct with age across the transition from adolescence to adulthood (partial r = 3 0.33, p <0.05, Supplementary Fig. 5) . Furthermore, female sex was associated with higher 4 accelerometer-based footprint distinctiveness (Cohen's d = 1.27, p < 0.001, Supplementary Fig. 5 5). 6 7 We next evaluated how footprint distinctiveness was related to a key domain of 8 psychopathology: affective instability. Affective instability is a major feature of many 9 psychiatric disorders 14 , including borderline personality disorder. Affective instability is 10 particularly prominent in youth 15 , and is an important predictor of suicide 16 . However, affective 11 instability is often challenging to quantify using standard tools as it is fundamentally a dynamic 12 measure 17 . We capitalized on participant ratings of multiple mood features collected three times 13 a day for two weeks using ecological momentary assessment in order to quantify affective 14 instability. We hypothesized that individuals who had less predictable patterns of mobility (i.e., 15 reduced footprint distinctiveness) would have higher levels of affective instability. While 16 controlling for data quantity, age, sex, and the mean of mood ratings, we found that affective 17 instability (measured by root mean square of successive differences 18 ) was associated with 18 reduced footprint distinctiveness (partial r = -0.37, p < 0.05, Fig. 3a) . Furthermore, given well-19 established links between sleep disturbance and mood disorders 19 , we also evaluated whether 20 variability in sleep duration was also associated with footprint distinctiveness. While controlling 21 for covariates as above, we found that variability in sleep duration was similarly associated with 22 reduced footprint distinctiveness (partial r = -0.36, p < 0.05, Fig. 3b ). items acquired three times a day, was associated with reduced footprint distinctiveness (r = -0.37, p < 31 0.05), after controlling for data quantity, age, sex, and mean level of mood ratings. b) Similarly, we found 32 that increased variability in sleep duration was associated with reduced footprint distinctiveness (r = -33 0.36, p < 0.05), after controlling for covariates. c) Across functional brain networks, only greater 1 connectivity within the somatomotor network had a significant association with footprint distinctiveness 2 (r = 0.46, p < 0.05, corrected for multiple comparisons with the false discovery rate). d) Patterns of brain 3 functional connectivity significantly predicted individual footprint distinctiveness using leave-one-out As a final step, we investigated whether footprint distinctiveness was related to patterns 15 of functional connectivity. Initially, we examined associations with a simple summary measure 16 of high-dimensional functional connectivity data: the mean connectivity within each of seven 17 canonical large-scale functional networks 20 . While controlling for covariates as above (as well as 18 in-scanner motion) and correcting for multiple comparisons with the false discovery rate, we 19 found that footprint distinctiveness was associated with greater connectivity within the 20 somatomotor network (r = 0.46, pfdr = 0.03, Fig. 3c) . Previous work has demonstrated that 21 somatomotor network connectivity develops over the lifespan (years) 21 and is altered acutely 22 (days) during limb disuse 22 ; our results further suggest that mobility patterns over weeks-months 23 can be predicted by somatomotor network connectivity. Lastly, we moved beyond the simple summary measure of mean network connectivity 26 and investigated whether complex multivariate patterns of functional connectivity could predict 27 footprint distinctiveness in unseen data. Given that there were far larger number of features than 28 participants, we used regularized regression with leave-one-out cross-validation and nested 29 parameter tuning, followed by permutation testing to determine significance (Online Methods). 30 We found that multivariate patterns of functional connectivity could predict footprint 31 distinctiveness in unseen data (r = 0.29, p = 0.025; Fig. 3d ). The predictive model yielded results 32 that aligned with the mass-univariate analyses (Fig. 3c) , suggesting that the multivariate model 33 was driven in part by features linked to somatomotor network (67% of edges selected by the 34 model). Moreover, this model also revealed important features beyond the motor system, 35 including increased connectivity between the frontoparietal and default mode system (Fig. 3e) . Taken together, these results establish that mobility patterns collected from smartphones 38 can be used to create a person-specific footprint. Notably, the distinctiveness of this footprint 39 increased with age, was reduced in association with both affective instability and circadian 40 irregularity, and was related to patterns of functional brain connectivity. These results align with 41 impactful prior work on "connectome fingerprinting" 8 (or "connectotyping" 10 ), which have 42 shown that individuals can be identified based on their pattern of functional connectivity. 43 Interestingly, result from these prior studies have shown that -like the footprint distinctiveness 44 examined here -connectome distinctiveness increases with age and is reduced in association 45 with psychiatric symptoms 11 . 46 47 Our finding that footprint distinctiveness is related to data quantity recalls recent work 1 demonstrating that the ability to delineate person-specific functional brain networks is dependent 2 in large part on the quantity of data available 23,24 . However, while accruing large amounts of 3 functional imaging data is often difficult and expensive, passive collection of long timeseries of 4 mobility data is both tolerable for participants and inexpensive. The high degree of scalability 5 enabled by ubiquitous usage of personal smartphones will allow future studies to test the 6 generalizability of these findings across different age groups and clinical samples. Moving 7 forward, mobility-based digital biomarkers that combine objective measurement and individual-8 specific analysis of behavior may accelerate the advances in personalized diagnostics for diverse 9 psychiatric illnesses. range 17-30 years) were enrolled as part of a study of affective instability in youth. Participants 9 were recruited via the Penn/CHOP Lifespan Brain Institute or through the Outpatient Psychiatry 10 Clinic at the University of Pennsylvania. Of these 41 participants, 38 participants met criteria for 11 Axis I psychiatric diagnosis based on a semi-structured clinical interview 1 ; 33 met criteria for 12 more than one disorder (Supplementary Table 1 ). Additionally, 16 of the 41 participants met 13 criteria for a personality disorder (mainly borderline personality disorder) based on assessment 14 with the SCID-II 1 . All participants provided informed consent to all study procedures; for 15 minors, the parent or guardians provided informed consent and the minor assented as well. This 16 study was approved by the University of Pennsylvania Institutional Review Board. Mobility data acquisition 20 21 Global Positioning System (GPS) geolocation data were acquired via the Beiwe platform 2 . 22 Participants were asked to download the Beiwe application on their personal smartphone. The 23 application recorded the location of the participant's phone in latitude, longitude, and altitude, as 24 well as the precision of that measure. To conserve battery and minimize degradation of the phone 25 performance, Beiwe was designed to track participant's geolocation in a periodic fashion. 26 Specifically, Beiwe tracked GPS for 2 minutes every 20 minutes, resulting in 144 minutes of 27 data recording and 1296 minutes of dormancy in a 24-hour cycle. Due to user and device related 28 factors in the naturalistic setting, such as phone powered off, no cell signal, or airplane mode, 29 longer periods of recording dormancy were possible. Mobility data were automatically uploaded 30 via WiFi to a cloud-based data management system daily. In total, 3,317 days of GPS tracking across all participants were obtained (mean (s.d.) = 77 (26) 33 days, range 14-132 days, see Supplementary Figure 1 ). After removing the first and last days 34 of each participant's study period when only partial data were recorded and days containing no 35 data, the remaining data available for analysis had 3,156 days. Accelerometer data were also acquired via the Beiwe platform. Raw GPS data were processed using the Smartphone Sensor Pipeline 3 , a validated pipeline 3 specifically designed to handle GPS data while accounting for data missingness. First, each 4 subject's GPS longitude and latitude coordinates on the spherical Earth's surface were 5 transformed to a standardized two-dimensional Cartesian plane, thus deidentifying subject's real-6 world locations. Second, the data were converted to a sequence of flights and pauses, where 7 flights were defined as segments of linear movements and pauses were defined as periods of no 8 movement. Finally, missing flights and pauses were then imputed by the hot-deck method 4 , 9 which resamples from observed events over each missing interval. 10 11 Mobility metrics calculation 12 13 Using the constructed subject mobility traces and the Smartphone Mobility footprint construction 20 21 Inspired by person-specific connectome fingerprints 6,7 , we constructed a mobility footprint for 22 each participant using the covariance matrix of mobility metrics. First, we extracted the mobility 23 metric time series by concatenating the daily mobile metric output from the Smartphone Sensor 24 Pipeline. Then we computed the pairwise Pearson correlation for all the mobility metrics to 25 construct a covariance matrix. The nodes of the network were the mobility metrics, and the edges 26 of the network were the Pearson correlation coefficients between metrics. We refer to the 27 resulting covariance matrix as the "Mobility Footprint." This procedure was carried out 28 separately for GPS-and accelerometer-based mobility data. For the main analysis, the upper 29 triangle of the resulting covariance matrices from GPS and accelerometer metrics were 30 concatenated and were used as input features for the individual identification procedure. We also 31 repeated the identification procedure using GPS or accelerometer features alone. As a sensitivity analysis to test performance of alternative features for individual identification, 34 we also computed the mean and the stability of each measure and used these features to identify 35 participants. Stability was defined as the root mean square of the successive differences 36 (RMSSD) 8 of each measure (Supplementary Figure 6) . Individual identification procedure ("footprinting") 39 40 We randomly partitioned each subject's data into two equally sized parts, named the "reference" 41 and the "target", respectively 6 . The objective of the individual identification procedure was to 42 match the subject from the target group to the same one in the reference group. For a given 43 subject, S, we computed the Pearson correlation (r) between that subject's features in the target 44 group, ST, and everyone's features in the reference group, SR 1, SR 2, SR N, where N is the total 45 number of participants. 46 Individual identification was operationalized as the maximum of the resulting 1 , 2 , ⋯ . In 2 other words, when the subject in the reference group having the mobility features that maximally 3 correlated with that of the target subject, these two participants were declared correctly matched: The individual identification accuracy was the number of correct identifications divided by the 8 total number of random data partitions P: 9 10 The above individual identification procedure was repeated 1,000 times, each time with a new 13 random data partition (P). We calculated the average individual identification accuracy across 14 the 1,000 runs, which yielded a distribution of sample-wise identification accuracy. Furthermore, 15 we also calculated the accuracy for each participant, defined as the number of correct 16 identifications for that specific participant divided by the number of data partitions (B). We refer 17 to this participant-specific identification accuracy as the individual footprint distinctiveness: 18 19 where is target in a partition for subject i, and is matched subject. We conducted the 22 individual identification procedure using the covariance matrix of the GPS data, accelerometer 23 data, as well as the combined feature set. Sensitivity analyses examined the mean and variance 24 of each feature. Similarity matrix construction 27 28 To visualize the individual footprint distinctiveness, we constructed similarity matrices between 29 participants' mobility covariance features 9 . First, we concatenated the daily mobility metrics for 30 a participant from multiple random data partitions. Next, a similarity matrix was constructed by 31 computing the Pearson correlation coefficients between every pair of participants. The resulting 32 matrix was a symmetric matrix, where the nodes were each participant and the edges were the 33 correlation coefficients between any two participant's mobility metrics. This grouping procedure 34 was performed solely for visualization, highlighting the within-individual, across-partition block 35 structures on the diagonal of the matrix. This grouping was not used in any statistical analysis. Permutation testing 38 39 To assess the statistical significance of individual identification accuracy, we used a permutation 40 testing procedure to create a null distribution of accuracy. Specifically, we randomly scrambled 41 the identity of the daily mobility metrics, thus disrupting the linkage between the mobility data 42 and the corresponding participant. We repeated the individual identification procedure for each 43 random permutation. The empirical p-value was then calculated as the proportion of times when 1 the permuted data yielded higher accuracy than the original data: where A is the individual identification accuracy, and M is the total permutations. 6 7 8 Sensitivity analysis of data missingness 9 10 To understand the effect of data missingness on our ability to identify participants' mobility 11 footprint, we conducted sensitivity analyses that used four sets of data constructed using different 12 thresholds for data missingness 3,10 . Specifically, we applied four thresholds with diminishing 13 tolerance for the number of missing samples (i.e., minutes recorded) in a day's worth of data to 14 be included in analysis (Supplementary Figure 1) . percentile, a further 171 days were removed, resulting in 2,413 days remaining for analysis. 20 Using these four sub-samples constructed with different inclusion criteria, we then repeated the 21 individual identification procedure and permutation testing as described above. Feature lesion analysis 24 25 To further investigate the influence of any single feature's influence on the individual 26 identification accuracy, we conducted a feature lesion analysis. We sequentially removed one 27 metric (out of the total 15 geolocation mobility metrics available) and constructed a new 28 covariance matrix which had one node (and 14 edges) less than the original feature covariance 29 matrix. Using this reduced feature set, we repeated the individual identification and permutation 30 testing procedures as described above (Supplementary Figure 3) . 31 32 Ecological momentary assessment 33 34 Using the Beiwe platform application on personal smartphones, participants completed daily 35 questionnaires specifically designed to assess mood variability at three timepoints throughout the 36 day 11 . In each survey, participants rated on a scale from 1 ("not at all") to 7 ("extremely") of 37 their endorsement of seven statements assessing mood variability, aggression, impulsivity, and 38 self-esteem since the last time they had answered the survey to capture their mood 39 (Supplementary Table 3 ). All seven items were concatenated to create an overall mood scale. 40 Additionally, every morning, participants were also asked about their sleep patterns and quality 41 from the night before. To quantify the variability of answers to the mood survey, we calculated 42 the root mean square of successive differences (RMSSD) between concatenated answers. 43 Similarly, we also calculated the RMSSD of sleep duration as a measurement of its stability. We built a generalized additive model (GAM) to investigate the association between mood and 1 sleep duration stability while accounting for covariate effects including data quantity, sex, age, 2 and mean levels of the measure. Age was modeled using penalized splines within GAM using 3 restricted maximum likelihood (REML) to estimate linear and nonlinear developmental effects 4 without over-fitting the data 12,13 . 5 6 Functional Connectivity Analysis 7 8 Imaging Acquisition 9 10 As previously described 14 , structural and functional MRI scans were acquired using in a single 11 session on a clinically-approved 3 Tesla Siemens Prisma (Erlangen, Germany) quadrature body-12 coil scanner and a Siemens receive-only 64-channel head coil at the Hospital of the University of 13 Pennsylvania. and used as T1w-reference throughout the workflow. The T1w-reference was then skull-stripped 29 with a Nipype implementation of the antsBrainExtraction.sh workflow (from ANTs), using 30 OASIS30ANTs as target template. Brain tissue segmentation of cerebrospinal fluid (CSF), 31 white-matter (WM) and gray-matter (GM) was performed on the brain-extracted T1w using 32 FAST in FSL 5.0.9 21 . Volume-based spatial normalization to MNI2009c standard space was 33 performed through nonlinear registration with antsRegistration (ANTs 2.2.0), using brain-34 extracted versions of both the T1w reference and the T1w template. 35 BOLD runs were first slice-time corrected using 3dTshift from AFNI 20160207 22 and then 36 motion corrected using mcflirt (FSL 5.0.9) 21 . A fieldmap was estimated based on a phase-37 difference map calculated with a dual-echo GRE sequence, processed with a custom workflow of 38 SDCFlows inspired by the epidewarp.fsl script and further improvements in HCP Pipelines 23 . 39 The fieldmap was then co-registered to the target EPI reference run and converted to a 40 displacement field map with FSL's fugue and other SDCflows tools. Based on the estimated 41 susceptibility distortion, a corrected BOLD reference was calculated for a more accurate co-42 registration with the anatomical reference. The BOLD reference was then co-registered to the 43 T1w reference using bbregister (FreeSurfer) which implements boundary-based registration 24 . 1 Co-registration was configured with nine degrees of freedom to account for distortions remaining 2 in the BOLD reference. Six head-motion parameters (corresponding rotation and translation 3 parameters) were estimated before any spatiotemporal filtering using mcflirt. Finally, the motion 4 correcting transformations, field distortion correcting warp, BOLD-to-T1w transformation and 5 T1w-to-template (MNI) warp were concatenated and applied to the BOLD timeseries in a single 6 step using antsApplyTransforms (ANTs) with Lanczos interpolation. 7 8 After pre-processing with fMRIPRep, confound regression was carried out in XCP Engine. 9 Preprocessed timeseries were despiked and then de-noised using a 36-parameter confound 10 regression model that has been shown to minimize the impact of motion artifact 25 . Specifically, 11 the confound regression model included the six framewise estimates of motion, the mean signal 12 extracted from eroded white matter and cerebrospinal fluid compartments, the global signal, the 13 derivatives of each of these nine parameters, and quadratic terms of each of the nine parameters 14 as well as their derivatives. Both the BOLD-weighted time series and the confound regressor 15 timeseries were temporally filtered simultaneously using a fist-order Butterworth filter with a 16 passband between 0.01 and 0.08 Hz to avoid mismatch in the temporal domain 26 . Confound 17 regression was performed using AFNI's 3dTproject. Note that in-scanner head motion was also 18 included as a covariate in all regression models (see below). Functional network and community connectivity 21 22 Functional connectivity between each pair of brain regions was quantified as the Fisher-23 transformed Pearson correlation coefficient between the mean regional BOLD time series. For 24 each participant, a 200 × 200 weighted adjacency matrix encoding the connectome was 25 constructed 27 . Each node was assigned to one of seven canonical functional brain modules or 26 communities defined by Yeo et al 28 . The within-community connectivity is defined as 29 where ′ is the weighted edge strength between the node and node ′ , both of which belong to 32 the same community , for the -th subject. The cardinality of the community assignment 33 vector, , represents the number of nodes in the -th community 29 . 34 Mass-univariate analysis 36 37 For each of the seven canonical networks, we fit generalized additive model (GAM) to 38 investigate the relationship between within-network connectivity and footprint distinctiveness, 39 while controlling for in-scanner motion, mobility data quantity, sex, and age. Specifically, we 40 used penalized splines using restricted maximum likelihood (REML) within GAM to estimate 41 linear and nonlinear age-related changes 12,13 . We controlled for multiple comparisons using the 42 False Discovery Rate (Q<0.05). 43 1 Predicting footprint distinctiveness using functional connectivity 2 3 We fit a penalized regression model to predict footprint distinctiveness using brain functional 4 connectivity 6 . In each iteration of leave-one-out cross-validation, one subject was left out as the 5 testing set and the rest the training set. Using the training set, we computed residualized footprint 6 distinctiveness from a GAM model with covariates as above (linear terms for in-scanner motion, 7 data quantity, sex; age was modeled with as a penalized spline). Then we fit a lasso regression 8 model to predict the residualized footprint distinctiveness using a sparse collection of functional 9 connectivity edges. L1 lasso hyperparameter was tuned in a nested leave-one-out fashion. Next, 10 we calculated the predicted footprint distinctiveness for the unseen subject in the testing set. 11 After all iterations, we obtained predicted footprint distinctiveness for all participants and then 12 calculated the Pearson correlation between the actual footprint distinctiveness and predicted 13 values. 14 15 Code availability and data access 16 17 The code for GPS data preprocessing, mobility metric extraction, individual identification, 18 additional analysis, and data visualization is available in R on github: Persistence of psychosis spectrum symptoms in the Philadelphia 28 Neurodevelopmental Cohort: a prospective two-year follow-up New Tools for New Research in 31 Psychiatry: A Scalable and Customizable Platform to Empower Data Driven Smartphone 32 Research Inferring mobility measures from GPS traces with missing data A Review of Hot Deck Imputation for Survey Non-36 response RAPIDS: Reproducible Analysis Pipeline for Data Streams Collected with 38 Mobile Devices Functional connectome fingerprinting: identifying individuals using 40 patterns of brain connectivity Delayed stabilization and individualization in connectome 42 development are related to psychiatric disorders An Overview of Heart Rate Variability Metrics and Norms Functional Brain Networks Are Dominated by Stable Group and 2 Individual Factors, Not Cognitive or Daily Variation Sociodemographic Characteristics of Missing Data in Digital 4 Fast stable restricted maximum likelihood and marginal likelihood estimation 8 of semiparametric generalized linear models Stable and Efficient Multiple Smoothing Parameter Estimation for 11 Generalized Additive Models Accelerated cortical thinning within structural brain networks is 13 associated with irritability in youth fMRIPrep: a robust preprocessing pipeline for functional MRI Nipype: A Flexible, Lightweight and Extensible Neuroimaging 17 Data Processing Framework in Python PennBBL/xcpEngine: atlas in MNI2009 Mitigating head motion artifact in functional connectivity MRI N4ITK: improved N3 bias correction Symmetric diffeomorphic image 25 registration with cross-correlation: evaluating automated labeling of elderly and 26 neurodegenerative brain Improved Optimization for the 28 Robust and Accurate Linear Registration and Motion Correction of Brain Images AFNI: software for analysis and visualization of functional magnetic 31 resonance neuroimages The minimal preprocessing pipelines for the Human Connectome 33 Project Accurate and robust brain image alignment using boundary-35 based registration Benchmarking of participant-level confound regression strategies for the 37 control of motion artifact in studies of functional connectivity The nuisance of nuisance regression: Spectral 40 misspecification in a common approach to resting-state fMRI preprocessing reintroduces 41 noise and obscures functional connectivity Local-Global Parcellation of the Human Cerebral Cortex from Intrinsic 43 Functional Connectivity MRI The organization of the human cerebral cortex estimated by intrinsic 45 functional connectivity Multi-scale network regression for brain-phenotype associations