key: cord-0552862-5rsn1emn authors: Arcolezi, H'eber H.; Couchot, Jean-Franccois; Renaud, Denis; Bouna, Bechara Al; Xiao, Xiaokui title: Differentially Private Multivariate Time Series Forecasting of Aggregated Human Mobility With Deep Learning: Input or Gradient Perturbation? date: 2022-05-01 journal: nan DOI: nan sha: a23fe664343cdd7d7b6c9b67fc4e3db8f2be8d96 doc_id: 552862 cord_uid: 5rsn1emn This paper investigates the problem of forecasting multivariate aggregated human mobility while preserving the privacy of the individuals concerned. Differential privacy, a state-of-the-art formal notion, has been used as the privacy guarantee in two different and independent steps when training deep learning models. On one hand, we considered textit{gradient perturbation}, which uses the differentially private stochastic gradient descent algorithm to guarantee the privacy of each time series sample in the learning stage. On the other hand, we considered textit{input perturbation}, which adds differential privacy guarantees in each sample of the series before applying any learning. We compared four state-of-the-art recurrent neural networks: Long Short-Term Memory, Gated Recurrent Unit, and their Bidirectional architectures, i.e., Bidirectional-LSTM and Bidirectional-GRU. Extensive experiments were conducted with a real-world multivariate mobility dataset, which we published openly along with this paper. As shown in the results, differentially private deep learning models trained under gradient or input perturbation achieve nearly the same performance as non-private deep learning models, with loss in performance varying between $0.57%$ to $2.8%$. The contribution of this paper is significant for those involved in urban planning and decision-making, providing a solution to the human mobility multivariate forecast problem through differentially private deep learning models. Efficiently planning a road network, choosing the optimal location for a hospital, for example, are all decisions based on a precise understanding of human mobility. Mobile phone data such as call detail records (CDRs) have proven to be one of the most promising ways to analyze human mobility on a large scale due to the high penetration rates of cell phones [35, 14, 13] . CDR is a type of metadata that describes users' activities in a cellular network (e.g., phone calls, SMS) with information such as the duration of communication, the antennas that handled the service (coarse level location), and so on. For this reason, CDRs are commonly used by mobile network operators (MNOs) to enhance their services and for billing and legal purposes [39, 13] . Because both temporal and spatial information is available in CDRs, these data have become one of the most important data sources for research on human mobility [14, 30, 35, 8, 20] . Indeed, human mobility analysis can benefit individuals and society enabling local authorities to improve urban planning, enhance the transportation system, and assist in decision-making to respond to critical situations (e.g., natural disasters [20, 28] ). Within a recent context, due to the ongoing Coronavirus Disease 2019 (COVID-19) pandemic [56] , on 8 April 2020, the European Commission asked MNOs in the European region to share anonymized and aggregated mobility data to help to fight the outbreak [54, 24] , which has also been done in other parts of the world as described in [8] . This vision is also shared by, e.g., Buckee et al. [14] and Oliver et al. [39] , which highlight the importance of aggregate mobility data and mobile phone data like CDRs for fighting the COVID-19 outbreak. For instance, on analyzing the dataset at our disposal (further explained in Subsection 2.1), Fig. 1 illustrates aggregated mobility analytics in Paris, France, for two 14-days periods: from the beginning of 2020-04-21 to the end of 2020-05-03 and from the beginning of 2020-09-23 to the end of 2020-10-06. The plot on the left-side corresponds to mobility analytics during the first national lockdown period in France [1] , and the plot on the right-side corresponds to a period with no lockdown measures. As one can notice, there is a clear difference between the first period of analysis (low mobility activity) and the second one (high mobility activity). This type of mobility analysis provides important insights on mobility patterns for public authorities and policymakers, for example [54, 8] . However, on analyzing mobility data, some studies have shown that humans follow particular patterns with a high identifiability [36, 38] and, hence, users' location privacy is a major concern [36, 38, 11, 35, 7, 5] . Indeed, even though in CDRs the location information is at a coarse level (antennas that handled the service), collecting many imprecise locations can still lead to privacy leaks, such as the home or work addresses. Also, this is a scenario in which users cannot sanitize their data locally since CDRs are automatically generated on MNOs' servers through the use of a service (e.g., making/receiving phone calls). To tackle these issues, the General Data Protection Regulation (GDPR) [3] as well as some data protection authorities, such as the Commission Nationale de l'Informatique et des Libertés (CNIL) [2] , in France, require that MNOs anonymize "on-the-fly" CDRs used for purposes other than billing. More precisely, if CDRs are used for mobility analytics, these data must be processed within a required time interval (e.g., 15 minutes) if and only if there is a sufficient number of users present for reaching a specific level of anonymity (i.e., "hide in the crowd"). This way, MNOs tend to publish aggregated mobility data [8, 57, 54, 53, 40] , e.g., the number of users by coarse location at a given timestamp, which, in other words, represents a multivariate time series dataset that can be used for predictive mobility [30] . Nevertheless, as recent studies have shown, even aggregated mobility data can be subject to membership inference attacks [42, 43] and users' trajectory recovery attack [53, 57] , thus requiring proper sanitization. To tackle privacy concerns in data releases, research communities have proposed different methods to preserve privacy, with differential privacy (DP) [21, 22] standing out as a formal definition that allows quantifying the privacyutility trade-off. Differential privacy has also been at the core of many privacy-preserving machine learning (ML) and deep learning (DL) models [48, 4, 60, 17, 31, 33] since predictive models are also subject to privacy attacks [16, 58, 50, 15, 49] . With these elements in mind, this paper contributes with a comparative analysis between adding DP guarantees into two different steps of training DL models to forecasting multivariate aggregated human mobility data. On the one hand, we consider using gradient perturbation, which can be achieved by training DL models over original time-series data with the differentially private version of the stochastic gradient descent algorithm (DP-SGD) [4] . On the other hand, we consider using input data perturbation, i.e., training DL models with differentially private time series data. Notice that, while aggregated time-series data provides some anonymity-based protection, with the latter input perturbation setting, DP also adds a layer of protection against, e.g., data breaches [32] , membership inference attacks [42, 43] , and users' trajectory recovery attacks [53, 57] . It is worth mentioning that human mobility forecast information is of great importance for public and/or private organizations to identify strategies to propose better decision-making solutions for society [55, 35, 8, 39, 20, 28, 14, 54] . Therefore, in this paper, extensive experiments were carried out with a real-world mobility dataset collected by Orange [40] on analyzing CDRs in 6 coarse regions in Paris, France, which we publish openly as an open mobility dataset. More precisely, this paper benchmarks four state-of-the-art DL models with this dataset, i.e., recurrent neural networks (RNNs): Long Short-Term Memory (LSTM) [27] , which is capable of learning long-term dependencies while overcoming the vanishing gradient problem of standard RNNs; Gated Recurrent Unit (GRU) [19] , which is similar to LSTM but with a simpler architecture; and their Bidirectional [46] architectures, i.e., BiLSTM and BiGRU. Moreover, we also took into consideration users' privacy, adding DP guarantees into the predictive models and evaluating their utility loss in comparison with non-private DL models. To summarize, this paper makes the following contributions: -Publish the real-world, CDRs-based, and multivariate (i.e., 6 coarse regions) aggregated mobility dataset openly in a Github repository 1 . -Benchmark four state-of-the-art RNNs (LSTM, GRU, BiLSTM, and BiGRU) with this dataset for one-step-ahead multivariate forecasting. -Provide the first comparative evaluation on the impact of differential privacy guarantees when training DL models in both input and gradient perturbation settings, for multivariate time series forecasting. Therefore, we intend that from this study, other classical time series forecasting, ML, and privacypreserving ML techniques can be tested and compared. Outline. The remainder of this paper is organized as follows. In Section 2, we describe the material and methods used in this work, i.e., the mobility data and the problem statement; the DL methods, and the privacy guarantee, namely, differential privacy for both input and gradient perturbation settings. In Section 3, we present the experimental setup, our results with nonprivate DL models, and our results with differentially private DL models. In Section 4, we discuss our work with its limitations and future directions. Lastly, in Section 5, we present the concluding remarks. In this section, we first describe the mobility dataset and the problem we intend to solve (Subsection 2.1). Next, we briefly describe the DL methods we consider in our experiments (Subsection 2.2). Lastly, we recall the privacy notion that we are considering, i.e., differ-ential privacy, in both input and gradient perturbation settings (Subsection 2.3). The dataset at our disposal was provided by an MNO in France [40] , which contains anonymized and aggregated human mobility data resulted from analyzing CDRs "on-the-fly", following recommendations from both GDPR and CNIL. This dataset comprises information for two periods: from 2020-04-20 to 2020-05-03 and from 2020-08-24 to 2020-11-04, with time granularity of 30 minutes (min) and spatial granularity of 6 coarse regions in Paris, France. More formally, this is a multivariate time series dataset X (t1,tτ ) with aggregate number of people per region and corresponding time period t ∈ [1, τ ]. That is, X (t1,tτ ) = [ t 1 , x 1 , t 2 , x 2 , ..., t τ , x τ ], in which x t is a vector with each position representing the number of users per region at time t ∈ [1, τ ]. In this paper, we aim at forecasting the future number of people at the next 30-min interval in each of the 6 regions. Thus, given X (t1,tτ ) , the goal is to forecast X (tτ+1) , i.e., onestep-ahead forecasting, which is unknown at time τ . For the rest of this paper, we only utilize the second period of this dataset (i.e., from 2020-08-24 to 2020-11-04), which has aggregated mobility data for 72 days. For each week, coarse region, and 30-min interval, we used the interquartile range technique 2 to detect outliers and missing data. These values were completed with the average value for that respective week, coarse region, and 30-min interval. Table 1 presents descriptive statistics about this processed dataset with the following measures per region (labeled as R1 -R6): min, max, mean, standard deviation (std), and median. To predict the number of users in each coarse region in a multivariate time series forecasting framework, we compared the performance of four state-of-the-art RNNs: LSTM [27] , GRU [19] , and their Bidirectional [46] architectures, i.e., BiLSTM and BiGRU. Indeed, RNNs is a specialized class of neural networks used to process sequential data (e.g., time-series data). RNNs have at least one feedback connection that provides the ability to use contextual information when mapping between input and output sequences [26] . The LSTM, GRU, and Bidirectional RNN methods are briefly described in the following subsections. Long Short-Term Memory [27] is a type of RNN that overcomes the vanishing gradient problem of standard RNNs. The memory cell of LSTM divides its states in long-term state c (t) and short-term state h (t) . The learning process is controlled by three gates: input i (t) , forget f (t) and output o (t) gates, which give LSTM the ability to learn which data in a sequence is important to keep or to discard. More precisely, both input x (t) and the previous short-term state h (t−1) are fed to four different and fully connected layers. Then, the first layer computes the internal hidden state g (t) , using x (t) and h (t−1) , and partially store g (t) in the long-term state. The remaining tree layers are the nonlinear gating units. The forget gate f (t) controls the past information which must be vanished or kept. The input gate i (t) controls the new information which is to be added to the long-term state. Lastly, the output gate o (t) controls which information could be utilized for the output of the memory cell y (t) . The mathematical formulation is as follows [27] : , in which ⊗ means an element-wise multiplication, σ is the sigmoid function, W * are weight matrices, and b * are the vectors of bias term. Gated Recurrent Unit [19] is also a type of RNN, which works using the same principle as LSTM. GRU utilizes two gates: update z (t) and reset r (t) , which decide what information should be passed to the output. More specifically, the reset gate r (t) controls how to combine the new input with the previous memory. The update gate z (t) controls how much of the last memory to keep. The mathematical formulation is as follows [19] : Bidirectional RNN (BiRNN) [46] is a combination of two RNNs: one RNN moves forward while the other moves backward. That is, BiRNN connects two hidden layers of opposite directions to the same output. The RNN cells in a BiRNN can either be standard RNNs, LSTMs, GRUs, and so on. This Bidirectional architecture allows the networks to have both backward and forward information about the sequence at every time step. In recent years, differential privacy [21] has been increasingly accepted as the current standard for data privacy with several large-scale implementations in the real-world (e.g. [45, 6] ). One key reason is that DP addresses the paradox of learning about a population while learning nothing about single individuals [22] . More specifically, the idea is that removing (or adding) a single row from the database should not affect much the statistical results. A formal definition of DP is given in the following [22] : Given > 0 and 0 ≤ δ < 1, a randomized algorithm A : D → R is said to provide ( , δ)-differential-privacy (( , δ)-DP) if, for all neighbouring datasets D 1 , D 2 ∈ D that differ on the data of one user, and for all sets R of outputs: The additive δ is interpreted as a probability of failure. Normally, a common choice for δ is to set it significantly smaller than 1/n where n is the number of users in the database [22] . Throughout this paper, if δ = 0, we will just say that A is -DP. Differential privacy possesses several important properties, highlighting its strength in comparison with other privacy models. For instance, with DP, there is no need to define the background knowledge that attackers might have, which is equivalent to assuming an attacker with unlimited resources. In addition, DP is immune to post-processing, which means it is not possible to make an -DP mechanism less differentially private by evaluating any function f of the response of the mechanism, given that there is no additional information about the database. Proposition 1 (Post-Processing of DP [22] ) If A : D → R is -DP, then f (A) is also -DP for any function f . Furthermore, DP also composes well, which is one of the most powerful features of this privacy model. For instance, accounting for the overall privacy loss consumed in a pipeline of several DP algorithms applied to the same database is feasible due to composition. We recall the sequential composition that will be used in this paper in the following. Proposition 2 (Sequential Composition [22] ) Let A 1 be an 1 -DP mechanism and A 2 be an 2 -DP mechanism. Then, the mechanism A 1, Any mechanism that respects Definition 1 can be considered differentially private. Two widely used DP mechanisms for numeric queries (i.e., functions f : D → R) are the Laplace mechanism [21] and the Gaussian mechanism [22] . One important parameter that determines how accurately we can answer the queries is their sensitivity. We recall the definition of 2 -sensitivity and the Gaussian mechanism that will be used in this paper in the following. Definition 2 ( 2 -sensitivity [22] ) The 2 -sensitivity of a function f : D → R, for all neighbouring datasets D 1 , D 2 ∈ D that differ on the data of one user, is: Definition 3 (Gaussian mechanism [22] ) For a query function f : D → R over a dataset D ∈ D and for σ = ∆2 2 ln (1.25/δ), the Gaussian mechanism is defined as: in which N 0, σ 2 is the normal distribution centered at 0 with variance σ 2 . For ∈ (0, 1), the Gaussian mechanism provides ( , δ)-DP [22] . In this paper, on the one hand, we consider that noise is added to each sample in the time series data (input perturbation). Once the data is differentially private, following Proposition 1, any DL or pre-processing methods can be applied to the data. On the other hand, we consider the case where noise is added in the learning stage (gradient perturbation). In this case, the raw time-series data is used as input to a DL method trained with the differentially private stochastic gradient descent algorithm. These two settings are briefly described in the following: Input data perturbation. Input perturbation (or data perturbation) consists to the fact that DP is added to each data sample x i ∈ D. For example, let x be a real-valued vector, then a differentially private version of it using the Gaussian mechanism (cf. Subsection 2.3.2) is:x = x + N 0, σ 2 . On the one hand, input perturbation is the easiest method to apply and it is independent of any ML and post-processing techniques. On the other hand, the perturbation of each sample in the dataset may have a negative impact on the utility of the trained model. In this paper, we will use the Gaussian mechanism [22] in Definition 3 to sanitize each release of multivariate mobility data. Gradient perturbation. Another solution to guarantee DP to the trained model is perturbing intermediate values in iterative algorithms. In this case, the authors in [4] proposed a differentially private version of the stochastic gradient descent algorithm (i.e., DP-SGD). Indeed, DL models trained with DP-SGD [4] provide provable DP guarantees for their input data. Two new parameters are added to the standard stochastic gradient descent algorithm, namely, clip and noise multiplier. The former is used to bound how much each training point can impact the model's parameters and the latter is used to add controlled Gaussian noise to the clipped gradients in order to ensure DP guarantee to each data sample in the training dataset. In this paper, we use the DP-SGD implementation from the Tensorflow Privacy (TFP) library [33] to implement our DL models. We divide this section in the following way. First, we describe general settings for our experiments (Subsection 3.1). Next, we present the development and evaluation of non-private DL models (Subsection 3.2). Lastly, we present the development of differentially private DL models, which include both gradient and input perturbation settings (Subsection 3.3). Environment. All algorithms were implemented in Python 3.8.8 with Keras and TFP [33] libraries. Temporal features. We added the time of the day and the time of the week as cyclical features to help models recognizing low and high peak values of human mobility patterns. Training and testing sets. We split the dataset analyzed in Table 1 from Subsection 2.1 into exclusively divided learning (first 65 days, i.e., n l = 3120 intervals of 30-min) and testing (last 7 days, i.e., n t = 336 intervals of 30-min) sets. Fig. 2 exemplifies the data separation into train and test sets for region R1. Forecasting methodology. We used 6 prior time steps (i.e., lag values), which showed autocorrelation higher than 0.5 to predict a single step ahead in the future (i.e., short forecasting horizon). More specifically, the forecasting models will take into account the number of people in each coarse region from 3 hours to make predictions one-step-ahead for each coarse region in the next 30-min interval. And in the end, we compute the performance metrics. Performance metrics. All models were evaluated with standard time-series metrics, namely, root mean square error (RMSE) calculated as RM SE = |y t −ŷ t |; in which y t is the real output,ŷ t is the predicted output, and n t is the total number of samples in the test set, for t ∈ [1, n t ]. RMSE was the primary metric to select the final DL models. As a multi-output scenario (i.e., 6 coarse regions), we present the metrics per coarse region as well as its averaged values. In all experiments, due to randomness, we report the results of the best model over 10 runs. Baseline model. We compared the performance of the four DL methods (i.e., LSTM, GRU, BiLSTM, and Bi-GRU) with a naive forecasting technique a.k.a. "persistence model ", which for each region, it returns the current number of people at time t as the forecasted value, i.e., x t+1 = x t . Notice that this is a quite accurate baseline since, in general, the number of people per coarse region varies slowly by 30-min (i.e., walking people may take more time to move from one coarse region to another). Model selection. To select the best hyperparameters per DL method, we used Bayesian optimization [12] with 100 iterations to minimize loss = RM SE avg + RM SE std ; the subscripts avg and std indicates the averaged and standard deviation values of the RMSE metric considering the 6 coarse regions. For each method, we only used a single hidden layer followed by a dense layer (output), since RNNs generally perform well with a low number of hidden layers [26] . So, we searched the following hyperparameters: number of neurons (h 1 ), batch size (bs), and learning rate (η). All models used "relu" (rectified linear unit) as an activation function, which resulted in better performance than the default "tanh" activation in prior tests. Lastly, models were trained using the adam (adaptive moment estimation) optimizer during 100 epochs by minimizing the MAE loss function. Table 2 exhibits the hyperparameters' search space and the final value used per DL method. Results and analysis. Table 3 presents the performance of the developed DL models in comparison with the Baseline model based on RMSE and MAE metrics per region and the resulting mean. Notice that the metrics are in the real scale according to the number of users per region (cf. Table 1 ). That said, although R1 presents higher metric values, it does not necessarily mean worse results. One solution could be normalizing the data. Besides, Fig. 3 illustrates for each region forecasting results for the last day of our testing set, which includes the real number of people and the predicted ones by each RNN: LSTM, GRU, BiLSTM, and BiGRU. As one can notice, all DL models consistently outperform the Baseline model. On average, the BiGRU model outperformed all other forecasting methods, with results highlighted in bold in Table 3 of developing DL models for this multivariate forecasting task. Similar scores were achieved by the GRU and BiLSTM models with an average RMSE around 1241. The least performing DL method in our dataset was the LSTM model. Extending the architectures, hyperparameters range, lag values (i.e., test with less or more input time steps), dropout layers, for example, could probably improve our models and change the resulting best technique. However, we will focus our attention on a comparative analysis of differentially private DL methods in the next subsection and, thus, these possible extensions are left as future work. Methods evaluated. We consider two privacypreserving ML settings presented in Subsection 2.3, namely, input perturbation (IP) and gradient perturbation (GP). Thus, we selected only the DL method that performed the best with original data, i.e., BiGRU (cf. For the model selection stage, we first start with BiGRU[GP] since it allows defining a range of , which is dependent on several hyperparameters of DP-SGD. For a fair comparison between both settings, we utilize the given range of to develop BiGRU[IP] models too. Notice, however, that in both scenarios, ( , δ)-DP can be ensured to each time series data sample. On the other hand, this also means that the same user may have contributed to all n l = 3120 training samples and, thus, in the worst case, the sequential composition in Proposition 2 applies. With these elements in mind, we considered high privacy regimes ( 1) such that the maximumˇ = n l i=1 i is compatible with real-world DP deployed systems [45, BiGRU[GP] model selection. In addition to standard hyperparameters h 1 , bs, and η (cf. Subsection 3.2), we also included the TFP hyperparameters in the Bayesian optimization with 100 iterations to minimize loss = (RM SE avg + RM SE std ) × e ; the multiplicative factor e is a penalization on high values of , which varies depending on the hyperparameters used per iteration. More specifically, given the number of training samples n l = 3120, we fix the following hyperparameters: the number of epochs equal 100, num microbatches = 5, noise multiplier equal {35, 70, 140, 500}, respectively, and δ = 10 −7 , which respects n l i=1 δ i < 1/n l [22] . This way, we varied h 1 , bs, η, and l2 norm clip according to Table 4, which exhibits the hyperparameters' search space, the final value used per BiGRU[GP] model, and the resulting privacy guarantee calculated with the compute dp sgd privacy function [33] , and the overalľ = n l i=1 i . Lastly, all BiGRU[GP] models also used "relu" as an activation function and were trained using the differentially private adam optimizer by minimizing the MAE loss function. BiGRU[IP] model selection. We fix δ = 10 −7 and we apply the Gaussian mechanism [22] , by varying according to Table 4 , to the whole time series data, as it would be done if such system had been deployed in real life. The metrics, however, are computed in comparison with original raw time series data. Because input perturbation allows using any post-processing techniques, we used the same model selection methodology as for nonprivate BiGRU models to select the best hyperparameters for BiGRU Table 5 presents the performance of differentially private Bi-GRU models trained under input and gradient perturbation regarding the RMSE and MAE metrics per region and the resulting mean values. We also included in Table 5 the utility loss of differentially private BiGRU models in comparison with non-private ones, for both RMSE and MAE averaged metrics E , calculated as: in which E N P is the result of Non-Private BiGRU (cf. averaged metric values in bold from Table 3 ) and E DP refers to the results of either BiGRU [GP] or BiGRU[IP] models. Indeed, Eq. (2) will be positive unless the differentially private model outperforms the non-private one (which is not the case in our results). We remarked in our experiments that since there is a sufficient number of users per time series sample (cf. Table 1) , it was still possible to make accurate forecasts in both privacy-preserving ML settings with the experimented range of ( , δ)-DP. Indeed, one can notice that all differentially private BiGRU models achieved averaged RMSE lower than 1250, in which the worst result achieved by BiGRU[IP] 4 is just 2.8% less precise than the non-private BiGRU model. What is more, in both gradient and input perturbation settings, differentially private BiGRU models achieved smaller error metrics than non-private LSTM, BiLSTM, and GRU models (cf. Table 3 ). For instance, both BiGRU[GP] 2 and BiGRU[IP] 2 reached similar scores in comparison with the non-private BiGRU model, with a loss of performance of about 0.57% and 0.62%, respectively. These results are highlighted in underlined font, which represents our best results in terms of utility, with differentially private BiGRU models. Interestingly, the accuracy (measured with the RMSE metric) of differentially private BiGRU models did not necessarily decrease according to more strict , i.e., lower values. One can note that results with 2 and 3 were more accurate than with 1 . This way, in terms of a good privacy-utility trade-off, both BiGRU[GP] 3 (0.92% less accurate) and BiGRU[IP] 3 (1.53% less accurate) presented good metrics scores while satisfying a low value of , with results highlighted in bold. Indeed, in the worst-case scenario, a user that was present in each data point would have leakedˇ 3 = 111.384 at the end of 65 days (i.e., ∼ 1.7 per day), which follows realworld DP systems deployed by industry nowadays [45, Section 8.4] . Lastly, Fig. 4 illustrates for each region forecasting results for the last day of our testing set, which includes the real number of people and the predicted ones by the following models: Baseline, non-private BiGRU, BiGRU[GP] 3 , and BiGRU[IP] 3 . As one can notice, similar forecasting results were achieved by both nonprivate and DP-based BiGRU models, which clearly outperforms the Baseline model. Mobile phone CDRs have been largely used to analyze human mobility in several contexts, e.g., the spread of infectious diseases [55, 8] , natural disasters [28, 20] , tourism [40] , and so on. However, on analyzing mobility data, de Montjoye et al. [36] show that humans follow particular patterns, which allows predicting human mobility with high accuracy. For instance, in a CDRs dataset of 1.5 million users, the authors showed that 95% of this population can be re-identified using four approximate locations and their timestamps. Indeed, the uniqueness of mobility data has also been studied in [38] , for example, in which the authors concluded that location trace has higher identifiability than a face matcher in a partial-knowledge model. For this reason, MNOs tend to publish aggregated mobility data [8, 57, 54, 53, 40] , which provides some form of anonymity-based protection. However, as recent studies have shown, even aggregated mobility data (e.g., heatmaps) can be subject to membership inference attacks [42, 43] and users' trajectory recovery attack [53, 57] . More precisely, the authors in [53, 57] showed that their attack reaches accuracies as high as 73% ∼ 91%. Therefore, it is vital to design privacypreserving techniques that allow analyzing human mobility [35] . Moreover, along with collecting time-series data, extracting meaningful forecasts is also of great interest. Time series forecasting has been a key area of ML research and application across many domains, e.g., medicine [44] , finance [47] , electrical power [52, 37] , mobility [30] , and so on. However, even ML models trained with raw data can also indirectly reveal sensitive information [16, 50, 15, 49] , in particular, RNNs [58] . To protect ML models against such threats, under the state-ofthe-art DP guarantee [21, 22] , there exist some privacypreserving ML alternatives adopted in the literature, e.g., input [18, 31, 23, 29, 9, 25] , gradient [4, 33, 51, 60, 48] , and objective perturbation [17] . The contribution of our research is significant for those involved in urban planning and decisionmaking [35] , providing a solution to the human mobility multivariate forecast problem through RNNs and differentially private BiGRUs. In addition, we point out the research community to the Github page mentioned in the introduction section, in which we release the mobility dataset used in this paper for further experimentation with time series, machine learning, and privacy-preserving methods. The related literature to our work includes the generation of synthetic mobility data [41, 34, 10] , the development of Markov models to infer travelers' activity pattern [59] , and the development of privacy-preserving methods to analyze CDRs-based data [7, 11, 5] . Besides, the work in [30] surveys non-private deep learning applications to mobility datasets in general. Concerning differentially private deep learning, one can find the application of gradient perturbation-based DL models for load forecasting [51] , an evaluation of differentially private DL models in fed- [29] , the proposal of locally differentially private DL architectures [31] , practical libraries for differentially private DL [33, 60] , and theoretical research works [4, 48, 18] . In this work, accurate multivariate forecasts were achieved with four non-private RNNs (i.e., LSTM, GRU, BiLSTM, and BiGRU), with BiGRU standing out among the four methods. Thus, this paper further evaluated both input and gradient perturbation settings to forecast multivariate aggregated mobility time series data using the BiGRU neural network. Between both input and gradient perturbation settings, although not measured, BiGRU[GP] models took more time to execute than BiGRU[IP] models due to DP-SGD. In terms of accuracy, BiGRU[GP] models consistently outperformed BiGRU[IP] models for the same ( , δ)-DP privacy level in our experiments. One reason for such result is because the input data perturbation setting adds DP guarantees to each time series point in the data, trading privacy with utility. This is indeed one fundamental problem in DP theory [22] in which local DP algorithms has low utility in comparison with centralized DP algorithms. On the other hand, as BiGRU[GP] is trained over non-DP time-series data, the mobility dataset is still subject to data leakage [32] , and consequently, membership inference attacks [42, 43] , and users' trajectory recovery attacks [53, 57] , which requires strong security measures. Therefore, training ML models over differentially private multivariate time series mobility data provides the best privacy-utility trade-off. In practice, the input-perturbation setting allows applying centralized DP mechanisms in the final aggregate data (e.g., Laplace mechanism [21] , Gaussian mechanism [22] ), essentially refreshing the privacy budget on a regular basis, and using the published data for any purpose (cf. Proposition 1). Finally, some limitations and prospective directions of this paper are described in the following. For differentially private BiGRU models, we only provided lower and upperˇ bounds for the privacy guarantee of each sample in the time-series data. Using, however, advanced composition theorems [22] to account for the final privacy budget for each user was out of the scope of this paper. Indeed, CDRs are event-based [39, 20] , which means that data are only available when users actively make phone calls (or connect to the internet, or send SMS). This way, there may have users who make several calls (e.g., business people) and, thus, have higher values ofˇ , while some groups do not, e.g., poor people. Besides, although the developed DL models outperform the Baseline model (x t+1 = x t ), there is plenty of room for improvements to be carried out on hyperparameters optimization, data scaling, the number of lag values, etc. For instance, some high-peak values were missed by both non-private and DP-based DL models (see Figs. 3 and 4 ). In addition, we fixed the number of lagged values to 6 to predict a single step-ahead in the future (i.e., the forecasting horizon), in which the former can be tuned for performance improvement and the latter can be increased for multi-step forecasting tasks. Thus, besides the aforementioned directions, for future work, we suggest and intend to investigate a more complex DL architecture to improve the results of DL models proposed in this paper for this multivariate time series forecasting task. Lastly, investigating the data leakage through membership inference attacks [49, 58] of both privacy-preserving ML settings is also a prospective and intended direction. This paper provides the first comparative evaluation of differentially private DL models in both input and gradient perturbation settings to forecast multivariate aggregated mobility time series data. Experiments were first carried out with four non-private DL models (i.e., LSTM, GRU, BiLSTM, and BiGRU). The BiGRU model best fitted our data and, thus, it was selected for building differentially private DL models. Under gradient and input perturbation settings, i.e., BiGRU [GP] and BiGRU[IP], respectively, four values of 1 were evaluated. As shown in the results, differentially private BiGRU models achieve nearly the same performance as non-private BiGRU models, with loss in performance varying between 0.57% to 2.8% (for the RMSE metric). Thus, we conclude that it is still possible to have accurate multivariate forecasts in both privacy-preserving ML settings. More specifically, although the gradient perturbation setting preserved more accuracy than the input perturbation setting, input perturbation guarantees stronger privacy protection (i.e., both for the ML model and for the data itself), thus providing the best privacy-utility trade-off. Acknowledgements This work was supported by the EIPHI-BFC Graduate School (contract "ANR-17-EURE-0002") and by the Region of Bourgogne Franche-Comté CAD-RAN Project. The work of Héber H. Arcolezi has been partially supported by the ERC project Hypatia, grant agreement Nº 835294. The authors would also like to thank the "Orange Application for Business" team for their continuous collaborations and useful feedback. All computations have been performed on the "Mésocentre de Calcul de Franche-Comté". No funding was received to assist with the preparation of this manuscript. Confinements liésà la pandémie de COVID-19 en france Commission nationale de l'informatique et des libertés (CNIL) General data protection regulation (GDPR) Deep learning with differential privacy. CCS '16 A case study: Privacy preserving release of spatio-temporal density in Paris Google COVID-19 community mobility reports: anonymization process description (version 1.1) Sanitization of call detail records via differentially-private bloom filters The contribution of telco data to fight the COVID-19 pandemic: Experience of telefonica throughout its footprint Preserving geo-indistinguishability of the emergency scene to predict ambulance response time Mobility modeling through mobile data: generating an optimized and open dataset respecting privacy Longitudinal collection and analysis of mobile phone data with local differential privacy Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures A survey of results on mobile phone datasets analysis Aggregated mobility data could help fight COVID-19 The secret sharer: Evaluating and testing unintended memorization in neural networks Extracting training data from large language models Differentially private empirical risk minimization RNN-DP: A new differential privacy scheme base on recurrent neural network for dynamic trajectory privacy protection Empirical evaluation of gated recurrent neural networks on sequence modeling Mobile phone data for urban climate change adaptation: Reviewing applications, opportunities and key challenges Calibrating noise to sensitivity in private data analysis The algorithmic foundations of differential privacy The influence of differential privacy on short term electric load forecasting Commission recommendation (eu) 2020/518 of 8 april 2020 on a common union toolbox for the use of technology and data to combat and exit from the COVID-19 crisis, in particular concerning mobile applications and the use of anonymised mobility data Privacy-preserving prediction of victim's mortality and their need for transportation to health facilities Recurrent neural networks for time series forecasting: Current status and future directions Long short-term memory Towards understanding communication behavior changes during floods using cell phone data Privacy preserving timeseries forecasting of user health data streams A survey on deep learning for human mobility Local differential privacy for deep learning World's biggest data breaches & hacks A general approach to adding differential privacy to iterative training procedures DP-WHERE: Differentially private modeling of human mobility On the privacy-conscientious use of mobile phone data Unique in the crowd: The privacy bounds of human mobility Multi-step wind speed forecasting based on hybrid multi-stage decomposition model and long short-term memory neural network Toward evaluating reidentification risks in the local privacy model Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle Flux vision: real time statistics on mobility patterns A non-parametric generative model for human trajectories What does the crowd say about you? evaluating aggregation-based location privacy Measuring membership privacy on aggregate location time-series A review on COVID-19 forecasting models. Neural Computing and Applications Linkedin's audience engagements API: A privacy preserving data analytics system at scale Bidirectional recurrent neural networks Financial time series forecasting with deep learning : A systematic literature review Privacy-preserving deep learning. CCS '15 Membership inference attacks against machine learning models Machine learning models that remember too much Differentially private deep learning for load forecasting on smart grid Wavelet group method of data handling for fault prediction in electrical power insulators A new privacy breach: User trajectory recovery from aggregated mobility data On the use of data from multiple mobile network operators in europe to fight COVID-19 Commentary: Containing the ebola outbreak -the potential and challenge of mobile network data WHO announces COVID-19 outbreak a pandemic Trajectory recovery from ASH On the privacy risks of deploying recurrent neural networks in machine learning A generative model of urban activities from cellular data Opacus: User-friendly differential privacy library in pytorch The authors have no conflicts of interest to declare that are relevant to the content of this article. Data can be accessed on the following Github page (ht tps://github.com/hharcolezi/ldp-protocols-mo bility-cdrs). Codes are available on the following Github page (ht tps://github.com/hharcolezi/ldp-protocols-mo bility-cdrs). This article does not contain any studies with human or animal subjects performed by any of the authors. As this article does not contain any studies with human participants or animals, the informed consent is not applicable.