key: cord-0996934-x9x7z7bc
authors: Chen, Langqiao; Lu, Yuhuan; He, Zhaocheng; Chen, Yixian
title: Online Trajectory Estimation Based on a Network-Wide Cellular Fingerprint Map
date: 2022-02-18
journal: Sensors (Basel)
DOI: 10.3390/s22041605
sha: 41485f6d6358c56ce069af55c66d7202916d012e
doc_id: 996934
cord_uid: x9x7z7bc

Cellular signaling data is widely available in mobile communications and contains abundant movement sensing information of individual travelers. Using cellular signaling data to estimate the trajectories of mobile users can benefit many location-based applications, including infectious disease tracing and screening, network flow sensing, traffic scheduling, etc. However, conventional methods rely too much on heuristic hypotheses or hardware-dependent network fingerprinting approaches. To address the above issues, NF-Track (Network-wide Fingerprinting based Tracking) is proposed to realize accurate online map-matching of cellular location sequences. In particular, neither prior assumptions such as arterial preference and less-turn preference or extra hardware-relevant parameters such as RSS and SNR are required for the proposed framework. Therefore, it has a strong generalization ability to be flexibly deployed in the cloud computing environment of telecom operators. In this architecture, a novel segment-granularity fingerprint map is put forward to provide sufficient prior knowledge. Then, a real-time trajectory estimation process is developed for precise positioning and tracking. In our experiments implemented on the urban road network, NF-Track can achieve a recall rate of 91.68% and a precision rate of 90.35% in sophisticated traffic scenes, which are superior to the state-of-the-art model-based unsupervised learning approaches.

Mobile phones have become a kind of universal instant messengers nowadays, and their users generate huge amounts of real-time records that form the backbone of ubiquitous computing in heterogeneous sensor networks [1] [2] [3] . As the most well-applied one, GPS positioning data has given vital information for many applications, including traffic sensing, traffic incident detection, travel prediction, route recommendations, etc. [4] [5] [6] However, mobile users such as regular commuters would prefer to keep GPS off while unnecessary. Hence, the backend systems of GPS service providers could only observe partial location data of users. As a supplement to GPS localization, cellular signaling data has become a promising type of mobile phone data that ceaselessly records the signaling between base stations and every mobile device. It has been widely used in location-based services, including route recommendation, traffic scheduling, etc. Thus, a large number of technologies have been proposed for user location sensing based on cellular data [7] [8] [9] [10] [11] .

As an extensive application originating from user localization, real-time and accurate estimation of a cellular-based trajectory is of great significance for urban individual movement monitoring and management nowadays (i.e., infectious disease tracing and screening [12] [13] [14] and network flow sensing and controlling [15] [16] [17] [18] [19] [20] ), because cellular signaling data can be passively sent back to the cloud systems in real time and contains a wealth of movement information of individual travelers. An online trajectory estimation based on cellular signaling sequences has been widely investigated for years. Some studies have attempted to filter large localization errors and directly decouple the movement trajectory under network constraints [21] [22] [23] [24] . However, cleaned cellular localization was still unsatisfactory in dense urban road networks, because the positioning errors might probably be larger than the road span. Some other researchers innovatively applied advanced deep learning approaches in the trajectory estimation process and achieved good performance in their field test [25] . Nonetheless, the deep learning-based models need large amounts of ground truth trajectories for convergence, and therefore, their utility is limited. For these reasons, the mainstream of the existing solutions was based on the Hidden Markov Model (HMM) framework, which could combine the interfering noise and map-matching rules into the trajectory to reinforce the inference process [26] [27] [28] .

Most of the above methods are characterized by unsupervised learning and require a certain amount of prior knowledge to accurately track a mobile device. An enormous number of manual conceptions and rules, including arterial preference, less-turn preference, and same direction preference, were predefined in such unsupervised learning approaches, which weakens the generalization ability of the models, because travelers will not always behave according to these predefinitions due to their various transport modes and dynamic route choices. Hence, how to free the estimation process from these prior assumptions becomes a meaningful question to be answered.

Nowadays, fingerprint mapping is utilized as the mainstream localization by associating the locations in the testing field with a unique wireless signal feature [29] [30] [31] [32] . On this basis, a mobile device can match itself to a specific location with the real-time measured signal features. Therefore, a fingerprint map is capable of providing sufficient prior knowledge, and many previous studies endeavored to migrate it to road network scenarios for precise positioning and tracking [33] [34] [35] [36] . However, the majority of location fingerprinting methods are hardware-dependent, because the construction fingerprint map and positioning process require the RSS (Received Signal Strength), SNR (Signal-to-Noise Ratio) and TA (Time Advanced) information, which is only available with special devices and violates the ubiquity [24, 37] . For this reason, these approaches are not lightweight enough to be applied to the cloud computing backend of telecom operators with RSS absence, which means conventional fingerprinting methods can only locate a mobile phone itself actively rather than calculate at the network side passively on a large scale. Therefore, conventional fingerprinting methods need some promotion to overcome these restrictions. In this paper, we revisit the map-matching task from the data perspective and propose to utilize the great power of data to help solve the aforementioned problems that obstruct conventional unsupervised methods. Therefore, a novel online trajectory estimation framework, NF-Track (Network-wide Fingerprinting based Tracking), is presented. Unlike the current fingerprinting technologies for mobile device localization that calibrate many extra hardware-relevant parameters at the grid scale, NF-Track provides more accurate localization by dividing the area of interest into road segments, and the fingerprint features are constructed with road segments as the basic units. This innovation not only avoids the trouble of standing at each grid for a certain time to draw the signal strength histogram but also benefits the trajectory estimation, as the mobile device is mostly on the road network. On the basis of the built segment-granularity fingerprinting map, an anchor-based similarity calculation model is developed to achieve online prediction. Compared to unsupervised methods, the proposed online prediction architecture depends on prior knowledge about segment-level signal features given by fingerprint map, which enhances the stability and precision of trajectory estimation.

To sum up, the major innovations of NF-Track are as follows:

(1) Distinct from current fingerprinting technologies that are hardware-dependent, NF-Track is supported by a data-driven fingerprint map, which not only improves the efficiency of signal collection but also benefits online cellular location sequence map-matching. (2) On the basis of such fingerprint map, the proposed trajectory estimation algorithm is independent of either hardware-relevant information in conventional fingerprint- ing approaches or heuristic hypotheses that are widely leveraged by unsupervised methods. Therefore, NF-Track is suitable for being deployed over cloud computing backends where only cellular localization is available. (3) We conduct our experiments on a real-world urban dataset. The results demonstrated the significant advantages of our real-time trajectory estimation approach contrasted with the current state-of-the-art online map-matching algorithm, especially for the estimation of irregular trajectories that are more twisted than the regular trajectories that prefer the shortest and straightest paths.

The rest of this paper is organized as follows. In Section 2, we discuss the relevant literature. In Section 3, we elaborate on the data preprocessing and the problem formulation. Section 4 details the design of our NF-Track. Section 5 presents the performance evaluation of our framework. Finally, Section 6 concludes this paper.

Wireless device positioning and tracking are of great significance for individual movement monitoring and management in sensor networks. Due to the exclusion of GPS signals in indoor areas, location fingerprinting is proposed to determine devices' locations in wireless areas based on the measured signal features, e.g., Received Signal Strength (RSS), Signal-to-Noise Ratio (SNR) and Time Advanced (TA) [29] [30] [31] [32] . Specifically, location fingerprinting is the process of associating locations in an environment with some sort of wireless signal feature that is unique to that location. The location fingerprinting-based positioning methods can be divided into two stages [38] . One is the offline fingerprint map construction, and the other is the online location estimation. In the offline stage, a rectangular grid of points is cast on the two-dimensional space, as Figure 1 shows below. Each spatial point is calibrated with a basic fingerprint that is associated with a unique feature constructed from the signal propagation characteristics. After all points are calibrated and stored, the fingerprint map is constructed. Then, in the online stage, while the signal features like RSS and SNR are measured by the device online, we can infer its location to a spatial point whose fingerprint is most similar to the measurement results. If the signal features are obtained continuously, the device can be tracked by connecting its estimated locations.

trajectories that prefer the shortest and straightest paths.

The rest of this paper is organized as follows. In Section 2, literature. In Section 3, we elaborate on the data preprocessing an tion. Section 4 details the design of our NF-Track. Section 5 presen uation of our framework. Finally, Section 6 concludes this paper.

Wireless device positioning and tracking are of great sig movement monitoring and management in sensor networks. Du signals in indoor areas, location fingerprinting is proposed to dete in wireless areas based on the measured signal features, e.g., R (RSS), Signal-to-Noise Ratio (SNR) and Time Advanced (TA) [29- fingerprinting is the process of associating locations in an enviro wireless signal feature that is unique to that location. The locati positioning methods can be divided into two stages [38] . One map construction, and the other is the online location estimatio rectangular grid of points is cast on the two-dimensional space, a Each spatial point is calibrated with a basic fingerprint that is a feature constructed from the signal propagation characteristics. brated and stored, the fingerprint map is constructed. Then, in th signal features like RSS and SNR are measured by the device onl tion to a spatial point whose fingerprint is most similar to the me signal features are obtained continuously, the device can be track timated locations. Some research migrated the location fingerprinting-based positioning methods to urban environments later, where the location in communication space is associated with a unique fingerprint structured by the signal features observed from the surrounding cellular base stations, and then, the outdoor mobile device can be located online. For example, to cope with the absence of GPS positioning under some circumstances, Ibrahim and Youssef [8] built an RSS-based location fingerprinting system for mobile phone positioning in the area of interest. Chen et al. [39] made a great effort to create a cellular location fingerprinting system in a metropolitan environment with higher base station density and achieved good performance in the mobile device position estimation.

Besides the outdoor area, some researchers endeavored to apply fingerprinting location technology to road networks, because the localization of mobile devices along a route is meaningful to many traffic management applications like the travel time estimation [31] . They firstly divided the road into several segments, and then, the distribution of the RSS from different base stations was measured as a segment-associated fingerprint for each segment. From then on, mobile devices' segment-granularity positioning could be realized by observing their RSS distribution online.

These methods provided a new perspective for individual positioning and tracking based on the prior knowledge extracted from observed features. It can be seen that location fingerprinting technologies have great potential in urban traveler management.

The online trajectory estimation is the task of mapping a set of real-time locations with errors to the corresponding points on the road network. The location data comes from a variety of moving objects, such as vehicles and mobile phones. In the current transportation research, GPS data is the spatiotemporal locations that are most commonly used. Online trajectory estimation technologies were early applied to align the low-sampling-rate GPS series to the digital road network [40] [41] [42] . The general processing of online trajectory is based on sliding window methods, which divide the trajectory into several input sequences and handle them independently. The window size is the significant parameter that affects overall performance. In previous studies, many efforts were made to investigate the estimation accuracies and computing latencies in different window sizes [42, 43] . To promote the system performance, researchers proposed various sliding window algorithms, including fixed sliding window, bounded variable sliding window, adjustable sliding window, etc. [4, 42, 44] . These works benefit many real-time continuous location-based services.

Besides GPS data, cellular data contains large-scale spatiotemporal movement information of urban travelers because mobile phones have a very high penetration rate among citizens nowadays. Therefore, how to estimate their trajectories with cellular signaling sequences becomes the subject of growing concern.

Due to the sparseness and fuzziness of cellular data, cellular signaling sequence mapmatching are mainly targets at addressing the issues of uncertainty. Some researchers tried to decouple individual movement information from a cellular signaling sequence by means of clustering, interpolation, etc. to realize the trajectory reconstruction on road networks [21] [22] [23] [24] . These methods can directly clean and smooth the cellular signaling sequences, quickly analyze the staying and moving behavior of travelers and extract their approximate spatial trajectories. Herein, density-based clustering algorithms are mainly used to partition cellular localizations into several groups, from which low-density localizations can be obtained. Meanwhile, many studies made efforts to promote the clustering performance by further investigating the intrinsic clustering structure [45, 46] , features employment [47] and noise robustness [48] . These efforts were meant to capture higher accuracy localizations of travelers. However, due to the complexity of road network, it might lead to a situation where there were many reasonable candidate matching paths for a certain cellular signaling sequence. Thus, many of these methods could only approximate individual cellular signaling sequence to the main traffic corridors of the city rather than a specific path, which was limited to implement refined management and guidance on travelers. Therefore, the individual movement information extracted from the cellular signaling sequence needs to be extensively reinforced to achieve accurate trajectory estimation. The fundamental way is to introduce prior rules to aid in fine-grained map-matching. Hence, many studies developed various rule-based methods to combine prior rules such as travelers' path selection preferences in order to achieve a well-estimated individual trajectory. Yuan Y et al. [49] matched the cellular signaling sequence to the suburban road network well based on the assumption that drivers tended to choose high-grade roads. As for the urban environment, many studies used HMM algorithms to decode the cellular sequence to a mobile trajectory, which could take the noise of cellular data into account and enhance the map-matching performance [24, 28] . However, these methods only performed in offline mode and did not deal with the live stream of the input locations, which limited the power of real-time positioning data.

Enlighted by existing online map-matching algorithms that used a sliding window to handle the input sequence incrementally [4, 42] , several HMM models were modified and introduced many heuristic mechanisms to the online trajectory inference process. Mohamed et al. [26] realized real-time trajectory estimation by an HMM that incorporated several heuristics like major road and shortest path preference to reduce the noise in the mapmatching process. Due to travelers who would decide their route based on the estimation of traffic status and travel time on roads [50, 51] , Jagadeesh et al. [27] proposed an HMM-based map-matching framework with a pre-estimated route choice model, which was robust for inaccurate and sparse location data. Besides HMM-based methods, advanced deep learning approaches were brought in the application of online map-matching of cellular signaling sequences. For instance, Shen Z et al. [25] innovatively trained recurrent neural networks to capture the mobility pattern from enormous cellular data and then realized the trajectory estimation. Nonetheless, in order to ensure the convergence of the model, the deep-learning algorithm needed not only large amounts of sample trajectories for offline training but also many heuristic hypotheses of arterial preference, same direction preference and less-turn preference.

These online map-matching methods had the following two main limitations. One is that some of them could only achieve fine results in wide-range suburban areas or urban high-grade road networks whose topology was relatively simple; therefore, the heuristic assumptions mentioned above are compatible with them. However, that was not the case for crisscrossed urban road networks. The other is that such methods were highly dependent on prior assumptions to ensure the estimation accuracy, which would weaken the generalization ability of models. Therefore, it is worthwhile to realize the flexible tracking of free-moving and irregular travelers in urban environments purely based on cellular signaling data.

Therefore, different from the aforementioned unsupervised methods that rely too much on the heuristic hypothesis, some researchers attempted to develop cellular trajectory map-matching methods based on location fingerprinting technology, which would not rely on any assumptions about travelers' preferences. For example, Thiagarajan et al. [34] divided the area of interest into uniform square grids and associated a set of observed base stations and their RSS values with each grid offline. The cellular signaling sequence was first matched to a sequence of traversed grids through HMM methods, and then, that grid sequence was further matched to road segments. Dalla et al. [35] observed the surrounding base stations and their signal strengths at the spatial points along the road, which were regarded as basic cellular fingerprints, and exploited in the trajectory estimation process.

The generalization of these fingerprint-based trajectory estimation methods was restricted from two perspectives. On the one hand, these approaches were hardware-relevant, because they relied on extra parameters like RSS and SNR. Hence, they were hard to be applied to the cloud computing systems of telecom operators whose signaling data contained cellular location only [37] . On the other hand, in order to build up a location fingerprinting system, these fingerprint-based trajectory estimation methods commonly associate the road network with massive observation points, and therefore, the fingerprinting process is quite inefficient based on point-grained fingerprint maps. Furthermore, point-grained location fingerprinting has another drawback that the concentration of fingerprint points might be hard to differentiate the signal features from each other and thereby increase the chance of positioning errors.

Inspired by this kind of research, we build a novel network-wide fingerprinting system for real-time cellular signaling sequences map-matching, which is well-behaved without either heuristic hypothesis or extra RSS information.

Referring to previous network fingerprinting approaches [33] [34] [35] , the benchmark data of this study is collected through war-driving on the road network, as Figure 2 illustrates. more, point-grained location fingerprinting has another drawb of fingerprint points might be hard to differentiate the signal fe thereby increase the chance of positioning errors.

Inspired by this kind of research, we build a novel networ tem for real-time cellular signaling sequences map-matching, w out either heuristic hypothesis or extra RSS information.

Referring to previous network fingerprinting approache data of this study is collected through war-driving on the road trates. Table 1 presents some record samples of the acquired d contains the Device ID, recorded timestamp and the connectin cellular information, the GPS coordinates are recorded simulta itude) and regarded as the true location of the mobile device fo The frequency of data acquisition is around 1 Hz, which form this study. 

The main notations in this paper are shown in Table 2 . Table 1 presents some record samples of the acquired dataset. A signaling record contains the Device ID, recorded timestamp and the connecting base station. Besides the cellular information, the GPS coordinates are recorded simultaneously (longitude and latitude) and regarded as the true location of the mobile device for modeling and validation. The frequency of data acquisition is around 1 Hz, which forms the benchmark dataset of this study. 

The main notations in this paper are shown in Table 2 .

A cellular signaling sequence obtained from telecom operators is a series of chronologically ordered locations generated by a mobile device, e.g., p 1 , p 2 , . . . , p k , where each location contains the geospatial coordinate of the signaling base station and a timestamp such as p = (lon, lat, t). The task of this study is transforming the cellular signaling sequence into a road segments series E 1 , E 2 , . . . , E n . As a result, the accuracy of the estimated trajectory can be evaluated by the ground truth GPS sequence. 

Integrated SBSS of P R Seq F infer Estimated CMD proportion of B Seq F in f er R(P) Integrated fingerprint of P B Seq F true Base station set of Seq F true AP Anchor point R Seq F true Estimated CMD proportion of B Seq F true

In this section, we discuss the trajectory estimation architecture of NF-Track in detail. To be specific, the overview of the estimating framework (see Figure 3 ) is shown below. 

The fingerprint is a multidimensional feature vector extracted from the original cellular signal in essence [38] . Thus, our innovative segment-granularity fingerprint map is built by following steps. Offline stage. In this stage, we first align the historical cellular series to the GPS series at the road segment scale. Then, the signal features of each road segment are extracted from the aligned cellular data as a unique fingerprint, and lastly, the obtained fingerprints constitute a network-wide cellular fingerprint map.

Online stage. For an input real-time cellular sequence, the online tracking process is implemented in the cloud computing backend of telecom operator as follows. Firstly, the spatial clustering method is applied to acquire a handful of high-confident anchors. Since each anchor is likely to have several candidate segments for map-matching, the preconstructed fingerprint map is taken as a reference to determine the most likely one. After that, the identified road segments are connected into a final trajectory.

The fingerprint is a multidimensional feature vector extracted from the original cellular signal in essence [38] . Thus, our innovative segment-granularity fingerprint map is built by following steps.

After network segmentation, we summarize the fingerprint features vector for each road segment. Actually, the observable information of cellular signaling data includes the user ID, signal recorded time and the connected base station. To characterize the features of cellular system in the road space based on these vanilla fields, the fingerprint features vector for a specific segment consists of two components: the set of stable impacting base stations and the corresponding set of impacting weights.

If a Stable impacting Base Station (SBS) has a stronger signal covering the current segment, mobile devices are more likely to connect with it subsequently. The grouped Stable impacting Base Station Sets (SBSSs) are probably distinct from one another due to the spatial heterogeneity of segments. Therefore, it is reasonable to consider SBSS as a fingerprint feature of the road segment. The extraction algorithm of SBSS is as follows.

Suppose n cellular sequences Seq 1 , Seq 2 , . . . , Seq n are collected on the road segment E and m base stations BS 1 , BS 2 , . . . , BS m have ever been recorded in them. Then, the frequency τ (1 ≤ τ ≤ n) of each base station in the n cellular signaling sequences is counted. If τ > θ·n, the base station is designated as an SBS. Here, θ represents a critical threshold for SBS selection, and its value is set to 0.375 according to the sensitivity analysis in Section 5. Finally, all SBSs of E are put in a set, denoted by S(E).

It is insufficient to use S(E) as the fingerprint feature alone, because two road segments are possible to have the same fingerprint characteristics when they are very close to each other in space. Therefore, after the SBSs are sifted out, their impacting weights on the road segment are significant and further modeled. It is prone to recognizing that larger transmission power and less propagation loss give base stations a higher impact on the road segment. As for its residing mobile devices, their received signal strength is higher, and the connections are more stable. Therefore, the traveling mobile devices' moving distance during connections with such base stations is probably longer. Therefore, we propose the Cumulative Moving Distance (CMD) to indicate an SBS's impacting weight on a road segment. For example, suppose there is a mobile device passing through a road segment E, and SBS is one of its S(E). Then, CMD(Seq, E, SBS) denotes the CMD of that mobile device while it communicates with SBS and moves on E, where Seq represents the generated cellular signaling sequence at that time. To eliminate the influence of randomness, the average CMD denoted by CMD(E, SBS) is obtained from n collected cellular signaling sequences Seq 1 , Seq 2 , . . . , Seq n as below to model the impacting weight of SBS on E.

CMD where Seq i represents the ith in k (1 ≤ k ≤ n) cellular signaling sequences that contains the records of SBS. Furthermore, CMD(E, SBS) needs to be normalized as R(E, SBS) shown below, because the lengths of the segments greatly affect their values.

Suppose segment E has m SBSs SBS 1 , SBS 2 , . . . , SBS m , and R(E, SBS i ) (1 ≤ i ≤ m) can be calculated as follows:

which is regarded as SBS i 's impacting weight on segment E. Therefore, the weights of SBS 1 , SBS 2 , . . . , SBS m on segment E can be represented by a one-dimensional vector (R 1 , R 2 , . . . , R m ), which is

Herein, R(E) is defined as the fingerprint feature of segment E.

After the fingerprint feature vectors of all road segments are calculated via previous steps, the network-wide fingerprint map can be constructed.

This section further illustrates that the fingerprint features of several segments can be combined to form the fingerprint of a path, which is the reference of cellular signaling sequence map-matching. Suppose a path P consists of a consecutive segment series E 1 , E 2 , . . . , E n . The SBSS of P is derived as follows.

Suppose there are λ base stations in S(P), which is S(P) = {SBS 1 , SBS 2 , . . . , SBS λ }. We have the R(P) representing the corresponding set of impacting weights of S(P), which is designated as the fingerprint feature of path P. The calculating procedure is as follows.

First, to facilitate the computing, S(E i ) and R(E i ) are reset as S(

Here,

where 1 ≤ ρ ≤ λ, and π is the index of SBS ρ in S(E i ). Then, R(P) can be calculated by a weight vector α and an impacting weight matrix β as below:

subject to

where θ i stands for the weight of the ith segment in P, and n ∑ i=1 θ i = 1. It can be calculated as follows:

where L(E i ) denotes the length of the i th segment in E 1 , E 2 , . . . , E n . In general, our fingerprint map has the following two highlights: (a) A novel segmentgrained fingerprinting method based on the cellular signaling sequences is put forward to assist with mobile device positioning and city-wide trajectory estimation. (b) Two inventive features are created to replace the commonplace fingerprinting characteristics like RSS and SNR commonly leveraged by fingerprint maps. This improvement enables the real-time and accurate tracking of mobile devices in the cloud computing backends. On the basis of this fingerprint map, the base station handover sequences can be matched to the urban road network directly.

The online stage realizes the trajectory estimation in three steps: anchor clustering, anchor segment-matching and trajectory reconstruction. In the first step, all of the mobile device's correlative base stations in a specific time span can be spatially clustered into several groups to decrease the positioning error resulted from cellular handover. The group centers with timestamps are denoted by Anchor Points (APs). The second step is to project all APs onto the road segments and generate the sub-trajectories between every two consecutive APs. In the final step, the real-time updated trajectory is produced by linking all the sub-trajectories.

The signal of mobile devices is likely to switch back and forth between several neighboring base stations, which is known as the "Ping-Pong effect", and brings interference to both spatial locating and trajectory estimation. Therefore, we adopt the DBSCAN (Densitybased spatial clustering of applications with noise) algorithm [52] to incrementally generate spatially robust anchor points as signal sources using a time span of input cellular locations. In our model, the neighboring base stations are clustered according to the geodesic distances among them.

The DBSCAN algorithm has two main input parameters: MinPts and ε. Herein, MinPts is the minimum number of base stations to form a cluster, and the value is set as 2 to smooth the "Ping-Pong effect" as much as possible, while ε is the searching radius of a base station location. Notably, the distance between neighboring base stations is generally less than 300 m, which is consequently leveraged as the ε value.

For each clustering group with N base stations, the geometric center of its base stations' positions is calculated by:

Some base stations that cannot be classified into any clustering group are still set as anchor locations, because mobile devices also issue effective connections to them. Accordingly, the position of the that base station BS is exactly the anchor location.

Furthermore, the timestamp of each anchor point is estimated, which paves the way for cellular signaling sequence map-matching in the following steps. We compute the timestamp for each anchor point, referring to the durations of the base stations lying in the corresponding group. For an instance shown in Figure 4 , three base stations: BS 1 , BS 2 and BS 3 are supposed to form an anchor point, whose timestamp T can be derived from the longest periods of BS 1 , BS 2 and BS 3 as below:

timestamp for each anchor point, referring to the durations of the base stations lying in the corresponding group. For an instance shown in Figure 4 , three base stations: , and are supposed to form an anchor point, whose timestamp can be derived from the longest periods of , and as below:

Here, N = 3 in this instance. If the anchor point only corresponds to a base station , its timestamp is given as follows:

The pseudo code of anchor point location capture and its timestamp inference are depicted by Algorithm 1. Take the inferred timestamp calculated by Equation (13); += ⟨ , ⟩; Here, N = 3 in this instance.

If the anchor point only corresponds to a base station BS, its timestamp T is given as follows:

T= MedianTimestamp(BS).

The pseudo code of anchor point location capture and its timestamp inference are depicted by Algorithm 1. In the second step, we match the output anchor points onto the road network chronologically. The first one AP 1 is matched to its nearest road segment as origin, and the APs in the back is matched to specific road segments through fingerprinting process. As Figure 5 illustrates, AP f and AP l are two consecutive APs. The former blue AP f has found its matching place, and the latter orange AP l is waiting to be matched. Due to its large signal coverage, AP l probably has more than one candidate segments. Therefore, from AP f to AP l , there are two optional matching paths: Path A and Path B . It should be noted that the AP's searching radius directly affects the number of options, which is discussed in detail in Section 5. matching place, and the latter orange is waiting to be mat coverage, probably has more than one candidate segmen , there are two optional matching paths:

ℎ and the AP's searching radius directly affects the number of opt detail in Section 5. Next, these two candidate paths are further investigated a following steps, the cellular signaling fragment between compared with the fingerprint features of these two candidate Firstly, we split out the target cellular signaling f timestamps of and , which are denoted by and Suppose the input cellular signaling sequen ⟨ , ⟩, ⟨ , ⟩, … , ⟨ , ⟩ and the connecting base station that at is . Thus, we have: The cumulative connection time of each base station in B they can be organized as a vector: Figure 5 . The orange anchor point probably has more than one matching option.

Next, these two candidate paths are further investigated as below. By performing the following steps, the cellular signaling fragment between AP f and AP l is intercepted and compared with the fingerprint features of these two candidate paths.

Firstly, we split out the target cellular signaling fragment Seq F in f er by the timestamps of AP f and AP l , which are denoted by T f in f er and T l in f er, respectively. Suppose the input cellular signaling sequence is represented by T 1 , BS 1 , T 2 , BS 2 , . . . , T n , BS n and the connecting base station at time T f in f er is BS k and that at T l in f er is BS k+α . Thus, we have:

satisfying T k ≤ T f in f er < T k+1 and T k+α−1 < T l in f er ≤ T k+α . Later, the recorded base stations set in Seq F in f er and their CMD proportion R Seq F in f er are extracted for the comparison with the fingerprint map in order to select the most likely candidate path.

The base stations can be extracted to a set B Seq F infer as follows:

Suppose there are m base stations in B Seq F infer ; that is,

The cumulative connection time of each base station in B Seq F in f er is calculated, and they can be organized as a vector:

W Seq F in f er = (w 1 , w 2 , . . . , w m ). Suppose the traveler moves at the velocity v, so R Seq F in f er can be evaluated by

As a note, the evaluation metric is designed for smooth travel. Certainly, some travelers possibly encounter traffic jams, red light stops or bus boarding on their journeys, which cause some base stations to have unreasonable evaluations of CMD. Nonetheless, the errors can be adaptively eliminated by our segment-grained fingerprinting, which is further demonstrated in Section 5.

Suppose Path A consists of a series of segments E 1 , E 2 , . . . , E ω . S(Path A ) represents the SBS set derived from the pre-calibrated fingerprint map, and the number of included base stations is denoted by ϕ, which is

The candidate path selection criterion is shown as follows. For instance, the matching degree of candidate Path A and Seq F in f er is evaluated by Jensen-Shannon divergence [53] .

The p denotes the proportional distribution of base station's connection range obtained from cellular signaling sequence, which is

The q is the distribution that is constructed for the similarity comparison with p based on the pre-calibrated fingerprint map. The construction procedure is illustrated by Algorithm 2.

Input:

Take i the index of BS in S(Path A ); Take j the index of BS in B Seq F infer ; r j = R i ; ELSE: r j = 0 . ;

Finally, we compare the matching degree of Seq F in f er and candidate paths and choose the largest one as the maximum likelihood matching path. For instance, if Similarity(Path A ) is larger than Similarity(Path B ), the anchor point is matched to the nearest position of Path A , marked as the orange triangle in Figure 6 . Simultaneously, a sub-trajectory that connects the two consecutive APs is generated. rs 2022, 22, x FOR PEER REVIEW Similarity( ℎ ) is larger than Similarity( ℎ ), the ancho nearest position of ℎ , marked as the orange triangle in F sub-trajectory that connects the two consecutive APs is genera 

After the series of anchor points is matched to the road n trajectories are concatenated simultaneously to form the short which is the update of mobile device's estimated trajectory. T trace gets passed in the digital map are extracted and chronolo the trajectory can be transformed into an orderly seq 〈 , , … , 〉, which are jointed by intersections or road junct 

We conduct our experiments on Minzhi and Bantian, t Shenzhen, China, where there is 209.9 km of urban roads in an 

After the series of anchor points is matched to the road network, the generated subtrajectories are concatenated simultaneously to form the shortest path shown in Figure 7 , which is the update of mobile device's estimated trajectory. The road segments that the trace gets passed in the digital map are extracted and chronologically ordered. Therefore, the trajectory can be transformed into an orderly sequence of road segments E 1 , E 2 , . . . , E n , which are jointed by intersections or road junctions. Similarity( ℎ ) is larger than Similarity( ℎ ), the ancho nearest position of ℎ , marked as the orange triangle in F sub-trajectory that connects the two consecutive APs is genera 

After the series of anchor points is matched to the road n trajectories are concatenated simultaneously to form the shorte which is the update of mobile device's estimated trajectory. T trace gets passed in the digital map are extracted and chronolo the trajectory can be transformed into an orderly sequ 〈 , , … , 〉, which are jointed by intersections or road juncti 

We conduct our experiments on Minzhi and Bantian, tw Shenzhen, China, where there is 209.9 km of urban roads in an The whole network is divided into 603 segments, which are ba fingerprint map. 

We Our dataset is classified into two categories: a training set and a testing set. The former is used for fingerprint map construction and the latter in online trajectory estimation scenarios.

The training set and the testing set contain 172 and 306 trajectories, respectively. We implemented the system using Python with an 8 GB RAM, 2.30 GHz core i5 processor.

To assess the robustness of the fingerprint map, we studied 24 segments with an average length of 438 m, of which each has eight cellular signaling sequences. The following two parts are robustness evaluations from the perspective of the SBS set and the impact strength of SBS, respectively.

In the first part, we investigate the variation of SBSS with different numbers of the cellular signaling sequences. Let ( ) be the SBSS extracted from (1 ≤ ≤ 8) sequences. For instance, ( ) represents the SBSS calibrated by the three cellular signaling sequences. The variation of ( ) can be measured by the Jaccard index, which is

and note that ( ) is regarded as the baseline in our case. The variation of SBSS can be quantified by the average J ( ) , ( ) of (here, = 24) segments, and it is denoted by ( ) as follows. Figure 9 shows the variation of ( ) , while the number of cellular sequences used for fingerprint calibration varies from one to eight. Our dataset is classified into two categories: a training set and a testing set. The former is used for fingerprint map construction and the latter in online trajectory estimation scenarios.

The training set and the testing set contain 172 and 306 trajectories, respectively. We implemented the system using Python with an 8 GB RAM, 2.30 GHz core i5 processor.

To assess the robustness of the fingerprint map, we studied 24 segments with an average length of 438 m, of which each has eight cellular signaling sequences. The following two parts are robustness evaluations from the perspective of the SBS set and the impact strength of SBS, respectively.

In the first part, we investigate the variation of SBSS with different numbers of the cellular signaling sequences. Let S (k) be the SBSS extracted from k(1 ≤ k ≤ 8) sequences. For instance, S (3) represents the SBSS calibrated by the three cellular signaling sequences. The variation of S (k) can be measured by the Jaccard index, which is

and note that S (8) is regarded as the baseline in our case.

The variation of SBSS can be quantified by the average J S (k) , S (8) of N (here, N = 24) segments, and it is denoted by Similarity (k) as follows.

(25) Figure 9 shows the variation of Similarity (k) , while the number of cellular sequences used for fingerprint calibration varies from one to eight.

It can be found that the majority of SBSs can be separated from three cellular signaling sequences, because the similarity with baseline is close to 0.8, while most SBSs can be obtained from five, because the similarity reaches up to more than 0.9.

The influence of SBS on each cellular signaling sequence is further inspected by SBSs' temporal ratio, as illustrated in Figure 10 below. It can be found that the majority of SBSs can be separated from three c ing sequences, because the similarity with baseline is close to 0.8, while mo obtained from five, because the similarity reaches up to more than 0.9.

The influence of SBS on each cellular signaling sequence is further insp temporal ratio, as illustrated in Figure 10 below. Notice that the average SBS temporal ratio of 24×8=192 cellular signal is 83.34%. Among all cellular signaling sequences, such a ratio is up to 90% them, which is shown in Figure 11 . It can be found that the majority of SBSs can be separated from three cellula ing sequences, because the similarity with baseline is close to 0.8, while most SBS obtained from five, because the similarity reaches up to more than 0.9.

The influence of SBS on each cellular signaling sequence is further inspected temporal ratio, as illustrated in Figure 10 below. Notice that the average SBS temporal ratio of 24×8=192 cellular signaling se is 83.34%. Among all cellular signaling sequences, such a ratio is up to 90% for 6 them, which is shown in Figure 11 . In conclusion, the SBSs are not only the base stations with high-frequency tions but are also the dominant base stations in terms of connection duration. Th it is reasonable to designate them as SBSs. Notice that the average SBS temporal ratio of 24 × 8 = 192 cellular signaling sequences is 83.34%. Among all cellular signaling sequences, such a ratio is up to 90% for 61.80% of them, which is shown in Figure 11 . It can be found that the majority of SBSs can be separated from three c ing sequences, because the similarity with baseline is close to 0.8, while mo obtained from five, because the similarity reaches up to more than 0.9.

The influence of SBS on each cellular signaling sequence is further insp temporal ratio, as illustrated in Figure 10 below. Notice that the average SBS temporal ratio of 24×8=192 cellular signal is 83.34%. Among all cellular signaling sequences, such a ratio is up to 90% them, which is shown in Figure 11 . In conclusion, the SBSs are not only the base stations with high-frequ tions but are also the dominant base stations in terms of connection durati In conclusion, the SBSs are not only the base stations with high-frequency connections but are also the dominant base stations in terms of connection duration. Therefore, it is reasonable to designate them as SBSs.

In the second part, we study the stability of the SBSs' influence on the segments by analyzing the proportional distribution of SBSs' impacting weights with different numbers of cellular signaling sequences.

Let R (k) (E i ) denote the proportional distribution of SBSs' impacting weights, where E i is the ith road segment and k (1 ≤ k ≤ 8) is the number of cellular signaling sequences. Then, a weighted vector of N segments can be represented as

Without loss of generality, we take the value of k to be 3, 5 and 8. Let k be equal to 3 and 8. We evaluate the similarity between R (3) and R (8) by their JS divergence, denoted by JS R (3) R (8) , which is formulated as Equation (21). The value of JS R (3) R (8) is 0.08. That means there is a certain resemblance between R (3) and R (8) .

To show the similarity intuitively, Figure 12 plots these two impacting weights of the SBSs. In the second part, we study the stability of the SBSs' influence on the segments by analyzing the proportional distribution of SBSs' impacting weights with different numbers of cellular signaling sequences.

Let R ( ) ( ) denote the proportional distribution of SBSs' impacting weights, where is the th road segment and (1 ≤ ≤ 8) is the number of cellular signaling sequences. Then, a weighted vector of segments can be represented as

Without loss of generality, we take the value of to be 3, 5 and 8. Let be equal to 3 and 8. We evaluate the similarity between ( ) and ( ) by their JS divergence, denoted by JS ( ) || ( ) , which is formulated as Equation (21). The value of JS ( ) || ( ) is 0.08. That means there is a certain resemblance between ( ) and ( ) . To show the similarity intuitively, Figure 12 plots these two impacting weights of the SBSs. Likewise, we can evaluate the similarity between ( ) and ( ) by JS ( ) || ( ) , whose value is further down to 0.02. ( ) and ( ) are comparatively shown in Figure  13 . It can be found that the proportionality of SBSs' weights obtained from five cellular sequences is highly identical to that from eight cellular sequences. In summary, the major fingerprint features of the road segment can be extracted from three cellular signaling sequences, and five cellular signaling sequences can provide extremely accurate fingerprint features. In our case, each road segment has no less than five cellular sequences for fingerprint feature calibration. Likewise, we can evaluate the similarity between R (5) and R (8) by JS R (5) R (8) , whose value is further down to 0.02. R (5) and R (8) are comparatively shown in Figure 13 . It can be found that the proportionality of SBSs' weights obtained from five cellular sequences is highly identical to that from eight cellular sequences. In the second part, we study the stability of the SBSs' influence on the segments by analyzing the proportional distribution of SBSs' impacting weights with different numbers of cellular signaling sequences.

Let R ( ) ( ) denote the proportional distribution of SBSs' impacting weights, where is the th road segment and (1 ≤ ≤ 8) is the number of cellular signaling sequences. Then, a weighted vector of segments can be represented as

Without loss of generality, we take the value of to be 3, 5 and 8. Let be equal to 3 and 8. We evaluate the similarity between ( ) and ( ) by their JS divergence, denoted by JS ( ) || ( ) , which is formulated as Equation (21). The value of JS ( ) || ( ) is 0.08. That means there is a certain resemblance between ( ) and ( ) . To show the similarity intuitively, Figure 12 plots these two impacting weights of the SBSs. Likewise, we can evaluate the similarity between ( ) and ( ) by JS ( ) || ( ) , whose value is further down to 0.02. ( ) and ( ) are comparatively shown in Figure  13 . It can be found that the proportionality of SBSs' weights obtained from five cellular sequences is highly identical to that from eight cellular sequences. In summary, the major fingerprint features of the road segment can be extracted from three cellular signaling sequences, and five cellular signaling sequences can provide extremely accurate fingerprint features. In our case, each road segment has no less than five cellular sequences for fingerprint feature calibration. Figure 13 . The visualization of R (5) and R (8) .

In summary, the major fingerprint features of the road segment can be extracted from three cellular signaling sequences, and five cellular signaling sequences can provide extremely accurate fingerprint features. In our case, each road segment has no less than five cellular sequences for fingerprint feature calibration.

In the experimental region, we tested 306 cellular signaling sequences whose corresponding trajectories varied from 1 km to 6 km in length, and their total was 1367 km.

The performance of trajectory estimation algorithm is evaluated by the precision rate and the recall rate as below.

Herein, X represents the estimated trajectory (the solid line in Figure 14) , Y represents the true trajectory (the green dash line) and X ∩ Y represents the overlapping part of the two trajectories (the green solid line), namely the correct estimation. Their lengths are denoted by |X|, |Y| and |X ∩ Y| respectively.

In the experimental region, we tested 306 cellular signaling sequ sponding trajectories varied from 1 km to 6 km in length, and their to

The performance of trajectory estimation algorithm is evaluated and the recall rate as below.

Herein, represents the estimated trajectory (the solid line in sents the true trajectory (the green dash line) and ∩ represents t of the two trajectories (the green solid line), namely the correct estim are denoted by | |, | | and | ∩ | respectively. 

The systematic performance of our model is affected by many chapter, the critical parameters of NF-Track are further investigated a (1) The -value of segment fingerprinting A small -value results in the underfitting of segment fingerpr one leads to overfitting. Therefore, the effect of changing the -value racy was investigated in this study at a 180-s time span, which prove rithm can reach the optimum. Figure 15 shows that, with increasing curacy increases at first and then decreases. Hence, -value was set t shown in Figure 15 . 

The systematic performance of our model is affected by many parameters. In this chapter, the critical parameters of NF-Track are further investigated as follows.

(1) The θ-value of segment fingerprinting A small θ-value results in the underfitting of segment fingerprinting, while a large one leads to overfitting. Therefore, the effect of changing the θ-value on the overall accuracy was investigated in this study at a 180-s time span, which proved later that the algorithm can reach the optimum. Figure 15 shows that, with increasing the θ-value, the accuracy increases at first and then decreases. Hence, θ-value was set to 0.375 optimally, as shown in Figure 15 .

(2) The ε-value of anchor clustering

The number of APs obtained from the cellular trajectory is mainly affected by the ε-neighborhood in the density-based clustering algorithm. More APs bring more locating information to the real-time tracking, but more noise is introduced at the same time. Thus, it needs some tuning to fetch a suitable ε-value. Figure 16 shows the effect of changing the ε-value in the estimation accuracy. It can be seen that the system achieves the best performance at ε = 300 within the range of testing. one leads to overfitting. Therefore, the effect of changing the -value on t racy was investigated in this study at a 180-s time span, which proved lat rithm can reach the optimum. Figure 15 shows that, with increasing the curacy increases at first and then decreases. Hence, -value was set to 0.3 shown in Figure 15 . The number of APs obtained from the cellular trajectory is mainly af neighborhood in the density-based clustering algorithm. More APs brin information to the real-time tracking, but more noise is introduced at the s it needs some tuning to fetch a suitable -value. Figure 16 shows the eff the -value in the estimation accuracy. It can be seen that the system ac performance at = 300 within the range of testing. (3) The searching radius of anchor map matching

The searching radius of anchor map matching mentioned in Section 4.2.2 was set to 800 m here, because the result of a testing experiment shown in Figure 17 indicated that the transmit distance of most base stations does not exceed 800 m. Hence, such a searching radius ensures the accurate map-matching of anchor points on ordinary urban road networks. The anchor was discarded when it could not find any segment within 800 m.

NF-Track is compared with three types of RSS-independent trajectory estimation algorithms, including (a) SnapNet [26] : The current state-of-the-art online cellular trajectory estimation algorithm in map-matching cellular-based locations with several innovative filters. (b) standardHMM: A plain HMM technique that map-matches the cellular locations without using any of the additional filters. (c) NearRd: Map-matching the anchor points to the road nearby and connecting them through the shortest path without the fingerprint map. Figure 18a ,b shows their recall rates and precision rates to evaluate the accuracy for various trajectory update spans w. It can be seen that all algorithms are difficult to obtain high accuracy when the update span is less than 60 s due to the high noise and sparseness of cellular-based positioning. NF-Track has extremely higher accuracy than other state-of-the-art online map-matching algorithms in all update spans and performs the best with 91.68% recall and 90.35% precision at w = 180 s. If we perform map-matching without a fingerprint map (NearRd), the accuracies have a general decline of more than ten percent, which suggests that the fingerprint map is able to enhance the effect of the trajectory estimation greatly. Figure 18a ,b shows their recall rates and precision rates to evaluate the accuracy for various trajectory update spans w. It can be seen that all algorithms are difficult to obtain high accuracy when the update span is less than 60 s due to the high noise and sparseness of cellular-based positioning. NF-Track has extremely higher accuracy than other state-ofthe-art online map-matching algorithms in all update spans and performs the best with 91.68% recall and 90.35% precision at = 180 . If we perform map-matching without a fingerprint map (NearRd), the accuracies have a general decline of more than ten percent, which suggests that the fingerprint map is able to enhance the effect of the trajectory estimation greatly. To illustrate the advantage over the current state-of-the-art unsupervised methods, we further compare NF-Track with SnapNet in two categories of trajectory: regular trajectory (RT) and irregular trajectory (iRT), shown in Figure 19 . The regular trajectories are generated by travelers that follow the shortest and straightest paths. Besides, some trajectories in our testing dataset are longer and more twisted because of many irregular circumstances, e.g., congestion avoidance and bus detouring, and we call them irregular trajectories. To illustrate the advantage over the current state-of-the-art unsupervised methods, we further compare NF-Track with SnapNet in two categories of trajectory: regular trajectory (RT) and irregular trajectory (iRT), shown in Figure 19 . The regular trajectories are generated by travelers that follow the shortest and straightest paths. Besides, some trajectories in our testing dataset are longer and more twisted because of many irregular circumstances, e.g., congestion avoidance and bus detouring, and we call them irregular trajectories.

To illustrate the advantage over the current state-of-the-art unsupervised methods, we further compare NF-Track with SnapNet in two categories of trajectory: regular trajectory (RT) and irregular trajectory (iRT), shown in Figure 19 . The regular trajectories are generated by travelers that follow the shortest and straightest paths. Besides, some trajectories in our testing dataset are longer and more twisted because of many irregular circumstances, e.g., congestion avoidance and bus detouring, and we call them irregular trajectories.

RT2 RT3 RT4 Figure 20 shows two algorithms' estimation accuracy on two kinds of trajectories at = 180 , where both supervised and unsupervised models achieve satisfactory performances, as seen in Figure 20 . It follows that both models acquire satisfactory results for regular trajectories. However, for irregular trajectories, NF-Track is obviously more effective, while unsupervised models are not robust, which makes it unsuitable for mapmatching in complicated urban environments. Moreover, NF-Track has low sensitivity to the patterns of moving trajectory, which indicates that the fingerprinting mechanism is helpful for enhancing the stability and generalization of cellular trajectory map-matching.

iRT2 iRT3 iRT4 Figure 19 . Regular trajectory (RT) and irregular trajectory (iRT). Figure 20 shows two algorithms' estimation accuracy on two kinds of trajectories at w = 180 s, where both supervised and unsupervised models achieve satisfactory performances, as seen in Figure 20 . It follows that both models acquire satisfactory results for regular trajectories. However, for irregular trajectories, NF-Track is obviously more effective, while unsupervised models are not robust, which makes it unsuitable for mapmatching in complicated urban environments. Moreover, NF-Track has low sensitivity to the patterns of moving trajectory, which indicates that the fingerprinting mechanism is helpful for enhancing the stability and generalization of cellular trajectory map-matching.

AP clustering and its timestamp inference are vital to the trajectory estimation algorithm. The correctness of AP map-matching is mainly affected by the inferred timestamps of its previous AP. Hence, it is essential to evaluate the accuracy of inferred timestamps and their influence on the map-matching, so as to assess the validity of the trajectory estimation algorithm. regular trajectories. However, for irregular trajectories, NF-Track is obviously m tive, while unsupervised models are not robust, which makes it unsuitable matching in complicated urban environments. Moreover, NF-Track has low se the patterns of moving trajectory, which indicates that the fingerprinting me helpful for enhancing the stability and generalization of cellular trajectory map 

AP clustering and its timestamp inference are vital to the trajectory estim rithm. The correctness of AP map-matching is mainly affected by the inferred t of its previous AP. Hence, it is essential to evaluate the accuracy of inferred t and their influence on the map-matching, so as to assess the validity of the traj mation algorithm.

In this study, 976 APs are extracted for analysis from the 306 cellular sig quences at = 180 . For each cellular signaling sequence, its extracted AP s backbone to reconstruct the spatiotemporal trajectory of the mobile device. As above, each GPS location recorded has already been matched to a certain posi In this study, 976 APs are extracted for analysis from the 306 cellular signaling sequences at w = 180 s. For each cellular signaling sequence, its extracted AP series is the backbone to reconstruct the spatiotemporal trajectory of the mobile device. As mentioned above, each GPS location recorded has already been matched to a certain position in the road network. Therefore, as for an AP, its closest GPS point can be taken as the true position where it should be matched. The timestamp of such a GPS location point, which is denoted by T true , is regarded as the true value of AP's timestamp.

Referring to its inferred timestamp T in f er derived by Equations (13) and (14), the timestamp inferring error of AP is expressed as:

Through the analysis of these 976 APs, the distribution of their timestamp inferring error is obtained as Figure 21 . Herein, 88% of the anchor points have errors within 10 s. The large inferring errors of a small part of the APs are caused by the devices' long rest behavior mentioned above, which brings perturbation to the timestamp inference. Overall, from the error distribution of APs' timestamps inference, such perturbations brought by the unsteady movement of urban travelers can be tolerated, because the average time gap of two successive APs is up to 93 s. 

Through the analysis of these 976 APs, the distribution of their timestam error is obtained as Figure 21 . Herein, 88% of the anchor points have errors w The large inferring errors of a small part of the APs are caused by the device behavior mentioned above, which brings perturbation to the timestamp infer all, from the error distribution of APs' timestamps inference, such perturbatio by the unsteady movement of urban travelers can be tolerated, because the av gap of two successive APs is up to 93 s. To further investigate the influence of timestamp inferring errors on finge parison and map-matching of APs, the cellular signaling fragments truncated b To further investigate the influence of timestamp inferring errors on fingerprint comparison and map-matching of APs, the cellular signaling fragments truncated by two consecutive APs are explored.

Suppose AP f and AP l are two successive anchor points. 

For all truncated cellular signaling fragments, the average consistency is 0.78, which indicates that the errors of inferred base station sets are quite small.

The proportion of base stations' CMD in Seq F in f er and Seq F true , respectively, denoted by R Seq F in f er and R Seq F true , were further evaluated by JS divergence JS R(Seq F in f er ) R Seq F true to observe their consistency. We could find that the average divergence was 0.13, and 80% of the cellular signaling fragments had a divergence less than 0.3, as shown in Figure 22 . In general, the current inferring errors of APs have a small influence o matching and trajectory estimation.

Cellular signals are affected by time periods and weather conditions, to so The urban traffic changes over time, which greatly influence the network load signal quality [54] . Besides, outdoor weather conditions can cause severe deg network performance [55] . Thus, the influences of these external factors on the accuracy are further investigated in this study. Table 3 summarizes the resu 180 . It can be seen that the overall estimation accuracy of the cellular trajecto non-rush hours is slightly higher in contrast to the rush hours (7-10 a.m. and In general, the current inferring errors of APs have a small influence on AP mapmatching and trajectory estimation.

Cellular signals are affected by time periods and weather conditions, to some extent. The urban traffic changes over time, which greatly influence the network load and radio signal quality [54] . Besides, outdoor weather conditions can cause severe degradation in network performance [55] . Thus, the influences of these external factors on the estimation accuracy are further investigated in this study. Table 3 summarizes the results at w = 180 s. It can be seen that the overall estimation accuracy of the cellular trajectories during non-rush hours is slightly higher in contrast to the rush hours (7-10 a.m. and 5-7 p.m.). The main reason is that the cellular network has sufficient capacity, and the localization is less disturbed by the "Ping-Pong effect" during non-rush hours. Referring to historical weather information [56] , we found that weather conditions have little impact on the performance of NF-Track, because dense base stations are deployed in urban areas, and they serve mobile users well. 

The computing consumption of several online trajectory estimation methods mentioned above is further investigated. Figure 23 presents the computational efficiency of NF-Track and SnapNet, respectively, which is the computing latency to process a timespan of input cellular locations. It can be seen that, although the computation of NF-Track is a bit longer than the unsupervised method due to the fine-grained fingerprint comparison implemented in dense urban road networks, it is worthy of significant improvement in the estimation accuracy. 

Cellular signaling data contains wealthy movement information of urban In this paper, we proposed NF-Track to realize the online trajectory estimation road networks based on a network-wide fingerprint map. Different from previo signaling sequence map-matching systems, NF-Track is independent of either relevant information in conventional fingerprinting approaches or heuristic h that are widely leveraged by unsupervised methods. Therefore, NF-Track is mo for cloud computing backend deployment to realize real-time tracking, whi many applications, such as infectious disease tracing and screening, network flo and traffic scheduling.

NF-Track is tested on a real-world urban dataset. The experiment show novel fingerprint map is robust enough to assist in the accurate map-matching 

Cellular signaling data contains wealthy movement information of urban travelers. In this paper, we proposed NF-Track to realize the online trajectory estimations in urban road networks based on a network-wide fingerprint map. Different from previous cellular signaling sequence map-matching systems, NF-Track is independent of either hardwarerelevant information in conventional fingerprinting approaches or heuristic hypotheses that are widely leveraged by unsupervised methods. Therefore, NF-Track is more suitable for cloud computing backend deployment to realize real-time tracking, which benefits many applications, such as infectious disease tracing and screening, network flow sensing and traffic scheduling.

NF-Track is tested on a real-world urban dataset. The experiment shows that our novel fingerprint map is robust enough to assist in the accurate map-matching of cellular signaling sequences. NF-Track can achieve a recall rate of 91.68% and a precision rate of 90.35% in sophisticated traffic scenes, which are superior to the state-of-the-art model-based unsupervised learning approaches.

NF-Track has great potential to be extended over larger areas. For this purpose, we are currently making efforts to enhance the generalization and portability of NF-Track. In particular, we have involved the base station parameters in the fingerprinting process, which can further facilitate the deployment of our systems and maintain the estimation accuracy in a larger urban area. 

The sensable city: A survey on the deployment and management for smart city monitoring

An intelligent big data collection technology based on micro mobile data centers for crowdsensing vehicular sensor network

Urban sensing based on mobile phone data: Approaches, applications, and challenges

Online map-matching based on hidden markov model for real-time traffic sensing applications

Dual attentive graph neural network for metro passenger flow prediction

Learning trajectories as words: A probabilistic generative model for destination prediction

Combining GPS with TOA/TDOA of Cellular Signals to Locate Terminal

CellSense: An Accurate Energy-Efficient GSM Positioning System

5G mmWave Positioning for Vehicular Networks

A Survey of Enabling Technologies for Network Localization, Tracking, and Navigation

An Overview on Integrated Localization and Communication towards 6G

Cellphone tracking could help stem the spread of coronavirus. Is privacy the price? Science 2020

The need for GIScience in mapping COVID-19

Mitigating COVID-19 Transmission in Schools with Digital Contact Tracing

A tailored machine learning approach for urban transport network flow estimation

Cellpath: Fusion of cellular and traffic sensor data for route flow estimation via convex optimization

Travel mode classification of intercity trips using cellular network data

Parallel distance: A new paradigm of measurement for parallel driving

Vehicle motion prediction at intersections based on the turning intention and prior trajectories model

Solving Traffic Signal Scheduling Problems in Heterogeneous Traffic Network by Using Meta-Heuristics

Estimating real human trajectories through mobile phone data

Reconstruction of human movement trajectories from large-scale low-frequency mobile phone data

Map-matching cell phone trajectories of low spatial and temporal accuracy

TRANSIT: Fine-grained human mobility trajectory inference at scale with mobile network signaling data

Fast map matching for cellular data

Accurate Real-time Map Matching for Challenging Environments

Online map-matching of noisy and sparse location data with hidden Markov and route choice models

Can We Map-Match Individual Cellular Network Signaling Trajectories in Urban Environments? Data-Driven Study

RADAR: An In-building RF-based User Location and Tracking System

Modeling of indoor positioning systems based on location fingerprinting

Wi-Fi Fingerprint-Based Topological Map Building for Indoor User Tracking

Indoor Localization Based on CSI Fingerprint by Siamese Convolution Neural Network

RSSI-fingerprinting-based mobile phone localization with route constraints

Low-Energy Trajectory Mapping for Mobile Devices

A map-matching algorithm dealing with sparse cellular fingerprint observations

Integrated tracking and route classification for travel time estimation based on cellular network signalling data

CLSTERS: A General System for Reducing Errors of Trajectories Under Challenging Localization Situations

Advanced Location-Based Technologies and Services

Practical metropolitan-scale positioning for gsm phones

Matching GPS observations to locations on a digital map

On Map-Matching Vehicle Tracking Data

On-line map-matching framework for floating car data with low sampling rate in urban road networks

Map-matching for low-sampling-rate GPS trajectories

Non-uniform sampling strategies for continuous correction based trajectory estimation

Discovering Density-Based Clustering Structures Using Neighborhood Distance Entropy Consistency

Intelligent Geodemographic Clustering Based on Neural Network and Particle Swarm Optimization

Fuzzy clustering based on automated feature pattern-driven similarity matrix reduction

Cubature Kalman Filter Under Minimum Error Entropy With Fiducial Points for INS/GPS Integration

Map matching of mobile probes based on handover location technology

Impacts of Sensor Spacing on Accurate Freeway Travel Time Estimation for Traveler Information

Systematic Relation of Estimated Travel Speed and Actual Travel Speed

A density-based algorithm for discovering clusters in large spatial databases with noise

Divergence measures based on the Shannon entropy

Understanding traffic dynamics in cellular data networks

Effects of Temperature and Humidity on Radio Signal Strength in Outdoor Wireless Sensor Networks

The Historical Weather in Shenzhen

This work was supported by Huawei Technologies Co., Ltd., Shenzhen, China.

The authors declare no conflict of interest.