key: cord-0206254-wzkiojhr
authors: Strzheletska, Elena V.; Tsotras, Vassilis J.
title: Reachability and Top-k Reachability Queries with Transfer Decay
date: 2021-05-18
journal: nan
DOI: nan
sha: 94b70f7c301c3b9eece27a6e02e0b158bbbe99b2
doc_id: 206254
cord_uid: wzkiojhr

The prevalence of location tracking systems has resulted in large volumes of spatiotemporal data generated every day. Addressing reachability queries on such datasets is important for a wide range of applications (surveillance, public health, social networks, etc.) A spatiotemporal reachability query identifies whether a physical item (or information etc.) could have been transferred from the source object $O_S$ to the target object $O_T$ during a time interval $I$ (either directly, or through a chain of intermediate transfers). In previous research on spatiotemporal reachability queries, the number of such transfers is not limited, and the weight of a piece of transferred information remains the same. This paper introduces novel reachability queries, which assume a scenario of information decay. Such queries arise when the value of information that travels through the chain of intermediate objects decreases with each transfer. To address such queries efficiently over large spatiotemporal datasets, we introduce the RICCdecay algorithm. Further, the decay scenario leads to an important extension: if there are many different sources of information, the aggregate value of information an object can obtain varies. As a result, we introduce a top-k reachability problem, identifying the k objects with the highest accumulated information. We also present the RICCtopK algorithm that can efficiently compute top-k reachability with transfer decay queries. An experimental evaluation shows the efficiency of the proposed algorithms over previous approaches.

Abstract-The prevalence of location tracking systems has resulted in large volumes of spatiotemporal data generated every day. Addressing reachability queries on such datasets is important for a wide range of applications (surveillance, public health, social networks, etc.) A spatiotemporal reachability query identifies whether a physical item (or information etc.) could have been transferred from the source object OS to the target object OT during a time interval I (either directly, or through a chain of intermediate transfers). In previous research on spatiotemporal reachability queries, the number of such transfers is not limited, and the weight of a piece of transferred information remains the same. This paper introduces novel reachability queries, which assume a scenario of information decay. Such queries arise when the value of information that travels through the chain of intermediate objects decreases with each transfer. To address such queries efficiently over large spatiotemporal datasets, we introduce the RICCdecay algorithm. Further, the decay scenario leads to an important extension: if there are many different sources of information, the aggregate value of information an object can obtain varies. As a result, we introduce a top-k reachability problem, identifying the k objects with the highest accumulated information. We also present the RICCtopK algorithm that can efficiently compute top-k reachability with transfer decay queries. An experimental evaluation shows the efficiency of the proposed algorithms over previous approaches.

Index Terms-spatio-temporal data, reachability query, top-k, access methods.

Answering reachability queries on large spatiotemporal datasets is important for a wide range of applications, such as security monitoring, surveillance, public health, epidemiology, social networks, etc. Nowadays, with the perpetuation of Covid-19, the reachability and trajectory analysis are as important as ever, since efficient contact tracing helps to control the spread of the disease.

Given two objects O S and O T , and a time interval I, a spatiotemporal reachability query identifies whether information (or physical item etc.) could have been transferred from O S to O T during I (typically indirectly through a chain of intermediate transfers). The time to exchange information (or physical items etc.) between objects affects the problem solution and it is application specific. An 'instant exchange' scenario (where information can be instantly transferred and retransmitted between objects) is assumed in [1] , but may not be the case in many real world applications. On the other hand, [2] and [3] consider two reachability scenarios without the 'instant exchange' assump-tion: reachability with processing delay and transfer delay. After two objects encountered each other, the contacted object may have to spend some time to process the received information before being able to exchange it again (processing delay). In other applications, for the transfer of information to occur, two objects are required to stay close to each other for some period of time (transfer delay); we call such elongated contact a meeting. To contract the virus, one has to be exposed to an infected person for a brief period of time; to exchange messages through Bluetooth, two cars have to travel closely together for some time.

While the problems discussed above considered different reachability scenarios, they had a common feature: the value of information carried by the object that initiated the information transmission process and the value of information obtained by any reached object was assumed to remain unchanged. In this paper, we remove this assumption, since for some applications it may not be valid. For example, if two persons communicate over the phone (or a Bluetooth-enabled device), some information may be lost due to faulty connection. We name a reachability problem, where the value of the transmitted item experiences a decay with each transfer, as reachability with transfer decay. The formal definition of the new problem is given in Section III. Note that in this paper will still assume the transfer delay scenario as this is more realistic.

The information decay scenario leads to the second problem we introduce, namely top-k reachability with decay. Consider a group of objects (people, cars, etc...), each of which possesses a different piece of information, and starts its transmission to other objects independently of each other. The objects that initiated the process form a set of source objects. Each of the source objects may carry information of a different value (and different weight), and during a contact, a decay of each piece of information may not be the same. As time progresses, any object may receive one or more items that originally came from different sources. It is reasonable to compute the combined weight of all the items collected by each object and rank the objects according to their aggregate weights. Objects with the most aggregated information may be of special interest. A top-k reachability query with decay finds the k objects with the highest aggregate weights.

In this paper we present two algorithms: RICCdecay and RICCtopK, that can efficiently compute reachability and top-k reachability with transfer decay queries on large spatiotemporal datasets. RICCdecay consists of two stages, preprocessing and query processing, while RICCtopK performs topk query processing using the index from RICCdecay. The rest of the paper is structured as follows: Section II is an overview of related work while Section III defines the two problems. Section IV describes the RICCdecay algorithm and its preprocessing phase, while Sections V and VI present the query processing for the reachability with decay and the topk reachability problems respectively. Section VII contains the experimental evaluation and Section VIII concludes the paper.

II. RELATED WORK Graph Reachability. A large number of works is proposed for the static graph reachability problem. The efficient approaches balance the preprocessing with the query processing, and are categorized in [4] as those, that use: (i) transitive closure compression [5] , [6] , (ii) hop labeling [7] , [8] , [9] , and (iii) refined online search [10] , [11] . In our model, the reachability question can be represented as a variation of a shortest path query. The state-of-the-art algorithm for solving shortest path problems on road networks is Contraction Hierarchies (CH) [12] . CH benefits from creating a hierarchy of nodes on the basis of their importance for the given road network. In our problem, there is no node preference between the graph nodes, and thus applying for it CH would be inefficient.

Evolving Graphs. Evolving graphs have recently received increased attention The DeltaGraph [13] , is an external hierarchical index structure used for efficient storing and retrieving of historical graph snapshots. For analyzing distance and reachability on temporal graphs, [14] utilizes graph reachability labeling, while for large dynamic graphs, [15] constructs a reachability index, based on a combination of labeling, ordering, and updating techniques. These methods work with datasets of a different nature, compared with spatiotemporal.

Spatiotemporal Databases. A survey on spatiotemporal access methods appears on in [16] . They often involve some variation on hierarchical trees [17] - [21] , some form of a gridbased structure [22] , [23] , or indexing in parametric space [24] , [25] . The existing spatiotemporal indexes support traditional range and nearest neighbor queries and not the reachability queries we examine here. Some of the recent more complex queries were focused on querying/identifying the behavior and patterns of moving objects: discovering moving clusters [26] , [27] , flock patterns [28] , and convoy queries [29] .

Spatiotemporal Reachability Queries. The first disk-based solutions for the spatiotemporal reachability problem, Reach-Grid and ReachGraph appeared in [1] . These are indexes on the contact datasets that enable faster query times. In Reach-Grid, during query processing only a necessary portion of the contact network is constructed and traversed. In ReachGraph, the reachability at different scales is precomputed and then reused at query time. ReachGraph makes the assumption that a contact between two objects can be instantaneous, and thus during one time instance, a chain of contacts may occur, which allows it to be smaller in size and thus reduces query time. ReachGrid does not require the 'instant exchange' assumption.

In [2] , two novel types of the 'no instant exchange' spatiotemporal reachability queries were introduced: reachability queries with processing and transfer delays (meetings). The proposed solution to the first type utilized CH [12] for path contraction. Later, [3] considered the second type of delays and introduced two algorithms, RICCmeetMin and RICCmeet-Max. To reduce query processing time, these algorithms precompute the shortest valid (RICCmeetMin), and the longest possible meetings (RICCmeetMax) respectively. Neither one of them can accommodate reachability queries with decay.

Spatiotemporal Top-k Queries. While many works have considered variations of spatial and spatiotemporal top-k queries [30] - [36] , no previous work addresses the decay scenario.

In this section, we define two novel spatiotemporal reachability problems: the problem of reachability with decay and its extension, the problem of top-k reachability with decay.

Let O = {O 1 , O 2 , ..., O n } be a set of moving objects, whose locations are recorded for a long period of time at discrete time instants t 1 , t 2 , ..., t i , ..., with the time interval between consecutive location recordings ∆t = t k+1 − t k (k = 1, 2, ...) being constant. A trajectory of a moving object O i is a sequence of pairs (l i , t k ), where l i is the location of object O i at time t k . Two objects, O i and O j , that at time t k are respectively at positions l i and l j , have a contact (denoted as

where d cont is the contact distance (a distance threshold given by the application), and dist(l i , l j ) is the Euclidean distance between the locations of objects O i and O j at time t k .

The reachability with transfer delay scenario (which we follow here) requires to discretize the time interval between consecutive position readings [t k , t k+1 ) by dividing it into a series of non-overlapping subintervals [τ 0 , τ 1 ), ..., [τ i , τ i+1 )... , [τ r−1 , τ r ) of equal duration ∆τ = τ i+1 − τ i , such that τ 0 = t k and τ r = t k+1 . We say that two objects, O i and O j , had a meeting < O i , O j , I m > during the time interval I m = [τ s , τ f ] if they had been within the threshold distance d cont from each other at each time instant τ k ∈ [τ s , τ f ]. The duration of this meeting is m = τ f − τ s . We call a meeting valid if its duration m ≥ m q ∆τ (where m q is the query specifies required meeting duration -time, needed for the objects to complete the exchange).

f , and τ sj+1 ≥ τ fj for j = 0, 1, ..., k−1. A reachability query determines whether object O T (the target) is reachable from object O S (the source) during time interval I.

Consider example in Fig. 1 . Table ( a) shows the actual meetings between all objects during one time block, which is given as a meetings graph in (b). A materialized reachability graph shows how the information is being dispersed. Suppose object O 1 is the source object and the required meeting duration m q = 2∆τ . Then graph G 2 in (c) is the materialized (m q )-reachability graph for O 1 on data from (a). By looking at G 2 , one can discover all objects that can be (m q )-reached by object O 1 during the time interval I = [τ 0 , τ 8 ]. 

In the reachability with transfer delay scenario, to complete the transfer, it is necessary for the objects to stay within the contact distance for a time interval that is at least as long as the required meeting duration m q . However, even if a meeting between objects O i and O j satisfied the m q requirement, under some circumstances, the transfer may still fail to occur, or the value of the transferred item may go down (e.g., a complete or partial signal loss during the communication). We consider a new type of reachability scenario, namely reachability with transfer decay, that accounts for such events.

Let d denote the rate of transfer decay -a part of information lost during one transfer (d ∈ [0, 1)). Then p = 1 − d (p ∈ (0, 1]) will define the portion of the transfered information. Suppose, the weight of the item carried by a source object O S is w. Then, during a valid meeting, O S can transfer this item to some object O i . However, considering the decay, if d > 0, the value of information, obtained by O i lessens and becomes wp. With each further transfer, the value of the received item will continue to decrease. This process can be modeled with an exponential decay function.

We denote the number of transfers (hops), that is required to pass the information from object

be a function that calculates the weight of an item after h transfers. Assuming that the transfer decay d and thus p are constant for the same item, g w (h) can be defined as follows:

The number of transfers h in equation (1), that an item has to complete in order to be delivered from object O S to object O i , depends on the time τ j when it is being evaluated, and thus denoted as h(O 3 ) = 1. The case with p = 1 corresponds to the reachability with transfer delay problem [3] . If p < 1, the value of g w (h) decreases exponentially with each transfer. Let ν denote the threshold weight. If after some transfer, the weight of the item becomes smaller than the threshold weight ν, we disregard that event by assigning to the newly transferred item the weight of 0. We say, that h is the allowed number of hops (transfers) if it satisfies the threshold weight inequality

We denote the maximum allowed number of transfers that satisfies inequality (2) as h max . Let f w : R → R be a function that assigns the weight to an item carried by object O i at time τ j , and denote it as

) as follows:

The table in Fig. 1 (a) shows the meetings between objects

Here we assume again that O 1 is the source object, m q = 2∆τ and d = 0.2 ( thus p = 0.8). To illustrate the difference between the actual weight of an item g w and its assigned weight f w , the values g w , f w1 , and f w2 are computed for each object at time instants from τ 0 to τ 8 and recorded in the table (see Fig. 2 ). The values for the assigned weight functions f w1 and f w2 are computed for ν = 0.6 and ν = 0.7 respectively. The graph G 3 in Figure 1 

if there exists a chain of subsequent valid and successful (under m q , d conditions) meetings We assume that the values of d and ν are query specified. An (m q , d)-reachability query Q md : {O S , O T , w, d, I, m q , ν} determines whether the target object O T is reachable from the source object O S , that caries an item whose weight is w, during time interval I = [τ s , τ f ], given required meeting duration m q , rate of transfer decay d, and threshold weight ν, and reports the earliest time instant when O T was reached.

We now consider the problem of top-k reachability with transfer decay. Let S = {O S1 , O S2 , ..., O Sq }, W = {w 1 , w 2 , ..., w q }, and D = {d 1 , d 2 , ..., d q } be the sets of source objects, weights, and decays respectively. Each object O Sr ∈ S carries a different piece of information (or physical item), whose weight is w r , and is able to transfer this information following the (m q , d)-reachability scenario. The transfer decay for the item carried by object O Sr is d r .

As the objects move through the network, source objects O Sr encounter other objects, and may pass information to them. Since each source object owns a different piece of information, the transferred weight depends on both, the number of hops and the source that it came from. Let h r (h r ≥ 0) be the number of hops required for object O Sr to pass the information to object O i . Then we can calculate the actual weight of an item r after h r transfers using equation (1) as

where r = (1, 2, ..., q). As in the previous problem, we require that each threshold weight inequality has been satisfied:

for r = (1, 2, ..., q) and threshold weight ν.

Let h max(r) be the maximum allowed number of transfers that satisfies the inequality above for each r = (1, 2, ..., q). Similarly to (3), function f w(r) assigns weight to the r th item carried by object

) as follows:

Furthermore, each object may receive more than one item. We denote the aggregate weight function F w : R → R that assigns weight to the collection of items carried by object O i at time τ j as F w (O (τj ) i ), and define it as follows:

A top-k reachability with decay query Q topK is given in the form {S, W, D, I, m q , ν, k}. The goal of Q topK is to find k objects with the highest aggregate weight F w (computed according to 5), that was obtained during the time interval I.

IV. PREPROCESSING As with other reachability problems discussed above, there are two naive approaches to solve (m q , d)-reachability problem: (i) 'no-preprocessing', and (ii)'precompute all'. Neither one of them is feasible for large graphs: the first does not 

Weight, assigned to an item carried by O

Contraction parameter and grid resolution involve any preprocessing, and thus too slow during the query processing, while the second requires too much time for preprocessing and too much space for storing the preprocessed data. To overcome the disadvantages of the second approach and still achieve fast query processing, we precompute and store only some data as described below.

In order to simplify the presentation, we assume that the minimum meeting duration µ (µ ≤ m q ) is known before the preprocessing, and set m q = µ, thus fixing it. However, the proposed algorithm can be extended to work with any query specified m q by combining it with RICCmeetMax [3] .

Suppose, our datasets contain records of objects' locations in the form (t, object id, location), ordered by the location reporting time t. We start the preprocessing by dividing the time domain into a non-overlapping time intervals of equal duration (time blocks). Each time block (denoted as B k ) contains all records whose reporting times belong to the corresponding time period. The number of the reporting times in each block is the contraction parameter C. How to find an optimal value of C will be discussed in Section VII.

For each time block, during the preprocessing, the following steps have to be completed: (i) computing candidate contacts, (ii) verifying contacts (performed for each t k ), (iii) identifying meetings, (iv) computing reachability, and (v) index construction. Steps (i), (ii), (iii), and (v) are similar to those in [3] ; we discuss them briefly, while concentrating on step (iv), which is the most challenging step of preprocessing.

During the preprocessing, information regarding each object O i is saved in a data structure named objectRecord(O i ), which is created at the beginning of each time block B k and deleted after all the needed information is written on the disk at the end of B k . ObjectRecord(O i ) has the following fields: Object id, Cell id (the object's placement in the grid with side H when it was first seen during B k ), ContactsRec (a list of the contacts of O i during B k ), M eetingsRec (a list of meetings of O i during B k ). The grid side H is another parameter (in addition to the contraction parameter C), which needs to be optimized. We will discuss this question in Section VII. In addition, for each time block we maintain a hashing scheme, that enables to access each object's information by the object's id.

Two objects O i and O j are candidate contacts at reporting time t k if the distance between them at that time is no greater than candidate contact distance d cc = 2d max + d cont (where d max is the largest distance that can be covered by any object during ∆t). Candidate contacts can potentially have a contact between t k and t k+1 . To force all candidate contacts of a given object O i to be in the same or neighboring with O i 's cells, at each t k we partition the area covered by the dataset into cells with side d cc . Now, to find all candidate contacts of object O i , we only need to compute the (Euclidean) distance between O i and objects in the same and neighboring cells.

Using our assumption that between consecutive reporting times objects move linearly, at t k+1 , we can verify if there were indeed any contacts between each pair of candidate contacts during the time interval [t k , t k+1 ). If a contact occurred, it is saved in the list ContactsRec of objectRecord of each contacted object. If an object O i had O j for its contact at two or more consecutive time instants, these contacts are merged into a meeting, and written in the M eetingsRec list of (O i ). At the end of each time block, a meeting duration m is computed for each meeting. All meetings with m < µ (with the exception of boundary meetings) are pruned, while all the remaining meetings are recorded into file Meetings. Boundary meetings (meetings that either start at the beginning or finish at the end of B k ) are recorded regardless of their duration since they may span more than one block, which needs to be verified during the query processing.

To speed up the query time, during the preprocessing, for each object O i , we precompute all objects that are (µ, d)reachable from O i during B k . Here we are facing a challenge: to find, which objects can be (µ, d)-reached by O i , we need to know the transfer decay d and weight threshold ν, which are assumed to be unknown at the preprocessing time.

To overcome an issue of unknown d and ν, we turn our problem of reachability with decay into hop-reachability problem. Recall that one of the requirements for object O T to be reachable from object O S is that each meeting in the chain of meetings from O S to O T has to be a successful meeting.

It follows from (2), that after each meeting, for each companion object O i , the following condition must hold:

Thus, the allowed number of transfers (or hops) h for a successful meeting should satisfy the following inequality:

h ≤ log p ν w , stopping at some points, where information needs to be analyzed. In our case, the x-dimension is the time-dimension, and y-dimension is the order in which the meetings are discovered. We demonstrate how the algorithm works on the Example in Fig. 3 , and later provide a pseudo-code and detailed explanation. Consider the data in the table (a1). It contains records of actual meetings between all objects during one time block. (a2)-(a6) describe how reached objects and meetings are being discovered. The information about the 'reachability' status of each object is recorded into a temporary table, which is created at the beginning of each block. A row is added to the table for each reached object at the time when it is reached, and it is updated with any new event. The development of the reachability table is shown in (b1)-(b6).

We show how to compute all objects that are reached by object O 1 during the given time block, assuming that µ = 2∆τ . At the beginning of the block, the sweep line is positioned at τ = 0, and only object O 1 is reached (with h min = 0), which is recorded in table (b1). During the given time block, O 1 has only one meeting, < O 1 , O 3 , [0, 3] > which is placed on the plane (a2). As a result of this meeting, object 

The process for computing all objects that are (h min )reachable by O S during one time block is generalized in Algorithm 1. Procedure UpdateHmin initializes and then updates the table that records the reachability status of each reached object. The S ReachHop set keeps all objects for which all h min values as well as the earliest reached time had been computed and finalized. Those objects that were found to be reached, but not in S ReachHop yet, are placed in the priority queue S P Q , where priority to the objects is given according to their 'reached' times. When an object (say object O i ) that has the earliest reached time (τ R (O i )) is extracted from S P Q , it is placed into S ReachHop (lines 10, 11). At this time, all meetings of objects that can be reached by O i (but not in S ReachHop ) are analyzed (lines 13 -23). As a result, both τ R (O j ) (and their priority in S P Q ) as well as their h min values can be changed (lines 19 and 21) . This algorithm has to be performed for each object of the dataset that is active during the given time block. The set of reached objects S ′ reached is initialized with object O S at the beginning of the query processing. We start reading file Reached (Hop) ). The process continues until O T is added to S ′ reached while reading some block B i (i < f ) or the last block B f is reached.

If at the end of processing B f , S ′ reached does not contain the target O T , the query processing can be aborted, otherwise it moves to the file Meetings. Now the process of identifying reached objects inside each block is the same as the one described in Algorithm 1. If there is a meeting between objects O i and O j , that ends at the end of the time block, but is shorter than m q , we check if it continues in the next block, and merge two meetings into one if needed. Also, if object 

If by the end of B i , O T was not found to be reached, and B i < B f , the search switches to Reached(Hop). This process continues until O T is confirmed to be reached by the information from Meetings, or the last block B f is processed.

VI. TOP-K REACHABILITY: QUERY PROCESSING To process top-k reachability queries efficiently, we will use the preprocessed data and index structure from RIC-Cdecay, described in the previous section. For that reason, we named our top-k reachability query processing algorithm RICCtopK. The top-k query Q topK is issued in the form {S, W, D, [τ s , τ f ], µ, ν, k}, where S = {O S1 , O S2 , ..., O Sq }, W = {w 1 , w 2 , ..., w q }, and D = {d 1 , d 2 , ..., d q } are the sets of source objects, weights, and decays respectively. To make use of the precomputed data from RICCdecay, the topk reachability with decay problem has to be translated into top-k hop-reachability problem. Hence, for each source object O Sr ∈ S, we compute h max(r) by applying inequality (6) to each triple {O Sr , w r , d r } as follows:

where p r = 1 − d r , and r = {1, 2, ...q}. Now each top-k query can be thought of as written in the form h max(2) , ..., h max(q) }. Note that the top-k query processing is the extension of the reachability with decay query processing algorithm, and thus we will avoid repeating some details concerning the use of the index structure during the query processing that were described earlier.

First, the set of Top-k Candidates is initialized by adding to it all source objects. We start reading file Reached(Hop) from time block B s , checking all records for each source object from set S (in order of their appearance in the file). Once an object, that was reached by at least one source, is discovered, it is added to Top-k Candidates. For each top-k candidate O i , we keep the information about the source object(s), that it was reached by and h min(r) required to transfer information from each source to O i . The search continues in this manner until time block B f is processed, after which the weight of each object from Top-k Candidates is computed. Note, that this is not the actual weight F w of an object, but the maximum weight F max that this object may receive.

Next, the query processing moves to the file Meetings. Here, the algorithm maintains two structures: Top-k Candidates and Top-k, that have to be updated at the end of each block. Topk Candidates contains: (i) the ids of all reached objects, (ii) their corresponding maximum weights F max , as well as (iii) the current weight F w of each candidate top-k object. At the beginning, the weight F w of each source object O Sr is set to its initial weight w r , while the rest of the objects' weights F w are set to 0. Top-k is initialized by adding to it k source objects from set S with the top k weights; the weight F w of each top-k object is recorded as well.

Let us denote the lowest weight F w among the objects in Top-k as F w min. If Top-k contains k objects, and the object with the smallest value carries weight F w min, any object O i , such that F max (O i ) < F w min, cannot be among the top-k.

In file Meetings, the query processing starts from time block B s . After one time block is processed, the aggregate weight F w of objects from Top-k Candidates that were involved in some transfers, may increase, and has to be updated. This may lead to changes in Top-k. After Top-k and F w min are updated, all objects O i from Top-k Candidates, such that F max (O i ) < F w min, can be removed from the set of candidates. When the work on B s is completed, we proceed to the next block. This process continues until either the last time block B f of the query is reached or the size of Top-k Candidates is reduced to the size of Top-k. The final state of Top-k answers the query.

For example, consider the top-k query with three source object O 1 , O 2 , and O 7 , whose corresponding weights are 3, 4, and 3. Suppose, the query interval [τ s , τ f ] is contained in time blocks B 1 -B 5 . Fig. 5 illustrates the example. The query answering begins in Reached(Hop). The relevant data is read from blocks B 1 -B 5 , and by the end of B 5 , the superset of all objects that can be reached by the source object is identified. These objects are Top-k Candidates. They are recorded in the Top-k Candidates table, together with their maximum possible aggregate weight F max (b1). Since at this stage the aggregate weight F w is known only for the source objects, the objects O 1 , O 2 , and O 7 are placed in the Top-k (c1). The query processing moves to B 1 in file Meetings (a2). At the end of B 1 , the aggregate weight of some objects F w VII. EXPERIMENTAL EVALUATION We proceed with the results of the experimental evaluation of RICCdecay and RICCtopK. Since there are no other algorithms for processing spatiotemporal reachability queries with decay, we compare against a modified version of RICCmeetMin [3] that enables it to answer such queries. All experiments were performed on a system running Linux with a 3.4GHz Intel CPU, 16 GB RAM, 3TB disk and 4K page size. All programs were written on C++ and compiled using gcc version 4.8.5 with optimization level 3.

All experiments were performed on six realistic datasets of two types: Moving Vehicles (MV) and Random Walk (RW). The MV datasets were created by the Brinkhoff data generator [37] , which generates traces of objects, moving on real road networks. For the underlying network we used the San Francisco Bay road network, which covers an area of about 30000 km 2 . These sets contain information about 1000, 2000, and 4000 vehicles respectively (denoted as M V 1 , M V 2 , and M V 4 ). The location of each vehicle is recorded every ∆t = 5 seconds during 4 months, which results in 2, 040, 000 records for each object. The size of each dataset (in GB) appears in Table II . For the experiments on these sets, d cont = 100 meters (for a (class 1) Bluetooth connection). For the RW datasets, we created our own generator, which utilizes the modified random waypoint model [38] , and is often used for modeling movements of mobile users. In our model, 90% of individuals are moving, while the remaining 10% are stationary. At the beginning of the first trip, each user chooses whether to move or not (in the ratio of 9 : 1). Each out of 90% moving users chooses the direction, speed (between 1.5m/s and 4m/s), and duration of the next trip, and then completes it. At the end of the trip, each person determines the parameters for the next trip, and so on. RW datasets consist of trajectories of 10000, 20000, and 40000 individuals respectively (denoted as RW 1 , RW 2 , and RW 4 ). Each set covers an area of 100 km 2 . The location of each user is recorded every ∆t = 6 sec for a period of one month (432,000 records for each person). We set d cont = 10 meters (to identify physical contacts or contacts in the range of a Bluetooth-enabled devices).

The performance was evaluated in terms of disk I/Os during query processing. The ratio of a sequential I/O to a random I/O is system dependent; for our experiments this ratio is 20:1 (20 sequential I/Os take the same time as 1 random). We thus present the equivalent number of random I/Os using this ratio.

The values of the contraction parameter C and the grid resolution H, that are used for the preprocessing, depend on the datasets. For each dataset, the parameters C and H were tuned on the 5% subset as follows. We performed the preprocessing of each subset for different values of (C, H), and tested the performance of RICCdecay on a set of 200 queries. The length of each query was picked uniformly at random between 500 and 3500 sec for the MV, and between 600 and 4200 sec for the RW datasets. The h max value was picked uniformly at random from 1 to 4 (we stopped at h max = 4 since the higher the h max , the less information is caried by the reached object and thus presents less interest). The parameters C and H were varied as follows: grid resolution H -from 500 to 40000 meters for MV datasets, and from 250 to 2000 m for RW datasets; contraction parameter C -from 0.5 to 30 min. For each dataset, the pair (C, H) that minimized the number of I/Os was used for the rest of the experiments. For example, for M V 1 we used C = 14 min and H = 20000 m, while for RW 4 we used C = 2 min nd H = 500 m.

The sizes of the auxiliary files and the index sizes for RICCmeetMin and RICCdecay appear in Table II . RICCdecay uses about 13.5% more space compare to RICCmeetMin since it records more information into the file Reached(Hop). (For each reached object, in addition to its id, it saves its hop value.) The time needed to preprocess one hour of data for RICCdecay ranges from 14 sec for M V 1 to 91 min for RW 4 . For comparison, the preprocessing time for RICCmeetMin ranges from 13 sec for M V 1 to 56 min for RW 4 .

The performance of RICCdecay was tested on sets of 100 queries of different time intervals and various h max = 1, 2, 3, 4, while µ was set to 2 sec, and the initial weight w of the item carried by O S was set to 1 for all the experiments. Increasing query length answer reachability queries with decay.) We ran a set of 100 queries varying h max from 1 to 4; each query's interval was picked uniformly at random from 500 to 3500 sec for the MV datasets, and from 600 to 4200 sec for RW datasets.

The results are presented in Fig. 6 (a1 − b3). RICCdecay accesses from 1.8 (for M V 2 dataset) to 11.5 (for RW 4 dataset) times less pages than RICCmeetMin. The biggest advantage of RICCdecay is achieved for h max = 1 for all datasets, and in general, the smaller the h max , the better is the performance of the RICCdecay algorithm. When answering a query Q mh , it reads file Reached(Hop) first. File Meetings needs to be read only if during traversing file Reached(Hop), the target object appears among the objects, reached by the source (i.e. if O T ∈ S ′ Reached ). However, S ′ Reached is a superset of the set of objects that can be reached by O S during the query interval I. We say that a query is pruned, if it aborts after reading file Reached(Hop) because of not finding the target among the reached objects. By precomputing the hop value of each reached object, Reached(Hop) gives more accurate information, than RICCmeet, which reduces the size of S ′ Reached . The smaller the h max , the less objects are in S ′ Reached , and thus the higher percent of queries can be pruned. Increasing Query Length. Now we test the performance of RICCdecay for various query lengths and compare with that of RICCmeetMin. Each test was run on a set of 100 queries varying query length from 500 to 3500 sec for M V , and from 600 to 4200 sec for RW datasets. The h max value for each query was picked uniformly at random from 1 to 4. The results are shown in Fig. 7 . For these sets of queries, RICCdecay outperforms RICCmeetMin in all the tests, accessing about 44% less pages in average, and this result does not change significantly from one dataset to another.

Top-K Reachability Queries. All the queries considered in this section until now were one-to-one queries: they had one source and one target object. Top-k queries that we described in Section III may have more than one source and one target objects. Multiple sources lead to the increase in the search space, while multiple undefined targets prohibit from the early query suspension. In addition, the need to calculate and compare the aggregate weights of the reached objects makes it impossible to prune a query (suspend it after just searching the file Reached(Hop)).

For each of our top-k experiments, we used sets of 100 queries, where query length was 3500 sec for M V datasets and 4200 sec for RW datasets. The number of source objects was set 4: S = {O S1 , O S2 , O S3 , O S4 }, and each weight was assigned a value of 1. Further, D = {0.10, 0.15, 0.20, 0.25}, ν = 0.6, and k was randomly picked from 4 to 20. The area covered by each dataset is very large, so to force objects to be reached by several sources, for each query, we picked source objects from the same cell (with the side equal to d cc ) at the beginning of the query interval. The results (see Fig. 8 ) indicate that for top-k queries RICCmeetMin accesses in average about 37% more pages than RICCtopK for the M V , and about 30% more pages for RW datasets. The advantage of RICCtopK owes to both, the RICCdecay index, and RICCtopK query processing. Information from the preprocessing allows for computing the maximum possible aggregate score F max using information from file Reached(Hop), while RICCtopK reduces the number of objects that have to be accessed when the query reads the file Meetings. VIII. CONCLUSIONS We presented two novel reachability problems: reachability with transfer decay and top-k reachability with transfer decay. To process these queries efficiently, we designed two new algorithms: RICCdecay and RICCtopK. The RICCmeetMin algorithm [3] was modified to answer the same types of queries, and served as a benchmark. We tested our algorithms on six realistic datasets, varying query duration and the maximum allowed number of hops. The performance comparison showed that RICCdecay and RICCtopK can answer the new types of queries more efficiently than RICCmeetMin.

Efficient reachability query evaluation in large spatiotemporal contact datasets

RICC: fast reachability query processing on large spatiotemporal datasets

Efficient processing of reachability queries with meetings

Scarab: scaling reachability computation on large graphs

Efficient Managemet on Transitive Relationships in Large Data and Knowledge Bases

Dual labeling: Answering graph reachability queries in constant time

Reachability and distance queries via 2-hop labels

3-hop: a high-compression indexing scheme for reachability query

Path-hop: efficiently indexing large graphs for reachability queries

GRAIL: scalable reachability index for large graphs

PReaCH: A Fast Lightweight Reachability Index Using Pruning and Contraction Hierarchies

Contraction hierarchies: faster and simpler hierarchical routing in road networks

Efficient snapshot retrieval over historical graph data

Characterising Temporal Distance and Reachability in Mobile and Online Social Networks

Reachability queries on large dynamic graphs: a total order approach

Spatio-temporal Access Methods: Part2 (2003 -2010)

On indexing mobile objects

Novel approaches in query processing for moving object trajectories

Efficient indexing of spatiotemporal objects

The bdual-tree: Indexing moving objects by space filling curves in dual space

St2b-tree: A self-tunable spatio-temporal b+-tree index for moving objects

Stripes: An efficient index for predicted trajectories

Lugrid: Update-tolerant gridbased indexing for moving objects

Indexing spatiotemporal trajectories with efficient polynomial approximation

Efficient trajectory joins using symbolic representations

Continuous clustering of moving objects

On discovering moving clusters in spatio-temporal data

On-line discovery of flock patterns in spatio-temporal data

Discovery of convoys in trajectory databases

Joint top-k spatial keyword query processing

Efficient continuously moving top-k spatial keyword query processing

Spatial keyword query processing: an experimental evaluation

Efficient processing of top-k spatial preference queries

Top-k spatial preference queries in directed road networks

Efficient computation of top-k frequent terms over spatio-temporal ranges

Scalable top-k spatiotemporal term querying," in Data Eng

Generating traffic data

Dynamic source routing in ad hoc wireless networks