key: cord-0230243-cik7eqau
authors: Arenas, Marcelo; Bahamondes, Pedro; Aghasadeghi, Amir; Stoyanovich, Julia
title: Temporal Regular Path Queries
date: 2021-07-02
journal: nan
DOI: nan
sha: 0b04a1a6b33afc0f7ed2b2dfc186055554ea9f5b
doc_id: 230243
cord_uid: cik7eqau

In the last decade, substantial progress has been made towards standardizing the syntax of graph query languages, and towards understanding their semantics and complexity of evaluation. In this paper, we consider temporal property graphs (TPGs) and propose temporal regular path queries (TRPQs) that incorporate time into TPG navigation. Starting with design principles, we propose a natural syntactic extension of the MATCH clause of popular graph query languages. We then formally present the semantics of TRPQs, and study the complexity of their evaluation. We show that TRPQs can be evaluated in polynomial time if TPGs are time-stamped with time points, and identify fragments of the TRPQ language that admit efficient evaluation over a more succinct interval-annotated representation. Finally, we implement a fragment of the language in a state-of-the-art dataflow framework, and experimentally demonstrate that TRPQ can be evaluated efficiently.

The importance of networks in scientific and commercial domains is undeniable. Networks are represented by graphs, and we will use the terms network and graph interchangeably. Considerable research and engineering effort is devoted to the development of effective and efficient graph representations and query languages. Property graphs have emerged as the de facto standard, and have been studied extensively, with efforts underway to unify the semantics of query languages for these graphs [1] , [2] . Many interesting questions about graphs are related to their evolution rather than to their static state [3] - [11] . Consequently, several recent proposals seek to extend query languages for property graphs with time [12] - [16] .

Our focus in this paper is on incorporating time into path queries. More precisely, we (a) outline the design principles for a temporal extension of Regular Path Queries (RPQs) with time; (b) propose a natural syntactic extension of state of the art query languages for conventional (non-temporal) property graphs, which supports temporal RPQs (TRPQs); (c) formally present the semantics of this language; (d) study the complexity of evaluation of several variants of this language; (e) implement a practical fragment of this language in a dataflow framework; and (f) empirically demonstrate that TRPQs can be evaluated efficiently. We show that, by adhering to the design principles that draw on decades of work on graph databases and on temporal relational databases, we are able to achieve polynomial-time complexity of evaluation, paving the way to implementations that are both usable and practical, as supported by our implementation and experiments.

As a preview of our proposed methods, consider Figure 1 that depicts a contact tracing network for a communicable disease with airborne transmission between people in enclosed locations on a university campus. In this network, different actors and their interactions are presented as a temporal property graph or TPG for short. (We will define temporal property graphs formally in Section III).

As in conventional property graphs [1] , nodes and edges in a TPG are labeled. The graph in Figure 1 contains two types of nodes, Person and Room (representing a classroom), and three types of edges: bi-directional edges meets and cohabits (lives together), and directed edge visits. Nodes and edges have optional properties that are associated with values. For example, node n 1 of type Person has properties name with value 'Ann' and risk with value 'low'. As another example, edge e 2 of type meets has property loc with value 'park'.

The purpose of the graph in Figure 1 is to allow identification of individuals who may have been exposed to the disease. In particular, we are interested in identifying potentially infected individuals who are considered high risk, due to age or pre-existing conditions. These types of questions can be naturally phrased as temporal regular path queries (TRPQs) that interrogate reachability over time. We will give an example of a TRPQ momentarily.

To support TRPQs, all nodes and edges in a TPG are associated with time intervals of validity (or intervals for short) that represent consecutive time points during which no change occurred for a node or an edge, in terms of its existence or property values. For example, node n 1 (Ann) is associated with the interval [1, 9] , signifying that n 1 was present in the graph and took on the specified property values during 9 consecutive time points. As another example, node n 2 (Bob) exists during the same interval as n 1 , but undergoes a change in the value of the property risk at time 4, when it changes from 'low' to 'high'. We represent a change in the state of an entity (a node or an edge) with nested boxes inside an outer box that denotes the entity in Figure 1 . Now, consider an example of a TRPQ that extends the syntax of Cypher to retrieve the list of high-risk people (x) who met someone (y), who subsequently tested positive for an infectious disease: name = Ann risk = low time = [1, 9] n1 : Person name = Bob risk = low time = [1, 4] name = Bob risk = high time = [5, 9] n2 : Person loc = cafe time = [3, 3] loc = park time = [5, 6] e1 : meets name = Mia risk = high time = [1, 7] n3 : Person num = 750 bldg = CS time = [3, 8] n4 : Room num = 1101 bldg = MATH time = [3, 7] n5 : Room time = [3, 7] e5 : cohabits name = Eve risk = low time = [2, 8] name = Eve risk = low test = pos time = [9, 9] name = Eve risk = low time = [10, 11] n6 : Person name = Zoe risk = high time = [1, 8] n7 : Person loc = park time = [1, 2] e2 : meets time = [6, 7] e3 : visits time = [5, 6] e6 : visits time = [5, 6] e7 : visits time = [7, 8] e8 : visits time = [6, 8] e9 : visits loc = cafe time = [5, 6] e10 : meets loc = park time = [4, 4] e11 : meets This contact tracing query produces the following temporal binding table when evaluated over the TPG in Figure 1 :

x x_time y y_time n 7 5 n 6 9 n 7 6 n 6 9 n 3 4 n 6 9 B. Summary of our approach

In the remainder of this paper, we formally develop the concepts that are necessary to evaluate this and other useful TRPQs over TPGs. We adopt a conceptual TPG model that naturally extends property graphs with time, and is both simple and sufficiently flexible to support the evolution of graph topology and of the properties of its nodes and edges. We evaluate TRPQs on TPGs under point-based semantics [17] , in which operators adhere to two principles: snapshot reducibility and extended snapshot reducibility, discussed in Section II. Our conceptual TPG model admits two logical representations that differ in the kind of time-stamping they use [18] . One associates objects with time points, while the other associates them with time intervals, for a more compact representation.

Design principles. We carefully designed our TRPQ language based on the following principles:

Navigability: Include operators that refer to the dynamics of navigating through the TPG: temporal navigation refers to movements on the graph over time, and structural navigation refers to movements across locations in its topology.

Navigation orthogonality: Temporal and structural navigation operators must be orthogonal, allowing non-simultaneous single-step temporal and structural movement.

Node-edge symmetry: The language should treat nodes and edges as first-class citizens, supporting equivalent operations.

Static testability: Testing is independent of navigation.

Snapshot reducibility: When time is removed from a query, pairs of temporal objects satisfying the query should correspond to pairs of objects in a single snapshot of the graph, and every pair satisfying the query in the snapshot of the TPG should correspond to a path satisfying it in the TPG.

By adhering to these principles, we achieved polynomialtime complexity of evaluation for TPGs that are time-stamped with time points, and also identified a significant fragment of the language that can be efficiently evaluated for interval time-stamped TPGs. In addition to theoretical results, these principles also allowed us to efficiently implement TRPQs by decoupling non-temporal and temporal processing.

Paper organization: We first give some background on temporal graph models and path query languages in Section II. We then formally define temporal graphs in Section III. We go on to propose a syntax for adding time to a practical graph query language in Section IV. Next, in Section V, we give the precise syntax and semantics of the language, and study the complexity of evaluating it. We describe an implementation of our language over an interval-based TPG in Section VI, and present results of an experimental evaluation in Section VII. We conclude in Section VIII. Additional complexity results and proofs, and supplementary experiments are available in the Appendix. System implementation and experimental evaluation are available at https://github.com/amirpouya/tpath.

Substantial research has been undertaken in the area of temporal relational databases since the 1980s, producing a significant body of work [19] , which includes representation of time [20] - [22] , semantics of temporal models [23] , temporal algebras [24] , and access methods [25] . Results of some of this work are part of the SQL:2011 standard [26] . a) Temporal graph models: Temporal graph models differ in what temporal semantics they encode, what time representation they use (time point, interval, or implicitly with a sequence), what entities they time-stamp (graphs, nodes, edges, or attribute-value assignments), and whether they represent evolution of topology only or also of the attributes. With a few exceptions, discussed next, the current de facto standard representation of temporal graphs is the snapshot sequence, where a state of a graph is associated with either a time point or an interval during which the graph was in that state [27] - [37] . This representation supports operations within each snapshot under the principle of snapshot reducibility, namely, that applying a temporal operator to a database is equivalent to applying the non-temporal variant of the operator to each database state [17] . For example, the G* system [15] stores a temporal graph as a snapshot sequence and provides two query languages, the procedural PGQL and the declarative DGQL. PGQL includes operators such as retrieving graph vertices and their edges at a given time point, along with nongraph operators like aggregation, union, projection, and join. Neither PGQL nor DGQL support temporal path queries.

The fundamental disadvantage of using the snapshot sequence as the conceptual representation of a temporal graph is that it does not support operations that explicitly reference temporal information. Semantics of operations that make explicit references to time are formalized as the principle of extended snapshot reducibility, where timestamps are made available to operators by propagating time as data [17] . Considering that our goal in this work is to support temporal regular path queries, having access to temporal information during navigation is crucial.

In response to this important limitation of the snapshot sequence representation, proposals have been made to annotate graph nodes, edges, or attributes with time. Moffitt and Stoyanovich [16] proposed to model property graph evolution by associating intervals of validity with nodes, edges, and property values. They also developed a compositional temporal graph algebra that provides a temporal generalization of common graph operations including subgraph, node creation, union, and join, but does not include reachability or path constructs. In our work, we adopt a similar representation of temporal graphs, but focus on temporal regular path queries. b) Paths in temporal graphs: Specific kinds of path queries over temporal graphs have been considered in the literature. Wu et al. [38] - [40] studied path query variants over temporal graphs, in which nodes are time-invariant and edges are associated with a starting time and an ending time. (Nodes and edges do not have type labels or attributes.) The authors introduced four types of "minimum temporal path" queries, including the earliest-arriving path and the fastest path, which can be seen as generalizations of the shortest path query for temporal graphs. They proposed algorithms and indexing methods to process minimum temporal path and temporal reachability queries efficiently.

Byun et al. [12] introduced ChronoGraph, a temporal graph traversal system in which edges are traversed time-forward. The authors show three use cases: temporal breadth-first search, temporal depth-first search, and temporal single-source shortest-path, instantiated over Apache Tinkerpop. Johnson et al. [14] introduced Nepal, a query language that has SQLlike syntax and supports regular path queries over temporal multi-layer communication networks, represented by temporal graphs that associate a sequence of intervals of validity with each node and edge. The key novelty of this work are timetravel path queries to retrieve past network states. Finally, Debrouvier et al. [13] introduced T-GQL, a query language for TPGs with Cypher-like syntax [41] . T-GQL operates over graphs in which (a) nodes persists but their attributes (with values) can change over time, and so are associated with periods of validity; and (b) edges are associated with periods of validity but their attributes are time-invariant. This asymmetry in the handling of nodes and edges is due to the authors' commitment to a specific (lower-level) representation of such TPGs in a conventional property graph system. Specifically, they assume that Objects (representing nodes), Attributes, and Values are stored as conventional property graph nodes, whereas time intervals are stored as properties of these nodes. Temporal edges are, in turn, stored as conventional edges, with time interval as one of their properties. T-GQL supports three types of path queries over such graphs, syntactically specified with the help of named functions: (1) "Continuous path" queries retrieve paths valid during each time pointsnapshot semantics. (2) "Pairwise continuous paths" require that the incoming and the outgoing edge for a node being traversed must exist during some overlapping time period. (3) "Consecutive paths" encode temporal journeys; for example, to indicate a way to fly from Tokyo to Buenos Aires with a couple of stopovers in a temporal graph for flight scheduling. Consecutive paths are used in T-GQL for encoding earliest arrival, latest departure, fastest, and shortest path queries.

A more detailed comparison of our proposal with other temporal query languages is given in Section V-C. In summary, our proposal differs from prior work in that we develop a general-purpose query language for temporal paths, which works over a simple conceptual definition of temporal property graphs and is nonetheless general enough to represent different kinds of temporal and structural evolution of such graphs. Our language is syntactically simple: it directly, and minimally, extends the MATCH clause of popular graph query languages, and does not rely on custom functions. In fact, as we show in Section V-B, there is a simple way to define its formal semantics, which allows us to develop efficient algorithms for query evaluation.

In this section, we formalize the notion of temporal property graph, which extends the widely used notion of property graph [1] , [2] , [41] to include explicit access to time. In this way, we can model the evolution of the topology of such a graph, as well as the changes in node and edge properties.

A temporal property graph defines a point-based representation of the evolution of a property graph, which is a simple and suitable framework to represent and reason about this evolution. However, time-stamping objects with time points may be impractical in terms of space overhead. This motivates the development of interval-based representations, which are common for temporal models for both relations (e.g., [18] , [42] ) and graphs (e.g., [12] , [13] , [16] ). In this section, we also define a succinct representation of temporal property graphs that uses interval time-stamping. Notice that point-based temporal semantics requires this succinct representation to be temporally coalesced: a pair of value-equivalent temporally adjacent intervals should be stored as a single interval, and this property should be maintained through operations [43] .

Assume Lab, Prop and Val to be sets of label names, property names and actual values, respectively. We define temporal property graphs over finite sets of time points. Time points can take on values that correspond to the units of time as appropriate for the application domain, and may represent seconds, weeks, or years. For the sake of presentation, we represent the universe of time points by N: a temporal domain Ω is a finite set of consecutive natural numbers, that is,

• Ω is a temporal domain; N is a finite set of nodes, E is a finite set of edges, and V ∩ E = ∅; • ρ : E → (N × N ) is a function that maps an edge to its source and destination nodes; • λ : (N ∪ E) → Lab is a function that maps a node or an edge to its label; • ξ : (N ∪ E) × Ω → {true, false} is a function that maps a node or an edge, and a time point to a Boolean. Moreover, if ξ(e, t) = true and ρ(e) = (v 1 , v 2 ), then ξ(v 1 , t) = true and ξ(v 2 , t) = true. • σ : (N ∪ E) × Prop × Ω → Val is a partial function that maps a node or an edge, a property name, and a time point to a value. Moreover, there exists a finite number of triples

Observe that Ω in Definition III.1 denotes the temporal domain of G, a finite set of linearly ordered time points starting from the time associated with the earliest snapshot of G, and ending with the time associated with its latest snapshot, where a snapshot of G refers to a conventional (non-temporal) property graph that represents the state of G at a given time point. Function ρ in Definition III.1 is used to provide the starting and ending nodes of an edge, function λ provides the label of a node or an edge, and function ξ indicates whether a node or an edge exists at a given time point in Ω (which corresponds to true). Finally, function σ indicates the value of a property for a node or an edge at a given time point in Ω.

Two conditions are imposed on TPGs to enforce that they conceptually correspond to sequences of valid conventional property graphs. In particular, an edge can only exist at a time when both of the nodes it connects exist, and that a property can only take on a value at a time when the corresponding object exists. Moreover, observe that by imposing that σ(o, p, t) be defined for a finite number of triples (o, p, t), we are ensuring that each node or edge can have values for a finite number of properties, so that each TPG has a finite representation. Finally, Definition III.1 assumes, for simplicity, that property values are drawn from the infinite set Val. That is, we do not distinguish between different data types. If a distinction is necessary, then Val can be replaced by a domain of values of some k different data types, Val 1 , . . . , Val k .

Recall our running example discussed in Section I-A and shown in Figure 1 . This example illustrates Definition III.1; it shows a TPG used for contact tracing for a communicable disease, with airborne transmission between people (represented by nodes with label Person) in enclosed locations (e.g., nodes with label Room). This TPG has a temporal domain Ω = {1, . . . , 11}, although any set of consecutive natural numbers containing Ω can serve as the temporal domain of this TPG, for example the set {0, . . . , 15}. The TPG is a multigraph: n 2 and n 3 are connected by two edges, e 2 and e 5 .

In the TPG in Figure 1 , Person nodes have properties name, risk ('low' or 'high'), and test ('pos' or 'neg'). For example, Eve, represented by node n 6 , is known to have tested positive for the disease at time 9. Note that each node and edge refers to a specific time-invariant real-life object or event. A TPG records observed states of these objects. In fact, real-life objects correspond to a sequence of temporal objects, each with a set of properties. For instance, node n 2 corresponds to a sequence of 9 temporal objects, one for each time point 1 through 9. These are represented in the figure by two boxes inside the outer box for n 2 , one for each interval during which no change occurred: [1, 4] with name Bob, and low risk, and [5, 9] with name Bob, and high risk. To simplify the figure, we do not show internal boxes for nodes or edges associated with a single time interval, such as n 1 and e 6 .

An interval of N is a term of the form [a, b] with a, b ∈ N and a ≤ b, which is used as a concise representation of the set {i ∈ N | a ≤ i ≤ b} between its starting point a and its ending point b. Each TPG G = (Ω, N, E, ρ, λ, ξ, σ) can be transformed into an Interval-timestamped Temporal Property Graph (ITPG), by putting the consecutive time points with the same values into the interval. More precisely, an ITPG I = (Ω , N, E, ρ, λ, ξ , σ ) encoding G is defined in the following way. The temporal domain Ω = {i ∈ N | a ≤ i ≤ b} of G is replaced by the interval Ω = [a, b], and N , E, ρ, λ are the same as in G. Moreover, ξ is a function that maps each object o ∈ (N ∪E) to a set of maximal intervals where o exists according to function ξ. For example, for Ω = {1, 2, 3, 4, 5} and node n such that ξ(n, 1) = ξ(n, 2) = ξ(n, 3) = ξ(n, 5) = true and ξ(n, 4) = false, it holds that Ω = [1, 5] and ξ (n) = { [1, 3] , [5, 5] }. Notice that ξ (n) could not be defined as { [1, 2] , [3, 3] , [5, 5] } since [1, 2] is not a maximal interval where n exists. In other words, the set of intervals in ξ (n) has to be coalesced. Finally, function σ is generated from σ in a similar way as ξ . The formal definition of ITPG can be found in the Appendix.

The main goal of this paper is to introduce a simple yet general query language for temporal property graphs. In this section, we give a guided tour of the query language, using the TPG shown in Figure 1 as the running example. All queries, except those presented alongside their equivalent rewritings, are numbered Q1 through Q12, and will be used in the experimental evaluation in Section VII.

The MATCH clause is a fundamental construct in popular graph query languages such as Cypher [41] , PGQL [44] , and G-Core [2] . By using graph patterns, the MATCH clause allows to bind variables with objects in a property graph, giving rise to binding tables that are subsequently processed by the other components of the query language. As an important step towards the construction of a temporal graph query language, we show how the MATCH clause can be extended to bind variables with temporal objects in a TPG. In particular, we show how the syntax and semantics of the query language G-Core [2] can be extended to accommodate temporal graph patterns. As the syntax and semantics of G-Core are compatible with those of Cypher [41] and PGQL [44] , these languages can accommodate such temporal graph patterns as well. These languages play a fundamental role in the ongoing graph query language standardization effort [45] , and our proposal can provide a natural temporal extension for this standard.

Our proposed syntax for temporal regular path queries can be summarized as the following extension of the MATCH clause:

Here, graph is either a TPG or an ITPG, and path is an expression that can contain temporal and structural navigation operators, together with some other functionalities like testing the label of a node or an edge, and verifying the value of a property of a node or an edge. We will present the formal semantics of the language in Section V.

As a first example, assume that contact_tracing is the TPG shown in Figure 1 . Then, the following G-Core expression extracts the list of people from contact_tracing:

The operator ON specifies that contact_tracing is the input graph, and (x:Person) indicates that x is a variable to be assigned nodes with label Person from the input graph. The evaluation of a MATCH clause in G-Core results in a table consisting of bindings that assign to each variable an object from the input graph: a node, an edge, a label, or a property value. The result of evaluating Q1 is the binding table:

x n 1 n 2 n 3 n 6 n 7

At this point, two observations should be made: (i) G-Core does not consider contact_tracing as a temporal property graph, so no explicit time is associated with the objects in a binding table; (ii) Cypher [41] and PGQL [44] produce the same bindings as G-Core when evaluating the previous MATCH clause. How should this clause be evaluated if contact_tracing is considered as a temporal property graph? The first issue is that variables in the MATCH clause are to be assigned temporal objects; for example, (x:Person) indicates that x is a variable to be assigned a temporal object (v, t), where v is a node with label Person that exists at time point t. This issue is addressed by adding an extra column for each variable to indicate the time point when that variable exists (table entries appear side-by-side to save vertical space):

x x_time n 1 1 · · · n 1 9

x x_time n 2 1 · · · n 7 8

Observe that the time point t for each value v of x is stored in the column x_time. Hence, the binding x → n 1 , x_time → 1 is in the resulting table, since n 1 is a node with label Person that exists at time point 1 in contact_tracing, and similarly for the other bindings. This illustrates that TRPQs without temporal navigation operate under snapshot reducibility, a design principle discussed in Section I-B.

Having explained how bindings to temporal objects are represented, we can now illustrate the main features of our query language. As in other popular graph query languages, we use curly brackets to indicate restrictions on property values. As our first example, consider the following MATCH clause:

The expression {risk = 'low'} is used to indicate that the value of property risk must be 'low'. The following binding table is the result of evaluating the previous MATCH clause:

x x_time n 1 1 · · · n 1 9

x x_time n 2 1 · · · n 2 4

x x_time n 6 2 · · · n 6 11 Observe that the binding x → n 2 , x_time → 4 is in this table, since n 2 is a node such that the label of n 2 is Person, n 2 exists at time point 4, and the value of property risk is 'low' for n 2 at time point 4, and likewise for the other bindings in this table. As a second example, consider the following query: In this case, we use the reserved word time to indicate that we are considering temporal objects at time point 1. The following is the result of evaluating this MATCH clause:

x x_time n 1 1 n 2 1

Other operators can limit the time under consideration, for example, to consider temporal objects at time less than 10:

Now, suppose that we want to retrieve the pairs of low-and high-risk people who have met, along with information about their meeting. For this, we can use the following query: The result of evaluating this MATCH clause is:

As in other popular graph query languages [2] , [41] , [44] , an expression of the form -[:meets]-> indicates the existence of an edge with label meets. We assign the variable z to the temporal object that represents that edge. Importantly, an expression of the form -[...]-> represents the structural navigation operator that is conceptually evaluated over the snapshots (temporal states) of the graph. This is the reason why each binding in the resulting table has the same value in columns x_time, z_time, and y_time . For example, the binding x → n 1 , x_time → 5, z → e 1 , z_time → 5, y → n 2 , y_time → 5 is in this table, since n 1 is a low-risk person at time point 5, n 2 is a high-risk person at time point 5, and there exists an edge e 1 with label meets between n 1 and n 2 at time point 5.

To ensure that our proposal is practically useful, a minimum requirement is that queries can be evaluated in polynomial time over TPGs. Hence, we have to choose very carefully how structural navigation is combined with temporal navigation, and how we refer to time in the query language, as the complexity can quickly become intractable when navigation patterns are combined with functionalities for comparing property values [46] . In fact, there is even a fixed query Q for which this negative result holds [46] . This means that the problem of computing, given a graph G as input, the answer to Q over G is intractable in data complexity [47] .

The basic temporal navigation operators in our language are PREV and NEXT that move by one unit of time into the past and into the future, respectively. Consider the following query: Here, x and y are temporal objects that correspond to the same real-world object -a node of type Person. In this case, x has the value 'pos' in the property test, meaning that x tested positive at some time point, and y denotes the same node at the time immediately before testing positive.

Temporal navigation allows single-step temporal movement, and is orthogonal to structural navigation, following navigation orthogonality, discussed in Section I-B. Note that PREV and NEXT reference timestamps, operating under extended snapshot reducibility [17] , discussed in Section II.

This example illustrates the use of notation -/.../-to specify a pattern that a path connecting objects x and y must satisfy. In general, such a pattern is a regular expression that can include temporal and structural operators (see formal definition in Section V). In this example, assuming that the temporal object (o 1 , t 1 ) corresponds to (x:Person {test = 'pos'}), and the temporal object (o 2 , t 2 ) corresponds to (y:Person), then the expression -/ PREV/-indicates that (o 1 , t 1 ) must be connected with (o 2 , t 2 ) through a path conforming to PREV, that is, t 2 = t 1 − 1. Importantly, -/PREV/-is evaluated under the restriction that no structural navigation must have occurred, given the separation between temporal and structural navigation that we are arguing for in this work. Hence, we conclude that o 2 = o 1 . The following binding table is the result of evaluating Q6:

x x_time y y_time n 6 9 n 6 8 Temporal and structural navigation can be combined to retrieve information about which room person x was visiting immediately before she received a positive test result:

The result of evaluating this MATCH clause is:

x x_time y y_time z z_time n 6 9 n 6 8 n 4 8

Observe that the temporal operator PREV moves from (x, x_time) to (y, y_time), while the structural operator -[:visits]-> moves from (y, y_time) to (z, z_time). Hence, temporal and structural navigation are carried out separately. Besides, observe that the intermediate variable y is not needed when retrieving the list of rooms that person x was visiting, we just included it to show the paths that are constructed when using different operators. The following simplified MATCH clause

can be used to obtain the desired answer:

x x_time z z_time n 6 9 n 4 8

At this point the reader may be wondering why the language is asymmetric, and it includes different notation for temporal and structural navigation. We have kept the notation -[...]-> to be compatible with graph query languages used today [2] , [41] , [44] , but an important feature of our proposal is the use of notation -/.../-to include regular expressions combining temporal and structural operators. Hence, we include two basic structural navigation operators, BWD ("backward") and FWD ("forward"), that are analogous to the temporal operators PREV and NEXT. Assume that an edge is given

which, in the formal TPGs notation (see Definition III.1), represents the fact that ρ(e) = (n, n ), ξ(n, t) = true, ξ(e, t) = true, and ξ(n , t) = true. Then, operator FWD moves forward from node n to edge e, or from edge e to node n , while keeping time t unchanged. That is, FWD operates in a TPG snapshot corresponding to time t. Similarly, operator BWD moves backwards from node n to edge e, and from edge e to node n in a TPG snapshot corresponding to time t. Thus, we can rewrite the previous MATCH clause as follows: The regular expression PREV/FWD/:meets/FWD uses the concatenation operator / to indicate that operator PREV has to be executed first followed by the expression FWD/:visits /FWD, which is executed in the same way. (The precise syntax and semantics of such expressions are presented in Section V.) Observe that in our query language, the expression -[:visits]-> is equivalent to -/FWD/:visits/FWD /-. This is because, given an edge of the form of Expression (1), the first operator FWD moves from n to e, then :visits checks that the label of e is visits, and finally the last operator FWD moves from e to n , thus obtaining the same result as using the operator -[:visits]-> in an edge of the form of Expression (1).

So far we only looked at expressions that navigate one step at a time, temporally or structurally. Our language also supports the Kleene star, indicating zero or more occurrences of an operator. For example, Q8 retrieves the list of rooms person x visited at any time prior to receiving a positive test (including also at the time when x received the test): producing the following temporal bindings:

x x_time z z_time n 6 9 n 4 8 n 6 9 n 4 7 n 6 9 n 5 6 n 6 9 n 5 5

As another example, we can retrieve the high-risk people who met someone who subsequently tested positive for an infectious disease: Recall that the temporal operator NEXT moves in time by one unit into the future. This query returns the following temporal bindings when evaluated over the graph in Figure 1 :

Observe that the term ({test = 'pos'}) does not include a variable, as we are not storing the contacts who tested positive to avoid stigmatizing them, and only record those who are potentially at risk for complications. Moreover, our query language allows to specify the number of times an operator is used. Thus, assuming that the time unit in contact_tracing is 5 minutes, we can retrieve the list of high-risk people who met someone who tested positive for an infectious disease 1 hour prior to the meeting: Next, consider the following notion of close contact for an infectious disease: If person a visits the same room as person b, and b tests positive for this disease at most two weeks after they visited the same room as a, then a is considered to have been in close contact with an infected person. The MATCH clause below retrieves high-risk people who have been in close contact with an infected person:

Observe that, as was the case for edge labels, node labels can be used inside an expression -/.../-, and so -/:Room/-in the expression above is equivalent to -(:Room)-. The query Q11 produces the following binding table:

x x_time n 3 7 n 7 7 n 7 8

As the final example, assume that if person a meets with person b, and b tests positive for an infections disease at most two weeks after their meeting, then a should also be considered to have been in close contact with an infected person. Q11 can be extended to consider this additional case:

This query produces the following bindings: In this section, we illustrated the main features of our proposed language and showed how popular graph query languages [2] , [41] , [44] can be extended to include these features. We will define the syntax and the semantics of our language next.

In this section, we provide a formal syntax and semantics for the expression path described in the previous section, and study the complexity of evaluating it. In Section V-A, we extend the widely used notion of regular path query [1] , [48] - [50] to deal with temporal objects in TPGs, which gives rise to the language NavL[PC,NOI]. Moreover, we show in Section V-A how NavL[PC,NOI] provides a formalization of the practical query language proposed in the previous section. Then we define the semantics of NavL [PC,NOI] in Section V-B, by following the definition of widely used query languages such as XPath and regular path queries [1] , [48] - [54] . Moreover, we study in Section V-B the complexity of the evaluation problem for NavL[PC,NOI] for TPGs and ITPGs. Finally, we provide in Section V-C a comparison of our proposal with other temporal query languages. Proofs and additional results can be found in the Appendix.

Recall that labels, property names, and property values are drawn from the sets Lab, Prop, and Val, respectively. Then the expressions in NavL[PC,NOI], which are called temporal regular path queries (TRPQs), are defined by the grammar:

where n and m are natural numbers such that n ≤ m. Intuitively, test checks a condition on a given node or edge at a given time point, axis allows structural or temporal navigation, (path/path) is used for the concatenation of two TRPQs, (path + path) allows for the disjunction of two TRPQs, path[n, m] allows path to be repeated a number of times that is between n and m, whereas path[n, _] only imposes a lower bound of at least n repetitions of expression path. The Kleene star path * can be expressed as path[0, _], and the expression path[_, n] is equivalent to path[0, n].

Conditions on temporal objects are defined by the grammar:

where ∈ Lab, p ∈ Prop, v ∈ Val, and k ∈ N. Intuitively, test is meant to be applied to a temporal object, that is, to a pair (o, t) with object o and time point t. Node and Edge test whether the object is a node or an edge, respectively; the term checks whether the label of the object is ; the term p → v checks whether the value of property p is v for the object at the given time point; ∃ checks whether the object exists at the given time point; and < k checks whether the current time point is less than k. Further, test can be (?path), where path is an expression satisfying grammar (2) , meaning that there is a path starting on the tested temporal object that satisfies path. Finally, test can be a disjunction or a conjunction of a pair of test expressions, or a negation of a test expression.

Furthermore, the following grammar defines navigation:

Operators F, B move structurally in a TPG: F moves forward in the direction of an edge, and B moves backward in the reverse direction of an edge. Operators N, P move temporally in a TPG: N moves to the next time point, and P moves to the previous time point.

Having a formal definition of the syntax of NavL[PC,NOI], we show that this language provides a formalization of the practical query language of Section IV. More precisely, temporal navigation operators PREV and NEXT in the practical query language correspond to the analogous operators P and N in NavL[PC,NOI], respectively, while structural navigation operators BWD and FWD in the practical query language correspond to the operators B and F in NavL[PC,NOI], respectively. Then consider the following MATCH clause over an arbitrary TPG:

Our task is to construct a query path in NavL[PC,NOI] such that the evaluation of this MATCH clause over graph is equivalent to the evaluation of path over this TPG. The following expression satisfies this condition:

Observe that (Node ∧ Person ∧ test → pos) is used to check whether the following conditions are satisfied for a temporal object (o, t): o is a node with label Person and with value pos in the property test at time point t. Notice that, by definition of TPGs, the fact that test → pos holds at time t implies that node o exists at this time point. Hence, (Node ∧ Person ∧ test → pos) is used to represent the expression (x:Person {test = 'pos'}). Moreover, temporal navigation operator P is used to move from the temporal object (o, t) to a temporal object (o, t ) such that t = t−1, so that it is used to represent the expression -/PREV/-. Finally, the condition (Node ∧ ∃) is used to test that o is a node that exists at time t . Observe that we explicitly need to mention the condition ∃, as expressions in NavL[PC,NOI] do not enforce the existence of temporal objects by default. The main reason to choose such a semantics is that there are many scenarios where moving through temporal objects that do not exists is useful, in particular when these temporal objects only exist at certain time points. For example, if a room is unavailable for some time, then the temporal path expression

can be used to look for the next time the room is available. Here, (N/¬∃)[0, _] moves through an arbitrary number of time points during which the room is unavailable, until the condition ∃ holds, and the room becomes available.

As a second example, consider query Q8 from Section IV. Based on the previous discussion, such a query can be represented as the following TRPQ:

where all temporal objects must exist, as required in Section IV. Note that we have not explicitly included the existence condition on the last room node, as the existence of an edge at time point t implies, according to the definition of TPGs, the existence of its starting and ending nodes.

As an additional example, consider query Q12 from Section IV, which uses many of the features of NavL[PC,NOI]. This query corresponds to the temporal path expression:

As our final example, consider query Q4 from Section IV. The use of a condition over the reserved word time is represented in NavL[PC,NOI] by the condition < k. For example, time < '10' is represented by the condition < 10, as a temporal object (o, t) satisfies < 10 if, and only if, t < 10. Hence, Q4 is equivalent to the following query in NavL[PC,NOI]:

Notice that abbreviations can be introduced for some of the operators described in this section, and some other common operators, to make notation of the formal language easier to use. For example, we could use condition = k, which is written in NavL[PC,NOI] as (< k + 1 ∧ ¬(< k)), and operator NE that moves by one unit into the future if the object that is reached exists. However, as such operators are expressible in NavL[PC,NOI], we prefer to use a minimal notation in this formal language to simplify its definition and analysis.

Let G = (Ω, N, E, ρ, λ, ξ, σ) be a TPG. Given an expression path in NavL[PC,NOI], the evaluation of path over G, denoted by path G , is defined by the set of tuples (o, t, o , t ) such that there exists a sequence of temporal objects starting in (o, t), ending in (o , t ), and conforming to path. More precisely, assume that src(e) = v 1 and tgt(e) = v 2 whenever ρ(e) = (v 1 , v 2 ), and assume that PTO

Then the evaluation of the axes in grammar (2) is defined as:

Moreover, assuming that, path, path 1 and path 2 are expressions in NavL[PC,NOI], we have that:

where path k is defined as the concatenation of path with itself k times. Finally, the evaluation of an expression test, defined according to grammar (3), is a navigation expression that stays in the same temporal object if test is satisfied:

Hence, to conclude the definition of the semantic of NavL[PC,NOI], we need to indicate when a temporal object (o, t) satisfies a condition test, which is denoted by (o, t) |= test. Formally, this is recursively defined as follows (omitting the usual semantics for Boolean connectives):

To define the evaluation of an expression path over a intervaltimestamped temporal property graph I, we just need to translate I into an equivalent TPG and consider the previous definition. Formally, assuming that can(·) is a canonical translation from an ITPG into an equivalent TPG, we have that: path I = path can(I) .

Having a formal definition of TRPQs allows not only to provide an unambiguous definition of the practical query language of Section IV, but also to formally study the complexity of evaluating this language. Assuming that G is a class of graphs and L is a query language, define Eval(G, L) as the problem of verifying whether (o, t, o , t ) ∈ path G , for an input consisting of a graph G ∈ G, an expression path in L and a pair (o, t), (o , t ) of temporal objects in G. 

T-GQL is a recently proposed temporal query language [13] developed on top of Cypher [41] , a popular graph query language. We now compare our TRPQs with T-GQL, and with the alternative of implementing a temporal graph query language that encodes time intervals as lists directly in Cypher.

First, consider the five design principles of our language, described in Section I-B. Since Cypher's data model does not explicitly consider time, it is not surprising that it does not satisfy navigability, navigation orthogonality, static testability, or snapshot reducibility, and only node-edge symmetry is satisfied. T-GQL satisfies navigability, navigation orthogonality and snapshot reducibility, but it treats nodes and edges differently, violating node-edge symmetry. Moreover, T-GQL test conditions do not satisfy static testability.

Second, consider the complexity of the query evaluation problem. As shown in Theorem V.1, our query language can be evaluated in polynomial time over temporal property graphs. In contrast, the evaluation problem for Cypher is intractable, even if we focus on non-temporal property graphs (i.e., a temporal property graph consisting of a single timestamp). In fact, a fixed query that checks for the existence of two disjoint paths from the same source node to the same destination node can be expressed in Cypher and is known to be NP-hard [41] . Whether these intractability results carry over T-GQL is not clear, as an exact characterization of T-GQL as a fragment of Cypher has not yet been provided.

Finally, we compare the expressive power of our proposal with Cypher and T-GQL. As Cypher is a general purpose graph query language, it is not surprising that every query in our proposal can be expressed in it, but at the cost of using unnatural and expensive time interval encodings. However, we can show that some natural TRPQs cannot be expressed in T-GQL. First, consider a graph for travel scheduling that includes different transportation services, such as flights, trains, and buses. By the definition of consecutive path in [13] , it is not possible to express a query in T-GQL that indicates how to go from one city to another combining different transportation services, which can be easily expressed in our proposal. As a more fundamental example, consider a query that retrieves paths that combine an arbitrary number of temporal journeys, some of them moving to the future and some to the past. Such a combination of temporal journeys cannot be specified in T-GQL, while it can be handled by our proposal.

We implement a fragment of NavL[PC,NOI] that includes all queries of Section IV over interval-timestamped TPGs. We use Rust and the Itertools library [55] , which efficiently implements dataflow operators, supports lazy evaluation of expressions, and collects data only when necessary. For multithreaded implementation, we use Rayon-Rs [56] , an interface over dataflow operators. Our algorithms can be implemented using any system that supports the dataflow model, such as Apache Spark [57] , Apache Flink [58] , Timely [59] and Differential dataflow [60] .

We represent a TPG as a pair of interval-timestamped temporal relations Nodes(id, label, properties, time) and Edges(id, src, tgt, label, properties, time), where properties are a set of key-value pairs. For example, for node n 2 and edge e 1 from Figure 1 , we have: [1, 4] n 2 Person {name = 'Bob', risk = 'high'} [5, 9] Edges id src tgt label properties time e 1 n 1 n 2 meets {loc = 'cafe'} [3, 3] e 1 n 1 n 2 meets {loc = 'park'} [5, 6] By the formal definition of TRPQs in Section V, we know that temporal and structural navigation operators are orthogonal, in the sense that the language allows non-simultaneous single-step time and structural movements. Hence, we break down the evaluation of a TRPQ into Evaluation of conventional path queries in Step 1 is a wellstudied problem [1] , [50] . In this work, we select an optimized select-project-join execution plan for each query in Section IV, and then implement these plans using Itertools operators in Rust.We implement in-memory hash-join that uses intervalbased reasoning to identify temporally-aligned [24] matches. For example, for Q5, we compute the intersection of the validity intervals for x, y and z. For TRPQs without temporal navigation (Q1-Q5), the final bindings table can be returned The interpretation of this temporally coalesced result is snapshot-based: we bind x = n 1 , z = e 1 , y= n 2 , with x_time = y_time = z_time = 5, and similarly for time 6.

Step 2: To evaluate the temporal navigation portion of the path expression, we use interval-based reasoning to join and prune out potential matches that do not satisfy the temporal constraint. For example, for Q7, we can limit the validity interval of z to the time immediately before x was tested positive. Note that interval intersection and union can be computed in constant time based on interval boundaries.

Step 3: For the final portion of query evaluation, we may need to use point-wise reasoning for temporal navigation. For example, Q8 retrieves the list of rooms z that person x visited at or prior to the time of testing positive. The PREV operator is defined over time points, and we need to compare pairs of time points of x and z to correctly identify person-room pairs. Furthermore, result generation for TRPQs that use temporal navigation must compute point-based bindings. Returning to our example, in the result of Q8, x_time may or may not be the same as z_time, and so we cannot use an interval representation for the output bindings such as (n 6 , [5, 6] , n 5 , [3, 5] ), because such a representation is inherently snapshotbased and it does not uniquely map to a set of point-wise temporal bindings over n 6 and n 5 .

An exception are TRPQs that return a single variable, such as Q9-Q12. Results of such queries can be returned temporally coalesced for compactness, although this rarely translates to savings in the running time of query execution, because temporal constraints must be check over a point-based representation for these queries in Step 3, as discussed above.

All experiments were run as a multi-threaded Rust application on a single cluster node with 64 GB of RAM and an Intel Xeon Platinum 8268 CPU, using the Slurm scheduler [61] . According to our results (Figure 3 ), performance for demanding queries was best at 16 CPU cores, and we use this setting in all experiments, unless noted otherwise. Reported execution times are averages of 5 runs. In most cases, the coefficient of variation of the running time was less than 6% (max 10%). 

We built interval-timestamped TPGs (per Sec. III-B) similar to Figure 1 using a trajectory dataset generated by Ojagh et al. [62] to study COVID-19 contact tracing. The authors tracked 20 individuals on the University of Calgary campus, and used that data to simulate trajectories of individuals visiting campus locations, recording the times when individuals entered and exited those locations. The synthetic dataset of Ojagh et al. records time up to a second. To make this data more realistic, we (i) made temporal resolution coarser, mapping timestamps to 5-min windows, and (ii) associated individuals with locations where they spent at least 2.5 min.

Our goal was to have an interval-timestamped graph with two types of nodes, Person and Room (representing classrooms), and two types of edges, visits and meets. To achieve this, we represented 100,000 individuals as Person nodes, with their periods of validity corresponding to visits of classrooms. Next, from among 410 unique locations in the dataset, we selected 100 most frequently visited as nodes of type Room, with periods of validity defined by the times of first entrance and last exit. Then, we added a visits edge between each person and each room they visit, with an appropriate time interval. We used information about the remaining 310 locations to add bi-directional meets edges between a pair of individuals who were at the same location at the same time. Finally, we randomly selected 18% of the Person nodes (proportion of the Canadian population aged 65+) as high risk for disease complications, and fixed this property over the lifespan of those nodes.

To study the impact of graph size on performance, we created graphs at different scale factors by randomly selecting a subset of the Person nodes of a given size, and keeping only the valid edges. To study the impact of query selectivity on performance, we selected between 2% and 10% of the Person nodes as positive for COVID-19, assigning the time of a positive test uniformly at random from the temporal domain of the graph, and keeping the selected nodes as positive for the remainder of their lifespan. Table I summarized the temporal graphs used in our experiments. The largest graph has 100,000 unique Person nodes, 100 unique Room nodes, and a temporal domain of 48 time points, each representing a 5-minute window. This corresponds to 340,000 temporal nodes and over 32 million temporal edges. 

For the first experiment, we executed queries Q1-Q12, discussed in Section IV, over graph G10 (Table I) . Table II shows the execution time of each query in seconds, and its output size in the number of tuples in the bindings table. Recall from Section VI that Steps 1 and 2 of query evaluation act on the interval representation or TRPG, while Step 3 expands the output of Step 2 into a point-based representation to check any remaining temporal constraints. Our implementation uses lazy evaluation. Decoupling the execution times of Steps 1 and 2 for the purpose of measurement would degrade performance, and we report these times jointly as "interval-based time" in Table II . Queries Q1-Q5 do not use temporal navigation, and so interval-based time and total time coincide and the output can remain temporally coalesced. In contrast, Q6-Q12 use temporal navigation; they require both interval-based and point-based processing, and the output for these queries is point-based.

We observe that most queries execute in less than 1 sec. The most challenging queries, Q11 and Q12, both produce over 22 million tuples in the output and take at most 6.5 sec.

In the second experiment, we execute all queries over graphs G1-G10 to study the impact of graph size on query performance. Figure 2 shows this result, with the number of unique Person nodes on the x-axis, and execution time in seconds on the y-axis. Observe that the running time increases linearly for all queries except Q5, Q9, and Q10 where the time increases approximately quadratically with increasing graph size. Increase in the running time is nearly perfectly explained by the increase in the size of the output. For example, increasing input size by a factor of x10 nodes and x100 edges (G6 to G10) increases output size of Q11 (resp. Q12) by a factor of 18.39 (resp. 19.29), and it increases the execution time by a factor of 18.89 (resp. 19.29).

In our third experiment, we studied the impact of parallelism on performance. Figure 3 shows the result of this experiment over the largest graph, G10, with the number of CPU cores on the x-axis and execution time in seconds on the y-axis. (The number of threads is the number of CPUs + 1.) Observe that the most demanding queries Q5, Q10, Q11, and Q12 substantially benefit from increased parallelism, with best performance at 16 cores. For example, Q12 executes in 6.45 sec on 16 cores, down from 13 sec on 1 core.

Queries Q6-Q11 all select Person nodes that at some point had a positive COVID-19 test. In our next experiment, we vary the positivity rate from 2% to 10%, thus impacting query selectivity, and study its effect on execution time. Figure 5 shows the result of this experiment over the largest graph, G10, with positivity rate on the x-axis and execution time on the y-axis. We observe a linear relationship between positivity rate and execution time for all queries.

In our final experiment, we consider the effect of temporal navigation on query performance. We select queries Q10, Q11 and Q12 because they all contain a temporal navigation operator with a numerical occurrence indicator (PREV[n,m] in Q10 and NEXT[n,m] in Q11 and Q12). We set n = 0, and vary the maximum number of temporal navigation steps m between 4 and 48 in increments of 4. Figure 4 shows the result over G10 with m on the x-axis and query execution time on the y-axis. We observe that increasing the number of temporal navigation steps increases the execution time. This increase is initially linear, but plateaus when m reaches 16, because of the cumulative effect of increasing m.

We considered temporal property graphs (TPGs) and proposed temporal regular path queries (TRPQs) that incorporate time into TPG navigation. Starting with design principles, we proposed a natural syntactic extension of the MATCH clause of popular query languages, formally presented the semantics of TRPQs, and studied the complexity of their evaluation. We also demonstrated that a fragment of the TRPQ language can be implemented efficiently. We hope that our work on the syntax and semantics, the positive complexity results, and our implementation and evaluation will pave the way to usable and practical production-level implementations of TRPQs.

An interesting future direction is to add support for aggregation and grouping. Another natural direction is to incorporate our methods into existing graph processing systems like GraphX [63] , Portal [64] or Neo4j [65] , and to investigate a range of systems questions, including the impact of different object timestamping strategies, temporal coalescing strategies, and indexing methods on performance.

An interval of N is a term of the form [a, b] with a, b ∈ N and a ≤ b, which is used as a concise representation of the set of natural numbers {i ∈ N | a ≤ i ≤ b} (that is, to specify this interval, we just need to mention its starting point a and its ending point b). Using Allen's interval algebra [66] , given two intervals [a 1 , b 1 ] and [a 2 , b 2 ], we say that [a 1 , b 1 ] occurs during

A finite family F of intervals is said to be coalesced [67] [3, 4] , [6, 8] } is not, because [1, 2] meets [3, 4] . The set of all finite coalesced families of intervals is denoted by FC. Observe that ∅ ∈ FC. Moreover, given

Finally, given an interval Ω, we use FC(Ω) to denote the set of all families F ∈ FC such that for every [a, b] ∈ F, it holds that [a, b] occurs during Ω.

Given an interval [a, b] and v ∈ Val, the pair [a 1 , b 1 ]) , . . ., (v n , [a n , b n ])} and for every j ∈ {1, . . . , [1, 2] ), (v, [5, 8] )} and F 2 = {(v, [1, 2] ), (w, [3, 4] )} are both coalesced (assuming that v = w). On the other hand, F 3 = {(v, [1, 2] ), (v, [3, 4] )} is not coalesced because [1, 2] meets [3, 4] and these intervals have the same value in F 3 . Moreover, the set of all finite coalesced families of valued intervals is denoted by vFC. Finally, given an interval Ω, we use vFC(Ω) to denote the set of all families F ∈ vFC such that for every

With these ingredients, we can introduce the notion of interval-timestamped temporal property graph. • Ω is an interval of N;

is a function that maps a node or an edge to a finite coalesced family of intervals occurring during Ω;

is a function that maps a node or an edge, and a property name to a finite coalesced family of valued intervals occurring during Ω.

In addition, I satisfies the following conditions:

• If ρ(e) = (n 1 , n 2 ), then ξ(e) ξ(n 1 ) and ξ(e) ξ(n 2 ). (1 ≤ j ≤ n). Moreover, observe that two additional conditions are imposed on I, which enforce that an ITPG conceptually corresponds to a finite sequence of valid conventional property graphs. In particular, as was the case for TPGs, an edge can only exist at a time when both of the nodes it connects exist, and a property can only take on a value at a time when the corresponding node or edge exists. For instance, assume that I = (Ω, N, E, ρ, λ, ξ, σ) is an ITPG corresponding to our running example in Figure 1 . Then, we have that Ω = [1, 11] , ξ(n 2 ) = { [1, 9] }, ξ(n 3 ) = { [1, 7] } and ξ(e 2 ) = {[1, 2]}, so that ξ(e 2 ) ξ(n 2 ) and ξ(e 2 ) ξ(n 3 ). Moreover, for the property risk, we have that σ(n 2 , risk) = {(low, [1, 4] ), (high, [5, 9] )}.

We conclude this section by observing that there is a one-to-one correspondence between TPGs and ITPGs. On the one hand, each TPG can be transformed in polynomial-time into a ITPG, by putting in the same interval consecutive time points with the same values. On the other hand, each ITPG can be transformed in exponential-time into a TPG, by replacing each interval by the set of time points represented by it.

Removing numerical occurrence indicators. We start by considering a restriction of our query language in which numerical occurrence indicators are not allowed. Formally, this means that grammar (2) is replaced by:

path ::= test | axis | (path/path) | (path + path)

The resulting language is called NavL

Person test → pos Node ∃ P Fig. 6 . A parsing tree of NavL[PC,NOI]-expression path = (Node ∧ Person ∧ test → pos)/P [5, 9] /(Node ∧ ∃).

Removing path conditions. Consider a second restriction of our language in which there are no path conditions. Formally, this means that instead of grammar (3), we use:

The resulting language is called NavL[NOI].

Allowing numerical occurrence indicators only in the axes. Consider a grammar for tests as in (6), where path conditions are not allowed, and a grammar for path expressions where numerical occurrence indicators are only used in the axes:

The resulting language is called NavL[ANOI], where ANOI refers to numerical occurrence indicators used only in the axes.

In this section, we show that NavL[PC, NOI] can be evaluated in polynomial time, considering a computational model where accessing the distinct elements of a TPG takes time O(1). More precisely, for a TPG G = (Ω, N, E, ρ, λ, ξ, σ), it is assumed that the following operations can be performed in time O(1): given e ∈ E and n 1 , n 2 ∈ N , check whether ρ(e) = (n 1 , n 2 ); given e ∈ E, compute src(e) and tgt(G); given o ∈ (N ∪ E) and ∈ Lab, check whether λ(o) = ; given o ∈ (N ∪ E) and t ∈ Ω, check whether ξ(o, t) = true; and given o ∈ (N ∪ E), p ∈ Prop and v ∈ Val, check whether σ(o, p, t) = v. Moreover, we use notation path for the length of NavL[PC, NOI]-expression path as an input string over an appropriate alphabet. Then, it is possible to prove the following.

Theorem C.1. There exists an algorithm that, given a temporal property graph G and a NavL[PC, NOI]-expression path, computes path G in timeÕ( path 2 · |Ω| 2 · (|N | + |E|) 2 ).

In the rest of this section, we describe the polynomial-time algorithm in the statement of Theorem C.1. Let G = (Ω, N, E, ρ, λ, ξ, σ) be a TPG and path be an expression in NavL[PC,NOI]. Moreover, assume that M = |Ω| · (|N | + |E|) is the number of distinct (existing or non-existing) temporal objects in G. The algorithm constructs a parsing tree of path, where each node is associated with an operator in NavL[PC,NOI], and then, by using a bottom-up approach, computes, for each node u, the set of tuples (o, t, o , t ) that satisfy the operator labeling u. For example, a parsing tree of NavL[PC,NOI]expression (Node ∧ Person ∧ test → pos)/P [5, 9] /(Node ∧ ∃) is shown in Figure 6 . The algorithm starts by evaluating the leaves: Node G , Person G , test → pos G , P G and ∃ G , according to the semantics defined in Section V-A, and then it combines the resulting tables by using the operators ∧, [5, 9] and / in the order specified by the parsing tree. For instance, in the right-hand side of the tree, once P G , Node G and ∃ G have been computed, the algorithm continues by constructing Node ∧ ∃ G , followed by P [5, 9] G and then by P [5, 9] /(Node ∧ ∃) G . Notice that at any given moment, the result of at most path nodes is stored in the form of a table, each one with as many pairs of temporal objects as there are available, i.e., with at most M 2 tuples.

We Input : A TPG G, a table T 1 such that T 1 = path 1 G for some NavL[PC,NOI]-expression path 1 , and n ≥ 0

constructed just by considering all objects o ∈ V ∪ E and then generating tuples (o, t, o, t − 1) such that t ∈ Ω and t − 1 ∈ Ω, while F G can be constructed by considering all edges e ∈ E, and then generating tuples (src(e), t, e, t) and (e, t, tgt(e), t) for each t ∈ Ω.

For each internal node u of the parsing tree of path, we must consider one of the following two cases. Assume first that the label of u is either ∧, ∨, ¬ or ?, so that u represents a more complex test expression (test 1 ∧ test 2 ), (test 1 ∨ test 2 ), (¬test 1 ) or (?path 1 ). If u represents test (test 1 ∧ test 2 ), then the algorithm has already computed T 1 = test 1 G and T 2 = test 2 G . Hence, to construct test 1 ∧ test 2 G , the algorithm needs to compute the intersection of T 1 and T 2 , which can be done in timeÕ(M 2 ) by sorting both tables (each of size at most M 2 ) and iterating with two pointers, one on each table, to see which elements occur in both. Recall that the notationÕ(M 2 ) ignores the logarithmic factors, which in this case appear when sorting tables T 1 and T 2 . The case where u represents either (test 1 ∨ test 2 ) or (¬test 1 ) can be treated in a similar way. Finally, if u represents test (?path 1 ), for each tuple (o, t, o , t ) ∈ path 1 G , we need to include the tuple Assume now that the label of u is either /, +, [n, m] or [n, _], so that u represents a more complex path expression (path 1 /path 2 ), (path 1 + path 2 ), path 1 [m, n] or path 1 [m, _]. If u represents expression (path 1 /path 2 ), then the algorithm has already computed T 1 = path 1 G and T 2 = path 2 G . Hence, to construct path 1 /path 2 G , the algorithm just need to sort T 1 by the third and fourth columns (the second pair of temporal objects), sort T 2 by the first and second column (the first pair of temporal objects), and then join T 1 with T 2 by looking at matching temporal objects on those columns. The overall time for this construction isÕ(M 2 ), as it corresponds to a sort-merge join on two tables with at most M 2 tuples. If u represents the expression (path 1 + path 2 ), the algorithm computes the union of T 1 and T 2 . If u represents the expression path 1 [n, m], then the algorithm proceeds as follows, assuming that T 1 = path 1 G has already been computed. Given that path 1 [n, m] is equivalent to the expression path 1 [n, n]/path 1 [0, m − n], the procedure first runs Algorithm 1 COMPUTEREPETITION(G, T 1 , n) to compute path 1 [n, n] G in a similar way to the exponentiation by squaring algorithm [68] . If n = m, then we are ready in timeÕ( path 1 · M 2 ), since the most expensive operation is the sort-merge join, which is carried out in timeÕ(M 2 ) and at most O(log(n)) times, that is, O( path 1 ) times. Otherwise, Algorithm 2 COMPUTEINTERVALREPETITION(G, T 1 , T 2 , m − n) is called to compute path 1 [n, n]/path 1 [0, m − n] G , where T 2 = path 1 [n, n] G is the result of invoking COMPUTEREPETITION(G, T 1 , n). Here, O(log(m − n)) sort-merge joins have to be carried out, which is again O( path 1 ), so this takes a total time ofÕ( path 1 · M 2 ).

Finally, if u represents the expression path 1 [n, _], then the computation process is similar to the previous one, assuming that T 1 = path 1 G has already been computed. As before, we act as if we have to compute the table for another, but equivalent, expression, path 1 [n, n]/path 1 [0, M 2 ], which is done by first computing T 2 = COMPUTEREPETITION(G, T 1 , n), and then invoking COMPUTEINTERVALREPETITION(G, T 1 , T 2 , M 2 ). This takes timeÕ( path 1 · M 2 ) as the sort-merge join is carried out in timeÕ(M 2 ), and O(log(n) + log(M 2 )) such joins need to be computed, that isÕ( path 1 ) such joins. Notice that In summary, the table associated to each node of the parsing tree of path can be computed in timeÕ( path ·M 2 ). Given that there are at most O( path ) such nodes, the total computation time isÕ( path 2 ·M 2 ), that is,Õ( path 2 ·|Ω| 2 ·(|N |+|E|) 2 ). This concludes the proof of Theorem C.1. Input : A TPG G, a table T i such that T i = path i G for some NavL[PC,NOI]-expression path i for i = 1, 2, and n > 0 Output:

First of all notice that basic tests such as Node, Edge, , p → v, < k and ∃ can be checked in O(1), so in absence of numerical occurrence indicators, checking a test is equivalent to checking the satisfaction of a Boolean formula on a given valuation, which can be done efficiently in time O(n), where n is the size of the formula. We will assume in the following then that we have a linear-time function CHECKTESTNOPC(C, (o, t), test) that takes as input an ITPG C, a temporal object (o, t) in C and a test expression test in NavL[PC], and returns true if (o, t) |= test in C, and false otherwise.

To show that Eval(ITPG, NavL[PC]) is in PTIME, we present a polynomial-time procedure in Algorithm 3, called TUPLEEVALSOLVEONLYPC, that, given an ITPG C, a tuple representing a pair of temporal objects (o 1 , t 1 , o 2 , t 2 ) and an expression r in NavL[PC], checks whether (o 1 , t 1 , o 2 , t 2 ) ∈ r C . In what follows, we show that Algorithm 3 works in polynomial time.

Notice first that we do not directly return the result. Instead, we first look at a hashing table that stores previously stored results, and only if this was not previously computed, we compute the result for the input, and store it in the table before returning the value. We employ this to avoid an exponential number of calls when recursively calling the algorithm. This is possible since, in absence of numerical occurrence indicators, navigation is done at most one step at a time. Thus, if (o 2 , t 2 ) is a temporal object reached from (o 1 , t 1 ) using an expression r, then |t 1 − t 2 | is at most the number of symbols N and P occurring in r. Hence, if r is the length of expression r, then there are at most O( r · |N ∪ E|) temporal objects that we will need to consider for this call, which means, at most O( r 2 · |N ∪ E| 2 ) tuples representing pairs of temporal objects. Hence, given that there are at most r sub-expressions of r that can be reached in the tree decomposition of r, we need to store at most O( r 3 · |N ∪ E| 2 ) different results for TUPLEEVALSOLVEONLYPC.

The rest of the algorithm is quite straightforward, and it considers the case when the result has not been precomputed. If r is a temporal navigation operator, then by the definitions of N and P, a single temporal object can be reached, (o 1 , t 1 + 1) or (o 1 , t 1 − 1), respectively, so we check that (o 2 , t 2 ) is equal to that respective temporal object. This can be easily done in O(1). If r is a spatial navigation operator, then we have to look at the mapping from edges to source and destination nodes, ρ, to determine the set of objects that can be reached. If o is an edge, and we move forward, we look for its destination node, if we move backward, for its source node. If o is a node, and we move forward, we are looking for the edges that have o as their source, and if we move backward, then we look for those who have o as their destination. The whole process can be done in time O( ρ ), which is O( C ). When r is testing a condition, we just call the previously mentioned algorithm CHECKTESTNOPC to check whether (o, t) |= r in C. This last base case can be done in time O( r ).

When r is of the form (r 1 + r 2 ), where r 1 and r 2 are expressions in NavL[PC], we have that (o 1 , t 1 , o 2 , t 2 ) ∈ (r 1 + r 2 ) C if and only if (o 1 , t 1 , o 2 , t 2 ) ∈ r 1 C or (o 1 , t 1 , o 2 , t 2 ) ∈ r 2 C , so if suffices that the call to TUPLEEVALSOLVEONLYPC with any of inputs r 1 or r 2 returns true, and this can be easily checked by the algorithm. The last case is when r is of the form (r 1 / r 2 ) where r 1 and r 2 are expressions in NavL[PC], where we have that (o 1 , t 1 , o 2 , t 2 ) ∈ r C if there exists a temporal object (o, t) such that (o 1 , t 1 , o, t) ∈ r 1 C and (o, t, o 2 , t 2 ) ∈ r 2 C . We know that in absence of numerical occurrence indicators, t will be at distance at most r 1 from t 1 and at most r 2 from t 2 , since one can move only as many times as there are N and P symbols in the respective formulas, so this gives us a polynomial-size set from which we can extract candidates to satisfy this condition.

Finally, notice that at every call to TUPLEEVALSOLVEONLYPC(C, (o 1 , t 1 , o 2 , t 2 ), r), we either already have computed the value for the key (o 1 , t 1 , o 2 , t 2 , r), in which case we can give an answer immediately, or we are computing a new value to store in the hash table H, which has size bounded by O( r 3 · |N ∪ E| 2 ). In any case, since the most expensive step performs C. Eval(ITPG, NavL[NOI]) is Σ p 2 -hard Consider the following decision problem called Generalized Subset Sum (G-SUBSET-SUM) which is known to be Σ p 2complete [69] :

G-SUBSET-SUM Input:

Natural numbers vectors u and w of dimensions dim(u) and dim(w), respectively, and a positive integer S ∈ N Output:

true if there exists x ∈ {0, 1} dim(u) such that, for all y ∈ {0, 1} dim(w) , it holds that x · u + y · w = S, and false otherwise.

In this problem, a · b represents the inner product between vectors a and b. Given vectors u = (u 1 , . . . , u n ) ∈ N n and w = (w 1 , . . . , w m ) ∈ N m , and the integer S ∈ N, the goal is to provide a polynomial-time algorithm that returns a ITPG C, a tuple (o 1 , t 1 , o 2 , t 2 ), and an expression r in NavL[NOI] such that (o 1 , t 1 , o 2 , t 2 ) ∈ r C if and only if ∃x ∈ {0, 1} n ∀y ∈ {0, 1} m x · u + y · w = S. Let M = 2 · n i=1 u i + m j=1 w j , which can be easily computed in polynomial time from u and w. Then C will be the ITPG C = (Ω, N, E, ρ, λ, ξ, σ) where Ω = [0, 2 · M ], N = {v}, E = ∅, ρ is an empty function, λ(v) = l, ξ(v) = {[0, 2 · M ]} and σ is an empty function. In other words, C is a ITPG consisting of only one node existing from time 0 to time 2 · M , with no edges or properties. The tuple (o 1 , t 1 , o 2 , t 2 ) in our reduction will be given by (v, M, v, 2 · M ). As for the expression r, it will be defined recursively as follows. First define an expression for each component u i of u that will represent whether u i · x i will be chosen to be u i or 0:

The idea is that the time t of the temporal object that is being reached will store the sum given by x · u + y · w, plus M to avoid having negative numbers on the time dimension when testing that the result is different from S (this will be explained in more detail later). Define then an expression for u, representing the sum accumulated by the ∃x ∈ {0, 1} n part of the problem:

We will now use a recursive construction to represent the sum accumulated by the ∀y ∈ {0, 1} m part of the problem. First define condition r t =S+M , that represents that the accumulated sum is not S:

By taking r 0 := r t =S+M , now recursively define r j+1 from r j , for j ∈ {0, . . . , m − 1}, as follows:

The formula r w = r m will allow to iterate over all the accumulated sums implied by the ∀y ∈ {0, 1} m part of the problem. Finally, r is defined as follows:

Assume that both t and t are in Ω. By induction on the definition of numerical occurrence indicators, it is easy to see that

In fact, it can be proved by induction that (v, t, v, t ) ∈ r u C if and only if ∃x ∈ {0, 1} n such that t = t + x · u. We will demonstrate something stronger, which is that (v, t, v, t ) ∈ r u1 / . . . /r ui C if and only if there exists (x 1 , . . . ,

Our base case will be checking the property for r u1 , which, by the same exact reasoning as above, satisfies that (v, t, v, t ) ∈ r u1 C if and only if there exists x 1 ∈ {0, 1} such that t = t + x 1 · u 1 . Suppose now that (v, t, v, t ) ∈ r u1 / . . . /r ui C if and only if there exists (x 1 , . . . , x i ) ∈ {0, 1} i such that t = t + i k=1 x i · u i . We also know by the previous reasoning that (v, t , v, t ) ∈ r ui+1 C if and only if there exists x i+1 ∈ {0, 1} such that t = t + x i+1 · u i+1 . Hence, by definition of (r 1 /r 2 ), we get that (v, t, v, t ) ∈ r u1 / . . . /r ui+1 C if and only if there exists (v, t ) such that there exists (x 1 , . . . , x i ) ∈ {0, 1} i such that t = t + i k=1 x i · u i and there exists x i+1 ∈ {0, 1} such that t = t + x i+1 · u i+1 , i.e., if and only if there exists (x 1 , . . . , x i+1 ) ∈ {0, 1} i+1 such that t = t + i+1 k=1 x i · u i . This yields the result. Moreover, by definition of (r 1 /r 2 ) C , we get that (v, M, v, 2 · M ) ∈ r C if and only if there exists x ∈ {0, 1} n such that:

Notice also that the right part of this formula is built in the following way:

and only if t is any time point in Ω and t = 2 · M . Thus, all we need to prove now is that there exists some time

First, we show by induction that if (v, t, v, t ) ∈ r j C , then t = t . The base case is trivial, since r 0 is a test. For the inductive case, assume that if (v, t, v, t ) ∈ r j C , then t = t. For conciseness, define a := w j+1 . In the case of j + 1, we know

Notice then that these conditions only hold respectively if t 3 = t 1 + a, by definition of operator N and numerical occurrence indicators, t 3 = t 4 by induction hypothesis and t 2 = t 4 − 2a. Thus,

Given the conclusion in the previous paragraph, all we need to prove now is that (v, M + x · u, v, M + x · u) ∈ r v C if and only if ∀y ∈ {0, 1} m x · u + y · w = S. In fact, we will prove by induction a stronger condition:

for every k ∈ {0, . . . , m}, it holds that ∀y ∈ {0, 1} k t + k j=1 y j · w j = S if and only if (v, M + t, v, M + t) ∈ r k C The case when t = x · u and k = m yields Σ p 2 -hardness of Eval(ITPG, NavL[NOI]) as a result. In the base case k = 0, we need to prove that t = S if and only if (v, M + t, v, M + t) ∈ r 0 C . Recall that r 0 = r t =S+M . It can be easily checked that (v, t) |= r 0 if and only if t = S + M . Also, r 0 is a test, so (v, t, v, t ) ∈ r 0 C if and only if t = t and t = S + M . Hence, we have that (v, M + t, v, M + t) ∈ r 0 C if and only if M + t = M + t (which is trivially satisfied) and M + t = M + S, i.e., if and only if t = S. For the inductive case, assume that for k ∈ {0, . . . , m}, it holds that:

Then we have to prove that:

To prove this, define again a := w k+1 for conciseness. Recall that r k+1 = ((N[a, 

The first condition is equivalent to having that t 1 = M + t + a. The second condition implies that t 1 = t 2 , given what we proved in the previous paragraphs. Hence, given that (v, M +t+a, v, M +t+a) ∈ r k C , we conclude by induction hypothesis that ∀y ∈ {0, 1} k t+a+ k j=1 y j ·w j = S. The third condition is equivalent to having that t = t 2 −2a. Altogether, this means that

and only if t = M + t and ∀y ∈ {0, 1} k+1 t + k+1 j=1 y j · w j = S. Finally, this gives us the result we are trying to prove, that is,

To conclude the proof of the theorem, notice that r can be constructed in polynomial time with respect to the sizes of u, v and S, so the entire reduction can be computed in polynomial time.

Consider the following well-known decision problem called True Quantified Boolean Formula (TQBF), which is well known to be PSPACE-complete [70] :

TQBF Input:

A quantified Boolean formula ψ = Q 1 x 1 . . . Q n x n ϕ(x 1 , . . . , x n ) in prenex normal form where ϕ(x 1 , . . . , x n ) is a Boolean formula on variables x 1 , . . . , x n , and Q 1 , . . . , Q n are quantifiers (∀ or ∃). Output:

true if ψ is valid, and false otherwise.

Without loss of generality, ϕ can be assumed to be in conjunctive normal form. We will show that TQBF can be reduced to our problem Eval(ITPG, NavL[PC,NOI]) by proceeding in three steps. Let ψ = Q 1 x 1 . . . Q n x n ϕ(x 1 , . . . , x n ). First, we will show that a predicate bit(i, t) (defined below) can be written in our language. Then, by using that predicate, we will show that ϕ can be encoded in our language. Finally, we will show that an expression representing the quantifiers of ψ can be added to the expression encoding ϕ, which yields the result.

We start with a QBF formula ψ as described above to build an input for the problem Eval(ITPG, NavL[PC,NOI]). More precisely, the input ITPG will be C = (Ω, N, E, ρ, λ, ξ, σ) where Ω = [0, 2 n − 1], N = {v}, E = ∅, ρ is an empty function, λ(v) = l, ξ(v) = {[0, 2 n − 1]} and σ is an empty function. In other words, C is an ITPG consisting of only one node existing from time 0 to time 2 n − 1, with no edges or properties. Moreover, we will build an expression r such that (v, 0, v, 0) ∈ r C if and only if ψ is valid, which concludes the reduction. The steps to construct r are shown next.

Step 1: Expressing the predicate bit with an expression in NavL[PC, NOI]. Consider predicate bit(i, t) that tests whether the i-th bit of time t (from right to left when written in its binary representation) is 1. For instance, bit(1, 0) is false, and bit(5, 30) is true, since the first bit of 0 is 0, whereas 30 is 11110 in binary, and its fifth bit is 1. Now, consider the following expression:

Notice that r i is a test. Thus, for a pair of temporal objects (o 1 , t 1 , o 2 , t 2 ) to satisfy r i , (o 1 , t 1 ) must be equal to (o 2 , t 2 ). The expression to satisfy is a path test, so there must be some temporal object (o 3 , t 3 ), such that (o 1 , t 1 , o 3 , t 3 ) ∈ P 2 i , 2 i [0, _] / < 2 i ∧ ¬ < 2 i−1 C . Since the right part is a test as well, we can split the expression into two parts. Firstly, we must have that (o 1 , t 1 , o 3 , t 3 ) ∈ P 2 i , 2 i [0, _] C , which implies that t 3 = t 1 − k * 2 i for some integer k ≥ 0. Secondly, we must have that (o 3 , t 3 ) |= < 2 i ∧ ¬ < 2 i−1 , which implies that 2 i−1 ≤ t 3 < 2 i . This means that by writing t 3 in its binary form, we get 1 as its i-th bit. Together, these two conditions imply that t 1 also has 1 as its i-th bit, when written in its binary form. In consequence, we get that

Besides, we trivially get that:

(o, t, o, t) ∈ ¬r i C if and only if bit(i, t) is false Finally, notice that both r i and ¬r i have linear length with respect to i, which will be important later.

Step 2: Expressing any CNF formula in NavL[PC, NOI]. Assume that

where for every j and k, l j,k is a literal, i.e., either a variable in {x 1 , . . . , x n } or its negation. Then, for every j ∈ {1, . . . , m} and k ∈ {1, . . . , m j }, we define:

We can use these expressions to build a regular expression that tests the satisfiability of our formula ϕ(x 1 , . . . , x n ) by any valuation σ : {x 1 , . . . , x n } → {false, true}. To do this, we will use a time value t ∈ [0, 2 n − 1] to represent a valuation σ t , where σ t (x i ) = true if and only if the i-th bit of t is 1. We can do so by employing the expressions L i,k on tests along with conjunctions (∧) and disjunctions (∨), which are also present in NavL[PC, NOI]: Here again, r ϕ(x1,...,xn) is a test, so if it is satisfied by (o 1 , t 1 , o 2 , t 2 ), then (o 1 , t 1 ) = (o 2 , t 2 ) Furthermore, we show that, because of how the expressions L j,k are defined, we have:

(o, t, o, t) ∈ r ϕ(x1,...,xn) C if and only if σ t (ϕ(x 1 , . . . , x n )) = true

To show the previous assertion, first assume that (o, t, o, t) ∈ r ϕ(x1,...,xn) C . Because of how the expression is defined, for each j ∈ {1, . . . , m} we must have that (o, t, o, t) ∈ mj k=1 L j,k C . Hence, for any arbitrary j ∈ {1, . . . , m}, we immediately get that there must exist k ∈ {1, . . . , m j } such that (o, t, o, t) ∈ L j,k C . If l j,k = x i , then we must also have that (o, t, o, t) ∈ r i C , which, as we already showed, is equivalent to having that bit(i, t) is true, which in turn is equivalent to σ t (x i ) = true. Otherwise, if l j,k = ¬x i , then we must have that (o, t, o, t) ∈ ¬r i C , which, as we also showed, is equivalent to having that bit(i, t) is false, which in turn is equivalent to σ t (x i ) = false. In both cases, we get that σ t (l j,k ) = true, which means that σ t mj k=1 l j,k = true. Since this holds for an arbitrary value j, it holds for the entire conjunction. Hence, σ t (ϕ(x 1 , . . . , x n )) = true. Now, to show that the inverse is also true, assume that there is some t ∈ Ω such that σ t (ϕ(x 1 , . . . , x n )) = true. By definition, this means that for every j ∈ {1, . . . , m}, σ t mj k=1 l j,k = true. In turn, this means that for every j there is some k ∈ {1, . . . , m j } such that σ t (l j,k ) = true. As we saw earlier, this condition is equivalent to having that (o, t, o, t) ∈ L j,k C , hence, for every j ∈ {1, . . . , m}, we also have that (o, t, o, t) ∈ mj k=1 L j,k C . Given that this is true for every j, by definition, it is also true for the conjunction of them. Hence, we get that (o, t, o, t) ∈ r ϕ(x1,...,xn) C . This concludes the proof in Step 2.

Observation: Notice that the previous results already implies NP-hardness and CONP-hardness for the problem. Since every valuation σ : {x 1 , . . . , x n } → {false, true} has a corresponding time point t such that σ = σ t , and a Boolean CNF formula ϕ(x 1 , . . . , x n ) on n variables x 1 , . . . , x n is satisfiable if and only if there exists a valuation σ such that σ (ϕ(x 1 , . . . , x n )) = true, it is also true that ϕ(x 1 , . . . , x n ) is satisfiable if and only if (v, 0, v, 0) ∈ ?(N[0, _] / r ϕ(x1,...,xn) ) C (that is, advance in time to an arbitrary time point t and check the condition that implies that σ t (ϕ(x 1 , . . . , x n )) = true for the temporal object (v, t)). Similarly, ϕ(x 1 , . . . , x n ) is a tautology if and only if (v, 0, v, 0) ∈ ¬?(N[0, _] / ¬r ϕ(x1,...,xn) ) C (there is no path to a time point t such that σ t (ϕ(x 1 , . . . , x n )) = false, hence there is no valuation that makes ϕ to be false). In what follows, we show PSPACE-hardness of the problem.

Step 3: Expressing satisfiability of quantified Boolean formulae with an expression in NavL[PC,NOI]. Consider a quantified Boolean formula in prenex normal form

Since we already have a way to express ϕ(x 1 , . . . , x n ), we only need to express the possible valuations (i.e., time points) generated by the sequence of quantifiers Q 1 , . . . , Q n .

First, assume Q i is the existential quantifier (∃). This means that we can either make x i take valuation true or false to satisfy our formula. Considering time points, this is equivalent to have either 1 or 0 at the i-th bit of the time t at which we are standing. The intuition is that this can easily be expressed by starting at time 0, and then deciding whether to move into a future time point with the expression (N[2 i−1 , 2 i−1 ] + N[0, 0]) to set the i-th bit of the time point. Notice that if there are only expressions of the form N[2 i−1 , 2 i−1 ], and they are only mentioned once (at least, as prefixes of our test expressions) for each i, they will not affect other bits of t. Hence, if s i+1 represents the part of the subformula Q i+1 x i+1 . . . Q n x n ϕ(x 1 , . . . , x n ), then the subformula ∃x i Q i+1 x i+1 . . . Q n x n ϕ(x 1 , . . . , x n ) can be represented by first navigating through time, only affecting the first i − 1 bits, and then testing that the reached temporal object satisfies the following test:

In turn, if Q i is the universal quantifier (∀), we will employ the fact that ∀x ψ(x) is equivalent to ¬∃x ¬ψ(x) for every formula ψ(x) with free variable x. Hence, if s i+1 represents the part of the subformula Q i+1 x i+1 . . . Q n x n ϕ(x 1 , . . . , x n ), then the subformula ∀x i Q i+1 x i+1 . . . Q n x n ϕ(x 1 , . . . , x n ) can be represented by first only navigating through time, only affecting the first i − 1 bits, and then testing whether the reached temporal object satisfies the following test:

Finally, define s n+1 = r ϕ(x1,...,xn) . We claim that, for i ∈ {n + 1, n, . . . , 1} (i.e., 0 to n quantifiers), if t < 2 i−1 , then:

In particular, for n quantifiers, i.e., when i = 1, this result gives us PSPACE-hardness for Eval(ITPG, NavL[PC,NOI]), since we will have that ψ is true if and only if (v, 0, v, 0) ∈ s 1 C , where C and s 1 can be constructed in polynomial time in the size of ψ. We will prove this claim by induction over the number of quantifiers preceding ϕ(x 1 , . . . , x n ).

The base case consists of the formula with no quantified variables, i.e., when i = n + 1. We must show that if t < 2 n , then (v, t, v, t) ∈ r ϕ(x1,...,xn) C if and only if ϕ(σ(x 1 ), . . . , σ(x n )) is true. Notice that ϕ(σ(x 1 ), . . . , σ(x n )) is true is equivalent to having that σ t (ϕ(x 1 , . . . , x n )) is true. Hence, by step 2, this is equivalent to having that (v, t, v, t) ∈ r ϕ(x1,...,xn) C .

For the inductive case, assume that the claim holds for k quantifiers, for some k such that 0 ≤ k ≤ n. This means that for i = n − k + 1, if t < 2 i−1 , then the following condition holds:

We then have to prove that the condition holds for k + 1 quantifiers, i.e., for i − 1 = n − k. That is, if t < 2 i−2 , then we have to show that:

Let t < 2 i−2 , and consider the following cases.

Notice then that by definition of s i−1 , we have that (v, t , v, t ) ∈ s i−1 C if and only if there exists

By definition of s i , the previous condition holds if and only if

To prove the direction (⇒) of (7) assume that (v, t , v, t ) ∈ s i−1 C , which implies that there exists t 1 ∈ {t , t + 2 i−2 } satisfying that (v, t 1 , v, t 1 ) ∈ s i C . Given that t is an integer with i − 2 bits, t 1 comes from either putting 1 or 0 as the (i − 1)-th bit of t , which means that t 1 < 2 i−1 must hold. By induction hypothesis, this implies that Q i x i . . . Q n x n ϕ(σ t1 (x 1 ), . . . , σ t1 (x i−1 ), x i , . . . , x n ) is valid. Since t 1 and t share the same first i − 2 bits, we get that σ t (x j ) = σ t1 (x j ) for j ∈ {1, ..., i − 2}. Therefore, we conclude that

To prove the direction (⇐) of (7) suppose that ∃x i−1 Q i x i . . . Q n x n ϕ(σ t (x 1 ), . . . , σ t (x i−2 ), x i−1 , . . . , x n ) is valid. Then we know that Q i x i . . . Q n x n ϕ(σ t (x 1 ), . . . , σ t (x i−2 ), b, x i , . . . , x n ) is valid for some value b ∈ {true, false}. Define 1 b as 1 if b = true, and as 0 otherwise. Notice then that by taking t 1 = t + 2 i−2 · 1 b , we get that σ t1 (x i−1 ) = b.

Moreover, σ t1 (x j ) = σ t (x j ) for every j ∈ {1, . . . , i − 2} since t < 2 i−2 and t 1 shares all its first i − 2 bits with t . In consequence, this gives us that Q i x i . . . Q n x n ϕ(σ t1 (x 1 ), . . . , σ t1 (x i−1 ), x i , . . . , x n ) is valid. By induction, this means that (v, t 1 , v, t 1 ) ∈ s i C . Since t 1 = t + 2 i−2 · 1 b , we have that either t 1 = t or t 1 = t + 2 i−2 . In any case, we get that

By definition of s i−1 , we conclude that (v, t , v, t ) ∈ s i−1 C , which was to be shown.

This, in turn, is equivalent to the fact that there is no time point

Hence, we know that (v, t , v, t ) ∈ s i−1 C if and only if, for every time point

This condition means that each

Notice then that t < 2 i−2 , so t and t + 2 i−2 are both smaller than 2 i−1 . By induction hypothesis, we get then that both quantified Boolean formulae

. , x n ) are valid. Furthermore, since t is smaller than 2 i−2 , σ t (x i−1 ) = false, and since t + 2 i−2 only differs from t in its (i − 1)-th bit, which is 1, we get that σ t +2 i−2 (x i−1 ) = true and, for every j ∈ {1, . . . , i − 2}, it holds that σ t (x j ) = σ t +2 i−2 (x j ). Hence, both quantified Boolean formulae Q i x i . . . Q n x n ϕ(σ t (x 1 ), . . . , σ t (x i−2 ), false, x i , . . . , x n ) and Q i x i . . . Q n x n ϕ(σ t (x 1 ), . . . , σ t (x i−2 ), true, x i , . . . , x n ) are valid. Therefore, we conclude that the quantified Boolean formula ∀x i−1 Q i x i . . . Q n x n ϕ(σ t (x 1 ), . . . , σ t (x i−2 ), x i−1 , x i , . . . , x n ) is valid. To show the direction (⇐) of (7) suppose that ∀x i−1 Q i x i . . . Q n x n ϕ(σ t (x 1 ), . . . , σ t (x i−2 ), x i−1 , x i , . . . , x n ) is valid. Then we get that both quantified Boolean formulae Q i x i . . . Q n x n ϕ(σ t (x 1 ), . . . , σ t (x i−2 ), false, x i , . . . , x n ) and Q i x i . . . Q n x n ϕ(σ t (x 1 ), . . . , σ t (x i−2 ), true, x i , . . . , x n ) are valid. As before, since t is smaller than 2 i−2 , σ t (x i−1 ) = false, and since t + 2 i−2 only differs from t in its (i − 1)-th bit, which is 1, we get that σ t +2 i−2 (x i−1 ) = true and for every j ∈ {1, . . . , i − 2}, it holds that σ t (x j ) = σ t +2 i−2 (x j ). This allows us to conclude that both Q i x i . . . Q n x n ϕ(σ t (x 1 ), . . . , σ t (x i−1 ), x i , . . . , x n ) and Q i x i . . . Q n x n ϕ(σ t +2 i−2 (x 1 ), . . . , σ t +2 i−1 (x i−1 ), x i , . . . , x n ) are valid. Finally, by induction hypothesis, this implies that (v, t , v, t ) ∈ s i C and (v, t + 2 i−2 , v, t + 2 i−2 ) ∈ s i C , which, as shown before, holds if and only if (v, t , v, t ) ∈ s i−1 C , which concludes the proof for this case. As we mentioned, all this together implies that Eval(ITPG, NavL[PC,NOI]) is PSPACE-hard, since ITPG C, expression s 1 and tuple (v, 0, v, 0) can be constructed in polynomial time in the size of ψ, and the problem of determining whether ψ is valid can be reduced to the problem of verifying whether (v, 0, v, 0) ∈ s 1 C .

Thus, it only remains to show that Eval(ITPG, NavL[PC,NOI]) is in PSPACE. To do this, we will provide an algorithm in PSPACE that, given an ITPG C, an expression r in NavL[PC,NOI], and a tuple (o, t, o , t ) ∈ PTO(C), computes whether (o, t, o , t ) ∈ r C . More precisely, Algorithm TUPLEEVALSOLVE (C, r, (o 1 , t 1 , o 2 , t 2 )) is defined as follows.

Next we show that for every ITPG C = (Ω, N, E, ρ, λ, ξ, σ), expression r in NavL[PC, NOI] and tuple (o 1 , t 1 , o 2 , t 2 ) ∈ PTO(C), it holds that (o 1 , t 1 , o 2 , t 2 ) ∈ r C if and only if TUPLEEVALSOLVE (C, r, (o 1 , t 1 , o 2 , t 2 )) returns true. Besides, we will prove that TUPLEEVALSOLVE works in polynomial space in the size of the input.

Notice firstly that the algorithm is recursive, and that the depth of the recursion is polynomial, since at every step on which the algorithm is called, the size of the path expression strictly decreases. There is one exception, that happens when r is of the form path[n, _]. Notice that here this expression is treated as if it was path[n, m], where m = n + |Ω| · |N ∪ E| (a term with polynomial size with respect to the input). Thus, the whole expression r can be thought as an equivalent expression r where all terms of the form path[n, _] are replaced with similar ones, in a manner that makes the whole input remain polynomial to the original. Although it might seem that path[n, m] could reach an exponential number of recursive calls, notice that this expression is always parsed as two expressions, path[n, n] and path[0, m − n], and then each of those is solved in a way similar to that of exponentiation by squaring, which allows to always get rid of the numerical occurrence indicator [n, m] after at most O(log m) recursive calls, a number that is polynomial in the size of the input. Hence the recursion tree has polynomial height.

Secondly, assume that conjunctions (∧) and disjunctions (∨) are computed from left to right, i.e., A(x) ∧ A(y) first computes A(x) and then computes A(y). Hence, at the most, we will need to have in memory as many calls to the algorithm as the recursion tree height. This number is polynomial, and since every non-recursive step is either a non-deterministic guess or clearly in polynomial time in the size of the input, we get that the whole algorithm gives an answer in PSPACE.

Now, let C = (Ω, N, E, ρ, λ, ξ, σ) be an ITPG, let r be an expression in NavL[PC, NOI] and let (o 1 , t 1 , o 2 , t 2 ) ∈ PTO(C) be a tuple concatenating two temporal objects. First suppose that (o 1 , t 1 , o 2 , t 2 ) ∈ r C . We will show, by induction on the recursion level of the recursion tree, that there exists an execution of algorithm TUPLEEVALSOLVE (C, r, (o 1 , t 1 , o 2 , t 2 )) that returns true. Notice that the base case is given for those cases where there is no recursion. The base test cases, i.e., when r is equal to either Node, Edge, , p → v, < k, or ∃, are easily checked, since all the algorithm does is checking their definitions over the temporal object (o 1 , t 1 ) after checking that it is equal to the temporal object (o 2 , t 2 ). Notice that for property-value checking and existence checking, the default value is false, returned at the end of the algorithm. The base navigation operators N, P, F and B are also easily checked by their definitions. For time navigation, we check that the objects are the same and that their associated times are consecutive, whereas for spatial navigation, we check that the times are equal, and that the respective objects are consecutive, by looking at the functions tgt and src, as defined by the operators.

As for the recursive cases, assume that the property holds up to recursion level n and we want to prove that it holds at recursion level n−1 (one level higher in the recursion tree). Firstly, a path expression matching any of the regular expressions (test∧test), (test ∨ test) or (¬test) can also be checked quite straightforwardly by definition, and since the flow is deterministic, we will omit further formal proofs. Secondly, if the path expression r is of the form (?r ), then we know that (o 1 , t 1 , o 2 , t 2 ) ∈ r C if and only (o 1 , t 1 ) = (o 2 , t 2 ) and there exists a temporal object (o , t ) ∈ PTO(C) such that (o 1 , t 1 , o , t ) ∈ r C . In such case, the algorithm iterates one by one over the possible temporal objects to find one that satisfies the condition. If such temporal object exists, the call to TUPLEEVALSOLVE returns true, since we then know that the tuple (o 1 , t 1 , o , t ) satisfies r if and only if there exist an execution of the call that returns true. Conversely, if no such temporal object exists, all recursive calls to TUPLEEVALSOLVE will return false by induction hypothesis. In this case, the algorithm will finish the loop without returning and it will then reach the last line (in part II), returning false.

As for regular path expressions r of the form (r 1 + r 2 ) where r 1 and r 2 are also regular path expressions, we know that by definition (r 1 + r 2 ) C = r 1 C ∪ r 2 C . Hence, (o 1 , t 1 , o 2 , t 2 ) ∈ r C if and only if (o 1 , t 1 , o 2 , t 2 ) ∈ r 1 C or (o 1 , t 1 , o 2 , t 2 ) ∈ r 2 C . By induction hypothesis, this means that (o 1 , t 1 , o 2 , t 2 ) ∈ r C if and only if either TUPLEEVALSOLVE(C, r 1 , (o 1 , t 1 , o 2 , t 2 )) returns true or TUPLEEVALSOLVE(C, r 2 , (o 1 , t 1 , o 2 , t 2 )) returns true. As the algorithm returns the disjunction of this two results, we get that (o 1 , t 1 , o 2 , t 2 ) ∈ r C if and only if TUPLEEVALSOLVE(C, r, (o 1 , t 1 , o 2 , t 2 )) returns true For regular path expressions r of the form (r 1 / r 2 ) where r 1 and r 2 are also TRPQs, we know that if (o 1 , t 1 , o 2 , t 2 ) ∈ r C , then there must exist a temporal object (o , t ) ∈ PTO(C) such that (o 1 , t 1 , o , t ) ∈ r 1 C and (o , t , o 2 , t 2 ) ∈ r 2 C . Thus, by iterating over all temporal object (o , t ), when we reach that exact temporal object, both TUPLEEVALSOLVE(C, r 1 , (o 1 , t 1 , o , t )) and TUPLEEVALSOLVE(C, r 2 , (o , t , o 2 , t 2 )) will be true, so the algorithm will return true and true = true. On the other hand, if (o 1 , t 1 , o 2 , t 2 ) / ∈ r C , then no matter what temporal object (o , t ) is being considered, we will either have that (o 1 , t 1 , o , t ) / ∈ r 1 C or (o , t , o 2 , t 2 ) / ∈ r 2 C . By induction hypothesis, this means that, for every execution, either the first call will be false or the second will, so the condition that makes the algorithm return true will not be met. Hence, the last line is reached and the algorithm returns false.

For expressions r matching regular expressions with numerical occurrence indicators of the form r [n, m], recall that, by definition, r [n, m] C = m k=n r k C . This implies that (o 1 , t 1 , o 2 , t 2 ) ∈ r [n, m] C if and only if there exists an integer k ∈ [n, m] such that (o 1 , t 1 , o 2 , t 2 ) ∈ r k C . We split this case into three cases. 1) When n = m, then (o 1 , t 1 , o 2 , t 2 ) ∈ r [n, m] C if and only if (o 1 , t 1 , o 2 , t 2 ) ∈ r n C . Recall that the concatenation operator is associative, and that r n C = r / r n−1 C = r / . . . / r C (n repetitions). Hence, if we define l = n/2 , then if n is even, r n C = (r l / r l ) C , whereas if n is odd, r n C = (r l / r / r l ) C . In the first case then, by the definition of concatenation, (o 1 , t 1 , o 2 , t 2 ) ∈ (r l / r l ) C if and only if there exists a temporal object (o , t ) such that (o 1 , t 1 , o , t ) ∈ r l C and (o , t , o 2 , t 2 ) ∈ r l C . By induction hypothesis this is equivalent to having that there exists a temporal object (o , t ) such that both TUPLEEVALSOLVE(C, r [l, l], (o 1 , t 1 , o , t )) and TUPLEEVALSOLVE(C, r [l, l], (o , t , o 2 , t 2 )) return true, since r l C = r [l, l] C . Since the algorithm iterates over all possible temporal objects to check this condition, if such temporal object exists, it will return true, if it does not, then it will reach the last line and return false.

The second case is similar, except that now we need two temporal objects as there are two concatenations, which means that (o 1 , t 1 , o 2 , t 2 ) ∈ (r l / r / r l ) C if and only if there exist two temporal objects (o , t ) and (o , t ) such that (o 1 , t 1 , o , t ) ∈ r l C , (o , t , o , t ) ∈ r C and (o , t , o 2 , t 2 ) ∈ r l C . By induction hypothesis, and recalling again that r l C = r [l, l] C , that means that (o 1 , t 1 , o 2 , t 2 ) ∈ r C if and only if there exist two temporal objects (o , t ) and (o , t ) such that the calls TUPLEEVALSOLVE(C, r [l, l], (o 1 , t 1 , o , t )), TUPLEEVALSOLVE(C, r , (o , t , o , t )) and TUPLEEVALSOLVE(C, r [l, l], (o , t , o 2 , t 2 )) return true. Again, since the algorithm iterates over all possible pairs of temporal objects (o , t ) and (o , t ) to check this condition, if such temporal objects exist, it will return true, if it does not, then it will reach the last line and return false. Since this recursion must stop at some point, the base case n = 1 is included, in which case r is r [1, 1] , which is equivalent to r , since in such case we know that (o 1 , o 2 , t 1 , t 2 ) ∈ r C if and only if (o 1 , t 1 , o 2 , t 2 ) ∈ r C . By hypothesis induction, this happens if and only if TUPLEEVALSOLVE(C, r , (o 1 , t 1 , o 2 , t 2 )) returns true, which is why the algorithm returns that result. Finally, the recursion works for n ≥ 2. The case when n = 1 is covered as a base case, so it only remains to look for the case when n = 0. For such case, recall that r 0 = (∃ ∨ ¬∃), i.e., a test that is a tautology. Hence,

, which is what the algorithm tests. 2) When n = 0, the base case will be slightly different. We can assume that n = m since the case where n = m was already covered. Hence, the base case only needs to consider the value m = 1. In this case, we have that r C = r [0, 1] C = r 0 C ∪ r 1 C . As we already discussed, checking whether (o 1 , t 1 , o 2 , t 2 ) ∈ r 0 C comes down to checking whether (o 1 , t 1 ) = (o 2 , t 2 ), and also r 1 C = r C . By induction, we know that (o 1 , t 1 , o 2 , t 2 ) ∈ r C if and only if TUPLEEVALSOLVE(C, r , (o 1 , t 1 , o 2 , t 2 )) returns true. Since the algorithm returns true if either of these conditions hold, this case is correctly covered. As for the recursive case, it is very similar to the previous one. If we define again l = m/2 , we can notice that, if m is even, then r For the inverse inclusion, notice that if (o 1 , t 1 , o 2 , t 2 ) ∈ (r [0, l] / r [0, l]) C , then there must exist a temporal object (o , t ) and two integers j and k in [0, l] such that (o 1 , t 1 , o , t ) ∈ r j C and (o , t , o 2 , t 2 ) ∈ r k C , which implies that (o 1 , t 1 , o 2 , t 2 ) ∈ (r j / r k ) C . Again, since concatenation is associative, (r i / r j ) C = r i+j C , and because i + j is at most n, (o 1 , t 1 , o 2 , t 2 ) ∈ n i=0 r i C , i.e., (o 1 , t 1 , o 2 , t 2 ) ∈ r C . As a result, we get that (r [0, l] / r [0, l]) C ⊆ n i=0 (r ji / r ki ) C . Combining these two inclusions with the equality r C = n i=0 (r ji / r ki ) C , we get that r C = (r [0, l] / r [0, l]) C . To show the part where n is odd, notice that (o 1 , t 1 , o 2 , t 2 ) ∈ r C if and only if there exists an integer i ∈ [0, m] such that (o 1 , t 1 , o 2 , t 2 ) ∈ r i C . Notice then that i can be written as the sum of three integers, k i := i/2 (∈ [0, l]), j i =: i/2 (∈ [0, l]) and s i := (imod2) (∈ [0, 1]). Since the concatenation operator is associative, this means that r i C = (r ji / r si / r ki ) C , which in turn implies that r C = m i=0 r i C = m i=0 (r ji / r si / r ki ) C . Then, by definition of the concatenation operator, (o 1 , t 1 , o 2 , t 2 ) ∈ m i=0 (r ji / r s1 / r ki ) C if and only if there exist two temporal objects (o , t ) and (o , t ) such that (o 1 , t 1 , o , t ) ∈ r ji C , (o , t , o , t ) ∈ r si C and (o , t , o 2 , t 2 ) ∈ r ki C . Since both j i and k i are bounded by l, r ji C ⊆ l i=0 r i C , and r ki C ⊆ l i=0 r i C , so any tuple (o, t, o , t ) in r ki C or r ji C will also be in l i=0 r i C = r [0, l] C . Similarly, any tuple in r s1 C will also be in For the inverse inclusion, notice that if (o 1 , t 1 , o 2 , t 2 ) ∈ (r [0, l] / r [0, 1] / r [0, l]) C , then there must exist two temporal objects (o , t ) and (o , t ), and three integers j ∈ [0, l], s ∈ [0, 1] and k ∈ [0, l] such that (o 1 , t 1 , o , t ) ∈ r j C , (o , t ) ∈ r s C and (o , t , o 2 , t 2 ) ∈ r k C , which implies that (o 1 , t 1 , o 2 , t 2 ) ∈ (r j / r s / r k ) C . Again, since concatenation is associative, (r i / r s / r j ) C = r i+s+j C , and because i + s + j is at most n, (o 1 , t 1 , o 2 , t 2 ) ∈ n i=0 r i C , i.e., (o 1 , t 1 , o 2 , t 2 ) ∈ r C . As a result, we get that (r [0, l] / r [0, 1] / r [0, l]) C ⊆ n i=0 (r ji / r ki ) C . As before, combining these two inclusions with the equality r C = n i=0 (r ji / r li / r ki ) C , we get that r C = (r ). Since the algorithm iterates over all temporal objects (o , t ) and checks if these conditions are met to return true, it will return true if (o 1 , t 1 , o 2 , t 2 ) ∈ r C and it will reach the last line and return false otherwise. Finally, for an expression r matching a regular expressions with numerical occurrence indicators of the form r [n, _], recall that we showed in Section C-A that r [n, _] C = r [n, m] C , where the expression m := n + (|Ω| + |V ∪ E|) 2 is polynomial in the size of the original input. By induction, (o 1 , t 1 , o 2 , t 2 ) ∈ r [n, m] C if and only if TUPLEEVALSOLVE(C, r [n, m], (o 1 , t 1 , o 2 , t 2 )) returns true, which is what the algorithm returns for this case. Altogether, we proved that TUPLEEVALSOLVE works in polynomial space, and that TUPLEEVALSOLVE(C, r, (o 1 , t 1 , o 2 , t 2 )) returns true if and only if (o 1 , t 1 , o 2 , t 2 ) ∈ r C . This concludes the proof of the theorem.

A natural question is whether there is a restriction on NavL[NOI] that can reduce the complexity of the evaluation problem but is still expressive enough to represent some useful queries. At this point, a restriction used in the study of XPath comes to the rescue [52] . In what follows, we show that the complexity of the evaluation problem is lower if numerical occurrence indicators are only allowed in the axes. Proof. To show NP-hardness, consider the following decision problem called Subset Sum (SUBSET-SUM), which is known to be NP-complete [71] :

We have that Eval(ITPG, NavL[PC]) can be solved in polynomial time by Theorem V.1, and we know that Eval(ITPG, NavL[PC]) is NP-complete by Theorem D.1. A natural question then is whether the complexity remains the same if these functionalities are combined. Notice that Eval(ITPG, NavL[PC,NOI]) is PSPACE-complete, so a positive answer to this question means a significant decrease in the complexity of the query evaluation problem. Unfortunately, we show that the complexity of the entire language does not decrease by restricting numerical occurrence indicators to occur only in the axes.

Theorem D.2. Eval(ITPG, NavL[PC,ANOI]) is PSPACE-complete.

Proof. Notice that every expression in NavL[PC, ANOI] is also an expression in NavL[PC, NOI], so PSPACE-membership follows immediately from Theorem V.1. Hence, we only need to prove PSPACE-hardness for NavL[PC, ANOI].

To show this, we replace test expressions r i in the proof in Section C-D by an expression in NavL[PC, ANOI] that will be denoted by q i . Expression q i is defined in such a way that, for every time t, it holds that (v, t, v, t) ∈ q i C if and only if (v, t, v, t) ∈ r i C , i.e., if and only if bit(i, t) is true, where bit(i, t) holds if the i-th bit of time t (from right to left when written in its binary representation) is 1. More precisely, expression q i is defined as follow: Notice that the length of the representation 2 k is k, so the whole expression q i has length O(n 2 ), which is polynomial with respect to the size of ψ. Also, notice that as before, we only need a polynomial number of these expressions for the reduction, and no further nesting of numerical occurrence indicators is required for the proof. Hence, we only need to prove that (v, t, v, t) ∈ q i C if and only if bit(i, t) is true. Recall that for this reduction, C is an ITPG consisting of only one node v, existing from time 0 to time 2 n − 1, with no edges or properties, so any temporal object considered will be of the form (v, t).

First, notice that q i is a path test, so (v, t, v, t) ∈ q i C if and only if there exists a time point t such that 

We now prove that (v, t, v, t) ∈ q i C if and only if bit(i, t) is true. To show direction (⇐), suppose that bit(i, t) is true. Notice then that (v, t 1 , v, t 2 ) ∈ (P[0, 0] + P[2 k , 2 k ]) C , if and only if t 2 = t 1 or t 2 = t 1 − 2 k . In particular, if the (k + 1)-th bit of t 1 is 1, then t 2 = t 1 − 2 k has a binary representation that is equal to that of t 1 except on the (k + 1)-th bit, and (v, t 1 , v, t 2 ) ∈ (P[0, 0] + P[2 k , 2 k ]) C . Similarly, if the (k + 1)-th bit of t 1 is 0, then t 2 = t 1 has a binary representation that is equal to that of t 1 , and also (v, t 1 , v, t 2 ) ∈ (P[0, 0] + P[2 k , 2 k ]) C . As in Section C-D, given b ∈ {true, false}, let 1 b be 1 if b = true, and be 0 otherwise. Moreover, define the sequence of time points t n+1 , . . . , t i such that t n+1 = t and t k = t k+1 − 1 bit(k+1,t) · 2 k for k ∈ {i, . . . , n}. Then for every k ∈ {i, . . . , n}, it holds that (v, t k , v, t k+1 ) ∈ (P[0, 0] + P[2 k , 2 k ]) C , and in particular, t = t − n k=i 1 bit(k+1,t) · 2 k satisfies (8) . Therefore, if the i-th bit of t is 1, then the i-th bit of t will be 1 as well. Hence,

given bit(i, t) is true, we conclude that (v, t, v, t) ∈ q i C , since for t = t − n k=i 1 bit(k+1,t) · 2 k , equation (8) holds and

(v, t ) |= (< 2 i ∧ ¬ < 2 i−1 ).

To show direction (⇒), suppose that (v, t, v, t) ∈ q i C . Then there exists a time point t such that (8) holds, which only holds if there exists a sequence of time points t n+1 , . . . , t i where t n+1 = t and either t k = t k+1 or t k = t k+1 − 2 k for k ∈ {i, . . . , n}, and t i = t . Notice that for such values for k, 2 k is a multiple of 2 i , so t = t + d · 2 i for some integer d. We conclude that the i-th bit of t is equal to 1 if and only if the i-th bit of t is equal to 1. Moreover, (v, t ) |= (< 2 i ∧ ¬ < 2 i−1 ), so the i-th bit of t is indeed equal to 1, so bit(i, t) must be equal to true.

From the previous paragraphs, we conclude that (v, t, v, t) ∈ q i C if and only if bit(i, t) is true. Hence, by replacing r i with q i in the proof of Section C-D, we deduce that NavL[PC, ANOI] is also PSPACE-hard, which was to be shown.

In section VII-A we discussed the impact of TGraphs size on query execution, and pointed out that the trends presented in Figure 2 can be explained by the size of output. To study this, we computed the increase in output size for graphs G2-G6 (with between 2,000 and 10,000 nodes, as summarized in Table I ) relative to the size of the output for G1 (with 1,000 nodes), for each query. Figures 7 (a) and (b) show this result. In these figures, the x-axis shows the number of nodes in each graph, and the y-axis shows the relative size of output bindings table, in comparison to the output of the same query over G1. Similarly to Figure 2 , it can be observed that the output size for all queries except Q5, Q9, Q10, Q11, and Q12 follows a linear trend. For Q5, Q9, Q10, Q11 and Q12, the output size increases quadratically. Figures 7 (c) gives another presentation of these results. Here, in addition to computing the output size relative to G1 for each query (shown on the y-axis), we also computed the execution time relative to G1 (shown on the x-axis). This plot show that relative query execution time and relative increase in output size are highly correlated for all queries, and for the majority of our queries we have perfect correlation. (a) (b) (c) Fig. 7 . Relationship between input size, output size, and query execution time for all queries. Execution time and output size are computed for graphs G2-G6 (with between 2,000 and 10,000 nodes) in Table I , in proportion to these quantities for graph G1 (with 1,000 nodes).

Foundations of modern query languages for graph databases

G-core: A core for future graph query languages

Modeling blog dynamics

The dynamics of viral marketing

Microscopic evolution of social networks

Nonparametric link prediction in dynamic networks

An event-based framework for characterizing the evolutionary behavior of interaction graphs

Mechanistic insights into metabolic disturbance during type-2 diabetes and obesity using qualitative networks

A gene-coexpression network for global discovery of conserved genetic modules

Discovering correlated spatiotemporal changes in evolving graphs

Web graph similarity for anomaly detection

Chronograph: Enabling temporal graph traversals for efficient information diffusion analysis over time

A model and query language for temporal graph databases

Nepal: a path query language for communication networks

The G* graph database: efficiently managing large distributed dynamic graphs

Temporal graph algebra

Temporal Statement Modifiers

Time Domain

Encyclopedia of Database Systems

On an algebra for historical relational databases: Two views

Unifying temporal data models via a conceptual model

A taxonomy of time databases

Point Versus Intervalbased Temporal Data Models

Temporal alignment

Comparison of access methods for time-evolving data

Temporal features in SQL: 2011

Pattern mining in frequent dynamic subgraphs

Towards Efficient Query Processing on Massive Time-Evolving Graphs

Building a reference combinatorial model for MANETs

A query based approach for mining evolving graphs

Efficient snapshot retrieval over historical graph data

Storing and Analyzing Historical Graph Data at Scale

Mining Periodic Behavior in Dynamic Social Networks

On Querying Historical Evolving Graph Sequences

Timereach: Historical reachability queries on evolving graphs

Localizing anomalous changes in time-evolving graphs

Link analysis using time series of web graphs

Path problems in temporal graphs

Efficient algorithms for temporal path computation

Reachability and time-based path queries in temporal graphs

Cypher: An evolving query language for property graphs

Temporal alignment

Coalescing in temporal databases

Pgql: A property graph query language

Association of ISO Graph Query Language Proponents

Querying graphs with data

The complexity of relational query languages (extended abstract)

Regular path queries with constraints

Rewriting of regular expressions and regular path queries

Querying graph databases

XML path language (XPath) version 1.0," W3C Recommendation

Conditional XPath

Efficient algorithms for processing XPath queries

XML path language (XPath) 3.1," W3C Recommendation

rust-itertools/itertools

Rayon-rs/rayon: Rayon: A data parallelism library for rust

Apache spark: a unified engine for big data processing

Apache flink: Stream and batch processing in a single engine

Naiad: a timely dataflow system

Differential dataflow

Slurm: Simple linux utility for resource management

A person-to-person and person-toplace covid-19 contact tracing system based on ogc indoorgml

Powergraph: Distributed graphparallel computation on natural graphs

Zooming out on an evolving graph

Neo4j: What is a graph database

Maintaining Knowledge about Temporal Intervals

Temporal Coalescing

On the complexity of pattern matching for highly compressed twodimensional texts

Word problems requiring exponential time(preliminary report)

Computers and Intractability; A Guide to the Theory of NP-Completeness

SUBSET-SUM Input:A finite set of integers A N, and a positive integer S ∈ N Output:true if there exists a subset A ⊆ A of A such that a∈A a = S.Given a set A N, and an integer S ∈ N, the goal is to provide a polynomial-time algorithm that returns an ITPG C, a tuple (o 1 , t 1 , o 2 , t 2 ), and an expression r in NavL [ANOI] such that (o 1 , t 1 , o 2 , t 2 ) ∈ r C if and only if there exists A ⊆ A such that a∈A a = S. More specifically, C will be the ITPG (Ω, N, E, ρ, λ, ξ, σ) where Ω = [0, S], N = {v}, E = ∅, ρ is an empty function, λ(v) = l, ξ(v) = {[0, S]} and σ is an empty function. In other words, C is an ITPG consisting of only one node existing from time 0 to time S, with no edges or properties. The tuple (o 1 , t 1 , o 2 , t 2 ) in our reduction will be given by (v, 0, v, S). Moreover, assuming that A = {a 1 , . . . , a n }, expression r is defined as follows:Notice that ITPG C, expression r in NavL[ANOI] and tuple (v, 0, v, S) can be computed in polynomial time in the sizes of A and S. Besides, it is straightforward to prove that (v, 0, v, S) ∈ r C if and only if there exists A ⊆ A such that a∈A a = S. This concludes the of NP-hardness of Eval(ITPG, NavL[ANOI]).To show that this problem is NP-complete, it only remains to show that the problem is also in NP. We present a nondeterministic algorithm that works in polynomial time, TUPLEEVALSOLVE_ANOI, that, given an ITPG C, an expression r in NavL[ANOI] and a pair of temporal objects (o 1 , t 1 , o 2 , t 2 ), has a run that returns true if and only if (o, t, o , t ) ∈ r C . This procedure is presented in Algorithm 6.: An ITPG C = (Ω, N, E, ρ, λ, ξ, σ), an expression r in NavL[ANOI] and a pair of temporal objects TUPLEEVALSOLVE_ANOI is very similar to TUPLEEVALSOLVE, so we will not discuss in detail what it does. Instead, we give an intuition of what the differences are that allow to return the right answer in non-deterministic polynomial time, instead of polynomial space. First, notice that if r is a test, then the algorithm works by solving basic tests efficiently, and then conjunctions, disjunctions and negations of tests are solved just by using directly the definition of these Boolean connectives.Hence, unlike what happens in the presence of path conditions, where we can have nested expressions with existential conditions and negations of existential conditions, sub-expressions for tests are efficiently solved by TUPLEEVALSOLVE_ANOI. Second, notice that for spatial navigation, we write the problem in terms of the reachability problem for graphs in a number of steps in a set {n, . . . , m}. This problem can be efficiently solved by using exponentiation by squaring on the adjacency matrix. Besides, notice that for spatial navigation expressions in NavL[ANOI], we need to consider as many new objects as there are in V ∪ E since the time is fixed, which is why expressions F[n, _] and B[n, _] are equivalent to F[n, m] and B[n, m], respectively, with m = n + |N ∪ E| = n + |N | + |E|. Finally, polynomial time executions are ensured by the non-deterministic guess for (o , t ) in Line 64, and the fact that the depth of the recursion tree is linear with respect to the size of the input expression r. In particular, we do a single non-deterministic guess in Line 64, instead of an exponential number of attempts (with respect to the size of the representation of Ω) that would be necessary to find the right pair (o , t ) in a deterministic algorithm. Let G = (N , E ) be the graph where: