key: cord-0530288-trtooy3v authors: Baharisangari, Nasim; Hirota, Kazuma; Yan, Ruixuan; Julius, Agung; Xu, Zhe title: Weighted Graph-Based Signal Temporal Logic Inference Using Neural Networks date: 2021-09-16 journal: nan DOI: nan sha: 7e38a98d357c6728e401c74811b7f16388e549f8 doc_id: 530288 cord_uid: trtooy3v Extracting spatial-temporal knowledge from data is useful in many applications. It is important that the obtained knowledge is human-interpretable and amenable to formal analysis. In this paper, we propose a method that trains neural networks to learn spatial-temporal properties in the form of weighted graph-based signal temporal logic (wGSTL) formulas. For learning wGSTL formulas, we introduce a flexible wGSTL formula structure in which the user's preference can be applied in the inferred wGSTL formulas. In the proposed framework, each neuron of the neural networks corresponds to a subformula in a flexible wGSTL formula structure. We initially train a neural network to learn the wGSTL operators and then train a second neural network to learn the parameters in a flexible wGSTL formula structure. We use a COVID-19 dataset and a rain prediction dataset to evaluate the performance of the proposed framework and algorithms. We compare the performance of the proposed framework with three baseline classification methods including K-nearest neighbors, decision trees, support vector machine, and artificial neural networks. The classification accuracy obtained by the proposed framework is comparable with the baseline classification methods. Learning spatial-temporal properties from data is useful in many applications, especially where the data is based on an underlying graph structure. It is preferable that the learned properties are interpretable for humans and amenable to formal analysis [1] . Various logics have been introduced to express and analyze spatial-temporal properties [2] [3] [4] , and temporal logics are one of the major groups. Temporal logics, which are categorized as formal languages [5] , demonstrate both the interpretability and being amenable to formal analysis; thus, temporal logics are used to express the temporal and logical properties of systems in a humaninterpretable way [6] . In addition, graph-based logic (GL), which is used to express spatial properties, is understandable for humans and preserves the rigorous aspect of formal logics. Besides interpretability and being amenable to formal analysis, efficiency and expressiveness [7] are also important when learning spatial-temporal properties from data [8] . One approach to increase the efficiency of the learning task is to integrate neural networks into the process [9] [10] . We can expand the capacity of the learning task to handle more complex spatial-temporal datasets by combining the distinct advantages of temporal logics and graph-based logics, and neural networks. Signal temporal logic (STL) is one type of temporal logics, which deals with real-valued data over real-time domain [11] . In this paper, we combine STL and GL to introduce graph-based signal temporal logic (GSTL) formulas to express spatial-temporal properties. Furthermore, we assign importance weights to the subformulas of a GSTL formula and name it weighted graph-based signal temporal logic (w-GSTL), where each importance weight quantifies the importance of a subformula of a w-GSTL formula. Contributions: In this paper, we propose a methodology that trains neural networks to learn spatial-temporal properties from data in the form of w-GSTL formulas. The contributions of this paper are as follow: (a) we introduce a flexible w-GSTL formula structure that allows the user's preference to be applied in the inferred w-GSTL formula. In this structure, the w-GSTL operators are free to be inferred; (b) we propose a framework and algorithms to learn w-GSTL formulas from data using neural networks. In the proposed framework and algorithms, neurons of the neural networks represent the quantitative satisfaction of the w-GSTL operators and Boolean connectives. For a given flexible w-GSTL formula structure, we first construct and train a neural network to learn w-GSTL operators through back-propagation; then, we construct and train another neural network to learn the parameters of the flexible w-GSTL formula structure through backpropagation. We evaluate the performance of the proposed framework and algorithms by exploiting real-life examples: predicting COVID-19 lockdown measure in a geographical region in Italy, and rainfall prediction in a geographical region in Australia. The obtained results show that the proposed method achieves comparable classification accuracy with comparison with four baseline classification methods including K-nearest neighbors (KNN), decision trees (DT), support vector machine (SVM), and artificial neural networks (ANN), while the interpretability has been improved with the learned w-GSTL formulas. Recently, learning spatial-temporal properties from data has been employed in different applications such as swarm robotics [4] , etc. Different methods have been adopted to carry out this learning task. Many researchers have developed different logics to learn spatial-temporal properties from data. For example, Xu et. al. [2] introduce graph temporal logic (GTL), or Liu et. al. [8] introduce graphbased spatial temporal logic (GSTL). Many other researchers propose frameworks to conduct the learning tasks based on neural networks. For instance, Wu et. al. [10] develop a CNN (convolution neural network)-based method and name it Graph WaveNet. and, Ziat et. al. [12] introduce Spatio-Temporal Neural Network (STNN). The proposed approach in this paper benefits from advantages of both the formal logics and neural networks: human-interpretability and efficiency. Moreover, combining temporal logic and neural networks to carry out learning tasks has been gaining attention [13] [14] [15] . One way to realize this combination is through connecting the temporal operators and Boolean connectives to the activation functions in neural networks [15] . Most of the standard algorithms used to conduct logic inference solve a non-convex optimization problem to find parameters in the formula, where the loss function of a neural network is not differentiable with respect to the parameters at every point. In [13] , Yan et. al. propose a loss function that addresses the differentiability issue. In addition, the proposed frameworks in [13] , [15] , and [14] do not extract spatialtemporal properties from data. Graph: We denote a graph by G = (V, E), where V = {v 1 , v 2 , ..., v n N } is a finite set of nodes, E = {e 1 , e 2 , ..., e n E } is a finite set of edges, and n N , n E ∈ N = {1, 2, ...}. We also denote a set of (possibly infinite) node values by Z, where Z ⊆ R. Graph-based trajectory: We define a finite d-dimensional graph-based trajectory g : V × T → Z d that assigns a node value for each node v at time-step k ∈ T = {0, 1, 2, ..., J}, where T is a finite discrete time domain and J ∈ N. We also denote the value of the d-th dimension of the graph-based trajectory g at time-step k and node v by III. WEIGHTED GRAPH-BASED SIGNAL TEMPORAL LOGIC In this section, we introduce weighted graph-based signal temporal logic (w-GSTL) as the weighted extension of graphbased logic (GL) which is modified from graph temporal logic in [2] . In this subsection, we define the syntax and Boolean semantics of graph-based logic (GL) formulas. In GL, we encode different locations in a graph-structured dataset as nodes in a graph G(V, E), and we use the edges to encode the neighbor connections of a location demonstrated as a node. For GL formulas, we define N v to denote the set of neighbors of a node v, where the subscript N stands for "neighbor". The number of the neighbors of the node v is denoted by | N v|. We define the syntax of GL formulas as follows. where stands for the Boolean constant True, π is an atomic predicate in the form of an inequality f (g(v, k)) > 0 in the ¬ (negation) and ∧ (conjunction) are standard Boolean connectives; ∀ G is a GL operator called graph-based universal quantifier and ∀ G N φ G reads as "all the neighbors of the current node satisfy φ G "; ∃ G is a GL operator called graphbased existential quantifier and ∃ G N φ G reads as "there exists at least one neighbor of the current node that satisfies φ G ". We define the Boolean semantics of GL formulas as follows. The quantitative satisfaction of graph-based logic formulas at node v and at time-step k is defined as follows. The syntax of GSTL formula φ is defined recursively as follows. where φ G is a GL formula, ¬ (negation) and ∧ (conjunction) are standard Boolean connectives, G I is the temporal operator "always", and F I is the temporal operator "eventually". The Boolean semantics of GSTL is based on the Boolean semantics of STL [11] and is evaluated using graph-based trajectories. The Boolean semantics of φ G is as described in Subsection III-A. Example 1. In Figure 1 , , v 6 }, and graph-based trajectory g satisfies the GL formula ∃ G N (g > 2) only at node v 4 . For the time interval I = [1, 3] , if the node value of node v 6 stays greater than 2 in the time interval k+I, then the GSTL formula G [1, 3] is satisfied by graph-based trajectory g at node v 4 . An extension of STL is weighted STL (wSTL), where we assign a weight to each subformula of an wSTL formula based on its importance [16] [13] . We refer to these weights as importance weights. In this paper, we extend wSTL to weighted GSTL (w-GSTL). In w-GSTL, in addition to defining importance weights for the subformulas, we define the importance weights for both the temporal operators and the GL operators. In other words, we assign an importance weight to each time-step k ∈ I, and we assign an importance weight to each neighbor nodev ∈ N v of a node v. We define the syntax of w-GSTL formulas as follows. where w 1 and w 2 are positive importance weights on w1φ1 and w1φ2 , respectively; Ω = [w k1+1 , w k1+2 , .., w k2 ] T ∈ R k2−k1+1 assigns a positive weight w k to k ∈ [k 1 , k 2 ] in the temporal operators; W = [w 1 , w 2 , .., w | N v| ] T ∈ R | Nv| assigns a positive weight to eachv ∈ N v. In the syntax of w-GSTL formulas, wφ G is defined as follows. wφ In this section, we formalize the problem of classifying graph-based trajectories by inferring w-GSTL formulas using neural networks. We denote a set of labeled graph-based . We assume that the set D is composed of two subsets: positive subset D P = {g i |l i = +1} which contains the graph-based trajectories representing the desired behavior, and negative subset D N = {g i |l i = −1} which contains the graph-based trajectories representing the undesired behavior. For the cardinality of the defined sets, we have |D P | =p, |D N | =ñ, and N D =p +ñ. Inspired by [17] , we define the following. Definition 1. We define a w-GSTL formula structure, denoted by F, as a w-GSTL formula in which the importance weights of the subformulas and the variables of the atomic predicates, and the importance weights of the GL operators and the temporal operators are replaced by free parameters. In this structure, we assume that we always have at least one temporal operator and one GL operator. Definition 2. We define a flexible w-GSTL formula structure F f as a flexible extension of w-GSTL formula structure such that the types of the temporal operators and the types of the GL operators are to be inferred from data; but the types of the Boolean connectives in the structure are fixed. In this structure, we represent the set of GL operators as a set P GL = {∀ G , ∃ G } from which the proper operator is to be inferred from data. Similarly, we represent the set of temporal operators as a set P temporal = {F Ω I , G Ω I }. Example 2. In the flexible w-GSTL formula structure ), the types of the temporal operators and the GL operators, the importance weights of the subformulas and the variables of the atomic predicates π 1 and π 2 , and the importance weights of the GL operators and the temporal operators are to be inferred from data, but the Boolean connective ∨ is fixed. After determining the proper w-GSTL operators in a given F f , we obtain a w-GSTL formulaφ that is consistent with F f . In order to define the problem statement, we define the classification accuracy, denoted by AC, as AC = l predicted N D × 100, where l predicted ∈ N is the number of correctly classified graph-based trajectories g i . Problem 1. Given a set of labeled graph-based trajectories and a flexible w-GSTL formula structure F f , infer a w-GSTL formulaφ (i.e., select the w-GSTL operators and compute the parameters of that structure) to classify D such that the classification accuracy AC is maximized. In order to solve Problem 1, we introduce w-GSTL neural networks (w-GSTL-NN) which combines the characteristics of w-GSTL and neural networks. In the first step, we construct and train a w-GSTL-NN to learn the proper w-GSTL operators in the flexible w-GSTL formula structure F f . In the second step, we construct and train another w-GSTL-NN to learn the parameters in F f . In w-GSTL-NN, we combine the activation functions in a neural network with the quantitative satisfaction of w-GSTL. We define the quantitative satisfaction of a w-GSTL formulaφ as follows. × R | N v| → R are activation functions corresponding to ∧, ∨, G Ω I , F Ω I , ∀ G , and ∃ G operators, respectively. We denote an activation function For defining the activation functions, we use the variable σ, where σ is a positive real number. We define the activation function corresponding to each operator in the set A as follows. ⊗ where w m is the normalized weight and I = [k 1 , k 2 ]; in the activation functions of ∧ and ∨ Boolean connectives, w m = w m / 2 j=1 w j ; In the activation functions of G Ω I and F Ω I operators, w m = w m / k2 j=k1 w j ; in ∀ G and ∃ G graph operators, w m = w m / | N v| j=1 w j . w m aims to normalize the importance weight of the m-th subformula inφ to the range of [0, 1] such that w m can reflect the importance of the m-th subformula in determining the quantitative satisfaction of the whole w-GSTL formulaφ. In the activation function of ∧ operator, r m = r w (g, v,φ m , k); in the activation function of ∨ operator, r m = −r w (g, v,φ m , k); in the activation function of ∀ G operator, r m = r w (g,v,φ G , k); in the activation function of ∃ G operator, r m = −r w (g,v,φ G , k); in the activation function of G Ω I operator, r m = r w (g, v,φ, m); and in the activation function of F Ω I operator, r m = −r w (g, v,φ, m); in the activation functions of ∧ and ∨ operators, s m = e − rm σ / In this section, we introduce a framework and algorithms to solve Problem 1 for a given flexible w-GSTL formula structure F f with M undetermined temporal operators and L undetermined GL operators using w-GSTL-NN. For the training of w-GSTL-NN, we design a loss function to meet the following two requirements: 1) the loss should be small when the inferred formula is satisfied by graph-based trajectories in D P and violated by the graph-based trajectories in D N ; 2) the loss should be large when the inferred formula is not satisfied by the graph-based trajectories in D P and not violated by the graph-based trajectories in D N . We define the loss function as follows. where g i denotes the i-th graph-based trajectory in D, and η > 0 is a tuning parameter. J(φ) is small for the cases where l i r w (g i ,φ) > 0 and increases exponentially when l i r w (g i ,φ) < 0. Algorithm 1: Two-step procedure of inferring a w-GSTL formula Input: a set of labeled graph-based trajectories D w-GSTL structure F f number of iterations K number of subformulas J Output: Learned w-GSTL formulaφ 1 Construct a w-GSTL-NN based on the given structure F f 2 Initialize w F f , a S and c 13 F ← Selected operators from the PGL and the Ptemporal We compute a w-GSTL formula in two steps: 1) determining the proper temporal and GL operators in the given flexible w-GSTL formula structure F f ; 2) learning the parameters of the flexible w-GSTL formula structure F f . For each step, we construct and train a separate w-GSTL-NN. Algorithm 1 illustrates the two-step procedure of learning a w-GSTL formula from a given set of labeled graph-based trajectories D and a given flexible w-GSTL formula structure F f . Step 1: In step 1, we initialize two sets of coefficients: Alg. 1) . In this step, for the activation functions of G Ω I and F Ω I , we define the coefficients q G = +1 and q F = −1, respectively. Similarly, for the activation functions of ∀ G and ∃ G , we define the coefficients q ∀G = +1 q ∃G = −1, respectively. We construct and train a w-GSTL-NN (demonstrated in Alg. Compute J(φ) using (1) 8 P arameters and r w (gi,φ) ← Fbackward(D k , P arameters, J(φ)) 9 end for 10 return P arameters Step 2: After determining the proper operators, we construct and train another w-GSTL-NN (demonstrated in Alg. 2) to learn parameters of the flexible w-GSTL formula structure F f including a S , c, and all the importance weights in the flexible w-GSTL formula structure F f that we denote them by w F f (Lines 13 to 16 in Alg. 2). Algorithm 2 illustrates w-GSTL-NN that we use to learn w-GSTL formulas. In Algorithm 2, P arameters denotes a set of parameters that we calculate at each step of the two-step process of learning a w-GSTL formula. In step 1, P arameters includes {b i } M i=1 and {b i } L i =1 . In step 2, P arameters includes the parameters of the flexible w-GSTL formula structure F f (including a S , c, and w F f ). In Algorithm 2, we denote the forward-propagation operation by F forward (D k , subf (j ), ⊗ i , P arameters), where D k denotes the selected mini-batch from the set D, subf (j ) is the jth subformula in F f , ⊗ j is activation function corresponding to the w-GSTL operator or Boolean connective in subf (j ), P arameters is to be calculated, and r w (g i ,φ) is the output of forward-propagation (Line 5 in Alg. 2). More clearly, after determining the w-GSTL operators, w-GSTL-NN calculates the quantitative satisfaction of the inferredφ with respect to g i through forward-propagation. Also, We denote the backpropagation operation by F backward (D k , P arameters, J(φ)), where P arameters is to be updated (Line 8 in Alg. 2). The proposed algorithms are implemented in a Python toolbox 1 . The time complexity for learning a w-GSTL-NN for a given graph G(V, E) is O(u 1 u 2 u 3 n E u 4 ), where u 1 is the number of training samples, u 2 is the number of graph-based trajectories in each sample, u 3 is the number of epochs, n E is the number of nodes in the graph G(V, E), and u 4 is the total number of temporal and graph-based operators in the flexible w-GSTL formula structure F f . In this section, we assess the performance of w-GSTL-NN. We first use a meteorological dataset in Australia to predict rainfall. Then, we predict the severity of lockdown measures using COVID-19 data in Italy. The performance of w-GSTL-NN is compared with some other standard classification algorithms. The flexible w-GSTL formula structure that we use for these two case studies is For both case studies, we use the Adam optimization algorithm [18] to learn the neural network. The c parameter is initialized as 0 to prevent initial bias. Furthermore, all of the importance weights are initialized at the same value of 0.5, also to prevent initial bias. In this subsection, we use w-GSTL-NN to predict rainfall in regions of Australia. The dataset is acquired from the Australian Government Bureau of Meteorology 2 [19] . The dataset that we use is composed of weather-related data in 49 regions of Australia measured daily from March 1st, 2013 to June 25th, 2017, including minimum sunshine (g 1 ), cloud at 9am (g 2 ), cloud at 3pm (g 3 ), and whether there was any rain on the given day (-1 or 1) (g 4 ). Only the most important attributes are mentioned due to space limitations. The criterion for the binary classification is the rain prediction for the following day (-1 for no rain, 1 for rain). We construct a graph structure of the Australian regions, considering any region within a 300 km radius of another region to be neighbors. A separate neural network is learned for each region and its neighbors to prevent scalability issues when broader regions are considered. We utilize the zero imputation method for missing values in the input data. The dataset is divided into a proportion of 0.8:0.2 for the training dataset and testing dataset, respectively, resulting in 1262 total data points per region for the training dataset and 316 total data points per region for the testing dataset. For a demonstration of how the importance weights and predicates can be interpreted, we consider the learned neural network for Albury and its neighbors: Wagga Wagga, Can-berra, Tuggeranong, Mount Ginini, Bendigo, Sale, Melbourne Airport, Melbourne, and Watsonia. We set K I consecutive days of data as one instance of the dataset in our experiment. The dataset is passed through two separate neural networks. In the first step, we determine the temporal operators and the GL operators to be applied on I i and I ii intervals, respectively. In the second step, we learn the parameters of the flexible w-GSTL formula structure F f . In the experiment, we set K I = 15, I i = [0, 6], I ii = [7, 14] , η = 1, and σ = 1. The learned w-GSTL formula isφ = w1 (G Ω1 [0,6] (∃ G N W π)) ∨ (¬ w2 (F Ω2 [7, 14] (∃ G W N π))), and the inferred predicate π is π := −0.0298g 1 + 0.0226g 2 + 0. We evaluate the performance of the proposed algorithm by applying some standard classification methods such as K-nearest neighbors (KNN) and decision tree (DT), kernel method such as support vector machine (SVM), and an artificial neural network (ANN) algorithm on the same dataset. Obtained w-GSTL-NN produces a higher accuracy than both KNN and DT (Table I) . Although SVM and ANN produce higher accuracy than w-GSTL-NN because they are not restricted by the w-GSTL formula, w-GSTL-NN can generate humanreadable results, unlike the other algorithms. Using the learned w-GSTL formula, the learned importance weights, and the predicates, we can interpret the decision-making of the classifier, rather than merely interpreting the results generated by a black box. The signs and the magnitudes of the predicates can be used to analyze the correlation between each meteorological input and rain prediction. The greater magnitude of the predicate suggests a stronger correlation between the input and rain prediction, and the sign of the predicate indicates whether the input has a positive or a negative relationship with rain prediction. For instance, the coefficients associated with the predicates suggest that larger amount of clouds at 9 am (g 2 ), a larger amount of clouds at 3 pm(g 3 ), and more rainfall during the interval (g 4 ) in the input data correlates to a larger probability that there will be rainfall on the target date, while the larger amount of sunshine (g 1 ) correlates to a larger probability that there will not be rainfall on the target date. Furthermore, the learned importance weights associated with ∃ G suggest that the meteorological conditions in Mount Ginini and Melbourne Airport are the best indicators for predicting rain in Albury. In this subsection, we use simulated COVID-19 datasets of 20 Italian regions from the DiBernardo Group Repository [20] . The dataset is composed of a time-series dataset for each region in Italy. The inputs of each time series are percentage of people infected (g 1 ), quarantined (g 2 ), deceased (g 3 ), and hospitalized due to COVID-19 (g 4 ). The data is simulated for the social distancing parameter of 0.3 for strict lockdown measures and 1 for no lockdown measures. Each of the inputs of the data was recorded daily for 365 days. We turn this case study into a binary classification by labeling "strict lockdown measures" with -1 and "no lockdown measures" with 1. For the demonstration of the learned importance weights and predicates, we consider the learned spatialtemporal properties of Abruzzo and its neighboring regions: Lazio, Marche, and Molise. In this case study, we use K I = 30, I i = [0, 14], I ii = [15, 29] , η = 1, and σ = 1. We divide the dataset into 472 sets of K I time instances for training dataset and 200 sets of K I time instances for testing dataset. The w-GSTL formula with learned operators is φ = w1 (G Ω1 [0,14] (∃ G W N π)) ∨ (¬ w2 (F Ω2 [15, 29] (∃ G W N π))), where the inferred predicate π is π := (−4.4617g 1 − 4.3504g 2 − 3.2291g 3 − 4.4045g 4 ≤ −0.1421), the normalized importance weights for I ii are Ω 2 = [0.0006, 0.0007, 0.0008, 0.0008, 0.0007, 0.0008, 0.0007, 0.0007, 0.0007, 0.0006, 0.0007, 0.0008, 0.0007, 0.1663, 0.8064], the normalized importance weights for Lazio, Marche, and Molise, respectively, are W = [0.0002, 0.0001, 0.9997], and the normalized importance weights for the ∨ operator are: w 1 = 0.9998, w 2 = 0.0002. We apply the K-nearest neighbors, decision tree, and support vector machine algorithms on this dataset to conduct the binary classification. The learned w-GSTL formula provides us with the spatial-temporal properties of the dataset and determines whether there is a strict lockdown measure in the region or not. Furthermore, the accuracy of w-GSTL-NN matches that of K-nearest neighbors, decision tree, and support vector machine for the COVID-19 test dataset, all achieving an accuracy of 100%. Nevertheless, w-GSTL-NN provides important information for analysis that would not be possible to retrieve when using other algorithms, which makes w-GSTL-NN useful for applications when interpretation for the decision-making of computers is necessary. In this paper, we proposed a framework that combined neural networks and w-GSTL for learning spatial-temporal properties from data. The proposed approach represents the learned knowledge in a human-readable form. As the future direction, we plan to extend this approach to scenarios where only the positive data is available. Also, we aim to apply w-GSTL-NN in the settings of deep reinforcement learning (deep RL) to improve the interpretability of the deep RL, where we deal with graph-structured problems. Towards verified artificial intelligence Graph Temporal Logic Inference for Classification and Identification Robotic swarm control from spatio-temporal specifications Probabilistic swarm guidance subject to graph temporal logic specifications Introduction to Automata Theory, Languages, and Computation Formalizing trajectories in human-robot encounters via probabilistic stl inference Expressivity of Deep Neural Networks A graph-based spatial temporal logic for knowledge representation and automated reasoning in cognitive robots Structured sequence modeling with graph convolutional recurrent networks Graph wavenet for deep spatial-temporal graph modeling Parametric identification of temporal properties Spatio-temporal neural networks for space-time series forecasting and relations discovery Neural network for weighted signal temporal logic Logic tensor networks: Deep learning and logical reasoning from data and knowledge Logical neural networks Specifying User Preferences Using Weighted Signal Temporal Logic Temporal Logics for Learning and Detection of Anomalous Behavior Adam: A method for stochastic optimization Rain in australia A network model of italy shows that intermittent regional strategies can alleviate the covid-19 epidemic