key: cord-0299459-1mzsk3cx authors: Bogaerts, Bart; Jakubowski, Maxime title: Fixpoint Semantics for Recursive SHACL date: 2021-09-17 journal: nan DOI: 10.4204/eptcs.345.14 sha: d2cd550043b6363ad590b4f7ba4fd7322da2ef9e doc_id: 299459 cord_uid: 1mzsk3cx SHACL is a W3C-proposed language for expressing structural constraints on RDF graphs. The recommendation only specifies semantics for non-recursive SHACL; recently, some efforts have been made to allow recursive SHACL schemas. In this paper, we argue that for defining and studying semantics of recursive SHACL, lessons can be learned from years of research in non-monotonic reasoning. We show that from a SHACL schema, a three-valued semantic operator can directly be obtained. Building on Approximation Fixpoint Theory (AFT), this operator immediately induces a wide variety of semantics, including a supported, stable, and well-founded semantics, related in the expected ways. By building on AFT, a rich body of theoretical results becomes directly available for SHACL. As such, the main contribution of this short paper is providing theoretical foundations for the study of recursive SHACL, which can later enable an informed decision for an extension of the W3C recommendation. The Semantic Web [2] extends the World-Wide Web with machine-interpretable data. The de facto standard of this web is that data is stored and published in the Resource Description Framework [22] , i.e., as a set of triples, referred to as a graph, often extended with semantic information expressed in OWL [20] . While in principle the Semantic Web is open and every agent can represent their data however they want, from the perspective of applications, or when consuming RDF data in some other way, it can often be useful to know which structural properties an RDF graph in question satisfies. In other words, there is a need for a declarative language for describing integrity constraints on RDF graphs. Several proposals have emerged to fill this need, the most prominent of them being ShEx [9] and SHACL [23] . Both approaches start from the notion of a shape: a structural property that a node in an RDF graph can satisfy, e.g., the shape of "things that have a name" will be satisfied for those nodes that appear as the subject in a triple with predicate foaf:name. On top of a language for defining such shapes, the two proposals also have a mechanism for targeting: for specifying which nodes should satisfy which shapes, e.g., to declare that "All persons should satisfy the has-a-name shape". In our paper, we will build on a formalization of SHACL [12] , which has revealed a striking similarity between shapes and concepts descriptions, as known from description logics [3] ; we recently deepened this connection further [7] . Following this line of research, we formalize a SHACL schema as a tuple (Def , T), where Def is a set of rules of the form s ← φ and T is a set of concept inclusions of the form φ ⊆ s with φ a shape (i.e., a concept description) and s a shape name. Def defines the shapes in terms of relations in the RDF graph and T contains the so-called targeting constraints. The W3C recommendation for SHACL only specifies the semantics when Def is non-recursive, i.e., if no shape is defined in terms of itself, but recently, some efforts have been made to lift this restriction, i.e., to define a supported and a stable semantics for recursive SHACL [12, 1] . In this paper, we put forward another principled way to define semantics of recursive SHACL, building on Approximation Fixpoint Theory (AFT), an abstract lattice-theoretic framework originally designed to unify semantics of non-monotonic logics [14] with applications, among others, in (extensions of) logic programming, autoepistemic logic, default logic, abstract argumentation, and active integrity constraints [15, 21, 24, 19, 11, 5] . There are several advantages to defining semantics of SHACL in this way: • It is simple and straightforward: the power of AFT, comes largely from the fact that all that is required to apply it, is to define a (three-valued) semantic operator (similar to Fitting's immediate consequence operator for logic programs [18] ). In many domains (including SHACL), there is a natural choice for such an operator; AFT then immediately induces all major classes of semantics. • It provides confidence: AFT guarantees that the developed semantics follow well-established principles in nonmonotonic reasoning. Even in case semantics are already defined, applying AFT can be a sanity check. A striking example of this is the fact that applications of AFT uncovered some issues in the semantics of (weighted and non-weighted) Abstract Dialectic Frameworks [24, 10, 4] . • It provides access to a large body of theoretical results, including theorems on stratification [27, 6] , predicate introduction [28] , and strong equivalence [25] , thereby eliminating the need to "reinvent the wheel" by rediscovering these results in each of the separate domains. In a nutshell, our main contribution is establishing formal foundations for the study of recursive SHACL. In actual SHACL, semantics is defined in terms of RDF graphs, but we recently showed how to reduce this to the logical setting [7] . As such, for the purpose of the current paper, we focus on the logical setting and take abstraction of RDF graphs. Throughout this paper, we (implicitly) fix a vocabulary, i.e., a set of node names (denoted N), property names (denoted P), and shape names (denoted S). A node name c is also referred to as a constant, a property name p as a binary predicate symbol and a shape name s as a unary predicate symbol. Path expressions E and shapes φ are defined as follows: where p, c, and s represent property names, node names, and shape names respectively, and n stands for non-zero natural numbers. When developing our semantics, we will treat φ 1 ∨ φ 2 as an abbreviation of ¬(¬φ 1 ∧ ¬φ 2 ) and ∀E.φ as an abbreviation for ¬ ≥ 1 E.¬φ . The reason for having them in the syntax is to enable the comparison with existing semantics in Section 5. As usual, an interpretation I consists of (i) a set ∆ I , called the domain of I; (ii) for each constant c, an element c I ∈ ∆ I ; (iii) for each shape name s, a set s I ⊆ ∆ I ; and (iv) for each property name p, a binary relation p I on ∆ I . When we say a graph-interpretation, we mean an interpretation that only consists of a domain and an interpretation for node and property names (not shape names). Intuitively, such an interpretation corresponds to an RDF graph. A path expression E evaluates in I to a binary relation E I on ∆ I , and a shape φ to a subset φ I of ∆ I , as defined in Tables 1 and 2 • Def a set of rules of the form s ← φ with s a shape name (referred to as the head of the rule) and φ a shape (the body of the rule), such that each s ∈ S appears exactly once in the head of a rule, and • T a set of (concept) inclusions of the form φ ⊆ s, with φ a shape that does not mention any shape names 1 , and s a shape name. 1 The condition that φ does not mention any shape names will entail that the target query can be evaluated on the underlying Table 1 : Semantics of a path expression E in an interpretation I. Table 2 : Semantics of a shape φ in an interpretation I. We use X to denote the cardinality of X. For a binary relation R and an element a, we use R(a) to denote {b | (a, b) ∈ R}. A shape name s 1 depends on shape name s 2 in Def if there is a rule s 1 ← φ in Def , and s 2 or some shape name that depends on s 2 occurs in φ . A schema is non-recursive if no shape name depends on itself. If (Def , T) is a non-recursive SHACL schema, and I a graph-interpretation, then I can be uniquely extended to an interpretation I such that for each s ∈ S, s I = φ I if s ← φ ∈ Def . In that case, we say that I validates with respect to (Def , T) if φ I ⊆ s I for each inclusion φ ⊆ s in T. For the recursive case, the situation is less obvious; in Section 4, we use approximation fixpoint theory to study the different semantic options that arise, but before doing so, we recall preliminaries on AFT. A complete lattice L, ≤ is a set L equipped with a partial order ≤, such that every set S ⊆ L has a least upper bound and a greatest lower bound. A complete lattice has a least element ⊥ and a greatest element Every monotone operator O in a complete lattice has a least fixpoint, denoted lfp(O). Given a lattice L, AFT uses a bilattice L 2 . We define projections for pairs as usual: (x, y) 1 = x and (x, y) 2 = y. Pairs (x, y) ∈ L 2 are used to approximate elements in the interval is non-empty. We use L c to denote the set of consistent elements. The precision order on L 2 is defined as ( is consistent, this means that (x, y) approximates all elements approximated by (u, v). In its original form, AFT makes use of approximators, which are operators on L 2 , but [16] showed that all the consistent fixpoints studied in AFT are uniquely determined by an approximator's restriction to L c and developed a theory of consistent approximators. An operator A : . AFT studies fixpoints of O using fixpoints of A. • The A-Kripke-Kleene fixpoint is the ≤ p -least fixpoint of A; it approximates all fixpoints of O. • A partial A-stable fixpoint is a pair (x, y) such that x = lfp(A(·, y) 1 ) and y = lfp(A(x, ·) 2 ), where A(·, y) 1 : L → L maps z to A(z, y) 1 and similarly for A(x, ·) 2 . • The A-well-founded fixpoint is the least precise (≤ p -least) partial A-stable fixpoint. graph, without knowledge of the shape definitions. In actual SHACL, the condition is even more limited; there only very specific queries are allowed as targets. These definitions allow reconstructing all major logic programming semantics by taking for O Van Emden and Kowalski's immediate consequence operator T P [26] and for A Fitting's three-(or four-) valued extension Ψ P [18] . For the rest of this paper, we fix a SHACL schema (Def , T) and a graph-interpretation I. We already mentioned that if Def is non-recursive, it uniquely induces a complete interpretation I in which all constraints in T are to be verified. When Def is recursive, however, the situation becomes more complex. On the one hand, there is a range of possible semantics dealing with recursion. On the other hand, some of the semantics yield not a single interpretation I , but either a set of them, or a three-valued interpretation. This will give us a choice between brave and cautious validation; the focus of this paper is on the treatment of negation, but we briefly discuss brave and cautious validation below. To apply AFT, the first step to take is to determine a suitable lattice. In our case, the obvious candidate is the lattice L I (from now on, denoted L) of all interpretations I with domain ∆ I that agree with I on N ∪ P, or in other words, the set of interpretations that expand I. This set is equipped with the standard truth order, I 1 ≤ t I 2 if s I 1 ⊆ s I 2 for all s ∈ S. The role of the semantic operator is to update the value of the interpretation of the shapes. In analogy with logic programming, its definition is straightforward: it maps the interpretation I to T Def (I ) such that for each shape name s with defining rule s ← φ , we define T Def (I )(s) = φ I . With the lattice L, ≤ t , elements of L c are pairs I = (I 1 , I 2 ) of two interpretations with I 1 ≤ t I 2 ; such pairs correspond one-to-one to three-valued interpretations that assign each s ∈ S a function ∆ → {t, f, u}, mapping a to t if a in s I 1 , to f if a ∈ s I 2 and to u otherwise (in other words, I 1 represents what is certainly true and I 2 what is possibly true). From now on, we simply refer to elements of L c as three-valued interpretations. We can evaluate a shape φ in a three-valued interpretation I with a straightforward extension of Kleene's truth tables, as also used in previous studies of recursive SHACL [12, 1] ; for completeness, this is included in Table 3 . This table makes use of the truth order ≤ t on truth values defined as f ≤ t u ≤ t t, and the negation on truth values defined as usual: ¬t = f; ¬f = t; ¬u = u. Once a three-valued evaluation of shapes is defined, an approximator is obtained directly: like the operator, the approximator updates the value of each shape symbol according to its defining rule: it maps I to Ψ Def (I ) where for each shape s ∈ S defined by the rule s ← φ , Ψ Def (I )(s) = φ I . Theorem 4.1. Ψ Def is a consistent approximator of T Def . At this point, AFT dictates what the supported models (fixpoints of T Def ), (partial) stable models (Ψ Def -stable fixpoints), well-founded model (Ψ Def -well-founded fixpoint), and Kripke-Kleene model (Ψ Def -KK fixpoint) of Def are. It is worth stressing that to arrive to this point, we made two choices. The first was our choice of order on the lattice. We opted here for the truth order, but its inverse would also have been a possible choice. Several of the semantics induced by AFT aim to minimize models in the chosen order for reasons of groundedness [8] , e.g., if s has s ← s as defining rule, in stable and wellfounded semantics, our chosen order would result in no nodes satisfying s. The second choice we made is which three-valued truth evaluation to use; we opted for the most obvious choice: a direct extension of Kleene's three-valued truth tables, which was used in other studies of recursive SHACL as well [12, 1] . Given these choices, models of the different types (stable, well-founded, ... ) are defined by AFT, and hence semantics for brave and cautious validation under of each semantics are established. Definition 4.2. Let σ ∈ {KK, WF} and let I be the σ -model. We say that I cautiously (resp. bravely) σ -validates with respect to (Def , T) if φ ∧ ¬s I = f (resp. φ ∧ ¬s I ≤ t u) for every φ ⊆ s in T. Let σ ∈ {St, Sup} and let M be the set of σ -models. We say that I cautiously (resp. bravely) σ -validates with respect to (Def , T) if φ ∧ ¬s I = f for all (resp. for some) I ∈ M for every φ ⊆ s in T. Let us illustrate the differences between the various types of models on a small example. . . , f represent people (divided in two cliques of three "close" friends); one person (c) is vaccinated and one person (d) shows Covid symptoms. This interpretation is visually depicted in Fig. 1 . We define two shapes: the shape of people at risk (those who (i) are not vaccinated and (ii) have symptoms or are close to someone at risk) and the shape of people who can go to office (those who are not at risk), as formalized below: For this set of shape definitions, the unique stable model equals the well-founded model and states that d, e, and f are at risk, while a, b, and c can work. In the Kripke-Kleene model, d, e, and f are again at risk, c is not at risk (and hence can work), but for a and b it is unknown whether they are at risk. There are two supported models: the stable model and one in which everyone except for c is at risk. Corman et al. [12] already defined a supported semantics and Andreşel et al. [1] a stable semantics for SHACL. For clarity, we refer to the existing semantics as CRS-supported and ACORSS-stable semantics, and to the semantics induced by AFT, e.g., as AFT-stable. Both of them focus on brave validation, but Andreşel et al. also mention the possibility of cautious validation. The main results on correspondence between the semantics are summarized here. In fact, Corman et al. [12, Definition 5] already defined the three-valued immediate consequence operator Ψ Def (there denoted T). While the focus of that work was on supported semantics, we now showed that in fact, by defining the approximator Ψ Def , they had everything at hand to define the full. Since Corman et al. also characterized supported models as fixpoints of Ψ Def (in their Definition 17), our semantics and theirs coincide. The situation is somewhat different for stable semantics, which Andreşel et al. [1] defined in terms of level mappings. ). Let I be an interpretation. A level assignment for I is a function level that maps tuples in {(φ , a) | φ I (a) = t} to integers and satisfies (i) level(φ 1 ∧φ 2 , a) = max{level(φ 1 , a), level(φ 2 , a)}, A supported model I is an ARCOSS-stable model if there exists a level assignment for I such that level(s, a) > level(φ s , a) for each rule s ← φ s in Def and each a with s I (a) = t. For the correspondence in case of stable semantics, we recall a normal form of Andreşel et al. [1] : Def is in shape normal form if all rules in Def have one of the following forms: Theorem 5.4. If I is an AFT-stable model of Def , then it is also an ACORSS-stable model. If Def is in shape normal form, the converse also holds. The difference between our stable semantics and the ACORSS-stable semantics is a semantic (in terms of the standard three-valued truth evaluation) versus a syntactic (the level mappings are defined in terms of the syntactic structure of the shapes) treatment of negation and is illustrated in the next example. Example 5.5 (Example 4.3 continued). Suppose that in our same interpretation, we wish to define a shape that identifies possible superspreaders. To do this, we say that a person is "safe" if they are vaccinated, or in contact with at most 1 non-safe person. This can be formalized as: Where ≤ 1 is an abbreviation for ¬ ≥ 2 . With the interpretation of Fig. 1 , there is a single AFT-stable model in which a, b, and c are safe, but d, e, and f are not. However, there are two ACROSS-stable models: the one mentioned above, and one in which everyone is safe, including the three-clique of non-vaccinated people. When Corman et al. [12] defined the supported model semantics for SHACL, they showed how to translate actual SHACL expressions (as specified by the W3C recommendation) into logical expressions in a language akin to description logics. For studying recursive SHACL expressions, they even already defined the operator Ψ Def . Hence, the only work left to obtain a rich family of semantics, was observing that this operator is indeed an approximators and applying AFT. As such, we believe this paper establishes strong and formal foundations for the study of recursive SHACL. Indeed, AFT does not just dictate how the semantics are to be defined, but immediately provides guarantees such as stratification and predicate introduction results that can be instrumental when developing concrete validators for recursive SHACL: we immediately obtain results about which transformations can safely be applied to our theories. Our results have been presented without taking a stance on the choice of semantics. It is important to realize though, that if one wants to view Def as an (inductive) definition of the shapes, it has been argued repeatedly that the well-founded semantics correctly formalizes this [17] . Under the well-founded semantics, SHACL integrates first-order constraints (of a restricted form) with inductive definitions and aggregates, and hence can be seen as a fragment of the language FO(ID, Agg), the formal foundation of the IDP language [13] . It is a topic for future work to investigate whether this can be exploited for either extending SHACL, or for developing alternative validation mechanisms, building on IDP. Stable model semantics for recursive SHACL A Semantic Web Primer An Introduction to Description Logic Weighted Abstract Dialectical Frameworks through the Lens of Approximation Fixpoint Theory Fixpoint Semantics for Active Integrity Constraints Stratification in Approximation Fixpoint Theory and Its Application to Active Integrity Constraints SHACL: A Description Logic in Disguise Grounded fixpoints and their applications in knowledge representation Semantics and validation of shape schemas for RDF Abstract Dialectical Frameworks Revisited Approximation Fixpoint Theory and the Well-Founded Semantics of Higher-Order Logic Programs Semantics and validation of recursive SHACL Predicate Logic as a Modelling Language: The IDP System. CoRR abs/1401.6312v2 Well-Founded Fixpoints and Applications in Nonmonotonic Reasoning Uniform semantic treatment of default and autoepistemic logics Ultimate approximation and its application in nonmonotonic knowledge representation systems The Well-Founded Semantics Is the Principle of Inductive Definition, Revisited Fixpoint semantics for logic programming -A survey Flexible Approximators for Approximating Fixpoint Theory OWL 2 Web Ontology Language: Structural Specification and Functional-Style Syntax Well-founded and Stable Semantics of Logic Programs with Aggregates Shapes Constraint Language (SHACL). W3C Recommendation. Available at Approximating operators and semantics for abstract dialectical frameworks Strong and uniform equivalence of nonmonotonic theories -an algebraic approach The Semantics of Predicate Logic as a Programming Language Splitting an operator: Algebraic modularity results for logics with fixpoint semantics Predicate Introduction for Logics With a Fixpoint Semantics. Parts I and II Acknowledgements This research was supported by the Flemish Government in the "Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen" programme and by FWO Flanders project G0B2221N.