key: cord-0105309-al92w637
authors: Kammuller, Florian
title: Dependability Engineering in Isabelle
date: 2021-12-08
journal: nan
DOI: nan
sha: ff4c6d24911e68c87b1bcf41e6a8112e88c5ec11
doc_id: 105309
cord_uid: al92w637

In this paper, we introduce a process of formal system development supported by interactive theorem proving in a dedicated Isabelle framework. This Isabelle Infrastructure framework implements specification and verification in a cyclic process supported by attack tree analysis closely inter-connected with formal refinement of the specification. The process is cyclic: in a repeated iteration the refinement adds more detail to the system specification. It is a known hard problem how to find the next refinement step: this problem is addressed by the attack based analysis using Kripke structures and CTL logic. We call this cyclic process the Refinement-Risk cycle (RR-cycle). It has been developed for security and privacy of IoT healthcare systems initially but is more generally applicable for safety as well, that is, dependability in general. In this paper, we present the extensions to the Isabelle Infrastructure framework implementing a formal notion of property preserving refinement interleaved with attack tree analysis for the RR-cycle. The process is illustrated on the specification development and privacy analysis of the mobile Corona-virus warning app.

Dependability [1] is a notoriously difficult property for system development because already security -which is part of dependability -is not compositional: given secure components, a system created from those components is not necessarily secure. Therefore, the usual divide-and-conquer approach from system and software design does not apply for dependability engineering. At the same time, it is mandatory for the design of dependable systems to introduce security in the early phases of the development since it cannot be easily "plugged in" at later stages. However, even if security is introduced in early phases, a classical stepwise development of refining abstract specifications by concretizing the design does not preserve security properties. Take for example, the implementation of sending a message from a client A to a server B such that the communication is encrypted to protect its content. In the abstract system specification, we do not consider a concrete protocol nor the architecture of the client and server. Using common refinement methods from software engineering provides a possible implementation by passing the message from a client system AS connected by a secure channel to a server system BS. However, this implementation does not exclude that other processes running on either AS or BS can eavesdrop on the plain text message because the confidentiality protection is only on the secure channel from AS to BS. This example is a simple illustration of what is known as the security refinement paradox [2] .

Why is security so hard? A simple explanation is that it talks about negative properties: something (loss or damage of information or functionality) must not happen. Negation is also in logic a difficult problem as it needs exclusion of possibilities. If the space to consider is large, this proof can be hard or infeasible. In security, the attacks often come from "outside the model". That is, for a given specification we can prove some security property and yet an attack occurs which uses a fact or observation or loophole that just has not been considered in the model. This known practical attack problem is similar to the refinement paradox. Intuitively, the attacker exploits a refinement of the system that has not been taken into account in the specification but is actually part of the real system (an implementation of the specification). In the above example, the real system allows that other applications can be run on the client within the security boundary. This additional feature of multi-processing systems has not been taken into account in the abstract specification where we considered processes and systems -the client and the server -as abstract entities without considering crucial features of their internal architectures.

To facilitate the formal stepwise development of dependable systems, we propose to make use of these intricacies by including the attacks into the process. As in common engineering practice, where testing for failures is used for improvement of prototypes, we use attack tree analysis within the same formal framework to provide guidance for the refinement of the system specification. Since the interleaved attack analysis steps are formalized in the same expressive formal framework, the found attack paths lead to system states that provide concrete information about the loopholes. This information allows adding details to data types, system functions, as well as policies thereby providing a next level of a more refined specification closer to an implementation. This is a cyclic process: in turn the newly refined specification can be scrutinized by attack tree analysis since the added detail may provide additional new loopholes. However, there is a stepwise improvement. Since we are using the highly expressive Higher Order Logic of Isabelle as a basis for our formal framework, we can implement the framework so that it allows expressing infrastructures including actors and policies, attack trees based on Kripke structures and temporal logic CTL, as well as a closed notion of dependability refinement that encompasses the idea that a refined specification after attack tree analysis excludes previous attacks.

This paper for the first time completes the RR-cycle by presenting (a) a full formal definition of the RR-cycle including (b) a formal notion of refinement with a property preservation theory, and (c) a formal notion of termination of the RR-cycle. All of those definitions and related theorems are validated on the application example of the Corona-cirus Warning App (CWA) completing a preliminary workshop publication [3] . It thus completes and extends previous works on establishing the RR-cycle. A more detailed account of this completion and how it relates to previous works is given in Section VIII-A.

In this paper, we first give a general overview over different approaches to refinement in formal methods in Section III to provide a better understanding and foundation for our formal framework. In Section IV, we then proceed to present the technical core of the paper, the formal definition of refinement including property preservation and mainly the definition of the RR-Cycle itself by providing a termination condition. Section V introduces the case study, the CWA. Then, we illustrate application of the Isabelle Infrastructure framework for modeling and initially analyzing the application in Section VI. The methodology of the RR-Cycle is illustrated on the CWA case study in Section VII. That is, we repeatedly refine the initial specification based on the attack tree results until the termination condition of the RR-Cycle is met and the global policy is thus valid. Section VIII discusses related work and presents conclusions. An overview over the Isabelle Infrastructure framework that is used as the vehicle for the formalisation of the RR-cycle is given next in Section II.

The Isabelle Infrastructure framework is implemented as an instance of Higher Order Logic in the interactive generic theorem prover Isabelle/HOL [4] . The framework enables formalizing and proving of systems with physical and logical components, actors and policies. It has been designed for the analysis of insider threats. However, the implemented theory of temporal logic combined with Kripke structures and its generic notion of state transitions are a perfect match to be combined with attack trees into a process for formal security engineering [5] including an accompanying framework [6] . An overview of the layers of object-logics of the Isabelle Infrastructure framework is provided in Figure 1 : each level above is embedded into the one below; the novel contribution of the Corona-virus warning app that will be presented in this paper are the highlighted two top boxes of the figure. This formula states that in Kripke structure K there is a path (E) on which the property s (given as the set of states in which the property is true) will eventually (F) hold. • Attack trees: attack trees are defined as a recursive datatype in Isabelle having three constructors: ⊕ ∨ creates or-trees and ⊕ ∧ creates and-trees. And-attack trees l⊕ s ∧ and or-attack trees l⊕ s ∨ consist of a list of sub-attacks which are themselves recursively given as attack trees. The third constructor takes as input a pair of state sets constructing a base attack step between two state sets. For example, for the sets I and s this is written as N (I,s) . As a further example, a two step and-attack leading from state set I via si to s is expressed as [N (I,si) ,N (si,s) ]⊕ (I,s) ∧ • Attack tree refinement, validity and adequacy:

Attack trees can be constructed also by a refinement process but this is different from the system refinement presented in this paper (see Section IV). An abstract attack tree may be refined by spelling out the attack steps until a valid attack is reached: A :: (σ:: state) attree).

The validity is defined constructively so that code can be generated from it. Adequacy with respect to a formal semantics in CTL is proved and can be used to facilitate actual application verification. This will be used for the stepwise system refinements central to the methodology presented in this paper.

Refinement is intuitively thought of as a systematic way of developing system designs from some abstract specification to a more detailed one. In the process of doing so, details that are left out in the abstraction are filled in. Refinement is a very natural way of step by step engineering systems starting from an abstract system design. However, whereas in traditional engineering disciplines validation is done through simulation, prototyping and testing, software-based systems (including IoT systems) allow formally specifying abstract system designs with logic. Then the concept of refinement can be explicitly formalized providing means to prove properties of the formal system design (also known as system specification) as well as reasoning in general about refinement, most importantly, how the refinement preserves properties or achieves design goals.

We consider systems as IoT system or cyber-physical systems, that is, systems constituted of hardware and software components. We emphasize that these systems should be designed in a human centric way. Although humans cannot be considered as system parts, their interaction can be specified as well by considering policies that specify the possible behaviour of actors as parts of the system design. If behavioural descriptions of human behaviour exist, they may well be integrated into the system specification. An example are insider threats where a taxonomy and empirical knowledge about human behaviour has been specified to provide a more detailed model of psychological dispositions of actors on when they start being insider attackers in order to analyze attacks and countermeasures. System specifications consist of datatypes for actors, physical system aspects (hardware, credentials, locations) and logical aspects, e.g., data, access control labels, software functions (types of functions, other descriptions of functionality like logical (axiomatic) descriptions) and policies. To summarize this heterogeneous set of entities, we use the simple description "infrastructures with (human) actors and policies".

Refinement is the process that starts with an initial more or less formal description of such a system adding more detail. This process is iterative, best described as a cycle in which a range of design and analysis activities lead to a more definite, concrete specification and thus a more specific, tangible description of the system.

The concept of formal refinement has a long standing in software system engineering in communicating and reactive system methods, for example, state charts, CSP [8] , CCS [9] , the Pi-calculus [10] , as well as in data-based system specification methods like Z [11] , [12] , B [13] , VDM. The field is well-explored and the literature is consequently vast so it is beyond the scope of this paper to give a comprehensive survey. Nevertheless, we are providing here an informal overview of a range of major techniques commonly used in these formal methods. This overview serves conveying the conceptual ideas to prepare the way for their implementation in the formal refinement of the RR-cycle and illustrate them on the application. The related work, Section VIII, will, however, contrast in more detail to similar formal refinement approaches.

1) State based trace refinement of Isabelle Infrastructure framework: The Isabelle Infrastructure framework enables a state-based specification (sometimes called model-driven) of systems where the overall system behaviour is defined (by an inductive definition) as a set of traces of system states (that include all aspects of infrastructures, actors, and policies). The system behaviour is the set of all possible traces. The general idea of refinement is to make the behaviour more concrete, that is, to reduce the set of all possible traces. Then a refinement can be simply defined by trace inclusion: in other words, system specification S 0 refines to system specification S 1 , formally,

(the inversion of the inclusion signs is deliberate not accidental). It turns out that this inclusion based refinement is quite versatile compared to existing forms of refinement simply because it combines a rich state based view with the idea of traces of system changes that include events (or actions) while the events themselves are abstract but can be refined by adding detail to the context conditions in the inductive definitions associated to the events. We now consider the various forms of refinement informally to prepare the ground and to give reference points to later stages when we explain in the application of the Isabelle Infrastructure framework which form of refinement we are looking at in a specific instance.

2) Event based trace refinement: Formally, the definition of trace refinement is given above: more refined behavioural models have a subset of the traces of the abstract model. The intuition behind the trace inclusion based refinement is that a more abstract specification allows various choices how system states may be reached while a more definite system description reduces those choices by selecting the system traces corresponding to a more specific system implementation. This general notion of reduction of system traces may be a design decision motivated by detected failures or simply motivated by making the behaviour more deterministic and consequently needs to go along with various changes to the design. This need for changes leads to the following category of data refinement.

3) Data refinement: This refinement is defined by making an abstract data type more concrete in the refinement step. A classical example is refining sets into the data type of lists which leads to making choices on order and replication of elements: since lists are ordered, for example, the set {1, 2, 3} could be represented in the refinement by the list [1,2,3] but also [3,2,1] -besides four other orders. In addition, also [1, 1, 2, 3] could be a possible refinement if replication is deemed a targeted feature. As a consequence of this choice for refining the data, certain traces are excluded in the refinement that could have been possible system traces under different choices for order and replication.

To provide a closed representation of process behaviour, event based process calculi like CSP [8] , [14] or CCS/Picalculus introduce choices between different events that are possible in the course of the execution of a process. In the trace model it is simple to see the extension of those choices: there will be a number of different continuations of a trace t, for example tˆ a and tˆ b represent the potential continuation of a trace of events t by either an event a or an event b if those two events are provided by a choice at the point of execution of the process after trace t has been executed. The syntaxˆrepresents trace concatenation and a creates a trace consisting of just the event a. In a purely event-based calculus, like CSP, CCS, or Pi, the internal system behaviour needs to be expressed in the specification purely based on the reaction, since state information is not part of the model by contrast to a model like the one used in the Isabelle Infrastructure framework (see previous subsection). So to distinguish between different ways the specified system may react to the events that the environment offers, CSP, for example, offers different kinds of choices that correspond to two levels of non-determinism. For any processes P and Q the process P Q is the so-called external choice between P and Q. This process is the process that offers the first events of P and Q to the environment 1 and then behaves either like P or like Q depending on the choice the environment made. Only if the environment offers events that may be communicated by both processes there is a real choice of the process.

The process P Q is the so-called internal choice or non-deterministic choice. In contrast to the former, it expresses that the choice lies with the system. That is, even if the environment offers only events that only one of P and Q can communicate, the process may refuse to communicate one of the offered events. Hence, it represents proper non-determinism as the decision cannot be determined by the environment. In a refinement, non-deterministic choices can be replaced by deterministic choices: making a process more deterministic moves it closer to an implementation. However, the traces model is not sufficient to show the difference. In this model both processes above have the same set of traces and are thus semantically equivalent: traces(P Q) = traces(P Q) .

This motivates thinking about Failures. 1 The notion of environment means all other processes.

In terms of modeling these different types of choices, it turns out that the pure trace model is not sufficient because traces record only what processes can do but not what they must do or in other words what they can refuse to do. In the example above, both processes have the trace with a and b included in their trace set because for both this continuation is a possibility. However, the traces do not reflect that the process using the non-deterministic choice could have stopped and refused to do anything in most cases. Therefore, to integrate this additional view on choices, the CSP calculus offers a second semantics adding refusals. Refusals are sets of events that a process can refuse at some point of execution, that is, for a certain trace. Failures are the semantic model in which traces are combined with sets of refusals. In this semantics our two processes above become distinguishable because P Q can only refuse what both P and Q can refuse whereas P Q only must accept if neither P nor Q can refuse and thus in the failures model failures(P Q) ⊃ failures(P Q) and consequently P Q F P Q .

2) Divergences and Termination: Undefined behaviour may be represented by a system internal invisible action, for example, τ in CSP. Nontermination can then be defined by an infinite sequence of such τ actions. For those traces where a program (or system) may diverge, the system does not communicate anything so there is no definite behaviour. In order to arrive at a meaningful refinement order that augments failures and divergences, all system traces are considered possible from the point of divergence onwards. Reducing non-termination leads to concretizing a program corresponding to extending the range for which the program produces meaningful output. The most general behaviour is represented by the process that diverges from the beginning. The divergences intuitively corresponds more closely to a view of input-output program where classically termination is part of total correctness but is less suitable for the wider system view we have here.

3) Action refinement: In reactive systems languages as well as human centric system formalisms, events or actions are a central modeling entity. The terms "action" or "event" are mostly used synonymously but in some formalism, like statecharts, actions and events are different entities -one representing pre-requisites of transitions, the other their consequences. Actions can be made more definite by combining them or making their occurrence dependent on conditions (or guards) of the context.

The formal notion of refinement for Isabelle infrastructures with actors naturally enables all the different notions of refinement because this refinement is simply based on an Isabelle type map and an inductive definition of behaviour. Isabelle's datatypes are rich enough to support most data refinements. Termination-based and non-deterministic refinement notions are typical for purely reactive systems where the entire state is abstract and system behaviour is purely described by traces of events. In our case, we might have the possibility of nontermination, for example by a loop in a state that may occur infinitely often for all actions so that the state does not change. Similarly, failures means that in a state some actions may not occur which can be modeled in the inductive definition of the system behaviour.

A language like CSP is an explicit calculus for constructing specific specified communication patterns using "standardized" constructors. However, using Higher Order Logic in Isabelle, we can simulate all choices, failures, etc, explicitly in the logic. Even though we do not have standardized constructors, we can mimic each and any of the CSP constructors. Moreover, we are not limited by a pure event-based view and can additionally simulate those constructors that CSP cannot offer, like action refinement. The latter is possible because our actions are more abstract and a vast portion of the action semantics is implemented in the additional preconditions of the corresponding rule of the semantics for this action (see Section VI-A) which provides fine grained tuning possibilities.

As described initially, Dependability Engineering consists of two complimentary activities: protection goal specification and attack (or, more generally, risk) analysis. These activities can be defined informally by "natural language" descriptions or using semi-formal modeling languages. Attack trees and Misuse Cases are commonly employed and recommended for this. For the design, UML diagrams can be used. However, we take this general, informal or semi-formal methodology further in our Isabelle Infrastructure framework that embeds attack trees in a fully formal way equipping it with a temporal logic CTL semantics that supports finding attacks based on negating global policies. Moreover, we adopt the system specification refinement idea from classical software engineering whereby we overcome a crucial shortcoming of refinement that has -in our opinion -impeded the success and the adoption of formal notions of refinement: how to find the right refinement relation for a refinement step in a concrete application! Finding the right refinement relation is in the security engineering process evidently provided by the attacks found through the attack tree analysis. Since the attack properties as well as attack paths are an outcome of the adequacy translation of the attack trees, these properties (in which the global security property is violated) can be used. We want to avoid them. Since refinement is provided by trace sets, we do this simply by excluding the attack traces and thereby we direct the refinement. Whether it is a data refinement, action refinement of another form of refinement (see Section III) may vary but all forms are possible. We will illustrate the most relevant ones by our case study, after introducing the formal notion of Dependability Refinement in the remainder of this section.

Intuitively, a refinement changes some aspect of the type of the state, for example, replaces a data type by a richer datatype or restricts the behaviour of the actors. The former is expressed directly by a mapping of datatypes, the latter is incorporated into the state transition relation of the Kripke structure that corresponds to the transformed model. In other words, we can encode a refinement within our framework as a relation on Kripke structures that is parameterized additionally by a function that maps the refined type to the abstract type. The direction "from refined to abstract" of this type mapping may seem curiously counter-intuitive. However, the actual refinement is given by the refinement that uses this function as an input. The refinement then refines an abstract to a more concrete system specification. The additional layer for the refinement can be formalized in Isabelle as a refinement relation E . The relation refinement is typed as a relation over triples -a function from a threefold Cartesian product to bool, the type containing true and false only. The type variables σ and σ input to the type constructor Kripke represent the abstract state type and the concrete state type. Consequently, the middle element of the triples selected by the relation refinement is a function of type σ ⇒ σ mapping elements of the refined state to the abstract state. The expression in quotation marks after the type is again the infix syntax in Isabelle that allows the definition of mathematical notation instead of writing refinement in prefix manner. This nicer infix syntax is already used in the actual definition. Finally, the arrow =⇒ is the implication of Isabelle's meta-logic while −→ is the one of the object logic HOL. They are logically equivalent but of different types: within a HOL formula P , for example, as below ∀x.P −→Q, only the implication −→ can be used.

The definition of K E K' states that for any state s of the refined Kripke structure that can be reached by the state transition in zero or more steps from an initial state s of the refined Kripke structure, the mapping E from the refined to the abstract model's state must preserve this reachability, i.e., the image of s must also be an initial state and from there the image of s under E must be reached with 0 or n steps.

A first direct consequence of this definition is the following lemma where the operator ' in E'(init K') represents function image, that is the set, {E(x).x ∈ init K'}.

A more prominent consequence of the definition of refinement is that of property preservation. Here, we show that refinement preserves the CTL property of EFs which means that a reachability property true in the refined model K' is already true in the abstract model. A state set s represents a property in the predicate transformer view of properties as sets of states. The additional condition on initial states ensures that we cannot "forget" them.

theorem prop_pres:

It is remarkable, that our definition of refinement by Kripke structure refinement entails property preservation and makes it possible to prove this as a theorem in Isabelle once for all, i.e., as a meta-theorem. However, this is due to the fact that our generic definition of state transition allows to explicitly formalize such sophisticated concepts like reachability. For practical purposes, however, the proof obligation of showing that a specific refinement is in fact a refinement is rather complex justly because of the explicit use of the transitive closure of the state transition relation. In most cases, the refinement will be simpler. Therefore, we offer additional help by the following theorem that uses a stronger characterization of Kripke structure refinement and shows that our refinement follows from this.

theorem strong_mt:

This simpler characterization is in fact a stronger one: we could have s→ σ s in the refined Kripke structure K' and ¬(E(s)→ σ E(s )) but neither s nor s are reachable from initial states in K'. For cases, where we want to have the simpler onestep proviso but still need reachability we provide a slightly weaker version of strong_mt.

The idea of property preservation coincides with the classical idea of trace refinement as defined for process algebra semantics like CSP's. In these semantics, the behaviour of a system is defined as the set of its traces, which are all possible sequences of events reflecting each moment of system execution. Consequently, a system S' is a refinement of another system S if the semantics of S' is a subset of the traces of the former one. We adopt this idea in principle, but extend it substantially: our notion additionally incorporates structural refinement. Since we include a state map σ'⇒σ in our refinement map, we additionally allow structural refinement: the state map generalizes the basic idea of trace refinement by traces corresponding to each other but allows additionally an exchange of data types thus allowing exchange of the structure of the system. As we see in the application to the case study in Section VII-A, the refinement steps may sometimes just specialize the traces: in this case the state map σ'⇒σ is just identity.

In addition, we also have a simple implicit version of action refinement. In an action refinement, traces may be refined by combining consecutive system events into atomic events thereby reducing traces. We can observe this kind of refinement in the second refinement step of CWA considered in Section VII-B.

Given the closed formal definition of refinement provided in the previous section, we can now formalize the RR-Cycle by the following predicate.

This predicate encompasses that the Kripke K' is a dependable system refinement of the Kripke structure K if there exists a data type map E such that the refined Kripke structure K' avoids the attack. The RR-cycle is an iterative process. Since refinement is a transitive relation, the individual refinement steps found in each iteration line up to one final refinement step. The formal predicate RR_cycle constitutes the termination condition for the cycle for the specified security goal s and for the lined up individual refinement steps. Only when a sufficient number of interleaved refinement and attack tree iterations have eradicated all attacks on the security goal s will the system specification be secure and this is verified by RR_cycle.

The German Chancellor Angela Merkel has strongly supported the publication of the mobile phone Corona-virus warning app (CWA; [15] ) by publicly proclaiming that "this App deserves your trust" [16] . Many millions of mobile phone users in Germany have downloaded the app with 6 million on the first day. CWA is one amongst many similar applications that aim at the very important goal to "break infection chains" by providing timely information to users of whether they have been in close proximity to a person who tested positive for COVID-19.

The app was designed with great attention on privacy: a distributed architecture [17] has been adopted that is based on a very clever application design whereby clients broadcast highly anonymized identifiers (ids) via Bluetooth and store those ids of people in close proximity. Infected persons report their infection by uploading their ids to a central server, providing all clients the means to assess exposure risk locally, hence, stored contact data has never to be shared.

We are mainly concerned with the architecture and protocols proposed by the DP-3T (Decentralized Privacy-Preserving Proximity Tracing) project [18] . The main reason to focus on this particular family of protocols is the Exposure Notification Framework (ENF), jointly published by Apple and Google [19] , that adopts core principles of the DP-3T proposal. This API is not only used in CWA but has the potential of being widely adopted in future app developments that might emerge due to the reach of players like Apple and Google.

There are, however, competing architectures noteworthy, namely protocols developed under the roof of the Pan-European Privacy-Preserving Proximity Tracing project (PEPP-PT) [20] , e.g. PEPP-PT-ROBERT [21] , that might be characterized as centralized architectures.

Neither DP-3T nor PEPP-PT are synonymous for just one single protocol. Each project endorses different protocols with unique properties in terms of privacy and data protection.

There is a variety of noteworthy privacy and security issues. The debate among advocates of centralized architectures and those in favor of a decentralized approach in particular yields a lot of interesting material detailing different attacks and possible mitigation strategies: [22] , [23] , [24] .

In terms of attack scenarios, we focus on, what might be classified as deanonymisation attacks: Tracking a device (see [24] [p9], [22] [p8]) and identifying infected individuals (see [24] [p5] [22] [p9]).

1) Basic DP-3T architecture: Upon installation, the app generates secret daily seeds to derive so-called Ephemeral Ids (EphIDs) from them. EphIDs are generated locally with cryptographic methods and cannot be connected to one another but only reconstructed from the secret seed they were derived from.

During normal operation each client broadcasts its EphIDs via Bluetooth whilst scanning for EphIDs broadcasted by other devices in the vicinity. Collected EphIDs are stored locally along with associated meta-data such as signal attenuation and date. In DP-3T the contact information gathered is never shared but only evaluated locally.

If patients test positive for the Corona-virus, they are entitled to upload specific data to a central backend server. This data is aggregated by the backend server and redistributed to all clients regularly to provide the means for local risk scoring, i. e., determining whether collected EphIDs match those broadcasted by now-confirmed Corona-virus patients during the last, e. g., 14, days.

In the most simple (and insecure) protocol proposed by DP-3T this basically translates into publishing the daily seeds used to derive EphIDs from. The protocol implemented by ENF and, hence, CWA adopts this low-cost design [17] . DP-3T proposes two other, more sophisticated protocols that improve privacy and data protection properties to different degrees but are more costly to implement. Figure 2 illustrates the basic system architecture along with some of the mitigation measures either implemented in CWA or proposed by DP-3T.

The formal model of CWA uses the Isabelle Infrastructure framework instantiating it by reusing its concept of actors for users and smartphones whereby locations correspond to physical locations. The Ephemeral Ids, their sending and change is added to Infrastructures by slightly adapting the basic state type of infrastructure graphs and accordingly the semantic rules for the actions move, get, and put. The details of the newly adapted Infrastructure are presented in Section VI. Technically, an Isabelle theory file Infrastructure.thy builds on top of the theories for Kripke structures and CTL (MC.thy), attack trees (AT.thy), and security refinement (Refinement.thy). Thus all these concepts can be used to specify the formal model for CWA, express relevant and interesting properties, and conduct interactive proofs (with the full support of the powerful and highly automated proof support of Isabelle). All Isabelle sources are available online [25] .

The Isabelle Infrastructure framework supports the representation of infrastructures as graphs with actors and policies attached to nodes. These infrastructures are the states of the Kripke structure.

The transition between states is triggered by nonparameterized actions get, move, and put executed by actors. Actors are given by an abstract type actor and a function Actor that creates elements of that type from identities (of type string written ''s'' in Isabelle). Actors are contained in an infrastructure graph type igraph constructed by its constructor Lgraph. datatype igraph = Lgraph (location × location)set location ⇒ identity set identity ⇒ (string set × string set × efid) location ⇒ string × (dlm × data) set location ⇒ efid set identity ⇒ location ⇒ (identity × efid) set

In the current application of the framework to the CWA case study, this graph contains a set of location pairs representing the topology of the infrastructure as a graph of nodes and a function that assigns a set of actor identities to each node (location) in the graph. The third component of an igraph assigns the credentials to each actor: a triple-valued function whose first range component is a set describing the credentials in the possession of an actor and the second component is a set defining the roles the actor can take on; most prominently the third component is the efid assigned to the actor. This is initially just a natural number but will be refined to actually represent lists of Ephemeral Ids later when refining the specification. Corresponding projection functions for each of the components of an infrastructure graph are provided; they are named gra for the actual set of pairs of locations, agra for the actor map, cgra for the credentials, lgra for the data at that location, egra for the assignment of current efids to locations, and kgra for the knowledge set for each actor for each location.

To start the CWA case study, we provide an example scenario: the initial values for the igraph components use two locations pub and shop to define the following constants (we omit the data map component ex_locs). The map ex_loc_ass is a function, defined using the λ-calculus of Isabelle, mapping each location to the set of actor identities currently at this location. The component ex_creds is a function assigning actor identities to triplets of which the most important part is the third element that contains the actors' ephemeral ids. The ephemeral ids of all actors present at a location are collected in the component ex_efids that maps each location to the set of these efids. Infrastructures are given by the following datatype that contains an infrastructure graph of type igraph and a policy given by a function that assigns local policies over a graph to all locations of the graph.

There are projection functions graphI and delta when applied to an infrastructure return the graph and the local policies, respectively. The function local_policies gives the policy for each location x over an infrastructure graph G as a pair: the first element of this pair is a function specifying the actors y that are entitled to perform the actions specified in the set which is the second element of that pair. For CWA, the initial infrastructure contains the graph ex_graph with its two locations pub and shop and is then wrapped up with the local policies into the infrastructure Corona_scenario that represents the "initial" state for the Kripke structure.

Corona_scenario ≡ Infrastructure ex_graph local_policies

Policies specify the expected behaviour of actors of an infrastructure. They are given by pairs of predicates (conditions) and sets of (enabled) actions. They are defined by the enables predicate: an actor h is enabled to perform an action a in infrastructure I, at location l if there exists a pair (p,e) in the local policy of l (delta I l projects to the local policy) such that the action a is a member of the action set e and the policy predicate p holds for actor h. enables I l h a ≡ ∃ (p,e) ∈ delta I l. a ∈ e ∧ p h

The privacy protection goal is to avoid deanonymization. That is, an attacker should not be able to disambiguate the set of pairs of real ids and EphIDs. This is abstractly expressed in the predicate identifiable. The clue is to filter the set A of all pairs of real ids and ephemeral ids eid for the given eid. If this set happens to have only one element, that is, there is only one pair featuring this specific eid, then the real id of that efid must be the first component, the identity Id. Gathering the "observations" of the actor Eve over all locations in the above policy formula is achieved by combining all pairs of ids and efids that Eve could infer on her movements through different locations, then building the intersection over those sets to condense the information and finally substracting all pairs in which Eve's own identity appears as first element.

The resulting set of pairs should not allow identification of any eid, that is, it must not be singleton for the global policy to hold, and thus, to protect privacy of users.

The state transition relation is defined as a syntactic infix notation I → i I' and denotes that infrastructures I and I' are in this relation. To give an impression of this definition, we show first the rule defining the state transition for the action get. Initially, this rule expresses that an actor that resides at a location l in graph G (denoted as "a @ G l") and is enabled by the local policy in this location to "get", can combine all ids at the current location (contained in egra G l) with all actors at the current location (contained in agra G l) and add this set of pairs to his knowledge set kgra G using the function update f(l := n) redefining the function f for the input l to have now the new value n.

get: G = graphI I =⇒ a @G l =⇒ enables I l (Actor a) get =⇒ I' = Infrastructure (Lgraph (gra G)(agra G)(cgra G)(lgra G)(egra G) ((kgra G)((Actor a) := ((kgra G (Actor a))(l:= {(x,y). x ∈ agra G l ∧ y ∈ egra G l}))))) (delta I) =⇒ I →i I' Another interesting rule for the state transition is the one for move whose structure resembles the previous one. move: G = graphI I =⇒ a @G l =⇒ a ∈ actors_graph(graphI I) =⇒ l ∈ nodes G =⇒ l' ∈ nodes G =⇒ enables I l' (Actor h) move =⇒ I' = Infrastructure (move_graph_a a l l' (graphI I))(delta I) =⇒ I →i I'

The semantics of this rule is implemented in the function move_graph_a that adapts the infrastructure state so that the moving actor a is now associated to the target location l' in the actor map agra and not any more at the previous location l and also the association of efids is updated accordingly. move_graph_a n l l' g ≡ Lgraph (gra g) (if n ∈ ((agra g) l) ∧ n / ∈ ((agra g) l') then ((agra g)(l := (agra g l) -{n})) (l' := (insert n (agra g l'))) else (agra g)) (cgra g)(lgra g) (if n ∈ ((agra g) l) ∧ n / ∈ ((agra g) l') then ((egra g)(l := (egra g l) -{cgra g n})) (l' := (insert cgra g n)(egra g l'))) else egra g) (kgra g) Based on this state transition and the above defined Corona_scenario, we define the first Kripke structure.

For the analysis of attacks, we negate the security property that we want to achieve, usually expressed as the global policy.

Since we consider a predicate transformer semantics, we use sets of states to represent properties. The invalidated global policy is given by the set scorona. The attack we are interested in is to see whether for the scenario Kripke_scenario ≡ Infrastructure ex_graph local_policies from the initial state Icorona ≡{corona_scenario}, the critical state scorona can be reached, that is, is there a valid attack (Icorona,scorona)?

As an initial step, we derive a valid and-attack for the Kripke structure corona_Kripke using the attack tree proof calculus. From these preparations, we can exhibit that an attack is possible using the attack tree calculus [7] .

To this end, the Correctness theorem AT_EF is simply applied to immediately prove the above CTL statement. This application of the meta-theorem of Correctness of attack trees saves us proving the CTL formula tediously by exploring the state space in Isabelle proofs. Alternatively, we could use the generated code for the function is_attack_tree in Scala (see [7] ) to check that a refined attack of the above is valid.

Clearly, fixed Ephemeral Ids are not really ephemeral. The model presented in Section VI has deliberately been designed abstractly to allow focusing on the basic system architecture and finding an initial deanonymization attack. We now introduce "proper" Ephemeral Ids and show how the system datatype can be refined to a system that uses those instead of the fixed ones.

For the DP-3T Ephemeral Ids [26] , for each day t a seed SK t is used to generate a list of length n = 24 * 60/L, where L is the duration for which the Ephemeral Ids are posted by the smart phone "where PRF is a pseudo-random function (e.g., HMAC-SHA256), "broadcast key" is a fixed, public string, and PRG is a pseudorandom generator (e.g. AES in counter mode)" [26] . From a cryptographic point of view, the crucial properties of the Ephemeral Ids are that they are purely random, therefore, they cannot be guessed, but at the same time if -after the actual encounter between sender and receiver -the seed SK t is published, it is feasible to relate any of the EphIDi to SK t for all i ∈ {1, . . . , n}. For a formalization of this crucial cryptographic property in Isabelle it suffices to define a new type of list of Ephemeral Ids efidlist containing the root SK t (the first efid), a current efid indicated by a list pointer of type nat, and the actual list of efids given as a function from natural numbers to efids. datatype efidlist = Efids "efid" "nat" "nat ⇒ efid"

We define functions for this datatype: efidsroot returning the first of the three constituents in an efidlist (the root SK t ); efids_index giving the second component, the index of the current efid; efids_inc_ind applied to an efidlist increments the index; efids_cur returning the current efid from the list and efids_list for the entire list (the third component).

The first step of refinement replaces the simple efid in the type igraph of the infrastructure type by the new type efidlist. Note, that in the new datatype igraph this change affects only the third component, the credentials, to become identity ⇒ efidlist

The refined state transition relation implements the possibility of changing the Ephemeral Ids by the rule for the action put that resembles very much the rule for get. The important change to the infrastructure state is implemented in the function put_graph_efid that increases the current index in the efidlist in the credential component cgra g n for the "putting" actor identity n and inserts the current efid from that credential component into the egra component, the set of currently "present" Ephemeral Ids at the location l. put_graph_efid n l g ≡ Lgraph (gra g)(agra g) ((cgra g)(n := efids_inc_ind(cgra g n))) (lgra g) ((egra g) (l := insert (efids_cur(efids_inc_ind(cgra g n))) ((egra g l) -{efids_cur(cgra g n)}))) (kgra g)

We can now apply the refinement by defining a datatype map from the refined infrastructure type InfrastructureOne.infrastructure to the former one Infrastructure.infrastructure. definition refmap :: InfrastructureOne.infrastructure ⇒ Infrastructure.infrastructure where refmap I = Infrastructure.Infrastructure (Infrastructure.Lgraph (InfrastructureOne.gra (graphI I)) (InfrastructureOne.agra (graphI I)) (λ h. repl_efr (InfrastructureOne.cgra (graphI I)) h) (InfrastructureOne.lgra (graphI I)) (λ l. λ a. efids_root (InfrastructureOne.cgra (graphI I) a) '(InfrastructureOne.agra (graphI I) l) (λ a. λ l.

if (a ∈ actors_graph(graphI I) ∧ l ∈ nodes (graphI I)) then (λ (x,y). (x, efids_root( InfrastructureOne.cgra (graphI I) (anonymous_actor I y))) '(InfrastructureOne.kgra (graphI I)) a l) else )))

The function anonymous_actor maps an efid back to the actor it is assigned to if that exists which can be expressed in Isabelle using the Hilbert-operator SOME.

anonymous_actor I e = (SOME a :: identity. a ∈ actors_graph (InfrastructureOne.graphI I) ∧ e ∈ range(efids_list (cgra (graphI I) a)))

This refinemnent map is then plugged into the parameter E of the refinement operator allowing to prove corona_Kripke refmap corona_KripkeO where the latter is the refined Kripke structure. The proof of this refinement that adds quite some structure requires a great deal of proving of invariants over the datatype. Surprisingly, we can still prove 

The persistent attack can be abbreviated informally by the action sequence [get,move,move,get] performed by actors Eve, Bob, Eve, and Eve again, respectively. How can a second refinement step avoid that Eve can get identification information for Bob? Can we impose that after the first move of Bob a put action, increasing Bob's efid and thus obfuscating his id, must happen before Eve can do another get? A very simple remedy to impose this and thus exclude the persistent attack is to enforce a put action after every move by an action refinement that binds these actions together. That is, when actors move they simultaneously change their efid. We can implement that change by a minimal update to the function move_graph_a (see Section VI) by adding an increment (highlighted as efids inc ind in the code snippet) of the currently used Ephemeral Id before updating the egra component of the target location. In other words, if Bob moves his egra component is changed automatically, that is, he gets a new efid, to stay with the example. move_graph_a n l l' g ≡ Lgraph (gra g) (if n ∈ ((agra g) l) ∧ n / ∈ ((agra g) l') then ((agra g)(l := (agra g l) -{n})) (l' := (insert n (agra g l'))) else (agra g)) (cgra g)(lgra g) (if n ∈ ((agra g) l) ∧ n / ∈ ((agra g) l') then ((egra g)(l := (egra g l) -{cgra g n})) (l' := insert (efids_cur( efids inc ind (cgra g n)))(egra g l'))) else egra g) (kgra g) This is an action refinement because the move action is changed. It is a refinement, since any trace of the refined model can still be mapped to a trace in the more abstract model just omitting a few steps (the refinement relation is defined using the reflexive transitive closure → * i ): The refinement map refmapI is trivial since the datatypes do not change. Because of → * i intermediate dangerous states in the abstract may be swallowed in the refined model. For example, the state s, where Bob has moved but not yet put the new efid, does not exist in the refined model any more.

As we have seen in Section IV-D and as has been illustrated by the application example, the RR-cycle is an iterative process. The formal predicate constitutes the termination condition for the cycle for the specified security goal: in our case study this has been "non-identifiability". Applying the formal characterization of the predicate RR-cycle to the case study after the two steps detailed in this section, we see that the second refinement step still does not avoid the deanonymization attack. Trying to verify the predicate RR_cycle fails. We find another attack possibility: actor Alice is on her own with the attacker in a location. As a remedy, the final refinement steps ensure that action move is constrained by (a) no actor can move to a location where there are less than three actors and (b) also no actor leaves a location when there are not at least four actors.

Step (a) is achieved by refining the specification such that it is only permitted to move if there are three or more actors at the destination location l'. Isabelle's mathematical library provides the necessary arithmetic and set theory to express this constraint as card (agra g l') ≥ 3. The constraint can be added as an additional precondition to the function move_graph_a for the field agra (and similarly for the egra field, see previous Section VII-B). Similarly,

Step (b) can be achieved by refining the specification further to allow an actor to move if the source destination l has at least three actors by adding card (agra g l') ≥ 4 to the agra field and similarly for egra. Summarizing, these two steps lead to a third refinement step that does not change the datatypes (as in the previous refinement step) but restricts the set of possible traces of state transition relations as is manifested in the refined definition of move_graph_a. The additional conditions are highlighted. move_graph_a n l l' g ≡ Lgraph (gra g) (if n ∈ ((agra g) l) ∧ n / ∈ ((agra g) l') ∧ card (agra g l') ≥ 3 ∧ card (agra g l) ≥ 4 then ((agra g)(l := (agra g l) -{n})) (l' := (insert n (agra g l'))) else (agra g)) (cgra g)(lgra g) (if n ∈ ((agra g) l) ∧ n / ∈ ((agra g) l') ∧ card (agra g l') ≥ 3 ∧ card (agra g l) ≥ 4 then ((egra g)(l := (egra g l) -{cgra g n})) (l' := insert (efids_cur(efids_inc_ind (cgra g n)))(egra g l'))) else egra g) (kgra g)

The proved meta-theory for attack trees can be applied to make this theorem more intuitive: the contraposition of the Correctness property grants that if there is no attack on (I,¬f), then EF ¬f does not hold in the Kripke structure. This yields the theorem in the following more understandable form since the AG f statement corresponds to ¬EF ¬f. [29] first define the fundamental notions of insiderness, policies, and behaviour showing how these concepts are able to express the classical insider threat patterns identified in the seminal CERT guide on insider threats [30] . This Isabelle Insider framework has been applied to auction protocols [31] , [32] illustrating that the Insider framework can embed the inductive approach to protocol verification [33] . An Airplane case study [34] , [35] revealed the need for dynamic state verification leading to the extension of adding a mutable state. Meanwhile, the embedding of Kripke structures and CTL into Isabelle have enabled the emulation of Modelchecking and to provide a semantics for attack trees [6] , [7] , [36] - [38] . Attack trees have provided the leverage to integrate Isabelle formal reasoning for IoT systems as has been illustrated in the CHIST-ERA project SUCCESS [5] where attack trees have been used in combination with the Behaviour Interaction Priority (BIP) component architecture model to develop security and privacy enhanced IoT solutions. This development has emphasized the technical rather than the psychological side of the framework development and thus branched off the development of the Isabelle Insider framework into the Isabelle Infrastructure framework. Since the strong expressiveness of Isabelle allows to formalize the IoT scenarios as well as actors and policies, the latter framework can also be applied to evaluate IoT scenarios with respect to policies like the European data privacy regulation GDPR [39] . Application to security protocols first pioneered in the auction protocol application [31] , [32] has further motivated the analysis of Quantum Cryptography which in turn necessitated the extension by probabilities [40] - [42] .

Requirements raised by these various security and privacy case studies have shown the need for a cyclic engineering process for developing specifications and refining them towards implementations. A first case study takes the IoT healthcare application and exemplifies a step-by-step refinement interspersed with attack analysis using attack trees to increase privacy by ultimately introducing a blockchain for access control [6] . First ideas to support a dedicated security refinement process are available in a preliminary arxive paper [43] but the current publication is the first to fully formalize the RRcycle and illustrate its application completely on the Coronavirus Warn App (CWA). The earlier workshop publication [3] provided the formalisation of the CWA illustrating the first two steps but it did not introduce the fully formalised RRcycle nor did it apply it to arrive at a solution satisfying the global privacy policy.

As has been illustrated on a large scale view in Section III, the concept of refinement features in various other formal methods. The most closely related one is the CSP failure/divergences-refinement as we have seen in the earlier section. Compared to the RR-cycle approach in the Isabelle Infrastructure framework, CSP offers a range of standardized constructors providing a calculus for event-based specification refinements. Clearly, CSP has thus a constructive advantage but as has already been discussed in Section III-C, Isabelle is expressive enough to emulate those constructors.

In general in CSP, the refinement is focused on events whereas in data-oriented specification formalism, for example, the B-method [13] , [44] , Z and its object-oriented exension Object-Z [45] , refinement means data refinement, that is, is represented as a relation between abstract and concrete datatypes. In early works, automated techniques have already been used to derive a concrete specification from an abstract one provided the concrete data types and refinement relation are given [46] - [48] . To find a refinement, that is, design the refined specification and define the refinement relation is a creative and ambigious process; it reflects the development skill of a programmer. An approach to derive refinement relations and refined specification for Object-Z specifications based on heuristics is [49] . By contrast to these earlier work, we make a substantial leap forward as we have a constructive way of finding a next refinement step guiding this process by attack tree analysis. The resulting RR-cycle thus provides a systematic way to construct refinements.

Statecharts [50] are a formal method focused on graphically modeling system states and their state transitions and allowing a structured view on the state and its data. Statecharts thus also combine data in states as well as dynamic behaviour; their semantics resembles our model. The transitions may be annotated with events that trigger a transition, conditions which guard it, and actions that are executed when the transition "fires". Statecharts may contain data and this data has various layers of abstraction but this is not a refinement rather a view of the same system at different levels of granularity. Concerning verification, an embedding of Extended Hierarchical Automata in Isabelle/HOL [51] , [52] is related to our work but the focus is on finding property preserving abstractions of the data within the Statechart [53] to enable model checking. Even though model checking is also involved there, abstraction is precisely the opposite of refinement. The relationship of Statecharts with CSP and model checking is also investigated [54] albeit in a dedicated extension of the FDR model checker thus not admitting explicit expression of data refinement and reasoning on system property preserving specification refinement and general security properties as we offer here.

In software engineering, dependability aggregates the security attributes confidentiality, integrity, and availability with safety, reliability, and maintainability [1] . In addition to security, we explicitly address safety through formal methods and verification with temporal logic since we can explicitly formalize relevant system properties and prove them on the system specification. However, maintainability (ability of easy maintenance (repair)) and reliability (continuity of correct service) are provided only indirectly because specification and formal proof imply higher quality.

Security verification is quite advanced when it comes to the automated analysis of security (or authentication) protocols. In terms of verification of security using process calculi, the Pi-calculus offers the dedicated model checker Proverif [55] . This work is relevant for our approach since it mechanizes a process calculus with actor-based communication and provides model checking verification but it does not address the idea of refinement let alone providing systematic support for dependability engineering. Security protocols have also been formalized in Isabelle [33] and a similar approach has been shown to integrate well with the Isabelle Insider framework [31] .

In this paper we have presented a formalization of the Refinement-Risk cycle (RR-cycle) within the Isabelle Infrastructure framework which encompasses a notion of formal refinement with a property preservation theory as well as attack trees based on Kripke structures and CTL. The RR-cycle iterates the activities of refinement and attack tree analysis until the termination condition is reached which shows that the global policy is true in the refined model. The application of this formalization of the RR-cycle is illustrated on the case study of the Corona-virus Warning App (CWA). To contrast the development of the RR-cycle we present an account of related refinement techniques in other formal methods. The comparison helps to make some of the used concepts clearer for a larger audience as well as showing up avenues for future work, for example, emulating some of the ready-made refinement constructors available in process calculi like CSP in the Isabelle Infrastructure framework.

The real novelty of the RR-cycle approach presented in this paper is guiding the dependability refinement process by attack trees. It is the expressivity of Isabelle's Higher Order logic that permits constructing -within the logic -a theory for refinement and attack trees. Thereby, the actual mechanism used in the framework for the analysis of security applications can be explicitly defined from first principles and its properties can be mathematically proved in the framework. At the same time, this generic meta-theory can be applied to application examples. That is, the defined constructs and proved theorems of the Infrastructure framework can now be instantiated to represent a concrete application example, verify security policies on it, find attacks using attack trees, and define refinement relations to new models and apply the RR-cycle constructs to verify -all this within the same framework and thus with the highest consistency and correctness guarantees due to both.

Basic concepts and taxonomy of dependable and secure computing

On the derivation of secure components

Modeling and analyzing the coronavirus warning app with the isabelle infrastructure framework

Isabelle/HOL -A Proof Assistant for Higher-Order Logic, ser

Success: Secure accessibility for the internet of things

Combining secure system design with risk assessment for iot healthcare systems

Attack trees in isabelle

Communicating Sequential Processes

A Calculus of Communicating Systems, ser. Lecture Notes in Computer Science

Communicating and mobile systems -the Pi-calculus

Data semantics

The Z notation -a reference manual, ser. Prentice Hall International Series in Computer Science

The B-book -assigning programs to meanings

The Theory and Practice of Concurrency, ser. Prentice Hall Series in Computer Science

The Corona-Warn-App Project

2020, german government announcement and support of Coronavirus warning app

Corona-warn-app solution architecture

Decentralized Privacy-Preserving Proximity Tracing

Exposure notification framework

ROBust and privacy-presERving proximity Tracing protocol

Analysis of DP3T: Between Scylla and Charybdis

Response to 'Analysis of DP3T: Between Scylla and Charybdis

Privacy and security risk evaluation of digital proximity tracing systems

3T/documents/blob/master/Security analysis/Privacy and Security Attacks on Digital Proximity Tracing Systems

2020, isabelle sources for IBC formalisation

Decentralized privacy-preserving proximity tracing -White Paper

Invalidating policies using structural information

Combining generated data models with formal invalidation for insider threat analysis

Modeling and verification of insider threats using logical analysis

The CERT Guide to Insider Threats: How to Prevent, Detect, and Respond to Information Technology Crimes (Theft, Sabotage, Fraud)

Towards formal analysis of insider threats for auctions

Insider threats for auctions: Formal modeling, proof, and certified code

The inductive approach to verifying cryptographic protocols

Investigating airplane safety and security against insider threats using logical modeling

Applying the isabelle insider framework to airplane security

A proof calculus for attack trees

Human centric security and privacy for the iot using formal techniques

Formal models of human factors for security and privacy

Formal modeling and analysis of data protection for gdpr compliance of iot healthcare systems

Qkd in isabelle -bayesian calculation

Attack trees in isabelle extended with probabilities for quantum cryptography

Formalizing probabilistic quantum security protocols in the isabelle infrastructure framework

A formal development cycle for security engineering in isabelle

The B tool (abstract)

The Object-Z Specification Language

Program derivation using the refinement calculator

Data refinement by calculation

A calculus of refinements for program derivation

Heuristics for refinement relations

Statecharts: A visual formalism for complex systems

Representing hierarchical automata in interactive theorem provers

Formalizing statecharts using hierarchical automata

Formal Techniques for Networked and Distributed Systems -FORTE 2005, 25th IFIP WG 6.1 International Conference

Verifying statemate statecharts using CSP and FDR

Automatic verification of security protocols in the symbolic model: the verifier ProVerif