key: cord-0048942-h2l6ozkv authors: Hamers, Ruben; Jongmans, Sung-Shik title: Discourje: Runtime Verification of Communication Protocols in Clojure date: 2020-03-13 journal: Tools and Algorithms for the Construction and Analysis of Systems DOI: 10.1007/978-3-030-45190-5_15 sha: bcba5ee37fb265cdabe360b3995c71ec102c8fb5 doc_id: 48942 cord_uid: h2l6ozkv This paper presents Discourje: a runtime verification framework for communication protocols in Clojure. Discourje guarantees safety of protocol implementations relative to specifications, based on an expressive new version of multiparty session types. The framework has a formal foundation and is itself implemented in Clojure to offer a seamless specification–implementation experience. Benchmarks show Discourje’s overhead can be less than 5% for real/existing concurrent programs. Background. To take advantage of today's and tomorrow's multi-core processors, shared-memory concurrent programming-a notoriously complex enterprise-is becoming increasingly important. To alleviate some of the complexities, in addition to low-level synchronization primitives, several modern programming languages have started to offer core support for higher-level communication primitives as well, in the guise of message passing through channels (e.g., Go [25] , Rust [42] , Clojure [17] ). The idea is that, beyond their usage in distributed computing, channels can also serve as a programming abstraction for shared memory, supposedly less prone to concurrency bugs than locks, semaphores, and the like. However, in a recent study of 171 concurrency bugs in popular open source Go programs [48] , Tu et al. found that "message passing does not necessarily make multi-threaded programs less error-prone than shared memory." From a programmer's perspective, a key problem is this: if we already know which roles (threads), infrastructure (channels between threads), and protocols (communications through channels) our program should consist of, then how can we ensure our implementation is indeed safe relative to our specification? Safety means "bad" channel actions never occur: if a send, receive, or close happens in the implementation, then it is allowed by the protocol in the specification. For instance, typical protocols rule out common message-passing concurrency bugs [48] , such as sends without receives, receives without sends, and type mismatches (actual type sent = expected type received). Essentially, thus, we face a classical verification problem, with classical ingredients: an implementation language I, a specification language S, and an inclusion relation . Over the past years, a significant body of research in this area has been based on multiparty session types (MPST) [27] . The idea is to specify protocols as behavioral types [1, 30] against which threads are subsequently type-checked; the impl. Fig. 2 . This paper theory guarantees that static well-typedness of threads at compile-time implies dynamic safety of their channel actions at run-time. Originally [27] , I was a dialect of pi-calculus, S was a calculus of behavioral types, and was defined through formal typing rules, but more recently, practical implementations were developed as well [14, 28, 29, 37, 38, 44] , where I is an existing general-purpose language (GPL; Erlang, F#, Go, Java, Scala), S is a new domain-specific language (DSL; Scribble), and encodes behavioral types in S as non-behavioral types in I (e.g., through custom communication API generation [29] ). These works highlight two key strengths of the MPST methodology, namely it supports: #1 fully automated verification of concrete programs (vs. abstract models); #2 user-friendly programming language-based notation to write specifications of protocols (vs. dynamic logic or temporal logic). Problem. One of the key open problems of MPST concerns expressiveness. For instance, suppose we need to write a program in which messages are repeatedly communicated from threads I 1 and I 2 to thread I 3 , non-deterministically ordered (i.e., standard producers-consumer); this protocol is not supported by MPST. We identify two reasons why expressiveness is limited. First, MPST were originally developed for distributed computing (service choreographies [10, 11] ); accordingly, decoupled verification of roles (per-service type-checking) has always been a key requirement [14] . This is reflected in the MPST workflow ( Fig. 1) : first, the programmer writes a global protocol specification; then, an MPST tool projects it onto every role to infer local protocol specifications; then, the implemented threads are type-checked. However, role-based decomposition of global behavior into equivalent local behaviors often cannot be done statically (e.g., [12] ), so expressiveness is limited by "projectability". Second, MPST prescribes static type-checking, which limits expressiveness, because: (a) type-checking is sound, but not complete, so the static MPST approach rejects implementations that are conservatively ill-typed but actually safe; (b) protocols whose execution relies on value-dependent control flow are supported only in limited circumstances. To alleviate (b), value-dependent type constructors can be added to S [20, 47] , but this raises practical issues (i.e., dependent types are only scarcely supported by mainstream GPLs). Contributions. To simplify shared-memory concurrent programming in languages with channels, we aim to consolidate strengths #1 and #2 (page 267), but alleviate MPST's expressiveness issues. Specifically, this paper is founded on two tenets that depart from existing work in significant ways (Fig. 2) . First, we exploit the fact that in our context, channels serve "merely" as programming abstractions for shared memory; there is no distribution whatsoever. Thus, whereas MPST-based verification for distributed computing requires projection, this is not the case in our setting, opening the door to fully automated projection-free MPST and eliminating a significant source of restrictions. Second, instead of adopting MPST-based verification through static typechecking at compile-time, we explore MPST-based verification through dynamic monitoring at run-time. This enables soundness and completeness, while it also supports value-dependent protocols in a generally implementable way (i.e., we are not aware of a mainstream GPL that does not support our monitoring approach). In this paper, we present our practical embodiment of these ideas: Discourje (pronounced "discourse"), a runtime verification framework for communication protocols in Clojure [17, 26] . Discourje consists of two components: a DSL to specify protocols and construct monitors, and an API to implement protocols (supplementing Clojure) and add instrumentation. While we could have developed this framework for any language with channel-based programming abstractions, including Go and Rust, Clojure is particularly interesting, because: (1) Clojure has a powerful macro system that enabled us to develop the Discourje DSL as an extension to Clojure, thereby offering programmers a seamless specificationimplementation experience; (2) contrasting Go and Rust, Clojure is not a systems language but an applications language, so runtime verification overheads might be more tolerable. We summarize our contributions as follows: Our artifact is available at https://github.com/discourje. Clojure (in a nutshell). Clojure [17, 26] is a general-purpose, impure functional language that compiles to Java bytecode. As a dialect of Lisp, Clojure follows the code-as-data philosophy, provides a powerful macro system, and adopts parenthesized prefix notation. Clojure offers asynchronous channel-based programming abstractions through core library clojure.core.async [16] . In the annual Clojure survey [15] , Clojure programmers indicate "ease of development" is more important than "runtime performance"; this makes Clojure an interesting target for runtime verification (viz. overheads). To introduce the core features of Clojure relevant to this paper, Fig. 3 shows a channel-based concurrent Tic-Tac-Toe program in Clojure, 3 while Fig. 4 summarizes the meaning of every primitive (";;" indicates comments). Lines 1-9 define constants (blank, cross, nought, initial-grid) and functions (get-blank, add, not-final?) to represent Tic-Tac-Toe concepts. Lines 11-12 define two channels (a->b and b->a) that implement the infrastructure through which players Alice and Bob communicate. Channels in Clojure are bounded: sends/receives block until a channel is not full/empty. Lines 14-24 and 25-35 define threads that implement Alice and Bob. Both players execute a loop, starting with a blank grid. In each iteration, Alice first gets the index of some blank space on the grid, then plays a cross in that space, then sends a message to Bob to communicate the index, then awaits a message from Bob, and then updates the grid accordingly; Bob acts symmetrically. After every grid update, Alice or Bob checks if it has reached a final configuration; if so, the loop is exited and channels are closed. Every Clojure data structure, including the vector that implements the grid, is persistent, and therefore, effectively immutable. This means that every operation on an existing data structure leaves it intact, and instead, it returns a new data structure. Thus, Alice and Bob initially share the same initial grid, but because it cannot be modified in-place, modifications need to be explicitly communicated. Persistence of Clojure data structures is also why we can guarantee freedom from data races in pure Clojure (= Clojure without Java objects): if users communicate only Clojure data through channels, race freedom is guaranteed (if Java objects are communicated, the user is responsible to avoid races). Basic Discourje: Tic-Tac-Toe. A basic Discourje specification of the Tic-Tac-Toe protocol for Alice and Bob is shown in Fig. 5 . We typeset Discourje "keywords" (which are actually just Clojure functions and macros) bold violet bold violet bold violet. Lines 1-2 define two roles (role role role) to represent Alice and Bob. Lines 4-6 define an auxiliary specification, inserted twice into the main specification (ins ins ins); it states that the channels between Alice and Bob are closed (-## -## -##), in parallel (par par par). Lines 7-13 define the main specification; it states that recursively (fix fix fix), first a message of type Long (the index of a grid) is communicated from Alice to Bob (--> --> -->), and then from Bob to Alice, unless the channels are closed (the game is done). Square brackets are used to build lists of sub-specifications (sequencing). The Tic-Tac-Toe protocol depends on value-dependent control flow, as Alice and Bob close the channels only once the grid has reached a final configuration. This is a non-protocol-related property that no existing MPST tool supports. 1 (def def def blank " ") (def def def cross "x") (def def def nought "o") ...)) ;; returns a blank space in g 8 (def def def add (fn fn fn [g i x-or-o] ...)) ;; returns g, but with i set to x-or-o 9 (def def def not-final? (fn fn fn [g] ...)) ;; returns true iff g is not final 10 11 (def def def a->b (chan chan chan 1)) (def def def b<-a a->b) ;; b<-a is an alias of a->b 12 (def def def b->a (chan chan chan 1)) (def def def a<-b b->a) ;; a<-b is an alias of b->a 13 14 (thread thread thread ;; alice (recur recur recur g)))))) 24 (close! close! close! a->b)) 25 (thread thread thread ;; bob (recur recur recur g)))))) 35 (close! close! close! b->a)) 1 (def def def alice (role role role "alice")) ;; roles 2 (def def def bob (role role role "bob")) 3 4 (def def def ttt-close (dsl dsl dsl ;; auxiliary spec 5 (par par par (-## -## -## alice bob) 6 (-## -## -## bob alice)))) 7 (def def def ttt (dsl dsl dsl ;; main spec 8 (fix fix fix :X 9 [(--> --> --> alice bob Long) 10 (alt alt alt (ins ins ins ttt-close) 11 [(--> --> --> bob alice Long) 12 (alt alt alt (ins ins ins ttt-close) 13 (fix fix fix :X))])]))) Discourje specification of Tic-Tac-Toe 10 (def def def m (moni moni moni (spec spec spec ttt))) 11 (def def def a->b (chan chan chan 1 alice bob m)) (def def def b<-a a->b) 12 (def def def b->a (chan chan chan 1 bob alice m)) (def def def a<-b b->a) To monitor the implementations of Alice and Bob against this specification, first, we need to load library discourje.core.async instead of clojure.core.async (implicitly loaded in Fig. 3 ). All other code modifications are shown in Fig. 6 : on line 10, the specification is evaluated to an internal form (spec spec spec) and wrapped in a new monitor (moni moni moni), while on lines 11-12, we associate the intended sender, receiver, and monitor with the channels. No other changes are needed: notably, the code for Alice (Fig. 3 , lines [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] and Bob (lines 25-35) is unaffected; Discourje is non-invasive to start using. Running the monitor alongside the implementation guarantees safety: if a non-compliant channel action were to be attempted, the monitor prevents it from happening and throws an exception. The implementation in Fig 3 can indeed violate the specification in Fig. 5 : the specification states channels are allowed to be closed only after (the receive of) the previous communication is done, but in the implementation, Alice or Bob could attempt to close already before. In our artifact, we have a solution where we mix channels with barrier synchronization from the standard java.util.concurrent library (readily usable in Clojure), to let Alice and Bob first await each other and then close. Thus, channel-based programming abstractions monitored through Discourje can be mixed seamlessly with other concurrency libraries, which happens regularly in message passing programs [46, 48] . Advanced Discourje: common patterns. Discourje specifications of common patterns of communication are shown in Fig. 7 ; they make use of Discourje's role indexing and finite repetition (rep rep rep) features. Imagine we have a sequence of worker threads, organized in a pipeline (i.e., the i-th worker receives from its predecessor, i−1, and sends to it successor, i+1). Lines 1-2 define the specification of a communication from a worker to its successor. Intuitively, succ is a function that maps three parameters to a specification. For instance, (ins ins ins succ bob 5 Turn) inserts (--> --> --> (bob 5) (bob 6) Turn), where (bob 5) and (bob 6) are indexed roles. We note that every role created with role role role allows indexing (with arbitrary types), and that specifications can be 1 (def def def succ (dsl dsl dsl :w :i :t 2 (--> --> --> (:w :i) (:w (inc :i)) t))) 3 4 (def def def pipe (dsl dsl dsl :w :k :t 5 (rep rep rep seq [:i (range (dec :k))] 6 (ins ins ins succ :w :i :t)))) 7 8 (def def def ring (dsl dsl dsl :w :k :t 9 [(ins ins ins pipe :w :k :t) 10 (--> --> --> (:w (dec :k)) (:w 0) :t)])) 11 (def def def one-one-one (dsl dsl dsl :m :w :k :t :u We also note that any Clojure function can be used in specifications (e.g., inc, to manipulate indices). Lines 4-6 define the specification of a pipeline communication pattern; it states that specification (ins ins ins succ :w :i :t) is repeated for each value :i from 0 to k-1, and the iterations are composed sequentially (seq). Lines 8-10 extend the pipeline to a ring, where the last worker also communicates with the first. Lines 11-14 define the specification of a communication from a "master" to one of k workers, and back. Similarly, lines 16-19 define the specification of a communication from a master to all of k workers, and back. In these specifications, loop iterations are composed alternatively (alt) and in parallel (par). Implementation calculus. To formalize our verification problem, we first define a calculus to model Clojure implementations. Let range over heap locations, x over variables, v over values, and I over implementations. The calculus is generated by the following grammar: v ::= nil | | fn x I | true | false | 0 | 1 | 2 | ... Calculus notation corresponds closely with Clojure notation (Fig. 4) , with the exception of application (I 1 I 2 ), sequencing (I 1 · I 2 ), and threading (I 1 I 2 ). The operational semantics of the calculus is defined in terms of labeled reductions of triples (I, E, H): I is an implementation, E is a global environment (from variables to values), and H is a heap (from heap locations to channel states). Channel states are represented as pairs (w, n), where w is a list of values (messages in transit, from left to right), and n the buffer size. Labels, ranged over by α, are of the form !v (send), ?v (receive), # (close), and τ (anything else; we verify only channel actions). The reduction rules are shown in Fig. 8 . Rule [I-Ctxt] executes the first step of implementation I in context C: it first substitutes I for in C (notation: C[I]), and then executes the first step. H( ) = (w, n) and |w| < n Contexts are generated by the following grammar: Specification calculus. Next, we define a calculus to model Discourje specifications. Let p, q range over roles, f over boolean functions (from the implementation calculus), n, m over number expressions (from the implementation calculus), [S↓-One] S i∈{1,2} ↓ S1 + S2 ↓ [S↓-Alt] S1 ↓ and S2 ↓ S1 · S2 ↓ [S↓-Seq] S1 ↓ and S2 ↓ S1 S2 ↓ [S↓-Par] Fig. 9 . Operational semantics of the specification calculus (termination) and ⊗ over {+, ·, }. The calculus is generated by the following grammar: ?v. We note that the specification calculus has no τ -reductions (which are not monitored; we verify only channel actions). We also note that it can express some, but not all, context-free languages: it can count (using ), but it cannot encode a stack. Inclusion relation. Finally, we define a relation to decide if the behavior of an implementation I is included in the behavior of a specification S. First, let † range over functions from heap locations to sender-receiver pairs; informally, † establishes a correspondence between channel references in the implementation (characterized by their heap locations) and channel references in the specification (characterized by the roles that use them as sender/receiver). Next, let → I ⊆ →. We call → I an execution of I if it satisfies these conditions: In words, (Î, E, H) R S iff wheneverÎ can reduce toÎ , S can reduce accordingly to S (andÎ and S are again related by R), up to τ -reductions (R is weak [24] ). Implementation I is safe relative to specification S, denoted as I S, if for every execution → I of I, there is a ( †, → I )-simulation R such that (I, ∅, ∅) R S. The DSL. The DSL consists of: Clojure macros to write specifications (cf. syntax of the specification calculus; Sect. 3); Clojure data structures to represent specifications as state machines (cf. operational semantics of the specification calculus); Clojure functions to instantiate these data structures and construct monitors. The workflow is shown in Fig. 11 : first, the programmer writes a specification S using the macros; then, at run-time, function spec spec spec is applied to S to expand and evaluate the macros to a data structure S ; then, function moni moni moni is applied to S to construct a monitor. Essentially, the monitor provides two operations, depicted as "lollipops" in Fig. 11 : checking if a given channel action α is allowed by S (formally: S α − → S for some S ), and subsequently updating S to its successor. In this way, effectively, the monitor incrementally builds a formal simulation to ensure safety (Sect. 3, page 275). We note that checking/updating is protected by lock-free synchronization (compare-and-set): an α reduction happens only if it was already checked if α is allowed, and the state has not yet been updated after that check. The API. The API consists of Clojure functions that act as proxies for Clojure's own functions to send, receive, close channels, and construct channels. The workflow is shown in Fig. 12 : first, the programmer writes an implementation I using Clojure's own functions; then, by loading library discourje.core.async instead of clojure.core.async, the programmer adds instrumentation to the implementation that allows channel actions to be monitored. As the signatures of Discourje's send, receive, and close functions are identical to Clojure's, adding instrumentation in this way is non-invasive and nearly effortless; the only changes needed, pertain to channel creation (Sect. 2, Fig. 6 ), since we require the programmer also to specify which roles will use the channel and associate a monitor (this is the practical embodiment of function † in Sect. 3, page 275). 4 Discourje's send function works as follows. When invoked, first, it waits until the underlying channel c is not full (recall channels in Clojure are bounded and blocking). Then, at time t 1 , it calls the monitor associated with c to check if the send is allowed. If yes, at time t 2 , it calls the monitor to update accordingly and the "actual send" happens through c; if no, only an exception is thrown. If, between t 1 and t 2 , multiple threads call the monitor to update, only one will succeed; the others need to retry from the start. Discourje's receive and close functions work similarly. In this way, Discourje detects safety violations in a way that is both sound (if an exception is thrown, the violating action really was not allowed) and complete (if no exception is thrown, all actions were really allowed). -Multi-cast: Adding to Clojure's send, receive, and close functions, the API also contains a multi-cast function to send the same value through n>1 channels, along with special monitoring support in the DSL (more efficient than monitoring individual communications). Also, the API contains a "multireceive" function that optionally synchronizes all receivers of a multi-cast. -Java interoperability: Clojure compiles to Java bytecode and runs on the JVM; this enabled us to extend Discourje to Java. Specifically, we wrote a thin Java wrapper around Discourje, so Java programmers can easily construct and use Discourje channels, write specifications, and have them monitored from inside their Java programs, regardless of the threading mechanism (e.g., classical Java threads, thread pools, and parallel streams can be used). General setup. We developed Discourje for two primary usage types: 4 We currently support the following main channel operations of clojure.core.async: sending, receiving, and closing. Discourje works out-of-the-box for all Clojure programs, except those that use unsupported clojure.core.async features; mixing Discourje with other concurrency libraries is fine (Sect. 2). An interesting next step is to also support clojure.core.async's transducers (operations on data-in-transit): to our knowledge, no existing work on MPST supports transducers, so supporting those requires significant new theoretical work. A. as a testing/debugging tool for concurrent programs in development, to reliably find/diagnose communication-related concurrency bugs; B. as a fail-safe mechanism for concurrent programs in production, to prevent propagation of spurious results caused by concurrency bugs to end-users (i.e., it is better to throw a runtime error, cf. ArrayIndexOutOfBoundsException.) A key factor that determines Discourje's fitness for purpose is overhead. We therefore conducted two kinds of benchmarks: microbenchmarks to study the scalability of Discourje and whole-program benchmarks to study the slowdown it inflicts relative to unmonitored code. We used two different hardware configurations to run our benchmarks: vm2 is an instance of the TACAS'20 Artifact Evaluation Virtual Machine for Vir-tualBox, configured with 2 virtual cores and 8 GB of virtual memory; lisa is a high-end machine with 16 physical cores (Intel Xeon 6130 processor; hyperthreading disabled) and 96 GB of physical memory (far more than needed for our benchmarks). We hosted vm2 on a machine with 4 physical cores (Intel Core i7-8569U; hyper-threading enabled) and 16 GB of physical memory. Microbenchmarks. In the microbenchmarks, we studied Discourje's scalability under extreme circumstances where threads perform only sends/receives and no real computations; this is the worst-case scenario for the lock-free algorithm to synchronize monitor access, as it gives rise to maximal thread contention. We considered three specifications to investigate the core features/operators offered by the Discourje DSL in isolation, using our built-in common patterns (Fig. 7) : ring for sequential composition, one-one-one (OOO) for alternative composition, and one-all-one (OAO) for parallel composition. Each pattern was recursively repeated (i.e., wrapped in (fix fix fix :X [... (fix fix fix :X)]). For Ring and OAO, a round consists of 1000 repetitions; for OOO, a round consists of 1000·n repetitions, where n is the number of worker threads. For each implementation I ∈ {Ring, OOO, OAO} with n ∈ {2, 4, 6, 8, 10, 12, 14, 16} worker threads, 5 we recorded the mean round latency µ I n in eight hours of execution on lisa, the standard deviation σ I n , and the coefficient of variation c I n = µ I n σ I n . We found c I n ≤ 6% for all I and n, except c OOO 6 = 14% and c OOO 8 = 8%. As a measure of scalability, we computed normalized means |µ I n | = µ I n 0.5·n·µ I 2 : this metric is a dimensionless number that indicates the extent to which implementations scale linearly in the number of worker threads, relative to n = 2. For instance, if |µ I 16 | = 1, I with 16 workers threads is exactly 8× as slow as I with 2 worker threads; this is reasonable, because the worker threads perform 8× more sends and receives in each round (due to the adversarial microbenchmark conditions, the sends and receives are effectively linearized by the monitor, which can check and update at most one channel action at a time). The normalized means are shown in Fig. 13 ; our raw data (including standard deviations) are included in our artifact. We summarize the findings: -Ring (blue) scales sub-linearly. This is because at any point in time, only one worker thread contends for monitor access (the current receiver or sender; the others are blocked, waiting for incoming channels to become non-empty). -OOO (red) scales linearly, stabilizing around a constant factor of 1.4. This is because the number of branches in the monitor's internal state machine grows linearly in the number of worker threads. Thus, the cost of using the monitor grows proportionately, but the factor is constant. -OAO (yellow) scales super-linearly, getting progressively worse as the number of worker threads increases. This is because all worker threads contend for monitor access all the time, and the number of branches in the monitor's state machine increases linearly. To conclude, Ring (which exercises sequential composition) enjoys excellent scalability, while OOO (which exercises alternative composition) enjoys decent scalability, even under the adversarial microbenchmark conditions. Scalability of OAO (parallel composition) can be improved; we discuss one avenue in Sect. 7. Whole-program benchmarks. In our whole-program benchmarks, we studied Discourje's possible slowdown in five real(istic)/existing concurrent programs: -Chess: Simulates a game of chess between two player threads. -Conjugate Gradient (CG-n): Computes an estimate of the largest eigenvalue of a symmetric positive definite sparse matrix with a random pattern of nonzeros, using the conjugate gradient algorithm, with n worker threads. We also wrote specifications for these implementations in the Discourje DSL. For Chess, the specification is the same as the Tic-Tac-Toe specification (Sect. 2); for CG, FT, IS, and MG, the specifications consist of recursively repeated choices among various instances of the one-all-one pattern (each of which involves different subsets of worker threads and message types); the key difference between the specifications, then, is the frequency in which repetitions occur. We recorded execution times of each of the implementations without and with monitoring enabled, using existing/standardized workloads. For Chess, the workload is controlled by the total amount of time each player has to compute its moves during the entire game; we used the four smallest such workloads supported by the open source chess server Lichess (https://lichess.org), namely {15, 30, 45, 60} seconds, and we limited games to a maximum of 40 turns per player (UltraBullet chess). 6 For CG, FT, IS, and MG, the workload is controlled by the input size; we used the standardized inputs that are predefined by NPB. We ran Chess on vm2; we ran CG-n, FT-n, IS-n, and MG-n on vm2 for n = 2 and on lisa for n ∈ {2, 4, 6, 8, 10, 12, 14, 16}. We repeated each of the runs 50 times to smooth out variability; the resulting coefficients of variation are below 5% for CG, FT, IS, and MG, and between 19%-22% for Chess (because moves are not computed deterministically, which affects the number of turns per game). As a measure of slowdown, we computed normalized means of execution times with monitoring, µ w , against those without monitoring, µ wo (i.e., µw µwo ): this metric is a dimensionless number that indicates the factor by which monitoring slows down the implementation. The normalized means are shown in Figs. 14-15; the raw data (including standard deviations) are included in our artifact. We summarize the findings: -For Chess, for three workloads, slowdowns are <1. As the number of instructions per channel action is, objectively, higher with monitoring than without, we suspect these observed speedups might be an artifact of the variability in the measurements. That said, the general trend suggests both usage types of Discourje (page 277) are very well possible for Chess. -For FT and IS, the slowdowns are low: less than 5% and 2% respectively. This seems low enough not only for Discourje's usage type A (testing/debugging in development), but even usage type B (fail-safe mechanism in production). -For CG and MG, the slowdowns are higher: less than 5× and 2.5× respectively. Although this might be too much for Discourje's usage type B, it seems low enough for usage type A (cf. the industrial-strength Valgrind tool for memory debugging [35] , which typically inflicts a ≥10× slowdown). The difference in performance between {FT, IS} and {CG, MG} may be explained by the fact the latter are more communication-intensive than the former, so the overhead of monitoring communications is more pronounced. -For CG, FT, IS, and MG, the slowdowns grow only linearly as the number of threads increases. This shows that the super-linear scalability we observed under the adversarial microbenchmark conditions for the one-all-one pattern, does not manifest in these real programs. To conclude, we believe it is encouraging to see that even (extended versions of) the specification that scaled poorest in our microbenchmarks, can give well enough performance in real concurrent programs for both usage types A and B. Expressiveness issues of multiparty session types (MPST) have received some attention, but efforts have primarily been geared towards adding more advanced features (e.g., time [5, 36] , security [7, 8, 9, 13] , and parametrisation [14, 20, 39] ); in contrast, restrictions on the usage of core features like choice and interleaving have remained, even though they limit MPST's applicability in practice (e.g., our Tic-Tac-Toe specification cannot be expressed; Fig. 5 ). Recently, work has been done to improve MPST's expressiveness in this regard using static techniques [31] , but our specification language in this paper is still more expressive. Closest to our work, then, are hybrid MPST approaches that combine static type-checking with a form of distributed runtime monitoring and/or assertion checking [3, 4, 19, 36, 37] . In contrast to this paper, however, these dynamic techniques still rely on projection, which limits expressiveness (Sect. 1); none of the specifications in this paper are supported. Projection-free MPST has also been explored by López et al. [34, 43] . Their idea is to specify MPI communication protocols in an MPI-tailored DSL, inspired by MPST, and verify the implementation against the specification using deductive verification tools (VCC [18] and Why3 [21] ). However, this approach does not support push-button verification: considerable manual effort is required. In contrast, our approach is fully automated. We are aware of only two other works that use formal techniques to reason about Clojure programs: Bonnaire-Sergeant et al. [6] formalized the optional type system for Clojure and proved soundness, while Pinzaru et al. [41] developed a translation from Clojure to Boogie [2] to verify Clojure programs annotated with pre/post-conditions. Ours is the first paper that targets concurrency in Clojure. Verification of shared-memory concurrency with channels has received attention in the context of Go [40, 32, 33, 45] . However, emphasis in these works is on checking deadlock-freedom, liveness, and generic safety properties, while we focus on program-specific protocol compliance. Castro et al. [14] also consider protocol compliance, but their specification language (of global types) is less expressive than ours and does not support this paper's examples. We presented Discourje: a runtime verification framework for channel-based communication protocols in Clojure. Discourje is based on a projection-free interpretation of multiparty session types, trading static type-checking for dynamic runtime monitoring to alleviate expressiveness issues. A key design principle of Discourje has been ergonomics: we aim to make Discourje's use as comfortable as possible. Specifically, programmers can decide to start using Discourje at any stage of development (and doing so requires little effort); Discourje is itself implemented in Clojure (so no need to use a different IDE, learn completely new syntax, or install special compilers); and Discourje can be used seamlessly alongside other concurrency libraries. The framework has a formal foundation, and benchmarks indicate that monitoring overhead can be less than 5% for real/existing concurrent programs. This makes Discourje suitable both as a testing/debugging tool in development, and as a fail-safe mechanism in production. We list two interesting avenues for future work. First, we want to refine our lock-free synchronization algorithm to enhance the way parallel composition is handled. Second, a much more profound extension pertains to feedback and recovery. Specifically, we want to explore the idea that whenever a monitor detects a violation, instead of throwing an exception, it should simply delay the violating action as a corrective measure, in an attempt to steer the implementation toward safe behavior. When done naively, such delays can easily yield deadlocks, so our plan is to combine this with runtime model-checking/reachability analysis to check if eventually, the violating action is allowed (if yes, delay; if no, throw). Behavioral types in programming languages Boogie: A modular reusable verifier for object-oriented programs Monitoring networks through multiparty session types A theory of design-by-contract for distributed multiparty interactions Timed multiparty session types Practical optional types for clojure Typing access control and secure information flow in sessions Information flow safety in multiparty sessions Session types for access and information flow control Structured communication-centred programming for web services Structured communication-centered programming for web services On global types and multiparty session Self-adaptation and secure information flow in multiparty communications Distributed programming using role-parametric session types in go: statically-typed endpoint apis for dynamically-instantiated communication structures Clojure Team: Clojure -State of Clojure Clojure Team: Clojure -Clojure core.async Channels Clojure Team: Clojure (nd) VCC: A practical system for verifying concurrent C Practical interruptible conversations: distributed dynamic verification with multiparty session types and python Parameterised multiparty session types Why3 -where programs meet provers Introduction to Process Algebra Performance and scalability of the NAS parallel benchmarks in java Branching time and abstraction in bisimulation semantics The Go Programming Language The clojure programming language Multiparty asynchronous session types Hybrid session verification through endpoint API generation Explicit connection actions in multiparty session types Foundations of session types and behavioural contracts Exploring type-level bisimilarity towards more expressive multiparty session types Fencing off go: liveness and safety for channel-based programming A static verification framework for message passing in go using behavioural types Protocol-based verification of message-passing parallel programs Valgrind: a framework for heavyweight dynamic binary instrumentation Timed runtime monitoring for multiparty conversations A session type provider: compiletime API generation of distributed protocols with refinements in f# Let it recover: multiparty protocol-induced recovery Pabble: parameterised scribble Static deadlock detection for concurrent go by global session graph synthesis Towards static verification of clojure contract-based programs Deductive verification of parallel programs using why3 A linear decomposition of multiparty sessions for safe distributed programming Static trace-based deadlock analysis for synchronous mini-go Why do scala developers mix the actor model with other concurrency models? Certifying data in multiparty session types Understanding real-world concurrency bugs in go ), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use