Hunting Superfluous Locks with Model Checking


HAL Id: hal-02314088
https://hal.inria.fr/hal-02314088

Submitted on 11 Oct 2019

HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.

Hunting Superfluous Locks with Model Checking
Viet-Anh Nguyen, Wendelin Serwe, Radu Mateescu, Eric Jenn

To cite this version:
Viet-Anh Nguyen, Wendelin Serwe, Radu Mateescu, Eric Jenn. Hunting Superfluous Locks with
Model Checking. From Software Engineering to Formal Methods and Tools, and Back, 11865, Springer
Verlag, pp.416-432, 2019, Lecture Notes in Computer Science, �10.1007/978-3-030-30985-5_24�. �hal-
02314088�

https://hal.inria.fr/hal-02314088
https://hal.archives-ouvertes.fr


Hunting Superfluous Locks with Model Checking

Viet-Anh Nguyen1, Wendelin Serwe2, Radu Mateescu2, and Eric Jenn1

1 IRT Saint Exupéry, Toulouse, France
2 Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP?, LIG, 38000 Grenoble, France

Abstract. Parallelization of existing sequential programs to increase
their performance and exploit recent multi and many-core architectures
is a challenging but inevitable effort. One increasingly popular paral-
lelization approach is based on OpenMP, which enables the designer to
annotate a sequential program with constructs specifying the parallel
execution of code blocks. These constructs are then interpreted by the
OpenMP compiler and runtime, which assigns blocks to threads running
on a parallel architecture. Although this scheme is very flexible and not
(very) intrusive, it does not prevent the occurrence of synchronization
errors (e.g., deadlocks) or data races on shared variables. In this paper,
we propose an iterative method to assist the OpenMP parallelization by
using formal methods and verification. In each iteration, potential data
races are identified by applying to the OpenMP program a lockset anal-
ysis, which computes the set of shared variables that potentially need to
be protected by locks. To avoid the insertion of superfluous locks, an ab-
stract, action-based formal model of the OpenMP program is extracted
and analyzed using the ACTL on-the-fly model checker of the CADP
formal verification toolbox. We describe the method, compare it with
existing work, and illustrate its practical use.

1 Introduction

Nowadays, to take full advantage of modern hardware architectures (multi-core
and many-core processors, Systems-on-Chip, etc.), it is necessary to parallelize
applications, even in constrained environments, such as avionics. Designing cor-
rect parallel programs on shared-memory architectures is a difficult task facing
classical synchronization issues, such as the presence of data races (concurrent
accesses to shared variables that make the program nondeterministic [20]) or
deadlocks, both of which are unacceptable for critical systems. These difficulties
occur not only in the design of new parallel programs, but also in the paralleliza-
tion of existing sequential programs, which have been optimized during years and
for which it is too costly to redevelop parallel versions from scratch.

An increasingly popular approach to parallelize sequential code is based on
OpenMP [33], which does not require to modify the code but simply annotate it
with parallelization constructs expressing a variety of mechanisms (creating par-
allel regions executed by teams of threads, inserting locks on variables and array

? Institute of Engineering Univ. Grenoble Alpes


elements, introducing synchronizations, etc.). The underlying compiler and exe-
cution framework are in charge of implementing these constructs, building and
executing the parallel program on a given architecture. Unfortunately, OpenMP
is not equipped with a formal semantics suitable for reasoning about OpenMP
programs and ensuring the correctness of annotation-based parallelization: [33,
Section 1.1] explicitly states that “OpenMP-compliant implementations are not
required to check for data dependencies, data conflicts, race conditions, or dead-
locks, any of which may occur in conforming programs.”

A naive way to eliminate data races in a parallel program is to protect all
shared variables by locks that serialize the accesses of parallel threads to these
variables. Although safe, this approach may introduce deadlocks, and also in-
crease the overhead, negatively impacting the performance of the program. Even
more importantly, the approach may induce a too sequential execution of the
program and thus not (fully) exploit the benefits of parallelization. In this pa-
per, we refine this naive lock-based approach and propose an iterative method
to prevent data races in safety-critical parallel applications. The method com-
bines a simple lockset analysis [36] to detect all the shared variables potentially
unprotected by locks (which may produce false positives about variables that
actually do not need to be protected) and a model checking analysis to reduce
the number of false positives and consequently avoid introducing superfluous
locks.

Lockset analysis is based on the application of a “locking discipline”, by
considering that a race condition may occur if a shared variable is not protected
by an appropriate lock. Lockset based race detectors are easy to implement and
never produce false negatives, i.e., they detect all potential data races, which is
essential for safety-critical applications. However, these detectors are pessimistic,
since data races can be prevented not only by using locks, but also by performing
accesses to shared variables sequentially.

We exhibit such sequentiality using model checking, by extracting from the
parallel program a formal model capturing (an abstraction of) the concurrency
and data dependencies, and detecting the presence of concurrent accesses to
shared variables using temporal properties in ACTL (Action Computation Tree
Logic) [9,8]. The precision of the model has no impact on the soundness of
the method (since the lockset analysis has already produced a data race free,
albeit not optimal, parallel program), but only on the efficiency of the parallel
code (a better model precision will yield a more accurate analysis and hence
a more drastic elimination of superfluous locks). We instantiated this method
on top of the CADP (Construction and Analysis of Distributed Processes)3 [14]
verification toolbox and illustrated it for the design of data race free OpenMP
applications.

Stefania’s work on defining action-based temporal logics and various exten-
sions thereof, as well as on designing efficient on-the-fly verification algorithms,
was a source of inspiration in this field. The implementation of ACTL used in
this paper relies on the translation from ACTL to the modal µ-calculus (Lµ)

3
http://cadp.inria.fr

2

http://cadp.inria.fr


proposed by Stefania and colleagues [10]. For checking ACTL formulas, we used
the on-the-fly model checker EVALUATOR [29] of CADP, which handles for-
mulas of MCL, an extension of alternation-free Lµ with data and generalized
Büchi automata. Although it relies on different techniques based on the local
resolution of Boolean equation systems [27], EVALUATOR is similar in spirit
to the on-the-fly model checkers FMC [16] and UMC [5] developed by Stefania
and colleagues, which handle formulas of µACTL (ACTL extended with fixed
point operators) and UCTL (µACTL extended with state-based and data-aware
operators), respectively.

Also, Stefania’s contributions on ACTL characteristic formulas [12] and the
adequacy of action-based logics with bisimulations [11] paved the way towards
an Lµ fragment adequate with divergence-sensitive branching bisimulation [30].
Recently, this Lµ fragment, which subsumes µACTL\X (µACTL without the
next time operator) was extended with strong modalities and equipped with
an improved compositional verification technique [24] applicable to the ACTL
formulas we use for detecting concurrent accesses to shared variables.

The paper is organized as follows. Section 2 gives a brief overview of OpenMP
and data races. Section 3 presents our parallelization workflow and its practi-
cal implementation using CADP. Section 4 reviews existing work on data race
prevention. Finally, Section 5 gives some concluding remarks and directions for
future work.

2 OpenMP

OpenMP [33]4 is an API (Application Programming Interface) for developing
portable parallel programs using a shared memory communication paradigm in
the C, C++, and Fortran programming languages. The OpenMP API consists
of directives to extend the base languages with portable parallel programming
constructs (to be implemented by an OpenMP-compliant compiler) and func-
tions and environment variables (to be implemented by a corresponding runtime
library). In the C/C++ languages, OpenMP directives are pragmas of the form
of #pragma omp .... Directives and calls to library routines are grouped under
the generic designation of constructs. OpenMP supports both parallel and se-
quential execution, the latter being achieved by simply ignoring the OpenMP
constructs.5

The execution model of OpenMP follows a fork-join discipline. Initially, an
OpenMP program begins with a single thread of execution. Whenever a thread
encounters an OpenMP parallel construct, the thread creates a team of threads
(containing at least itself) and becomes the master thread of the team. The
code executed by each of these threads depends on the code inside the parallel
construct. These threads then execute independently and synchronize (using an

4
http://www.openmp.org

5 However, OpenMP does neither require nor guarantee that parallel and sequential
executions produce the same results; also, executing the same program with a dif-
ferent number of threads may yield different results [33, Section 1.3].

3

http://www.openmp.org


implicit barrier) on termination; only the master thread continues afterwards.
OpenMP supports nested parallelism: each thread of a team can itself create a
new team, when it encounters a parallel construct. Hence, several teams may
exist simultaneously.

Initially designed for the parallelization of regular loops, OpenMP supports
since version 3.0 also the notion of task. In OpenMP, a task is a pair of a piece
of code together with a specific piece of data. Tasks can be generated explicitly
or implicitly, and are to be executed by the threads. As an example, the for
construct implicitly creates a task for each iteration of the associated for-loop.
The single construct implicitly creates a task for its associated code, so that
this code is executed exactly once; other threads encountering the construct wait
for its termination (using an implicit barrier). A task can be suspended (and
resumed later) only on so-called task scheduling points, e.g., when creating new
task(s) or when waiting on a barrier. OpenMP provides constructs to express
further constraints on task creation and execution, such as the fact that a task
can only be executed if another task has completed.

All OpenMP threads have access to the same shared memory. Each thread
may read or write any shared variables, and there is no constraint as to when
those operations are allowed to occur. The memory model of OpenMP is relaxed-
consistency: each thread has a temporary view of the shared memory; this tem-
porary view is not required to be consistent with the memory at all times. Each
thread has also access to a local, private memory, to which other threads have
no access.

OpenMP provides synchronization constructs (locks, barriers, etc.), variable
attributes defining data sharing (private, shared, etc.), a flush operation (en-
forcing the consistency of the temporary views with the shared memory), and
data-dependencies between tasks (enforcing the execution of a task computing
some value before the execution of a task using this value).

In OpenMP, parallelization is user-directed, i.e., the programmer explicitly
uses the OpenMP constructs to specify how to parallelize the execution of the
program. Hence, OpenMP relies on the programmer to ensure the correctness
of the program [33, Section 1.1], e.g., to ensure the absence of memory man-
agement errors, such as data races. However, this is a heavy responsibility for
the programmer, given the inherent complexity of parallelization, the rich set
of constructs, and the fact the creation and ordering of tasks might depend
on information only available at runtime. Consequently, any assistance to the
programmer is more than welcome.

A first approach to assist programmers is to make a list of common pitfalls
and to derive a set of recommended best practices and coding guidelines [38].
These common mistakes can be classified into two categories: errors (leading
to an incorrect behavior) and performance issues (leading to inefficient pro-
grams). For instance, using the clause default (none) implies that data-sharing
attributes of all variables in all parallel regions of the OpenMP program must be
explicitly specified. Although following carefully chosen coding guidelines may

4


ensure correctness, it might be difficult to apply these guidelines to an existing
sequential code and obtain a parallel version with acceptable performance.

A second approach is the development of analysis tools that detect errors.
However, due to the expressive power of the OpenMP constructs and in par-
ticular the fact that the parallel execution might depend on data values, de-
veloping such analysis tools is extremely challenging. As for the first approach,
limiting the scope of the tools to a subset of OpenMP constructs is not always
acceptable in an industrial context (the constructs have been included for good
reasons). Similarly, applying coarse data abstractions increases the rate of false
positives/negatives and reduces the practical usefulness of the analysis.

In the following section, we present a method to assist the programmer to
ensure the absence of data races. To illustrate this method, we use the fol-
lowing simple example, which computes the sum of the squares of the ele-
ments of an array a, counting the last element (a[4]) twice. Figure 1 shows the
OpenMP code.6 The construct #pragma omp parallel (line 5) creates a team of
threads to execute the block from lines 6–15 in parallel; the number of threads
in the team is determined at runtime, depending on the available hardware.
The construct #pragma omp for schedule (static, 1) (line 7) indicates that the
body of the for-loop should be statically splitted in as many tasks as there are
iterations (i.e., 5). Obviously, there is a data race on the update of variable sum in
the for-loop (line 11). However, there is no data race for the accesses to the array
a and variable sum between the body of the for-loop and assignment at line 14,
because the assignment is executed after the termination of all iterations of the
for-loop (i.e., there is an implicit barrier at the end of the #pragma omp for
construct), and the assignment is executed by a single thread (this is ensured by
the single construct at line 13).

3 Parallelization Workflow

Figure 2 depicts the suggested workflow for the parallelization of a sequential
application, guaranteeing the absence of data races by enforcing a locking disci-
pline and avoiding superfluous locks by means of model checking. This iterative
flow comprises several activities in each iteration:

1. Lockset analysis is used to build the variable-set SUV (Set of Unprotected
Variables), which might present the risk of a data race. It is sufficient to
use the simplest version of lockset analysis presented in [36], which raises an
alarm if any access to a shared variable is not protected by a lock.

2. A formal model expressed in the LNT language [15] is built to capture all
the possible control flows of OpenMP threads and their synchronizations.
The process of building the LNT model is presented in the next section.

3. The verification tools provided by CADP are used to identify the “sequen-
tiality constraints” that prevent some data race conditions to occur. The set
of unprotected variables is updated accordingly.

6 The meaning of “work unit” (in the comments) can be found in Section 3.2.

5


1 int a[5] = {2, 3, 4, 5, 6};

2 int main()

3 {

4 int i, sum = 0; // work unit WU0
5 #pragma omp parallel

6 {

7 #pragma omp for schedule (static , 1)

8 for (i = 0; i < 5; i++)

9 { // work u n i t s WU1 to WU5
10 a[i] = a[i] * a[i];

11 sum += a[i];

12 }

13 #pragma omp single

14 sum += a[4]; // work unit WU6
15 }

16 return 0;

17 }

Fig. 1. Minimalist OpenMP example

4. The refined list of unprotected variables is given to the programmer, who
can add new locks in the program to protect them.

formal

modeling

model

checking

adding

locks

lockset

analysis

SUV

refine

SUV?

empty

p
ro

g
ra

m

sequence

constraints

SUV

no

yes

Fig. 2. Parallelization workflow

Considering that the programmers may make a mistake when adding locks to
the program, the verification process is repeated until the list of unprotected
variables is empty. At the end, we ensure that the program is free from data
races. These steps are detailed in the following.

3.1 Lockset Algorithm

Lockset analysis [36] is a technique for dynamic detection of possible data races.
The technique has been successfully implemented in the Eraser tool. Rather

6


than checking the absence of data races, lockset analysis checks whether a pro-
gram adheres to a locking discipline, which requires that each access to a shared
variable is protected by at least one lock. Notice that the respect of this lock-
ing discipline guarantees absence of data races, because all accesses to a shared
variable are mutually exclusive.

The original definition of the simplest possible version of the lockset algo-
rithm [36, Section 2] is the following. “For each shared variable v, the algorithm
maintains the set C(v) of candidate locks for v. This set contains those locks
that have protected v for the computation so far. That is, a lock l is in C(v)
if, in the computation up to that point, every thread that has accessed v was
holding l at the moment of the access. When a new variable v is initialized, its
candidate set C(v) is considered to hold all possible locks. When the variable
is accessed, the algorithm updates C(v) with the intersection of C(v) and the
set of locks held by the current thread. This process, called lockset refinement,
ensures that any lock that consistently protects v is contained in C(v). If some
lock l consistently protects v, it will remain in C(v) as C(v) is refined. If C(v)
becomes empty this indicates that there is no lock that consistently protects v.”

Under the hypothesis that the set of locks held by a thread for each point of
the program is deterministic, a single run of the lockset algorithm is sufficient.
Otherwise, for instance if the operations on locks are data-dependent or vary in
different branches of conditional statements, several runs might be necessary.

The basic lockset algorithm can be refined [36] to reduce the number of
false alarms, for instance to take into account that read accesses need not to be
protected if there is no concurrent write access. For the purpose of this paper,
the basic algorithm is sufficient.

3.2 OpenMP to LNT

In order to build an LNT model that captures the control flows of threads
and their synchronizations, the OpenMP program is broken into work units (or
blocks). A work unit is defined as a part of a task, containing neither conditional
branches synchronizations, nor task scheduling points. Thus, the execution of a
work unit is never interrupted by the runtime scheduler. A work unit graph
may contain two types of nodes: basic nodes represent work units of the pro-
gram, and synchronization nodes represent synchronizations between threads
enforced by OpenMP constructs (i.e., #pragma omp critical, omp_set_lock(),
omp_unset_lock(), ...).

An edge between a pair of nodes represents the execution order. For a pair of
basic nodes, this edge reflects the order of the corresponding work units, which
is imposed by the control flow. For a pair of a basic node and a synchroniza-
tion node, the edge reflects that the work unit denoted by the basic node starts
(respectively, ends) with a synchronization construct corresponding to the syn-
chronization node. The work unit graph can be obtained by static analysis of the
code, akin to the construction of a control flow graph in an optimizing compiler.

Figure 3 represents a work unit graph of the program given in Figure 1. This
work unit graph contains no synchronization nodes, but the one shown later

7


in Figure 5 contains basic nodes (depicted as circles) and synchronization nodes
(lock and unlock nodes, depicted as rectangles with rounded corners). Inspection
of the OpenMP source code (Figure 1) yields the following information about
variables accessed by the various work units. All work units access (read/write)
the variable sum. All work units but WU0 access (read/write) the array a; how-
ever, they access separate elements—the only exception being a[4], which is read
and written by WU5 and read by WU6. Thus, there might be a data race for
variables a[4] and sum.

WU1 WU2 WU3 WU4 WU5

WU0

WU6

Fig. 3. Work unit graph for the program of Figure 1

To analyze the work unit graph using model checking, we first transform it
into an LNT model by applying the following rules:

– Basic nodes are modeled as LNT processes
– Lock/unlock nodes are modeled as synchronizations on gates ACQUIRE/RELEASE

representing acquire/release actions on the lock; two further actions INIT and
DESTROY denote the creation and deletion of the lock

– Barrier nodes are modeled as multiway rendezvous on dedicated LNT syn-
chronization gates

– Edges are modeled as LNT sequential composition
– Branch conditions are modeled as nondeterministic choice using the select

operator

For example, Table 1 shows the LNT code for the work unit graph of Figure 3.

3.3 Sequentiality Detection

A data race may occur on a shared variable x if at least two work units WUi
and WUj accessing x can execute concurrently at some moment. If the two work
units always execute in a deterministic order, they cannot cause a data race on
x, meaning that it is not necessary to protect x by a lock.

8


Table 1. LNT code for work unit graph of Figure 3

module OMP is

process MAIN [WU0 , WU1 , WU2 , WU3 , WU4 , WU5 , WU6: none] is

WU0;

par

WU1

|| WU2

|| WU3

|| WU4

|| WU5

end par;

WU6

end process

end module

To detect the sequential execution of two work units, we exploit the work
unit graph of the OpenMP program. The LNT model of this graph represents
all the possible interleavings of work units (encoded as basic nodes in the graph,
and simply as gate names in the LNT model) permitted by the OpenMP paral-
lelization constructs (encoded as synchronization nodes in the graph and as gate
names or implicit synchronizations in the LNT model). The behavior of the LNT
model is represented by an LTS, in which every state corresponds to a global
state of the work unit graph (i.e., an abstract state of the OpenMP program),
each action denotes the execution of a basic node or a synchronization node, and
each transition indicates that the program can move from one state to another
by performing a certain action.

In terms of this LTS, the sequential execution of two basic nodes correspond-
ing to work units WUi and WUj can be ensured by checking that, in every state,
it is not possible to execute both basic nodes immediately. This property can be
expressed in ACTL [8] as follows:

¬EFtrue(EXWUitrue∧EXWUjtrue)

The formula expresses the absence of a transition sequence leading (from the
initial state of the LTS) to a state having an outgoing transition labeled by WUi
and an outgoing transition labeled by WUj.

For the LNT model of work unit graph shown on Table 1, the above formula
holds for all pairs of work units (WU0, WUi) and (WUi, WU6) with 1 ≤ i ≤ 5,
and fails for all pairs of work units (WUi, WUj) with 1 ≤ i,j ≤ 5 and i 6= j. This
reflects the structure of the work unit graph (WU0 is executed before WU1, ...,
WU5, which are executed before WU6) and indicates the possibility of data races
on the shared variables accessed by work units WU1, ..., WU5, which therefore
must be protected by locks.

9


3.4 Inserting Locks

To protect the access to sum in the for-loop, the programmer declares its body as
critical. The resulting code is shown on Figure 4, the corresponding work unit
graph on Figure 5 (the principal difference with Figure 3 being the addition of
Lock/Unlock nodes), and the corresponding LNT code in Table 2. The principal
difference between Tables 1 and 2 is that the execution of the work units WU1,
WU2, WU3, WU4, and WU5 is protected by a lock (represented by process
LOCK), which has to be acquired before the execution of the work unit, and
released afterwards: these steps are grouped into to process PROTECTED_WU (used
similarly to a procedure). Process LOCK executes as an additional process. It has
a local variable FREE indicating the status of the lock, and ensures that the lock
can only acquired when it is free and that only the process holding the lock
can release it. Gates INIT (respectively, DESTROY) are used to start (respectively,
terminate) the execution of the lock.

Rerunning lockset analysis on the modified program, the accesses to a and
sum in work unit 6 are still not protected by a lock, but model checking shows
sequentiality of work units 1 to 5 with work unit 6, and thus the absence of a
data race without the need of adding any further lock.

4 Related Work

Much effort has been spent on detecting data races in parallel programs. These
efforts can be classified into dynamic and static approaches [4].

Dynamic techniques rely on observations of the running program. Such tech-
niques have been implemented in several tools for race detection in OpenMP
programs. Happens-before analysis monitors accesses to shared variables. If an
access to a shared variable is logically concurrent with any previous conflicting
access, the tool will raise an alarm; a pair of concurrent accesses to the same
variable is conflicting if and only if at least one of them is a write. This tech-
nique leads to no false positives (i.e., each detected issue is indeed a data race),
but might produce false negatives, because its precision depends on the defini-
tion of logically concurrent accesses, which is often expensive to compute. The
happens-before technique has been implemented for instance in RaceStand [23].

Closer related to our suggested method are techniques based on lockset anal-
ysis, which we also use in our flow. The lockset analysis [36] (see also Section 3.1)
aims at enforcing a locking discipline, rather than checking for absence of data
races in general. Tools based on lockset analysis raise an alarm when some access
to a shared variable is not protected by an appropriate lock. Lockset based race
detectors are safe (i.e., they do not produce false negatives), but too pessimistic
because locking is not the only way to provide safe synchronization. Thus, the
Eraser tool [36] implements various improvements of the simple lockset algo-
rithm to reduce the number of false positives, taking into account frequent pro-
gramming patterns. Our workflow has a similar goal, but rather than trying
to improve the algorithm, we use model checking to filter the results. This has

10


int a[5] = {2, 3, 4, 5, 6};

int main()

{

int i, sum = 0; // work unit WU0
#pragma omp parallel

{

#pragma omp for schedule (static , 1)

for (i = 0; i < 5; i++)

{ // work u n i t s WU1 to WU5
#pragma omp critical

{

a[i] = a[i] * a[i];

sum += a[i];

}

}

#pragma omp single

sum += a[4]; // work unit WU6
}

return 0;

}

Fig. 4. Corrected OpenMP code for Figure 1
(added #pragma omp critical inside the for-loop)

WU1 WU2 WU3 WU4 WU5

WU0

WU6

Lock L

Unlock L Unlock L

Lock L Lock L Lock L Lock L

Unlock LUnlock LUnlock L

Fig. 5. Work unit graph for the program of Figure 4

11


Table 2. LNT code for work unit graph of Figure 5

module OMP2 is

channel LOCK_CHANNEL is (Nat) end channel

process LOCK [INIT , DESTROY: none ,

ACQUIRE , RELEASE: LOCK_CHANNEL] is

var FREE: Bool , TID: Nat in

INIT;

FREE := true; −− i n i t i a l l y the l o c k i s f r e e
TID := 1; −− to make LNT2LOTOS happy
loop L in

select

only if FREE then ACQUIRE (?TID) end if

[]

only if not (FREE) then RELEASE (TID) end if

[]

DESTROY; break L

end select;

FREE := not (FREE)

end loop

end var

end process

process PROTECTED_WU [WU: none ,

ACQUIRE , RELEASE: LOCK_CHANNEL]

(id: Nat) is

ACQUIRE (id); −− acquire the l o c k
WU; −− work
RELEASE (id) −− r e l e a s e the l o c k

end process

process MAIN [WU0 , WU1 , WU2 , WU3 , WU4 , WU5 , WU6 ,

INIT , DESTROY: none ,

ACQUIRE , RELEASE: LOCK_CHANNEL] is

par INIT , ACQUIRE , RELEASE , DESTROY in

WU0;

INIT;

par

PROTECTED_WU [WU1 , ACQUIRE , RELEASE] (1)

|| PROTECTED_WU [WU2 , ACQUIRE , RELEASE] (2)

|| PROTECTED_WU [WU3 , ACQUIRE , RELEASE] (3)

|| PROTECTED_WU [WU4 , ACQUIRE , RELEASE] (4)

|| PROTECTED_WU [WU5 , ACQUIRE , RELEASE] (5)

end par;

DESTROY;

WU6

||

LOCK [INIT , DESTROY , ACQUIRE , RELEASE]

end par

end process

end module

12


the advantage of taking into account dependencies, without any prior knowledge
about the kind of dependency.

To improve precision, some tools combine several approaches. Adaptive Dy-
namic Analysis Tool (ADAT) [22] selects, for a given program, suitable race de-
tection techniques, based on their scalability and their efficiency in term of thread
labeling, access filtering and access detecting. Intel R© Thread Checker [35] emu-
lates the sequential version of the application and uses it to derive the happens-
before relation. The tool checks the data dependency of accesses to shared vari-
ables (whenever an OpenMP directive is detected) by using sequentially traced
information, and reports the accesses as races if their dependency satisfies an anti
(write-after-read), flow (read-after-write), and output data dependency (write-
after-write). Oracle R© Developer Studio Thread Analyzer [34] is based on similar
techniques and yields comparable results as Thread Checker [18].

Static analysis tools do not require to run the program. To reduce the com-
plexity of the analysis, some approaches limit their scope to subsets of OpenMP.
For instance, the race avoidance tool [37] is limited to OpenMP programs us-
ing only the #pragma omp parallel for construct. ompVerify [3] detects data
races using a polyhedral model (used to describe execution order of statement
instance, and the relation of statement instances to the memory cells where they
are read or write). The tool covers a class of program fragments called Affine
Control Loops.

OpenMP Analysis ToolKit (OAT) [26] uses an SMT (Satisfiability Modulo
Theories) solver based symbolic analysis to detect data races. In the tool, ev-
ery parallel code region of an OpenMP program is encoded into a first-order
logic formula, which is then solved by the SMT solver. The solution reported by
the SMT solver is interpreted to point out errors and generate a feasible execu-
tion trace that reveals the errors. Nonconcurrency analysis [25] statically detects
whether two statements in an OpenMP program will not be executed concur-
rently by different threads in a team. The RacerD tool [6,17] of the Infer static
analyser7 for concurrent Java code aims at easily usable and understandable bug
reports. Thus RacerD favors the absence of false positives over the guarantee of
the absence of data races.

Model checking is another static analysis technique, based on the analysis of
the reachable state space derived from a model of the program. In the context
of parallel programs, this state space is in general huge, because it has to take
into account every possible execution scenario. Thus, usually it is necessary to
apply property-preserving abstractions to reduce the state space to a tractable
size, but still preserve the control-flow and operations on shared variables. The
more concrete the model, the more precise the results reported by the model
checker, but the more resources are required. Hence, the principal challenge of
model checking based approaches is to find the right abstraction level, with just
the right balance between precision and analysis complexity. This challenge can
be somehow circumvented by not using the model-checker for the verification
of the core property, but to supplement another analysis technique, for instance

7
https://fbinfer.com/

13

https://fbinfer.com/


to construct a more precise happens-before relation [32] or to refine the set of
variables that need a lock as in our approach.

In order to further enhance the accuracy and the scalability of data race
analysis, some approaches employ static analysis to provide guided information
for the dynamic race detectors. For example, ARCHER [2] first identifies data
race-free code regions (i.e., which do not contain data dependencies) with a static
analysis, and then instruments only the remaining, potentially unsafe regions,
for data race detection. Another approach is the combination of a thread label-
ing scheme (to maintain the logical concurrency of thread segments) with the
happens-before technique (to analyze the happens-before relations to detect con-
flicting accesses to every shared memory location) [19,21]. The ThreadSafe [1]
tool (for Java code) applies the principles of the lockset algorithm in the setting
of a static analysis: locksets are computed for abstract summaries of methods. A
generic and formal OpenMP epoch model, which describes memory events of all
OpenMP threads that occur between two synchronization events, has been used
as basis for determining the happens-before relations, which are then applied for
detecting data races [7].

5 Conclusion

We proposed an iterative method to ensure the absence of data races in par-
allel programs using a combination of lockset analysis (to identify unprotected
shared variables) and ACTL model checking (to detect superfluous locks). Al-
though simple, our method is modular, separating the concerns of parallelization
and verification on a formal model of the program. This enables to balance the
precision of the model with the quality of the resulting data race free parallel
program: a more detailed model will increase accuracy in detecting superfluous
locks, but also require more computing resources to analyze it by model check-
ing. We illustrated the method on the parallelization of programs using OpenMP,
by proposing an intermediate representation of the concurrent blocks and their
synchronizations as a work unit graph, which is transformed into an LNT model.

The proposed method could be applied for parallel programs in other lan-
guages as well, provided that a suitable translation to LNT is available (e.g.,
AADL2LNT [31]). Once an abstract LNT model of the parallel program is avail-
able, it can be used not only for checking sequentiality constraints and deadlocks
as in our analysis flow, but also other properties, both qualitative (safety, live-
ness, fairness) and quantitative ones [28]. The method can be further refined
by tackling an industrial case-study involving parallelization using OpenMP.
For model checking large work unit graphs, the compositional verification tech-
niques for ACTL provided by CADP [30,13] (and notably the recent combined
bisimulation technique [24], suitable for the ACTL sequentiality detection for-
mulas given in Section 3.3), can be experimented to find appropriate composition
strategies. Also, alternative ways of translating an OpenMP application into an
LNT program can be investigated, with various degrees of abstraction.

14


Acknowledgements. This work has been supported by the CAPHCA (Critical
Applications on Predictable High-Performance Computing Architectures) project
funded by the PIA programme of the French government.

References

1. Atkey, R., Sannella, D.: ThreadSafe: Static Analysis for Java Con-
currency. Electronic Communications of the EASST 72 (2015).
https://doi.org/10.14279/tuj.eceasst.72.1025

2. Atzeni, S., Gopalakrishnan, G., Rakamaric, Z., Ahn, D.H., Laguna, I., Schulz,
M., Lee, G.L., Protze, J., Müller, M.S.: ARCHER: Effectively Spotting Data
Races in Large OpenMP Applications. In: 2016 IEEE International Paral-
lel and Distributed Processing Symposium (IPDPS). pp. 53–62 (May 2016).
https://doi.org/10.1109/IPDPS.2016.68

3. Basupalli, V., Yuki, T., Rajopadhye, S., Morvan, A., Derrien, S., Quinton, P.,
Wonacott, D.: ompVerify: Polyhedral Analysis for the OpenMP Programmer. In:
Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) OpenMP in the
Petascale Era, vol. 6665, pp. 37–53. Springer (2011). https://doi.org/10.1007/978-
3-642-21487-5 4, https://hal.inria.fr/hal-00752626

4. Beckman, N.E.: A Survey of Methods for Preventing Race Conditions (2006)

5. ter Beek, M.H., Fantechi, A., Gnesi, S., Mazzanti, F.: A State/Event-based Model-
checking Approach for the Analysis of Abstract System Properties. Sci. Comput.
Program. 76(2), 119–135 (2011). https://doi.org/10.1016/j.scico.2010.07.002

6. Blackshear, S., Gorogiannis, N., O’Hearn, P.W., Sergey, I.: RacerD: Composi-
tional Static Race Detection. Proc. ACM Program. Lang. 2(OOPSLA), 144:1–
144:28 (Oct 2018). https://doi.org/10.1145/3276514, http://doi.acm.org/10.
1145/3276514

7. Cramer, T., Schwitanski, S., Münchhalfen, F., Terboven, C., Müller, M.S.: An
OpenMP Epoch Model for Correctness Checking. In: 2016 45th International Con-
ference on Parallel Processing Workshops (ICPPW). pp. 299–308 (Aug 2016).
https://doi.org/10.1109/ICPPW.2016.51

8. De Nicola, R., Fantechi, A., Gnesi, S., Ristori, G.: An Action-Based Frame-
work for Verifying Logical and Behavioural Properties of Concurrent Sys-
tems. Computer Networks and ISDN Systems 25(7), 761–778 (Feb 1993).
https://doi.org/10.1016/0169-7552(93)90047-8

9. De Nicola, R., Vaandrager, F.: Action versus State Based Logics for Transition
Systems. In: Guessarian, I. (ed.) Semantics of Systems of Concurrent Processes:
Proceedings of the LITP Spring School on Theoretical Computer Science, La Roche
Posay, France. Lecture Notes in Computer Science, vol. 469, pp. 407–419. Springer
(Apr 1990)

10. Fantechi, A., Gnesi, S., Ristori, G.: From ACTL to Mu-Calculus. In: Proceedings
of the ERCIM Workshop on Theory and Practice in Verification (Pisa, Italy). pp.
3–10 (Jan 1992)

11. Fantechi, A., Gnesi, S., Ristori, G.: Model Checking for Action-
Based Logics. Formal Methods in System Design 4(2), 187–203 (1994).
https://doi.org/10.1007/BF01384084

12. Fantechi, A., Gnesi, S., Ristori, G.: Modelling Transition Systems within an Action
Based Logic. Technical report, IEI-CNR, Pisa (1996)

15

https://doi.org/10.14279/tuj.eceasst.72.1025
https://doi.org/10.1109/IPDPS.2016.68
https://doi.org/10.1007/978-3-642-21487-5_4
https://doi.org/10.1007/978-3-642-21487-5_4
https://hal.inria.fr/hal-00752626
https://doi.org/10.1016/j.scico.2010.07.002
https://doi.org/10.1145/3276514
http://doi.acm.org/10.1145/3276514
http://doi.acm.org/10.1145/3276514
https://doi.org/10.1109/ICPPW.2016.51
https://doi.org/10.1016/0169-7552(93)90047-8
https://doi.org/10.1007/BF01384084


13. Garavel, H., Lang, F., Mateescu, R.: Compositional Verification of Asynchronous
Concurrent Systems Using CADP. Acta Informatica 52(4), 337–392 (Apr 2015)

14. Garavel, H., Lang, F., Mateescu, R., Serwe, W.: CADP 2011: A Toolbox for the
Construction and Analysis of Distributed Processes. Springer International Journal
on Software Tools for Technology Transfer (STTT) 15(2), 89–107 (Apr 2013)

15. Garavel, H., Lang, F., Serwe, W.: From LOTOS to LNT. In: Katoen, J.P.,
Langerak, R., Rensink, A. (eds.) ModelEd, TestEd, TrustEd – Essays Dedicated
to Ed Brinksma on the Occasion of His 60th Birthday. Lecture Notes in Computer
Science, vol. 10500, pp. 3–26. Springer (Oct 2017)

16. Gnesi, S., Mazzanti, F.: On the Fly Verification of Network of Automata. In:
Arabnia, H.R. (ed.) Proceedings of the International Conference on Parallel and
Distributed Processing Techniques and Applications PDPTA’1999 (Las Vegas,
Nevada, USA). pp. 1040–1046. CSREA Press (June–July 1999)

17. Gorogiannis, N., O’Hearn, P.W., Sergey, I.: A True Positives Theorem for a Static
Race Detector. Proc. ACM Program. Lang. 3(POPL), 57:1–57:29 (Jan 2019).
https://doi.org/10.1145/3290370

18. Ha, O.K., Kim, Y.J., Kang, M.H., Jun, Y.K.: Empirical Comparison of Race Detec-
tion Tools for OpenMP Programs. In: Ślezak, D., Kim, T.h., Yau, S.S., Gervasi, O.,
Kang, B.H. (eds.) Grid and Distributed Computing. pp. 108–116. Communications
in Computer and Information Science, Springer (2009)

19. Ha, O.K., Kuh, I.B., Tchamgoue, G.M., Jun, Y.K.: On-the-fly Detection of Data
Races in OpenMP Programs. In: Proceedings of the 2012 Workshop on Parallel
and Distributed Systems: Testing, Analysis, and Debugging. pp. 1–10. PADTAD
2012, ACM (2012). https://doi.org/10.1145/2338967.2336808

20. Henzinger, T.A., Jhala, R., Majumdar, R.: Race checking by context inference. In:
Pugh, W., Chambers, C. (eds.) Proceedings of the ACM SIGPLAN Conference
on Programming Language Design and Implementation PLDI’2004 (Washington,
DC, USA). pp. 1–13. ACM (June 2004). https://doi.org/10.1145/996841.996844

21. Kang, M.H., Ha, O.K., Jun, S.W., Jun, Y.K.: A Tool for Detecting First Races in
OpenMP Programs. In: Malyshkin, V. (ed.) Parallel Computing Technologies. pp.
299–303. Springer (2009)

22. Kim, Y., Song, S., Jun, Y.: ADAT: An Adaptable Dynamic Analysis Tool for Race
Detection in OpenMP Programs. In: 2011 IEEE Ninth International Symposium
on Parallel and Distributed Processing with Applications. pp. 304–310 (May 2011).
https://doi.org/10.1109/ISPA.2011.49

23. Kim, Y.J., Park, M.Y., Park, S.H., Jun, Y.K.: A Practical Tool for Detecting Races
in OpenMP Programs. In: Malyshkin, V. (ed.) Parallel Computing Technologies.
pp. 321–330. Springer (2005)

24. Lang, F., Mateescu, R., Mazzanti, F.: Compositional Verification of Concurrent
Systems by Combining Bisimulations. In: Proceedings of the 3rd World Congress on
Formal Methods FM’2019 (Porto, Portugal). Lecture Notes in Computer Science,
Springer (Oct 2019), to appear

25. Lin, Y.: Static Nonconcurrency Analysis of OpenMP Programs. In: Proceedings of
the 2005 and 2006 International Conference on OpenMP Shared Memory Parallel
Programming. pp. 36–50. IWOMP’05/IWOMP’06, Springer (2008), http://dl.
acm.org/citation.cfm?id=1892830.1892835

26. Ma, H., Diersen, S.R., Wang, L., Liao, C., Quinlan, D., Yang, Z.: Sym-
bolic Analysis of Concurrency Errors in OpenMP Programs. In: 2013 42nd
International Conference on Parallel Processing. pp. 510–516 (Oct 2013).
https://doi.org/10.1109/ICPP.2013.63

16

https://doi.org/10.1145/3290370
https://doi.org/10.1145/2338967.2336808
https://doi.org/10.1145/996841.996844
https://doi.org/10.1109/ISPA.2011.49
http://dl.acm.org/citation.cfm?id=1892830.1892835
http://dl.acm.org/citation.cfm?id=1892830.1892835
https://doi.org/10.1109/ICPP.2013.63


27. Mateescu, R.: CAESAR SOLVE: A Generic Library for On-the-Fly Resolution
of Alternation-Free Boolean Equation Systems. Springer International Journal on
Software Tools for Technology Transfer (STTT) 8(1), 37–56 (Feb 2006), full version
available as INRIA Research Report RR-5948, July 2006

28. Mateescu, R., Serwe, W.: Model Checking and Performance Evaluation with CADP
Illustrated on Shared-Memory Mutual Exclusion Protocols. Science of Computer
Programming 78(7), 843–861 (Jul 2013)

29. Mateescu, R., Thivolle, D.: A Model Checking Language for Concurrent Value-
Passing Systems. In: Cuellar, J., Maibaum, T., Sere, K. (eds.) Proceedings of the
15th International Symposium on Formal Methods (FM’08), Turku, Finland. Lec-
ture Notes in Computer Science, vol. 5014, pp. 148–164. Springer (May 2008)

30. Mateescu, R., Wijs, A.: Property-Dependent Reductions Adequate with
Divergence-Sensitive Branching Bisimilarity. Sci. Comput. Program. 96(3), 354–
376 (2014)

31. Mkaouar, H., Zalila, B., Hugues, J., Jmaiel, M.: From AADL Model to LNT Spec-
ification. In: de la Puente, J.A., Vardanega, T. (eds.) Proceedings of the 20th
ADA-Europe International Conference on Reliable Software Technologies (Ada-
Europe’15), Madrid, Spain. Lecture Notes in Computer Science, vol. 9111, pp.
146–161. Springer (2015)

32. Nakade, R., Mercer, E., Aldous, P., McCarthy, J.: Model-Checking Task Parallel
Programs for Data-Race. In: Dutle, A., Muñoz, C., Narkawicz, A. (eds.) NASA
Formal Methods. pp. 367–382. Springer International Publishing (2018)

33. OpenMP Architecture Review Board: OpenMP Application Program-
ming Interface (Nov 2018), https://www.openmp.org/wp-content/uploads/
OpenMP-API-Specification-5.0.pdf

34. Oracle Studio 12.6: Thread Analyzer User’s Guide (Jun 2017), https://docs.
oracle.com/cd/E77782_01/html/E77800/index.html

35. Petersen, P., Shah, S.: OpenMP Support in the Intel R© Thread Checker. In: Voss,
M.J. (ed.) OpenMP Shared Memory Parallel Programming. pp. 1–12. Springer
(2003)

36. Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., Anderson, T.: Eraser: A Dy-
namic Data Race Detector for Multithreaded Programs. ACM Trans. Comput.
Syst. 15(4), 391–411 (Nov 1997). https://doi.org/10.1145/265924.265927

37. Shah, D.: Analysis of an OpenMP Program for Race Detection. Master’s thesis,
San Jose State University (2009)

38. Süß, M., Leopold, C.: Common Mistakes in OpenMP and How to Avoid Them:
A Collection of Best Practices. In: Proceedings of the 2005 and 2006 Interna-
tional Conference on OpenMP Shared Memory Parallel Programming. pp. 312–323.
IWOMP’05/IWOMP’06, Springer (2008), http://dl.acm.org/citation.cfm?id=
1892830.1892863

17

https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf
https://docs.oracle.com/cd/E77782_01/html/E77800/index.html
https://docs.oracle.com/cd/E77782_01/html/E77800/index.html
https://doi.org/10.1145/265924.265927
http://dl.acm.org/citation.cfm?id=1892830.1892863
http://dl.acm.org/citation.cfm?id=1892830.1892863

	Hunting Superfluous Locks with Model Checking