key: cord-0046592-m68sa88q
authors: Baranowski, Marek; He, Shaobo; Lechner, Mathias; Nguyen, Thanh Son; Rakamarić, Zvonimir
title: An SMT Theory of Fixed-Point Arithmetic
date: 2020-05-30
journal: Automated Reasoning
DOI: 10.1007/978-3-030-51074-9_2
sha: 1afa03e493afeb67e3bdd8b4551ea6f74c0fb8cc
doc_id: 46592
cord_uid: m68sa88q

Fixed-point arithmetic is a popular alternative to floating-point arithmetic on embedded systems. Existing work on the verification of fixed-point programs relies on custom formalizations of fixed-point arithmetic, which makes it hard to compare the described techniques or reuse the implementations. In this paper, we address this issue by proposing and formalizing an SMT theory of fixed-point arithmetic. We present an intuitive yet comprehensive syntax of the fixed-point theory, and provide formal semantics for it based on rational arithmetic. We also describe two decision procedures for this theory: one based on the theory of bit-vectors and the other on the theory of reals. We implement the two decision procedures, and evaluate our implementations using existing mature SMT solvers on a benchmark suite we created. Finally, we perform a case study of using the theory we propose to verify properties of quantized neural networks.

Algorithms based on real arithmetic have become prevalent. For example, the mathematical models in machine learning algorithms operate on real numbers. Similarly, signal processing algorithms often implemented on embedded systems (e.g., fast Fourier transform) are almost always defined over real numbers. However, real arithmetic is not implementable on computer systems due to its unlimited precision. Consequently, we use implementable approximations of real arithmetic, such as floating-point and fixed-point arithmetic, to realize these algorithms in practice.

Floating-point arithmetic is the dominant approximation of real arithmetic that has mature hardware support. Although it enjoys the benefits of being able to represent a large spectrum of real numbers and high precision of arithmetic operations over small numbers, floating-point arithmetic, due to its complexity, can be too expensive in terms of speed and power consumption on certain platforms. These platforms are often deployed in embedded systems such as mobile phones, video game consoles, and digital controllers. Recently, the machine learning community revived the interest in fixed-point arithmetic since popular machine learning algorithms and models can be implemented using (even very low bit-width) fixed-points with little accuracy loss [11, 27, 37] . Therefore, fixed-point arithmetic has been a popular alternative to floating-point arithmetic on such platforms since it can be efficiently realized using integer arithmetic. There are several software implementations of fixed-point arithmetic in different programming languages [22, 28, 34] ; moreover, some programming languages, such as Ada and GNU C, have built-in fixed-point types.

While fixed-point arithmetic is less popular in mainstream applications than floating-point arithmetic, the systems employing the former are often safetycritical. For example, fixed-point arithmetic is often used in medical devices, cars, and robots. Therefore, there is a need for formal methods that can rigorously ensure the correctness of these systems. Although techniques that perform automated verification of fixed-point programs already exist [1, 3, 15] , all of them implement a custom dedicated decision procedure without formalizing the details of fixed-point arithmetic. As a result, it is hard to compare these techniques, or reuse the implemented decision procedures.

On the other hand, ever since the SMT theory of floating-point numbers was formalized [8, 44] in SMT-LIB [46] , there has been a flurry of research in developing novel and faster decision procedures for the theory [6, 7, 14, 29, 35, 50] . Meanwhile, the floating-point theory has also been used by a number of approaches that require rigorous reasoning about floating-point arithmetic [2, 36, 39, 41] . The published formalization of the theory enables fair comparison between the decision procedures, sharing of benchmarks, and easy integration of decision procedures within tools that need this functionality. In this paper, we propose and formalize an SMT theory of fixed-point arithmetic, in the spirit of the SMT theory of floating-point arithmetic, with the hope that it will lead to similar outcomes and advances.

Contributions. We summarize our main contributions as follows:

-We present an intuitive and comprehensive syntax of fixed-point arithmetic (Sect. 3) that captures common use cases of fixed-point operations. -We provide formal semantics of the fixed-point theory based on rational arithmetic (Sect. 4). -We propose and implement two decision procedures for the fixed-point theory:

one that leverages the theory of fixed-size bit-vectors and the other the theory of real numbers (Sect. 5). -We evaluate the two decision procedures on a set of benchmarks using mature SMT solvers (Sect. 6), and perform a case study of verifying quantized neural networks that uses our theory of fixed-point arithmetic (Sect. 7).

Fixed-point arithmetic, like floating-point arithmetic, is used as an approximation for computations over the reals. Both fixed-point and floating-point numbers (excluding the special values) can be represented using rational numbers. However, unlike floating-point numbers, fixed-point numbers in a certain format maintain a fixed divisor, hence the name fixed-point. Consequently, fixed-point numbers have a reduced range of values. However, this format allows for custom precision systems to be implemented efficiently in software-fixed-point arithmetic operations can be implemented in a much smaller amount of integer arithmetic operations compared to their floating-point counterparts. For example, a fixed-point addition operation simply amounts to an integer addition instruction provided that wrap-around is the intended behavior when overflows occur. This feature gives rise to the popularity of fixed-point arithmetic on embedded systems where computing resources are fairly constrained. A fixed-point number is typically interpreted as a fraction whose numerator is an integer with fixed bit-width in its two's complement representation and denominator is a power of 2. Therefore, a fixed-point format is parameterized by two natural numbers-tb that defines the bit-width of the numerator and fb that defines the power of the denominator. A fixed-point number in this format can be treated as a bit-vector of length tb that is the two's complement representation of the numerator integer and has an implicit binary point between the fb + 1 th and fb th least significant bits. We focus on the binary format (as opposed to decimal, etc.) of fixed-point arithmetic since it is widely adopted in hardware and software implementations in practice. Moreover, depending on the intended usage, developers leverage both signed and unsigned fixed-point formats. The signed or unsigned format determines whether the bit pattern representing the fixed-point number should be interpreted as a signed or unsigned integer, respectively. Therefore, signed and unsigned fixed-point formats having the same tb and fb have different ranges (

] and [0, 2 tb −1 2 fb ]), respectively. Fixed-point addition (resp. subtraction) is typically implemented by adding (resp. subtracting) the two bit-vector operands (i.e., two's complements), amounting to a single operation. Because the denominators are the same between the two operands, we do not need to perform rounding. However, we still have to take care of potential overflows that occur when the result exceeds the allowed range of the chosen fixed-point format. Fixed-point libraries typically implement two methods to handle overflows: saturation and wrap-around. Saturation entails fixing overflowed results to either the minimal or maximal representable value. The advantage of this method is that it ensures that the final fixed-point result is the closest to the actual result not limited by finite precision. Wrap-around allows for the overflowing result to wrap according to two's complement arithmetic. The advantage of this method is that it is efficient and can be used to ensure the sum of a set of (signed) numbers has a correct final value despite potential overflows (if it falls within the supported range). Note that addition is commutative under both methods, but only addition using the wrap-around method is associative. The multiplication and division operations are more involved since they have to include the rounding step as well.

In this section, we describe the syntax of our proposed theory of fixed-point arithmetic. It is inspired by the syntax of the SMT theory of floating-points [8, 44] and the ISO/IEC TR 18037 standard [23].

Fixed-Points. We introduce the indexed SMT nullary sorts (_ SFXP tb fb) to represent signed fixed-point sorts, where tb is a natural number specifying the total bit-width of the scaled integer in its two's complement form and fb is a natural number specifying the number of fractional bits; tb is greater than or equal to fb. Similarly, we represent unsigned fixed-point sorts with (_ UFXP tb fb). Following the SMT-LIB notation, we define the following two functions for constructing fixed-points literals:

where (_ sfxp fb) (resp. (_ ufxp fb)) produces a function that takes a bit-vector (_ BitVec tb) and constructs a fixed-point (_ SFXP tb fb) (resp. (_ UFXP tb fb)).

Similarly to the theory of floating-point arithmetic, we also introduce the RoundingMode sort (abbreviated as RM) to represent the rounding mode, which controls the direction of rounding when an arithmetic result cannot be precisely represented by the specified fixed-point format. However, unlike the floating-point theory that specifies five different rounding modes, we only adopt two rounding mode constants, namely roundUp and roundDown, as they are common in practice.

Overflow Modes. We introduce the nullary sort OverflowMode (abbreviated as OM) to capture the behaviors of fixed-point arithmetic when the result of an operation is beyond the representable range of the used fixed-point format. We adopt two constants, saturation and wrapAround, to represent the two common behaviors. The saturation mode rounds any out-of-bound results to the maximum or minimum values of the representable range, while the wrapAround mode wraps the results around similar to bit-vector addition.

Comparisons. The following operators return a Boolean by comparing two fixedpoint numbers:

Arithmetic. We support the following binary arithmetic operators over fixedpoint sorts parameterized by tb and fb:

Note that we force the sorts of operands and return values to be the same. The addition and subtraction operations never introduce error into computation according to our semantics in Sect. 4. Hence, these operators do not take a rounding mode as input like multiplication and division.

Conversions. We introduce two types of conversions between sorts. First, the conversions between different fixed-point sorts:

Second, the conversions between the real and fixed-point sorts:

In this section, we formalize the semantics of the fixed-point theory by treating fixed-points as rational numbers. We first define fixed-points as indexed subsets of rationals. Then, we introduce two functions, rounding and overflow, that are crucial for the formalization of the fixed-point arithmetic operations. Finally, we present the formal semantics of the arithmetic operations based on rational arithmetic and the two aforementioned functions.

Let F fb = { n 2 fb | n ∈ Z} be the infinite set of rationals that can be represented as fixed-points using fb fractional bits. We interpret a signed fixed-point sort (_ SFXP tb fb) as the finite subset

, where function bv2nat converts a bit-vector to its unsigned integer value. The rational value of its signed counterpart constructed using (sfxp bv fb) is bv2int(bv ) 2 fb , where function bv2int converts a bit-vector to its signed value. Since we treat fixed-point numbers as subsets of rational numbers, we interpret fixed-point comparison operators, such as =, fxp.le, fxp.leq, as simply their corresponding rational comparison relations, such as =, <, ≤, respectively. To be able to formalize the semantics of arithmetic operations, we first introduce the round and overflow helper functions.

We interpret the rounding mode sort RoundingMode as the set rmode = {ru, rd }, where roundUp = ru and roundDown = rd . Let rnd F fb : rmode×R → F fb be a family of round functions parameterized by fb that map a rounding mode and real number to an element of F fb . Then, we define rnd F fb as

We interpret the overflow mode sort OverflowMode as the set omode = {sat, wrap}, where saturation = sat and wrapAround = wrap. Let ovf F : omode × F fb → F be a family of overflow functions parameterized by F that map a rounding mode and element of F fb to an element of F; here, F is either S tb,fb or U tb,fb depending on whether we are using signed or unsigned fixed-point numbers, respectively. Then, we define ovf F as

Note that x · 2 fb , y · 2 fb ∈ Z according to the definition of F, and also there is always exactly one y satisfying the constraint. Now that we introduced our helper round and overflow functions, it is easy to define the interpretation of fixed-point arithmetic operations:

Therefore, we do not need to round the results of the addition and subtraction operations. In the case of division by zero, we adopt the semantics of other SMT theories such as reals: (= x (sfxp.div om rm y 0)) and (= x (ufxp.div om rm y 0)) are satisfiable for every x, y ∈ F, om ∈ omode, rm ∈ rmode. Furthermore, for every x, y ∈ F, om ∈ omode, rm ∈ rmode, if (= x y) then (= (sfxp.div om rm x 0) (sfxp.div om rm y 0)) and (= (ufxp.div om rm x 0) (ufxp.div om rm y 0)).

Note that the order of applying the rnd and ovf functions to the results in real arithmetic matters. We choose rnd followed by ovf since it matches the typical real-world fixed-point semantics. Conversely, reversing the order can lead to out-of-bound results. For example, assume that we extend the signature of the ovf function to omode × R → R while preserving its semantics as a modulo operation over 2 tb−fb . Then, ovf U3,2 (wrap, 7.5) evaluates to 7.5 4 , and applying rnd F2 to it when the rounding mode is ru evaluates to 8 4 ; this is greater than the maximum number in U 3,2 , namely 7 4 . On the other hand, evaluating ovf U3,2 (wrap, rnd F2 (ru, 7.5)) produces 0, which is the expected result. We could apply the ovf function again to the out-of-bound results, but the current semantics achieves the same without this additional operation.

Let cast F,F fb : omode × rmode × R → F be a family of cast functions parameterized by F and F fb that map an overflow mode, rounding mode, and real number to an element of F; as before, F is either S tb,fb or U tb,fb depending on whether we are using signed or unsigned fixed-point numbers, respectively. Then, we define cast F,F fb (om, rm, r) = ovf F (om, rnd F fb (rm, r)), and the interpretation of the conversions between reals and fixed-points as (_ to_sfxp tb fb) (om, rm, r) = cast S tb,fb ,F fb (om, rm, r) (_ to_ufxp tb fb) (om, rm, r) = cast U tb,fb ,F fb (om, rm, r) sfxp.to_real (r) = r ufxp.to_real (r) = r

In this section, we propose two decision procedures for the fixed-point theory by leveraging the theory of fixed-size bit-vectors in one and the theory of reals in the other.

Bit-Vector Encoding. The decision procedure based on the theory of fixed-size bit-vectors is akin to the existing software implementations of fixed-point arithmetic that use machine integers. More specifically, a fixed-point sort parameterized by tb is encoded as a bit-vector sort of length tb. Therefore, the encoding of the constructors of fixed-point values simply amounts to identity functions. Similarly, the encoding of the comparison operators uses the corresponding bit-vector relations. For example, the comparison operator sfxp.lt is encoded as bvslt.

The essence of the encoding of the arithmetic operations is expressing the numerator of the result, after rounding and overflow handling, using bit-vector arithmetic. We leverage the following two observations in our encoding. First, rounding a real value v to the value in the set F fb amounts to rounding v ·2 fb to an integer following the same rounding mode. This observation explains why rounding is not necessary for the linear arithmetic operations. Second, we can encode the wrap-around of the rounded result as simply extracting tb bits from the encoded result thanks to the wrap-around nature of the two's complement SMT representation. We model the behavior of division-by-zero using uninterpreted functions of the form (RoundingMode OverflowMode (_ BitVec tb) (_ BitVec tb)), with one such function for each fixed-point sort appearing in the query. The result of division-by-zero is then the result of applying this function to the numerator, conditioned on the denominator being equal to zero. This ensures that all divisions-by-zero with equal numerators produce equal results when the overflow and rounding modes are also equal.

Real Encoding. The decision procedure based on the theory of reals closely mimics the semantics defined in Sect. 4. We encode all fixed-point sorts as the real sort, while we represent fixed-point values as rational numbers. Therefore, we can simply encode fixed-point comparisons as real relations. For example, both sfxp.lt and ufxp.lt are translated into < relation. We rely on the first observation above to implement the rounding function rnd fb using an SMT real-tointeger conversion. We implement the overflow function ovf tb,f b using the SMT remainder function. Note that the encodings of both functions involve non-linear real functions, such as the real-to-int conversion. Finally, we model division as the rounded, overflow-corrected result of the real theory's division operation. Since the real theory's semantics ensures that equivalent division operations produce equivalent results, this suffices to capture the fixed-point division-by-zero semantics.

Implementation. We implemented the two decision procedures within the pySMT framework [25] : the two encodings are rewriting classes of pySMT. We made our implementations publicly available. 1 We also implemented a random generator of queries in our fixed-point theory, and used it to perform thorough differential testing of our decision procedures.

We generated the benchmarks we use to evaluate the two encodings described in Sect. 5 by translating the SMT-COMP non-incremental QF_FP benchmarks [45] . The translation accepts benchmarks that contain only basic arithmetic operations defined in both theories. Moreover, we exclude all the benchmarks in the wintersteiger folder because they are mostly simple regressions to test the correctness of an implementation of the floating-point theory. In the end, we manage to translate 218 QF_FP benchmarks in total.

We translate each QF_FP benchmark into 4 benchmarks in the fixed-point theory, which differ in the configurations of rounding and overflow modes. We denote a configuration as a (rounding mode, overflow mode) tuple. Note that changing a benchmark configuration alters the semantics of its arithmetic operations, which might affect its satisfiability. Our translation replaces floating-point sorts with fixed-point sorts that have the same total bit-widths; the number of fractional bits is half of the bit-width. This establishes a mapping from singleprecision floats to Q16.16 fixed-points implemented by popular software libraries such as libfixmath [34] . It translates arithmetic operations into their corresponding fixed-point counterparts using the chosen configuration uniformly across a benchmark. The translation also replaces floating-point comparison operations with their fixed-point counterparts. Finally, we convert floating-point constants by treating them as reals and performing real-to-fixed-point casts. We made our fixed-point benchmarks publicly available. 2 The SMT solvers that we use in the evaluation are Boolector [9] (version 3.1.0), CVC4 [4] (version 1.7), MathSAT [13] (version 5.5.1), Yices2 [19] (version 2.6.1), and Z3 [17] (version 4.8.4) for the decision procedure based on the theory of bit-vectors. For the evaluation of the decision procedure based on the theory of reals, we use CVC4, MathSAT, Yices2, and Z3. We ran the experiments on a machine with four Intel E7-4830 sockets, for a total of 32 physical cores, and 512GB of RAM, running Ubuntu 18.04. Each benchmark was limited to 1200 s of wall time and 8GB of memory, and no run of any benchmark exceeded the memory limit. We set processor affinity for each solver instance in order to reduce variability due to cache effects. Table 1 shows the results of running the SMT solvers on each configuration with both encodings (bit-vector and real). We do not observe any inconsistencies in terms of satisfiability reported among all the solvers and between both encodings. The performance of the solvers on the bit-vector encoding is typically better than on the real encoding since it leads to fewer timeouts and crashes. Moreover, all the solvers demonstrate similar performance for the bit-vector encoding Table 1 . The results of running SMT solvers on the four different configurations of the benchmarks using both encodings. Boolector and MathSAT are denoted by Btor and MSAT, respectively. Column "All" indicates the number of benchmarks for which any solver answered sat or unsat; benchmarks for which no solver gave an answer are counted as unknown. Table 2 . Comparison of the number of benchmarks (considering all configurations) solved by a solver but not solved by another solver. Each row shows the number of benchmarks solved by the row's solver but not solved by the column's solver. We mark the bit-vector (resp. real) encoding with B (resp. R). across all the configurations, whereas they generally produce more timeouts for the real encoding when the overflow mode is wrap-around. We believe that this can be attributed to the usage of nonlinear operations (e.g., real to int casts) in the handling of wrap-around behaviors. This hypothesis could also explain the observation that the bit-vector encoding generally outperform the real encoding when the overflow mode is wrap-around since wrap-around comes at almost no cost for the bit-vector encoding (see Sect. 5).

Column "All" captures the performance of the solvers when treated as one portfolio solver. This improves the overall performance since the number of solved benchmarks increases, indicating that each solver has different strengths and weaknesses. Table 2 further analyzes this behavior, and we identify two reasons for it when we consider unique instances solved by each individual solver. First, when the overflow mode is saturation, Yices2 is the only solver to solve unique instances for both encodings. Second, when the overflow mode is wrap-around, the uniquely solved instances come from solvers used on the bit-vector encoding, except one that comes from Yices2 on the real encoding. These results provide further evidence that the saturation configurations are somewhat easier to solve with reals, and that wrap-around is easier with bit-vectors. Figure 1 uses quantile plots [5] to visualize our experimental results in terms of runtimes. A quantile plot shows the minimum runtime on y-axis within which each of the x-axis benchmarks is solved. Some characteristics of a quantile plot are helpful in analyzing the runtimes. First, the rightmost x coordinate is the number of benchmarks that a solver returns meaningful results for (i.e., sat or unsat). Second, the uppermost y coordinate is the maximum runtime of all the benchmarks. Third, the area under a line approximates the total runtime.

Although the semantics of the benchmarks vary for each configuration, we can observe that the shapes of the bit-vector encoding curves are similar, while those of the real encoding differ based on the chosen overflow mode. More precisely, solvers tend to solve benchmarks faster when their overflow mode is saturation as opposed to wrap-around. We observe the same behavior in Table 1 , and it is likely due to the fact that we introduce nonlinear operations to handle wraparound behaviors when using the real encoding.

Neural networks have experienced a significant increase in popularity in the past decade. Such networks that are realized by a composition of non-linear layers are able to efficiently solve a large variety of previously unsolved learning tasks. However, neural networks are often viewed as black-boxes, whose causal structure cannot be interpreted easily by humans [40] . This property makes them unfit for applications where guaranteed correctness has a high priority. Advances in formal methods, in particular SMT solvers, leveraging the piece-wise linear structure of neural networks [20, 31, 47] , have made it possible to verify certain formal properties of neural networks of reasonable size. While these successes provide an essential step towards applying neural networks to safety-critical tasks, these methods leave out one crucial aspect-neural networks are usually quantized before being deployed to production systems [30] .

Quantization converts a network that operates over 32-bit floating-point semantics into a fewer-bit fixed-point representation. This process serves two goals: compressing the memory requirement and increasing the computational efficiency of running the network. Quantization introduces additional non-linear rounding operations to the semantics of a neural network. Recently, Giacobbe et al. [26] have shown that, in practice, this can lead to situations where a network that satisfies formal specifications might violate them after the quantization step. Therefore, when checking formal properties of quantized neural networks, we need to take their fixed-point semantics into account.

We derive a set of example fixed-point problem instances based on two machine learning tasks to demonstrate the capabilities of our fixed-point SMT theory on realistic problems. For all tasks, we train multi-layer perceptron modules [43] with ReLU-7 activation function [32] using quantization-aware training [30] . This way we avoid that quantization results in a considerable loss of accuracy. To encode a neural network into an SMT formula, we rely on the Giacobbe et al.'s [26] approach for encoding the summations and activation functions. We quantize all neural networks using the signed fixed-point format with 8 bits total and 4 fractional bits. We are using the bit-vector encoding decision procedure in combination with the Boolector SMT solver.

In our first task, we train a neural network controller using the cart-pole environment of OpenAI's "gym" reinforcement learning suite. In this task, an agent has to balance a pole mounted on a movable cart in an upright position. The cart provides four observation variables x,ẋ,ϕ,φ to the controller, where x is the position of the cart and ϕ the angle of the pole. The controller then steers the cart by discrete actions (move left or right). Our neural network agent, composed of three layers (4, 8, 1) , solves the task by achieving an average score of the maximal 500 points. We analyze what our black-box agent has learned by using our decision procedure. In particular, we are interested in how much our agent relies on the input variable x compared to ϕ for making a decision. Moreover, we are interested in which parts of the input space the agent's decision is constant. We assume the dynamics of the cart is bounded, i.e., −0.3 ≤ẋ ≤ 0.3, −0.02 ≤φ ≤ 0.2, and partition the input space of the remaining two input variables into a grid of 64 tiles. We then check for each tile whether there exists a situation when the network would output a certain action (left, right) by invoking our decision procedure. Figure 2 shows that the agent primarily relies on the input variable ϕ for making a decision. If the angle of the pole exceeds a certain threshold, the network is guaranteed to make the vehicle move left; on the other hand, if the angle of the pole is below a different threshold, the network moves the vehicle right. Interestingly, this pattern is non-symmetric, despite the task being entirely symmetric. For our second task, we checked the fairness specification proposed by Giacobbe et al. [26] to evaluate the maximum influence of a single input variable on the decision of a network. We train a neural network on student data to predict the score on a math exam. Among other personal features, the gender of a person is fed into the network for making a decision. As the training data contains a bias in the form of a higher average math score for male participants, the network might learn to underestimate the math score of female students. We employ our decision procedure to compute the maximum influence of the gender of a person to its predicted math score. First, we create encodings of the same network (3 layers of size 6, 16, and 1) that share all input variables except the gender as a single fixed-point theory formula. We then constrain the predicted scores such that the one network outputs a score that is c higher than the score predicted by the other network. Finally, we perform binary search by iteratively invoking our decision procedure to find out at what bias c the formula changes from satisfiable to unsatisfiable. Table 3 shows that there exists a hypothetical person whose predicted math score would drop by 11.5 points out of 100 if the person is female instead of male. Moreover, our results also show that for no person the math score would change by 11.75 points if the gender would be changed.

Ruemmer and Wahl [44] and Brain et al. [8] propose and formalize the SMT theory of the IEEE-754 floating-point arithmetic. We were inspired by these papers both in terms of the syntax and the formalization of the semantics of our theory. There are several decision procedures for the floating-point theory. In particular, Brain et al. [7] present an efficient and verified reduction from the theory of floating-points to the theory of bit-vectors, while Leeser et al. [33] solve the floating-point theory by reducing it to the theory of reals. These two decision procedures are much more complicated than the ones we describe in Sect. 5 due to the more complex nature of floating-point arithmetic.

In the rest of this section, we introduce related approaches that perform verification or synthesis of programs that use fixed-point arithmetic. Many of these approaches, and in particular the SMT-based ones, could benefit from our unified formalization of the theory of fixed-point arithmetic. For example, they could leverage our decision procedures instead of developing their own from scratch. Moreover, having the same format allows for easier sharing of benchmarks and comparison of results among different decision procedures.

Eldib et al. [21] present an SMT-based method for synthesizing optimized fixed-point computations that satisfy certain acceptance criteria, which they rigorously verify using an SMT solver. Similarly to our paper, their approach encodes fixed-point arithmetic operations using the theory of bit-vectors. Anta et al. [3] tackle the verification problem of the stability of fixed-point controller implementations. They provide a formalization of fixed-point arithmetic semantics using bit-vectors, but unlike our paper they do not formalize rounding and overflows. Furthermore, they encode the fixed-point arithmetic using unbounded integer arithmetic, arguing that unbounded integer arithmetic is a better fit for their symbolic analysis. We could also reduce our bit-vector encoding to unbounded integers following a similar scheme as Anta et al.

Bounded model checker ESMBC [15, 24] supports fixed-point arithmetic and has been used to verify safety properties of fixed-point digital controllers [1] . Like us, it also employs a bit-vector encoding. However, it is unclear exactly which fixed-point operations are supported. UppSAT [50] is an approximating SMT solver that leverages fixed-point arithmetic as an approximation theory to floating-point arithmetic. Like the aforementioned work, UppSAT also encodes fixed-point arithmetic using the theory of bit-vectors. Its encoding ignores rounding modes, but adds special values such as infinities.

In addition to SMT-based verification, another important aspect of reasoning about fixed-point computations is error bound analysis, which is often used for the synthesis of fixed-point implementations. Majumdar et al. [38] synthesize Pareto optimal fixed-point implementations of control software in regard to performance criteria and error bounds. They reduce error bound computation to an optimization problem solved by mixed-integer linear programming. Darulova et al. [16] compile real-valued expressions to fixed-point expressions, and rigorously show that the generated expressions satisfy given error bounds. The error bound analysis is static and based on affine arithmetic. Volkova et al. [48, 49] propose an approach to determine the fixed-point format that ensures the absence of overflows and minimizes errors; their error analysis is based on Worst-Case Peak Gain measure. TAFFO [12] is an LLVM plugin that performs precision tuning by replacing floating-point computations with their fixed-point counterparts. The quality of precision tuning is determined by a static error propagation analysis.

In this paper, we propose an SMT theory of fixed-point arithmetic to facilitate SMT-based software verification of fixed-point programs and systems by promoting the development of decision procedures for the proposed theory. We introduce the syntax of fixed-point sorts and operations in the SMT-LIB format similar to that of the SMT floating-point theory. Then, we formalize the semantics of the fixed-point theory, including rounding and overflow, based on the exact rational arithmetic. We develop two decision procedures for the fixedpoint theory that encode it into the theory of bit-vectors and reals. Finally, we study the performance of our prototype decision procedures on a set of benchmarks, and perform a realistic case study by proving properties of quantized neural networks.

As future work, we plan to add more complex operations to the fixed-point theory, such as conversions to/from floating-points and the remainder operation. Moreover, we would like to apply the fixed-point theory to verify existing software implementations of fixed-point arithmetic in different programming languages. We plan to do this by integrating it into the Boogie intermediate verification language [18] and the SMACK verification toolchain [10, 42] .

Bounded model checking for fixed-point digital filters

Towards verified, constant-time floating point operations

Automatic verification of control system implementations

CAV 2011

Software verification and verifiable witnesses

Deciding floatingpoint logic with abstract conflict driven clause learning

Building better bit-blasting for floating-point problems

An automatable formal semantics for IEEE-754 floating-point arithmetic

Boolector: an efficient SMT solver for bit-vectors and arrays

SMACK software verification toolchain

Learning in practice: reasoning about quantization

TAFFO: tuning assistant for floating to fixed point optimization

The MathSAT5 SMT solver

A three-tier strategy for reasoning about floating-point numbers in SMT

SMT-based bounded model checking for embedded ANSI-C software

Synthesis of fixed-point programs

Z3: an efficient SMT solver

BoogiePL: a typed procedural language for checking object-oriented programs

Yices 2.2

Formal verification of piece-wise linear feed-forward neural networks

An SMT based method for optimizing arithmetic computations in embedded software code

ESBMC 5.0: an industrial-strength C model checker

PySMT: a solver-agnostic library for fast prototyping of SMT-based algorithms

How many bits does it take to quantize your neural network?

Deep learning with limited numerical precision

Signed 15.16 precision fixed-point arithmetic

Stochastic local search for solving floatingpoint constraints

Quantization and training of neural networks for efficient integerarithmetic-only inference

Reluplex: an efficient SMT solver for verifying deep neural networks

Convolutional deep belief networks on CIFAR-10

Make it real: effective floating-point reasoning via exact arithmetic

Cross platform fixed point maths library

Just fuzz it: solving floatingpoint constraints using coverage-guided fuzzing

Floating-point symbolic execution: a case study in n-version programming

Fixed point quantization of deep convolutional networks

Synthesis of minimal-error control software

Alive-FP: automated verification of floating point based peephole optimizations in LLVM

The building blocks of interpretability. Distill

Verifying (in-)stability in floating-point programs by increasing precision, using SMT solving

SMACK: decoupling source language details from verifier implementations

Learning representations by backpropagating errors

An SMT-LIB theory of binary floating-point arithmetic

SMT-LIB benchmarks in the QF_FP theory

SMT-LIB: the satisfiability modulo theories library

Evaluating robustness of neural networks with mixed integer programming

Determining fixed-point formats for a digital filter implementation using the worst-case peak gain measure

Arithmetic approaches for rigorous design of reliable fixed-point LTI filters

Exploring approximations for floating-point arithmetic using UppSAT