key: cord-0045895-bgbkgdbo authors: Klęsk, Przemysław; Bera, Aneta; Sychel, Dariusz title: Reduction of Numerical Errors in Zernike Invariants Computed via Complex-Valued Integral Images date: 2020-05-22 journal: Computational Science - ICCS 2020 DOI: 10.1007/978-3-030-50420-5_24 sha: b435fef5a351e5bd887d1068a9391ad5be12cfa5 doc_id: 45895 cord_uid: bgbkgdbo Floating-point arithmetics may lead to numerical errors when numbers involved in an algorithm vary strongly in their orders of magnitude. In the paper we study numerical stability of Zernike invariants computed via complex-valued integral images according to a constant-time technique from [2], suitable for object detection procedures. We indicate numerically fragile places in these computations and identify their cause, namely—binomial expansions. To reduce numerical errors we propose piecewise integral images and derive a numerically safer formula for Zernike moments. Apart from algorithmic details, we provide two object detection experiments. They confirm that the proposed approach improves accuracy of detectors based on Zernike invariants. The classical approach to object detection is based on sliding window scans. It is computationally expensive, involves a large number of image fragments (windows) to be analyzed, and in practice precludes the applicability of advanced methods for feature extraction. In particular, many moment functions [9] , commonly applied in image recognition tasks, are often preculded from detection, as they involve inner products i.e. linear-time computations with respect to the number of pixels. Also, the deep learning approaches cannot be applied directly in detection, and require preliminary stages of prescreening or region-proposal. There exist a few feature spaces (or descriptors) that have managed to bypass the mentioned difficulties owing to constant-time techniques discovered for them within the last two decades. Haar-like features (HFs), local binary patterns (LBPs) and HOG descriptor are state-of-the-art examples from this category [1, 4, 14] The crucial algorithmic trick that underlies these methods and allows for constant-time-O(1)-feature extraction are integral images. They are auxiliary arrays storing cumulative pixel intensities or other pixel-related expressions. Having prepared them before the actual scan, one is able to compute fast the wanted sums via so-called 'growth' operations. Each growth involves two additions and one subtraction using four entries of an integral image. In our research we try to broaden the existing repertoire of constant-time techniques for feature extraction. In particular, we have managed to construct such new techniques for Fourier moments (FMs) [7] and Zernike moments (ZMs) [2] . In both cases a set of integral images is needed. For FMs, the integral images cumulate products of image function and suitable trigonometric terms and have the following general forms: j,k f (j, k) cos(−2π(jt/const 1 + ku/const 2 )), and j,k f (j, k) sin(−2π(jt/const 1 + ku/const 2 )), where f denotes the image function and t, u are order-related parameters. With such integral images prepared, each FM requires only 21 elementary operations (including 2 growths) to be extracted during a detection procedure. In the case of ZMs, complex-valued integral images need to be prepared, having the form: j,k f (j, k)(k − i j) t (k + i j) u , where i stands for the imaginary unit (i 2 = − 1). The formula to extract a single ZM of order (p, q) is more intricate and requires roughly 1 24 p 3 − 1 8 pq 2 + 1 12 q 3 growths, but still the calculation time is not proportional to the number of pixels. It should be remarked that in [2] we have flagged up, but not tackled, the problem of numerical errors that may occur when computations of ZMs are backed with integral images. ZMs are complex numbers, hence the natural data type for them is the complex type with real and imaginary parts stored in the double precision of the IEEE-754 standard for floating-point numbers (a precision of approximately 16 decimal digits). The main culprit behind possible numerical errors are binomial expansions. As we shall show the algorithm must explicitly expand two binomial expressions to benefit from integral images, which leads to numbers of different magnitudes being involved in the computations. When multiple additions on such numbers are carried out, digits of smaller-magnitude numbers can be lost. In this paper we address the topic of numerical stability. The key new contribution are piecewise integral images. Based on them we derive a numerically safer formula for the computation of a single Zernike moment. The resulting technique introduces some computational overhead, but remains to be a constant-time technique. Recent literature confirms that ZMs are still being applied in many image recognition tasks e.g: human age estimation [8] , electrical symbols recognition [16] , traffic signs recognition [15] , tumor diagnostics from magnetic resonance [13] . Yet, it is quite difficult to find examples of detection tasks applying ZMs directly. Below we describe the constant-time approach to extract ZMs within detection, together with the proposition of numerically safe computations. ZMs can be defined in both polar and Cartesian coordinates as: = p + 1 π where: i is the imaginary unit (i 2 = − 1), and f is a mathematical or an image function defined over unit disk [2, 17] . p and q indexes, representing moment order, must be simultaneously even or odd, and p |q|. ZMs are in fact the coefficients of an expansion of function f , given in terms of Zernike polynomials V p,q as the orthogonal base: 1 where V p,q (r, θ) = (p−|q|)/2 s=0 β p,q,s r p−2s e i qθ . As one can note V p,q combines a standard polynomial defined over radius r and a harmonic part defined over angle θ. In applications, finite partial sums of expansion (4) are used. Suppose ρ and denote the imposed maximum orders, polynomial and harmonic, respectively, and ρ . Then, the partial sum that approximates f can be written down as: ZMs are invariant to scale transformations, but, as such, are not invariant to rotation. Yet, they do allow to build suitable expressions with that property. Suppose f denotes a version of function f rotated by an angle α, i.e. f (r, θ) = f (r, θ + α). It is straightforward to check, deriving from (1), that the following identity holds where M p,q represents a moment for the rotated function f . Hence in particular, the moduli of ZMs are one type of rotational invariants, since Apart from the moduli, one can also look at the following products of moments with the first factor raised to a natural power. After rotation one obtains Hence, by forcing nq + s = 0 one can obtain many rotational invariants because the angle-dependent term e i (nq+s)α disappears. In practical tasks it is more convenient to work with rectangular, rather than circular, image fragments. Singh and Upneja [12] proposed a workaround to this problem: a square of size w × w in pixels (w is even) becomes inscribed in the unit disc, and zeros are "laid" over the square-disc complement. This reduces integration over the disc to integration over the square. The inscription implies that widths of pixels become √ 2/w and their areas 2/w 2 (a detail important for integration). By iterating over pixel indexes: 0 j, k w − 1, one generates the following Cartesian coordinates within the unit disk: In detection, it is usually sufficient to replace integration involved in M p,q by a suitable summation, thereby obtaining a zeroth order approximation. In subsequent sections we use the formula below, which represents such an approximation (hat symbol) and is adjusted to have a convenient indexing for our purposes: -namely, we have introduced the substitutions p := 2p + o, q := 2q + o. They play two roles: they make it clear whether a moment is even or odd via the flag o ∈ {0, 1}; they allow to construct integral images, since exponents 1 2 (p ∓ s) − s present in (2) are now suitably reduced. Suppose a digital image of size n x ×n y is traversed by a w×w sliding window. For clarity we discuss only a single-scale scan. The situation is sketched in Fig. 1 . Let (j, k) denote global coordinates of a pixel in the image. For each window under analysis, its offset (top-left corner) will be denoted by (j 0 , k 0 ). Thus, indexes of pixels that belong to the window are: Given a global index (j, k) of a pixel, the local Cartesian coordinates corresponding to it (mapped to the unit disk) can be expressed as: Let {ii t,u } denote a set of complex-valued integral images 2 : where pairs of indexes (t, u), generating the set, belong to: For any integral image we also define the growth operator over a rectangle spanning from (j 1 , k 1 ) to (j 2 , k 2 ): with two complex-valued substractions and one addition. The main result from [2] (see there for proof) is as follows. (14), has been prepared prior to the detection procedure. Then, for any square window in the image, each of its Zernike moments (11) can be calculated in constant time -O(1), regardless of the number of pixels in the window, as follows: M 2p+o,2q+o = 4p+2o+2 πw 2 2q+o 2s+o 2p+o β 2p+o,2q+o,p−s √ 2 w 2s+o · s−q t=0 s−q t (−k c +i j c ) s−q−t s+q+o u=0 s+q+o u (−k c −i j c ) s+q+o−u Δ j0,j0+w−1 k0,k0+w−1 (ii t,u ) . (16) Floating-point additions or subtractions are dangerous operations [6, 10] because when numbers of different magnitudes are involved, the right-most digits in the mantissa of the smaller-magnitude number can be lost when widely spaced exponents are aligned to perform an operation. When ZMs are computed according to Proposition 1, such situations can arise in two places. The connection between the definition-style ZM formula (11) and the integral images-based formula (16) are expressions (13): . They map global coordinates to local unit discs. When the mapping formulas are plugged into x k and y j in (11), the following subexpression arises under summations: Now, to benefit from integral images one has to explictly expand the two binomial expressions, distinguishing two groups of terms: k ∓ i j-dependent on the global pixel index, and −k c ± i j c -independent of it . By doing so, coordinates of the particular window can be isolated out and formula (16) is reached. Unfortunately, this also creates two numerically fragile places. The first one are integral images themselves, defined by (14) . Global pixel indexes j, k present in power terms Hence, for an image of size e.g. 640 × 480, the summands vary in magnitude roughly from 10 0(t+u) to 10 3(t+u) . To fix an example, suppose t + u = 10 (achievable e.g. when ρ = = 10) and assume a roughly constant values image function. Then, the integral image ii t,u has to cumulate values ranging from 10 0 up to 10 30 . Obviously, the rounding-off errors amplify as the ii t,u sum progresses towards the bottom-right image corner. The second fragile place are expressions: (−k c + i j c ) s−q−t and (−k c − i j c ) s+q+o−u , involving the central index, see (16) . Their products can too become very large in magnitude as computations move towards the bottom-right image corner. In error reports we shall observe relative errors, namely: where φ denotes a feature (we skip indexes for readability) computed via integral images while φ * its value computed by the definition. To give the reader an initial outlook, we remark that in our C++ implementation noticeable errors start to be observed already for ρ= =8 settings, but they do not yet affect significantly the detection accuracy. For ρ= =10, the frequency of relevant errors is already clear they do deteriorate accuracy. For example, for the sliding window of size 48 × 48, about 29.7% of all features have relative errors equal at least 25%. The numerically safe approach we are about to present reduces this fraction to 0.7%. The technique we propose for reduction of numerical errors is based on integral images that are defined piecewise. We partition every integral image into a number of adjacent pieces, say of size W × W (border pieces may be smaller due to remainders), where W is chosen to exceed the maximum allowed width for the sliding window. Each piece obtains its own "private" coordinate system. Informally speaking, the (j, k) indexes that are present in formula (14) become reset to (0, 0) at top-left corners of successive pieces. Similarly, the values cumulated so far in each integral image ii t,u become zeroed at those points. Algorithm 1 demonstrates this construction. During detection procedure, global coordinates (j, k) can still be used in main loops to traverse the image, but once the window position gets fixed, say at (j 0 , k 0 ), then we shall recalculate that position to new coordinates (j 0 , k 0 ) valid for the current piece in the following manner: Note that due to the introduced partitioning, the sliding window may cross partitioning boundaries, and reside in either: one, two or four pieces of integral images. That last situation is illustrated in Fig. 2 . Therefore, in the general case, the outcomes of growth operations Δ for the whole window will have to be combined from four parts denoted in the figure by P 1 , P 2 , P 3 , P 4 . An important role in that context will be played by the central index (j c , k c ). In the original approach its position was calculated only once, using global coordinates and formula (12) . The new technique requires that we "see" the central index differently from the point of view of each part P i . We give the suitable formulas below, treating the first part P 1 as reference. Note that the above formulation can, in particular, yield negative coordinates. More precisely, depending on the window position and the P i part, the coordinates of the central index can range within: −W +w/2 < j c,Pi , k c,Pi < 2W −w/2. procedure PiecewiseIntegralImage(f , t, u, W ) create array iit,u of size nx×ny and auxiliary array ii of size ny k := 0, j := 0 for x := 0, . . . , nx − 1 do if x mod W = 0 then k := 0 for y := 0, . . . , ny − 1 do if y mod W = 0 then j := 0 As the starting point for the derivation we use a variant of formula (11) , taking into account the general window position with global pixel indexes ranging as follows: j 0 j j 0 +w−1, k 0 k k 0 +w−1. We substitute 4p+2o+2 The second pass in the derivation of (25) splits the original summation into four smaller summations over window parts lying within different pieces of integral images, see Fig. 2 . We write this down for the most general case when the window crosses partitioning boundaries along both axes. We remind that three simpler cases are also possible: {P 1 } (no crossings), {P 1 , P 2 } (only vertical boundary crossed), {P 1 , P 3 } (only horizontal boundary crossed). The parts are directly implied by: the offset (j 0 , k 0 ) of detection window, its width w and pieces size W . For strictness, we could have written a functional dependence of form P (j 0 , k 0 , w, W ), which we skip for readability. Also in this pass we switch from global coordinates (j, k) to piecewise coordinates (j P , k P ) valid for the given part P . The connection between those coordinates can be expressed as follows In the third pass we apply two binomial expansions. In both of them we distinguish two groups of terms: −k c,P ±i j c,P (independent of the current pixel index) and k P ∓ i j P (dependent on it). Lastly, we change the order of summations and apply the constant-time Δ operation. The key idea behind formula (25), regarding its numerical stability, lies in the fact that it uses decreased values of indexes j P , k P , j c,P , k c,P , comparing to the original approach, while maintaining the same differences k P −k c,P and j c,P −j P . Recall the initial expressions (k − k c ) √ 2/w and (j c − j) √ 2/w and note that one can introduce arbitrary shifts, e.g. k := k ± α and k c := k c ± α (analogically for the pair: j, j c ) as long as differences remain intact. Later, when indexes in such a pair become separated from each other due to binomial expansions, their numerical behaviour is safer because the magnitude orders are decreased. Features with high errors can be regarded as damaged, or even useless for machine learning. In Table 1 and Fig. 3 we report percentages of such features for a given image, comparying the original and the numerically safer approach. The figure shows how the percentage of features with relative errors greater than 0.25, increases as the sliding window moves towards the bottom-right image corner. In subsequent experiments we apply RealBoost (RB) learning algorithm producing ensembles of weak classifiers: shallow decision trees (RB+DT) or bins (RB+B), for details see [2, 5, 11] . Ensemble sizes are denoted by T . Two types of feature spaces are used: based solely on moduli of Zernike moments (ZMs-M), and based on extended Zernike product invariants (ZMs-E); see formulas (7), (8) , respectively. The '-NER' suffix stands for: numerical errors reduction. Feature counts are specified in parenthesis. Hence, e.g. ZMs-E-NER (14250) represents extended Zernike invariants involving over 14 thousand features, computed according to formula (25) based on piecewise integral images. Experiment: "Synthetic A letters" For this experiment we have prepared a synthetic data set based on font material gathered by T.E. de Campos et al. [2, 3] . A subset representing 'A' letters was treated as positives (targets). In train images, only objects with limited rotations were allowed (±45 • with respect to their upright positions). In contrast, in test images, rotations within the full range of 360 • were allowed. Table 2 lists details of the experimental setup. We report results in the following forms: examples of detection outcomes (Fig. 4) , ROC curves over logarithmic FAR axis (Fig. 5a) , test accuracy measures for selected best detectors (Table 4) . Accuracy results were obtained by a batch detection procedure on 200 test images including 417 targets (within the total of 3 746 383 windows). The results show that detectors based on features obtained by the numerically safe formula achieved higher classification quality. This can be well observed in ROCs, where solid curves related to the NER variant dominate dashed curves, and is particularly evident for higher orders ρ and (blue and red lines in Fig. 5a) . These observations are also confirmed by Table 4 Experiment: "Faces" The learning material for this experiment consisted of 3 000 images with faces (in upright position) looked up using the Google Images and produced a train set with 7 258 face examples (positives). The testing phase was a two-stage process, consisting of preliminary and final tests. Preliminary tests were meant to generate ROC curves and make an initial comparison of detectors. For this purpose we used another set of faces (also in upright position) containing 1 000 positives and 2 000 000 negatives. The final batch tests were meant to verify the rotational invariance. For this purpose we have prepared the third set, looking up faces in unnatural rotated positions (example search queries: "people lying down in circle", "people on floor", "face leaning to shoulder", etc.). We stopped after finding 100 such images containing 263 faces in total (Table 3) . (Fig. 5b) indicate that sensitivities at satisfactory levels of ≈90% are coupled with false alarm rates ≈ 2·10 −4 , which, though improved with respect to [2] 3 , is still too high to accept. Therefore, we decided to apply obtained detectors as prescreeners invariant to rotation. Candidate windows selected via prescreening 4 were then analyzed by angle-dependent classifiers. We prepared 16 such classifiers, each responsible for an angular section of 22.5 • , trained using 10 125 HFs and RB+B algorithm. Figure 5b shows an example of ROC for an angle-dependent classifier (90 • ± 11.25 • ). The prescreening eliminated about 99.5% of all windows for ZMs-M and 97.5% for ZMs-E. .875 10/100 6.8 .999997055 Table 4 reports detailed accuracy results, while Fig. 6 shows detection examples. It is fair to remark that this experiment turned out to be much harder than the former one. Our best face detectors based on ZMs and invariant to rotation have the sensitivity of about 90% together with about 7% of false alarms per image, which indicates that further fine-tuning or larger training sets should be considered. As regards the comparison between standard and NER variants, as previously the ROC curves show a clear advantage of NER prescreeners (solid curved dominate their dashed counterparts). Accuracy results presented in Table 4 also show such a general tendency but are less evident. One should remember that these results pertain to combinations: 'prescreener (ZMs) + postscreener (HFs)'. An improvement in the prescreener alone may, but does not have to, influence directly the quality of the whole system. We have improved the numerical stability of floating-point computations for Zernike moments (ZMs) backed with piecewise integral images. We hope this proposition can pave way to more numerous detection applications involving ZMs. A possible future direction for this research could be to analyze the computational overhead introduced by the proposed approach. Such analysis can be carried out in terms of expected value of the number of needed growth operations. Embedded face detection application based on local binary patterns Constant-time calculation of Zernike moments for detection with rotational invariance Character recognition in natural images Histograms of oriented gradients for human detection Additive logistic regression: a statistical view of boosting What every computer scientist should know about floating-point arithmetic Constant-time fourier moments for face detection-can accuracy of haar-like features be beaten? In: Rutkowski Facial age estimation using Zernike moments and multi-layer perceptron Moment Functions in Image Analysis -Theory and Applications IEEE standard for floating point numbers Response binning: improved weak classifiers for boosting Accurate computation of orthogonal Fourier-Mellin moments Segmentation and classification of brain tumor using wavelet and Zernike based features on MRI Rapid object detection using a boosted cascade of simple features Traffic sign detection and recognition using color standardization and Zernike moments Feature extraction and image recognition for the electrical symbols based on Zernike moment Beugungstheorie des Schneidenverfahrens und seiner verbesserten Form, der Phasenkontrastmethode