key: cord-0044773-8d9ig9y4 authors: Ribal, Christophe; Lermé, Nicolas; Le Hégarat-Mascle, Sylvie title: Thin Structures Segmentation Using Anisotropic Neighborhoods date: 2020-05-18 journal: Information Processing and Management of Uncertainty in Knowledge-Based Systems DOI: 10.1007/978-3-030-50146-4_44 sha: f187df682b7f8a87a4b76d6896833b22017d585f doc_id: 44773 cord_uid: 8d9ig9y4 Bayesian and probabilistic models are widely used in image processing to handle noise due to various alteration phenomena. To benefit from the spatial information in a tractable way, Markov Random Fields (MRF) are often assumed with isotropic neighborhoods, that is however at the detriment of the preservation of thin structures. In this study, we aim at relaxing this assumption on stationarity and isotropy of the neighborhood shape in order to get a prior probability term that is relevant not only within the homogeneous areas but also close to object borders and within thin structures. To tackle the issue of neighborhood shape estimation, we propose to use tensor voting, that allows for the estimation of structure direction and saliency at various scales. We propose three main ways to derive anisotropic neighborhoods, namely shape-based, target-based and cardinal-based neighborhood. Then, having defined the neighborhood field, we introduce an energy that will be minimized using graph cuts, and illustrate the benefits of our approach against the use of isotropic neighborhoods in the applicative context of crack detection. First results on such a challenging problem are very encouraging. Image segmentation is a challenging task in the computer vision field, which deals with the problem of partitioning an image (or video) into multiple regions with labels that may later be used in higher level tasks, like object classification, detection or tracking. This is an ill-posed problem since at the pixel level, such operation is prone to noise, corrupted data and all kind of optic phenomena altering the original image, resulting in multiple valid solutions. A common way to overcome these difficulties is to take into account spatial relationships between close pixels in order to favor some solutions exhibiting slow variations in the label field. Classically, one may model the 2-dimensional field of labels as a MRF [10] and compute the segmentation using the Maximum A Posteriori (MAP). Variational approaches are widely used to provide solutions minimizing a functional which uses energy terms representing data fidelity and regularization terms. The numerous energy models for reducing the impact of image artifacts over the output segmentation nevertheless tend to share a common drawback: They behave poorly on thin structures 1 , because of the small size and complex geometry of the latter with respect to neighborhood ones. The early removal of such structures is a well known effect of Total Variation (TV) regularization (e.g. in image reconstruction [20] ) and Potts regularization (e.g. in image segmentation [14] ). Thin structures are however ubiquitous in a number of applications (such as medical imaging or quality control) and detecting them as accurately as possible is therefore of great interest. Alternatively, superpixel decomposition methods have been developed for grouping pixels sharing similar radiometric intensities into regions of controlled spatial extent. Superpixel partitions are generally seen as oversegmentations that preserve small structures but also noise. The benefits of superpixel decomposition is thus to drastically reduce the number of elements to process while keeping the geometrical information that is often lost with multi-resolution approaches and leaving noise removal for further processing steps. Dealing with further processing, a major drawback of a superpixel segmentation is that the usual hypothesis of a regular lattice is lost (i.e. pixels are all of the same size and shape). As a result, image segmentation approaches taking advantage of superpixels must cope with these problems and introduce new spatial relationships. This induces a neighborhood construction step even for isotropic neighborhoods: For instance, a simple criterion is that superpixels are considered as neighbors when they share a common border [6, 9, 15, 21] . The authors of [21] propose to minimize an energy using graph cuts on the adjacency graph obtained from the watershed of the input image. In this graph, edges connecting two adjacent regions are weighted upon their common border length, similarly to [6] . Those neighborhood fields based on adjacency do not favor specific orientations of neighborhoods with respect to superpixel context and/or location. The approach of [11] is to gather all superpixels whose centroid belongs to a disc centered on it and is therefore isotropic as in most of the other superpixel-based segmentation approaches. At the pixel level however, anisotropic approaches have been introduced to minimize the alteration of thin structures by regularization processes [7, 13] . This corresponds to the relaxation of the isotropic hypothesis often introduced when formulating the problem as a MRF. As an alternative to the weighted Total Variation (TV) [19] , the authors of [17] introduce a directional TV approach, based on a "vesselness feature" which aims to detect thin structures. Finally, since we believe that structure orientation estimation is a key aspect of anisotropic regularization approaches, let us mention different ways to estimate it: tensor voting approaches [5, 16] , vesselness operators like RORPO [18] , the Frangi vesselness [8] , or structure aware regression filters [23] to perform structure-dependent image smoothing. By analogy with typical probabilistic modeling, the uniform hypothesis widely used in the absence of prior knowledge corresponds to an isotropic neighborhood, and the specific prior distribution corresponds to an anisotropic neighborhood which can be derived from the observation of the local orientation in our case. We thus propose a methodology that both allows for the relaxation of the isotropic neighborhood which is all the more relevant when we consider the superpixel level, and provides regularized results robust to noise. We consider in this context the construction of elliptic neighborhoods, that originate from [7] and [11] , and of two path-based neighborhoods. Similarly to [17] , we expect these anisotropic neighborhoods to take into account the orientations of image's structures. Thus, we introduce a new field embedding these orientations computed from tensor voting [16] . Finally, we formulate the segmentation problem in an energy minimization framework, and solve it using graph cuts. The rest of the paper is organized as follows. The problem formulation is presented in Sect. 2 and the construction of isotropic and anisotropic neighborhoods is detailed in Sect. 3. Section 4 introduces the energy terms implemented and Sect. 5 compares our results against those obtained with isotropic neighborhoods on real and simulated images. Finally, Sect. 6 outlines the contributions of the paper and discusses future work. A superpixel is a group of pixels, defined by their coordinates in the ndimensional space, n ∈ N >0 the set of positive integers (n = 2 in the experiments presented in Sect. 5). Since each pixel belongs to one and only one superpixel, the set S of K ∈ N >0 superpixels is a partition of the original image. Denoting by P the set of pixels, then each superpixel s is an element of S ⊂ 2 P , i.e. s ∈ 2 P where 2 X denotes the powerset of a set X. The partition constraint implies that ∀p ∈ P, ∃!s ∈ S such that p ∈ s. Notice that the shape of any superpixel is also usually constrained to be composed of a single connected component. We denote by F the feature space holding the spectral information associated to any pixel or superpixel, for instance R (grayscale images), R 3 (color images) or a higher dimensional space (hyperspectral images). To stress that our approach can apply equally to an image of pixels or of superpixels, we define the position and the feature vector of a superpixel s ∈ S. In our case (but other choices could have been done depending on the application), they are the barycenter of the coordinates (in n-dimensional space) of the pixels that compose s and the feature barycenter (in F) of these same pixels. Given a finite set C = {1, . . . , C} of C ∈ N >0 classes, segmenting the image is equivalent to finding a field of labels u ∈ C S . We use the MAP criterion to assign, given the image of superpixels (with superpixels possibly reduced to a single pixel), denoted I∈ F S , a label u s ∈ C to s, ∀s ∈ S. To this end, we set up a functional E, to be minimized over the field of labels u ∈ C S , that encompasses different priors on the labeling u: where α ∈ R >0 is a parameter controlling the balance between the data fidelity term E 1 and the smoothness term E 2 , and V : S → 2 S is the neighborhood field that is fixed. Note that E 1 only depends on the image data and on u. Smoothness prior on the labeling u yields the smoothness term E 2 that is itself based on neighborhood field definition. In this study, we only consider the second order cliques, and we denote by N ⊂ S 2 the set of second order cliques of superpixels. Note that using such a definition, the superpixels of any pair (s, t) ∈ N are not required to have a common boundary. For any superpixel s ∈ S, we define the With the relaxation of isotropy and stationarity constraints on V comes the need to introduce additional priors. First, we formulate the hypothesis that the structure of the neighborhood of a superpixel depends on the structure of the objects in the image. We model such information of structure through a symmetric second order tensor field T ∈ (R n×n ) S and assume that it can be built from the field of labels u. Other considered priors yield different ways to construct the neighborhood field V , presented in the next section. In this section, we aim at defining the neighborhood field V : S → 2 S , possibly anisotropic and non stationary. To each site s ∈ S, we associate a set of sites V (s) ∈ 2 S , where s is either a superpixel or a pixel, such definition being consistent with S = P. To underline the genericity of our formulation, we consider both cases in our experiments. The construction of our anisotropic neighborhoods aims at encouraging strong relationships between sites aligned with respect to the directions of the thin structures of the image. As explained in Sect. 1, the characteristics thereby depicted for these structures, namely orientation and saliency, may be retrieved from vesselness operators [8, 18, 23] or Tensor Voting [16] . In this study, we consider this latter approach where a scale parameter σ ∈ R >0 sets the span of the voting field. Whatever the way they have been estimated, let us represent the thin structure features in a field of second order tensors T ∈ (R n×n ) S . For any site s ∈ S, local orientation and saliency of structure are derived from the eigenvectors and the eigenvalues of the tensor T s . Eigenvectors are ranked by decreasing order of their corresponding eigenvalue. More precisely, for any site s ∈ S, the construction of V is achieved using a set of vectors, We distinguish two families of anisotropic neighborhoods, namely shapebased neighborhoods and path-based neighborhoods, both compared (see Sect. 5) against the following neighbourhoods: Stawiaski's boundary-based neighborhood [21] and Giraud's neighborhood [11] . Note that the latter can be seen as an isotropic restriction of our shape-based neighborhood. Path-based neighborhoods stem from the idea of adapting the neighborhood structure to 1-dimensional thin structures, represented by paths. Formally, for any k ∈ N >0 , we define a path of cardinality k as a set of sites (s 1 , . . . , s k ) ∈ S k such that, in our case, (s i , s i+1 ) have a common boundary (see [21] ), ∀i ∈ {1, . . . , k − 1}. Moreover, we denote by P K the set of paths of cardinality K ∈ N >0 and by P the set of paths of any cardinality k ∈ N >0 . In what follows, we detail three different ways to construct the neighborhood field V using the tensor field T . We are inspired by [11] , using superpixel centroid relative locations and ndimensional shapes instead of discs to settle the shape-based neighborhoods (shape). Whenever the centroid of a superpixel belongs to the computed neighborhood shape of a second one, it is added to the neighborhood of the latter. For computational reasons, we discretize the orientations of the parametric shapes based on the one of − → v 0 (T s ) for any site s ∈ S, which boils down to the use of a dictionary of neighborhood shapes. Notice that the neighborhood V (s) of any site s ∈ S is not necessarily connected with such an approach. Target-based neighborhood (target) is a path-based neighborhood that aims at constructing the neighborhood V (s) of a site s ∈ S by connecting it to two distant sites t 0 , t 1 ∈ S (named "target") through paths of minimal energy. Hence, the connectedness along these paths (and so V (s)) is thus ensured by definition. We propose to find these paths in two stages. Firstly, for any j ∈ {0, 1}, targets connecting s are found with where β ∈ R >0 is a free parameter, V (s) denotes a shape-based neighborhood (see Sect. 3.1), sign (.) denotes the sign of a real number, ., . denotes the scalar product, . denotes the Euclidean norm and − → st denotes the vector connecting any pair of sites (s, t) ∈ S 2 . In Eq. (2), the first term favors the sites s and t to have similar image intensities while the second term favors far targets from s that are aligned with − → v 0 (T s ). Secondly, paths of minimal energy connecting the site s ∈ S to either targets t * 0 or t * 1 (see Eq. (2)) are obtained with where stands for the cardinality of a set. The term to be minimized in Eq. (3) is large when image intensities of successive sites along a path are dissimilar and small otherwise. Finally, the neighborhood V (s) of the site s can be now constructed as follows: V (s) = (p * 0 ∪ p * 1 ) \ {s}. Cardinal-based neighborhood (cardinal) is a path-based neighborhood that aims at constructing the neighborhood V (s) of a site s ∈ S by finding two paths of minimal energy starting from s, in opposite directions (according to − → v 0 (T (s))) and of fixed length K ∈ N >0 . For any j ∈ {0, 1}, these paths are In the above expression, l j C (p) provides a measure of the length of the path p starting from s. For any path p = (s 1 = s, . . . , s K ) ∈ P K , l j C (p) is defined by where β ∈ R >0 is a free parameter and measures the angle between the vectors − → u and − → v and discriminates whether the scalar product between them is positive or not. The first term of Eq. (5) encourages the image intensities of any site s i to be similar to s while the second term aims at aligning the path with − → v 0 (T s ). This allows for ensuring that two paths in opposite directions are selected to establish the neighborhood V (s) of s. Finally, the neighborhood V (s) of the site s can be now constructed as follows: V (s) = (p * 0 ∪ p * 1 ) \ {s}. In the next section, the segmentation model using anisotropic and isotropic neighborhoods is detailed. The data fidelity term E 1 (u) in the functional E(u, V ) (see Eq. (1)) is the energy term derived from the likelihood P (I | u). At the pixel level, popular models rely on statistical assumptions, especially by assuming site conditional independence. Relying on the same assumption but at the superpixel level, the probability P (I | u) is the product, over S, of probabilities P (I(s) | u s ), I(s) ∈ F being the observation and u s ∈ C the class of s. In this study, we adopt a color model assuming that image intensities are Gaussian-distributed for each class c ∈ C with mean value μ c ∈ F and standard deviation σ c ∈ R >0 [4] . Then, the data term E s 1 for any superpixel s ∈ S and any label u s ∈ C is written and, for any u ∈ C S , E 1 in Eq. (1) is: The energy term E 2 (u, V ) corresponds to the smoothness prior on the label field u and requires the definition of a neighborhood field V , as introduced in Sect. 2. Then, this neighborhood field being fixed, we assume u is an MRF so that a prior probability on u can be computed from 'elementary' energy terms. In this study, we adopt the Potts model [25] , weighted according to the strength of interaction between neighboring superpixels. The definition of E 2 (u, V ) is thus the following: For instance, in our implementation of the neighborhood of [21] , the weighting function W is defined for any pair (s, p) ∈ N as W (s, p) = ∂(s,p) ∂(s) ∈]0, 1], where ∂(s, p) and ∂(s) denote the common boundary between s and p and the perimeter of s, respectively. In the other neighborhood fields we compare, the cliques N can connect non adjacent superpixels. Thus, we propose to define for any pair (s, p) ∈ N the weighting function W as W (s, p) = ( V (s)) −1 ∈]0, 1]. The Potts model preserves its properties, in particular submodularity, and the data fidelity term E 1 is convex. Numerous works have proven the efficiency of graph cuts [3, 12] . According to [12] and [3] respectively, the energy function defined in Eq. (1) can be exactly minimized when C = 2 (this is the case in our experiments) and approximately minimized when C > 2. Finally, note that the estimation of the neighborhood field V itself requires a segmentation u. In this study, we use the blind segmentation (i.e. α = 0). Let us introduce the data and experiments carried out within our application context, namely crack detection. We aim at segmenting a crack, which is a thin structure over a highly textured and noisy background, e.g. some asphalt road or concrete wall as in the cracktree dataset [27] . In this study, in addition to images from this dataset, we consider a simulated image with arbitrary shapes and textured noise as shown in Fig. 1 . Images intensities are normalized in [0, 1]. A variety of algorithms for generating superpixels exist and exhibit different properties [1, 22] . Besides, the requirement of providing a partition of the image into connected sets of pixels, the main desirable properties are the preservation of image boundaries, the control of the compactness of superpixels and their number, in addition to computational efficiency of the algorithms. In order to study the benefit of our approach also regardless the superpixel decomposition, we propose a "perfectly shaped" set of superpixels generated from the dilated ground truth. Then, for results derivation using actual superpixels, we require the following properties for superpixels: good compactness to be efficiently modeled by their centroid, regularity in size while at the same time allowing the grouping of crack pixels into thin superpixels. Given those prerequisites, we have considered Extended Topology Preserving Segmentation superpixels (ETPS) [26] after image smoothing with median filtering with a square window of size (7 × 7) . Concerning the construction of anisotropic neighborhoods, the parameters are fixed so that there are 6 neighbors per superpixel in average. For shape, this reduces to setting the ellipse's area to 7 times the mean area of a superpixel, while their flattening is set to 0.6. With cardinal, we set K = 4 and β = 5 × 10 −3 . Finally for target, β = 5 × 10 −3 × β R where β R ∈ R >0 is the radius of the shape-based neighborhood ellipsis. Notice, these parameters are related to the scale of the thin structures to detect with respect to the superpixel size, so that some priors can help setting them. Finally, we estimate the mean μ c of each class c ∈ C (here, C = 2) from the ground truth, and we set σ c = 1 for simplicity. To compensate the effect of variation of texture size due to perspective, we set the mean value of classes as an affine function of the vertical position of the superpixel. Future works can handle the parameter estimation in an unsupervised context. Because our ground truth can be composed of 1 pixel width objects, in order to distinguish between slight mislocation errors and non-detection of some parts of the cracks, we compute the F-measure (FM) at scale = 2, based on the number of true positives (TP), false positives (FP) and false negatives (FN), like in [2, 24] . In addition, the crack region and the non-crack area being highly unbalanced (in favor to the non-crack area), we use a high value of γ = 5 in FM to increase sensitivity to FN with respect to FP: The results are presented at pixel level in Fig. 1 and at superpixel level in Fig. 2 , respectively. For each image, that corresponds to a set of parameters including the type of superpixels and the type of neighborhood, we select the best result, according to the FM criterion, among the results obtained varying the parameters σ (the scale parameter of tensor voting) and α (the regularization parameter in Eq. (1)). The automatic estimation of these parameters along with the analysis of their impact on performance (FM criterion for instance) will be carried in future work in order to make the proposed method more effective. Fig. 1 . Evaluation performance against ground truths at pixel level for a crack image (top row) and a simulated one (bottom row). The three last columns are segmentations without regularization ("blind") and with regularization (isotropic with 4-connectivity or shape-based anisotropic neighborhoods with ellipses). For each image, in both regularized cases, the results achieving the largest FM with respect to tensor voting scale σ and regularization parameter α, are depicted. FM measurements are also provided in percents for γ = 5. At pixel level, Fig. 1 illustrates the clear improvement of the quality of the results with the use of anisotropic neighborhoods. In the first image of crack, anisotropic regularization allows for enhancing the continuity of the detected cracks, even if some small gaps still fragment it. In the simulated image, the improvement is significant with the correct segmentation of the six discontinuities in the cracks, without loss of precision on more complex shapes. However, at superpixel level (Fig. 2) , while exhibiting better blind results thanks to the averaging of information at pixel level, superpixel anisotropic neighborhoods seem to suffer in general from the fact that it is difficult to establish the right neighborhood V even with a correct estimation of its orientation (see last column). Our experiments reveal that even if we are far from "Optimal" neighborhood performances, path-based neighborhoods tend to outperform the shape-based ones. Unfortunately, the anisotropic approach benefits exhibited in Fig. 2 do not seem to improve the segmentation of the crack image in a so significant way: Path-based approach outperforms the other approaches when superpixels are perfectly shaped, but are still sensitive to the degradation of the quality of the superpixels. Evaluation performance against ground truths at superpixel level for a crack image and a simulated one. The segmentations achieving the largest FM with respect to parameters σ and α for γ = 5, are depicted. The last two columns correspond to the use of the "perfectly shaped" superpixel. The last column represents, for one site highlighted in blue, the sites that are in its neighborhood (in red), depending on the method used for constructing the latter one: Each row shows a different type of neighborhood, specified in header lines. The last row is a ground truth-based neighborhood for comparison purpose. (Color figure online) In this paper, we introduced three anisotropic neighborhoods, in order to make them able to fit the thin structures of the image and thus to improve segmentation results. They rely on the estimation of the orientations of such structures, based here on tensor voting that is efficient in estimating dense map of orientations from a sparse field of labeled sites in the blind segmentation. We then perform the minimization of our energy functional via graph cuts. We tested our results with a simulated image and an actual difficult crack image, to validate the improvements brought by anisotropic regularization. While our results exhibit a high gain of performances at pixel level, superpixel segmentation suffers from the challenging task to estimate neighborhood at superpixel level, that seems to weaken the benefits of anisotropic regularization. Finally, we plan to investigate the possible refinement of the neighborhood field estimation after computing the regularized segmentation to introduce an alternative minimization procedure, and to explore extensions of our approach with thin structures in shape from focus in 3D-space [20] for future works. Superpixels and polygons using simple non-iterative clustering Robust crack detection for unmanned aerial vehicles inspection in an a-contrario decision framework Fast approximate energy minimization via graph cuts Active contours without edges N-dimensional tensor voting and application to epipolar geometry estimation Superpixel-based extended random walker for hyperspectral image classification Recovering thin structures via nonlocal-means regularization with application to depth from defocus Multiscale vessel enhancement filtering Class segmentation and object localization with superpixel neighborhoods Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images SuperPatchMatch: an algorithm for robust correspondences using superpixel patches What energy functions can be minimized via graph cuts? Ant colony optimization for image regularization based on a nonstationary Markov modeling Multilayer joint segmentation using MRF and graph cuts Convex formulation for multiband image classification with superpixel-based spatial regularization Tensor voting: theory and applications A variational model for thin structure segmentation based on a directional regularization Curvilinear structure analysis by ranking the orientation responses of path operators Variational method combined with Frangi vesselness for tubular object segmentation Efficient graph cut optimization for shape from focus Region merging via graph-cuts Superpixels: an evaluation of the state-of-theart Relative reductive structureaware regression filter Crack detection based on a Marked Point Process model The Potts model Real-time coarse-to-fine topologically preserving segmentation CrackTree: automatic crack detection from pavement images