key: cord-0498867-sliduny8
authors: Wewior, Aaron; Weickert, Joachim
title: Variational Coupling Revisited: Simpler Models, Theoretical Connections, and Novel Applications
date: 2019-12-12
journal: nan
DOI: nan
sha: 9b17f8546c3e05d208dd9170c861a534f2eb4a9b
doc_id: 498867
cord_uid: sliduny8

Variational models with coupling terms are becoming increasingly popular in image analysis. They involve auxiliary variables, such that their energy minimisation splits into multiple fractional steps that can be solved easier and more efficiently. In our paper we show that coupling models offer a number of interesting properties that go far beyond their obvious numerical benefits. We demonstrate that discontinuity-preserving denoising can be achieved even with quadratic data and smoothness terms, provided that the coupling term involves the $L^1$ norm. We show that such an $L^1$ coupling term provides additional information as a powerful edge detector that has remained unexplored so far. While coupling models in the literature approximate higher order regularisation, we argue that already first order coupling models can be useful. As a specific example, we present a first order coupling model that outperforms classical TV regularisation. It also establishes a theoretical connection between TV regularisation and the Mumford-Shah segmentation approach. Unlike other Mumford-Shah algorithms, it is a strictly convex approximation, for which we can guarantee convergence of a split Bregman algorithm.

Abstract. Variational models with coupling terms are becoming increasingly popular in image analysis. They involve auxiliary variables, such that their energy minimisation splits into multiple fractional steps that can be solved easier and more efficiently. In our paper we show that coupling models offer a number of interesting properties that go far beyond their obvious numerical benefits. We demonstrate that discontinuity-preserving denoising can be achieved even with quadratic data and smoothness terms, provided that the coupling term involves the L 1 norm. We show that such an L 1 coupling term provides additional information as a powerful edge detector that has remained unexplored so far. While coupling models in the literature approximate higher order regularisation, we argue that already first order coupling models can be useful. As a specific example, we present a first order coupling model that outperforms classical TV regularisation. It also establishes a theoretical connection between TV regularisation and the Mumford-Shah segmentation approach. Unlike other Mumford-Shah algorithms, it is a strictly convex approximation, for which we can guarantee convergence of a split Bregman algorithm.

1. Introduction. In image analysis applications, it is desirable to use variational models with regularisers that involve higher order derivatives: These higher order models offer better smoothness and avoid e.g. staircasing artifacts that are present in various first order variational approaches. Unfortunately, higher order variational models are numerically unpleasant. For instance, minimising a second order energy gives rise to a fourth order partial differential equation (PDE) as Euler-Lagrange equation. Higher order PDEs are more cumbersome to discretise, and their numerical approximations suffer from higher condition numbers such that algorithms converge more slowly or become even unstable.

As a remedy, coupling models have been proposed. They introduce coupling terms with auxiliary variables. For instance, one approximates the gradient of the processed image by an auxiliary variable. In this way a second order energy in a single variable is replaced by a first order energy in two variables. As Euler-Lagrange equations, one solves a system of second order PDEs, which is numerically more pleasant than solving a fourth order PDE.

Coupling models have a long history. Already in 1990, Horn came up with a coupling model for shape from shading [19] . In the case of denoising and deblurring, Chambolle and Lions [13] have investigated coupling models by combining several convex functionals into one regulariser. With this they were able to modify the total variation regularisation model [24] and reduce unpleasant staircasing artifacts. A transfer of this infimal convolution approach to the discrete setting was proposed by Setzer et al. [27] . Also Bredies et al. [7] performed research in this direction and introduced a total generalised variation regulariser based on symmetric tensors of varying order. This allows a selective regularisation with different smoothness constraints. More recent coupling models have become highly sophisticated. Hewer et al. [18] introduced three coupling variables to come up with a coupling model that replaces fourth order energy minimisation. Hafner et al. [17] considered coupling models that are anisotropic both in the coupling and the smoothness term, and Brinkmann et al. [8] used coupling models in connection with regularisers based on natural vector field operations, including divergence and curl terms.

Nevertheless, in spite of their popularity and sophistication, the main motivation behind all these coupling approaches was numerical efficiency. Additional benefits of the coupling formulation have hardly been investigated so far.

Our Contributions. The goal of our paper is to show that the coupling formulation is much more than just a numerical acceleration strategy. We will see that it provides valuable benefits from three different perspectives that have remained unexplored so far:

• From a modelling perspective, we argue that coupling terms may free the user from designing nonquadratic smoothness terms if discontinuity-preserving solutions are needed. Moreover, we show that it can be worthwhile to replace even first order energies by coupling models. • From a theoretical perspective, we demonstrate that coupling models may give insights into the connections between two of the most widely used variational approaches in image analysis: the total variation (TV) regularisation of Rudin et al. [24] and the Mumford-Shah functional [22] for image segmentation. These insights will also give rise to a new algorithm for Mumford-Shah segmentation, that differs from existing ones by the fact that it minimises a strictly convex energy. Thus, we can guarantee convergence of a split Bregman algorithm. • From an application perspective, we show that sparsity promoting coupling terms may provide useful information as an edge detector that comes at no additional cost.

Paper Structure. Our paper is organised as follows: In Section 2, we derive the framework for a first and second order model that incorporates the coupling term. We show the relation to the Mumford-Shah functional and elaborate why the coupling term can be interpreted as an edge detector. Section 3 deals with the numerical aspects that lead to the minimisation of our approach in the discrete setting and showcases the advantages of the split Bregman formulation. In Section 4, we present experimental results, and we conclude the paper in Section 5.

2. Our Models. In this section we propose two coupling models for edge-preserving image enhancement. We relate them to total variation regularisation and the segmentation approach of Mumford and Shah, and we argue that the L 1 coupling term provides a novel edge detector. : Ω → R denote a greyscale image that contains noise with zero mean. Our goal is to find a noise free version of that image. As this is in general an ill-posed problem, we employ variational approaches to find a reconstruction u that is as close as possible to the noise free image and at the same time satisfies certain a priori smoothness conditions. One of the earliest models by Whittaker [29] and Tikhonov [28] obtains u as minimiser of the energy functional

where · denotes the Euclidian norm. The first term guarantees that u is close to the given data f and is called data term. The second term penalises deviation from a smooth solution and is therefore denoted as the smoothness term. The regularisation parameter α > 0 steers the smoothness of the solution. In practice we deal with a rectangular, finite domain Ω. This prompts us to choose homogeneous Neumann boundary conditions. These coincide with the natural boundary conditions that arise when we derive the Euler-Lagrange equations of (1). We modify (1) to a constrained problem by introducing an auxiliary vector field v ∈ L 2 (Ω, R 2 ). We set v = ∇u and obtain

Enforcing the constraint weakly in the Euclidian norm yields the coupling model

with parameter β > 0. Since the newly introduced term inherently couples ∇u and v we call it coupling term. It is obvious that apart from trivial cases the vector field v is not equal to the gradient of the function u. However, by following the proof in [4] one can show that

which means that they are "equal in the mean". The vector field v inherits its boundary conditions from u. We will discuss the details in the next chapter. At first sight the coupling model does not offer any advantage over the energy (1) and seems to complicate it more. Indeed, by calculating the Euler-Lagrange equations and performing a simple substitution we see that the minimiser for u of the functional in (3) is equivalent to the minimiser of energy (1) with parameter αβ α+β . For the limit case β → ∞, which corresponds to a hard constraint of ∇u = v, we thus obtain the same solution. In the other limit case α → ∞, the energy (3) corresponds to (1) with regularisation parameter β.

In order to obtain our first order coupling model we only apply a small modification to the coupling term in (3) . We recall that a function u ∈ L 1 (Ω) is said to be of bounded variation (u ∈ BV (Ω)) if it has a (distributional) gradient in form of a Radon measure of finite total mass. Then we can define the total variation seminorm of u as [2] (5)

If we consider the difference ∇u − v as an argument, we extend (5) and obtain

With this we formulate our first order coupling model:

It is obvious that the model (7) is strictly convex due to the quadratic data term. Moreover, we also have a quadratic smoothness term and only the coupling term contains a more sophisticated penaliser. It is therefore no surprise that for the hard constraint β → ∞, the energies (7) and (3) are still identical. The more interesting case arises in the limit α → ∞: In order to fulfil the constraint, the vector field v should be zero, which yields the regulariser (5) for the coupling term. Energy (7) is then equivalent to the classical TV denoising model by Rudin et al. [24] . However, we will later demonstrate how the structure of the coupling model is both beneficial for an efficient minimisation as well as for superior denoising results. We note that it can be shown that model (7) is equivalent to denoising with a Huber TV penaliser [10] .

While TV regularisation promises a good compromise between image reconstruction and edge preservation, it introduces "blocky" structures into the reconstructed images, the so-called staircasing artifacts. One way to avoid these artifacts is by second order models [25, 30, 21] . Due to the coupling formulation, a higher regularisation order can easily be realised in our framework. Instead of the vector field v we penalise its Jacobi matrix in the Frobenius norm. The resulting energy is given by

In order to see that (8) approximates a second order model, we discuss the limiting case β → ∞ again. The smoothness term favours a constant vector field v. For a hard coupling this means that constant first derivatives of u lead to the smallest energy. Hence, the solution u corresponds to an affine function which is characteristic for second order regularisation methods. The model resembles the total generalized variation (TGV) of second order [7] :

where ε denotes the symmetrised gradient in a TV-like norm in the smoothness term. However, the smoothness term in our first and second order model is still quadratic. In Section 3 we will see that this allows very efficient algorithms.

Let Ω be defined as before and let K be a compact curve in Ω. The segmentation model by Mumford and Shah minimises the following energy [22] :

The function u : Ω → R approximates f as a sufficiently smooth function in Ω\K, but admits discontinuities across K. The curve K denotes the edge set between the segments. The expression (K) represents the length of K and can be expressed as H 1 (K), the one-dimensional Hausdorff measure in R 2 . The parameter β > 0 controls the length of the edge set, whereas α > 0 steers the smoothness of u in Ω\K.

Unfortunately the Mumford-Shah energy is highly non-convex and difficult to minimise. Ambrosio and Tortorelli [3] showed that they can approximate the solution by a sequence of regular functions defined on Sobolev spaces and proved Γ-convergence. However, their energy is still non-convex and the quality of the solution depends on the initial parameter setting of the -approximation.

Cai et al. [11] proposed a variant that approximates the edge length by the total variation seminorm. They were able to approximate (10) by a convex minimisation problem over the whole domain:

Nevertheless, the price for convexity comes with a smooth minimiser u ∈ W 1,2 (Ω) that cannot contain any discontinuities such that a postprocessing step is necessary for a segmentation.

In order to relate equation (11) to our first order model (7), we perform the image decomposition u = u 1 + u 2 such that the regularisers affect different parts of the image. These kind of approaches based on an infimal convolution were introduced by Chambolle and Lions in [13] . With this we obtain

The decomposition lets us define the parts on different spaces. While u 1 ∈ W 1,2 (Ω), we have u 2 ∈ BV (Ω). Thus, we admit jumps in u which makes a postprocessing unnecessary. By introducing an auxiliary vector field v ∈ L 2 (Ω, R 2 ) which is designed to approximate ∇u 1 , we consequently obtain for the argument of the last term:

Substituting this in (12) brings us back to the coupling model (7). Therefore, we observe that our first order coupling model approximates the Mumford-Shah segmentation model. It is obvious that this convex approximation is smoother then other variants as we do not perform an enhancement at edges. However, we still allow jumps in the regularised image. Moreover, we stress that the approximation is strictly convex and admits a unique solution.

The reasoning for the first order model can easily be transferred to the second order model. Energy (8) can thus be understood as an approximation to a second order Mumford-Shah model. One example for a second order approximation is the model of Blake and Zisserman [5] :

Here, ∇ 2 denotes the Hessian, and the Hausdorff measures on the curves K 0 and K 1 penalise not only the edge length between segments as before, but also the length of the curve that describes creases in the image. More recent variants were discussed in [9, 15] . They all have in common that they require the approach of Ambrosio-Torterelli to derive a minimiser. Our second order model gives an alternative that manages without the -approximation.

Interpretation of the Coupling Term as an Edge Detector. In the previous subsection we established a connection between our coupling framework and the Mumford-Shah functional. The difficulty in the original formulation lies in the computation of the edge set K. We argue in the following how the edge set is related to the introduced coupling term which is directly accessible without additional effort. We observe that the smoothness terms in energy (7) and (8) are quadratic. That means that on one hand v is still a smooth approximation to the ∇u. On the other hand the coupling term contains the sparsity-promoting L 1 norm. As a result, v deviates from ∇u only at a small number of positions, which are represented as peaks in the energy of the coupling term. Since these peaks correspond to discontinuous jumps in the solution u, we see that the coupling term corresponds to the desired edge set and can naturally be utilised as an edge detector.

We distinguish two limiting cases. For β → ∞ we end up with the hard constraint ∇u = v such that the coupling energy (6) evaluates to 0. This makes sense, as the model simplifies to energy (1) again, and u will be a smooth solution without jumps. For β → 0, the functions u and v are decoupled. A minimiser is reached when u corresponds to the given image f and v = 0. Here the coupling term represents a simple gradient-based edge detector without a presmoothing. Such a detector is not reliable in the presence of high frequency oscillations and noise as it responds to many false edges.

In between these limiting cases, an increasing value for β allows the coupling term to partition the image into more and more homogeneous regions, since this gives a smaller energy. Thus, on each scale β, we solve a convex variational model and obtain a unique solution. The coupling term serves as a novel edge detector that provides us with a globally optimal edge set. The discontinuity-preserving L 1 norm in the underlying model allows the coupling term to respond at edges, but makes it robust against false edges produced by noise in the image. As it also promotes sparsity it limits the detector's response to the most relevant edges which are well localised.

3. Numerical Aspects. In this section we argue that the discrete variables in our models live on different grids in order to obtain consistent approximations. We speed up a conventional alternating minimisation by a split Bregman scheme [16] that offers a tremendous acceleration of the minimisation process. We discuss the scalar-valued problem and then explain the modification needed to solve the problem for vector-valued data.

3.1. Discretisation of the Variables. In order to implement the model on a computer we introduce a discrete setting based on a finite difference discretisation. We consider the domain Ω to be rectangular, i.e. Ω = (a, b) × (c, d). Then the discrete domain Ω d is given on a regular Cartesian grid with M × N nodes: (15) Ω

The discrete locations (i, j) within Ω d correspond w.l.o.g. to the continuous locations a + i − 1 2 h, c + j − 1 2 h within Ω.

Discretising the scalar-valued functions u and f at positions (i, j) yields variables u, f ∈ R M ×N . We approximate the derivatives with a finite difference scheme. As we require the consistency order of our approximation to be at least two, we use central differences around a grid shifted by h 2 . In order to be consistent with this approximation we consequently need the discretisation for v = (v 1 , v 2 ) to live on different dual grids shifted just by this half grid size [20] . More precisely, entries (v 1 ) i,j are defined at positions (i + 1 2 , j), whereas entries (v 2 ) i,j are defined at positions (i, j + 1 2 ), as they approximate the partial derivatives in different directions. This also extends to the boundary conditions. As already mentioned v inherits its boundary conditions from the homogeneous Neumann boundary conditions of u. This means that the components of v have zero Dirichlet boundary conditions in the direction of the shifted grid and homogeneous Neumann boundary conditions otherwise.

Discretisation of the Energies. For sufficiently smooth data u, we can define a regularised penaliser of (5) by (16) Ω ϕ (∇u) dx = Ω ∇u 2 + dx with small ≥ 0 [1] . For the case = 0, it reduces to (5) , but otherwise offers the advantage to be differentiable when ∇u = 0. We consider this regularisation for our discrete implementations.

The discretisation of our first order model (7) and second order model (8) subsume to the minimisation problem (17) min

where · 2 denotes the discrete Euclidean norm. The coupling term contains the discrete counterpart of the regularised penalising function (16) in form of

which evaluates the regulariser (16) pixelwise with

The approximation of the discrete gradient components ∂ x+ u and ∂ y+ u lives on a grid shifted by half a grid size in the corresponding positive coordinate direction and incorporates the homogeneous Neumann boundary conditions. In the function S (v), the parameter defines the regularisation order of the problem. For = 1, we have S 1 (v) = v 2 2 , and for = 2 we use S 2 (v) given by (20) ∇

We employ the same discretisation for ∂ y+ v 1 and ∂ x+ v 2 as above, whereas we introduce the approximations ∂ x− v 1 and ∂ y− v 2 as central differences on a grid which is shifted in the respective negative coordinate with Dirichlet boundary conditions. If we solve problem (17) in a naive way, we set the derivatives with respect to the components u and v to zero and perform an alternating iterative minimisation. The nonlinearity is always evaluated at the old iterate. The minimisation problem for u leads to (21) 0

and a discrete divergence operator div u . For = 1, the auxiliary vector field v requires us to solve the following equations:

whereas for = 2 we obtain a system of equations of the form

Here, ∆ v denotes a direct sum of discrete Laplace operators that are applied to each component of v separately.

3.3. The Split Bregman Method. Unfortunately, the solution of (21) is a bottleneck in the alternating minimisation process. The derivatives of the regularised penalisation that serve as a diffusivity slow down the convergence speed drastically when we apply iterative solvers. As a remedy for this problem we employ the split Bregman method by Goldstein and Osher [16] . This operator splitting approach is equivalent to the alternating direction method of multipliers (ADMM) and the Douglas-Rachford splitting algorithm which are suitable for convex optimisation problems such as (17) and whose convergence is guaranteed [6, 26] .

We introduce the discrete vector field p that lives on the same grid as v. We set p := ∇ u u − v and obtain the constrained problem (24) min

Enforcing the constraint strictly with a numerical parameter λ > 0 yields the split Bregman iteration at step k + 1:

Even though the split Bregman method introduces an additional minimisation problem, the previous nonlinear equations are simplified substantially due to the simple quadratic structure of the data and smoothness term. The subproblems for u and v are then by design linear and yield the optimality conditions

Equation (28) leads to a simple pointwise computation. For the other two subproblems we may only apply a few steps of an iterative solver. This is in accordance to Osher and Goldstein who state that a complete convergence of the subproblems is not necessary for the convergence of the whole algorithm [16] .

The last subproblem for p contains the nonlinearity, that now only occurs pointwise and not as a diffusivity in a divergence term. We obtain the update rule

We terminate the algorithm if the residuum of equations (21)- (23) in relation to an initial residuum gets smaller than a prescribed threshold. If we are also interested in the resulting edge set, we can obtain it without additional effort by evaluating ∇ u u − v 2 2 . 3.4. Vector-valued Extension. An extension to vector-valued images is straightforward. The Euclidian norm in data and smoothness terms allows a separate calculation for each channel which opens up the possibility for a parallel implementation. The only coupling between channels occurs in (19) . Therefore, only the nonlinearities in the pointwise update rule (30) are affected.

4.1. Denoising Experiment. In this section we discuss the application of our first and second order coupling models for denoising. We compare our models to TV denoising [24] and the combined first and second order approach TV-TV 2 by Papafitsoros et al. [23] . We set the regularisation parameter of the penalising function to = 10 −6 and stop the algorithms if the relative residuum gets smaller than 10 −6 . For the solution of the linear subproblems we perform an inner loop of 10 Jacobi steps per iteration. The numerical parameter λ is important for the number of iterations until convergence is reached. Empirically we found that λ should be chosen in dependence of β and the regularisation order . We fix λ := · β. We choose the parameters optimally for each method such that the mean squared error (MSE) w.r.t. the ground truth is minimised.

In Fig. 1 we consider an affine test image of size 256 × 256 that is corrupted by Gaussian noise with σ = 40. We first consider denoising by first order models. Our model is able to beat the classical TV model w.r.t. the MSE. We also see a qualitative improvement when we consider a horizontal cross section of the images in the last row. We reduce the staircasing artifacts significantly.

We compare the reconstruction by second order models in Fig. 2 . While the TV-TV 2 model beats our first order coupling model only slightly w.r.t. to the MSE, our second order model presents a large improvement. In the 1D plot we see that no staircasing artifacts arise.

This experiment also illustrates the efficiency of the split Bregman implementation for solving our problems. We compare two implementations. On the one hand we employ an alternating minimisation strategy of equations (21) and (22) resp. (23) realised with a standard conjugate gradient method. On the other hand we apply a split Bregman Implementation based on the subproblems that arise from equation (25) . Figure 3 shows the decay in the relative residuum as a function to the runtime. For both the first and second order model, we gain a speed up of more than one order of magnitude with the split Bregman scheme. test image [8] with consequently of detected edges. To counteract the segment fusion steered by β, the parameter α should be chosen relatively large to benefit from the TV-like effect inside each segment.

In Fig. 4 we show our first order model for varying values of β together with the edge set that we get from the coupling term. For better visualisation we perform a hysteresis thresholding of the edge set [12] . Our method is able to produce smooth segments. For an increasing value of β, more and more segments are merged. For instance we see that the smaller hole in the rock vanishes for a higher value. As already mentioned the resulting images are rather smooth in each segment due to the quadratic smoothness term, but we still have sharp edges in between. For an increasing β value we obtain a sparser edge set.

original image [14] α = 750, β = 50 α = 750, β = 500 

Detection. The previous subsection shows that an edge detector can be obtained directly from the coupling term. In Fig. 5 we illustrate its high robustness under noise. We compare it to a Canny edge detector [12] , which is a classical edge detection method. To compensate for the noise, the Canny edge detector presmoothes the image by a convolution with a Gaussian kernel. While most large scale structures are detected, edges of smaller details are dislocated or merged together. This can be seen e.g. at the branch below the bird. Our method works edge-preserving and is able to capture these small details.

We have shown that coupling terms in variational models are useful for more than just easing numerics. Already first order models that contain a coupling term in the sparsity promoting L 1 norm are sufficient for edge-preserving denoising. For the remaining data and smoothness terms we can use simple quadratic L 2 norms. This does not only simplify the numerical minimisation scheme, it also challenges the common belief that edge-preserving denoising requires a nonquadratic smoothness term. The first order coupling model also shows a theoretical connection to the classical segmentation model of Mumford and Shah. As it works on a strictly convex energy, we obtain an approximation of the Mumford-Shah functional that has a unique solution and is guaranteed to converge. Furthermore, the introduced coupling term itself carries valuable information: It has the properties of a global edge detector that is robust against noise and comes at literally no cost as a byproduct of the minimisation. In our ongoing research we investigate how the coupling formulation can be extended to applications in deblurring or inpainting or in combination with anisotropic regularisation. Moreover, we intend to exploit the simplicity of the quadratic terms by using parallel implementations to gain more efficient solvers.

Analysis of bounded variation penalty methods for ill-posed problems

Functions of Bounded Variation and Free Discontinuity Problems

Approximation of functionals depending on jumps by elliptic functionals via Γ-convergence

An alternative approach towards the higher order denoising of images. analytical aspects

Visual Reconstruction

Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning

Total generalized variation

Unified models for second-order TV-type regularisation in imaging-a new perspective based on vector operators

Second-order edge-penalization in the ambrosiotortorelli functional

Infimal convolution regularisation functionals of BV and L p spaces. part I: The finite p case

A two-stage image segmentation method using a convex variant of the Mumford-Shah model and thresholding

A computational approach to edge detection

Image recovery via total variation minimization and related problems

Berkeley Segmentation and Boundary Detection Benchmark and Dataset

Second order Mumford-Shah model for image denoising

The split Bregman method for L1-regularized problems

Introducing maximal anisotropy into second order coupling models

Lagrangian strain tensor computation with higher order variational models

Height and gradient from shading

Natural discretizations for the divergence, gradient, and curl on logically rectangular grids

Noise removal using fourth-order partial differential equations with applications to medical magnetic resonance images in space and time

Optimal approximation of piecewise smooth functions and associated variational problems

A combined first and second order variational approach for image reconstruction

Nonlinear total variation based noise removal algorithms

Denoising with higher order derivatives of bounded variation and an application to parameter estimation

Split Bregman algorithm, Douglas-Rachford splitting and frame shrinkage

Scale Space and Variational Methods in Computer Vision

Infimal convolution regularizations with discrete 1-type functionals

Solution of incorrectly formulated problems and the regularization method

A new method of graduation

Fourth-order partial differential equations for noise removal

Acknowledgements. This project has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant agreement no. 741215, ERC Advanced Grant INCOVID).