key: cord-0112654-tn4w68c8
authors: Yasarla, Rajeev; Joze, Hamid Reza Vaezi; Patel, Vishal M
title: Network Architecture Search for Face Enhancement
date: 2021-05-13
journal: nan
DOI: nan
sha: 02c7dcedee24ae9ca55a96180fae7b7000009ad0
doc_id: 112654
cord_uid: tn4w68c8

Various factors such as ambient lighting conditions, noise, motion blur, etc. affect the quality of captured face images. Poor quality face images often reduce the performance of face analysis and recognition systems. Hence, it is important to enhance the quality of face images collected in such conditions. We present a multi-task face restoration network, called Network Architecture Search for Face Enhancement (NASFE), which can enhance poor quality face images containing a single degradation (i.e. noise or blur) or multiple degradations (noise+blur+low-light). During training, NASFE uses clean face images of a person present in the degraded image to extract the identity information in terms of features for restoring the image. Furthermore, the network is guided by an identity-loss so that the identity in-formation is maintained in the restored image. Additionally, we propose a network architecture search-based fusion network in NASFE which fuses the task-specific features that are extracted using the task-specific encoders. We introduce FFT-op and deveiling operators in the fusion network to efficiently fuse the task-specific features. Comprehensive experiments on synthetic and real images demonstrate that the proposed method outperforms many recent state-of-the-art face restoration and enhancement methods in terms of quantitative and visual performance.

In the era of COVID-19, the use of video communication tools such as Zoom, Skype, Webex, MS Teams, Google Meet etc. has increased drastically. In many cases, images/videos captured by these video conferencing tools are of poor quality due to low-light ambient conditions, noise, motion artifacts etc. Figure 1 shows an example of such an image taken during a video conference. Hence, it is impor-* Work performed during internship at Microsoft Degraded image Super-FAN [2] Shen et al. [66] UMSN [83] DeblurGANv2 [28] DFDNet [35] HiFaceGAN [80] NASFE (ours) Figure 1 : Sample results on the face with multiple degradations like blur, noise and low-light conditions. Restoration methods such as [48, 66, 2, 83, 35, 80] fail to reconstruct a high quality clean face image. In constrast, the proposed NASFE network produces a high quality face image.

tant to enhance the quality of face images collected in such conditions. Furthermore, restoration of degraded face images is an important problem in many applications such as human computer interaction (HCI), biometrics, authentication and surveillance.

Existing face image restoration and enhancement methods are designed to address only a single type of degradation such as blur, noise or low-light. However, in practice face images might have been collected in the presence of multiple degradations (i.e. noise + blur+ low-light). Hence, it is important to enhance the quality of face images collected in such conditions. In this paper, we address the problem of restoring a single face image degraded by multiple degradations (noise+blur+low-light). To the best of our knowledge, we are first ones to address such a multi-task image restoration problem where a single network is able to remove the effects of low-light conditions, noise, and blur simultaneously.

Face images are structured and informative when compared to natural images. Methods such as [48, 66, 2, 83] extract structured information from faces in the form of semantic maps or exemplar masks in order to super-resolve or remove blur from the degraded face images. This way of extracting facial semantic information is extremely difficult when there are multiple degradations present in the image and as a result may lead to poor restoration performance. Recently, HiFaceGAN [80] addressed face restoration using a multi-stage semantic generation framework. The performance of this method relies heavily on the features extracted from the degraded faces for capturing semantic information. However, it is difficult to learn semantic structure from the noisy features corresponding to multiple degradations. Recently, DFDNet [35] proposed a dictionary-based method to produce a high-resolution face image from a low-resolution input. Note that super resolution is a relatively easy task compared to restoring an image from multiple degradations since semantic information cannot be easily extracted from the images with multiple degradations. As shown in Figure 1 even though state-of-the-art face image restoration methods such as [48, 66, 2, 83, 35, 80] are retrained on multiple degradations, they fail to reconstruct a high-quality clean face image.

Recently, Li et al. [36] proposed a blind face restoration method by utilizing multi-exemplar images and adaptive fusion of features from guidance and degraded images. An important point to note here is that computation of the guidance image also relies on extracting the structured information (i.e. landmarks) from degraded face image and clean face image. Since structured information extracted from a degraded image with multiple degradations is not reliable, this way of computing a guidance image will not be helpful in addressing the proposed problem of multi-task face image restoration. To address this problem, we propose a way of using clean face images corresponding to same person in the degraded image to restore the image. These clean images, which maybe taken in different scenarios at different times, help us extract the identity information using VG-GFace features [51] . Since these images may have different styles due to contrast or illumination, we apply Adaptive-Instance normalization (AdaIN) [21] on the extracted VG-GFace features to transform the style to that of the degraded face image. We also use these extracted features and propose a novel identity-loss, L iden , for training the network. The proposed multi-task face restoration problem can be considered as an many-to-one feature mapping problem, i.e extracting task specific features (i.e. noisy, blur, and lowlight enhancement features) and fusing them to get features corresponding to the clean image. The fused features can then be used by a decoder to restore the face image. One can clearly see the importance of fusion in this framework. Rather than naively using Res2Blocks [15] or convolutional blocks in the fusion network, one can learn an architecture for fusion which may lead to better restoration. To this end, we propose a neural architecture search-based approach [40, 41] for learning the fusion network architecture. Additionally, we introduce FFT-op and deveiling operators in order to process the task specific features efficiently where these operators address image formation formulation of these multiple degradations. FFT-op is motivated by Weiner deconvolution and help in learning the weights to efficiently fuse the task specific features and remove the effect of blur from the features. Deveiling operator is introduced to learn the weights in order to enhance low-light conditions efficiently. Furthermore, we use a classification network to classify the input degraded image into different classes that give information about the degradations that are present in the input face image. This class specific information is used as a prior information in the fusion network for fusing the task specific features. Fig. 1 shows sample results from the proposed Network Architecture Search for Face Enhancement (NASFE) method, where one can see that NASFE is able to provide better restoration results as compared to the state-of-the-art face restoration methods.

To summarize, this paper makes the following contributions:

• We propose a way of extracting the identity information form different clean images of a person present in the degraded image to restore the face image. • We propose a novel loss, called identity-loss (L iden ), which uses the aforementioned identity information to train the NASFE network. • We propose a neural architecture search-based method for designing the fusion network.

Denoising. Earlier methods such as [9, 93, 11, 17] make use of image priors to perform denoising. They require the knowledge about the amount of noise present in the noisy image to obtain denoised images. In contrast to these methods, blind image denoising methods like [90, 29, 38] , model the noise using techniques like Non-local Bayes, and Low-rank mixture of Gaussians. CSF [61] and TDRN [7] proposed optimization algorithms in addressing stage-wise inference methods. The advent of convolutional networks in addressing image restoration problem allowed, DnCNN [87] , FFDNet [85] , RED30 [44] , and BM3D-Net [79] to achieve impressive performance in denoising. Noise2Noise [30] method proposed a method that doesn't require paired noise-clean images to train the network. They rely on statistical reasoning and train the network using pairs of noisy images. Authors of CBDNet [19] proposed an elegant way to model realistic noisy image, and use them to train the network that uses asymmetric learning to suppress under estimation of noise level.

Deblurring. Classical image deblurring methods follow estimation of blur kernel given a blurry image, and then use a deconvolution technique to obtain deblurred image. [77, 13, 25, 58, 64, 49, 68, 47, 52] methods have been proposed to compute different priors like sparsity, L 0 gradient prior, patch prior, manifold prior, and low-rank prior, in order to compute blur kernel from the given blurry image. In recent years, neural network-based methods have also been proposed for deblurring [76, 59, 43, 66, 82] , super-resolution [84, 6, 2, 36, 78, 69] . Deblurring methods are classified into blind and non-blind deblurring methods based on the usage of blur kernel information while restoring the image. Recent non-blind image deblurring methods like [5, 26, 63, 64, 62, 70] assume and use some knowledge about the blur kernel. Several face deblurring methods extract structural information in the form of facial fucidial or key points, face exemplar mask or semantic masks. Pan et al. [48] extract exemplar face images and use them as a global prior to estimate the blur kernel. Recently, [66, 67] proposed to use the semantic maps of a face to deblur the image. HiFaceGAN [80] addressed face restoration using a multi-stage semantic generation framework. DFD-Net [35] proposed a dictionary-based method to produce high-resolution face image.

Low-light enhancement. Various methods have been proposed in the literature for addressing low-light enhancement problem. Methods such as histogram equalization [54] , matching region templates [22] , contrast statistics [60] , bilateral learning [16] , intermediate HDR supervision [81] , reinforcement learning [50, 55] , and adversarial [8, 23] have been proposed in the literature. [12, 65] view the inverse of low-light image as a hazy image in order to estimate a low-light enhanced image. Wei et al. [73] proposed a realistic noise formation model based on the characteristics of CMOS photosensors, to synthesize realistic low-light images. Guo et al. [18] proposed a zero reference-based method that uses pixel-wise high-order curves for dynamic range adjustment to enhance dark images.

Neural Architecture Search methods focuses on automatically designing or constructing neural network architectures that are efficient in addressing a specific problem to achieve the optimal performance. Several architecture search methods have been proposed that use reinforcement learning [91, 3, 89] and evolutionary algorithms [57, 74, 40, 45] . Recent architecture search methods like [53, 92, 56] focus on searching the repeatable cell structure, while fixing the network level structure fixed. These methods are more efficient and computationally less expensive than the earlier methods. PNAS [39] proposed a progressive search strategy that notably reduces the computational cost. Motivated by [37, 41] , we propose an efficient way to design our fusion network that fuses the task specific features in addressing the removal of multiple degradations like noise, blur, and enhancing the low-light conditions.

An observed image y, with multiple degradations can be modeled as follows,

where x is the clean image, r is the irradiance map, k is the blur kernel, and n is the additive noise. Here, * and denote convolution and element wise multiplication operations, respectively. Figure 2 gives an overview of the proposed NASFE network which consists of three task specific encoders, a fusion network and a decoder. The deblur encoder (E B (.)), denoise encoder (E N (.)) and low-light encoder (E L (.)) are trained to address the corresponding tasks of deblurring, denoising, and low-light enhancement, respectively. These encoders are used to extract task-specific features which are then fused using a network architecture search (NAS) based fusion block. Finally, the fused features are paseed through the decoder network to restore the face image. Additionally, with help of a classification network, we determine what degradations are present in the input image, and use them as a prior information to the task specific encoders and the fusion network. Furthermore, to improve the quality and preserve the identity in the restored face image, we extract identity information from a set of clean images corresponding to the same identity present in the degraded image in the form of VGGFace features. We denote them as the identity information (I iden ), and pass them as input along with the degraded image to the NASFE network to restore the face image. Besides using this identity information, we construct an identity loss L iden to train the NASFE network.

Due to space limitations, details regarding the task specific encoders, classification network and the decoder network are provided in the supplementary document.

As can be seen from Fig. 2 , the task specific features are concatenated and then fed into the fusion network. Rather than using a simple 1 × 1 convolution or a residual block to fuse these features, we proposed to use a NAS-based approach for designing the fusion network [53, 56, 41, 37] . We define fusion cell as a smallest repeatable module used to construct the fusion network (see Fig. 3 ). In our approach, the network search space includes both network level search (i.e. searching the connection between different fusion cells), and cell level search (i.e. exploring structure of the fusion cell).

We adopt the cell design structure of [37] to define a fusion cell (represented as Cell(.)) as a directed acyclic graph consisting of B blocks. Each block i in the l th fusion cell F l is a two tensors to one tensor mapping structure determined as a tuple of (I 1 , applied to the corresponding input tensors, and M ∈ M is the method used to combine the outputs O 1 , O 2 to form the block F l i 's output tensor, Z l i . The fusion cell F l 's output, Z l is a concatenation of the outputs of all blocks {Z l 1 , Z l 2 , · · · , Z l B } in the cell F l . I l is the set input tensor consisting of outputs of the previous cell F l−1 and previous-previous cell F l−2 . We follow [37] , and use element wise addition as the only operator of possible combining method in M. The set of possible layer types O consist of the following ten operators:

• Deveiling operator • Identity or skip connection • FFT operator • No or zero connection Along with the conventional convolution operators like dilated, separable convolutions, no and skip connections, Res2Block [15] and self-attention block [71, 86] , we introduce Res-op, deveiling and FFT-op (shown in Fig. 4 ) to efficiently process the task specific features. These operations are based on the image formation models regarding the individual degradation like noise, blur and low-light conditions. In what follows, we explain the design of these new operators in detail. FFT operator. Classical deblurring methods make use of the Wiener deconvolution technique to restore the image from blurry observations. Motivated by Wiener deconvolution, we split the features x in into two parts x in1 and x in2 , and then apply convolution operation to obtain x 1 and x 2 , respectively. Finally, as shown in Fig. 4(b) , we apply Weiner deconvolution to obtain x out = F −1 (X out ), where

Here, F −1 denotes the inverse Fourier transform operator and X 1 , X 2 and X out are the Fourier transforms of x 1 , x 2 and x out , respectively. Here, resembles the inverse of signal-to-noise ratio used during Weiner deconvolution which we set equal to 0.01.

Deveiling operator. [12, 65] viewed low-light enhancement as an image dehazing problem since both have similar mathematical models. [12] addressed low-light enhancement problem using a dark channel prior. Motivated by these methods, we use a deveiling operator [31, 34] to learn a latent mask A that enhances features from low-light conditions as follows

ry is a learnable mask and a function of y. As shown in Fig. 4 (c) A can be learned using a convolution layer given input features x in and element wise multiplication of A with x in to obtain the enhanced features x out .

Residual operator. Inspired by the image deraining work [33] that uses a residual operator to remove rain streaks, we use a similar residual operator shown in Fig. 4 (a) to estimate noise from the latent features and finally obtain noisy free features.

Fusion Cell search space. We use the continuous relaxation approach [40, 41] in which every output tensor Z l i of block F l i is connected to all input tensors I i l through operation O j→i as follows

We defineŌ j→i , as approximation of the best search for operators O j→i using continuous relaxation as follows

where |O| k=1 α k j→i = 1, and α k j→i ≥ 0 ∀i, j. This can be easily implemented using softmax. Hence, using Eq. 4 and Eq. 5, we can summarize the fusion cell architecture as follows, Z l = Cell(Z l−1 , Z l−2 , α).

Given the degraded image y, and a set of clean images D C = {C i } n i=1 , we compute pool3 features using the VG-GFace network [51] . Note that D C contains clean images of the same person present in the degraded image y. Let F y and {F C i } n i=1 denote the VGGFace features corresponding to y and {C i } n i=1 , respectively. Since the clean images in D C may have different style and characteristics as they may have been taken in different scenarios and times, we apply Adaptive-Instance normalization (AdaIN) before passing them as input to the network to reduce the effect of different styles in the images. AdaIN is applied as follows,

where σ(.) and µ(.) denote standard deviation and mean, respectively. Mean ofF C i is defined as the identity information, i.e. I iden = mean({F C i }). I iden is used along with y as input to the NASFE network to restore the face image.

Note that as we are extracting the identity information from the clean images, this information is much more reliable and provides stronger prior as compared to face exemplar masks [48] or semantic maps [66, 2, 83] extracted using the degraded images. Additionally, to preserve the identity of the subject in restored image, we construct an identity loss L iden to train the NASFE network.

Identity Loss L iden . Letx denote the restored face image using the NASFE network. We construct the identity loss as follows

where Fx denotes the VGGFace features corresponding tô x and n denotes the number of clean images in D C .

The NASFE network is trained using a combination of the L2 loss, perceptual loss [24] and identity loss as follows L f inal = L 2 + λ per L per + λ iden L iden (9) where L 2 = x − x 2 2 , L iden denotes the identity loss defined in (8) and L per denotes the perceptual loss [24] defined as follows

Here, Fx, F x denote the pool3 layer features of the VG-GFace network [51] and N, H and W are the number of channels, height and width of Fx, respectively. We set λ per = 0.04 and λ iden = 0.003 in our experiments. Note that multiple clean images are required only during training. Once the network is trained, a degraded image is fed into the network and the NASFE produces identitypreserving restored image as the output.

Given x, we first convolve it with k to get a blurry image. Here, k can be a motion blur kernel [1, 27] or an anisotropic Gaussian blur kernel [88] . To generate a degraded image with blur+low light+noise conditions, we fol- (TIP'7)

EPLL [93] (ICCV'11) TNRD [7] (PAMI'16)

DnCNN [87] (TIP'17)

CBDNet [19] ( +DnCNN [87] LIME [20] +DnCNN [87] Liet al. [32] +DnCNN [87] Zero-DCE [18] +DnCNN [87] Zero low [19, 73] and convert the obtained blurry image to irradiance image L. We then multiply low light factor r to L,

is the camera response function (CRF) function, and M (.) represents the function that converts an RGB image to a Bayer image. Finally, we add realistic Photon-Gaussian noise [19] , where n consists of two components: stationary noise n c with noise variance σ 2 c and signal dependent noise n s with spatially varying noise variance Lσ 2 s . Training dataset. We conduct our experiments using clean images from the CelebA [42] and VGGFace2 [4] face datasets. We randomly selected 30000 images from the training set of CelebA [42] , and 30000 images from VG-GFace2 [4] and generate synthetic degraded images with multiple degradations. Images in the CelebA and VG-GFace2 datasets are of size 176 × 144 and 224 × 224, respectively. Given a clean face image x, we first convolve it with blur kernel k sampled from 25000 motion kernels generated using [1, 27] , and 8 anisotoric Gaussian kernels [88] , and then following [73, 75] we multiply them with low light factor (r) sampled uniformly from [0.05, 0.5] to obtain images with low-light conditions. Finally, we add realistic noise [19] n (where σ s ∈ [0.01, 0.16] and σ c ∈ [0.01, 0.06]) to obtain the degraded image y. Based on the degradations present in y, we create class label c which is a vector of length three, c = {b, n, l} where b, n, and l are binary numbers, i.e b, n, and l are one if y contains blur, noise and low-light, respectively and zero otherwise.

Test datasets. We create test datasets using randomly sampled 100 test images from the test sets of CelebA [42] and VGGFace2 [4] . Using these clean images, we create test datasets Test-B, Test-N, Test-L, Test-BN, and Test-BNL with the amounts of degradations as shown in the Table 4 . Additionally, we collected a real-world face image dataset with multiple degradations corresponding to 20 subjects from YouTube. Training Details. The NASFE network is trained using

It is trained to produce a class labelĉ i which indicates degradation(s) present in y i . Training and network details of the classification network are provided in the supplementary document. NASFE contains three encoders (deblur (E B (.)), denoise (E N (.)) and low-light (E L (.))), one fusion network (F n(.)) and a decoder (De(.)). Encoders (E B , E N , E L ) are initially trained to address the corresponding individual tasks of deblurring, denoising, and low-light enhancement, respectively. More details are provided in the supplementary document. Given a degraded image y, we compute classĉ (using CN) and identity information I iden and pass them as input to NASFE to compute a restored imagex. We set the number of blocks B to 12 in the Fusion cell of NASFE. Following [37] , we update α and the weights of the NASFE alternately during training. NASFE is trained using L f inal with the Adam optimizer and batch size of 40. The learning rate is set equal to 0.0005. NASFE is trained for one million iterations.

We compare the performance of our network against the state-of-the-art (SOTA) denoising [9, 93, 7, 87, 19] , deblurring [48, 2, 66, 83, 28, 35, 80] , and low-light enhancement methods [14, 20, 32, 18, 73] . Peak-Signal-to-Noise Ratio (PSNR) and Structural Similarity index (SSIM) [72] measures are used to compare the performance of different methods on synthetic images. Note that we retrain the SOTA methods [2, 66, 83, 28, 35, 80] using the training data discussed earlier and following the procedures explained in the respective papers. Additionally, we provide visual comparisons of NASFE against the SOTA methods using the real blurry images provided by [66] , and real low-light exposure images published by [18, 3] .

Denoising Experiments.

We perform quantitative analysis of NASFE against the SOTA denoising methods [9, 93, 7, 87, 19] using Test-N. As can be seen from Table 1 , our method outperformed the SOTA method by 0.6dB in PSNR and 0.02 in SSIM. Note that methods such as [87, 19] are specifically designed for image denoising. Even though our method is designed for dealing with multiple degradations, it provides better performance than SOTA denoising methods.

Deblurring Experiments. We compare the performance of NASFE against SOTA deblurring [48, 46, 2, 66, 83, 28] methods using Test-B. Methods such as [48, 46, 2, 66, 83] use facial exemplar or semantic information as priors to perform deblurring. On the other hand, our network uses identity information as a prior to perform deblurring. As can be seen from Table 2 , our method outperforms the SOTA method by 1.93dB in PSNR and 0.04 in SSIM.

Low-light Enhancement Experiments. We evaluated the performance of NASFE against SOTA low-light enhancement methods [14, 20, 32, 18, 73] using Test-L. Even though our method is trained for addressing multiple degradations, NASFE performed significantly better than [14, 20, 32, 18, 73] . As can be seen from Table 3 , our method outperforms SOTA methods [14, 20, 32, 18, 73] by 3dB in PSNR and 0.10 in SSIM.

More qualitative results are provided in the supplementary document.

Blur + Noise + Low-light Experiments. We compare the performance of different methods on Test-BNL which contains multiple degradations blur, noise, and low-light conditions. Note, we retrain [2, 66, 83, 28, 35, 80] using degraded images that contain all degradations. Results are corresponding to this experiment are shown in Table 5 . As can be seen from this table, NASFE performs better by 2.1dB in PSNR and 0.07 in SSIM compared to second best performing method. Fig. 5 shows the qualitative results of NASFE against other methods. The outputs of other methods are blurry or contain artifacts near eyes, nose and mouth. On the other hand, NASFE produces clear and sharp face images. Furthermore, we can observe from Fig. 5 and Fig. 6 that outputs of other methods still contain low-light conditions, whereas NASFE produces sharp face images with enhanced lighting conditions.

We conducted face recognition experiments using Test-BNL, to show the significance of various face restoration methods on face recognition. Face recognition experiments are conducted using ArcFace [10] on Test-BN, where Top-K similar faces for the restored face image are picked from the gallery set and used to compute accuracy. Table 5 show the face recognition accuracies corresponding to different methods. We can clearly see that NASFE achieves an improvement of 7% over the second best performing method.

We conduct ablation studies using the test-sets Test-BN, and Test-BNL to show the improvements achieved by the different components in NASFE. We start with the baseline network (BN), and define it as a combination of three encoders (E B , E N , and E L ), a fusion network (composed of 4 Res2Blocks [15] ), and a decoder (De). As shown in Table 6 , BN performs very poorly due to its inability in processing task-specific features efficiently. Now, we introduce network architecture search in the fusion network by using fusion cells in-order to efficiently processing the taskspecific features. The use of network architecture search results in improvement of BN-NAS by ∼ 2dB compared to BN. Then, we use class labels c (computed using classification network) as input to BN-NAS which increases the performance of the network by ∼ 0.5dB. Now we use the proposed identity information I iden of the identity present in the degraded image (refer to section 3.2) which further improves the performance of the network by ∼ 1.5dB. The resultant network corresponds to NASFE. Note that BN and BN-NAS are trained using L mse . Now we train NASFE with L mse and L per which further improves the performance by 0.3dB. Now we use the proposed L iden to construct L f inal and train NASFE. The proposed L iden improves the performance of NASFE by ∼ 0.5dB.

We proposed a multi-task face restoration network, called NASFE, that can enhance poor quality face images containing a single degradation (i.e. noise or blur) or multiple degradations (noise+blur+low-light UMSN [83] (TIP'20)

DeblurGANv2 [28] (ICCV '19) DFDNet [28] (ECCV'20)

HiFaceGAN [28] (ACMM'20) use of the clean face images of a person present in the degraded image to extract the identity information, and uses it to train the network weights. Additionally, we use network architecture search to design the fusion network in NASFE that fuses the task-specific features obtained from the encoders. Extensive experiments shows that the proposed method performance significantly better than SOTA image restoration/enhancement methods on both synthetic degraded images as well as real-world images with multiple degradations (noise+blur+low-light). 

Modeling the performance of image restoration from motion blur

Super-fan: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with gans

Efficient architecture search by network transformation

Vggface2: A dataset for recognising faces across pose and age

Deep convolutional neural network for image deconvolution

Gated context aggregation network for image dehazing and deraining

Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE transactions on pattern analysis and machine intelligence

Deep photo enhancer: Unpaired learning for image enhancement from photographs with gans

Image denoising by sparse 3-d transformdomain collaborative filtering

Arcface: Additive angular margin loss for deep face recognition

Nonlocally centralized sparse representation for image restoration

A fast wavelet algorithm for image deblurring

Removing camera shake from a single photograph

A weighted variational model for simultaneous reflectance and illumination estimation

Res2net: A new multi-scale backbone architecture

Deep bilateral learning for realtime image enhancement

Weighted nuclear norm minimization with application to image denoising

Zero-reference deep curve estimation for low-light image enhancement

Toward convolutional blind denoising of real photographs

Lime: Low-light image enhancement via illumination map estimation

Arbitrary style transfer in real-time with adaptive instance normalization

Context-based automatic local image enhancement

Dslr-quality photos on mobile devices with deep convolutional networks

Perceptual losses for real-time style transfer and super-resolution. ECCV

Blind deconvolution using a normalized sparsity measure

Learning to push the limits of efficient fft-based image deconvolution

Deblurgan: Blind motion deblurring using conditional adversarial networks

Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better

Multiscale image blind denoising

Noise2noise: Learning image restoration without clean data

Aod-net: All-in-one dehazing network

Structure-revealing low-light image enhancement via robust retinex model

Robust optical flow in rainy scenes

Rainflow: Optical flow under rain streaks and rain veiling effect

Blind face restoration via deep multi-scale component dictionaries

Enhanced blind face restoration with multi-exemplar images and adaptive spatial feature fusion

Autodeeplab: Hierarchical neural architecture search for semantic image segmentation

Automatic estimation and removal of noise from a single image

Progressive neural architecture search

Hierarchical representations for efficient architecture search

Darts: Differentiable architecture search

Deep learning face attributes in the wild

Unsupervised domain-specific deblurring via disentangled representations

Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections

Evolving deep neural networks

Deep multi-scale convolutional neural network for dynamic scene deblurring

Exampledriven manifold priors for image deconvolution

Deblurring face images with exemplars

Blind image deblurring using dark channel prior

Donggeun Yoo, and In So Kweon. Distort-and-recover: Color enhancement using deep reinforcement learning

Deep face recognition

Shearlet-based deconvolution

Efficient neural architecture search via parameter sharing

Adaptive histogram equalization and its variations. Computer vision, graphics, and image processing

Contentaware dark image enhancement through channel division

Regularized evolution for image classifier architecture search

Large-scale evolution of image classifiers

Image deblurring via enhanced low-rank prior

Face video deblurring using 3d facial priors

Content-aware dark image enhancement through channel division

Shrinkage fields for effective image restoration

Shrinkage fields for effective image restoration

Discriminative non-blind deblurring

A machine learning approach for non-blind image deconvolution

Robust Patch-Based HDR Reconstruction of Dynamic Scenes

Deep semantic face deblurring

Exploiting semantics for face image deblurring

Edge-based blur kernel estimation using patch priors

Imagepairs: Realistic super resolution dataset via beam splitter camera rig

Non-blind deblurring: Handling kernel uncertainty with cnns

Attention is all you need

Image quality assessment: from error visibility to structural similarity

A physics-based noise formation model for extreme low-light raw denoising

Genetic cnn

Learning to restore low-light images via decomposition-andenhancement

Deep convolutional neural network for image deconvolution

Unnatural l0 sparse representation for natural image deblurring

Learning to superresolve blurry face and text images

Bm3d-net: A convolutional neural network for transform-domain collaborative filtering

Hifacegan: Face renovation via collaborative suppression and replenishment

Image correction via deep reciprocating hdr transformation

Learning to restore a single face image degraded by atmospheric turbulence using cnns

Deblurring face images using uncertainty guided multi-stream semantic networks

Face super-resolution guided by facial component heatmaps

Multi-style generative network for real-time transfer

Dimitris Metaxas, and Augustus Odena. Self-attention generative adversarial networks

Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising

Learning a single convolutional super-resolution network for multiple degradations

Practical block-wise neural network architecture generation

From noise modeling to blind image denoising

Neural architecture search with reinforcement learning

Learning transferable architectures for scalable image recognition

From learning models of natural image patches to whole image restoration