Journal of Electronic Imaging 13(2), 264–277 (April 2004). Adaptive hybrid mean and median filtering of high-ISO long-exposure sensor noise for digital photography Tamer Rabie UAE University College of Information Technology P.O. Box 15551 Al-Ain, United Arab Emirates E-mail: tamer@cs.toronto.edu Abstract. This paper presents a new methodology for the reduction of sensor noise from images acquired using digital cameras at high- International Organization for Standardization (ISO) and long- exposure settings. The problem lies in the fact that the algorithm must deal with hardware-related noise that affects certain color channels more than others and is thus nonuniform over all color channels. A new adaptive center-weighted hybrid mean and median filter is formulated and used within a novel optimal-size windowing framework to reduce the effects of two types of sensor noise, namely blue-channel noise and JPEG blocking artifacts, common in high-ISO digital camera images. A third type of digital camera noise that affects long-exposure images and causes a type of sensor noise commonly known as ‘‘stuck-pixel’’ noise is dealt with by pre- processing the image with a new stuck-pixel prefilter formulation. Experimental results are presented with an analysis of the perfor- mance of the various filters in comparison with other standard noise reduction filters. © 2004 SPIE and IS&T. [DOI: 10.1117/1.1668279] 1 Introduction With the advent of the inexpensive charge-coupled device ~CCD! on a chip ~Fig. 1!, the wide-spread move from tra- ditional 35 mm film photography to digital photography is becoming increasingly apparent especially with journalists and professional photographers. This has prompted digital camera manufacturers to try to implement most of the legacy techniques common among traditional film cameras such as high-International Organization for Standardization ~ISO! film, long exposures, high-speed shutters, etc., into digital cameras. One technique that is of utmost importance to a large community of photographers is the digital camera equivalent to the traditional high-speed silver-based film sensitivity, commonly known as the ISO sensitivity num- ber. An ISO number that appears on regular camera film packages specifies the speed, or sensitivity, of this type of silver-based film. The higher the number the ‘‘faster’’ or more sensitive the film is to light. Typical ISO speeds for Paper 03-014 received Feb. 3, 2003; revised manuscript received Apr. 30, 2003 and Sep. 11, 2003; accepted for publication Sep. 15, 2003. 1017-9909/2004/$15.00 © 2004 SPIE and IS&T. 264 / Journal of Electronic Imaging / April 2004 / Vol. 13(2) silver-based film include 100, 200, or 400. Each doubling of the ISO number indicates a doubling in film speed so each of these films is twice as fast as the next fastest. Image sensors used in digital cameras are also rated using equiva- lent ISO numbers. Just as with film, an image sensor with a lower ISO needs more light for a good exposure than one with a higher ISO. In poorly lit conditions, a longer expo- sure of the image sensor is needed for more light to enter. This, however, will lead to the acquired images being blurred, unless the scene being imaged is completely still. It is, therefore, better to set the image sensor to a higher ISO setting because this will enhance freezing of scene motion and shooting in low light. Typically, digital image sensor ISOs range from 100 ~fairly slow! to 3200 or higher ~very fast!. Some digital cameras have more than one ISO rating. In low-light situations, the sensor’s ISO can be increased by amplifying the image sensor’s signal ~increasing its gain!. Some cameras even increase the gain automatically. This not only increases the sensor’s sensitivity, but, unfortu- nately, also increases the noise or ‘‘grain,’’ thus, generating images that are contaminated with random noise effects. 2 Sensor Noise Types Noise can be summarized as the visible effects of an elec- tronic error ~or interference! in the final image from a digi- tal camera. Noise is a function of how well the image sen- sor and digital signal processing systems inside the digital camera are prone to and can cope with or remove these errors ~or interference!. Noise significantly degrades the image quality and increases the difficulty in discriminating fine details in the image. It also complicates further image processing, such as image segmentation and edge detection. The type of high-ISO sensor noise produced by a typical digital camera CCD imaging sensor can be modeled as an additive white Gaussian distribution with zero mean and a variance ~noise power! proportionate to the amount of am- plification applied to the image sensor’s signal to boost its gain.1–3 Rabie Visible noise in a digital image is often affected by tem- perature ~high worse, low better! and ISO sensitivity ~high worse, low better!. Some cameras exhibit almost no noise and some a lot and all the time. It has certainly been the challenge of digital camera developers to reduce noise and produce a ‘‘cleaner’’ image, and indeed some recent digital cameras are improving this situation greatly, allowing for higher and higher ISOs to be used without too much noise. In general, image artifacts produced by digital cameras can be divided into three types. • Stuck-pixel noise: also known as impulse-type noise is created by many digital cameras and is caused by long exposure time of the CCD elements in dim-lighting conditions when a bright image is required without the use of the flash light. During the exposure, some CCD cells become saturated and are stuck at a bright color, which show up in the acquired image as bright im- pulse noise pixels @see Fig. 2~a!#. Removing this type of noise is usually not a difficult task but comes at the cost of blurring the other uncorrupted pixels. • Blue-channel noise: is a common problem in digital photographs, especially those created by high-ISO professional news and sports digital cameras. A typical digital camera sensor @CCD or complementary metal– oxide–semiconductor ~CMOS!# is more sensitive to certain primary colors than others ~often sensors are Fig. 1 The CCD imaging sensor chip used in the ‘‘Kodak Profes- sional DCS720x’’ digital camera. (courtesy http:// www.dpreview.com/). less sensitive to blue light! and so to compensate, these channels are amplified more than the others @see Fig. 2~d!#. This noise has hampered the acceptance of digital cameras for quality reasons, and it has limited their use for some techniques such as automasking based on chrominance ~e.g., ‘‘blue screen’’ back- grounds!. • JPEG artifacts: also known as JPEG blocking arti- facts, are due to the nature of the 8-by-8 block-size used by the JPEG compression standard. High com- pression ratios result in images with blockyness in the blue and red channels. These blocks are especially ob- vious in the flat areas of an image. In high detail areas, artifacts called ‘‘mosquito noise’’ become noticeable. This term comes from the ripple effect that mosqui- toes make when their legs touch water. The sensor noise filtering techniques that will be de- scribed shortly present a solution to these types of artifacts that are common in most color images acquired by digital cameras. 3 Background of Noise Reduction Techniques The primary concern in digital photography is the visual fidelity of the acquired images. What professional photog- raphers demand from a digital camera is fast and precise image acquisition ~high-sensitivity in low-light conditions and exact-moment capture! coupled with the best visual results. Previous noise reduction techniques available in the literature do not take into account the physics of the CCD photo-capture element used in a majority of video and digi- tal still cameras today. The CCD image sensor tends to be highly sensitive to the green light frequencies and less sen- sitive to the blue and red light waves. The CCD hardware controller is, thus, tuned to increase the signal gain of the blue CCD elements more than the green elements. In nor- mal and good lighting conditions there are no visible ef- fects due to this difference in signal gain, but in low light conditions, where the CCD signal gain is increased more for the less sensitive color channels, this produces high- frequency noise that contaminates the blue channel, and, to a lesser extent, the red channel, more severely than the green channel. In general the chrominance channels of the acquired images will be more severely affected by this Fig. 2 (a) Long exposure stuck pixel noise (courtesy http://www.dpreview.com/), (b) red channel, (c) green channel, and (d) blue channel showing excessive noise. Journal of Electronic Imaging / April 2004 / Vol. 13(2) / 265 Adaptive hybrid mean and median filtering . . . Fig. 3 An image acquired in low light at an ISO-400 setting with a Minolta Dimage digital camera, then separated into its component channels using the L*a*b* color space to show the severity of noise in the chrominance a and b channels as compared to the luminance channel due to the limitations of the CCD image sensor. noise than the luminance channel of the image as shown in Fig. 3. The resulting effect is the visibility of random noise artifacts in the acquired image that differs in severity from acceptable ~at low-ISO settings , ISO 400) to completely contaminating the picture ~at very high ISO settings . ISO 2000) such that it becomes visually unacceptable. More details on separating a color image into luminance ~brightness! and chrominance ~color! channels will be pre- sented in Sec. 4. Statistical characteristics of images are of fundamental importance in many areas of image processing. Incorpora- tion of a priori statistical knowledge of spatial correlation in an image, in essence, can lead to considerable improve- ment in many image processing algorithms. For noise fil- tering, the well-known Wiener filter for minimum mean- squared error ~MMSE! estimation is derived from a measure or an estimate of the power spectrum of the image, as well as the transfer function of the spatial degradation phenomenon and the noise power spectrum.4 Unfortunately, the Wiener filter is designed under the assumption of wide- sense stationary signal and noise. Although the stationarity assumption for additive, zero-mean, white Gaussian noise is valid for most cases, it is not reasonable for most realistic images, apart from the uninteresting case of uniformly gray image fields. What this means in the case of the Wiener filter is that we will experience uniform filtering throughout the image, with no allowance for changes between edges and flat regions, resulting in unacceptable blurring of high- frequency detail across edges and inadequate filtering of noise in relatively flat areas. Noise reduction filters have been designed in the past with this stationarity assumption. These have the effect of removing noise at the expense of signal structure. Ex- amples such as the fixed-window Wiener filter and the fixed-window mean and median filters have been the stan- dard in noise smoothing for the past 2 decades.4 – 6 These filters typically smooth out the noise, but destroy the high- frequency structure of the image in the process. This is mainly due to the fact that these filters deal with the fixed- window region as having sample points that are stationary ~belonging to the same statistical ensemble!. For natural scenes, any given part of the image generally differs suffi- ciently from the other parts so that the stationarity assump- 266 / Journal of Electronic Imaging / April 2004 / Vol. 13(2) tion over the entire image or even inside a fixed-window region is not generally valid. Newer adaptive Wiener filter- ing techniques that take into account the nonstationarity nature of most realistic images have been used as an alter- native to preserve signal structure as much as possible.7 Many, however, do this at the expense of proper noise re- duction, where the high-frequency areas will be insuffi- ciently filtered, which will result in a large amount of high- amplitude noise remaining around edges in the image. Another shortcoming is the failure of these filters to remove stuck-pixel ~impulse-type! noise that appears in the ac- quired images due to long CCD exposure times in dim light. A fixed-window median filter will remove this type of impulse noise but will also alter important signal structure due to the same assumption that the image samples in the fixed-window can be modeled by a stationary random field, which is not valid for a fixed-window that cannot inher- ently differentiate between edge and flat image regions.8 Another shortcoming with many noise filtering tech- niques that deal with color digital images is the application of the same filter evenly to the three color channels ~R,G,B!9,10 with the assumption that the sensor noise is equally distributed among the three color channels, which is an erroneous assumption as explained at the start of this section. To the author’s knowledge, the issue of high-ISO noise reduction for digital cameras has not received much attention in the literature. Although current work in the lit- erature on adaptive noise reduction filters ~such as Smolka et al.,11 Eng and Ma12! may be used to reduce high-ISO digital camera noise, these filters have been developed without the specific needs of professional digital cameras and as such do not take into consideration the different types of noise degradations which are generated by these digital cameras as mentioned previously. The result is either insufficient noise reduction in the chrominance channels, or too much smoothing in the luminance channel of the fil- tered image. Also, many filters only deal with gray-level images6,8,13–17 and, as such, are not very effective for the type of images acquired by the digital cameras discussed in this work. We will compare one of the commonly used adaptive spatial noise reduction filters, namely the adaptive local statistic MMSE filter, with our work to show the ef- fectiveness of our technique in producing visually superior Rabie filtered images that can directly be used for further analysis, such as edge detection and image understanding. Digital camera manufacturers are only now beginning to realize the importance of incorporating noise reduction fil- ters in the hardware image acquisition pipeline for their digital cameras. To the author’s knowledge, the only known digital camera to actually attempt to incorporate a noise reduction algorithm is the Kodak Professional DCS720x digital camera.18 Kodak rates the camera as ‘‘calibrated’’ up to ISO 4000 and capable of ISO 6400. This means that shooting in extremely low light/high shutter speed condi- tions is possible with this camera, but at the expense of increased noise in the acquired images. With noise reduc- tion activated the images are acquired with much less noise and are more visually pleasing. Images from this camera are used as a comparison with the techniques described in this paper. In the next sections, a new adaptive technique is de- scribed that is highly tuned to produce visually pleasing filtered color digital images that have been acquired using digital sensor-based cameras. This new technique differs from previous filtering methods in that it is geared towards the type of color images obtained from digital cameras, and thus takes into account the physical limitations of the CCD and the specific types of CCD sensor noise produced. 4 Color Spaces Before going into the details of the new color filtering methods, it is important to give a brief background of the most popular color spaces used in separating color images into their component color channels before filtering each channel. A color space is a model for representing color in terms of intensity values. It defines a one-, two-, three-, or four- dimensional space whose dimensions, or components, rep- resent intensity values. A color component is also referred to as a color channel. For example, RGB space is a three- dimensional color space whose components are the red, green, and blue intensities that make up a given color. Color spaces can be divided into two general categories; device dependent and device independent color spaces. • Device dependent color spaces: These include the family of RGB Spaces. The RGB space is a three- dimensional color space whose components are the red, green, and blue intensities that make up a given color. Most CCD and CMOS based digital camera im- aging sensors use the RGB color space by reading the amounts of red, green, and blue light reflected from the scene that fall on the CCD elements and then con- vert those amounts into digital values. These values are device dependent and one CCD may produce a different RGB value from another depending on how it is manufactured. HSV space and HLS space are transformations of RGB space that can describe colors in terms more natural to an artist. The name HSV stands for hue, saturation, and value, and HLS stands for hue, lightness, and saturation. The CMY color space is sometimes used in CCD and CMOS image sensors. The name CMY refers to cyan, magenta, and yellow, which are the three primary colors in this color space, and red, green, and blue are the three second- aries. • Device independent color spaces: Some color spaces can express color in a device-independent way. Whereas RGB colors vary with CCD and CMOS sen- sor hardware characteristics, device-independent col- ors are meant to be true representations of colors as perceived by the human eye. These color representa- tions, called device-independent color spaces, result from work carried out in 1931 by the Commission Internationale d’Eclairage ~CIE! and for that reason are also called CIE-based color spaces. The CIE cre- ated a set of color spaces that specify color in terms of human perception. It then developed algorithms to de- rive three imaginary primary constituents of color, namely X , Y , and Z , that can be combined at different levels to produce all the color the human eye can per- ceive. The resulting color model, and other CIE color models, form the basis for all color management sys- tems. Although the RGB and CMY values differ from device to device, human perception of color remains consistent across devices. Colors can be specified in the CIE-based color spaces in a way that is indepen- dent of the characteristics of any particular imaging device. The goal of this standard is for a given CIE- based color specification to produce consistent results on different devices, up to the limitations of each device.19 One problem with representing colors using the X Y Z color space is that it is perceptually nonlinear: it is not possible to accurately evaluate the perceptual closeness of colors based on their relative positions in X Y Z space. Col- ors that are close together in X Y Z space may seem very different to observers, and colors that seem very similar to observers may be widely separated in X Y Z space. L *a *b * space is a nonlinear transformation of X Y Z space to create a perceptually linear color space designed to match per- ceived color difference with quantitative distance in color space.20,21 As stated earlier, the CCD sensor of a typical digital camera is less sensitive to the blue and red channels and this causes amplified noise artifacts in the chromatic chan- nels in low light or at high-ISO settings. Moreover, there seems to be general agreement that spatial resolution is markedly lower in chromatic channels than in the achro- matic one ~see Fig. 3!, hence, high-frequency information, i.e., edges, come mainly from this achromatic channel.22 Another important consideration is that, in order to avoid chromatic artifacts in the filtered image, a nonlinear opera- tor cannot be applied to each RGB component separately.22 These two considerations and experimental results suggest that a color model which should separate luminance from chrominance is suitable. We, thus, choose to separate our acquired images using the L *a *b * color space because of its merits stated earlier. The L *a *b * color space separates the RGB image into a luminance channel L , and two chrominance channels ( a , b ) . This allows us to use differ- ent filter parameters specifically tuned for each channel. In general the luminance channels suffer less noise artifacts Journal of Electronic Imaging / April 2004 / Vol. 13(2) / 267 Adaptive hybrid mean and median filtering . . . than the ( a , b ) chrominance channels. We, therefore, take this into consideration when filtering each channel, by al- lowing more smoothing in the ( a , b ) channels to correct for color artifacts, while passing more high frequency in the filtered luminance channel. This will be further emphasized when presenting the experimental results in a later section. 5 Adaptive-Window Signal Equalization Hybrid Filter The quantity of light falling on an image sensor array ~e.g., CCD array!, is a real valued function q ( x , y ) of two real variables x and y . An image is typically a degraded mea- surement of this function, where degradations may be di- vided into two categories, those that act on the domain ( x , y ) and those that act on the range q . Sampling, aliasing, and blurring act on the domain, while noise ~including quantization noise! and the nonlinear response function of the camera act on the range.23 We are concerned with the latter type of camera sensor degradations. Digital camera sensor noise reduction is the process of removing unwanted noise from a digital image. It falls into two main categories, reduction or removal of noise from high-ISO images including JPEG compression artifacts, and reduction or removal of noise from long-exposure im- ages ~with ‘‘stuck pixels’’!. In this section a detailed de- scription of the adaptive hybrid filter for sensor noise re- moval is presented with experimental results showing its performance in comparison with other standard noise re- duction filters. 5.1 Hybrid Mean and Adaptive Center Weighted Median Filter The median filter is a class of order-statistic filters where filter statistics are derived from ordering ~ranking! the ele- ments of a set rather than computing means, etc. The me- dian filter is a nonlinear neighborhood operation, similar to convolution, except that the calculation is not a weighted sum. Instead, the pixels in the neighborhood are ranked in the order of their gray levels, and the midvalue of the group is stored in the output pixel. In probability theory, the me- dian M , of a random variable x , is the value for which the probability of the outcome x , M is 0.5.6 Median filtering is normally a slower process than convolution, due to the re- quirement for sorting all the pixels in each neighborhood by value. There are, however, algorithms that speed up the process.24,25 The median filter is popular because of its demonstrated ability to reduce random impulsive noise without blurring edges as much as a comparable linear low- pass filter. However, it often fails to perform as well as linear filters in providing sufficient smoothing of nonimpul- sive noise components such as additive Gaussian noise. In order to achieve various distributed noise removal, as well as detail preservation, it is often necessary to combine lin- ear and non-linear operations.26 –29 In this section we intro- duce a hybrid filter combining the best of both worlds; proper smoothing in flat regions and detail preservation in busy regions of the image. One of the main disadvantages with the basic median filter is that it is location-invariant in nature, and thus also tends to alter the pixels not disturbed by noise. The center weighted median filter ~CWMF! was developed to address 268 / Journal of Electronic Imaging / April 2004 / Vol. 13(2) this limitation in the basic median filter.30,31 This filter will give the pixel at the center of the window more weight ( . 1 ) than the other pixels in the window before determin- ing the median. This has the effect of preferentially pre- serving that pixel’s value so both fine detail and noise are more preserved. In the extreme, one could make it so that the center pixel has a weight equal to the entire weight of the rest of the window, in which case the value of the center pixel is assured of being the output of the median opera- tion. This is the identity filter, where the output is equal to the input. In general, a CWMF can be varied over the range from the median filter to the identity filter by varying the central weight. This corresponds to the range from strong noise and detail removal ~basic median filtering! to none ~identity filtering!. In the original CWMF implementation, the central weight is constant over the entire image. In this paper, we make use of the CWMF concept and implement it as an adaptive CWMF ~ACWMF! by varying the central weight based on signal and noise estimates in- side an adaptive window framework, which will be de- scribed in detail in the next section. In formulating our center weighted median-based filter we use an image model with additive noise as follows: y ~ k , l !5 x~ k , l !1 n~ k , l !, ~1! where k P@ 0,M 2 1 #, and l P@ 0,N 2 1 # for an M 3 N sized image. n ( k , l ) is a zero-mean additive white Gaussian noise random variable, of variance s n 2 , and uncorrelated to the ideal image x ( k , l ) , which is assumed to be of zero mean and variance s x 2 , and y ( k , l ) is the noise-corrupted input image. For the purpose of the following analysis, we assume that both x ( k , l ) and n ( k , l ) are ergodic random variables. The implication of this assumption is that although we do not have a priori knowledge of the signal and noise statis- tical variance and mean, we can still capture samples of x ( k , l ) and n ( k , l ) and determine their variance and mean, which are, in turn, representative of their respective en- sembles. It should also be noted that although the noise variance, s n 2 , is not known a priori, it is easily estimated from a window in a flat area of the degraded image y ( k , l ) . 32 We begin by setting an objective criterion of optimality for deriving the central weight at each pixel location. We use a similar criterion to that used in deriving the power spectrum equalization filter,33 by seeking a linear estimate, x̂ ( k , l ) , such that the signal variance of the estimate is equal to the variance of the ideal image x ( k , l ) . Assuming this estimate is of the form x̂~ k , l !5 a•y ~ k , l !, ~2! we can express our criterion as s x 25 E$x̂ 2%5 E$~ a•y !2%, ~3! where E$•% is the expectation operator. In general, the ac- quired images have a nonzero mean, and we can account for this by subtracting the mean of each image from both random variables of Eq. ~2!. For zero mean noise, the a Rabie posteriori sample mean ~local mean inside the adaptive window! of the degraded pixel y ( k , l ) , denoted by m y , is equal to the a priori sample mean of the ideal pixel x ( k , l ) . After dropping the ( k , l ) notation for readability, we have x̂ 2 m y 5 a•~ y 2 m y !, ~4! and we can write our criterion after accounting for the mean as follows: s x 25 E$~ x̂ 2 m y ! 2%5 E$@ a•~ y 2 m y !# 2% 5 a 2•s y 25 a 2•~ s x 21 s n 2 !. ~5! Therefore, the signal equalization estimator a becomes a 5A s x2 s x 21 s n 2. ~6! If the number of pixels in the adaptive window, of size L x3 L y , is ( L x•L y ) , then the central weight for the pixel under analysis at pixel position ( k , l ) is given as C w5 a•~ L x•L y 2 1 !1 1. ~7! This central weight can be used to give the value of the pixel at ( k , l ) more weight than the other pixels in the adap- tive window before determining the median; i.e., we count it as if it were C w pixels rather than just one pixel. Thus, a 5 0 is the basic median filter, while a 5 1 is the identity filter, and 0