J Braz Comput Soc (2013) 19:341–359 DOI 10.1007/s13173-013-0102-1 ORIGINAL PAPER A survey on automatic techniques for enhancement and analysis of digital photography Claudio S. V. C. Cavalcanti · Herman Martins Gomes · José Eustáquio Rangel De Queiroz Received: 14 September 2012 / Accepted: 7 February 2013 / Published online: 26 March 2013 © The Brazilian Computer Society 2013 Abstract Thefastgrowthintheconsumerdigitalphotogra- phy industry during the past decade has led to the acquisition and storage of large personal and public digital collections containing photos with different quality levels and redun- dancy, among other aspects. This naturally increased the dif- ficulty in selecting or modifying those photos. Within the above context, this survey focuses on systematically review- ing the state-of-art on techniques for the enhancement and analysis of digital photos. Nevertheless, it is not within the scope of this survey to review image quality metrics for eval- uating degradation due to compression, digital sensor noise, and affine issues. Assuming the photos have good quality in those aspects, this review is centered on techniques that might be useful to automate the task of selecting photos from large collections or to enhance the visual aspect of imperfect photos by using some perceptual measure. Keywords Image enhancement · Photographic analysis · Computational aesthetics · Survey 1 Introduction In the late 1990s, there was an immense growth in the digital photography industry. Manufacturers began to produce dig- ital cameras on a large scale and at decreasing prices [119]. Great changes have been noticed in photographic technology and practice since then. When using consumer analog film, the number of photos was limited by the roll size (which usually allowed at most 36 photos). Nowadays, with large C. S. V. C. Cavalcanti (B) · H. M. Gomes · J. E. R. De Queiroz Universidade Federal de Campina Grande, Rua Aprigio Veloso, 882 Bodocongo, Campina Grande, PB 58429-140, Brazil e-mail: claudio.cavalcanti@gmail.com capacity re-writable memory cards (e.g., 256 GB), the num- ber of photos that can be acquired/stored has increased by approximately three orders of magnitude (if considering digi- tal images, captured with a resolution of 8 MP). Digital pho- tography also changed the way photos were printed. With film, photos had to be developed first, in order to be seen, whereas when shooting with a digital camera, printing is no longer a requirement, once it is possible to preview images in the camera viewer or on a monitor screen, and then to decide which ones to print. One consequence of those changes is that taking photos has become an almost costless task. Thus, the judgment of what could be a good shot and the care for adjusting camera settings for a specific scene becomes less usual for most con- sumers and even for some professional photographers. As a result, large amounts of photos are taken and stored daily. This causes difficulties in selecting which ones to print or to publish, e.g., in digital albums. In summary, this results in a scenario involving a large amount of stored photos from which just a small part will be printed. In this survey, con- sumer photos are considered the ones obtained (1) with minor adjustments in camera settings, (2) aiming at portraying daily events, and (3) barely exploring art basic techniques. On the other hand, professional photos differ from consumer ones by the use of more elaborate techniques and better equip- ment utilization, which might improve the photo quality. In this survey, professional photos are not necessarily obtained by a professional photographer, and do not encompass other connotations of professional photos (e.g., artistic or journal- istic). There are several recent applications for which photo processing is an essential intermediate task, e.g., photo col- lage [149], slide showing [44], browsing [68,85,112,150], storytelling [53], and photo summarization [28,110,117, 141]. 123 342 J Braz Comput Soc (2013) 19:341–359 Fig. 1 Block diagram illustrating the sub-areas in which this work is subdivided The algorithms reviewed in this survey are organized into two categories: enhancement and analysis. Each category is divided in other sub-categories. An illustration for the divi- sion that is used in this work is shown in Fig. 1. While enhancement algorithms are intended to modify the image in such a way that it might become better-looking or appealing, analysis algorithms are designed to assess photos according to some criteria, such as composition, aesthetics, or overall quality. A number of papers have been written on both image enhancement and analysis in the past years. This survey focuses in work on image processing techniques that were already tested or may be directly used in specific problems of photo enhancement and analysis. In order to avoid ambi- guities, in this survey the words photo and photography are strictly related to consumer and professional photography. Image enhancement algorithms can be classified as on the fly and off-line. While on the fly algorithms modify the photo conditions before the photo is taken, off-line algorithms per- form changes after the acquisition took place. Although the on the fly algorithms might lead to better results than off-line algorithms, they must run faster since it might be necessary to do this in a real-time operation. Off-line algorithms are limited in the sense that they do not allow scene changes, e.g., it is not possible to zoom out from a photo or ask some- one to open his/her eyes. However, there is no a priori time frame to produce the enhancement result. Image analysis algorithms can be classified as assessment, information extraction, and grouping algorithms. Assess- ment algorithms analyze the visual aspect of a photo in two main facets: aesthetics and with respect to the image quality assessment (IQA). Formally, the main goal of IQA algo- rithms is to predict ratings in a human-like manner [21]. Although this definition is very broad, the term IQA is typi- cally used to denote the evaluation of the image degradation (e.g., due to lossy compression or noise) [21,62,63,118,167, 178]. Therefore, in this survey, IQA is used with this latter meaning. There is also some ambiguity regarding the use of the expression aesthetics quality assessment. In this survey, aesthetics quality assessment algorithms are defined as the ones whose goal is to assign a score (or a class of scores, such as professional and amateur) to a photo based in the analyzed feature, e.g., photographic composition rules or number of faces found, as used by other authors [7,74,126,145]. More- over, information extraction algorithms search for elements of interest, such as the place a photo was taken, the existence of faces, and the presence of specific people in the environ- ment, among others. Finally, grouping algorithms are defined in this survey as the ones which analyze images in order to find similarities between them. It is not within the scope of this survey to review image quality assessment for evaluating degradation due to com- pression, digital sensor noise, and affine issues. Assuming the photos have good quality in those aspects, this review focuses on techniques that might be useful to automate the task of photo selection from large collections or to enhance the visual aspect of imperfect photos by using a perceptual measure. This survey is organized as follows: in Sect. 2, the method- ology employed for finding related work is presented. In Sect. 3, the work on image enhancement is reviewed, in particular, enhancement that could be performed to increase the quality of a photo in a printing or selection scenario. In Sect. 4, work on image (and photo) analysis are reviewed. In Sect. 5, the main issues found in the reviewed approaches are discussed and summarized. Finally, in Sect. 6, some conclu- sions are given. 2 Methodology of the research This section is devoted to presenting the methodology adopted for searching the related work in the area. More specifically, information on search engines, digital libraries, and keywords used in the searching process is provided. Two search strategies were employed, inspired by the tra- versingorderofbreadth-firstanddepth-firstsearchstrategies, respectively. Breadth-first search was performed within a set of predefined published conference proceedings and jour- nals in a specified time period, but only the first level of the search tree was considered for reviewing. Depth-first search was performed by using a search engine to find papers given a set of keywords. A subsequent search was then performed by using the references of those papers as a starting point. This process was repeated until a maximum depth of 3 was reached. In the following two subsections, more details on each type of performed search are given. 2.1 Breadth-first search This strategy aimed at finding related papers in a set of recenttechnicalpublications,suchasjournalsandconference 123 J Braz Comput Soc (2013) 19:341–359 343 Table 1 Literature search results The first column corresponds to the publication name, the second column indicates if the publication is a conference or a journal, the third column shows the publisher name, and the fourth column shows the number of papers related to this survey Publication C/J Publisher No. of papers CVPR C IEEE 18 ICME C IEEE 15 CVIU J Elsevier 12 ICIP C IEEE 9 IJCV J Springer 8 Pattern recognition J Elsevier 8 MM C ACM 6 TIP J IEEE 6 IET-CV J IET 4 CIVR C ACM 3 Expert systems with applications J Elsevier 3 ECCV C Springer 2 Eurographics C Wiley-Blackwell 2 ICASSP C IEEE 2 IJPRAI J World Scientific 2 Transactions on consumer electronics J IEEE 2 Transactions on Graphics J ACM 2 Transactions on multimedia J IEEE 2 Visual communication and image representation J Elsevier 2 Other J/C - 42 Total 150 proceedings, given a specified time frame. The search was performed in the database of conferences and journals of IEEE, ACM, Springer, and Elsevier. Besides, by using the search results, there was also performed a search for all papers published in a given conference or journal by using its table of contents. The keywords used for the automatic search were (con- sumer OR personal OR digital) AND (image(s) OR photo(s) or photograph(s) OR photographic archive) AND (value OR quality OR aesthetics OR visual quality) AND (evaluation OR assessment OR analysis OR estimation). A publication period between 2006 and 2012 was defined. 2.2 Depth-first search In this strategy, the relevant papers were found by using the following methodology: (1) based on a set of key- words, for every result returned by a given search engine, (2) the bibliography was analyzed, and (3) relevant cited work was reviewed, including the root paper itself. This is a practical and useful method for reviewing the literature, once the search is seeded using papers already consid- ered relevant by other researchers. The great advantage is that this method dramatically reduces the searching time for finding relevant papers. On the other hand, there are some drawbacks. First, some stop criteria have to be defined, otherwise this becomes an almost endless process. Second, not every citation is directly related to the research area, since it is common to find papers from correlated areas such as Artificial Intelligence and Neurobiology. In order to perform this search, some constraints have to be defined. The search is performed in a single level. Another levelisconsideredifandonlyifacitedpaperisstrictlyrelated to the area. 2.3 Search results Table 1 contains the conferences and journals returned from the above mentioned search method. It is also noted if it is a conference or a journal, the publisher name, and the number of papers selected for this survey. In Table 1, Other refers to conferences or journals with only one related publication. Figure 2 illustrates the balance between conference and journal papers that are reviewed in this survey. 123 344 J Braz Comput Soc (2013) 19:341–359 91 61% 59 39% Conference Journal Fig. 2 Number of works published in conferences and journals that are studied in this survey. 2.4 Considerations on the methodology The methodology previously presented was defined in order to cover relevant papers in the research taxonomy defined in the previous section. Of course, work published prior to 2006, publishedinlowimpactconferences/journalsorindexedwith inadequate keywords might not have been included in this survey. Nonetheless, the number of relevant papers that were included in this survey (150) indicates that a good sample of the relevant work was considered. 3 Enhancement This section focuses on the research on enhancement tech- niques applied to digital photography. Usually, the areas of enhancement and analysis work side-by-side, e.g., enhance- mentisoftenperformedinordertoobtainmorepreciseanaly- sis, and a good analysis may help identify which aspects of a digital photo should be enhanced. In spite of that, and for didactic purposes, these areas are discussed separately in this survey. As mentioned in the previous section, enhancement work may be divided in on the fly and off-line. On the fly approaches are the ones for which it is possible to modify the environ- ment during image acquisition, while in off-line approaches that is not possible, thus, usually the photos are modified or enhanced after acquisition. Nonetheless, both expressions, off-line and on the fly, are also used with other connotations. Chartier and Renaud employed off-line in a noise filtering context [22], while Ercegovac and Lang used the same term in a digital arithmetic context [46]. In the following two sections, more details on photo enhancement approaches are given. 3.1 On the fly enhancement Although it is generally possible to improve photos by means of a wide range of enhancement algorithms (e.g., red eye cor- rection, histogram processing, among others), there are some particular scenarios from which some information is com- pletely lost during acquisition thus making useless a post- processing operation. Photograph acquisition is naturally a lossy process, which disregards factors such as color, tem- perature, environment, time and space of the environment, depth of the scene, among several others. For instance, a photo may be considered inadequate due to the zoom choice, e.g., a close-up should not be used when the goal is to show that the subject is in a given location. After the photo is shot, zooming out is not possible, and a good photo might be lost. In some specific situations, it is possible to perform some cor- rection. However, the results are usually far inferior to a sce- nario in which a new photo could be obtained. For example, image brightness can be adjusted after acquisition in order to improve image aesthetics, but this may result in intensity clipping. On the fly (or dynamic, live, real-time) enhancement algorithms are proposed to automatically perform or advise adjustments to the camera settings, before the photo is taken. Theperformedadjustmentsareintendedtoimprovethephoto quality or to avoid undesired conditions, such as inadequate focus and lighting. Most modern digital cameras have embedded on the fly enhancement mechanisms, such as an exposure meter, an automatic focus adjustment, and a white-balance adjustment. Since those mechanisms are mostly based on low level infor- mation, high level information about scene contents is usu- ally input by the user by means of an appropriate scene switch selection. For example, one may adjust the camera scene switch to motion when using camera to shoot a sports scene. A fully automated system may require that high-level information, such as the location of people in the scene, should be used in order to increase the overall understanding of the scene, as well as to help algorithms to decide where and how to perform changes. A first example of a fully-automatic approach is the face- priority auto-focus [125], which has been commercially used by several camera manufacturers. The goal is to set the focus of the camera to regions where there are faces, in order to avoid incorrect focus priority. For example, the Nikon D90 camera uses face position to correctly adjust focus on people present in the scene [107]. Another example, which recently became very popular, is the smile shutter function, which shots the photo only when every detected face in the image is smiling. Several cam- eras and prototypes incorporating the above features have been developed by camera manufacturers, such as Sony and Canon [47]. Photographic composition rules are also considered as important features for the dynamic adjustment of an image. The conformity with photographic composition rules can be achieved with slight movements of the camera. The pro- duction of an autonomous robot photographer is a possible direction to take to address this aspect. The robot devel- 123 J Braz Comput Soc (2013) 19:341–359 345 oped by Byers et al. [13,14] was designed to be placed in an event, moving towards possible subjects, proceeding the composition, and, finally, obtaining the photo. The subject of a photo may be identified by several approaches, such as considering the output of a skin detection algorithm and a laser range-finder sensor [13,14]. This information may also be used for finding the path that a robot must follow to reach the subjects. After getting to the desired place, once again the scene is analyzed for achieving a good composi- tion for the photo. Four composition rules (rule of thirds, empty space, no middle, and edge rule) were used to guide their system to obtain a good composition. The system per- formance was assessed in some real-world events such as a wedding reception and during the SIGGRAPH 2003 confer- ence. Another example of a robot photographer (but less inde- pendent than the one previously presented) is the Sony Party Shot [143]. The Sony Party Shot apparatus can be plugged into the camera for locating and photograph- ing people by moving in three degrees of freedom (pan, tilt, and zoom). The limitation of this approach is the need to put the robot in a fixed position with the aim of locating people of that point of view, and taking pho- tos. Some approaches may consider acquiring multiple images with different camera settings in order to detect and/or correct issues after image acquisition. For example, two subsequent photos may be obtained with different camera apertures [4]. The acquired images are then combined for finding the sub- ject and analyzing photo composition. Besides improving photographic composition of the image, it is also possible to locate the mergers (which occur due to the projection of a 3D world scene into a 2D representation, in which back- ground objects appear to be connected to the objects in the foreground) [4]. Despite the obvious advantage of on the fly approaches to digital photography, there are a number of limitations. Since most algorithms demand high processing times and some level of scene understanding, on the fly processing might be impractical due to battery consumption. A high processing time can be understood as a period of time that exceeds the time between two scenes setups (e.g., people position and lighting conditions). In order to improve scene understanding, stereo vision might be employed to find interesting objects, to pro- vide depth estimates, and to improve image segmenta- tion as well, which may help with the acquisition of bet- ter photos. In a dynamic environment, stereo vision may be obtained by the use of Pan–Tilt–Zoom (PTZ) cam- eras through simple-region SSD (sum of square difference) matching [158]. Finally, on the fly composition might be used for automatic and semi-automatic panorama creation [23]. 3.2 Off-line enhancement This section is devoted to discuss existing methods for the enhancement of photos that have already been acquired. The goal is to enhance a photo using only the existing informa- tion in the image file (i.e., pixels, Exchangeable Image File format—EXIF information, faces detected, etc.). Changes typically occur at pixel level, and are required when it is not possible to obtain another photo and the resulting photo has room for improvements. For off-line enhancement, the image representation might be in any color space, however Benoit et al. [5] have shown some advantages when using mod- els based on the human visual system for low-level photo enhancement. Generally, enhancement changes can be used for making more attractive a photo that presents some type of imperfec- tion. For instance, after removing the imperfection that lens dust may cause, a photo may look more attractive [188]. It is also possible to improve photos by smoothing the subject’s skin [77], by adjusting some general aspect, such as contrast and brightness for both generic [183] and specific type of photos (e.g., improving contours in nature photos [128]) or by removing an undesired object (e.g., an unknown person or a light pole [6,170]). This last type of enhancement may be achieved by image inpainting, as discussed next. Image inpainting has been largely used for enhancing pho- tos [170]. Through inpainting, one might remove an element which harms the composition of a photo [170]. Inpainting algorithms remove elements from photos by evaluating the surrounding indicated area with statistical analysis and filling this area with a surrounding-like texture. Inpainting may be obtained by combining texture synthesis, geometric partial differential equations (PDEs), and coherence among neigh- bor pixels [6]. Patch sparsity algorithms are used for improv- ing image inpainting [170]. Enhancement by example has also been employed [72]. When a user classifies a photo, he/she indirectly classifies the features he/she considers important. Processes such as sharpening, super-resolution, inpainting, white-balance, and deblurring are performed in photos so that they reflect the features present in example images. Other types of enhance- ment are the photo composite, in which a face can be replaced by another [86], and collage algorithms, in which groups selected images in a new one [149]. Cropping algorithms are designed for obtaining an image which has smaller dimensions than the original one. There are several methods for automatic [24,138,147,186,3,1,179] and semi-automatic [133] photo cropping. Cropping is per- formed by extracting an area of interest from an original photo [147,82,179], to improve the quality of the photo- graphic composition [186,133,18], to retarget a photo to smaller displays [24,138,3,1,179], and to recompose the photo [87]. 123 346 J Braz Comput Soc (2013) 19:341–359 Most cropping methods have in common the use of content-aware strategies. Content detection may vary from face detection [147,24,186,138,18], saliency detection [147, 24,186,3,1,87], and user-interaction with tracking of the user’s gaze [133]. Region of interest (ROI) cropping methods intend to remove a fraction of the original image which contains or includes some element of interest. The dimensions of the resulting image are dependent on the original image con- tents. Some restrictions may apply, such as maintaining the image original proportion or leaving some room from the element of interest to the photo edges [147,82]. Common applications for such methods are thumbnail cropping and image summarization. A ROI cropping may be improved with face detection for images containing people. However, other elements of interest (e.g., animals) may be detected by using specific detectors [165] or a generic detector such as saliency maps. Saliency and spatial priors have been also used for content-aware thumbnail cropping [82]. Cropping for improving composition may help the pho- tographer to achieve better results by modifying the image dimensions or image proportions. As it is going to be further discussed in Sect. 4.1, there are rules for analyzing the com- position quality of a photo, such as the rule of thirds, which may be used in conjunction with small changes in the image dimensions, aiming at better composition by just removing a few pixel rows (or columns) of the photo. On the other hand, there are methods that make direct changes in the image con- tents. They will be referred to in this paper as recomposition methods, as discussed next. Cropping algorithms are limited in the sense that they crop the images from the borders. Cropping columns or rows in the middle of the image usually results in distortions. However, in some cases the important content of the image is close to the borders. There is, however, a class of algorithms named retargeting algorithms, which may crop the image to regions other than the borders only. Retargeting algorithms are mainly designed for adapt- ing an image to different rendering devices, such as mobile phones [24,138,3,1]. The goal is to preserve the main con- tent of the image while discarding unnecessary or redundant information in such a way the main content is more visi- ble than if a simple resample were applied. Global energy optimization for the whole image may be used for image retargeting [127]. Face detection, text detection, and visual attention, all combined, may also be used for finding content in photos [24]. Another class of algorithms, similar to the retarget- ing algorithms is the class of recomposition algorithms. Recomposition algorithms present a very challenging area of research: the goal is to automatically change the image in order to obtain a more pleasant composition. Some examples of such changes include modifying the subject proportions, removing elements from the photo, cropping the image, etc. Most approaches are, as yet, semi-automatic in the sense they require human intervention to indicate which areas need improvement. In this survey, recomposition algorithms are considered as different from retargeting algorithms since they do not necessarily imply changing the original image dimensions or the original image proportions. The changes are usually artistic ones. Liu et al. [87] proposed a method for recomposition based on finding elements of interest, and applying composition rules (such as rule of thirds, diagonal guidance, and visual balance) to produce a better composed image. In a similar approach, Bhattacharya et al. [7] pro- posed a semi-automatic recomposition method which uses stress points (adapted from the rule of thirds) for optimal object placement and visual balance for improving compo- sition. Experiments show that 73 % of recomposed images were considered better than original counterparts by human observers. 4 Analysis In this section, relevant work on photography analysis is pre- sented. Methods in this area may be organized according to their purposes, as follows: 1. Assessment. The goal is to score photos on a given scale (e.g., from zero to ten, good or bad) according to some criterion: the image quality (related to some degradation in the image) or the aesthetics of the image could be assessed; and 2. Information extraction. The goal is to detect the pres- ence and location of some pre-defined elements of inter- est, e.g., people and faces, in a photo. The relationship between photos could also be extracted. Inthefollowingsections,eachoneofthesegroupsofmethods is described in more detail. 4.1 Assessment Assessment algorithms typically assign a score to a photo based on some metric. This allows the creation of an ordering based on the returned metric values. Assessing (or ranking) a photo is a very difficult and controversial task, especially when dealing with consumer ones. Two main aspects can be evaluated: the quality and the aesthetics of the photo. While the image quality analysis, in this survey, is understood as the assessment of the degradation of the image (e.g., sensor noise, resolution, and compression artefacts), the aesthetics analysis is related to the visual appearance and appeal of the photo (e.g., the color harmony and photo composition). IQA is out of the scope of this survey. 123 J Braz Comput Soc (2013) 19:341–359 347 Several photo composition techniques and rules of thumb have been defined by experienced photographers, based on heuristics and are considered as responsible for improving the aesthetic quality of a photo. Those rules, known as pho- tographic composition rules, may be used to identify higher- quality photos, based on the assessment of features. Photo composition may be regarded as the most determinant factor to consumers when considering photo quality [134]. The application of a photo composition rule will not nec- essarily assure best aesthetic results. Notwithstanding, pho- tos obeying such rules are likely to look more appealing to consumers than if they were shot without attention to the rules [134,17]. However, it is not necessarily true that a photo must have composition rules obeyed to be considered appeal- ing by consumers. This contradiction may be explained by the existence of other factors, apart from composition, e.g., people involved, photogeny and the place the photo was taken. Some of the photo composition rules were lately explained by the theories of perception. The rule of thirds is a good example: it is known that when the subject of the photo is placed in one of the thirds of the image, the viewer is stimu- lated, due to the nature of human visual system, to perceive other regions of the photo. Other rules are not well defined in the specialized photography literature or are defined in terms of more subjective concepts, e.g., trying to obtain a more casual and spontaneous picture [12]. Photographic composition rules have been adopted for ranking photos in many researches [87,17,18,39,4,14,73, 139,95,81,41], in which relationships between some prede- fined rules and the human judgment have been identified. Rules may come from human visual system theories as well as from professional photographer’s expertise. The rule of thirds is the most explored photographic com- position rule in the literature [87,17,18,39,4,14,74]. One of the main reasons for that is because it is easily translated to algorithms. The rule of thirds states that one should preferen- tially place the subject in a third of the image width or height (depending on the image orientation). Existing works differ on how the subject of the photo is located, for example by using (1) face detection algorithms [17,18,14]; (2) low-level information, such as borders and regions found by the mean shift algorithm [87,4]; or (3) by evaluating the differences of pixels positioned in those interest areas [39]. Other rules are also explored but with less consensus between authors. The zoom rule can be applied to classify photos according to the distance from the camera to the subject. Excessive or insufficient distances are penalized by the algorithm as inadequate compositions. Since a precise detection of the subject is required for this type of analysis, Cavalcanti et al. [18] and Byers et al. [14] used face detec- tion as the main information to identify subject position. In a similar approach, Kahn et al. [74], used the ratio between the area of the face and the area of the image. The integrity rule was proposed to identify undesired chopping of the main subject. The great drawback of this rule is the high cost of precisely detecting the subject in a photo. The use of anthropometric measures were shown to be effective to subjects in an upright frontal position. Using some reliable information, such as the coordinates, and the dimensions of a detected face [139,18], it is possible to infer the position of the rest of the subject body, and detect possible chops. Both zoom and integrity rules were designed consider- ing that there is trusty high-level information such as the face coordinates. This is a great drawback, since an impre- cise detection may lead to wrong conclusions. There are approaches that mainly rely on the intensities of pixels rather than on high-level information. One disadvantage is that color images may have their channels treated independently and may result in redundancy which must be treated by clas- sification algorithms. For instance, in the work of Datta et al. [39], 56 initial rules were reduced to just 15 rules after the execution of support vector machines (SVM) and prun- ing [33], since there was redundancy in applying the same data extraction algorithm in all images. Finally, the visual balance rule is also used for analyzing if the photo elements are well balanced, i.e., are placed in a photo in a way that observer attention is equally divided by the photo elements [87,7,176]. Besides the use of photographic rules there are authors which evaluate low-level features (e.g., sharpness, bright- ness, and contrast) in order to identify the overall appearance of a photo [108,73,39,81,80]. Higher level image analysis may also be employed for photo ranking. For instance, aesthetic analysis, may be achieved by learning how humans classify photos accord- ing to some subjective criteria. Although that might be dif- ficult, there are studies focusing on the emotions evoked by artwork in humans [174]. The criteria may be diverse. For instance, the time a human spends evaluating an image can be a criterion for confidence on human assessment [45]. It is believed that the emotions evoked by a natural image can be understood by means of aesthetics gap concept. Accord- ing to Datta et al. [40], “The aesthetics gap is the lack of coincidence between the information that one can extract from low-level visual data and the interpretation of emo- tions that the visual data may arouse in a particular user in a given scenario.”. Color harmony can also be consid- ered as an important feature to be considered [176]. Low- level information such as lighting, color [95,80,74], lumi- nance [74], edges, and range of lightness [48] are used for judging the harmony (a high-level subjective aspect) of a photo and videos [104]. 123 348 J Braz Comput Soc (2013) 19:341–359 Besides all the above presented factors, there are some other common sense factors that might influence human judgement. Below is a list of these additional factors: • People involved. A photo may be considered more or less appealing depending on the identity of the shown people, e.g., even a badly composed and illuminated photo might be considered good if it contains people for which the consumer has affection, such as the photographer’s child, a famous person, etc. The opposite might also happen: a well composed photo might be discarded if the person in the photo is unknown; • Place where the photo was taken. Some photos are related to places rarely visited. Thus, even if a photo has prob- lems, e.g., in composition or illumination, it is likely that it will not be discarded because of its uniqueness; • Photogeny. Well-composed photos do not necessarily contain photogenic people. It is possible to find one or more group members talking or looking elsewhere in the moment the photo was shot, especially in group photos; and • Personal preferences. Some people might prefer a photo without obeying composition rules. Despite the above discussed factors, photo ranking might be useful for helping consumers to identify (at least in a group of pre-selected photos) the ones with more attributes related to a better looking or appealing impression. 4.2 Information extraction This section includes a discussion on approaches for extract- ing elements of interest that might be important to a pho- tographic analysis system. The reviewed work involves approaches for face and people detection, landscape analysis (e.g., horizon tilt evaluation), and identification of the image class (e.g., if it is a photo or a graphic image). The goal is neither to rank nor to classify the images but to extract information. This may be considered as an auxiliary source of information for image ranking methods (discussed in the previous section). Elements of interest might be anything the user is searching for: (1) a face; (2) a person; (3) regions with unwanted features such as dissection lines [139], (4) unfocused or blurred regions [151], (5) a sunset area [9], (6) text [153], and many others. Generally, information extraction by different approaches involves the construction of a classification model for the tar- geted element (this can be performed, for example, through a learning process using a set of reference patterns). It is com- monly accepted that the best technique to build a particular model for a given problem is dependant on specific features of the problem. Decision trees [130,43] are typically employed to iden- tify classes that have a reduced number of constraints, both numerical and categorical, such as number of colors, num- ber of people, etc. The ID3 classification algorithm [124] was used for classifying an image as either a digital photo or as artwork (e.g., a logo, a drawing, and other images artificially generated). The decision tree was trained with 1,200 images. An accuracy of 95.6 % was achieved when distinguishing the classes. This result was verified through a tenfold cross validation. The SVM [33] is largely used for classification, which is useful for detecting features in photographs, such as indoor or outdoor scenes [137], the presence of a sunset [9], the level of expertise of the photographer [152], the presence of skin regions [64], among others. The SVM is normally used when the set of constraints is not small, and there is no clear linear separation of the data for each class. When defining an SVM model, a kernel must be specified. For instance, Serrano et al. [137] used a radial basis function. On the other hand, Boutell et al. [9] and Li et al. [80] used a gaussian function. Whenthereisagreatamountofdata,andagreatnumberof components as well, the high correlation between those com- ponents may harm classification. Many authors use principal component analysis (PCA) [65] for reducing the dimension- ality of the feature space [152]. Some information extraction methods are designed for detecting human-related information, such as face, eye, skin, pose, etc. Since most photos have people, such information is very important to any photographic analysis system. Recent work in face detection focused on multi-view, rota- tion, and scale invariant face and eye detectors. Discriminant features [162], low-level features [42], Sobel edge detection, morphological operations, and thresholding [154] may be used for this goal. Face recognition algorithms can be used for identifying photos which contain or do not contain a specific individual, as well as for finding relationships between images due to the presence of a given person or group [55]. The identification of a specific person might be used, according to a rule defined by Loui et al. [92], to infer the relevance of a given photo based on the relationship of the people to the photo owner. Recognition can be also used, along with human tagging and some logic formalism (e.g., Markov logic), to retrieve social connections in photo repositories [187,75,140,57]. In a photo selection scenario, it is very important to iden- tify the relationship between the people present in some image set. Such a relationship may be used for predicting the significance of such images to that set [91]. This can be done using local patterns of Gabor magnitude and phase [166]. Face recognition highly relies on face detection. Thus, impre- cision on face detection may result in a poor face recognition. There are, however, approaches for misalignment-robust face recognition [171]. 123 J Braz Comput Soc (2013) 19:341–359 349 The use of face details, such as birthmarks [120] and clothes [54], are also used to improve face recognition. A Markov random field is used for recognizing people based on contextual clues such as clothing [2]. Gender can also be a clue for face recognition by means of spatial Gaussian mixture models (SGMM) [84]. Considering that low-level image features are considered by consumers to determine if a given photo is better than another [137], detecting the presence of such low-level fea- tures may be very useful for ranking photos. One of those features is the blur. Blur may be used for automatically rank- ing photos [73,95]. Blurred images can be identified by the detection of some features, such as image color, gradient, and spectrum information [88]. The spectral analysis of an image gradient is also used for identifying blurring kernels in images [69]. Other features such as clarity, complexity, and color composition are also explored [94,39]. Besides the face, skin regions are another important evi- dence for the presence of a human in a photo. Several approaches were proposed. Skin tone may be detected by a pixel-wise approach or by a region-based approach [76]. Both approaches use a color model [64]. Additionally, it might be possible to decompose skin tone in hemoglobin and melanin [169], which can be used for a better understanding of skin texture. Skin classification may be performed by the use of SVM and region segmentation [64], as indicated earlier in this sec- tion. The approaches might be compared with receiver oper- ating characteristic (ROC) analysis [136]. While evidence for people can be obtained by skin detection or face detection, there are also approaches by which humans can be directly detected in images. Recent approaches use local binary patterns (LBP) for human detection through two variants, semantic-LBP and Fourier- LBP [105]. People detection can also be achieved by the use of quantified fuzzy temporal rules for representing knowl- edge of human spatial data. This kind of data is learned with an evolutionary approach [106]. A head and shoulders detec- tor can be also achieved by the use of a watershed and border detector, whose outputs are used to train a classifier using AdaBoost [168]. Some researches have been conducted for detecting peo- ple in a specific context, but might be extended to a more general scenario. For instance, it was shown that the use of region covariance features with radial basis function ker- nel SVM, and histograms of oriented gradients (HOG) with quadratic kernel SVM outperformed the use of local recep- tive fields with quadratic kernel SVM in the specific sce- nario of pedestrian detection [116]. In the same way, the detection of human activities, such as ‘fighting’ or ‘assault’, are recognized and encoded by using context-free gram- mars through a method which uses a description-based approach [131]. Besides people detection, other types of information may be useful for photography analysis. For example, the social context might be inferred by analyzing the distribution of people found within the image [56]. A graph-based approach has shown to be useful for finding rows of people [56]. The pose of the people is also important information about the photo. Each body part has a limited number of positions when compared relatively to other body parts. For instance, the head is directly connected to the shoulders and might not appear connected to the feet. Thus, if a face is found, the shoulders should come right below. There are several approaches for human pose estimation. Human pose may be estimated in video sequences using multi-dimensional boosting regression from Haar features [8]. In static images, pose can be classified through angular constraints and varia- tions of body joints with the use of SVM [97], with observa- tion a driven Gaussian process latent variable model (ODG- PLVM) [61] and non-tree graph models [70], and with a con- ditional random field (CRF) if multiple views are available. A bottom-up parsing approach can be used to recognize the human body for performing pose estimation by segmenting multiple images. Besides human subjects, other types of subjects may be considered in a photo. Different subjects (e.g., natural or man-made objects and animals) might appear alone or inter- acting with humans, resulting in a more complex photo. A shape-driven object detector [129,59], SIFT [78], and sets of mattes [144] may also be employed for a more general object detector. It is also possible to identify the region- of-interest by using captured camera information stored in EXIF [83]. Instead of detecting a specific type of object, it is also effective to use the identification of regions within the image with some correspondence. In this sense, the image segmen- tation algorithm has fundamental importance for the photog- raphy analysis. There are several approaches to image segmentation. Since in photography analysis, subjects and scenarios might vary widely, the more general the image segmentation algo- rithm is, the better the result. Main methods for image segmentation are based on edge information [159,19], fragment-based approaches [36,79], point-wise repetition [182], tree partitioning under a nor- malized cut criterion [160], a nonparametric Bayesian model [115], a geometric active contour model [180], Markov random fields with region growing [123], Markov randomfieldsandgraphcut[25],andlocalChan–Vese(LCV) model [163]. Most algorithms deal with both color and gray images. Some image segmentation algorithms are specific to color images [19,181]. It is normally difficult to compare different image segmentation algorithms, but unsupervised objective assessment methods have been attempted for this task [185]. 123 350 J Braz Comput Soc (2013) 19:341–359 4.3 Grouping Photo grouping is designed for setting associations between groups of photos. The associations may be set by either the semantic information found (such as number of faces detected, number of colors, etc.) or high-level information (e.g., Global Positioning System, GPS, position present in some image EXIF). 4.3.1 Classification and clustering Classification algorithms are designed to identify the class which a given image belongs to. There are several goals for image classification, e.g., (1) identifying, in a set of image files, which ones are photos and which ones are graph- ics [114]; (2) identifying whether photos were obtained in an indoor or an outdoor environment [137]; and (3) identifying if images were obtained by an amateur photographer or a professional one [151,80,94,111], among others. It is not completely known how humans perform clas- sification tasks. Vogel et al. [157] have shown, however, that humans use both local and global region-based configu- rations for scene categorization. This implies that human- inspired algorithms may consider both local and global region-based information for better results in image classifi- cation. It was also shown that the color plays an important roleinimagecategorizationforhumans.Naturalimageswere better classified when presented in color as opposed to gray levels [157]. Classification has a close relation to information extraction, as discussed in Sect. 4.2. One of the main steps for an accurate image classification is the representation of the image which will later be used as input to a classifier. Representation can be performed by local descriptors [101], a topic histogram using probabilistic latent semantic analysis (pLSA) and expectation–maximization (EM) [93], multilevel representation [164], triangular repre- sentation, which is robust to viewpoint changes [67], resolu- tion invariant image representation [161], and scale invariant feature transform (SIFT) [164], among other methods. Some relevant classifiers proposed in the literature are: AdaBoost [93], SVM [101,164], multiple kernel learn- ing (MKL) [93], Bayesian belief networks [38], Bayesian active learning [122], and conditional random fields mod- els [15,184]. The main challenges in image classification are the com- putational cost and the classification accuracy. A local adap- tive active learning (LA-AL) method was used for lowering the number of training samples needed [93]. The within- category confusion can be dealt with probabilistic patch descriptors, which encodes the appearance of an image frag- ment, and the variability within a category [101]. Clustering algorithms are intended to automatically group images when considering their extracted features. Given a set of photos, clustering can be used to identify the existing relationship between such photos. Cooper et al. [30] presented an automatic temporal similarity-based method using EXIF data. Graph-based algo- rithms [52] and local discriminant models and global integra- tion (LDMGI) [173] are common methods for image clus- tering. Since clustering is not commonly a supervised process, system improvements are necessary for reducing errors in the system. Thus, user feedback is used as a way to bring out relevant feedback about system performance [11]. 4.3.2 Summarization Another recent area of interest is finding relationships between photos for producing summaries. Summaries are useful since finding information in large sets of images can be time consuming. Summaries are used for producing con- densed displays of touristic destinations [117], simplifying photo browsing on personal collections [141,155,28], index- ing [156], and storytelling [53,110], among other appli- cations. A specific problem related to the task of produc- ing summaries or filtering out redundant information from a collection of photographs is the detection of near dupli- cates [28,148,126]. Photos matching specific keywords [142] or GPS-tagged information [135] have been grouped to build 3D models of some sightseeing. On-line tools, such as Bing maps [102], used some of those technologies for building 3D models of such places. 4.3.3 Image retrieval According to Marshall et al. [99], image retrieval techniques can be classed as content-based image retrieval (CBIR) and annotation-based image retrieval (ABIR). In CBIR, the images are processed for obtaining information while in ABIR, images are often annotated with textual information, such as place, time or photographer, and this information is used to retrieve images. Most detection algorithms can be used as an intermediate step for retrieving images in CBIR [99,90], such as recog- nized faces [187,121] and events [37], among others. Since manually tagging photos can be time consum- ing, recent work considers the use of information auto- matically obtained from EXIF [83,148,85,132], SIFT [148, 26], face recognition and connections found in social net- works [27,113,155], and georeferences, which might be obtained from GPS devices [16], people clues such as faces and clothes [146] or other high-level information [132]. 123 J Braz Comput Soc (2013) 19:341–359 351 4.4 Discussion In this section, algorithms for photo analysis have been orga- nized in three categories: assessment, information extraction, and grouping. From the performed review, assessment seems to be the less explored area. This may be explained by the highly subjective nature of the task, which makes it difficult to perform precise or universal analyses. The other two areas are more explored in the literature and present a richer set of approaches. Besides the underlying limitations discussed in the next section, the approaches seem very promising to be included in a photography analysis system. 5 Critical analysis The main issues covered in the studies reviewed in this survey are considered in this section. To better discuss such issues, the following information about the articles were summa- rized: the source of the used image set, the size of the image set, the main goal, the metrics used for assessment, and the achieved results. The photo analysis algorithms are shown in Table 2 and the enhancement algorithms are in Table 3. This section contains two subsections. In the first one, a review of the image sets used in the experiments is given. In the second one, commentaries about the validation processes are presented. 5.1 Image sets For most of the image analysis algorithms reviewed in this survey, the purpose is to perform tasks in a human-like man- ner. Thus, it is fundamentally important to ensure the photo sample is representative for testing. Some studies were performed to identify the user behav- iour when photographing [96], sharing [103], analyzing [49], and managing [35]. However, based on the conducted lit- erature review, strong evidence about the user preferences were not drawn, the assessment of most algorithms for photo enhancement and analysis are performed by means of sub- jective assessment. According to the conducted literature review, there is no defined methodology for carrying out subjective exper- iments for photos analysis. Some methodology ought to be employed due to the number of factors that might influence subjective assessment. Some of these factors are: 1. People involved. While in professional photos, the people present in the photo are usually part of the subject, in con- sumer photos, people are mostly known and significant to the photo owner. Therefore, a photo assessment per- formed by consumers might be too strict in the absence of a known person and too flexible in the occurrence of, for instance, a family member; 2. Place and event. In some situations, the photo might not be technically good, but captured a place or a rare event. This could positively influence the judgement of the photos; 3. Style used. Different users adopt different photo habits. The individual style of a user might not be appreciated by other users; 4. Number of Images. There are an endless number of poses, camera settings, and subject positioning. Therefore, it is barely infeasible to represent this diversity of possibilities in a small set of images. In Tables 2 and 3, the second column (Image sources) indi- cates the databases from which the images were obtained. In this column, Web refers to web crawled images and Own refers to particular photos from the authors or contributors. The third column, (Set size), represents the number of images used in the experiments (if any). The fourth column indicates the main goal of the work. The fifth column briefly indicates how the approach has been evaluated, in which Obj. repre- sents an objective assessment and Sub. a subjective assess- ment method. In the final column is shown the best reported performance of proposed algorithms. Tables 2 and 3 have been built based strictly on what was described in the papers. Whenever the information was not explicitly shown in the paper, results are shown in a non-numeric way or the infor- mation is not suitable for the discussed problem, NI (Not Informed) is used. Both tables are sorted based on the total number of images and then alphabetically by the name of authors. By analyzing the Tables 2 and 3 it is possible to draw some conclusions about the number of images and their sources. First, there is no consensus on the database to be used. This makes it impossible to perform a direct comparison between the results in Tables 2 and 3, and to reproduce the exper- iments as well. Second, the number of images employed in the evaluations drastically vary. The average number of images employed in photo analysis work is 45,344 with a standard deviation of 212,556, and a median of 3,581 with an interquartile range of 12,278. If only photo enhancement work is considered, the average is 716 with a standard devia- tion of 952, and a median of 375 with an interquartile range of 766. Third, no work presented a categorization of the image set, e.g., not known is the distribution of the number of peo- ple among a given set. Finally, some papers only presented a simple visual verification of the results (e.g., Achanta et al. [1] and Banerjee et al. [4]). It is important to highlight the non-utilization of a labeled and representative public image database for photographic 123 352 J Braz Comput Soc (2013) 19:341–359 T ab le 2 S um m ar y of th e re vi ew ed w or k on an al ys is te ch ni qu es A ut ho rs Im ag e so ur ce s S et si ze M ai n go al A ss es s. m et ho d: us ed m et ri cs R es ul ts L iu et al .[ 90 ] P ho to si g [1 0] , N U S -W ID E [2 9] , K od ak [1 72 ] 1, 30 0 ,0 00 C B IR O bj .: pr ec is io n 14 .5 % L i et al .[ 83 ] N I 70 ,0 00 R O I D et ec ti on O bj .: pr ec is io n an d R ec al l N I P an g et al .[ 11 7] F li ck r [5 0] 50 ,0 00 G ro up in g S ub .: sc al ed (1 –5 ) A ve ra ge ra nk > 4 S in ha [1 41 ] F li ck r [5 0] ,P ic as a [5 8] 40 ,0 00 G ro up in g O bj .: JS D iv er ge nc e JS D iv .< 0. 3 T on g et al .[ 15 2] C or el [3 2] ,M S 29 ,5 40 A ss es s. : ho m e us er x ph ot og ra ph er O bj .: M S E 11 .1 M ar sh al l [9 9] M IR F L IC K R 25 00 0 [5 1] 25 ,0 00 C B IR N I N I O ’H ar e [1 13 ] O w n 23 ,7 74 G ro up in g O bj .: H -h it ru le N I D ao et al .[ 37 ] P ic as a [5 8] 19 ,1 01 G ro up in g O bj .: F -M ea su re N I L uo et al .[ 94 ] W eb 17 ,6 13 A ss es s. : hi gh x lo w qu al it y O bj .: ac cu ra cy 95 % Y ao et al .[ 17 5] P ho to .n et [6 0] 13 ,3 02 A ss es s. : ra nk in g S ub .: sc al e (0 –1 00 ) 75 .3 3 % K e et al .[ 73 ] D pC ha ll en ge [2 0] 12 ,0 00 A ss es s. S ub .: sc al e (1 –1 0) 72 % Y eh et al .[ 17 7] D pC ha ll en ge [2 0] ,F li ck r [5 0] 12 ,0 00 A ss es s. : ra nk in g S ub .: sc al e (1 –1 0) 81 % Y eh et al .[ 17 6] D pC ha ll en ge [2 0 ] ,F li ck r [5 0] 12 ,0 00 A ss es s. : ra nk in g S ub .: sc al e (1 –1 0) 93 % S an dn es [1 32 ] O w n 7, 67 2 G ro up in g O bj .: ac cu ra cy 88 .1 % S u et al .[ 14 5] D pC ha ll en ge [2 0] 6, 00 0 A ss es s. S ub .: sc al e (1 –1 0) 92 .0 6 % B ou te ll et al .[ 9] C or el [3 2] /O w n 5, 77 0 C la ss .: su ns et ac cu ra cy 96 .4 % S in gl a et al .[ 14 0] O w n 4, 50 0 S um m . O bj .: pr ec is io n an d R ec al l N I O li ve ir a et al .[ 11 4] W eb 3, 70 0 C la ss .: ph ot o x gr ap hi c O bj .: cr os s- va li da ti on 95 .6 % D at ta et al .[ 39 ,4 1] P ho to .n et [6 0] 3, 58 1 A ss es s. : ra nk in g S ub .: sc al e (1 –7 ) 70 .1 2 % O br ad or et al .[ 11 1] P ho to .n et [6 0] 3, 14 1 C la ss .: hi gh x lo w ae st he ti cs S ub .: sc al e (1 –7 ) 66 .5 % Z ha ng et al .[ 18 7] O w n 2 ,5 97 G ro up in g N I N I T on g et al .[ 15 1] C or el [3 2] 2 ,3 55 C la ss .: bl ur O bj .: A cc ur ac y 98 .6 % O br ad or [1 08 ] N I 2 ,0 00 A ss es s. : ra nk in g S ub .: 6 gr ad es 37 .5 % S he n et al .[ 13 9] W eb ,F li ck r [5 0] 2 ,0 00 D et ec t. : di ss ec ti on li ne s S ub .: T P + F P 80 .8 7 an d 33 .6 1 % C oo pe r et al .[ 30 ] O w n 1, 44 9 C la ss .: ev en t O bj .: F -M ea su re 0. 85 68 S er ra no et al .[ 13 7] W eb 1, 20 0 C la ss .: in do or x O ut do or O bj .: ac cu ra cy 90 .2 % C hu et al .[ 26 ] O w n 1, 19 9 G ro up in g O bj .: pr ec is io n 0. 68 C hu et al .[ 27 ,2 8] F li ck r [5 0] 1, 02 4 G ro up in g S ub .: sc al e (1 –5 ) S at is fa ct io n > 4 T an g et al .[ 14 8] P ic as a [5 8] 97 5 G ro up in g O bj .: pr ec is io n an d R ec al l N I L ou i et al .[ 92 ] N I 94 3 G ro up in g S ub .: co rr el at io n 0. 84 L o P re st i et al .[ 12 1] G al la gh er [5 4] 58 9 R et ri ev al O bj .: er ro r ra te 27 .6 8 % K im et al .[ 75 ] O w n 56 4 G ro up in g O bj .: P re ci si on at T op -N M A P > 0. 4 L i et al .[ 81 ] F li ck r [5 0] 50 0 A ss es s. : ra nk in g S ub .: ch oi ce 51 % L i et al .[ 80 ] F li ck r [5 0] 50 0 A ss es s. & C la ss S ub .: sc al e (0 –1 0) R es id ua l su m -o f- sq ua re s: 2. 38 K ha n et al .[ 74 ] L i et al .[ 81 ] 50 0 A ss es s. : ra nk in g S ub .: ch oi ce 61 .1 0 % Ji an g et al .[ 71 ] F li ck r [5 0] ,K od ak [1 72 ], O w n 45 0 A ss es s. : ra nk in g S ub .: sc al e 0– 10 0 M S E < 17 O br ad or et al .[ 11 0] O w n 20 0 G ro up in g S ub .: ch oi ce 75 % 123 J Braz Comput Soc (2013) 19:341–359 353 Table 3 Summary of the reviewed work on enhancement techniques. Authors Image sources Set size Main goal Assess. method: Results used metrics Byers [13] Own 3,008 In-camera photo composition Sub.: user selection 35 % Tian et al. [149] Own 1,627 Photo Collage Sub.: professional Most results considered good Liu et al. [87] Web 900 Recomposition Sub.: forced choice 93.7 % Bhattacharya et al. [7] Web 632 Recomposition Sub.: forced choice 93.7 % Yin et al. [179] Own 600 Media Adaptation NI NI Suh et al. [147] Corbis [31] 150 Cropping Sub.: recognition time Faster using the approach Zhang et al. [186] Own 100 Cropping Sub.: scaled 41 % Chen et al. [24] Web 56 Recomposition Sub.: scaled 71.28 % Santella et al. [133] NI 50 Cropping Sub.: forced choice 58.4 % Setlur et al. [138] NI 40 Retargeting Sub.: forced choice 89.1 % Achanta et al. [1] Berkeley [100] and MSRA [89] NI Retargeting NI NI Banerjee et al. [4] NI NI Recomposition NI NI Lim et al. [86] NI NI Composite NI NI analysis. Therefore, most authors crawled images from on- line repositories. Web crawlers can be employed for creating image sets which present a richer and diverse number of sit- uations, and a higher number of pixels [114,137,24,139]. Nevertheless, the great drawback is the lack of copyright licenses for public experiments. There are some public image databases that are free for academic research use (such as Flickr [50] and other databases under Creative Commons license [34]), yet they are not labeled. Regarding photo analy- sis, there are some web databases which have been used as a ground-truth for subjective quality analysis (e.g., DPChal- lenge [20] and Photo.net [60]). However, since those data- bases were designed for photo contests, they typically do not represent the reality for consumer photography, which usually have less quality and less exigent evaluators. Two authors have built datasets in order to make them available to the community. The first work, from Luo et al. [94], presented a dataset of 17,000 labeled pho- tos. The set was built to be diverse, once photos are dis- tributed over seven categories, they were labeled as high or low quality. The problem of this photo set lies in the labeling process. Some important information is not shown, such as the exact number of votes for each cate- gory, the origin and background of the photographer, and the personal information of the voters. Besides this, a more precise ranking (instead of only classifying images as high/low quality) could be used for a more general use in enhancement and analysis algorithms. The other work, from Bhattacharya et al. [7], presented a smaller photo set (only 632 images). Other factors, such as the ones ana- lyzed on the Luo et al. [94] approach, could not be eval- uated, since the image set built by Bhattacharya et al. [7] could not be downloaded due to a Web server error. Thus, it might be considered that the image set is no longer avail- able. One might suggest that if it is possible to learn an expert opinionaboutaphoto,itwouldbepossibletoanalyzeaphoto. However this is surprisingly not always true. Since average photography consumers do not have training in what a good photo is, they often do not agree with advice given by experts. There are several other factors that might influence a photog- raphy user’s opinion, such as photo effects and the event from whichthephotowasobtained,ratherthanphotographicrules. In conclusion, it was not possible to identify compara- tive studies involving different approaches, which considered publicly available photo datasets. This causes difficulties to reliably compare techniques when dealing with consumer photography. The use of image sets from photography con- tests has also its disadvantages since both photographers and voters may have professional skills or are highly interested in photography. This may lead to results that are not related to ordinary photography consumer preferences. 5.2 Validation This section contains a discussion on validation approaches. Photography might be considered an art form [66]. There is no simple way of deciding whether a photo is aesthetically pleasant or not. However, it might be possible to identify some metrics that would help photo assessment, and that would be a step further in this area. Another important aspect to be considered is how appro- aches were validated. Since the reviewed work is about photo enhancement and analysis, the results are usually images (in the case of photo enhancement algorithms) or abstract infor- mation, e.g., color/gray-scale maps, statistics, and scores. Both have a very high subjective component, although some metrics might be defined for obtaining a more objective analysis in a specific scenario. 123 354 J Braz Comput Soc (2013) 19:341–359 Validation methods can be classified as subjective or objective. Subjective methods involve subjective experi- ments in which humans are asked to give their opinion on photosofapre-definedtestsetwithrespecttoagivenattribute or criterion. A participant may give his/her opinion based on the following methods [98]: – Single-stimulus rating. The participant will give a score toaphotooragroupofphotos.Thescoremightbecontin- uous (such as 0–10) or categorical (e.g., excellent, good, fair, bad, and poor). During the rating process, each photo is typically showed to the participant for a fixed presen- tation time (e.g., 3 s); – Double-stimulus rating. While analogous to the single- stimulus rating, in double-stimulus trials a reference photo and a test photo are presented in random order, one after another, for a fixed presentation time (e.g., 3 s); – Forced-Choice. The participant is forced to choose only one within a group of photos, according to a given crite- rion; – Pairwise similarity judgement. Similar to forced-choice but, besides choosing one from a group of photos, the participant has also to indicate on a continuous scale how large the difference in quality is between the two photos; and – Indirect. The participant does not directly give his/her opinion. The quality may be inferred by some measure- ment such as the time needed for the participant to choose a photo. Other details and comparing methods can be found in the work of Mantiuk et al. [98], in which a comparison between the first four above-mentioned methods is given. The better method is usually the one with higher correlation between human and automatic labeling. It was shown, however, that for comparing IQA algorithms, in which differences between images might be small, the forced-choice pairwise compari- son is the most accurate and time-efficient [98]. Besides the comparing method, there are also other factors have an influence in the experimental assessment, since such experiments involve humans. Some of the factors are: – Number of participants. Once the opinion about the qual- ity of a photo may vary from one person to another, it is important to have a large number of participants in order to identify features that are more significant in human analysis; – Used equipment. When the experiment is conducted in an uncontrolled environment, the equipment used might harm the results (e.g., the calibration of the screen in a color experiment might produce a different opinion); – Knowledge in photography. Experts evaluate photos in a different way than consumers do. As an example, profes- sional photos might be considered good by both expert and consumer while a consumer photo might be consid- ered good by a consumer but bad by experts; – Cultural diversity. The style and subject of the photo might influence the judgement depending on the partici- pant’s background and origin; and – Number of photos. The number of photos in the experi- ment is a factor as crucial as the number of participants. If, on the one hand, a great number of photos might better represent the diversity of the photos, on the other hand, it might reduce the number of volunteer participants, since it becomes a more laborious experiment. Since, according to the literature review, there is no data- base which considers all those factors, most of the conclu- sions drawn from subjective experiments might be consid- ered partially biased. Besides the drastic influence of such factors, there is no consensus on what are the ideal values for them. Thus, most papers present some questionable decisions on the validation step, such as the number of participants (e.g., three participants [108]), knowledge in photography (e.g., most participants are experts [73]), and the number of images used (e.g., only 34 photos to represent the analysis sample [49]). On the other hand, objective metrics present a set of well- defined criteria, and proposals are evaluated based on those criteria. For instance, the best algorithm might be the one which has lower false-positive rates in a face detection sce- nario. Objective methods are usually less expensive since they do not rely on the availability and classification coherence of human participants. However, there are some important features that are not yet well-assessed by computational algorithms, such as the global visual aspect of a photo. Even humans may disagree with a classification result, what may imply a harder subjective assessment. Both approaches (objective and subjective) are important, each in its specific application scenario. As it can be seen in Tables 2 and 3, the methodology of the assessment widely differs in the reviewed work, with regards to the following aspects: (1) the assessment method, (2) the metric used for assessment, and (3) the source of the photo set. Although the results reported in those tables were obtained with different algorithms and different goals, it is possible to conclude that most approaches have opted for subjec- tive assessment when dealing with image enhancement and analysis. The reason is probably the lack of consensus on the image set to be used as ground-truth and the essentially subjective task of comparing images. 123 J Braz Comput Soc (2013) 19:341–359 355 There is also a lack of clarity regarding the number of people used in the subjective experiments, their confidence with the labeling, and the methodology of the experiment. 6 Conclusions This survey reviewed state-of-the-art methods for photo enhancement and analysis. For better understanding of this research area, a taxonomy was defined based on the related work. The main conclusions of this survey are discussed next: • According to the conducted literature review, this is the first survey on consumer photographic enhancement and analysis techniques; • The interest in algorithms for photo enhancement and analysis has been growing recently, based on the number of recent papers published in this area; • Thereisnotaconsensusonamethodologyforconducting subjective photo analysis experiments; • Although the results were obtained with different algo- rithms and different goals, it is possible to conclude that most approaches have opted for a subjective assessment due to the lack of a public and labeled image set that might work as a ground-truth for an objective assessment, and due to the inherently subjective task of comparing images. Therefore, in this scenario, direct comparisons between existing approaches might be unfair; • Some work that indicates the photo sources are not repro- ducible, since the photos used for testing are not clearly identified due mostly to copyright reasons or the great number of images; • There is no consensus on the number of images to be used in the experiments; and • There is a lack of clarity regarding the number of people used in the subjective experiments, their confidence with the labeling provided, and the assessment methodology. Thus, it is possible to conclude that, although there has been recent growth in photo enhancement and analysis tech- niques, this is an area with large potential. Experimental assessment needs to be improved, and assessment method- ologies are required as well in order to obtain strong conclu- sions about methods and results. Acknowledgments The authors wish to thank Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) for the financial support of part of this research. References 1. Achanta R, Süsstrunk, S (2009) Saliency detection for content- aware image resizing. In: Proceedings of the IEEE ICIP 2009. Piscataway, IEEE, pp 1005–1008 2. Anguelov, D, Lee KC, Gokturk SB, Sumengen B (2007) Contex- tualidentityrecognitioninpersonalphotoalbums.In:Proceedings of the IEEE CVPR 2007. IEEE Computer Society, pp 1–7 3. Avidan S, Shamir A (2007) Seam carving for content-aware image resizing. ACM Trans Graphics 26(3):10.1-10.9 4. Banerjee S, Evans BL (2004) Unsupervised automation of photo- graphic composition rules in digital still cameras. In: Proceedings of the SPIE Conference on sensors, color, cameras, and systems for digital photography, VI. pp 364–373 5. Benoit A, Caplier A, Durette B, Herault J (2010) Using human visual system modeling for bio-inspired low level image process- ing. Comput Vis Image Underst 114(7):758–773 6. Bertalmio M, Bugeau A, Caselles V, Sapiro G (2010) A com- prehensive framework for image inpainting. IEEE Trans Image Process 19(10):2634–2645 7. Bhattacharya S, Sukthankar R, Shah M (2010) A framework for photo-quality assessment and enhancement based on visual aes- thetics. In: Proceedings of the ACM MM 2010, pp 271–280 8. Bissacco A, Yang M, Soatto S (2007) Fast human pose estima- tion using appearance and motion via multi-dimensional boosting regression. In: Proceedings of the IEEE CVPR 2007, pp 1–8 9. Boutell M, Luo J, Gray RT (2003) Sunset scene classification usingsimulatedimagerecomposition.In:ProceedingsoftheIEEE ICME 2003, pp 37–40 10. Boyce W, Wilkie S (2013) Photosig. http://www.photosig.com. Accessed 31 January 2013 11. Bruneau P, Picarougne F, Gelgon M (2010) Interactive unsuper- vised classification and visualization for browsing an image col- lection. Pattern Recogn 43(2):485–493 12. Busselle M (1999) Better picture guide to photographing people. RotoVision, Hove 13. Byers Z, Dixon M, Goodier K, Grimm CM, Smart WD (2003) An autonomous robot photographer. In: Proceedings IEEE/RSJ IROS 2003, pp 2636–2641 14. Byers Z, Dixon M, Smart W, Grimm C (2004) Say cheese!: expe- riences with a robot photographer. AAAI Mag 25(3):37–46 (this is an invited paper that wraps up all of the other Lewis papers) 15. Cao L, Luo J, Kautz H, Huang T (2008) Annotating collections of photos using hierarchical event and scene models. In: Proceedings of the IEEE CVPR 2008, pp 1–8 16. Cao L, Luo J, Kautz H, Huang T (2009) Image annotation within the context of personal photo collections using hierarchical event and scene models. IEEE Trans Multiméd 11(2):208–219 17. Cavalcanti C, Gomes H, Veloso L, Carvalho J, Lima Jr O (2010) Automatic single person composition analysis. In: Skala V (ed) Proceedings of the WSCG 2010. UNION Agency-Science Press, Plzen, pp 229–236 18. Cavalcanti CSVC, Gomes H, Meireles R, Guerra W (2006) Towards automating photographic composition of people. In: Pro- ceedings of the IASTED VIIP 2006. ACTA Press, Anaheim, pp 25–30 19. Celik T, Tjahjadi T (2010) Unsupervised colour image segmen- tation using dual-tree complex wavelet transform. Comput Vis Image Underst 114(7):813–826 20. Challenging technologies: dpchallenge a digital photography con- test (2013) http://www.dpchallenge.com. Accessed 31 January 2013 21. Charrier C, Knoblauch K, Moorthy AK, Bovik AC, Maloney LT (2010) Comparison of image quality assessment algorithms on compressed images. In: Proceedings of the SPIE, Image Quality and System Performance VII, 2010. pp 75, 290B–1-75, 290B–11 22. Chartier S, Renaud P (2008) An online noise filter for eye-tracker data recorded in a virtual environment. In: Proceedings of the ACM ETRA 2008, pp 153–156 123 http://www.photosig.com http://www.dpchallenge.com 356 J Braz Comput Soc (2013) 19:341–359 23. Chen H (2008) Note: Focal length and registration correction for building panorama from photographs. Comput Vis Image Underst 112(2):225–230 24. Chen Lq, Xie X, Fan X, Ma WY, Zhang Hj, Zhou HQ (2003) A visual attention model for adapting images on small displays. Multiméd Syst 9:353–364 25. Chen S, Cao L, Wang Y, Liu J, Tang X (2010) Image segmen- tation by map-ml estimations. IEEE Trans Image Process 19(9): 2254–2264 26. Chu WT, Lee YL, Yu JY (2009) Using context information and local feature points in face clustering for consumer photos. In: Proceedings of the IEEE ICASSP 2009, pp 1141–1144 27. Chu WT, Li CJ, Tseng SC (2011) Travelmedia: an intelligent management system for media captured in travel representation. J Vis Commun Image 22(1):93–104 28. Chu WT, Lin CH (2010) J Vis Commun Image Rep. Consumer photo management and browsing facilitated by near-duplicate detection with feature filtering 21(3):256–268 29. Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng YT (2009) Nus- wide: a real-world web image database from national university of singapore. In: Proceedings of ACM CIVR 2009, July 8–10 30. Cooper M, Foote J, Girgensohn A, Wilcox L (2005) Temporal event clustering for digital photo collections. ACM Trans Multi- méd Comput Commun Appl 1(3):269–288 31. Corbis: Corbis image gallery (2001–2009). http://www.corbis. com. Accessed 31 January 2013 32. Corel Images: Corel images (2013) http://elib.cs.berkeley.edu/ photos/corel/. Accessed 31 January 2013 (currently unavailable) 33. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297 34. Creative Commons: Creative commons (2012) http:// creativecommons.org/. Accessed 31 January 2013 35. Cunningham SJ, Masoodian M (2007) Identifying personal photo digital library features. In: Proceedings of the ACM/IEEE-CS JCDL 2007, pp 400–401 36. Daliri MR, Torre V (2009) Classification of silhouettes using con- tour fragments. Comput Vis Image Underst 113(9):1017–1025 37. Dao MS, Dang-Nguyen DT, De Natale FG (2011) Signature- image-based event analysis for personal photo albums. In: Pro- ceedings of the ACM MM 2011, pp 1481–1484 38. Das M, Loui AC (2009) Event classification in personal image collections. In: Proceedings of the IEEE ICME 2009. IEEE Press, New York, pp 1660–1663 39. Datta R, Joshi D, Li J, Wang JZ (2006) Studying aesthetics in photographic images using a computational approach. In: Pro- ceedings of the ECCV 2006, pp 7–13 40. Datta R, Li J, Wang JZ (2008) Algorithmic inferencing of aesthet- ics and emotion in natural images: an exposition. In: Proceedings of the IEEE ICIP 2008, pp 105–108 41. Datta R, Wang JZ (2010) Acquine: aesthetic quality inference engine—real-time automatic rating of photo aesthetics. In: Pro- ceedings of the ACM MIR 2010, pp 421–424 42. Destrero A, Mol C, Odone F, Verri A (2009) A regularized frame- work for feature selection in face detection and authentication. Int J Comput Vis 83(2):164–177 43. Duda RO, Stork DG, Hart PE (2000) Pattern classification and scene analysis. Part 1, Pattern classification, 2nd edn. Wiley, New York 44. DunkerP,PoppP,CookR(2011)Content-aware auto-soundtracks for personal photo music slideshows. In: Proceedings of the IEEE ICME 2011, pp 1–5 45. Engelke U, Maeder AJ, Zepernick HJ (2009) On confidence and response times of human observers in subjective image quality assessment. In: Proceedings of the IEEE ICME 2009, pp 910–913 46. Ercegovac M, Lang T (1992) On-the-fly rounding (computing arithmetic). IEEE Trans Comput 41(12):1497–1503 47. Etchells D (2005) Canon expo 2005—a one-company trade show. http://www.imaging-resource.com/NEWS/1126887991. html. Accessed 31 January 2013 48. Fedorovskaya E, Neustaedter C, Hao W (2008) Image harmony for consumer images. In: Proceedings of the IEEE ICIP 2008, pp 121–124 49. Fedorovskaya E, Neustaedter C, Hao W (2008) Image harmony for consumer images. In: Proceedings of the IEEE ICIP 2008, pp 121–124. doi:10.1109/ICIP.2008.4711706 50. Flickr (2013) Flickr photo sharing. http://www.flickr.com/. Accessed 31 January 2013 51. Flickr (2013) Mirflickr-25000. http://www.flickr.com/photos/ tags/. Accessed 31 January 2013 52. Foggia P, Percannella G, Sansone C, Vento M (2008) Int J Pattern Recogn Artif Intell. A graph-based algorithm for cluster detection 22(5):843–860 53. Fujita H, Arikawa M (2007) Creating animation with personal photo collections and map for storytelling. In: Proceedings of the ACM EATIS 2007. ACM, New York, pp 1:1–1:8 54. Gallagher A, Chen T (2008) Clothing cosegmentation for recog- nizing people. In: Proceedings of the IEEE CVPR 2008, pp 1–8 55. Gallagher AC, Chen T (2007) Using group prior to identify people in consumer images. In: Proceedings of the IEEE CVPR 2007, vol 0. IEEE Computer Society, pp 1–8 56. Gallagher AC, Chen T (2009) Finding rows of people in group images. In: Proceedings of the IEEE ICME 2009. IEEE Press, New York, pp 602–6058 57. Golder S (2008) Measuring social networks with digital photo- graph collections. In: Proceedings of the ACM HT 2008, pp 43–48 58. Google: Picasa (2013) http://picasa.google.com/. Accessed 31 January 2013 59. Gorelick L, Basri R (2009) Shape based detection and top– down delineation using image segments. Int J Comput Vis 83(3): 211–232 60. Greenspun P (2013) Photo.net photography community. http:// photo.net. Accessed 31 January 2013 61. Gupta A, Chen F, Kimber D, Davis LS (2008) Context and obser- vation driven latent variable model for human pose estimation. In: Proceedings of the IEEE CVPR 2008, pp 1–8 62. Haddad Z, Beghdadi A, Serir A, Mokraoui A (2010) Image quality assessment based on wave atoms transform. In: Proceedings of the IEEE ICIP 2010, pp 305–308 63. Han HS, Kim DO, Park RH (2009) Structural information-based image quality assessment using lu factorization. IEEE Trans Con- sum Electron 55(1):165–171 64. HanJ,AwadG,SutherlandA(2009)Automaticskinsegmentation and tracking in sign language recognition. IET-CV 3(1):24–35 65. Haykin S (1999) Neural networks: a comprehensive foundation. Prentice Hall, Englewood Cliffs 66. Hedgecoe J (2009) New manual of ohotography. Dorling Kinder- sley, New York 67. Hoíng NV, Gouet-Brunet V, Rukoz M, Manouvrier M (2010) Embedding spatial information into image content description for scene retrieval. Pattern Recogn 43(9):3013–3024 68. Hsu SH, Jumpertz S, Cubaud P (2008) A tangible interface for browsing digital photo collections. In: Proceedings of the ACM TEI 2008, pp 31–32 69. Ji H, Liu C (2008) Motion blur identification from image gra- dients. In: Proceedings of the IEEE CVPR 2007, vol 0, IEEE Computer Society, pp 1–8 70. Jiang H, Martin D (2008) Global pose estimation using non-tree models. In: Proceedings of the IEEE CVPR 2008, pp 1–8 123 http://www.corbis.com http://www.corbis.com http://elib.cs.berkeley.edu/photos/corel/ http://elib.cs.berkeley.edu/photos/corel/ http://creativecommons.org/ http://creativecommons.org/ http://www.imaging-resource.com/NEWS/1126887991.html http://www.imaging-resource.com/NEWS/1126887991.html http://dx.doi.org/10.1109/ICIP.2008.4711706 http://www.flickr.com/ http://www.flickr.com/photos/tags/ http://www.flickr.com/photos/tags/ http://picasa.google.com/ http://photo.net http://photo.net J Braz Comput Soc (2013) 19:341–359 357 71. Jiang W, Loui A, Cerosaletti C (2010) Automatic aesthetic value assessment in photographic images. In: Proceedings of the IEEE ICME 2010, pp 920–925 72. Joshi N, Matusik W, Adelson EH, Kriegman DJ (2010) Personal photo enhancement using example images. ACM Trans Graphics 29(2):1–15 73. Ke Y, Tang X, Jing F (2006) The design of high-level features for photo quality assessment. In: Proceedings of the IEEE CVPR 2006, pp 419–426 74. Khan SS, Vogel D (2012) Evaluating visual aesthetics in photo- graphic portraiture. In: Proceedings of the CAe 2012. Eurograph- ics Association, pp 55–62 75. Kim HN, Saddik AE, Jung JG (2012) Leveraging personal photos to inferring friendships in social network services. Expert Syst Appl 39(8):6955–6966 76. Kruppa H, Bauer MA, Schiele B (2002) Skin patch detection in real-world images. In: Proceedings of the the 24th DAGM Sym- posium on Pattern Recognition. Springer LNCS, pp 109–117 77. Lee C, Schramm MT, Boutin M, Allebach JP (2009) An algorithm for automatic skin smoothing in digital portraits. In: Proceedings of the IEEE ICIP 2009. IEEE Press, New York, pp 3113–3116 78. Lee S, Kim K, Kim JY, Kim M, Yoo HJ (2010) Familiarity based unified visual attention model for fast and robust object recogni- tion. Pattern Recogn 43(3):1116–1128 79. Levin A, Weiss Y (2009) Learning to combine bottom–up and top–down segmentation. Int J Comput Vis 81(1):105–118 80. Li C, Gallagher AC, Loui AC, Chen T (2010) Aesthetic quality assessment of consumer photos with faces. In: Proceedings of the IEEE ICIP 2010, pp 3221–3224 81. Li C, Loui AC, Chen T (2010) Towards aesthetics: a photo quality assessment and photo selection system. In: Proceedings of the ACM MM 2010, pp 827–830 82. Li X, Ling H (2009) Learning based thumbnail cropping. In: Proceedings of the IEEE ICME 2009. IEEE Press, New York, pp 558–561 83. Li Z, Luo H, Fan J (2009) Incorporating camera metadata for attended region detection and consumer photo classification. In: Proceedings of the ACM MM 2009, pp 517–520 84. Li Z, Zhou X, Huang TS (2009) Spatial gaussian mixture model for gender recognition. In: Proceedings of the IEEE ICIP 2009. IEEE Press, New York, pp 45–48 85. Liao WH (2009) A framework for attention-based personal photo manager. In: Proceedings of the the IEEE SMC 2009. IEEE Press, New York, pp 2128–2132 86. Lim SH, Lin Q, Petruszka A (2010) Automatic creation of face composite images for consumer applications. In: Proceedings of the IEEE ICASSP 2010, pp 1642–1645 87. Liu L, Chen R, Wolf L, Cohen-Or D (2010) Optimizing photo composition. In: Proceedings of the Eurographics, vol 29, pp 469–478 88. Liu R, Li Z, Jia J (2008) Image partial blur detection and clas- sification. In: Proceedings of the IEEE CVPR 2007, vol 0. IEEE Computer Society, Los Alamitos, pp 1–8 89. Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum HY (2011) Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell 33(2):353–367 90. Liu Y, Xu D, Tsang IW, Luo J (2011) Textual query of personal photos facilitated by large-scale web data. IEEE Trans Pattern Anal Mach Intell 33(5):1022–1036 91. Loui A, Wood M, Scalise A, Birkelund J (2008) Multidimensional image value assessment and rating for automated albuming and retrieval. In: Proceedings of the IEEE ICIP 2008, pp 97–100 92. Loui AC, Wood MD, Scalise A, Birkelund J (2008) Multidi- mensional image value assessment and rating for automated albuming and retrieval. In: Proceedings of the IEEE ICIP 2008, pp 97–100 93. Lu F, Yang X, Zhang R, Yu S (2009) Image classification based on pyramid histogram of topics. In: Proceedings of the IEEE ICME 2009. IEEE Press, New York, pp 398–401 94. Luo W, Wang X, Tang X (2011) Content-based photo quality assessment. In: Proceedings of the IEEE ICCV 2011, vol. 0. IEEE Computer Society, Los Alamitos, pp 2206–2213 95. Luo Y, Tang X (2008) Photo and video quality evaluation: focus- ing on the subject. In: Proceedings of the ECCV 2008. Springer, Heidelberg, pp 386–399 96. Lux M, Kogler M, del Fabro M (2010) Why did you take this photo: a study on user intentions in digital photo productions. In: Proceedings of the ACM SAPMIA 2010, pp 41–44 97. Maik V, Paik D, Lim J, Park K, Paik J (2010) Hierarchical pose classification based on human physiology for behaviour analysis. IET-CV 4(1):12–24 98. Mantiuk RK, Tomaszewska A, Mantiuk R (2012) Comparison of four subjective methods for image quality assessment. Comput Graphics Forum 31(8):2478–2491 99. Marshall B (2010) Taking the tags with you: Digital photograph provenance. In: Proceedings of the IEEE symposium on data, privacy, and E-Commerce 2010. IEEE Computer Society, Los Alamitos, pp 72–77 100. Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating seg- mentation algorithms and measuring ecological statistics. In: Pro- ceedings of the ICCV 2001, vol 2, pp 416–423 101. Mele K, Suc D, Maver J (2009) Local probabilistic descriptors for image categorisation. IET-CV 3(1):8–23 102. Microsoft Corporation: Bing maps (2012) http://www.bing.com/ maps/. Accessed 31 January 2013 103. Miller AD, Edwards WK (2007) Give and take: a study of con- sumer photo-sharing culture and practice. In: Proceedings of the ACM SIGCHI 2007, pp 347–356 104. Moorthy AK, Obrador P, Oliver N (2010) Towards computa- tional models of the visual aesthetic appeal of consumer videos. In: Proceedings of the ECCV 2010. Springer, Berlin/Heidelberg, pp 1–14 105. Mu Y, Yan S, Liu Y, Huang T, Zhou B (2008) Discriminative local binary patterns for human detection in personal album. In: Proceedings of the IEEE CVPR 2008, vol 0. New York, IEEE Computer Society, pp 1–8 106. Mucientes M, Bugarín A (2010) People detection through quan- tified fuzzy temporal rules. Pattern Recogn 43(4):1441–1453 107. Nikon Corporation: Nikon d90 advanced function (2008). http://chsvimg.nikon.com/products/imaging/lineup/d90/en/ advanced-function/. Accessed 31 January 2013 108. Obrador P (2008) Region based image appeal metric for consumer photos. In: Proceedings of the IEEE Workshop on multimedia signal 2008, pp 696–701 109. Obrador P, Moroney N (2009) Automatic image selection by means of a hierarchical scalable collection representation. In: Pro- ceedings of the SPIE visual communications and image process- ing, San Jose, vol 7257, pp 0W.1–0W.12 110. Obrador P, de Oliveira R, Oliver N (2010) Supporting personal photo storytelling for social albums. In: Proceedings of the ACM MM 2010, pp 561–570 111. Obrador P, Schmidt-Hackenberg L, Oliver N (2010) The role of image composition in image aesthetics. In: Proceedings of the IEEE ICIP 2010, pp 3185–3188 112. O’Hare N, Lee H, Cooray S, Gurrin C, Jones G, Malobabic J, O’Connor N, Smeaton AF, Uscilowski, B (2006) Mediassist: Using content-based analysis and context to manage personal photo collections. In: Proceedings of the CIVR 2006, vol 4071. Springer, Heidelberg, pp 529–532 123 http://www.bing.com/maps/ http://www.bing.com/maps/ http://chsvimg.nikon.com/products/imaging/lineup/d90/en/advanced-function/ http://chsvimg.nikon.com/products/imaging/lineup/d90/en/advanced-function/ 358 J Braz Comput Soc (2013) 19:341–359 113. O’Hare N, Smeaton AF (2009) Context-aware person identi- fication in personal photo collections. IEEE Trans Multiméd 11(2):220–228 114. Oliveira CJS, Araújo AdeA, Severiano CA Jr, Gomes DR (2002) Classifying images collected on the World Wide Web. In: Pro- ceedings of the SIBGRAPI 2002, IEEE Computer Society Press, Fortaleza, pp 327–334 115. Orbanz P, Buhmann JM (2008) Nonparametric bayesian image segmentation. Int J Comput Visi 77(1–3):25–45 116. Paisitkriangkrai S, Shen C, Zhang J (2008) Performance eval- uation of local features in human classification and detection. IET-CV 2(4):236–246 117. Pang Y, Hao Q, Yuan Y, Hu T, Cai R, Zhang L (2011) Summariz- ing tourist destinations by mining user-generated travelogues and photos. Comput Vis Image Underst 115(3):352–363 118. Park HJ, Har DH (2011) Subjective image quality assessment based on objective image quality measurement factors. IEEE Trans Consumer Electron 57(3):1176–1184 119. Peres M (2007) Focal encyclopedia of photography: digital imag- ing, theory and applications, history, and science. Elsevier Science Inc./Focal Press, Boston 120. Pierrard JS, Vetter T (2007) Skin detail analysis for face recogni- tion. In: Proceedings of the IEEE CVPR 2007, pp 1–8 121. Presti LL, Cascia ML (2012) An on-line learning method for face association in personal photo collection. Image Vis Comput 30 (4–5):306–316 122. Qi GJ, Hua XS, Rui Y, Tang J, Zhang HJ (2008) Two-dimensional active learning for image classification. In: Proceedings of the IEEE CVPR 2008, pp 1–8 123. QinAK,ClausiDA(2010)Multivariateimagesegmentationusing semantic region growing with adaptive edge penalty. IEEE Trans Image Process 19(8):2157–2170 124. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106 125. Rahman M, Gamadia M, Kehtarnavaz N (2008) Real-time face- based auto-focus for digital still and cell-phone cameras. In: Proceedings of the IEEE SSIAI 2008. IEEE Computer Society, Los Alamitos, pp 177–180 126. Redi JA, Heynderickx I (2012) Image integrity and aesthetics: towards a more encompassing definition of visual quality. In: Pro- ceedings of the SPIE human vision and electronic imaging XVII 2012, vol 8291. SPIE, San Jose, pp 15.1–15.10 127. Ren T, Liu Y, Wu G (2009) Image retargeting based on global energy optimization. In: Proceedings of the IEEE ICME 2009. IEEE Press, New York, pp 406–409 128. Ren X, Fowlkes CC, Malik J (2008) Learning probabilistic models for contour completion in natural images. Int J Comput Vis 77 (1–3):47–63 129. Rousson M, Paragios N (2008) Prior knowledge, level set repre- sentations & visual grouping. Int J Comput Vis 76(3):231–243 130. Russell SJ, Norvig P (2009) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall, New Delhi 131. Ryoo MS, Aggarwal JK (2009) Semantic representation and recognition of continued and recursive human activities. Int J Comput Vis 82(1):1–24 132. Sandnes F (2010) Unsupervised and fast continent classification of digital image collections using time. In: Proceedings of the ICSSE 2010, pp 516–520 133. Santella A, Agrawala M, Decarlo D, Salesin D, Cohen M (2006) Proceedings of the gaze-based interaction for semi-automatic photo cropping. In: Proceedings of the ACM SIGCHI 2006. ACM Press, New York, pp 771–780 134. Savakis AE, Etz SP, Loui ACP (2000) Evaluation of image appeal in consumer photography. In: Proceedings of the SPIE human vision and electronic imaging V, vol 3959. SPIE, San Jose, pp 111–120 135. Schindler G, Krishnamurthy P, Lublinerman R, Liu Y, Dellaert F (2008) Detecting and matching repeated patterns for automatic geo-tagging in urban environments. In: Proceedings of the IEEE CVPR 2008, pp 208–219 136. Schmugge SJ, Jayaram S, Shin MC, Tsap LV (2007) Objective evaluation of approaches of skin detection using roc analysis. Comput Vis Image Underst 108(1–2):41–51 137. Serrano N, Savakis A, Luo J (2002) A computationally efficient approach to indoor/outdoor scene classification. In: Proceedings of the IEEE ICPR 2002. IEEE Computer Society, Los Alamitos, pp 146–149 138. Setlur V, Takagi S, Raskar R, Gleicher M, Gooch B (2005) Auto- matic image retargeting. In: Proceedings of the ACM MUM 2005. ACM Press, New York, pp 59–68 139. Shen CT, Liu JC, Shih SW, Hong JS (2009) Towards intelli- gent photo composition-automatic detection of unintentional dis- section lines in environmental portrait photos. Expert Syst Appl 36(5):9024–9030 140. Singla P, Kautz H, Gallagher A (2008) Discovery of social rela- tionships in consumer photo collections using markov logic. In: Proceedings of the IEEE CVPR 2008 Workshops, pp 1–7 141. Sinha P (2011) Summarization of archived and shared personal photo collections. In: Proceedings of the ACM WWW 2011, pp 421–426 142. Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vis 80(2):189–210 143. Sony Corporation: Sony party-shot automatic photogra- pher (2009). http://store.sony.com/webapp/wcs/stores/servlet/ ProductDisplay?catalogId=10551&storeId=10151&langId=-1& partNumber=IPTDS1. Accessed 31 January 2013 144. Stein A, Stepleton T, Hebert M (2008) Towards unsupervised whole-object segmentation: Combining automated matting with boundary detection. In: Proceedings of the IEEE CVPR 2008, pp 1–8 145. Su HH, Chen TW, Kao CC, Hsu WH, Chien SY (2011) Scenic photo quality assessment with bag of aesthetics-preserving fea- tures. In: Proceedings of the ACM MM 2011, pp 1213–1216 146. Suh B, Bederson BB (2007) Semi-automatic photo annotation strategies using event based clustering and clothing based person recognition. Interact Comput 19(4):524–544 147. Suh B, Ling H, Bederson BB, Jacobs DW (2003) Automatic thumbnail cropping and its effectiveness. In: Proceedings of the ACM UIST 2003. ACM Press, New york, pp 95–104 148. Tang F, Gao Y (2009) Fast near duplicate detection for per- sonal image collections. In: Proceedings of the ACM MM 2009, pp 701–704 149. Tian A, Zhang X, Tretter DR (2011) Content-aware photo-on- photo composition for consumer photos. In: Proceedings of the ACM MM 2011, pp 1549–1552 150. Tómasson G, Sigurp’orsson H, Jónsson B, Amsaleg L (2011) Photocube: effective and efficient multi-dimensional browsing of personal photo collections. In: Proceedings of the ACM ICMR 2011, pp 70:1–70:2 151. Tong H, Li M, Zhang H, Zhang C (2004) Blur detection for digi- tal images using wavelet transform. In: Proceedings of the IEEE ICME 2004, pp 17–20 152. Tong H, Li M, Zhang HJ, He J, Zhang C (2004) Classification of digital photos taken by photographers or home users. In: Pro- ceedings of the Pacific Rim Conference on Multimedia. Springer, Heidelberg, pp 198–205 153. Tran C, Wijnhoven R, de With P (2011) Text detection in per- sonal image collections. In: Proceedings of the IEEE ICCE 2011, pp 85–86 154. Tsao WK, Lee AJT, Liu YH, Chang TW, Lin HH (2010) A data mining approach to face detection. Pattern Recogn 43(3): 1039–1049 123 http://store.sony.com/webapp/wcs/stores/servlet/ProductDisplay?catalogId=10551&storeId=10151&langId=-1&partNumber=IPTDS1 http://store.sony.com/webapp/wcs/stores/servlet/ProductDisplay?catalogId=10551&storeId=10151&langId=-1&partNumber=IPTDS1 http://store.sony.com/webapp/wcs/stores/servlet/ProductDisplay?catalogId=10551&storeId=10151&langId=-1&partNumber=IPTDS1 J Braz Comput Soc (2013) 19:341–359 359 155. Tsay KE, Wu YL, Hor MK, Tang CY (2009) Personal photo orga- nizer based on automated annotation framework. International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp 507–510 156. Valle E, Cord M, Philipp-Foliguet S, Gorisse D (2010) Indexing personal image collections: a flexible, scalable solution. IEEE Trans Consumer Electron 56(3):1167–1175 157. Vogel J, Schwaninger A, Wallraven C, Bülthoff HH (2007) Cate- gorization of natural scenes: Local versus global information and the role of color. ACM Trans Appl Percept 4(3):19.1–19.21 158. Wan D, Zhou J (2008) Stereo vision using two ptz cameras. Comput Vis Image Underst 112(2):184–194 159. Wang H, Oliensis J (2010) Generalizing edge detection to contour detection for image segmentation. Comput Vis Image Underst 114(7):731–744 160. Wang J, Jia Y, Hua XS, Zhang C, Quan L (2008) Normalized tree partitioning for image segmentation. In: Proceeings of the IEEE CVPR 2008, vol 0. IEEE Computer Society, Los Alamitos, pp 1–8 161. Wang J, Zhu S, Gong Y (2009) Resolution-invariant image repre- sentation for content-based zooming. In: Proceedings of the IEEE ICME 2009. IEEE Press, New York, pp 918–921 162. Wang P, Ji Q (2007) Multi-view face and eye detection using discriminant features. Comput Vis Image Underst 105(2):99–111 163. Wang XF, Huang DS, Xu H (2010) An efficient local chan-vese model for image segmentation. Pattern Recogn 43(3):603–618 164. Wang Y, Huang Q, Gao W (2009) Pornographic image detection based on multilevel representation. IJPRAI 23(8):1633–1655 165. Wichmann FA, Drewes J, Rosas P, Gegenfurtner KR (2010) Ani- mal detection in natural scenes: critical features revisited. J Vis 10(4):6.1–27 166. Xie S, Shan S, Chen X, Chen J (2010) Fusing local patterns of gabor magnitude and phase for face recognition. IEEE Trans Image Process 19(5):1349–1361 167. Xie ZX, Wang ZF (2010) Color image quality assessment based on image quality parameters perceived by human vision system. In: Proceedings of the ICMT 2010, pp 1–4 168. Xin H, Ai H, Chao H (2011) Tretter D Human head-shoulder segmentation. In: Proceedings of the IEEE FG 2011, pp 227–232 169. Xu S, Ye X, Wu Y, Giron F, Leveque JL, Querleux B (2008) Automatic skin decomposition based on single image. Comput Vis Image Underst 110(1):1–6 170. Xu Z, Sun J (2010) Image inpainting by patch propagation using patch sparsity. IEEE Trans on Image Process 19(5):1153–1165 171. Yan S, Wang H, Liu J, Tang X, Huang TS (2010) Misalignment- robust face recognition. IEEE Trans Image Process 19(4): 1087–1096 172. Yanagawa A, Loui AC, Luo J, Chang SF, Ellis D, Jiang W, Kennedy L, Lee K (2008) Kodak consumer video benchmark data set: concept definition and annotation. Columbia University, Technical report 173. Yang Y, Xu D, Nie F, Yan S, Zhuang Y (2010) Image cluster- ing using local discriminant models and global integration. IEEE Trans Image Process 19(10):2761–2773 174. Yanulevskaya V, van Gemert J, Roth K, Herbold A, Sebe N, Geusebroek J (2008) Emotional valence categorization using holistic image features. In: Proceedings of the IEEE ICIP 2008, pp 101–104 175. Yao L, Suryanarayan P, Qiao M, Wang JZ, Li J (2012) Oscar: on-site composition and aesthetics feedback through exemplars for photographers. Int J Comput Vis 96(3):353–383 176. Yeh CH, Ho YC, Barsky BA, Ouhyoung M (2010) Personalized photograph ranking and selection system. In: Proceedings of the ACM MM 2010, pp 211–220 177. Yeh CH, Ng WS, Barsky BA, Ouhyoung M (2009) An esthetics rule-based ranking system for amateur photos. In: Proceedings of the ACM SIGGRAPH 2009, pp 24:1–24:1 178. Yi Y, Yu X, Wang L, Yang Z (2008) Image quality assessment based on structural distortion and image definition. In: Proceed- ings of the international conference on computer science and soft- ware engineering 2008(6):253–256 179. Yin W, Luo J, Chen CW (2010) Semantic adaptation of consumer photo for mobile device access. In: Proceedimgs of the ISCAS 2010, pp 1173–1176 180. YingZ,GuangyaoL,XiehuaS,XinminZ(2009)Geometricactive contours without re-initialization for image segmentation. Pattern Recogn 42(9):1970–1976 181. Yu Z, Au OC, Zou R, Yu W, Tian J (2010) An adaptive unsuper- vised approach toward pixel clustering and color image segmen- tation. Pattern Recogn 43(5):1889–1906 182. Zeng G, Gool LV (2008) Multi-label image segmentation via point-wise repetition. In: Proceedings of the IEEE CVPR 2008, vol 0. IEEE Computer Society, Los Alamitos, pp 1–8 183. Zeng YC (2009) Automatic local contrast enhancement using adaptive histogram adjustment. In: Proceedings of the IEEE ICME 2009. IEEE Press, New York, pp 1318–1321 184. Zha ZJ, Hua XS, Mei T, Wang J, Qi GJ, Wang Z (2008) Joint multi-label multi-instance learning for image classification. In: Proceedings of the IEEE CVPR 2008, vol 0. IEEE Computer Society, Los Alamitos, pp 1–8 185. Zhang H, Fritts JE, Goldman SA (2008) Image segmentation eval- uation: a survey of unsupervised methods. Comput Vis Image Underst 110(2):260–280 186. Zhang M, Zhang L, Sun Y, Feng L, Ma W (2005) Auto Cropping for Digital Photographs. In: Proceedings of the IEEE ICME 2005, pp 438–441 187. Zhang T, Chao H, Willis C, Tretter D (2010) Consumer image retrieval by estimating relation tree from family photo collections. In: Proceedings of the ACM CIVR 2010, pp 143–150 188. Zhou C, Lin S (2007) Removal of image artifacts due to sen- sor dust. In: Proceedings of the IEEE CVPR 2007, vol 0. IEEE Computer Society, Los Alamitos, pp 1–8 123 A survey on automatic techniques for enhancement and analysis of digital photography Abstract 1 Introduction 2 Methodology of the research 2.1 Breadth-first search 2.2 Depth-first search 2.3 Search results 2.4 Considerations on the methodology 3 Enhancement 3.1 On the fly enhancement 3.2 Off-line enhancement 4 Analysis 4.1 Assessment 4.2 Information extraction 4.3 Grouping 4.3.1 Classification and clustering 4.3.2 Summarization 4.3.3 Image retrieval 4.4 Discussion 5 Critical analysis 5.1 Image sets 5.2 Validation 6 Conclusions Acknowledgments References