Deterministic Systems Analysis IN T E R A C T IV E DIGITAL P H O T O G R A P H Y AT SCALE by B rian M ark Summa A dissertation subm itted to the faculty of The University of U tah in partial fulfillment of the requirem ents for the degree of D octor of Philosophy in C om puter Science School of C om puting The University of U tah May 2013 Copyright © B rian M ark Summ a 2013 All Rights Reserved T h e U n i v e r s i t y o f U t a h G r a d u a t e S c h o o l STATEMENT OF DISSERTATION APPROVAL The dissertation o f Brian M ark Summa has been approved by the following supervisory committee members: Valerio Pascucci Chair 2/25/2013 Date A pproved Paolo Cignoni M em ber 12/13/2012 Date A pproved Charles Hansen M em ber 12/14/2012 Date A pproved Christopher Johnson M em ber 12/14/2012 Date A pproved Paul Rosen M em ber 12/14/2012 Date A pproved and by Alan Davis Chair o f the D epartm ent o f _____________________ School o f Com puting and by Donna M . W hite, Interim D ean o f The Graduate School. A B ST R A C T Interactive editing and m anipulation of digital media is a fundam ental com ponent in digital content creation. One media in particular, digital imagery, has seen a recent increase in popularity of its large or even massive image formats. U nfortunately, current systems and techniques are rarely concerned w ith scalability or usability w ith these large images. Moreover, processing massive (or even large) imagery is assumed to be an off-line, autom atic process, although many problems associated w ith these d atasets require hum an intervention for high quality results. This dissertation details how to design interactive image techniques th a t scale. In particular, massive imagery is typically constructed as a seamless mosaic of many smaller images. The focus of this work is the creation of new technologies to enable user interaction in the form ation of these large mosaics. W hile an interactive system for all stages of the mosaic creation pipeline is a long-term research goal, this dissertation concentrates on the last phase of the mosaic creation pipeline - the com position of registered images into a seamless composite. The work detailed in this dissertation provides the technologies to fully realize interactive editing in mosaic com position on image collections ranging from the very small to massive in scale. To Delila C O N T E N T S A B S T R A C T ................................................................................................................................ iii L IS T O F F I G U R E S ................................................................................................................ v iii L IS T O F T A B L E S .................................................................................................................... x v ii A C K N O W L E D G E M E N T S ..................................................................................................x ix C H A P T E R S 1......M O T I V A T IO N A N D C O N T R I B U T I O N S .......................................................... 1 1.1 The P anoram a Creation P i p e l i n e ............................................................................. 4 1.2 B oundaries....................................................................................................................... 7 1.3 Color Correction ........................................................................................................... 12 2. R E L A T E D W O R K ......................................................................................................... 15 2.1 Image Boundaries: S e a m s ........................................................................................... 15 2.1.1 Pairwise B o u n d a rie s ........................................................................................... 15 2.1.2 G raph C u t s ........................................................................................................... 16 2.1.3 A lternative B oundary T echniques.................................................................... 16 2.1.4 Out-of-Core and D istributed C o m p u ta tio n ................................................... 16 2.2 Color Correction: G radient Domain E d itin g .......................................................... 17 2.2.1 Poisson Image Processing .................................................................................. 17 2.2.2 Poisson Solvers .................................................................................................... 17 2.2.3 Out-of-Core C o m p u ta tio n .................................................................................. 19 2.2.4 D istributed C o m p u ta tio n .................................................................................. 19 2.2.5 Cloud Com puting - MapReduce and Hadoop ............................................... 20 2.2.6 Out-of-Core D ata Access .................................................................................. 21 3. S C A L A B L E A N D E F F I C I E N T D A T A A C C E S S ............................................ 22 3.1 Z- and HZ-Order Background .................................................................................... 22 3.2 Efficient M ultiresolution Range Q u e r ie s ................................................................. 25 3.3 Parallel W r i t e ................................................................................................................ 28 3.4 ViSUS Software Framework ...................................................................................... 28 3.4.1 LightStream Dataflow and Scene G ra p h ........................................................ 31 3.4.2 Portable Visualization Layer - ViSUS A ppK it.............................................. 32 3.4.3 Web-Server and P l u g - I n .................................................................................... 33 3.4.4 Additional Application: Real-Time M o n ito rin g .......................................... 34 4. I N T E R A C T I V E S E A M E D I T I N G A T S C A L E ............................................... 36 4.1 O ptim al Image B o u n d a rie s ........................................................................................ 36 4.1.1 Optim al B o u n d a r ie s ........................................................................................... 36 4.1.2 M in-Cut and M in -P a th ...................................................................................... 37 4.1.3 G raph C u t s ........................................................................................................... 38 4.2 Pairwise Seams and Seam Trees ............................................................................... 38 4.3 From Pairwise to Global Seams ............................................................................... 40 4.3.1 The Dual Adjacency M e s h ............................................................................... 41 4.3.2 Branching Points and Intersection R e s o lu tio n ............................................ 42 4.3.2.1 Branching points......................................................................................... 43 4.3.2.2 Removing invalid intersections................................................................. 44 4.4 Out-of-Core Seam P ro c e s s in g .................................................................................... 46 4.4.1 Branching Point and Shared Edge P h a s e ...................................................... 47 4.4.2 Intersection Resolution P h a s e ........................................................................... 48 4.5 Weaving Interactive S y s t e m ...................................................................................... 49 4.5.1 System S p e c ific s.................................................................................................. 49 4.5.1.1 In p u t.............................................................................................................. 49 4.5.1.2 Initial parallel com putation...................................................................... 50 4.5.1.3 Seam network im p ort................................................................................. 50 4.5.2 Interactions ........................................................................................................... 51 4.5.2.1 Seam bending............................................................................................... 51 4.5.2.2 Seam sp littin g .............................................................................................. 52 4.5.2.3 Branching point movement....................................................................... 52 4.5.2.4 Branching point splitting and merging.................................................. 52 4.5.2.5 Im proper user interaction.......................................................................... 52 4.6 Scalable Seam Interactions ......................................................................................... 53 4.7 Results .............................................................................................................................. 55 4.7.1 P anoram a C re a tio n ............................................................................................. 55 4.7.1.1 In-core results............................................................................................... 55 4.7.1.2 Out-of-core results...................................................................................... 56 4.7.2 P anoram a E d itin g ................................................................................................ 58 4.7.2.1 Editing bad seams....................................................................................... 58 4.7.2.2 Multiple valid seams................................................................................... 60 4.8 Lim itations and Future W o r k .................................................................................... 60 5. I N T E R A C T I V E G R A D I E N T D O M A I N E D I T I N G A T S C A L E ............ 64 5.1 Gradient Domain Image P ro c e s s in g ........................................................................ 64 5.2 Progressive Poisson Solver........................................................................................... 65 5.2.1 Progressive Framework ...................................................................................... 65 5.2.1.1 Initial solution.............................................................................................. 66 5.2.1.2 Progressive refinem ent............................................................................... 67 5.2.1.3 Local preview............................................................................................... 67 5.2.1.4 Progressive full solution............................................................................. 68 5.2.1.5 Out-of-core solver........................................................................................ 69 5.2.2 D ata A ccess........................................................................................................... 69 5.2.3 Interactive Preview and Out-of-Core Solver R e su lts................................... 72 5.3 Parallel D istributed Gradient Domain E diting ...................................................... 79 5.3.1 Parallel Solver ....................................................................................................... 80 vi 5.3.1.1 D ata distribution as tiles........................................................................... 80 5.3.1.2 Coarse solution............................................................................................ 81 5.3.1.3 F irst phase: progressive solution............................................................. 81 5.3.1.4 Second phase: overlap solution................................................................ 82 5.3.1.5 Parallel im plem entation d etails............................................................... 82 5.3.2 Results .................................................................................................................. 84 5.3.2.1 NVIDIA cluster........................................................................................... 85 5.3.2.2 Longhorn cluster.......................................................................................... 87 5.3.2.3 Heterogeneous cluster................................................................................. 88 5.4 Gradient Domain Editing on the C l o u d ................................................................. 89 5.4.1 MapReduce and H a d o o p .................................................................................... 89 5.4.1.1 In p u t.............................................................................................................. 90 5.4.1.2 MapReduce tran sfer................................................................................... 90 5.4.2 MapReduce for G radient D o m a in .................................................................... 92 5.4.2.1 Tiles................................................................................................................ 92 5.4.2.2 Coarse solution............................................................................................ 94 5.4.2.3 F irst (map) phase........................................................................................ 94 5.4.2.4 Second (reduce) phase................................................................................ 94 5.4.2.5 Storage in the H D FS.................................................................................. 95 5.4.3 Results .................................................................................................................. 96 5.4.3.1 Scalability...................................................................................................... 97 5.4.3.2 Fault tolerance............................................................................................. 98 6. F U T U R E W O R K .............................................................................................................. 99 A P P E N D I X : M A S S IV E D A T A S E T S ............................................................................. 101 R E F E R E N C E S ......................................................................................................................... 102 vii LIST OF FIG URES 1.1 A 360 degree panoram a taken of the city of Toronto, Ontario, C anada credited to Armstrong, Beere and Hime in 1856....................................................................... 2 1.2 Massive imagery is typically constructed as a mosaic of many smaller images. (a) A panoram a of Salt Lake City comprised of 624 individual images. The combined image is over 3.2 gigapixels in size. (b) The panoram a after being composited into a single seamless image...................................................................... 2 1.3 The three stages of panoram a (mosaic) creation. First, the individual images must be acquired. Second, they are registered into a common coordinate sys­ tem . Third, they are composited (blended) into a single seamless image. My dissertation research has been to provide technologies to enable interactivity for the final composition stage while a high-performance back-end provides a final image........................................................................................................................... 5 1.4 An example of the three stages of a panoram a’s creation. F irst, the individual images are acquired, typically, from a handheld camera. Second, for registra­ tion, the common coordinate system for the images is computed. Third, they are composited (blended) into a single seamless image............................................ 5 1.5 A diagram to illustrate the main options available during the composition stage of panoram a creation. The most simple process is to merge the image directly from the registration. The simplest approach is an alpha-blend of the overlap areas to achieve a smooth transition between images................................ 6 1.6 A simple blending approach is usually not sufficient in mosaics with moving elements. In these cases, the elements produce “ghosts” (circled here in red) in the final blend............................................................................................................... 7 1.7 (a, e) Two examples (Canoe: 6842 x 2853, 2 images and Lake Path: 4459 x 4816, 2 images) of undesirable, yet exactly optim al seams (unique pairwise overlaps) for the pixel difference energy. (b, f) A zoom of visual artifacts caused by this optim al seam. (c, g) The pixel labeling. (d, h) The result produced by Adobe P ho to sh o p ™ . Images courtesy of City Escapes N ature Photography........................................................................................................................ 7 1.8 Hierarchical G raph Cuts has only been shown to work well on hierarchies of two to three levels. For this four picture panoram a example, we can see th a t Hierarchical G raph Cuts produces a solution th a t passes though a dynamic scene element when using four levels of the hierarchy. A typical input value of a ten pixel dilation was used for this example. While a larger dilation param eter could be used, this would require a larger memory and com putational cost which negates the benefits of the technique................................................................ 9 1.9 Even when seams are visually acceptable, moving elements in the scene may cause m ultiple visually valid seam configurations. On the top, this figure shows a four image panoram a (Crosswalk: 4705 x 3543, four images) with three valid configurations. On the bottom , this figure shows a two image panoram a (Apollo-Aldrin: 3432 x 2297, two images) w ith two valid configu­ rations. Images courtesy of NASA................................................................................ 2.1 Although the result is a seamless, sm ooth image, w ithout coarse upsampling the final image will fail to account for large trends th a t span beyond a single overlap and can lead to unwanted, unappealing shifts in color.............................. 3.1 (a) The first four levels of the Z-order space filling curve; (b) 4x4 array indexed using stand ard Z -o rd er.................................................................................................... 3.2 (a) Address transform ation from row-major index ( i , j ) to Z-order index I (Step 1) and th en to hierarchical Z-order index (Step 2); (b) Levels of the hierarchical Z-order for a 4x4 array. The samples on each level remain ordered by the stand ard Z-order................................................................................................... 3.3 O ur fast-stack Z-order traversal of a 4x4 array w ith concurrent index com pu­ tatio n ................................................................................................................................... 3.4 (a) Naive parallel strategy where each process writes its piece of the overall d ataset into the underlying flle, (b) each process transm its each contiguous d a ta segment to an interm ediate aggregator. Once the aggregator’s buffer is complete, the d a ta are w ritten to disk, (c) several noncontiguous memory accesses are bundled into a single message to decrease com munication overhead 3.5 The ViSUS software framework. Arrows denote external and internal depen­ dences of the main software com ponents. Additionally this figure illustrates the relationship w ith several example applications th a t have been successfully developed w ith this framework. .................................................................................. 3.6 The L ightStream Dataflow used for analysis and visualization of a three­ dimensional com bustion simulation (U intah code). (a) Several dataflow m od­ ules chained together to provide a light and flexible stream processing capa­ bility. (b) One visualization th a t is the result from this dataflow ......................... 3.7 The same application and visualization of a Mars panoram a running on an iPhone 3G mobile device (a) and a powerwall display (b). D ata courtesy of NASA. .............................................................................................................................. 3.8 Rem ote visualization and m onitoring of simulations. (a) An S3D com bustion sim ulation visualized from a desktop in the Scientific C om puting and Imaging (SCI) In stitu te (Salt Lake City, U tah) during its execution on the H O P P E R 2 high performance com puting platform in Lawrence Berkeley N ational Lab­ o ratory (Berkeley, California). (b) Two ViSUS dem onstrations of LLNL sim­ ulation codes (M iranda and R aptor) visualized in real-tim e while executed on the B lueG ene/L prototype installed at the IBM booth of the Supercom puting e xh ibit.................................................................................................................................. 10 21 23 24 27 29 30 32 34 35 ix 4.1 The four-neighborhood min-cut solution (a) w ith its dual m in-path solution (b). The m in-cut labeling is colored in red /b lu e and the m in-path solution is highlighted in red .............................................................................................................. 37 4.2 (a) Given a simple overlap configuration a seam can be thought of as a p ath s th a t connects pairs of boundary intersections u and v. (b) Even in a more com plicated case, a valid seam configuration is still com putable by taking pairs of intersections w ith a consistent winding about an image boundary. Note th a t there is an altern ate configuration denoted in gray................................................... 39 4.3 Given two m in-path trees associated w ith a seam ’s endpoints (u,v), a new seam th a t passes through any point in the overlap (yellow) is a simple linear walk up each tre e ............................................................................................................... 39 4.4 (a) A solution to the panoram a boundary problem can be considered as a network of pairwise boundaries between images. (b) O ur adjacency mesh representation is designed w ith this property in mind. Nodes correspond to panoram a images, edges correspond to boundaries and branching points (inter­ sections in red) correspond to faces of the mesh. (c) G raph C uts optim ization can provide more complex pixel assignments where “islands” of pixels assigned to one image can be com pletely bounded by another image. O ur approach simplifies the solution by removing such islands........................................................ 40 4.5 (a) A three overlap adjacency mesh representation. (b) A four overlap initial quadrilateral adjacency mesh w ith its two valid mesh subdivisions. (c) A five overlap pentagon adjacency mesh w ith an example subdivision. ....................... 42 4.6 Considering the full neighborhood graph of a panoram a (a), where an edge exists if an overlap exists between a pair of images, an initial valid adjacency mesh (b) can be com puted by finding all nonoverlapping, maxim al cliques in the full graph, then activating and deactivating edges based on the boundary of each clique...................................................................................................................... 43 4.7 (a) Pairwise seam endpoints closest to a multioverlap (red) are considered a branching point. (b) This can be determ ined by finding a minimum point in the multioverlap w ith respect to m in-path distance from the p artn er endpoints. (c) After the branching point is found, the new seams are com puted by a linear lookup up the p artn er end p oin t’s seam tree. (d) To enable parallel com putation, each branching point is com puted using the initial endpoint location (green) even if it was moved via another branching point calculation (red )...................................................................................................................................... 44 x 4.8 (a) Pairwise seams may produce invalid intersections or crossings in a m ulti­ overlap, which leads to an inconsistent labeling of the dom ain. The gray area on the top can be given the labels A or B and on the b ottom either C or D. (b) Choosing a label is akin to collapsing one seam onto the other. This leads to new image boundaries, which were based on energy functions th a t do not correlate to this new boundary. The top collapse results in a B-C boundary using an A-B seam (C-D seam for the bottom ). (c and d) O ur technique performs a b ette r collapse where each intersection point is connected to the branching point via a minimal p ath th a t corresponds to the proper boundary (B-C). One can thin k of this as a virtu al addition of a new adjacency mesh edge (B-C) at the tim e of resolution to account for the new boundary............... 45 4.9 The phases of out-of-core seam com putation. (a) F irst, branching points are com puted. The seams for all unshared edges can also be com puted during this pass. (b) Second, once the corresponding branching points are com puted, all unshared edges can be com puted w ith a single m in-path calculation. (c) Third, once all the seams for the edges for a given face have been com puted, the intersections can be resolved. Note, the three passes do not necessarily need to be three separate phases since they can be interleaved when the proper input d a ta are ready. .................................................................................................... 47 4.10 The low memory branching point calculation for our out-of-core seam creation technique. (a) Given a face for which a branching point needs to be com puted, (b) the com putation proceeds “round-robin” on the edges of the face to com­ pute the needed seam trees. The images th a t correspond to the edge endpoints and overlap energy are only needed during the seam tree calculation for a given edge on the face. Therefore by loading and unloading these d ata during the “round-robin” com putation, the memory overhead for the branching point com putation is the cost of storing two images, one energy overlap buffer, and one for the seam trees for the given face..................................................................... 48 4.11 For intersections th a t require a resolution seam, the two images which corre­ spond to the overlap needed for the seam must be loaded. In the figure above, these images are the ones th a t correspond to the endpoint of the diagonal, resolution adjacency mesh edge..................................................................................... 49 4.12 Overview of Panorama Weaving. The initial com putation is given by steps one through four, after which the solution is ready and presented to the user. Interactions, steps five and six, use the tree u pd ate in step four as a background process. Additionally, step six updates the dual adjacency m esh......................... 50 4.13 Im porting a seam network from another algorithm . The user is allowed to im port the result generated by G raph C uts (a) and adjust the seam between the green and purple regions to unm ask a moving person (b). Note th a t this edit has only a local effect, and th a t the rest of the im ported network is unaltered. 51 4.14 Im proper user constraints are resolved or if resolution is not possible, given visual feedback. (a) Resolution of an intersection caused by a user moving a constraint. (b) Resolution of an intersection caused by a user moving a branching point. (c) A non-resolvable case where a user is ju st provided a visual cue of a problem .................................................................................................... 53 xi 4.15 Given the inherent locality of the seam editing interactions, only a very small subset of the adjacency mesh needs to be considered. (a) For operations on an adjacency mesh face (i.e., branching point operations) only the images and overlaps of the corresponding face and its one face neighborhood need to be loaded and com puted. (b) For edge operations (i.e., bending), we need consider only the faces th a t share the edge................................................................. 54 4.16 W hen a user selects an area of a panoram a to edit, the system must determ ine which overlaps intersect w ith the selected area. This can be accomplished w ith a (a) bounding hierarchy of the overlaps. D uring selection this hierarchy is traversed to isolate the proper overlaps for the selection. This gives a logarithm ic lookup w ith respect to the num ber of adjacency mesh faces in the panoram a. Alternatively, (b) if a pixel-to-image labeling is provided, this can be used to isolate a fixed neighborhood th a t needs to be tested for overlap intersection. This labeling is commonly com puted if the panoram a is to be fed into a color correction routine after seam com pu tation .................................... 54 4.17 Fall Salt Lake City, 126,826 x 29,633, 3.27 gigapixel, 611 images. (a) An example window com puted w ith out-of-core G raph C ut technique introduced in Kopf et al. [94]. This single window took 50 minutes for G raph Cuts to converge, w ith the initial iteration requiring 10.2 m inutes. Since the full d ataset contains 495 similar windows, using the windowed technique would take days (85.15 hours) at best, and weeks (17.2 days) in the worst case. (b) The full resolution Panorama Weaving solution was com puted in 68.4 minutes on a single core and 9.5 m inutes on eight cores. Our single core im plem entation required a peak memory footprint of only 290 megabytes while using eight cores had peak memory of only 1.4 gigabytes............................................................ 57 4.18 Lake Louise, 187,069 x 40.202, 7.52 gigapixel, 1512 images. The Panorama Weaving results for the Lake Louise panoram a. Our out-of-core seam compu­ tation produces this full resolution solution in as little as 37.7 minutes while requiring at most only 2.0 gigabytes of memory. P anoram a courtesy of City Escapes N ature Photography ......................................................................................... 59 4.19 Panorama Weaving on a challenging data-set (Nation, 12848 x 3821, nine images) with moving objects during acquisition, registration issues and varying exposure. Our initial autom atic solution (b) was com puted in 4.6 seconds at full resolution for a result with lower seam energy th a n G raph Cuts. Addi­ tionally, we present a system for the interactive user exploration of the seam solution space (c), easily enabling: (d) the resolution of moving objects, (e) the hiding of registration artifacts (split pole) in low contrast areas (scooter) or (f) the fix of semantic notions for which autom atic decisions can be unsatisfactory (stoplight colors are inconsistent after the autom atic solve). The user editing session took only a few minutes. (a) The final, color-corrected panoram a......... 59 4.20 R epairing non-ideal seams may give m ultiple valid seam configurations. (a) The initial seam configuration for the Skating d ataset (9400 x 4752, six images) based on gradient energy. (b and c) Its two m ajor problem areas. (d and e) Using our technique a user can repair the panoram a, b ut also has the choices of two valid seam configurations. P anoram a courtesy of City Escapes N ature Photography........................................................................................................................ 59 xii 4.21 A panoram a taken by Neil A rm strong during the Apollo 11 moon landing (Apollo-Armstrong: 6,913 x 1,014, eleven images). (a) R egistration artifacts exist on the horizon. (b) Our system can be used to hide these artifacts. (c) The final color-corrected image. P anoram a courtesy of NASA............................. 61 4.22 In this example (Graffiti: 10,899 x 3,355, ten images), (a) the user fixed a few recoverable registration artifacts and tuned the seam location for improved gradient-dom ain processing, yielding a colorful color-corrected graffiti. (b) O ur initial autom atic solution (energy function based on pixel gradients). (c) The user edited panoram a. The editing session took 2 m inutes. ....................... 61 4.23 The color-corrected, user edited examples from Figure 1.7. The artifacts caused by the optim al seams can be repaired by a user. Images courtesy of City Escapes N ature P hotography............................................................................ 62 4.24 A lake v ista panoram a (Lake: 7,626 x 1,231, 22 images) w ith canoes which move during acquisition. In all there are six independent areas of movement, therefore there are 64 possible seam configurations of different canoe positions. Here we illustrate two of these configurations w ith color-corrected versions of the full panoram a (a and c) and a zoomed in portion on each panoram a (b and d) showing the differing canoe positions. P anoram a courtesy of City Escapes N ature P hotography......................................................................................................... 62 4.25 Splitting a five valence branching point based on gradient energy of the Fall­ 5way d ataset (5211 x 5177, 5 images): as the user splits the pentagon, the resulting seams m ask/unm ask the dynamic elements. Note th a t each branch­ ing point th a t has a valence higher th a n 3 can be fu rth er subdivided. ............ 62 5.1 O ur adaptive refinement scheme using simple difference averaging. (a) Global progressive up-sampling of the edited image com puted by a background pro­ cess. (b) View-dependent local refinement based on a 2k x 2k window. In both cases we speedup the SOR solver w ith an initial solution obtained by sm ooth refinement of the solution................................................................................................ 68 5.2 Subsampled and tiled hierarchies. (a) A subsam pled hierarchy. As expected, subsam pling has the tendency to produce high-frequency aliasing. Though details such as the cars on the highway and in the parking lots are preserved. (b) A tiled hierarchy. This produces a more visually pleasing image at all resolutions b ut at the cost of potentially losing information. The cars are now completely smoothed away. D ata courtesy of the U.S. Geological Survey.......... 70 5.3 O ur progressive framework using subsam pled and tiled hierarchies. (a) A com posite satellite image of A tlanta, over 100 gigapixels at full resolution, overlaid on Blue M arble background subsampled; (b) a tiled version of the same satellite image; (c) the seamless cloning solution using subsampling; (d) the same solution com puted using a tiled hierarchy; (e) the solution offset com puted using subsampling; (f) the solution com puted using tiles; (g) a full resolution portion com puted using subsampling; (h) the same portion using tiling. Note th a t even though there is a slight difference in the com puted solution, both the tiled and the subsam pled hierarchies produce a seamless stitch w ith our framework. D ata courtesy of the U.S. Geological Survey and NASA’s E a rth O bservatory............................................................................................. 71 xiii 5.4 The Edinburgh P anoram a 16,950 x 2, 956 pixels. (a) O ur coarse solution com puted at a resolution of 0.7 megapixels; (b) the same panoram a solved at full resolution w ith our progressive global solver scaled to approxim ately 12 megapixel for publication; (c) a detail view of a particularly bad seam from the original panoram a; (d) the problem area previewed using our adaptive local refinement; (e) the problem area solved at full resolution using our global solver in 3.48 m inutes....................................................................................................... 74 5.5 The RMS error when com pared to the ideal analytical solution as we increase iterations for both m ethods. Stream ing multigrid has b ette r convergence and less error for the Edinburgh example (a), though our m ethod remains stable for the larger Salt Lake City panoram a (b). Notice th a t every plot has been scaled independently to best illustrate the convergency trends of each m ethod. 75 5.6 P anoram a of Salt Lake City of 3.27 gigapixel, obtained by stitching 611 images. (a) Mosaic of the original images. (b) Our solution com puted at 0.9 megapixel resolution. (c) The full solution provided by our global solver. (d) The difference image between our preview and the full solution at the preview resolution. B oth (a) and (c) have been scaled for publication to approxim ately 12.9 megapixels.................................................................................................................. 76 5.7 A comparison of our adaptive local preview on a portion of the Salt Lake City panoram a one half of the full resolution; (a) the original mosaic, (b) our adaptive preview, (c) the full solution from our global solver, and (d) the difference image between the adaptive preview and the full solution ................ 76 5.8 A com parison of our system w ith the best known out of core m ethod [Kazhdan and Hoppe 2008] and a full analytical solution on a portion of the Salt Lake City panoram a, 21201 x 24001 pixels, 485 megapixel (a) the full analytical so­ lution; (b) our solution com puted in 28.1 minutes; (c) solution from [Kazhdan and Hoppe 2008] com puted in 24.9 minutes; (d) the analytical solution where the solver is allowed to harmonically fill the boundary; (e) our solution with harmonic fill; (f) solution from [Kazhdan and Hoppe 2008] w ith harmonic fill; (g) the m ap image used by all solvers to construct the panoram a where the red color indicates the image th a t provides the pixel color and white denotes the panoram a boundary................................................................................................... 77 5.9 Application of our m ethod to HDR image compression: (a) Original synthetic HDR image of an adaptively refined Sierpinki sponge generated w ith Povray. (b) Tone m apped image w ith recovery of detailed inform ation previously hid­ den in the shadows. (c) Belgium House image solved using our coarse-to-fine m ethod w ith an initial 16 x 12 coarse solution (a = 0.01, = 0.7, compression coefficient=0.5). (d) The direct analytical solution. Image courtesy of R aanan F a tta l.................................................................................................................................... 78 5.10 Satellite imagery collection w ith a background given by a 3.7 gigapixel image from NASA’s Blue M arble Collection. T he Progressive Poisson solver allows the application of the seamless cloning m ethod to two copies of the city of A tlanta, each of 116 gigapixels. An artist can interactively place a copy of A tlanta under shallow w ater and recreate the lost city of Atlantis. D ata courtesy of the U.S. Geological Survey and NASA’s E a rth O bservatory............. 79 xiv 5.11 O ur tile-based approach: (a) An input image is divided into equally spaced tiles. In the first phase, after a symbolic padding by a column and row in all dimensions, a solver is run on a window denoted by a collection of four labeled tiles. D ata are sent and collected for the next phase to create new d a ta windows w ith a 50% overlap. (b) An example tile layout for the Fall P anoram a example. ....................................................................................................... 81 5.12 Windows are d istributed as evenly as possible across all nodes in the dis­ trib u ted system. Windows assigned to a specific node are denoted by color above. Given the overlap scheme, d a ta transfer only needs to occur one-way, denoted by the red arrows and boundary above. To avoid starvation between phases and to hide as much d a ta transfer as possible, windows are processed in inverse order (white arrows) and the tiles needed by other nodes are transferred im mediately......................................................................................................................... 83 5.13 Fall P anoram a - 126,826 x 29, 633, 3.27 gigapixel. (a) The panoram a before seamless blending and (b) the result of the parallel Poisson solver run on 480 cores w ith 124 x 29 windows and com puted in 5.88 m inutes................................. 85 5.14 W inter P anoram a - 92, 570 x 28,600, 2.65 gigapixel. (a) The result of the parallel Poisson solver run on 480 cores w ith 91 x 28 windows and com puted in 6.02 m inutes, (b) the panoram a before seamless blending, and (c) the coarse panoram a solution............................................................................................................. 85 5.15 The two phases of a M apReduce job. In the figure, three m ap tasks produce key/values pairs th a t are hashed into two bins corresponding to the two reduce tasks in the job. W hen the d a ta are ready, the reducers grab their needed d ata from the m ap per’s local disk.......................................................................................... 90 5.16 A diagram of the job control and d a ta flow for one Task Tracker in a Hadoop job. The d otted, red arrows indicate d a ta flow over the network; dashed arrows represent communication; the blue arrow indicates a local d a ta w rite and the black arrows indicate an action taken by the node................................................... 91 5.17 Although the result is a sm ooth image, w ithout coarse upsampling the final image will fail to account for large trends th a t span beyond a single overlap and can lead to unwanted shifts in color. Notice the vertical banding denoted by the red arrows............................................................................................................... 92 5.18 The 512 x 512 tiles used in our Edinburgh (a), Redrock (b), and Salt Lake City (c) exam ples.............................................................................................................. 93 5.19 Our tile-based approach: An input image is divided into equally spaced tiles. In the m ap phase after a symbolic padding by a column and row in all dimensions, a solver is run on a collection of four tiles labeled by numbers above. After the m apper finishes, it assigns a key such th a t each reducer runs its solver a collection of four tiles th a t have a 50% overlap with the previous solutions. ......................................................................................................................... 93 xv 5.20 The results of our cloud im plem entation, from top to bottom : Edinburgh, 25 images, 16, 950 x 2, 956, 50 megapixel and the solution to Edinburgh from our cloud im plem entation; Redrock, nine images, 19, 588 x 4, 457; 87 megapixel and the solution to Redrock from our cloud im plem entation; Salt Lake City, 611 images, 126,826 x 29,633, 3.27-gigapixel and the solution to Salt Lake City from our cloud im plem entation............................................................................ 97 5.21 (a) The scalability plot for the Edinburgh (50 megapixel) panoram a on our one node 8-core test desktop; (b) the scalability plot for Redrock (87 megapixel) panoram a on the same machine .................................................................................... 98 6.1 A typical example of interaction during panoram a registration from the open- source Hugin [77] software tool. Current interaction is limited to the manual selection and deletion of feature points used during registration .......................... 100 xvi LIST OF TABLES 4.1 Performance results com paring Panorama Weaving to G raph Cuts for our test d atasets th a t contain more th a n simple pairwise overlaps. Panorama Weaving run serially (PW -S) com putes solutions quickly. W hen run in p ar­ allel, runtim es are reduced to ju st a few seconds. The energy ratio (E. ratio) between the final seam energy produced by Panorama Weaving and G raph C uts (P W Energy / GC Energy) is shown. For all b ut one d ataset (Fall-5way), Panorama Weaving produces a lower energy result. It is com parable otherwise. P anoram a image sizes are reported in megapixels (M P )......................................... 56 4.2 Strong scaling results for the Fall Salt Lake City panoram a, 126, 826 x 29,633, 3.27 gigapixel, 611 images. Our out-of-core Panorama Weaving technique scales very well in term s of efficacy percentage compared to ideal scaling up to the physical cores of our test system (eight cores). At eight cores our technique loses a slight am ount of efficiency due to our im plem entation having a dedicated thread to handing the seam scheduling. Using the full eight cores to process this panoram a provides a full resolution seam solution in ju st 9.5 minutes. The system is extremely light on memory and uses a t most 1.4 gigabytes. ......................................................................................................................... 57 4.3 Strong scaling results for the Lake Louise panoram a, 187,069 x 40.202, 7.52 gigapixel, 1512 images. Like the smaller Fall Salt Lake city panoram a, our im plem entation shows very good efficiency up to the physical num ber of cores on our test system. Using the full eight cores for the full resolution seam solution for this panoram a requires 37.7 minutes of com pute tim e and at most 2.0 gigabytes of memory.................................................................................................. 58 5.1 The strong scaling results for the Fall Panoram a run on the NVIDIA cluster from 2-60 nodes up to a to tal of 480 cores. Overhead (O /H ) due to M PI communication and I/O is also provided along with its percentage of actual running time. The Fall Panoram a, due to its larger size begins to lose efficiency a t around 32 nodes when I/O overhead begins to dom inate. Even with this overhead, the efficiency (Eff.) remains acceptable.................................................... 86 5.2 The strong scaling results for the W inter P anoram a run on the NVIDIA cluster from 2-60 nodes up to a to tal of 480 cores. Overhead (O /H ) due to M PI communication and I/O is also provided along with its percentage of actual running tim e. For the W inter Panoram a, the I/O overhead does not effect performance up to 60 nodes and the im plem entation m aintains efficiency (Eff.) throughout all of our ru n s............................................................................................... 86 5.3 Weak scaling tests run on the NVIDIA cluster for the Fall P anoram a d ataset. As the num ber of cores, increases so does the image resolution to be solved. The image was expanded from the center of the full image. Iterations of the solver for all windows were locked at 1000 for testing to ensure no variation is due to slower converging image areas. As is shown, our im plem entation shows good efficiency even when running on the maxim um num ber of cores................. 87 5.4 To dem onstrate the p ortability of our im plem entation, we have run strong scalability testing for the Fall P anoram a on the Longhorn cluster from 2-128 nodes up to a to ta l of 1024 cores. As the num bers show, we m aintain good scalability and efficiency even when running on all available nodes and cores. . 87 5.5 Weak scaling tests run on the Longhorn cluster for the Fall P anoram a d ataset. 88 5.6 O ur simulated heterogeneous system. This test example is a simulated mixed system of 2 8-core nodes, 4 4-core nodes, and 8 2-core nodes. The weights for our framework are the num ber of cores available in each node. The tim ings and window distributions are for Fall P anoram a dataset. As you can see, with the proper weightings our framework can distrib ute windows proportionally based on the performance of the system. The max runtim e of 32.70 minutes for this 48 core system is on par w ith tim ings for the 32 core (40.08 minutes) and 64 core (20.83 minutes) runs from the strong scaling te s t.............................. 89 A.1 Massive panoram a d a ta acquired and used in this dissertation work...................101 A.2 Massive satellite d a ta acquired and used in this dissertation work.......................101 xviii A C K N O W L E D G E M E N T S This dissertation was m ade possible through the support of others who I ’d like the thank: F irst, I would like to th an k my family whose endless support made this work possible. T hank you Mom, Dad, Chris, Jason and Amira for your wholehearted support of my decision to quit my job and move across the country to go back to school (in U tah of all places). Most im portantly, I would like to th an k Delila, my p artn er in all of this, for taking this adventure w ith me. You are the source of my inspiration in all things. I would like to th a n k my advisor and mentor, Valerio Pascucci, for his continued guidance and encouragement. We both took a very big chance in our first semester as student and professor at U tah, which, I believe, reaped a very huge reward. Since the sta rt of my work, he has given me the support and confidence I needed to succeed and for this I am grateful. I hope this dissertation is not the end, but the beginning of a long collaboration. I would also like to th a n k the other members of my com mittee: Chris, Chuck, P aul and Paolo for their feedback on this work. I would like to especially th an k K rzysztof Sikorski who was a member of my com m ittee for most of my tim e at U tah. Despite his illness, he was a constant source of guidance and encouragement, for which I am grateful. Finally, I would like to th a n k the many collaborators I have had while here at U tah, and I hope you forgive me for not w riting the numerous list. This work was only possible through these collaborations. I cannot express the p erpetual astonishm ent on w hat we w ere/are able to accomplish together. C H A P T E R 1 M OTIVATION A N D C O N T R IB U T IO N S Interactive editing and m anipulation of digital media is a fundam ental com ponent in digital content creation. One media in particular, digital imagery, has seen a recent increase in popularity of its large or even massive image formats. U nfortunately, current systems and techniques are rarely concerned w ith scalability or usability w ith these large images. For example, the support for large imagery in the most prevalent interactive image editing application, Adobe P h o to sh o p ™ , lacks tru e viability for to d a y ’s massive images. The application’s large image form at has a 90 gigapixel maxim um image size, lim ited editing functionality beyond 900 megapixel, and a tedious processing tim e during an interactive session. Moreover, the creation and processing of large imagery is assumed to be an offline, autom atic process though many of the problems associated w ith these d atasets require hum an intervention for repair. The work outlined in this dissertation will show th a t this expensive, offline assum ption need not be tru e and th a t real-tim e interaction provides new and powerful environm ents for the creation and editing of massive images. Specifically, this work will detail how to design interactive image processing algorithm s th a t scale. There has always been an inherent hum an desire to docum ent or replicate large vistas of our n atu ral world or to docum ent historical events in detail. Panoram ic paintings reached the height of their popularity in the early 19th century due to improvements in perspective drawing techniques. A few decades later, the advent of modern photography was closely followed by the earliest work in the creation of panoram ic images, see Figure 1.1. In the years since, the popularity of panoram as has not waned, see Figure 1.2. These large, sweeping images capture the feeling of being an observer, w hether it is of a beautiful n atu ral view, a historic event such as a P residential inau gu ratio n ,1 or a rem inder of the destruction of w ar.2 Consequently, there exists a significant interest in creating and using large mosaics 1B a ra k O b am a P resid e n tia l In au g u ratio n : h ttp ://g ig a p a n .o rg /g ig a p a n s /1 5 3 7 4 / 2H iroshim a P a n o ra m a P ro jec t: h ttp ://w w w .iw u .e d u / rw ilso n /h iro sh im a / http://gigapan.org/gigapans/15374/ http://www.iwu.edu/ 2 F ig u r e 1.1: A 360 degree panoram a taken of the city of Toronto, O ntario, C anada credited to A rm strong, Beere and Hime in 1856. F ig u r e 1.2: Massive imagery is typically constructed as a mosaic of many smaller images. (a ) A panoram a of Salt Lake City comprised of 624 individual images. The combined image is over 3.2 gigapixels in size. (b) The panoram a after being com posited into a single seamless image. for personal, scientific, a n d /o r commercial applications. Examples include medical imaging, where electron microscopy d a ta is composited into ultra-high resolution images [159] or the study of phenology and genomics.3 Massive imagery is also common in geographic inform ation systems (GIS) in th e form of aerial or satellite d a ta and used for anything from urban planning to global clim ate research. While many-megapixel cameras do exist,4 they are overly expensive and unwieldy to use. Therefore, massive imagery is typically constructed as a mosaic of many smaller images. 3G igaVision P ro ject: h ttp ://w w w .g ig a v isio n .o rg / 4Seitz 6x17 D igital: h ttp ://w w w .ro u n d s h o t.c h /x m L 1 /in te rn e t/d e /a p p lic a tio n /d 4 3 8 /d 9 2 5 /f9 3 4 .c fm http://www.gigavision.org/ http://www.roundshot.ch/xmL1/internet/de/application/d438/d925/f934.cfm 3 At one time, images such as the one in Figure 1.1 were painstakingly constructed by hand. Recent innovations in algorithm s and available hardw are have drastically simplified the creation of small-scale panoram as. This process can be com puted simply offline and can now be embedded in com m odity cam eras (e.g., the Sony Cyber-shot 3D Sweep Panoram a) or mobile devices such as A pple’s iPhone. The panoram as for these algorithm s are assumed to be small and therefore, are not designed to scale. For example, the iPhone’s panoram a feature, released in Septem ber 2012, has a strict 28 megapixel upper limit on the panoram a size. This is small by to d a y ’s standards. An online search for the word “gigapixel” powerfully dem onstrates the increasing desire to create ever larger panoram as. To date, the largest panoram a contains roughly 272 gigapixel, yet if the current trend continues, this record is bound to be broken w ithin a few m onths. This trend is aided by the introduction of new, high-resolution image sensors. For example, w ith current state-of-the-art 36.3 active megapixel CMOS sensors, it would take as little as 70 images to produce a gigapixel panoram a. Creating panoram as at large scales has become exponentially more difficult th a n the simple, small cases for which panoram a techniques were originally designed. For example, the 3.2 gigapixel panoram a shown in Figure 1.2 took several hours to capture and an order of m agnitude more tim e to process on conventional hardware. Furtherm ore, this tim eline assumes a perfect capture and one-time processing. In practice, the process of setting up an autom ated cam era is complicated and error prone and often problems, such as unanticipated occlusion or global m isalignment occur. Unfortunately, many of these issues are only subtly expressed in the individual images and become apparent only after the creation of the final panoram a. Additionally, to d a y ’s processing pipelines are less th a n ideal and typically involve a large num ber of interdependent and unintuitive param eter choices. A mistake or unlucky choice in the setup can easily cause unacceptable artifacts in the image requiring a repeat of the process. Consequently, it may take several weeks and significant com putational resources to produce one large-scale panoram a. This makes it difficult for all b ut a select few to create such images and makes this imagery im practical for many interesting applications. For example, acquiring imagery from unusual locations such as national parks, or covering transient events like an aurora, becomes a significant logistical and m onetary challenge. Furtherm ore, in security applications waiting hours or days for viable results defeats the prim ary purpose of acquiring the images. Finally, in scientific applications, while typically less tim e constrained, the personal and com putational 4 resources necessary to create a large-scale image of the night sky, for example, are beyond the reach of all but the largest projects. Therefore, despite significant interest, creating these massive images remains an esoteric hobby or a closely guarded research project. Work must be done to close the gap between the desire to create large-scale panoram as (and their potential applications) and the ability to capture, process, and utilize such imagery. An ideal panoram a system should allow a user to browse individual images as they are acquired, to setup and preview the processing pipeline and results through accurate, real-tim e approxim ations, and include a flexible and scalable offline com ponent to produce a final image. The system should be divided into two components: First, a real-time framework to supervise and steer the acquisition process and to guide the postprocessing; and second, a com putational back-end based on a d istributed or cloud com puting framework. The real-time system should be able to run on devices as small as an iPad or netbook com puter and be designed to be used in the field to detect any problems as early as possible. The back-end fills the gap between commonly available but slow com modity hardw are and specialized d istributed com puting resources. By implementing a flexible framework able to run on a wide variety of heterogeneous systems, the back-end scales gracefully between a single multicore machine, or a small cluster, to more powerful hardware. My dissertation research has been the creation of technologies to aid in the creation of such a system. 1.1 The Panoram a Creation P ipeline Creating large-scale panoram as can be divided into three stages: acquisition, image registration, and composition. See Figures 1.3 and 1.4. Each stage individually has been the focus of a large am ount of research b ut little effort has been spent on real-tim e performance or their interdependence. Traditionally all three stages are treated as separate postprocesses, making performance or pipelining a secondary consideration. However, as discussed above, this approach is rapidly becoming unsustainable as long acquisition and processing times lead to errors as well as an increased num ber of hardw are and software failures. To this end, my research goals are and have been to develop algorithms for each stage th a t produce high quality approxim ations in real-tim e and provide a scalable infrastructure to create full solutions exploiting all available hardware. Such new technologies would enable a wide variety of applications currently infeasible. For example, the ability to quickly and cheaply produce high resolution images of art galleries, historical events, or national parks would 5 Acquisition F ig u r e 1.3: The three stages of panoram a (mosaic) creation. F irst, the individual images must be acquired. Second, they are registered into a common coordinate system. Third, they are composited (blended) into a single seamless image. My dissertation research has been to provide technologies to enable interactivity for the final com position stage while a high-performance back-end provides a final image. A cquisition Registration C om position F ig u r e 1.4: An example of the three stages of a pan o ram a’s creation. F irst, the individual images are acquired, typically, from a handheld camera. Second, for registration, the common coordinate system for the images is com puted. Third, they are composited (blended) into a single seamless image. greatly benefit schools, universities and the public in general. Enabling the m ilitary to combine footage from multiple security cameras, satellites, or flying drones into a seamless overview would allow operators to more accurately spot changes in a secure area or direct ground operations. While a full system is a long-term research goal, my dissertation work has focused on the last phase of the mosaic creation pipeline, specifically the composition stage. After registration, image mosaics are combined in order to give the illusion of a seamless, massive image. Images acquired w ith inexpensive robots and consumer cameras pose an interesting challenge for image processing techniques. Often, panoram a robots can take seconds between each photograph, causing gigapixel-sized images to be taken over the course of hours. Due to this delay, images can vary significantly in lighting conditions a n d /o r exposure, and when registered can form an unappealing patchwork. Dynamic objects between images may also move during acquisition, ruining the illusion of a single, seamless image. Images acquired by air or satellite also suffer from an extrem e version of this problem, where the tim e of acquisition can vary from hours to days for a single composite. Therefore, 6 minimizing the transition between images is the fundam ental step in the composition stage, see Figure 1.5. The simplest transition approach is an alpha-blend of the overlap areas. Szeliski [155] provides an excellent introduction to this and other blending techniques. Such an approach does not work well in the presence of dynam ic elements which move between captures, artifacts from poor registration, or varying exposures across images, see Figure 1.6. Often, it is preferable to com pute a “h ard ” boundary, or seam, between the images as a final step, or as the preprocess for a technique such as gradient dom ain blending [133, 103]. Techniques exists to com pute these seams based purely on distance [174, 132], but like blending, these will perform poorly when the scene contains moving elements. A more sophisticated approach is to com pute the boundaries between images through an energy function minim ization to produce a nice transition between the mosaic images. These boundaries often provide the illusion of a seamless com posited image. If exposure or lighting conditions vary among the images, a final color correction is necessary to produce a sm ooth image. Techniques such as gradient dom ain blending [133, 103], mean value coordinates [54], or bilateral upsam pling [93] have been shown to provide adequately sm ooth images. G radient dom ain blending remains the most popular, b ut also the most com putationally expensive technique for color correction due to the quality of its final results. The techniques associated w ith panoram a boundaries and blending are typically com­ putationally expensive and are considered an offline, postprocess for large (and even small) panoram as. As the focus of my dissertation work, I provide novel algorithm s and techniques to bring these operations into an interactive setting for massive imagery. F ig u r e 1.5: A diagram to illustrate the main options available during the com position stage of panoram a creation. The most simple process is to merge the image directly from the registration. The simplest approach is an alpha-blend of the overlap areas to achieve a sm ooth transition between images. 7 F ig u r e 1.6: A simple blending approach is usually not sufficient in mosaics w ith moving elements. In these cases, the elements produce “ghosts” (circled here in red) in the final blend. 1.2 Boundaries In the past, panoram a image collections were captured in one sweeping motion (i.e., with image overlaps in only one dimension as in Figure 1.7). Today’s images are often collections of multiple rows and columns or in more u nstructured configurations. Consequently, more sophisticated panoram a processing techniques continue to be developed to account for their more complex configurations. After the initial registration, the p an o ram a’s individual images are blended to give the illusion of a single seamless image. As a usual first step, a boundary between images m ust be com puted as input for a color correction technique such as gradient dom ain blending [133, F ig u r e 1.7: (a, e) Two examples (Canoe: 6842 x 2853, 2 images and Lake P ath : 4459 x 4816, 2 images) of undesirable, yet exactly optim al seams (unique pairwise overlaps) for the pixel difference energy. (b, f) A zoom of visual artifacts caused by this optim al seam. (c, g) The pixel labeling. (d, h) The result produced by Adobe P h o to sh o p ™ . Images courtesy of City Escapes N ature Photography. 8 103]. These boundaries are often called seams. Using a global optim ization technique, these seams can be optimized to minimize visual artifacts due to transition between images. This is typically a pixel-based energy function such as color or color-gradient variations across the boundary. Currently, the most used technique for global seam com putation in a panoram a is the G raph Cuts algorithm [26, 24, 92]. This is a popular and robust com puter vision technique and has been adapted [101, 4] to com pute the boundary between a collection of images. W hile this technique has been used w ith good success for a variety of panoram ic or similar graphics applications [101, 4, 5, 3, 2, 94, 89, 44, 90], it can be problem atic due to its high com putational cost and memory requirements. Moreover, G raph C uts applied to digital panoram as is a typical serial operation. Since com puting the globally optim al boundaries between images is known to be N P-hard when the panoram a is composed of more th a n a collection of unique pairwise overlaps [26], G raph C uts aims to efficiently approxim ate the optim al solution and can therefore fall into local m inim a of the solution space. There has been a large body of work to reduce some of the costs associated w ith G raph C uts [25, 24, 142, 5, 107, 61, 124, 167, 138, 69, 106], b ut each of these works primarily focuses on G raph C u ts’ typical image segm entation or de-noising applications. The success of many of these algorithm s has yet to be dem onstrated for digital panoram as. Those which have been used in a panoram a context can suffer from lim itations. For example, the popular Hierarchical G raph C uts technique [107, 5] has been shown to operate well on hierarchies up to two to three levels in digital panoram as [5] and can be observed practice in panoram as such as the one shown in Figure 1.8. Given the recent trend of the increasing resolution of panoram as (many megapixels to gigapixels), one can see th a t this lim ited hierarchy would not be sufficient to com pute the seams of images of these sizes. As a second example, G raph C uts often needs an integer based energy function to guarantee convergence. This can prove problem atic for high dynamic range (HDR) panoram as. To overcome these types of lim itations, a new approach was designed based on the following observations: • A minimal energy seam does not necessarily give visually pleasing results. Figure 1.7 provides two examples of panoram as w ith an exact pairwise optim al energy boundary based on pixel difference across the seam. This should be sensitive to dynamic, moving objects which appear in the overlap. As you can see, neither seam would be considered ideal by a user since they cut through moving objects in the scene. Additionally, to 9 Full Resolution Two Levels Three Levels Four Levels F ig u r e 1.8: Hierarchical G raph Cuts has only been shown to work well on hierarchies of two to three levels. For this four picture panoram a example, we can see th a t Hierarchical G raph C uts produces a solution th a t passes though a dynam ic scene element when using four levels of the hierarchy. A typical input value of a ten pixel dilation was used for this example. W hile a larger dilation param eter could be used, this would require a larger memory and com putational cost which negates the benefits of the technique. fu rth er argue the im portance of this observation, the figure also shows very similar seams com puted by Adobe P h o to sh o p ™ , a widely used image editing application. • There can be more th a n one valid seam solution. Even if the initial seam solution is visually acceptable to the user, there may be a large num ber of additional, valid solutions. Some of these alternative seams may be preferable and this determ ination is com pletely subjective. For example, a user may have wished th a t the high energy in a seam occurred in an area where it is less likely to be noticed such as the grassy area or the w ater in the images in Figure 1.7. Given moving elements in a scene, such elements may occur entirely w ithin the area of an overlap. Therefore, there can be acceptable seams where the element is included and ones where it is not. Figure 1.9 provides examples. • An interactive technique is necessary and attainable. Given the subjective n atu re of the image boundaries and the possibility of techniques falling into bad local minima, a user m ust be interjected into the seam boundary problem. Currently, finding panoram a boundaries w ith G raph C uts is an offline process w ith only one solution presented to the user. The only existing alternative is the m anual editing, pixel by pixel, of the individual image boundaries. This is a tim e-consuming and tedious 10 F ig u r e 1.9: Even when seams are visually acceptable, moving elements in the scene may cause multiple visually valid seam configurations. On the top, this figure shows a four image panoram a (Crosswalk: 4705 x 3543, four images) w ith three valid configurations. On the bottom , this figure shows a two image panoram a (Apollo-Aldrin: 3432 x 2297, two images) w ith two valid configurations. Images courtesy of NASA. process where the user relies on perception alone to determ ine if the manual seam is acceptable. Therefore, a guided interactive technique for image boundaries is necessary for panoram a processing. This technique should allow users to include or remove dynam ic elements, move an image seam out of possible local minima into a lower error state, move the seam into a higher error sta te (but one w ith more acceptable visual coherency) or hide errors in locations where they feel it is less noticeable. D uring these edits, the user should be provided the optim al seams given these new constraints. • A solution based on pairwise boundaries can achieve good results for panoram as giving a fast, highly parallel, and light system. Com puting pairwise-only optim al boundaries is both fast and exact (i.e., is guaranteed to find the global minimum). It is then of no surprise th a t these boundaries have been used often in past work for pan- or tilt-only panoram as [145, 46, 157, 165]. Although it has been thought not to generalize beyond this case, there has been no technique to use pairwise boundaries in panoram as w ith more complex structure, save for efforts to combine them via a distance m etric [69] or sequentially [53]. This dissertation work not only provides a global solution based on pairwise boundaries, b ut also shows th a t this solution often 11 produces lower energy seams th a n G raph C uts for panoram as. Given a technique to combine pairwise boundaries into a coherent seam network, each disjointed seam can be com puted separately and trivially in parallel. Moreover, the solution produced for each is typically independent and therefore, memory and resources for each can be allocated and released as needed. In addition, the solution dom ain is only the overlap between pairs of images in contrast to some previous applications of G raph C uts for panoram as [101, 4], which often consider the entire com posite image as the solution domain. All of these properties give the potential for a very fast and light system even when operating on the full resolution imagery. Moreover, such a system should have the ability to be extended to an out-of-core or d istributed setting. This dissertation describes a new image boundary technique called Panorama Weaving. F irst, Panorama Weaving provides an autom atic technique to create approxim ate optim al boundaries th a t is fast, has low memory requirements, and is easy to parallelize. Second, it provides the first interactive technique to enable the exploration of the seam solution space. This gives the end-user a powerful editing system for panoram a seams. In particular, the contributions of this work on a technical level are: • A novel technique to merge independently com puted pairwise boundaries into a global, consistent seam network th a t does not cascade to a global calculation. • A panoram a seam creation technique based purely on pairwise boundary solutions. This technique is fast and highly parallel and shows significant speed-ups com pared to previous work, even when run sequentially. More im portantly, it achieves all of this even w ith full resolution imagery. • Out-of-core and d istributed seam creation algorithm s which extend the creation tech­ nique to mosaics massive in size. These algorithm s provide speed-ups com pared to the state-of-the-art. • The first system th a t allows interactive editing of seams in panoram as. This system guarantees minimal user input thanks to an efficient exploration of the solution space. • An intuitive mesh specialization of a region adjacency graph th a t encodes seam and image relations. This adjacency mesh provides a way to guarantee the global consistency of the seam network of the interactions and also enables a robust editing of the netw ork’s topology. 12 1.3 Color Correction Creating a single seamless image from a mosaic has been the subject of a large body of work for which gradient-dom ain (Poisson) techniques currently provide the best solution. Only one m ethod exists to operate on the gradient-dom ain of massive images: the stream ing m ultigrid [89] technique. However, processing the three gigapixel image of Figure 1.2 using this technique still takes well over an hour, which does not support an interactive trial-and- error artistic process. An additional disadvantage of traditio n al out-of-core m ethods is their tendency to achieve a low memory footprint a t the cost of significantly proliferating the disk storage requirements. For example, the m ultigrid m ethod [89] requires auxiliary storage an order of m agnitude greater th a n the input size, alm ost half of which is due to gradient com putation. In contrast, our approach completely avoids such d a ta proliferation, thereby allowing the processing of d a ta which already pushes the boundaries of available storage. The m ultigrid m ethod [89] is also lim ited by main memory usage since it is proportional to the num ber of iterations of the solver. This can cause the m ethod to not achieve acceptable results for images th a t may require a large num ber of iterations, as shown in Section 4. This work introduces a new m ethod w ith memory usage independent of the num ber of iterations of the Poisson solver and, therefore, would scale gracefully in these cases. An option to reduce tim es is to design a similar scheme to run in a d istributed environ­ ment. Consequently, there has been recent work to extend the m ultigrid solver [90] to a parallel im plem entation, reducing the tim e to com pute a gigapixel solution to mere minutes. However, this approach is prim arily a proof-of-concept since it does not supply the classic tests of scalability (weak or strong) nor is it tested significantly. Like many out-of-core m ethods, proliferation of disk storage requirem ents is a m ajor drawback. For example, testing was only possible w ith a full 16-node cluster for some of the stream ing m ultigrid test d a ta due to excessive storage dem ands. Finally, the technique assumes th a t a small num ber of predeterm ined iterations is sufficient to achieve a solution, which may not always be the case. This im plem entation was optimized for a single d istributed system and therefore, is unlikely to port well to other environments. H istory has shown th a t levels of abstraction th a t remove complexity from a code base can be instrum ental in the advancement of technologies. A bstraction th a t allows simple and portable code accelerates innovation and reduces tim e to develop new ideas. The cloud should be explored as such abstraction, allowing a developer to ignore much of the more tedious and complex elements in implementing a d istributed graphics algorithm . A general scheme cannot beat the performance of highly specialized and 13 optimized code. Often for organizations w ith resources, there may be cases where speed and efficiency are more im portant th a n the cost to create and m aintain a typical implementation. Although w ith increased availability of cloud commodities, there is now the o pp ortunity to offer more members of our com munity the ability to develop new algorithm s for a distributed environment. In particular, in the area of color correction this dissertation work introduces a simple and light-weight framework th a t provides the user w ith the illusion of a full Poisson system solve at interactive frame rates for image editing. This framework also allows for the com putation of a full solution on a single machine w ith a simple approach, rivaling the run tim e of the current best out-of-core technique [89], while producing equal or higher quality results on images th a t require a large num ber of iterations. The system is flexible enough to handle different hierarchical image form ats such as tiling for higher quality images or HZ- order for greater in p u t/o u tp u t (I/O ) speed. In particular, by exploiting a new implicit kd- tree hierarchy for HZ-order, the framework needs only to access and solve visible pixels. This allows an a rtist to interactively apply gradient-based techniques to images gigapixels in size. This new framework is straightforw ard and requires neither com plicated spatial indexing nor advanced caching schemes. Additionally, this work introduces a framework for parallel gradient-dom ain processing inspired by the out-of-core technique w ith a novel reform ulation to provide an efficient parallel d istributed algorithm . This new framework has both a straightforw ard im plem entation and shows both strong and weak parallel scalability. W hen implemented in stand ard M PI (Message Passing Interface), the same code base ports well to multiple d istributed systems. Furtherm ore, this d istributed algorithm can be w rapped in a level of abstraction to be run on the cloud, allowing for a simple im plem entation, as well as allowing it to be d istributed to the com munity at large. Specifically, the contributions of this work are: • A coarse-to-fine progressive Poisson solver running at interactive frame rates, extended to a wide variety of gradient dom ain tasks, w ith the ability to scale to gigapixel images. This cascadic solver entirely avoids the coarsening stage of the V-cycle yet produces high quality results. • A m ethod to locally refine solutions having tim e and space requirements th a t are linearly dependent on the screen resolution rath er th a n the resolution of the input image. 14 • A full out-of-core solver th a t m aintains strict control over system resources, rivals the run-tim es for the best known m ethod and consistently achieves quality results where previous m ethods may not converge well in practice. • A light-weight stream ing framework th a t provides adaptive m ultiresolution access to out-of-core images in a cache coherent m anner, w ithout using intricate indexing d ata structures or precaching schemes. • A d istributed algorithm based on the out-of-core scheme, which has a straightforw ard im plem entation and shows both strong and weak parallel scalability. • The first d istributed Poisson solver for imaging implemented in the cloud. C H A P T E R 2 RELATED W ORK This chapter will outline the previous work for the two m ajor portions of the composition step of the panoram a creation pipeline: image boundaries, Section 2.1, and color correction, Section 2.2. In particular, these sections will address the related work for the state-of-the-art for image boundaries, minimal image seams, and color correction, gradient dom ain editing. 2.1 Image Boundaries: Seams This section details the related previous work for com puting image boundaries for an image mosaic. In particular, this section focuses on the current state-of-the-art th a t is the com putation of minimal mosaic seams. 2 .1 .1 P a ir w ise B o u n d a r ie s Some of the seminal works in digital panoram as assume th a t an image collection is acquired in a single sweep (either pan, tilt or a com bination of the two) of the scene. In such panoram as, only pairwise overlaps of images need be considered [119, 120, 157, 145, 46, 165]. The pairwise boundaries which have globally minimal energy can be com puted quickly and exactly using a min-cut or m in-path algorithm . There is an intuitive and proven duality between min-cut and single-source/single-destination m in-path [72]. These pairwise techniques were thought to not be general enough to handle the many configurations possible in modern panoram as. Recent work [69] has dealt w ith the com bination of these seams for more complex panoram as, although the seam combinations are still based on an image distance metric. O ther recent work [53], combined these separate seams for the purposes of tex tu re synthesis combining seams sequentially. For their work, this was sufficient to provide good results for textures. The com bination and intersection of these seams in a digital panoram a can be more complex and therefore, a more expressive com bination is necessary. In addition, interaction was not considered as a necessary functionality in these works. This dissertation presents a novel technique to combine these disjoint seams into a 16 global panoram a seam network and allow for m anual user interaction. 2 .1 .2 G rap h C u ts The G raph Cuts technique [26, 24], com putes a k-labeling of a graph, typically an image, to minimize an energy function on the domain. An algorithm th a t guarantees to find the global minimum is considered to be N P-hard [26] and therefore G raph Cuts was designed to efficiently com pute a good approxim ation. G raph C uts has been shown to give good results for a variety of energy functions [92]. Thus, it is of no surprise given this versatility th a t it has been shown to a d ap t to the image mosaic and panoram a boundary problem [101, 4]. However, G raph Cuts is both a com putationally expensive and memory intensive technique. Given these requirements, there has been work on accelerating the G raph Cuts process by, for instance, adapting the technique to run on the G PU [167], in parallel [106], or in parallel-distributed [48] environments. Building a hierarchy for the G raph C uts com putation [107, 5] has shown to be popular due to its reduction of memory and com putation costs. For panoram as, this strategy has only been shown to provide good results for a hierarchy of two to three levels [5]. There has also been work on bringing G raph C uts into an interactive setting [25, 142, 104, 61, 124] although these works have focused only on user guided image segm entation. This dissertation provides the first technique to allow interactive editing of panoram a boundaries. 2 .1 .3 A lte r n a tiv e B o u n d a r y T ech n iq u es While G raph C uts still m aintains its popularity as a solution to the minimal boundary problem, there has been other ongoing work on alternative techniques. For example, there has been work on techniques based on lum inance voting [82]. There has also been recent work using geodesics to interactively com pute pairwise minimal boundaries [45]. 2 .1 .4 O u t-o f-C o r e an d D is tr ib u te d C o m p u ta tio n While there has been previous work to bring the G raph Cuts technique to massive grids, the previous work has only dealt w ith extending the algorithm to a d istributed and out-of­ core environm ent [48], No current technique decouples the two. The work of this dissertation has the flexibility to operate in-core, out-of-core, or d istributed depending on the application or available resources. Moreover, the inherent parallelism of the new technique is likely to outperform the previous work. Finally, there has been no work to allow interaction with these seams at massive scales. Hierarchical G raph Cuts has been used on large images [2], 17 although given the docum ented lim itation on the viable num ber of levels in the hierarchy [5] this will not scale massively. Applying standard G raph C uts as a sweeping window over an image neighborhood [94] has been used to produce boundaries for gigapixel imagery. Such a process has yet to be form ulated in parallel and is potentially very com putationally expensive. 2.2 Color Correction: Gradient D om ain Editing This section details the related previous work in the most popular and sophisticated color correction procedure for image mosaics called gradient dom ain image editing, or by its alternative name, Poisson image editing. 2 .2 .1 P o is s o n Im a g e P r o c e s s in g A variety of gradient-based m ethods provide a popular, b ut com putationally expensive, set of techniques for advanced image m anipulation. Given a guiding gradient field con­ structed from one or m ultiple source images, these techniques a tte m p t to find a closest-fit image using some predeterm ined distance metric. This basic concept has been adapted for stand ard image editing [133], as well as more advanced m attin g operations [152] , and high level drag-and-drop functionality [79]. Furtherm ore, gradient-based techniques can tone m ap high dynam ic range images to display favorably on stand ard m onitors [55] or hide the seams in panoram as [133, 103, 4, 89]. O ther applications include detecting light­ ing [76] or shapes from images [173], removing shadows [57] or reflections [6], and gradient dom ain painting [114]. Recently, an alternative approach using m ean value coordinates has sm oothly interpolated the boundary offset between source images, thereby mimicking Dirichlet boundary conditions [54]. This promising new line of research has yet to show support of Poisson techniques such as tone-m apping, the ability to work well out-of-core, or consistently acceptable results for m ethods th a t typically require N eum ann boundary conditions. 2 .2 .2 P o is s o n S o lv er s The solution to a two-dimensional (2D) Poisson problem lies at the core of gradient based image processing. Poisson equations have wide utility in many engineering and science applications. C om puting their solution efficiently has been the focus of a large body of work and even a cursory review is beyond the scope of this dissertation. For small images, m ethods exist to find the direct analytical solution using Fast Fourier transform s [75, 7, 8, 18 113]. Simchony [146] provides a survey of these m ethods for com puter vision applications. Often the problem is simplified by discretization into a large linear system whose dimension is typically the num ber of pixels in an image. If this system is small enough to fit into memory, m ethods exist to find the direct solution and we refer the reader to D orr [51] who provides an extensive review on direct m ethods. Typically, iterative Krylov subspace m ethods, such as conjugate gradient, are used due to their fast convergence. For much larger systems, memory consum ption is the lim iting factor and iterative solvers, such as Successive Over-Relaxation (SOR) [13] become more attractive. D epending on the application, different levels of accuracy may be required. Sometimes, a coarse approxim ation is sufficient to achieve the desired result. Bilateral upsampling m ethods [93] operating on a coarse solution produced good results for applications such as tonem apping. Such m ethods have not yet been shown to handle applications such as image stitching where the interpolated values are typically not sm ooth at the seams between images. W hen pure upsampling is insufficient, the system m ust be solved fully. M ultigrid m ethods are often employed to aid the convergence of an iterative solver. Such m ethods have proven particularly effective by dealing w ith the large scale trends at coarse resolutions. These techniques include preconditioners [66, 155] and multigrid solvers [27, 28]. There exist different variants of multigrid algorithm s using either adaptive [20, 88, 21, 2, 139] or nonadaptive meshes [87, 89]. As a first step in a complete m ultigrid system, the mesh is coarsened. The Poisson equation can then be solved in a coarse-to-fine m anner. One full iteration, from fine to coarse and back, is typically called a V-cycle. Most recently, a V-cycle was implemented in a stream ing fashion for large panoram as [89]. However, other systems only implement parts of the V-cycle. K opf et al. [94] implement only the second half in a pure upsampling procedure, while Bolitho et al. [21] implement a purely coarse-to-fine solver also called cascadic [22]. Lischinski et al. [105] applied this pure coarse-to-fine approach for interactive tonal adjustm ent. T he technique outlined in this diseration (for the first tim e) shows th a t a cascadic approach has applications well beyond the adjustm ent of tonal values and can be used for a wide variety of gradient based image processing techniques. This work also extends such techniques to allow the interactive editing and processing of gigapixel images. The solver propagates sufficient information from coarse-to-fine, allowing us to achieve local solutions at interactive rates th a t are virtually indistinguishable from the full-resolution solution. 19 2 .2 .3 O u t-o f-C o r e C o m p u ta tio n Toledo [160] presents a survey of general out-of-core algorithm s for linear systems. The m ajority of algorithms surveyed assume th a t at least the solution vectors can be kept in main memory, which is not the case for large images. For out-of-core processing of large images, the stream ing m ultigrid m ethod of K azhdan and Hoppe [89] has so far provided the only solution. However, processing a three gigapixel image using this technique still takes well over an hour which does not support an interactive trial-and-error artistic process. Many algorithm s such as tone m apping require careful p aram eter tuning to achieve good results. Thus, waiting several hours to examine the effects of a single p aram eter change is not feasible in this context. An additional disadvantage of traditio n al out-of-core m ethods is their tendency to achieve a low memory footprint at the cost of significantly proliferating the disk storage requirements. For example, the m ultigrid m ethod [89] requires auxiliary storage an order of m agnitude greater th a n the input size, alm ost half of which is due to gradient com putation. In contrast, in this dissertation, I w ith my collaborators, introduce an approach th a t completely avoids such d a ta proliferation, thereby allowing the processing of d ata, which already pushes the boundaries of available storage. The m ultigrid m ethod [89] is also limited by main memory usage since it is proportional to the num ber of iterations of the solver. This can cause the m ethod to not achieve acceptable results for images th a t may require a large num ber of iterations, as shown in Section 4. This work provides a new m ethod w ith memory usage independent of the num ber of iterations and, therefore, scales gracefully in these cases. 2 .2 .4 D is tr ib u t e d C o m p u ta tio n Recently, the stream ing m ultigrid m ethod has been extended to a distrib uted environ­ ment [90] and has reduced the tim e to process gigapixel images from hours to minutes. The d istributed multigrid requires 16 bytes/pixel of disk space in tem porary storage for the solver as well as 24 bytes/pixel to store the solution and gradient constraints. For the terapixel example of K azhdan et al. [90], the m ethod had a minimum requirement of 16 nodes in order to accom m odate the needed disk space for fast local caching. In contrast, the approach outlined in this work needs no tem porary storage and is implemented in standard M PI. Streaming multigrid also assumes a precom puted image gradient, which would add substantial overhead in initialization to transfer the color float or double data. O ur new approach is initialized using original image d a ta plus an extra byte for image 20 boundary inform ation which equates to 1/3 less d a ta transfer in initialization th a n the previous m ethod. D ata transfers between this solver’s phases, while floating point, only deal w ith the boundaries between com pute nodes which is substantially smaller th a n the full image and therefore are rarely required to be cached to disk. The multigrid m ethod [89, 90] may also be limited by main memory, since the num ber of iterations of the solver is directly proportional to the memory footprint. For large images, this limits the solver to only a few Gauss-Seidel iterations and therefore may not necessarily converge for challenging cases. O ur m eth o d ’s memory usage is independent of the num ber of iterations and can therefore solve images th a t have slow convergence. Often large images are stored as tiles at the highest resolution; therefore, m ethods th a t exploit this stru ctu re would be advantageous. Stookey et al. [148] use a tile-based approach to com pute an over-determined Laplacian P D E (partial differential equation). By using tiles th a t overlap in all dimensions, the m ethod solves the P D E on each tile separately and then blends the solution via a weighted average. U nfortunately this m ethod cannot account for large scale trends beyond a single overlap and therefore can only be used on problems which have no large (global) trends. Figure 2.1 illustrates why this would be a problem for panoram a processing. The coarse up-sam pling of our approach fixes this issue. 2 .2 .5 C lo u d C o m p u tin g - M a p R e d u c e an d H a d o o p M apReduce [47] was developed by Google as a simple framework to process massive d ata on large d istributed systems. It is an abstraction th a t owes its inspiration to functional program m ing languages such as Lisp. At its core, the framework relies on two simple operations: • Map: Given input, create a key/value pair. • Reduce: Process all values of a given key. All the complexity of a typical d istributed im plem entation due to d a ta distribution, load balancing, fault-recovery and com m unication are under this abstraction layer and therefore can be ignored by a developer. This framework, when combined w ith a d istributed file system, can be a simple yet powerful tool for d a ta processing. Hadoop is an open source im plem entation of M apReduce m aintained by the Apache Software Foundation and can be optionally coupled w ith its own d istributed file system (HDFS). Pavlo et al. [131] found th a t Hadoop was easy to deploy and use, offered adequate 21 F ig u r e 2.1: Although the result is a seamless, sm ooth image, w ithout coarse upsampling the final image will fail to account for large trends th a t span beyond a single overlap and can lead to unwanted, unappealing shifts in color. scalability, has very effective fault-tolerance, and, most im portantly, was easy to adap t for complex analytic tasks. H adoop is also widely available as a com modity resource. For example, Amazon Web Services, a service suite th a t has become nearly synonymous with cloud com puting in the media, provides Hadoop as a native capability [10]. Companies have begun to use Hadoop as a simple alternative for d a ta processing on large clusters [71]. For instance, The New York Times has used H adoop for large scale image to P D F con­ version [68]. Google, IBM, and NSF have also partnered to provide a H adoop cluster for research [41]. 2 .2 .6 O u t-o f-C o r e D a t a A c c e ss Given an image, it is well known th a t the stand ard row-major order exhibits good locality in only one dimension and is therefore ill-suited for an unconstrained out-of-core storage scheme [168]. Previous out-of-core Poisson m ethods [89] have been noted to be severely lim ited by this constraint. Instead, indexing based on various space-filling curves [143] has been proposed in different applications [126, 70, 17, 102] to exploit their inherent geometric locality. Of particular interest is the Z-order (also called Lebesgue-order) [17, 128] since it allows an especially simple conversion to and from stand ard row-major indices. While Z-order exhibits good locality in all dimensions, it does so only at a fixed resolution and does not support hierarchical access. Instead, this work will utilize the hierarchical variant, called HZ-order, proposed by Pascucci and Frank [128]. C H A P T E R 3 SCALABLE A N D EFFIC IE N T DATA ACCESS At the core of any large d a ta processing system is an efficient scheme for d a ta access. In this chapter, I will detail the technology used in the systems outlined in this work. In Section 3.1, I will review the fundam entals of the hierarchical Z-order (HZ-order) for two-dimensional arrays, our chosen form at for large image processing. I will also provide a new, simple algorithm in Section 3.2 for accessing d a ta organized in HZ-order, while avoiding the repeated index conversions used in [128]. Section 3.3 will provide a new parallel w rite scheme for HZ-order d ata. Finally, in Section 3.4 I will give an outline of the ViSUS software infrastructure, the core system behind much of the massive image processing outlined in this dissertation. Conversion of large images into theViSUS form at requires no additional storage, com pared to the typical 1/3 d a ta increase common for typical tiled image hierarchies. From our test d ata, we have found th a t there is only a 27% overhead due to the conversion compared to ju st copying the raw d a ta which makes this conversion very light. The conversion requires no operations on the pixel d a ta and will outperform even the most simple tiled hierarchies, which require some m anipulation of the pixel data. Section 5.2.2 will show th a t this new I/O infrastructure reduces the overhead by 28%-40% when com pared to a stand ard tiled image hierarchy. These num bers reflect the theoretical bound of 1/3 overhead, m ade worse by the inability to constrain real queries to perfect alignment w ith the boundaries of a quadtree. 3.1 Z- and HZ-Order Background The d a ta access routine of our system achieves high performance on our image d ata by utilizing a hierarchical variant of a stand ard Z-order (Lebesgue) space filling curve to lay out our two-dimensional d a ta in one-dimensional memory. In the two-dimensional case the Z-order curve can be defined recursively by a Z shape whose vertices are replaced by Z shapes half its size (see Figure 3.1 (a)). Given the binary row-major index of a pixel 23 (a) (b) Binary representation Z-order i 0 0000 00 00 1 0001 01 00 2 0010 00 01 3 0011 01 01 Sa mp le in de x 4 0100 10 005 0101 11 00 6 0110 10 01 7 0111 11 01 8 1000 00 10 9 1001 01 10 10 1010 00 11 11 1011 01 11 12 1100 10 10 13 1101 11 10 14 1110 10 11 15 1 1 1 1 11 11 F ig u r e 3.1: (a) The first four levels of the Z-order space filling curve; (b) 4x4 array indexed using stand ard Z-order (in • • • i 1i0, j n • • • j 1j 0) the corresponding Z-order index I is com puted by interleaving the indices I = j n in . . . j 1i 1j 0i0 (see Figure 3.2 (a) step 1). While Z-order exhibits good locality in all dimensions, it does so only at full resolution and does not support hierarchical access. Instead, our system uses the hierarchical variant, called HZ-order, proposed by Pascucci and Frank [128]. This new index changes the standard Z-order of Figure 3.1 (b) to be organized by levels corresponding to a subsampling binary tree, in which each level doubles the num ber of points in one dimension (see Figure 3.2 (b)). This pixel order is com puted by adding a second step to the index conversion. To com pute an HZ-order index I, the binary representation of a given Z-order index I is shifted to the right until the first 1-bit exits. D uring the first shift, a 1-bit is added to the left and 0-bits are added in all following shifts (see Figure 3.2 (a)). This conversion could have a potentially very simple and efficient hardw are im plem entation. The software C + + version can be implemented as follows: 24 (a) (b) F ig u r e 3.2: (a) Address transform ation from row-m ajor index ( i , j ) to Z-order index I (Step 1) and then to hierarchical Z-order index (Step 2); (b) Levels of th e hierarchical Z-order for a 4x4 array. T he samples on each level rem ain ordered by the stan d ard Z-order. i n l i n e adhocindex r e m a p (r e g is te r adhocindex i ) { i |= la s t_ b it_ m a s k ; / / s e t le f t m o s t one i /= i& - i; / / remove t r a i l i n g z e r o s r e tu r n (i> > 1 ); / / remove r ig h tm o st one } We store th e d a ta in a way guaranteeing efficient access to any subregion w ithout internal caching and w ithout opening a d a ta block more th a n once. F urtherm ore, we allow for storage of incomplete arrays. In our storage form at, we first sort the d a ta in HZ-order and group consecutive samples in blocks of constant size. A sequence of consecutive blocks is grouped 25 into a record and records are clustered in groups, which are organized hierarchically. Each record has a header specifying which of its blocks are actually present and if the data are stored raw or compressed. Groups can miss entire records or subgroups, implying that all their respective blocks and records are missing. The file format is implemented via a header file describing the various parameters (dimension, block size, record size, etc.) and one file per record. The hierarchy of groups is implemented as a hierarchy of directories each containing a predetermined maximum number of subdirectories. The leaves of each directory contain only records. To open a file, one needs only to reconstruct the path of a record and defer its search to the file system. In particular, the path of a record is constructed as follows: we take the HZ-address of the first sample in the record, represent it as a string, and partition it into chunks of characters naming directories, subdirectories, and the record file. Note that, since blocks, records and groups can be missing, one is not restricted to arrays of data that cover the entire index space. In fact, we can easily store even images with different regions sampled at different resolutions. 3.2 Efficient Multiresolution Range Queries One of the key components of our framework is the ability to quickly extract rectangular subsets of the input image in a progressive manner. Computing the row-major indices of all samples residing within a given query box is straightforward. However, efficiently calculating their corresponding HZ-indices is not. Transforming each address individually results in a large number of redundant computations by repeatedly converting similar indices. To avoid this overhead, we introduce a recursive access scheme that traverses an image in HZ-order, while concurrently computing the corresponding row-major indices. This traversal implicitly follows a kd-tree style subdivision, allowing us to quickly skip large portions of the image. To better illustrate the algorithm I will first describe how to recursively traverse an array in plain Z-order using the 4x4 array of Figure 3.1 (b) as example. Subsequently, I will discuss how to restrict the traversal to a given query rectangle and finally, how the scheme is adapted to HZ-order. We use a stack containing tuples of type (split-dimension, Lstart, mind, maxJ, m in j, m a x j, num elem ents). To start the process we push the tuple t0 =(1,0,0,3,0,3,16) onto the stack. At each iteration we pop the top-most element t from the stack. If t contains only a single element we output the current L start as HZ-index and fetch the correspond­ 26 ing sample. Otherwise, we split the region represented by t into two pieces along the axis given by split-dimension and create the corresponding tuples t1 = (0,0,0,3,0,1,8) and t2 =(0,8,0,3,2,3,8). Note that all elements of t1 and t2 can be computed from t by simple bit manipulation. In case of a square array, we simply flip the split dimension each time a tuple is split. However, one can also store a specific split order to accommodate rectangular arrays. Figure 3.3 shows the first eight iteration of the algorithm outputting the first four elements in the array of Figure 3.1 (b). To use this algorithm for fast range queries, each tuple is tested against the query box as it comes off the stack and discarded if no overlap exists. Since the row-major indices describing the bounding box of each tuple are computed concurrently, the intersection test is straightforward. Furthermore, the scheme applies, virtually unchanged, to traverse samples in Z-order that sub-sample an array uniformly along each axis, where the sub-sampling rate along each axis could be different. Finally, to adapt the algorithm to HZ-order (see Figure 3.2 (b)), one exploits the following two important facts: • One can directly compute the starting HZ-index for each level. For example, in a squared array level 0 contains one sample and all other levels h contain 2h-1 samples. Therefore the starting HZ-index of level h, lShtart, is 2m -h, where m is the number of bits of the largest HZ-index. • Within each level, samples are ordered according to plain Z-order and can be traversed with the stack algorithm described above, using the appropriate subsampling rate. Using these two facts one can iterate through an array in HZ-order by processing one level at a time, adding lShtart to the I s t a r t index of each tuple. In practice, we avoid subdividing the stack tuples to the level of a single sample. Instead, depending on the platform, we choose a parameter n and build a table, with the sequence of Z-order indices for an array with 2n elements. When running the stack algorithm, each time a tuple t with 2n elements appears, we loop through the table instead of splitting t. By accessing only the necessary samples in strict HZ-order, the stack-based algorithm guarantees that only the minimal number of disk blocks are touched and each block is loaded exactly once. For progressively refined zooms in a given area, we can apply this algorithm with a minor variation. In particular, one would need to reduce the size of the bounding box represented s s G J II O © o 1,4) t1 = ( 0 , 0 , 0 , 3, 0, 1,8) t4 =(1, 4, 2, 3, 0, 1,4) t = ( 1 , 0 , 0 , 3, 0, 3, 16) t2 =(0, 8, 0, 3, 2, 3, 8) t2 =( 0, 8, 0 , 3, 2, 3, 8) t6 . 3 X\Vi. i"T 0 t 2 FT t7 t8 t2 t2 • 1-3 (i,j)=(l,l) t4 =(1, 4, 2, 3,0, 1,4) t2 =(0, 8, 0, 3, 2, 3, 8) Figure 3.3: Our fast-stack Z-order traversal of a 4x4 array with concurrent index computation 28 in a tuple each time it is pushed back into the stack. In this way, even for a progressively refined zoom, one would access only the needed data blocks, each being accessed only once. 3.3 Parallel Write The multiresolution data layout outlined above is a progressive, linear format and therefore has a write routine that is inherently serial. When processing a large image on a distributed system, or even on a single multicore system, it would be ideal for each node, or process, to be able to write out its piece of the data directly in this layout. Therefore a parallel write strategy must be employed. Figure 3.4 illustrates different possible parallel strategies. As shown in Figure 3.4 (a), each process can naively write its own data directly to the underlying binary file. This is inefficient due to the large number of small file accesses. As data gets large, it becomes disadvantageous to store the entire dataset as a single, large file and typically the entire dataset is partitioned into a series of smaller more manageable files. This disjointness can be used by a parallel write routine. As each simulation process produces simulation data, it can store its piece of the overall dataset locally and pass the data on to an aggregator process. These aggregator processes can be used to gather the individual pieces and composite the entire dataset. In Figure 3.4 (b), each process transmits each contiguous data segment to an intermediate aggregator. Once the aggregator’s buffer is complete, the data are written to disk using a single large I/O operation. Figure 3.4 (c), illustrates a strategy where several noncontiguous memory accesses from each process are bundled into into a single message. This approach reduces the number of small network messages needed to transfer data to aggregators. This last strategy has been shown to exhibit good throughput performance and weak scaling for S3D combustion simulation applications when compared to standard Fortran I/O benchmark [98, 99]. 3.4 ViSUS Software Framework The ViSUS (Visual Streams for Ultimate Scalability) software framework6 has been designed as an environment to allow the interactive exploration of massive datasets on a variety of hardware, possibly over platforms distributed geographically. The system and I/O infrastructure is designed to handle n-dimensional datasets but is typically used on two-dimensional and three-dimensional image data. This two-dimensional portion of this 6h t t p : / / v i s u s . c o or h t t p : / / v i s u s . u s http://visus.co http://visus.us 29 | Rank One || Rank Two 11Rank Three| | Rank One || Rank Two 11Rank Three| \ \ \ i \ i j / / / / / \w'\«> i"/ - • Aggregator Process ^ Rank Z e r o f 1 f f s / • ^ Rank Zero f Aggregator Process (a) ^ IDX Binary File (b) ^ IDX Binary File (c) Forming indexed datatype MPI file writes ♦ ------------- MPI puts O MPI indexed datatype • Data chunks F igure 3.4: (a) Naive parallel strategy where each process writes its piece of the overall dataset into the underlying flle, (b) each process transmits each contiguous data segment to an intermediate aggregator. Once the aggregator’s buffer is complete, the data are written to disk, (c) several noncontiguous memory accesses are bundled into a single message to decrease communication overhead. system is the core application on which the massive applications outlined in this dissertation are built. The ViSUS software framework was designed with the primary philosophy that the visualization and/or processing of massive data need not be tied to specialized hardware or infrastructure. In other words, a visualization environment for large data can be designed to be lightweight, highly scalable and run on a variety of platforms or hardware. Moreover, if designed generally such an infrastructure can have a wide variety of applications, all from the same code base. Figure 3.5 details example applications and the major components of the ViSUS infrastructure. The components can be grouped into three major categories. First, a lightweight and fast out-of-core data management framework using multiresolution space filling curves, which I have outlined Sections 3.1, 3.2, and 3.3. This allows the organization of information in an order that exploits the cache hierarchies of any modern data storage architectures. Second, ViSUS contains a dataflow framework to allow data to be processed during movement. Processing massive datasets in their entirety would be a long and expensive operation which hinders interactive exploration. By designing new algorithms to fit within this framework, data can be processed as it moves. The Progressive Poisson technique outlined in Section 5.2 is one such new algorithm. Third, ViSUS provides a portable visualization layer that was designed to scale from mobile devices to powerwall displays with the same code base. Figure 3.5 provides a diagram of the ViSUS software architecture. In this section I External SQL Image Conversion Compression Networking and o t h e r... (Freelmage) (zlib) (curl) Threadin g (pthreads/ w indows-native) Rendering (opengl) Rendering extensions (glew) GUI library (Juce) ViSUS Apps with GUI Figure 3.5: The ViSUS software framework. Arrows denote external and internal dependences of the main software components. Additionally this figure illustrates the relationship with several example applications that have been successfully developed with this framework. 00o 31 will detail three of ViSUS’s major components and how they couple with the efficient data access detailed in the previous sections to achieve a fast, scalable, and highly portable data processing and visualization environment. Finally, I will illustrate an important additional use of this infrastructure, real-time monitoring of scientific simulations. 3.4.1 LightStream Dataflow and Scene Graph Even simple manipulations can be overly expensive when applied to each variable in a large scale dataset. Instead, it would be ideal to process the data based on need by pushing data through a processing pipeline as the user interacts with different portions of the data. The ViSUS multiresolution data layout enables efficient access to different regions of the data at varying resolutions. Therefore different compute modules can be implemented using progressive algorithms to operate on these data stream. Operations such as binning, clustering, or rescaling are trivial to implement on this hierarchy given some known statistics on the data, such as the function value range, etc. These operators can be applied to the data stream as-is, while the data are moving to the user, progressively refining the operation as more data arrives. More complex operations can also be reformulated to work well using the hierarchy. For instance, using the layout for image data produces a hierarchy which is identical to a subsampled image pyramid on the data. Moreover, as data are requested progressively, the transfer will traverse this pyramid in a coarse-to-fine manner. Techniques such as gradient-domain image editing can be reformulated to use this progressive stream and produce visually acceptable solutions which will be detailed in Section 5.2. These adaptive, progressive solutions allow the user to explore a full resolution solution as if it was fully available, without the expensive, full computation. ViSUS LightStream facilitates this steam processing model by providing definable mod­ ules within a dataflow framework with a well understood API. Figure 3.6 gives an example of a dataflow for the analysis and visualization of a scientific simulation. This particular example is the dataflow for a Uintah combustion simulation used by the C-SAFE Center for the Simulation of Accidental Fires and Explosions at the University of Utah. Each LightStream module provides streaming capability through input and output data ports that can be used in a variety of data transfer/sharing modes. In this way, groups of modules can be chained to provide complex processing operations as the data are transferred from the initial source to the final data analysis and visualization stages. This data flow is typically driven by user demands and interactions. A variety of “standard” modules, such 32 (a) (b) F igure 3.6: The LightStream Dataflow used for analysis and visualization of a three­ dimensional combustion simulation (Uintah code). (a) Several dataflow modules chained together to provide a light and flexible stream processing capability. (b) One visualization that is the result from this dataflow. as data differencing (for change detection), content based image clustering (for feature detection), or volume rendering with multiple, science-centric transfer functions, are part of the base system. These can be used by new developers as templates for their own progressive streaming data processing modules. ViSUS also provides a scene graph hierarchy for both organizing objects in a particular environment, as well as the sharing and inheriting of parameters. Each component in a model is represented by a node in this scene graph and inherits the transformations and environment parameters from its parents. Three-dimensional volume or two-dimensional slice extractors are children of a data set node. As an example of inheritance, a scene graph parameter for a transfer function can be applied to the scene graph node of a data set. If the extractor on this data set does not provide its own transfer function, it will be inherited. 3.4.2 Portable Visualization Layer - ViSU S AppK it. The visualization component of ViSUS was built with the philosophy that a single code base can be designed to run on a variety of platforms and hardware ranging from mobile devices to powerwall displays. To enable this portability, the basic draw routines were designed to be OpenGL ES compatible. This is a limited subset of OpenGL used primarily for mobile devices. More advanced draw routines can be enabled if a system’s hardware can support it. In this way, the data visualization can scale in quality depending on the available hardware. Beyond the display of the data, the underlying graphical user interface (GUI) library can hinder portability to multiple devices. At this time ViSUS has 33 made use of the Juce 7 library which is lightweight and supports mobile platforms such as iOS and Android in addition to major operating systems. ViSUS provides a demo viewer that contains standard visualizations such as slicing, volume rendering and isosurfacing. Similarly to the example LightStream modules, these routines can be expanded through a well-defined application programming interface (API). Additionally, the base system can display two-dimensional and three-dimensional time varying data. As mentioned above, each of these visualizations can operate on the end result of a LightStream dataflow. The system considers a two-dimensional dataset as a special case of a slice renderer and therefore the same code base is used for two-dimensional and three-dimensional datasets. Combining all of the above design decisions allows the same code base to be used on multiple platforms seamlessly for data of arbitrary dimensions. Figure 3.7 shows the same application and visualization running on an iPhone 3G mobile device and a powerwall display. 3.4.3 Web-Server and Plug-In ViSUS has been extended to support a client-server model in addition to the tradi­ tional viewer. The ViSUS server can be used as a standalone application or a web server plugin module. The ViSUS server uses HTTP (a stateless protocol) in order to support many clients. A traditional client/server infrastructure, where the client established and maintained a stable connection to the server, can only handle a limited number of clients robustly. Using HTTP, the ViSUS server can scale to thousands of connections. The ViSUS client keeps a number (normally 48) of connections alive in a pool using the “keep-alive” option of HTTP. The use of lossy or lossless compression is configurable by the user. For example, ViSUS supports JPEG and EXR for lossy compression of byte and floating point data, respectively. The ViSUS server is an open client/server architecture, therefore it is possible to port the plugin to any web server which supports a C + + module (i.e., Apache, IIS). The ViSUS client can be enabled to cache data to local memory or to disk. In this way, a client can minimize transfer time by referencing data already sent, as well as having the ability to work offline if the server becomes unreachable. The ViSUS portable visualization framework (Appkit) also has the ability to be compiled as a Google Chrome, Microsoft Internet Explorer, or Mozilla Firefox web browser plugin. This allows a ViSUS framework based viewer to be easily integrated into web visualization portals. 7http://www.rawm aterialsoftware.com http://www.rawmaterialsoftware.com 34 (a) (b) F igure 3.7: The same application and visualization of a Mars panorama running on an iPhone 3G mobile device (a) and a powerwall display (b). Data courtesy of NASA. 3.4.4 Additional Application: Real-Time Monitoring In addition to the ones provided in this dissertation, the ViSUS framework has an ad­ ditional ideal application in the real-time monitoring of large scientific simulations. Ideally, for these simulations a user-scientist would like to view a simulation as it is computed, in order to steer or correct the simulation as unforeseen events arise. Simulation data are often very large. For instance, a single field of a time-step from the S3D combustion simulation in Figure 3.8 (a) is approximately 128 gigabytes in size. In the time needed to transfer this single time-step, the user-scientist would have lost any chance for significant steering/correction of an ongoing simulation or at least the ability to save wasting further resources by terminating a simulation early. By using the parallel ViSUS data format in simulation checkpointing [98, 99], we can link this data directly with an Apache server using a ViSUS plug-in running on a node of the cluster system. By doing this, user-scientists can visualize simulation data as checkpoints are reached. ViSUS can handle missing or partial data; therefore, the data can be visualized even as it is being written to disk by the system. ViSUS’s support for a wide-variety of clients (a stand-alone application, a web-browser plug-in, or an iOS application for the iPad or iPhone) allows the application scientist to monitor a simulation as it is produced, on practically any system that is available without any need to transfer the data off the computing cluster. As mentioned above, Figure 3.8 (a) is an S3D large-scale, combustion simulation visualized remotely from an high performance 35 (a) (b) F igure 3.8: Remote visualization and monitoring of simulations. (a) An S3D combustion simulation visualized from a desktop in the Scientific Computing and Imaging (SCI) Institute (Salt Lake City, Utah) during its execution on the HOPPER 2 high performance computing platform in Lawrence Berkeley National Laboratory (Berkeley, California). (b) Two ViSUS demonstrations of LLNL simulation codes (Miranda and Raptor) visualized in real-time while executed on the BlueGene/L prototype installed at the IBM booth of the Supercomputing exhibit. computing platform8. This work is the natural evolution of the ViSUS approach of targeting practical ap­ plications for out-of-core data analysis and visualization. This approach has been used for direct streaming and real-time remote monitoring of the early large scale simulations such as those executed on the IBM B G /L supercomputers at Lawrence Livermore National Laboratory (LLNL) [130] shown in Figure 3.8 (b). This work continues its evolution towards the deployment of high performance tools for in situ and postprocessing data management and analysis for the software and hardware resources of the future including exascale DOE platforms of the next decade9. 8Data are courtesy o f Jackie Chen at Sandia National Laboratories, Combustion Research Facility h t t p ://a s c r .s a n d ia .g o v /p e o p le /C h e n .h t m 9 Center for Exascale Simulation o f Combustion in Turbulence (E xaCT) h t t p : // s c i e n c e .e n e r g y .g o v / a s c r /r e s e a r c h /s c i d a c /c o - d e s i g n / http://ascr.sandia.gov/people/Chen.htm http://science.energy.gov/ascr/research/scidac/co-design/ CHAPTER 4 INTERACTIVE SEAM EDITING AT SCALE This chapter outlines the Panorama Weaving technique which brings the boundary computation phase of panorama creation pipeline into an interactive environment. Sec­ tion 4.1 gives the relevant background and formulation for the computation of optimal image boundaries. Section 4.2 discusses how to achieve interaction with pairwise boundaries. Section 4.3 introduces the adjacency mesh data structure and how it can be used to bring pairwise seams to a global seam solution. Section 4.4 details how to extend the Panorama Weaving technique to an out-of-core environment, thereby scaling the technique to gigapixel images. In Section 4.5, I will detail how to design an interactive system using this technique and scale it to large images in Section 4.6. Finally, Section 4.7 provides results for the technique and in Section 4.8 I discuss its limitations. 4.1 Optimal Image Boundaries In this section, we discuss the technical background for boundary calculations of both pairwise and many-image panoramas. 4.1.1 Optimal Boundaries Given a collection of n panorama images I i , l 2 ..In and the panorama P , the image boundary problem can be thought of as finding a discrete labeling L(p) e (1...n) for all panorama pixels p e P , which minimizes the transition between each image. If L (p) = k, this indicates that the pixel value for location p in the panorama comes from image Ik. This transition can be defined by an energy on the piecewise smoothness E s(p, q) of the labeling of neighboring elements p, q e N , where N is the set of all neighboring pixels. We would like to minimize the sum of the energy of all neighbors, E . For the panorama boundary problem, this energy is typically [4] defined as: 37 E (L) = ^ Es(jp,t p,q&N If minimizing the transition in pixel values: E s (p ,q ) = \\Il (j>)(P) - I L(q)(P) \ + ||1L(p)(q) - ^L(q)(9) y or if minimizing the transition in the gradient: E s(p ,q ) = ||V / l ( p) (p) - V/L(q)(p)| + |V/L(p) (q) - v 1 L(q)(9)| where L(p) and L(q) are the labeling of the two pixels. Notice that L(p) = L(q) implies E s(p, q) = 0. Minimizing the change in pixel value works well in the context of poor registration or moving objects in the scene, while minimizing the gradient produces a nice input for techniques such as gradient domain blending. In addition, techniques can use a linear combination of the two energies. 4.1.2 M in-C ut and Min-Path When computing the optimal boundary between two images, the binary labeling is equivalent to computing a min-cut of a graph whose nodes are the pixels and arcs connect a pixel to its neighbors. The arc weights are then the energy function being minimized, see Figure 4.1 (a). If we consider a four-neighborhood and the dual-graph of the planar min-cut graph, as we show in Figure 4.1 (b), we can see that there is an equivalent min-path to the min-cut solution on the dual-graph. This has been shown to be true for all single source, single destination paths on planar graphs [72]. The approaches are equivalent in the sense (a) E s ( P , q ) D n -0 -D O — Q —1 - 0 — O — O QTaT-n-Tp C H -0-r-0-K > — O (b) D r a t H - T Q F igure 4.1: The four-neighborhood min-cut solution (a) with its dual min-path solution (b). The min-cut labeling is colored in red/blue and the min-path solution is highlighted in red. 38 that the final solution of a min-cut calculation defines the pixel labeling L(p) while the min-path solution defines the path that separates pixels of different labeling. 4.1.3 Graph Cuts This technique provides good solutions to pixel labeling problems for more than two images. The intricacies of the algorithm [26, 24, 92] are beyond the scope of this dissertation, but at a high level, Graph Cuts finds a labeling L which minimizes an energy function E '(L ). This function consists of term E s(p, q) augmented with energy associated with individual pixel locations E d(p). E '(L ) = £ E d (p )+ £ E s(p,q) p p,qeN For the panorama boundary problem, this data energy E d is typically [4] defined as being 0 if location p is a valid pixel of IL(P). Otherwise, it has infinite energy. 4.2 Pairwise Seams and Seam Trees Figure 4.2 illustrates two example pairwise image seams. In the simplest and most common case, Figure 4.2 (a), the boundary lines of the two images intersect at two points u and v connected by the seam s. The other simple, but more general case in Figure 4.2 (b) shows two overlapping images, where the intersection of their boundary lines results in an even number of intersection points. A set of seams can be built by connecting pairs of points with a consistent winding. The seams computed in this way define a complete partition of the space between the two images. In nonsimple cases, i.e., with co-linear boundary intervals, we can achieve the same result by choosing one representative point (possibly optimized to minimize an energy). Notice that the case in Figure 4.2 (b) produces more than a single set of valid seams, denoted by the purple and grey dashed lines. For clarity in the discussion, we will focus on the case in Figure 4.2 (a) since we can treat each seam of the case in Figure 4.2 (b) as independent. Assuming the dual-path energy representation in Figure 4.1 (b), a seam is a path that connects the intersection points (u,v). Computing the minimal path of a given energy function will give an optimal seam s, which can be computed efficiently with Dijkstra’s algorithm [50]. With minimal additional overhead, we can compute both min-path trees Tu and Tv from u and v (single source all paths). These trees provide all minimal seams which originate from either endpoint and define the dual seam tree of our technique. Given a point in the image overlap, we can find its minimal paths to u and v with a linear walk 39 (a) (b) F igure 4.2: (a) Given a simple overlap configuration a seam can be thought of as a path s that connects pairs of boundary intersections u and v. (b) Even in a more complicated case, a valid seam configuration is still computable by taking pairs of intersections with a consistent winding about an image boundary. Note that there is an alternate configuration denoted in gray. up the trees Tu and Tv, as shown in Figure 4.3. If this point is a user constraint, the union of the two minimal paths forms a new constrained optimal seam. Due to the simplicity of the lookup, this path computation is fast enough to achieve interactive rates even for large image overlaps. Note that two min-paths on the same energy function are guaranteed not to cross. Although, since each dual-seam tree is computed independently, the minimal paths from a constraint (to u and v) can cross. In particular, if the trees computed by Dijkstra’s algorithm are dependent on the order in which the edges are calculated and there are multiple paths in an overlap that share the same energy, the paths on each tree to a user constraint can cross. To avoid this problem we enforce an ordering based on the edge index and we are guaranteed to achieve noncrossing solutions. Moving an endpoint is also a simple walk up its partner endpoint’s seam tree. Therefore © □ G F igure 4.3: Given two min-path trees associated with a seam’s endpoints (u,v), a new seam that passes through any point in the overlap (yellow) is a simple linear walk up each tree. 40 a user can change an endpoint location at-will, interactively. Although after the movement, the shifted endpoint's seam tree is no longer valid since it was based on a previous location. If future interactions are desired, the tree must be recomputed. This can be computed as a background process after the users finish their initial interaction without any loss of responsiveness to the system. 4.3 From Pairwise to Global Seams To avoid incurring the cost associated with the solution of a global optimization, we build the panorama as a proper collection of pairwise seams. This is based on the observation, illustrated in Figure 4.4 (a), that the label assignment in a Graph Cut optimization mostly forms a simple collection of regions partitioned by pairwise image seams (denoted in the picture by the double-arrows). Our technique is designed with this property in mind and independently computes each seam constrained by the pairwise intersections called branching points. These are colored in red in Figure 4.4 (b). O---------------------Q . F igure 4.4: (a) A solution to the panorama boundary problem can be considered as a network of pairwise boundaries between images. (b) Our adjacency mesh representation is designed with this property in mind. Nodes correspond to panorama images, edges correspond to boundaries and branching points (intersections in red) correspond to faces of the mesh. (c) Graph Cuts optimization can provide more complex pixel assignments where “islands” of pixels assigned to one image can be completely bounded by another image. Our approach simplifies the solution by removing such islands. 41 Note that the solution of a Graph Cuts optimization can provide more complex pixel assignments, where “islands” of pixels assigned to one image can be completely bounded by another image, as shown in Figure 4.4 (c). Obviously, our approach simplifies the solution by removing such islands and makes each region simply connected. We have checked how the energy optimized by our technique would change with this assumption (see Section 4.7). In all cases we have noticed that the energy of the seams produced by our system remains in the same order of magnitude as Graph Cuts, actually being reduced in all cases but one. Limitations on this assumption are detailed in Section 4.8. 4.3.1 The Dual Adjacency Mesh To construct a seam network, our computations are driven by an abstract structure that we call the dual adjacency mesh. We draw the inspiration for our adjacency mesh representation from the traditional region adjacency graph used in computer vision, as well as the regions of difference (ROD) graphs of Uyttendaele et al [165]. In Figure 4.4 (b and c), we have the adjacency graph for a global, Graph Cuts computation. This graph can be considered the dual to the seam network: each node corresponds to an image in the panorama, whereas each edge describes an overlap relation between images. Edges are then orthogonal to the seam they represent. If we consider this graph as having the structure of a mesh, the dual of the panorama branching points are the faces of this mesh representation. In Figure 4.4 (b), the branching points are highlighted in red. Seams which exit this mesh representation correspond to pairwise overlaps on the panorama boundary. These are illustrated in Figure 4.4 (b) with a single yellow endpoint. Connecting the branching points on adjacent faces in the mesh and/or the external endpoints gives a global seam network of pairwise image boundaries. In addition to the branching points in the seam network, the faces of the adjacency mesh are also an intuitive representation for overlap clusters. Specifically, clusters are groups of overlaps that share a common area that we call a multioverlap. These multioverlaps are areas where branching points must occur. The simplest multioverlap beyond the pairwise case consists of three overlaps and is represented by a triangle, see Figure 4.5 (a). A multioverlap with four pairwise overlaps, can be represented by a quadrilateral, indicating that all four pairwise seams branch at a mutual point. An important property of this representation is that this quadrilateral can be split into two triangles, a classic subdivision, see Figure 4.5 (b). Any valid (no edge crossing) subdivision of a polygon in this mesh will result in 42 F igure 4.5: (a) A three overlap adjacency mesh representation. (b) A four overlap initial quadrilateral adjacency mesh with its two valid mesh subdivisions. (c) A five overlap pentagon adjacency mesh with an example subdivision. a valid seam configuration. In this way, the representation can handle a wide range of seam combinations, but keep the overall network valid. Figure 4.5 (c) shows an example subdivision of a five-way intersection. As a precomputation, we calculate the initial adjacency mesh consisting of simple n-gon face representations for every n-way cluster. This precomputation stage enables the conversion of the initial nonplanar full neighborhood graph into a planar mesh repre­ sentation, see Figure 4.6. Clusters (and their corresponding muti-overlaps) by definition are nonoverlapping, maximal cliques of the full neighborhood graph. This computation is a classic clique problem and is known to be NP-complete [43]. For most panoramas, we have found the neighborhood graph is small enough that a brute-force search can be computed quickly. Although, previous work has shown that given a graph with a polynomial bound on the number of maximal cliques, they can be found in polynomial time [141]. This is indeed the case for the neighborhood graph which has maximal boxicity [140] dimension of two [36]. After the maximal cliques have been found, each n-gon face is extracted by finding the fully spanning cycle of clique vertices on the boundary in relation to the centroids of the images. The boundary edges of the n-gon face are marked as active, while the interior (intersecting) edges are marked as inactive as shown in Figure 4.6. This adjacency mesh is used to drive the computation and combination of the pairwise boundaries as well as user manipulation. As we will illustrate in Section 4.5, it can be completely hidden from a user of the interactive system with intuitive editing concepts. 4.3.2 Branching Points and Intersection Resolution Given a collection of seam trees that correspond to active edges in the adjacency mesh, we can now combine the seams into a global seam network. To do this, we need to compute the branching points which correspond to each adjacency mesh face, adjust the seam given a possible new endpoint, and resolve any invalid intersections that may arise (in order to maintain consistency). 43 F igure 4.6: Considering the full neighborhood graph of a panorama (a), where an edge exists if an overlap exists between a pair of images, an initial valid adjacency mesh (b) can be computed by finding all nonoverlapping, maximal cliques in the full graph, then activating and deactivating edges based on the boundary of each clique. 4.3.2.1 B ran ch in g p oin ts. Assuming for each pairwise seam there exists only two endpoints, for each multioverlap one endpoint must be adapted into a branching point. We refer to this endpoint as being inside in relation to the adjacency mesh face. The other seam endpoint is considered to be outside in relation to the multioverlap. These can be computed by finding the endpoints which are closest (euclidean distance) to the multioverlap associated with the face. Figure 4.7 (a) displays these endpoints with the color red and the multioverlap area with a blue shading. Although it is possible to create a pathological overlap configuration where this distance metric fails, we have found that this strategy works well in practice. If we use the dual seam tree distances, i.e., the path distance values associated with the outside endpoints, we can compute a branching point which is optimal with respect to these paths, as illustrated in Figure 4.7 (b). This can be accomplished with a simple lookup of the distance values in the trees. We have found that minimizing the sum of the least squared error provides a nice low energy solution. The new path associated with a moved endpoint is determined by a simple walk up the dual seam tree, see Figure 4.7 (c). Additionally, each seam tree associated with the branching point is recalculated given its location. As Figure 4.7 (d) illustrates, the branching point is always computed using the distance field of the initial endpoint location even if this point had been previously adjusted by an adjacent face. In practice, we have found the contribution of the root location is minimal to the overall structure of the seam tree towards the leaves of the tree. Since using the initial starting endpoints allows the branching points to be computed independently and in a single parallel pass, we have adopted this into our technique. The seams produced by this initial process in the four-overlap case are similar to the 44 Tree Lookup—a. f ---------- J J •»- I ck 1— jJ \ K. i |M >------ V-J r A o----- O — r\ _ /' — - jlr-v ' V X" o — ---- (a) (b) (c) (d) F igure 4.7: (a) Pairwise seam endpoints closest to a multioverlap (red) are considered a branching point. (b) This can be determined by finding a minimum point in the multioverlap with respect to min-path distance from the partner endpoints. (c) After the branching point is found, the new seams are computed by a linear lookup up the partner endpoint’s seam tree. (d) To enable parallel computation, each branching point is computed using the initial endpoint location (green) even if it was moved via another branching point calculation (red). sequential techniques introduced by Efros and Freeman [53] and Cohen et al. [42]. With the additional adjacency mesh, our technique is much more expressive in the possible seam configurations (especially allowing arbitrary valence branching points). In addition, as we will illustrate next, for panoramas and especially in an interactive setting one cannot assume that a seam’s path to a branching point respects the paths of other seams. 4.3.2.2 R e m o v in g invalid in tersection s. Since each seam is computed using a separate energy function, seam-to-seam intersections beyond the branching points are pos­ sible. Small intersections of this type must be allowed to ensure solutions are computable in a four-neighborhood configuration. For instance, there would be no nonintersecting way to combine five seams into a single branching point. This allowance is defined by an e-neighborhood around the branching point which can be set by the user. We have found allowing an intersection neighborhood of one or two pixels gives good results with no visible artifacts from the intersection. The intersections in this neighborhood are collapsed to be co-linear to the shortest of the intersecting paths. Intersections that occur outside of this e-neighborhood must be resolved due to the inconsistent pixel labeling that they imply. Figure 4.8 (a, c) shows an example of in­ tersections in a four-way image overlap. The areas highlighted in gray have conflicting image assignments. Enforcing no intersections at the time of the seam computation would complicate parallelism and be overly expensive. This corresponds to a k-way planar escape problem with multiple energies (where k is the number of seams incoming to the branching point) for which variants have been shown to be NP-complete [179]. This could also lead to possible unstable interactions since small movements may lead to extremely large changes in the overall seam paths. The simplest solution is to choose one assignment per conflict 45 A B " ~~ - ~ t" „ " C D ( a) C D ( b) A A \ \ B o v o A B - I" \ C D 0 — 0 - 1 - C D ( c) O V O 0-0 A \ B " - - C D ( d) F igure 4.8: (a) Pairwise seams may produce invalid intersections or crossings in a multioverlap, which leads to an inconsistent labeling of the domain. The gray area on the top can be given the labels A or B and on the bottom either C or D. (b) Choosing a label is akin to collapsing one seam onto the other. This leads to new image boundaries, which were based on energy functions that do not correlate to this new boundary. The top collapse results in a B-C boundary using an A-B seam (C-D seam for the bottom). (c and d) Our technique performs a better collapse where each intersection point is connected to the branching point via a minimal path that corresponds to the proper boundary (B-C). One can think of this as a virtual addition of a new adjacency mesh edge (B-C) at the time of resolution to account for the new boundary. A B C D area. This is equivalent to collapsing the area and making the two seams co-linear at points where they “cross.” Each collapse introduces a new image boundary for which the wrong energy function has been minimized, Figure 4.8 (a, b). In our technique, we perform a more sophisticated collapse. For a given pair of intersecting seams, multiple intersections can be resolved by taking into account only the furthest intersection from the branching point in terms of distance on the seam. Given that each seam divides the domain, this intersection can only occur between seams that divide a common image. If presented with a seam-to-seam intersection, we can easily compute the new boundary that is introduced during the collapse. This is simply a resolution seam on an overlap between the images which is not common between the intersection seams. The resolution seams connect the intersection points with the branching points. Often, if multiple resolution seams share the same overlap, as in Figure 4.8, only one min-path calculation from the branching point is needed to fill in all min-paths. The new resolution seams are constrained by the other seams in the cluster in order to not introduce new intersections with the new paths. The constraints are also given the ability to gradually increase the allowed intersection neighborhood beyond the user defined e-neighborhood in 46 the chance that no solution path exists. The crossings and intersections are collapsed in this neighborhood. Due to the rarity of this occurrence, the routine adds minimal overhead to the overall technique in practice. Order matters in both finding the intersections and computing the resolution seams and therefore must be consistent. We have found that ordering based on the overlap size works well. Resolution seams and expanded e-neighborhood are considered to be temporary states. Figure 4.8 (c, d) shows an example of an intersection resolution. This technique robustly handles possible seam intersections at the branching points. Most importantly, since we are only adjusting the seam from the intersection point on, we can resolve each adjacency mesh face in parallel. In addition, since the seam is not changed outside of the multioverlap within a cluster, the resolution is local and will not cascade to a global resolution. However, it is possible for a user to introduce unresolvable intersections through added constraints, as we will discuss in Section 4.5. 4.4 Out-of-Core Seam Processing While designed to be light on resources, the technique outlined in the previous sections assumes all images can be stored in-core. This assumption holds true for many panoramas, but not images gigapixels in size. In this section, I will detail how the original technique can be modified to handle large panoramas. As illustrated in Figure 4.9, the initial seam solution for the Panorama Weaving tech­ nique can be thought to occur in three phases. First, for each adjacency mesh face, the corresponding branching point must be computed. After the branching point is found, any seams which occur on the panorama boundary can be computed. These correspond to edges in our mesh that belong to only one face. See Figure 4.9 (a). As mentioned previously, the branching point computations for each face given our simple pairwise seam assumption are completely independent and can be computed in parallel. As a second phase, the seams for the shared edges can be found by connecting the newly computed branching points. Each shared edge can be processed independently in parallel. See Figure 4.9 (b). Finally, once all edges for a face are computed the seam intersections for the given face can be resolved. This resolution is also independent and parallel for each adjacency mesh face. See Figure 4.9 (c). Note that each phase need not be entirely distinct since the can be interleaved. If we bundle the branching point and shared seam calculation into a single operation, the logic of our out-of-core computation would only deal with the adjacency mesh faces. This can drastically reduce the system complexity even when working in a multicore environment. 47 0 + < > K > + < > -v O 0 4 < > K > ^ 0 -r O (a) (b) (c) F igure 4.9: The phases of out-of-core seam computation. (a) First, branching points are computed. The seams for all unshared edges can also be computed during this pass. (b) Second, once the corresponding branching points are computed, all unshared edges can be computed with a single min-path calculation. (c) Third, once all the seams for the edges for a given face have been computed, the intersections can be resolved. Note, the three passes do not necessarily need to be three separate phases since they can be interleaved when the proper input data are ready. By doing this, we also cast our problem into the simple problem of graph traversal for the two remaining phases of our seam processing system. For our traversal strategy, we chose a simple row-major traversal of the mesh faces. 4.4.1 Branching Point and Shared Edge Phase The design goal for this phase is to keep the memory requirement low and predictable. The reasoning for this approach is twofold. First, low memory requirements enable the portability of our out-of-core system to a wide variety of systems. Such a technique has the ability to run on systems from laptops to HPC computers. Second, when moving into a multicore implementation being able to have a low, predictable memory requirement per face would make the logic and scheduling of the many threads operating on the adjacency mesh faces very simple. Given the resources of the system, we can predict how many faces the system can compute in parallel. We achieve our low memory footprint by computing the branching point for a given face in a “round-robin” fashion, as shown in Figure 4.10. For each edge, the images that correspond to its endpoints are loaded and the overlap energy is calculated. Next, the seam tree needed for the branching point is calculated. After this calculation, the overlap energy is no longer needed and is therefore ejected from memory. The calculation then moves onto the next edge that shares an image with the previous computation. The rest of the branching point calculation proceeds in the same way until all seam trees have been computed to compute the location of the new branching point. As mentioned above, we couple the shared edge phase into this phase of calculation. This 48 (a) (b) F igure 4.10: The low memory branching point calculation for our out-of-core seam creation technique. (a) Given a face for which a branching point needs to be computed, (b) the computation proceeds “round-robin” on the edges of the face to compute the needed seam trees. The images that correspond to the edge endpoints and overlap energy are only needed during the seam tree calculation for a given edge on the face. Therefore by loading and unloading these data during the “round-robin” computation, the memory overhead for the branching point computation is the cost of storing two images, one energy overlap buffer, and one for the seam trees for the given face. is done with a flag per face to indicate whether the branching point has been computed. After the branching point has been computed for a face, the flag is set and the process checks to see if the flag is also set for the other faces on a face's shared edges. If the face calculation is the second to set this flag, it knows it can fill in the new seam for the shared edge. For the multicore implementation, this check is made atomic with locks to ensure there is always a clear first- and second-face calculation when setting flags for the endpoints of a shared seam. Note that the memory overhead for this computation is equal to the space required to store two images, one buffer for the overlap energy, and one for the seam trees for the edges of the mesh face. As you can see, this overhead is quite low and given an average image size and overlap percentage, very predictable. The geometry of the seams are stored in-core for the final phase of the calculation. 4.4.2 Intersection Resolution Phase When all seams for a face have been computed, the seam intersections that correspond to the face can be resolved. The intersection test is computed on the seam geometry which is already in memory. If an intersection occurs within a threshold distance on the seam from the branching point it is considered small and collapsed. This is an equivalent operation to the intersection collapse from the in-core Panorama Weaving technique. For larger 49 intersections, a resolution seam must be computed and the two images that correspond to the overlap of the resolution seam need to be loaded before computation. See Figure 4.11 for an example. 4.5 Weaving Interactive System In this section, we outline how to create a light and fast interactive system using the Panorama Weaving technique. A simplified diagram of the operation of the system is given in Figure 4.12. In Section 4.7, we provide examples of this application editing a variety of panoramas. 4.5.1 System Specifics In this subsection, I will detail specifics for the prototype system. 4.5.1.1 In pu t. The system inputs for our prototype are flat, registered raster images with no geometry save the image offset. Any input can be converted into this format and therefore it is the most general. The initial image intersection computation is computed using the rasterized boundaries. Due to aliasing, there may be many intersections found. If the intersections are contiguous, they are treated as the same intersection and a represen­ tative point is chosen. In practice, we have found this choice has little effect on the seam outside a small neighborhood (less than 10 pixels from the intersection). Therefore, the system picks the minimal point in this group in terms of the energy. Pairs of intersections that are very close in terms of euclidean distance (less than 20 pixels) are considered to be infinitesimally small seams and are ignored. The user is also allowed to dictate the energy function for the entire panorama, image, or overlap. This can be done as an initial input parameter or within the interactive program \ \ " - i r~ — / F igure 4.11: For intersections that require a resolution seam, the two images which correspond to the overlap needed for the seam must be loaded. In the figure above, these images are the ones that correspond to the endpoint of the diagonal, resolution adjacency mesh edge. 50 A utom atic C om putation F igure 4.12: Overview of Panorama Weaving. The initial computation is given by steps one through four, after which the solution is ready and presented to the user. Interactions, steps five and six, use the tree update in step four as a background process. Additionally, step six updates the dual adjacency mesh. itself. Specifically, our prototype allows the user to switch between pixel difference or gradient difference energies. 4.5.1.2 Initial parallel co m p u ta tio n . Parallel computation is accomplished using a thread pool equal to the number of available cores. The initial dual seam tree and branching points computation can be run trivially in parallel. In the presence of two adjacent faces in the adjacency mesh, a mutex flag must be used on their shared seam since both faces may attempt to write this data simultaneously. As a final phase, each adjacency mesh face resolves intersections in parallel. In order to compute these resolutions in parallel, we split a seam’s data into three separate structures for the start, middle, and end of the seam. The middle seam contains the original seam before intersection resolution and its extent is maintained by pointers. The structure’s start and end are updated with the intersection resolution seams by the faces associated with their respective branching points. Either vector can be associated only with one face; therefore, we run no risk of multiple threads writing to the same location. Each seam tree is stored as two buffers: one for node distance and one which encodes the tree itself. The tree buffer encodes the tree with each pixel referencing its parent. This can be done in 2 bits (rounded to 1 byte for speed) for a four-pixel neighborhood. Therefore, for float distances we need only 5 bytes per pixel to store a seam tree. 4.5.1.3 Seam n etw ork im p o rt. It is possible to import a seam network computed with an alternative technique (such as Graph Cuts, see Figure 4.13), and edit it with our 51 F igure 4.13: Importing a seam network from another algorithm. The user is allowed to import the result generated by Graph Cuts (a) and adjust the seam between the green and purple regions to unmask a moving person (b). Note that this edit has only a local effect, and that the rest of the imported network is unaltered. system. Our import procedure works as follows. Given a labeling of the pixels of the panorama, the algorithm first extracts the boundaries of the regions. Then, branching points (boundary intersections) are extracted. Next, each boundary segment (bounded by two branching points) is identified as a seam and the connected components of the resulting seam network are identified. To be compatible with our framework, only the seam networks made of a single connected component can be imported. Thus, we only consider the biggest connected component of the network and small islands are discarded. Finally, our seam data-structures are fed with the seam network and the adjacency mesh is updated if necessary. Since the editing operations do not cascade globally, a user can edit a problem area locally and maintain much of the original solution if desired. 4.5.2 Interactions In this subsection, I will detail some possible interactions that can be accomplished with our system. 4.5.2.1 Seam b en d in g. The adding of a constraint and its movement is called a bending interaction in our system and operates as outlined in Section 4.2. A user is allowed to add a constraint to a seam and is provided instantly the optimal seam which must pass through it. The constraint can be moved interactively to explore the seam solution space. Intersections in any adjacency mesh face containing the corresponding edge are resolved, which can be done in parallel. Most importantly, given how the technique resolves intersections, seams cannot change beyond the multioverlap area in these faces. Therefore, the seam resolution does not cascade globally. 52 4.5.2.2 Seam splitting. Adding more than one constraint is akin to splitting the seam into segments. After a bending interaction, the seam trees are split into four, where there were previously two. Two of the trees (corresponding to the endpoints) are inherited by the new seams. The two trees associated with the new constraint are identical, therefore only one tree computation is necessary. Splitting occurs in our prototype when a user releases the mouse after a bending interaction. Editing is locked for this seam until the corresponding trees are resolved. This is a quick process and it is very rare for a user to be fast enough to beat the computation. 4.5.2.3 B ran ch in g p oin t m ovem en t. The user is given the ability to grab and move the branching point associated with a selected face of the adjacency graph As I have detailed in Section 4.2, a movement of an endpoint is a simple lookup on its partner’s dual seam tree. As the user moves a branching point, intersections for both the selected face and all adjacent faces are resolved. Given that the intersection resolution does not adjust seam geometry beyond the multioverlap, we need only to look at this one-face neighborhood and not globally. To enable further interaction, the seam trees associated with this endpoint need to be recalculated after movement. When the user releases the mouse, the seam tree data for all the endpoints associated with the active seams for the face are recomputed as a background process in parallel. Like splitting, editing is locked for each seam until it completes the seam tree update. 4.5.2.4 B ran ch in g p oin t splittin g and m erging. The user can add and remove additional panorama seams by splitting and merging branching points. Addition and removal of seams is equivalent to subdividing and merging faces of the adjacency mesh. Improper requests for a subdivision or merge correlate to a non-valid seam network and are therefore restricted. If splitting is possible for a selected branching point, the user can iterate and choose from all possible subdivisions of the corresponding face. To maintain consistent seams, merging is only possible between branching points resulting from a previous split. In other words, merging faces associated with different initial adjacency mesh faces would lead to an invalid seam configuration since the corresponding images do not overlap. If a seam is added, its dual seam tree is computed. In addition, the other active seams associated with this face will need to be updated much like a branching point movement. 4.5.2.5 Im p ro p e r user in teraction . Given the editing freedom allowed to users, they may move a seam into a inconsistent configuration. Figure 4.14 illustrates some examples. Rather than constrain the user, the prototype system either tries to resolve the 53 improper seams or if that is not possible give the user visual feedback indicating a problem configuration. For example, if the user introduces a seam intersection, our intersection routine is launched to resolve it, see Figure 4.14 (a). Crossing branching points, Figure 4.14 (b), can be resolved similarly. Figure 4.14 (c) illustrates a configuration with no resolution. In this instance, the crossing edges are collapsed and the user is given a visual hint that there is a problem. Given the locality of the interactions in the Panorama Weaving technique, extending the interactions to gigapixel sized images does not require a large change to the base interactions. Our interaction scheme is based on a standard large image viewer and on-the-fly loading and computation of the data needed for seam editing. As Figure 4.15 shows, due to the technique's simple pairwise seam assumption, the data which needs to be loaded and computed is local given an interaction. Our system works as follows: we leverage the large image, out-of-core ViSUS system outlined in Chapter 3 to provide exploration of a flattened gigapixel image created from the seams from our initial out-of-core seam computation. If a user wishes have a seam or branching point interaction, she/he can initiate this edit my selecting a seam area she/he wished to edit. The action is determined similarly to the in-core seam system in that the overlap bounding boxes are tested against the user selected area. If all the overlap bounding boxes for a given face are selected, then it is assumed the user wishes to have a branching point manipulation. Otherwise, it is assumed the user wishes a seam bending interaction and a single overlap from the selection is chosen. A user can cycle through the single selection options if more than one is present for a given input. A brute-force bounding box intersection test has the possibility of being overly expensive for panoramas which contain thousands of images and overlaps, therefore, we have designed two i i i i i , i■ i i i i * i * F igure 4.14: Improper user constraints are resolved or if resolution is not possible, given visual feedback. (a) Resolution of an intersection caused by a user moving a constraint. (b) Resolution of an intersection caused by a user moving a branching point. (c) A non- resolvable case where a user is just provided a visual cue of a problem. 4.6 Scalable Seam Interactions (a) (b) (c) 54 F igure 4.15: Given the inherent locality of the seam editing interactions, only a very small subset of the adjacency mesh needs to be considered. (a) For operations on an adjacency mesh face (i.e., branching point operations) only the images and overlaps of the corresponding face and its one face neighborhood need to be loaded and computed. (b) For edge operations (i.e., bending), we need consider only the faces that share the edge. more scalable options for the selection. As Figure 4.16 (a) illustrates, a bounding hierarchy of overlaps can be built by merging the bounding boxes of pairs of neighboring adjacency mesh faces. During a user selection, the hierarchy is traversed to determine which faces need to be considered for selection. Once this is determined, the selected face’s overlaps are testing for user selection. This provides a selection runtime that is logarithmic in the number of faces in our panorama. Alternatively, as Figure 4.16 (b) shows, if there is a pixel to image map this can be leveraged to determine the neighborhood of overlaps that need to be tested for selection. This neighborhood consists of the edges of faces that share the node that corresponds to the pixel-map image. After the user finishes their interaction, the F igure 4.16: When a user selects an area of a panorama to edit, the system must determine which overlaps intersect with the selected area. This can be accomplished with a (a) bounding hierarchy of the overlaps. During selection this hierarchy is traversed to isolate the proper overlaps for the selection. This gives a logarithmic lookup with respect to the number of adjacency mesh faces in the panorama. Alternatively, (b) if a pixel-to-image labeling is provided, this can be used to isolate a fixed neighborhood that needs to be tested for overlap intersection. This labeling is commonly computed if the panorama is to be fed into a color correction routine after seam computation. 55 seams are saved and the loaded images are masked and saved to the flattened image. 4.7 Results In this section, we detail the results in both the creation and editing phases of our system. In-core results were performed on a 3.07 GHz Intel i7 four-core processor (with Hyperthreading) with 24 gigabytes of memory. The large system memory was required in order to run the Graph Cuts implementation, as is, on all datasets. Panorama Weaving performed well for all datasets on test systems including laptops with only 4 gigabytes of memory. Out-of-core results were run on 2.67GHz Xeon X5550 (eight cores) system with 24 gigabytes of memory. 4.7.1 Panorama Creation This subsection details the results for in-core and out-of-core initial seam creation. 4.7.1.1 In -c o r e results. We compare the panorama creation phase of our system to the implementation provided by the authors of the Graph Cuts technique [26, 24, 92], which many consider the exemplary implementation. Both a expansion and swap algorithms were run until convergence to guarantee minimal errors and the best time is reported. Since this implementation has various ways of passing data and smoothness terms, we tested all and report the fastest, which is precomputed arrays for the costs with a function pointer acting as a lookup. Not having an equally well-established in-core parallel implementation for Graph Cuts, we use a serial version of our algorithm for comparison. Timings for Graph Cuts are based on the implementation's reported runtime. Due to the parallel option of Panorama Weaving, its timings are based on wall-clock time. Datasets which contain more than simple pairwise overlaps were run at full resolution and the running times and energy comparisons are provided in Table 4.1. Our technique produces lower energy seams for all but one example, Fall-5way, and even in this case the techniques have comparable energy. In terms of performance, serial Panorama Weaving computes its solution faster than the Graph Cuts for all datasets (at the same resolution). As the Graph Cuts results show, a hierarchical approach would be necessary to achieve similar performance by trading quality for speed. Parallel Panorama Weaving further reduces the runtime down to mere seconds for all datasets at full resolution. On average, we see that the scaling performance between Panorama Weaving’s serial and parallel implementations to be about a five times speedup. This is in sync with the number of physical cores in the test system. Hyperthreading is effective when data access is a main bottleneck. A speedup corresponding to the number 56 Table 4.1: Performance results comparing Panorama Weaving to Graph Cuts for our test datasets that contain more than simple pairwise overlaps. Panorama Weaving run serially (PW-S) computes solutions quickly. When run in parallel, runtimes are reduced to just a few seconds. The energy ratio (E. ratio) between the final seam energy produced by Panorama Weaving and Graph Cuts (PW Energy / GC Energy) is shown. For all but one dataset (Fall-5way), Panorama Weaving produces a lower energy result. It is comparable otherwise. Panorama image sizes are reported in megapixels (MP). Dataset Megapixel Images PW Parallel PW Serial GC Serial E. Ratio Crosswalk 16.7 4 1.3 7.2 369.6 0.995 Fall-5way 30.0 5 2.4 12.1 735.4 1.220 Skating 44.7 6 3.2 16.8 734.0 0.851 Lake 9.4 22 0.5 2.9 337.2 0.503 Graffiti 36.6 10 4.3 19.6 983.7 0.707 Nation 49.1 9 4.6 23.2 1168.7 0.800 of physical cores should be expected when an algorithm is compute-bound which is true for Panorama Weaving. Therefore our implementation is scaling quite well on our test system. 4.7.1.2 O u t-o f-c o r e results. To test the performance for our out-of-core imple­ mentation, we computed the seams for two large panoramas on our eight core test system. The Fall Salt Lake City panorama consist of 611 overlapping images and is 126,826 x 29,633, 3.27 gigapixel, when combined. The final image that is the result of our seam computation is provided in Figure 4.17 (b). Additionally, in Table 4.2 we provide strong scaling test of our implementation for this panorama as we vary the core count from one to eight. As this table illustrates, our implementation shows very good efficiency up to the number of cores of our test system. At eight cores, our efficiency takes a slight dip due to our implementation using a dedicated thread to schedule the face computation for each phase of the out-of-core Panorama Weaving technique. On a single core, our system can produce a seam solution in only 68.5 minutes. As I have discussed in Section 2.1 Hierarchical Graph Cuts [2] does not scale beyond two to three levels of the hierarchy. At three levels, the coarse version of this panorama is still approximately 100 megapixel in size with 611 labels and could not be run on our test system. Therefore, to provide context for our running times, we compare our technique to the predicted runtime of a similar technique [94] which relies on a moving window of a Graph Cuts solution. Figure 4.17 (a) provides an example of one of these windows. In our tests, a Graph Cuts solution took 3003.86 seconds to converge. Even more problematic is that the first iteration for this window still took a very long time to compute: 612.99 seconds. Therefore, if this window is a good representation for this 57 F igure 4.17: Fall Salt Lake City, 126, 826 x 29, 633, 3.27 gigapixel, 611 images. (a) An example window computed with out-of-core Graph Cut technique introduced in Kopf et al. [94]. This single window took 50 minutes for Graph Cuts to converge, with the initial iteration requiring 10.2 minutes. Since the full dataset contains 495 similar windows, using the windowed technique would take days (85.15 hours) at best, and weeks (17.2 days) in the worst case. (b) The full resolution Panorama Weaving solution was computed in 68.4 minutes on a single core and 9.5 minutes on eight cores. Our single core implementation required a peak memory footprint of only 290 megabytes while using eight cores had peak memory of only 1.4 gigabytes. Table 4.2: Strong scaling results for the Fall Salt Lake City panorama, 126, 826 x 29, 633, 3.27 gigapixel, 611 images. Our out-of-core Panorama Weaving technique scales very well in terms of efficacy percentage compared to ideal scaling up to the physical cores of our test system (eight cores). At eight cores our technique loses a slight amount of efficiency due to our implementation having a dedicated thread to handing the seam scheduling. Using the full eight cores to process this panorama provides a full resolution seam solution in just 9.5 minutes. The system is extremely light on memory and uses at most 1.4 gigabytes. Cores Tim e(s) Ideal(s) Efficiency Max Mem. 1 4109 N A NA 290 MB 2 2079 2054.5 98.8% 443 MB 3 1403 1369.7 97.6% 599 MB 4 1049 1027.3 97.9% 791 MB 5 840 821.8 97.8% 881 MB 6 706 684.8 97.0% 1.1 GB 7 601 587.0 97.7% 1.2 GB 8 573 513.6 89.6% 1.4 GB panorama dataset (which our testing indicates that it is) computing all 495 windows in this dataset would take days (85.15 hours) at best, and weeks (17.2 days) in the worst case. Our implementation can compute a full resolution solution in a little over an hour for a single core and only a few minutes when run on eight cores. Also of note is the small use of memory inherent with our scheme. At any given time, we need only hold the cost of computing the seams for a face per core. Therefore the per-core memory footprint is only the cost of holding two images, an energy buffer, and the seam tress for a given face in memory. Even with floating point precision and eight cores, for this dataset our technique uses at most 1.4 gigabytes of memory. To further test the scalability of our system, our 58 implementation was run on even larger image. The Lake Louise panorama consists of 1512 images and is 187,069 x 40.202 (7.52 gigapixel) when combined. In Table 4.3 we provide a strong scaling test for our implementation for this dataset. Like the previous example, our implementation shows very good efficiency for one to eight cores on our test system. The system is very light on memory resources and needs only 2.0 gigabytes of memory to operate on all eight cores. When using all cores, our implementation can provide a seam solution in only 37.7 minutes. The final image resulting from our seam calculation is provided in Figure 4.18. 4.7.2 Panorama Editing We provide additional results of the interactive portion of our technique editing a variety of panoramas. Images which are color-corrected were processed using gradient domain blending [133, 103]. 4.7.2.1 E d itin g b a d seams. In Figure 4.19, the Nation dataset is a highly dynamic scene of a busy intersection with initial seams that pass through moving cars/people, see Figure 4.19(d). In addition, there are various registration artifacts, see Figure 4.19(e). Before our technique, a user would consider this panorama unsalvageable or be required to manually edit the boundary masks pixel-by-pixel. In just a few minutes, using our system, a user can produce an appealing panorama by adjusting seams to account for the moving objects and pulling registration artifacts into areas which are less noticeable. Figure 4.20 (a, b and c) shows the initial seam configuration for the Skating dataset with two problem areas. The initial seams pass through people who change position on the ice and produce Table 4.3: Strong scaling results for the Lake Louise panorama, 187, 069 x 40.202, 7.52 gigapixel, 1512 images. Like the smaller Fall Salt Lake city panorama, our implementation shows very good efficiency up to the physical number of cores on our test system. Using the full eight cores for the full resolution seam solution for this panorama requires 37.7 minutes of compute time and at most 2.0 gigabytes of memory. Cores Tim e(s) Ideal(s) Efficiency Max Mem. 1 16279 N A NA 382 MB 2 8263 8139.5 98.51% 627 MB 3 5516 5426.3 98.37% 877 MB 4 4132 4069.8 98.49% 1.1 GB 5 3306 3255.8 98.48% 1.4 GB 6 2778 2713.2 97.67% 1.6 GB 7 2383 2325.6 97.59% 1.8 GB 8 2259 2034.9 90.08% 2.0 GB 59 F igure 4.18: Lake Louise, 187,069 x 40.202, 7.52 gigapixel, 1512 images. The Panorama Weaving results for the Lake Louise panorama. Our out-of-core seam computation produces this full resolution solution in as little as 37.7 minutes while requiring at most only 2.0 gigabytes of memory. Panorama courtesy of City Escapes Nature Photography (d) (e) (f) F igure 4.19: Panorama Weaving on a challenging data-set (Nation, 12848 x 3821, nine images) with moving objects during acquisition, registration issues and varying exposure. Our initial automatic solution (b) was computed in 4.6 seconds at full resolution for a result with lower seam energy than Graph Cuts. Additionally, we present a system for the interactive user exploration of the seam solution space (c), easily enabling: (d) the resolution of moving objects, (e) the hiding of registration artifacts (split pole) in low contrast areas (scooter) or (f) the fix of semantic notions for which automatic decisions can be unsatisfactory (stoplight colors are inconsistent after the automatic solve). The user editing session took only a few minutes. (a) The final, color-corrected panorama. F igure 4.20: Repairing non-ideal seams may give multiple valid seam configurations. (a) The initial seam configuration for the Skating dataset (9400 x 4752, six images) based on gradient energy. (b and c) Its two major problem areas. (d and e) Using our technique a user can repair the panorama, but also has the choices of two valid seam configurations. Panorama courtesy of City Escapes Nature Photography. 60 either an amalgamation of two positions of a single person or a partial person. As shown in the companion video, repairing these seams only takes a few seconds of interaction, see Figure 4.20 (d and e) for edited results. Figure 4.21 illustrates how a user can correct registration artifacts that appear on the moon’s horizon in the Apollo-Armstrong dataset. 4.7.2.2 M u ltip le valid seams. Along with repairing unideal seams, Figure 4.19 and 4.20 (Nation and Skating) are also examples of a user choosing between multiple valid seam configurations. In Figure 4.19 (f), the initial seam calculation for the Nation dataset produces an intersection with four red stoplights, an inconsistent configuration. With our system, a user can turn two stoplights green creating a more realistic setting. Figure 4.20 (bottom) shows 2 valid seam configurations the that user can choose while fixing the Skating dataset. Each was repaired with a simple bend of the panorama seam. In Figure 4.22, we provide an example of how a user can fix registration artifacts of the dataset (Graffiti) while tuning the seam location for improved results in the final color-correction. For gradient- domain blending, smooth, low-gradient areas provide the best results, therefore the user placed the seams in the smooth wall locations, Figure 4.22 (c). This editing session required just 2 minutes of interaction. Finally, in Figure 4.23 we show the color-corrected edits of the originally optimal, but non-visually pleasing, seams of Figure 1.7 for the two datasets: Canoe and Lake Path. Both interactions required only a few seconds of user input. Figure 4.24 is a Lake vista with multiple dynamic objects moving in the scene during acquisition. In all, there are six independent areas in the panorama where a canoe, or groups of canoes, change positions in overlap areas. Figure 4.24 shows two examples of alternative edits. A user editing with our technique would have the choice of 64 valid seam combinations of canoes. In Figure 4.25, we show a user iterating through valid splitting options of a five valence branching point of the Fall-5way dataset. In this way, we allow users the freedom to add and remove seams as they see fit. Finally, the images Crosswalk and Apollo-Aldrin in Figure 1.9 were created and edited in our system to show how panoramas can have multiple valid seam configurations. 4.8 Limitations and Future Work Our technique is versatile and can robustly handle a multitude of panorama configura­ tions. However, there is currently a limitation on the configurations which we can handle. The adjacency mesh data structure in its current form relies on the fact that the intersection of pairwise overlaps yields an area of exactly one connected component (which is needed 61 F igure 4.21: A panorama taken by Neil Armstrong during the Apollo 11 moon landing (Apollo-Armstrong: 6,913 x 1,014, eleven images). (a) Registration artifacts exist on the horizon. (b) Our system can be used to hide these artifacts. (c) The final color-corrected image. Panorama courtesy of NASA. F igure 4.22: In this example (Graffiti: 10,899 x 3,355, ten images), (a) the user fixed a few recoverable registration artifacts and tuned the seam location for improved gradient-domain processing, yielding a colorful color-corrected graffiti. (b) Our initial automatic solution (energy function based on pixel gradients). (c) The user edited panorama. The editing session took 2 minutes. 62 F igure 4.23: The color-corrected, user edited examples from Figure 1.7. The artifacts caused by the optimal seams can be repaired by a user. Images courtesy of City Escapes Nature Photography. F igure 4.24: A lake vista panorama (Lake: 7,626 x 1,231, 22 images) with canoes which move during acquisition. In all there are six independent areas of movement, therefore there are 64 possible seam configurations of different canoe positions. Here we illustrate two of these configurations with color-corrected versions of the full panorama (a and c) and a zoomed in portion on each panorama (b and d) showing the differing canoe positions. Panorama courtesy of City Escapes Nature Photography. F igure 4.25: Splitting a five valence branching point based on gradient energy of the Fall-5way dataset (5211 x 5177, 5 images): as the user splits the pentagon, the resulting seams mask/unmask the dynamic elements. Note that each branching point that has a valence higher than 3 can be further subdivided. 63 to guarantee the manifold structure of the mesh). For example, less than one connected component would arise in a situation where one overlap is completely incased inside another and more than one can be caused by an overlap’s area passing through the middle of another overlap. Both of these cases break the pairwise seam network assumption. In addition, an image whose boundary is completely enclosed by another image’s boundary (100% overlap) is currently considered invalid. These are pathological cases that we have yet to encounter in practice. Overall, the authors feel that these limitations are only temporary and that the data-structures and methods outlined in this work are general enough to support these cases as a future extension. The serial and parallel out-of-core implementations discussed above show good scale seam processing to images gigapixels in size. The schemes show good scaling for our test datasets even though there is some redundancy in the file I/O . Each image is loaded for ever multioverlap which it is a member. In other words, an image is loaded (reloaded) an equivalent number of times based on the valency of its adjacency mesh node. As future work, my collaborators and I wish to explore caching strategies for images and overlaps, as well as how these strategies interplay with different face traversal order. CHAPTER 5 INTERACTIVE GRADIENT DOM AIN EDITING AT SCALE This chapter introduces the Progressive Poisson technique which brings the color-correction phase of panorama creation pipeline into an interactive environment. This technique provides gradient domain processing interactively which is the most sophisticated and computationally expensive correction method. This operation is an inherently global and therefore, before this work, there were no known techniques on applying it to massive images interactively. Section 5.2 outlines the interactive technique and a full resolution out-of-core Progressive Poisson solver. Section 5.3 extends the full solver to a parallel distributed environment and Section 5.4 shows how it can be redesigned as a cloud based resource. 5.1 Gradient Domain Image Processing Gradient domain image processing encompasses a family of techniques that manipulate an image based on the value of a gradient field rather than operating directly on the pixel values. Seamless cloning, panorama stitching, and high dynamic range tone mapping are all techniques that belong to this class. Given a gradient field G (x ,y ), defined over a domain Q C K2, we seek to find an image P (x ,y ) such that its gradient V P fits G (x ,y ). In order to minimize ||VP — G || in a least squares sense, one has to solve the following optimization problem: m i n £ ||VP — G||2 (5.1) It is well known that minimizing equation (5.1) is equivalent to solving the Poisson equation A P = div G (x ,y ) where A denotes the Laplace operator A P = + dyr and div G (x,y) denotes the divergence of G . To adapt the equations shown above to discrete images, we apply a standard finite difference approach which approximates the Laplacian as: A P (x,y) = P (x + 1, y) + P (x — 1,y) + 65 P ( x , y + 1) + P ( x , y — 1) - 4 P ( x , y ) (5.2) and the divergence of G (x ,y ) = (Gx ( x , y ) , G y (x ,y )) as: div G (x ,y ) = Gx(x,y) — Gx(x — 1,y) + Gy (x, y) — Gy (x ,y — 1). The differential form A P = div G(x, y) can therefore be discretized into the following sparse linear system: Lp = b. (5.3) Each row of the matrix L stores the weights of the standard five point Laplacian stencil given by (5.2), p is the vector of pixel colors, and b encodes the guiding gradient field, as well as the boundary constraints. The choice of guiding gradient field G (x, y) and boundary conditions for the system determines which image processing technique is applied. In the case of seamless cloning, it is necessary to use Dirichlet boundary conditions, set to be the color values of the background image at the boundaries, and the guiding gradient to be the gradient of the source image (see [133] for a detailed description). For tone mapping and image stitching, Neumann boundary condition are used. The guiding gradient field for image stitching is the composited gradient field of the original images. The unwanted gradient between images is commonly set to zero or averaged across the stitch. The guiding gradient for tone mapping is adjusted from the original pixel values to compress the high dynamic range (HDR) (see [55] for more detail). Methods such as gradient domain painting [114] allow the guiding gradient to be user defined. 5.2 Progressive Poisson Solver This section discusses a progressive framework for solving very large Poisson systems in massive image editing. This technique allows for a simple implementation, yet is highly scalable, and performs well even with limited storage and processing resources. 5.2.1 Progressive Framework For an image P of n x n pixels, the Laplace system (5.3) has n2 independent variables, one per pixel. Computing the entire solution is therefore expensive both in terms of space and time. For large images, the space requirements easily exceed the main memory available on most computers. Moreover, the long computation times make any interactive application unfeasible. 66 Acceleration methods try to address either or both of these issues. The recent adaptive formulation by Agarwala [2] has been particularly insightful. By exploiting the smoothness of the solution, this method was the first to reduce both the cost of the computation and its memory requirements. The approach by Kazhdan and Hoppe [89] demonstrates how a streaming approach can achieve high performance by optimizing the memory access patterns. We extend these acceleration techniques and show how to achieve high quality local solutions, without the need for solving the entire system. Moreover, we show that coarse approximations are of acceptable visual quality without the cost of a typical coarsening stage used in the V-cycle. These new features, coupled with a simple multiresolution framework, enable a data-driven interactive environment that exploits the fact that interactive editing sessions are always limited by screen resolution. At any given time, a user only sees either a low resolution view of the entire image or a high resolution view of a small area. We take advantage of this practical restriction and solve the Poisson equation only for the visible pixels. This provides performance advantages for interactive sessions, as well as tight control over the memory usage. For example, even the simple step of computing the gradient of the full resolution image can be problematic due to its significant processing time and storage requirement. In our approach, we avoid this problem by estimating gradients on-the-fly using only the available pixels. Overall, our interactive system is based on a simple two-tier approach: • A global progressive solver provides a near instant coarse approximation of the full solution. This approximation can be refined up to a desired solution by a lightweight process, often running in the background and possibly out-of-core. Any time the user changes input parameters, this process is restarted. • A local progressive solver provides a quick solution for the visible pixels. This process is driven by the interactive viewer and uses as a coarse approximation the best solution available from the global solver. These components can be coupled with different multiresolution hierarchies as discussed in the next section. 5.2.1.1 Initial solu tion . At launch, the system computes a coarse image for the initial view. A fast two-dimensional direct method using cosine and Fast Fourier transforms by Agrawal [7, 8] is used for this initial solve for techniques that require Neumann boundaries 67 (stitching, HDR compression). For methods that require Dirichlet boundaries (seamless cloning) we using an iterative method such as SOR. To provide the user with a meaningful preview, we use an initial coarse resolution of one to two megapixel depending on the physical display. We have found, in practice, that the Fast Fourier solver usually gives us this approximation in under 2 seconds. This initial solution is at the core of the progressive refinement defined in the next paragraph. 5.2.1.2 P rogressive refin em en t. The goal of progressive refinement is to increase the resolution of our solution either locally or globally. This requires injecting color trans­ port information from coarser to finer resolutions. In doing so, we exploit the fact that the solution, away from the seams, tends to be smooth [2] and up-sampling the coarse solution gives high quality results in large areas of the image. To improve the solution and resolve the problems at the seams, we run an iterative method, estimating new gradients from the original pixel data of the finer resolution and using the up-sampled values as the initial solution estimate. The finer resolution gradient field allows the iterative solver to reconstruct the detailed features of the original image. For the iterative method we allow the use of either conjugate gradient (for faster convergence) or SOR (for minimal memory overhead). The iterative solver is assumed to have converged when the L 2 norm of the residual between iterations is less than 1.0 x 10- 3 . In practice there is no perceptible difference between iterations after this condition is met. Figure 5.1 shows the refinement process where we assume for simplicity that each resolution doubles each dimension separately and our data is a subsampled hierarchy. In this case, computing each finer resolution is equivalent to adding new rows (or columns) to the coarse resolution. Therefore, we know that each new pixel added has two neighbors from the coarse solution. We can take the average difference from these two neighbors and apply it to the original Rgigabyte value of the pixel from the new resolution (see Figure 5.1 (a)). Since the image is subsampled, the average difference and application to the new pixel is trivial. In a tiled hierarchy one would need to double both dimensions at the same time, requiring a simple adjustment to the interpolation. Each new resolution is treated as new data and the offset is based on the solution from the previous resolution and the transform between levels. 5.2.1.3 L oca l p review . Comegabyteining the coarse, global solve with a progres­ sively refined local preview is all that is necessary for our interactive system. For data 68 Pr = Ps — Pn [~[ Ps = solved pixel □ Pn = new pixel from original image n x n 2n x n 2n x 2n Pn(x, y) + Pr (x—1’y)+ Pr(x+1’y) Pn(x, y) + Pr (x’y—1)+ Pr(x’y—1) 4n x 4n (a ) k X k 2k x 2k Global Solver 2k k 2k 2k Local Adaptive Solver (b) F igure 5.1: Our adaptive refinement scheme using simple difference averaging. (a) Global progressive up-sampling of the edited image computed by a background process. (b) View- dependent local refinement based on a 2k x 2k window. In both cases we speedup the SOR solver with an initial solution obtained by smooth refinement of the solution. requests at resolutions equal to or less than our coarse solution, we simply display the available data. As the user zooms into an area, the image is progressively refined in a local region. Since the resolution increase is directly coupled with the decrease in the extent of the local view, the numegabyteer of pixels that must be processed remains constant (see Figure 5.1 (b)). This results in a logarithmic run-time complexity and constant storage requirement, which allows our system to gracefully scale to images orders of magnitude larger than previously possible. 5.2.1.4 P rogressive full solu tion . The progressive refinement can be applied glob­ ally to compute a full solution. Since the method requires a very small overhead, it can easily be run as background process during the interactive preview. When a new resolution has been solved, the interactive preview uses the solution as a new coarse approximation, 69 thereby saving com putation during th e local adaptive phase. Like o ther in-core m ethods, this progressive global solver is limited by available system memory. To address this issue, th e global solver has th e ability to switch modes to a moving-window out-of-core progressive solver. 5 .2 .1 .5 O u t- o f - c o r e s o lv e r. The out-of-core solver m aintains strict control over memory usage by sweeping th e d a ta w ith a sliding window. The window traverses the finest desired resolution, which can be several levels in th e hierarchy from th e current available solution. If th e ju m p in resolution is too high, th e process can be repeated several times. W ithin each window, th e coarse solution is up-sampled and th e new resolution image is solved using th e gradients from th e desired resolution. Since th e window lives at th e desired resolution, we never need to expand memory beyond th e size of th e window. F urtherm ore, windows are arranged such th a t they overlap w ith th e previously solved d a ta in b o th dimensions to produce a sm ooth solution. The overlap causes th e solver to load and com pute some of th e d a ta m ultiple times. This overlap has an inherent overhead when com pared to an idealized in-core solver. For instance, given a 1 /x overlap, th e four corners, each 1 /x x 1 /x in size, are executed four times. The four strips on th e edge of th e window, not including th e corners, each 1 /x x (1 — 2 /x ) in size are executed two times. All other pixels, size (1 — 2 /x ) x (1 — 2 /x ), are executed once. Therefore, th e overhead com putation for this 1 /x overlap is given by: 4 /x (1 + 1 /x ). Moreover, th e I/O overhead can be reduced to 1/x , since we can retain pixel d a ta from th e previous window in th e prim ary traversal direction. In principle, a larger overlap between windows results in higher quality solutions, though in practice we have found th a t for a 1024 x 1024 window a 1/32 overlap is sufficient for good results. This overlap requires only a 12.8% com pute overhead and a 3.1% I/O overhead. A larger window can be used to reduce th e percentage overlap while achieving th e same quality results. For instance, by doubling th e window size in b o th dimensions, a 2048 x 2048 window can be com puted w ith a 1/64 overlay, only incurring a 6.3% com pute overhead and a 1.5% I/O overhead. Compared to th e exact analytical solution, our m ethod produces even higher quality results th a n th e best known m ethod [89] for equivalent run times. 5 .2 .2 D a t a A c c e s s O ur progressive solver can operate well on multiple hierarchical schemes. Tiled hierar­ chies are often used to produce sm oother, antialiased images, though high contrast areas 70 in th e original image may be lost in th e smoothing. As Figure 5.2 (b) shows, the tiled image is visually pleasing, b ut details such as th e cars on th e highway are lost. This visual smoothness can also come at the cost of significant preprocessing, reduced flexibility when dealing w ith missing d ata, and increased I/O when traversing th e d ata. The costs can be especially significant for massive d a ta if one has to process it w ith very limited resources. The least costly image hierarchy can be com puted by subsampling. Subsampling is simple and lightweight, b ut is prone to high frequency aliasing. It does, though, retain higher contrast a t th e coarse resolution. Figure 5.2 (a) shows how th e subsampled hierarchy has aliasing artifacts, b ut also retains enough contrast to see the cars on th e highway. This contrast may be beneficial for some applications, such as an analyst studying satellite imagery. To show th e flexibility of our interactive system, we support b o th a filtered tiled hierarchy and a subsampled hierarchy (see Figure 5.3). For a tiled scheme, we com pute th e image F ig u r e 5.2: Subsampled and tiled hierarchies. (a) A subsampled hierarchy. As expected, subsam pling has th e tendency to produce high-frequency aliasing. Though details such as th e cars on th e highway and in th e parking lots are preserved. (b) A tiled hierarchy. This produces a more visually pleasing image a t all resolutions b ut at th e cost of potentially losing inform ation. The cars are now com pletely sm oothed away. D ata courtesy of th e U.S. Geological Survey. 71 F ig u r e 5.3: O ur progressive framework using subsampled and tiled hierarchies. (a) A com posite satellite image of A tlanta, over 100 gigapixels at full resolution, overlaid on Blue M arble background subsampled; (b) a tiled version of th e same satellite image; (c) the seamless cloning solution using subsampling; (d) th e same solution com puted using a tiled hierarchy; (e) th e solution offset com puted using subsampling; (f) th e solution com puted using tiles; (g) a full resolution portion com puted using subsampling; (h) th e same portion using tiling. Note th a t even though there is a slight difference in th e com puted solution, b o th th e tiled and th e subsampled hierarchies produce a seamless stitch w ith our framework. D ata courtesy of th e U.S. Geological Survey and NASA’s E a rth Observatory. hierarchy using a Gaussian kernel to produce a smooth, antialiased image (Figure 5.3 right column). W ith a m inor variation to th e underlying I/O layer, our system also supports a faster, subsampled Hierarchical Z-order as proposed by Pascucci and Frank [2002] (Figure 5.3 left column). For an overview of th e HZ d a ta form at, see Section 3.1. To achieve th e level of scalability necessary in th e current system, we fu rth er simplify the HZ d a ta access scheme. We use a lightweight recursive algorithm th a t avoids repeated index com putations, provides progressive and adaptive access, guarantees cache coherency and minimizes th e num egabyteer of I/O operations w ithout using any explicit caching mechanisms. In particular, com puting th e HZ index w ith this new algorithm a ttain s a th irty times speedup com pared to th e previous work. For example, to com pute th e indices for a 0.8 gigapixel image th e new algorithm requires 4.7 seconds where th e previous m ethod would take 144.1 seconds. Moreover, since th e traversal follows th e HZ storage order exactly for any query window, we guarantee th a t each file is accessed only once w ithout need of holding any block of d a ta in cache. For details on our new recursive algorithm see Section 3.2. This 72 approach makes th e system intrinsically cache friendly for any realistic caching architecture and, therefore, very flexible in exploiting m odern hardware. Conversion into HZ-order requires no additional storage. On th e o ther hand, for tiled hierarchies a 1/3 d a ta increase is common. Due to our new d a ta access scheme, conversion to HZ-order is straightforw ard and inexpensive. For our te st d ata, we have found th a t there is only a 27% overhead due to th e conversion com pared to ju st copying th e raw d ata. In essence, th e conversion is strictly a reordering of th e d a ta and requires no operations on th e pixel d ata. This conversion will outperform even th e most simple tiled hierarchies which require some m anipulation of the pixel d ata. Each resolution in th e HZ-hierarchy is in plain Z-order, which allows for fast, cache coherent access of subregions of th e image. HZ is not tied to a specific d a ta traversal order, such as th e row-major imposed by trad itio n al file formats, as previously observed in [89]. In fact, HZ m aintains a high degree of cache coherency even during adaptive local traversals. The locality of our d a ta access provides graceful performance degradation even in extrem e conditions. In particular, we d em onstrate accessing a d a ta set, of roughly a terab y te in size, by simply m ounting a rem ote file system over an encrypted V PN channel via a wireless connection. Even in normal running conditions, we have found th a t th e I/O overhead caused by using a tiled hierarchy increased th e running tim e by 39%-67%. These numegabyteers reflect th e theoretical bound of 1/3 overhead, m ade worse by th e inability to constrain real queries to perfect alignment w ith th e boundaries of a quadtree. The effect of this overhead is detrim ental to th e scalability of th e system under more difficult running conditions such as th e one m entioned above. Moreover, HZ easily handles partially converted d ata, as we show in one portion of th e accompanying video for th e editing of the Salt Lake City panoram a. In a tiled scheme, th e entire hierarchy may need to be recom puted as new d a ta is added. 5 .2 .3 I n te r a c tiv e P r e v ie w a n d O u t-o f-C o r e S o lv e r R e s u lt s We d em onstrate th e scalability and interactivity of our approach on several applications, using a num egabyteer of images ranging from megapixels to hundreds of gigapixels in size. To fu rth er illustrate th e responsiveness of our system, th e accompanying video shows screen captures of live dem onstrations. To highlight particu lar details and validate th e approach, th e figures in this section show previews and close-ups of our interactive system, alongside th e results of our full out-of-core progressive solver. We also provide running times of our 73 full out-of-core solver com pared w ith th e best current m ethod, stream ing m ultigrid [89], which we have verified to use th e same gradient inform ation. All timings and demos were performed on a 64-bit 2.67 GHz Intel Q uad Core desktop, w ith 8 gigabyte of memory. All stream ing m ultigrid tim ings were com puted from code provided by th e authors and include th e tim ing for th e gradient preprocess along w ith th e tim ing to produce a solution. O ur simple framework provides th e illusion of a fully solved Poisson system at interactive frame rates and under continuous p aram eter changes w ith only a simple GL te x tu re for display and no special hardw are acceleration. Therefore, our code is platform independent. O ur simple progressive out-of-core solver produces robust solutions w ith run times th a t rival [89]. Unlike th e previous m ethod, our out-of-core solver does not use hardw are acceleration and did not undergo high code optim ization to achieve th e following runtimes. The solver is also sequential and uses no threading to accelerate the com putation. If further optim ization of th e run-tim es is desired, there is nothing in our system to prevent the addition of these acceleration techniques. Different from o ther out-of-core m ethods, we do not rely on large external memory d a ta structures and we do not need to pre-com pute gradients for th e entire image. For th e Salt Lake City panoram a, for example, th e stream ing m ultigrid m ethod [89] creates 75.2 gigabyte of auxiliary inform ation for a 7.9 gigabyte input image. W hile disk space is generally assumed to be plentiful, such an explosion in disk space is unsustainable for images hundreds of gigapixel in size. The collection of satellite imagery we use in our video is more th a n one terab y te in size and would, therefore, require more th a n 9.5 terabytes of tem porary storage. The Edinburgh example is 25 images has resolution 16,950 x 2,956 (50 megapixel). At launch, our system performs a seamless stitch Poisson solve of a global 0.7 megapixel image in 1.26 seconds using our direct analytical solve, see Figure 5.4 (a). From this point on, th e system can pan and zoom interactively as if th e full-solution were already available. Our local adaptive refinement gives a solution th a t is visually equivalent to a solution to th e entire system, see Figure 5.4 (c, d, and e). In th e accompanying video, we d em onstrate interactive editing and solving of th e Poisson system, after th e repeated user-selected replacement of pixels of a particu lar color. We also perform a seamless clone of a 2000 x 1600 airplane on E d inburgh’s cloudy sky. The plane is anim ated along a linear p ath across th e panoram a. As evident in th e video, our framework shows th e entire sequence in real-time.W e also dem onstrate similar interactive editing w ith th e Redrock panoram a (d ata 74 F ig u r e 5.4: The E dinburgh P an o ram a 16, 950 x 2, 956 pixels. (a) O ur coarse solution com puted a t a resolution of 0.7 megapixels; (b) th e same panoram a solved at full resolution w ith our progressive global solver scaled to approxim ately 12 megapixel for publication; (c) a detail view of a particularly bad seam from th e original panoram a; (d) th e problem area previewed using our adaptive local refinement; (e) th e problem area solved at full resolution using our global solver in 3.48 minutes. courtesy of Aseem Agarwala): nine images, 19, 588 x 4,457; 87 megapixels. Given this initial coarse solution, our m ethod can produce a full solution of Edinburg, see Figure 5.4 (b), in 3.48 minutes. T he stream ing m ultigrid m ethod requires 3.52 minutes. Figure 5.5 (a) shows th e convergence and error for our m ethod and stream ing m ultigrid when com pared to the ideal direct solution. The Salt Lake City example is 611 images w ith resolution of 126,826 x 29,633 (3.27 gigapixel). A significantly larger example is provided by a panoram a cap tu red w ith a simple cam era m ounted on a G igaPan robot [63]. To maximize individual image quality th e pictures were taken w ith autom atic exposure times, which inherently increases the color differences between images th a t need to be corrected by th e Poisson solver. An initial coarse preview of 0.87 megapixel is com puted by our direct analytical solver in 2.07 seconds. Figure 5.6 shows th e original set of images (a), th e panoram a th a t our systems stitches in real tim e (b), th e global solution provided by our out-of-core solver (c), and th e difference image between th e interactive preview and th e final solution a t th e coarse resolution (d). There are slight deviations at some of th e more challenging seams, b ut overall there is negligible visible difference. O ur local adaptive preview mimics well th e global solution, as shown in Figure 5.7. To test th e accuracy of th e m ethods, we have run a full analytical Poisson solver on a 485 megapixel subset of the panoram a on a H PC -com puter. Figures 5.8 (a) and (b) show how close our out-of-core solution comes to th e exact analytical solution. Figure 5.8 (c) shows th a t th e m ultigrid m ethod has yet to converge to an acceptable solution given an equivalent am ount of running tim e. All solutions were com puted using th e m ap (a) (b) O O ô r ooo CM S tream ing M ultigrid - SLC Portion O ■ Time (s) RMS Error 41.6219 41.6237 41.6257 Iterations g Progressive O u t-o f-co re - SLC Portion 00 cn oo00 o o o Time (s) RMS Error H5.85382 ■ ■ % M 3.11283 3.12682 Iterations ■ to ■ 15 25 35 45 55 100 200 300 F ig u r e 5.5: The RMS error when com pared to th e ideal analytical solution as we increase iterations for b o th m ethods. Stream ing m ultigrid has b e tte r convergence and less error for th e E dinburgh example (a), though our m ethod remains stable for th e larger Salt Lake City panoram a (b). Notice th a t every plot has been scaled independently to best illustrate th e convergency trends of each m ethod. 76 F ig u r e 5.6: P an o ram a of Salt Lake City of 3.27 gigapixel, obtained by stitching 611 images. (a) Mosaic of th e original images. (b) O ur solution com puted at 0.9 megapixel resolution. (c) The full solution provided by our global solver. (d) The difference image between our preview and th e full solution at th e preview resolution. B oth (a) and (c) have been scaled for publication to approxim ately 12.9 megapixels. F ig u r e 5.7: A com parison of our adaptive local preview on a portion of th e Salt Lake City panoram a one half of th e full resolution; (a) th e original mosaic, (b) our adaptive preview, (c) th e full solution from our global solver, and (d) th e difference image between th e adaptive preview and th e full solution 77 F ig u r e 5.8: A com parison of our system w ith th e best known out of core m ethod [Kazhdan and Hoppe 2008] and a full analytical solution on a portion of th e Salt Lake City panoram a, 21201 x 24001 pixels, 485 megapixel (a) th e full analytical solution; (b) our solution com puted in 28.1 minutes; (c) solution from [Kazhdan and Hoppe 2008] com puted in 24.9 minutes; (d) th e analytical solution where th e solver is allowed to harm onically fill th e boundary; (e) our solution w ith harm onic fill; (f) solution from [Kazhdan and Hoppe 2008] w ith harm onic fill; (g) th e m ap image used by all solvers to construct th e panoram a where th e red color indicates th e image th a t provides th e pixel color and white denotes the panoram a boundary. increases th e memory usage of th e method. Sierpinki Sponge example has resolution 128k x 128k (16 gigapixel). We have tested th e tone m apping application on a synthetic high dynam ic range image generated w ith M egaPOV [118]. In this image we use a partially refined model of a Sierpinki Sponge to create high variations in level-of-detail. Such details can be com pletely hidden in the d ark areas under projected shadows. We follow the approach introduced by F a tta l [55] to reconstruct th e inform ation hidden in th e d ark regions. To validate the approach, we ran a typical H DR test image, th e Belgium House, progressively refined from a 16 x 12 coarse solution. Even w ith such a coarse initial solution, we achieve results very close to th e exact solution (see Figure 5.9 (c) and (d)). Figure 5.9 shows th e original sponge model (a) and 78 y J K I r ! ? ■ I . 1 I I ^ I ■ F ig u r e 5.9: A pplication of our m ethod to HDR image compression: (a) Original synthetic H DR image of an adaptively refined Sierpinki sponge generated w ith Povray. (b) Tone m apped image w ith recovery of detailed inform ation previously hidden in th e shadows. (c) Belgium House image solved using our coarse-to-fine m ethod w ith an initial 16 x 12 coarse solution (a = 0.01, = 0.7, compression coefficient=0.5). (d) The direct analytical solution. Image courtesy of R aan an F attal. th e processed version (b), where all the details under th e shadows have been recovered. The satellite example contains a Blue M arble background image th a t is 3.7 gigapixel and imagery of A tlan ta and o ther cities which are well over 100 gigapixel. To d em onstrate the scalability of our system, we have run th e seamless cloning algorithm for entire cities over a variety of realistic backgrounds from NASA’s Blue M arble Collection [125] (see Figure 5.10). We show how a user can take advantage of these capabilities to achieve artistic effects and create virtu al worlds from real d ata. We also create a dynam ic environm ent by anim ating 79 F ig u r e 5.10: Satellite imagery collection w ith a background given by a 3.7 gigapixel image from NASA’s Blue M arble Collection. The Progressive Poisson solver allows the application of the seamless cloning m ethod to two copies of th e city of A tlanta, each of 116 gigapixels. An artist can interactively place a copy of A tlan ta under shallow w ater and recreate th e lost city of A tlantis. D ata courtesy of th e U.S. Geological Survey and NASA’s E a rth Observatory. th e background world m ap over 12 months and concurrently use th e Poisson solver to show how th e appearance of a city would change across th e seasons. 5.3 Parallel D istributed Gradient D om ain Editing In th e following, we provide details of our parallel Progressive Poisson algorithm and M PI im plem entation. This new algorithm reduces th e tim e to com pute a full resolution gradient dom ain solution from hours in th e case of a single, out-of-core solution to minutes when run on a distrib u ted cluster. 80 5 .3 .1 P a r a lle l S o lv e r Commonly, large images are stored as tiles, which gives one an underlying stru ctu re to di­ vide an image am ongst th e nodes/processors for a d istrib u ted solver. Tile-based distrib u ted solvers have been shown to work well when only local trends are present. Seamless stitching commonly contains large scale tren d s where a naive tile-based approach will provide poor results. The addition of the Progressive Poisson m e th o d ’s coarse upsampling, allows for a simple, tile-based parallel solver th a t can account for large trends. Our algorithm works in two phases: T he first phase performs th e progressive upsam ple of a precom puted coarse solution for each tile. The second phase solves for a sm ooth image on tiles th a t significantly overlap th e solution tiles from th e first phase. In this way, th e second phase sm ooths any seams not captured or even introduced by th e first phase, producing a complete, seamless image. 5 .3 .1 .1 D a t a d i s t r i b u t i o n a s tile s . A lthough a tile-based approach leverages a common image storage form at, it is not typically how m ethods are designed to handle seamless stitching of large panoram as. For instance, m ethods like stream ing m ultigrid [89, 90] often assume precom puted gradients for th e whole image. Our system is designed to take tiles directly as inp u t and therefore m ust be able to handle th e gradient com putation on-the-fly. An im p o rtan t and often undocum ented com ponent of panoram a stitching is the m ap or label image. Given an ordered set of images which compose the panoram a, the m ap image gives th e correspondence of a pixel location in the overall panoram a to the smaller image th a t supplies th e color. This m ap file is necessary to determ ine th e difference between actual gradients and those due to seams. This m ap also defines th e boundaries of th e panoram a, which are commonly irregular. This file along w ith each individual image th a t composes th e mosaic are needed for a traditional, out-of-core scheme [89, 149] for gradient com putation. If th e gradient across th e seams is assumed to be zero, which is a common technique we adopt for th is solver, each tile can be com posited in advance and the m ap file is only needed to denote image seams or boundary. As noted above, this com posited tile is often already provided if used in a trad itio n al large image system. The map file can th en be encoded as an ex tra channel of color inform ation, typically th e alpha channel. For mosaics of m any hundreds of images, such as th e examples provided in th is dissertation, we cannot encode an index for each image in a byte of d ata. Though in practice each tile has very little probability of having more th a n 256 individual images, each image is given a unique 0-255 num ber on a per tile basis. 81 We have chosen an overlap of 50% in b o th dimensions for th e second phase windowing scheme of th e parallel solver for simplicity in im plem entation. Each window is composed of a 2 x 2 collections of tiles. To avoid undefined windows in th e second phase, we add a symbolic padding of one row /colum n of tiles to all sides of th e image which th e solver regards as pure boundary. Figure 5.11 gives an example of a tile layout. T he overlapping window size used for our testin g was 1024 x 1024 pixels (assuming 512 x 512 tiles), which we found to be a good compromise between a low memory footprint and image coverage. Each node receives a p artitio n of windows equivalent to a contiguous subimage w ith no overlap necessary between nodes during th e same phase. D ata can be d istrib u ted evenly across all nodes in th e case of a homogeneous d istrib u ted system or dependent on weights due to available resources in th e case of a heterogeneous hardw are. We provide a te st case for a heterogeneous system in Section 5. 5 .3 .1 .2 C o a r s e s o lu tio n . As a first step, th e first phase of our solver will upsample via bilinear interpolation a 1-2 megapixel coarse solution. Much like th e Progressive Poisson m ethod [149], each node com putes a solution in ju s t a few seconds using a direct F F T solver on a coarsely sampled version of our large image. In tiled hierarchies, this coarse image is typically already present and can be encoded w ith th e m ap inform ation in much the same way as th e tiles. 5 .3 .1 .3 F i r s t p h a s e : p r o g r e s s iv e s o lu tio n . This phase com putes a Progressive Poisson solution for each window which are composed of tiles read off of a distrib u ted (a) * Image 2 2 3 3 4 4 2 2 3 3 4 4 5 5 6 6 7 7 8 8 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 9 9 10 10 11 11 12 12 * 6 6 7 7 8 8 6 6 7 7 8 8 10 10 11 11 12 12 10 10 11 11 12 12 (b) F ig u r e 5.11: O ur tile-based approach: (a) An input image is divided into equally spaced tiles. In th e first phase, after a symbolic padding by a column and row in all dimensions, a solver is run on a window denoted by a collection of four labeled tiles. D ata are sent and collected for th e next phase to create new d a ta windows w ith a 50% overlap. (b) An example tile layout for the Fall P an o ram a example. 82 file system. To progressively solve a window, an image hierarchy is necessary. For our im plem entation a stan d ard power-of-two image pyram id was used. As a first step, the solver upsamples th e solution to a finer resolution in the image pyram id using a coarse solution image and th e original pixel values. An iterative solver is th en run for several iterations to sm ooth this upsam ple using th e original pixel gradients as th e guiding field. This process is repeating down th e image hierarchy until th e full resolution is reached. The solver is considered to have converged at this resolution when th e L 2 norm falls below 10-3 which is based on th e range of byte color d ata. From our testing, we have found th a t SOR gives b o th good running times and low memory consum ption and therefore is our default solver. As noted above, this window is logically composed of four tiles, which are com puted and saved in memory for th e next phase as floating point color d ata. This leads to 12 b ytes/pixel (three floating point color d ata) to transfer between phases. Given th e d a ta distribution, one node may process many windows. If this is th e case, only th e tiles which border a node’s dom ain are prepared to be transferred to another node, thereby keeping d a ta com m unication between phases to a relatively small zone. 5 .3 .1 .4 S e c o n d p h a s e : o v e r la p s o lu tio n . The second phase gathers th e four tiles (b o th solution and original) th a t make up th e overlapping window. A fter th e d a ta are gathered, th e gradients are com puted from th e original pixel values and an iterative solver (SOR) is run after being initialized w ith the solutions from th e first phase. The iterative solver is constrained to only work on interior pixels to prevent this phase from introducing new seams at th e window boundary. Technically, there may be errors at th e pixels around th e m idpoints of th e boundary edges of these windows, though we have not encountered this in practice. Again, this solver is run until convergence given by th e L 2 norm. N ote th a t even though th e tile gradients are com puted in th e first phase, we have chosen to recom pute them on th e fly in th e second phase. Passing th e gradients would cost a t least an additional 12 b ytes/pixel overhead. As nodes increase, d a ta transfer and com m unication becomes a significant bottleneck in most distrib u ted schemes therefore, we chose to pay th e cost of increased com putation and reading th e less expensive byte image d a ta from th e distrib u ted file system instead of th e costly transfer. 5 .3 .1 .5 P a r a ll e l i m p l e m e n t a t i o n d e ta ils . Each node has one m aster th read which coordinates all processing and com munication. The core com ponent of this th read is a priority queue of windows and tiles to be processed. At launch, this queue is initialized by a separate seeding th read w ith th e initial dom ain of windows to be solved in th e first 83 phase. Because of th e separation of th e main th read from th e seeding of th e queue, the main th read can begin processing windows immediately. Each window is given a first phase id, which is th e window’s row and column location in th e subimage to be processed by a node. C om m unication between nodes need only be one-way in our system, therefore we have chosen for com m unication to be “u p stream ” between nodes, i.e., th e nodes operating on a subimage w ith horizontal or vertical location greater th a n th e current node. In order to avoid starvation in th e second phase, th e queue is loaded w ith windows in reverse order in term s of th e tile id. Figure 5.12 gives an example of th e traversal and communication. All initially seeded windows are given equal low priority in th e queue. In essence th e initial queue operates much like a first-in-first-out (FIFO ) queue. As windows are removed from the queue, th e main th read launches a progressive solver th read which is handed off to an in tra­ node dynam ic scheduler. Our im plem entation uses a HyperFlow [170] scheduler to execute th e solver on all available cores. H yperFlow has been shown to efficiently schedule execution F ig u r e 5.12: Windows are distrib u ted as evenly as possible across all nodes in the distrib u ted system. Windows assigned to a specific node are denoted by color above. Given th e overlap scheme, d a ta transfer only needs to occur one-way, denoted by th e red arrows and boundary above. To avoid starvation between phases and to hide as much d a ta transfer as possible, windows are processed in inverse order (white arrows) and th e tiles needed by other nodes are transferred immediately. 84 of workflows on multicore systems and therefore is th e perfect solution for our intra-node scheduling. In all there are two distinct sequential stages in each phase: (1) loading of th e tile d a ta and th e com putation of th e image gradient and (2) the progressive solution. This flow inform ation allows HyperFlow to exploit d ata, task, and pipeline parallelism to maximize throu g h p u t. A fter a solution is com puted, th e progressive solver th read partitio n s th e window into th e tiles th a t comprise it. This allows th e second phase to recombine th e tiles needed for th e 50% overlap window. All four tiles are loaded into th e queue w ith high priority. If th e main th read removes a tile (as opposed to a window) from th e queue and th e tile is needed by another node, th e main th read im m ediately sends th e d a ta asynchronously to th e proper node. Otherwise, if th e node needs this tile for phase two, th e second phase id of th e window which needs th e tile is com puted and hashed w ith a two-dimensional hash function th e same size as th e window dom ain for the second phase. If all four tiles for a given second phase window have been hashed, th e main th read now knows a second phase window is ready and im m ediately passes th e window to a solver th read for processing. If th e main th read receives a solved tile from another node, this is also im m ediately hashed. 5 .3 .2 R e s u lt s To d em onstrate th e scalability and ad ap tab ility of th e approach, we have tested our im plem entation using two panoram a d atasets, gigapixels in size. To illustrate th e portability of the system, we have also shown its running times and scalability on two distrib u ted systems. Our main system, th e NVIDIA Center of Excellence cluster in th e Scientific Com puting and Imaging In stitu te at th e U niversity of U tah, consists of 60 active nodes w ith 2.67GHz Xeon X5550 Processors (8 cores), 24GB of RAM per node, and 750GB local scratch disk space. The second system, th e Longhorn visualization cluster in th e Texas Advanced C om puter Center a t th e University of Texas a t A ustin, consists of 256 nodes (of which 128 were available for our tests) w ith 2.5GHz Nehalem Processors (8 cores), 48GB of RAM per node, and 73GB local scratch disk space. Weak and strong scalability tests were performed on b oth systems. Given th e proven scalability of Hyperflow on one node, we have tested th e scalability of th e M PI im plem entation from 2-60 and 2-128 nodes for the NVIDIA cluster and Longhorn cluster, respectively. Timings are taken as best over several runs to discount external effects to th e cluster from shared resources such as the distrib u ted file system. The d atasets used for testing were: 85 • Fall Panoram a. 126,826x29, 633, 3.27 gigapixel. W hen tiled, this d ataset is composed of 124 x 29 10242 sized windows. See Figure 5.13 for image results from a NVIDIA cluster 480 core te st run. • W inter Panoram a. 92, 570 x 28, 600, 2.65 gigapixel. W hen tiled, this d atase t is composed of 91 x 28 10242 sized windows. See Figure 5.14 for image results from a NVIDIA cluster 480 core te st run. 5 .3 .2 .1 N V I D I A c l u s t e r . To show th e M PI scalability of our framework and im­ plem entation, strong and weak scaling tests were performed for 2-60 nodes. As shown in Tables 5.1 and 5.2, b o th d atasets scale close to ideal and w ith high efficiency for strong F ig u r e 5.13: Fall P an o ram a - 126, 826 x 29, 633, 3.27 gigapixel. (a) The panoram a before seamless blending and (b) th e result of th e parallel Poisson solver run on 480 cores w ith 124 x 29 windows and com puted in 5.88 minutes. F ig u r e 5.14: W inter P an o ram a - 92, 570 x 28, 600, 2.65 gigapixel. (a) The result of the parallel Poisson solver run on 480 cores w ith 91 x 28 windows and com puted in 6.02 minutes, (b) th e panoram a before seamless blending, and (c) th e coarse panoram a solution. 86 T a b l e 5 .1 : T h e stro n g scaling resu lts for th e Fall P a n o ra m a ru n on th e N V ID IA c lu ste r from 2-60 nodes up to a to ta l of 480 cores. O verhead (O /H ) due to M P I c o m m u n icatio n and I / O is also provided along w ith its pe rc e n ta g e of a c tu a l ru n n in g tim e. T h e Fall P a n o ra m a , due to its larger size begins to lose efficiency a t aro u n d 32 nodes w hen I / O overhead begins to d o m in a te . E ven w ith th is overhead, th e efficiency (Eff.) rem ains acceptable. S tro n g Scaling - Fall P a n o ra m a - N V ID IA c lu ste r Nodes Cores Ideal (m) Actual (m) Eff. % O /H (m) % O /H 2 16 79.35 79.35 100.0 18.80 23.7 4 32 39.68 40.08 97.1 9.05 22.2 8 64 19.84 20.83 95.2 7.28 35.0 16 128 9.92 11.43 78.9 6.50 51.7 32 256 4.96 6.20 53.8 6.20 67.3 48 384 3.31 6.40 51.7 6.40 100.0 60 480 2.65 5.88 45.0 5.88 100.0 T a b l e 5 .2 : T h e stro n g scaling resu lts for th e W in te r P a n o ra m a ru n on th e N V ID IA c lu ste r from 2-60 nodes u p to a to ta l of 480 cores. O verhead (O /H ) d u e to M P I com m u n icatio n a n d I / O is also provided along w ith its pe rc e n ta g e of a c tu a l ru n n in g tim e. F or th e W in te r P a n o ra m a , th e I / O overhead does n o t effect perform ance u p to 60 nodes a n d th e im p lem e n ta tio n m ain ta in s efficiency (Eff.) th ro u g h o u t all of o u r runs. S tro n g Scaling - W in te r P a n o ra m a - N V ID IA c lu ste r Nodes Cores Ideal (m) Actual (m) Eff. % O /H (m) % O /H 2 16 128.87 128.87 100.0 8.63 6.7 4 32 64.43 77.68 82.9 4.70 6.1 8 64 32.22 40.63 79.3 4.28 10.5 16 128 16.11 21.17 76.1 4.17 19.7 32 256 8.05 10.88 74.0 4.08 37.5 48 384 5.37 6.98 76.9 4.10 58.7 60 480 4.30 6.02 71.4 4.00 66.5 scaling. T h e Fall P a n o ra m a , due to its larger size begins to lose efficiency a t a ro u n d 32 nodes w hen I / O overhead begins to d o m in a te . E ven w ith th is overhead, th e efficiency rem ains acceptable. For th e W in te r P a n o ra m a , th e I / O overhead does n o t effect p erform ance up to 60 nodes a n d th e im p lem e n ta tio n m ain ta in s efficiency th ro u g h o u t th e te s t. W eak scaling te s ts were perform ed using a subim age of th e Fall P a n o ra m a d a ta s e t. See T able 5.3 for th e w eak scaling resu lts. As th e n u m b er of cores increases so does th e im age resolution to be solved. T h e subim age was ex p an d ed from th e c e n te r of th e full im age and ite ra tio n s of th e solver for all w indows were locked a t 1000 for te s tin g to ensure no v a ria tio n is d u e to slower converging im age areas. As th e figure shows, o u r im p lem e n ta tio n shows good w eak scaling 87 T a b l e 5 .3 : W eak scaling te s ts ru n on th e N V ID IA c lu ste r for th e Fall P a n o ra m a d a ta s e t. As th e n u m b er of cores, increases so does th e im age reso lu tio n to be solved. T h e im age was e x p a n d ed from th e c e n te r of th e full im age. Ite ra tio n s of th e solver for all w indows were locked a t 1000 for te s tin g to ensure no v a ria tio n is d u e to slower converging im age areas. As is shown, o u r im p lem e n ta tio n shows good efficiency even w hen ru n n in g on th e m axim um n u m b er of cores. W eak Scaling - N V ID IA clu ster N odes Cores Size (M P ) T im e (m in.) Efficiency 2 16 100.66 5.55 100.00% 4 32 201.33 5.55 100.00% 8 64 402.65 5.53 100.30% 16 128 805.31 5.68 97.65% 32 256 1610.61 5.77 96.24% 60 480 3019.90 6.57 84.52% efficiency even for 60 nodes w ith 480 cores. In all, we have p ro d u ced a g rad ie n t d om ain solution to a d a ta s e t w hich in previous w ork th e b est know n m eth o d s [89, 149] to o k hours to com pute. 5 .3 .2 .2 L o n g h o r n c l u s t e r . To show th e p o r ta b ility a n d M P I scalability of our fram ew ork a n d im p lem e n ta tio n , stro n g a n d weak scaling te s ts were perform ed on th e largest d a ta s e t (Fall P a n o ra m a ) on a second clu ster. T h e stro n g scaling te s ts were perform ed from 2-128 nodes a n d th e weak scaling te sts, lim ited by th e size of th e im age, were perform ed from 2-64 nodes. As show n in T able 5.4, o u r im p lem e n ta tio n m a in ta in s very good efficiency a n d tim in g s for our s tro n g scaling te s t up to th e full 1024 cores available on th e system . M uch like th e N V ID IA cluster, weak scaling te s ts were perform ed on a p o rtio n of th e Fall T a b l e 5 .4 : To d e m o n s tra te th e p o r ta b ility of o u r im p lem e n ta tio n , we have ru n stro n g scalab ility te s tin g for th e Fall P a n o ra m a on th e L onghorn c lu ste r from 2-128 nodes up to a t o ta l of 1024 cores. As th e nu m b ers show, we m a in ta in good scalability a n d efficiency even w hen ru n n in g on all available nodes a n d cores. S tro n g Scaling - Fall P a n o ra m a - L onghorn Nodes Cores Ideal(m) Actual(m) Efficiency 2 16 84.07 84.07 100% 4 32 42.03 43.18 97% 8 64 21.02 21.85 96% 16 128 10.51 12.08 87% 32 256 5.25 6.93 76% 64 512 2.63 3.89 68% 128 1024 1.31 2.73 48% 88 P an o ram a and iterations of th e solver were locked at 1000. To ensure th a t each node got a reasonably sized subimage to solve, th e tests were limited to 64 nodes. Table 5.5 dem onstrates our im plem entations ability to weak scale on this cluster, m aintaining good efficiency for up to 512 cores. 5 .3 .2 .3 H e te r o g e n e o u s c l u s t e r . As a final test of p o rtability and adaptability, we presented our im plem entation w ith a simulated heterogeneous distrib u ted system. Our parallel framework provides th e ability to give weights to nodes which is typically even and therefore results in an even d istribution of windows across all nodes. For this example, a simple weighting scheme can easily load-balance this mixed network, giving the nodes w ith more resources more windows to com pute. Table 5.6 gives an example mixed system of two 8-core nodes, four 4-core nodes, and eight 2-core nodes. In all, this system has 48 available cores. The weights for our framework are simply th e num ber of cores available in each node. This network was sim ulated using th e NVIDIA cluster by overloading Hyperflow’s knowledge of available resources w ith our desired properties. W hile this is not a perfect sim ulation since th e m ain th read handling M PI com m unication would not be limited to reside on th e desired cores, as shown in th e strong scaling tests even w ith evenly distrib u ted d a ta on 8-16 nodes th e im plem entation is not yet I/O bound. Therefore, we should still have a good approxim ation to a real, limited system. The figure details th e window d istribution and timings for th e Fall P an o ram a for all nodes in this test. As is shown, we m aintain good load balancing given proper node weighting when dealing w ith heterogenous systems. The m ax runtim e of 32.70 minutes for this 48 core system is on par w ith run tim e for the 32 core (40.08 minutes) and 64 core (20.83 minutes) strong scaling results. T a b le 5.5: Weak scaling tests run on th e Longhorn cluster for th e Fall P an o ram a dataset. Weak Scaling - Longhorn cluster Nodes Cores Size (M P) Time (min.) Efficiency 2 16 75.5 5.50 100.00% 4 32 151 6.13 89.67% 8 64 302 6.15 89.43% 16 128 604 6.15 89.43% 32 256 1208 6.13 89.67% 64 512 2416 6.15 89.43% 89 T a b le 5.6: O ur sim ulated heterogeneous system. This te st example is a sim ulated mixed system of 2 8-core nodes, 4 4-core nodes, and 8 2-core nodes. The weights for our framework are th e num ber of cores available in each node. T he timings and window distributions are for Fall P an o ram a d ataset. As you can see, w ith th e proper weightings our framework can d istrib u te windows proportionally based on th e performance of th e system. The m ax runtim e of 32.70 minutes for this 48 core system is on par w ith tim ings for th e 32 core (40.08 minutes) and 64 core (20.83 minutes) runs from th e strong scaling test. ■ Total W ind o w s Processed ■ T im e (m ) Heterogeneous System - Fall Panorama Cores 8 4 2 Time(m) 27.9 28.9 32.1 32.7 32 32.5 16.6 23.1 28.7 32.2 20 23.6 24.6 28.4 Windows 1239 1239 640 640 580 600 276 285 300 330 304 304 290 319 5.4 Gradient D om ain Editing on th e Cloud The parallel algorithm outlined in th e previous section provides a full resolution gradient dom ain solution for massive images in only a few m inutes of processing tim e. In this section, we explore redesigning this technique as a cloud-based application. For this work, we chose to ta rg e t th e M apReduce framework and its open source im plem entation, Hadoop. M apReduce and Hadoop have emerged in recent years as popular and widely supported cloud technologies. Therefore, they are th e logical targ ets for this work. 5 .4 .1 M a p R e d u c e a n d H a d o o p This subsection briefly reviews some of th e fundam entals of th e M apReduce framework and how to design graphics algorithm s to work well w ith H adoop and H adoop's D istributed File System (HDFS). We provide a high level view to justify design decisions outlined in th e next section. The m ap function operates on key/value pairs producing one or more key/value pairs for th e reduce phase. The reduce function is a per-key operation th a t works on th e o u tp u t of th e m apper (see Figure 5.15). H adoop’s scheduler will interleave th eir execution as d a ta are available. Currently, Hadoop does not support job chaining. Therefore, any algorithm th a t requires two passes will likely require two separate M apReduce jobs. W hile this will likely change in the future, at this tim e minimizing th e num ber of passes is an im portant consideration since th e overhead incurred by launching new jobs in H adoop is significant. 90 F ig u r e 5.15: The two phases of a M apReduce job. In th e figure, three m ap tasks produce key/values pairs th a t are hashed into two bins corresponding to th e two reduce tasks in the job. W hen the d a ta are ready, th e reducers grab th eir needed d a ta from th e m ap p er’s local disk. In Section 5.4.2 we detail our algorithm , which requires only one pass. H adoop has been optimized to handle large files and to pro cess/tran sfer small chunks of d ata. For m any applications including th e one outlined in th e next section, understanding H adoop’s d a ta flow is vital for an efficient im plem entation, much like random memory access m ust be considered in a GPU. 5 .4 .1 .1 I n p u t . The Hadoop distrib u ted file system stripes d a ta across all available nodes on a per block basis w ith replication to guarantee a certain level of locality for the m ap phase and to be able to handle system faults. W hen a jo b is launched, H adoop will split th e input d a ta evenly for all map instances. For our example, allowing Hadoop to arb itrarily split th e input d a ta could result in fragm ented images. Therefore, th e system allows th e developer to specialize th e function reading th e inp u t which we use to constrain th e split to only occur a t image boundaries. 5 .4 .1 .2 M a p R e d u c e t r a n s f e r . D uring execution, each m apper hashes th e key of each key/value pair into bins. The num ber of bins equal th e num ber of reducers (see Figure 5.15) and each bin is also sorted by key. The m ap first stores and sorts th e d a ta in a buffer in memory b u t will spill to disk if this is exceeded (the default buffer size is 512 91 MB). This spill can lead to poor m apper performance and should be avoided if possible. A fter a m apper completes execution, th e interm ediate d a ta are stored to a node’s local disk. Each m apper informs th e control node th a t its d a ta are finished and ready for the reducers. Since H adoop assumes th a t any m apper is equally likely to produce any key, there is no assumed locality for th e reducers. Each reducer m ust pull its d a ta from multiple m appers in th e cluster (see Figure 5.15 and 5.16). If a reducer m ust grab key/value pairs from many local disks on th e cluster (possibly an N -to -N mapping), this phase can have d rastic effect on th e performance. Job coordination is handled w ith a m aster/slave model where th e control node, called th e Job Tracker distributes and manages th e m ap and reduce tasks. W hen a program is launched th e Job Tracker initiates Task Trackers on nodes in th e cluster. The Job Tracker th en schedules tasks on th e Task Tracker m aintaining a com m unication link to handle system faults (see Figure 5.16). — I iiiP NodeJob tracker 1 v Create C om m unication Task tracker DFS I I- / Task tracker Output from other mappers Node Disk Node Disk Node Disk F ig u r e 5.16: A diagram of th e job control and d a ta flow for one Task Tracker in a Hadoop job. The dotted, red arrows indicate d a ta flow over th e network; dashed arrows represent com munication; th e blue arrow indicates a local d a ta w rite and th e black arrows indicate an action taken by th e node. 92 5 .4 .2 M a p R e d u c e for G r a d ie n t D o m a in Commonly, large images are stored as tiles, which gives us th e underlying stru ctu re for our scheme. However, a tiled-based approach by itself would not account for large scale trends common in panoram as (see Figure 5.17). Therefore, we add upsam pling of a coarse solution similar to th e approach used in Summa et al. [149] to cap tu re these trends. Our algorithm works in two phases: The first phase performs th e upsam ple of a precom puted coarse solution and solves each tile to produce a sm ooth solution over th e extent of th e tile. The second phase solves for a sm ooth image on tiles th a t significantly overlap th e smoothed tiles from th e first phase. In this way, th e second phase smooths any seams not captured or even introduced by th e first phase solvers. This algorithm can be simply implem ented in one M apReduce job in Hadoop. 5 .4 .2 .1 T ile s . We have chosen an overlap of 50% in b o th dimensions for th e second phase due to th e simplicity of im plem entation, although Summa et al. [149] has shown th a t a good solution can be found w ith much less. To easily accomplish this overlap, we divide th e d a ta into tiles 1/4 of th e proper size. Figure 5.18 shows th e tile layout for our test images. Each phase operates on four of these smaller tiles which are combined to construct th e larger tiles. To avoid undefined tiles in th e second phase, we add a symbolic padding of one row /colum n to all sides of th e image. Figure 5.19 gives an example of a tile layout. An im p o rtan t com ponent of panoram a stitching is a m ap file which gives th e correspondence from a pixel location in th e overall panoram a to th e smaller image th a t supplies th e color. This map file is necessary to determ ine the difference between actual gradients and those due to seams. This m ap also defines th e boundaries of th e panoram a, which are commonly irregular and do not usually follow th e actual image boundary. T he panoram a boundary is a seam we would like to preserve. We encode th e m ap file into each individual tile as an alpha channel. For images such as th e Salt Lake City example, we cannot encode an index F ig u r e 5.17: A lthough th e result is a sm ooth image, w ithout coarse upsam pling th e final image will fail to account for large trends th a t span beyond a single overlap and can lead to unwanted shifts in color. Notice th e vertical banding denoted by the red arrows. 93 F ig u r e 5.18: The 512 x 512 tiles used in our Edinburgh (a), Redrock (b), and Salt Lake City (c) examples. 1 - 0 ,0 - 1 - 0 ,1 - 1 - 0 ,2 - 1 - 0 ,3 - - 1 ,0 - -1,1 - - 1 ,2 - - 1 ,3 - -2 ,0 ­ 1 -2 ,1 ­ 1 -2 ,2 ­ 1 - 2 , 3 ­ 1 1,1 1,2- 1,3- 2,1 2,2 2 ,3 - Image Map Reduce F ig u r e 5.19: Our tile-based approach: An input image is divided into equally spaced tiles. In th e m ap phase after a symbolic padding by a column and row in all dimensions, a solver is run on a collection of four tiles labeled by numbers above. A fter th e m apper finishes, it assigns a key such th a t each reducer runs its solver a collection of four tiles th a t have a 50% overlap w ith the previous solutions. for each image in a byte of d ata. However, th e map is only used to denote if two pixels are from the same source image or if a pixel is on th e boundary. Therefore a byte is more th a n enough to encode this correspondence. The symbolic padding is encoded as boundary and images th a t are not evenly divisible by our tile size are also padded w ith boundary. The overlapping window size used for our te st was 1024 x 1024 pixels which we found was a good compromise between a low memory footprint and image coverage. 94 5 .4 .2 .2 C o a r s e s o lu tio n . As a first step, th e first phase of our solver will upsample via bilinear interpolation a 1-2 megapixel coarse solution. Much like th e m ethod from Summa et al. [149], we precom pute th e coarse solution in ju st a few seconds using a direct F F T solver on a coarsely sampled version of our large image. In tiled hierarchies, this coarse image is typically already present. In Hadoop, this coarse solution is sent along w ith the M apReduce jo b when launched. The Job Tracker stores this image on th e distrib u ted file system for Task Trackers to pull and store locally. 5 .4 .2 .3 F i r s t ( m a p ) p h a s e . A fter loading/com bining th e smaller tiles and perform ­ ing th e upsample, th e first phase runs an iterative solver initialized w ith th e upsam pled pixel colors. From our testing, we have found th a t SOR gives good running tim es and low memory consum ption and therefore is our default solver. The solver is considered to have converged when th e L 2 norm falls below 10-3 which is based on th e range of byte d ata. A fter a sm ooth image is com puted, the solution is split back into its four smaller tiles and sent to th e next phase as byte d ata. Some precision is lost in th e solution d a ta by this tru n catio n of bits and can cause slower convergence in th e next phase. However, in many distrib u ted systems, th e bottleneck is d a ta transfer; therefore it is preferable to use smaller d a ta a t th e cost of increased com putation. For th e Hadoop im plem entation, this first phase of our algorithm fits well w ith H adoop’s m ap phase. Each m apper em its a key/value pair, where th e value is the d a ta from a small tile and th e key is com puted in such a way th a t we achieve the desired 50% overlap in th e next phase. The key is com puted as a row /colum n pair in the space of th e larger tiles. This key is stored in 4 bytes before being em itted. The high word contains th e row and th e low word contains th e column. For a tile a t location (x ,y ), the key for sub-tile (i, j ) is com puted as: keyjro w = x * 2 + i; (5.4) key-col = y * 2 + j; (5.5) Below we provide pseudocode for th e m ap phase and Figure 5.19 provides an example. 5 .4 .2 .4 S e c o n d ( r e d u c e ) p h a s e . T he second phase now gathers th e four smaller tiles th a t make up th e overlapping window. These tiles sit as interm ediate d a ta on th e local disks of th e cluster. If th e system accounts for locality, each instance would only need to gath er three tiles since th e nodes could be placed such th a t one tile is always stored locally. A fter th e d a ta are gathered, th e gradients are com puted from th e original pixel values and an iterative solver (SOR) is run after being initialized w ith th e solutions from th e first 95 proc Map(blockld, im age) = row := b lockld >> 16; col := blockld & 0xFFFF; solver.com pute-gradient (image); solver.upsam ple_coarse(im age, row, col); so lver.S O R (im a g e); for i := 0 to 1 do for j := 0 to 1 do keyR o w := row * 2 + i; keyC ol := col * 2 + j ; key := keyR o w << 16 + keyCol; emit(key, solver.tiles[i][j ]); phase. The iterative solver is constrained to only work on interior pixels to prevent this phase from introducing new seams. Technically, there may be errors at th e pixels around th e m idpoints of th e boundary edges of these tiles, though in practice we have not seen this affect th e solution. This second phase fits well w ith H adoop’s reduce phase w ith some considerations. Hadoop does not account for d a ta locality for th e reducers, therefore, we m ust assume th e worst case g ath er of four tiles. Also, th e reducers do not have access to the HDFS, nor can any task request specific d ata. The m appers in th e first phase modify the pixel values, b u t th e reducer needs th e original values to com pute th e gradient vector for th e iterative solver. Therefore, th e m apper m ust also concatenate th e original pixel values to th e solved d a ta before it em its th e key/value pair. This leads to a 6 b ytes/pixel transfer between phases. Below we provide pseudocode for th e reduce phase. proc Reduce(blockld, [(map1, o r g 1 ) ,..., (map4, org4)]) = m apper-output := merge(map1, map2, map3, map4); o rig in a L tile := merge(org1, org2, org3, org4); so lver.com pute-gradient(original-tile); solver.S O R (m a p p er-o u tp u t); e m it(B lo ck Id , solver.tiles); 5 .4 .2 .5 S to r a g e in t h e H D F S . In Hadoop, saving th e image in stan d ard row m ajor order would lead to poor performance in th e m appers since there is good locality in only one dimension. Saving individual tiles would also not be efficient since H adoop’s HDFS is optimized for large files. Therefore, we save th e d a ta as th e large tiles, comprised of th e four 96 smaller tiles, which th e m apper needs in th e first phase. We concatenate the tiles together, row-by-row, into a single large file. 5 .4 .3 R e s u lt s We d em onstrate th e quality of our approach on three te st panoram as which range from megapixels to gigapixels in size. We also d em onstrate th e generality of th e ab stractio n by running our code, w ithout modification, on a single desktop and on a large cluster. Finally, we te st H adoop’s scalability w ith two of our te st panoram as. The single node tests were performed on a 2 xQ uad-C ore Intel Xeon w5580 3.2GHz desktop w ith 24GB of memory. For our large distrib u ted tests, we ran our m ethod on the NSF CLuE [41] cluster, which consists of 275 nodes each w ith dual Intel Xeon 2.8GHz processors w ith H yperT hreading and 8GB of memory. W hile still a valuable resource for research, as far as m odern clusters are concerned, C L uE ’s hardw are is o u td ated being a retired system based on a 6-year-old technology originally produced in 2004. Moreover, CLuE is also a shared resource and all tim ings were certainly affected by other researchers using th e machines. The E dinburgh panoram a consists of 25 images w ith a full resolution of 16, 950 x 2, 956 pixels (50 megapixel) and was broken into 48 tiles. For our single node test, our m ethod produced a solution in 81 seconds w ith eight m appers and four reducers. The Redrock panoram a consists of nine images w ith a full resolution of 19, 588 x 4,457 pixels (87 megapixel) and was partitioned into 96 tiles. O ur m ethod running on a single node solved th e panoram a in 156 seconds w ith nine m appers and nine reducers. The solver running on th e cluster ran in 199 seconds w ith 96 m appers and 96 reducers. Due to the small size of th e panoram as, th e ex tra parallelization given to us by th e distrib u ted system did not increase performance. Q uite th e opposite was true, th e runtim es were worse due to overhead of H adoop launching and coordinating many tasks. Also, because th e cluster was a shared resource, this increase in com pute tim e could have easily come from external influences. See Figure 5.20 for th e original and solved panoram as. The Salt Lake City panoram a consists of 611 images w ith a full resolution of 126,826 x 29, 633 pixels (3.27 gigapixel) and was split into 3,444 tiles. Our m ethod took 3 hours and 5 minutes to com pute a solution on our one node te st desktop. On th e distrib u ted cluster w ith 492 m appers and 492 reducers th e tim e to com pute a solution dropped to 28 minutes and 44 seconds of which 3 minutes and 24 seconds was due to Hadoop overhead and 15 minutes was due to I/O and d a ta transfer between th e m ap and reduce phases. R unning 97 F ig u r e 5.20: The results of our cloud im plem entation, from top to bottom : Edinburgh, 25 images, 16, 950 x 2, 956, 50 megapixel and th e solution to E dinburgh from our cloud im plem entation; Redrock, nine images, 19, 588 x 4,457; 87 megapixel and th e solution to R edrock from our cloud im plem entation; Salt Lake City, 611 images, 126, 826 x 29, 633, 3.27-gigapixel and th e solution to Salt Lake City from our cloud im plem entation. Salt Lake City w ith 246 m appers and 246 reducers produced a solution in 39 minutes and 49 seconds of which 2 minutes and 7 seconds was due to Hadoop overhead and 30 minutes was due to I/O and d a ta transfer. Note th a t these are all wall clock times and include activity of other people on a shared system. Moreover, th e configuration, which we could not change, required running a t least three processes on every node which have only two cores. Therefore, we can only hope to have 2 /3 com pute efficiency out of this cluster. See Figure 5.20 or th e original and solved panoram a. Based on our tim ing and th e pricing available online, running th e 492 m ap p er/red u cer job would have cost approxim ately $50 to run on A m azon’s Elastic Reduce [10]. This is orders of m agnitude less expensive and tim e comsuming th a n operating and m aintaining a proprietary cluster and would allow any researcher in th e field to experim ent w ith new ideas. 5 .4 .3 .1 S c a la b ility . Due to th e shared n atu re of th e CLuE cluster, we restricted our scalability tests to only th e single node test desktop. Figure 5.21 plots th e runtim e to solve b o th th e E dinburgh and Redrock panoram as as a function of num ber of reducers and m appers. We varied th e num ber of m appers and reducers from one to th e num ber of cores. The plot shows th a t as b o th th e m appers and reduces increase so does our performance, b ut as th e to tal num ber of b o th m appers and reducers meets or exceeds th e available cores of our system, th e performance gain flattens. This is an im p o rtan t observation and m ust be remembered when choosing an optim al num ber of m appers and reducers especially when purchasing tim e and cores as a commodity. 98 F ig u r e 5.21: (a) The scalability plot for th e E dinburgh (50 megapixel) panoram a on our one node 8-core te st desktop; (b) th e scalability plot for Redrock (87 megapixel) panoram a on th e same machine 5 .4 .3 .2 F a u lt t o l e r a n c e . H adoop has been developed to robustly handle failures in th e cluster. Achieving a fault to lerant im plem entation is a m ajor challenge on its own and is not easily available in o ther distrib u ted frameworks such as M PI. The trem endous advantage of fault tolerance comes a t th e cost of high variability in running tim es, though jobs are guaranteed to finish. In fact, all runs on th e distrib u ted cluster had some kind of failure in th e system a t some tim e during th e execution and still we were able to get results, which would not be available w ith a trad itio n al distrib u ted im plem entation. In particular, th e running tim e stated above for th e Salt Lake City example w ith 492 m appers/reducers was based on th e job w ith th e minimum num ber of failures (95 failed tasks). In practice, we have seen this example run as long as 49 m inutes to account for th e 133 failed tasked th a t occurred during th e job. C H A P T E R 6 F U T U R E W O RK The work outlined in this dissertation provides b o th th e justification for and th e solutions to bringing th e com position stage of th e panoram a processing pipeline into an interactive set­ ting. T he future of this work is to bring the entire pipeline into an interactive environment. A system built w ith this guiding principle would allow th e user to add and remove images, fix registration problems, or adjust image boundaries all while having a preview of th e final color-corrected, com posited panoram a. This will give users an unprecedented am ount of control over th e creation of new panoram as, increasing b o th the accessibility of panoram a creation and quality of th e final results. Due to th e work completed for this dissertation, the logical next step to achieving my ultim ate goal is to provide new and interactive solutions for th e registration phase of th e panoram a pipeline. Currently, interaction for registration is typically non-existent. Even when it is provided by a system, th e interaction is rudim entary at best. For instance, th e only interaction possible for a panoram a processing system such as Hugin [77] is th e m anual selection and deletion of image feature points between pairs of images, see Figure 6.1. This is a tedious process for small images, and com pletely unwieldy for larger image collections. The scaling of registration algorithm s will also need fu rth er study. D espite significant previous work, many current m ethods have been shown to work w ith relatively small collections of images. A lthough assumed to scale well, only recently has work shown an extension of current techniques to work w ith extrem ely large collections of photographs [1]. Often such assum ptions of scaling can be false; for instance some of th e commercial and open-source products, although advertised to handle an arb itrary num ber of images, have failed when presented w ith panoram as w ith hundreds of images. General purpose algorithm s for au tom atic registration of extrem ely large image collections rem ain an open avenue of investigation. In addition, state-of-the-art stitching software often needs a reduction of complexity by strictly enforcing th a t th e images are acquired in a regular p a tte rn (columns and rows) to reduce th e search space for possible registrations. My collaborators and I have 100 1 4 *• - - - * •• F ig u r e 6.1: A typical example of interaction during panoram a registration from th e open- source Hugin [77] software tool. C urrent interaction is limited to th e manual selection and deletion of feature points used during registration. found th a t these program s will often fail when presented w ith large image collections w ith no assumed structure. T h e focus of my dissertation work was to bring th e com position stage of th e panoram a creation pipeline into an interactive setting, not only for small images, b u t for images massive in size. The Progressive Poisson and Panorama Weaving algorithm s elegantly achieve this goal. A lthough panoram as were the prim ary focus of th e work, th e m ethods and frameworks developed th roughout my dissertation provide new paradigm s for interacting w ith high resolution imagery. For instance, th e Progressive Poisson provides a working proof-of-concept on how to reform ulate global algorithm s to work in an interactive setting for large d a ta by com puting screen resolution previews in real-tim e and using out-of-core com putation for full resolution solutions. O ne can envision expanding th e frameworks and techniques outlined in this work w ith o ther d a ta processing tools to allow comprehensive editing of massive d atasets on regular desktop com puters. A P P E N D IX M A SSIV E DATASETS T a b le A .1 : Massive panoram a d a ta acquired and used in this dissertation work. D ataset Images Format Individual Image Size Lake Louise Large 5794 RAW NEF 16-bit 4288 x 2848 (12 Megapixel) Lake Louise Small 1 2805 RAW NEF 16-bit 4288 x 2848 (12 Megapixel) Lake Louise W inter 1 1983 JP E G Fine 4288 x 2848 (12 Megapixel) Lake Louise W inter 2 1876 JP E G Fine 4288 x 2848 (12 Megapixel) Lake Louise Morning 1656 JP E G Fine 4288 x 2848 (12 Megapixel) Lake Louise Small 2 1440 RAW NEF 16-bit 4288 x 2848 (12 Megapixel) Salt Lake City Large 1311 JP E G Fine 3456 x 2592 (9 megapixels) Lake Louise Evening 1220 JP E G Fine 4288 x 2848 (12 Megapixel) Salt Lake City Winter 1219 JP E G Fine 3456 x 2592 (9 megapixels) Salt Lake City Fall 624 JP E G Fine 3456 x 2592 (9 megapixels) Mount Rushmore 300 JP E G Fine 4288 x 2848 (12 Megapixel) Salt Lake City Small 132 JP E G Fine 3264 x 2448 (8 megapixels) T a b le A .2 : Massive satellite d a ta acquired and used in this dissertation work. Dataset Resolution Gigapixels New York, NY 80000 x 80000 6.40 Chattanooga, TN 120000 x 100000 12.00 Washington, DC 131350 x 159375 20.93 Hamilton County, SC 240000 x 232000 55.68 Philadelphia, PA 250000 x 230000 57.50 Indianapolis, IN 260000 x 260000 67.60 San Diego, CA 200000 x 365000 73.00 San Francisco, CA 225000 x 330000 74.25 New Orleans, LA 330000 x 290000 95.70 Olympia, WA 501059 x 329220 164.96 San Antonio, TX 521640 x 492480 256.90 Atlanta, GA 524288 x 524288 274.88 Seattle, WA 411280 x 693528 285.23 Phoenix, AZ 720000 x 540000 388.80 R EFER EN C ES [1] A g a r w a l , S., S n a v e ly , N ., S e itz , S., a n d S z e lis k i, R . B undle adjustm ent in the large. E C C V ’10: Proceedings o f the 11th European conference on Computer vision: P art I I (Jan. 9), 29-42. [2] A g a r w a l a , A. Efficient gradient-dom ain com positing using quadtrees. A C M Trans. Graph 26, 3 (2007), 94. [3] A g a r w a l a , A ., A g r a w a l a , M ., C o h e n , M. F ., S a le s i n , D ., a n d S z e lis k i, R. P hotographing long scenes w ith multi-viewpoint panoram as. A C M Trans. Graph 25, 3 (2006), 853-861. [4] A g a r w a l a , A ., D o n t c h e v a , M ., A g r a w a l a , M ., D r u c k e r , S. M ., C o l b u r n , A ., C u r l e s s , B ., S a le s i n , D ., a n d C o h e n , M. F . Interactive digital photom on­ tage. A C M Trans. Graph 23, 3 (2004), 294-302. [5] A g a r w a l a , A ., Z h e n g , K . C., P a l , C ., A g r a w a l a , M ., C o h e n , M. F ., C u r l e s s , B ., S a le s i n , D ., a n d S z e lis k i, R . Panoram ic video textures. A C M Trans. Graph 24, 3 (2005), 821-827. [6] A g r a w a l , A ., R a s k a r , R ., N a y a r , S. K ., a n d Li, Y. Removing photography artifacts using gradient projection and flash-exposure sampling. In S IG G R A P H ’05: A C M S IG G R A P H 2005 Papers (New York, NY, USA, 2005), ACM, pp. 828-835. [7] A g r a w a l , A. K ., C h e l l a p p a , R ., a n d R a s k a r , R . An algebraic approach to surface reconstruction from gradient fields. In IC C V (2005), pp. I: 174-181. [8] A g r a w a l , A. K ., R a s k a r , R ., a n d C h e l l a p p a , R . W h at is th e range of surface reconstructions from a gradient field? In E C C V (2006), pp. I: 578-591. [9] A g r a w a l , A. K ., X u, Y ., a n d R a s k a r , R . Invertible m otion blur in video. A C M Trans. Graph 28, 3 (2009). [10] A m a zo n. E lastic m apreduce, 2012. h ttp ://aw s.am azo n .co m /elasticm ap red u ce. [11] A n a n d a n , P ., B u r t , P . J ., D a n a , K ., H a n s e n , M. W ., a n d v a n d e r W a l, G. S. Real-tim e scene stabilization and mosaic construction. In Image Understanding Workshop (1994), pp. I:457-465. [12] A u t o P a n o . h ttp ://w w w .a u to p a n o .n e t. [13] A x e ls s o n , O. Iterative Solution Methods. Cam bridge Universty Press, New York, NY, 1994. [14] B a d r a , F ., Q u m sieh , A ., a n d D u d e k , G. R o tatio n and zooming in image mosaicing. In W A C V (1998), pp. 50-55. http://aws.amazon.com/elasticmapreduce http://www.autopano.net 103 [15] B a e , S., P a r i s , S., a n d D u r a n d , F . Two-scale tone m anagem ent for photographic look. In S IG G R A P H ’06: A C M S IG G R A P H 2006 Papers (New York, NY, USA, 2006), ACM, pp. 637-645. [16] B a i, X ., W a n g , J ., Sim ons, D ., a n d S a p ir o , G. Video snapcut: R obust video object cu to u t using localized classifiers. A C M Trans. Graph 28, 3 (2009). [17] B a l m e l l i , L., K o v a c e v ic , J ., a n d V e t t e r l i , M. Q uadtrees for em bedded surface visualization: C onstraints and efficient d a ta structures. In in Proc. o f IE E E International Conference on Image Processing (1999), pp. 487-491. [18] B a y , H ., T u y t e l a a r s , T ., a n d G o o l , L. J . V. SURF: Speeded up robust features. In E C C V (2006), pp. I: 404-417. [19] B e n - E z r a , M ., a n d N a y a r , S. K . M otion-based motion deblurring. IE E E Trans. P attern Anal. Mach. Intell 26, 6 (2004), 689-698. [20] B e r g e r , M. J ., a n d C o l e l l a , P . Local adaptive mesh refinement for shock hydrodynamics. Journal Computational Physics 82 (1989), 64-84. [21] B o l i t h o , M ., K a z h d a n , M ., B u r n s , R ., a n d H o p p e , H. Multilevel stream ing for out-of-core surface reconstruction. In SG P ’07: Proceedings o f the fifth Eurographics Sym posium on Geometry Processing (Aire-la-Ville, Switzerland, Switzerland, 2007), Eurographics Association, pp. 69-78. [22] B o r n e m a n n , F . A ., a n d K r a u s e , R . Classical and cascadic m ultigrid - a m ethod­ ological comparison. In In Proceedings o f the 9th International Conference on D omain Decomposition Methods (1996), D omain Decomposition Press, pp. 64-71. [23] B o u k e r c h e , A ., a n d P a z z i, R . W . N. R em ote rendering and stream ing of progressive panoram as for mobile devices. In A C M M ultimedia (2006), K. N ahrstedt, M. Turk, Y. Rui, W. Klas, and K. M ayer-Patel, Eds., ACM, pp. 691-694. [24] B o y k o v , Y ., a n d K o lm o g o r o v , V. An experim ental com parison of m in-cut/m ax- flow algorithm s for energy minim ization in vision. IE E E Trans. P attern Anal. Mach. Intell 26, 9 (2004), 1124-1137. [25] B o y k o v , Y. Y ., a n d J o l l y , M. P . Interactive graph cuts for optim al boundary and region segm entation of objects in N-D images. In IC C V (2001), pp. I: 105-112. [26] B o y k o v , Y. Y ., V e k s l e r , O ., a n d Z a b ih , R . Fast approxim ate energy minimiza­ tion via graph cuts. IE E E Trans. P attern Analysis and M achine Intelligence 23, 11 (Nov. 2001), 1222-1239. [27] B r a n d t , A. Multi-level adaptive solutions to boundary-value problems. M athem at­ ics o f Com putation 31, 138 (1977), 333-390. [28] B r ig g s , W . L., H e n s o n , V. E ., a n d M c C o r m ic k , S. F . A M ultigrid Tutorial (2nd Ed.). Society for Industrial and Applied M athem atics, Philadelphia, PA, USA, 2000. [29] B r o w n , D. C. Close-range cam era calibration. Photogrammetric Engineering 37, 8 (Aug. 1971), 855-866. 104 [30] B r o w n , M ., a n d L o w e , D . A utom atic panoram ic image stitching using invariant features. International Journal o f Com puter Vision (Jan. 2007). [31] B r o w n , M ., a n d L o w e , D. G. Recognising panoram as. In IC C V (2003), IE E E C om puter Society, pp. 1218-1227. [32] B r o w n , M ., S z e lis k i, R . S., a n d W i n d e r , S. A. J . M ulti-image m atching using multi-scale oriented patches. In C V P R (2005), pp. I: 510-517. [33] B u c h a n a n , A ., a n d F i t z g ib b o n , A. W . Combining local and global motion models for feature point tracking. In C V P R (2007), pp. 1-8. [34] C a p e l , D ., a n d Z is s e rm a n , A. A utom ated mosaicing w ith super-resolution zoom. Proc. C V P R (Jan. 1998). [35] C h am , T . J ., a n d C i p o l l a , R . A statistical framework for long-range feature m atching in uncalibrated image mosaicing. In C V P R (1998), pp. 442-447. [36] C h a n d r a n , L. S., a n d S iv a d a s a n , N. Geometric representation of graphs in low dimension. In Proceedings o f the 12th A nnual International Conference on Com­ puting and Combinatorics (Berlin, Heidelberg, 2006), C O C O O N ’06, Springer-Verlag, pp. 398-407. [37] C h a r t r a n d , G ., a n d H a r a r y , F . P lan a r perm u tatio n graphs. Ann. Inst. Henri Poincare 3, 4 (1967), 433-438. [38] C h e n , S. E . Quicktime VR: An image-based approach to virtu al environm ent navigation. In S IG G R A P H (1995), pp. 29-38. [39] C h o , S., a n d L ee , S. Fast m otion deblurring. A C M Trans. Graph 28, 5 (2009). [40] C h o w , E ., F a l g o u t , R . D ., H u , J . J ., T u m in a r o , R . S., a n d Y a n g , U. M. A survey of parallelization techniques for m ultigrid solvers. In Parallel Processing fo r Scientific Computing, M. A. Heroux, P. Raghavan, and H. D. Simon, Eds., vol. 20 of Software, Environm ents, and Tools. SIAM, Philadelphia, PA, Nov. 2006, pp. 179-201. ch. 10,. [41] C luE. Clue program, 2008. h ttp://w w w .nsf.gov/pubs/2008/nsf08560 /nsf08560.htm . [42] C o h e n , M. F ., S h a d e , J ., H i l l e r , S., a n d D e u s s e n , O. Wang tiles for image and tex tu re generation. A C M Trans. Graph 22, 3 (2003), 287-294. [43] C o r m e n , T . H ., L e i s e r s o n , C. E ., R i v e s t , R . L., a n d S t e i n , C. Introduction to Algorithms, Third E dition, 3rd ed. The M IT Press, Cambridge, MA, 2009. [44] C o r r e a , C. D ., a n d M a, K .-L . Dynamic video narratives. A C M Trans. Graph 29, 4 (2010). [45] C rim in isi, A ., S h a r p , T ., R o t h e r , C., a n d P E r e z , P . Geodesic image and video editing. A C M Trans. Graph 29, 5 (2010), 134. [46] D av is, J . E . Mosaics of scenes w ith moving objects. In C V P R (1998), pp. 354-360. http://www.nsf.gov/pubs/2008/nsf08560 105 [47] D e a n , J ., a n d G h e m a w a t, S. MapReduce: Simplified d a ta processing on large clusters. C AC M 51, 1 (2008), 107-113. [48] D e l o n g , A ., a n d B o y k o v , Y . A scalable graph-cut algorithm for N-D grids. In C V P R (2008), IE E E C om puter Society. [49] D em m el, J . W . Applied Numerical Linear Algebra. SIAM, 1997. [50] D i j k s t r a , E . W . A note on two problems in connexion w ith graphs. Numerische M athem atik 1 (1959), 269-271. [51] D o r r , F . W . The direct solution of th e discrete poisson equation on a rectangle. S IA M Review 12, 2 (April 1970), 248-263. [52] E d e l s b r u n n e r , H ., a n d M u c k e , E . P . Simulation of simplicity: A technique to cope w ith degenerate cases in geometric algorithm s. A C M Trans. Graphics 9, 1 (Jan. 1990), 67-104. [53] E f r o s , A. A ., a n d F r e e m a n , W . T . Image quilting for tex tu re synthesis and transfer. In S IG G R A P H (2001), pp. 341-346. [54] F a r b m a n , Z., H o f f e r , G ., L ip m an , Y ., C o h e n - O r , D ., a n d L is c h in s k i, D. Coordinates for instan t image cloning. In S IG G R A P H ’09: Proceedings o f the 36th A nnual Conference on Computer Graphics and Interactive Techniques (New York, NY, USA, 2009), ACM. [55] F a t t a l , R ., L is c h in s k i, D ., a n d W e r m a n , M. G radient dom ain high dynam ic range compression. In S IG G R A P H ’02: Proceedings o f the 29th A nnual Conference on Com puter Graphics and Interactive Techniques (New York, NY, USA, 2002), ACM, pp. 249-256. [56] F e r g u s , R ., S in g h , B ., H e r tz m a n n , A ., R o w e is, S. T ., a n d F r e e m a n , W . T. Removing cam era shake from a single photograph. A C M Transactions on Graphics 25, 3 (July 2006), 787-794. [57] F i n l a y s o n , G. D ., H o r d l e y , S. D ., a n d D r e w , M. S. Removing shadows from images. In E C C V ’02: Proceedings o f the 7th European Conference on Computer Vision-Part I V (London, UK, 2002), Springer-Verlag, pp. 823-836. [58] F l u s s e r , J ., B o ld y s , J ., a n d Z ito v A , B. M oment forms invariant to ro tatio n and blur in arb itrary num ber of dimensions. IE E E Trans. P attern Anal. Mach. Intell 25, 2 (2003), 234-246. [59] F l u s s e r , J ., a n d S u k , T . D egraded image analysis: An invariant approach. IE E E Transactions on P attern Analysis and M achine Intelligence 20, 6 (1998), 590-603. [60] F o r d , L. R ., a n d F u l k e r s o n , D. R . M aximal flow th rough a network. Canadian Journal o f M athem atics 8 (1956), 399-404. [61] F r e e d m a n , D ., a n d Z h a n g , T . Interactive graph cut based segm entation w ith shape priors. In C V P R (2005), pp. I: 755-762. [62] G a l l , D. L. Mpeg: A video compression stan d ard for m ultim edia applications. Com munications o f the A C M (Jan 1991). 106 [63] G ig a P a n . h ttp ://w w w .g ig a p a n .o rg /a b o u t.p h p . [64] G o ld m a n , D. B. V ignette and exposure calibration and com pensation. IE E E Trans. P attern Anal. Mach. Intell 32, 12 (2010), 2276-2288. [65] G o o g l e . Google E a rth h ttp ://e a rth .g o o g le .c o m /. [66] G o r t l e r , S., a n d C o h e n , M. V ariational modeling w ith wavelets. In Sym posium on Interactive 3D Graphics (1995), pp. 35-42. [67] G o s h t a s b y , A. A. 2-D and 3-D Image Registration: For Medical, Rem ote Sensing, and Industrial Applications. Wiley-Interscience, New York, NY, M ar. 2005. [68] G o t t f r id , D . Self-service, prorated super com puting fun!, 2007. h ttp ://o p e n .b lo g s . nytim es.com /2007/11/01/self-service-prorated-super-com puting-fun. [69] G r a c i a s , N. R . E ., M a h o o r , M. H ., N e g a h d a r i p o u r , S., a n d G l e a s o n , A. C. R . Fast image blending using w atersheds and graph cuts. Image and Vision Computing 27, 5 (Apr. 2009), 597-607. [70] G r i e b e l , M ., a n d Z u m b u sc h , G. Parallel m ultigrid in an adaptive pde solver based on hashing and space-filling curves. Parallel Comput. 25, 7 (1999), 827-843. [71] H a d o o p . A pplications and organizations using hadoop, 2012. h ttp ://w ik i.a p a c h e . org/hadoop/Pow eredB y. [72] H a s s in , R . M aximum flow in (s, t)-p lan ar networks. Inform . Proc. Lett. 13 (1981), 107. [73] H e a t h , N g , a n d P e y t o n . Parallel algorithm s for sparse linear systems. S IR E V : S IA M Review 33 (1991). [74] H iR IS E . High Resolution Imaging Science E xperim ent h ttp ://h iris e .lp l.a riz o n a .e d u /. [75] H o c k n e y , R . W . A fast direct solution of Poisson’s equation using Fourier analysis. Journal o f the A C M 12, 1 (Jan. 1965), 95-113. [76] H o r n , B. K . P . D eterm ining lightness from an image. Comput. Graphics Image Processing 3, 1 (Dec. 1974), 277-299. [77] H u g in . h ttp ://h u g in .so u rcefo rg e.n et. [78] I k e d a , S., S a t o , T ., a n d Y o k o y a , N. High-resolution panoram ic movie gener­ ation from video stream s acquired by an om nidirectional m ulti-cam era system. In Proceedings o f IE E E International Conference on M ultisensor Fusion and Integration fo r Intelligent System s (Aug. 2003), p. 155. [79] J ia , J ., S un, J ., T a n g , C .-K ., a n d Shum , H .-Y . D rag-and-drop pasting. In S IG G R A P H ’06: A C M S IG G R A P H 2006 Papers (New York, NY, 2006), ACM, pp. 631-637. [80] J ia , J ., a n d T a n g , C .-K . Tensor voting for image correction by global and local intensity alignment. IE E E Trans. P attern Anal. Mach. Intell 27, 1 (2005), 36-50. http://www.gigapan.org/about.php http://earth.google.com/ http://open.blogs http://wiki.apache http://hirise.lpl.arizona.edu/ http://hugin.sourceforge.net 107 [81] J ia , J ., a n d T a n g , C .-K . Image stitching using stru ctu re deform ation. IE E E Trans. P attern Anal. Mach. Intell 30, 4 (2008), 617-631. [82] J ia , J. Y ., a n d T a n g , C. K . Image registration w ith global and local luminance alignment. In IC C V (2003), pp. 156-163. [83] J ia , J . Y ., a n d T a n g , C. K . E lim inating stru ctu re and intensity misalignment in image stitching. In IC C V (2005), pp. II: 1651-1658. [84] J o h n s o n , D . B. Efficient algorithm s for shortest paths in sparse networks. Journal o f the A C M 24, 1 (Jan. 1977), 1-13. [85] J o s h i, N ., K a n g , S. B., Z i tn ic k , C. L., a n d S z e lis k i, R . Image deblurring using inertial m easurem ent sensors. A C M Trans. Graph 29, 4 (2010). [86] J o s h i, N ., S z e lis k i, R ., a n d K r ie g m a n , D. J . P S F estim ation using sharp edge prediction. In C V P R (2008), pp. 1-8. [87] K a z h d a n , M. R econstruction of solid models from oriented point sets. In Euro­ graphics Sym posium on Geometry Processing (2005), pp. 73-82. [88] K a z h d a n , M ., B o l i t h o , M ., a n d H o p p e , H. Poisson surface reconstruction. In Eurographics Sym posium on Geometry Processing (2006), pp. 61-70. [89] K a z h d a n , M ., a n d H o p p e , H. Stream ing m ultigrid for gradient-dom ain operations on large images. A C M ToG. 27, 3 (2008). [90] K a z h d a n , M ., S u r e n d r a n , D ., a n d H o p p e , H. D istributed gradient-dom ain processing of p lanar and spherical images. Transactions on Graphics (T O G 29, 2 (M ar 2010). [91] K a z h d a n , M. M ., a n d H o p p e , H. Stream ing m ultigrid for gradient-dom ain operations on large images. A C M Trans. Graph 27, 3 (2008). [92] K o lm o g o r o v , V ., a n d Z a b ih , R . W h at energy functions can be minimized via graph cuts? IE E E Trans. P attern Anal. Mach. Intell 26, 2 (2004), 147-159. [93] K o p f , J ., C o h e n , M. F ., L is c h in s k i, D ., a n d U y t t e n d a e l e , M. Joint bilateral upsampling. A C M Trans. Graph 26, 3 (2007), 96. [94] K o p f , J ., U y t t e n d a e l e , M ., D e u s s e n , O ., a n d C o h e n , M. F . C apturing and viewing gigapixel images. A C M Trans. Graph 26, 3 (2007), 93. [95] K o u r o g i, M ., K u r a t a , T ., H o s h in o , J ., a n d M u r a o k a , Y . Real-tim e image mosaicing from a video sequence. In IC IP (4) (1999), pp. 133-137. [96] K r u g e r , S., a n d C a lw a y , A. Image registration using m ultiresolution frequency dom ain correlation. Proc. B ritish Machine Vision C onf (Jan 1998). [97] K u g lin , C. D ., a n d H in es, D . C. The phase correlation image alignment method. Assorted Conferences and Workshops (Sept. 1975), 163-165. 108 [98] K u m a r , S., P a s c u c c i , V ., V is h w a n a th , V ., C a r n s , P ., H e r e l d , M ., L a th a m , R ., P e t e r k a , T ., P a p k a , M ., a n d R o ss, R . Towards parallel access of m ulti­ dimensional, m ulti-resolution scientific d ata. In Petascale Data Storage Workshop (PD SW ), 2010 5th (Nov. 2010), pp. 1 -5. [99] K u m a r , S., V is h w a n a th , V ., C a r n s , p . , Sum m a, B., S c o r z e l l i , G ., P a s c u c c i , V ., R o s s , R ., C h e n , J ., K o l l a , H ., a n d G r o u t , R . Pidx: Efficient parallel i/o for m ulti-resolution multi-dimensional scientific d atasets. In Proceedings o f IE E E Cluster 2011 (Sep. 2011). [100] K u n d u r , D ., a n d H a t z i n a k o s , D. Blind image deconvolution. IE E E Signal Processing Magazine 13, 3 (May 1996), 43-64. [101] K w a t r a , V ., S c h o d l , A ., E s s a , I., T u r k , G ., a n d B o b ic k , A. G raphcut textures: Image and video synthesis using graph cuts. A C M Transactions on Graphics 22, 3 (July 2003), 277-286. [102] L a w d e r , J . K ., a n d K in g , P . J . H. Using space-filling curves for multi-dimensional indexing. In L N C S (2000), Springer Verlag, pp. 20-35. [103] L e v in , A ., Z o m e t, A ., P e l e g , S., a n d W e iss, Y . Seamless image stitching in the gradient domain. In E C C V (2004), pp. Vol IV: 377-389. [104] Li, Y ., S u n , J ., T a n g , C .-K ., a n d Shum , H .-Y . Lazy snapping. A C M Trans. Graph 23, 3 (2004), 303-308. [105] L is c h in s k i, D ., F a r b m a n , Z., U y t t e n d a e l e , M ., a n d S z e lis k i, R . Interactive local adjustm ent of tonal values. A C M ToG 25, 3 (2006), 646-653. [106] L iu, J ., a n d S u n , J . Parallel graph-cuts by adaptive b o ttom -up merging. In C VP R (2010), IEEE, pp. 2181-2188. [107] L o m b a e r t , H ., S u n , Y . Y ., G r a d y , L., a n d X u, C. Y . A multilevel banded graph cuts m ethod for fast image segm entation. In IC C V (2005), pp. I: 259-265. [108] L o w e , D . G. O bject recognition from local scale-invariant features. In IC C V (1999), p p . 1150-1157. [109] L u c a s , B ., a n d K a n a d e , T . An iterative image registration technique w ith an application to stereo vision. International Jo in t Conference on Artificial Intelligence 3 (1981), 674-679. [110] M a li n g , D. H. Coordinate System s and Map Projections. B utterw orth-H einem ann, W oburn, MA, 1993. [111] M a n n , S., a n d P i c a r d , R . W . V irtual bellows: C onstructing high quality stills from video. In IC IP (1) (1994), pp. 363-367. [112] M a t u n g k a , R ., Z h e n g , Y ., a n d E w in g , R . Image registration using adaptive polar transform . Image Processing, IE E E Transactions on 18, 10 (2009), 2340 - 2354. [113] M c C a n n , J. Recalling th e single-FFT direct poisson solve. In S IG G R A P H Posters (2008), ACM, p. 71. 109 [114] M c C a n n , J ., a n d P o l l a r d , N. S. Real-tim e gradient-dom ain painting. In S IG G R A P H ’08: A C M S IG G R A P H 2008 papers (New York, NY, USA, 2008), ACM, pp. 1-7. [115] M c G u ir e , M. An image registration technique for recovering rotation, scale and tran slatio n param eters. N E C Res. Inst. Tech. Rep., T R (1998), 98-018. [116] M c L a u c h l a n , P ., a n d J a e n i c k e , A. Image mosaicing using sequential bundle adjustm ent. Image and Vision Computing (Jan. 2002). [117] M e e h a n , J . Panoramic Photography. Amphoto, Oct. 1990. [118] M e g a P O V . h ttp ://m e g a p o v .in e ta rt.n e t. [119] M ilg r a m , D. L. C om puter m ethods for creating photomosaics. IE E E Trans. Com puter 23 (1975), 1113-1119. [120] M ilg r a m , D . L. A daptive techniques for photomosaicking. IE E E Trans. Computer 26 (1977), 1175-1180. [121] M i l l s , A ., a n d D u d e k , G. Image stitching w ith dynam ic elements. Image and Vision Computing 27, 10 (Sept. 2009), 1593-1602. [122] M o r t e n s e n , E. N ., a n d B a r r e t t , W . A. Intelligent scissors for image composi­ tion. In S IG G R A P H (1995), pp. 191-198. [123] M o r t e n s e n , E. N ., a n d B a r r e t t , W . A. Interactive segm entation w ith intelligent scissors. Graphical models and image processing: GM IP 60, 5 (Sept. 1998), 349-384. [124] N a g a h a s h i, T ., F u jiy o s h i, H ., a n d K a n a d e , T . Image segm entation using iterated graph cuts based on multi-scale smoothing. In A C C V (2007), pp. II: 806-816. [125] N A SA . NASA Blue M arble h ttp ://e arth o b serv ato ry .n asa.g o v / F eatu res/B lu eM arb le/. [126] N ie d e r m e ie r , R ., R e i n h a r d t , K ., a n d S a n d e r s , P . Towards optim al locality in meshindexings. In Proc. Fundamentals o f Com putation Theory (1997), vol. 1279 of LNCS, Spinger, pp. 364-375. [127] O ja n s iv u , V ., a n d H e i k k i l a , J . Image registration using blur-invariant phase correlation. IE E E Signal Processing Letters 14, 7 (July 2007), 449-452. [128] P a s c u c c i , V ., a n d F r a n k , R . J . Hierarchical indexing for out-of-core access to m ulti-resolution d ata. In Hierarchical and Geometrical Methods in Scientific Visualization, M athem atics and Visualization. Springer, New York, NY. [129] P a s c u c c i , V ., a n d F r a n k , R . J . Global static indexing for real-tim e exploration of very large regular grids. In Supercomputing (S C ’01) (2001), p. 2. [130] P a s c u c c i , V ., L a n e y , D . E ., F r a n k , R . J ., G y g i, F ., S c o r z e l l i , G ., L in se n , L., a n d H a m a n n , B. R eal-tim e m onitoring of large scientific simulations. In S A C (2003), ACM, pp. 194-198. http://megapov.inetart.net http://earthobservatory.nasa.gov/ 110 [131] P a v l o , A ., P a u l s o n , E ., R a s in , A ., A b a d i, D . J ., D e W i t t , D. J ., M a d d e n , S. R ., a n d S t o n e b r a k e r , M. A com parison of approaches to large scale d a ta analysis. In SIG M O D (Providence, RI, USA, 2009). [132] P e l e g , S., R o u s s o , B ., A c h a , A. R ., a n d Z o m e t, A. Mosaicing on adaptive manifolds. IE E E Trans. P attern Analysis and Machine Intelligence 22, 10 (Oct. 2000), 1144-1154. [133] P E r e z , P ., G a n g n e t , M ., a n d B l a k e , A. Poisson image editing. A C M Trans. Graph. 22, 3 (2003), 313-318. [134] P h ili p , S., Summa, B ., B r e m e r , P .- T ., a n d P a s c u c c i , V. Parallel gradient dom ain processing of massive images. In Eurographics Sym posium on Parallel Graph­ ics and Visualization (Llandudno, Wales, UK, 2011), T. Kuhlen, R. P ajarola, and K. Zhou, Eds., Eurographics Association, pp. 11-19. [135] P h ili p , S., Summa, B ., P a s c u c c i , V ., a n d B r e m e r , P .- T . H ybrid cpu-gpu solver for gradient dom ain processing of massive images. In IC P A D S ’11: Proceedings o f the 2011 IE E E 17th International Conference on Parallel and Distributed System s (W ashington, DC, USA, 2011), IE E E C om puter Society, pp. 244-251. [136] P r e t t o , A ., M e n e g a t t i , E ., B e n n e w it z , M ., B u r g a r d , W ., a n d P a g e l l o , E . A visual odom etry framework robust to motion blur. In IC R A (2009), IEEE, pp. 2250-2257. [137] P T g u i, 2012. h ttp ://w w w .p tg u i.c o m . [138] R a s t o g i , A ., a n d K r i s h n a m u r t h y , B. Localized hierarchical graph cuts. In IC V G IP (2008), IEEE, pp. 163-170. [139] R i c k e r , P . M . A direct m ultigrid poisson solver for oct-tree adaptive meshes. The Astrophysical Journal Supplem ent Series 176 (2008), 293-300. [140] R o b e r t s , F . On the boxicity and cubicity o f a graph. Recent Progress in Combina­ torics, 1969. [141] R o s g e n , B., a n d S t e w a r t , L. Complexity results on graphs w ith few cliques. Discrete M athem atics & Theoretical Computer Science 9, 1 (2007). [142] R o t h e r , C ., K o lm o g o r o v , V ., a n d B l a k e , A. G rabcut: Interactive foreground ex traction using iterated graph cuts. A C M Trans. Graph 23, 3 (2004), 309-314. [143] S a g a n , H. Space-Filling Curves. Spinger-Verlag, New York, NY, 1994. [144] S a n d , P ., a n d T e l l e r , S. Video m atching. ToG 23, 3 (2004), 592-599. [145] Shum , H. Y ., a n d S z e lis k i, R . S. C onstruction and refinement of panoram ic mosaics w ith global and local alignment. In IC C V (1998), pp. 953-956. [146] S im c h o n y , T ., a n d C h e l l a p p a , R . Direct analytical m ethods for solving Poisson equations in com puter vision problems. IE E E Trans. P attern Anal. Mach. Intell. 12 (1990), 435-446. http://www.ptgui.com 111 [147] S n a v e ly , N ., G a r g , R ., S e itz , S. M ., a n d S z e lis k i, R . F inding paths through th e w orld’s photos. In SIGGraph-08 (2008), pp. xx-yy. [148] S t o o k e y , j . , X ie, Z., C u t l e r , B ., F r a n k l i n , W . R ., T r a c y , D. M ., a n d A n d r a d e , M. V. A. Parallel O D ETLA P for te rra in compression and reconstruction. In GIS (2008), W. G. Aref, M. F. Mokbel, and M. Schneider, Eds., ACM, p. 17. [149] Sum m a, B ., S c o r z e l l i , G ., J i a n g , M ., B r e m e r , P .- T ., a n d P a s c u c c i , V. Interactive editing of massive imagery made simple: Turning A tla n ta into A tlantis. A C M Trans. Graph. 30, 2 (Apr. 2011), 7:1-7:13. [150] Sum m a, B ., T i e r n y , J ., a n d P a s c u c c i , V. P anoram a weaving: F ast and flexible seam processing. A C M Trans. Graph. 31, 4 (July 2012), 83:1-83:11. [151] Sum m a, B ., V o, H. T ., S ilv a , C ., a n d P a s c u c c i , V. Massive image editing on the cloud. In IA S T E D International Conference on Computational Photography (CPhoto 2011) (2011). [152] S u n , J ., J ia , J ., T a n g , C .-K ., a n d Shum , H .-Y . Poisson m atting. A C M Trans. Graph. 23, 3 (2004), 315-321. [153] S z e lis k i, R . Image mosaicing for tele-reality applications. In Proceedings o f the Second IE E E Workshop on Applications o f Com puter Vision (1994), pp. 44-53. [154] S z e lis k i, R . Video mosaics for virtu al environm ents. Com puter Graphics and Applications, IE E E 16, 2 (1996), 22 - 30. [155] S z e lis k i, R . Image alignment and stitching: A tuto rial. Foundations and Trends in Com puter Graphics and Vision 2, 1 (2006). [156] S z e lis k i, R ., a n d Shum , H .-Y . C reating full view panoram ic image mosaics and environm ent maps. S IG G R A P H ’97: Proceedings o f the 24th annual conference on Com puter graphics and interactive techniques (Aug 1997). [157] S z e lis k i, R . S. Video mosaics for virtu al environm ents. IE E E Computer Graphics and Applications 16, 2 (Mar. 1996), 22-30. [158] T a i, Y. W ., D u , H ., B r o w n , M. S., a n d Lin, S. Im age/video deblurring using a hybrid cam era. In C V P R (2008), pp. 1-8. [159] T a s d iz e n , T ., K o s h e v o y , p . , G rim m , B. C., A n d e r s o n , J . R ., J o n e s , B. W ., W a t t , C. B ., W h i t a k e r , R . T ., a n d M a r c , R . E . A utom atic mosaicking and volume assembly for high-throughput serial-section transm ission electron microscopy. Journal o f Neuroscience Methods 193, 1 (2010), 132 - 144. [160] T o l e d o , S. A survey of out-of-core algorithm s in numerical linear algebra. In E xternal m em ory algorithms, Dimacs Series In Discrete M athem atics And Theoretical C om puter Science. American M athem atical Society, Boston, MA, 1999, pp. 161-179. [161] T r i g g s , B ., M c L a u c h l a n , P ., H a r t l e y , R . I., a n d F itz g ib b o n , A. W . Bundle adjustm ent: A m odern synthesis. In Vision Algorithms Workshop: Theory and Practice (1999), pp. 298-372. 112 [162] T u y t e l a a r s , T ., a n d M ik o l a j c z y k , K. Local invariant feature detectors: A survey. Foundations and Trends in Computer Graphics and Vision 3, 3 (2007), 177­ 280. [163] T z i m r o p o u lo s , G ., A r g y r i o u , V ., Z a f e i r i o u , S., a n d S t a t h a k i , T . R obust fft-based scale-invariant image registration w ith image gradients. P attern Analysis and Machine Intelligence, IE E E Transactions on D O I - 10.1109/34.55103 PP, 99 (2010), 1 - 1. [164] U SG S,. U nited States Geological Survey h ttp ://w w w .u sg s.g o v /. [165] U y t t e n d a e l e , M. T ., E d e n , A ., a n d S z e lis k i, R . S. E lim inating ghosting and exposure artifacts in image mosaics. In C V P R (2001), pp. II:509-516. [166] V a l g r e n , C., a n d L i l i e n t h a l , A. J . SIFT, SURF & seasons: A ppearance-based long-term localization in outdoor environm ents. Robotics and A utonom ous System s 58, 2 (2010), 149-156. [167] V i n e e t , V ., a n d N a r a y a n a n , P . J. CUDA cuts: Fast graph cuts on th e GPU. In Com puter Vision on GPU (2008), pp. 1-8. [168] V i t t e r , J . S. E xtern al memory algorithm s and d a ta structures: Dealing w ith massive d ata. A C M Comput. Surv. 33, 2 (2001), 209-271. [169] V o , H ., B r o n s o n , J ., Summa, B ., C o m b a, J ., F r e i r e , J ., H o w e , B ., P a s c u c c i , V ., a n d S ilv a , C. Parallel visualization on large clusters using m apreduce. In Pro­ ceedings o f the 2011 IE E E Sym posium on Large-Scale Data Analysis and Visualization (LD AV) (2011), p. (to appear). [170] V o , H. T ., O s m a ri, D . K ., Sum m a, B ., C o m b a, J . L. D ., P a s c u c c i , V ., a n d S ilv a , C. T . Stream ing-enabled parallel dataflow architecture for multicore systems. Comput. Graph. Forum 29, 3 (2010), 1073-1082. [171] W a n g , B ., Sum m a, B ., P a s c u c c i , V ., a n d V e jd e m o - J o h a n s s o n , M. Branching and circular features in high dim ensional d ata. Visualization and Computer Graphics, IE E E Transactions on 17, 12 (dec. 2011), 1902-1911. [172] W a r d , G. Hiding seams in high dynam ic range panoram as. In A P G V (2006), R. W. Flem ing and S. Kim, Eds., vol. 153 of A C M International Conference Proceeding Series, ACM, p. 150. [173] W e iss, Y . Deriving intrinsic images from image sequences. In International Confer­ ence on Com puter Vision (2001), pp. 68-75. [174] W o o d , D. N ., F i n k e l s t e i n , A ., H u g h e s , J . F ., T h a y e r , C. E ., a n d S a le s i n , D. M ultiperspective panoram as for cel anim ation. In S IG G R A P H (1997), pp. 243-250. [175] W u , C. SiftGPU: A G PU im plem entation of scale invariant feature transform (SIFT), 2007. h ttp ://c s .u n c .e d u / ccw u/siftgpu. [176] X io n g , Y ., a n d P u l l i , K . Fast image labeling for creating high-resolution panoram ic images on mobile devices. In IS M (2009), IE E E C om puter Society, pp. 369-376. http://www.usgs.gov/ http://cs.unc.edu/ 113 [177] X io n g , Y ., a n d P u l l i , K . Fast panoram a stitching for high-quality panoram ic images on mobile phones. IE E E Transactions on Consumer Electronics (Jan 2010). [178] X io n g , Y ., W a n g , X ., T ic o , M ., a n d L ia n g , C. Panoram ic imaging system for mobile devices. S IG G R A P H ’09: Posters (Jan 2009). [179] X u , D ., C h e n , Y ., X io n g , Y ., Q ia o , C ., a n d H e, X. On th e complexity o f/an d algorithm s for finding shortest p ath w ith a disjoint counterpart. IE E E /A C M Trans. on Networking 14, 1 (2006), 147-158. [180] Ya h o o ! Yahoo! expands its m45 cloud com puting initiative, adding to p universities to supercom puting research cluster. h t t p : // r e s e a r c h .y a h o o .c o m /n e w s /3 3 7 4 . [181] Y o o n , M .-S .-E ., a n d L i n d s t r o m , M .-P . Mesh layouts for block-based caches. IE E E Transactions on Visualization and Com puter Graphics 12, 5 (2006), 1213-1220. [182] Y o o n , S .-E ., L i n d s t r o m , P ., P a s c u c c i , V ., a n d M a n o c h a , D . Cache-oblivious mesh layouts. A C M Trans. Graph. 24, 3 (2005), 886-893. [183] Y u a n , L., S u n , J ., Q u a n , L., a n d Shum , H .-Y . Image deblurring w ith blu rred /n o isy image pairs. A C M Trans. Graph 26, 3 (2007), 1. [184] Z i to v a , B., a n d F l u s s e r , J . Image registration methods: A survey. Image and Vision Computing 21, 11 (Oct. 2003), 977-1000. [185] Z o g h la m i, I., F a u g e r a s , O ., a n d D e r i c h e , R . Using geometric corners to build a 2d mosaic from a set of images. Computer Vision and P attern Recognition, 1997. Proceedings., 1997 IE E E Com puter Society Conference on (1997), 420 - 425. http://research.yahoo.com/news/3374