Deterministic Systems Analysis


IN T E R A C T IV E  DIGITAL P H O T O G R A P H Y
AT SCALE

by
B rian M ark Summa

A dissertation subm itted to  the faculty of 
The University of U tah 

in partial fulfillment of the requirem ents for the degree of

D octor of Philosophy 
in

C om puter Science

School of C om puting 
The University of U tah 

May 2013


Copyright ©  B rian M ark Summ a 2013 
All Rights Reserved


T h e  U n i v e r s i t y  o f  U t a h  G r a d u a t e  S c h o o l

STATEMENT OF DISSERTATION APPROVAL

The dissertation o f Brian M ark Summa
has been approved by the following supervisory committee members:

Valerio Pascucci Chair 2/25/2013
Date A pproved

Paolo Cignoni M em ber 12/13/2012
Date A pproved

Charles Hansen M em ber 12/14/2012
Date A pproved

Christopher Johnson M em ber 12/14/2012
Date A pproved

Paul Rosen M em ber 12/14/2012
Date A pproved

and by Alan Davis Chair o f
the D epartm ent o f _____________________ School o f Com puting

and by Donna M . W hite, Interim D ean o f The Graduate School.


A B ST R A C T
Interactive editing and m anipulation of digital media is a fundam ental com ponent in 

digital content creation. One media in particular, digital imagery, has seen a recent increase 
in popularity of its large or even massive image formats. U nfortunately, current systems 
and techniques are rarely concerned w ith scalability or usability w ith these large images. 
Moreover, processing massive (or even large) imagery is assumed to be an off-line, autom atic 
process, although many problems associated w ith these d atasets require hum an intervention 
for high quality results. This dissertation details how to  design interactive image techniques 
th a t scale. In particular, massive imagery is typically constructed as a seamless mosaic 
of many smaller images. The focus of this work is the creation of new technologies to 
enable user interaction in the form ation of these large mosaics. W hile an interactive system 
for all stages of the mosaic creation pipeline is a long-term research goal, this dissertation 
concentrates on the last phase of the mosaic creation pipeline -  the com position of registered 
images into a seamless composite. The work detailed in this dissertation provides the 
technologies to  fully realize interactive editing in mosaic com position on image collections 
ranging from the very small to  massive in scale.


To Delila


C O N T E N T S
A B S T R A C T ................................................................................................................................  iii 
L IS T  O F  F I G U R E S  ................................................................................................................ v iii 
L IS T  O F  T A B L E S .................................................................................................................... x v ii 
A C K N O W L E D G E M E N T S ..................................................................................................x ix  
C H A P T E R S  
1......M O T I V A T IO N  A N D  C O N T R I B U T I O N S ..........................................................  1

1.1 The P anoram a Creation P i p e l i n e .............................................................................  4
1.2 B oundaries.......................................................................................................................  7
1.3 Color Correction ...........................................................................................................  12

2. R E L A T E D  W O R K  ......................................................................................................... 15
2.1 Image Boundaries: S e a m s ...........................................................................................  15

2.1.1 Pairwise B o u n d a rie s ...........................................................................................  15
2.1.2 G raph C u t s ...........................................................................................................  16
2.1.3 A lternative B oundary T echniques.................................................................... 16
2.1.4 Out-of-Core and D istributed C o m p u ta tio n ...................................................  16

2.2 Color Correction: G radient Domain E d itin g ..........................................................  17
2.2.1 Poisson Image Processing .................................................................................. 17
2.2.2 Poisson Solvers ....................................................................................................  17
2.2.3 Out-of-Core C o m p u ta tio n .................................................................................. 19
2.2.4 D istributed C o m p u ta tio n .................................................................................. 19
2.2.5 Cloud Com puting - MapReduce and Hadoop ............................................... 20
2.2.6 Out-of-Core D ata Access .................................................................................. 21

3. S C A L A B L E  A N D  E F F I C I E N T  D A T A  A C C E S S ............................................  22
3.1 Z- and HZ-Order Background ....................................................................................  22
3.2 Efficient M ultiresolution Range Q u e r ie s .................................................................  25
3.3 Parallel W r i t e ................................................................................................................ 28
3.4 ViSUS Software Framework ......................................................................................  28

3.4.1 LightStream  Dataflow and Scene G ra p h ........................................................  31
3.4.2 Portable Visualization Layer - ViSUS A ppK it..............................................  32
3.4.3 Web-Server and P l u g - I n ....................................................................................  33
3.4.4 Additional Application: Real-Time M o n ito rin g ..........................................  34


4. I N T E R A C T I V E  S E A M  E D I T I N G  A T  S C A L E  ............................................... 36
4.1 O ptim al Image B o u n d a rie s ........................................................................................  36

4.1.1 Optim al B o u n d a r ie s ...........................................................................................  36
4.1.2 M in-Cut and M in -P a th ......................................................................................  37
4.1.3 G raph C u t s ...........................................................................................................  38

4.2 Pairwise Seams and Seam Trees ...............................................................................  38
4.3 From Pairwise to  Global Seams ...............................................................................  40

4.3.1 The Dual Adjacency M e s h ...............................................................................  41
4.3.2 Branching Points and Intersection R e s o lu tio n ............................................  42

4.3.2.1 Branching points.........................................................................................  43
4.3.2.2 Removing invalid intersections................................................................. 44

4.4 Out-of-Core Seam P ro c e s s in g ....................................................................................  46
4.4.1 Branching Point and Shared Edge P h a s e ...................................................... 47
4.4.2 Intersection Resolution P h a s e ........................................................................... 48

4.5 Weaving Interactive S y s t e m ......................................................................................  49
4.5.1 System S p e c ific s.................................................................................................. 49

4.5.1.1 In p u t..............................................................................................................  49
4.5.1.2 Initial parallel com putation......................................................................  50
4.5.1.3 Seam network im p ort.................................................................................  50

4.5.2 Interactions ...........................................................................................................  51
4.5.2.1 Seam bending...............................................................................................  51
4.5.2.2 Seam sp littin g ..............................................................................................  52
4.5.2.3 Branching point movement.......................................................................  52
4.5.2.4 Branching point splitting and merging..................................................  52
4.5.2.5 Im proper user interaction.......................................................................... 52

4.6 Scalable Seam Interactions ......................................................................................... 53
4.7 Results ..............................................................................................................................  55

4.7.1 P anoram a C re a tio n .............................................................................................  55
4.7.1.1 In-core results............................................................................................... 55
4.7.1.2 Out-of-core results......................................................................................  56

4.7.2 P anoram a E d itin g ................................................................................................ 58
4.7.2.1 Editing bad seams....................................................................................... 58
4.7.2.2 Multiple valid seams................................................................................... 60

4.8 Lim itations and Future W o r k ....................................................................................  60
5. I N T E R A C T I V E  G R A D I E N T  D O M A I N  E D I T I N G  A T  S C A L E ............ 64

5.1 Gradient Domain Image P ro c e s s in g ........................................................................  64
5.2 Progressive Poisson Solver...........................................................................................  65

5.2.1 Progressive Framework ......................................................................................  65
5.2.1.1 Initial solution.............................................................................................. 66
5.2.1.2 Progressive refinem ent...............................................................................  67
5.2.1.3 Local preview...............................................................................................  67
5.2.1.4 Progressive full solution............................................................................. 68
5.2.1.5 Out-of-core solver........................................................................................ 69

5.2.2 D ata A ccess...........................................................................................................  69
5.2.3 Interactive Preview and Out-of-Core Solver R e su lts...................................  72

5.3 Parallel D istributed Gradient Domain E diting ...................................................... 79
5.3.1 Parallel Solver ....................................................................................................... 80

vi


5.3.1.1 D ata  distribution as tiles........................................................................... 80
5.3.1.2 Coarse solution............................................................................................  81
5.3.1.3 F irst phase: progressive solution.............................................................  81
5.3.1.4 Second phase: overlap solution................................................................  82
5.3.1.5 Parallel im plem entation d etails...............................................................  82

5.3.2 Results ..................................................................................................................  84
5.3.2.1 NVIDIA cluster...........................................................................................  85
5.3.2.2 Longhorn cluster.......................................................................................... 87
5.3.2.3 Heterogeneous cluster................................................................................. 88

5.4 Gradient Domain Editing on the C l o u d .................................................................  89
5.4.1 MapReduce and H a d o o p ....................................................................................  89

5.4.1.1 In p u t..............................................................................................................  90
5.4.1.2 MapReduce tran sfer...................................................................................  90

5.4.2 MapReduce for G radient D o m a in .................................................................... 92
5.4.2.1 Tiles................................................................................................................ 92
5.4.2.2 Coarse solution............................................................................................  94
5.4.2.3 F irst (map) phase........................................................................................ 94
5.4.2.4 Second (reduce) phase................................................................................ 94
5.4.2.5 Storage in the H D FS..................................................................................  95

5.4.3 Results ..................................................................................................................  96
5.4.3.1 Scalability......................................................................................................  97
5.4.3.2 Fault tolerance.............................................................................................  98

6. F U T U R E  W O R K .............................................................................................................. 99 
A P P E N D I X :  M A S S IV E  D A T A S E T S ............................................................................. 101 
R E F E R E N C E S  ......................................................................................................................... 102

vii


LIST OF FIG URES
1.1 A 360 degree panoram a taken of the city of Toronto, Ontario, C anada credited

to  Armstrong, Beere and Hime in 1856.......................................................................  2
1.2 Massive imagery is typically constructed as a mosaic of many smaller images.

(a) A panoram a of Salt Lake City comprised of 624 individual images. The 
combined image is over 3.2 gigapixels in size. (b) The panoram a after being 
composited into a single seamless image...................................................................... 2

1.3 The three stages of panoram a (mosaic) creation. First, the individual images 
must be acquired. Second, they are registered into a common coordinate sys­
tem . Third, they are composited (blended) into a single seamless image. My 
dissertation research has been to  provide technologies to enable interactivity 
for the final composition stage while a high-performance back-end provides a 
final image........................................................................................................................... 5

1.4 An example of the three stages of a panoram a’s creation. F irst, the individual 
images are acquired, typically, from a handheld camera. Second, for registra­
tion, the common coordinate system for the images is computed. Third, they
are composited (blended) into a single seamless image............................................ 5

1.5 A diagram  to  illustrate the main options available during the composition 
stage of panoram a creation. The most simple process is to  merge the image 
directly from the registration. The simplest approach is an alpha-blend of the 
overlap areas to  achieve a smooth transition between images................................ 6

1.6 A simple blending approach is usually not sufficient in mosaics with moving 
elements. In these cases, the elements produce “ghosts” (circled here in red)
in the final blend...............................................................................................................  7

1.7 (a, e) Two examples (Canoe: 6842 x 2853, 2 images and Lake Path: 4459 
x 4816, 2 images) of undesirable, yet exactly optim al seams (unique pairwise 
overlaps) for the pixel difference energy. (b, f) A zoom of visual artifacts 
caused by this optim al seam. (c, g) The pixel labeling. (d, h) The result 
produced by Adobe P ho to sh o p ™ . Images courtesy of City Escapes N ature 
Photography........................................................................................................................ 7

1.8 Hierarchical G raph Cuts has only been shown to work well on hierarchies of 
two to  three levels. For this four picture panoram a example, we can see th a t 
Hierarchical G raph Cuts produces a solution th a t passes though a dynamic 
scene element when using four levels of the hierarchy. A typical input value of a 
ten pixel dilation was used for this example. While a larger dilation param eter 
could be used, this would require a larger memory and com putational cost 
which negates the benefits of the technique................................................................ 9


1.9 Even when seams are visually acceptable, moving elements in the scene may 
cause m ultiple visually valid seam configurations. On the top, this figure 
shows a four image panoram a (Crosswalk: 4705 x 3543, four images) with 
three valid configurations. On the bottom , this figure shows a two image 
panoram a (Apollo-Aldrin: 3432 x 2297, two images) w ith two valid configu­
rations. Images courtesy of NASA................................................................................

2.1 Although the result is a seamless, sm ooth image, w ithout coarse upsampling 
the final image will fail to  account for large trends th a t span beyond a single 
overlap and can lead to unwanted, unappealing shifts in color..............................

3.1 (a) The first four levels of the Z-order space filling curve; (b) 4x4 array indexed 
using stand ard  Z -o rd er....................................................................................................

3.2 (a) Address transform ation from row-major index ( i , j )  to Z-order index I  
(Step 1) and th en  to hierarchical Z-order index (Step 2); (b) Levels of the 
hierarchical Z-order for a 4x4 array. The samples on each level remain ordered 
by the stand ard  Z-order...................................................................................................

3.3 O ur fast-stack Z-order traversal of a 4x4 array w ith concurrent index com pu­
tatio n  ...................................................................................................................................

3.4 (a) Naive parallel strategy where each process writes its piece of the overall 
d ataset into the underlying flle, (b) each process transm its each contiguous 
d a ta  segment to  an interm ediate aggregator. Once the aggregator’s buffer 
is complete, the d a ta  are w ritten to  disk, (c) several noncontiguous memory 
accesses are bundled into a single message to decrease com munication overhead

3.5 The ViSUS software framework. Arrows denote external and internal depen­
dences of the main software com ponents. Additionally this figure illustrates 
the relationship w ith several example applications th a t have been successfully 
developed w ith this framework. ..................................................................................

3.6 The L ightStream  Dataflow used for analysis and visualization of a three­
dimensional com bustion simulation (U intah code). (a) Several dataflow m od­
ules chained together to  provide a light and flexible stream  processing capa­
bility. (b) One visualization th a t is the result from this dataflow .........................

3.7 The same application and visualization of a Mars panoram a running on an 
iPhone 3G mobile device (a) and a powerwall display (b). D ata  courtesy of 
NASA. ..............................................................................................................................

3.8 Rem ote visualization and m onitoring of simulations. (a) An S3D com bustion 
sim ulation visualized from a desktop in the Scientific C om puting and Imaging 
(SCI) In stitu te  (Salt Lake City, U tah) during its execution on the H O P P E R  
2 high performance com puting platform  in Lawrence Berkeley N ational Lab­
o ratory (Berkeley, California). (b) Two ViSUS dem onstrations of LLNL sim­
ulation codes (M iranda and R aptor) visualized in real-tim e while executed on 
the B lueG ene/L  prototype installed at the IBM booth of the Supercom puting 
e xh ibit..................................................................................................................................

10

21

23

24

27

29

30

32

34

35

ix


4.1 The four-neighborhood min-cut solution (a) w ith its dual m in-path solution
(b). The m in-cut labeling is colored in red /b lu e and the m in-path solution is 
highlighted in red ..............................................................................................................  37

4.2 (a) Given a simple overlap configuration a seam can be thought of as a p ath  
s th a t connects pairs of boundary intersections u and v. (b) Even in a more 
com plicated case, a valid seam configuration is still com putable by taking pairs 
of intersections w ith a consistent winding about an image boundary. Note th a t 
there is an altern ate configuration denoted in gray................................................... 39

4.3 Given two m in-path trees associated w ith a seam ’s endpoints (u,v), a new 
seam th a t passes through any point in the overlap (yellow) is a simple linear 
walk up each tre e ............................................................................................................... 39

4.4 (a) A solution to  the panoram a boundary problem can be considered as a 
network of pairwise boundaries between images. (b) O ur adjacency mesh 
representation is designed w ith this property in mind. Nodes correspond to 
panoram a images, edges correspond to  boundaries and branching points (inter­
sections in red) correspond to  faces of the mesh. (c) G raph C uts optim ization 
can provide more complex pixel assignments where “islands” of pixels assigned 
to  one image can be com pletely bounded by another image. O ur approach
simplifies the solution by removing such islands........................................................ 40

4.5 (a) A three overlap adjacency mesh representation. (b) A four overlap initial 
quadrilateral adjacency mesh w ith its two valid mesh subdivisions. (c) A five 
overlap pentagon adjacency mesh w ith an example subdivision. .......................  42

4.6 Considering the full neighborhood graph of a panoram a (a), where an edge 
exists if an overlap exists between a pair of images, an initial valid adjacency 
mesh (b) can be com puted by finding all nonoverlapping, maxim al cliques in 
the full graph, then  activating and deactivating edges based on the boundary
of each clique......................................................................................................................  43

4.7 (a) Pairwise seam endpoints closest to a multioverlap (red) are considered a 
branching point. (b) This can be determ ined by finding a minimum point in 
the multioverlap w ith respect to  m in-path distance from the p artn er endpoints.
(c) After the branching point is found, the new seams are com puted by a 
linear lookup up the p artn er end p oin t’s seam tree. (d) To enable parallel 
com putation, each branching point is com puted using the initial endpoint 
location (green) even if it was moved via another branching point calculation 
(red )...................................................................................................................................... 44

x


4.8 (a) Pairwise seams may produce invalid intersections or crossings in a m ulti­
overlap, which leads to  an inconsistent labeling of the dom ain. The gray area 
on the top can be given the labels A or B  and on the b ottom  either C or D.
(b) Choosing a label is akin to  collapsing one seam onto the other. This leads 
to  new image boundaries, which were based on energy functions th a t do not 
correlate to  this new boundary. The top collapse results in a B-C boundary 
using an A-B seam (C-D seam for the bottom ). (c and d) O ur technique 
performs a b ette r collapse where each intersection point is connected to the 
branching point via a minimal p ath  th a t corresponds to  the proper boundary 
(B-C). One can thin k  of this as a virtu al addition of a new adjacency mesh 
edge (B-C) at the tim e of resolution to  account for the new boundary...............  45

4.9 The phases of out-of-core seam com putation. (a) F irst, branching points are 
com puted. The seams for all unshared edges can also be com puted during 
this pass. (b) Second, once the corresponding branching points are com puted, 
all unshared edges can be com puted w ith a single m in-path calculation. (c) 
Third, once all the seams for the edges for a given face have been com puted, 
the intersections can be resolved. Note, the three passes do not necessarily 
need to be three separate phases since they can be interleaved when the proper 
input d a ta  are ready. ....................................................................................................  47

4.10 The low memory branching point calculation for our out-of-core seam creation 
technique. (a) Given a face for which a branching point needs to  be com puted,
(b) the com putation proceeds “round-robin” on the edges of the face to  com­
pute the needed seam trees. The images th a t correspond to the edge endpoints 
and overlap energy are only needed during the seam tree calculation for a given 
edge on the face. Therefore by loading and unloading these d ata  during the
“round-robin” com putation, the memory overhead for the branching point 
com putation is the cost of storing two images, one energy overlap buffer, and 
one for the seam trees for the given face.....................................................................  48

4.11 For intersections th a t require a resolution seam, the two images which corre­
spond to the overlap needed for the seam must be loaded. In the figure above, 
these images are the ones th a t correspond to  the endpoint of the diagonal, 
resolution adjacency mesh edge.....................................................................................  49

4.12 Overview of Panorama Weaving. The initial com putation is given by steps 
one through four, after which the solution is ready and presented to the user. 
Interactions, steps five and six, use the tree u pd ate in step four as a background 
process. Additionally, step six updates the dual adjacency m esh.........................  50

4.13 Im porting a seam network from another algorithm . The user is allowed to 
im port the result generated by G raph C uts (a) and adjust the seam between 
the green and purple regions to unm ask a moving person (b). Note th a t this
edit has only a local effect, and th a t the rest of the im ported network is unaltered. 51

4.14 Im proper user constraints are resolved or if resolution is not possible, given 
visual feedback. (a) Resolution of an intersection caused by a user moving 
a constraint. (b) Resolution of an intersection caused by a user moving a 
branching point. (c) A non-resolvable case where a user is ju st provided a 
visual cue of a problem ....................................................................................................  53

xi


4.15 Given the inherent locality of the seam editing interactions, only a very small 
subset of the adjacency mesh needs to be considered. (a) For operations on 
an adjacency mesh face (i.e., branching point operations) only the images 
and overlaps of the corresponding face and its one face neighborhood need 
to  be loaded and com puted. (b) For edge operations (i.e., bending), we need 
consider only the faces th a t share the edge................................................................. 54

4.16 W hen a user selects an area of a panoram a to edit, the system must determ ine 
which overlaps intersect w ith the selected area. This can be accomplished 
w ith a (a) bounding hierarchy of the overlaps. D uring selection this hierarchy 
is traversed to  isolate the proper overlaps for the selection. This gives a 
logarithm ic lookup w ith respect to  the num ber of adjacency mesh faces in 
the panoram a. Alternatively, (b) if a pixel-to-image labeling is provided, this 
can be used to  isolate a fixed neighborhood th a t needs to  be tested for overlap 
intersection. This labeling is commonly com puted if the panoram a is to  be
fed into a color correction routine after seam com pu tation ....................................  54

4.17 Fall Salt Lake City, 126,826 x 29,633, 3.27 gigapixel, 611 images. (a) An 
example window com puted w ith out-of-core G raph C ut technique introduced 
in Kopf et al. [94]. This single window took 50 minutes for G raph Cuts 
to  converge, w ith the initial iteration requiring 10.2 m inutes. Since the full 
d ataset contains 495 similar windows, using the windowed technique would 
take days (85.15 hours) at best, and weeks (17.2 days) in the worst case. (b)
The full resolution Panorama Weaving solution was com puted in 68.4 minutes 
on a single core and 9.5 m inutes on eight cores. Our single core im plem entation 
required a peak memory footprint of only 290 megabytes while using eight 
cores had peak memory of only 1.4 gigabytes............................................................  57

4.18 Lake Louise, 187,069 x 40.202, 7.52 gigapixel, 1512 images. The Panorama 
Weaving results for the Lake Louise panoram a. Our out-of-core seam compu­
tation  produces this full resolution solution in as little as 37.7 minutes while 
requiring at most only 2.0 gigabytes of memory. P anoram a courtesy of City 
Escapes N ature Photography ......................................................................................... 59

4.19 Panorama Weaving on a challenging data-set (Nation, 12848 x 3821, nine 
images) with moving objects during acquisition, registration issues and varying 
exposure. Our initial autom atic solution (b) was com puted in 4.6 seconds at 
full resolution for a result with lower seam energy th a n  G raph Cuts. Addi­
tionally, we present a system for the interactive user exploration of the seam 
solution space (c), easily enabling: (d) the resolution of moving objects, (e) the 
hiding of registration artifacts (split pole) in low contrast areas (scooter) or (f) 
the fix of semantic notions for which autom atic decisions can be unsatisfactory 
(stoplight colors are inconsistent after the autom atic solve). The user editing 
session took only a few minutes. (a) The final, color-corrected panoram a.........  59

4.20 R epairing non-ideal seams may give m ultiple valid seam configurations. (a)
The initial seam configuration for the Skating d ataset (9400 x 4752, six images) 
based on gradient energy. (b and c) Its two m ajor problem areas. (d and e)
Using our technique a user can repair the panoram a, b ut also has the choices 
of two valid seam configurations. P anoram a courtesy of City Escapes N ature 
Photography........................................................................................................................ 59

xii


4.21 A panoram a taken by Neil A rm strong during the Apollo 11 moon landing 
(Apollo-Armstrong: 6,913 x 1,014, eleven images). (a) R egistration artifacts 
exist on the horizon. (b) Our system can be used to  hide these artifacts. (c)
The final color-corrected image. P anoram a courtesy of NASA.............................  61

4.22 In this example (Graffiti: 10,899 x 3,355, ten  images), (a) the user fixed a few 
recoverable registration artifacts and tuned the seam location for improved 
gradient-dom ain processing, yielding a colorful color-corrected graffiti. (b)
O ur initial autom atic solution (energy function based on pixel gradients). (c)
The user edited panoram a. The editing session took 2 m inutes. .......................  61

4.23 The color-corrected, user edited examples from Figure 1.7. The artifacts 
caused by the optim al seams can be repaired by a user. Images courtesy
of City Escapes N ature P hotography............................................................................ 62

4.24 A lake v ista panoram a (Lake: 7,626 x 1,231, 22 images) w ith canoes which 
move during acquisition. In all there are six independent areas of movement, 
therefore there are 64 possible seam configurations of different canoe positions.
Here we illustrate two of these configurations w ith color-corrected versions of
the full panoram a (a and c) and a zoomed in portion on each panoram a (b and 
d) showing the differing canoe positions. P anoram a courtesy of City Escapes 
N ature P hotography.........................................................................................................  62

4.25 Splitting a five valence branching point based on gradient energy of the Fall­
5way d ataset (5211 x 5177, 5 images): as the user splits the pentagon, the 
resulting seams m ask/unm ask the dynamic elements. Note th a t each branch­
ing point th a t has a valence higher th a n  3 can be fu rth er subdivided. ............ 62

5.1 O ur adaptive refinement scheme using simple difference averaging. (a) Global 
progressive up-sampling of the edited image com puted by a background pro­
cess. (b) View-dependent local refinement based on a 2k x  2k window. In both 
cases we speedup the SOR solver w ith an initial solution obtained by sm ooth 
refinement of the solution................................................................................................  68

5.2 Subsampled and tiled hierarchies. (a) A subsam pled hierarchy. As expected, 
subsam pling has the tendency to  produce high-frequency aliasing. Though 
details such as the cars on the highway and in the parking lots are preserved.
(b) A tiled hierarchy. This produces a more visually pleasing image at all 
resolutions b ut at the cost of potentially losing information. The cars are now 
completely smoothed away. D ata  courtesy of the U.S. Geological Survey..........  70

5.3 O ur progressive framework using subsam pled and tiled hierarchies. (a) A 
com posite satellite image of A tlanta, over 100 gigapixels at full resolution, 
overlaid on Blue M arble background subsampled; (b) a tiled version of the 
same satellite image; (c) the seamless cloning solution using subsampling; (d) 
the same solution com puted using a tiled hierarchy; (e) the solution offset 
com puted using subsampling; (f) the solution com puted using tiles; (g) a full 
resolution portion com puted using subsampling; (h) the same portion using 
tiling. Note th a t even though there is a slight difference in the com puted 
solution, both  the tiled and the subsam pled hierarchies produce a seamless 
stitch w ith our framework. D ata  courtesy of the U.S. Geological Survey and 
NASA’s E a rth  O bservatory............................................................................................. 71

xiii


5.4 The Edinburgh P anoram a 16,950 x 2, 956 pixels. (a) O ur coarse solution 
com puted at a resolution of 0.7 megapixels; (b) the same panoram a solved at 
full resolution w ith our progressive global solver scaled to approxim ately 12 
megapixel for publication; (c) a detail view of a particularly bad seam from the 
original panoram a; (d) the problem area previewed using our adaptive local 
refinement; (e) the problem area solved at full resolution using our global 
solver in 3.48 m inutes....................................................................................................... 74

5.5 The RMS error when com pared to  the ideal analytical solution as we increase 
iterations for both  m ethods. Stream ing multigrid has b ette r convergence and 
less error for the Edinburgh example (a), though our m ethod remains stable 
for the larger Salt Lake City panoram a (b). Notice th a t every plot has been 
scaled independently to  best illustrate the convergency trends of each m ethod. 75

5.6 P anoram a of Salt Lake City of 3.27 gigapixel, obtained by stitching 611 images.
(a) Mosaic of the original images. (b) Our solution com puted at 0.9 megapixel 
resolution. (c) The full solution provided by our global solver. (d) The 
difference image between our preview and the full solution at the preview 
resolution. B oth (a) and (c) have been scaled for publication to  approxim ately
12.9 megapixels..................................................................................................................  76

5.7 A comparison of our adaptive local preview on a portion of the Salt Lake 
City panoram a one half of the full resolution; (a) the original mosaic, (b) 
our adaptive preview, (c) the full solution from our global solver, and (d) the 
difference image between the adaptive preview and the full solution ................  76

5.8 A com parison of our system w ith the best known out of core m ethod [Kazhdan 
and Hoppe 2008] and a full analytical solution on a portion of the Salt Lake 
City panoram a, 21201 x 24001 pixels, 485 megapixel (a) the full analytical so­
lution; (b) our solution com puted in 28.1 minutes; (c) solution from [Kazhdan 
and Hoppe 2008] com puted in 24.9 minutes; (d) the analytical solution where 
the solver is allowed to harmonically fill the boundary; (e) our solution with 
harmonic fill; (f) solution from [Kazhdan and Hoppe 2008] w ith harmonic fill;
(g) the m ap image used by all solvers to  construct the panoram a where the 
red color indicates the image th a t provides the pixel color and white denotes
the panoram a boundary...................................................................................................  77

5.9 Application of our m ethod to  HDR image compression: (a) Original synthetic 
HDR image of an adaptively refined Sierpinki sponge generated w ith Povray.
(b) Tone m apped image w ith recovery of detailed inform ation previously hid­
den in the shadows. (c) Belgium House image solved using our coarse-to-fine 
m ethod w ith an initial 16 x 12 coarse solution (a  =  0.01, =  0.7, compression 
coefficient=0.5). (d) The direct analytical solution. Image courtesy of R aanan 
F a tta l....................................................................................................................................  78

5.10 Satellite imagery collection w ith a background given by a 3.7 gigapixel image 
from NASA’s Blue M arble Collection. T he Progressive Poisson solver allows 
the application of the seamless cloning m ethod to two copies of the city of 
A tlanta, each of 116 gigapixels. An artist can interactively place a copy of 
A tlanta under shallow w ater and recreate the lost city of Atlantis. D ata 
courtesy of the U.S. Geological Survey and NASA’s E a rth  O bservatory............. 79

xiv


5.11 O ur tile-based approach: (a) An input image is divided into equally spaced 
tiles. In the first phase, after a symbolic padding by a column and row in 
all dimensions, a solver is run on a window denoted by a collection of four 
labeled tiles. D ata  are sent and collected for the next phase to  create new 
d a ta  windows w ith a 50% overlap. (b) An example tile layout for the Fall 
P anoram a example. ....................................................................................................... 81

5.12 Windows are d istributed as evenly as possible across all nodes in the dis­
trib u ted  system. Windows assigned to  a specific node are denoted by color 
above. Given the overlap scheme, d a ta  transfer only needs to  occur one-way, 
denoted by the red arrows and boundary above. To avoid starvation between 
phases and to  hide as much d a ta  transfer as possible, windows are processed in 
inverse order (white arrows) and the tiles needed by other nodes are transferred
im mediately......................................................................................................................... 83

5.13 Fall P anoram a - 126,826 x 29, 633, 3.27 gigapixel. (a) The panoram a before 
seamless blending and (b) the result of the parallel Poisson solver run on 480 
cores w ith 124 x 29 windows and com puted in 5.88 m inutes.................................  85

5.14 W inter P anoram a - 92, 570 x 28,600, 2.65 gigapixel. (a) The result of the 
parallel Poisson solver run on 480 cores w ith 91 x 28 windows and com puted 
in 6.02 m inutes, (b) the panoram a before seamless blending, and (c) the coarse 
panoram a solution............................................................................................................. 85

5.15 The two phases of a M apReduce job. In the figure, three m ap tasks produce 
key/values pairs th a t are hashed into two bins corresponding to  the two reduce 
tasks in the job. W hen the d a ta  are ready, the reducers grab their needed d ata  
from the m ap per’s local disk..........................................................................................  90

5.16 A diagram  of the job control and d a ta  flow for one Task Tracker in a Hadoop 
job. The d otted, red arrows indicate d a ta  flow over the network; dashed arrows 
represent communication; the blue arrow indicates a local d a ta  w rite and the 
black arrows indicate an action taken by the node...................................................  91

5.17 Although the result is a sm ooth image, w ithout coarse upsampling the final 
image will fail to  account for large trends th a t span beyond a single overlap 
and can lead to unwanted shifts in color. Notice the vertical banding denoted
by the red arrows............................................................................................................... 92

5.18 The 512 x 512 tiles used in our Edinburgh (a), Redrock (b), and Salt Lake 
City (c) exam ples..............................................................................................................  93

5.19 Our tile-based approach: An input image is divided into equally spaced tiles.
In the m ap phase after a symbolic padding by a column and row in all 
dimensions, a solver is run on a collection of four tiles labeled by numbers 
above. After the m apper finishes, it assigns a key such th a t each reducer runs 
its solver a collection of four tiles th a t have a 50% overlap with the previous 
solutions. .........................................................................................................................  93

xv


5.20 The results of our cloud im plem entation, from top to  bottom : Edinburgh, 25 
images, 16, 950 x 2, 956, 50 megapixel and the solution to  Edinburgh from our 
cloud im plem entation; Redrock, nine images, 19, 588 x 4, 457; 87 megapixel 
and the solution to  Redrock from our cloud im plem entation; Salt Lake City,
611 images, 126,826 x 29,633, 3.27-gigapixel and the solution to  Salt Lake 
City from our cloud im plem entation............................................................................  97

5.21 (a) The scalability plot for the Edinburgh (50 megapixel) panoram a on our one 
node 8-core test desktop; (b) the scalability plot for Redrock (87 megapixel) 
panoram a on the same machine ....................................................................................  98

6.1 A typical example of interaction during panoram a registration from the open- 
source Hugin [77] software tool. Current interaction is limited to  the manual 
selection and deletion of feature points used during registration .......................... 100

xvi


LIST OF TABLES
4.1 Performance results com paring Panorama Weaving to  G raph Cuts for our 

test d atasets th a t contain more th a n  simple pairwise overlaps. Panorama 
Weaving run serially (PW -S) com putes solutions quickly. W hen run in p ar­
allel, runtim es are reduced to ju st a few seconds. The energy ratio (E. ratio) 
between the final seam energy produced by Panorama Weaving and G raph 
C uts (P W  Energy /  GC Energy) is shown. For all b ut one d ataset (Fall-5way), 
Panorama Weaving produces a lower energy result. It is com parable otherwise. 
P anoram a image sizes are reported in megapixels (M P ).........................................  56

4.2 Strong scaling results for the Fall Salt Lake City panoram a, 126, 826 x 29,633,
3.27 gigapixel, 611 images. Our out-of-core Panorama Weaving technique 
scales very well in term s of efficacy percentage compared to  ideal scaling up 
to  the physical cores of our test system (eight cores). At eight cores our 
technique loses a slight am ount of efficiency due to our im plem entation having 
a dedicated thread to handing the seam scheduling. Using the full eight cores 
to  process this panoram a provides a full resolution seam solution in ju st 9.5 
minutes. The system is extremely light on memory and uses a t most 1.4 
gigabytes. .........................................................................................................................  57

4.3 Strong scaling results for the Lake Louise panoram a, 187,069 x 40.202, 7.52 
gigapixel, 1512 images. Like the smaller Fall Salt Lake city panoram a, our 
im plem entation shows very good efficiency up to the physical num ber of cores 
on our test system. Using the full eight cores for the full resolution seam 
solution for this panoram a requires 37.7 minutes of com pute tim e and at most
2.0 gigabytes of memory..................................................................................................  58

5.1 The strong scaling results for the Fall Panoram a run on the NVIDIA cluster 
from 2-60 nodes up to  a to tal of 480 cores. Overhead (O /H ) due to  M PI 
communication and I/O  is also provided along with its percentage of actual 
running time. The Fall Panoram a, due to its larger size begins to  lose efficiency 
a t around 32 nodes when I/O  overhead begins to  dom inate. Even with this 
overhead, the efficiency (Eff.) remains acceptable....................................................  86

5.2 The strong scaling results for the W inter P anoram a run on the NVIDIA cluster 
from 2-60 nodes up to  a to tal of 480 cores. Overhead (O /H ) due to  M PI 
communication and I/O  is also provided along with its percentage of actual 
running tim e. For the W inter Panoram a, the I/O  overhead does not effect 
performance up to 60 nodes and the im plem entation m aintains efficiency (Eff.) 
throughout all of our ru n s...............................................................................................  86


5.3 Weak scaling tests run on the NVIDIA cluster for the Fall P anoram a d ataset.
As the num ber of cores, increases so does the image resolution to be solved.
The image was expanded from the center of the full image. Iterations of the 
solver for all windows were locked at 1000 for testing to  ensure no variation is 
due to  slower converging image areas. As is shown, our im plem entation shows 
good efficiency even when running on the maxim um num ber of cores................. 87

5.4 To dem onstrate the p ortability of our im plem entation, we have run strong 
scalability testing for the Fall P anoram a on the Longhorn cluster from 2-128 
nodes up to a to ta l of 1024 cores. As the num bers show, we m aintain good 
scalability and efficiency even when running on all available nodes and cores. . 87

5.5 Weak scaling tests run on the Longhorn cluster for the Fall P anoram a d ataset. 88
5.6 O ur simulated heterogeneous system. This test example is a simulated mixed 

system of 2 8-core nodes, 4 4-core nodes, and 8 2-core nodes. The weights for 
our framework are the num ber of cores available in each node. The tim ings 
and window distributions are for Fall P anoram a dataset. As you can see, with 
the proper weightings our framework can distrib ute windows proportionally 
based on the performance of the system. The max runtim e of 32.70 minutes 
for this 48 core system is on par w ith tim ings for the 32 core (40.08 minutes)
and 64 core (20.83 minutes) runs from the strong scaling te s t..............................  89

A.1 Massive panoram a d a ta  acquired and used in this dissertation work...................101
A.2 Massive satellite d a ta  acquired and used in this dissertation work.......................101

xviii


A C K N O W L E D G E M E N T S
This dissertation was m ade possible through the support of others who I ’d like the thank: 
F irst, I would like to  th an k  my family whose endless support made this work possible. 

T hank you Mom, Dad, Chris, Jason and Amira for your wholehearted support of my decision 
to  quit my job and move across the country to  go back to  school (in U tah  of all places). 
Most im portantly, I would like to  th an k  Delila, my p artn er in all of this, for taking this 
adventure w ith me. You are the source of my inspiration in all things.

I would like to  th a n k  my advisor and mentor, Valerio Pascucci, for his continued guidance 
and encouragement. We both  took a very big chance in our first semester as student and 
professor at U tah, which, I believe, reaped a very huge reward. Since the sta rt of my work, 
he has given me the support and confidence I needed to  succeed and for this I am  grateful. 
I hope this dissertation is not the end, but the beginning of a long collaboration.

I would also like to th a n k  the other members of my com mittee: Chris, Chuck, P aul and 
Paolo for their feedback on this work. I would like to  especially th an k  K rzysztof Sikorski 
who was a member of my com m ittee for most of my tim e at U tah. Despite his illness, he 
was a constant source of guidance and encouragement, for which I am  grateful.

Finally, I would like to  th a n k  the many collaborators I have had while here at U tah, 
and I hope you forgive me for not w riting the numerous list. This work was only possible 
through these collaborations. I cannot express the p erpetual astonishm ent on w hat we 
w ere/are able to  accomplish together.


C H A P T E R  1
M OTIVATION A N D  C O N T R IB U T IO N S

Interactive editing and m anipulation of digital media is a fundam ental com ponent in 
digital content creation. One media in particular, digital imagery, has seen a recent increase 
in popularity of its large or even massive image formats. U nfortunately, current systems 
and techniques are rarely concerned w ith scalability or usability w ith these large images. 
For example, the support for large imagery in the most prevalent interactive image editing 
application, Adobe P h o to sh o p ™ , lacks tru e  viability for to d a y ’s massive images. The 
application’s large image form at has a 90 gigapixel maxim um image size, lim ited editing 
functionality beyond 900 megapixel, and a tedious processing tim e during an interactive 
session. Moreover, the creation and processing of large imagery is assumed to be an offline, 
autom atic process though many of the problems associated w ith these d atasets require 
hum an intervention for repair. The work outlined in this dissertation will show th a t this 
expensive, offline assum ption need not be tru e  and th a t real-tim e interaction provides new 
and powerful environm ents for the creation and editing of massive images. Specifically, this 
work will detail how to  design interactive image processing algorithm s th a t scale.

There has always been an inherent hum an desire to  docum ent or replicate large vistas of 
our n atu ral world or to  docum ent historical events in detail. Panoram ic paintings reached 
the height of their popularity in the early 19th century due to improvements in perspective 
drawing techniques. A few decades later, the advent of modern photography was closely 
followed by the earliest work in the creation of panoram ic images, see Figure 1.1. In the years 
since, the popularity of panoram as has not waned, see Figure 1.2. These large, sweeping 
images capture the feeling of being an observer, w hether it is of a beautiful n atu ral view, 
a historic event such as a P residential inau gu ratio n ,1 or a rem inder of the destruction of 
w ar.2 Consequently, there exists a significant interest in creating and using large mosaics

1B a ra k  O b am a P resid e n tia l In au g u ratio n : h ttp ://g ig a p a n .o rg /g ig a p a n s /1 5 3 7 4 /

2H iroshim a P a n o ra m a  P ro jec t: h ttp ://w w w .iw u .e d u / rw ilso n /h iro sh im a /

http://gigapan.org/gigapans/15374/
http://www.iwu.edu/


2

F ig u r e  1.1: A 360 degree panoram a taken of the city of Toronto, O ntario, C anada credited 
to A rm strong, Beere and Hime in 1856.

F ig u r e  1.2: Massive imagery is typically constructed as a mosaic of many smaller images. 
(a ) A panoram a of Salt Lake City comprised of 624 individual images. The combined image 
is over 3.2 gigapixels in size. (b) The panoram a after being com posited into a single seamless 
image.

for personal, scientific, a n d /o r commercial applications. Examples include medical imaging, 
where electron microscopy d a ta  is composited into ultra-high resolution images [159] or 
the study of phenology and genomics.3 Massive imagery is also common in geographic 
inform ation systems (GIS) in th e form of aerial or satellite d a ta  and used for anything from 
urban planning to global clim ate research.

While many-megapixel cameras do exist,4 they are overly expensive and unwieldy to use. 
Therefore, massive imagery is typically constructed as a mosaic of many smaller images.

3G igaVision P ro ject: h ttp ://w w w .g ig a v isio n .o rg /

4Seitz 6x17 D igital: h ttp ://w w w .ro u n d s h o t.c h /x m L 1 /in te rn e t/d e /a p p lic a tio n /d 4 3 8 /d 9 2 5 /f9 3 4 .c fm

http://www.gigavision.org/
http://www.roundshot.ch/xmL1/internet/de/application/d438/d925/f934.cfm


3

At one time, images such as the one in Figure 1.1 were painstakingly constructed by hand. 
Recent innovations in algorithm s and available hardw are have drastically simplified the 
creation of small-scale panoram as. This process can be com puted simply offline and can 
now be embedded in com m odity cam eras (e.g., the Sony Cyber-shot 3D Sweep Panoram a) 
or mobile devices such as A pple’s iPhone. The panoram as for these algorithm s are assumed 
to  be small and therefore, are not designed to scale. For example, the iPhone’s panoram a 
feature, released in Septem ber 2012, has a strict 28 megapixel upper limit on the panoram a 
size. This is small by to d a y ’s standards. An online search for the word “gigapixel” 
powerfully dem onstrates the increasing desire to create ever larger panoram as. To date, 
the largest panoram a contains roughly 272 gigapixel, yet if the current trend  continues, this 
record is bound to  be broken w ithin a few m onths. This trend  is aided by the introduction 
of new, high-resolution image sensors. For example, w ith current state-of-the-art 36.3 
active megapixel CMOS sensors, it would take as little as 70 images to  produce a gigapixel 
panoram a.

Creating panoram as at large scales has become exponentially more difficult th a n  the 
simple, small cases for which panoram a techniques were originally designed. For example, 
the 3.2 gigapixel panoram a shown in Figure 1.2 took several hours to  capture and an 
order of m agnitude more tim e to  process on conventional hardware. Furtherm ore, this 
tim eline assumes a perfect capture and one-time processing. In practice, the process of 
setting up an autom ated cam era is complicated and error prone and often problems, such 
as unanticipated occlusion or global m isalignment occur. Unfortunately, many of these 
issues are only subtly expressed in the individual images and become apparent only after 
the creation of the final panoram a. Additionally, to d a y ’s processing pipelines are less th a n  
ideal and typically involve a large num ber of interdependent and unintuitive param eter 
choices. A mistake or unlucky choice in the setup can easily cause unacceptable artifacts 
in the image requiring a repeat of the process. Consequently, it may take several weeks 
and significant com putational resources to  produce one large-scale panoram a. This makes 
it difficult for all b ut a select few to  create such images and makes this imagery im practical 
for many interesting applications. For example, acquiring imagery from unusual locations 
such as national parks, or covering transient events like an aurora, becomes a significant 
logistical and m onetary challenge. Furtherm ore, in security applications waiting hours 
or days for viable results defeats the prim ary purpose of acquiring the images. Finally, in 
scientific applications, while typically less tim e constrained, the personal and com putational


4

resources necessary to create a large-scale image of the night sky, for example, are beyond 
the reach of all but the largest projects. Therefore, despite significant interest, creating 
these massive images remains an esoteric hobby or a closely guarded research project.

Work must be done to  close the gap between the desire to  create large-scale panoram as 
(and their potential applications) and the ability to capture, process, and utilize such 
imagery. An ideal panoram a system should allow a user to  browse individual images 
as they are acquired, to  setup and preview the processing pipeline and results through 
accurate, real-tim e approxim ations, and include a flexible and scalable offline com ponent 
to  produce a final image. The system should be divided into two components: First, 
a real-time framework to  supervise and steer the acquisition process and to  guide the 
postprocessing; and second, a com putational back-end based on a d istributed or cloud 
com puting framework. The real-time system  should be able to  run on devices as small as 
an iPad or netbook com puter and be designed to be used in the field to detect any problems 
as early as possible. The back-end fills the gap between commonly available but slow 
com modity hardw are and specialized d istributed com puting resources. By implementing a 
flexible framework able to run on a wide variety of heterogeneous systems, the back-end 
scales gracefully between a single multicore machine, or a small cluster, to more powerful 
hardware. My dissertation research has been the creation of technologies to  aid in the 
creation of such a system.

1.1 The Panoram a Creation P ipeline
Creating large-scale panoram as can be divided into three stages: acquisition, image 

registration, and composition. See Figures 1.3 and 1.4. Each stage individually has been the 
focus of a large am ount of research b ut little effort has been spent on real-tim e performance 
or their interdependence. Traditionally all three stages are treated  as separate postprocesses, 
making performance or pipelining a secondary consideration. However, as discussed above, 
this approach is rapidly becoming unsustainable as long acquisition and processing times 
lead to  errors as well as an increased num ber of hardw are and software failures. To this 
end, my research goals are and have been to develop algorithms for each stage th a t produce 
high quality approxim ations in real-tim e and provide a scalable infrastructure to  create 
full solutions exploiting all available hardware. Such new technologies would enable a wide 
variety of applications currently infeasible. For example, the ability to quickly and cheaply 
produce high resolution images of art galleries, historical events, or national parks would


5

Acquisition

F ig u r e  1.3: The three stages of panoram a (mosaic) creation. F irst, the individual images 
must be acquired. Second, they are registered into a common coordinate system. Third, 
they are composited (blended) into a single seamless image. My dissertation research has 
been to provide technologies to  enable interactivity for the final com position stage while a 
high-performance back-end provides a final image.

A cquisition Registration C om position

F ig u r e  1.4: An example of the three stages of a pan o ram a’s creation. F irst, the individual 
images are acquired, typically, from a handheld camera. Second, for registration, the 
common coordinate system for the images is com puted. Third, they  are composited 
(blended) into a single seamless image.

greatly benefit schools, universities and the public in general. Enabling the m ilitary to 
combine footage from multiple security cameras, satellites, or flying drones into a seamless 
overview would allow operators to  more accurately spot changes in a secure area or direct 
ground operations. While a full system is a long-term research goal, my dissertation work 
has focused on the last phase of the mosaic creation pipeline, specifically the composition 
stage.

After registration, image mosaics are combined in order to  give the illusion of a seamless, 
massive image. Images acquired w ith inexpensive robots and consumer cameras pose an 
interesting challenge for image processing techniques. Often, panoram a robots can take 
seconds between each photograph, causing gigapixel-sized images to  be taken over the 
course of hours. Due to this delay, images can vary significantly in lighting conditions 
a n d /o r exposure, and when registered can form an unappealing patchwork. Dynamic objects 
between images may also move during acquisition, ruining the illusion of a single, seamless 
image. Images acquired by air or satellite also suffer from an extrem e version of this problem, 
where the tim e of acquisition can vary from hours to  days for a single composite. Therefore,


6

minimizing the transition  between images is the fundam ental step in the composition stage, 
see Figure 1.5. The simplest transition  approach is an alpha-blend of the overlap areas. 
Szeliski [155] provides an excellent introduction to  this and other blending techniques. Such 
an approach does not work well in the presence of dynam ic elements which move between 
captures, artifacts from poor registration, or varying exposures across images, see Figure 1.6. 
Often, it is preferable to  com pute a “h ard ” boundary, or seam, between the images as a 
final step, or as the preprocess for a technique such as gradient dom ain blending [133, 103]. 
Techniques exists to com pute these seams based purely on distance [174, 132], but like 
blending, these will perform poorly when the scene contains moving elements.

A more sophisticated approach is to  com pute the boundaries between images through 
an energy function minim ization to produce a nice transition  between the mosaic images. 
These boundaries often provide the illusion of a seamless com posited image. If exposure 
or lighting conditions vary among the images, a final color correction is necessary to 
produce a sm ooth image. Techniques such as gradient dom ain blending [133, 103], mean 
value coordinates [54], or bilateral upsam pling [93] have been shown to provide adequately 
sm ooth images. G radient dom ain blending remains the most popular, b ut also the most 
com putationally expensive technique for color correction due to  the quality of its final 
results.

The techniques associated w ith panoram a boundaries and blending are typically com­
putationally expensive and are considered an offline, postprocess for large (and even small) 
panoram as. As the focus of my dissertation work, I provide novel algorithm s and techniques 
to  bring these operations into an interactive setting for massive imagery.

F ig u r e  1.5: A diagram  to  illustrate the main options available during the com position 
stage of panoram a creation. The most simple process is to  merge the image directly from 
the registration. The simplest approach is an alpha-blend of the overlap areas to  achieve a 
sm ooth transition  between images.


7

F ig u r e  1.6: A simple blending approach is usually not sufficient in mosaics w ith moving 
elements. In these cases, the elements produce “ghosts” (circled here in red) in the final 
blend.

1.2 Boundaries
In the past, panoram a image collections were captured in one sweeping motion (i.e., with 

image overlaps in only one dimension as in Figure 1.7). Today’s images are often collections 
of multiple rows and columns or in more u nstructured configurations. Consequently, more 
sophisticated panoram a processing techniques continue to  be developed to  account for their 
more complex configurations.

After the initial registration, the p an o ram a’s individual images are blended to give the 
illusion of a single seamless image. As a usual first step, a boundary between images m ust be 
com puted as input for a color correction technique such as gradient dom ain blending [133,

F ig u r e  1.7: (a, e) Two examples (Canoe: 6842 x 2853, 2 images and Lake P ath : 4459 x 
4816, 2 images) of undesirable, yet exactly optim al seams (unique pairwise overlaps) for the 
pixel difference energy. (b, f) A zoom of visual artifacts caused by this optim al seam. (c, g) 
The pixel labeling. (d, h) The result produced by Adobe P h o to sh o p ™ . Images courtesy 
of City Escapes N ature Photography.


8

103]. These boundaries are often called seams. Using a global optim ization technique, these 
seams can be optimized to  minimize visual artifacts due to transition  between images. This 
is typically a pixel-based energy function such as color or color-gradient variations across 
the boundary.

Currently, the most used technique for global seam com putation in a panoram a is the 
G raph Cuts algorithm  [26, 24, 92]. This is a popular and robust com puter vision technique 
and has been adapted [101, 4] to  com pute the boundary between a collection of images. 
W hile this technique has been used w ith good success for a variety of panoram ic or similar 
graphics applications [101, 4, 5, 3, 2, 94, 89, 44, 90], it can be problem atic due to  its high 
com putational cost and memory requirements. Moreover, G raph C uts applied to  digital 
panoram as is a typical serial operation. Since com puting the globally optim al boundaries 
between images is known to be N P-hard when the panoram a is composed of more th a n  a 
collection of unique pairwise overlaps [26], G raph C uts aims to  efficiently approxim ate the 
optim al solution and can therefore fall into local m inim a of the solution space.

There has been a large body of work to reduce some of the costs associated w ith G raph 
C uts [25, 24, 142, 5, 107, 61, 124, 167, 138, 69, 106], b ut each of these works primarily 
focuses on G raph C u ts’ typical image segm entation or de-noising applications. The success 
of many of these algorithm s has yet to  be dem onstrated for digital panoram as. Those which 
have been used in a panoram a context can suffer from lim itations. For example, the popular 
Hierarchical G raph C uts technique [107, 5] has been shown to  operate well on hierarchies 
up to two to  three levels in digital panoram as [5] and can be observed practice in panoram as 
such as the one shown in Figure 1.8. Given the recent trend  of the increasing resolution of 
panoram as (many megapixels to  gigapixels), one can see th a t this lim ited hierarchy would 
not be sufficient to  com pute the seams of images of these sizes. As a second example, G raph 
C uts often needs an integer based energy function to  guarantee convergence. This can prove 
problem atic for high dynamic range (HDR) panoram as.

To overcome these types of lim itations, a new approach was designed based on the 
following observations:

•  A minimal energy seam does not necessarily give visually pleasing results. Figure 1.7 
provides two examples of panoram as w ith an exact pairwise optim al energy boundary 
based on pixel difference across the seam. This should be sensitive to  dynamic, moving 
objects which appear in the overlap. As you can see, neither seam would be considered 
ideal by a user since they cut through moving objects in the scene. Additionally, to


9

Full Resolution Two Levels Three Levels Four Levels

F ig u r e  1.8: Hierarchical G raph Cuts has only been shown to  work well on hierarchies of 
two to  three levels. For this four picture panoram a example, we can see th a t Hierarchical 
G raph C uts produces a solution th a t passes though a dynam ic scene element when using 
four levels of the hierarchy. A typical input value of a ten  pixel dilation was used for this 
example. W hile a larger dilation param eter could be used, this would require a larger 
memory and com putational cost which negates the benefits of the technique.

fu rth er argue the im portance of this observation, the figure also shows very similar 
seams com puted by Adobe P h o to sh o p ™ , a widely used image editing application.

•  There can be more th a n  one valid seam solution. Even if the initial seam solution 
is visually acceptable to  the user, there may be a large num ber of additional, valid 
solutions. Some of these alternative seams may be preferable and this determ ination 
is com pletely subjective. For example, a user may have wished th a t the high energy 
in a seam occurred in an area where it is less likely to be noticed such as the grassy 
area or the w ater in the images in Figure 1.7. Given moving elements in a scene, such 
elements may occur entirely w ithin the area of an overlap. Therefore, there can be 
acceptable seams where the element is included and ones where it is not. Figure 1.9 
provides examples.

•  An interactive technique is necessary and attainable. Given the subjective n atu re of 
the image boundaries and the possibility of techniques falling into bad local minima, 
a user m ust be interjected into the seam boundary problem. Currently, finding 
panoram a boundaries w ith G raph C uts is an offline process w ith only one solution 
presented to  the user. The only existing alternative is the m anual editing, pixel by 
pixel, of the individual image boundaries. This is a tim e-consuming and tedious


10

F ig u r e  1.9: Even when seams are visually acceptable, moving elements in the scene may 
cause multiple visually valid seam configurations. On the top, this figure shows a four image 
panoram a (Crosswalk: 4705 x 3543, four images) w ith three valid configurations. On the 
bottom , this figure shows a two image panoram a (Apollo-Aldrin: 3432 x 2297, two images) 
w ith two valid configurations. Images courtesy of NASA.

process where the user relies on perception alone to determ ine if the manual seam 
is acceptable. Therefore, a guided interactive technique for image boundaries is 
necessary for panoram a processing. This technique should allow users to  include 
or remove dynam ic elements, move an image seam out of possible local minima into 
a lower error state, move the seam into a higher error sta te  (but one w ith more 
acceptable visual coherency) or hide errors in locations where they feel it is less 
noticeable. D uring these edits, the user should be provided the optim al seams given 
these new constraints.

•  A solution based on pairwise boundaries can achieve good results for panoram as giving 
a fast, highly parallel, and light system. Com puting pairwise-only optim al boundaries 
is both  fast and exact (i.e., is guaranteed to find the global minimum). It is then 
of no surprise th a t these boundaries have been used often in past work for pan- 
or tilt-only panoram as [145, 46, 157, 165]. Although it has been thought not to 
generalize beyond this case, there has been no technique to  use pairwise boundaries 
in panoram as w ith more complex structure, save for efforts to  combine them  via a 
distance m etric [69] or sequentially [53]. This dissertation work not only provides a 
global solution based on pairwise boundaries, b ut also shows th a t this solution often


11

produces lower energy seams th a n  G raph C uts for panoram as. Given a technique to 
combine pairwise boundaries into a coherent seam network, each disjointed seam can 
be com puted separately and trivially in parallel. Moreover, the solution produced for 
each is typically independent and therefore, memory and resources for each can be 
allocated and released as needed. In addition, the solution dom ain is only the overlap 
between pairs of images in contrast to  some previous applications of G raph C uts for 
panoram as [101, 4], which often consider the entire com posite image as the solution 
domain. All of these properties give the potential for a very fast and light system 
even when operating on the full resolution imagery. Moreover, such a system should 
have the ability to  be extended to an out-of-core or d istributed setting.

This dissertation describes a new image boundary technique called Panorama Weaving. 
F irst, Panorama Weaving provides an autom atic technique to create approxim ate optim al 
boundaries th a t is fast, has low memory requirements, and is easy to parallelize. Second, it 
provides the first interactive technique to  enable the exploration of the seam solution space. 
This gives the end-user a powerful editing system  for panoram a seams. In particular, the 
contributions of this work on a technical level are:

•  A novel technique to merge independently com puted pairwise boundaries into a global, 
consistent seam network th a t does not cascade to  a global calculation.

•  A panoram a seam creation technique based purely on pairwise boundary solutions. 
This technique is fast and highly parallel and shows significant speed-ups com pared 
to previous work, even when run sequentially. More im portantly, it achieves all of this 
even w ith full resolution imagery.

•  Out-of-core and d istributed seam creation algorithm s which extend the creation tech­
nique to  mosaics massive in size. These algorithm s provide speed-ups com pared to 
the state-of-the-art.

•  The first system th a t allows interactive editing of seams in panoram as. This system 
guarantees minimal user input thanks to  an efficient exploration of the solution space.

•  An intuitive mesh specialization of a region adjacency graph th a t encodes seam 
and image relations. This adjacency mesh provides a way to  guarantee the global 
consistency of the seam network of the interactions and also enables a robust editing 
of the netw ork’s topology.


12

1.3 Color Correction
Creating a single seamless image from a mosaic has been the subject of a large body of 

work for which gradient-dom ain (Poisson) techniques currently provide the best solution. 
Only one m ethod exists to  operate on the gradient-dom ain of massive images: the stream ing 
m ultigrid [89] technique. However, processing the three gigapixel image of Figure 1.2 using 
this technique still takes well over an hour, which does not support an interactive trial-and- 
error artistic process. An additional disadvantage of traditio n al out-of-core m ethods is their 
tendency to  achieve a low memory footprint a t the cost of significantly proliferating the disk 
storage requirements. For example, the m ultigrid m ethod [89] requires auxiliary storage an 
order of m agnitude greater th a n  the input size, alm ost half of which is due to  gradient 
com putation. In contrast, our approach completely avoids such d a ta  proliferation, thereby 
allowing the processing of d a ta  which already pushes the boundaries of available storage. 
The m ultigrid m ethod [89] is also lim ited by main memory usage since it is proportional to 
the num ber of iterations of the solver. This can cause the m ethod to  not achieve acceptable 
results for images th a t may require a large num ber of iterations, as shown in Section 4. This 
work introduces a new m ethod w ith memory usage independent of the num ber of iterations 
of the Poisson solver and, therefore, would scale gracefully in these cases.

An option to  reduce tim es is to  design a similar scheme to run in a d istributed environ­
ment. Consequently, there has been recent work to  extend the m ultigrid solver [90] to  a 
parallel im plem entation, reducing the tim e to com pute a gigapixel solution to  mere minutes. 
However, this approach is prim arily a proof-of-concept since it does not supply the classic 
tests of scalability (weak or strong) nor is it tested significantly. Like many out-of-core 
m ethods, proliferation of disk storage requirem ents is a m ajor drawback. For example, 
testing was only possible w ith a full 16-node cluster for some of the stream ing m ultigrid test 
d a ta  due to  excessive storage dem ands. Finally, the technique assumes th a t a small num ber 
of predeterm ined iterations is sufficient to  achieve a solution, which may not always be the 
case. This im plem entation was optimized for a single d istributed system  and therefore, is 
unlikely to port well to  other environments. H istory has shown th a t levels of abstraction th a t 
remove complexity from a code base can be instrum ental in the advancement of technologies. 
A bstraction th a t allows simple and portable code accelerates innovation and reduces tim e to 
develop new ideas. The cloud should be explored as such abstraction, allowing a developer 
to  ignore much of the more tedious and complex elements in implementing a d istributed 
graphics algorithm . A general scheme cannot beat the performance of highly specialized and


13

optimized code. Often for organizations w ith resources, there may be cases where speed and 
efficiency are more im portant th a n  the cost to  create and m aintain a typical implementation. 
Although w ith increased availability of cloud commodities, there is now the o pp ortunity to 
offer more members of our com munity the ability to develop new algorithm s for a distributed 
environment.

In particular, in the area of color correction this dissertation work introduces a simple 
and light-weight framework th a t provides the user w ith the illusion of a full Poisson system 
solve at interactive frame rates for image editing. This framework also allows for the 
com putation of a full solution on a single machine w ith a simple approach, rivaling the run 
tim e of the current best out-of-core technique [89], while producing equal or higher quality 
results on images th a t require a large num ber of iterations. The system is flexible enough to 
handle different hierarchical image form ats such as tiling for higher quality images or HZ- 
order for greater in p u t/o u tp u t (I/O ) speed. In particular, by exploiting a new implicit kd- 
tree hierarchy for HZ-order, the framework needs only to  access and solve visible pixels. This 
allows an a rtist to interactively apply gradient-based techniques to images gigapixels in size. 
This new framework is straightforw ard and requires neither com plicated spatial indexing 
nor advanced caching schemes. Additionally, this work introduces a framework for parallel 
gradient-dom ain processing inspired by the out-of-core technique w ith a novel reform ulation 
to  provide an efficient parallel d istributed algorithm . This new framework has both  a 
straightforw ard im plem entation and shows both  strong and weak parallel scalability. W hen 
implemented in stand ard  M PI (Message Passing Interface), the same code base ports well 
to  multiple d istributed systems. Furtherm ore, this d istributed algorithm  can be w rapped 
in a level of abstraction to be run on the cloud, allowing for a simple im plem entation, as 
well as allowing it to  be d istributed to the com munity at large.

Specifically, the contributions of this work are:

•  A coarse-to-fine progressive Poisson solver running at interactive frame rates, extended 
to a wide variety of gradient dom ain tasks, w ith the ability to  scale to gigapixel images. 
This cascadic solver entirely avoids the coarsening stage of the V-cycle yet produces 
high quality results.

•  A m ethod to locally refine solutions having tim e and space requirements th a t are 
linearly dependent on the screen resolution rath er th a n  the resolution of the input 
image.


14

•  A full out-of-core solver th a t m aintains strict control over system resources, rivals the 
run-tim es for the best known m ethod and consistently achieves quality results where 
previous m ethods may not converge well in practice.

•  A light-weight stream ing framework th a t provides adaptive m ultiresolution access to 
out-of-core images in a cache coherent m anner, w ithout using intricate indexing d ata  
structures or precaching schemes.

•  A d istributed algorithm  based on the out-of-core scheme, which has a straightforw ard 
im plem entation and shows both  strong and weak parallel scalability.

•  The first d istributed Poisson solver for imaging implemented in the cloud.


C H A P T E R  2
RELATED W ORK

This chapter will outline the previous work for the two m ajor portions of the composition 
step of the panoram a creation pipeline: image boundaries, Section 2.1, and color correction, 
Section 2.2. In particular, these sections will address the related work for the state-of-the-art 
for image boundaries, minimal image seams, and color correction, gradient dom ain editing.

2.1 Image Boundaries: Seams
This section details the related previous work for com puting image boundaries for an 

image mosaic. In particular, this section focuses on the current state-of-the-art th a t is the 
com putation of minimal mosaic seams.

2 .1 .1  P a ir w ise  B o u n d a r ie s
Some of the seminal works in digital panoram as assume th a t an image collection is 

acquired in a single sweep (either pan, tilt or a com bination of the two) of the scene. In such 
panoram as, only pairwise overlaps of images need be considered [119, 120, 157, 145, 46, 165]. 
The pairwise boundaries which have globally minimal energy can be com puted quickly 
and exactly using a min-cut or m in-path algorithm . There is an intuitive and proven 
duality between min-cut and single-source/single-destination m in-path [72]. These pairwise 
techniques were thought to  not be general enough to  handle the many configurations possible 
in modern panoram as. Recent work [69] has dealt w ith the com bination of these seams for 
more complex panoram as, although the seam combinations are still based on an image 
distance metric. O ther recent work [53], combined these separate seams for the purposes 
of tex tu re synthesis combining seams sequentially. For their work, this was sufficient to 
provide good results for textures. The com bination and intersection of these seams in a 
digital panoram a can be more complex and therefore, a more expressive com bination is 
necessary. In addition, interaction was not considered as a necessary functionality in these 
works. This dissertation presents a novel technique to combine these disjoint seams into a


16

global panoram a seam network and allow for m anual user interaction.

2 .1 .2  G rap h  C u ts
The G raph Cuts technique [26, 24], com putes a k-labeling of a graph, typically an 

image, to  minimize an energy function on the domain. An algorithm  th a t guarantees to 
find the global minimum is considered to  be N P-hard [26] and therefore G raph Cuts was 
designed to efficiently com pute a good approxim ation. G raph C uts has been shown to  give 
good results for a variety of energy functions [92]. Thus, it is of no surprise given this 
versatility th a t it has been shown to  a d ap t to  the image mosaic and panoram a boundary 
problem [101, 4]. However, G raph Cuts is both  a com putationally expensive and memory 
intensive technique. Given these requirements, there has been work on accelerating the 
G raph Cuts process by, for instance, adapting the technique to  run on the G PU  [167], 
in parallel [106], or in parallel-distributed [48] environments. Building a hierarchy for the 
G raph C uts com putation [107, 5] has shown to  be popular due to  its reduction of memory 
and com putation costs. For panoram as, this strategy has only been shown to provide good 
results for a hierarchy of two to  three levels [5]. There has also been work on bringing G raph 
C uts into an interactive setting [25, 142, 104, 61, 124] although these works have focused 
only on user guided image segm entation. This dissertation provides the first technique to 
allow interactive editing of panoram a boundaries.

2 .1 .3  A lte r n a tiv e  B o u n d a r y  T ech n iq u es
While G raph C uts still m aintains its popularity as a solution to the minimal boundary 

problem, there has been other ongoing work on alternative techniques. For example, there 
has been work on techniques based on lum inance voting [82]. There has also been recent 
work using geodesics to  interactively com pute pairwise minimal boundaries [45].

2 .1 .4  O u t-o f-C o r e  an d  D is tr ib u te d  C o m p u ta tio n
While there has been previous work to bring the G raph Cuts technique to massive grids, 

the previous work has only dealt w ith extending the algorithm  to a d istributed and out-of­
core environm ent [48], No current technique decouples the two. The work of this dissertation 
has the flexibility to  operate in-core, out-of-core, or d istributed depending on the application 
or available resources. Moreover, the inherent parallelism  of the new technique is likely to 
outperform  the previous work. Finally, there has been no work to  allow interaction with 
these seams at massive scales. Hierarchical G raph Cuts has been used on large images [2],


17

although given the docum ented lim itation on the viable num ber of levels in the hierarchy [5] 
this will not scale massively. Applying standard G raph C uts as a sweeping window over 
an image neighborhood [94] has been used to  produce boundaries for gigapixel imagery. 
Such a process has yet to be form ulated in parallel and is potentially very com putationally 
expensive.

2.2 Color Correction: Gradient D om ain Editing
This section details the related previous work in the most popular and sophisticated 

color correction procedure for image mosaics called gradient dom ain image editing, or by 
its alternative name, Poisson image editing.

2 .2 .1  P o is s o n  Im a g e P r o c e s s in g
A variety of gradient-based m ethods provide a popular, b ut com putationally expensive, 

set of techniques for advanced image m anipulation. Given a guiding gradient field con­
structed from one or m ultiple source images, these techniques a tte m p t to  find a closest-fit 
image using some predeterm ined distance metric. This basic concept has been adapted 
for stand ard  image editing [133], as well as more advanced m attin g  operations [152] , and 
high level drag-and-drop functionality [79]. Furtherm ore, gradient-based techniques can 
tone m ap high dynam ic range images to display favorably on stand ard  m onitors [55] or 
hide the seams in panoram as [133, 103, 4, 89]. O ther applications include detecting light­
ing [76] or shapes from images [173], removing shadows [57] or reflections [6], and gradient 
dom ain painting [114]. Recently, an alternative approach using m ean value coordinates 
has sm oothly interpolated the boundary offset between source images, thereby mimicking 
Dirichlet boundary conditions [54]. This promising new line of research has yet to  show 
support of Poisson techniques such as tone-m apping, the ability to work well out-of-core, 
or consistently acceptable results for m ethods th a t typically require N eum ann boundary 
conditions.

2 .2 .2  P o is s o n  S o lv er s
The solution to a two-dimensional (2D) Poisson problem lies at the core of gradient based 

image processing. Poisson equations have wide utility in many engineering and science 
applications. C om puting their solution efficiently has been the focus of a large body of 
work and even a cursory review is beyond the scope of this dissertation. For small images, 
m ethods exist to  find the direct analytical solution using Fast Fourier transform s [75, 7, 8,


18

113]. Simchony [146] provides a survey of these m ethods for com puter vision applications. 
Often the problem is simplified by discretization into a large linear system  whose dimension 
is typically the num ber of pixels in an image. If this system  is small enough to fit into 
memory, m ethods exist to  find the direct solution and we refer the reader to  D orr [51] 
who provides an extensive review on direct m ethods. Typically, iterative Krylov subspace 
m ethods, such as conjugate gradient, are used due to their fast convergence. For much 
larger systems, memory consum ption is the lim iting factor and iterative solvers, such as 
Successive Over-Relaxation (SOR) [13] become more attractive.

D epending on the application, different levels of accuracy may be required. Sometimes, 
a coarse approxim ation is sufficient to achieve the desired result. Bilateral upsampling 
m ethods [93] operating on a coarse solution produced good results for applications such 
as tonem apping. Such m ethods have not yet been shown to  handle applications such as 
image stitching where the interpolated values are typically not sm ooth at the seams between 
images.

W hen pure upsampling is insufficient, the system  m ust be solved fully. M ultigrid 
m ethods are often employed to  aid the convergence of an iterative solver. Such m ethods 
have proven particularly effective by dealing w ith the large scale trends at coarse resolutions. 
These techniques include preconditioners [66, 155] and multigrid solvers [27, 28]. There 
exist different variants of multigrid algorithm s using either adaptive [20, 88, 21, 2, 139] or 
nonadaptive meshes [87, 89]. As a first step in a complete m ultigrid system, the mesh is 
coarsened. The Poisson equation can then be solved in a coarse-to-fine m anner. One full 
iteration, from fine to coarse and back, is typically called a V-cycle. Most recently, a V-cycle 
was implemented in a stream ing fashion for large panoram as [89]. However, other systems 
only implement parts of the V-cycle. K opf et al. [94] implement only the second half in a 
pure upsampling procedure, while Bolitho et al. [21] implement a purely coarse-to-fine solver 
also called cascadic [22]. Lischinski et al. [105] applied this pure coarse-to-fine approach 
for interactive tonal adjustm ent. T he technique outlined in this diseration (for the first 
tim e) shows th a t a cascadic approach has applications well beyond the adjustm ent of tonal 
values and can be used for a wide variety of gradient based image processing techniques. 
This work also extends such techniques to  allow the interactive editing and processing of 
gigapixel images. The solver propagates sufficient information from coarse-to-fine, allowing 
us to achieve local solutions at interactive rates th a t are virtually indistinguishable from 
the full-resolution solution.


19

2 .2 .3  O u t-o f-C o r e  C o m p u ta tio n
Toledo [160] presents a survey of general out-of-core algorithm s for linear systems. The 

m ajority of algorithms surveyed assume th a t at least the solution vectors can be kept in 
main memory, which is not the case for large images. For out-of-core processing of large 
images, the stream ing m ultigrid m ethod of K azhdan and Hoppe [89] has so far provided the 
only solution. However, processing a three gigapixel image using this technique still takes 
well over an hour which does not support an interactive trial-and-error artistic process. 
Many algorithm s such as tone m apping require careful p aram eter tuning to achieve good 
results. Thus, waiting several hours to  examine the effects of a single p aram eter change is 
not feasible in this context.

An additional disadvantage of traditio n al out-of-core m ethods is their tendency to 
achieve a low memory footprint at the cost of significantly proliferating the disk storage 
requirements. For example, the m ultigrid m ethod [89] requires auxiliary storage an order of 
m agnitude greater th a n  the input size, alm ost half of which is due to  gradient com putation. 
In contrast, in this dissertation, I w ith my collaborators, introduce an approach th a t 
completely avoids such d a ta  proliferation, thereby allowing the processing of d ata, which 
already pushes the boundaries of available storage.

The m ultigrid m ethod [89] is also limited by main memory usage since it is proportional 
to  the num ber of iterations of the solver. This can cause the m ethod to not achieve 
acceptable results for images th a t may require a large num ber of iterations, as shown in 
Section 4. This work provides a new m ethod w ith memory usage independent of the num ber 
of iterations and, therefore, scales gracefully in these cases.

2 .2 .4  D is tr ib u t e d  C o m p u ta tio n
Recently, the stream ing m ultigrid m ethod has been extended to  a distrib uted  environ­

ment [90] and has reduced the tim e to  process gigapixel images from hours to  minutes. 
The d istributed multigrid requires 16 bytes/pixel of disk space in tem porary storage for 
the solver as well as 24 bytes/pixel to  store the solution and gradient constraints. For the 
terapixel example of K azhdan et al. [90], the m ethod had a minimum requirement of 16 
nodes in order to accom m odate the needed disk space for fast local caching. In contrast, 
the approach outlined in this work needs no tem porary storage and is implemented in 
standard M PI. Streaming multigrid also assumes a precom puted image gradient, which 
would add substantial overhead in initialization to  transfer the color float or double data. 
O ur new approach is initialized using original image d a ta  plus an extra byte for image


20

boundary inform ation which equates to 1/3 less d a ta  transfer in initialization th a n  the 
previous m ethod. D ata  transfers between this solver’s phases, while floating point, only 
deal w ith the boundaries between com pute nodes which is substantially smaller th a n  the full 
image and therefore are rarely required to be cached to  disk. The multigrid m ethod [89, 90] 
may also be limited by main memory, since the num ber of iterations of the solver is directly 
proportional to  the memory footprint. For large images, this limits the solver to only a few 
Gauss-Seidel iterations and therefore may not necessarily converge for challenging cases. 
O ur m eth o d ’s memory usage is independent of the num ber of iterations and can therefore 
solve images th a t have slow convergence.

Often large images are stored as tiles at the highest resolution; therefore, m ethods th a t 
exploit this stru ctu re would be advantageous. Stookey et al. [148] use a tile-based approach 
to  com pute an over-determined Laplacian P D E  (partial differential equation). By using 
tiles th a t overlap in all dimensions, the m ethod solves the P D E  on each tile separately and 
then blends the solution via a weighted average. U nfortunately this m ethod cannot account 
for large scale trends beyond a single overlap and therefore can only be used on problems 
which have no large (global) trends. Figure 2.1 illustrates why this would be a problem for 
panoram a processing. The coarse up-sam pling of our approach fixes this issue.

2 .2 .5  C lo u d  C o m p u tin g  - M a p R e d u c e  an d  H a d o o p
M apReduce [47] was developed by Google as a simple framework to  process massive d ata  

on large d istributed systems. It is an abstraction th a t owes its inspiration to  functional 
program m ing languages such as Lisp. At its core, the framework relies on two simple 
operations:

•  Map: Given input, create a key/value pair.

•  Reduce: Process all values of a given key.

All the complexity of a typical d istributed im plem entation due to d a ta  distribution, load 
balancing, fault-recovery and com m unication are under this abstraction layer and therefore 
can be ignored by a developer. This framework, when combined w ith a d istributed file 
system, can be a simple yet powerful tool for d a ta  processing.

Hadoop is an open source im plem entation of M apReduce m aintained by the Apache 
Software Foundation and can be optionally coupled w ith its own d istributed file system 
(HDFS). Pavlo et al. [131] found th a t Hadoop was easy to  deploy and use, offered adequate


21

F ig u r e  2.1: Although the result is a seamless, sm ooth image, w ithout coarse upsampling 
the final image will fail to  account for large trends th a t span beyond a single overlap and 
can lead to  unwanted, unappealing shifts in color.

scalability, has very effective fault-tolerance, and, most im portantly, was easy to  adap t for 
complex analytic tasks. H adoop is also widely available as a com modity resource. For 
example, Amazon Web Services, a service suite th a t has become nearly synonymous with 
cloud com puting in the media, provides Hadoop as a native capability [10]. Companies 
have begun to  use Hadoop as a simple alternative for d a ta  processing on large clusters [71]. 
For instance, The New York Times has used H adoop for large scale image to  P D F  con­
version [68]. Google, IBM, and NSF have also partnered to  provide a H adoop cluster for 
research [41].

2 .2 .6  O u t-o f-C o r e  D a t a  A c c e ss
Given an image, it is well known th a t the stand ard  row-major order exhibits good locality 

in only one dimension and is therefore ill-suited for an unconstrained out-of-core storage 
scheme [168]. Previous out-of-core Poisson m ethods [89] have been noted to  be severely 
lim ited by this constraint. Instead, indexing based on various space-filling curves [143] has 
been proposed in different applications [126, 70, 17, 102] to  exploit their inherent geometric 
locality. Of particular interest is the Z-order (also called Lebesgue-order) [17, 128] since 
it allows an especially simple conversion to  and from stand ard  row-major indices. While 
Z-order exhibits good locality in all dimensions, it does so only at a fixed resolution and 
does not support hierarchical access. Instead, this work will utilize the hierarchical variant, 
called HZ-order, proposed by Pascucci and Frank [128].


C H A P T E R  3
SCALABLE A N D  EFFIC IE N T  DATA 

ACCESS
At the core of any large d a ta  processing system  is an efficient scheme for d a ta  access. 

In this chapter, I will detail the technology used in the systems outlined in this work. 
In Section 3.1, I will review the fundam entals of the hierarchical Z-order (HZ-order) for 
two-dimensional arrays, our chosen form at for large image processing. I will also provide 
a new, simple algorithm  in Section 3.2 for accessing d a ta  organized in HZ-order, while 
avoiding the repeated index conversions used in [128]. Section 3.3 will provide a new 
parallel w rite scheme for HZ-order d ata. Finally, in Section 3.4 I will give an outline of 
the ViSUS software infrastructure, the core system  behind much of the massive image 
processing outlined in this dissertation. Conversion of large images into theViSUS form at 
requires no additional storage, com pared to the typical 1/3 d a ta  increase common for typical 
tiled image hierarchies. From our test d ata, we have found th a t there is only a 27% overhead 
due to the conversion compared to  ju st copying the raw d a ta  which makes this conversion 
very light. The conversion requires no operations on the pixel d a ta  and will outperform  
even the most simple tiled hierarchies, which require some m anipulation of the pixel data. 
Section 5.2.2 will show th a t this new I/O  infrastructure reduces the overhead by 28%-40% 
when com pared to  a stand ard  tiled image hierarchy. These num bers reflect the theoretical 
bound of 1/3 overhead, m ade worse by the inability to constrain real queries to  perfect 
alignment w ith the boundaries of a quadtree.

3.1 Z- and HZ-Order Background
The d a ta  access routine of our system achieves high performance on our image d ata  

by utilizing a hierarchical variant of a stand ard  Z-order (Lebesgue) space filling curve to 
lay out our two-dimensional d a ta  in one-dimensional memory. In the two-dimensional case 
the Z-order curve can be defined recursively by a Z shape whose vertices are replaced by 
Z shapes half its size (see Figure 3.1 (a)). Given the binary row-major index of a pixel


23

(a)

(b)

Binary representation
Z-order i

0 0000 00 00
1 0001 01 00
2 0010 00 01
3 0011 01 01

Sa
mp

le 
in

de
x 4 0100 10 005 0101 11 00

6 0110 10 01
7 0111 11 01
8 1000 00 10
9 1001 01 10
10 1010 00 11
11 1011 01 11
12 1100 10 10
13 1101 11 10
14 1110 10 11
15 1 1 1 1 11 11

F ig u r e  3.1: (a) The first four levels of the Z-order space filling curve; (b) 4x4 array indexed 
using stand ard  Z-order

(in • • • i 1i0, j n • • • j 1j 0) the corresponding Z-order index I  is com puted by interleaving the 
indices I  =  j n in . . .  j 1i 1j 0i0 (see Figure 3.2 (a) step 1).

While Z-order exhibits good locality in all dimensions, it does so only at full resolution 
and does not support hierarchical access. Instead, our system uses the hierarchical variant, 
called HZ-order, proposed by Pascucci and Frank [128]. This new index changes the 
standard Z-order of Figure 3.1 (b) to  be organized by levels corresponding to  a subsampling 
binary tree, in which each level doubles the num ber of points in one dimension (see Figure 3.2
(b)). This pixel order is com puted by adding a second step to the index conversion. To 
com pute an HZ-order index I, the binary representation of a given Z-order index I  is shifted 
to  the right until the first 1-bit exits. D uring the first shift, a 1-bit is added to  the left and 
0-bits are added in all following shifts (see Figure 3.2 (a)). This conversion could have a 
potentially very simple and efficient hardw are im plem entation. The software C + +  version 
can be implemented as follows:


24

(a)

(b)

F ig u r e  3.2: (a) Address transform ation from row-m ajor index ( i , j ) to  Z-order index I  
(Step 1) and then  to  hierarchical Z-order index (Step 2); (b) Levels of th e hierarchical 
Z-order for a 4x4 array. T he samples on each level rem ain ordered by the stan d ard  Z-order.

i n l i n e  adhocindex r e m a p (r e g is te r  adhocindex i ) {  
i  |= la s t_ b it_ m a s k ; / /  s e t  le f t m o s t  one 
i  /=  i& - i;  / /  remove t r a i l i n g  z e r o s
r e tu r n  (i> > 1 ); / /  remove r ig h tm o st one

}
We store th e d a ta  in a way guaranteeing efficient access to  any subregion w ithout internal 

caching and w ithout opening a d a ta  block more th a n  once. F urtherm ore, we allow for storage 
of incomplete arrays. In our storage form at, we first sort the d a ta  in HZ-order and group 
consecutive samples in blocks of constant size. A sequence of consecutive blocks is grouped


25

into a record and records are clustered in groups, which are organized hierarchically. Each 

record has a header specifying which of its blocks are actually present and if the data are 

stored raw or compressed. Groups can miss entire records or subgroups, implying that all 
their respective blocks and records are missing.

The file format is implemented via a header file describing the various parameters 

(dimension, block size, record size, etc.) and one file per record. The hierarchy of groups 
is implemented as a hierarchy of directories each containing a predetermined maximum 

number of subdirectories. The leaves of each directory contain only records. To open a file, 

one needs only to reconstruct the path of a record and defer its search to the file system. 

In particular, the path of a record is constructed as follows: we take the HZ-address of the 
first sample in the record, represent it as a string, and partition it into chunks of characters 

naming directories, subdirectories, and the record file. Note that, since blocks, records and 

groups can be missing, one is not restricted to arrays of data that cover the entire index 

space. In fact, we can easily store even images with different regions sampled at different 

resolutions.

3.2 Efficient Multiresolution Range Queries
One of the key components of our framework is the ability to quickly extract rectangular 

subsets of the input image in a progressive manner. Computing the row-major indices of all 
samples residing within a given query box is straightforward. However, efficiently calculating 

their corresponding HZ-indices is not. Transforming each address individually results in a 

large number of redundant computations by repeatedly converting similar indices. To avoid 

this overhead, we introduce a recursive access scheme that traverses an image in HZ-order, 
while concurrently computing the corresponding row-major indices. This traversal implicitly 

follows a kd-tree style subdivision, allowing us to quickly skip large portions of the image.
To better illustrate the algorithm I will first describe how to recursively traverse an 

array in plain Z-order using the 4x4 array of Figure 3.1 (b) as example. Subsequently, I will 

discuss how to restrict the traversal to a given query rectangle and finally, how the scheme 

is adapted to HZ-order.
We use a stack containing tuples of type (split-dimension, Lstart, mind, maxJ, m in j, 

m a x j, num elem ents). To start the process we push the tuple t0 =(1,0,0,3,0,3,16) onto 

the stack. At each iteration we pop the top-most element t from the stack. If t contains 

only a single element we output the current L start as HZ-index and fetch the correspond­


26

ing sample. Otherwise, we split the region represented by t into two pieces along the 

axis given by split-dimension and create the corresponding tuples t1 =  (0,0,0,3,0,1,8) and 

t2 =(0,8,0,3,2,3,8). Note that all elements of t1 and t2 can be computed from t by simple 
bit manipulation. In case of a square array, we simply flip the split dimension each time a 

tuple is split. However, one can also store a specific split order to accommodate rectangular 

arrays. Figure 3.3 shows the first eight iteration of the algorithm outputting the first four 

elements in the array of Figure 3.1 (b).

To use this algorithm for fast range queries, each tuple is tested against the query box 

as it comes off the stack and discarded if no overlap exists. Since the row-major indices 

describing the bounding box of each tuple are computed concurrently, the intersection test is 

straightforward. Furthermore, the scheme applies, virtually unchanged, to traverse samples 
in Z-order that sub-sample an array uniformly along each axis, where the sub-sampling rate 

along each axis could be different.

Finally, to adapt the algorithm to HZ-order (see Figure 3.2 (b)), one exploits the 

following two important facts:

• One can directly compute the starting HZ-index for each level. For example, in a 

squared array level 0 contains one sample and all other levels h contain 2h-1 samples. 

Therefore the starting HZ-index of level h, lShtart, is 2m -h, where m is the number of 

bits of the largest HZ-index.

• Within each level, samples are ordered according to plain Z-order and can be traversed 

with the stack algorithm described above, using the appropriate subsampling rate.

Using these two facts one can iterate through an array in HZ-order by processing one level 

at a time, adding lShtart to the I s t a r t  index of each tuple.

In practice, we avoid subdividing the stack tuples to the level of a single sample. Instead, 

depending on the platform, we choose a parameter n and build a table, with the sequence 
of Z-order indices for an array with 2n elements. When running the stack algorithm, each 

time a tuple t with 2n elements appears, we loop through the table instead of splitting 
t. By accessing only the necessary samples in strict HZ-order, the stack-based algorithm 

guarantees that only the minimal number of disk blocks are touched and each block is 

loaded exactly once.
For progressively refined zooms in a given area, we can apply this algorithm with a minor 

variation. In particular, one would need to reduce the size of the bounding box represented


s s

G
J II O © o 1,4)

t1 = ( 0 , 0 , 0 ,  3, 0,  1,8) t4 =(1, 4, 2, 3, 0, 1,4)
t = ( 1 , 0 , 0 ,  3, 0,  3, 16) t2 =(0, 8, 0, 3, 2, 3, 8) t2 =( 0, 8, 0 , 3,  2, 3, 8)

t6 .

3
X\Vi.

i"T
0

t 2

FT
t7 t8

t2 t2

•  1-3 (i,j)=(l,l)

t4 =(1, 4, 2, 3,0, 1,4)
t2 =(0, 8, 0, 3, 2, 3, 8)

Figure 3.3: Our fast-stack Z-order traversal of a 4x4 array with concurrent index computation


28

in a tuple each time it is pushed back into the stack. In this way, even for a progressively 

refined zoom, one would access only the needed data blocks, each being accessed only once.

3.3 Parallel Write
The multiresolution data layout outlined above is a progressive, linear format and 

therefore has a write routine that is inherently serial. When processing a large image on a 

distributed system, or even on a single multicore system, it would be ideal for each node, 

or process, to be able to write out its piece of the data directly in this layout. Therefore a 

parallel write strategy must be employed. Figure 3.4 illustrates different possible parallel 
strategies. As shown in Figure 3.4 (a), each process can naively write its own data directly 

to the underlying binary file. This is inefficient due to the large number of small file accesses. 

As data gets large, it becomes disadvantageous to store the entire dataset as a single, large 
file and typically the entire dataset is partitioned into a series of smaller more manageable 

files. This disjointness can be used by a parallel write routine. As each simulation process 

produces simulation data, it can store its piece of the overall dataset locally and pass the 

data on to an aggregator process. These aggregator processes can be used to gather the 

individual pieces and composite the entire dataset. In Figure 3.4 (b), each process transmits 
each contiguous data segment to an intermediate aggregator. Once the aggregator’s buffer 

is complete, the data are written to disk using a single large I/O  operation. Figure 3.4 (c), 

illustrates a strategy where several noncontiguous memory accesses from each process are 

bundled into into a single message. This approach reduces the number of small network 

messages needed to transfer data to aggregators. This last strategy has been shown to 

exhibit good throughput performance and weak scaling for S3D combustion simulation 

applications when compared to standard Fortran I/O  benchmark [98, 99].

3.4 ViSUS Software Framework
The ViSUS (Visual Streams for Ultimate Scalability) software framework6 has been 

designed as an environment to allow the interactive exploration of massive datasets on a 

variety of hardware, possibly over platforms distributed geographically. The system and 

I/O  infrastructure is designed to handle n-dimensional datasets but is typically used on 

two-dimensional and three-dimensional image data. This two-dimensional portion of this

6h t t p : / / v i s u s . c o  or h t t p : / / v i s u s . u s

http://visus.co
http://visus.us


29

| Rank One || Rank Two 11Rank Three| | Rank One || Rank Two 11Rank Three|

\ \ \ i  \ i j  / / / /  /

\w'\«> i"/ - •
Aggregator 
Process

^  Rank Z e r o f

1  f  f
s /  •  

^  Rank Zero f  
Aggregator
Process

(a)

^  IDX Binary File 

(b)

^  IDX Binary File 

(c)

Forming indexed datatype MPI file writes ♦ ------------- MPI puts
O  MPI indexed datatype •  Data chunks

F igure 3.4: (a) Naive parallel strategy where each process writes its piece of the overall 
dataset into the underlying flle, (b) each process transmits each contiguous data segment to 
an intermediate aggregator. Once the aggregator’s buffer is complete, the data are written 
to disk, (c) several noncontiguous memory accesses are bundled into a single message to 
decrease communication overhead.

system is the core application on which the massive applications outlined in this dissertation 

are built.
The ViSUS software framework was designed with the primary philosophy that the 

visualization and/or processing of massive data need not be tied to specialized hardware or 

infrastructure. In other words, a visualization environment for large data can be designed 

to be lightweight, highly scalable and run on a variety of platforms or hardware. Moreover, 

if designed generally such an infrastructure can have a wide variety of applications, all from 

the same code base. Figure 3.5 details example applications and the major components 

of the ViSUS infrastructure. The components can be grouped into three major categories. 
First, a lightweight and fast out-of-core data management framework using multiresolution 

space filling curves, which I have outlined Sections 3.1, 3.2, and 3.3. This allows the 
organization of information in an order that exploits the cache hierarchies of any modern 

data storage architectures. Second, ViSUS contains a dataflow framework to allow data 

to be processed during movement. Processing massive datasets in their entirety would be 
a long and expensive operation which hinders interactive exploration. By designing new 

algorithms to fit within this framework, data can be processed as it moves. The Progressive 

Poisson technique outlined in Section 5.2 is one such new algorithm. Third, ViSUS provides 
a portable visualization layer that was designed to scale from mobile devices to powerwall 

displays with the same code base.

Figure 3.5 provides a diagram of the ViSUS software architecture. In this section I


External SQL Image Conversion Compression Networking
and o t h e r... (Freelmage) (zlib) (curl)

Threadin g (pthreads/ 
w indows-native)

Rendering
(opengl)

Rendering 
extensions (glew)

GUI library 
(Juce)

ViSUS Apps 
with GUI

Figure 3.5: The ViSUS software framework. Arrows denote external and internal dependences of the main software components. 
Additionally this figure illustrates the relationship with several example applications that have been successfully developed with this 
framework.

00o


31

will detail three of ViSUS’s major components and how they couple with the efficient data 

access detailed in the previous sections to achieve a fast, scalable, and highly portable data 
processing and visualization environment. Finally, I will illustrate an important additional 
use of this infrastructure, real-time monitoring of scientific simulations.

3.4.1 LightStream Dataflow and Scene Graph
Even simple manipulations can be overly expensive when applied to each variable in 

a large scale dataset. Instead, it would be ideal to process the data based on need by 

pushing data through a processing pipeline as the user interacts with different portions of 

the data. The ViSUS multiresolution data layout enables efficient access to different regions 
of the data at varying resolutions. Therefore different compute modules can be implemented 

using progressive algorithms to operate on these data stream. Operations such as binning, 
clustering, or rescaling are trivial to implement on this hierarchy given some known statistics 

on the data, such as the function value range, etc. These operators can be applied to the 

data stream as-is, while the data are moving to the user, progressively refining the operation 
as more data arrives. More complex operations can also be reformulated to work well using 

the hierarchy. For instance, using the layout for image data produces a hierarchy which 

is identical to a subsampled image pyramid on the data. Moreover, as data are requested 
progressively, the transfer will traverse this pyramid in a coarse-to-fine manner. Techniques 

such as gradient-domain image editing can be reformulated to use this progressive stream 

and produce visually acceptable solutions which will be detailed in Section 5.2. These 

adaptive, progressive solutions allow the user to explore a full resolution solution as if it 

was fully available, without the expensive, full computation.
ViSUS LightStream facilitates this steam processing model by providing definable mod­

ules within a dataflow framework with a well understood API. Figure 3.6 gives an example 
of a dataflow for the analysis and visualization of a scientific simulation. This particular 

example is the dataflow for a Uintah combustion simulation used by the C-SAFE Center 

for the Simulation of Accidental Fires and Explosions at the University of Utah. Each 

LightStream module provides streaming capability through input and output data ports 

that can be used in a variety of data transfer/sharing modes. In this way, groups of modules 
can be chained to provide complex processing operations as the data are transferred from 

the initial source to the final data analysis and visualization stages. This data flow is 

typically driven by user demands and interactions. A variety of “standard” modules, such


32

(a) (b)

F igure 3.6: The LightStream Dataflow used for analysis and visualization of a three­
dimensional combustion simulation (Uintah code). (a) Several dataflow modules chained 
together to provide a light and flexible stream processing capability. (b) One visualization 
that is the result from this dataflow.

as data differencing (for change detection), content based image clustering (for feature 

detection), or volume rendering with multiple, science-centric transfer functions, are part of 

the base system. These can be used by new developers as templates for their own progressive 
streaming data processing modules.

ViSUS also provides a scene graph hierarchy for both organizing objects in a particular 
environment, as well as the sharing and inheriting of parameters. Each component in a 
model is represented by a node in this scene graph and inherits the transformations and 

environment parameters from its parents. Three-dimensional volume or two-dimensional 
slice extractors are children of a data set node. As an example of inheritance, a scene graph 

parameter for a transfer function can be applied to the scene graph node of a data set. If 

the extractor on this data set does not provide its own transfer function, it will be inherited.

3.4.2 Portable Visualization Layer - ViSU S AppK it.

The visualization component of ViSUS was built with the philosophy that a single 
code base can be designed to run on a variety of platforms and hardware ranging from 

mobile devices to powerwall displays. To enable this portability, the basic draw routines 

were designed to be OpenGL ES compatible. This is a limited subset of OpenGL used 
primarily for mobile devices. More advanced draw routines can be enabled if a system’s 

hardware can support it. In this way, the data visualization can scale in quality depending 

on the available hardware. Beyond the display of the data, the underlying graphical user 

interface (GUI) library can hinder portability to multiple devices. At this time ViSUS has


33

made use of the Juce 7 library which is lightweight and supports mobile platforms such as 

iOS and Android in addition to major operating systems. ViSUS provides a demo viewer 

that contains standard visualizations such as slicing, volume rendering and isosurfacing. 
Similarly to the example LightStream modules, these routines can be expanded through a 
well-defined application programming interface (API). Additionally, the base system can 

display two-dimensional and three-dimensional time varying data. As mentioned above, 

each of these visualizations can operate on the end result of a LightStream dataflow. The 

system considers a two-dimensional dataset as a special case of a slice renderer and therefore 

the same code base is used for two-dimensional and three-dimensional datasets. Combining 

all of the above design decisions allows the same code base to be used on multiple platforms 

seamlessly for data of arbitrary dimensions. Figure 3.7 shows the same application and 
visualization running on an iPhone 3G mobile device and a powerwall display.

3.4.3 Web-Server and Plug-In

ViSUS has been extended to support a client-server model in addition to the tradi­
tional viewer. The ViSUS server can be used as a standalone application or a web server 

plugin module. The ViSUS server uses HTTP (a stateless protocol) in order to support 
many clients. A traditional client/server infrastructure, where the client established and 

maintained a stable connection to the server, can only handle a limited number of clients 

robustly. Using HTTP, the ViSUS server can scale to thousands of connections. The ViSUS 

client keeps a number (normally 48) of connections alive in a pool using the “keep-alive” 

option of HTTP. The use of lossy or lossless compression is configurable by the user. For 

example, ViSUS supports JPEG and EXR for lossy compression of byte and floating point 
data, respectively. The ViSUS server is an open client/server architecture, therefore it is 

possible to port the plugin to any web server which supports a C + +  module (i.e., Apache, 

IIS). The ViSUS client can be enabled to cache data to local memory or to disk. In this way, 
a client can minimize transfer time by referencing data already sent, as well as having the 

ability to work offline if the server becomes unreachable. The ViSUS portable visualization 

framework (Appkit) also has the ability to be compiled as a Google Chrome, Microsoft 
Internet Explorer, or Mozilla Firefox web browser plugin. This allows a ViSUS framework 

based viewer to be easily integrated into web visualization portals.

7http://www.rawm aterialsoftware.com

http://www.rawmaterialsoftware.com


34

(a) (b)

F igure 3.7: The same application and visualization of a Mars panorama running on an 
iPhone 3G mobile device (a) and a powerwall display (b). Data courtesy of NASA.

3.4.4 Additional Application: Real-Time Monitoring

In addition to the ones provided in this dissertation, the ViSUS framework has an ad­
ditional ideal application in the real-time monitoring of large scientific simulations. Ideally, 

for these simulations a user-scientist would like to view a simulation as it is computed, 

in order to steer or correct the simulation as unforeseen events arise. Simulation data 
are often very large. For instance, a single field of a time-step from the S3D combustion 

simulation in Figure 3.8 (a) is approximately 128 gigabytes in size. In the time needed to 
transfer this single time-step, the user-scientist would have lost any chance for significant 

steering/correction of an ongoing simulation or at least the ability to save wasting further 
resources by terminating a simulation early. By using the parallel ViSUS data format in 

simulation checkpointing [98, 99], we can link this data directly with an Apache server using 

a ViSUS plug-in running on a node of the cluster system. By doing this, user-scientists can 

visualize simulation data as checkpoints are reached. ViSUS can handle missing or partial 

data; therefore, the data can be visualized even as it is being written to disk by the system.

ViSUS’s support for a wide-variety of clients (a stand-alone application, a web-browser 
plug-in, or an iOS application for the iPad or iPhone) allows the application scientist to 

monitor a simulation as it is produced, on practically any system that is available without 

any need to transfer the data off the computing cluster. As mentioned above, Figure 3.8 (a) 
is an S3D large-scale, combustion simulation visualized remotely from an high performance


35

(a) (b)

F igure 3.8: Remote visualization and monitoring of simulations. (a) An S3D combustion 
simulation visualized from a desktop in the Scientific Computing and Imaging (SCI) 
Institute (Salt Lake City, Utah) during its execution on the HOPPER 2 high performance 
computing platform in Lawrence Berkeley National Laboratory (Berkeley, California). (b) 
Two ViSUS demonstrations of LLNL simulation codes (Miranda and Raptor) visualized in 
real-time while executed on the BlueGene/L prototype installed at the IBM booth of the 
Supercomputing exhibit.

computing platform8.

This work is the natural evolution of the ViSUS approach of targeting practical ap­

plications for out-of-core data analysis and visualization. This approach has been used 
for direct streaming and real-time remote monitoring of the early large scale simulations 

such as those executed on the IBM B G /L  supercomputers at Lawrence Livermore National 

Laboratory (LLNL) [130] shown in Figure 3.8 (b). This work continues its evolution towards 
the deployment of high performance tools for in situ and postprocessing data management 

and analysis for the software and hardware resources of the future including exascale DOE 

platforms of the next decade9.

8Data are courtesy o f Jackie Chen at Sandia National Laboratories, Combustion Research Facility 
h t t p ://a s c r .s a n d ia .g o v /p e o p le /C h e n .h t m

9 Center for Exascale Simulation o f Combustion in Turbulence (E xaCT) 
h t t p : // s c i e n c e .e n e r g y .g o v / a s c r /r e s e a r c h /s c i d a c /c o - d e s i g n /

http://ascr.sandia.gov/people/Chen.htm
http://science.energy.gov/ascr/research/scidac/co-design/


CHAPTER 4

INTERACTIVE SEAM EDITING AT 
SCALE

This chapter outlines the Panorama Weaving technique which brings the boundary 

computation phase of panorama creation pipeline into an interactive environment. Sec­

tion 4.1 gives the relevant background and formulation for the computation of optimal 

image boundaries. Section 4.2 discusses how to achieve interaction with pairwise boundaries. 

Section 4.3 introduces the adjacency mesh data structure and how it can be used to bring 
pairwise seams to a global seam solution. Section 4.4 details how to extend the Panorama 

Weaving technique to an out-of-core environment, thereby scaling the technique to gigapixel 

images. In Section 4.5, I will detail how to design an interactive system using this technique 

and scale it to large images in Section 4.6. Finally, Section 4.7 provides results for the 

technique and in Section 4.8 I discuss its limitations.

4.1 Optimal Image Boundaries
In this section, we discuss the technical background for boundary calculations of both 

pairwise and many-image panoramas.

4.1.1 Optimal Boundaries

Given a collection of n panorama images I i , l 2 ..In and the panorama P , the image 
boundary problem can be thought of as finding a discrete labeling L(p) e (1...n) for all 

panorama pixels p e  P , which minimizes the transition between each image. If L (p) =  k, 

this indicates that the pixel value for location p in the panorama comes from image Ik. This 

transition can be defined by an energy on the piecewise smoothness E s(p, q) of the labeling 

of neighboring elements p, q e N , where N  is the set of all neighboring pixels. We would 
like to minimize the sum of the energy of all neighbors, E . For the panorama boundary 

problem, this energy is typically [4] defined as:


37

E (L) =  ^  Es(jp,t 
p,q&N

If minimizing the transition in pixel values:

E s (p ,q ) =  \\Il (j>)(P) -  I L(q)(P) \ +  ||1L(p)(q) -  ^L(q)(9) y 

or if minimizing the transition in the gradient:

E s(p ,q ) =  ||V / l ( p) (p) - V/L(q)(p)| +  |V/L(p) (q) -  v 1 L(q)(9)|

where L(p) and L(q) are the labeling of the two pixels. Notice that L(p) =  L(q) implies 

E s(p, q) =  0. Minimizing the change in pixel value works well in the context of poor 

registration or moving objects in the scene, while minimizing the gradient produces a nice 
input for techniques such as gradient domain blending. In addition, techniques can use a 

linear combination of the two energies.

4.1.2 M in-C ut and Min-Path
When computing the optimal boundary between two images, the binary labeling is 

equivalent to computing a min-cut of a graph whose nodes are the pixels and arcs connect 

a pixel to its neighbors. The arc weights are then the energy function being minimized, see 

Figure 4.1 (a). If we consider a four-neighborhood and the dual-graph of the planar min-cut 

graph, as we show in Figure 4.1 (b), we can see that there is an equivalent min-path to the 
min-cut solution on the dual-graph. This has been shown to be true for all single source, 

single destination paths on planar graphs [72]. The approaches are equivalent in the sense

(a)

E s  ( P , q )

D n -0 -D
O — Q —1 - 0 — O — O

QTaT-n-Tp
C H -0-r-0-K > — O

(b)
D r a t H - T Q

F igure 4.1: The four-neighborhood min-cut solution (a) with its dual min-path solution 
(b). The min-cut labeling is colored in red/blue and the min-path solution is highlighted 
in red.


38

that the final solution of a min-cut calculation defines the pixel labeling L(p) while the 

min-path solution defines the path that separates pixels of different labeling.

4.1.3 Graph Cuts

This technique provides good solutions to pixel labeling problems for more than two 

images. The intricacies of the algorithm [26, 24, 92] are beyond the scope of this dissertation, 
but at a high level, Graph Cuts finds a labeling L which minimizes an energy function E '(L ). 

This function consists of term E s(p, q) augmented with energy associated with individual 

pixel locations E d(p).

E '(L ) =  £  E d (p )+  £  E s(p,q) 
p p,qeN

For the panorama boundary problem, this data energy E d is typically [4] defined as being 

0 if location p is a valid pixel of IL(P). Otherwise, it has infinite energy.

4.2 Pairwise Seams and Seam Trees
Figure 4.2 illustrates two example pairwise image seams. In the simplest and most 

common case, Figure 4.2 (a), the boundary lines of the two images intersect at two points 
u and v connected by the seam s. The other simple, but more general case in Figure 4.2 (b) 

shows two overlapping images, where the intersection of their boundary lines results in an 

even number of intersection points. A set of seams can be built by connecting pairs of points 

with a consistent winding. The seams computed in this way define a complete partition 

of the space between the two images. In nonsimple cases, i.e., with co-linear boundary 

intervals, we can achieve the same result by choosing one representative point (possibly 

optimized to minimize an energy). Notice that the case in Figure 4.2 (b) produces more 

than a single set of valid seams, denoted by the purple and grey dashed lines. For clarity 
in the discussion, we will focus on the case in Figure 4.2 (a) since we can treat each seam 

of the case in Figure 4.2 (b) as independent.
Assuming the dual-path energy representation in Figure 4.1 (b), a seam is a path that 

connects the intersection points (u,v). Computing the minimal path of a given energy 

function will give an optimal seam s, which can be computed efficiently with Dijkstra’s 

algorithm [50]. With minimal additional overhead, we can compute both min-path trees 

Tu and Tv from u and v (single source all paths). These trees provide all minimal seams 
which originate from either endpoint and define the dual seam tree of our technique. Given 
a point in the image overlap, we can find its minimal paths to u and v with a linear walk


39

(a) (b)

F igure 4.2: (a) Given a simple overlap configuration a seam can be thought of as a path 
s that connects pairs of boundary intersections u and v. (b) Even in a more complicated 
case, a valid seam configuration is still computable by taking pairs of intersections with a 
consistent winding about an image boundary. Note that there is an alternate configuration 
denoted in gray.

up the trees Tu and Tv, as shown in Figure 4.3. If this point is a user constraint, the union 

of the two minimal paths forms a new constrained optimal seam. Due to the simplicity 

of the lookup, this path computation is fast enough to achieve interactive rates even for 

large image overlaps. Note that two min-paths on the same energy function are guaranteed 

not to cross. Although, since each dual-seam tree is computed independently, the minimal 
paths from a constraint (to u and v) can cross. In particular, if the trees computed by 

Dijkstra’s algorithm are dependent on the order in which the edges are calculated and there 
are multiple paths in an overlap that share the same energy, the paths on each tree to a 

user constraint can cross. To avoid this problem we enforce an ordering based on the edge 

index and we are guaranteed to achieve noncrossing solutions.
Moving an endpoint is also a simple walk up its partner endpoint’s seam tree. Therefore

© □ G

F igure 4.3: Given two min-path trees associated with a seam’s endpoints (u,v), a new 
seam that passes through any point in the overlap (yellow) is a simple linear walk up each 
tree.


40

a user can change an endpoint location at-will, interactively. Although after the movement, 

the shifted endpoint's seam tree is no longer valid since it was based on a previous location. 

If future interactions are desired, the tree must be recomputed. This can be computed 
as a background process after the users finish their initial interaction without any loss of 

responsiveness to the system.

4.3 From Pairwise to Global Seams
To avoid incurring the cost associated with the solution of a global optimization, we build 

the panorama as a proper collection of pairwise seams. This is based on the observation, 
illustrated in Figure 4.4 (a), that the label assignment in a Graph Cut optimization mostly 

forms a simple collection of regions partitioned by pairwise image seams (denoted in the 

picture by the double-arrows).

Our technique is designed with this property in mind and independently computes each 

seam constrained by the pairwise intersections called branching points. These are colored 

in red in Figure 4.4 (b).

O---------------------Q .

F igure 4.4: (a) A solution to the panorama boundary problem can be considered as a 
network of pairwise boundaries between images. (b) Our adjacency mesh representation 
is designed with this property in mind. Nodes correspond to panorama images, edges 
correspond to boundaries and branching points (intersections in red) correspond to faces of 
the mesh. (c) Graph Cuts optimization can provide more complex pixel assignments where 
“islands” of pixels assigned to one image can be completely bounded by another image. 
Our approach simplifies the solution by removing such islands.


41

Note that the solution of a Graph Cuts optimization can provide more complex pixel 

assignments, where “islands” of pixels assigned to one image can be completely bounded by 

another image, as shown in Figure 4.4 (c). Obviously, our approach simplifies the solution 
by removing such islands and makes each region simply connected. We have checked how 

the energy optimized by our technique would change with this assumption (see Section 4.7). 

In all cases we have noticed that the energy of the seams produced by our system remains 

in the same order of magnitude as Graph Cuts, actually being reduced in all cases but one. 

Limitations on this assumption are detailed in Section 4.8.

4.3.1 The Dual Adjacency Mesh

To construct a seam network, our computations are driven by an abstract structure 

that we call the dual adjacency mesh. We draw the inspiration for our adjacency mesh 

representation from the traditional region adjacency graph used in computer vision, as well 

as the regions of difference (ROD) graphs of Uyttendaele et al [165]. In Figure 4.4 (b and

c), we have the adjacency graph for a global, Graph Cuts computation. This graph can 
be considered the dual to the seam network: each node corresponds to an image in the 

panorama, whereas each edge describes an overlap relation between images. Edges are then 

orthogonal to the seam they represent. If we consider this graph as having the structure of a 
mesh, the dual of the panorama branching points are the faces of this mesh representation. 

In Figure 4.4 (b), the branching points are highlighted in red. Seams which exit this 
mesh representation correspond to pairwise overlaps on the panorama boundary. These are 

illustrated in Figure 4.4 (b) with a single yellow endpoint. Connecting the branching points 
on adjacent faces in the mesh and/or the external endpoints gives a global seam network 

of pairwise image boundaries.
In addition to the branching points in the seam network, the faces of the adjacency mesh 

are also an intuitive representation for overlap clusters. Specifically, clusters are groups of 

overlaps that share a common area that we call a multioverlap. These multioverlaps are areas 

where branching points must occur. The simplest multioverlap beyond the pairwise case 

consists of three overlaps and is represented by a triangle, see Figure 4.5 (a). A multioverlap 

with four pairwise overlaps, can be represented by a quadrilateral, indicating that all four 
pairwise seams branch at a mutual point. An important property of this representation is 

that this quadrilateral can be split into two triangles, a classic subdivision, see Figure 4.5 
(b). Any valid (no edge crossing) subdivision of a polygon in this mesh will result in


42

F igure 4.5: (a) A three overlap adjacency mesh representation. (b) A four overlap initial 
quadrilateral adjacency mesh with its two valid mesh subdivisions. (c) A five overlap 
pentagon adjacency mesh with an example subdivision.

a valid seam configuration. In this way, the representation can handle a wide range of 

seam combinations, but keep the overall network valid. Figure 4.5 (c) shows an example 
subdivision of a five-way intersection.

As a precomputation, we calculate the initial adjacency mesh consisting of simple 

n-gon face representations for every n-way cluster. This precomputation stage enables 
the conversion of the initial nonplanar full neighborhood graph into a planar mesh repre­

sentation, see Figure 4.6. Clusters (and their corresponding muti-overlaps) by definition 
are nonoverlapping, maximal cliques of the full neighborhood graph. This computation is a 

classic clique problem and is known to be NP-complete [43]. For most panoramas, we have 

found the neighborhood graph is small enough that a brute-force search can be computed 

quickly. Although, previous work has shown that given a graph with a polynomial bound 

on the number of maximal cliques, they can be found in polynomial time [141]. This is 

indeed the case for the neighborhood graph which has maximal boxicity [140] dimension of 

two [36]. After the maximal cliques have been found, each n-gon face is extracted by finding 
the fully spanning cycle of clique vertices on the boundary in relation to the centroids of 

the images. The boundary edges of the n-gon face are marked as active, while the interior 

(intersecting) edges are marked as inactive as shown in Figure 4.6.
This adjacency mesh is used to drive the computation and combination of the pairwise 

boundaries as well as user manipulation. As we will illustrate in Section 4.5, it can be 

completely hidden from a user of the interactive system with intuitive editing concepts.

4.3.2 Branching Points and Intersection Resolution
Given a collection of seam trees that correspond to active edges in the adjacency mesh, 

we can now combine the seams into a global seam network. To do this, we need to compute 

the branching points which correspond to each adjacency mesh face, adjust the seam given 
a possible new endpoint, and resolve any invalid intersections that may arise (in order to 
maintain consistency).


43

F igure 4.6: Considering the full neighborhood graph of a panorama (a), where an edge 
exists if an overlap exists between a pair of images, an initial valid adjacency mesh (b) 
can be computed by finding all nonoverlapping, maximal cliques in the full graph, then 
activating and deactivating edges based on the boundary of each clique.

4.3.2.1 B ran ch in g p oin ts. Assuming for each pairwise seam there exists only two 

endpoints, for each multioverlap one endpoint must be adapted into a branching point. 

We refer to this endpoint as being inside in relation to the adjacency mesh face. The 

other seam endpoint is considered to be outside in relation to the multioverlap. These 

can be computed by finding the endpoints which are closest (euclidean distance) to the 
multioverlap associated with the face. Figure 4.7 (a) displays these endpoints with the 

color red and the multioverlap area with a blue shading. Although it is possible to create a 

pathological overlap configuration where this distance metric fails, we have found that this 

strategy works well in practice.

If we use the dual seam tree distances, i.e., the path distance values associated with 

the outside endpoints, we can compute a branching point which is optimal with respect 

to these paths, as illustrated in Figure 4.7 (b). This can be accomplished with a simple 

lookup of the distance values in the trees. We have found that minimizing the sum of the 
least squared error provides a nice low energy solution. The new path associated with a 

moved endpoint is determined by a simple walk up the dual seam tree, see Figure 4.7 (c). 

Additionally, each seam tree associated with the branching point is recalculated given its 

location. As Figure 4.7 (d) illustrates, the branching point is always computed using the 

distance field of the initial endpoint location even if this point had been previously adjusted 
by an adjacent face. In practice, we have found the contribution of the root location is 

minimal to the overall structure of the seam tree towards the leaves of the tree. Since using 

the initial starting endpoints allows the branching points to be computed independently 
and in a single parallel pass, we have adopted this into our technique.

The seams produced by this initial process in the four-overlap case are similar to the


44

Tree Lookup—a. 
f

----------
J J
•»-

I
ck

1—
jJ \ K.

i
|M

>------

V-J r A

o-----
O —

r\ _

/'
— - jlr-v

' V X"
o —

----
(a) (b) (c) (d)

F igure 4.7: (a) Pairwise seam endpoints closest to a multioverlap (red) are considered a 
branching point. (b) This can be determined by finding a minimum point in the multioverlap 
with respect to min-path distance from the partner endpoints. (c) After the branching point 
is found, the new seams are computed by a linear lookup up the partner endpoint’s seam 
tree. (d) To enable parallel computation, each branching point is computed using the initial 
endpoint location (green) even if it was moved via another branching point calculation (red).

sequential techniques introduced by Efros and Freeman [53] and Cohen et al. [42]. With 

the additional adjacency mesh, our technique is much more expressive in the possible seam 

configurations (especially allowing arbitrary valence branching points). In addition, as we 
will illustrate next, for panoramas and especially in an interactive setting one cannot assume 

that a seam’s path to a branching point respects the paths of other seams.
4.3.2.2 R e m o v in g  invalid in tersection s. Since each seam is computed using a 

separate energy function, seam-to-seam intersections beyond the branching points are pos­

sible. Small intersections of this type must be allowed to ensure solutions are computable 
in a four-neighborhood configuration. For instance, there would be no nonintersecting 

way to combine five seams into a single branching point. This allowance is defined by an 

e-neighborhood around the branching point which can be set by the user. We have found 

allowing an intersection neighborhood of one or two pixels gives good results with no visible 
artifacts from the intersection. The intersections in this neighborhood are collapsed to be 

co-linear to the shortest of the intersecting paths.

Intersections that occur outside of this e-neighborhood must be resolved due to the 

inconsistent pixel labeling that they imply. Figure 4.8 (a, c) shows an example of in­

tersections in a four-way image overlap. The areas highlighted in gray have conflicting 
image assignments. Enforcing no intersections at the time of the seam computation would 

complicate parallelism and be overly expensive. This corresponds to a k-way planar escape 

problem with multiple energies (where k is the number of seams incoming to the branching 
point) for which variants have been shown to be NP-complete [179]. This could also lead to 
possible unstable interactions since small movements may lead to extremely large changes 

in the overall seam paths. The simplest solution is to choose one assignment per conflict


45

A B

"  ~~ - ~ t" „ "

C D

( a)

C D

( b)

A

A \
\

B

o v o
A B

-  I"

\

C D
0 — 0

- 1 -

C D

( c)

O V O

0-0

A
\

B

"  -  -

C D

( d)

F igure 4.8: (a) Pairwise seams may produce invalid intersections or crossings in a 
multioverlap, which leads to an inconsistent labeling of the domain. The gray area on 
the top can be given the labels A or B and on the bottom either C or D. (b) Choosing a 
label is akin to collapsing one seam onto the other. This leads to new image boundaries, 
which were based on energy functions that do not correlate to this new boundary. The top 
collapse results in a B-C boundary using an A-B seam (C-D seam for the bottom). (c and
d) Our technique performs a better collapse where each intersection point is connected to 
the branching point via a minimal path that corresponds to the proper boundary (B-C). 
One can think of this as a virtual addition of a new adjacency mesh edge (B-C) at the time 
of resolution to account for the new boundary.

A B

C D

area. This is equivalent to collapsing the area and making the two seams co-linear at points 

where they “cross.” Each collapse introduces a new image boundary for which the wrong 

energy function has been minimized, Figure 4.8 (a, b). In our technique, we perform a more 

sophisticated collapse.

For a given pair of intersecting seams, multiple intersections can be resolved by taking 
into account only the furthest intersection from the branching point in terms of distance 

on the seam. Given that each seam divides the domain, this intersection can only occur 

between seams that divide a common image. If presented with a seam-to-seam intersection, 

we can easily compute the new boundary that is introduced during the collapse. This is 
simply a resolution seam on an overlap between the images which is not common between the 

intersection seams. The resolution seams connect the intersection points with the branching 

points. Often, if multiple resolution seams share the same overlap, as in Figure 4.8, only one 

min-path calculation from the branching point is needed to fill in all min-paths. The new 
resolution seams are constrained by the other seams in the cluster in order to not introduce 

new intersections with the new paths. The constraints are also given the ability to gradually 

increase the allowed intersection neighborhood beyond the user defined e-neighborhood in


46

the chance that no solution path exists. The crossings and intersections are collapsed in this 

neighborhood. Due to the rarity of this occurrence, the routine adds minimal overhead to the 

overall technique in practice. Order matters in both finding the intersections and computing 
the resolution seams and therefore must be consistent. We have found that ordering based on 

the overlap size works well. Resolution seams and expanded e-neighborhood are considered 

to be temporary states. Figure 4.8 (c, d) shows an example of an intersection resolution.

This technique robustly handles possible seam intersections at the branching points. 
Most importantly, since we are only adjusting the seam from the intersection point on, we 

can resolve each adjacency mesh face in parallel. In addition, since the seam is not changed 

outside of the multioverlap within a cluster, the resolution is local and will not cascade to a 

global resolution. However, it is possible for a user to introduce unresolvable intersections 
through added constraints, as we will discuss in Section 4.5.

4.4 Out-of-Core Seam Processing
While designed to be light on resources, the technique outlined in the previous sections 

assumes all images can be stored in-core. This assumption holds true for many panoramas, 

but not images gigapixels in size. In this section, I will detail how the original technique 
can be modified to handle large panoramas.

As illustrated in Figure 4.9, the initial seam solution for the Panorama Weaving tech­

nique can be thought to occur in three phases. First, for each adjacency mesh face, the 

corresponding branching point must be computed. After the branching point is found, any 

seams which occur on the panorama boundary can be computed. These correspond to edges 

in our mesh that belong to only one face. See Figure 4.9 (a). As mentioned previously, the 

branching point computations for each face given our simple pairwise seam assumption are 

completely independent and can be computed in parallel. As a second phase, the seams for 
the shared edges can be found by connecting the newly computed branching points. Each 

shared edge can be processed independently in parallel. See Figure 4.9 (b). Finally, once all 

edges for a face are computed the seam intersections for the given face can be resolved. This 

resolution is also independent and parallel for each adjacency mesh face. See Figure 4.9 (c). 
Note that each phase need not be entirely distinct since the can be interleaved.

If we bundle the branching point and shared seam calculation into a single operation, the 

logic of our out-of-core computation would only deal with the adjacency mesh faces. This 

can drastically reduce the system complexity even when working in a multicore environment.


47

0 + < > K > + < > -v O  0 4 < > K > ^ 0 -r O

(a) (b) (c)

F igure 4.9: The phases of out-of-core seam computation. (a) First, branching points are 
computed. The seams for all unshared edges can also be computed during this pass. (b) 
Second, once the corresponding branching points are computed, all unshared edges can be 
computed with a single min-path calculation. (c) Third, once all the seams for the edges for 
a given face have been computed, the intersections can be resolved. Note, the three passes 
do not necessarily need to be three separate phases since they can be interleaved when the 
proper input data are ready.

By doing this, we also cast our problem into the simple problem of graph traversal for the 

two remaining phases of our seam processing system. For our traversal strategy, we chose 

a simple row-major traversal of the mesh faces.

4.4.1 Branching Point and Shared Edge Phase
The design goal for this phase is to keep the memory requirement low and predictable. 

The reasoning for this approach is twofold. First, low memory requirements enable the 

portability of our out-of-core system to a wide variety of systems. Such a technique has 

the ability to run on systems from laptops to HPC computers. Second, when moving into 
a multicore implementation being able to have a low, predictable memory requirement per 

face would make the logic and scheduling of the many threads operating on the adjacency 

mesh faces very simple. Given the resources of the system, we can predict how many faces 
the system can compute in parallel. We achieve our low memory footprint by computing 

the branching point for a given face in a “round-robin” fashion, as shown in Figure 4.10. For 

each edge, the images that correspond to its endpoints are loaded and the overlap energy 

is calculated. Next, the seam tree needed for the branching point is calculated. After this 
calculation, the overlap energy is no longer needed and is therefore ejected from memory. 

The calculation then moves onto the next edge that shares an image with the previous 

computation. The rest of the branching point calculation proceeds in the same way until 

all seam trees have been computed to compute the location of the new branching point.
As mentioned above, we couple the shared edge phase into this phase of calculation. This


48

(a) (b)

F igure 4.10: The low memory branching point calculation for our out-of-core seam creation 
technique. (a) Given a face for which a branching point needs to be computed, (b) the 
computation proceeds “round-robin” on the edges of the face to compute the needed seam 
trees. The images that correspond to the edge endpoints and overlap energy are only needed 
during the seam tree calculation for a given edge on the face. Therefore by loading and 
unloading these data during the “round-robin” computation, the memory overhead for the 
branching point computation is the cost of storing two images, one energy overlap buffer, 
and one for the seam trees for the given face.

is done with a flag per face to indicate whether the branching point has been computed. 

After the branching point has been computed for a face, the flag is set and the process 

checks to see if the flag is also set for the other faces on a face's shared edges. If the face 

calculation is the second to set this flag, it knows it can fill in the new seam for the shared 

edge. For the multicore implementation, this check is made atomic with locks to ensure 

there is always a clear first- and second-face calculation when setting flags for the endpoints 
of a shared seam. Note that the memory overhead for this computation is equal to the 

space required to store two images, one buffer for the overlap energy, and one for the seam 

trees for the edges of the mesh face. As you can see, this overhead is quite low and given 

an average image size and overlap percentage, very predictable. The geometry of the seams 
are stored in-core for the final phase of the calculation.

4.4.2 Intersection Resolution Phase

When all seams for a face have been computed, the seam intersections that correspond 

to the face can be resolved. The intersection test is computed on the seam geometry which 
is already in memory. If an intersection occurs within a threshold distance on the seam from 

the branching point it is considered small and collapsed. This is an equivalent operation 

to the intersection collapse from the in-core Panorama Weaving technique. For larger


49

intersections, a resolution seam must be computed and the two images that correspond to 

the overlap of the resolution seam need to be loaded before computation. See Figure 4.11 

for an example.

4.5 Weaving Interactive System
In this section, we outline how to create a light and fast interactive system using the 

Panorama Weaving technique. A simplified diagram of the operation of the system is given 

in Figure 4.12. In Section 4.7, we provide examples of this application editing a variety of 

panoramas.

4.5.1 System Specifics

In this subsection, I will detail specifics for the prototype system.

4.5.1.1 In pu t. The system inputs for our prototype are flat, registered raster images 
with no geometry save the image offset. Any input can be converted into this format and 

therefore it is the most general. The initial image intersection computation is computed 

using the rasterized boundaries. Due to aliasing, there may be many intersections found. If 

the intersections are contiguous, they are treated as the same intersection and a represen­

tative point is chosen. In practice, we have found this choice has little effect on the seam 

outside a small neighborhood (less than 10 pixels from the intersection). Therefore, the 
system picks the minimal point in this group in terms of the energy. Pairs of intersections 

that are very close in terms of euclidean distance (less than 20 pixels) are considered to be 
infinitesimally small seams and are ignored.

The user is also allowed to dictate the energy function for the entire panorama, image, or 

overlap. This can be done as an initial input parameter or within the interactive program

\
\

"  -  i 
r~

—

/

F igure 4.11: For intersections that require a resolution seam, the two images which 
correspond to the overlap needed for the seam must be loaded. In the figure above, these 
images are the ones that correspond to the endpoint of the diagonal, resolution adjacency 
mesh edge.


50

A utom atic C om putation

F igure 4.12: Overview of Panorama Weaving. The initial computation is given by steps 
one through four, after which the solution is ready and presented to the user. Interactions, 
steps five and six, use the tree update in step four as a background process. Additionally, 
step six updates the dual adjacency mesh.

itself. Specifically, our prototype allows the user to switch between pixel difference or 
gradient difference energies.

4.5.1.2 Initial parallel co m p u ta tio n . Parallel computation is accomplished using 

a thread pool equal to the number of available cores. The initial dual seam tree and 
branching points computation can be run trivially in parallel. In the presence of two adjacent 

faces in the adjacency mesh, a mutex flag must be used on their shared seam since both 

faces may attempt to write this data simultaneously. As a final phase, each adjacency mesh 

face resolves intersections in parallel. In order to compute these resolutions in parallel, 

we split a seam’s data into three separate structures for the start, middle, and end of the 

seam. The middle seam contains the original seam before intersection resolution and its 

extent is maintained by pointers. The structure’s start and end are updated with the 

intersection resolution seams by the faces associated with their respective branching points. 
Either vector can be associated only with one face; therefore, we run no risk of multiple 

threads writing to the same location.
Each seam tree is stored as two buffers: one for node distance and one which encodes the 

tree itself. The tree buffer encodes the tree with each pixel referencing its parent. This can 

be done in 2 bits (rounded to 1 byte for speed) for a four-pixel neighborhood. Therefore, 

for float distances we need only 5 bytes per pixel to store a seam tree.
4.5.1.3 Seam  n etw ork  im p o rt. It is possible to import a seam network computed 

with an alternative technique (such as Graph Cuts, see Figure 4.13), and edit it with our


51

F igure 4.13: Importing a seam network from another algorithm. The user is allowed to 
import the result generated by Graph Cuts (a) and adjust the seam between the green and 
purple regions to unmask a moving person (b). Note that this edit has only a local effect, 
and that the rest of the imported network is unaltered.

system. Our import procedure works as follows. Given a labeling of the pixels of the 

panorama, the algorithm first extracts the boundaries of the regions. Then, branching 

points (boundary intersections) are extracted. Next, each boundary segment (bounded 

by two branching points) is identified as a seam and the connected components of the 
resulting seam network are identified. To be compatible with our framework, only the seam 

networks made of a single connected component can be imported. Thus, we only consider 

the biggest connected component of the network and small islands are discarded. Finally, 
our seam data-structures are fed with the seam network and the adjacency mesh is updated 

if necessary. Since the editing operations do not cascade globally, a user can edit a problem 
area locally and maintain much of the original solution if desired.

4.5.2 Interactions

In this subsection, I will detail some possible interactions that can be accomplished with 

our system.

4.5.2.1 Seam  b en d in g. The adding of a constraint and its movement is called a 
bending interaction in our system and operates as outlined in Section 4.2. A user is allowed 

to add a constraint to a seam and is provided instantly the optimal seam which must 

pass through it. The constraint can be moved interactively to explore the seam solution 
space. Intersections in any adjacency mesh face containing the corresponding edge are 

resolved, which can be done in parallel. Most importantly, given how the technique resolves 

intersections, seams cannot change beyond the multioverlap area in these faces. Therefore, 

the seam resolution does not cascade globally.


52

4.5.2.2 Seam  splitting. Adding more than one constraint is akin to splitting the 

seam into segments. After a bending interaction, the seam trees are split into four, where 

there were previously two. Two of the trees (corresponding to the endpoints) are inherited 
by the new seams. The two trees associated with the new constraint are identical, therefore 
only one tree computation is necessary. Splitting occurs in our prototype when a user 

releases the mouse after a bending interaction. Editing is locked for this seam until the 

corresponding trees are resolved. This is a quick process and it is very rare for a user to be 

fast enough to beat the computation.
4.5.2.3 B ran ch in g p oin t m ovem en t. The user is given the ability to grab and 

move the branching point associated with a selected face of the adjacency graph As I have 

detailed in Section 4.2, a movement of an endpoint is a simple lookup on its partner’s dual 
seam tree. As the user moves a branching point, intersections for both the selected face and 

all adjacent faces are resolved. Given that the intersection resolution does not adjust seam 
geometry beyond the multioverlap, we need only to look at this one-face neighborhood and 

not globally. To enable further interaction, the seam trees associated with this endpoint 

need to be recalculated after movement. When the user releases the mouse, the seam tree 

data for all the endpoints associated with the active seams for the face are recomputed as 

a background process in parallel. Like splitting, editing is locked for each seam until it 

completes the seam tree update.
4.5.2.4 B ran ch in g p oin t splittin g and m erging. The user can add and remove 

additional panorama seams by splitting and merging branching points. Addition and 

removal of seams is equivalent to subdividing and merging faces of the adjacency mesh. 
Improper requests for a subdivision or merge correlate to a non-valid seam network and are 

therefore restricted. If splitting is possible for a selected branching point, the user can iterate 

and choose from all possible subdivisions of the corresponding face. To maintain consistent 

seams, merging is only possible between branching points resulting from a previous split. In 

other words, merging faces associated with different initial adjacency mesh faces would lead 
to an invalid seam configuration since the corresponding images do not overlap. If a seam is 

added, its dual seam tree is computed. In addition, the other active seams associated with 
this face will need to be updated much like a branching point movement.

4.5.2.5 Im p ro p e r user in teraction . Given the editing freedom allowed to users, 

they may move a seam into a inconsistent configuration. Figure 4.14 illustrates some 

examples. Rather than constrain the user, the prototype system either tries to resolve the


53

improper seams or if that is not possible give the user visual feedback indicating a problem 
configuration. For example, if the user introduces a seam intersection, our intersection 

routine is launched to resolve it, see Figure 4.14 (a). Crossing branching points, Figure 4.14 
(b), can be resolved similarly. Figure 4.14 (c) illustrates a configuration with no resolution. 

In this instance, the crossing edges are collapsed and the user is given a visual hint that 
there is a problem.

Given the locality of the interactions in the Panorama Weaving technique, extending the 
interactions to gigapixel sized images does not require a large change to the base interactions. 
Our interaction scheme is based on a standard large image viewer and on-the-fly loading 

and computation of the data needed for seam editing. As Figure 4.15 shows, due to 

the technique's simple pairwise seam assumption, the data which needs to be loaded and 

computed is local given an interaction. Our system works as follows: we leverage the large 

image, out-of-core ViSUS system outlined in Chapter 3 to provide exploration of a flattened 

gigapixel image created from the seams from our initial out-of-core seam computation. If 
a user wishes have a seam or branching point interaction, she/he can initiate this edit my 

selecting a seam area she/he wished to edit. The action is determined similarly to the 

in-core seam system in that the overlap bounding boxes are tested against the user selected 

area. If all the overlap bounding boxes for a given face are selected, then it is assumed 

the user wishes to have a branching point manipulation. Otherwise, it is assumed the user 
wishes a seam bending interaction and a single overlap from the selection is chosen. A user 

can cycle through the single selection options if more than one is present for a given input. 

A brute-force bounding box intersection test has the possibility of being overly expensive for 

panoramas which contain thousands of images and overlaps, therefore, we have designed two

i i  i i i , i■ i i i i * i *

F igure 4.14: Improper user constraints are resolved or if resolution is not possible, given 
visual feedback. (a) Resolution of an intersection caused by a user moving a constraint. 
(b) Resolution of an intersection caused by a user moving a branching point. (c) A non- 
resolvable case where a user is just provided a visual cue of a problem.

4.6 Scalable Seam Interactions

(a) (b) (c)


54

F igure 4.15: Given the inherent locality of the seam editing interactions, only a very 
small subset of the adjacency mesh needs to be considered. (a) For operations on an 
adjacency mesh face (i.e., branching point operations) only the images and overlaps of the 
corresponding face and its one face neighborhood need to be loaded and computed. (b) For 
edge operations (i.e., bending), we need consider only the faces that share the edge.

more scalable options for the selection. As Figure 4.16 (a) illustrates, a bounding hierarchy 

of overlaps can be built by merging the bounding boxes of pairs of neighboring adjacency 
mesh faces. During a user selection, the hierarchy is traversed to determine which faces 

need to be considered for selection. Once this is determined, the selected face’s overlaps 
are testing for user selection. This provides a selection runtime that is logarithmic in the 

number of faces in our panorama. Alternatively, as Figure 4.16 (b) shows, if there is a pixel 
to image map this can be leveraged to determine the neighborhood of overlaps that need 

to be tested for selection. This neighborhood consists of the edges of faces that share the 
node that corresponds to the pixel-map image. After the user finishes their interaction, the

F igure 4.16: When a user selects an area of a panorama to edit, the system must determine 
which overlaps intersect with the selected area. This can be accomplished with a (a) 
bounding hierarchy of the overlaps. During selection this hierarchy is traversed to isolate 
the proper overlaps for the selection. This gives a logarithmic lookup with respect to the 
number of adjacency mesh faces in the panorama. Alternatively, (b) if a pixel-to-image 
labeling is provided, this can be used to isolate a fixed neighborhood that needs to be 
tested for overlap intersection. This labeling is commonly computed if the panorama is to 
be fed into a color correction routine after seam computation.


55

seams are saved and the loaded images are masked and saved to the flattened image.

4.7 Results
In this section, we detail the results in both the creation and editing phases of our 

system. In-core results were performed on a 3.07 GHz Intel i7 four-core processor (with 
Hyperthreading) with 24 gigabytes of memory. The large system memory was required in 
order to run the Graph Cuts implementation, as is, on all datasets. Panorama Weaving 

performed well for all datasets on test systems including laptops with only 4 gigabytes of 

memory. Out-of-core results were run on 2.67GHz Xeon X5550 (eight cores) system with 
24 gigabytes of memory.

4.7.1 Panorama Creation
This subsection details the results for in-core and out-of-core initial seam creation.
4.7.1.1 In -c o r e  results. We compare the panorama creation phase of our system to 

the implementation provided by the authors of the Graph Cuts technique [26, 24, 92], which 
many consider the exemplary implementation. Both a  expansion and swap algorithms were 

run until convergence to guarantee minimal errors and the best time is reported. Since 
this implementation has various ways of passing data and smoothness terms, we tested all 

and report the fastest, which is precomputed arrays for the costs with a function pointer 
acting as a lookup. Not having an equally well-established in-core parallel implementation 

for Graph Cuts, we use a serial version of our algorithm for comparison. Timings for Graph 

Cuts are based on the implementation's reported runtime. Due to the parallel option of 

Panorama Weaving, its timings are based on wall-clock time. Datasets which contain more 
than simple pairwise overlaps were run at full resolution and the running times and energy 

comparisons are provided in Table 4.1. Our technique produces lower energy seams for all 
but one example, Fall-5way, and even in this case the techniques have comparable energy. 

In terms of performance, serial Panorama Weaving computes its solution faster than the 

Graph Cuts for all datasets (at the same resolution). As the Graph Cuts results show, a 
hierarchical approach would be necessary to achieve similar performance by trading quality 

for speed. Parallel Panorama Weaving further reduces the runtime down to mere seconds 

for all datasets at full resolution. On average, we see that the scaling performance between 

Panorama Weaving’s serial and parallel implementations to be about a five times speedup. 
This is in sync with the number of physical cores in the test system. Hyperthreading is 

effective when data access is a main bottleneck. A speedup corresponding to the number


56

Table 4.1: Performance results comparing Panorama Weaving to Graph Cuts for our test 
datasets that contain more than simple pairwise overlaps. Panorama Weaving run serially 
(PW-S) computes solutions quickly. When run in parallel, runtimes are reduced to just 
a few seconds. The energy ratio (E. ratio) between the final seam energy produced by 
Panorama Weaving and Graph Cuts (PW  Energy /  GC Energy) is shown. For all but one 
dataset (Fall-5way), Panorama Weaving produces a lower energy result. It is comparable 
otherwise. Panorama image sizes are reported in megapixels (MP).

Dataset Megapixel Images PW  Parallel PW  Serial GC Serial E. Ratio
Crosswalk 16.7 4 1.3 7.2 369.6 0.995
Fall-5way 30.0 5 2.4 12.1 735.4 1.220
Skating 44.7 6 3.2 16.8 734.0 0.851

Lake 9.4 22 0.5 2.9 337.2 0.503
Graffiti 36.6 10 4.3 19.6 983.7 0.707
Nation 49.1 9 4.6 23.2 1168.7 0.800

of physical cores should be expected when an algorithm is compute-bound which is true for 

Panorama Weaving. Therefore our implementation is scaling quite well on our test system.
4.7.1.2 O u t-o f-c o r e  results. To test the performance for our out-of-core imple­

mentation, we computed the seams for two large panoramas on our eight core test system. 

The Fall Salt Lake City panorama consist of 611 overlapping images and is 126,826 x 29,633, 

3.27 gigapixel, when combined. The final image that is the result of our seam computation 

is provided in Figure 4.17 (b). Additionally, in Table 4.2 we provide strong scaling test of 

our implementation for this panorama as we vary the core count from one to eight. As this 

table illustrates, our implementation shows very good efficiency up to the number of cores of 

our test system. At eight cores, our efficiency takes a slight dip due to our implementation 

using a dedicated thread to schedule the face computation for each phase of the out-of-core 

Panorama Weaving technique. On a single core, our system can produce a seam solution 

in only 68.5 minutes. As I have discussed in Section 2.1 Hierarchical Graph Cuts [2] does 
not scale beyond two to three levels of the hierarchy. At three levels, the coarse version 

of this panorama is still approximately 100 megapixel in size with 611 labels and could 
not be run on our test system. Therefore, to provide context for our running times, we 

compare our technique to the predicted runtime of a similar technique [94] which relies on 

a moving window of a Graph Cuts solution. Figure 4.17 (a) provides an example of one 

of these windows. In our tests, a Graph Cuts solution took 3003.86 seconds to converge. 
Even more problematic is that the first iteration for this window still took a very long time 

to compute: 612.99 seconds. Therefore, if this window is a good representation for this


57

F igure 4.17: Fall Salt Lake City, 126, 826 x 29, 633, 3.27 gigapixel, 611 images. (a) An 
example window computed with out-of-core Graph Cut technique introduced in Kopf et 
al. [94]. This single window took 50 minutes for Graph Cuts to converge, with the initial 
iteration requiring 10.2 minutes. Since the full dataset contains 495 similar windows, using 
the windowed technique would take days (85.15 hours) at best, and weeks (17.2 days) in 
the worst case. (b) The full resolution Panorama Weaving solution was computed in 68.4 
minutes on a single core and 9.5 minutes on eight cores. Our single core implementation 
required a peak memory footprint of only 290 megabytes while using eight cores had peak 
memory of only 1.4 gigabytes.

Table 4.2: Strong scaling results for the Fall Salt Lake City panorama, 126, 826 x 29, 633, 
3.27 gigapixel, 611 images. Our out-of-core Panorama Weaving technique scales very well 
in terms of efficacy percentage compared to ideal scaling up to the physical cores of our test 
system (eight cores). At eight cores our technique loses a slight amount of efficiency due to 
our implementation having a dedicated thread to handing the seam scheduling. Using the 
full eight cores to process this panorama provides a full resolution seam solution in just 9.5 
minutes. The system is extremely light on memory and uses at most 1.4 gigabytes.

Cores Tim e(s) Ideal(s) Efficiency Max Mem.
1 4109 N A NA 290 MB
2 2079 2054.5 98.8% 443 MB
3 1403 1369.7 97.6% 599 MB
4 1049 1027.3 97.9% 791 MB
5 840 821.8 97.8% 881 MB
6 706 684.8 97.0% 1.1 GB
7 601 587.0 97.7% 1.2 GB
8 573 513.6 89.6% 1.4 GB

panorama dataset (which our testing indicates that it is) computing all 495 windows in this 

dataset would take days (85.15 hours) at best, and weeks (17.2 days) in the worst case. 
Our implementation can compute a full resolution solution in a little over an hour for a 

single core and only a few minutes when run on eight cores. Also of note is the small use 
of memory inherent with our scheme. At any given time, we need only hold the cost of 

computing the seams for a face per core. Therefore the per-core memory footprint is only 

the cost of holding two images, an energy buffer, and the seam tress for a given face in 
memory. Even with floating point precision and eight cores, for this dataset our technique 

uses at most 1.4 gigabytes of memory. To further test the scalability of our system, our


58

implementation was run on even larger image. The Lake Louise panorama consists of 1512 

images and is 187,069 x 40.202 (7.52 gigapixel) when combined. In Table 4.3 we provide a 

strong scaling test for our implementation for this dataset. Like the previous example, our 
implementation shows very good efficiency for one to eight cores on our test system. The 

system is very light on memory resources and needs only 2.0 gigabytes of memory to operate 
on all eight cores. When using all cores, our implementation can provide a seam solution 

in only 37.7 minutes. The final image resulting from our seam calculation is provided in 

Figure 4.18.

4.7.2 Panorama Editing

We provide additional results of the interactive portion of our technique editing a variety 

of panoramas. Images which are color-corrected were processed using gradient domain 

blending [133, 103].
4.7.2.1 E d itin g b a d  seams. In Figure 4.19, the Nation dataset is a highly dynamic 

scene of a busy intersection with initial seams that pass through moving cars/people, see 
Figure 4.19(d). In addition, there are various registration artifacts, see Figure 4.19(e). 

Before our technique, a user would consider this panorama unsalvageable or be required to 

manually edit the boundary masks pixel-by-pixel. In just a few minutes, using our system, 

a user can produce an appealing panorama by adjusting seams to account for the moving 

objects and pulling registration artifacts into areas which are less noticeable. Figure 4.20 

(a, b and c) shows the initial seam configuration for the Skating dataset with two problem 

areas. The initial seams pass through people who change position on the ice and produce

Table 4.3: Strong scaling results for the Lake Louise panorama, 187, 069 x 40.202, 7.52 
gigapixel, 1512 images. Like the smaller Fall Salt Lake city panorama, our implementation 
shows very good efficiency up to the physical number of cores on our test system. Using the 
full eight cores for the full resolution seam solution for this panorama requires 37.7 minutes 
of compute time and at most 2.0 gigabytes of memory.

Cores Tim e(s) Ideal(s) Efficiency Max Mem.
1 16279 N A NA 382 MB
2 8263 8139.5 98.51% 627 MB
3 5516 5426.3 98.37% 877 MB
4 4132 4069.8 98.49% 1.1 GB
5 3306 3255.8 98.48% 1.4 GB
6 2778 2713.2 97.67% 1.6 GB
7 2383 2325.6 97.59% 1.8 GB
8 2259 2034.9 90.08% 2.0 GB


59

F igure 4.18: Lake Louise, 187,069 x 40.202, 7.52 gigapixel, 1512 images. The Panorama 
Weaving results for the Lake Louise panorama. Our out-of-core seam computation produces 
this full resolution solution in as little as 37.7 minutes while requiring at most only 2.0 
gigabytes of memory. Panorama courtesy of City Escapes Nature Photography

(d)

(e)

(f)

F igure 4.19: Panorama Weaving on a challenging data-set (Nation, 12848 x 3821, nine 
images) with moving objects during acquisition, registration issues and varying exposure. 
Our initial automatic solution (b) was computed in 4.6 seconds at full resolution for a 
result with lower seam energy than Graph Cuts. Additionally, we present a system for 
the interactive user exploration of the seam solution space (c), easily enabling: (d) the 
resolution of moving objects, (e) the hiding of registration artifacts (split pole) in low 
contrast areas (scooter) or (f) the fix of semantic notions for which automatic decisions can 
be unsatisfactory (stoplight colors are inconsistent after the automatic solve). The user 
editing session took only a few minutes. (a) The final, color-corrected panorama.

F igure 4.20: Repairing non-ideal seams may give multiple valid seam configurations. (a) 
The initial seam configuration for the Skating dataset (9400 x 4752, six images) based on 
gradient energy. (b and c) Its two major problem areas. (d and e) Using our technique a 
user can repair the panorama, but also has the choices of two valid seam configurations. 
Panorama courtesy of City Escapes Nature Photography.


60

either an amalgamation of two positions of a single person or a partial person. As shown 

in the companion video, repairing these seams only takes a few seconds of interaction, see 

Figure 4.20 (d and e) for edited results. Figure 4.21 illustrates how a user can correct 
registration artifacts that appear on the moon’s horizon in the Apollo-Armstrong dataset.

4.7.2.2 M u ltip le valid seams. Along with repairing unideal seams, Figure 4.19 

and 4.20 (Nation and Skating) are also examples of a user choosing between multiple valid 

seam configurations. In Figure 4.19 (f), the initial seam calculation for the Nation dataset 
produces an intersection with four red stoplights, an inconsistent configuration. With our 

system, a user can turn two stoplights green creating a more realistic setting. Figure 4.20 

(bottom) shows 2 valid seam configurations the that user can choose while fixing the Skating 

dataset. Each was repaired with a simple bend of the panorama seam. In Figure 4.22, we 
provide an example of how a user can fix registration artifacts of the dataset (Graffiti) while 
tuning the seam location for improved results in the final color-correction. For gradient- 

domain blending, smooth, low-gradient areas provide the best results, therefore the user 

placed the seams in the smooth wall locations, Figure 4.22 (c). This editing session required 

just 2 minutes of interaction. Finally, in Figure 4.23 we show the color-corrected edits of the 

originally optimal, but non-visually pleasing, seams of Figure 1.7 for the two datasets: Canoe 

and Lake Path. Both interactions required only a few seconds of user input. Figure 4.24 

is a Lake vista with multiple dynamic objects moving in the scene during acquisition. In 
all, there are six independent areas in the panorama where a canoe, or groups of canoes, 

change positions in overlap areas. Figure 4.24 shows two examples of alternative edits. A 

user editing with our technique would have the choice of 64 valid seam combinations of 

canoes. In Figure 4.25, we show a user iterating through valid splitting options of a five 

valence branching point of the Fall-5way dataset. In this way, we allow users the freedom to 

add and remove seams as they see fit. Finally, the images Crosswalk and Apollo-Aldrin in 

Figure 1.9 were created and edited in our system to show how panoramas can have multiple 

valid seam configurations.

4.8 Limitations and Future Work
Our technique is versatile and can robustly handle a multitude of panorama configura­

tions. However, there is currently a limitation on the configurations which we can handle. 
The adjacency mesh data structure in its current form relies on the fact that the intersection 

of pairwise overlaps yields an area of exactly one connected component (which is needed


61

F igure 4.21: A panorama taken by Neil Armstrong during the Apollo 11 moon landing 
(Apollo-Armstrong: 6,913 x 1,014, eleven images). (a) Registration artifacts exist on the 
horizon. (b) Our system can be used to hide these artifacts. (c) The final color-corrected 
image. Panorama courtesy of NASA.

F igure 4.22: In this example (Graffiti: 10,899 x 3,355, ten images), (a) the user fixed a few 
recoverable registration artifacts and tuned the seam location for improved gradient-domain 
processing, yielding a colorful color-corrected graffiti. (b) Our initial automatic solution 
(energy function based on pixel gradients). (c) The user edited panorama. The editing 
session took 2 minutes.


62

F igure 4.23: The color-corrected, user edited examples from Figure 1.7. The artifacts 
caused by the optimal seams can be repaired by a user. Images courtesy of City Escapes 
Nature Photography.

F igure 4.24: A lake vista panorama (Lake: 7,626 x 1,231, 22 images) with canoes which 
move during acquisition. In all there are six independent areas of movement, therefore 
there are 64 possible seam configurations of different canoe positions. Here we illustrate 
two of these configurations with color-corrected versions of the full panorama (a and c) and 
a zoomed in portion on each panorama (b and d) showing the differing canoe positions. 
Panorama courtesy of City Escapes Nature Photography.

F igure 4.25: Splitting a five valence branching point based on gradient energy of the 
Fall-5way dataset (5211 x 5177, 5 images): as the user splits the pentagon, the resulting 
seams mask/unmask the dynamic elements. Note that each branching point that has a 
valence higher than 3 can be further subdivided.


63

to guarantee the manifold structure of the mesh). For example, less than one connected 

component would arise in a situation where one overlap is completely incased inside another 

and more than one can be caused by an overlap’s area passing through the middle of another 
overlap. Both of these cases break the pairwise seam network assumption. In addition, an 

image whose boundary is completely enclosed by another image’s boundary (100% overlap) 

is currently considered invalid. These are pathological cases that we have yet to encounter 

in practice. Overall, the authors feel that these limitations are only temporary and that 

the data-structures and methods outlined in this work are general enough to support these 

cases as a future extension.

The serial and parallel out-of-core implementations discussed above show good scale 

seam processing to images gigapixels in size. The schemes show good scaling for our test 
datasets even though there is some redundancy in the file I/O . Each image is loaded for 

ever multioverlap which it is a member. In other words, an image is loaded (reloaded) an 

equivalent number of times based on the valency of its adjacency mesh node. As future 

work, my collaborators and I wish to explore caching strategies for images and overlaps, as 

well as how these strategies interplay with different face traversal order.


CHAPTER 5

INTERACTIVE GRADIENT DOM AIN 
EDITING AT SCALE

This chapter introduces the Progressive Poisson technique which brings the color-correction 

phase of panorama creation pipeline into an interactive environment. This technique 

provides gradient domain processing interactively which is the most sophisticated and 

computationally expensive correction method. This operation is an inherently global and 
therefore, before this work, there were no known techniques on applying it to massive images 

interactively. Section 5.2 outlines the interactive technique and a full resolution out-of-core 

Progressive Poisson solver. Section 5.3 extends the full solver to a parallel distributed 

environment and Section 5.4 shows how it can be redesigned as a cloud based resource.

5.1 Gradient Domain Image Processing
Gradient domain image processing encompasses a family of techniques that manipulate 

an image based on the value of a gradient field rather than operating directly on the pixel 

values. Seamless cloning, panorama stitching, and high dynamic range tone mapping are all 

techniques that belong to this class. Given a gradient field G (x ,y ), defined over a domain 

Q C K2, we seek to find an image P (x ,y ) such that its gradient V P  fits G (x ,y ).
In order to minimize ||VP — G || in a least squares sense, one has to solve the following 

optimization problem:

m i n £  ||VP — G||2 (5.1)

It is well known that minimizing equation (5.1) is equivalent to solving the Poisson equation 

A P  =  div G (x ,y ) where A  denotes the Laplace operator A P  =  +  dyr and div G (x,y) 
denotes the divergence of G .

To adapt the equations shown above to discrete images, we apply a standard finite 
difference approach which approximates the Laplacian as:

A P  (x,y) =  P  (x +  1, y) +  P  (x — 1,y) +


65

P ( x , y  +  1) +  P ( x , y  — 1) -  4 P ( x , y )  (5.2)

and the divergence of G (x ,y ) =  (Gx ( x , y ) , G y (x ,y )) as:

div G (x ,y ) =  Gx(x,y) — Gx(x — 1,y) +

Gy (x, y) — Gy (x ,y  — 1).

The differential form A P  =  div G(x, y) can therefore be discretized into the following 

sparse linear system:
Lp =  b. (5.3)

Each row of the matrix L stores the weights of the standard five point Laplacian stencil 

given by (5.2), p is the vector of pixel colors, and b encodes the guiding gradient field, as 

well as the boundary constraints. The choice of guiding gradient field G (x, y) and boundary 

conditions for the system determines which image processing technique is applied. In the 

case of seamless cloning, it is necessary to use Dirichlet boundary conditions, set to be the 

color values of the background image at the boundaries, and the guiding gradient to be the 
gradient of the source image (see [133] for a detailed description). For tone mapping and 
image stitching, Neumann boundary condition are used. The guiding gradient field for image 

stitching is the composited gradient field of the original images. The unwanted gradient 

between images is commonly set to zero or averaged across the stitch. The guiding gradient 

for tone mapping is adjusted from the original pixel values to compress the high dynamic 

range (HDR) (see [55] for more detail). Methods such as gradient domain painting [114] 

allow the guiding gradient to be user defined.

5.2 Progressive Poisson Solver
This section discusses a progressive framework for solving very large Poisson systems 

in massive image editing. This technique allows for a simple implementation, yet is highly 

scalable, and performs well even with limited storage and processing resources.

5.2.1 Progressive Framework
For an image P  of n x  n pixels, the Laplace system (5.3) has n2 independent variables, 

one per pixel. Computing the entire solution is therefore expensive both in terms of space 
and time. For large images, the space requirements easily exceed the main memory available 
on most computers. Moreover, the long computation times make any interactive application 

unfeasible.


66

Acceleration methods try to address either or both of these issues. The recent adaptive 
formulation by Agarwala [2] has been particularly insightful. By exploiting the smoothness 

of the solution, this method was the first to reduce both the cost of the computation 
and its memory requirements. The approach by Kazhdan and Hoppe [89] demonstrates 
how a streaming approach can achieve high performance by optimizing the memory access 

patterns.

We extend these acceleration techniques and show how to achieve high quality local 

solutions, without the need for solving the entire system. Moreover, we show that coarse 

approximations are of acceptable visual quality without the cost of a typical coarsening stage 

used in the V-cycle. These new features, coupled with a simple multiresolution framework, 
enable a data-driven interactive environment that exploits the fact that interactive editing 

sessions are always limited by screen resolution. At any given time, a user only sees either 
a low resolution view of the entire image or a high resolution view of a small area. We take 

advantage of this practical restriction and solve the Poisson equation only for the visible 

pixels. This provides performance advantages for interactive sessions, as well as tight control 

over the memory usage. For example, even the simple step of computing the gradient of the 

full resolution image can be problematic due to its significant processing time and storage 

requirement. In our approach, we avoid this problem by estimating gradients on-the-fly 

using only the available pixels.
Overall, our interactive system is based on a simple two-tier approach:

• A global progressive solver provides a near instant coarse approximation of the full 

solution. This approximation can be refined up to a desired solution by a lightweight 
process, often running in the background and possibly out-of-core. Any time the user 

changes input parameters, this process is restarted.

• A local progressive solver provides a quick solution for the visible pixels. This process 

is driven by the interactive viewer and uses as a coarse approximation the best solution 
available from the global solver.

These components can be coupled with different multiresolution hierarchies as discussed in 

the next section.
5.2.1.1 Initial solu tion . At launch, the system computes a coarse image for the 

initial view. A fast two-dimensional direct method using cosine and Fast Fourier transforms 

by Agrawal [7, 8] is used for this initial solve for techniques that require Neumann boundaries


67

(stitching, HDR compression). For methods that require Dirichlet boundaries (seamless 

cloning) we using an iterative method such as SOR. To provide the user with a meaningful 

preview, we use an initial coarse resolution of one to two megapixel depending on the 
physical display. We have found, in practice, that the Fast Fourier solver usually gives us 

this approximation in under 2 seconds. This initial solution is at the core of the progressive 
refinement defined in the next paragraph.

5.2.1.2 P rogressive refin em en t. The goal of progressive refinement is to increase 
the resolution of our solution either locally or globally. This requires injecting color trans­

port information from coarser to finer resolutions. In doing so, we exploit the fact that 

the solution, away from the seams, tends to be smooth [2] and up-sampling the coarse 

solution gives high quality results in large areas of the image. To improve the solution and 
resolve the problems at the seams, we run an iterative method, estimating new gradients 

from the original pixel data of the finer resolution and using the up-sampled values as 
the initial solution estimate. The finer resolution gradient field allows the iterative solver 

to reconstruct the detailed features of the original image. For the iterative method we 
allow the use of either conjugate gradient (for faster convergence) or SOR (for minimal 

memory overhead). The iterative solver is assumed to have converged when the L 2 norm 

of the residual between iterations is less than 1.0 x 10- 3 . In practice there is no perceptible 

difference between iterations after this condition is met.
Figure 5.1 shows the refinement process where we assume for simplicity that each 

resolution doubles each dimension separately and our data is a subsampled hierarchy. In 

this case, computing each finer resolution is equivalent to adding new rows (or columns) 

to the coarse resolution. Therefore, we know that each new pixel added has two neighbors 
from the coarse solution. We can take the average difference from these two neighbors and 

apply it to the original Rgigabyte value of the pixel from the new resolution (see Figure 5.1 

(a)). Since the image is subsampled, the average difference and application to the new pixel 

is trivial.
In a tiled hierarchy one would need to double both dimensions at the same time, requiring 

a simple adjustment to the interpolation. Each new resolution is treated as new data and 

the offset is based on the solution from the previous resolution and the transform between 

levels.
5.2.1.3 L oca l p review . Comegabyteining the coarse, global solve with a progres­

sively refined local preview is all that is necessary for our interactive system. For data


68

Pr =  Ps — Pn
[~[ Ps =  solved pixel

□  Pn =  new pixel from 
original image

n x n

2n x n

2n x 2n

Pn(x, y) +  Pr (x—1’y)+ Pr(x+1’y)

Pn(x, y) +  Pr (x’y—1)+ Pr(x’y—1)

4n x 4n

(a )

k X k

2k x 2k

Global Solver

2k k

2k 2k
Local Adaptive 

Solver

(b)
F igure 5.1: Our adaptive refinement scheme using simple difference averaging. (a) Global 
progressive up-sampling of the edited image computed by a background process. (b) View- 
dependent local refinement based on a 2k x 2k window. In both cases we speedup the SOR 
solver with an initial solution obtained by smooth refinement of the solution.

requests at resolutions equal to or less than our coarse solution, we simply display the 
available data. As the user zooms into an area, the image is progressively refined in a local 

region. Since the resolution increase is directly coupled with the decrease in the extent of 

the local view, the numegabyteer of pixels that must be processed remains constant (see 
Figure 5.1 (b)). This results in a logarithmic run-time complexity and constant storage 

requirement, which allows our system to gracefully scale to images orders of magnitude 

larger than previously possible.

5.2.1.4 P rogressive full solu tion . The progressive refinement can be applied glob­
ally to compute a full solution. Since the method requires a very small overhead, it can 

easily be run as background process during the interactive preview. When a new resolution 

has been solved, the interactive preview uses the solution as a new coarse approximation,


69

thereby saving com putation during th e local adaptive phase. Like o ther in-core m ethods, 

this progressive global solver is limited by available system  memory. To address this issue, 

th e global solver has th e ability to  switch modes to a moving-window out-of-core progressive 

solver.

5 .2 .1 .5  O u t- o f - c o r e  s o lv e r. The out-of-core solver m aintains strict control over 

memory usage by sweeping th e d a ta  w ith a sliding window. The window traverses the 

finest desired resolution, which can be several levels in th e hierarchy from th e current 

available solution. If th e ju m p  in resolution is too high, th e process can be repeated several 

times. W ithin each window, th e coarse solution is up-sampled and th e new resolution 

image is solved using th e gradients from th e desired resolution. Since th e window lives at 

th e desired resolution, we never need to  expand memory beyond th e size of th e window. 

F urtherm ore, windows are arranged such th a t they overlap w ith th e previously solved d a ta  

in b o th  dimensions to  produce a sm ooth solution. The overlap causes th e solver to  load 

and com pute some of th e d a ta  m ultiple times. This overlap has an inherent overhead when 

com pared to an idealized in-core solver. For instance, given a 1 /x  overlap, th e four corners, 

each 1 /x  x 1 /x  in size, are executed four times. The four strips on th e edge of th e window, 

not including th e corners, each 1 /x  x (1 — 2 /x ) in size are executed two times. All other 

pixels, size (1 — 2 /x ) x (1 — 2 /x ), are executed once. Therefore, th e overhead com putation 

for this 1 /x  overlap is given by: 4 /x (1  +  1 /x ). Moreover, th e I/O  overhead can be reduced 

to  1/x , since we can retain pixel d a ta  from th e previous window in th e prim ary traversal 

direction. In principle, a larger overlap between windows results in higher quality solutions, 

though in practice we have found th a t for a 1024 x 1024 window a 1/32 overlap is sufficient 

for good results. This overlap requires only a 12.8% com pute overhead and a 3.1% I/O  

overhead. A larger window can be used to reduce th e percentage overlap while achieving 

th e same quality results. For instance, by doubling th e window size in b o th  dimensions, a 

2048 x 2048 window can be com puted w ith a 1/64 overlay, only incurring a 6.3% com pute 

overhead and a 1.5% I/O  overhead. Compared to th e exact analytical solution, our m ethod 

produces even higher quality results th a n  th e best known m ethod [89] for equivalent run 

times.

5 .2 .2  D a t a  A c c e s s

O ur progressive solver can operate well on multiple hierarchical schemes. Tiled hierar­

chies are often used to  produce sm oother, antialiased images, though high contrast areas


70

in th e original image may be lost in th e smoothing. As Figure 5.2 (b) shows, the tiled 

image is visually pleasing, b ut details such as th e cars on th e highway are lost. This visual 

smoothness can also come at the cost of significant preprocessing, reduced flexibility when 

dealing w ith missing d ata, and increased I/O  when traversing th e d ata. The costs can be 

especially significant for massive d a ta  if one has to  process it w ith very limited resources. 

The least costly image hierarchy can be com puted by subsampling. Subsampling is simple 

and lightweight, b ut is prone to  high frequency aliasing. It does, though, retain higher 

contrast a t th e coarse resolution. Figure 5.2 (a) shows how th e subsampled hierarchy 

has aliasing artifacts, b ut also retains enough contrast to  see the cars on th e highway. 

This contrast may be beneficial for some applications, such as an analyst studying satellite 

imagery.

To show th e flexibility of our interactive system, we support b o th  a filtered tiled hierarchy 

and a subsampled hierarchy (see Figure 5.3). For a tiled scheme, we com pute th e image

F ig u r e  5.2: Subsampled and tiled hierarchies. (a) A subsampled hierarchy. As expected, 
subsam pling has th e tendency to  produce high-frequency aliasing. Though details such as 
th e cars on th e highway and in th e parking lots are preserved. (b) A tiled hierarchy. This 
produces a more visually pleasing image a t all resolutions b ut at th e cost of potentially 
losing inform ation. The cars are now com pletely sm oothed away. D ata courtesy of th e U.S. 
Geological Survey.


71

F ig u r e  5.3: O ur progressive framework using subsampled and tiled hierarchies. (a) A 
com posite satellite image of A tlanta, over 100 gigapixels at full resolution, overlaid on Blue 
M arble background subsampled; (b) a tiled version of th e same satellite image; (c) the 
seamless cloning solution using subsampling; (d) th e same solution com puted using a tiled 
hierarchy; (e) th e solution offset com puted using subsampling; (f) th e solution com puted 
using tiles; (g) a full resolution portion com puted using subsampling; (h) th e same portion 
using tiling. Note th a t even though there is a slight difference in th e com puted solution, 
b o th  th e tiled and th e subsampled hierarchies produce a seamless stitch w ith our framework. 
D ata courtesy of th e U.S. Geological Survey and NASA’s E a rth  Observatory.

hierarchy using a Gaussian kernel to  produce a smooth, antialiased image (Figure 5.3 

right column). W ith  a m inor variation to  th e underlying I/O  layer, our system  also 

supports a faster, subsampled Hierarchical Z-order as proposed by Pascucci and Frank

[2002] (Figure 5.3 left column). For an overview of th e HZ d a ta  form at, see Section 3.1. 

To achieve th e level of scalability necessary in th e current system, we fu rth er simplify the 

HZ d a ta  access scheme. We use a lightweight recursive algorithm  th a t avoids repeated 

index com putations, provides progressive and adaptive access, guarantees cache coherency 

and minimizes th e num egabyteer of I/O  operations w ithout using any explicit caching 

mechanisms. In particular, com puting th e HZ index w ith this new algorithm  a ttain s a th irty  

times speedup com pared to  th e previous work. For example, to com pute th e indices for a 0.8 

gigapixel image th e new algorithm  requires 4.7 seconds where th e previous m ethod would 

take 144.1 seconds. Moreover, since th e traversal follows th e HZ storage order exactly for 

any query window, we guarantee th a t each file is accessed only once w ithout need of holding 

any block of d a ta  in cache. For details on our new recursive algorithm  see Section 3.2. This


72

approach makes th e system  intrinsically cache friendly for any realistic caching architecture 

and, therefore, very flexible in exploiting m odern hardware. Conversion into HZ-order 

requires no additional storage. On th e o ther hand, for tiled hierarchies a 1/3 d a ta  increase 

is common. Due to  our new d a ta  access scheme, conversion to  HZ-order is straightforw ard 

and inexpensive. For our te st d ata, we have found th a t there is only a 27% overhead due to 

th e conversion com pared to ju st copying th e raw d ata. In essence, th e conversion is strictly 

a reordering of th e d a ta  and requires no operations on th e pixel d ata. This conversion will 

outperform  even th e most simple tiled hierarchies which require some m anipulation of the 

pixel d ata.

Each resolution in th e HZ-hierarchy is in plain Z-order, which allows for fast, cache 

coherent access of subregions of th e image. HZ is not tied to a specific d a ta  traversal 

order, such as th e row-major imposed by trad itio n al file formats, as previously observed 

in [89]. In fact, HZ m aintains a high degree of cache coherency even during adaptive local 

traversals. The locality of our d a ta  access provides graceful performance degradation even 

in extrem e conditions. In particular, we d em onstrate accessing a d a ta  set, of roughly a 

terab y te in size, by simply m ounting a rem ote file system over an encrypted V PN  channel 

via a wireless connection. Even in normal running conditions, we have found th a t th e I/O  

overhead caused by using a tiled hierarchy increased th e running tim e by 39%-67%. These 

numegabyteers reflect th e theoretical bound of 1/3 overhead, m ade worse by th e inability 

to  constrain real queries to  perfect alignment w ith th e boundaries of a quadtree. The 

effect of this overhead is detrim ental to  th e scalability of th e system under more difficult 

running conditions such as th e one m entioned above. Moreover, HZ easily handles partially 

converted d ata, as we show in one portion of th e accompanying video for th e editing of the 

Salt Lake City panoram a. In a tiled scheme, th e entire hierarchy may need to  be recom puted 

as new d a ta  is added.

5 .2 .3  I n te r a c tiv e  P r e v ie w  a n d  O u t-o f-C o r e  S o lv e r  R e s u lt s

We d em onstrate th e scalability and interactivity of our approach on several applications, 

using a num egabyteer of images ranging from megapixels to  hundreds of gigapixels in size. 

To fu rth er illustrate th e responsiveness of our system, th e accompanying video shows screen 

captures of live dem onstrations. To highlight particu lar details and validate th e approach, 

th e figures in this section show previews and close-ups of our interactive system, alongside 

th e results of our full out-of-core progressive solver. We also provide running times of our


73

full out-of-core solver com pared w ith th e best current m ethod, stream ing m ultigrid [89], 

which we have verified to  use th e same gradient inform ation. All timings and demos were 

performed on a 64-bit 2.67 GHz Intel Q uad Core desktop, w ith 8 gigabyte of memory. All 

stream ing m ultigrid tim ings were com puted from code provided by th e authors and include 

th e tim ing for th e gradient preprocess along w ith th e tim ing to  produce a solution.

O ur simple framework provides th e illusion of a fully solved Poisson system  at interactive 

frame rates and under continuous p aram eter changes w ith only a simple GL te x tu re for 

display and no special hardw are acceleration. Therefore, our code is platform  independent. 

O ur simple progressive out-of-core solver produces robust solutions w ith run times th a t 

rival [89]. Unlike th e previous m ethod, our out-of-core solver does not use hardw are 

acceleration and did not undergo high code optim ization to achieve th e following runtimes. 

The solver is also sequential and uses no threading to  accelerate the com putation. If further 

optim ization of th e run-tim es is desired, there is nothing in our system  to  prevent the 

addition of these acceleration techniques.

Different from o ther out-of-core m ethods, we do not rely on large external memory 

d a ta  structures and we do not need to pre-com pute gradients for th e entire image. For 

th e Salt Lake City panoram a, for example, th e stream ing m ultigrid m ethod [89] creates

75.2 gigabyte of auxiliary inform ation for a 7.9 gigabyte input image. W hile disk space 

is generally assumed to  be plentiful, such an explosion in disk space is unsustainable for 

images hundreds of gigapixel in size. The collection of satellite imagery we use in our video 

is more th a n  one terab y te in size and would, therefore, require more th a n  9.5 terabytes of 

tem porary storage.

The Edinburgh example is 25 images has resolution 16,950 x 2,956 (50 megapixel). 

At launch, our system  performs a seamless stitch Poisson solve of a global 0.7 megapixel 

image in 1.26 seconds using our direct analytical solve, see Figure 5.4 (a). From this 

point on, th e system  can pan and zoom interactively as if th e full-solution were already 

available. Our local adaptive refinement gives a solution th a t is visually equivalent to a 

solution to  th e entire system, see Figure 5.4 (c, d, and e). In th e accompanying video, 

we d em onstrate interactive editing and solving of th e Poisson system, after th e repeated 

user-selected replacement of pixels of a particu lar color. We also perform a seamless clone of 

a 2000 x 1600 airplane on E d inburgh’s cloudy sky. The plane is anim ated along a linear p ath  

across th e panoram a. As evident in th e video, our framework shows th e entire sequence in 

real-time.W e also dem onstrate similar interactive editing w ith th e Redrock panoram a (d ata


74

F ig u r e  5.4: The E dinburgh P an o ram a 16, 950 x 2, 956 pixels. (a) O ur coarse solution 
com puted a t a resolution of 0.7 megapixels; (b) th e same panoram a solved at full resolution 
w ith our progressive global solver scaled to  approxim ately 12 megapixel for publication; (c) 
a detail view of a particularly bad seam from th e original panoram a; (d) th e problem area 
previewed using our adaptive local refinement; (e) th e problem area solved at full resolution 
using our global solver in 3.48 minutes.

courtesy of Aseem Agarwala): nine images, 19, 588 x 4,457; 87 megapixels. Given this 

initial coarse solution, our m ethod can produce a full solution of Edinburg, see Figure 5.4

(b), in 3.48 minutes. T he stream ing m ultigrid m ethod requires 3.52 minutes. Figure 5.5 (a) 

shows th e convergence and error for our m ethod and stream ing m ultigrid when com pared 

to  the ideal direct solution.

The Salt Lake City example is 611 images w ith resolution of 126,826 x 29,633 (3.27 

gigapixel). A significantly larger example is provided by a panoram a cap tu red  w ith a 

simple cam era m ounted on a G igaPan robot [63]. To maximize individual image quality 

th e pictures were taken w ith autom atic exposure times, which inherently increases the 

color differences between images th a t need to  be corrected by th e Poisson solver. An initial 

coarse preview of 0.87 megapixel is com puted by our direct analytical solver in 2.07 seconds. 

Figure 5.6 shows th e original set of images (a), th e panoram a th a t our systems stitches in 

real tim e (b), th e global solution provided by our out-of-core solver (c), and th e difference 

image between th e interactive preview and th e final solution a t th e coarse resolution (d). 

There are slight deviations at some of th e more challenging seams, b ut overall there is 

negligible visible difference. O ur local adaptive preview mimics well th e global solution, as 

shown in Figure 5.7. To test th e accuracy of th e m ethods, we have run a full analytical 

Poisson solver on a 485 megapixel subset of the panoram a on a H PC -com puter. Figures 5.8

(a) and (b) show how close our out-of-core solution comes to th e exact analytical solution. 

Figure 5.8 (c) shows th a t th e m ultigrid m ethod has yet to  converge to  an acceptable solution 

given an equivalent am ount of running tim e. All solutions were com puted using th e m ap


(a)

(b)

O
O
ô
r

ooo
CM

S tream ing M ultigrid - SLC Portion

O ■

Time (s) RMS
Error

41.6219 41.6237 41.6257

Iterations

g Progressive O u t-o f-co re - SLC Portion

00

cn

oo00

o
o
o

Time (s) RMS
Error

H5.85382
■

■ %
M

3.11283 3.12682
Iterations

■ to

■

15 25 35 45 55 100 200 300

F ig u r e  5.5: The RMS error when com pared to th e ideal analytical solution as we increase iterations for b o th  m ethods. Stream ing 
m ultigrid has b e tte r convergence and less error for th e E dinburgh example (a), though our m ethod remains stable for th e larger 
Salt Lake City panoram a (b). Notice th a t every plot has been scaled independently to  best illustrate th e convergency trends of each 
m ethod.


76

F ig u r e  5.6: P an o ram a of Salt Lake City of 3.27 gigapixel, obtained by stitching 611 
images. (a) Mosaic of th e original images. (b) O ur solution com puted at 0.9 megapixel 
resolution. (c) The full solution provided by our global solver. (d) The difference image 
between our preview and th e full solution at th e preview resolution. B oth (a) and (c) have 
been scaled for publication to  approxim ately 12.9 megapixels.

F ig u r e  5.7: A com parison of our adaptive local preview on a portion of th e Salt Lake 
City panoram a one half of th e full resolution; (a) th e original mosaic, (b) our adaptive 
preview, (c) th e full solution from our global solver, and (d) th e difference image between 
th e adaptive preview and th e full solution


77

F ig u r e  5.8: A com parison of our system  w ith th e best known out of core m ethod [Kazhdan 
and Hoppe 2008] and a full analytical solution on a portion of th e Salt Lake City panoram a, 
21201 x 24001 pixels, 485 megapixel (a) th e full analytical solution; (b) our solution 
com puted in 28.1 minutes; (c) solution from [Kazhdan and Hoppe 2008] com puted in 
24.9 minutes; (d) th e analytical solution where th e solver is allowed to  harm onically fill 
th e boundary; (e) our solution w ith harm onic fill; (f) solution from [Kazhdan and Hoppe 
2008] w ith harm onic fill; (g) th e m ap image used by all solvers to construct th e panoram a 
where th e red color indicates th e image th a t provides th e pixel color and white denotes the 
panoram a boundary.

increases th e memory usage of th e method.

Sierpinki Sponge example has resolution 128k x 128k (16 gigapixel). We have tested 

th e tone m apping application on a synthetic high dynam ic range image generated w ith 

M egaPOV [118]. In this image we use a partially refined model of a Sierpinki Sponge 

to  create high variations in level-of-detail. Such details can be com pletely hidden in the 

d ark areas under projected shadows. We follow the approach introduced by F a tta l [55] to 

reconstruct th e inform ation hidden in th e d ark regions. To validate the approach, we ran 

a typical H DR test image, th e Belgium House, progressively refined from a 16 x 12 coarse 

solution. Even w ith such a coarse initial solution, we achieve results very close to th e exact 

solution (see Figure 5.9 (c) and (d)). Figure 5.9 shows th e original sponge model (a) and


78

y J K I
r ! ? ■  I .  1 I I

^ I ■

F ig u r e  5.9: A pplication of our m ethod to  HDR image compression: (a) Original synthetic 
H DR image of an adaptively refined Sierpinki sponge generated w ith Povray. (b) Tone 
m apped image w ith recovery of detailed inform ation previously hidden in th e shadows.
(c) Belgium House image solved using our coarse-to-fine m ethod w ith an initial 16 x 12 
coarse solution (a  =  0.01, =  0.7, compression coefficient=0.5). (d) The direct analytical 
solution. Image courtesy of R aan an  F attal.

th e processed version (b), where all the details under th e shadows have been recovered.

The satellite example contains a Blue M arble background image th a t is 3.7 gigapixel and 

imagery of A tlan ta and o ther cities which are well over 100 gigapixel. To d em onstrate the 

scalability of our system, we have run th e seamless cloning algorithm  for entire cities over a 

variety of realistic backgrounds from NASA’s Blue M arble Collection [125] (see Figure 5.10). 

We show how a user can take advantage of these capabilities to  achieve artistic effects and 

create virtu al worlds from real d ata. We also create a dynam ic environm ent by anim ating


79

F ig u r e  5.10: Satellite imagery collection w ith a background given by a 3.7 gigapixel 
image from NASA’s Blue M arble Collection. The Progressive Poisson solver allows the 
application of the seamless cloning m ethod to  two copies of th e city of A tlanta, each of 
116 gigapixels. An artist can interactively place a copy of A tlan ta under shallow w ater and 
recreate th e lost city of A tlantis. D ata courtesy of th e U.S. Geological Survey and NASA’s 
E a rth  Observatory.

th e background world m ap over 12 months and concurrently use th e Poisson solver to show 

how th e appearance of a city would change across th e seasons.

5.3 Parallel D istributed Gradient D om ain Editing
In th e following, we provide details of our parallel Progressive Poisson algorithm  and 

M PI im plem entation. This new algorithm  reduces th e tim e to com pute a full resolution 

gradient dom ain solution from hours in th e case of a single, out-of-core solution to minutes 

when run on a distrib u ted  cluster.


80

5 .3 .1  P a r a lle l S o lv e r

Commonly, large images are stored as tiles, which gives one an underlying stru ctu re to  di­

vide an image am ongst th e nodes/processors for a d istrib u ted  solver. Tile-based distrib u ted  

solvers have been shown to  work well when only local trends are present. Seamless stitching 

commonly contains large scale tren d s where a naive tile-based approach will provide poor 

results. The addition of the Progressive Poisson m e th o d ’s coarse upsampling, allows for a 

simple, tile-based parallel solver th a t can account for large trends. Our algorithm  works 

in two phases: T he first phase performs th e progressive upsam ple of a precom puted coarse 

solution for each tile. The second phase solves for a sm ooth image on tiles th a t significantly 

overlap th e solution tiles from th e first phase. In this way, th e second phase sm ooths any 

seams not captured or even introduced by th e first phase, producing a complete, seamless 

image.

5 .3 .1 .1  D a t a  d i s t r i b u t i o n  a s tile s . A lthough a tile-based approach leverages a 

common image storage form at, it is not typically how m ethods are designed to  handle 

seamless stitching of large panoram as. For instance, m ethods like stream ing m ultigrid [89, 

90] often assume precom puted gradients for th e whole image. Our system  is designed to 

take tiles directly as inp u t and therefore m ust be able to  handle th e gradient com putation 

on-the-fly. An im p o rtan t and often undocum ented com ponent of panoram a stitching is the 

m ap or label image. Given an ordered set of images which compose the panoram a, the 

m ap image gives th e correspondence of a pixel location in the overall panoram a to the 

smaller image th a t supplies th e color. This m ap file is necessary to  determ ine th e difference 

between actual gradients and those due to  seams. This m ap also defines th e boundaries of 

th e panoram a, which are commonly irregular. This file along w ith each individual image 

th a t composes th e mosaic are needed for a traditional, out-of-core scheme [89, 149] for 

gradient com putation. If th e gradient across th e seams is assumed to  be zero, which is a 

common technique we adopt for th is solver, each tile can be com posited in advance and the 

m ap file is only needed to denote image seams or boundary. As noted above, this com posited 

tile is often already provided if used in a trad itio n al large image system. The map file can 

th en  be encoded as an ex tra channel of color inform ation, typically th e alpha channel. For 

mosaics of m any hundreds of images, such as th e examples provided in th is dissertation, 

we cannot encode an index for each image in a byte of d ata. Though in practice each tile 

has very little probability of having more th a n  256 individual images, each image is given 

a unique 0-255 num ber on a per tile basis.


81

We have chosen an overlap of 50% in b o th  dimensions for th e second phase windowing 

scheme of th e parallel solver for simplicity in im plem entation. Each window is composed 

of a 2 x 2 collections of tiles. To avoid undefined windows in th e second phase, we add 

a symbolic padding of one row /colum n of tiles to  all sides of th e image which th e solver 

regards as pure boundary. Figure 5.11 gives an example of a tile layout. T he overlapping 

window size used for our testin g  was 1024 x 1024 pixels (assuming 512 x 512 tiles), which 

we found to  be a good compromise between a low memory footprint and image coverage. 

Each node receives a p artitio n  of windows equivalent to a contiguous subimage w ith no 

overlap necessary between nodes during th e same phase. D ata can be d istrib u ted  evenly 

across all nodes in th e case of a homogeneous d istrib u ted  system  or dependent on weights 

due to  available resources in th e case of a heterogeneous hardw are. We provide a te st case 

for a heterogeneous system in Section 5.

5 .3 .1 .2  C o a r s e  s o lu tio n . As a first step, th e first phase of our solver will upsample 

via bilinear interpolation a 1-2 megapixel coarse solution. Much like th e Progressive Poisson 

m ethod [149], each node com putes a solution in ju s t a few seconds using a direct F F T  solver 

on a coarsely sampled version of our large image. In tiled hierarchies, this coarse image is 

typically already present and can be encoded w ith th e m ap inform ation in much the same 

way as th e tiles.

5 .3 .1 .3  F i r s t  p h a s e : p r o g r e s s iv e  s o lu tio n . This phase com putes a Progressive 

Poisson solution for each window which are composed of tiles read off of a distrib u ted

(a)
*

Image

2 2 3 3 4 4

2 2 3 3 4 4

5 5 6 6 7 7 8 8

5 5 6 6 7 7 8 8

9 9 10 10 11 11 12 12

9 9 10 10 11 11 12 12

*
6 6 7 7 8 8

6 6 7 7 8 8

10 10 11 11 12 12

10 10 11 11 12 12

(b)

F ig u r e  5.11: O ur tile-based approach: (a) An input image is divided into equally spaced 
tiles. In th e first phase, after a symbolic padding by a column and row in all dimensions, 
a solver is run on a window denoted by a collection of four labeled tiles. D ata are sent 
and collected for th e next phase to  create new d a ta  windows w ith a 50% overlap. (b) An 
example tile layout for the Fall P an o ram a example.


82

file system. To progressively solve a window, an image hierarchy is necessary. For our 

im plem entation a stan d ard  power-of-two image pyram id was used. As a first step, the 

solver upsamples th e solution to  a finer resolution in the image pyram id using a coarse 

solution image and th e original pixel values. An iterative solver is th en  run for several 

iterations to sm ooth this upsam ple using th e original pixel gradients as th e guiding field. 

This process is repeating down th e image hierarchy until th e full resolution is reached. The 

solver is considered to  have converged at this resolution when th e L 2 norm  falls below 10-3 

which is based on th e range of byte color d ata. From our testing, we have found th a t SOR 

gives b o th  good running times and low memory consum ption and therefore is our default 

solver. As noted above, this window is logically composed of four tiles, which are com puted 

and saved in memory for th e next phase as floating point color d ata. This leads to 12 

b ytes/pixel (three floating point color d ata) to  transfer between phases. Given th e d a ta  

distribution, one node may process many windows. If this is th e case, only th e tiles which 

border a node’s dom ain are prepared to  be transferred to  another node, thereby keeping 

d a ta  com m unication between phases to  a relatively small zone.

5 .3 .1 .4  S e c o n d  p h a s e : o v e r la p  s o lu tio n . The second phase gathers th e four tiles 

(b o th  solution and original) th a t make up th e overlapping window. A fter th e d a ta  are 

gathered, th e gradients are com puted from th e original pixel values and an iterative solver 

(SOR) is run after being initialized w ith the solutions from th e first phase. The iterative 

solver is constrained to  only work on interior pixels to  prevent this phase from introducing 

new seams at th e window boundary. Technically, there may be errors at th e pixels around 

th e m idpoints of th e boundary edges of these windows, though we have not encountered 

this in practice. Again, this solver is run until convergence given by th e L 2 norm. N ote th a t 

even though th e tile gradients are com puted in th e first phase, we have chosen to  recom pute 

them  on th e fly in th e second phase. Passing th e gradients would cost a t least an additional 

12 b ytes/pixel overhead. As nodes increase, d a ta  transfer and com m unication becomes a 

significant bottleneck in most distrib u ted  schemes therefore, we chose to  pay th e cost of 

increased com putation and reading th e less expensive byte image d a ta  from th e distrib u ted  

file system  instead of th e costly transfer.

5 .3 .1 .5  P a r a ll e l i m p l e m e n t a t i o n  d e ta ils . Each node has one m aster th read  which 

coordinates all processing and com munication. The core com ponent of this th read  is a 

priority queue of windows and tiles to  be processed. At launch, this queue is initialized 

by a separate seeding th read  w ith th e initial dom ain of windows to  be solved in th e first


83

phase. Because of th e separation of th e main th read  from th e seeding of th e queue, the 

main th read  can begin processing windows immediately. Each window is given a first phase 

id, which is th e window’s row and column location in th e subimage to  be processed by a 

node. C om m unication between nodes need only be one-way in our system, therefore we 

have chosen for com m unication to  be “u p stream ” between nodes, i.e., th e nodes operating 

on a subimage w ith horizontal or vertical location greater th a n  th e current node. In order 

to  avoid starvation in th e second phase, th e queue is loaded w ith windows in reverse order 

in term s of th e tile id. Figure 5.12 gives an example of th e traversal and communication. 

All initially seeded windows are given equal low priority in th e queue. In essence th e initial 

queue operates much like a first-in-first-out (FIFO ) queue. As windows are removed from the 

queue, th e main th read  launches a progressive solver th read  which is handed off to  an in tra­

node dynam ic scheduler. Our im plem entation uses a HyperFlow [170] scheduler to  execute 

th e solver on all available cores. H yperFlow has been shown to  efficiently schedule execution

F ig u r e  5.12: Windows are distrib u ted  as evenly as possible across all nodes in the 
distrib u ted  system. Windows assigned to  a specific node are denoted by color above. Given 
th e overlap scheme, d a ta  transfer only needs to  occur one-way, denoted by th e red arrows 
and boundary above. To avoid starvation between phases and to hide as much d a ta  transfer 
as possible, windows are processed in inverse order (white arrows) and th e tiles needed by 
other nodes are transferred immediately.


84

of workflows on multicore systems and therefore is th e perfect solution for our intra-node 

scheduling. In all there are two distinct sequential stages in each phase: (1) loading of 

th e tile d a ta  and th e com putation of th e image gradient and (2) the progressive solution. 

This flow inform ation allows HyperFlow to  exploit d ata, task, and pipeline parallelism to 

maximize throu g h p u t.

A fter a solution is com puted, th e progressive solver th read  partitio n s th e window into 

th e tiles th a t comprise it. This allows th e second phase to recombine th e tiles needed for 

th e 50% overlap window. All four tiles are loaded into th e queue w ith high priority. If 

th e main th read  removes a tile (as opposed to  a window) from th e queue and th e tile is 

needed by another node, th e main th read  im m ediately sends th e d a ta  asynchronously to 

th e proper node. Otherwise, if th e node needs this tile for phase two, th e second phase id 

of th e window which needs th e tile is com puted and hashed w ith a two-dimensional hash 

function th e same size as th e window dom ain for the second phase. If all four tiles for a 

given second phase window have been hashed, th e main th read  now knows a second phase 

window is ready and im m ediately passes th e window to  a solver th read  for processing. If 

th e main th read  receives a solved tile from another node, this is also im m ediately hashed.

5 .3 .2  R e s u lt s

To d em onstrate th e scalability and ad ap tab ility  of th e approach, we have tested our 

im plem entation using two panoram a d atasets, gigapixels in size. To illustrate th e portability 

of the system, we have also shown its running times and scalability on two distrib u ted  

systems. Our main system, th e NVIDIA Center of Excellence cluster in th e Scientific 

Com puting and Imaging In stitu te  at th e U niversity of U tah, consists of 60 active nodes 

w ith 2.67GHz Xeon X5550 Processors (8 cores), 24GB of RAM per node, and 750GB local 

scratch disk space. The second system, th e Longhorn visualization cluster in th e Texas 

Advanced C om puter Center a t th e University of Texas a t A ustin, consists of 256 nodes (of 

which 128 were available for our tests) w ith 2.5GHz Nehalem Processors (8 cores), 48GB 

of RAM per node, and 73GB local scratch disk space. Weak and strong scalability tests 

were performed on b oth systems. Given th e proven scalability of Hyperflow on one node, 

we have tested th e scalability of th e M PI im plem entation from 2-60 and 2-128 nodes for the 

NVIDIA cluster and Longhorn cluster, respectively. Timings are taken as best over several 

runs to  discount external effects to  th e cluster from shared resources such as the distrib u ted  

file system. The d atasets used for testing were:


85

•  Fall Panoram a. 126,826x29, 633, 3.27 gigapixel. W hen tiled, this d ataset is composed 

of 124 x 29 10242 sized windows. See Figure 5.13 for image results from a NVIDIA 

cluster 480 core te st run.

•  W inter Panoram a. 92, 570 x 28, 600, 2.65 gigapixel. W hen tiled, this d atase t is 

composed of 91 x 28 10242 sized windows. See Figure 5.14 for image results from 

a NVIDIA cluster 480 core te st run.

5 .3 .2 .1  N V I D I A  c l u s t e r . To show th e M PI scalability of our framework and im­

plem entation, strong and weak scaling tests were performed for 2-60 nodes. As shown 

in Tables 5.1 and 5.2, b o th  d atasets scale close to  ideal and w ith high efficiency for strong

F ig u r e  5.13: Fall P an o ram a - 126, 826 x 29, 633, 3.27 gigapixel. (a) The panoram a before 
seamless blending and (b) th e result of th e parallel Poisson solver run on 480 cores w ith 
124 x 29 windows and com puted in 5.88 minutes.

F ig u r e  5.14: W inter P an o ram a - 92, 570 x 28, 600, 2.65 gigapixel. (a) The result of the 
parallel Poisson solver run on 480 cores w ith 91 x 28 windows and com puted in 6.02 minutes, 
(b) th e panoram a before seamless blending, and (c) th e coarse panoram a solution.


86

T a b l e  5 .1 : T h e  stro n g  scaling resu lts for th e  Fall P a n o ra m a  ru n  on th e  N V ID IA  c lu ste r 
from  2-60 nodes up to  a to ta l of 480 cores. O verhead (O /H ) due to  M P I c o m m u n icatio n  and 
I / O  is also provided along w ith  its pe rc e n ta g e  of a c tu a l ru n n in g  tim e. T h e  Fall P a n o ra m a , 
due to  its larger size begins to  lose efficiency a t aro u n d  32 nodes w hen I / O  overhead begins 
to  d o m in a te . E ven w ith  th is  overhead, th e  efficiency (Eff.) rem ains acceptable.

S tro n g  Scaling - Fall P a n o ra m a  - N V ID IA  c lu ste r
Nodes Cores Ideal (m) Actual (m) Eff. % O /H  (m) % O /H

2 16 79.35 79.35 100.0 18.80 23.7
4 32 39.68 40.08 97.1 9.05 22.2
8 64 19.84 20.83 95.2 7.28 35.0

16 128 9.92 11.43 78.9 6.50 51.7
32 256 4.96 6.20 53.8 6.20 67.3
48 384 3.31 6.40 51.7 6.40 100.0
60 480 2.65 5.88 45.0 5.88 100.0

T a b l e  5 .2 : T h e  stro n g  scaling resu lts for th e  W in te r P a n o ra m a  ru n  on th e  N V ID IA  c lu ste r 
from  2-60 nodes u p  to  a to ta l of 480 cores. O verhead (O /H ) d u e to  M P I com m u n icatio n  
a n d  I / O  is also provided along w ith  its pe rc e n ta g e  of a c tu a l ru n n in g  tim e. F or th e  
W in te r P a n o ra m a , th e  I / O  overhead does n o t effect perform ance u p  to  60 nodes a n d  th e  
im p lem e n ta tio n  m ain ta in s efficiency (Eff.) th ro u g h o u t all of o u r runs.

S tro n g  Scaling - W in te r P a n o ra m a  - N V ID IA  c lu ste r
Nodes Cores Ideal (m) Actual (m) Eff. % O /H  (m) % O /H

2 16 128.87 128.87 100.0 8.63 6.7
4 32 64.43 77.68 82.9 4.70 6.1
8 64 32.22 40.63 79.3 4.28 10.5

16 128 16.11 21.17 76.1 4.17 19.7
32 256 8.05 10.88 74.0 4.08 37.5
48 384 5.37 6.98 76.9 4.10 58.7
60 480 4.30 6.02 71.4 4.00 66.5

scaling. T h e  Fall P a n o ra m a , due to  its larger size begins to  lose efficiency a t a ro u n d  32 nodes 

w hen I / O  overhead begins to  d o m in a te . E ven w ith  th is overhead, th e  efficiency rem ains 

acceptable. For th e  W in te r P a n o ra m a , th e  I / O  overhead does n o t effect p erform ance up to  

60 nodes a n d  th e  im p lem e n ta tio n  m ain ta in s efficiency th ro u g h o u t th e  te s t. W eak scaling 

te s ts  were perform ed using a subim age of th e  Fall P a n o ra m a  d a ta s e t. See T able 5.3 for th e  

w eak scaling resu lts. As th e  n u m b er of cores increases so does th e  im age resolution to  be 

solved. T h e  subim age was ex p an d ed  from  th e  c e n te r of th e  full im age and ite ra tio n s  of th e  

solver for all w indows were locked a t 1000 for te s tin g  to  ensure no v a ria tio n  is d u e to  slower 

converging im age areas. As th e  figure shows, o u r im p lem e n ta tio n  shows good w eak scaling


87

T a b l e  5 .3 : W eak scaling te s ts  ru n  on th e  N V ID IA  c lu ste r for th e  Fall P a n o ra m a  d a ta s e t. 
As th e  n u m b er of cores, increases so does th e  im age reso lu tio n  to  be solved. T h e  im age was 
e x p a n d ed  from  th e  c e n te r of th e  full im age. Ite ra tio n s  of th e  solver for all w indows were 
locked a t 1000 for te s tin g  to  ensure no v a ria tio n  is d u e to  slower converging im age areas. As 
is shown, o u r im p lem e n ta tio n  shows good efficiency even w hen ru n n in g  on th e  m axim um  
n u m b er of cores.

W eak Scaling - N V ID IA  clu ster
N odes Cores Size (M P ) T im e (m in.) Efficiency

2 16 100.66 5.55 100.00%
4 32 201.33 5.55 100.00%
8 64 402.65 5.53 100.30%

16 128 805.31 5.68 97.65%
32 256 1610.61 5.77 96.24%
60 480 3019.90 6.57 84.52%

efficiency even for 60 nodes w ith  480 cores. In all, we have p ro d u ced  a g rad ie n t d om ain 

solution to  a d a ta s e t w hich in previous w ork th e  b est know n m eth o d s [89, 149] to o k  hours 

to  com pute.

5 .3 .2 .2  L o n g h o r n  c l u s t e r . To show th e  p o r ta b ility  a n d  M P I scalability of our 

fram ew ork a n d  im p lem e n ta tio n , stro n g  a n d  weak scaling te s ts  were perform ed on th e  largest 

d a ta s e t (Fall P a n o ra m a ) on a second clu ster. T h e  stro n g  scaling te s ts  were perform ed from  

2-128 nodes a n d  th e  weak scaling te sts, lim ited by th e  size of th e  im age, were perform ed 

from  2-64 nodes. As show n in T able 5.4, o u r im p lem e n ta tio n  m a in ta in s very good efficiency 

a n d  tim in g s for our s tro n g  scaling te s t up to  th e  full 1024 cores available on th e  system . 

M uch like th e  N V ID IA  cluster, weak scaling te s ts  were perform ed on a p o rtio n  of th e  Fall

T a b l e  5 .4 : To d e m o n s tra te  th e  p o r ta b ility  of o u r im p lem e n ta tio n , we have ru n  stro n g  
scalab ility  te s tin g  for th e  Fall P a n o ra m a  on th e  L onghorn c lu ste r from  2-128 nodes up to  a 
t o ta l of 1024 cores. As th e  nu m b ers show, we m a in ta in  good scalability a n d  efficiency even 
w hen ru n n in g  on all available nodes a n d  cores.

S tro n g  Scaling - Fall P a n o ra m a  - L onghorn
Nodes Cores Ideal(m) Actual(m) Efficiency

2 16 84.07 84.07 100%
4 32 42.03 43.18 97%
8 64 21.02 21.85 96%

16 128 10.51 12.08 87%
32 256 5.25 6.93 76%
64 512 2.63 3.89 68%

128 1024 1.31 2.73 48%


88

P an o ram a and iterations of th e solver were locked at 1000. To ensure th a t each node 

got a reasonably sized subimage to  solve, th e tests were limited to  64 nodes. Table 5.5 

dem onstrates our im plem entations ability to  weak scale on this cluster, m aintaining good 

efficiency for up to  512 cores.

5 .3 .2 .3  H e te r o g e n e o u s  c l u s t e r . As a final test of p o rtability and adaptability, we 

presented our im plem entation w ith a simulated heterogeneous distrib u ted  system. Our 

parallel framework provides th e ability to give weights to  nodes which is typically even and 

therefore results in an even d istribution of windows across all nodes. For this example, a 

simple weighting scheme can easily load-balance this mixed network, giving the nodes w ith 

more resources more windows to  com pute. Table 5.6 gives an example mixed system  of two 

8-core nodes, four 4-core nodes, and eight 2-core nodes. In all, this system  has 48 available 

cores. The weights for our framework are simply th e num ber of cores available in each 

node. This network was sim ulated using th e NVIDIA cluster by overloading Hyperflow’s 

knowledge of available resources w ith our desired properties. W hile this is not a perfect 

sim ulation since th e m ain th read  handling M PI com m unication would not be limited to 

reside on th e desired cores, as shown in th e strong scaling tests even w ith evenly distrib u ted  

d a ta  on 8-16 nodes th e im plem entation is not yet I/O  bound. Therefore, we should still have 

a good approxim ation to a real, limited system. The figure details th e window d istribution 

and timings for th e Fall P an o ram a for all nodes in this test. As is shown, we m aintain good 

load balancing given proper node weighting when dealing w ith heterogenous systems. The 

m ax runtim e of 32.70 minutes for this 48 core system  is on par w ith run tim e for the 32 

core (40.08 minutes) and 64 core (20.83 minutes) strong scaling results.

T a b le  5.5: Weak scaling tests run on th e Longhorn cluster for th e Fall P an o ram a dataset.

Weak Scaling - Longhorn cluster
Nodes Cores Size (M P) Time (min.) Efficiency

2 16 75.5 5.50 100.00%
4 32 151 6.13 89.67%
8 64 302 6.15 89.43%

16 128 604 6.15 89.43%
32 256 1208 6.13 89.67%
64 512 2416 6.15 89.43%


89

T a b le  5.6: O ur sim ulated heterogeneous system. This te st example is a sim ulated mixed 
system  of 2 8-core nodes, 4 4-core nodes, and 8 2-core nodes. The weights for our framework 
are th e num ber of cores available in each node. T he timings and window distributions are 
for Fall P an o ram a d ataset. As you can see, w ith th e proper weightings our framework 
can d istrib u te windows proportionally based on th e performance of th e system. The m ax 
runtim e of 32.70 minutes for this 48 core system  is on par w ith tim ings for th e 32 core 
(40.08 minutes) and 64 core (20.83 minutes) runs from th e strong scaling test.

■ Total W ind o w s  Processed ■  T im e  (m )

Heterogeneous System - Fall Panorama
Cores 8 4 2

Time(m) 27.9 28.9 32.1 32.7 32 32.5 16.6 23.1 28.7 32.2 20 23.6 24.6 28.4
Windows 1239 1239 640 640 580 600 276 285 300 330 304 304 290 319

5.4 Gradient D om ain Editing on th e Cloud
The parallel algorithm  outlined in th e previous section provides a full resolution gradient 

dom ain solution for massive images in only a few m inutes of processing tim e. In this section, 

we explore redesigning this technique as a cloud-based application. For this work, we 

chose to  ta rg e t th e M apReduce framework and its open source im plem entation, Hadoop. 

M apReduce and Hadoop have emerged in recent years as popular and widely supported 

cloud technologies. Therefore, they are th e logical targ ets for this work.

5 .4 .1  M a p R e d u c e  a n d  H a d o o p

This subsection briefly reviews some of th e fundam entals of th e M apReduce framework 

and how to design graphics algorithm s to  work well w ith H adoop and H adoop's D istributed 

File System (HDFS). We provide a high level view to  justify design decisions outlined in 

th e next section.

The m ap function operates on key/value pairs producing one or more key/value pairs 

for th e reduce phase. The reduce function is a per-key operation th a t works on th e o u tp u t 

of th e m apper (see Figure 5.15). H adoop’s scheduler will interleave th eir execution as d a ta  

are available. Currently, Hadoop does not support job chaining. Therefore, any algorithm  

th a t requires two passes will likely require two separate M apReduce jobs. W hile this will 

likely change in the future, at this tim e minimizing th e num ber of passes is an im portant 

consideration since th e overhead incurred by launching new jobs in H adoop is significant.


90

F ig u r e  5.15: The two phases of a M apReduce job. In th e figure, three m ap tasks produce 
key/values pairs th a t are hashed into two bins corresponding to  th e two reduce tasks in the 
job. W hen the d a ta  are ready, th e reducers grab th eir needed d a ta  from th e m ap p er’s local 
disk.

In Section 5.4.2 we detail our algorithm , which requires only one pass.

H adoop has been optimized to  handle large files and to  pro cess/tran sfer small chunks of 

d ata. For m any applications including th e one outlined in th e next section, understanding 

H adoop’s d a ta  flow is vital for an efficient im plem entation, much like random  memory access 

m ust be considered in a GPU.

5 .4 .1 .1  I n p u t .  The Hadoop distrib u ted  file system  stripes d a ta  across all available 

nodes on a per block basis w ith replication to  guarantee a certain level of locality for the 

m ap phase and to  be able to  handle system  faults. W hen a jo b  is launched, H adoop will 

split th e input d a ta  evenly for all map instances. For our example, allowing Hadoop to 

arb itrarily  split th e input d a ta  could result in fragm ented images. Therefore, th e system 

allows th e developer to  specialize th e function reading th e inp u t which we use to constrain 

th e split to only occur a t image boundaries.

5 .4 .1 .2  M a p R e d u c e  t r a n s f e r .  D uring execution, each m apper hashes th e key of 

each key/value pair into bins. The num ber of bins equal th e num ber of reducers (see 

Figure 5.15) and each bin is also sorted by key. The m ap first stores and sorts th e d a ta  in 

a buffer in memory b u t will spill to  disk if this is exceeded (the default buffer size is 512


91

MB). This spill can lead to poor m apper performance and should be avoided if possible.

A fter a m apper completes execution, th e interm ediate d a ta  are stored to  a node’s local 

disk. Each m apper informs th e control node th a t its d a ta  are finished and ready for the 

reducers. Since H adoop assumes th a t any m apper is equally likely to  produce any key, 

there is no assumed locality for th e reducers. Each reducer m ust pull its d a ta  from multiple 

m appers in th e cluster (see Figure 5.15 and 5.16). If a reducer m ust grab key/value pairs 

from many local disks on th e cluster (possibly an N -to -N  mapping), this phase can have 

d rastic effect on th e performance.

Job coordination is handled w ith a m aster/slave model where th e control node, called 

th e Job Tracker distributes and manages th e m ap and reduce tasks. W hen a program  is 

launched th e Job Tracker initiates Task Trackers on nodes in th e cluster. The Job Tracker 

th en  schedules tasks on th e Task Tracker m aintaining a com m unication link to  handle 

system  faults (see Figure 5.16).

— I iiiP
NodeJob tracker

1
v  Create

C om m unication Task tracker

DFS

I
I- /

Task tracker

Output from 
other mappers

Node
Disk

Node
Disk

Node
Disk

F ig u r e  5.16: A diagram  of th e job control and d a ta  flow for one Task Tracker in a Hadoop 
job. The dotted, red arrows indicate d a ta  flow over th e network; dashed arrows represent 
com munication; th e blue arrow indicates a local d a ta  w rite and th e black arrows indicate 
an action taken by th e node.


92

5 .4 .2  M a p R e d u c e  for G r a d ie n t D o m a in

Commonly, large images are stored as tiles, which gives us th e underlying stru ctu re for 

our scheme. However, a tiled-based approach by itself would not account for large scale 

trends common in panoram as (see Figure 5.17). Therefore, we add upsam pling of a coarse 

solution similar to  th e approach used in Summa et al. [149] to  cap tu re these trends. Our 

algorithm  works in two phases: The first phase performs th e upsam ple of a precom puted 

coarse solution and solves each tile to  produce a sm ooth solution over th e extent of th e tile. 

The second phase solves for a sm ooth image on tiles th a t significantly overlap th e smoothed 

tiles from th e first phase. In this way, th e second phase smooths any seams not captured 

or even introduced by th e first phase solvers. This algorithm  can be simply implem ented 

in one M apReduce job in Hadoop.

5 .4 .2 .1  T ile s . We have chosen an overlap of 50% in b o th  dimensions for th e second 

phase due to  th e simplicity of im plem entation, although Summa et al. [149] has shown th a t 

a good solution can be found w ith much less. To easily accomplish this overlap, we divide 

th e d a ta  into tiles 1/4 of th e proper size. Figure 5.18 shows th e tile layout for our test 

images. Each phase operates on four of these smaller tiles which are combined to  construct 

th e larger tiles. To avoid undefined tiles in th e second phase, we add a symbolic padding of 

one row /colum n to  all sides of th e image. Figure 5.19 gives an example of a tile layout. An 

im p o rtan t com ponent of panoram a stitching is a m ap file which gives th e correspondence 

from a pixel location in th e overall panoram a to th e smaller image th a t supplies th e color. 

This map file is necessary to determ ine the difference between actual gradients and those 

due to  seams. This m ap also defines th e boundaries of th e panoram a, which are commonly 

irregular and do not usually follow th e actual image boundary. T he panoram a boundary is 

a seam we would like to  preserve. We encode th e m ap file into each individual tile as an 

alpha channel. For images such as th e Salt Lake City example, we cannot encode an index

F ig u r e  5.17: A lthough th e result is a sm ooth image, w ithout coarse upsam pling th e final 
image will fail to  account for large trends th a t span beyond a single overlap and can lead 
to  unwanted shifts in color. Notice th e vertical banding denoted by the red arrows.


93

F ig u r e  5.18: The 512 x 512 tiles used in our Edinburgh (a), Redrock (b), and Salt Lake 
City (c) examples.

1
- 0 ,0 -

1
- 0 ,1 -

1
- 0 ,2 -

1
- 0 ,3 -

- 1 ,0 - -1,1 - - 1 ,2 - - 1 ,3 -

-2 ,0 ­
1

-2 ,1 ­
1

-2 ,2 ­
1

- 2 , 3 ­
1

1,1 1,2- 1,3-

2,1 2,2 2 ,3 -

Image Map Reduce

F ig u r e  5.19: Our tile-based approach: An input image is divided into equally spaced tiles. 
In th e m ap phase after a symbolic padding by a column and row in all dimensions, a solver 
is run on a collection of four tiles labeled by numbers above. A fter th e m apper finishes, it 
assigns a key such th a t each reducer runs its solver a collection of four tiles th a t have a 50% 
overlap w ith the previous solutions.

for each image in a byte of d ata. However, th e map is only used to  denote if two pixels are 

from the same source image or if a pixel is on th e boundary. Therefore a byte is more th a n  

enough to  encode this correspondence. The symbolic padding is encoded as boundary and 

images th a t are not evenly divisible by our tile size are also padded w ith boundary. The 

overlapping window size used for our te st was 1024 x 1024 pixels which we found was a 

good compromise between a low memory footprint and image coverage.


94

5 .4 .2 .2  C o a r s e  s o lu tio n . As a first step, th e first phase of our solver will upsample 

via bilinear interpolation a 1-2 megapixel coarse solution. Much like th e m ethod from 

Summa et al. [149], we precom pute th e coarse solution in ju st a few seconds using a direct 

F F T  solver on a coarsely sampled version of our large image. In tiled hierarchies, this coarse 

image is typically already present. In Hadoop, this coarse solution is sent along w ith the 

M apReduce jo b  when launched. The Job Tracker stores this image on th e distrib u ted  file 

system  for Task Trackers to  pull and store locally.

5 .4 .2 .3  F i r s t  ( m a p )  p h a s e . A fter loading/com bining th e smaller tiles and perform ­

ing th e upsample, th e first phase runs an iterative solver initialized w ith th e upsam pled pixel 

colors. From our testing, we have found th a t SOR gives good running tim es and low memory 

consum ption and therefore is our default solver. The solver is considered to  have converged 

when th e L 2 norm  falls below 10-3  which is based on th e range of byte d ata. A fter a sm ooth 

image is com puted, the solution is split back into its four smaller tiles and sent to  th e next 

phase as byte d ata. Some precision is lost in th e solution d a ta  by this tru n catio n  of bits 

and can cause slower convergence in th e next phase. However, in many distrib u ted  systems, 

th e bottleneck is d a ta  transfer; therefore it is preferable to  use smaller d a ta  a t th e cost of 

increased com putation. For th e Hadoop im plem entation, this first phase of our algorithm  

fits well w ith H adoop’s m ap phase. Each m apper em its a key/value pair, where th e value 

is the d a ta  from a small tile and th e key is com puted in such a way th a t we achieve the 

desired 50% overlap in th e next phase. The key is com puted as a row /colum n pair in the 

space of th e larger tiles. This key is stored in 4 bytes before being em itted. The high word 

contains th e row and th e low word contains th e column. For a tile a t location (x ,y ), the 

key for sub-tile (i, j )  is com puted as:

keyjro w  =  x * 2 +  i; (5.4)

key-col =  y * 2 +  j;  (5.5)

Below we provide pseudocode for th e m ap phase and Figure 5.19 provides an example.

5 .4 .2 .4  S e c o n d  ( r e d u c e )  p h a s e . T he second phase now gathers th e four smaller 

tiles th a t make up th e overlapping window. These tiles sit as interm ediate d a ta  on th e local 

disks of th e cluster. If th e system accounts for locality, each instance would only need to 

gath er three tiles since th e nodes could be placed such th a t one tile is always stored locally. 

A fter th e d a ta  are gathered, th e gradients are com puted from th e original pixel values and 

an iterative solver (SOR) is run after being initialized w ith th e solutions from th e first


95

proc Map(blockld, im age) = 
row :=  b lockld >> 16; 
col :=  blockld & 0xFFFF; 
solver.com pute-gradient (image); 
solver.upsam ple_coarse(im age, row, col); 
so lver.S O R (im a g e); 
for i :=  0 to  1 do 

for j  :=  0 to 1 do
keyR o w  :=  row * 2 +  i; 
keyC ol :=  col * 2 +  j ; 
key :=  keyR o w  << 16 +  keyCol; 
emit(key, solver.tiles[i][j ]);

phase. The iterative solver is constrained to  only work on interior pixels to  prevent this 

phase from introducing new seams. Technically, there may be errors at th e pixels around 

th e m idpoints of th e boundary edges of these tiles, though in practice we have not seen 

this affect th e solution. This second phase fits well w ith H adoop’s reduce phase w ith some 

considerations. Hadoop does not account for d a ta  locality for th e reducers, therefore, we 

m ust assume th e worst case g ath er of four tiles. Also, th e reducers do not have access to  the 

HDFS, nor can any task  request specific d ata. The m appers in th e first phase modify the 

pixel values, b u t th e reducer needs th e original values to  com pute th e gradient vector for 

th e iterative solver. Therefore, th e m apper m ust also concatenate th e original pixel values 

to  th e solved d a ta  before it em its th e key/value pair. This leads to  a 6 b ytes/pixel transfer 

between phases. Below we provide pseudocode for th e reduce phase.

proc Reduce(blockld, [(map1, o r g 1 ) ,..., (map4, org4)]) =  
m apper-output :=  merge(map1, map2, map3, map4); 
o rig in a L tile  :=  merge(org1, org2, org3, org4); 
so lver.com pute-gradient(original-tile); 
solver.S O R (m a p p er-o u tp u t); 
e m it(B lo ck Id , solver.tiles);

5 .4 .2 .5  S to r a g e  in  t h e  H D F S . In Hadoop, saving th e image in stan d ard  row m ajor 

order would lead to  poor performance in th e m appers since there is good locality in only 

one dimension. Saving individual tiles would also not be efficient since H adoop’s HDFS is 

optimized for large files. Therefore, we save th e d a ta  as th e large tiles, comprised of th e four


96

smaller tiles, which th e m apper needs in th e first phase. We concatenate the tiles together, 

row-by-row, into a single large file.

5 .4 .3  R e s u lt s
We d em onstrate th e quality of our approach on three te st panoram as which range from 

megapixels to  gigapixels in size. We also d em onstrate th e generality of th e ab stractio n  by 

running our code, w ithout modification, on a single desktop and on a large cluster. Finally, 

we te st H adoop’s scalability w ith two of our te st panoram as.

The single node tests were performed on a 2 xQ uad-C ore Intel Xeon w5580 3.2GHz 

desktop w ith 24GB of memory. For our large distrib u ted  tests, we ran our m ethod on the 

NSF CLuE [41] cluster, which consists of 275 nodes each w ith dual Intel Xeon 2.8GHz 

processors w ith H yperT hreading and 8GB of memory. W hile still a valuable resource for 

research, as far as m odern clusters are concerned, C L uE ’s hardw are is o u td ated  being a 

retired system based on a 6-year-old technology originally produced in 2004. Moreover, 

CLuE is also a shared resource and all tim ings were certainly affected by other researchers 

using th e machines.

The E dinburgh panoram a consists of 25 images w ith a full resolution of 16, 950 x 

2, 956 pixels (50 megapixel) and was broken into 48 tiles. For our single node test, our 

m ethod produced a solution in 81 seconds w ith eight m appers and four reducers. The 

Redrock panoram a consists of nine images w ith a full resolution of 19, 588 x 4,457 pixels 

(87 megapixel) and was partitioned into 96 tiles. O ur m ethod running on a single node 

solved th e panoram a in 156 seconds w ith nine m appers and nine reducers. The solver 

running on th e cluster ran in 199 seconds w ith 96 m appers and 96 reducers. Due to  the 

small size of th e panoram as, th e ex tra parallelization given to us by th e distrib u ted  system 

did not increase performance. Q uite th e opposite was true, th e runtim es were worse due 

to  overhead of H adoop launching and coordinating many tasks. Also, because th e cluster 

was a shared resource, this increase in com pute tim e could have easily come from external 

influences. See Figure 5.20 for th e original and solved panoram as.

The Salt Lake City panoram a consists of 611 images w ith a full resolution of 126,826 x 

29, 633 pixels (3.27 gigapixel) and was split into 3,444 tiles. Our m ethod took 3 hours and 

5 minutes to  com pute a solution on our one node te st desktop. On th e distrib u ted  cluster 

w ith 492 m appers and 492 reducers th e tim e to  com pute a solution dropped to 28 minutes 

and 44 seconds of which 3 minutes and 24 seconds was due to Hadoop overhead and 15 

minutes was due to I/O  and d a ta  transfer between th e m ap and reduce phases. R unning


97

F ig u r e  5.20: The results of our cloud im plem entation, from top to  bottom : Edinburgh, 
25 images, 16, 950 x 2, 956, 50 megapixel and th e solution to E dinburgh from our cloud 
im plem entation; Redrock, nine images, 19, 588 x 4,457; 87 megapixel and th e solution to 
R edrock from our cloud im plem entation; Salt Lake City, 611 images, 126, 826 x 29, 633, 
3.27-gigapixel and th e solution to  Salt Lake City from our cloud im plem entation.

Salt Lake City w ith 246 m appers and 246 reducers produced a solution in 39 minutes and 

49 seconds of which 2 minutes and 7 seconds was due to  Hadoop overhead and 30 minutes 

was due to  I/O  and d a ta  transfer. Note th a t these are all wall clock times and include 

activity of other people on a shared system. Moreover, th e configuration, which we could 

not change, required running a t least three processes on every node which have only two 

cores. Therefore, we can only hope to  have 2 /3  com pute efficiency out of this cluster. See 

Figure 5.20 or th e original and solved panoram a. Based on our tim ing and th e pricing 

available online, running th e 492 m ap p er/red u cer job would have cost approxim ately $50 

to  run on A m azon’s Elastic Reduce [10]. This is orders of m agnitude less expensive and 

tim e comsuming th a n  operating and m aintaining a proprietary cluster and would allow any 

researcher in th e field to experim ent w ith new ideas.

5 .4 .3 .1  S c a la b ility . Due to th e shared n atu re of th e CLuE cluster, we restricted 

our scalability tests to only th e single node test desktop. Figure 5.21 plots th e runtim e to 

solve b o th  th e E dinburgh and Redrock panoram as as a function of num ber of reducers and 

m appers. We varied th e num ber of m appers and reducers from one to  th e num ber of cores. 

The plot shows th a t as b o th  th e m appers and reduces increase so does our performance, 

b ut as th e to tal num ber of b o th  m appers and reducers meets or exceeds th e available cores 

of our system, th e performance gain flattens. This is an im p o rtan t observation and m ust be 

remembered when choosing an optim al num ber of m appers and reducers especially when 

purchasing tim e and cores as a commodity.


98

F ig u r e  5.21: (a) The scalability plot for th e E dinburgh (50 megapixel) panoram a on our 
one node 8-core te st desktop; (b) th e scalability plot for Redrock (87 megapixel) panoram a 
on th e same machine

5 .4 .3 .2  F a u lt t o l e r a n c e . H adoop has been developed to  robustly handle failures 

in th e cluster. Achieving a fault to lerant im plem entation is a m ajor challenge on its own 

and is not easily available in o ther distrib u ted  frameworks such as M PI. The trem endous 

advantage of fault tolerance comes a t th e cost of high variability in running tim es, though 

jobs are guaranteed to  finish. In fact, all runs on th e distrib u ted  cluster had some kind of 

failure in th e system a t some tim e during th e execution and still we were able to  get results, 

which would not be available w ith a trad itio n al distrib u ted  im plem entation. In particular, 

th e running tim e stated  above for th e Salt Lake City example w ith 492 m appers/reducers 

was based on th e job w ith th e minimum num ber of failures (95 failed tasks). In practice, 

we have seen this example run as long as 49 m inutes to  account for th e 133 failed tasked 

th a t occurred during th e job.


C H A P T E R  6

F U T U R E  W O RK

The work outlined in this dissertation provides b o th  th e justification for and th e solutions 

to  bringing th e com position stage of th e panoram a processing pipeline into an interactive set­

ting. T he future of this work is to bring the entire pipeline into an interactive environment. 

A system built w ith this guiding principle would allow th e user to  add and remove images, 

fix registration problems, or adjust image boundaries all while having a preview of th e final 

color-corrected, com posited panoram a. This will give users an unprecedented am ount of 

control over th e creation of new panoram as, increasing b o th  the accessibility of panoram a 

creation and quality of th e final results. Due to  th e work completed for this dissertation, the 

logical next step to  achieving my ultim ate goal is to provide new and interactive solutions 

for th e registration phase of th e panoram a pipeline. Currently, interaction for registration is 

typically non-existent. Even when it is provided by a system, th e interaction is rudim entary 

at best. For instance, th e only interaction possible for a panoram a processing system such 

as Hugin [77] is th e m anual selection and deletion of image feature points between pairs of 

images, see Figure 6.1.

This is a tedious process for small images, and com pletely unwieldy for larger image 

collections. The scaling of registration algorithm s will also need fu rth er study. D espite 

significant previous work, many current m ethods have been shown to  work w ith relatively 

small collections of images. A lthough assumed to  scale well, only recently has work shown an 

extension of current techniques to work w ith extrem ely large collections of photographs [1]. 

Often such assum ptions of scaling can be false; for instance some of th e commercial and 

open-source products, although advertised to  handle an arb itrary  num ber of images, have 

failed when presented w ith panoram as w ith hundreds of images. General purpose algorithm s 

for au tom atic registration of extrem ely large image collections rem ain an open avenue of 

investigation. In addition, state-of-the-art stitching software often needs a reduction of 

complexity by strictly enforcing th a t th e images are acquired in a regular p a tte rn  (columns 

and rows) to  reduce th e search space for possible registrations. My collaborators and I have


100

1 4 *• -  -  -  * ••

F ig u r e  6.1: A typical example of interaction during panoram a registration from th e open- 
source Hugin [77] software tool. C urrent interaction is limited to  th e manual selection and 
deletion of feature points used during registration.

found th a t these program s will often fail when presented w ith large image collections w ith 

no assumed structure.

T h e focus of my dissertation work was to bring th e com position stage of th e panoram a 

creation pipeline into an interactive setting, not only for small images, b u t for images 

massive in size. The Progressive Poisson and Panorama Weaving algorithm s elegantly 

achieve this goal. A lthough panoram as were the prim ary focus of th e work, th e m ethods and 

frameworks developed th roughout my dissertation provide new paradigm s for interacting 

w ith high resolution imagery. For instance, th e Progressive Poisson provides a working 

proof-of-concept on how to  reform ulate global algorithm s to  work in an interactive setting 

for large d a ta  by com puting screen resolution previews in real-tim e and using out-of-core 

com putation for full resolution solutions. O ne can envision expanding th e frameworks and 

techniques outlined in this work w ith o ther d a ta  processing tools to allow comprehensive 

editing of massive d atasets on regular desktop com puters.


A P P E N D IX  

M A SSIV E DATASETS

T a b le  A .1 : Massive panoram a d a ta  acquired and used in this dissertation work.

D ataset Images Format Individual Image Size
Lake Louise Large 5794 RAW NEF 16-bit 4288 x 2848 (12 Megapixel)
Lake Louise Small 1 2805 RAW NEF 16-bit 4288 x 2848 (12 Megapixel)
Lake Louise W inter 1 1983 JP E G  Fine 4288 x 2848 (12 Megapixel)
Lake Louise W inter 2 1876 JP E G  Fine 4288 x 2848 (12 Megapixel)
Lake Louise Morning 1656 JP E G  Fine 4288 x 2848 (12 Megapixel)
Lake Louise Small 2 1440 RAW NEF 16-bit 4288 x 2848 (12 Megapixel)
Salt Lake City Large 1311 JP E G  Fine 3456 x 2592 (9 megapixels)
Lake Louise Evening 1220 JP E G  Fine 4288 x 2848 (12 Megapixel)
Salt Lake City Winter 1219 JP E G  Fine 3456 x 2592 (9 megapixels)
Salt Lake City Fall 624 JP E G  Fine 3456 x 2592 (9 megapixels)
Mount Rushmore 300 JP E G  Fine 4288 x 2848 (12 Megapixel)
Salt Lake City Small 132 JP E G  Fine 3264 x 2448 (8 megapixels)

T a b le  A .2 : Massive satellite d a ta  acquired and used in this dissertation work.

Dataset Resolution Gigapixels
New York, NY 80000 x 80000 6.40
Chattanooga, TN 120000 x 100000 12.00
Washington, DC 131350 x 159375 20.93
Hamilton County, SC 240000 x 232000 55.68
Philadelphia, PA 250000 x 230000 57.50
Indianapolis, IN 260000 x 260000 67.60
San Diego, CA 200000 x 365000 73.00
San Francisco, CA 225000 x 330000 74.25
New Orleans, LA 330000 x 290000 95.70
Olympia, WA 501059 x 329220 164.96
San Antonio, TX 521640 x 492480 256.90
Atlanta, GA 524288 x 524288 274.88
Seattle, WA 411280 x 693528 285.23
Phoenix, AZ 720000 x 540000 388.80


R EFER EN C ES

[1] A g a r w a l ,  S., S n a v e ly , N ., S e itz , S., a n d  S z e lis k i, R . B undle adjustm ent in the 
large. E C C V ’10: Proceedings o f the 11th European conference on Computer vision: 
P art I I  (Jan. 9), 29-42.

[2] A g a r w a l a ,  A. Efficient gradient-dom ain com positing using quadtrees. A C M  Trans. 
Graph 26, 3 (2007), 94.

[3] A g a r w a l a ,  A ., A g r a w a l a ,  M ., C o h e n , M. F ., S a le s i n , D ., a n d  S z e lis k i, R. 
P hotographing long scenes w ith multi-viewpoint panoram as. A C M  Trans. Graph 25,
3 (2006), 853-861.

[4] A g a r w a l a ,  A ., D o n t c h e v a ,  M ., A g r a w a l a ,  M ., D r u c k e r ,  S. M ., C o l b u r n ,  
A ., C u r l e s s ,  B ., S a le s i n , D ., a n d  C o h e n , M. F . Interactive digital photom on­
tage. A C M  Trans. Graph 23, 3 (2004), 294-302.

[5] A g a r w a l a ,  A ., Z h e n g , K . C., P a l ,  C ., A g r a w a l a ,  M ., C o h e n , M. F ., 
C u r l e s s ,  B ., S a le s i n , D ., a n d  S z e lis k i, R . Panoram ic video textures. A C M  
Trans. Graph 24, 3 (2005), 821-827.

[6] A g r a w a l ,  A ., R a s k a r ,  R ., N a y a r , S. K ., a n d  Li, Y. Removing photography 
artifacts using gradient projection and flash-exposure sampling. In S IG G R A P H  ’05: 
A C M  S IG G R A P H  2005 Papers (New York, NY, USA, 2005), ACM, pp. 828-835.

[7] A g r a w a l ,  A. K ., C h e l l a p p a ,  R ., a n d  R a s k a r ,  R . An algebraic approach to 
surface reconstruction from gradient fields. In IC C V  (2005), pp. I: 174-181.

[8] A g r a w a l ,  A. K ., R a s k a r ,  R ., a n d  C h e l l a p p a ,  R . W h at is th e range of surface 
reconstructions from a gradient field? In E C C V  (2006), pp. I: 578-591.

[9] A g r a w a l ,  A. K ., X u, Y ., a n d  R a s k a r ,  R . Invertible m otion blur in video. A C M  
Trans. Graph 28, 3 (2009).

[10] A m a zo n. E lastic m apreduce, 2012. h ttp ://aw s.am azo n .co m /elasticm ap red u ce.

[11] A n a n d a n , P ., B u r t ,  P . J ., D a n a , K ., H a n s e n , M. W ., a n d  v a n  d e r  W a l, 
G. S. Real-tim e scene stabilization and mosaic construction. In Image Understanding 
Workshop (1994), pp. I:457-465.

[12] A u t o P a n o . h ttp ://w w w .a u to p a n o .n e t.

[13] A x e ls s o n , O. Iterative Solution Methods. Cam bridge Universty Press, New York, 
NY, 1994.

[14] B a d r a ,  F ., Q u m sieh , A ., a n d  D u d e k , G. R o tatio n  and zooming in image 
mosaicing. In W A C V  (1998), pp. 50-55.

http://aws.amazon.com/elasticmapreduce
http://www.autopano.net


103

[15] B a e , S., P a r i s ,  S., a n d  D u r a n d ,  F . Two-scale tone m anagem ent for photographic 
look. In S IG G R A P H  ’06: A C M  S IG G R A P H  2006 Papers (New York, NY, USA, 
2006), ACM, pp. 637-645.

[16] B a i, X ., W a n g , J ., Sim ons, D ., a n d  S a p ir o , G. Video snapcut: R obust video 
object cu to u t using localized classifiers. A C M  Trans. Graph 28, 3 (2009).

[17] B a l m e l l i ,  L., K o v a c e v ic , J ., a n d  V e t t e r l i ,  M. Q uadtrees for em bedded 
surface visualization: C onstraints and efficient d a ta  structures. In in Proc. o f IE E E  
International Conference on Image Processing (1999), pp. 487-491.

[18] B a y , H ., T u y t e l a a r s ,  T ., a n d  G o o l , L. J . V. SURF: Speeded up robust features. 
In E C C V  (2006), pp. I: 404-417.

[19] B e n - E z r a , M ., a n d  N a y a r , S. K . M otion-based motion deblurring. IE E E  Trans. 
P attern Anal. Mach. Intell 26, 6 (2004), 689-698.

[20] B e r g e r ,  M. J ., a n d  C o l e l l a ,  P . Local adaptive mesh refinement for shock 
hydrodynamics. Journal Computational Physics 82 (1989), 64-84.

[21] B o l i t h o ,  M ., K a z h d a n , M ., B u r n s , R ., a n d  H o p p e , H. Multilevel stream ing for 
out-of-core surface reconstruction. In SG P  ’07: Proceedings o f the fifth Eurographics 
Sym posium  on Geometry Processing (Aire-la-Ville, Switzerland, Switzerland, 2007), 
Eurographics Association, pp. 69-78.

[22] B o r n e m a n n , F . A ., a n d  K r a u s e ,  R . Classical and cascadic m ultigrid - a m ethod­
ological comparison. In In  Proceedings o f the 9th International Conference on D omain 
Decomposition Methods (1996), D omain Decomposition Press, pp. 64-71.

[23] B o u k e r c h e ,  A ., a n d  P a z z i, R . W . N. R em ote rendering and stream ing of 
progressive panoram as for mobile devices. In A C M  M ultimedia (2006), K. N ahrstedt, 
M. Turk, Y. Rui, W. Klas, and K. M ayer-Patel, Eds., ACM, pp. 691-694.

[24] B o y k o v , Y ., a n d  K o lm o g o r o v , V. An experim ental com parison of m in-cut/m ax- 
flow algorithm s for energy minim ization in vision. IE E E  Trans. P attern Anal. Mach. 
Intell 26, 9 (2004), 1124-1137.

[25] B o y k o v , Y. Y ., a n d  J o l l y ,  M. P . Interactive graph cuts for optim al boundary 
and region segm entation of objects in N-D images. In IC C V  (2001), pp. I: 105-112.

[26] B o y k o v , Y. Y ., V e k s l e r ,  O ., a n d  Z a b ih , R . Fast approxim ate energy minimiza­
tion via graph cuts. IE E E  Trans. P attern Analysis and M achine Intelligence 23, 11 
(Nov. 2001), 1222-1239.

[27] B r a n d t ,  A. Multi-level adaptive solutions to  boundary-value problems. M athem at­
ics o f Com putation 31, 138 (1977), 333-390.

[28] B r ig g s , W . L., H e n s o n , V. E ., a n d  M c C o r m ic k , S. F . A M ultigrid Tutorial 
(2nd Ed.). Society for Industrial and Applied M athem atics, Philadelphia, PA, USA, 
2000.

[29] B r o w n , D. C. Close-range cam era calibration. Photogrammetric Engineering 37, 8 
(Aug. 1971), 855-866.


104

[30] B r o w n , M ., a n d  L o w e , D . A utom atic panoram ic image stitching using invariant 
features. International Journal o f Com puter Vision (Jan. 2007).

[31] B r o w n , M ., a n d  L o w e , D. G. Recognising panoram as. In IC C V  (2003), IE E E  
C om puter Society, pp. 1218-1227.

[32] B r o w n , M ., S z e lis k i, R . S., a n d  W i n d e r , S. A. J . M ulti-image m atching using 
multi-scale oriented patches. In C V P R  (2005), pp. I: 510-517.

[33] B u c h a n a n , A ., a n d  F i t z g ib b o n , A. W . Combining local and global motion models 
for feature point tracking. In C V P R  (2007), pp. 1-8.

[34] C a p e l , D ., a n d  Z is s e rm a n , A. A utom ated mosaicing w ith super-resolution zoom. 
Proc. C V P R  (Jan. 1998).

[35] C h am , T . J ., a n d  C i p o l l a ,  R . A statistical framework for long-range feature 
m atching in uncalibrated image mosaicing. In C V P R  (1998), pp. 442-447.

[36] C h a n d r a n ,  L. S., a n d  S iv a d a s a n , N. Geometric representation of graphs in low 
dimension. In Proceedings o f the 12th A nnual International Conference on Com­
puting and Combinatorics (Berlin, Heidelberg, 2006), C O C O O N ’06, Springer-Verlag, 
pp. 398-407.

[37] C h a r t r a n d ,  G ., a n d  H a r a r y ,  F . P lan a r perm u tatio n  graphs. Ann. Inst. Henri 
Poincare 3, 4 (1967), 433-438.

[38] C h e n , S. E . Quicktime VR: An image-based approach to  virtu al environm ent 
navigation. In S IG G R A P H  (1995), pp. 29-38.

[39] C h o , S., a n d  L ee , S. Fast m otion deblurring. A C M  Trans. Graph 28, 5 (2009).

[40] C h o w , E ., F a l g o u t ,  R . D ., H u , J . J ., T u m in a r o , R . S., a n d  Y a n g , U. M. A 
survey of parallelization techniques for m ultigrid solvers. In Parallel Processing fo r 
Scientific Computing, M. A. Heroux, P. Raghavan, and H. D. Simon, Eds., vol. 20 of 
Software, Environm ents, and Tools. SIAM, Philadelphia, PA, Nov. 2006, pp. 179-201. 
ch. 10,.

[41] C luE. Clue program, 2008. h ttp://w w w .nsf.gov/pubs/2008/nsf08560 
/nsf08560.htm .

[42] C o h e n , M. F ., S h a d e , J ., H i l l e r ,  S., a n d  D e u s s e n , O. Wang tiles for image 
and tex tu re generation. A C M  Trans. Graph 22, 3 (2003), 287-294.

[43] C o r m e n , T . H ., L e i s e r s o n , C. E ., R i v e s t ,  R . L., a n d  S t e i n , C. Introduction 
to Algorithms, Third E dition, 3rd ed. The M IT Press, Cambridge, MA, 2009.

[44] C o r r e a ,  C. D ., a n d  M a, K .-L . Dynamic video narratives. A C M  Trans. Graph 29,
4 (2010).

[45] C rim in isi, A ., S h a r p , T ., R o t h e r ,  C., a n d  P E r e z , P . Geodesic image and video 
editing. A C M  Trans. Graph 29, 5 (2010), 134.

[46] D av is, J . E . Mosaics of scenes w ith moving objects. In C V P R  (1998), pp. 354-360.

http://www.nsf.gov/pubs/2008/nsf08560


105

[47] D e a n , J ., a n d  G h e m a w a t, S. MapReduce: Simplified d a ta  processing on large 
clusters. C AC M  51, 1 (2008), 107-113.

[48] D e l o n g ,  A ., a n d  B o y k o v , Y . A scalable graph-cut algorithm  for N-D grids. In 
C V P R  (2008), IE E E  C om puter Society.

[49] D em m el, J . W . Applied Numerical Linear Algebra. SIAM, 1997.

[50] D i j k s t r a ,  E . W . A note on two problems in connexion w ith graphs. Numerische 
M athem atik 1 (1959), 269-271.

[51] D o r r ,  F . W . The direct solution of th e discrete poisson equation on a rectangle. 
S IA M  Review 12, 2 (April 1970), 248-263.

[52] E d e l s b r u n n e r ,  H ., a n d  M u c k e , E . P . Simulation of simplicity: A technique to 
cope w ith degenerate cases in geometric algorithm s. A C M  Trans. Graphics 9, 1 (Jan. 
1990), 67-104.

[53] E f r o s ,  A. A ., a n d  F r e e m a n , W . T . Image quilting for tex tu re synthesis and 
transfer. In S IG G R A P H  (2001), pp. 341-346.

[54] F a r b m a n , Z., H o f f e r ,  G ., L ip m an , Y ., C o h e n - O r , D ., a n d  L is c h in s k i, D. 
Coordinates for instan t image cloning. In S IG G R A P H  ’09: Proceedings o f the 36th 
A nnual Conference on Computer Graphics and Interactive Techniques (New York, 
NY, USA, 2009), ACM.

[55] F a t t a l ,  R ., L is c h in s k i, D ., a n d  W e r m a n , M. G radient dom ain high dynam ic 
range compression. In S IG G R A P H  ’02: Proceedings o f the 29th A nnual Conference on 
Com puter Graphics and Interactive Techniques (New York, NY, USA, 2002), ACM, 
pp. 249-256.

[56] F e r g u s ,  R ., S in g h , B ., H e r tz m a n n , A ., R o w e is, S. T ., a n d  F r e e m a n , W . T. 
Removing cam era shake from a single photograph. A C M  Transactions on Graphics 
25, 3 (July 2006), 787-794.

[57] F i n l a y s o n , G. D ., H o r d l e y ,  S. D ., a n d  D r e w , M. S. Removing shadows from 
images. In E C C V  ’02: Proceedings o f the 7th European Conference on Computer 
Vision-Part I V  (London, UK, 2002), Springer-Verlag, pp. 823-836.

[58] F l u s s e r ,  J ., B o ld y s , J ., a n d  Z ito v A , B. M oment forms invariant to ro tatio n  and 
blur in arb itrary  num ber of dimensions. IE E E  Trans. P attern Anal. Mach. Intell 25,
2 (2003), 234-246.

[59] F l u s s e r ,  J ., a n d  S u k , T . D egraded image analysis: An invariant approach. IE E E  
Transactions on P attern Analysis and M achine Intelligence 20, 6 (1998), 590-603.

[60] F o r d ,  L. R ., a n d  F u l k e r s o n ,  D. R . M aximal flow th rough a network. Canadian 
Journal o f M athem atics 8 (1956), 399-404.

[61] F r e e d m a n , D ., a n d  Z h a n g , T . Interactive graph cut based segm entation w ith 
shape priors. In C V P R  (2005), pp. I: 755-762.

[62] G a l l ,  D. L. Mpeg: A video compression stan d ard  for m ultim edia applications. 
Com munications o f the A C M  (Jan  1991).


106

[63] G ig a P a n . h ttp ://w w w .g ig a p a n .o rg /a b o u t.p h p .

[64] G o ld m a n , D. B. V ignette and exposure calibration and com pensation. IE E E  Trans. 
P attern Anal. Mach. Intell 32, 12 (2010), 2276-2288.

[65] G o o g l e . Google E a rth  h ttp ://e a rth .g o o g le .c o m /.

[66] G o r t l e r ,  S., a n d  C o h e n , M. V ariational modeling w ith wavelets. In Sym posium  
on Interactive 3D Graphics (1995), pp. 35-42.

[67] G o s h t a s b y , A. A. 2-D and 3-D Image Registration: For Medical, Rem ote Sensing, 
and Industrial Applications. Wiley-Interscience, New York, NY, M ar. 2005.

[68] G o t t f r id , D . Self-service, prorated super com puting fun!, 2007. h ttp ://o p e n .b lo g s . 
nytim es.com /2007/11/01/self-service-prorated-super-com puting-fun.

[69] G r a c i a s ,  N. R . E ., M a h o o r ,  M. H ., N e g a h d a r i p o u r ,  S., a n d  G l e a s o n ,  A. 
C. R . Fast image blending using w atersheds and graph cuts. Image and Vision 
Computing 27, 5 (Apr. 2009), 597-607.

[70] G r i e b e l ,  M ., a n d  Z u m b u sc h , G. Parallel m ultigrid in an adaptive pde solver based 
on hashing and space-filling curves. Parallel Comput. 25, 7 (1999), 827-843.

[71] H a d o o p . A pplications and organizations using hadoop, 2012. h ttp ://w ik i.a p a c h e . 
org/hadoop/Pow eredB y.

[72] H a s s in , R . M aximum flow in (s, t)-p lan ar networks. Inform . Proc. Lett. 13 (1981), 
107.

[73] H e a t h ,  N g , a n d  P e y t o n .  Parallel algorithm s for sparse linear systems. S IR E V : 
S IA M  Review 33 (1991).

[74] H iR IS E . High Resolution Imaging Science E xperim ent h ttp ://h iris e .lp l.a riz o n a .e d u /.

[75] H o c k n e y , R . W . A fast direct solution of Poisson’s equation using Fourier analysis. 
Journal o f the A C M  12, 1 (Jan. 1965), 95-113.

[76] H o r n , B. K . P . D eterm ining lightness from an image. Comput. Graphics Image 
Processing 3, 1 (Dec. 1974), 277-299.

[77] H u g in . h ttp ://h u g in .so u rcefo rg e.n et.

[78] I k e d a , S., S a t o ,  T ., a n d  Y o k o y a , N. High-resolution panoram ic movie gener­
ation from video stream s acquired by an om nidirectional m ulti-cam era system. In 
Proceedings o f IE E E  International Conference on M ultisensor Fusion and Integration 
fo r  Intelligent System s (Aug. 2003), p. 155.

[79] J ia , J ., S un, J ., T a n g , C .-K ., a n d  Shum , H .-Y . D rag-and-drop pasting. In 
S IG G R A P H  ’06: A C M  S IG G R A P H  2006 Papers (New York, NY, 2006), ACM, 
pp. 631-637.

[80] J ia , J ., a n d  T a n g , C .-K . Tensor voting for image correction by global and local 
intensity alignment. IE E E  Trans. P attern Anal. Mach. Intell 27, 1 (2005), 36-50.

http://www.gigapan.org/about.php
http://earth.google.com/
http://open.blogs
http://wiki.apache
http://hirise.lpl.arizona.edu/
http://hugin.sourceforge.net


107

[81] J ia , J ., a n d  T a n g , C .-K . Image stitching using stru ctu re deform ation. IE E E  Trans. 
P attern Anal. Mach. Intell 30, 4 (2008), 617-631.

[82] J ia , J. Y ., a n d  T a n g , C. K . Image registration w ith global and local luminance 
alignment. In IC C V  (2003), pp. 156-163.

[83] J ia , J . Y ., a n d  T a n g , C. K . E lim inating stru ctu re and intensity misalignment in 
image stitching. In IC C V  (2005), pp. II: 1651-1658.

[84] J o h n s o n , D . B. Efficient algorithm s for shortest paths in sparse networks. Journal 
o f the A C M  24, 1 (Jan. 1977), 1-13.

[85] J o s h i, N ., K a n g , S. B., Z i tn ic k , C. L., a n d  S z e lis k i, R . Image deblurring using 
inertial m easurem ent sensors. A C M  Trans. Graph 29, 4 (2010).

[86] J o s h i, N ., S z e lis k i, R ., a n d  K r ie g m a n , D. J . P S F  estim ation using sharp edge 
prediction. In C V P R  (2008), pp. 1-8.

[87] K a z h d a n , M. R econstruction of solid models from oriented point sets. In Euro­
graphics Sym posium  on Geometry Processing (2005), pp. 73-82.

[88] K a z h d a n , M ., B o l i t h o ,  M ., a n d  H o p p e , H. Poisson surface reconstruction. In 
Eurographics Sym posium  on Geometry Processing (2006), pp. 61-70.

[89] K a z h d a n , M ., a n d  H o p p e , H. Stream ing m ultigrid for gradient-dom ain operations 
on large images. A C M  ToG. 27, 3 (2008).

[90] K a z h d a n , M ., S u r e n d r a n ,  D ., a n d  H o p p e , H. D istributed gradient-dom ain 
processing of p lanar and spherical images. Transactions on Graphics (T O G  29, 2 
(M ar 2010).

[91] K a z h d a n , M. M ., a n d  H o p p e , H. Stream ing m ultigrid for gradient-dom ain 
operations on large images. A C M  Trans. Graph 27, 3 (2008).

[92] K o lm o g o r o v , V ., a n d  Z a b ih , R . W h at energy functions can be minimized via 
graph cuts? IE E E  Trans. P attern Anal. Mach. Intell 26, 2 (2004), 147-159.

[93] K o p f , J ., C o h e n , M. F ., L is c h in s k i, D ., a n d  U y t t e n d a e l e ,  M. Joint bilateral 
upsampling. A C M  Trans. Graph 26, 3 (2007), 96.

[94] K o p f , J ., U y t t e n d a e l e ,  M ., D e u s s e n , O ., a n d  C o h e n , M. F . C apturing and 
viewing gigapixel images. A C M  Trans. Graph 26, 3 (2007), 93.

[95] K o u r o g i, M ., K u r a t a ,  T ., H o s h in o , J ., a n d  M u r a o k a ,  Y . Real-tim e image 
mosaicing from a video sequence. In IC IP  (4) (1999), pp. 133-137.

[96] K r u g e r ,  S., a n d  C a lw a y , A. Image registration using m ultiresolution frequency 
dom ain correlation. Proc. B ritish Machine Vision C onf (Jan  1998).

[97] K u g lin , C. D ., a n d  H in es, D . C. The phase correlation image alignment method. 
Assorted Conferences and Workshops (Sept. 1975), 163-165.


108

[98] K u m a r , S., P a s c u c c i , V ., V is h w a n a th , V ., C a r n s , P ., H e r e l d ,  M ., L a th a m , 
R ., P e t e r k a ,  T ., P a p k a , M ., a n d  R o ss, R . Towards parallel access of m ulti­
dimensional, m ulti-resolution scientific d ata. In Petascale Data Storage Workshop 
(PD SW ), 2010 5th (Nov. 2010), pp. 1 -5.

[99] K u m a r , S., V is h w a n a th , V ., C a r n s , p . ,  Sum m a, B., S c o r z e l l i ,  G ., P a s c u c c i , 
V ., R o s s , R ., C h e n , J ., K o l l a ,  H ., a n d  G r o u t ,  R . Pidx: Efficient parallel i/o  for 
m ulti-resolution multi-dimensional scientific d atasets. In Proceedings o f IE E E  Cluster 
2011 (Sep. 2011).

[100] K u n d u r , D ., a n d  H a t z i n a k o s , D. Blind image deconvolution. IE E E  Signal 
Processing Magazine 13, 3 (May 1996), 43-64.

[101] K w a t r a ,  V ., S c h o d l ,  A ., E s s a , I., T u r k ,  G ., a n d  B o b ic k , A. G raphcut 
textures: Image and video synthesis using graph cuts. A C M  Transactions on Graphics 
22, 3 (July 2003), 277-286.

[102] L a w d e r , J . K ., a n d  K in g , P . J . H. Using space-filling curves for multi-dimensional 
indexing. In L N C S  (2000), Springer Verlag, pp. 20-35.

[103] L e v in , A ., Z o m e t, A ., P e l e g ,  S., a n d  W e iss, Y . Seamless image stitching in the 
gradient domain. In E C C V  (2004), pp. Vol IV: 377-389.

[104] Li, Y ., S u n , J ., T a n g , C .-K ., a n d  Shum , H .-Y . Lazy snapping. A C M  Trans. 
Graph 23, 3 (2004), 303-308.

[105] L is c h in s k i, D ., F a r b m a n , Z., U y t t e n d a e l e ,  M ., a n d  S z e lis k i, R . Interactive 
local adjustm ent of tonal values. A C M  ToG 25, 3 (2006), 646-653.

[106] L iu, J ., a n d  S u n , J . Parallel graph-cuts by adaptive b o ttom -up merging. In C VP R 
(2010), IEEE, pp. 2181-2188.

[107] L o m b a e r t ,  H ., S u n , Y . Y ., G r a d y ,  L., a n d  X u, C. Y . A multilevel banded 
graph cuts m ethod for fast image segm entation. In IC C V  (2005), pp. I: 259-265.

[108] L o w e , D . G. O bject recognition from local scale-invariant features. In IC C V  (1999), 
p p . 1150-1157.

[109] L u c a s , B ., a n d  K a n a d e , T . An iterative image registration technique w ith an 
application to  stereo vision. International Jo in t Conference on Artificial Intelligence
3 (1981), 674-679.

[110] M a li n g , D. H. Coordinate System s and Map Projections. B utterw orth-H einem ann, 
W oburn, MA, 1993.

[111] M a n n , S., a n d  P i c a r d ,  R . W . V irtual bellows: C onstructing high quality stills 
from video. In IC IP  (1) (1994), pp. 363-367.

[112] M a t u n g k a ,  R ., Z h e n g , Y ., a n d  E w in g , R . Image registration using adaptive 
polar transform . Image Processing, IE E E  Transactions on 18, 10 (2009), 2340 -  
2354.

[113] M c C a n n , J. Recalling th e single-FFT direct poisson solve. In S IG G R A P H  Posters 
(2008), ACM, p. 71.


109

[114] M c C a n n , J ., a n d  P o l l a r d ,  N. S. Real-tim e gradient-dom ain painting. In 
S IG G R A P H  ’08: A C M  S IG G R A P H  2008 papers (New York, NY, USA, 2008), ACM, 
pp. 1-7.

[115] M c G u ir e , M. An image registration technique for recovering rotation, scale and 
tran slatio n  param eters. N E C  Res. Inst. Tech. Rep., T R  (1998), 98-018.

[116] M c L a u c h l a n ,  P ., a n d  J a e n i c k e ,  A. Image mosaicing using sequential bundle 
adjustm ent. Image and Vision Computing (Jan. 2002).

[117] M e e h a n , J . Panoramic Photography. Amphoto, Oct. 1990.

[118] M e g a P O V . h ttp ://m e g a p o v .in e ta rt.n e t.

[119] M ilg r a m , D. L. C om puter m ethods for creating photomosaics. IE E E  Trans. 
Com puter 23 (1975), 1113-1119.

[120] M ilg r a m , D . L. A daptive techniques for photomosaicking. IE E E  Trans. Computer 
26 (1977), 1175-1180.

[121] M i l l s ,  A ., a n d  D u d e k , G. Image stitching w ith dynam ic elements. Image and 
Vision Computing 27, 10 (Sept. 2009), 1593-1602.

[122] M o r t e n s e n ,  E. N ., a n d  B a r r e t t ,  W . A. Intelligent scissors for image composi­
tion. In S IG G R A P H  (1995), pp. 191-198.

[123] M o r t e n s e n ,  E. N ., a n d  B a r r e t t ,  W . A. Interactive segm entation w ith intelligent 
scissors. Graphical models and image processing: GM IP 60, 5 (Sept. 1998), 349-384.

[124] N a g a h a s h i, T ., F u jiy o s h i, H ., a n d  K a n a d e , T . Image segm entation using 
iterated  graph cuts based on multi-scale smoothing. In A C C V  (2007), pp. II: 806-816.

[125] N A SA . NASA Blue M arble h ttp ://e arth o b serv ato ry .n asa.g o v / 
F eatu res/B lu eM arb le/.

[126] N ie d e r m e ie r , R ., R e i n h a r d t ,  K ., a n d  S a n d e r s , P . Towards optim al locality in 
meshindexings. In Proc. Fundamentals o f Com putation Theory (1997), vol. 1279 of 
LNCS, Spinger, pp. 364-375.

[127] O ja n s iv u , V ., a n d  H e i k k i l a , J . Image registration using blur-invariant phase 
correlation. IE E E  Signal Processing Letters 14, 7 (July 2007), 449-452.

[128] P a s c u c c i , V ., a n d  F r a n k ,  R . J . Hierarchical indexing for out-of-core access 
to  m ulti-resolution d ata. In Hierarchical and Geometrical Methods in Scientific 
Visualization, M athem atics and Visualization. Springer, New York, NY.

[129] P a s c u c c i , V ., a n d  F r a n k ,  R . J . Global static indexing for real-tim e exploration 
of very large regular grids. In Supercomputing (S C ’01) (2001), p. 2.

[130] P a s c u c c i , V ., L a n e y , D . E ., F r a n k ,  R . J ., G y g i, F ., S c o r z e l l i ,  G ., L in se n , 
L., a n d  H a m a n n , B. R eal-tim e m onitoring of large scientific simulations. In S A C
(2003), ACM, pp. 194-198.

http://megapov.inetart.net
http://earthobservatory.nasa.gov/


110

[131] P a v l o ,  A ., P a u l s o n ,  E ., R a s in , A ., A b a d i, D . J ., D e W i t t ,  D. J ., M a d d e n , 
S. R ., a n d  S t o n e b r a k e r ,  M. A com parison of approaches to large scale d a ta  
analysis. In SIG M O D  (Providence, RI, USA, 2009).

[132] P e l e g ,  S., R o u s s o , B ., A c h a , A. R ., a n d  Z o m e t, A. Mosaicing on adaptive 
manifolds. IE E E  Trans. P attern Analysis and Machine Intelligence 22, 10 (Oct. 2000), 
1144-1154.

[133] P E r e z , P ., G a n g n e t ,  M ., a n d  B l a k e ,  A. Poisson image editing. A C M  Trans. 
Graph. 22, 3 (2003), 313-318.

[134] P h ili p , S., Summa, B ., B r e m e r , P .- T ., a n d  P a s c u c c i , V. Parallel gradient 
dom ain processing of massive images. In Eurographics Sym posium  on Parallel Graph­
ics and Visualization (Llandudno, Wales, UK, 2011), T. Kuhlen, R. P ajarola, and 
K. Zhou, Eds., Eurographics Association, pp. 11-19.

[135] P h ili p , S., Summa, B ., P a s c u c c i , V ., a n d  B r e m e r , P .- T . H ybrid cpu-gpu solver 
for gradient dom ain processing of massive images. In IC P A D S ’11: Proceedings o f 
the 2011 IE E E  17th International Conference on Parallel and Distributed System s 
(W ashington, DC, USA, 2011), IE E E  C om puter Society, pp. 244-251.

[136] P r e t t o ,  A ., M e n e g a t t i ,  E ., B e n n e w it z , M ., B u r g a r d ,  W ., a n d  P a g e l l o ,  
E . A visual odom etry framework robust to  motion blur. In IC R A  (2009), IEEE, 
pp. 2250-2257.

[137] P T g u i, 2012. h ttp ://w w w .p tg u i.c o m .

[138] R a s t o g i ,  A ., a n d  K r i s h n a m u r t h y ,  B. Localized hierarchical graph cuts. In 
IC V G IP  (2008), IEEE, pp. 163-170.

[139] R i c k e r ,  P . M .  A direct m ultigrid poisson solver for oct-tree adaptive meshes. The 
Astrophysical Journal Supplem ent Series 176 (2008), 293-300.

[140] R o b e r t s ,  F . On the boxicity and cubicity o f a graph. Recent Progress in Combina­
torics, 1969.

[141] R o s g e n , B., a n d  S t e w a r t ,  L. Complexity results on graphs w ith few cliques. 
Discrete M athem atics & Theoretical Computer Science 9, 1 (2007).

[142] R o t h e r ,  C ., K o lm o g o r o v , V ., a n d  B l a k e ,  A. G rabcut: Interactive foreground 
ex traction using iterated  graph cuts. A C M  Trans. Graph 23, 3 (2004), 309-314.

[143] S a g a n , H. Space-Filling Curves. Spinger-Verlag, New York, NY, 1994.

[144] S a n d , P ., a n d  T e l l e r ,  S. Video m atching. ToG 23, 3 (2004), 592-599.

[145] Shum , H. Y ., a n d  S z e lis k i, R . S. C onstruction and refinement of panoram ic 
mosaics w ith global and local alignment. In IC C V  (1998), pp. 953-956.

[146] S im c h o n y , T ., a n d  C h e l l a p p a ,  R . Direct analytical m ethods for solving Poisson 
equations in com puter vision problems. IE E E  Trans. P attern Anal. Mach. Intell. 12 
(1990), 435-446.

http://www.ptgui.com


111

[147] S n a v e ly , N ., G a r g ,  R ., S e itz , S. M ., a n d  S z e lis k i, R . F inding paths through 
th e w orld’s photos. In SIGGraph-08 (2008), pp. xx-yy.

[148] S t o o k e y ,  j . ,  X ie, Z., C u t l e r ,  B ., F r a n k l i n ,  W . R ., T r a c y ,  D. M ., a n d  
A n d r a d e ,  M. V. A. Parallel O D ETLA P for te rra in  compression and reconstruction. 
In GIS (2008), W. G. Aref, M. F. Mokbel, and M. Schneider, Eds., ACM, p. 17.

[149] Sum m a, B ., S c o r z e l l i ,  G ., J i a n g , M ., B r e m e r , P .- T ., a n d  P a s c u c c i , V. 
Interactive editing of massive imagery made simple: Turning A tla n ta into A tlantis. 
A C M  Trans. Graph. 30, 2 (Apr. 2011), 7:1-7:13.

[150] Sum m a, B ., T i e r n y ,  J ., a n d  P a s c u c c i , V. P anoram a weaving: F ast and flexible 
seam processing. A C M  Trans. Graph. 31, 4 (July 2012), 83:1-83:11.

[151] Sum m a, B ., V o, H. T ., S ilv a , C ., a n d  P a s c u c c i , V. Massive image editing on the 
cloud. In IA S T E D  International Conference on Computational Photography (CPhoto 
2011) (2011).

[152] S u n , J ., J ia , J ., T a n g , C .-K ., a n d  Shum , H .-Y . Poisson m atting. A C M  Trans. 
Graph. 23, 3 (2004), 315-321.

[153] S z e lis k i, R . Image mosaicing for tele-reality applications. In Proceedings o f the 
Second IE E E  Workshop on Applications o f Com puter Vision (1994), pp. 44-53.

[154] S z e lis k i, R . Video mosaics for virtu al environm ents. Com puter Graphics and 
Applications, IE E E  16, 2 (1996), 22 -  30.

[155] S z e lis k i, R . Image alignment and stitching: A tuto rial. Foundations and Trends in 
Com puter Graphics and Vision 2, 1 (2006).

[156] S z e lis k i, R ., a n d  Shum , H .-Y . C reating full view panoram ic image mosaics and 
environm ent maps. S IG G R A P H  ’97: Proceedings o f the 24th annual conference on 
Com puter graphics and interactive techniques (Aug 1997).

[157] S z e lis k i, R . S. Video mosaics for virtu al environm ents. IE E E  Computer Graphics 
and Applications 16, 2 (Mar. 1996), 22-30.

[158] T a i, Y. W ., D u , H ., B r o w n , M. S., a n d  Lin, S. Im age/video deblurring using a 
hybrid cam era. In C V P R  (2008), pp. 1-8.

[159] T a s d iz e n , T ., K o s h e v o y , p . ,  G rim m , B. C., A n d e r s o n , J . R ., J o n e s , B. W ., 
W a t t ,  C. B ., W h i t a k e r ,  R . T ., a n d  M a r c ,  R . E . A utom atic mosaicking and 
volume assembly for high-throughput serial-section transm ission electron microscopy. 
Journal o f Neuroscience Methods 193, 1 (2010), 132 -  144.

[160] T o l e d o ,  S. A survey of out-of-core algorithm s in numerical linear algebra. In 
E xternal m em ory algorithms, Dimacs Series In Discrete M athem atics And Theoretical 
C om puter Science. American M athem atical Society, Boston, MA, 1999, pp. 161-179.

[161] T r i g g s ,  B ., M c L a u c h l a n ,  P ., H a r t l e y ,  R . I., a n d  F itz g ib b o n , A. W . Bundle 
adjustm ent: A m odern synthesis. In Vision Algorithms Workshop: Theory and 
Practice (1999), pp. 298-372.


112

[162] T u y t e l a a r s ,  T ., a n d  M ik o l a j c z y k ,  K. Local invariant feature detectors: A 
survey. Foundations and Trends in Computer Graphics and Vision 3, 3 (2007), 177­
280.

[163] T z i m r o p o u lo s , G ., A r g y r i o u ,  V ., Z a f e i r i o u ,  S., a n d  S t a t h a k i ,  T . R obust 
fft-based scale-invariant image registration w ith image gradients. P attern Analysis 
and Machine Intelligence, IE E E  Transactions on D O I - 10.1109/34.55103 PP, 99 
(2010), 1 -  1.

[164] U SG S,. U nited States Geological Survey h ttp ://w w w .u sg s.g o v /.

[165] U y t t e n d a e l e ,  M. T ., E d e n , A ., a n d  S z e lis k i, R . S. E lim inating ghosting and 
exposure artifacts in image mosaics. In C V P R  (2001), pp. II:509-516.

[166] V a l g r e n ,  C., a n d  L i l i e n t h a l ,  A. J . SIFT, SURF & seasons: A ppearance-based 
long-term  localization in outdoor environm ents. Robotics and A utonom ous System s 
58, 2 (2010), 149-156.

[167] V i n e e t , V ., a n d  N a r a y a n a n ,  P . J. CUDA cuts: Fast graph cuts on th e GPU. In 
Com puter Vision on GPU (2008), pp. 1-8.

[168] V i t t e r ,  J . S. E xtern al memory algorithm s and d a ta  structures: Dealing w ith 
massive d ata. A C M  Comput. Surv. 33, 2 (2001), 209-271.

[169] V o , H ., B r o n s o n , J ., Summa, B ., C o m b a, J ., F r e i r e ,  J ., H o w e , B ., P a s c u c c i , 
V ., a n d  S ilv a , C. Parallel visualization on large clusters using m apreduce. In Pro­
ceedings o f the 2011 IE E E  Sym posium  on Large-Scale Data Analysis and Visualization 
(LD AV) (2011), p. (to appear).

[170] V o , H. T ., O s m a ri, D . K ., Sum m a, B ., C o m b a, J . L. D ., P a s c u c c i , V ., a n d  
S ilv a , C. T . Stream ing-enabled parallel dataflow architecture for multicore systems. 
Comput. Graph. Forum 29, 3 (2010), 1073-1082.

[171] W a n g , B ., Sum m a, B ., P a s c u c c i , V ., a n d  V e jd e m o - J o h a n s s o n , M. Branching 
and circular features in high dim ensional d ata. Visualization and Computer Graphics, 
IE E E  Transactions on 17, 12 (dec. 2011), 1902-1911.

[172] W a r d , G. Hiding seams in high dynam ic range panoram as. In A P G V  (2006), R. W. 
Flem ing and S. Kim, Eds., vol. 153 of A C M  International Conference Proceeding 
Series, ACM, p. 150.

[173] W e iss, Y . Deriving intrinsic images from image sequences. In International Confer­
ence on Com puter Vision (2001), pp. 68-75.

[174] W o o d , D. N ., F i n k e l s t e i n ,  A ., H u g h e s , J . F ., T h a y e r ,  C. E ., a n d  S a le s i n , D. 
M ultiperspective panoram as for cel anim ation. In S IG G R A P H  (1997), pp. 243-250.

[175] W u , C. SiftGPU: A G PU  im plem entation of scale invariant feature transform  (SIFT), 
2007. h ttp ://c s .u n c .e d u / ccw u/siftgpu.

[176] X io n g , Y ., a n d  P u l l i ,  K . Fast image labeling for creating high-resolution 
panoram ic images on mobile devices. In IS M  (2009), IE E E  C om puter Society, 
pp. 369-376.

http://www.usgs.gov/
http://cs.unc.edu/


113

[177] X io n g , Y ., a n d  P u l l i ,  K . Fast panoram a stitching for high-quality panoram ic 
images on mobile phones. IE E E  Transactions on Consumer Electronics (Jan  2010).

[178] X io n g , Y ., W a n g , X ., T ic o , M ., a n d  L ia n g , C. Panoram ic imaging system  for 
mobile devices. S IG G R A P H ’09: Posters (Jan  2009).

[179] X u , D ., C h e n , Y ., X io n g , Y ., Q ia o , C ., a n d  H e, X. On th e complexity o f/an d  
algorithm s for finding shortest p ath  w ith a disjoint counterpart. IE E E /A C M  Trans. 
on Networking 14, 1 (2006), 147-158.

[180] Ya h o o ! Yahoo! expands its m45 cloud com puting initiative, adding to p  universities 
to  supercom puting research cluster. h t t p : // r e s e a r c h .y a h o o .c o m /n e w s /3 3 7 4 .

[181] Y o o n , M .-S .-E ., a n d  L i n d s t r o m , M .-P . Mesh layouts for block-based caches. 
IE E E  Transactions on Visualization and Com puter Graphics 12, 5 (2006), 1213-1220.

[182] Y o o n , S .-E ., L i n d s t r o m , P ., P a s c u c c i , V ., a n d  M a n o c h a , D . Cache-oblivious 
mesh layouts. A C M  Trans. Graph. 24, 3 (2005), 886-893.

[183] Y u a n , L., S u n , J ., Q u a n , L., a n d  Shum , H .-Y . Image deblurring w ith 
blu rred /n o isy  image pairs. A C M  Trans. Graph 26, 3 (2007), 1.

[184] Z i to v a , B., a n d  F l u s s e r ,  J . Image registration methods: A survey. Image and 
Vision Computing 21, 11 (Oct. 2003), 977-1000.

[185] Z o g h la m i, I., F a u g e r a s ,  O ., a n d  D e r i c h e , R . Using geometric corners to  build 
a 2d mosaic from a set of images. Computer Vision and P attern Recognition, 1997. 
Proceedings., 1997 IE E E  Com puter Society Conference on (1997), 420 -  425.

http://research.yahoo.com/news/3374