Mont’e Scan: Effective shape and color digitization of
cluttered 3D artworks
FABIO BETTIO, ALBERTO JASPE VILLANUEVA, EMILIO MERELLA, FABIO MARTON, and
ENRICO GOBBETTI, CRS4 Visual Computing, Italy
RUGGERO PINTUS, CRS4 Visual Computing, Italy and Yale University, USA

We propose an approach for improving the digitization of shape and color of 3D artworks in a cluttered environment using 3D
laser scanning and flash photography. In order to separate clutter from acquired material, semi-automated methods are employed
to generate masks used to segment the range maps and the color photographs. This approach allows the removal of unwanted
3D and color data prior to the integration of acquired data in a 3D model. Sharp shadows generated by flash acquisition are
easily handled by this masking process, and color deviations introduced by the flash light are corrected at the color blending
step by taking into account the geometry of the object. The approach has been evaluated on a large scale acquisition campaign
of the Mont’e Prama complex. This site contains an extraordinary collection of stone fragments from the Nuragic era, which
depict small models of prehistoric nuraghe (cone-shaped stone towers), as well as larger-than-life archers, warriors, and boxers.
The acquisition campaign has covered 37 statues mounted on metallic supports. Color and shape were acquired at a resolution
of 0.25mm, which resulted in over 6200 range maps (about 1.3G valid samples) and 3817 photographs.

Categories and Subject Descriptors: I.3.3 [Computer Graphics] Picture and Image Generation; I.3.7 [Computer Graphics]
Three-Dimensional Graphics and Realism

General Terms: Cultural Heritage

Additional Key Words and Phrases: 3D scanning, shape acquisition, color acquisition, 3D visualization

ACM Reference Format:
Fabio Bettio, Alberto Jaspe Villanueva, Emilio Merella, Fabio Marton, Enrico Gobbetti, Ruggero Pintus. 2014. Mont’e Scan:
Effective shape and color digitization of cluttered 3D artworks ACM J. Comput. Cult. Herit. 8, 1, Article 4 (August 2014), 22 pages.
DOI:http://dx.doi.org/10.1145/2644823

1. INTRODUCTION

The increasing performance and proliferation of digital photography and 3D scanning devices is making
it possible to acquire, at reasonable costs, very dense and accurate sampling of both geometric and
optical surface properties of real objects. A wide variety of cultural heritage applications stand to benefit
particularly from this technological evolution. In fact, this technological progress is leading to the

This research is partially supported by the Region of Sardinia, EU FP7 grant 290277 (DIVA), and Soprintendenza per i Beni
Archeologici per le Province di Cagliari ed Oristano.
Author’s address: F. Bettio, R. Pintus, A. Jaspe, E. Merella, F. Marton and E. Gobbetti; CRS4, POLARIS Ed. 1, 09010 Pula (CA),
Italy; email: {fabio,ruggero,ajaspe,emerella,marton,gobbetti}@crs4.it www: http://www.crs4.it/vic/ http://graphics.cs.
yale.edu/
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided
that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page
or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to
lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be
requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or
permissions@acm.org.
c© 2014 ACM 1556-4673/2014/08-ART4 $15.00
DOI:http://dx.doi.org/10.1145/2644823

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


4:2 • F. Bettio, A. Jaspe Villanueva, E. Merella, F. Marton, E. Gobbetti, and R. Pintus

possibility to construct accurate colored digital replicas not only for single digital objects but at a large
scale. Accurate reconstructions built from objective measures have many applications, ranging from
virtual restoration to visual communication.

Fig. 1. Reassembled Nuragic statue with supports and its virtual reconstruction. The black support structure holds
the fragments in the correct position, with minimal contact surface, avoiding pins and holes in the original material. A 360-degree
view is possible, but color and shape capture is difficult because of clutter, occlusions, and shadows. The rightmost image depicts
our 3D reconstruction. Photo courtesy of ArcheoCAOR.

The digitization approach most widely used today is a combination of laser scanning with digital
photography. Using computational techniques, digital object surfaces are reconstructed from the laser-
scan-generated range maps, while the apparent color value sampled in digital photos is transferred by
registering the photos with respect to the 3D model and mapping it to the 3D surface using the recovered
inverse projections. Since early demonstrations of the complete modeling pipeline (e.g., [Bernardini
and Rushmeier 2002; Levoy et al. 2000]), most of its components have reached sufficient maturity
for adoption in a variety of application domains. This approach is particularly well suited to cultural
heritage digitization, since scanning and photographic acquisition campaigns can be performed quickly
and easily, without the need to move objects to specialized acquisition labs. The most costly and time
consuming part of 3D reconstruction is thus moved to post-processing, which can be performed off-site.
Thus, in recent years research has focused on improving and automating the post-processing steps
– for instance, leading to (semi-)automated scalable solutions for range-map alignment [Pingi et al.
2005], surface reconstruction from point clouds [Kazhdan et al. 2006; Manson et al. 2008; Cuccuru
et al. 2009; Calakli and Taubin 2011], photo registration [Pintus et al. 2011c; Corsini et al. 2012],
and color mapping [Callieri et al. 2008; Pintus et al. 2011b; 2011a]. Even though passive image-based
methods have recently emerged as a viable (and low-cost) 3D reconstruction technology [Remondino
2011], the standard pipeline based on laser scanning or other active sensors still remain a widely used
general-purpose approach, mainly because of the higher reliability for a wider variety settings (e.g.,
featureless surfaces) [Remondino 2011; Koutsoudis et al. 2014].
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


Mont’e Scan: Effective shape and color digitization of cluttered 3D artworks • 4:3

In this paper, we tackle the difficult problem of effectively adapting the 3D scanning pipeline to the
acquisition of color and shape of 3D artworks on-site, in a cluttered environment. This case, arises, for
instance, when scanning restored and reassembled ancient statues in which (heavy) stone fragments
are maintained in place by a custom exostructure (see Fig. 1 for an example).

Digitizing statues without removing the supports allows one to perform scanning directly on location
and without moving the fragments, therefore enabling a completely contactless approach. On the other
hand, the presence of the supporting structure typically generates shadow-related color artifacts, holes
due to occlusion effects and extra geometry that must be removed. With the standard 3D scanning
pipeline these issues lead to laborious and mostly manual post-processing steps – including cleaning the
geometry and careful pixel masking (for more details see Sec. 2 with an overview of the related work).

Motivated by these issues, in this work we present a practical approach for improving the digitization
of the shape and color of 3D artworks in a cluttered environment. While our methods are generally
applicable, the work was spurred by our involvement in the Digital Mont’e Prama project (see Sec. 3),
which included the fine-scale acquisition of 37 large statues and therefore required robust and scalable
methods. Our main contributions presented in this article are the following:

—an easy-to-apply acquisition protocol based on laser scanning and flash photography;
—a simple and practical semi-automatic method for clutter removal and photo masking;
—a scalable implementation of the entire masking, editing, infilling, color-correction, and color-blending

pipeline, that works fully out-of-core without limits on model size and photo number;
—the evaluation of the method and tools in a large-scale real-world application involving a massive

acquisition campaign covering 37 statues mounted on metallic supports, acquired at 0.25mm reso-
lution, resulting in over 6200 range scans (approximately 1.3G valid samples) and 3817 10Mpixel
photographs.

This work is an invited extended version of our Digital Heritage 2013 contribution [Bettio et al. 2013].
In addition to supplying a more thorough exposition in this work, we also provide significant new
material including an analysis of requirements, the description of an improved pipeline which performs
color infilling and takes into account position and orientation of surfaces during color mapping, and the
presentation of results on the complete Mont’e Prama dataset.

2. RELATED WORK

Our system extends and combines state-of-the-art results in a number of technological areas. In the
following text we only discuss the approaches most closely related to our novel contributions, which
focus on effective clutter removal. For more detailed information on the entire 3D scanning pipeline we
refer the reader to the recent survey by Callieri et al. [2011]. Our approach assumes that color images
are reliably registered to the 3D models. Obtaining this registration is an orthogonal problem for which
a variety of solutions exists (e.g., [Borgeat et al. 2009; Pintus et al. 2011c; Corsini et al. 2012; Pintus and
Gobbetti 2014]). The results in this work have been obtained with the approach of Pintus et al. [2014].
Color and geometry masking. Editing and cleaning the acquired 3D model is often the most time-
consuming reconstruction task [Callieri et al. 2011]. While some techniques exist for semi-automatic 3D
clutter removal in 3D scans, they are typically limited to well-defined situations (e.g., walls vs. furniture
for interior scanning [Adan and Huber 2011] or walls vs. organic models for exterior scanning [Lafarge
and Mallet 2012]). Manual editing is also typically employed in approaches that work on images and
range maps. For instance, Farouk et al. [2003] embedded a simple image editor into the scanning
GUI. We also follow the approach of working on 2D representations, but concentrate our efforts to
reduce human interventions. Interactive 2D segmentation is a well-known research topic with several

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


4:4 • F. Bettio, A. Jaspe Villanueva, E. Merella, F. Marton, E. Gobbetti, and R. Pintus

state-of-the-art solutions that typically involve classification and/or editing of color image datasets (see
well-established surveys [Zhang et al. 2008; McGuinness and O’Connor 2010]). In general, the aim
of these techniques is to efficiently cope with the foreground/background extraction problem with the
least possible user input. The simplest tool available is the Magic Wand in Adobe Photoshop 7 [Adobe
Systems Inc. 2002]. The user selects a point and the software automatically computes a connected
set of pixels that belong to the same region. Unfortunately, an acceptable segmentation is rarely
achieved automatically since choosing the correct color or intensity tolerance value is a difficult or even
impossible task. Many classic methods, such as intelligent scissors [Mortensen and Barrett 1999], active
contours [Kass et al. 1988] and Bayes matting [Chuang et al. 2001], require a considerable degree of
user input in order to achieve satisfactory results. More accurate approaches have been presented that
solve the semi-automatic image segmentation problem by using Graph Cuts [Boykov and Jolly 2001];
here the user marks a small set of background and/or foreground pixels as seeds and the algorithm
propagates that information to the remaining image regions. Among the large number of extensions to
the Graph Cuts methodology [Zeng et al. 2008; Xu et al. 2007], the GrabCut technique [Rother et al.
2004] combines a very straightforward manual operation, with a color modeling and an extra layer
of (local) minimization to the Graph Cuts technique; this requires a small effort from the user but
proves to be very robust in different segmentation scenarios. In this work we propose an adaptation
of the GrabCut approach to the problems of editing point cloud geometries and pre-processing images
for texture blending. In our approach, we perform a minimal user-assisted training on a small set of
acquired range maps and images in order to automatically remove clutter from images and point clouds.
Color acquisition and blending. Most cultural heritage applications require the association of material
properties to geometric reconstructions of the sampled artifact. While many methods exist for sampling
Bidirectional Radiance Distribution Functions (BRDF) [Debevec et al. 2000; Lensch et al. 2003] in
sophisticated environments with controlled lighting, the typical cultural heritage applications impose
fast on-site acquisition and the use of low-cost and easy to use procedures and technologies. Color
photography is the most common approach. Since removing lighting artifacts requires knowledge of
the lighting environment, one approach is to employ specific techniques that use probes [Debevec 1998;
Corsini et al. 2008]. However, these techniques are hard to use in practice, in typical museum settings
with local lights. Dellepiane et al. [2009; 2010] proposed, instead, to use light from camera flashes. They
propose to use the Flash Lighting Space Sampling (FLiSS) – a correction space where a correction
matrix is associated to each point in the camera field of view. Nevertheless, this method requires a
laborious calibration step. Given that medium- to high-end single-lens reflex (SLR) cameras support
fairly uniform flash illumination and RAW data acquisition modes that produce images where each
pixel value is proportional to incoming radiance [Kim et al. 2012], we take the simpler approach of using
a constant color balance correction for the entire set of photographs and apply a per-pixel intensity
correction based on geometric principles. This approach effectively reduces calibration work. While
our previously published results [Bettio et al. 2013] only employed a distance-based correction, in
this work we employ a more complete correction that also takes into account surface orientation. The
method is similar to the one originally used by Levoy et al. [2000], without the need of special fiber
optic illuminators and of per-pixel color response calibration. Under our flash illumination, taken from
relatively far from the statues (2.5m), the flash can be approximated as a point source and energy
deposition on the statue is negligible compared to typical ambient lighting. In addition, while previous
color blending pipelines worked on large triangulated surfaces [Levoy et al . 2000; Callieri et al. 2008]
or single-resolution point-clouds [Pintus et al. 2011a], we blend images directly on multiresolution
structures leading to increased scalability. The pipeline presented in our original work [Bettio et al.
2013] is also combined here with inpainting and infilling methods for constructing seamless models.

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


Mont’e Scan: Effective shape and color digitization of cluttered 3D artworks • 4:5

Our implementation is based on combining screened Poisson reconstruction [Kazhdan and Hoppe 2013]
with an anisotropic color diffusion process [Wu et al. 2008] implemented in a multigrid framework.
Scalable editing and processing. Massive point-cloud processing and interactive point-cloud editing
are required to produce high-quality 3D models from an input set of registered photos and merged
geometries. In this work, we represent geometric datasets as a forest of out-of-core octrees of point
samples [Wand et al. 2007; Wimmer and Scheiblauer 2006; Scheiblauer and Wimmer 2011] and employ
the same structure for all operations, including color blending and editing. While previous works split
and refine nodes based on strict per-node sample budgets, our approach is based on local density
estimates, which allows us keep to the structures more balanced.

3. CONTEXT AND METHOD OVERVIEW

The design of our method, which is of general use, has taken into account requirements gathered from
domain experts in the context of a large scale project. Section 3.1 briefly illustrates the design process
and the derived requirements, while Sec. 3.2 provides a general overview of our approach, justifying the
design decisions in relation to the requirements.

Fig. 2. Mont’e Prama Statues on display at the CRCBC exhibition hall. Scanning was performed on-site.

3.1 Requirements

While our methods are of general use, our work is motivated by the Digital Mont’e Prama project,
a collaborative effort between CRS4 and the Soprintendenza per i Beni Archeologici per le Province
di Cagliari ed Oristano (ArcheoCAOR, the government department responsible for the archeological
heritage of the provinces of Cagliari and Oristano), which aims to digitally document, archive, and
present to the public the large and unique collection of pre-historic statues from the Mont’e Prama
complex, including larger-than-life human figures and small models of prehistoric nuraghe (cone-shaped
stone towers). The project covers aspects ranging from 3D digitization to visual exploration (see Balsa
et al. [2014] and Marton et al. [2014] for more details on the visual exploration aspects).

The Mont’e Prama Collection. The Mont’e Prama complex is a large set of sandstone sculptures created
by the Nuragic civilization in Western Sardinia. More than 5000 sculpture fragments were recovered
after four excavation campaigns carried out between 1975 and 1979. According to the most recent
estimates, the stone fragments came from a total of 44 statues depicting archers, boxers, warriors and

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


4:6 • F. Bettio, A. Jaspe Villanueva, E. Merella, F. Marton, E. Gobbetti, and R. Pintus

Laser 
Scanning

Flash
Photography

Virtual
Reconstruction

Training
Automatic
Cleaning

Manual
Editing

Geometry
Consolidation

Training
Automatic
Masking

Manual
Editing

Photo 
Mapping

On-site Coarse
Registration

Photos Range
Maps

Photos

Seamless 
Colored 
3D Model

Color correction
Photo Blending

On-site SfM
Registration

Infilling and
inpainting

Fig. 3. Pipeline. We improve digitization of 3D artworks in a cluttered environment using 3D laser scanning and flash
photography. Semi-automated methods are employed to generate masks to segment the 2D range maps and the color photographs,
removing unwanted 3D and color data prior to 3D integration. Sharp shadows generated by flash acquisition are handled
by the masking process and color deviations introduced by the flash light are corrected at color blending time by taking into
account object geometry. A final seamless model is created by combining Poisson reconstruction with anisotropic color diffusion.
User-guided phases are highlighted in yellow.

models of prehistoric nuraghe. These can be traced to an as-yet undetermined period, which goes from
the tenth to the seventh century BC. Restoration, carried out at the Centro di Restauro e Conservazione
dei Beni Culturali (CRCBC) of Li Punti (Sassari, Italy) resulted in the partial reassembly of 25 human
figures with height varying between 2 and 2.5 meters, and 13 approximately one-meter-sized nuraghe
models. Following modern restoration criteria, reassembly was performed in a non-invasive way (no
drilling or bolt insertions into the sculptures). Definitely joining fragments have been glued together
using a water-soluble epoxy resin, and all the gaps on the resin-filled surface were covered with lime-
mortar stucco. Custom external supports have been designed to sustain all the parts of a statue in order
to ensure stability to all the components without the use of mechanical attachments, while minimizing
contacts with the statue and maximizing visibility (see Fig. 1). All supports allow a 360 degree view of
the statue. The project covers 37 out of the 38 reassembled statues – one nuraghe was excluded since it
is small and still subject to further reassembling work (see Fig. 2 for the setup of the exhibition in the
CRCBC hall and Fig. 9 for the final digitally reconstructed models).

Requirement Analysis. In order to design our acquisition and 3D reconstruction technique, we em-
barked in a participatory design process involving domain experts with the goal of collecting the detailed
requirements of the application domain; the experts included two archaeologists from ArcheoCAOR and
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


Mont’e Scan: Effective shape and color digitization of cluttered 3D artworks • 4:7

two restoration experts from CRCBC. Additional requirements stem from our analysis of related work
(see Sec. 2) and our own past experience developing capture, and reconstruction systems for architecture
and cultural heritage [Cuccuru et al. 2009; Pintus et al. 2011a; Mura et al. 2013]. In the following text
we describe the main requirements used for guiding the development process, briefly summarizing how
they were derived:

R1. On-site scanning in a limited time frame. Scanning of the entire set of statues must be per-
formed while the statues are on display in the CRCBC exhibition hall. All the scanning has to be
performed during a time-frame of max 8h/day (early morning and late evening) and in a period of
no more than two months.

R2. No contact and no fragment motion.The statues are made of relatively soft sandstone and
are very sensitive to contact and erosion. Therefore, the acquisitions must be performed with the
statues mounted on their supports and without any contact. The only motion possible is the sliding
of the base of each fully reassembled statue (i.e., 2D rotation and translation on the exhibition floor).
Fragments cannot be moved one relative to the other.

R3. Macro- and micro-structural shape capture and reconstruction. Like almost any decorated
artifact of cultural heritage, the statues present information at multiple scales (global shape and
carvings). Even the finest material micro-structure carries valuable information (e.g., on the artistic
decorations, as well as on the carving process). For instance, almost all the fragments of the Mont’e
Prama statues have millimeter-sized carvings on various parts, and thus require sub-millimetric
model precision.

R4. High-resolution color capture and reconstruction. While no results of a human coloring
process is currently visible on the statues, color information provides valuable information and is
part of the aura of the objects. For instance, the fragments acquired in this project are often made
of stones with different grains. In fact, the sandstone itself has a spatially varying texture, color
inserts and characteristic patterns are due to organic and fossil inclusions as well as limestone
deposits. In addition, traces of fire are visible on multiple fragments and lead to a characteristic
dark-brown coloring. Moreover, surface finish is variable and subtly modifies stone color, while not
sensibly modifying the reflectance, which remains extremely diffuse. In short, it is important to
capture and reproduce color at a resolution comparable to the resolution of geometry capture.

R5. Seamless virtual models for public presentation. While the acquisition process should lead to
the creation of models accurate enough for archival, study, and further restoration activities, one of
the main motivations for the project is to use them for visual communication to scholars as well
as to the public at large, both in museum settings and on the web. In order to preserve the aura
of original artifacts, the 3D models should be at high resolution and seamless. The exostructures
should not be present in these virtual models. Since the exostructure has contact points with the
artwork, holes are unavoidable. However, for public presentation the model should be watertight, as
visible holes would be confusing and unattractive. Since contact areas are small and placed in areas
with lack of detail, smoothly infilling and inpainting these areas is considered acceptable for public
display.

R6. Low-labor approach. To be applicable to this Mont’e Prama project, and to ensure wide adoption
in other cultural heritage campaigns, the proposed method should be flexible and required a
reasonably low amount of labor, to reduce the time and the costs required to produce the models.

3.2 Approach

Requirements R1-R6 were taken as guidelines for our research, which resulted in the definition of a
semi-automatic color and shape 3D scanning pipeline for the acquisition and reconstruction of cluttered

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


4:8 • F. Bettio, A. Jaspe Villanueva, E. Merella, F. Marton, E. Gobbetti, and R. Pintus

objects. The pipeline is an enhancement of the well-established pipeline based on laser scanning and
digital photography. This makes the method general-purpose enough to be used in different settings
by operators trained to use the classic robust approach based on 3D laser scanning. Figure 3 outlines
the approach used, which consists of a short on-site phase and a subsequent, mostly automatic, off-site
phase (R1 and R6).

The only on-site operations are the acquisition of geometry and color, which are performed in a
contact-less manner by only sliding and rotating the statue while it is mounted on its support (R2). The
geometry acquisition operation is performed with a triangulation laser scanner (R3), which produces
range and reflectance maps that are incrementally coarsely aligned during scanning in order to monitor
3D surface coverage. On the other hand, color is acquired in a dark environment by taking a collection of
photographs with an uncalibrated camera while using the camera flash as the only source of light (R1,
R4 and R6). A Macbeth color checker, visible in at least one of the photographs, is used for post-process
color calibration. Analogously to the geometry acquisition step, coverage is (optionally) checked on-site
by coarsely aligning the photographs using a Structure-from-Motion (SfM) pipeline.

The remainder of the work can be performed off-site using semi-automatic ( R6) geometry and color
pipe-lines that communicate only at the final merging step. In order to remove geometric clutter (R5) the
user manually segments a very small subset of the input range maps and produces a training dataset
that is sufficient for the algorithm to automatically mask unwanted geometry. This step exploits the
reflectance channel of the laser scanner. As commonly done for cultural heritage pipelines, the automatic
masking can be in principle be revised by visually inspecting and optionally manually improving the
segmentation, using the same tools employed for creating the example masks. Note that this step, in
contrast to previous work [Farouk et al. 2003], is entirely optional (see Sec. 8 for an evaluation of the
manual labor required). In order to create a clean 3D model, the masks are applied to all range maps,
which are then finely registered with a global registration method and optionally edited manually for the
finishing touch. The geometry reconstruction is then performed using Poisson reconstruction [Kazhdan
et al. 2006], which takes care of infilling small holes that appears in the unscanned areas (i.e., where
supports touch the surface of the scanned object) using a smoothness prior on the indicator function.
The effect is similar to volumetric diffusion [Davis et al. 2002].

The color pipeline follows a similar work pattern. It begins from the photographs in raw format.
After the training performed by the user on a small subset of images the algorithm automatically
masks all the input photos removing clutter (R5). The user then optionally performs a visual check
and a manual refinement, then the masked photos – already coarsely aligned among themselves with
SfM – are aligned with the geometry using the method by Pintus et al. [2014]. The photos are finally
mapped to the surface by color-blended projection [Callieri et al. 2008; Pintus et al. 2011a]. During the
blending step colors are calibrated using a data extracted from the color checker and the differences in
illumination caused by the flash used during photography are corrected using geometric information
(R5). Finally, an anisotropic diffusion process [Wu et al. 2008] is employed to perform a conservative
inpainting of the areas left without colors due to occlusions (R5).

It should be noted that the infilling and inpainting approaches employed in this work are minimalist
(R5). We do not aim at reconstructing high frequency details in large areas. Instead, we just smoothly
extend color and geometry from the neighborhood of holes to avoid the presence of confusing and
unattractive holes for public display applications.

Details on semi-automatic geometry and color masking, as well as scalable data consolidation and
color mapping are provided in the following sections. The corresponding phases are highlighted in
yellow in Fig. 3.

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


Mont’e Scan: Effective shape and color digitization of cluttered 3D artworks • 4:9

4. DATA ACQUISITION

While geometry acquisition is performed using the standard triangulation laser scanning approach,
color acquisition is performed using an uncalibrated flash camera. In our context, flash illumination
is a viable way to image the objects, as it provides us with sharp shadows together with information
on the image-specific direction of the illumination. Since at color mapping-time the geometry of the
image is known, we can correct each projected pixel according to the position of the surface on which it
projects with respect to the camera and the flash’s light, thus obtaining a reasonable approximation
of the surface albedo (see Sec. 7). In addition, cluttering material – e.g., the supporting exostructure –
generates sharp shadows which can be easily identified both by the masking process and by taking into
account geometric occlusion in the color mapping process (see Sec. 5).

In contrast to previous work [Dellepiane et al. 2009; 2010; Levoy et al. 2000], we handle images
directly in RAW format, which allows us to correct images without prior camera calibration. We
experimentally measured that on a medium/high end camera, such as the Nikon D200 employed in
our work, acquisition in RAW format produces images with pixel values proportional to the incoming
radiance at medium illumination levels (as also verified elsewhere [Friedrich et al. 2010; Kim et al.
2012]), and that the flash emits a fairly uniform light within a reasonable working space.

Fig. 4. Flash illumination. Using RAW camera data distance-based scaling provides a reasonable correction. Balance between
color channels can then be ensured using color-checker-based calibration.

Sensor near-linearity has been verified by taking images of a checkerboard in a dark room with
t=1/250s f/11.0+0.0, ISO 400. As shown in the graph in Fig. 4 values (measured on the white checker-
board squares) are proportional to 1/d2, where d is the distance from the flash light. Distance-based
scaling can be thus exploited at color mapping time to provide reasonable correction, while balance
between color channels can be ensured using color-checker-based calibration (see Sec. 7). The character-
istics of flash illumination have been verified by taking a photograph of a white diffuse material at 2m
using the same 50mm lens used for photographing the statues. After correcting for lighting angle and
distance the illumination varies only by a maximum of 7.6% within the view frustum.

While more accurate results may be obtainable with calibration techniques, even the most accurate
ones performed off-site [Levoy et al. 2000; Dellepiane et al. 2009; 2010; Friedrich et al. 2010] do not
perfectly match local shading and illumination settings since, in particular, indirect illumination is not

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


4:10 • F. Bettio, A. Jaspe Villanueva, E. Merella, F. Marton, E. Gobbetti, and R. Pintus

taken into account and the photographed materials are (obviously) not perfect Lambertian scatterers.
We thus consider this uncalibrated approach to be suitable for practical use. It should be noted that
whenever needed these alternate techniques can be easily performed during post-processing using the
same captured data. Indeed, the availability of RAW images in the captured database grants the ability
to perform a variety of post-process enhancements [Kim et al. 2012].

5. SEMI-AUTOMATIC GEOMETRY AND COLOR MASKING

Our masking process aims to separate the foreground geometry (the object to be modeled) from the
cluttering data (in particular, occluding objects), under the assumption of different appearances – as
captured in the reflectance and color signals. Starting from a manual segmentation of a small set of
examples (Sec. 5.1) we train a histogram-based classifier of the materials (Sec. 5.1), which is then refined
by finding an optimal labeling of pixels using graph cuts (Sec. 5.3). Then, a final (optional) user-assisted
revision can be performed using the same tool used for manual segmentation.

Input Ground truth Histogram Auto Mask Difference Diff. Detail

Fig. 5. Automatic masking. Geometry (top) and color (bottom) results for a single image. From left to right: acquired re-
flectance/color image; user-generated ground truth masking; mask generated by histogram-based classification; final automatically
generated mask; difference to ground truth; magnified region of difference image. In the difference image, black and white pixels
are perfect matches, while yellow pixels are false positives, green pixels are false negative points on this image, and red pixels are
real false negative points considering the entire dataset.

5.1 Manual segmentation

To perform the initial training the user is provided with a custom segmentation tool, with the same
interface for range maps and color images. The tool allows the user to visually browse images/scans in
the acquisition database, visually select a small subset (typically, less than 5%) and draw a segmentation
in the form of a binary mask – using white for foreground and black for background. The mask layer
is rendered on top of the image layer and the user can vary the transparency of the mask to evaluate
the masking results. In addition to using standard draw/erase brushes, our tool supports interactive
grab-cut segmentation [Rother et al. 2004] in which the user selects a bounding box of the foreground
object to initialize the segmentation method.

5.2 Histogram-based classification

The user-selected small subset of manually masked images and range maps is used to learn a rough
statistical distribution of pixel values that characterize foreground objects. For artifacts made of fairly
uniform materials – e.g., stone sculptures – 3-4 range maps and 4-5 images are typically sufficient.
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


Mont’e Scan: Effective shape and color digitization of cluttered 3D artworks • 4:11

For the range maps we build a 1D histogram of reflectance values, quantized to 32 levels, by accumu-
lating all pixels that were marked as foreground in the user-defined mask. On the other hand, for the
color images we use a 2D histogram based on hue and saturation, both quantized to 32 levels. Ignoring
the value component is more robust to shading variation due to flash illumination and variable surface
orientation. The process is repeated for all manually masked images, thus accumulating histogram
values before a final normalization step.

The histogram computed on the training set can be used for a rough classification of range map/image
pixels based on reflectance/color information. This classification is simply obtained by back-projecting
each image pixel to the corresponding bin location and interpreting the normalized histogram value as
a foreground probability.

It is worth noting that whether the histogram is computed from foreground or clutter data is not
important; as long as the rest of the pipeline is consistent the only constraint is the aforementioned
assumption that the two appearances are reasonably well separable.

5.3 Graph cut segmentation

As illustrated in Fig. 5, third column, the histogram-based classification is very noisy but roughly
succeeds in identifying the foreground pixels, which are generally marked with high probabilities. This
justifies our use of histograms for the rough classification step rather than the more complex statistical
representations typically used in soft segmentation [Ruzon and Tomasi 2000; Chuang et al. 2001].

Segmentation is improved by using the rough histogram-based classification as starting point for an
iterated graph cut process. We initially separate all pixels in two regions: probably foreground for those
with normalized histogram value larger than 0.5, and probably background for the others. We then
iteratively apply the GrabCut [Rother et al . 2004] segmentation algorithm using a Gaussian Mixture
Model with 5 components per region and estimating segmentation using min-cut. As illustrated in
Fig. 5, column 4, the process produces tight and well regularized segmentation masks.

5.4 Morphological post-pass

Since masks must be conservative, especially at silhouette boundaries where a small misalignment is
likely to occur, we found it useful to post-process the masks using morphological filters. After denoising
the mask using a small median filter (5x5 in this work) we perform an erosion of the mask using an
octagon kernel (4x4 in this case). This step has the effect of eliminating small isolated spots and to avoid
being too close to the silhouettes, and the removal does not create problems given the large overlap of
images that cover the model.

6. DATA CONSOLIDATION AND EDITING

The final result of the automatic masking step is a mask image associated to each range map and
color image. These masks are used for pre-filtering the geometry and color information before further
processing.

The remaining processing steps, optionally including color mapping (see Sec. 7), are performed by
using an editable out-of-core structure based on a forest of octrees. The structure consists of a scene
structure, hierarchically grouping models and associating to each group a rigid body transformation for
positioning it with respect to its parent. The leafs of the scene structure are out-of-core octrees of points.
Similarly to the system of Wand et al. [2007], the data points are stored unmodified in the leaf nodes
of the octrees while inner nodes provide grid-quantization-based multiresolution representations. In
contrast to previous work, we dynamically decide whether to subdivide or merge nodes based on a local
dynamic density estimation. In particular, at each insertion/deletion operation in a leaf node, we count
how many grid cells are occupied in the quantization grid that would be associated to the node upon

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


4:12 • F. Bettio, A. Jaspe Villanueva, E. Merella, F. Marton, E. Gobbetti, and R. Pintus

Fig. 6. Interactive inspection and editing. Left: all captured range maps imported into the out-of-core scene structure
without applying masks. Right: automatic masking removes most – if not all – clutter. Editing can thus be limited to fixing fine
details.

subdivision. If this number is larger than four times the number of points, we consider the node dense
enough for splitting. The structure provides efficient rendering and allows for handling very large data
sets using out-of-core storage, while supporting efficient online insertion, deletion and modification of
point samples.

After completing the masking process we import into a scene structure all the range maps, using a
separate octree for each one (see Fig. 6). The initial transformation of each range map is the coarse
alignment transformation computed during the scanning operation, while normals and local sampling
space are computed by local differences. The hierarchical scene structure is exploited for grouping scans
(e.g., separating low-res from hi-res scans and labeling semantically relevant parts). Alignment and
editing operations are applied to the structure using an interactive editor. Alignment is performed by
applying to (a subsets of) the range scans a global registration method based on the scalable approach
by Pulli et al. [1999], using GPU-accelerated KNN to speed-up local pairwise ICP [Cayton 2012]. After
satisfactory global alignment multiple octrees can optionally be merged together. Interactive editing
is performed on the structure using a select-and-apply paradigm that supports undo history. At each
application of a modifier (e.g., point deletion) modified samples are moved to a temporary out-of-core
structure (a memory-mapped array) prior to modification. By associating the array to the undo list we
are able to perform reversible editing.

The final colored point clouds can then be further elaborated to produce seamless surface models.
To produce consolidated models represented as colored triangle meshes, the most common surface
representation, there exist a number of state-of-the-art approaches [Cuccuru et al. 2009; Manson
et al. 2008; Kazhdan et al. 2006]. We have adopted the recent screened Poisson surface reconstruction
approach [Kazhdan and Hoppe 2013] which produces high-quality watertight reconstructions by incor-
porating input points as interpolating constraints, while reasonably infilling missing areas based on
smoothness priors. Because the Poisson approach does not handle colored surfaces we incorporate color
in a post-processing phase (see Sec. 7.1).
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


Mont’e Scan: Effective shape and color digitization of cluttered 3D artworks • 4:13

7. COLOR CORRECTION, MAPPING, AND INPAINTING

The color attribute is obtained first by projecting masked photos onto the 3D model reference frame and
then performing seamless texture blending of those images onto the surface [Pintus et al. 2011c; Pintus
et al. 2011a]. In contrast to previous work, we blend and map images directly to the out-of-core structure
and perform color correction starting from captured RAW images during the mapping operation.

7.1 Streaming color mapping

Our streaming photo blending implementation closely follows our previous work [Pintus et al. 2011c;
Pintus et al. 2011a], which we have extended to work on the multiresolution point cloud structure. We
associate a blending weight to each point, which is initialized at zero. We then perform photo blending
adding one image at a time. For each image, we start rendering the point cloud from the camera point
of view, using a rendering method that performs a screen-space surface reconstruction and adapting
the point cloud resolution to 1 projected point/pixel. We then estimate a per-pixel blending weight with
screen-space GPU operations that take as input the depth buffer as well as the stencil masks (see
Pintus et al. [Pintus et al. 2011a] for details). In a second pass on the point cloud, we update the point
colors and weights contained in the visible samples of the leafs of the multiresolution structure. Once
all images are updated we consolidate the structure recomputing bottom-up the colors and weights of
inner node samples using averaging operations. As a result, the colored models are available in our
out-of-core multiresolution point cloud structure for further editing.

In order to apply this same process to triangulated surfaces, such as those coming out of the Poisson
reconstruction, we import the surface vertices in our octree, perform the mapping, and then map the
color back to the triangulated surface. In this manner we can use the spatial partitioning structure for
view-frustum and occlusion culling during mapping operations.

Fig. 7. Color correction and relighting. Left: original image under flash illumination; note the sharp shadows and uneven
intensity. Center-left: projected color with distance-based correction and no synthetic illumination; notice that the flash highlight
has been removed, but a darker shade is on the slanted surface. Center-right: projected color with distance-based and orientation-
based correction and no synthetic illumination; note the even distribution and good approximation of surface albedo. Right:
synthetically illuminated model based on recovered albedo using a different lighting setup.

7.2 Flash color correction

Color correction happens at color blending time during color mapping operations. At this phase of the
processing the color mapping algorithm knows the color stored in the corresponding pixel of the RAW
image (the apparent color C(raw)), the camera parameters (camera intrinsic and extrinsic parameters
as well as flash position), and the geometric information of the current sample (position and normal
stored in the corresponding pixel of the frame buffers used to compute blending weights).

As we verified, the RAW data acquisition produces images where each pixel value is proportional
to incoming radiance and the flash light is fairly uniform (see Sec. 4). Therefore, we apply a simple

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


4:14 • F. Bettio, A. Jaspe Villanueva, E. Merella, F. Marton, E. Gobbetti, and R. Pintus

color correction method based of first principles, similar to the original approach of Levoy et al. [2000],
but without per-pixel calibration of flash illumination and camera response. The results presented in
this paper assume that the imaged surface is a Lambertian scatterer so that the measured color, for a
sufficiently distant illumination, can be approximated for each color channel i by

C
(i)
(raw)

≈ w(i)
(balance)

I(flash)

d2
C

(i)
(surface)

(n · l)+ (1)

where w(i)
(balance)

is the channel’s scale factor used to achieve color balance, I(flash) is flash intensity,

C
(i)
(surface)

is the diffuse reflectance (albedo) of the colored surface sample, n is the surface sample
normal, l is the flash light direction, and d is the distance of the surface sample from the flash light. As
in standard settings, the color balance factors are recovered by taking a single image of a calibration
target (Macbeth charts in our case) and using the same setting used for taking the photographs of the
artifacts.

Thus, to compute the color of the surface at color mapping time we consider a user-provided desired
object distance do, using equation 1 to find

C
(i)
(mapped)

=
d2

w
(i)
(balance)

d20(� + (1 − �)ñ · l)
C

(i)
(raw)

(2)

where � is a small non-null value (0.1 in our case) and ñ is the smoothed normal (obtained in screen-
space with a 5x5 averaging filter. Normal smoothing and dot product offsetting are introduced to
reduce the effect of possible over-corrections in the presence of a small misalignment – particularly
at grazing angles. It should be noted that, since the Lambertian model does not take into account the
roughness of the surface, under flash illumination it tends to over-shadow at grazing angles. As noted
by Oren and Nayar [Oren and Nayar 1994], this effect is due to the fact that while the brightness of a
Lambertian surface is independent of viewing direction, the brightness of a rough surface increases as
the viewing direction approaches the light source direction. The small angular weight correction thus
also contributes to reduce the boosting of colors near silhouettes.

Figure 7 shows how a single flash image introduces sharp shadows and uneven intensity based
on distance and angle of incidence. Shadows are removed by the color masking process described
in Sec. 5 as well as by shadow mapping during color projection. Distance-based correction removes
flash highlights but still produces darker shades on slanted surfaces. On the other hand, combining
distance-based and orientation-based correction produces a reasonable approximation of surface albedo,
thereby enabling a seamless combination of multiple images without illumination-dependent coloring.
The resulting colored model can thus be used for synthetic relighting.

7.3 Inpainting

Points of contacts between supports and status generate small holes in the geometry, as well as missing
colors due to occlusions and shadows (seen in white in Fig. 8 left). In order to produce final colored
watertight models – useful, e.g., for public presentations – it is important to smoothly reconstruct these
missing areas. We took the conservative approach of only using smoothness priors to perform geometry
infilling and color inpainting, rather than applying more invasive reconstruction methods based on – for
instance – non-local cloning methods. This conservative approach has the advantage of not introducing
spurious details, while repairing the surface enough to avoid the presence of distracting surface and
color artifacts during virtual exploration.

Geometry infilling is simply achieved by applying a Poisson surface reconstruction method [Kazhdan
and Hoppe 2013] to reasonably infill missing areas based on smoothness priors (see Fig. 8 center).
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


Mont’e Scan: Effective shape and color digitization of cluttered 3D artworks • 4:15

On the other hand, color inpainting uses an anisotropic color diffusion process [Wu et al. 2008]
implemented in a multigrid framework. We employ a meshless approach that can be applied either
to the vertices of the triangle mesh produced by Poisson reconstruction, or directly to a point cloud
constructed from it. We assume that each color sample stores the accumulated color and weight coming
from color blending. We first extract all points with a null weight, which are those requiring infilling.
We then extract a neighbor graph for this point cloud (by edge connectivity when operating on a triangle
mesh or by a k-nearest neighbor search, with k=8, when working on point clouds), growing the graph by
one layer in order to include colored points in the neighborhood of holes. We then produce a hierarchy of
simplified graphs using sequence coarsening operations on the neighbor graph, so that each level has
only one quarter of the samples of the finer one. We stop simplification when the number of nodes is
small enough (less than 1000 in this paper) or no more simplification edges exists. The graph is used to
quickly compute anisotropic diffusion using a multigrid solver based on V-Cycle iterations. Boundary
conditions are computed using the samples with non-zero weight that are included in the hierarchy. The
anisotropic diffusion equations are then successively transferred to coarser grids by simple averaging
and used in a coarse-to-fine error-correction scheme. Once the coarser grid is reached the problem
is solved through Gauss-Seidel iterations and the coarse grid estimates of the residual error can be
propagated down to the original grid and used to refine the solution. The cycle is repeated a few times
until convergence (results in this paper use 10 V-Cycle iterations). As illustrated in Fig 8 right, color
diffusion combined with watertight surface reconstruction successfully masks the color and geometry
artifacts due to occlusion and shadows. It is important to note that the original colors and geometry are
preserved in the database and that these extra colors can be easily removed from presentation when
desired.

Fig. 8. Geometry infilling and inpainting. Left: the points of contact between the support and the statue generate small
holes in the geometry as well as missing colors due to occlusions (in white). Middle: Poisson reconstruction smoothly infills holes.
Right: color is diffused anisotropically for conservative inpainting.

8. IMPLEMENTATION AND RESULTS

We implemented the methods described in this paper in a C++ software library and system running on
Linux. The out-of-core octree structure is implemented on top of Berkeley DB 4.8.3, while OpenMP is
used for parallelizing blending operations. The automatic masking subsystem is implemented on top of
OpenCV 2.4.3. RAW color images from the camera are handled using the dcraw 9.10 library. The SfM
software used for image-to-image alignment is Bundler 0.4.1 [Snavely et al. 2008]. All tests were run on
a PC with an 8-core Intel Core i7-3820 CPU (3.60GHz), 64GB RAM and an NVIDIA GTX680 graphics
board.

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


4:16 • F. Bettio, A. Jaspe Villanueva, E. Merella, F. Marton, E. Gobbetti, and R. Pintus

Fig. 9. Reconstructed Status of the Mont’e Prama complex. Colored reconstructions of the 37 reassembled statues.

8.1 Acquisition

The scanning campaign covered 37 statues, which were scanned and photographed directly in the
museum. Fig. 9 summarizes the reconstruction results. The geometry of all the statues was acquired at
a resolution of 0.25mm using a Minolta Vivid 9i in tele mode, resulting in over 6200 640x480 range
scans. The number or scans includes a few (wide) coarse scans which fully cover the statue, that were
acquired to help with global scan registration. The scanning campaign produced over 1.3G valid position
samples. Color was acquired with a Nikon D200 camera mounting a 50mm lens. All photos were taken
with a flash in a dark room, with a shutter speed of 1/250s, aperture f/11.0+0.0, and ISO sensitivity
400. A total of 3817 10Mpixel photographs were produced. The on-site scanning campaign required
620 hours to complete for a team of two people, one camera, and one scanner. In practice, on-site time
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


Mont’e Scan: Effective shape and color digitization of cluttered 3D artworks • 4:17

was reduced by parallelizing acquisition with two scanning teams working on two statues at a time.
The acquisition time includes scanning sessions, flash photography sessions (in dark room), and coarse
alignment of scans using our point cloud editor. Photo alignment using the SfM pipeline was performed
after each flash acquisition session, and in parallel to the scanning session, in order to verify whether
sufficient coverage had been reached. Average bundle adjustment time was of 2 hours/statue.

8.2 Automatic geometric masking

The quality and efficiency of our automatic geometric masking process was extensively evaluated on
a selected dataset, which was also manually segmented to create a ground-truth result. The digital
acquisition of selected statue, named Guerriero3 and depicted in Fig. 1, is composed by 226 range maps
(54 of which containing clutter data).

Samples (%)
Model points 51.4M
Clutter points 790K
False-Positives 240 (486) 0.03 (0.06)
False-Negatives 35757 (11219) 4.53 (1.42)
True False-Negatives 5639 (2746) 0.68 (0.35)

Fig. 10. Evaluation of automatic geometric masking. Results of manual segmentation of a single statue (Guerriero3)
compared with the results produced by automatic masking. We report the number of range map samples labeled as model (“Model
points”) and clutter (“Clutter points”) in the ground truth dataset, the samples erroneously labeled as statue (“False-positives”) or
clutter (“False-negatives”) in the automatic method, as well as the number of false negative points that really lead to missing
data in the combined dataset (“True False-negatives”). Values between parentheses compare the manually refined and the
ground-truth datasets, instead of the purely automatic method. Percentages are computed with respect to the number of clutter
points.

Each ground-truth mask was created manually from the reflectance channel of the acquired range
map using our interactive mask editor. An experienced user took about 330 minutes to complete the
manual segmentation process for the entire statue. For the sake of completeness, we also measured the
time required to remove clutter data from the 3D dataset by direct 3D point cloud editing, as done in
typical scanning pipelines. Using our out-of-core point cloud editor this operation was completed by an
experienced user in about 300 minutes, which is relatively similar to the time required for the manual
2D segmentation approach. By taking into account the relative complexity of the other statues, we can
estimate a total time of about 130-150 man-hours for the manual cleaning of the entire collection of
statues.

The automatic segmentation process was started by manually segmenting 5 reflectance images using
the same editor used for manual segmentation. This training set was used as input for the automatic
classifier. The entire process took 9 minutes for the creation of the training set and 6 minutes for the
automatic computation of the mask on an 8-core processor. The automatically generated masks were
then manually verified and retouched using our system. This additional step, which is optional, took
about 30 minutes. Applying the automatic process to the entire statue collection only took 5 hours,
excluding manual cleaning, and a total of 13.5 hours including the manual post-process cleanup: this
result is a more than ten-fold speed-up with respect to the manual approaches.

The efficiency of the automatic masking method can be seen from the results presented in Fig. 10,
which shows the results of the comparison tests between the automatically segmented masks (with and
without post-process manual cleaning) and the ground-truth dataset.

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


4:18 • F. Bettio, A. Jaspe Villanueva, E. Merella, F. Marton, E. Gobbetti, and R. Pintus

More than 95.0% of the clutter samples are correctly labeled. False-positive samples represent extra
points which can be easily identified and removed from the automated masks via 2D editing and they
are only about 0.03% of the total clutter in the ground-truth dataset. False-negative points represent
statue samples that have been erroneously masked; they are about 4.5% (1.4% in the clean-up dataset)
of the total clutter in the ground-truth dataset. Since overlapping range maps typically acquire the
geometry of same model region from multiple points of view, a false-negative sample is not a problem if
its value is correctly classified in at least one mask covering the same area. By taking into account this
fact, we verified that the points that were completely missed by the acquisition (True false negative) are
only 0.68% of the total imaged clutter surface. This check was performed by searching in overlapping
scans for samples within a radius of 1mm from each missing sample. Therefore, we can conclude that
only a small portion of the surface is missed by the system. Further, Fig. 5 illustrates the position of
the missing points; from the images it is easy to see that the points in question are often very sparse
or represent small boundary area of the model. Thus, their overall effect on dataset quality is quite
limited.

8.3 Automatic color masking

The quality and efficiency of the color masking process was evaluated in a manner analogous to the
geometry masking procedure. The selected statue –“Guerriero3”, depicted in Fig. 1 – was imaged by 68
photographs (33 of which containing clutter data). Manually masking the images took 181 minutes,
while applying the automated process required 9 minutes to generate the training set, 15 minutes to
automatically compute the masks on 8 CPU cores, plus a final 30 minutes for the optional manual
post-process cleanup. The speed-up provided by our automated procedure is, again, substantial. The
semi-automatic masking process for the entire set of statues only required a total of 41 hours (17 hours
without the post-process cleaning). By taking into account the relative complexity of the other statues,
we can estimate a total time of about 145 man-hours for the manual cleaning of the entire collection of
statues.

Samples (%)
Model points 220.5M
Clutter points 12.1M
False-Positive 381K (334K) 3.16 (2.77)
False-Negative 263K (253K) 2.18 (2.09)
True False-Negative 8642 (7725) 0.07 (0.06)

Fig. 11. Evaluation of automatic color masking. Results of manual segmentation of a single statue (Guerriero3) compared
with automatic masking results. We report the number of colored samples labeled as model (“Model points”) and clutter (“Clutter
points”) in the ground truth dataset, the samples erroneously labeled as statue (“False-positives”) or clutter (“False-negatives”) by
the automatic method, as well as the number of false negative points that really lead to missing data in the combined dataset
(“True False-negatives”). Values between parentheses compare the manually refined and the ground-truth datasets, instead of
the purely automatic method. Percentages are computed with respect to the number of clutter points.

As illustrated in the table in Fig. 11, the color masking procedures achieves results similar to those
obtained by geometry masking. Again, about 95.0% of the samples are labeled correctly. In this case,
false-positive samples are points where clutter color could potentially leak to geometry areas. These
represent about 3% of the clutter area – i.e., below 0.2% of the model area. Instead, false-negative points
are statue samples that do not receive color by a given image since they have been erroneously masked;
they are about 2.2% (2.1% in the cleaned-up dataset) of the total clutter in the ground-truth dataset,
but reduce to negligible amounts when considering overlapping photographs. This is because of the
ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


Mont’e Scan: Effective shape and color digitization of cluttered 3D artworks • 4:19

large overlap between photos and the concentration of false negative in thin boundary areas covered
from other angles. Sampling redundancy, required for alignment purposes, is thus also very beneficial
to the automatic masking process.

8.4 Consolidation and coloring

The generated geometry and color masks were used to create digital 3D models of the 37 statues (see
Fig. 9). After cleaning, all models were imported into our system based on forests of octrees, which was
used for all the 3D editing and color blending. We use lossless compression when storing our hierarchical
database, thus achieving an average cost of about 38B/sample, with per-sample positions, normals, radii,
colors, and blending weights (including database overhead). Disk footprints for our multiresolution
editable representation are thus similar to storing single-resolution uncompressed data.

We compared the performance of our system to the state-of-the-art streaming color blender [Pintus
et al. 2011a]. Our pipeline required a total of 23 minutes for blending the Guerriero3 statue; as
we already mentioned, the pipeline works directly on the editable representation of the model and
includes color correction for flash illumination. On the other hand, the streaming color blender required
2.5 minutes for pre-computing the Morton-ordered sample stream and the culling hierarchy, and
26 minutes for color blending. Therefore, the increased flexibility of our system does not introduce
additional overhead in the form of processing time nor does it require additional temporary storage –
all while supporting fast turnaround times during iterative editing sessions.

Flash color correction proved to be adequate in our evaluation. It produces visually appealing results
without unwanted color variation and/or visible seams between acquisitions (see Fig. 7 for an example).
It is important to note that, while no painting results are currently visible on the statues, including
natural color considerably adds to the realism of the reconstruction, as demonstrated in Fig. 12.

Fig. 12. Effect of color mapping. From left to right: original photograph (boxer 16); virtual reconstruction without color;
virtual reconstruction with color.

9. CONCLUSION

We have presented an approach for improving the digitization of shape and color of 3D artworks in a
cluttered environment using 3D laser scanning and flash photography. The method was evaluated in

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


4:20 • F. Bettio, A. Jaspe Villanueva, E. Merella, F. Marton, E. Gobbetti, and R. Pintus

the context of a real-world large-scale digital reconstruction project concerning 37 statues mounted
on supports. It proved capable of notably reducing on-site acquisition times and off-site editing times,
while being able to produce good-quality results.

Our method is of general applicability and is able to handle the difficult case of on-site acquisition in
a cluttered environment, as exemplified by the problem of acquiring models of statue fragments held
in place by exostructures. The technique is based on the standard combination of laser scanning and
flash digital photography. One of its main advantages is that it does not need complex illumination
setups (just a dark room during color acquisition), thus reducing the time required for color acquisition.
Further, it drastically reduces post-processing time when compared to current procedures, thanks
to semi-automatic masking of color and geometry and scalable color-corrected color blending. The
produced colored 3D models are the starting point for creating a large number of applications based on
visual communication, including passive approaches (e.g., still images, video, computer animations) and
interactive ones (e.g., multimedia books, web distribution, interactive navigation).

In its current implementation the method assumes that the statue material is easily separable from
the unwanted support structure by analyzing reflectance and color, and cannot thus be successfully
applied when the statue and the supporting structure are visually indistinguishable. However, this
differentiation is enforced in modern restoration practices. The results presented in this paper take
the assumption that the statue material is fairly diffuse and homogeneous – a common case for all
ancient artifacts made of stone. This is not an intrinsic limitation of the method, since it could also be
applied, just by reversing the masks, if the supporting material is diffuse and homogeneous. Regions of
contact between the exostructure and the imaged object cannot obviously be recovered, since they are
invisible to the imaging devices. This problem is common to all use-cases involving static supports. Since
these parts are small and generally uninteresting, infilling techniques are typically used. However, this
issue is orthogonal to this discussion. In the pipeline presented by this work, infilling is performed
in a conservative way based on smoothness priors. An interesting avenue of future work would be to
efficiently combine the method with more elaborate geometry and color synthesis techniques.

Our work aimed at improving the effectiveness of the standard 3D scanning pipe-line in the case
of cluttered 3D models. This pipeline, combining passive and active acquisition methods, is currently
among the most commonly applied in the cultural heritage domain, due to the good reliability in a large
variety of settings. An important avenue of future work is to evaluate whether alternative image-based
approaches based on digital photography can effectively be adapted to the same difficult settings, e.g.,
by incorporating clutter analysis and flash light modeling in dense reconstruction pipelines.

Acknowledgments. The authors are grateful to Marco Minoja, Elena Romoli, and Alessandro Usai of ArcheoCAOR for supporting
this project and to Daniela Rovina, Alba Canu, Luisanna Usai and all the personnel of CRCBC for invaluable help during the
scanning campaign. We also thank Roberto Combet, Alex Tinti, Marcos Balsa, and Antonio Zorcolo of CRS4 Visual Computing for
their contribution to the scanning and post-processing tasks, and our in-house native English speaker Luca Pireddu for his help
in revising the manuscript.

REFERENCES

ADAN, A. AND HUBER, D. 2011. 3D reconstruction of interior wall surfaces under occlusion and clutter. In Proc. 3DIMPVT.
275–281.

ADOBE SYSTEMS INC. 2002. Adobe photoshop user guide.
BALSA RODRIGUEZ, M., AGUS, M., MARTON, F., AND GOBBETTI, E. 2014. HuMoRS: Huge models mobile rendering system. In

Proc. ACM Web3D International Symposium. ACM Press, New York, NY, USA.
BERNARDINI, F. AND RUSHMEIER, H. 2002. The 3D model acquisition pipeline. In Computer Graphics Forum. Vol. 21. 149–172.
BETTIO, F., GOBBETTI, E., MERELLA, E., AND PINTUS, R. 2013. Improving the digitization of shape and color of 3d artworks in

a cluttered environment. In Proc. Digital Heritage. 23–30.

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


Mont’e Scan: Effective shape and color digitization of cluttered 3D artworks • 4:21

BORGEAT, L., POIRIER, G., BERALDIN, A., GODIN, G., MASSICOTTE, P., AND PICARD, M. 2009. A framework for the registration
of color images with 3d models. Image Processing (ICIP) Proceedings, 69–72.

BOYKOV, Y. Y. AND JOLLY, M.-P. 2001. Interactive graph cuts for optimal boundary & region segmentation of objects in N–D
images. In Proc. ICCV. Vol. 1. 105–112.

CALAKLI, F. AND TAUBIN, G. 2011. SSD: Smooth signed distance surface reconstruction. In Computer Graphics Forum. Vol. 30.
1993–2002.

CALLIERI, M., CIGNONI, P., CORSINI, M., AND SCOPIGNO, R. 2008. Masked photo blending: Mapping dense photographic data
set on high-resolution sampled 3D models. Computers & Graphics 32, 4, 464–473.

CALLIERI, M., DELLEPIANE, M., CIGNONI, P., AND SCOPIGNO, R. 2011. Digital Imaging for Cultural Heritage Preservation:
Analysis, Restoration, and Reconstruction of Ancient Artworks. CRC Press, Chapter Processing sampled 3D data: reconstruction
and visualization technologies, 69–99.

CAYTON, L. 2012. Accelerating nearest neighbor search on manycore systems. In Proc. IEEE IPDPS. 402–413.
CHUANG, Y.-Y., CURLESS, B., SALESIN, D. H., AND SZELISKI, R. 2001. A Bayesian approach to digital matting. In Proc. CVPR.

264–271.
CORSINI, M., CALLIERI, M., AND CIGNONI, P. 2008. Stereo light probe. Computer Graphics Forum 27, 2, 291–300.
CORSINI, M., DELLEPIANE, M., GANOVELLI, F., GHERARDI, R., FUSIELLO, A., AND SCOPIGNO, R. 2012. Fully automatic

registration of image sets on approximate geometry. IJCV, 1–21.
CUCCURU, G., GOBBETTI, E., MARTON, F., PAJAROLA, R., AND PINTUS, R. 2009. Fast low-memory streaming MLS reconstruction

of point-sampled surfaces. In Graphics Interface. 15–22.
DAVIS, J., MARSCHNER, S. R., GARR, M., AND LEVOY, M. 2002. Filling holes in complex surfaces using volumetric diffusion. In

Proc. 3DPVT. 428–441.
DEBEVEC, P. 1998. Rendering synthetic objects into real scenes: bridging traditional and image-based graphics with global

illumination and high dynamic range photography. In Proc. SIGGRAPH. 189–198.
DEBEVEC, P., HAWKINS, T., TCHOU, C., DUIKER, H.-P., SAROKIN, W., AND SAGAR, M. 2000. Acquiring the reflectance field of a

human face. In Proc. SIGGRAPH. 145–156.
DELLEPIANE, M., CALLIERI, M., CORSINI, M., CIGNONI, P., AND SCOPIGNO, R. 2009. Flash lighting space sampling. In

Computer Vision/Computer Graphics Collaboration Techniques. 217–229.
DELLEPIANE, M., CALLIERI, M., CORSINI, M., CIGNONI, P., AND SCOPIGNO, R. 2010. Improved color acquisition and mapping

on 3D models via flash-based photography. ACM JOCCH 2, 4, Article 9.
FAROUK, M., EL-RIFAI, I., EL-TAYAR, S., EL-SHISHINY, H., HOSNY, M., EL-RAYES, M., GOMES, J., GIORDANO, F., RUSHMEIER,

H. E., BERNARDINI, F., ET AL. 2003. Scanning and processing 3d objects for web display. In 3DIM. 310–317.
FRIEDRICH, D., BRAUERS, J., BELL, A. A., AND AACH, T. 2010. Towards fully automated precise measurement of camera

transfer functions. In IEEE Southwest Symposium on Image Analysis & Interpretation. IEEE, 149–152.
KASS, M., WITKIN, A., AND TERZOPOULOS, D. 1988. Snakes: Active contour models. IJCV 1, 4, 321–331.
KAZHDAN, M., BOLITHO, M., AND HOPPE, H. 2006. Poisson surface reconstruction. In Proc. SGP. 61–70.
KAZHDAN, M. AND HOPPE, H. 2013. Screened poisson surface reconstruction. ACM Trans. Graph 32, 1.
KIM, S., LIN, H., LU, Z., SUESSTRUNK, S., LIN, S., AND BROWN, M. 2012. A new in-camera imaging model for color computer

vision and its application. IEEE Trans. PAMI 34, 12, 2289–2302.
KOUTSOUDIS, A., VIDMAR, B., IOANNAKIS, G., ARNAOUTOGLOU, F., PAVLIDIS, G., AND CHAMZAS, C. 2014. Multi-image 3d

reconstruction data evaluation. Journal of Cultural Heritage 15, 1, 73 – 79.
LAFARGE, F. AND MALLET, C. 2012. Creating large-scale city models from 3D point clouds: a robust approach with hybrid

representation. IJCV 99, 1, 69–85.
LENSCH, H. P. A., KAUTZ, J., GOESELE, M., HEIDRICH, W., AND SEIDEL, H.-P. 2003. Image-based reconstruction of spatial

appearance and geometric detail. ACM TOG 22, 2, 234–257.
LEVOY, M., PULLI, K., CURLESS, B., RUSINKIEWICZ , S., KOLLER , D., PEREIRA , L., GINZTON, M., ANDERSON, S., DAVIS, J.,

GINSBERG, J., ET AL. 2000. The digital michelangelo project: 3d scanning of large statues. In Proc. SIGGRAPH. 131–144.
MANSON, J., PETROVA, G., AND SCHAEFER, S. 2008. Streaming surface reconstruction using wavelets. In Computer Graphics

Forum. Vol. 27. 1411–1420.
MARTON, F., BALSA RODRIGUEZ, M., BETTIO, F., AGUS, M., JASPE VILLANUEVA, A., AND GOBBETTI, E. 2014. IsoCam:

Interactive visual exploration of massive cultural heritage models on large projection setups. ACM Journal on Computing and
Cultural Heritage 7, 2, Article 12.

MCGUINNESS, K. AND O’CONNOR, N. E. 2010. A comparative evaluation of interactive segmentation algorithms. Pattern
Recognition 43, 2, 434–444.

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.


4:22 • F. Bettio, A. Jaspe Villanueva, E. Merella, F. Marton, E. Gobbetti, and R. Pintus

MORTENSEN, E. N. AND BARRETT, W. A. 1999. Toboggan-based intelligent scissors with a four-parameter edge model. In Proc.
CVPR. Vol. 2.

MURA, C., MATTAUSCH, O., JASPE VILLANUEVA, A., GOBBETTI, E., AND PAJAROLA, R. 2013. Robust reconstruction of interior
building structures with multiple rooms under clutter and occlusions. In Proc. 13th International Conference on Computer-Aided
Design and Computer Graphics. 52–59.

OREN, M. AND NAYAR, S. K. 1994. Generalization of lambert’s reflectance model. In Proc. SIGGRAPH. ACM, 239–246.
PINGI, P., FASANO, A., CIGNONI , P., MONTANI , C., AND SCOPIGNO, R. 2005. Exploiting the scanning sequence for automatic

registration of large sets of range maps. Computer Graphics Forum 24, 3, 517–526.
PINTUS, R. AND GOBBETTI, E. 2014. A fast and robust framework for semi-automatic and automatic registration of photographs

to 3d geometry. ACM Journal on Computing and Cultural Heritage. To appear.
PINTUS, R., GOBBETTI, E., AND CALLIERI, M. 2011a. Fast low-memory seamless photo blending on massive point clouds using a

streaming framework. ACM JOCCH 4, 2, Article 6.
PINTUS, R., GOBBETTI, E., AND CALLIERI, M. 2011b. A streaming framework for seamless detailed photo blending on massive

point clouds. In Proc. Eurographics Area Papers. 25–32.
PINTUS, R., GOBBETTI, E., AND COMBET, R. 2011c. Fast and robust semi-automatic registration of photographs to 3D geometry.

In Proc. VAST. 9–16.
PULLI, K. 1999. Multiview registration for large data sets. In Proc. 3D Digital Imaging and Modeling. 160–168.
REMONDINO, F. 2011. Heritage recording and 3d modeling with photogrammetry and 3d scanning. Remote Sensing 3, 6,

1104–1138.
ROTHER, C., KOLMOGOROV, V., AND BLAKE, A. 2004. Grabcut: Interactive foreground extraction using iterated graph cuts. In

ACM TOG. Vol. 23. 309–314.
RUZON, M. A. AND TOMASI, C. 2000. Alpha estimation in natural images. In Proc. CVPR. 18–25.
SCHEIBLAUER, C. AND WIMMER, M. 2011. Out-of-core selection and editing of huge point clouds. Computers & Graphics 35, 2,

342–351.
SNAVELY, N., SEITZ, S. M., AND SZELISKI, R. 2008. Modeling the world from internet photo collections. IJCV 80, 2.
WAND, M., BERNER, A., BOKELOH, M., FLECK, A., HOFFMANN, M., JENKE, P., MAIER, B., STANEKER, D., AND SCHILLING, A.

2007. Interactive editing of large point clouds. In Proc. SPBG. 37–46.
WIMMER, M. AND SCHEIBLAUER, C. 2006. Instant points: Fast rendering of unprocessed point clouds. InProc. SPBG. 129–137.
WU, C., DENG, J., AND CHEN, F. 2008. Diffusion equations over arbitrary triangulated surfaces for filtering and texture

applications. Visualization and Computer Graphics, IEEE Transactions on 14, 3, 666–679.
XU, N., AHUJA, N., AND BANSAL, R. 2007. Object segmentation using graph cuts based active contours. Computer Vision and

Image Understanding 107, 3, 210–224.
ZENG, Y., SAMARAS, D., CHEN, W., AND PENG, Q. 2008. Topology cuts: A novel min-cut/max-flow algorithm for topology

preserving segmentation in N–D images. Computer vision and image understanding 112, 1, 81–90.
ZHANG, H., FRITTS, J. E., AND GOLDMAN, S. A. 2008. Image segmentation evaluation: A survey of unsupervised methods.

Computer Vision and Image Understanding 110, 2, 260–280.

ACM Journal on Computing and Cultural Heritage, Vol. 8, No. 1, Article 4, Publication date: August 2014.