Visual Privacy Protection Methods: A Survey

José Ramón Padilla-Lópeza,∗, Alexandros Andre Chaaraouia, Francisco
Flórez-Revueltab

aDepartment of Computer Technology, University of Alicante,
P.O. Box 99, E-03080 Alicante, Spain

bFaculty of Science, Engineering and Computing, Kingston University,
Penrhyn Road, KT1 2EE, Kingston upon Thames, United Kingdom

Abstract

Recent advances in computer vision technologies have made possible the de-
velopment of intelligent monitoring systems for video surveillance and ambient-
assisted living. By using this technology, these systems are able to automatically
interpret visual data from the environment and perform tasks that would have
been unthinkable years ago. These achievements represent a radical improve-
ment but they also suppose a new threat to individual’s privacy. The new
capabilities of such systems give them the ability to collect and index a huge
amount of private information about each individual. Next-generation systems
have to solve this issue in order to obtain the users’ acceptance. Therefore,
there is a need for mechanisms or tools to protect and preserve people’s privacy.
This paper seeks to clarify how privacy can be protected in imagery data, so
as a main contribution a comprehensive classification of the protection methods
for visual privacy as well as an up-to-date review of them are provided. A sur-
vey of the existing privacy-aware intelligent monitoring systems and a valuable
discussion of important aspects of visual privacy are also provided.

Keywords: visual privacy protection, video surveillance, ambient-assisted
living, computer vision, image processing

1. Introduction

It can be observed that world population is ageing. In fact, it is estimated
that population over 50 will rise by 35% between 2005 and 2050, and those
over 85 will triple by 2050 (EC, 2010). Furthermore, the number of people in
long-term care living alone is expected to increase by 74% in Japan, 54% in
France and 41% in the US (EC, 2008). Therefore, this situation will not be

∗Corresponding author
Email addresses: jpadilla@dtic.ua.es (José Ramón Padilla-López),

alexandros@dtic.ua.es (Alexandros Andre Chaaraoui), F.Florez@kingston.ac.uk
(Francisco Flórez-Revuelta)

Preprint submitted to Expert Systems with Applications February 3, 2015


sustainable in the near future, unless new solutions for the support of the older
people, which take into account their needs, are developed.

Ambient-assisted living (AAL) aims to provide a solution to this situation.
AAL applications use information and communication technologies to provide
support to people so as to increase their autonomy and well-being. Video cam-
eras are being used more and more frequently in AAL applications because they
allow to get rich visual information from the environment. The advances pro-
duced in the last decades have contributed to this. The computational power
has been increased while, at the same time, costs have been reduced. Fur-
thermore, computer vision advances have given video cameras the ability of
‘seeing’, becoming smart cameras (Fleck & Strasser, 2008). This has enabled
the development of vision-based intelligent monitoring systems that are able to
automatically extract useful information from visual data to analyse actions,
activities and behaviours (Chaaraoui et al., 2012b), both for individuals and
crowds, monitoring, recording and indexing video bitstreams (Tian et al., 2008).
By installing networks of cameras in people homes or care homes, novel vision-
based telecare services are being developed in order to support the older and
disabled people (Cardinaux et al., 2011). But these new technologies also sup-
pose a new threat to individual’s privacy.

Traditionally, cameras are used in public spaces for surveillance services in
streets, parking lots, banks, airports, train stations, shopping centres, museums,
sports installations and many others. It is estimated that there is an average
of one camera for every 32 citizens in the UK, one of the most camera-covered
countries in the world (Gerrard & Thompson, 2011). In short, video cameras
are mainly used in outdoor environments and in public places, but they are
not commonly used within private environments due to people’s concerns about
privacy. Generally, the use of video cameras in public places has been tolerated
or accepted by citizens, whereas their use in private spaces has been refused.
There may be several reasons to explain this difference. On the one hand,
the perceived public-safety benefits favor the usage of cameras in public places
for crime prevention, fight against terrorism and others. On the other hand,
there is a widespread belief that while staying in public environments, people’s
sensitive information will not be exposed. Finally, there are some attitudes
which have also contributed to accept their use, for example, to assume that
anyone demanding privacy must have something to hide (Caloyannides, 2003).

In traditional video surveillance systems cameras are managed by human
operators that constantly monitor the screens searching for specific activities or
incidents. As estimated by Dee & Velastin (2008), the ratio between human
operators and screens is around 16 displays for every operator in four local
authority installations within the UK. Although they can only really watch 1-
4 screens at once (Wallace & Diffley, 1988), this does not prevent abuses of
these systems by their operators. Furthermore, the processing capacities of
next-generation video surveillance systems and the increasing number of closed-
circuit television cameras installed in public places are raising concerns about
individual’s privacy in public spaces too.

In the near future it is expected that cameras will surround us in both public

2


and private spaces. Intelligent monitoring systems threaten individual’s right
to privacy because of automatic monitoring (Adams & Ferryman, 2013). These
systems can retain a variety of information about people habits, visited places,
relationships, and so on (Coudert, 2010). It is known that some systems already
use facial recognition technology (Goessl, 2012). This way, these systems may
build a profile for each citizen in which the people identity and related sensi-
tive information is revealed. Therefore, this evolution of intelligent monitoring
systems could be seen as approaching an Orwellian Big Brother, as people may
have the feeling of being constantly monitored.

In the light of the above, it is clear that the protection of the individual’s
privacy is of special interest in telecare applications as well as in video surveil-
lance, regardless whether they operate in private or public spaces. Therefore,
privacy requirements must be considered in intelligent monitoring systems by
design (Langheinrich, 2001; Schaar, 2010). As aforementioned, smart cameras
become essential for AAL applications. Given that security and privacy protec-
tion have become critical issues for the acceptance of video cameras, a privacy-
aware smart camera would make it possible to use video cameras in realms
where they have never been used before. If individual’s privacy can be guaran-
teed through the use of this technology, public acceptance would be increased
giving the opportunity of installing these cameras in private environments to
replace simpler binary sensors or, most importantly, to develop new telecare
services (Olivieri et al., 2012; Chen et al., 2012; Morris et al., 2013). This
breakthrough could open the door to novel privacy-aware applications for am-
bient intelligence (AmI) (Augusto et al., 2010), and more specifically in AAL
systems for ageing in place (O’Brien & Mac Ruairi, 2009), being beneficial to
improve the quality of life and to maintain the independence of people in need
of long-term care.

1.1. Related studies

The focus of this review is on the protection of visual privacy. There are
valuable reviews about AmI and AAL that have already considered privacy in
video and have also highlighted its importance for the adoption of video-based
AAL applications (Cook et al., 2009; Cardinaux et al., 2011). But these works
scarcely go into detail about how visual privacy protection can be achieved.
Other works from the video surveillance field have also analysed this topic but
from a different point of view (Senior et al., 2003, 2005; Cavoukian, 2013). The
main threats and risks of surveillance technologies like closed-circuit television
cameras, number plates recognition, geolocation and drones are discussed in
depth. As a consequence, some guidelines to manage privacy are also proposed,
but how to protect visual privacy is not considered. In the same line, Senior
& Pankanti (2011) unify their previous works and extend the review of visual
privacy not only to video surveillance but also to medical images, media spaces
and institutional databases. They consider some technologies to protect visual
privacy and provide a classification. As far as we know, this is the first attempt
to provide such a classification of protection methods for visual privacy but it
is not a comprehensive one. In a more recent work (Winkler & Rinner, 2014),

3


security and privacy in visual sensor networks are reviewed. Although they per-
form a detailed analysis of the security from several points of view (data-centric,
node-centric, network-centric and user-centric), they do not provide an in-depth
analysis of privacy protection.

In this survey, we focus on giving an answer to the question of how the vi-
sual privacy can be protected, and how such a kind of protection is developed
by some of the existing privacy-aware intelligent monitoring systems that have
been found in the literature. Because of this, a comprehensive classification of
visual privacy protection methods is provided as the main contribution of this
work. The remainder of this paper is organised as follows: Section 2 gives an
intuitive notion of what visual privacy protection is. A comprehensive review
of visual privacy protection methods is presented in section 3. In section 4,
relevant privacy-aware intelligent monitoring systems are introduced. A discus-
sion of important privacy-related aspects is carried out in section 5. Finally,
a summary of the present work as well as future research directions are given
in section 6.

2. Visual privacy protection

Privacy protection consists in preventing that the information that an in-
dividual wants to keep private becomes available to the public domain. In the
context of images and videos, we refer to it as visual privacy protection. In this
paper, the terms visual privacy and privacy will be used indistinctly, except
when indicated.

First of all, it is worth to clarify when individual’s privacy needs to be
protected. When protecting privacy, it can be differentiated between person’s
identity and sensitive information which has to be kept in private. Video can
convey an enormous amount of information that can be qualified as sensitive.
Nevertheless, if sensitive information is present in a video but person’s identity
is not, there is no privacy loss. The same is true whether person’s identity
is in a video but without any sensitive information. In both cases, privacy
is protected because there does not exist any association or mapping between
sensitive information and person’s identity.

Another important issue related to visual privacy is which is the sensitive
information or region of interest to be protected. In many works only the face
is obscured but that is not enough to protect visual privacy. Even when the
person’s face is obscured, other elements could exist in the image through which
person identification may be performed, for instance, using inference channels
and previous knowledge (Saini et al., 2014). Visual cues like clothes, height,
gait, and the like can be used to identify the person. For instance, in a pair-
wise constraints identification (Chang et al., 2006; Chen et al., 2009) where
faces had been masked, observers were able to identify whether a person in one
image was the same one than in a different image. In that test, recognition
hit rate was higher than 80%. By using this information and only detecting an
image where a privacy breach exists, the person may be identified and tracked

4


in images where privacy was presumably preserved. These visual clues must
be considered in order to protect privacy as they affect to the election of which
regions of interest have to be protected. So, there is not actually only one region
of interest but multiple. A region of interest should be extended to a wider area
in some cases, while two or more regions of interest should be created in others.

3. Protection methods

There are different ways to protect and preserve the privacy of individuals
appearing in videos and images (see Table 1). Two approaches can be found
if we consider the temporal aspect of when a protection method is used, i.e.
before the image is acquired, or after it. On the one hand, it is possible to
prevent others of successfully capturing images in which an individual appears.
On the other hand, once an image exists sensitive or private information (e.g.
faces, number plates, and so on) can be removed (Figure 1).

According to how privacy is protected, protection methods can be classified
in five large categories: intervention (section 3.1), blind vision (section 3.2),
secure processing (section 3.3), redaction (section 3.4) and data hiding (sec-
tion 3.5). Although the latter is often used along redaction methods, it has
been added as it is a large field of research itself. In this section, a review of
commonly used protection methods to protect visual privacy is presented.

Table 1: Overview of reviewed papers according to the used visual
privacy protection method.

Category Count References
Intervention 4 Wagstaff (2004), Patel et al. (2009), Harvey (2010),

Mitskog & Ralston (2012)

Blind vision 5 Avidan & Butman (2006), Erkin et al. (2009), Avidan et al.
(2009), Sadeghi et al. (2010), Shashanka (2010)

Secure
processing

8 Erturk (2007), Shashank et al. (2008), Park & Kautz
(2008), Ito & Kiya (2009), Upmanyu et al. (2009), Ng et al.
(2010), Chaaraoui et al. (2012a), Zhang et al. (2012)

Redaction:
Image filter

10 Hudson & Smith (1996), Zhao & Stasko (1998), Boyle et al.
(2000), Neustaedter & Greenberg (2003), Kitahara et al.
(2004), Mart́ınez-Ponte et al. (2005), Neustaedter et al.
(2006), Zhang et al. (2006), Frome et al. (2009), Agrawal &
Narayanan (2011)

Continued on next page

5


Table 1 – Continued from previous page
Category Count References
Redaction:
Encryption

27 Spanos & Maples (1995), Macq & Quisquater (1995), Agi
& Gong (1996), Tang (1996), Qiao et al. (1997), Zeng &
Lei (2003), Yang et al. (2004), Boult (2005), Yabuta et al.
(2005), Yabuta et al. (2006), Dufaux & Ebrahimi (2006),
Dufaux et al. (2006), Chattopadhyay & Boult (2007),
Baaziz et al. (2007), Raju et al. (2008), Dufaux & Ebrahimi
(2008a), Dufaux & Ebrahimi (2008b), Xiangdong et al.
(2008), Carrillo et al. (2010), Dufaux & Ebrahimi (2010),
Tong et al. (2010), Tong et al. (2011), Dufaux (2011), Sohn
et al. (2011), Pande & Zambreno (2013), Li et al. (2013),
Ra et al. (2013)

Redaction:
K-same
family

7 Newton et al. (2005), Gross et al. (2006a), Gross et al.
(2006b), Bitouk et al. (2008), Gross et al. (2009), De la
Hunty et al. (2010), Lin et al. (2012)

Redaction:
Object /
people
removal

22 Kokaram et al. (1995), Igehy & Pereira (1997), Masnou &
Morel (1998), Morse & Schwartzwald (1998), Efros & Leung
(1999), Bertalmio et al. (2000), Efros & Freeman (2001),
Criminisi et al. (2003), Criminisi et al. (2004), Zhang et al.
(2005b), Chan et al. (2006), Cheung et al. (2006), Shiratori
et al. (2006), Wexler et al. (2007), Patwardhan et al.
(2007), Whyte et al. (2009), Vijay Venkatesh et al. (2009),
Koochari & Soryani (2010), He et al. (2011), Ma & Ma
(2011), Ghanbari & Soryani (2011), Abraham et al. (2012)

Redaction:
Visual
abstraction

14 Hodgins et al. (1998), Lyons et al. (1998), Tansuriyavong &
Hanaki (2001), Fan et al. (2005), Williams et al. (2006),
Chen et al. (2007), Baran & Popović (2007), Hogue et al.
(2007), Chinomi et al. (2008), Chen et al. (2009), Qureshi
(2009), Sadimon et al. (2010), Borosán et al. (2012), Chen
et al. (2014)

Data hiding 12 Petitcolas et al. (1999), Wu (2001), Cox et al. (2002),
Yabuta et al. (2005), Zhang et al. (2005a), Ni et al. (2006),
Yu & Babaguchi (2007), Paruchuri & Cheung (2008),
Cheung et al. (2008), Cheung et al. (2009), Paruchuri et al.
(2009), Guangzhen et al. (2010)

3.1. Intervention

Intervention methods deal with the problem of preventing someone to cap-
ture private visual data from the environment. They aim to create capture-
resistant spaces. These methods physically intervene camera devices to prevent
the acquisition of an image by means of a specialised device that interferes
with the camera optical lens. For instance, Patel et al. (2009) developed the
BlindSpot system. It locates any number of retro-reflective CCD or CMOS cam-
era lenses around a protected area and, then, it directs a pulsing light at the
detected lens, spoiling any images that cameras may record as Figure 2 shows.

6


Figure 1: Some privacy protected sequences of images: the first column shows the sensitive
information or region of interest; the second column shows the region of interest replaced
by the silhouette; the third column shows the sensitive information scrambled; and the last
column shows the sensitive areas inpainted. Reprinted from Paruchuri et al. (2009).

Similarly, Harvey (2010) proposed an anti-paparazzi device. It uses an array of
high-power LEDs to produce a stream of light at over 12 000 lumen. Mitskog &
Ralston (2012) have patented a camera blocker for a device with an integrated
camera that uses a thin film organic polymer to neutralise camera lens. This
camera blocker is reusable and adhesively sticks to any surface without leaving

7


Figure 2: Light beam neutralizing a camera lens. A light beam of single colour is used on the
left picture, whereas a light beam generated using colour patterns is used on the right picture.
Reprinted from Patel et al. (2009).

a sticky residue.
Aforementioned approaches are suitable when, once recorded, no control

can be enforced over the use of images. However, when enforcement is possible,
software-based methods can be used. For instance, the firmware of a camera
or an application installed on a smartphone could be responsible of preventing
the capture of certain environments, such as artworks in a museum, when the
device comes into the range of a bluetooth transmitter (Wagstaff, 2004). Nev-
ertheless, software-based intervention has some drawbacks. First of all, it can
be easily thwarted by using a camera without any privacy software installed.
Furthermore, users consent and their collaboration is required in order to work
successfully. Because systems like these depend on third parties, they are likely
doomed to failure. A new privacy legislation that enforces cameras to be in
accordance with privacy protocols is needed. Anyway, given that no system is
completely secure by nature, even under this assumption, it might be hacked.

3.2. Blind vision

Blind vision has to do with image or video processing in an anonymous way.
It addresses privacy related processing issues by means of secure multi-party
computation (SMC) techniques that are applied to vision algorithms (Avidan
& Butman, 2006). SMC is a subfield of cryptography that enables multiple
parties to jointly compute a function in such a way that their inputs and the
function itself are not revealed. When applied to vision this means that one
party, Bob, could use a vision algorithm of another one, Alice, without enabling
Bob to learn anything about Alice’s algorithm and, in the same extent, without
enabling Alice to learn anything about Bob’s images. These methods are useful
in order to use third-party algorithms to process data in a privacy-respectful
way. For instance, using blind vision techniques, common tasks such as face
detection, image matching, object tracking or image segmentation could be
done in an anonymous way.

8


Concerning this, Avidan & Butman (2006) developed a secure classifier that
was used in a blind face detector. This classifier is based on an oblivious transfer
method and a secure dot-product protocol in order to compute the face detection
algorithm. However, it is very time consuming due to the high computational
load of the algorithm. Nevertheless, a proposal to accelerate the algorithm by
using histograms of oriented gradients is considered at the risk of revealing some
information about the original image.

Similarly, Erkin et al. (2009) proposed an efficient algorithm that allows to
jointly run the Eigenfaces (Turk & Pentland, 1991) recognition algorithm. This
algorithm operates over homomorphically encrypted data. It allows matching
an encrypted facial image with a database of facial templates in such a way that
the biometric itself and the detection result are kept in secret from the server
that performs the matching. Sadeghi et al. (2010) improved the communication
and computation efficiency of the previous algorithm.

Blind vision algorithms are also used for image matching. Avidan et al.
(2009) presented a protocol for image matching related algorithms. Image
matching is described as a generalisation of detection and recognition tasks.
The described algorithm is built upon a secure fuzzy matching of SIFT at-
tributes, which are treated as 16 character text strings, and it is used in their
bag-of-features approach.

A framework for privacy-preserving Gaussian Mixture Model (GMM) com-
putations is proposed by Shashanka (2010). GMMs are commonly used in
machine learning for clustering and classification. In computer vision they
are mainly used in background subtraction (Zivkovic, 2004) for motion anal-
ysis (Chan & Liu, 2009).

3.3. Secure processing

There are other methods that are not based on SMC but that process visual
information in a privacy respectful way. They are referred as secure processing
in this work. For example, an image matching algorithm for private content-
based image retrieval (PCBIR) is proposed by Shashank et al. (2008). PCBIR
is related with similarity search. Conversely to blind vision, privacy is required
in one direction because the whole database is usually public, while the query
is kept private.

Another possibility is to work with images in a domain that does not reveal
visual information. Ito & Kiya (2009) presented an image matching algorithm
using phase-only correlation (POC) in the frequency domain. This algorithm
preserves the visual privacy of the images in a template database. In order
to achieve this, all the images of the template database are converted to the
frequency domain. Then, a phase scrambling using an one-time key is applied
to the discrete Fourier transformation (DFT) coefficients. Afterwards, in order
to match a query image with an image of the templates database, the query
image is converted to the frequency domain and the matching is done with
POC using the DFT coefficients as inputs.

Algorithms that preserve privacy in an implicit manner, for instance, re-
jecting visual information that is not necessary for the algorithm to work are

9


also considered under the umbrella of secure processing. Ng et al. (2010) pro-
posed a privacy-preserving stereoscopic vision algorithm. It preserves privacy
by calculating the disparity map using One-Bit Transform (Erturk, 2007). This
way, each pixel in the input images is reduced to one bit. In this process, a
huge amount of information (colour, texture and so on) is removed in the out-
put images, complicating thus the identification of persons appearing on the
images.

A prototype of a privacy-preserving system for recognition of activities of
daily living is proposed by Park & Kautz (2008). This prototype relies on the
silhouette of detected foreground objects and the motion-map of the frame in
order to analyse the activities. This way, if the silhouette and motion-map gen-
eration is performed within the camera and it is not possible to access the RGB
signal, the visual privacy of the persons inhabiting the environment would be
ensured from the algorithm point of view. Similarly, Chaaraoui et al. (2012a)
use silhouettes in their efficient approach for multi-view human action recogni-
tion based on bag of key poses. By using silhouettes, identifiable information is
removed, hence it can also be considered a privacy-aware method. Depth infor-
mation from RGB-D cameras can be used as a way to preserve privacy (Zhang
et al., 2012) as well. Depth data can be obtained from low-cost structured-light
cameras like Microsoft Kinect or Asus Xtion, or time-of-flight cameras like Mi-
crosoft Kinect 2. Given that no colour information is involved, a depth map
visualisation does not enable face recognition and prevents direct extraction of
visual clues for person identification.

Upmanyu et al. (2009) presented an interesting approach where an efficient
framework to carry out privacy-preserving surveillance is proposed. This so-
lution is inspired in a secret sharing scheme adapted to image data. In this
framework an image is split into a set of random images in such a way that the
shares do not contain any meaningful information about the original image. De-
spite of shares being distributed between a certain number of servers, computer
vision algorithms can be applied securely and efficiently.

3.4. Redaction

Image and video modification or redaction methods are the most common vi-
sual privacy protection methods. They modify the sensitive regions of an image
such as faces, bodies, number plates, etc. to conceal private information con-
cerning the subjects appearing on it. In order to determine the privacy sensitive
regions in which a redaction method operates, computer vision algorithms are
used. However, this section focuses only on the application of privacy-preserving
methods, therefore it is assumed that sensitive regions are correctly detected.

According to the way in which an image is modified, redaction methods
can be classified in several categories. To begin with, there are ad-hoc distor-
tion/suppression methods (section 3.4.1). These methods modify the region of
interest of an image, either completely removing sensitive information from the
image, or modifying the information using common image filters like blurring
or pixelating. By using these filters, sensitive regions are modified in order to
make them unrecognisable.

10


More robust methods like image encryption (section 3.4.2) are also used to
conceal the region of interest. Using image encryption techniques the privacy
sensitive region of an image is ciphered by a key. Generally encryption tech-
niques based on scrambling (permutation) were commonly used in analogue
video, but they can also be used in digital video. Besides of that, it must be
taken into account that in the literature scrambling is used as a synonym of
encryption in some cases. In this paper, we consider scrambling techniques as
a subfield of image encryption.

Another approach to image redaction is face de-identification (section 3.4.3).
These methods are focused on de-identifying the faces appearing in an image.
Although the aforementioned methods can be used for face de-identification, we
focus here on those based on the k-same family of algorithms that implements
the k-anonymity protection model (Sweeney, 2002). These algorithms alter the
face of a person in such a way that identity cannot be recognised but facial
expressions are preserved.

Finally, some approaches like object removal (section 3.4.4) use inpainting-
based algorithms to completely remove sensitive regions of an image by filling the
left gap with the corresponding background. Once the image has been inpainted,
a visual abstraction of the removed sensitive information can be rendered, like
a stick figure, a point, a silhouette, and the like (Chinomi et al., 2008).

Regarding image modification, some questions have to be outlined. Redac-
tion methods cannot modify an image in whatever way. As reported by Hudson
& Smith (1996), there is a trade-off between providing privacy and intelligibil-
ity in images. For instance, when an image is modified, information needed for
image understanding may be also removed. So, a modified image could lack
of utility and balancing privacy and intelligibility is needed. In other words,
privacy protected images have to retain useful information needed by applica-
tions built upon this information, such as telecare monitoring systems or video
surveillance. It is also worth mentioning that whereas some of the redaction
methods are irreversible, i.e. the modification cannot be undone, there are
others methods that are reversible so the original version can be recovered if
needed. Nevertheless, reversible methods can be developed by using irreversible
redaction methods along with data hiding algorithms.

An overview of redaction methods has been given so far. In next subsections,
each one of the described categories will be dealt with in depth, focusing on
the most representative works and summarising how the presented methods are
used.

3.4.1. Image filtering

Redaction methods based on image filtering use common image filters to
apply various effects on images in order to modify privacy sensitive regions
(Figure 3). Depending on the application, image filters can be used for ob-
scuring human faces, human bodies, number plates or even background in
video conferences (Kitahara et al., 2004; Mart́ınez-Ponte et al., 2005; Zhang

11


(a) Lena (b) Lena with Gaussian Blur (c) Lena Pixelation

Figure 3: Several examples of image filtering methods: a) the original image without any
modification, b) image modified by applying a Gaussian Blur filter, and c) image modified by
applying a pixelating filter.

et al., 2006; Frome et al., 2009). Among the most commonly used filters, blur-
ring (Neustaedter & Greenberg, 2003; Zhang et al., 2006; Neustaedter et al.,
2006; Frome et al., 2009) and pixelating (Boyle et al., 2000; Kitahara et al.,
2004; Neustaedter et al., 2006) stand out.

A blurring filter applies a Gaussian function over an image. This function
modifies each pixel of an image using neighbouring pixels. As a result, a blurred
image is obtained in which the details of sensitive regions have been removed.
Blurring is widely used in Google Street View (Frome et al., 2009) to modify
human faces and number plates. A pixelating filter divides an image into a grid
of eight-pixel wide by eight-pixel high blocks. The average colour of the pixels
of each block is computed and the resultant colour is assigned to all of the pixels
belonging to that block. As a result, an image where the resolution of sensitive
regions have been reduced is obtained. Pixelating is commonly used in television
to preserve the anonymity of suspects, witnesses or bystanders. However, it is
vulnerable to some kind of attacks such as integrating pixels along trajectories
over time that may allow partly recovering of the obscured information.

Lander et al. (2001) evaluated the effectiveness of pixelating and blurring
in videos and static images. Results indicated that participants were still able
to recognise some of the faces. Similarly, Newton et al. (2005) showed that
pixelating and blurring alike filters do not thwart recognition software. Training
a parrot recogniser using the same distortion as the probe on gallery images,
high recognition rates are obtained (near 100%) despite looking somewhat de-
identified to humans.

3.4.2. Encryption of videos and images

Image and video encryption methods encode imagery data in such a way that
the original data becomes unintelligible as can be observed in Figure 4. The
main goal of these methods is reliable security in storage and secure transmis-
sion of content over the network. Usually, security in cryptographic algorithms
resides in the strength of the used key instead of in keeping the algorithm pri-

12


Figure 4: Two examples of an encrypted image where the face of the person is considered the
sensitive region. Reprinted from Boult (2005).

vate. By using encryption, a distorted video that unauthorised viewers cannot
visualise is obtained. Only users who have the proper key for decryption can
visualise it. Through the inverse operation and the used key, the ciphered data
can be deciphered in order to retrieve the original images. This is named condi-
tional access through encryption. Encryption methods operate over the whole
frame or a delimited region of all the video frames. Although such methods do
not provide a balance between privacy and intelligibility, they enable to perform
data analysis over unprotected data once authorisation and required permissions
have been granted. In such cases, privacy would be protected until access to
raw data is eventually requested.

Generally näıve video encryption algorithms have treated the compressed
video bitstream as text data, therefore encrypting the entire video bitstream.
Hence, commonly used encryption algorithms such as Data Encryption Standard
(DES), Rivest’s Cipher (RC5), Advanced Encryption Standard (AES), Rivest,
Shamir and Adleman (RSA) and so on, have been used. These algorithms
guarantee the highest security level but, unfortunately, they are not suitable for
real-time video encryption because they are very time consuming (Yang et al.,
2004; Pande & Zambreno, 2013). Due to this, selective encryption algorithms
have been proposed (Spanos & Maples, 1995). These algorithms keep using text-
based encryption but encrypt only a selected part of the video bitstream so as to
get real-time encryption. Other encryption algorithms have also been proposed
for real-time encryption, namely light-weight encryption algorithms (Zeng & Lei,
2003). These algorithms are suitable for real-time applications because, when
encrypting, they use a simple XOR cipher or only encrypt some bits of the video
bitstream. Thereby, they are much faster than the first ones. Finally, there are
methods based on scrambling (Tang, 1996). Traditional scrambling methods
modify an analogue video signal like those found on closed-circuit television
cameras to make it unintelligible. However, with the proliferation of digital video
cameras, scrambling techniques are also applied to digital videos in the field of

13


video encryption. Mainly, scrambling algorithms are based on permutation only
methods in which transformed coefficients are then permuted in order to distort
the resulting image.

Each one of these methods can operate in a specific domain, like the spatial
domain, frequency domain (transform domain) or code-stream domain (com-
pressed video). Furthermore, it is important to note that light-weight en-
cryption and scrambling-based methods are less secure than näıve encryption.
For instance, scrambling video in the spatial domain is subject to efficient at-
tacks (Macq & Quisquater, 1995). Generally these algorithms trade-off security
for encryption speed. However, compared to blurring and pixelating, scram-
bling approaches as in Dufaux & Ebrahimi (2008a) are successful at hiding
identity (Dufaux & Ebrahimi, 2010; Dufaux, 2011).

Regarding when encryption is performed, there are several approaches: prior
to encoding, after encoding or during encoding. Each approach has advantages
and disadvantages. Prior-to-encoding encryption is a very simple method that
works with the original image independently from the used encoding scheme.
However, it significantly changes the statistics property of the video signal,
resulting in a less efficient compression later. Regarding after-encoding encryp-
tion, the compressed code-stream is encrypted after video encoding. The result-
ing encrypted and compressed code-stream could hardly be reproducible in a
standard player, and it could even cause the player to crash. However, it avoids
to fully decode and re-encode the video. Finally, during-encoding encryption
has the advantage of a fine-grained control over the encoding process but it is
closely linked to the used video encoding scheme.

Next, some of the selective and light-weight encryption as well as scrambling
methods found in the literature will be analysed.

Concerning selective encryption, AEGIS (Spanos & Maples, 1995) is an algo-
rithm that uses DES to encrypt only the I-frames of the MPEG video bit-
stream. By ciphering I-frames, the needed B-frames and P-frames cannot be
reconstructed either. However, the algorithm is not completely secure due to
partial information leakage from the I-blocks in P and B frames (Agi & Gong,
1996). Similarly, the video encryption algorithm proposed by Qiao et al. (1997)
works with I-frames. It divides them in two halves that are XORed and stored
in one half. One of the half is encrypted using DES algorithm. Although this
algorithm is secure, it is not suitable for real-time applications. Raju et al.
(2008) analyse the distribution of the DCT coefficients (DC and AC) of com-
pressed MPEG videos in order to develop a computationally efficient and secure
encryption scheme for real-time applications. DC and AC coefficients are man-
aged differently regarding their visual influence, and electronic code block and
cipher block chaining modes are interleaved to adapt the encryption process to
the video data. The described scheme uses RC5 for encrypting DCT coefficients.
Boult (2005) used DES and AES to encrypt faces in JPEG images during com-
pression in their privacy approach through invertible cryptographic obfuscation.
The information required for the decrypting process is stored inside the JPEG

14


file header. Although this information can be publicly read, it cannot be used
without the private key. Chattopadhyay & Boult (2007) used this technique for
real-time encryption, using uCLinux on the Blackfin DSP architecture. Simi-
larly, an encryption scheme for JPEG images is used by Ra et al. (2013). The
JPEG image is divided into two parts, one public and one private. The first
one is unaltered, whereas in the second one the most significant DC coefficients
are encrypted during the encoding process after the quantisation step. This ap-
proach is designed to obtain JPEG-compliant images that can be sent to photo
sharing services under storage and bandwidth constraints.

As for light-weight video encryption, Zeng & Lei (2003) presented an efficient
algorithm for H263 that operates in the frequency domain. Bit scrambling is
used to transform coefficients and motion vectors during video encoding without
affecting the compression efficiency. By using this method each frame of the re-
sulting video is completely distorted. A cryptographic key is used to control the
scrambling process, thereby authorised users will be able to undo the scrambling
using the key. A similar video encryption algorithm for MPEG-4 is proposed
by Dufaux & Ebrahimi (2006) where security is provided by pseudo-randomly
inverting the sign of selected transform coefficients (frequency domain) corre-
sponding to the regions of interest. The encryption process depends on a pri-
vate key that is RSA encrypted and inserted in the stream as metadata. In this
method the amount of distortion introduced can be adjusted from merely fuzzy
to completely noisy. This method is deeply explained for the case of Motion
JPEG 2000 in (Dufaux et al., 2006), and for the case of H264/AVC in (Dufaux
& Ebrahimi, 2008a). Concerning the latter, Tong et al. (2010) made a proposal
to correct the drift error produced in H264/AVC during video encryption. The
described method is also used by Baaziz et al. (2007) in an automated video
surveillance system. Dufaux & Ebrahimi (2008b) presented an extension of their
previous work based on code-stream domain scrambling. The scrambling is per-
formed after the MPEG-4 encoding process directly on the resulting code-stream
output by the camera. This way, it avoids to decode and re-encode the video,
saving computational complexity. An encryption scheme for JPEG XR (Srini-
vasan et al., 2007) working in the frequency domain is proposed by Sohn et al.
(2011). It uses subband-adaptive scrambling for protecting face regions. Con-
cretely, different scrambling techniques are used for each subband: random level
shift for DC subbands, random permutation for LP subbands, and random sign
inversion for HP subbands. A different encryption scheme based on compressive
sensing (CS) (Candes et al., 2006; Donoho, 2006) and chaos theory is proposed
by Tong et al. (2011). This method scrambles sensitive regions of a video by
using block-based CS sampling on their transform coefficients during encoding.
The scrambling process is controlled by a chaotic sequence used to form the CS
measurement matrix. In order to prevent drift error they also use their previous
method.

Finally, regarding scrambling, Tang (1996) proposed an encryption algo-
rithm that works in the frequency domain and uses the permutation of the
DCT coefficients in order to replace the zig-zag order. Specifically, instead of
using the zig-zag order to map the 8x8 block to a 1x64 vector, a random per-

15


mutation list is used. A different approach is proposed by Yabuta et al. (2005)
where scrambling is performed in the space domain before encoding. The scram-
bling only affects the moving regions and is performed by randomly permuting
pixels. Before being scrambled, the original moving regions are encrypted with
AES, and embedded inside the masked image using digital watermarking (Pe-
titcolas et al., 1999). Thereby, the scrambled regions can be recovered later
if needed. An improved version of this method is presented in (Yabuta et al.,
2006) being able of real-time encoding, and decoding only one specific object
by exploiting object tracking information. Xiangdong et al. (2008) presented
a novel encryption scheme that relies on chaos theory. It works in the space
domain and prior to video encoding. It permutes the pixels of an image row
by row by using a chaotic sequence of sorted real numbers as a key. This key
can be used to de-scramble the image when necessary. A similar approach that
sorts the chaotic sequence as Vigenère cipher is proposed by Li et al. (2013).
Carrillo et al. (2010) proposed an encryption algorithm working in the spatial
domain. Before encoding, pseudo-random permutations are applied to the pix-
els. A secret pass phrase which controls the permutation process is used as key.
This algorithm is independent of the used compression algorithm and robust to
transcoding at the cost of an increase in bitrate depending on the percentage of
encrypted blocks.

3.4.3. Face de-identification

Face de-identification consists in the alteration of faces to conceal person
identities. The goal is to alter a face region in such a way that it cannot be
recognised using face recognition software. In order to achieve this, the faces
appearing in images are modified by using some of the aforementioned methods
such as image filtering previously discussed in section 3.4.1. Nevertheless, there
are cases in which privacy must be preserved while images must still keep their
capacity of being analysed. In such cases, it is then necessary to balance privacy
and intelligibility. Concerning this, there are methods in the literature that
consider this trade-off, and some of them will be reviewed in this section.

The k-Same family of algorithms (Figure 5) is one of the most recent and
commonly used algorithms for face de-identification. K-Same was first intro-
duced by Newton et al. (2005). Intuitively, k-Same performs de-identification
by computing the average of k face images in a face set. Then, all of the im-
ages of the given cluster are replaced by the obtained average face. Using this
algorithm, a de-identified face is representative of k members of the original
used face set. This way, if the probability of a de-identified face of being cor-
rectly recognised by a face recognition software is no more than 1

k
, it is said

that this algorithm provides k-anonymity privacy protection (Sweeney, 2002).
However, despite the fact that this formal model can preserve privacy, there are
no guarantees of the utility of the data.

Gross et al. proposed some extensions to the k-Same. On the one hand, an
algorithm named k-Same-Select that extends the k-Same algorithm is presented
in (Gross et al., 2006a). It guarantees the utility of the data, for example,
by preserving facial expressions or gender in face images. This algorithm di-

16


Figure 5: Several examples of de-identified face images: a) faces de-identified by using the
k-Same algorithm where some ghosting artefacts appear due to misalignments in the face set.
b) faces de-identified by using k-Same-M algorithm. Reprinted from Gross et al. (2009).

vides the face image set into mutually exclusive subsets using a data utility
function and, then, it uses the k-Same algorithm on the different subsets. On
the other hand, the k-Same-M algorithm is introduced in (Gross et al., 2006b)
to fix a shortcoming of the k-Same-Select algorithm. Appearance-based algo-
rithms, like k-Same-Select, work directly in the pixel level producing sometimes
alignment mismatch that leads to undesirable artefacts in the de-identified face
images. In order to overcome this issue k-Same-M relies on active appearance
model (Cootes et al., 1998, 2001), a statistical and photo-realistic model of the

17


shape and texture of faces. This model is also used by De la Hunty et al. (2010)
for real-time facial expression transferring from one’s person face to another.
Despite this method is not used for face de-identification, it could be useful to
transfer expressions from an already de-identified face to another one.

The k-Same-based algorithms have some constraints. For instance, if a sub-
ject is represented more than once in the dataset then k-Same does not provide
k-anonymity privacy protection. In order to address this shortcoming, Gross
et al. (2009) proposed a multi-factor model which unifies linear, bilinear and
quadratic models. By using a generative multi-factor model, a face image is fac-
torised into identity and non-identity factors. Afterwards, the de-identification
algorithm is applied and the de-identified face is reconstructed using the multi-
factor model.

A different approach is described by Bitouk et al. (2008), where faces are
automatically replaced in photographs. A large library of 2D faces downloaded
from the Internet is built according to appearance and pose. In order to replace
a detected face in an image, a similar candidate to the input is selected from
the library. Lin et al. (2012) presented a similar work for face swapping where
a personalised 3D head model from a frontal face is built, thereby any pose can
be rendered by directly matching with the face that wants to be substituted.

3.4.4. Object / people removal

Object and people removal deals with concealing persons or objects appear-
ing in an image or a video in such a way that there are not trails of them in the
resulting modified version (Figure 6). For the sake of clarity, the term ‘object’
will be used to refer both person and object indistinctly. When removing an
object from an image a gap is left. This gap has to be filled in order to create
a seamless image. Inpainting methods are used to repair regions with errors or
damages. Inpainting consists in reconstructing missing parts in such a way that
the modification is undetectable. Information from surrounding area is used
to fill in the missing areas. Therefore, these methods can be used for visual
privacy to remove people that do not commit suspicious activities from video
surveillance recordings, removing inactive participants from the video stream
of a video conference, concealing people from images in online social networks,
and so many other applications. However, due to computational restrictions,
inpainting methods are rarely used in real-time applications running on com-
modity hardware (Granados et al., 2012).

Inpainting can be divided into two large groups: image inpainting (Bertalmio
et al., 2000) and video inpainting (Kokaram et al., 1995; Abraham et al., 2012).
This distinction is mainly due to the content nature. While in an image is only
needed to ensure spatial consistencies, in video, temporal consistencies between
all of the frames have to be ensured as well.

Regarding image inpainting, there are several approaches: texture synthe-
sis (Igehy & Pereira, 1997; Efros & Leung, 1999; Efros & Freeman, 2001), partial
differential equations (PDE) inspired algorithms (Masnou & Morel, 1998; Morse
& Schwartzwald, 1998; Bertalmio et al., 2000; Chan et al., 2006), and exemplar
based (Criminisi et al., 2003, 2004; Whyte et al., 2009; Koochari & Soryani,

18


(a) Real image (b) Modified image

Figure 6: An example of a people removal method where the person has been manually
selected in the real image (a), and then automatically removed in the second image (b) by
filling the region concerning the person using an exemplar-based image inpainting method.
Reprinted from Criminisi et al. (2004).

2010; He et al., 2011; Ma & Ma, 2011). In texture synthesis, a synthetic texture
derived from one portion of the image is used to fix another portion. This syn-
thetic texture is a plausible patch that does not have visible seams nor repetitive
features. Algorithms based on texture synthesis are able to fill in large regions
but at the cost of not preserving linear structures. PDE inspired algorithms re-
construct the gap using geometry information to interpolate the missing parts.
A diffusion process propagates linear structures of equal gray value (isophotes)
of the surrounding area into the gap region. Although these algorithms pre-
serve well linear structures, the diffusion process introduces some blurring when
filling in large regions. Finally, exemplar-based methods are of particular in-
terest. Instead of generating synthetic textures, these methods operate under
the assumption that the information that is necessary to complete the gap has
to be fetched from nearby regions of the same image. They generate new tex-
tures by searching for similar patches in the image with which the gap is filled.
Moreover, some exemplar-based algorithms also search the needed information
in databases of millions of images in order to complete the remaining infor-
mation (Whyte et al., 2009). Furthermore, these algorithms often combine the
advances of texture synthesis and isophote-driven inpainting by a priority-based
mechanism which determines the region filling order, thereby reducing blurring
caused by prior techniques.

19


(a) Real image (b) Blur (c) Pixelating (d) Emboss

(e) Solid silhouette (f) Skeleton (g) 3D avatar (h) Invisibility

Figure 7: Several examples of visual abstractions where the person in the real image (a) has
been replaced by a visual model.

Regarding video inpainting, some of the first straightforward approaches
tried to apply image inpainting methods to individual images of the underlying
video data. However, they did not take full advantage of the temporal correla-
tion of video sequences. Due to this, the previous methods are often modified
in order to be adapted to sequences of images, reconstructing a given frame
by interpolating missing parts from adjacent frames. These methods can be
classified into patch-based methods (Shiratori et al., 2006; Wexler et al., 2007;
Patwardhan et al., 2007; Ghanbari & Soryani, 2011) and object-based meth-
ods (Zhang et al., 2005b; Cheung et al., 2006; Vijay Venkatesh et al., 2009). As
patch-based methods are unable to perform both spatial and temporal aspects
simultaneously, object-based methods were introduced in order to overcome
these constraints.

3.4.5. Visual abstraction / object replacement

Object replacement involves the substitution of objects (or persons) appear-
ing in an image or video by a visual abstraction (or visual model) that protects
the privacy of an individual while enabling activity awareness. As far as we
know, the term ‘visual abstraction’ was early coined by Chinomi et al. (2008)

20


to refer to a visual model that abstracts a replaced object. Common visual
models could be a point, a bounding box, a stick figure, a silhouette, a polyg-
onal model, and many others as seen in Figure 7. Visual abstraction may be
obtained in a a variety of ways. Hence, there is an overlap with some of the
previous reviewed methods, such as image filtering or face de-identification, that
could also be used as visual models. Object replacement does not necessarily
imply removing the object to be replaced, but quite often the aforementioned
techniques in section 3.4.4 intervene. In such cases, an object removal method is
applied, followed by the rendering of the visual model over the inpainted image.
The abstract object is often located in the same relative position, pose, and
orientation as the original object.

Although it depends on the used visual model, by using a proper abstract
representation of an object, the information of the scene remains useful so as
visual analysis can still be carried out, for instance, to assess the underlying
activity before an accident in a home risk detection service. Then, the activity
can be analysed without violating the right to privacy of people appearing in the
image because it is not possible to directly identify them. However, as reported
by Hodgins et al. (1998), some works in human motion perception showed that
viewers easily recognise friends by their gaits, as well as the gender of unfamiliar
persons when using moving light displays (Johansson, 1973). Concerning this,
it seems that motion is essential for identifying human figures. Furthermore,
apparently the geometric model used for rendering human motion affects the
viewer’s perception of motion. For example, in experiments carried out by Hod-
gins et al., subjects were able to better discriminate motion variations using the
polygonal model than they were with the stick figure model. Hence, a study to
determine how visual abstraction techniques actually preserve privacy must be
done.

Different works employ visual abstraction and object replacement. For ex-
ample, a silhouette representation can be used as a visual abstraction of a per-
son (Tansuriyavong & Hanaki, 2001). This removes information about textures
while maintaining the shape of the person, thereby complicating the identifi-
cation. A silhouette representation is used, for instance, to preserve individ-
ual’s privacy in a fall detector and object finder system (Williams et al., 2006).
Another representation can be obtained by using an edge motion history im-
age (Chen et al., 2007, 2009). By using this pseudo-geometric model, the whole
human body is obscured and the person looks like a ghost in the final image.
This method detects edges of the body appearance and motion, accumulating
them through the time, in order to smooth the noise, and partially preserve body
contours. Chinomi et al. (2008) proposed twelve operators for visual abstrac-
tions: as-is, see-through, monotone, blur, mosaic, edge, border, silhouette, box,
bar, dot and transparency. In order to choose among one of them, the relation-
ship between the subject being monitored and the viewer is taken into account.
In addition to the representation that have been seen so far, 3D avatars can be
used too. Avatar creation (Sadimon et al., 2010) is a very interesting field with
straightforward application in visual abstraction as well. Lyons et al. (1998)
presented a method that uses automatic face recognition and Gabor wavelet

21


transform for creating avatars. It automatically extracts a face from an image,
and create a personalised avatar by rendering the face into a generic avatar body.
However, given that the avatar maintains recognisable aspects of the face, pri-
vacy would not be protected. Hogue et al. (2007) proposed a method based on
3D reconstruction for whole body avatar creation. They use a portable stereo
video camera to extract the geometry of the person and create a 3D model. A
similar and more recent work is proposed by Chen et al. (2014), where two low-
cost RGB-D cameras are used to scan the whole body of a person in order to
create a mesh model. Although the resulting textured model of both methods
do not protect privacy, they can constitute a building block for other methods
where generic 3D avatars are used to replace persons, as proposed by Fan et al.
(2005). Furthermore, automatic rigging (Baran & Popović, 2007; Borosán et al.,
2012) could be used to animate 3D models, enabling the use of customised 3D
models in real-time. Hence, opening the door to new protection methods that
transform or deform mesh models. Finally, the use of object-video streams for
preserving privacy is proposed by Qureshi (2009). A separated video bitstream
is generated for each foreground object appearing in the raw video. Thereby, the
original video can be reconstructed from object-video streams without any data
loss. During reconstruction, each video bitstream can be rendered in several
ways, for instance, obscuring people identities using a silhouette representation
or just not showing an object-video stream at all.

3.5. Data hiding based methods

In order to protect privacy, there are redaction methods that apart of mod-
ifying the region of interest, they embed the original information inside of the
modified version so as to be retrieved in the future if needed (Cheung et al.,
2009). These redaction methods make use of data hiding techniques (Petitcolas
et al., 1999) to develop reversible methods when the underlying redaction does
not support it. In Figure 8, a classification of data hiding methods is presented.
Concerning the terminology used in data hiding, the embedded data is the mes-
sage that will be sent secretly. It is often hidden in another message referred to
as cover message whose content can be text, audio, image or video. As a result
of a hidden process, a marked message is obtained.

Generally data hiding techniques are used for steganography, digital water-
marking and fingerprinting. Steganography is a practice that consists in con-
cealing a secret message as embedded data inside a cover message. The hiding
process is often controlled by a key in order to allow the recovery of the secret
message to parties that know it. Regarding digital watermarking and finger-
printing, they both use steganography techniques but are focused on copyright
protection. On the one hand, digital watermarking encodes information about
the owner of an object by means of a visible pattern (e.g. a company logo) or
hidden information. On the other hand, fingerprinting is used to embed hidden
serial numbers that uniquely identify an object in such a way that the owner of
the copyright can detect violations of licence agreements. Data hiding methods
have to provide different kind of features in terms of capacity, perceptibility
and robustness (Cox et al., 2002). The main difference between steganography

22


Information 

hiding 

Covert channels Steganography 

Linguistic 

steganography 

Technical 

steganography 

Anonymity 
Copyright 

marking 

Robust 

copyright 

marking 

Fingerprinting Watermarking 

Imperceptible 

watermarking 

Visible 

watermarking 

Fragile 

watermarking 

 
Figure 8: A classification of data hiding techniques according to Petitcolas et al. (1999).

and digital watermarking (and fingerprinting) is their application. Whereas the
former aims for imperceptibility to human vision, the latter is focused on the
robustness.

Invisible watermarking can be used for privacy protection purposes. Instead
of embedding information about the owner, the original video is embedded in-
side of the privacy protected version of it. Due to this, data hiding methods that
provide large embedding capacity are required. Furthermore, if the reversibility
of the hiding process is considered, irreversible and reversible data hiding tech-
niques can be found (Ni et al., 2006). In the former, the cover video cannot be
fully restored after the hidden data is extracted, but it usually produces higher
hiding capacity. In the latter, the cover video can be fully restored, thereby
maintaining the authenticity of the original video. Nevertheless, reversible data
hiding is not required in privacy protection because the cover video is discarded
once the embedded data has been extracted.

Some of the privacy protection methods that make use of non-reversible and
imperceptible watermarking are described next. For instance, Yabuta et al.
(2005) extract moving objects of a motion JPEG video. These regions are
scrambled in the original video, resulting in a privacy protected video that is
used as the cover video. Then, the extracted objects are JPEG compressed and
encrypted with AES, followed by a hiding process that embeds them into the
least significant bits of middle frequency DCT coefficients of the cover video.
The added perturbation is small and does not visually affect the reconstructed

23


image. A similar approach is proposed by Zhang et al. (2005a), where fore-
ground objects are extracted from a H263 video, compressed and encrypted as
a regular video bitstream. The gaps left by the foreground objects are inpainted
in the original video using background replacement. Then, the encrypted fore-
ground objects are embedded in the DCT coefficients of the inpainted video
considering the perceptual quality of the marked video. However, one drawback
is the increased bitrate of the resultant video. Yu & Babaguchi (2007) presented
a method for hiding a face into a new generated face. It considers the face as the
information to be embedded. Initially, an active appearance model is built and
faces are characterised as parameters according to this model in order to reduce
the payload of the embedded information. Then, the face appearing in the orig-
inal MPEG2 video is masked. Finally, face parameters are embedded into DCT
coefficients of the video following the quantised index modulation scheme (Cox
et al., 2002). A novel data hiding algorithm for M-JPEG that minimises both
the output perceptual distortion and the output bitrate is proposed by Paruchuri
& Cheung (2008). The algorithm identifies the optimal locations to hide data
by selecting the DCT coefficients that minimise a cost function that considers
both distortion and bitrate. The person appearing in a video (privacy data)
is extracted and removed from it. Then, the privacy data is embedded into
the selected DCT coefficients of the inpainted video. The proposed algorithm
is used by Cheung et al. (2008) but modifying the rate-distortion scheme to
determine the optimal number of bits to be embedded in each DCT block. This
last version is used for a reversible data hiding scheme in a video surveillance
prototype (Cheung et al., 2009; Paruchuri et al., 2009). Guangzhen et al. (2010)
presented a different method for JPEG images. It uses scaling and wavelet coef-
ficients generated by DWT. The former is used to build a low-resolution image
where privacy information appears pixelated, whereas the latter is used along
scaling coefficients to jointly recover the original image. In this way, wavelet
coefficients represent only the private information that needs to be embedded.
They are embedded in the DCT coefficients of the low-resolution image after
quantisation via amplitude module modulation (Wu, 2001).

4. Privacy-aware intelligent monitoring systems

This section introduces some of the existing expert and intelligent systems
in the video surveillance and AAL fields that take privacy into account and,
thereby, are developed as a framework to protect it. Some of the reviewed
systems focus only on the way in which private date is managed, whereas others
use redaction methods to protect individual’s privacy before imagery data is
stored, distributed, shared or visualised. Because of this, some privacy design
questions which such systems should address are drawn (Senior et al., 2005; Yu
et al., 2008):

• What data is captured?

• Has the subject given consent?

24


• How does the subject specify the privacy preferences?

• What form does the data take?

• Who sees the data?

• How long is the data kept?

• How is the data used?

• How raw is the data?

• Who and what should be protected?

• How should privacy be protected in a video surveillance system?

• How can privacy be protected without losing utility?

The answers to these questions lead to privacy policies and technical aspects
that such systems should cope with in one way or another. In the same line, it
is also important to question when redaction is performed. As stated by Senior
(2009), there are mainly several locations where redaction can be performed
for a general video-based monitoring architecture made up of cameras, video
processor, database and user interfaces. Concretely, these locations include
all of the mentioned parts but cameras because of technical issues that may
arise in legacy systems. Nevertheless, cameras have been also included here
from a conceptual point of view, so there are a total of four locations. In
such an architecture, data flows from video cameras to user interfaces, crossing
through the video processor and the database. Furthermore, depending on
privacy policies of subjects, viewers permission, and others, both redacted and
unredacted data may have to be displayed. Next, we enumerate the several
locations where redaction can be applied:

1. User Interface. This location is the most insecure. Data crosses the system
without being protected until it reaches the user interface. Then, redac-
tion is carried out by the user interface and the protected information is
presented to the viewer. An advantage of this approach is that redacted
data does not require to be stored. However, metadata information needs
to be delivered with the raw data.

2. Database. When the user interface requests the database for some infor-
mation to view, the latter can redact it. Thereby, redacted data does not
require to be stored but it involves additional processing because the same
data may be redacted multiple times. Besides of that, latency issues may
arise due to extra processing.

3. Video Processor. Redaction can be performed by the video processor
before storing. It analyses video bitstreams to detect activities and extract
useful information. Nevertheless, bandwidth and storage requirements are
increased because both redacted and unredacted data need to be sent to
the database.

25


4. Video Camera. A privacy-aware smart camera can redact sequences of
images itself. This is the earliest possible stage in which redaction can be
applied. Similarly to the video processor, in this location bandwidth and
storage requirements are increased too.

Several redaction locations can be combined to enhance security as in double
redaction also proposed by Senior (2009), in which privacy protection is applied
at the earliest stage as well as in other locations. Video is decomposed in mul-
tiple information streams containing the private data which are encrypted and
flow through the system to the database. The information can be recombined
later, when needed by an authorised viewer. By using double redaction more
than one level of privacy protection can be provided, where each one could be
suitable for different applications. This way, some visual information could not
be protected but securely stored, whereas other information could be redacted
according to several factors like subject, viewer, ongoing activity and so on.

Next, some of the existing systems found in the literature are described.

CoMedi (Coutaz et al., 1999) is a media space prototype that facilitates remote
informal communication and group awareness while assuring privacy protection.
A porthole display with fish-eye feature is used in order to provide awareness
of the remote activities that are being carried out. It shows the personal in-
formation that the corresponding remote user has previously accepted to reveal
without losing awareness about peripheral activities. Regarding privacy, this
prototype uses a face tracker and a privacy filter based on eigenspace coding
in order to filter the captured faces not belonging to the image set of ‘socially
correct’ faces.

NeST (Fidaleo et al., 2004), the networked sensor tapestry, is a general ar-
chitecture to manage the broad range of distributed surveillance applications. It
is scalable and is composed of software modules and hardware components. The
core component of a NeST server is the privacy buffer. It utilises programmable
plug-in privacy filters which operate on data coming from sensors in order to
prevent access to it or to remove personal identifiable information. These pri-
vacy filters are specified using a privacy grammar that is able of connecting
multiple low-level privacy filters to create arbitrary data-dependent privacy def-
initions. Moreover, the privacy of the monitored subjects is also considered. In
this sense, the NeST architecture integrates individuals’ privacy (according to
behaviours) denying access to specific information to some or all of the modules
or operators. NeST features secure sharing, capturing, distributed processing
and archiving of surveillance data.

Stealth Vision (Kitahara et al., 2004) is an anonymous video capturing sys-
tem that protects the privacy of objects by pixelating or blurring their appear-
ance. It determines private areas of captured videos from a mobile camera using
3D space models. The 3D position of the target object is estimated by a ho-
mographic transformation using images coming from a static overhead camera.
Afterwards, a calibrated mobile camera estimates the privacy area by project-

26


ing the 3D models onto the captured image plane. Finally, the privacy area is
protected. RFID sensors are employed in order to indicate which persons or
objects are allowed to be captured.

PriSurv (Chinomi et al., 2008) is a system that uses visual abstract repre-
sentations to protect individual’s privacy. It is composed of several modules:
analyser, profile generator, profile base, access controller, abstractor and video
database. In order to identify subjects, RFID tags and image processing is
used. This system is focused on small communities where a certain number
of members are registered and they monitor each other. Given that the sense
of privacy is different for everyone, privacy policies are determined according
to the closeness between subjects in the video and viewers monitoring. Hence,
the appearance of subjects is opened to close viewers while it is hidden to the
viewers that subjects feel distant from. Depending on this closeness, different
abstract representations are chosen.

Respectful Cameras (Schiff et al., 2009) is a prototype system focused on
addressing privacy concerns. This system works by detecting coloured markers
in real time that people wear such as hats or vests in order to indicate their
privacy preferences. This are expressed as their will of remaining anonymous.
The system obscures the face of a person when a marker has been detected.

CareLog and BufferWare (Hayes & Truong, 2009) are two systems based on
the concepts of the selective archiving model. Under this model, data is con-
stantly buffered but an explicit input is required in order to archive it, otherwise
data is lost. It represents a compromise whereby people can negotiate their own
policies concerning control, privacy, information access and comfort. CareLog
is a system aimed at education classrooms for the recording of diagnostic be-
havioural data of children with severe behaviour disorders. Using such a system,
a teacher takes control of data archiving to document an incident after it has
occurred. Regarding BufferWare, it is a system aimed at semi-public spaces in
which people can save recorded data, if desired, using a touch-screen. They are
also able of viewing previous recordings. In this case, BufferWare was placed
in a social area of an academic building. These two systems were proposed in
order to conduct an experiment to assess issues of the selective archiving model,
such as ownership of data, choice of which data should be saved or deleted, vis-
ibility and awareness of recordings, and trust in the fact that policies are being
followed and features were implemented as described.

Altcare (Shoaib et al., 2010) is a monitoring system based on a static net-
work of video cameras aimed for emergency detection at home. It focuses on fall
detection, and works without any manual initialisation or interaction with older
persons. When a fall is detected, Altcare first attempts to get the confirmation
from the involved person. Afterwards, if the verification is positive, it auto-
matically communicates the emergency to the responsible person. In addition
to information concerning the involved person, the system transmits a patch of
the video that shows the emergency. In order to protect the privacy, the person
is replaced by the silhouette in the video. Furthermore, Altcare enables system
administrators to check the state of the person at any time, if so desired by the
monitored person.

27


In Sohn et al. (2010), a privacy-preserving watch list screening system for
video-surveillance is proposed. Depending on the group of interest the watch list
comprise either blacklisted or white-listed identities. Hence, the goal is to verify
whether a person is enrolled in the watch list or not. In order to preserve privacy
it relies on homomorphic encryption. This system is composed of a network of
distributed cameras and a central management server (CMS) which owns the
watch list database. This database is considered private and it contains face
feature vectors corresponding to identities of interest. In this system, cameras
analyse the recorded raw video in order to detect face regions. When a face is
detected, it is encrypted and sent to the CMS where face feature vectors are
retrieved from the watch list database. Afterwards, a comparison between the
face coming from a camera and those from the watch list is performed in the
encryption domain. Then a result confirming whether the identity is included
in the watch list or not is encrypted and sent back to the camera. Finally,
according to the obtained result the camera notifies it to the security service.

TrustCAM (Winkler & Rinner, 2010b,a) is a smart camera that provides se-
curity and privacy-protection based on trusted computing. By using this smart
camera video bitstreams are digitally signed. Thereby, the manipulation of
recorded images or videos can be detected, origin of images are provided, and
multi-level access control is supported. Each TrustCAM node is operated by
a central control station (CS) which runs a protected database where crypto-
graphic keys generated during camera setup are securely stored. Camera nodes
encrypt privacy sensitive image data such as faces or number plates using a spe-
cial cryptographic key stored in each one. Moreover, an abstracted version of
the sensitive regions as, for example, a silhouette can also be generated and en-
crypted. These encrypted sensitive regions cannot be decrypted without having
access to the CS. Indeed, each representation requires a different key in order
to decrypt it, thereby providing multi-level access control.

Finally, PAAS (Barhm et al., 2011) is a privacy-aware surveillance system
that enables monitored subjects to specify their privacy preferences via gestures.
It uses face recognition technology in order to map individuals to their privacy
preferences and security credentials that are encoded using an extension of the
P3P-APPEL framework (Cranor et al., 2002). Users have access to one of three
privacy settings, namely no privacy, blurred face and blurred full body.

5. Discussion

5.1. Recognition of the region of interest

As it has been seen throughout this survey, privacy protection methods
mainly focus on preserving visual privacy in images and videos, either modifying
completely the whole image or only a region of interest. Although the detection
of the region of interest has not been specifically considered in this work, correct
detection is fundamental in order to make protection methods work as desired.
For instance, it could be imagined a hypothetical case in which a scrambling
algorithm was used to obfuscate a person’s face in a video. Supposing that

28


the video sampling rate is 25 fps, there are 3 000 frames in 2 minutes of video
where a face appears. It could be supposed also that the hit rate of the used
face detector was around 98%. Under these assumptions, the subject face will
be poorly detected in roughly 60 frames of video, thereby not enabling the
obscuring algorithm to work properly in protecting privacy. If the face is fully
observable in a single frame, subject identification can already be performed.
Not only that, the face could also be inferred observing multiple frames in
which the face detector failed. Hence, it is essential that the computer vision
techniques (segmentation, object detection, tracking, recognition, etc.) involved
before the obscuring process are reliable and robust enough in order to guarantee
the effectiveness of the protection method.

5.2. Privacy and intelligibility trade-off

Although we have mentioned the privacy-intelligibility trade-off in the intro-
duction of the section 3.4, let us provide a further discussion here. Zhao & Stasko
(1998) compared several image filters (blurring, pixelating, edge-detector, live-
shadow and shadow-view) and found out that whereas intelligibility is provided
almost in all of the tested filters, actors were also recognised in most of the cases.
Boyle et al. (2000) performed an in depth evaluation of blurring and pixelating
to analyse how the variation of the filtration level affects to this balance. The
obtained results suggested that blurring and, to a lesser extent, pixelating may
provide a balance, but it would be a precarious balance at best. Furthermore,
Neustaedter et al. (2006) state that blurring by itself does not suffice for privacy
protection when a balance is needed. Therefore, it seems that image filters can
hardly provide a balance between privacy and information utility, and generally
privacy loses in this balance. Nevertheless, when such a balance is not needed,
image filters may be used. As reported by Agrawal & Narayanan (2011), blur-
ring is enough to hide the identity if gait is not involved. But gait and other
temporal characteristics are difficult to hide if there is some familiarity between
the subject and the user. Anyway, it would be interesting to expand these stud-
ies to also include visual abstraction methods and all those where such a balance
could be analysed.

5.3. Real-time applications

Regarding the techniques that could be used in real-time intelligent moni-
toring systems, apart from image filtering, the remainder techniques would be
impracticable because they are very time consuming. In such systems, all the
involved computer vision algorithms along with the protection method must
jointly work in real time. Like inpainting techniques, encryption ones have
high computational requirements. Nevertheless, some light-weight encryption
techniques may work properly. In turn, some image inpainting techniques are
designed to work in real time too. The election of which protection method to
use depends on the application requirements.

29


5.4. Privacy model and evaluation

Relying on a formal model to guarantee privacy preservation is required in
order to have a common framework in which privacy solutions can be devel-
oped and evaluated. Common definitions about what visual privacy should
be, when privacy is protected, which are the sensitive areas, and so on are
needed. A model that currently fulfils some of these requirements is the face de-
identification based on the k-same family of algorithms. Although it guarantees
that a de-identified face cannot be recognised with a probability higher than 1

k
,

inference channels are not considered. In that sense, Saini et al. (2014) propose
a model that takes this into account. This model measures the privacy loss
of the individuals that usually inhabitant a surveillance area. This measure is
given as a continuous variable in the range [0, 1]. As far as we know, this is the
only approach that currently measures privacy loss in such a way. Regarding
the remainder methods, despite they are not based on a formal model, it can be
intuitively realised that they provide different protection levels. For instance,
näıve image filtering techniques like blurring or pixelating are not sufficient to
protect privacy when intelligibility is involved. But blanking out or encrypting
sensitive regions could be enough.

Also related to the previous point is how the evaluation of visual privacy pro-
tection is performed. We have not found a common framework for it. Although
the aforementioned model for measuring privacy loss is very valuable, it cannot
be used to build such a framework that automatically performs the comparison
because current technology is not robust enough. This would have been very
useful in order to perform an exhaustive comparison covering all of the reviewed
methods. Moreover, the lack of more manually-labelled privacy datasets also
contributes to deteriorate visual privacy evaluation. The only datasets focused
on privacy that have been found in the literature are PEViD-HD (Korshunov
& Ebrahimi, 2013) and PEViD-UHD (Korshunov & Ebrahimi, 2014). Both
datasets consider video surveillance scenarios and provide high definition video
sequences and ultra high definition ones, respectively. A similar dataset for
ambient-assisted living would be appreciated where, in addition, other kind of
visual sensors were included (e.g. RGB-D sensors). In addition to assess the
grade in which privacy is protected, to what extent useful information is retained
should also be studied.

In the absence of such evaluation mechanisms, several works have been found
in which some of the protection methods have been tested by users as part of the
experimentation. These experiments are mainly based on conducting interviews
and questionnaires with users where their skills in extracting useful information
from a privacy-protected image (subject identification, pose, activity, presence,
etc.) are evaluated (Korshunov et al., 2012). Results obtained from an objective
study are missed.

5.5. User studies about privacy requirements

Concerning users, there is no agreement about privacy requirements be-
cause privacy is highly subjective and which information is considered sensitive

30


depends on each individual. Despite of what has been previously said about
blurring and pixelating not being effective in providing a balance between pri-
vacy and intelligibility, some users that have participated in studies about this
matter feel satisfied with these filters when they are configured correctly (Zhao
& Stasko, 1998; Boyle et al., 2000; Lander et al., 2001; Neustaedter et al., 2006).
Other participants felt that there is not such a balance because as they were able
to know what subjects were doing, privacy was not being suitably protected.
Indeed, while some of the participants showed interest in using video cameras
at work and even at their homes imposing some constraints, others commented
that they would not use them because video cameras are very intrusive. Any-
way, what can be extracted about this lack of consensus is that privacy is a very
subjective topic and user’s requirements vary.

Considering which protection methods preserve privacy better for a fall de-
tection system, a user survey was conducted by Edgcomb & Vahid (2012). They
analysed several visual representations: blur, silhouette, oval, box, and trailing
arrows. Results indicated that silhouettes and blur were perceived to provide
insufficient privacy, whereas an oval provides sufficient perceived privacy while
still supporting fall detection accuracy of 89%.

Others have also researched about possible features of subjects’ sense of
security and privacy for video surveillance systems (Koshimizu et al., 2006;
Babaguchi et al., 2009). Results showed how subjects classify viewers that
monitor them using cameras in: familiar persons, unfamiliar persons and per-
sons in duty. Moreover, subjects expect a very familiar person to protect them
in an emergency. Concerning the disclosure of private information, it is af-
fected by the closeness between subjects and viewers. Familiarity or closeness
between persons facilitates the recognition of each other. It is also curious to
see how people in their 50s are less sensitive to privacy than those in their 20-
40s. Regarding older people, most of them agree with using silhouettes as a
visual representation in order to protect their privacy. Furthermore, they show
interest in customising the system operation. They demand control over the
camera and who has access to the captured or stored information (e.g. turning
a camera off, setting up the filtration level, seeing how they are being watched,
and so on). However, there are others who consider that they do not require a
monitoring system because they are independent enough (Demiris et al., 2009).

Finally, it is necessary that users demand protection methods in order to get
visual privacy protection techniques widely used in video surveillance systems.
Without such a demand, main actors of the security market, more interested in
making a profit, will not pay attention to privacy issues because they do not
have enough motivation to kick-start self-regulation (Gutwirth et al., 2012). We
consider that a strong demand of privacy-aware systems will increase research on
this field also benefiting to future AAL systems. Either way, more discussion is
needed in visual privacy, privacy requirements of users according to applications
need to be collected, and more research on visual privacy protection is required.

31


6. Conclusion

As we have seen throughout this work, the proliferation of networks of video
cameras for surveillance in public spaces and the advances in computer vision
have led to a situation in which the privacy of individuals is being compro-
mised. Moreover, nowadays video cameras are being used more often in ambient-
assisted living applications that operate in private spaces. Because of this, ex-
pert and intelligent systems that handle these tasks should take privacy into
account by means of new tools that restore the individual’s right to privacy.

In this paper, an up-to-date review of visual privacy protection methods
has been provided. As far as we know, this is the first review that focuses
on the protection methods and tries to give a comprehensive classification of
all of them. In our classification, we have considered the following categories:
intervention, blind vision, secure processing, redaction and data hiding. As
it has been seen, redaction methods cover the largest part of this work and
they have also been classified according to the way in which imagery data is
modified: image filtering, encryption, face de-identification, object removal and
visual abstraction.

Given that the previous methods are required for expert and intelligent sys-
tems for video surveillance and ambient-assisted living, some of them have also
been reviewed. Such systems are developed as a framework to provide some
kind of privacy protection. However, only using some of the described meth-
ods is not enough to build a privacy-aware system. As it has been seen, they
have to support also other mechanisms to implement privacy policies. These
policies specify what data is protected, how it is protected, who has access, and
a variety of other fundamental aspects concerning involved subjects, ongoing
activities, captured data, and so on. Despite of visual privacy protection being
necessary in expert and intelligent systems for video surveillance, due to its in-
creasing power of tracking, storing and indexing of our public daily activities,
the use of privacy protection methods can also be justified for such systems but
from an ambient-assisted living perspective. In that sense, privacy-aware smart
cameras are essential in the construction of future low-cost systems for telecare
applications aimed at older and disabled people.

As another contribution of this work, we have also provided a discussion
about important factors and current limitations of visual privacy protection.
For instance, a key aspect, i.e. the correct detection of the region of interest,
has been highlighted. It is a very relevant topic because it can be considered
as the first building block of privacy protection and, therefore, it would deserve
a self-standing review of the related research fields. We have also mentioned
the problem faced by redaction methods, i.e. the privacy and intelligibility
trade-off, and the necessity of expanding these studies to cover other protection
methods. Furthermore, we have stood out the lack of a formal model and
evaluation mechanisms that would enable to make fair comparisons between
different protection methods and, more importantly, it would provide guarantees
about their accuracy in protecting privacy.

This work provides a comprehensive picture of how to protect visual privacy,

32


so it can be a good starting point for novel researchers in the field of expert and
intelligent systems, and more specifically in intelligent monitoring systems for
video surveillance and ambient-assisted living. For future research directions we
consider that there are two main axis that need special attention. On the one
hand, as aforementioned, a standard way to quantify and evaluate visual privacy
protection is needed so as to fairly compare works in this matter. It would be
better if this is included as a part of a formal model that considers more privacy
related aspects. On the other hand, recognition accuracy of sensitive regions
needs to be improved, so research is required in more robust computer vision
algorithms for recognition, tracking and event detection.

Regarding other future research directions, protection methods would also
deserve some consideration. Although a lot of work has been done so far, novel
redaction methods that provide a better balance between privacy and intelligi-
bility are welcome. For instance, a textual representation of the scene, where the
essential context (e.g. individuals, events, etc.) is captured, could be enough
for some applications like telecare. Privacy preferences are also very important.
Mechanisms to empower users and put them in control of their private data
captured by intelligent systems should be researched. It would be appreciated
to have a common way to let users specify their preferences so they can decide
who, how, where and when they are watched in normal scenarios where no law
infringement is involved. Finally, users should be able of knowing and track-
ing which systems have collected data about them in order to enable them to
take legal actions in this matter just in case they need it. Therefore, work in
these areas may lead to obtain more effective and reliable visual privacy protec-
tion methods and systems in a near future and also helps to increase the users’
acceptance of using video cameras in private spaces.

Acknowledgement

This work has been partially supported by the Spanish Ministry of Science
and Innovation under project “Sistema de visión para la monitorización de la
actividad de la vida diaria en el hogar” (TIN2010-20510-C04-02) and by the
European Commission under project “caring4U - A study on people activity
in private spaces: towards a multisensor network that meets privacy require-
ments” (PIEF-GA-2010-274649). José Ramón Padilla López and Alexandros
Andre Chaaraoui acknowledge financial support by the Conselleria d’Educació,
Formació i Ocupació of the Generalitat Valenciana (fellowship ACIF/2012/064
and ACIF/2011/160 respectively). The funders had no role in study design, data
collection and analysis, decision to publish, or preparation of the manuscript.

This article was originally published in José Ramón Padilla-López, Alexan-
dros Andre Chaaraoui, Francisco Flórez-Revuelta, Visual Privacy Protection
Methods: A Survey, Expert Systems with Applications, ISSN: 0957-4174, http:
//dx.doi.org/10.1016/j.eswa.2015.01.041.

33

http://dx.doi.org/10.1016/j.eswa.2015.01.041
http://dx.doi.org/10.1016/j.eswa.2015.01.041


References

Abraham, A. R., Prabhavathy, K. A., & Shree, D. J. (2012). A survey on video
inpainting. International Journal of Computer Applications , 56 , 43–47.

Adams, A. A., & Ferryman, J. M. (2013). The future of video analytics for
surveillance and its ethical implications. Security Journal , . doi:10.1057/sj.
2012.48.

Agi, I., & Gong, L. (1996). An empirical study of secure MPEG video trans-
missions. In Network and Distributed System Security, Proceedings of the
Symposium on (pp. 137–144). IEEE.

Agrawal, P., & Narayanan, P. (2011). Person de-identification in videos. Circuits
and Systems for Video Technology, IEEE Transactions on, 21 , 299–310.

Augusto, J., Nakashima, H., & Aghajan, H. (2010). Ambient intelligence and
smart environments: A state of the art. In H. Nakashima, H. Aghajan, &
J. Augusto (Eds.), Handbook of Ambient Intelligence and Smart Environments
(pp. 3–31). Springer US.

Avidan, S., & Butman, M. (2006). Blind vision. In A. Leonardis, H. Bischof, &
A. Pinz (Eds.), Computer Vision - ECCV 2006 (pp. 1–13). Springer Berlin
/ Heidelberg volume 3953 of Lecture Notes in Computer Science.

Avidan, S., Elbaz, A., Malkin, T., & Moriarty, R. (2009). Oblivious image
matching. In A. Senior (Ed.), Protecting Privacy in Video Surveillance (pp.
49–64). Springer London.

Baaziz, N., Lolo, N., Padilla, O., & Petngang, F. (2007). Security and privacy
protection for automated video surveillance. In Signal Processing and Infor-
mation Technology, 2007 IEEE International Symposium on (pp. 17–22).

Babaguchi, N., Koshimizu, T., Umata, I., & Toriyama, T. (2009). Psychological
study for designing privacy protected video surveillance system: PriSurv.
In A. Senior (Ed.), Protecting Privacy in Video Surveillance (pp. 147–164).
Springer London.

Baran, I., & Popović, J. (2007). Automatic rigging and animation of 3D charac-
ters. Graphics, ACM Transactions on, 26 . doi:10.1145/1276377.1276467.

Barhm, M., Qwasmi, N., Qureshi, F., & el Khatib, K. (2011). Negotiating pri-
vacy preferences in video surveillance systems. In K. Mehrotra, C. Mohan,
J. Oh, P. Varshney, & M. Ali (Eds.), Modern Approaches in Applied Intelli-
gence (pp. 511–521). Springer Berlin Heidelberg volume 6704 of Lecture Notes
in Computer Science.

Bertalmio, M., Sapiro, G., Caselles, V., & Ballester, C. (2000). Image inpainting.
In Computer Graphics and Interactive Techniques, Proceedings of the 27th
Annual Conference on SIGGRAPH ’00 (pp. 417–424). New York, NY, USA:
ACM Press/Addison-Wesley Publishing Co.

34

http://dx.doi.org/10.1057/sj.2012.48
http://dx.doi.org/10.1057/sj.2012.48
http://dx.doi.org/10.1145/1276377.1276467


Bitouk, D., Kumar, N., Dhillon, S., Belhumeur, P., & Nayar, S. K. (2008). Face
swapping: automatically replacing faces in photographs. Graphics, ACM
Transactions on, 27 , 39:1–39:8.

Borosán, P., Jin, M., DeCarlo, D., Gingold, Y., & Nealen, A. (2012). RigMesh:
Automatic rigging for part-based shape modeling and deformation. Graphics,
ACM Transactions on, 31 , 198:1–198:9.

Boult, T. (2005). PICO: Privacy through invertible cryptographic obscuration.
In Computer Vision for Interactive and Intelligent Environment, 2005 (pp.
27–38). IEEE.

Boyle, M., Edwards, C., & Greenberg, S. (2000). The effects of filtered video on
awareness and privacy. In Computer supported cooperative work, Proceedings
of the 2000 ACM conference on CSCW ’00 (pp. 1–10). New York, NY, USA:
ACM.

Caloyannides, M. A. (2003). Society cannot function without privacy. IEEE
Security and Privacy , 1 , 84–86.

Candes, E., Romberg, J., & Tao, T. (2006). Robust uncertainty principles:
exact signal reconstruction from highly incomplete frequency information.
Information Theory, IEEE Transactions on, 52 , 489–509.

Cardinaux, F., Bhowmik, D., Abhayaratne, C., & Hawley, M. S. (2011). Video
based technology for ambient assisted living: A review of the literature. Jour-
nal of Ambient Intelligence and Smart Environments , 3 , 253–269.

Carrillo, P., Kalva, H., & Magliveras, S. (2010). Compression independent
reversible encryption for privacy in video surveillance. EURASIP Journal
on Information Security , 2009 , 1–13. doi:10.1155/2009/429581. Article ID:
2009:429581.

Cavoukian, A. (2013). Surveillance, Then and Now: Securing Privacy in Public
Spaces . Technical Report Information and Privacy Commmissioner of On-
tario.

Chaaraoui, A. A., Climent-Pérez, P., & Flórez-Revuelta, F. (2012a). An efficient
approach for multi-view human action recognition based on bag-of-key-poses.
In A. A. Salah, J. Ruiz-del Solar, c. Meriçli, & P.-Y. Oudeyer (Eds.), Human
Behavior Understanding (pp. 29–40). Springer Berlin Heidelberg volume 7559
of Lecture Notes in Computer Science.

Chaaraoui, A. A., Climent-Pérez, P., & Flórez-Revuelta, F. (2012b). A review
on vision techniques applied to human behaviour analysis for ambient-assisted
living. Expert Systems with Applications , 39 , 10873–10888.

Chan, C. S., & Liu, H. (2009). GMM-QNT hybrid framework for vision-based
human motion analysis. In Fuzzy Systems, 2009. FUZZ-IEEE 2009. IEEE
International Conference on (pp. 1820–1825).

35

http://dx.doi.org/10.1155/2009/429581


Chan, T., Shen, J., & Zhou, H.-M. (2006). Total variation wavelet inpainting.
Journal of Mathematical Imaging and Vision, 25 , 107–125.

Chang, Y., Yan, R., Chen, D., & Yang, J. (2006). People identification with
limited labels in privacy-protected video. In Multimedia and Expo, 2006 IEEE
International Conference on (pp. 1005–1008).

Chattopadhyay, A., & Boult, T. (2007). PrivacyCam: a privacy preserving
camera using uCLinux on the blackfin DSP. In Computer Vision and Pattern
Recognition, 2007. CVPR ’07. IEEE Conference on (pp. 1–8).

Chen, D., Chang, Y., Yan, R., & Yang, J. (2007). Tools for protecting the pri-
vacy of specific individuals in video. EURASIP Journal on Advances in Signal
Processing , 2007 , 1–9. doi:10.1155/2007/75427. Article ID: 2007:075427.

Chen, D., Chang, Y., Yan, R., & Yang, J. (2009). Protecting personal identifi-
cation in video. In A. Senior (Ed.), Protecting Privacy in Video Surveillance
(pp. 115–128). Springer London.

Chen, Y., Dang, G., Cheng, Z., & Xu, K. (2014). Fast capture of personalized
avatar using two kinects. Journal of Manufacturing Systems , 33 , 233–240.

Chen, Y.-Y., Cho, C.-W., Lin, S.-H., Lai, H.-Y., Lo, Y.-C., Chen, S.-Y., Chang,
Y.-J., Huang, W.-T., Chen, C.-H., Jaw, F.-S., Tsang, S., & Tsai, S.-T. (2012).
A vision-based regression model to evaluate parkinsonian gait from monocular
image sequences. Expert Systems with Applications , 39 , 520–526.

Cheung, S.-c., Paruchuri, J., & Nguyen, T. (2008). Managing privacy data in
pervasive camera networks. In Image Processing, 2008. ICIP 2008. 15th IEEE
International Conference on (pp. 1676–1679).

Cheung, S.-C., Venkatesh, M., Paruchuri, J., Zhao, J., & Nguyen, T. (2009).
Protecting and managing privacy information in video surveillance systems.
In A. Senior (Ed.), Protecting Privacy in Video Surveillance (pp. 11–33).
Springer London.

Cheung, S.-C. S., Zhao, J., & Venkatesh, M. (2006). Efficient object-based video
inpainting. In Image Processing, 2006 IEEE International Conference on (pp.
705–708).

Chinomi, K., Nitta, N., Ito, Y., & Babaguchi, N. (2008). PriSurv: Privacy
protected video surveillance system using adaptive visual abstraction. In
S. Satoh, F. Nack, & M. Etoh (Eds.), Advances in Multimedia Modeling (pp.
144–154). Springer Berlin / Heidelberg volume 4903 of Lecture Notes in
Computer Science.

Cook, D. J., Augusto, J. C., & Jakkula, V. R. (2009). Ambient intelligence:
Technologies, applications, and opportunities. Pervasive and Mobile Com-
puting , 5 , 277–298.

36

http://dx.doi.org/10.1155/2007/75427


Cootes, T., Edwards, G., & Taylor, C. (1998). Active appearance models. In
H. Burkhardt, & B. Neumann (Eds.), Computer Vision – ECCV’98 (pp. 484–
498). Springer Berlin Heidelberg volume 1407 of Lecture Notes in Computer
Science.

Cootes, T., Edwards, G., & Taylor, C. (2001). Active appearance models.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23 , 681–
685.

Coudert, F. (2010). When video cameras watch and screen: Privacy implications
of pattern recognition technologies. Computer Law & Security Review , 26 ,
377–384.

Coutaz, J., Bérard, F., Carraux, E., Astier, W., & Crowley, J. L. (1999).
CoMedi: Using computer vision to support awareness and privacy in me-
diaspaces. In CHI ’99 Extended Abstracts on Human Factors in Computing
Systems CHI EA ’99 (pp. 13–14). New York, NY, USA: ACM.

Cox, I., Honsinger, C., Miller, M., & Bloom, J. (2002). Digital watermarking.
Journal of Electronic Imaging , 11 , 414–414.

Cranor, L., Langheinrich, M., & Marchiori, M. (2002). A P3P prefer-
ence exchange language 1.0 (APPEL1.0). URL: http://www.w3.org/TR/
P3P-preferences/ last visited: 2015/01/08.

Criminisi, A., Perez, P., & Toyama, K. (2003). Object removal by exemplar-
based inpainting. In Computer Vision and Pattern Recognition, Proceedings
of IEEE Computer Society Conference on (pp. II–721–II–728 vol.2). volume 2.

Criminisi, A., Perez, P., & Toyama, K. (2004). Region filling and object removal
by exemplar-based image inpainting. Image Processing, IEEE Transactions
on, 13 , 1200 –1212.

Dee, H., & Velastin, S. (2008). How close are we to solving the problem of
automated visual surveillance? Machine Vision and Applications , 19 , 329–
343.

Demiris, G., Oliver, D., Giger, J., Skubic, M., & Rantz, M. (2009). Older adults’
privacy considerations for vision based recognition methods of eldercare ap-
plications. Technology and Health Care, 17 , 41–48.

Donoho, D. (2006). Compressed sensing. Information Theory, IEEE Transac-
tions on, 52 , 1289–1306.

Dufaux, F. (2011). Video scrambling for privacy protection in video surveil-
lance: recent results and validation framework. In Mobile Multimedia/Image
Processing, Security, and Applications 2011, Proceedings of SPIE (pp. 1–12).
volume 8063. doi:10.1117/12.883948 Article ID: 806302.

37

http://www.w3.org/TR/P3P-preferences/
http://www.w3.org/TR/P3P-preferences/
http://dx.doi.org/10.1117/12.883948


Dufaux, F., & Ebrahimi, T. (2006). Scrambling for video surveillance with pri-
vacy. In Computer Vision and Pattern Recognition Workshop, 2006. CVPRW
’06. Conference on (p. 160). doi:10.1109/CVPRW.2006.184.

Dufaux, F., & Ebrahimi, T. (2008a). H.264/AVC video scrambling for privacy
protection. In Image Processing, 2008. ICIP 2008. 15th IEEE International
Conference on (pp. 1688–1691).

Dufaux, F., & Ebrahimi, T. (2008b). Scrambling for privacy protection in
video surveillance systems. Circuits and Systems for Video Technology, IEEE
Transactions on, 18 , 1168–1174.

Dufaux, F., & Ebrahimi, T. (2010). A framework for the validation of privacy
protection solutions in video surveillance. In Multimedia and Expo (ICME),
2010 IEEE International Conference on (pp. 66–71).

Dufaux, F., Ouaret, M., Abdeljaoued, Y., Navarro, A., Vergnenègre, F., &
Ebrahimi, T. (2006). Privacy enabling technology for video surveillance. In
Mobile Multimedia/ Image Processing for Military and Security Applications,
Proceedings of SPIE (pp. 1–12). volume 6250. doi:10.1117/12.664945 Article
ID: 62500M.

EC (2008). Seniorwatch 2. Assessment of the Senior Market for ICT Progress
and Developments . Technical Report European Commission.

EC (2010). Overview of the European strategy in ICT for Ageing Well . Technical
Report European Commission.

Edgcomb, A., & Vahid, F. (2012). Privacy perception and fall detection accuracy
for in-home video assistive monitoring with privacy enhancements. ACM
SIGHIT Record , 2 , 6–15.

Efros, A., & Leung, T. (1999). Texture synthesis by non-parametric sampling. In
Computer Vision, 1999. The Proceedings of the Seventh IEEE International
Conference on (pp. 1033–1038). volume 2.

Efros, A. A., & Freeman, W. T. (2001). Image quilting for texture synthesis
and transfer. In Computer Graphics and Interactive Techniques, Proceedings
of the 28th Annual Conference on SIGGRAPH ’01 (pp. 341–346). New York,
NY, USA: ACM.

Erkin, Z., Franz, M., Guajardo, J., Katzenbeisser, S., Lagendijk, I., & Toft, T.
(2009). Privacy-preserving face recognition. In I. Goldberg, & M. Atallah
(Eds.), Privacy Enhancing Technologies (pp. 235–253). Springer Berlin /
Heidelberg volume 5672 of Lecture Notes in Computer Science.

Erturk, S. (2007). Multiplication-free one-bit transform for low-complexity
block-based motion estimation. Signal Processing Letters, IEEE , 14 , 109–
112.

38

http://dx.doi.org/10.1109/CVPRW.2006.184
http://dx.doi.org/10.1117/12.664945


Fan, J., Luo, H., Hacid, M.-S., & Bertino, E. (2005). A novel approach for
privacy-preserving video sharing. In Information and knowledge management,
Proceedings of the 14th ACM international conference on CIKM ’05 (pp. 609–
616). New York, NY, USA: ACM.

Fidaleo, D. A., Nguyen, H.-A., & Trivedi, M. (2004). The networked sen-
sor tapestry (NeST): a privacy enhanced software architecture for interactive
analysis of data in video-sensor networks. In Video surveillance & sensor
networks, Proceedings of the ACM 2nd international workshop on VSSN ’04
(pp. 46–53). New York, NY, USA: ACM.

Fleck, S., & Strasser, W. (2008). Smart camera based monitoring system and
its application to assisted living. Proceedings of the IEEE , 96 , 1698–1714.

Frome, A., Cheung, G., Abdulkader, A., Zennaro, M., Wu, B., Bissacco, A.,
Adam, H., Neven, H., & Vincent, L. (2009). Large-scale privacy protection
in google street view. In Computer Vision, 2009 IEEE 12th International
Conference on (pp. 2373–2380).

Gerrard, G., & Thompson, R. (2011). Two million cameras in the UK. CCTV
Image, 42 , 10–12.

Ghanbari, A., & Soryani, M. (2011). Contour-based video inpainting. In Ma-
chine Vision and Image Processing (MVIP), 2011 7th Iranian (pp. 1–5).
IEEE.

Goessl, L. (2012). New japanese security camera scans 36 million faces per
second. URL: http://digitaljournal.com/article/321848 last visited:
2014/05/08.

Granados, M., Tompkin, J., Kim, K., Grau, O., Kautz, J., & Theobalt, C.
(2012). How not to be seen – object removal from videos of crowded scenes.
Computer Graphics Forum, 31 , 219–228.

Gross, R., Airoldi, E., Malin, B., & Sweeney, L. (2006a). Integrating utility into
face de-identification. In G. Danezis, & D. Martin (Eds.), Privacy Enhancing
Technologies (pp. 227–242). Springer Berlin / Heidelberg volume 3856 of
Lecture Notes in Computer Science.

Gross, R., Sweeney, L., Cohn, J., Torre, F., & Baker, S. (2009). Face de-
identification. In A. Senior (Ed.), Protecting Privacy in Video Surveillance
(pp. 129–146). Springer London.

Gross, R., Sweeney, L., de la Torre, F., & Baker, S. (2006b). Model-based face
de-identification. In Computer Vision and Pattern Recognition Workshop,
2006. CVPRW ’06. Conference on (pp. 161–168).

Guangzhen, L., Yoshimichi, I., Xiaoyi, Y., Naoko, N., & Noboru, B. (2010). Re-
coverable privacy protection for video content distribution. EURASIP Journal
on Information Security , 2009 , 1–11. doi:10.1155/2009/293031. Article ID:
2009:293031.

39

http://digitaljournal.com/article/321848
http://dx.doi.org/10.1155/2009/293031


Gutwirth, S., Leenes, R., De Hert, P., & Poullet, Y. (2012). European Data
Protection: In Good Health? . Springer.

Harvey, A. (2010). Camoflash - anti-paparazzi clutch. URL: http://
ahprojects.com/projects/camoflash/ last visited: 2014/05/08.

Hayes, G., & Truong, K. (2009). Selective archiving: A model for privacy sen-
sitive capture and access technologies. In A. Senior (Ed.), Protecting Privacy
in Video Surveillance (pp. 165–184). Springer London.

He, L., Bleyer, M., & Gelautz, M. (2011). Object removal by depth-
guided inpainting. In The Austrian Association for Pattern Recognition
(OAGM/AAPR) Workshop 2011 (OAGM2011) (pp. 1–8).

Hodgins, J. K., O’Brien, J. F., & Tumblin, J. (1998). Perception of human mo-
tion with different geometric models. Visualization and Computer Graphics,
IEEE Transactions on, 4 , 307–316.

Hogue, A., Gill, S., & Jenkin, M. (2007). Automated avatar creation for 3D
games. In Future Play, Proceedings of the 2007 Conference on Future Play
’07 (pp. 174–180). New York, NY, USA: ACM.

Hudson, S. E., & Smith, I. (1996). Techniques for addressing fundamental pri-
vacy and disruption tradeoffs in awareness support systems. In Computer sup-
ported cooperative work, Proceedings of the 1996 ACM conference on CSCW
’96 (pp. 248–257). New York, NY, USA: ACM.

De la Hunty, M., Asthana, A., & Goecke, R. (2010). Linear facial expression
transfer with active appearance models. In Pattern Recognition (ICPR), 2010
20th International Conference on (pp. 3789–3792).

Igehy, H., & Pereira, L. (1997). Image replacement through texture synthesis.
In Image Processing, Proceedings of the International Conference on (pp.
186–189). volume 3.

Ito, I., & Kiya, H. (2009). One-time key based phase scrambling for phase-
only correlation between visually protected images. EURASIP Journal on
Information Security , 2009 , 1–11. doi:10.1155/2009/841045. Article ID:
2009:841045.

Johansson, G. (1973). Visual perception of biological motion and a model for
its analysis. Perception & Psychophysics , 14 , 201–211.

Kitahara, I., Kogure, K., & Hagita, N. (2004). Stealth vision for protecting
privacy. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th
International Conference on (pp. 404–407). volume 4.

Kokaram, A., Morris, R., Fitzgerald, W., & Rayner, P. (1995). Interpolation of
missing data in image sequences. Image Processing, IEEE Transactions on,
4 , 1509–1519.

40

http://ahprojects.com/projects/camoflash/
http://ahprojects.com/projects/camoflash/
http://dx.doi.org/10.1155/2009/841045


Koochari, A., & Soryani, M. (2010). Exemplar-based video inpainting with large
patches. Journal of Zhejiang University SCIENCE C , 11 , 270–277.

Korshunov, P., Araimo, C., De Simone, F., Velardo, C., Dugelay, J., &
Ebrahimi, T. (2012). Subjective study of privacy filters in video surveil-
lance. In Multimedia Signal Processing (MMSP), 2012 IEEE 14th Interna-
tional Workshop on (pp. 378–382).

Korshunov, P., & Ebrahimi, T. (2013). PEViD: privacy evaluation video dataset.
In Applications of Digital Image Processing XXXVI, Proceedings of SPIE (pp.
1–9). volume 8856. doi:10.1117/12.2030974 Article ID: 88561S.

Korshunov, P., & Ebrahimi, T. (2014). UHD video dataset for evaluation of
privacy. In Sixth International Workshop on Quality of Multimedia Experience
(QoMEX 2014).

Koshimizu, T., Toriyama, T., & Babaguchi, N. (2006). Factors on the sense of
privacy in video surveillance. In Continuous archival and retrieval of personal
experiences, Proceedings of the 3rd ACM workshop on CARPE ’06 (pp. 35–
44). New York, NY, USA: ACM.

Lander, K., Bruce, V., & Hill, H. (2001). Evaluating the effectiveness of pixela-
tion and blurring on masking the identity of familiar faces. Applied Cognitive
Psychology , 15 , 101–116.

Langheinrich, M. (2001). Privacy by design - principles of privacy-aware ubiqui-
tous systems. In Ubiquitous Computing, Proceedings of the 3rd international
conference on UbiComp ’01 (pp. 273–291). Springer-Verlag.

Li, S., Zhao, Y., Qu, B., & Wang, J. (2013). Image scrambling based on chaotic
sequences and veginère cipher. Multimedia Tools and Applications , 66 , 573–
588.

Lin, Y., Wang, S., Lin, Q., & Tang, F. (2012). Face swapping under large pose
variations: A 3D model based approach. In Multimedia and Expo (ICME),
2012 IEEE International Conference on (pp. 333–338).

Lyons, M., Plante, A., Jehan, S., Inoue, S., & Akamatsu, S. (1998). Avatar
creation using automatic face processing. In Multimedia, Proceedings of the
sixth ACM international conference on (pp. 427–434). ACM.

Ma, W., & Ma, Q. (2011). An effective method for removing large object. In
Image and Signal Processing (CISP), 2011 4th International Congress on (pp.
80–84). volume 1.

Macq, B., & Quisquater, J.-J. (1995). Cryptology for digital tv broadcasting.
Proceedings of the IEEE , 83 , 944–957.

41

http://dx.doi.org/10.1117/12.2030974


Mart́ınez-Ponte, I., Desurmont, X., Meessen, J., & Delaigle, J.-F. (2005). Ro-
bust human face hiding ensuring privacy. In Image Analysis for Multimedia
Interactive Services (WIAMIS), Proceedings of the International Workshop
on. volume 4.

Masnou, S., & Morel, J.-M. (1998). Level lines based disocclusion. In Image
Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on
(pp. 259–263). volume 3.

Mitskog, T. F. R., & Ralston, R. A. (2012). Camera blocker for a device with
an integrated camera that uses a thin film organic polymer.

Morris, M. E., Adair, B., Miller, K., Ozanne, E., Hansen, R., Pearce, A. J.,
Santamaria, N., Viegas, L., Long, M., & Said, C. M. (2013). Smart-home
technologies to assist older people to live well at home. Journal of Aging
Science, 1 . doi:10.4172/2329-8847.1000101.

Morse, B., & Schwartzwald, D. (1998). Isophote-based interpolation. In Image
Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on
(pp. 227–231). volume 3.

Neustaedter, C., & Greenberg, S. (2003). The design of a context-aware home
media space for balancing privacy and awareness. In A. Dey, A. Schmidt, &
J. McCarthy (Eds.), UbiComp 2003: Ubiquitous Computing (pp. 297–314).
Springer Berlin Heidelberg volume 2864 of Lecture Notes in Computer Sci-
ence.

Neustaedter, C., Greenberg, S., & Boyle, M. (2006). Blur filtration fails to
preserve privacy for home-based video conferencing. Computer-Human Inter-
action, ACM Transactions on, 13 , 1–36.

Newton, E., Sweeney, L., & Malin, B. (2005). Preserving privacy by de-
identifying face images. Knowledge and Data Engineering, IEEE Transactions
on, 17 , 232–243.

Ng, P. L., minn Ang, L., & Seng, K. P. (2010). Privacy preserving stereoscopic
vision with one-bit transform. In Computer Science and Information Tech-
nology (ICCSIT), 2010 3rd IEEE International Conference on (pp. 70–74).
volume 9.

Ni, Z., Shi, Y.-Q., Ansari, N., & Su, W. (2006). Reversible data hiding. Circuits
and Systems for Video Technology, IEEE Transactions on, 16 , 354–362.

O’Brien, A., & Mac Ruairi, R. (2009). Survey of assistive technology devices
and applications for aging in place. In Advances in Human-oriented and
Personalized Mechanisms, Technologies, and Services, 2009. CENTRIC ’09.
Second International Conference on (pp. 7–12).

42

http://dx.doi.org/10.4172/2329-8847.1000101


Olivieri, D. N., Conde, I. G., & Sobrino, X. A. V. (2012). Eigenspace-based
fall detection and activity recognition from motion templates and machine
learning. Expert Systems with Applications , 39 , 5935–5945.

Pande, A., & Zambreno, J. (2013). Securing multimedia content using joint
compression and encryption. In Embedded Multimedia Security Systems (pp.
23–30). Springer London.

Park, S., & Kautz, H. (2008). Privacy-preserving recognition of activities in
daily living from multi-view silhouettes and RFID-based training. In AI in
Eldercare: New Solutions to Old Problems, AAAI Fall Symposium on.

Paruchuri, J., & Cheung, S.-c. (2008). Joint optimization of data hiding and
video compression. In Circuits and Systems, 2008. ISCAS 2008. IEEE Inter-
national Symposium on (pp. 3021–3024).

Paruchuri, J. K., Cheung, S.-c. S., & Hail, M. W. (2009). Video data hiding
for managing privacy information in surveillance systems. EURASIP Journal
on Information Security , 2009 , 1–18. doi:10.1155/2009/236139. Article ID:
2009:236139.

Patel, S. N., Summet, J. W., & Truong, K. N. (2009). Blindspot: Creating
capture-resistant spaces. In A. Senior (Ed.), Protecting Privacy in Video
Surveillance (pp. 185–201). Springer London.

Patwardhan, K., Sapiro, G., & Bertalmio, M. (2007). Video inpainting under
constrained camera motion. Image Processing, IEEE Transactions on, 16 ,
545–553.

Petitcolas, F., Anderson, R., & Kuhn, M. (1999). Information hiding-a survey.
Proceedings of the IEEE , 87 , 1062–1078.

Qiao, L., Nahrstedt, K. et al. (1997). A new algorithm for MPEG video encryp-
tion. In Imaging Science, Systems and Technology (CISST’97, Proceedings of
First International Conference on (pp. 21–29).

Qureshi, F. (2009). Object-video streams for preserving privacy in video surveil-
lance. In Advanced Video and Signal Based Surveillance, 2009. AVSS ’09.
Sixth IEEE International Conference on (pp. 442–447).

Ra, M.-R., Govindan, R., & Ortega, A. (2013). P3: Toward privacy-preserving
photo sharing. In Networked Systems Design and Implementation, Proceed-
ings of the 10th USENIX conference on (pp. 515–528).

Raju, C., Umadevi, G., Srinathan, K., & Jawahar, C. (2008). Fast and secure
real-time video encryption. In Computer Vision, Graphics Image Processing,
2008. ICVGIP ’08. Sixth Indian Conference on (pp. 257–264). IEEE.

43

http://dx.doi.org/10.1155/2009/236139


Sadeghi, A.-R., Schneider, T., & Wehrenberg, I. (2010). Efficient privacy-
preserving face recognition. In D. Lee, & S. Hong (Eds.), Information, Secu-
rity and Cryptology - ICISC 2009 (pp. 229–244). Springer Berlin / Heidelberg
volume 5984 of Lecture Notes in Computer Science.

Sadimon, S., Sunar, M., Mohamad, D., & Haron, H. (2010). Computer gener-
ated caricature: A survey. In Cyberworlds (CW), 2010 International Confer-
ence on (pp. 383–390).

Saini, M., Atrey, P., Mehrotra, S., & Kankanhalli, M. (2014). W3-privacy:
understanding what, when, and where inference channels in multi-camera
surveillance video. Multimedia Tools and Applications , 68 , 135–158.

Schaar, P. (2010). Privacy by design. Identity in the Information Society , 3 ,
267–274.

Schiff, J., Meingast, M., Mulligan, D. K., Sastry, S., & Goldberg, K. (2009).
Respectful cameras: Detecting visual markers in real-time to address privacy
concerns. In A. Senior (Ed.), Protecting Privacy in Video Surveillance (pp.
65–89). Springer London.

Senior, A. (2009). Privacy protection in a video surveillance system. In A. Senior
(Ed.), Protecting Privacy in Video Surveillance (pp. 35–47). Springer London.

Senior, A., & Pankanti, S. (2011). Privacy protection and face recognition. In
S. Z. Li, & A. K. Jain (Eds.), Handbook of Face Recognition (pp. 671–691).
Springer London.

Senior, A., Pankanti, S., Hampapur, A., Brown, L., li Tian, Y., & Ekin, A.
(2003). Blinkering Surveillance: Enabling Video Privacy through Computer
Vision. Technical Report IBM Research Division.

Senior, A., Pankanti, S., Hampapur, A., Brown, L., Tian, Y.-L., Ekin, A.,
Connell, J., Shu, C. F., & Lu, M. (2005). Enabling video privacy through
computer vision. Security Privacy, IEEE , 3 , 50–57.

Shashank, J., Kowshik, P., Srinathan, K., & Jawahar, C. (2008). Private content
based image retrieval. In Computer Vision and Pattern Recognition, 2008.
CVPR 2008. IEEE Conference on (pp. 1–8).

Shashanka, M. (2010). A privacy preserving framework for gaussian mixture
models. In Data Mining Workshops (ICDMW), 2010 IEEE International
Conference on (pp. 499–506).

Shiratori, T., Matsushita, Y., Tang, X., & Kang, S. B. (2006). Video completion
by motion field transfer. In Computer Vision and Pattern Recognition, 2006
IEEE Computer Society Conference on (pp. 411–418). volume 1.

Shoaib, M., Elbrandt, T., Dragon, R., & Ostermann, J. (2010). Altcare: Safe
living for elderly people. In Pervasive Computing Technologies for Healthcare
(PervasiveHealth), 2010 4th International Conference on (pp. 1–4).

44


Sohn, H., De Neve, W., & Ro, Y. M. (2011). Privacy protection in video
surveillance systems: Analysis of subband-adaptive scrambling in JPEG XR.
Circuits and Systems for Video Technology, IEEE Transactions on, 21 , 170–
177.

Sohn, H., Plataniotis, K., & Ro, Y. (2010). Privacy-preserving watch list screen-
ing in video surveillance system. In G. Qiu, K. Lam, H. Kiya, X.-Y. Xue,
C.-C. Kuo, & M. Lew (Eds.), Advances in Multimedia Information Process-
ing - PCM 2010 (pp. 622–632). Springer Berlin / Heidelberg volume 6297 of
Lecture Notes in Computer Science.

Spanos, G. A., & Maples, T. B. (1995). Performance study of a selective en-
cryption scheme for the security of networked, real-time video. In Computer
Communications and Networks, International Conference on.

Srinivasan, S., Tu, C., Regunathan, S., & Sullivan, G. (2007). HD photo: a
new image coding technology for digital photography. In Applications of
Digital Image Processing XXX, Proceedings of SPIE (pp. 1–19). volume 6696.
doi:10.1117/12.767840 Article ID: 66960A.

Sweeney, L. (2002). k-anonymity: a model for protecting privacy. International
Journal of Uncertainty, Fuzziness and Knowledge-Based Systems , 10 , 557–
570.

Tang, L. (1996). Methods for encrypting and decrypting MPEG video data
efficiently. In Multimedia, Proceedings of the fourth ACM international con-
ference on (pp. 219–229). ACM.

Tansuriyavong, S., & Hanaki, S.-i. (2001). Privacy protection by concealing per-
sons in circumstantial video image. In Perceptive user interfaces, Proceedings
of the 2001 workshop on (pp. 1–4). ACM.

Tian, Y.-l., Brown, L., Hampapur, A., Lu, M., Senior, A., & Shu, C.-f. (2008).
IBM smart surveillance system (S3): event based video surveillance system
with an open and extensible framework. Machine Vision and Applications ,
19 , 315–327.

Tong, L., Dai, F., Zhang, Y., & Li, J. (2010). Prediction restricted H.264/AVC
video scrambling for privacy protection. Electronics Letters , 46 , 47–49.

Tong, L., Dai, F., Zhang, Y., Li, J., & Zhang, D. (2011). Compressive sensing
based video scrambling for privacy protection. In Visual Communications
and Image Processing (VCIP), 2011 IEEE (pp. 1–4).

Turk, M., & Pentland, A. (1991). Face recognition using eigenfaces. In Com-
puter Vision and Pattern Recognition, 1991. Proceedings CVPR ’91., IEEE
Computer Society Conference on (pp. 586–591).

45

http://dx.doi.org/10.1117/12.767840


Upmanyu, M., Namboodiri, A., Srinathan, K., & Jawahar, C. (2009). Efficient
privacy preserving video surveillance. In Computer Vision, 2009 IEEE 12th
International Conference on (pp. 1639–1646).

Vijay Venkatesh, M., Cheung, S.-c. S., & Zhao, J. (2009). Efficient object-based
video inpainting. Pattern Recognition Letters , 30 , 168–179.

Wagstaff, J. (2004). Using bluetooth to disable camera phones. URL:
http://www.loosewireblog.com/2004/09/using_bluetooth.html last vis-
ited: 2014/05/08.

Wallace, E., & Diffley, C. (1988). CCTV control rooms ergonomics . Technical
Report 14/98 Police Scientific Development Branch (PSDB) UK Home Office.

Wexler, Y., Shechtman, E., & Irani, M. (2007). Space-time completion of video.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29 , 463–
476.

Whyte, O., Sivic, J., & Zisserman, A. (2009). Get out of my picture! internet-
based inpainting. In Proceedings of the 20th British Machine Vision Confer-
ence, London.

Williams, A., Xie, D., Ou, S., Grupen, R., Hanson, A., & Riseman, E. (2006).
Distributed Smart Cameras for Aging in Place. Technical Report DTIC Doc-
umen.

Winkler, T., & Rinner, B. (2010a). Securing embedded smart cameras with
trusted computing. EURASIP Journal on Wireless Communications and Net-
working , 2011 , 1–20. doi:10.1155/2011/530354. Article ID: 2011:530354.

Winkler, T., & Rinner, B. (2010b). TrustCAM: Security and privacy-protection
for an embedded smart camera based on trusted computing. In Advanced
Video and Signal Based Surveillance (AVSS), 2010 Seventh IEEE Interna-
tional Conference on (pp. 593–600).

Winkler, T., & Rinner, B. (2014). Security and privacy protection in visual
sensor networks: A survey. ACM Computing Surveys , 47 , 1–42.

Wu, M. (2001). Multimedia data hiding . Ph.D. thesis Princeton University.

Xiangdong, L., Junxing, Z., Jinhai, Z., & Xiqin, H. (2008). Image scrambling
algorithm based on chaos theory and sorting transformation. IJCSNS Inter-
national Journal of Computer Science and Network Security , 8 , 64–68.

Yabuta, K., Kitazawa, H., & Tanaka, T. (2005). A new concept of security
camera monitoring with privacy protection by masking moving objects. In
Y.-S. Ho, & H. Kim (Eds.), Advances in Multimedia Information Processing
- PCM 2005 (pp. 831–842). Springer Berlin / Heidelberg volume 3767 of
Lecture Notes in Computer Science.

46

http://www.loosewireblog.com/2004/09/using_bluetooth.html
http://dx.doi.org/10.1155/2011/530354


Yabuta, K., Kitazawa, H., & Tanaka, T. (2006). A new concept of real-time se-
curity camera monitoring with privacy protection by masking moving objects.
In Proceedings of SPIE (pp. 188–199). volume 6063.

Yang, M., Bourbakis, N., & Li, S. (2004). Data-image-video encryption. Poten-
tials, IEEE , 23 , 28–34.

Yu, X., & Babaguchi, N. (2007). Privacy preserving: Hiding a face in a face. In
Y. Yagi, S. Kang, I. Kweon, & H. Zha (Eds.), Computer Vision - ACCV 2007
(pp. 651–661). Springer Berlin / Heidelberg volume 4844 of Lecture Notes in
Computer Science.

Yu, X., Chinomi, K., Koshimizu, T., Nitta, N., Ito, Y., & Babaguchi, N. (2008).
Privacy protecting visual processing for secure video surveillance. In Image
Processing, 2008. ICIP 2008. 15th IEEE International Conference on (pp.
1672–1675).

Zeng, W., & Lei, S. (2003). Efficient frequency domain selective scrambling of
digital video. Multimedia, IEEE Transactions on, 5 , 118–129.

Zhang, C., Rui, Y., & wei He, L. (2006). Light weight background blurring for
video conferencing applications. In Image Processing, 2006 IEEE Interna-
tional Conference on (pp. 481–484).

Zhang, C., Tian, Y., & Capezuti, E. (2012). Privacy preserving automatic fall
detection for elderly using RGBD cameras. In K. Miesenberger, A. Karshmer,
P. Penaz, & W. Zagler (Eds.), Computers Helping People with Special Needs
(pp. 625–633). Springer Berlin Heidelberg volume 7382 of Lecture Notes in
Computer Science.

Zhang, W., Cheung, S., & Chen, M. (2005a). Hiding privacy information in
video surveillance system. In Image Processing, 2005. ICIP 2005, IEEE In-
ternational Conference on (pp. II–868–71). volume 3.

Zhang, Y., Xiao, J., & Shah, M. (2005b). Motion layer based object removal
in videos. In Application of Computer Vision, 2005. WACV/MOTIONS ’05
Volume 1. Seventh IEEE Workshops on (pp. 516–521). volume 1.

Zhao, Q. A., & Stasko, J. T. (1998). The awareness-privacy tradeoff in video sup-
ported informal awareness: A study of image-filtering based techniques . Tech-
nical Report Georgia Institute of Technology. GVU Technical Report;GIT-
GVU-98-16.

Zivkovic, Z. (2004). Improved adaptive gaussian mixture model for background
subtraction. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the
17th International Conference on (pp. 28–31). volume 2.

47


	Introduction
	Related studies

	Visual privacy protection
	Protection methods
	Intervention
	Blind vision
	Secure processing
	Redaction
	Image filtering
	Encryption of videos and images
	Face de-identification
	Object / people removal
	Visual abstraction / object replacement

	Data hiding based methods

	Privacy-aware intelligent monitoring systems
	Discussion
	Recognition of the region of interest
	Privacy and intelligibility trade-off
	Real-time applications
	Privacy model and evaluation
	User studies about privacy requirements

	Conclusion