key: cord-0037382-ym9j0ei0
authors: Tao, Ji; Tan, Yap-Peng
title: Probabilistic Reasoning for Closed-Room People Monitoring
date: 2005
journal: Intelligent Multimedia Processing with Soft Computing
DOI: 10.1007/3-540-32367-8_15
sha: 0218cd4a2d26f29da1590afd80c3511ad08241a8
doc_id: 37382
cord_uid: ym9j0ei0

In this chapter, we present a probabilistic reasoning approach to recognizing people entering and leaving a closed room by exploiting low-level visual features and high-level domain-specific knowledge. Specifically, people in the view of a monitoring camera are first detected and tracked so that their color and facial features can be extracted and analyzed. Then, recognition of people is carried out using a mapped feature similarity measure and exploiting the temporal correlation and constraints among each sequence of observations. The optimality of recognition is achieved in the sense of maximizing the joint posterior probability of the multiple observations. Experimental results of real and synthetic data are reported to show the effectiveness of the proposed approach.

With the increased concern for physical security in the face of global terrorism and outbrcalcs of infectious viruses, automated video surveillance for enhanced security in human living and work places ha.s received unprecedented attention from industries, research institutes, antl governmentid agencies over the past few years [l, 21. One main ta.sk of many video surveillance systems is to associate each person with an identity or t o correspond a same person observed a t different time inst,ances. The results allow the derivation of such useful information as how long a person has stayed in a site, how many people are in the room during a certain period of concern, and who they are. Potential applications of such a system include, for example, understanding human activities in a, monitored work place [8, 9] , keeping aware of user identities in an intelligent room [3] , antl identifying who could possibly be infected with a newly itlentifiod victim of Severe Acute Respiratory Syndrome (SARS) [lo] .

A number of related solutions have been proposed in the literatilre for people access control and monitoring. For example, biometrics have been increas-ingly used for identity rccognition and obtained sa.tisfactory results. Represt:ntativc work in biornetric-based rccognition ranges from fingerprint and iris identification to face and gait recognition 111 --14,161. However, many of these methods require intrusive data collection, e.g., demanding hilrnan proactive action and collaboration during the course of identification, and thus, work mainly in well-controlled environments. Although gait recognition can partly address this limitation by exploring hun~ari motion dynamics, gait fwture, by itself, has limited discriminating power and only works for people whose motion patterns have been well characterized arid pre-stored in a database for matching.

Rcgn.rdless of the type of features used, most existing a.pproachcs accornplish the rccognition task ba.sad on somc maxirnurn likelil~ood classification rule [Is] , where a definite decision is made based on features observed at a single time iristai~ce/tfilration. The temporal correlation and constraints anlong the observations obtained over timc, however, are seldom utilized even they exist in sonic specific contexts; for example, in the case of closed-room monitoring, a person currently inside the room cannot enter the roorn again without first leaving it, and vice versa.

On the other hand, dynamic I3ayesia.n networks (DBNs) 130,311 arc becoming popular in probabilistic iriferencc due to thcir ability in incorporating various prior co~~strairits and dealing with uncertainties in a systematic manner. In particular, hiddcn Markov models (HMMs), a spocific form of DBNs, are well suited for modeling and identifying an event, represented as a state sequence, which can best explain a series of observations, for at least two rcasons 124--261. First, the topology arnong the hidden states, i.e., their interdependencies, car1 encode prior knowledge about how the event evolves. Second, the forward-backward and Viterbi algorithms 1291, whidi are developed based on HMIWs lattice structure, allow one to evalmte the probabilities of different state sequences efficiently, and khus to identify the most likely state sequence.

In this chapter, we present a video-based system using probabilistic reasoning and based on the Vitcrbi algorithm for monitoring people cntcring arid leaving a closed room (i.e., a room with only a single entrance/exit; e.g., a lab, class room, or meeting room). The system consists of two rriodules: a feature cxtra.ction module to detcct/t,rack people entering or leaving the only entranco/cxit of the closotl room n.nd extract their low-level fcat,urcs for rccognition in an unintrusivc manner, and a pcople rccognition module to correspond each observed person with a person previoilsly entering the roorn or to identify him/her as a new person unseen before. Figure 1 depicts the architecture of the proposed system. Ra.ther than using only a single observation, we perform recognition by exploiting the terngord correla.tion and constraints arnong multiple pcople observations acquired at, different timc instances. Consequently, our method can effectively enhance the liniited discriminating power of lowlevel features, such as color histogranis and face featxres acquired using a ca,mcra, from a distance. Experimental results demonstrate that the proposed system can achieve superior recognition accura.cy as coinpared with the existing systems using maximum likelihood approaches. 

In this section, we briefly review preliminaries of HMM and the Viterbi algorithm, based on which our proposed system is constructed.

In With these probabilities, three basic problems can generally be addressed with an HMM [27] : 1) Evaliiate P[OlA], the probability of an observation sequence 0 given the model A; 2) Find the most likely state sequence given the model and an observation sequence; and 3) Find the model A = {A, B , .ir) that maximizes P[OlA] for a given observation sequence. The first and third problems are known as model evaluation and training, respectively, and can be solved by forward-backward algorithm and Balm-Welch method [27] . Of relevance to our application is the second problem, in which the Viterbi algorithm plays ail important role.

Consider a discrete-time dyriaixical system which is governed by a Markov chain and generates a. sequence of observable outputs (observations) according to a nnmber of hidden (unobservable) states. Our objective is to infer the most probable state sequence from the observation sequence. Straightforwardly, one can find the most probable state sequence by enumerating all possible state sequences and evaluating the probability of the observation scquence due to each possible state sequence. While viable, this exhaustive approach is computationally intensive even for a small number of states and observations. For example, with five observations (i.e. T = 5), the 3-state HMM shown in Fig. 2 will have 243 possible state sequences.

Using the Viterbi algorithm, one can exploit the dynamic prograinrning technique to sirnplify the computation substantially. To see this, let us first define thc quantity ht (i) = rnax P [ q l~ . . .

which is the highest probability of a state sequence which accounts for the first t observations and ends in state S, at time t. By induction, it is easy to

Hence, for each state S, a t time t , orlo cari find one state sequence erding in it and assuming the highest probability (i). We shall refer to this state sequence m the partial best state sequence. Once we have determined the hidden state corresponding to the observation obtained at time t , the uncertainties up to t cari be resolveti. For an HMM with N states, there are N partial best state sequences due to cach obscrvation. At the end of tho obscrvation (i.e., time T), the Viterbi dgoritlirn can find the most probable state sequence with probability max, hT(?). We shall refer to this state sequence as the best state sequence.

The procedure for finding the most probable state scquence can be surnmarized as follows [27] The array $1 is introduced to keep track of the argument that maxiinizes (2) for each t and ,J', indicating all t,he preceding states along each partial bcst state sequence. With 1//, one can retrieve the best state sequence of the whole process as well as the partial ttest state sequence ellding at a given state at any ti~ne. The latter is particularly useful in our pcople monitoring application as we shall show later.

From Fig. 3 , we can see that the recursion computation of the Vitcrbi algorithrri can be derived based on the temporal view structure of an HMM as shown in Fig. 2(b) . The key point of its efficiency is that since there are only N possible states at ea.ch observation time, all the possible state sequences will cnd in t l m e N states no matter how long thc observation sequt:nccs arc. Apart from its tractability in cornputation, the Viterbi algorithm bears another important property: it does not make any maximum likelihood decision at each intermediate observation time, but obtains an overall decision by taking into account the whole sequence of observations. Ambiguity andlor rnisjutlgcmcrit based on partial observations can bt: coirected later when more observations become available. This is well suited for our application, as it can exploit the temporal correlation among the observation sequence arid recognize people based 011 multiple observations, reducing the chance of error due to making hard decision based on features with low discriminating power.

The feature cxtraction module of our proposed system currently makes use of two types of low-level fcatures as illustrated in Fig. 4 : color histograms and low-resolution human faces. These features are descril.)etl in detail below.

Color histogram is a popular color feature for content-based image and video a.nalysis [20 221 . I t is easy to comput,e and rather invaria.nt to changc in shape or s i~e [19] . Our system detects and tracks each moving person as it foreground region and counts the color distribution of pixels within the region (i.e. color histogram) as the appearance featilre of the person. The feature similarity of two observed people ci and c,: can be measured by their color histogram intersection, defined as xfF1 rnin(Hi(k), Hj (k)), where Hi and Hi a,re the nornlalized color histogmms of the two people, respectively, and K is the total number of color bins in t,he histogram. For the dctails of a color histogram based people tracking and recognition systern, sec our previous work [17, 18] .

For facial feature, we make use of two functions provided in Intel Open Source Computer Vision Library (OpenCV) 141, HarrFaccDetection and HMM-FaceRecognition, to automatically detect and model human faces in video sequences. The face detector was originalSy proposed by Viola [5] and furt,her inlprovcd by Licnhart [6] , while the embetldcd HMM (EHMM) fa.cc: rccognizer was developed by Nefian et al. [7] . It has been shown that the EHMM recognizer can exploit the natural structmc of frontal faces and achieve outstanding pcrforinance. With a number of face images of a same pcrson, say ci, we train his/her EHMM fcat,urc using a set of observation vectors obtaincd from the corresponding 2D-DCT coefficients. The likelihood of a,ri imknown face observation c,? with respect to person ci can be calculated by a doubly embedded Viterbi algorithm [7] .

Direct application of the color histogram and face similar measures a,s defined above poses some potential problams. For exan~plc, the color histogram intersection of two different people is generally larger than zero for the reason that some of their appearance features: such as hair and skin, could share similar colors. On t,he other hand, the same pcrson observed at different times may not have identical color hist,ogrrtrris due to difference in lighting conditions, ca.mcra, vicw angles, and segrncntation rcsults. The low-rcsolution face fea,tures are also subject to similar problems. Moreover, owing to wiccessive multiplications of values less than one, the likelihood of faces calculated by . ,

Extracted face images for training Conceivably, the mq3ping function necds to have the following propcrtics: 1) it should be non-decreasing; 2) it should be approaching 0 or 1 as S(cc, cj) takes values near its lower or upper limit; and 3) the transition from low to high mapped values should take place at where the va1.ue of S(ci, c,j) becomes evident to support that trhc two observed people arc likcly the same. After some subjective studies and comparisons, wc scloct the sigrnoid function [23] to perform the mapping, which can be expressed as follows:

where a and /3 are two parameters determining the shape of a sigrnoid curve with tz. controlling the steepness of the transition and /3 defining the center of transition point. By cxperimcnts, wc have determined the proper values of those two parameters for tho sirnilarity measurcs based on color histograrn and face features, respectively.

It should be noted that many other features/attributes (e.g., fingerprint, iris pattern, voice, gait, etc.) for which a similarity measure is defined car1 be used in our proposed people monitoring system. To inake the system less intrusive, we have only made use of color histogram arid face features in the work rcportcd hcrc.

Our first attempt is to develop a suitable HMM for the recognition bask in a closed room by making use of the Vitcrbi algorithm. However, the parameters of IIhiIMs need to be pre-loarnad from a set of reprcscntat~ive data, based on a fixed number of sta,tes. This situation, however, is not applicable to our case as the number of people a.s well as thcir activity pa.tterns, i.c., the frcqucncics of entering and lcaving the room, are generally different from place t,o place or ti~nc to timc, and hard to bo estimated from prior data.

To construct a framework that is well suited for the problem of our concern, we ernploy t,he lattice structure arid parameter setting of HMMs, and formu1a.t~ the problem of pcoplc recognition as follows. Assume t,hat the closed room is empty when the syst,em is first activated. When a person is entering tho room, we append a new state to the state set (c1ataba.se) to represent his/her identity; when a. person leaves at time t, he/she will be recorded as a new observi&ion Ot. Thus the states in the state set at tirne t, denoted as associatetf with the observation sequence to maximize a joint posterior probability P(Q, OlA(t)). In this way: each leaving person can correspond to one of those who are judged still inside the room. illustrates a recognition example of the proposed framework, wllcrc the optimal state sequence obtained is shown in bold lines. From this sequence, we can identify 0 1 as $1, O2 as S3, O3 as S1, etc. We can see that the f~amework has a lattice structure similar to HMMs; howevcr, thcre arc scve r d key differences that distinguish our framework from conventional HMMs.

First, the parameter set A(t) is time variant and needs to be dcrivcd at each observation time instance based on d l tlie previous possible states and current observations rather than from some training data. Second, the number of states in our model is not fixed but can increase ovcr time bcfore a decision is made. Third, the states can be indefinite because more than one states could be associated with the same pcrson, e.g., both states S1 and S4 in Fig. 5 rcpresent the person 'a'. It can be seen that when 'a' leaves again at t = 3, the framework recognizes him/her as $1 rather than Sq, which is consistent with his/her idcntity (state) rccognizcd at t = 1 in his/her first exit.

It should be notcd that in our framcwork a state could represent the feature rnodcl or thc idcntity of a pcrson. For clarity, we shell use s, to denote a person's identity arid St hisllier feature model.

F'rorn the problem formulation, it can be seen that the main task in constructing the proposed framework lies in the estimation of tlie time-variant parameter set A(t). Once A(t) is known, we can find the optimal path indicating the recognized people by a "turn-the-crank" procedure given by the Viterbi algorithn~ [28] . Our solutions are given as follows.

Initial In analogy wit11 the clcfiriition in HMMs, let b,(Ot) be the probability of observation Ot generated by state z. We regard this probability as how likely an observed Ot is due to person s,, and simply approximate it by Eq. (7) using their feature similarity as

Before proceeding to the next step. we define a set of probabilities between observation times t and t + 1 as shown in Fig. 6 . For conciseness, we shdl refer to "observation time" as "time" hereafter wheii there is no confusion. where Nt is the number of states (people who possibly stay in the room) at time t and N;-" = N t + l -Nt is the number of people entering between times t and t + 1. With these auxilia.ry probabilities, we can now derivc the transition probabilit,~ using the following three steps. i) Tmns.it.ioa probability; aij (t):

The t,ransition probability a q ( t ) measures the odds that person s d leaves t,he room at time t and person sj leaves at time t + 1 (i.e., qt = si and qt+l = s i ) Lacking other knowledge, it is reasonable to assume that the prohbility for sj to lcave at time t + 1 is proportional to his/her existence odds in the roorn at tinie t + 1-. Conceivably, a person cannot leave a room if he/she is not in the room at all. Hence, we compute the transition probability as We compute this conditional probability with the aid of the likelihood matrix M defined in Eq. (10) . The calculation is given by where fi,L,, is the entry of a modified rnatrix which is derived from Ad to indicate how likely person s .~+ + , is in fact person s, re-entering the room untlor a, given cond. We introduce matrix 1 Z . r in order to incorporate the domain knowledge in the context of the cond, making our estimation more accurata.

In particular, the value C, @+, can be considered as the gain of the existence odds for person sj due to newly entered people between times t and t + 1.

This gain, along with the exi~t~ence odds of person sj at time t+, which is the t h x y value of Bj, constitutes his/her existence odds at time t + 1-. For those newly added states sj, where Nt 5 j L, Nf,+l, the value 1 -C , G,,,,(i-~,) can therefore be considered as the odds for person s,, being a new person. Note that in this framework we attempt to est,imate these odds in a a approximate nmnner, which has been found to be effective for improving the recognition rates.

Specifically, I%,~,, is derived from rrL,fr,, as

In (15) . rlZL' is a 2-D normalization operation which will be explained later; shows that it is impossible for one to enter a closed room if he/sho is currently inside, while p(S,,) accounts for the fact that an existing person could not enter the room agaiu if he/she has not been observed leaving the room. Sk. is the state for which the transition probability is to be calculated at t i~n e t and P T ( S k , t ) is the partial best state sequence ending in state Sk at time t , i.e., the highest probable state sccluencc that is retrieved by array qh and includes all the people who have bcon observed leaving the room up to time t.

To illi~strate the calculation in detail, we provide in the following a nilmerical example, in which five pcoplc enter and three leave a closed room, forming a process of three observations as shown in Fig. 7 .

For time pair {t, t + l ) shown in Fig. 7 , suppose that the likelihood rnatrix obtained by Eq. First, wc incorporate the knowledge of the people status into the likelihood matrix M by using the two indication factors. l?rom Eq. (16), ~( 0 3 ) = 0 and the third row of M will be set to zero. This is because if s3 is known to be inside the roorn, then neither si nor s 5 can be ss regardless of their similarities. The second indication factor relies on the partial best state sequence ending in the state for which thc transition probability is to be calculated. Assume that Sl at time t is the state and the partial best state sequence PT(S1, t ) is as shown in Fig. 7 . Since S3 is not on this path, it rnenns that person s:< has not left the roorn since he/she entered. So none of s 4 and s5 could be ss and p(S3) = 0 according to Eq. (17) . With this domain knowledge, the original likclihood matrix M can bo modified as In this example, the two indication factors affect the same row of the likelihood matrix Ad. In general, the adjustment could be made on different rows depending on the cord as well as the state chosen for evaluating the transition probability.

Next, the likelihood matrix M' is norrnalized so that the summations of the likelihoods corresponding to all possible circun~stances are equal to one. To do this, we introduce a normalization operation, referred Lo as the correlated normalization operation and denoted as 71, which works as follows. Consider a person 7 and a group of candidates C consisting of N peoplt!, and definc the similarity rneasnre w, = P[C,, I ] , C,, E C. Under the constraint that at most one person (or none) of C could be 7 , we can obtain the following new similarity measurcs

To rnttkc tho probabilities corresponding to all possiblc circunistailccs sum up to one, the 7 operation is defiwd to normalize the new similarity measures as This normalization operation aims to correlate likelihoods that are measured indcpcndcntly by imposing thc abovc-mentioned constraint. It should be noted that this constraint can be applied to the adjustccl likclihood matrix M' both row-wise and column-wise. For instance, at most one of s q and ss (or none of them) could be sl, while s,l could be at most one of s l , sz and s y (or none of them, i.c., sd is a new person) in the example shown in Fig. 7 . For sin~plicity, we apply a 2-D r) operation (r12"), a row-wisc normalization followed by a colurnn-wise normalization. to normalize tho likelihood matrix M' and obttiin Ram this normalized lildihood nmtrix El we ca.n obtain useful informat,ion such as the existence odds of sl is increased by 0.05 + 0.72 = 0.77, the odds that s 4 is a new person is 1 -0.05 -0.40 -0 = 0.55, ctc. Consequerltly, the existence odds at time t + 1-under col-rdk car1 be obtained using Ecl. As shown in Fig. 8 , assume that person sl leaves the room at times t and t + 1, i.e., 3-= 1 and 3 = 1 (denoted by the shaded circles in Fig. 8) . We Table 1 .

We first examine the gain of existence odds for each person from time t+ to t + 1-: y(j) = P[s,~,~+,-= llqt = sl] -P[sj,,+ = lIqt = sl]. Clearly, xi y ( j ) = 2 because the gain in existence odds is due to the two newly entered persons. Howcver, if we know for sure that sl leavcs the room at tirnc t + 1, t,hen he/she must be in thc room at time t + I-, and thus, $1) should be cqual t,o 1 rather than 0.77. In ot,her worcls, the gain for each person nceds to he re-adjusted (7 in the table) t,o incorporate the knowledge of this new assumption to ensure that ?(I) = 1. That Y(1) is eclual to one can also be cxplained as follows: since sl leaves at time t and tirnc t + l successively, one of the entered persons s,j and s~ must be s l . As there is no reason to favor any pcwon other than sl, our approwh is to incrca.sc sl's exist,erice odds from 0.77 to 1 and decrease the others' proportionally. This is sensible as orice we know t,hat the person who leaves the room at time t + 1 is sl, then s 4 arid s:, should be more likely to be sl than what we originally estimate, and consequently, less likely to bc other people. At timc t + l+, thc existcricc odds of sl bccomt:s zero due to his exit,, and tho others' cea be obta.ined a.s the summation of their existence odds at time tf and the re-adjusted gain, i.e., P[sj;,+ = llqr = sl] and y(j) . Table 1 . Estimation of the existence odds

The above analysis and calculatioi~ can be sunirnarizcd into the general formulas below

It car1 be seen from these formulas that the estimation = llqt = Sj-1. (23) of existence odds is recursive; that is, one's existence odds at time t + If depends on that at t+. It should also be noted that although thc above derivation may appear somewhat complex, it leads to an important underlying property of the proposed framework: the summation of the existence odds of all states at any time instance t f / -(i.e., xi P [ s~,~+ , -= llqt = sj]) for any t and j is always equal to the number of people who really stay in the room at that time.

Batch rccognition of obscrvations (people who have lcft the room) can be gerforrnetl at any time when riecessary by ret,rieving the state sequence with t,he maximum score of joint posterior probability as the best sta.te sequence.

To use the proposed framework for recognizing people re-entering the room is equivalent to finding and merging those states associated with a same person in the best state sequence. This can be accomplished by the followirlg local maximurn likelihood scheme. Let qt, = Si and search backward in thc best state sequenm for qtt = Si, where t' E (1,. . . , t -1) arid tt' is minimized. 

To test the proposed people monitoring system, we have captureci two vidcos in a research laboratory using a low-cost PC camera monitoring the lab's only cntrancc. During an one-hour monitoring period, video-I recorded eleven people who were unawarc of the experiment, of which four entered and left twico, another four er~teretl and 1t:ft only once, arid three er~teretl without leaving. Video-I1 simulated the process of people entering a,nd leaving with the help of eight students, among whom seven entered and left the lab for three times and another one for two times. In video-11, the volunteers were asked to approach the camcra so that thcir faccs could be recorded by the camera from a reasonable distance.

In our experiments described below, color histogram is tested on Video-I and Video-I1 while face is tested on Video-I1 and the face database of Olivetti Research (400 images of 40 individuals, 10 images per individual at the resolution of 92 x 112 pixcls). A synthct,ic process generator is also designed to randomly re-arrailge the entries arid exits frorn thct two vidcos a,nd syrithesizc processes from the fa.ce database according to the rule that orie cannot enter unless he/she is outside the lab, and vice versa. This generator allows us to simulate a large combinations of entries and exits ovcr time from the same group of people.

For comparison, we also implement a recognition approach based on maxirnurn likelihood (ML) classification [32, 33] as follows. When a pcrson is det,ected entering a closed room, he/she is compared with people who are in the system's database and currently labeled as io~it'. If the nlaximurri likelihood with respect to an 'out' person is larger than a threshold T,, they are considered the same person. Then, the observed person is labeled as 'in' and his/hcr corresponding fea.ture template in t,he database is updated with the latest one. Otherwise, the person is assumed to be new and then labeled as 'in' with a new identity label. When there are multiple exits without entries among them, the leaving people are recognized from the people with label 'in' by niaxirnizing a joint likelihood.

We now use an exampla of t,he synthesized entry/exit process to illustrate the superiority of our approach against the rnaxirnurn likelihood approach. The process is obtained based on tkie eight people (Pl--P8) recorded in video-TI a s shown in Table 2 , where 'I' and '0' represent in (entry) and out (exit), respectively. A total of 44 entries arid exits are observed, among tliern 22 are entwirig while the other 22 are exiting. Note that in this extl.mple we take color histogram as the low-level feature . Table 2 . A sequence of entries and exits obtained from the synthetic generator using the eight people observed in Vidco-I1 I I O I I O I O I I I O I I O P 5 P 1 P 8 P 7 P 3 P 5 P 2 P 6 P 4 P 6 P 8 P 5 P 8 P 7 P 2 C o n t 1 d O I 0 O 1 I 0 0 1 1 1 0 0 1 I P 7 P2 P6 P5 P 7 P4 P4 P4 P 7 P1 P3 P8 P8 P5 0 0 0 I I 0 I 0 0 0 0 I 0 0

Thc recognition results of thc eight pcoplc (at observation timc) arc givc~i in Table 3 . The results obtained by our proposed approach and the ML ayproach are compared against the ground truth. In this particular example, our qtproach achieves 100% recognition rate, outperforming the ML approach. On t,he other haud, the iVtL approach wrongly recognizes P 8 as P1 at time 7 and thc othcr way around at time 19. Furthermore, P 7 is incorrectly ident,ified as P 3 at tirnc 18 and rcw:rsely at time 20. Thesc racognition crrors are rnairily due to that the similarity measures between these different people exceed the preset threshold T,,. On the contrary, the errors at times 10, 13 and 15, where P6 and P 7 are wrongly identified as new people whcil they are just re-entering, asc duc to that the ~imilarit~ies of their fcatures observed at differcnt times are lower than T , . In othcr words, tlie ML approach is rather st:risitivc to the threshold T,,, inappropriate selection of which often results in false recognitions. In comparison, our proposed approach benefits from the threshold-free scheme; therefore, it is more robust to variations in fcaturc cxtractions as well as changes in lighting conditions or vicw angles. R,ecall that at each observation time there is a partial best statc seyucncc ending in each individual state with t,he probability score of bl(i). To make a recognition decision at t,ime t is thus t,o choose from all the partial best st& sequence the one wit,h the largest value of b,,(i). Consequently, a confidence indcx can be defined in the rangc [O, I] to cvaluatc tlie reliability of a ticcision that is made at each timc rriax(S, (i))

c o n f (t) = t 3 i hl(4 . Fig. 9 shows t,lie variation of the confidence index over t,inle for the above example. Fig. 9 . 'The conlidenco indcx over observalion timc

In the early stage of monitoring, thc system has only fcw choices for making decision (few people are observed); therefore, the confidence index is usually high. With the increase of the state number, i.e., more possible paths to choose from, the reliability of a decision may decrease. However, the adva.rit,age of our approach lies in that it is capable of maintaining the decision reliability at a later observation time by collectively considering all the available observations--a merit of the Viterbi algorithm as described in Sec. 2. In Fig. 9 , wlieri the confidence index is lower than 0.8, the ML approach is likely to make a wrong decision at the corresponding time (as indicated by the circlos). Maariwhile, the proposed approach can still make the right choicc since the probabilities (or scores) of the other paths are even lower.

It should be noted that the computational complexity of our framework increases rapidly when more states are gcnera,ted. However, when the confidence index is sufficioritly high, the total number of states can be reduced by making a definite decision to marge the states iissociated with thc same person (e.g., at time 16 in Fig. 9 ). We consider this a promising topic for future investigation. Table 4 summasizes the recognition performances of the experimental rcsults, where 20 synthesized processes of entries and exits are generated for each t,est data set and for cadi feature type considered. It is evident from t,he table that our proposed approach can notably improve the recognition accuracies as compared with that of the ML approach. 

We have presented in this chapter a novel probabilistic reasoning framework for rnouitoring people in a closed room. Rather than identifying each single observation from a database, the framework is devised to recognize people based on multiplo observations by exploiting the temporal correlations and constraints imposed by the application domain. In addition, the proposed framework permits its pwameters to be estimated a r d updated at each observation instance by combing low-level features and domain-specific knowledge. Experimental results demonstrate that the proposed approach outperforms tho existing maximurn-likelihood approach when using the same features and being tested with thc same test videos.

19'39) Security applications of comp~itcr vision

Object rccognit,ion and tracking for remote video surveillance. IEER Trans. Circuits and Systems for Video 'I'echnology

Open sourccb computer vision library reference mannal

Itohust real-time object detection

An extended set of Haar-like features for rapid object detection

!)99) An embedded I-IMM based approach for face deteclion arid recognition. I'roc

A Bayesian computer vision system for modeling human interactions. 1EEE 'l'mns. Pattern Analysis and Machinr Tntelligcncc

Real time surveillance of people and their activities

Outbreak of Severe Acute Respirittory Syndromo -Worldwide

Evaluation of automated I~iomctrics-based identification and verification systems

IIuman and machine recognition of faces: a survey. l'roc

Gait-based recognition of humans using coritirluous HMMs

Bayesian-based performance prediction for gait, rrcognition

Pattern classification

Statistiral motion rnodel bascd on the change offealure relalionships: human gait-based recognition. IEEE 'l'rans. Pat tern Analysis and Machine Intelligence

A color histogram based people tracking syslerrl

Proc. IEEE International Symposium on Circuits and Systems (ISCAS 2001)

Color appearance-based approach to robust tracking <and recognition of multiple people. the Fourth International Conkrcnce on Inforrr~ation, Communications & Signal I'rocessing and Fourth I'acilic-Rirn Conferencc on Multimedia

Color Indexing

An cfficient color representation for irnage retrieval

Robust color histogram descriptors for video scgnlcnt rctricval and identification

3D real-timc head tracking using color l~istograrns arid stereovision

(1'392) Why tanh: choosing a sigmoidal hmci ion., l'roc

Human action learning via hidden hiarkov model

Visual recognition of Arnerican sign language using hidden Markov models

Aiitn~nated visual survcillancc using hidden Markov models

A tutorial on hidden Markov modcls and selected applications in speech recognitinn

Coinmcmis on Computational Learning Modrl for Metrical Phonology

The Vite~bi algorithm. I'roc

1!195) h tutorial on learning with Bayesian networks

A brief introduction to graphical models and Bayesian networks

A maximum-likelihood approach to visual event classification

) nacking groups of people. Computer Vision arid Image Undcrstanding

I t should b e noted t h a t t,he proposed system can b e readily cxtendcd to nioriitor multiple cntrarices or adjacent areas with t h e use of a n array of cameras, all being iriterconnected aiid sharing t h e results obtained by each owil analysis unit.