Using Multiple Views To Resolve Human Body Tracking Ambiguities

Home> Academic Documents> Using Multiple Views To Resolve Human Body Tracking Ambiguities

DOC PREVIEW

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Using Multiple Views To Resolve Human BodyTracking AmbiguitiesAlex Leykin, Florin Cutzu and Mihran TuceryanComputer Science DepartmentIndiana UniversityBloomington, IN 47405-7104, [email protected]@cs.indiana.edu [email protected] paper outlines the theoretical background and presents a new ap-proachto humanbody tracking with monocular static camera. A novel“view-based representation” is introduced at the feature extraction stage. We showthat ambiguities in correspondence, such as the ones that occur as the resultof occlusion, can be resolved by using this approach. In particular, we storecolor information for each object in a vector of views, where the numberof elements is determined online, using unsupervised clustering followed bythe cluster validity assessment. Based on this representation a tracking sys-tem was developed. The prelimiary results presented show the discriminativepotential of the proposed system.1 IntroductionThe majority of modern tracking algorithms consist of primarily three stages [10]. First,the objects of interest or foreground have to be separated from noise or background.Incase of a static or motionless camera this can be done by creating a certain backgroundmodel either a priori or during the run time and, consequently, subtracting this modelfrom each frame of the tracking sequence. Second, blobs of distinct shape, potentiallycorresponding to the objects being tracked have to be extracted from the resulting sceneand feature vectors are to be built descriptive of every blob. The last, third stage, isconcerned with matching each blob to the object (human body in our case) over time andspace.In this work we have concentrated on the second, representational stage. We approachit by treating visual features that describe each moving blob as two distinct subsets. Thefirst set - view independent features, are invariant of the position and orientation of thehuman body as well as the illumination of the scene. For the second set contains view-based features, where each object is thought of as having multiple views. The trackerkeeps the information about each view separately.Havingaccumulated information about each view, the program can compute matchingof the model with the blob to each view independently and to choose the best match asthe current view. Implementation of multiple views resolves ambiguities while matchingblobs in each frame to the human body objects in the system. In particular the systemrecognizes humans correctly after being fully occluded by another moving person (Figure3).In the following section of this paper we discuss the techniques for representation andfeature extraction existing in the tracking literature. Following that, we outline the threestages of our tracking algorithm with the stress on the multiple-view representation insections 3.2 and 3.3. Finally, we discuss the results obtained by using our tracker, drawconclusions and outline several prospects for the future work.2 Related workOne of the forms of view-based object representations in computer vision literature, iseigen spaces. In such a layout as an object undergoes affine transformations along theperiod of time it is matched to a set of the eigen views. These schemes can be used totrack primarily rigid objects [2], where the model for each object is acquired upfront,during the learning stage. Although the eigenspace accumulated the information aboutobjects’ color, shape and texture, it is rather difficult to utilize it for real-time tracking asthese three can change independently during tracking time. Specifically, various scalingapproaches have to be applied to bring the size of the object into correspondence withthat of the eigenview. Allowing for these constraints, eigen-veiw modeling has been usedsuccessfully primarily for face tracking [4, 13].More implementation-oriented works have utilized simplistic feature extraction tech-niques. For example, in [8, 17] each object’s color and position are recorded and nofeature model is built. In complex tracking situations, this kind of blending of the viewscan result in a number of ambiguities, for instance, when all aforementioned features areidentical for two objects undergoing a split event after occlusion.In [3, 1] the tracking feature is a distribution of colors represented by a color his-togram, which is compared with a histogram of colors observed within the current regionof interest. This region of gaussian shape is obtained most of the times by some form ofEM algorithm. The method proved to be very productive and tolerant to partial occlu-sions. It was demonstrated primarily in tracking single moving objects (e.g. human faces)in less resticted environment (moving camera), but it was not subjected to rigirous testingon video sequences with multiple moving actors.Dramatically different approach is to build a 3-D model of the object being tracked[16, 6]. This is a graphically and computationally intense approach with more than onehigh-resolution camera required. Therefore, most of the such algorithms operate in verylimited environment with no occlusions. Human body is modeled by stick figures orcombinations of blobs, where color does not play a crucial discriminatory role, for theauthors are after recognizing human activity, not tracking.3 Method3.1 Background subtractionTo subtract the background we have employed an adaptive illumination-invariant methodthat operates in HSV space. To discriminate between the moving objects and the staticbackground, we have exploited the notion of chromaticity and brightness distortion toisolate highlights and shadows from the actual moving objects (see [7]).3.1.1 Building the modelEach pixel i of the background was modeled by a 4-tuple <µi,σi,γi,βi>, whereµiis theexpected color value,σiis the standard deviation of color value,γiis the variation of thebrightness distortion, andβiis the variation of the chromaticity distortion of the i-th pixel.Let Ii=[IH(i), IS(i), IV(i)] be i-th pixel in the current frame. If the color and brightnessdistortion are denoted by Ciand Bi, then the system is a set of equations [Eq. 1- 6]:µi=[µH(i),µS(i),µV(i)] (1)σi=[σH(i),σS(i),σV(i)] (2)Ci=IS(i)2+µS(i)2σS(i)2 −2∗ IS(i) ∗µS(i)σS(i)2∗ cosIH(i) −µH(i)σH(i) (3)Bi=|IV(i) −µV(i)|σV(i)(4)γi=∑Nn=1Cn(i)N(5)βi=∑Nn=1Bn(i)N(6)The variations of color and brightness distortion in eq. 5, 6 are obtained as the aver-ages of Ciand Biover N


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 10 pages.

Please select your school