A linear estimation method for 3D pose and facial animation tracking.

Home> Academic Documents> A linear estimation method for 3D pose and facial animation tracking.

DOC PREVIEW

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

A linear estimation method for 3D pose and facial animation tracking.Jos´e Alonso Yb´a˜nez ZepedaE.N.S.T.75014 Paris, [email protected] DavoineCNRS, U.T.C.60205 Compi`egne cedex, [email protected] CharbitE.N.S.T.75014 Paris, [email protected] paper presents an approach that incorporatesCanonical Correlation Analysis (CCA) for monocular 3Dface pose and facial animation estimation. The CCA is usedto find the dependency between texture residuals and 3Dface pose and facial gesture. The texture residuals are ob-tained from observed raw brightness shape-free 2D imagepatches that we build by means of a parameterized 3D geo-metric face model. This method is used to correctly estimatethe pose of the face and the model’s animation parameterscontrolling the lip, eyebrow and eye movements (encoded in15 parameters). Extensive experiments on tracking faces inlong real video sequences show the effectiveness of the pro-posed method and the value of using CCA in the trackingcontext.1. IntroductionHead pose and facial gesture estimation is a crucial taskin several computer vision applications, like video surveil-lance, human-computer interaction, biometrics, vehicle au-tomation, etc. It poses a challenging problem because ofthe variability of facial appearance within a video sequence.This variability is due to changes in head pose (particularlyout-of-plane head rotations), facial expression, or lighting,to occlusions, or a combination of all of them.Different approaches exist for tracking moving objects,two of them being feature-based and model-based. Feature-based approaches rely on tracking local regions of interest,like key points, curves, optical flow, or skin color [5, 10].Model-based approaches use a 2D or 3D object model thatis projected onto the image and matched to the object to betracked [9, 7]. These approaches establish a relationshipbetween the current frame and the information that theyare looking for. Some popular methods to find this rela-tion use a gradient descent technique like the active appear-ance models AAMs [4, 15], a statistical based techniqueusing support or relevant vector machines (SVM and RVM)[2, 14], or a regression technique based on the CanonicalCorrelation Analysis (CCA) (linear or kernel based). CCAis a statistical method which relates two sets of observa-tions, and that is well suited for regression tasks. CCA hasrecently been used for appearance based 3D pose estimation[11], appearance-based localization [12] and to improve theAAM search [6]. These works highlight the advantages ofthe CCA to obtain regression parameters that outperformstandard methods in speed, memory requirements and ac-curacy (when the parameter space is not too small).In this paper we present a model-based approach that in-corporates CCA for monocular 3D face pose and facial an-imation estimation. This approach fuses the use of a para-meterized 3D geometric face model with the CCA in orderto correctly track the facial gesture corresponding to the lip,eyebrow and eye movements and the 3D head pose encodedin 15 parameters.Although model-based methods and CCA are tradition-ally used in the computer vision domain, these two methodstogether were not already used in the tracking context. Wewill show experimentally on different public and our ownvideo sequences that, indeed, our CCA approach is wellsuited to obtain a simple and effective facial pose and ges-ture tracker.2. Face representationThe use of a 3D generic face model for tracking purposeshas been widely explored in the computer vision commu-nity. In this section we show how we use the Candide-3face model to acquire the 3D geometry of a person’s faceand the corresponding texture map for tracking purposes.2.1. 3D geometric modelThe 3D parameterized face model Candide-3 [1] is con-trolled by Animation Units (AUs). The wireframe consistsof a group of n 3D interconnected vertices to describe aface with a set of triangles. The 3n-vector g consists ofthe concatenation of all the vertices, and can be written in aparametric form as:1g = gs+ Aτa, (1)where the columns of A are face Animation Units and thevector τacontains 69 animation parameters [1] to controlfacial movements so that different expressions can be ob-tained. gs= g +∆g + Sτscorresponds to the static geom-etry of a given person’s face:g is the standard shape of theCandide model, the columns of S are Shape Units and thevector τscontains 14 shape parameters [1] used to reshapethe wireframe to the most common head shapes. The vec-tor ∆g can be used if necessary to adapt the 3D model tonon-symmetric faces locally by moving vertices individu-ally. ∆g, τsand τaare initialized manually, by fitting theCandide shape to the face shape facing the camera in thefirst video frame (see Figure 1).a.b.c.d.Figure 1. (a) 3D Candide model aligned on the target face in thefirst video frame with the 2D image patch mapped onto its surface(upper right corner) and three other semi-profile synthesized views(left side). (b),(c) and (d) Stabilized face images used for trackingthe pose: SFI1, the eyebrows and the eyes: SFI 2, the mouth:SFI3, respectively.The facial 3D pose and animation state vector b is thengiven by:b =θx,θy,θz,tx,ty,tz, τTaT, (2)where θ.and t.components stand respectively for the modelrotation around three axes and translation.In this work, the geometric model g(b) will be used to cropout underlying image patches from the video frames and totransform faces into a normalized facial shape for trackingpurposes, as described in the next section. We will limit thedimension of τato 9, in order to only track eyebrows, eyesand lips. In that case, the state vector b ∈ R15.2.2. Stabilized face imageWe consider here a stabilized 2D shape free image patch(also called a texture map) to represent the facial appear-ance of the person facing the camera and to represent ob-servations from the incoming video frame Y. The patch isbuilt by warping the rawbrightness image vector lying un-der the model g(b) into a fixed size 2D projection of thestandard Candide model without any expression (i.e. withτa=0). This patch augmented with two semi-profile viewsof the face, to track rotation in a wider range, is written asx = W(g(b), Y), where W is a warping operator (see Fig-ure 1.b). We will see in section 4 how to use other stabilizedface images to represent and track the upper and lower fa-cial features of the face (Figures 1.c


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 7 pages.

Please select your school