New version page

WKLiao_CVPR08

This preview shows page 1-2-3 out of 8 pages.

View Full Document
View Full Document

End of preview. Want to read all 8 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

3D Face Tracking and Expression Inference from a 2D Sequence Using ManifoldLearningWei-Kai Liao and Gerard MedioniComputer Science DepartmentInstitute for Robotics and Intelligent SystemsUniversity of Southern CaliforniaLos Angeles, CA 90089-0273{wliao, medioni}@usc.eduAbstractWe propose a person-dependent, manifold-based ap-proach for modeling and tracking rigid and nonrigid 3Dfacial deformations from a monocular video sequence. Therigid and nonrigid motions are analyzed simultaneously in3D, by automatically fitting and tracking a set of landmarks.We do not represent all nonrigid facial deformations as asimple complex manifold, but instead decompose them on abasis of eight 1D manifolds. Each 1D manifold is learnedoffline from sequences of labeled expressions, such as smile,surprise, etc. Any expression is then a linear combination ofvalues along these 8 axes, with coefficient representing thelevel of activation. We experimentally verify that expres-sions can indeed be represented this way, and that individ-ual manifold s are indeed 1D. The manifold dimensionalityestimation, manifold learning, and manifold traversal oper-ation are all implemented in the N-D Tensor Voting frame-work. Using simple local operations, this framework givesan estimate of the tangent and normal spaces at every sam-ple, and provides excellent robustness to noise and outliers.The output of our system, besides the tracked landmarks in3D, is a labeled annotation of the expression. We demon-strate results on a number of challenging sequences.1. IntroductionNonrigid deformation is an important property of hu-man faces, as it conveys information about a human’s men-tal state. A lot of research has been devoted to investigatedeformable face models based on linear subspace analysis.The 2D Active Shape Model (ASM) and Active Appear-ance Model (AAM) [5, 6, 15] approximate the shape defor-mation as a linear combination with some 2D basis shapes.The model is learned using Principal Component Analysis(PCA). The AAM inherits the idea of deformable shape, butalso learns an appearance model for texture variation. A 3Ddeformable model has also been proposed. In [22], Xiaoet al extended the 2D AAM to a combined 2D+3D AAM.In [2, 3], Blanz and Vetter built a 3D morphable model forfacial animation and face recognition.More recently, in [11], Gu and Kanade proposed a 3Ddeformable model consisting of a set of sparse 3D pointsand patches associated with each point. Based on thismodel, an EM style algorithm is proposed to infer head poseand face shapes. In [23], Zhu and Ji proposed a normalizedSVD to estimate the pose and expression. Based on this, anon-linear optimization method is also proposed to improvethe tracking result. Vogler et al [21] proposed an integrationsystem to combine 3D deformable model with 2D ASM.The proposed system uses ASM to track reliable featuresand 3D deformable model to infer the face shape and posefrom tracked features.In the above papers, the construction of a deformablemodel is built on top of the linear subspace approach. How-ever, linear subspace methods are inadequate to representthe underlying structure of real data, and nonlinear mani-fold learning approaches are proposed [19, 20]. Nonlineardimensionality reduction techniques provide a good alter-native to model the high dimensional visual data. In [4],Chang et al proposed a probabilistic approach based on theappearance manifold for expression analysis. In [9, 10], theauthor proposed a manifold based approach for 3D bodypose tracking and general tracking.Most nonlinear manifold learning techniques character-ize the intrinsic structure by recovering the low-dimensionalembedding. For example, ISOMAP [20] finds a low-dimensional embedding that preserves geodesic distancesin the input space. Locally-linear embedding (LLE) [19]searches for a manifold based on the local linearity prin-ciple. Recent works argue that this may not be the bestway to parameterize the manifold, especially for the pur-1Detect and Track 2D positionsof landmark points3D embeddingNonlinear manifold learningInfer 3D shapes,manifold coordinates,and expressionDetect and Track 2D positionsof landmark pointsEstimate head poseOffline OnlineManifold-based facialdeformation modelHead pose, 3D shape, andexpression labels3D facemodelTraining sequencesTest videoFigure 1. Flow chart of proposed systempose of handling noisy data and out-of-sample generaliza-tion [1, 7, 8]. They propose different algorithms to estimatethe local tangent hyperplane on the manifold, and use theestimated tangent to manipulate novel points. Besides, [18]proposed computational tools for statistical inference on aRiemannian manifold.Here, we propose a new framework to model the de-formable shape using nonlinear manifolds. The main con-tribution is two-fold. First, instead of using a linear sub-space analysis, we argue the 3D facial deformations arebetter mo deled as a com bination of several 1D manifolds.Each 1D manifold represents a mode of deformation or ex-pression, such as smile, surprise, blinking, etc. By learningthese manifolds, a 3D shape instance, usually representedby a very high dimensional vector, can be mapped into alow-dimensional manifold. The coordinate on the manifoldcorresponds to the magnitude of facial deformation alongthat mode. We thus call it the “level of activation”. Second,we propose a novel framework of nonlinear manifold learn-ing based on N-D Tensor Voting [16, 17]. Tensor Votingestimates the local normal and tangent spaces of the mani-fold at each point. The estimated tangent vectors enable usto directly navigate on the manifold.The proposed 3D deformable shape model is applied tononrigid face tracking. We develop an algorithm to infer thenonrigid 3D facial deformations with the head pose and ex-pression iteratively, based on the proposed model. Withoutlearning complex facial gestures dynamics, the proposed al-gorithm can track a rich representation of the face, includ-ing the 3D pose, 3D shape, expression label with probabil-ity, and the activation level. The flow chart of our proposedsystem is outlined in figure 1.The rest of this paper is organized as follows: We startwith the offline constru c tion of the manifold-based facialdeformation model. The formulation and learned manifoldsare presented in section 2. The manifold learning and infer-ence is implemented in the N-D Tensor Voting framework,shown on section 3. Based on the proposed model and in-ference tool, we


Loading Unlocking...
Login

Join to view WKLiao_CVPR08 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view WKLiao_CVPR08 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?