TAMU CSCE 689 - ezzat2002videorealisticSpeechAnimationSLIDES

Unformatted text preview:

Trainable VideorealisticSpeech AnimationTony Ezzat, Gadi Geiger, Tomas Poggio@ MITPresented by: Yinan Fan04-26-2007News Coveragez (July 23, 2002) REPORTS FROM SIGGRAPH-2002 - Wendy Ju: Character Animation z (July 2, 2002) CNN : Video Research at MIT Puts Words into Mouthsz (June 30, 2002) ASSOCIATED PRESS - Theo Emery: Video Research at MIT Puts Words into Mouths, with Startling Resultsz (June 17, 2002) THE DISCOVERY CHANNEL [Toronto, Canada] - Jennifer Scott Video:* Science, Lies & Videotapez (May 28, 2002) DER SPIEGEL [Germany] - Marco Evers: Videomanipulation: Wie BilderLuegen Lernenz (May 20, 2002) NBC TODAY SHOW - Katie Couric: Video:* (100 Kbps)(300 Kbps) z (May 20, 2002) MIT NEWS OFFICE: TECH TALK - Deborah Halber: Realistic Animation of Human Face Makes Simulated Talking Look Realz (May 16, 2002) NPR - "All Things Considered" - Robert Siegel: Audio: MIT Video Lipsyncz (May 16, 2002) TORONTO GLOBE & MAIL - Graeme Smith: Computers Fake Moving Mouthsz (May 15, 2002) BOSTON GLOBE - Gareth Cook : At MIT, They Can Put Words in Our MouthsBackgroundz Facial Modelingz 3D methodsz Image-based methods: photorealistic?,videorealistic? Parsimonius?z Video Rewritez Speech Animationz Keyframez Physics-basedz Machine learning methodsz Problem: Motion, smoothness, dynamics, coarticulationeffects…z MMMMMM?Well in some sense,… yes…corpus…...preprocessed and sorted…principle component selected…relationship?graph? MM space?data analysis…some new stuff!MMM,…seriouslyz Morphable Model Representation z A low-dimensional space --paramaterized by shape parametersαand appearance parameters βz A ``black box'' capable of performingz Synthesis z AnalysisSystem OverviewCorpusz A human subject uttering various uttaerace, in neutral expressionz 640*480 of 29.97 fps NTSC, 44.1KHzz 15 minutes, 30000 framesz 152 one syllable wordsz 156 two syllable wordsz 105 short sentencesPre-processingz Audio phonetically aligned(using CMU Sphinx system)z Each image normalized ----head maskz Planar perspective deformationz Eye maskMasksz The only manual workMMM: Definitionz A set of prototype images z A set of prototype flows z Using coarse-to-fine, gradient based optical flow algorithm 1{}NiiI=1{}NiiC=() { (), ()}iiixyCp d pd p=Building MMMz Task: choose image prototypes and compute correspondenceBuilding MMMz EM-PCAz 15 PCA dimensionsz Ij Î pjz K-Means Clusteringz Mahalonobis distance metric:z N=46: No explicit relationship to visemsz Dijkstraz Corpus graphz K-nearest neighbor frames (k=20), weighted by MDz Dijkstra shortest path => 46 correspondencesSynthesisz Goal: Map (α,β) to an image in MMMzα: 46-dimensional Æ mouth shape zβ: 46-dimensional -> mouth textureSynthesisz Steps:z Synthesize a new correspondence:z Forward Warp aSynthesisz Steps:z Synthesize a new correspondence:z Forward Warp aSynthesisz Steps:z Synthesize a new correspondence:z Forward Warp aSynthesisz Steps:z Synthesize a new correspondence:z Forward Warp aAnalysisz Goal:Project the entire recorded corpus onto the constructed MMM, and produce a time series of parameters (α,β) that represent trajectories of the original mouth motionz Each utterance analyzed with respect to the 92 dimensional MMMAnalysisz Estimate parameter α:Îz N image warps are synthesizedz Estimateβ:Analysis ResultTrajectory Synthesisz Goal:Map from an input phone stream {ptt} to a trajectory of parameters yt=(αt,βt) in MMM space. z Phone stream?={\w\, \w\, \w\, \w\, \uh\, \uh\, \uh\, \uh\, \uh\, \uh\, \n\, \n\, \n\, \n\, \n\} =>word ‘one’151{}ttp=HistogramTrajectory Synthesisz Mathmatically a regularization problem:z Minimization:Trainingz Adjust the means and variance to better reflect the training dataÎTraining ResultPost-ProcessingResultz Demosz Interviews: Discovery,NBCz Another ExampleEvaluationsDiscussionsz Viewing Conditions?z 2D->3Dz Emotionz Better video-realismz Geodesic trajectory


View Full Document

TAMU CSCE 689 - ezzat2002videorealisticSpeechAnimationSLIDES

Documents in this Course
slides

slides

10 pages

riccardo2

riccardo2

33 pages

ffd

ffd

33 pages

intro

intro

23 pages

slides

slides

19 pages

p888-ju

p888-ju

8 pages

w1

w1

23 pages

vfsd

vfsd

8 pages

subspace

subspace

48 pages

chapter2

chapter2

20 pages

MC

MC

41 pages

w3

w3

8 pages

Tandem

Tandem

11 pages

meanvalue

meanvalue

46 pages

w2

w2

10 pages

CS689-MD

CS689-MD

17 pages

VGL

VGL

8 pages

ssq

ssq

10 pages

Load more
Download ezzat2002videorealisticSpeechAnimationSLIDES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view ezzat2002videorealisticSpeechAnimationSLIDES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view ezzat2002videorealisticSpeechAnimationSLIDES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?