DOC PREVIEW
Video Face Replacement

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Video Face ReplacementKevin Dale1Kalyan Sunkavalli1Micah K. Johnson2Daniel Vlasic3Wojciech Matusik2,4Hanspeter Pfister11Harvard University2MIT CSAIL3Lantos Technologies4Disney Research Zurich(a) Source (b) Target (c) Aligned (d) Three frames of the blended resultFigure 1: Our method for face replacement requires only single-camera video of the source (a) and target (b) subject, which allows for simpleacquisition and reuse of existing footage. We track both performances with a multilinear morphable model then spatially and temporally alignthe source face to the target footage (c). We then compute an optimal seam for gradient domain compositing that minimizes bleeding andflickering in the final result (d).AbstractWe present a method for replacing facial performances in video.Our approach accounts for differences in identity, visual appear-ance, speech, and timing between source and target videos. Unlikeprior work, it does not require substantial manual operation or com-plex acquisition hardware, only single-camera video. We use a 3Dmultilinear model to track the facial performance in both videos.Using the corresponding 3D geometry, we warp the source to thetarget face and retime the source to match the target performance.We then compute an optimal seam through the video volume thatmaintains temporal consistency in the final composite. We show-case the use of our method on a variety of examples and presentthe result of a user study that suggests our results are difficult todistinguish from real video footage.CR Categories: I.4.3 [Image Processing and Computer Vi-sion]: Enhancement—Filtering; I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—AnimationKeywords: face replacement, facial animation, video compositingLinks: DL PDF1 IntroductionTechniques for manipulating and replacing faces in photographshave matured to the point that realistic results can be obtained withminimal user input (e.g., [Agarwala et al. 2004; Bitouk et al. 2008;Sunkavalli et al. 2010]). Face replacement in video, however, posessignificant challenges due to the complex facial geometry as well asour perceptual sensitivity to both the static and dynamic elementsof faces. As a result, current systems require complex hardware andsignificant user intervention to achieve a sufficient level of realism(e.g., [Alexander et al. 2009]).This paper presents a method for face replacement in video thatachieves high-quality results using a simple acquisition process.Unlike previous work, our approach assumes inexpensive hardwareand requires minimal user intervention. Using a single camera andsimple illumination, we capture source video that will be insertedinto a target video (Fig. 1). We track the face in both the sourceand target videos using a 3D multilinear model. Then we warp thesource video in both space and time to align it to the target. Finally,we blend the videos by computing an optimal spatio-temporal seamand a novel mesh-centric gradient domain blending technique.Our system replaces all or part of the face in the target video withthat from the source video. Source and target can have the sameperson or two different subjects. They can contain similar perfor-mances or two very different performances. And either the sourceor the target can be existing (i.e., uncontrolled) footage, as long asthe face poses (i.e., rotation and translation) are approximately thesame. This leads to a handful of unique and useful scenarios in filmand video editing where video face replacement can be applied.For example, it is common for multiple takes of the same sceneto be shot in close succession during a television or movie shoot.While the timing of performances across takes is very similar, sub-tle variations in the actor’s inflection or expression distinguish onetake from the other. Instead of choosing the single best take forthe final cut, our system can combine, e.g., the mouth performancefrom one take and the eyes, brow, and expressions from another toproduce a video montage.A related scenario is dubbing, where the source and target sub-ject are the same, and the source video depicts an actor in a studiorecording a foreign language track for the target footage shot on lo-cation. The resulting video face replacement can be far superior tothe common approach of replacing the audio track only. In contrastto multi-take video montage, the timing of the dubbing source iscompletely different and the target face is typically fully replaced,although partial replacement of just the mouth performance is pos-sible, too.Another useful scenario involves retargeting existing footage toproduce a sequence that combines an existing backdrop with anew face or places an existing actor’s facial performance into newfootage. Here the new footage is shot using the old footage as anaudiovisual guide such that the timing of the performances roughlymatches. Our video-based method is particularly suitable in thiscase because we have no control over the capture of the existingfootage.A final scenario is replacement, where the target facial performanceis replaced with an arbitrary source performance by a different sub-ject. This is useful, for example, when replacing a stunt actor’sface, captured in a dangerous environment, with the star actor’sface, recorded in a safe studio setting. In contrast to retargeting,where the source footage is shot using the target as an audiovisualguide to roughly match the timings, the performance of the sourceand target can be very different, similar to dubbing but with differ-ent subjects.Furthermore, it is entertaining for amateurs to put faces of friendsand family into popular movies or music videos. Indeed, an activecommunity of users on YouTube has formed to share such videosdespite the current manual process of creating them (e.g., search for“Obama Dance Off”). Our video face replacement system wouldcertainly benefit these users by dramatically simplifying the cur-rently labor-intensive process of making these videos.Video face replacement has advantages over replacing the entirebody or the head in video. Full body replacement typically requireschroma key compositing (i.e., green screening) or rotoscoping toseparate the body from the video. Head replacement is difficultdue to the complexities of determining an appropriate matte in re-gions containing hair. Existing methods for both body and head re-placement require expensive equipment, significant manual work,or both [Alexander et al.


Video Face Replacement

Download Video Face Replacement
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Video Face Replacement and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Video Face Replacement 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?