DOC PREVIEW
TAMU CSCE 483 - tracking-proposal

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Facial Tracking and Animation Project Proposal Todd Belote David Brown Brad Busse Bryan Harris CPSC 483 – Spring 2004TABLE OF CONTENTS INTRODUCTION ..................................................................................................3 Problem Background ......................................................................................3 Needs Statement .............................................................................................4 Goal and objectives ........................................................................................4 METHOD OF SOLUTION.....................................................................................5 Literature and technical survey .....................................................................5 Design Constraints and feasibility.................................................................5 Evaluation of alternative solutions................................................................5 Statement of Work...........................................................................................7 i. Proposed Design..................................................................................7 ii. Approach for design validation...........................................................9 iii. Economic analysis and budget......................................................10 iv. Schedule of tasks............................................................................11 Project Management and Teamwork ...........................................................13 Societal, safety and evironmental analysis.................................................14 Appendices....................................................................................................15 i. CV / Qualifications..............................................................................15 ii. Product datasheets ............................................................................15 iii. Bibliography ....................................................................................17INTRODUCTION Problem Background In 2001 Dr. Ricardo Gutierriez-Osuna and the PRISM (Pattern Recognition and Intelligent Sensor Machines) lab at Texas A&M University published a paper, Speech Driven Facial Animation[1], which discussed the current state of research in the area of facial animation created from processed speech and presented a plan to extend this area of research. One action suggested by this paper was the development of a low cost, facial motion and speech processing system. Currently such systems, such as the system sold by Vicon, cost in the range of $60,000. As a result current TAMU graduate students Marco Zavala and Karl Jablonski developed a sub-$1000 system capable of tracking facial points and receiving audio synchronously. The purpose of the system was to provide information relating facial movements to specific characteristics of a speech waveform. The system runs on a personal computer (Pentium IV 2.0 GHz, 512 MB RAM). The hardware consists of a Winnov Videum 1000 Plus audiovisual capture card capable of acquiring 640X480 video at 30 fps and audio at 16 kHz, an IBM's PupilCAM, and Acoustic Magic's Voice Tracker microphone array. As the system acquires data it creates a processed audio file and (X,Y) .fap file which together can be input into a Facial Animation Engine with MPEG-4 specifications to verify the point tracking and audiovisual synchrony. The current system accomplishes the desired goal at a basic level. The PupilCAM consists of a basic CCD camera with two arrays of infrared LEDs. The LEDs reflect off of markers placed on the face to accentuate the points. This light is reflected through two installed filters to lower the amount of light and create more concise points. Currently when the system is started the points must be initialized manually. This is accomplished by taking a snapshot of the video feed, which is converted to black and white and then inversed, and having the user manually select the points they want to track. Once this is done, the system begins reading video and audio. As the video is input each frame is analyzed to find the position of the points being tracked. These points are used to create an (X,Y) coordinate output file. The reason for this is that capturing and storing the entire video is not feasible for long periods of time, and also unnecessary for the purpose of facial animation. As a new frame is captured, the points from the previous frame need to be identified. This is accomplished by taking an 11X11 template of the point and trying to match it to a 21X21 square surrounding the previous location of the point. The audio is input into a buffer which represents 1 second, or 16,000 samples. These samples are then processed with two algorithms: Mel Frequency Cepstral Coefficients(MFCC), to model perceptual features of speech production, and Linear Prediction Coefficients(LPC), as another option to model certain characteristics of the waveform.Needs Statement While both the audio and video capture and processing of the current system work correctly now, there are many improvements which can be made to the system, primarily in the area of point initialization and tracking. The point tracking system only works with minimal head movement, and if a point is lost due to turning it is unrecoverable. The initialization system requires that the user not move while initializing the points, so that when they are all selected they are still at the same place in the screen. These need to be both more robust and more stable in the final operation. Also, the data acquisition synchrony needs to be verified, and the fap file generation for the Facial Animation Engine(FAE) needs to be refined. Goal and objectives The goal of the data acquisition in our system is to provide audio-visual data in a manner which facilitates real-time processing. The data shall be delivered to the processing functions in a robust and timely manner. The current data acquisition system will be modified, or replaced, to satisfy our new processing and synchronization requirements. To improve the current point initialization system by automating all or part of the process. Currently the points are selected manually from a still image. We propose to develop an algorithm which will automatically determine the location of either all the points, or all but a couple


View Full Document

TAMU CSCE 483 - tracking-proposal

Download tracking-proposal
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view tracking-proposal and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view tracking-proposal 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?