UCI ICS 273A - hpirsiav Hamed Pirsiavash mathine learning project report - D2850831

Home> Schools> University of California, Irvine> (ICS) > ICS 273A> hpirsiav Hamed Pirsiavash mathine learning project report

DOC PREVIEW

UCI ICS 273A - hpirsiav Hamed Pirsiavash mathine learning project report

School name University of California, Irvine

Course Ics 273a- Machine Learning

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Human Activity and Pose Recognition using Space-time Correlation Hamed Pirsiavash Deva Ramanan Department of Computer Science Department of Computer Science University of California Irvine University of California Irvine Irvine, CA 92617 Irvine, CA 92617 [email protected] [email protected] Abstract In this project we will implement and use space-time behavior based correlation algorithm to recognize the human activity and pose. Space-time correlation method measures the similarity between two distinct video clips based on the motion. We will use this algorithm in a nearest neighbor type classifier to find the best match to a new query in the database and then recognize the human activity based on prior known labels of the database. 1 Introduction Human activity and pose recognition is a fundamental problem in many applications such as visual surveillance, video summarization, and video indexing [3]. In a very simple scenario, whenever we recognize the motion of human body and its pose, we can use it in indexing the vide database and then we may perform video search very efficiently based on the visual data. Also, in video summarization, usually, we are looking for video segments containing some particular types of motion or segments without them; hence, these algorithms can be very useful in these kinds of applications. Some previous works use the model based approaches. They use constraints coming from kinematics and dynamics of the human body and using some models for body parts e.g., joints, and then estimate the parameters from video data. [4] Some other approaches use the visual data to estimate the human body motion and pose without any model [3]. Generally, they learn a function which maps high dimensional visual data to the motion space. These methods have good performance in initializing the model based approaches. In this project, we will use a method closer to the second category. We will use space-time behavior based correlation method [1] to classify small sequences of human motion captured from different views. In section 2, the space time correlation method would be explained and in the next section, our classifier, implementation, and results would be discussed.2 Space-time behavior based correlation Shechtman et all have introduced space-time behavior based correlation as a novel method to measure the similarity between two video segments based on their motion [1]. They try to find the motion consistency between two space-time patches coming from two different video sequences. In some previous works, researchers have tried to calculate the optical flow for these two patches and then find the correlation between them. But these approaches have a lot of difficulties like aperture problem and high sensitivity of optical flow to noise. In this method, they try to define a similar correlation measurement avoiding explicitly calculation of the optical flow. They choose small space-time patches e.g., 7*7*3 patches from a video segment and calculate the gradient with respect to space and time and stack all of them in a matrix named G. We have Where (u,v,w) is the direction of pixels with equivalent intensity values and “n” is the number of pixels in the patch. Figure 1 shows a patch and a (u,v,w) vector. Figure 1. An ST patch and the vector containing pixels with equivalent intensity values. We haveHence, for a patch with consistent motion i.e, parallel equal intensity lines, M is a 3*3 matrix for which the null space is the direction of equal intensity lines. Therefore, calculating the rank of M seems to be sufficient to investigate motion consistency in the patch. Note that in investigating the motion between two patches, we can concatenate the G matrices and calculate the resulting M as follows: If the motions of both patches are consistent with each other, then this new matrix M would also have a null-space in the same direction. However we know that if the spatial pattern of the patch is like an edge, it introduces a rank decrease in M and regardless of motion consistency of the patch, it would have a null space perpendicular to that special edge direction. In order to solve this degenerate case, they introduce the upper-right part of M as follows: Now, we have to calculate the rank increase by going from this 2*2 matrix to our original 3*3 matrix M. If this rank increase is 0, we have motion consistency and if it is 1, there are multiple motions in the patch. They define rank increase using the eigen values of matrices as follows: And they introduce the inconsistency measure for each pairs of patches as follows: Where the numerator is the rank increase for concatenated gradient vectors and “epsilon” is added to eliminate any possible division by zero. By summing the inverse of this measure over all possible patches for two space time windows from two video segments, we can find the similarity measure for those two windows. And by moving windows, we may find the correlation volume between two video sequences. 3 Implementation and results In this project, we implemented the above mentioned method and used it in recognizing human motion in small video sequences. First, we tried the results of correlation algorithm on looking for a query containing only one type of motion (walking) in a video segment containing multiple motions. The videos are downloaded from [5] and after calculating the correlation, we have superimposedthe original video with a green ellipsoid on the peak value of the correlation. Figure 2 shows a sample fame of the output sequence. Input and output sequences can be downloaded from http://ics.uci.edu/~hpirsiav/machineLearning/report_video1.zip Figure 2. Result of the correlation algorithm on a walking query and a video segment containing multiple motions. Left to right: original video, query, superimposed result. In using this method for activity and pose recognition, we used CMU MoBo database [2] which contains several sequences from 25 subjects walking in four different speeds and captured from six different views. We chose six different motion classes from CMU MoBo database as follows: 1. Side-view fast walk (fastWalk/vr03_7) 2. 45-degree-view fast walk (fastWalk/vr16_7) 3. front view fast walk (fastWalk/vr07_7) 4. Side-view slow walk (slowWalk/vr03_7) 5. 45-degree-view slow walk (slowWalk/vr16_7) 6. Front view slow walk (slowWalk/vr07_7) Some sample frames are shown in Figure 3. Then we

View Full Document