Feature Seeding for Action Recognition

Home> Academic Documents> Feature Seeding for Action Recognition

DOC PREVIEW

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Feature Seeding for Action Recognition Pyry Matikainen Rahul Sukthankar Martial Hebert pmatikai cs cmu edu rahuls cs cmu edu hebert ri cmu edu The Robotics Institute Carnegie Mellon University Abstract Progress in action recognition has been in large part due to advances in the features that drive learning based methods However the relative sparsity of training data and the risk of overfitting have made it difficult to directly search for good features In this paper we suggest using synthetic data to search for robust features that can more easily take advantage of limited data rather than using the synthetic data directly as a substitute for real data We demonstrate that the features discovered by our selection method which we call seeding improve performance on an action classification task on real data even though the synthetic data from which the features are seeded differs significantly from the real data both in terms of appearance and the set of action classes 1 Introduction A human researcher who designs a feature has an almost insurmountable advantage over a learning algorithm they can appeal to an intuition built over thousands of hours of direct experience with the world to decide which parts of the visual experience are important to consider and which are noise In contrast an algorithm that attempts to select or learn features directly from a target dataset risks overfitting especially if a large number of candidate features are considered Intuitively this problem might be avoided if a large amount of related data were used to learn the features one promising method to produce such data is synthetic generation using computer graphics techniques While graphics methods are not quite yet at the point where plausible synthetic images can be economically generated in the special case of motion the widespread availability of mature motion capture technology has provided a wealth of resources from which synthetic videos of human motion can be produced We propose a technique that we refer to as feature seeding in which synthetic data is used to select or seed features that are robust against a wide range of tasks and conditions The actual model is learned entirely on real data synthetic data has just guided the choice of underlying features In this case we only need enough similarity that the same types of features are useful on both synthetic and real data In video analysis for motion features we suspect we d a b 0 1 d1 d2 Figure 1 Our method uses motion features from synthetic data left to seed features that are effective for real data right even though the two data sets share no common actions and are very different in terms of appearance R Sukthankar is now at Google Research y1 0 1 1 1 0 y2 dn 0 1 0 0 1 c e 0 1 0 1 1 H yn Figure 2 System overview a pool of randomly generated features a is filtered or seeded on synthetic data b to produce a greatly reduced number of features e that are likely to be informative We extract descriptors e g trajectories on real data c and these descriptors are fed through the seeded feature set to produce label vectors qi one per descriptor These label vectors are then accumulated into a histogram H that represents the video clip can meet that requirement see Fig 1 What we demonstrate is that one can leverage observations of human actions obtained from one source to classify actions observed from another loosely related source even if the two sets of actions differ This transfer is possible because the two datasets are correlated not necessarily in terms of specific actions but because both depict humans performing visually distinctive movements In more concrete terms many popular bag of visual words BoW techniques rely on quantizing descriptors computed from video generally either simple unsupervised techniques such as k means clustering 11 15 20 24 or hand crafted quantization strategies 18 are used Our suggested seeding can be seen as employing synthetic data to drive the selection of the quantization method itself The basic organization of our method can be seen in Fig 2 First a set of synthetic video clips is generated using motion capture data These clips are generated in groups where each group is an independent binary classification problem Next raw motion descriptors are extracted from the synthetic data pool in the form of trajectory snippets 18 19 and histogram of optical flow HOF descriptors around space time interest points STIP 15 Note that we are not proposing a complete system for action recognition we consider only motion features in a simplified recognition framework in order to isolate the effects of our feature seeding Each clip produces many descriptors trajectory descriptors produce on the order of 300 descriptors per frame of video while STIP HOF produces closer to 100 descriptors per frame These descriptors are sampled to produce a candidate pool of features where each feature is a radial basis function RBF classifier 1 whose support vectors are randomly drawn from the descriptors Then the synthetic data is used to rank features based on their aggregate classification performance across many groups of synthetic data We denote the highly ranked features selected in this way as the seeded features The seeded features can then be applied to real data and used as input to conventional machine learning techniques For evaluation we consider the seeded features in a standard bag of words framework using linear SVMs as classifiers 2 Related work Our proposed technique is related to both domain adaptation and feature selection but targets a different level of information transfer than either Domain adaptation techniques can be powerful across limited and wellcharacterized domains such as in 14 However the gains are often modest and as the aptly titled work frustratingly simple domain adaptation by Daume 10 shows 1 We use the word feature in the same way as in boosting approaches 27 a feature is an individual base classifier even simple techniques can outperform sophisticated domain adaptation methods Likewise transfer learning methods such as transductive SVMs 8 can provide modest benefits but are often computationally expensive and often restricted to datasets with shared classes In terms of feature selection our method falls firmly into the filtering category of the taxonomy of Guyon and Elissee 13 in which features are selected without knowing the target classifier The choice of a filtering method rather than a wrapper is


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

Please select your school