UT CS 395T - Visual Categorization With Bags of Keypoints - D1945802

Home> Schools> University of Texas at Austin> Computer Science (CS) > CS 395T> Visual Categorization With Bags of Keypoints

DOC PREVIEW

UT CS 395T - Visual Categorization With Bags of Keypoints

School name University of Texas at Austin

Course Cs 395t- Multicore Operating Systems Implementation

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

11Visual Categorization With Bags of Keypoints. ECCV, 2004. G. Csurka, C. Bray, C. Dance, and L. Fan. Shilpa Gulati2/15/20072Basic Problem Addressed Find a method for Generic Visual Categorization Visual Categorization: Identifying whether objects of one or more typesare present in an image. Generic: Method generalizes to new object types. Invariant to scale, rotation, affine transformation, lighting changes, occlusion, intra-class variations etc.3Main Idea Applying the bag-of-keywords approach for text categorization to visual categorization. Constructing vocabulary of feature vectors from clustered descriptors of images.4The Approach I: Training  Extract interest points from a dataset of training images and attach descriptors to them. Cluster the keypoints and construct a set of vocabularies (Why a set? Next slide). Train a multi-class qualifier using bags-of-keypoints around the cluster centers.5Why a set of vocabularies? The approach is motivated by text categorization (spam filtering for example). For text, the keywords have a clear meaning (Lottery! Deal! Affine Invariance). Hence finding a vocabulary is easy. For images, keypoints don’t necessarily have repeatable meanings. Hence find a set, then experiment and find the best vocabulary and classifier.6The Approach II: Testing Given a new image, get its keypointdescriptors.  Label each keypoint with its closest cluster center in feature space. Categorize the objects using the multi-class classifier learnt earlier: Naïve Bayes Support Vector Machines (SVMs)27Feature Extraction and Description From a database of images: Extract interest points using Harris affine detector. It was shown in Mikolajczyk and Schmid (2002) that scale invariant interest point detectors are not sufficient to handle affine transformations. Attach SIFT descriptors to the interest points. A SIFT description is 128 dimension vector. SIFT descriptors were found to be best for matching in Mikolajczyk and Schmid (2003). 8Visual Vocabulary Construction Use a k-means clustering algorithm to form a set of clusters of feature vectors. The feature vectors associated with the cluster centers (V1..Vm) form a vocabulary. Find multiple sets of clusters using different values of k.V1V2VmVocabulary isV = {V1, V2.. ,Vm}Construct multiple vocabularies.Slide inspired by [3]9Clustering ExampleImage taken from [2]All features Clusters10 Extract keypointdescriptors from a set of labeled images. Put the descriptor in the cluster or “bag” with minimum distance from cluster center. Count the number of keypoints in each bag.V1V2Vmni1ni2nimnijis the total number of times a feature “near” Vjoccurs in training images of category iCategorization by Naïve Bayes I: TrainingFImage of category CiMinimum distance from V2If a feature in image I is nearest to cluster center Vj, we say that keypoint j has occurred in image ISlide inspired by [3]11Categorization by Naïve Bayes II: Training For each category Ci, P(Ci) = Number of images of category Ci/ Total number of images In all images I of category Ci, For each keypoint Vj P (Vj| Ci) = Number of keypoints Vjin I / Total number of keypoints in I= nij/ ni But use Laplace smoothing to avoid numbers near zero.P (Vj | Ci) = (nij+ 1) / (ni+ |V|) Slide inspired by [3]12Categorization by Naïve Bayes III: Testing P (Ci|Image) = βP(Ci)P(Image|Ci)= βP(Ci)P(V0, V1,.. ,Vm|Ci)= βP(Ci)0(| )miiiPVC=∏Slide inspired by [3]313SVM: Brief Introduction SVM classifier finds a hyperplane that separates two-class data with maximum margin. maximum margin hyperplane. Equation f(x) is the target (classifying function)support vectorsTwo class dataset with linearly separable classes.Maximum margin hyperplanegive greatest separation between classes.The data instances closest to the hyperplane are called support vectors.14Categorization by SVM I: Training The classifying function is f (x)= sign ( ∑i yiβiK(x, xi)+ b ) xiis a feature vector from the training images, yiis the label for xi(yes, in category Ci, or no not in Ci), βiand bhave to be learnt. Data is not always linearly separable (Non linear SVM) A function Φ maps original data space to higher dimensional space. K(x, xi) = Φ(x).Φ(xi)15Categorization by SVM II: Training For an image of category Ci, xiis a vector formed by the number of occurrences of keypoints V in the image. The parameters are sometimes learnt using Sequential Quadratic Programming. The approach used in the paper is not mentioned. For the m class problem, the authors train m SVMs, each to distinguish some category Cifrom the other m-1.16Categorization by SVM III: Testing Given a query image, assign it to the category with the highest SVM output.17Experiments Two databases DB1: In-house. 1779 images. 7 object classes: faces, buildings, trees, cars, phones, bikes. Some images contain objects from multiple classes. But large proprtion of image is occupied by target image. DB2: Freely available from various sites. About 3500 images. 5 object classes: faces, airplanes, cars (rear), cars(side) and motorbikes(side).18Performance Metrics Confusion Matrix, M mij= Number of images from category jidentified by the classifier as category i. Overall Error Rate, R Accuracy = Total number of correctly classified test images/Total number of test images R = 1 – Accuracy Mean Rank, MR MR for category j = E [rank of class j in classified output | true class is j]419Finding Value of k Error rate decreases with increasing k. Decrease is low after k >1000. Choose k = 1000. Good tradeoff between accuracy and speed.Graph of error rate vs. k for Naïve Bayes for DB1Graph is taken from [2]selected operating point20Naïve Bayes Results for DB1691750194Books1.492842275Faces1.88141502424Buildings131367151Cars0739011Bikes1.3308052Trees9342Faces1.571.571.631.33Mean rank30376Phones0500Trees3350BuildingsBooksBikesCarsPhonesTrue ÆConfusion Matrix for Naïve Bayes on DB1Overall error rate = 28%Table taken from [2]21SVM Results Linear SVM gives best results out of linear, quadratic and cubic, except for cars. Quadratic gives best results on cars. How do we know these will work for other categories? What if we have to use higher degrees? Only time and more experiments will tell.22SVM Results Results for

View Full Document

UT CS 395T - Visual Categorization With Bags of Keypoints

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 5 pages.

UT CS 395T - Visual Categorization With Bags of Keypoints

Sign up for free to view:

Please select your school