DOC PREVIEW
UT CS 395T - Fast Discriminative Visual Codebooks

This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Fast Discriminative Visual Codebooks using Randomized Clusering ForestsFrank Moosmann, Bill Triggs, and Frederic JuriePresented by: Andrew F. DreherCS 395T - Spring 2007Contributions1) Creating visual “words” using classification trees2) Small ensembles of randomized trees can outperform k-means clusteringUsing stochasticity to improve accuracyTrees as “Words”Visual “Words”1) High dimensional vectors; typically extracted features or clusters of features summarized at a point2) Clusters forming is usually performed using k-means clustering3) Used with “bag of words” methods derived from text processingTrees as “words”1) Trees are trained as classifiers2) Leaves are used as “words”Represent a classified cluster of visual featuresProvides spacial information and intuition lacking in k-means3) Classification is a separate stage (using SVM) over the leavesInformation Gain with Entropy1) Useful with limited number of values2) Often prefers “pure” nodesRandomization of thresholds helps create different splits and treesPaper parameters Smin and Tmax[0, 1] Completely random trees[1, D] Discriminative trees (classic ID3)Basic Example of EntropyBasic Example of Entropy1Basic Example of Entropy132Basic Example of EntropyExperimentsGeneral Overview1) Descriptors - Dataset dependentHSV color (768-D vector)Wavelet (768-D vector)Created from HSV using Haar transformSIFT (128-D vector)2) Performance Metrics1) Receiver Operating Characteristic (ROC)2) Equal Error Rate (EER)Haar Wavelet1) First known wavelet2) Not continuous or differentiable3) Described as:Source: Wikipedia (http://en.wikipedia.org/wiki/Haar_wavelet){1 0 ≤ x ≤ ½-1 ½ ≤ x ≤ 11 0 otherwisef (x) =Specific Parameters1) Descriptors: Color Wavelet2) Tree parameters: Smin = 0.5; Tmax ≈ 503) Dataset: GRAZ-02Three categories300 Images from each category½ for training; ½ for testingSpacial ResultsFigure 3: ‘Bike’ visual words for 4 different images. The brightness denotes the posterior probabilityfor the visual word at the given image position to be labelled ‘bike’.is chosen using a validation set. For the 768-D Color Wavelet Descriptor on the GRAZ-02 dataset,Tmax≈ 50.Our algorithm’s ability to produce meaningful visual words is illustrated in figure 3 (c.f. [16]).Each white dot corresponds to the center of an image sub-window that reached an unmixed leafnode for the given object category (i.e. all of the training vectors belonging to the leaf are labeledwith that category). Note that even though they have been learned on entire images without objectsegmentation, the visual vocabulary is discriminative enough to detect local structures in the testimages that correspond well with representative object fragments, as illustrated in figure 2(right).The tests here were for individual object categories versus negatives (N). We took 300 images fromeach category, using images with even numbers for training and ones with odd numbers for testing.For Setting 1 tests we trained on the whole image as in [19], while for Setting 2 ones we used thesegmentation masks provided with the images to train on the objects alone without background.For the GRAZ-02 database the wavelet descriptors gave the best performance. We report resultsfor these on the two hardest categories, bikes and cars. For B vs. N we achieve 84.4% average EERclassification rate for setting 1 and 84.1% for setting 2, in comparison to 76.5% from Opelt et al.[19].For C vs. N the respective figures are 79.9%, 79.8% and 70.7%. Remarkably, using segmentationmasks during training does not improve the image classification performance. This suggests that themethod is able to pick out the relevant information from a significant amount of clutter.Comparing ERC-Forests with k-means and kd-clustering trees. Unless otherwise stated,20 000 features (67 per image) were used to learn 1000 spatial bins per tree for 5 trees, and 8000patches were sampled per image to build the resulting 5000-D histograms. The histograms are bina-rized using trivial thresholding at count 1 before being fed to the global linear SVM image classifier.We also tested with histograms normalized to total sum 1, and with thresholding by maximizing themutual information of each dimension, but neither yielded better results for ERC-Forests.Fig. 4 gives some quantitative results on the bikes category (B vs. N). Fig. 4(a) shows the cleardifference between our method and classical k-means for vocabulary construction. Note that wewere not able to extend the k-means curve beyond 20 000 windows per image owing to prohibitiveexecution times. The figure also shows results for ‘unsupervised trees’ – ERC-Forests built with-out using the class labels during tree construction. The algorithm remains the same, but the nodescoring function is defined as the ratio between the splits so as to encourage balanced trees similarto randomized KD-trees. If only a few patches are sampled this is as good as k-means and muchfaster. However the spatial partition is so bad that with additional test windows, the binarized his-togram vectors become almost entirely filled with ones, so discrimination suffers. As the dotted lineshows, using binarization thresholds that maximize the mutual information can fix this problem butthe results are still far below ERC-Forests. This comparison clearly shows the advantages of usingsupervision during clustering.Fig. 4(b) shows that codebooks with around 5000 entries (1000 per tree) suffice for good results.Fig. 4(c) shows that when the number of features used to build the codebooks is increased, theFigure 3: ‘Bike’ visual words for 4 different images. The brightness denotes the posterior probabilityfor the visual word at the given image position to be labelled ‘bike’.is chosen using a validation set. For the 768-D Color Wavelet Descriptor on the GRAZ-02 dataset,Tmax≈ 50.Our algorithm’s ability to produce meaningful visual words is illustrated in figure 3 (c.f. [16]).Each white dot corresponds to the center of an image sub-window that reached an unmixed leafnode for the given object category (i.e. all of the training vectors belonging to the leaf are labeledwith that category). Note that even though they have been learned on entire images without objectsegmentation, the visual vocabulary is discriminative enough to detect local structures in the testimages that correspond well with representative object fragments, as illustrated in figure


View Full Document

UT CS 395T - Fast Discriminative Visual Codebooks

Documents in this Course
TERRA

TERRA

23 pages

OpenCL

OpenCL

15 pages

Byzantine

Byzantine

32 pages

Load more
Download Fast Discriminative Visual Codebooks
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Fast Discriminative Visual Codebooks and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Fast Discriminative Visual Codebooks 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?