UW-Madison CS 766 - Image Classification using Latent Patch Concepts - D2760505

Home> Schools> University of Wisconsin, Madison> (CS) > CS 766> Image Classification using Latent Patch Concepts

DOC PREVIEW

UW-Madison CS 766 - Image Classification using Latent Patch Concepts

School name University of Wisconsin, Madison

Course Cs 766- Computer Vision

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

IntroductionRelated WorkImplementationTrainingTestingExperimentsImpact of overlapImpact of scaleImpact of depthImpact of training dataImpact of ClusteringFuture WorkConclusionImage Classification using Latent Patch ConceptsTushar KhotComputer Sciences DepartmentUniversity of Wisconsin, Madison, [email protected] have been many approaches to tackle theproblem of image classification. Some sacrificespeed for a very complex model whereas other ap-proaches try to limit the features for speed. In ourapproach, we try to maintain spatial informationsimilar to the spatial pyramid approach and use thepyramid match kernel for efficiency. We split theimages into overlapping patches at multiple scalesand train a latent concept model on these patches.For testing, we find the most likely category using anaive latent topic model. Patches of diffe rent scaleshave different classifiers, which can be utilized forparallelizing the process.1 IntroductionThe problem that we are attempting to solve isthe image categorization problem over the Caltech-101[1] and Caltech-256[2] dataset. Over the last fewyears, it has been shown that simpler approachesactually work really well for image classification[5].Using a simple histogram intersection kernel withSVM actually performs really well as shown in [4].The spatial pyramid kernel used in [4] splits theimage into partitions and builds a histogram overall the partitions. By using the partitions they takeinto account the location of the features too. Ex-tending this idea, we built an image classificationtree where nodes at higher depths deal with smallerpartitions of the tree. Each node of the tree dealswith a particular partition of the image and learnsa classifier based on these partitions in all the im-ages. Also if the SVM/KNN has a very high classi-fication probability, we can completely avoid trav-eling down the tree. Unfortunately, the recognitionperformance of this approach was very poor as itdidn’t allow for matches across patches at differ-ent positions. Hence we then decided to use latenttopic models, to allow for matches between patchesirrespective of their position.2 Related WorkGrauman et al. [3] used a pyramid kernel, by split-ting the feature space into partitions. But theirapproach didn’t account for the location of thesefeatures.Lazebnik et al. [4] extended this idea but insteadof splitting the feature space, they split the imageinto parts and built a histogram intersection kernelon the features.But the objects in every image maynot be at the same location and hence this approachmay not always work.There have been many other kernels such as [11]which can then be used with KNN or SVM’s butmost of these approaches are too time consumingand hence were not used.There have been approaches that use other ma-chine learning techniques such as decision trees [8]or SVM/KNN [6] for classification.Boiman et al. [5] have shown that using compli-cated features that try to reduce the size of the fea-ture space lose a lot of information and hence theyuse a simple Nearest-Neighbor approach. We alsorely on the simplicity of our approach to make useof the most information without taking too muchtime.Desai et al.[9] use object detection to solve theimage categorization problem. But they need theobjects in the training data which may not alwaysbe available. Instead latent topic models,[10], [7]are more appealing as they don’t rely on the objects1being provided in the training data.3 ImplementationSince, we wanted to match patches irrespective oftheir location, we needed to train a classifier onall the patches of the same sc ale. Since nearestneighbor approaches lose the least amount of in-formation, we decided to use K-Nearest Neighborfor classification. To reduce the search time duringtesting, we also performed clustering on the train-ing patches. The cluster centers formed the latenttopics for the classification problem. The probabil-ity of an image can thus be calculated by computingthe probability of its patches being generated fromsome concept and the probability of that conceptbelonging to the given category.3.1 TrainingEvery training image is partitioned into overlap-ping patches at various scales.In our approach, ascale of 1/2 indicates that the patches have half therows and columns as compared to the original im-age. Given the boundaries of these patches, we canfind the PCA-SIFT features enclosed within theseboundaries. The set of PCA-SIFT features for eachwindow, form the feature patches.All patches at the same scale but of different cat-egories are collected together. We perform Hierar-chical Agglomerative Clustering on these patches tofind the concepts as centroids. We use the pyramidmatch kernel to find the pairwise distance betweenall the pair of feature patches(set of PCA-SIFT fea-tures). The pair of points with the highest similari-ties are clustered and the centroids are added backto the list of patches. For clustering two featurepatches, we perform KMeans clustering on the col-lection of features within these patches. We thenselect the set of PCA-SIFT features from the orig-inal set nearest to the center. We don’t use thecentroids directly as the pyramid match kernel usesa vocabulary tree on the original set of PCA-SIFTfeatures and fails on seeing unknown features.To save on time, we cluste r twenty patches at atime before recomputing the distances. We stopclustering once the number of clusters are smallenough or the similarity measures are too low forany me aningful clustering.The distance computation is skewed by the factthat some patches may have very high number offeatures and hence the intersection would also bevery high. Hence we divide the similarity score bythe number of features in each patch.Also, images of certain classes are inherentlysparse and would always have low similarity mea-sures. We cluster all patches with less than 5 fea-tures into a default blank patch.At the end of the HAC run, we have a set ofnew patches where each patch, Ljwas obtained bycombining njkpatches from category Ck.P rob(Ck|Lj) =njk+ mPk(njk+ m), where m=0.1 (1)Algo 1 Train(Images, Features, Categories)1. Perform the following for windows at sc ale, s ={1, 1/2, · · · 1/2n}.2. Partition the image and the features into over-lapping windows at scale s.3. Perform HAC on the feature patches acrosscategories at scale s, till number of patches <threshold.4. For each cluster/concept, maintain thePs(Category|Concept).3.2 TestingAt the

View Full Document