Unformatted text preview:

Empirical Evaluation of Dissimilarity Measures for Color and Texture Jan Puzicha Joachim M Buhmann Yossi Rubner Carlo Tomasi Presented by Dave Kauchak Department of Computer Science University of California San Diego dkauchak cs ucsd edu The Problem Image Dissimilarity D Where does this problem arise in computer vision Image Classification Image Retrieval Image Segmentation Classification Retrieval Jeremy S De Bonet Paul Viola 1997 Structure Driven Image Database Retrieval Neural Information Processing 10 1997 Segmentation http vizlab rutgers edu comanici segm images html Histograms for image dissimilarity Examine the distribution of features rather than the features themselves General purpose i e any distribution of features Resilient to variations shadowing changes in illumination shading etc Can use previous work in statistics etc Histogram Example Histogramming Image Features Color Texture Shape Others Create histogram through binning or some procedure to get a distribution Color Which is more similar L a b was designed to be uniform in that perceptual closeness corresponds to Euclidean distance in the space L a b L lightness white to black a red greeness b yellowness blueness Texture Texture is not pointwise like color Texture involves a local neighborhood Gabor Filters are commonly used to identify texture features Gabor Filters Gabor filters are Gaussians modulated by sinusoids They can be tuned in both the scale size and the orientation A filter is applied to a region and is characterized by some feature of the energy distribution often mean and standard deviation Examples of Gabor Filters Scale 3 at 72 Scale 4 at 108 Scale 5 at 144 Creating Histograms from Features Regular Binning Simple Choosing bins important Bins may be too large or too small Adaptive Binning Bins are adapted to the distribution usually using some form of K means Marginal Histograms Marginal histograms only deal with a single feature Normal Binning Marginal binning resulting in 2 histograms Cumulative Histogram Normal Histogram Cumulative Histogram Dissimilarity Measure Using the Histograms Heuristic Histogram Distances Non parametric Test Statistics Information Theoretic diverges Ground distance measures Notation D I J is the dissimilarity of images I and J f i J is histogram entry i in histogram of image J fr i J is marginal histogram entry i of image J Fr i J is the cumulative histogram Heuristic Histogram Distances Minkowski form distance Lp 1 p p D I J f i I f i J i Special cases L1 absolute cityblock or Manhattan distance L2 Euclidian distance L Maximum value distance More heuristic distances Weighted Mean Variance WMV r I r J r I r J D I J r r r Only includes minimal information about distribution Non parametric Test Statistics Kolmogorov Smirnov distance K S r r r D I J max F i I F i J Cramer von Mises type CvM D I J F i I F i J r r i r 2 Cumulative Difference Example Histogram 1 Histogram 2 K S Difference CvM Non parametric Test Statistics cont 2 statistic chi square Simple statistical measure to decide if two samples came from the same underlying distribution D I J i f i I f i J f i I f i J 2 Information Theoretic diverges How well can one distribution be coded using the other as a codebook Kullback Leibler divergence KL D I J i Jeffrey divergence D I J i f i I f i I log f i J JD f i I f i J f i I log f i J log f i f i Ground Distance Measure Based on some metric of distance between individual features Earth Movers Distance EMD Minimal cost to transform one distribution to the other Only measure that works on distributions with a different number of bins EMD One distribution can be seen as a mass of earth properly spread in space the other as a collection of holes in that same space Distributions are represented as a set of clusters and an associated weight Computing the dissimilarity then becomes the transportation problem Transportation Problem Some number of suppliers with goods Some other number of consumers wanting goods Each consumer supplier pair has an associated cost to deliver one unit of the goods Find least expensive flow of goods from supplier to consumer Various properties of the metrics K S CvM and WMV are only defined for marginal distributions Lp WMV K S CvM and under constraints EMD all obey the triangle inequality WMV is particularly quick because the calculation is quick and the values can be precomputed offline EMD is the most computationally expensive Key Components for Good Comparison Meaningful quality measure Subdivision into various tasks applications classification retrieval and segmentation Wide range of parameters should be measured An uncontroversial ground truth should be established Data Set Color Randomly chose 94 images from set of 2000 94 images represent separate classes Randomly select disjoint set of pixels from the images Set size of 4 8 16 32 64 pixels 16 disjoint samples per set per image Data Set Texture Brodatz album Collection of wide range of texture e g cork lawn straw pebbles sand etc Each image is considered a class as in color Extract sets of 16 non overlapping blocks sizes 8x8 16x16 256x256 Setup Classification k Nearest Neighbor classifier is used Nearest Neighbor classification given a collection of labeled points S and a query point q what point belonging to S is closest to q k nearest is a majority vote of the k closest points k 1 3 5 and 7 Average misclassification rate percentage using leave one out Setup Classification cont Bins 4 8 16 32 64 128 256 Texture case three sets of filters were used of sizes 12 24 and 40 filters 1000 CPU hours of computation Results Classification color data set Results Classification texture data set Results Classification For small sample sizes the WMV measure performs best in the texture case WMV only estimates means and variances Less sensitive to sampling noise EMD also performs well for small sample sizes Local binning provides additional information For large sample sizes 2 test performs best Results Classification cont For texture classification marginal distributions do better than multidimensional distributions except for very large sample sizes 256x256 Binning is not well adapted to the data since it is fixed for all the 94 classes EMD which uses local adaption does much better For multidimensional histograms the more bins the better the performance For texture usually 12 filters is enough Setup Image Retrieval Vary sample size Vary number of images retrieved Performance measured based on


View Full Document

UCSD CSE 291 - Dissimilarity Measures for Color and Texture

Documents in this Course
Bluegene

Bluegene

23 pages

TinyECC

TinyECC

19 pages

MultiNet

MultiNet

18 pages

Lecture 2

Lecture 2

23 pages

AdaBoost

AdaBoost

25 pages

Lecture 9

Lecture 9

46 pages

Lecture

Lecture

5 pages

GPSR

GPSR

18 pages

Load more
Loading Unlocking...
Login

Join to view Dissimilarity Measures for Color and Texture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Dissimilarity Measures for Color and Texture and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?