Unformatted text preview:

Animals on the WebTamara L. BergUniversity of California, BerkeleyComputer Science [email protected] A. ForsythUniversity of Illinois, Urbana-ChampaignDepartment of Computer [email protected] demonstrate a method for identifying images con-taining categories of animals. The images we classify de-pict animals in a wide range of aspects, configurationsand appearances. In addition, the images typically por-tray multiple species that differ in appearance (e.g. ukari’s,vervet monkeys, spider monkeys, rhesus monkeys, etc.). Ourmethod is accurate despite this variation and relies on foursimple cues: text, color, shape and texture.Visual cues are evaluated by a voting method that com-pares local image phenomena with a number of visual ex-emplars for the category. The visual exemplars are obtainedusing a clustering method applied to text on web pages. Theonly supervision required involves identifying which clus-ters of exemplars refer to which sense of a term (for exam-ple, “monkey” can refer to an animal or a bandmember).Because our method is applied to web pages with freetext, the word cue is extremely noisy. We show unequivocalevidence that visual information improves performance forour task. Our method allows us to produce large, accurateand challenging visual datasets mostly automatically.1. IntroductionThere are currently more than 8,168,684,3361web pageson the Internet. A search for the term “monkey” yields36,800,000 results using Google text search. There mustbe a large quantity of images portraying “monkeys” withinthese pages, but retrieving them is not an easy task asdemonstrated by the fact that a Google image search for“monkey” yields only 30 actual “monkey” pictures in thefirst 100 results. Animals in particular are quite difficult toidentify because they pose difficulties that most vision sys-tems are ill-equipped to handle, including large variationsin aspect, appearance, depiction, and articulated limbs.We build a classifier that uses word and image infor-mation to determine whether an image depicts an animal.This classifier uses a set of examples, harvested largely au-1Google’s last released number of indexed web pagestomatically, but incorporating some supervision to deal withpolysemy-like phenomena. Four cues are combined to de-termine the final classification of each image: nearby words,color, shape, and texture. The resulting classifier is very ac-curate despite large variation in test images. In figure 1 weshow that visual information makes a substantial contribu-tion to the performance of our classifier.We demonstrate one application by harvesting picturesof animals from the web. Since there is little point in look-ing for, say, “alligator” in web pages that don’t have wordslike “alligator”, “reptile” or “swamp”, we use Google to fo-cus the search. Using Google text search, we retrieve thetop 1000 results for each category and use our classifier tore-rank the images on the returned pages. The resulting setsof animal images (fig 3) are quite compelling and demon-strate that we can handle a broad range of animals.For one of our categories, “monkey”, we show that thesame algorithm can be used to label a much larger collectionof images. The dataset that we produce from this set ofimages is startlingly accurate (81% precision for the first500 images) and displays great visual variety (fig 5). Thissuggests that it should be possible to build enormous, richsets of labeled animal images with our classifier.1.1. Previous Work:Object Recognition has been thoroughly researched,but is by no means a solved problem. There has been arecent explosion of work in appearance based object recog-nition using local features, in particular on the Caltech-101Object Categories Dataset introduced in [8]. Some meth-ods use constellation of parts based models trained usingEM [10, 8]. Others employ probabilistic models like pLSAor LDA [20, 19]. The closest method to ours employs anearest neighbor based deformable shape matching [4] tofind correspondences between objects. Object recognitionis unsolved, but we show that whole image classificationcan be successful using fairly simple methods.There has been some preliminary work on voting basedmethods for image classification in the Caltech-101 Datasetusing geometric blur features [3]. In an alternative forcedchoice recognition task this method produces quite rea-Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06) 0-7695-2597-0/06 $20.00 © 2006 IEEE0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91precisionrecallClassification Performance of monkey Classifiergooglenearby word rankingshape featurescolor featurestexture featuresshape+color+words+texture0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91precisionrecallClassification Performance of frog Classifiergooglenearby word rankingshape featurescolor featurestexture featuresshape+color+words+texture0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91precisionrecallClassification Performance of giraffe Classifiergooglenearby word rankingshape featurescolor featurestexture featuresshape+color+words+textureFigure 1. Classification performance on Test images (all images except visual exemplars) for the “monkey” (left), “frog” (center) and“giraffe” (right) categories. Recall is measured over images in our collection, not all images existing on the web. “monkey” resultsare on a set of 12567 images, 2456 of which are true “monkey” images. “frog” results are on a set of 1964 images, 290 of which aretrue “frog” images. “giraffe” results are on a set of 873 images, 287 of which are true “giraffe” images. Curves show the Google textsearch classification (red), word based classification (green), geometric blur shape feature based classification (magenta), color basedclassification (cyan), texture based classification (yellow) and the final classification using a combination of cues (black). Incorporatingvisual information increases classification performance enormously over using word based classification alone.sonable results (recognition rate of 51%) as comparedwith the best previously reported result using deformableshape matching (45%) [4]2.Our work uses a modified vot-ing method for image retrieval that incorporates multiplesources of image and


View Full Document

UW-Madison CS 766 - Animals on the Web

Documents in this Course
Load more
Download Animals on the Web
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Animals on the Web and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Animals on the Web 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?