UW-Madison CS 766 - Animals on the Web - D542246

Home> Schools> University of Wisconsin, Madison> (CS) > CS 766> Animals on the Web

UW-Madison CS 766 - Animals on the Web

School name University of Wisconsin, Madison

Pages 8

Download Save

Unformatted text preview:

Animals on the WebTamara L. BergUniversity of California, BerkeleyComputer Science [email protected] A. ForsythUniversity of Illinois, Urbana-ChampaignDepartment of Computer [email protected] demonstrate a method for identifying images con-taining categories of animals. The images we classify de-pict animals in a wide range of aspects, conﬁgurationsand appearances. In addition, the images typically por-tray multiple species that differ in appearance (e.g. ukari’s,vervet monkeys, spider monkeys, rhesus monkeys, etc.). Ourmethod is accurate despite this variation and relies on foursimple cues: text, color, shape and texture.Visual cues are evaluated by a voting method that com-pares local image phenomena with a number of visual ex-emplars for the category. The visual exemplars are obtainedusing a clustering method applied to text on web pages. Theonly supervision required involves identifying which clus-ters of exemplars refer to which sense of a term (for exam-ple, “monkey” can refer to an animal or a bandmember).Because our method is applied to web pages with freetext, the word cue is extremely noisy. We show unequivocalevidence that visual information improves performance forour task. Our method allows us to produce large, accurateand challenging visual datasets mostly automatically.1. IntroductionThere are currently more than 8,168,684,3361web pageson the Internet. A search for the term “monkey” yields36,800,000 results using Google text search. There mustbe a large quantity of images portraying “monkeys” withinthese pages, but retrieving them is not an easy task asdemonstrated by the fact that a Google image search for“monkey” yields only 30 actual “monkey” pictures in theﬁrst 100 results. Animals in particular are quite difﬁcult toidentify because they pose difﬁculties that most vision sys-tems are ill-equipped to handle, including large variationsin aspect, appearance, depiction, and articulated limbs.We build a classiﬁer that uses word and image infor-mation to determine whether an image depicts an animal.This classiﬁer uses a set of examples, harvested largely au-1Google’s last released number of indexed web pagestomatically, but incorporating some supervision to deal withpolysemy-like phenomena. Four cues are combined to de-termine the ﬁnal classiﬁcation of each image: nearby words,color, shape, and texture. The resulting classiﬁer is very ac-curate despite large variation in test images. In ﬁgure 1 weshow that visual information makes a substantial contribu-tion to the performance of our classiﬁer.We demonstrate one application by harvesting picturesof animals from the web. Since there is little point in look-ing for, say, “alligator” in web pages that don’t have wordslike “alligator”, “reptile” or “swamp”, we use Google to fo-cus the search. Using Google text search, we retrieve thetop 1000 results for each category and use our classiﬁer tore-rank the images on the returned pages. The resulting setsof animal images (ﬁg 3) are quite compelling and demon-strate that we can handle a broad range of animals.For one of our categories, “monkey”, we show that thesame algorithm can be used to label a much larger collectionof images. The dataset that we produce from this set ofimages is startlingly accurate (81% precision for the ﬁrst500 images) and displays great visual variety (ﬁg 5). Thissuggests that it should be possible to build enormous, richsets of labeled animal images with our classiﬁer.1.1. Previous Work:Object Recognition has been thoroughly researched,but is by no means a solved problem. There has been arecent explosion of work in appearance based object recog-nition using local features, in particular on the Caltech-101Object Categories Dataset introduced in [8]. Some meth-ods use constellation of parts based models trained usingEM [10, 8]. Others employ probabilistic models like pLSAor LDA [20, 19]. The closest method to ours employs anearest neighbor based deformable shape matching [4] toﬁnd correspondences between objects. Object recognitionis unsolved, but we show that whole image classiﬁcationcan be successful using fairly simple methods.There has been some preliminary work on voting basedmethods for image classiﬁcation in the Caltech-101 Datasetusing geometric blur features [3]. In an alternative forcedchoice recognition task this method produces quite rea-Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06) 0-7695-2597-0/06 $20.00 © 2006 IEEE0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91precisionrecallClassification Performance of monkey Classifiergooglenearby word rankingshape featurescolor featurestexture featuresshape+color+words+texture0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91precisionrecallClassification Performance of frog Classifiergooglenearby word rankingshape featurescolor featurestexture featuresshape+color+words+texture0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91precisionrecallClassification Performance of giraffe Classifiergooglenearby word rankingshape featurescolor featurestexture featuresshape+color+words+textureFigure 1. Classiﬁcation performance on Test images (all images except visual exemplars) for the “monkey” (left), “frog” (center) and“giraffe” (right) categories. Recall is measured over images in our collection, not all images existing on the web. “monkey” resultsare on a set of 12567 images, 2456 of which are true “monkey” images. “frog” results are on a set of 1964 images, 290 of which aretrue “frog” images. “giraffe” results are on a set of 873 images, 287 of which are true “giraffe” images. Curves show the Google textsearch classiﬁcation (red), word based classiﬁcation (green), geometric blur shape feature based classiﬁcation (magenta), color basedclassiﬁcation (cyan), texture based classiﬁcation (yellow) and the ﬁnal classiﬁcation using a combination of cues (black). Incorporatingvisual information increases classiﬁcation performance enormously over using word based classiﬁcation alone.sonable results (recognition rate of 51%) as comparedwith the best previously reported result using deformableshape matching (45%) [4]2.Our work uses a modiﬁed vot-ing method for image retrieval that incorporates multiplesources of image and

View Full Document


School:
Email:
New Password:
Confirm Password:

UW-Madison CS 766 - Animals on the Web

Sign up for free to view:

Please select your school