UVA CS 662 - Query by Image and Video Content

Unformatted text preview:

Query by Image The QBIC System Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafher, Denis Lee, Dragutin Petkovie, David Steele, and Peter Yanker ZBMAlmaden Research Center m QBlC* lets users find pictorial information in large image and video databases based on color, shape, texture, and sketches. QBIC technology is part of several IBM products. ‘To run an interacnve query, vult the QBIC Web sewer at http //imwqbic almaden ibm COW Semantic versus nonsemantic information icture yourself as a fashion designer needing images of fabrics with a particular mixture of colors, a museum cataloger looking P for artifacts of a particular shape and textured pattern, or a movie producer needing a video clip of a red car-like object moving from right to left with the camera zooming. How do you find these images? Even though today’s technology enables us to acquire, manipulate, transmit, and store vast on-line image and video collections, the search method- ologies used to find pictorial information are still limited due to difficult research problems (see “Semantic versus nonsemantic” sidebar). Typ- ically, these methodologies depend on file IDS, keywords, or text associ- ated with the images. And, although powerful, they don’t allow queries based directly on the visual properties of the images, are dependent on the particular vocabulary used, and don’t provide queries for images similar to a given image. Research on ways to extend and improve query methods for image data- bases is widespread, and results have been presented in workshops, con- ferences,’.* and surveys. We have developed the QBIC (Query by Image Content) system to explore content-based retrieval methods. QBIC allows queries on large image and video databases based on example images, user-constructed sketches and drawings, selected color and texture patterns, At first glance, content-based querying appears deceptively simple because we humans seem to be so good at it. If a pro- gram can be written to extract semantically relevant text phrases from images, the problem may be solved by using currently available text-search technology. Unfortunately, in an unconstrained environment, the task of writing this pro- gram is beyond the reach of current technology in image understanding. At an artificial intelligence conference sev- eral years ago, a challenge was issued to the audience to write a program that would identify all the dogs pictured in a chil- dren’s book, a task most 3-year-olds can easily accomplish. Nobody in the audience accepted the challenge, and this remains an open problem. Perceptual organization-the process of grouping image features into meaningful objects and attaching semantic descriptions to scenes through model matching-is an unsolved problem in image understanding. Humans are much better than computers at extracting semantic descrip- tions from pictures. Computers, however, are better than humans at measuring properties and retaining these in long-term memory. One of the guiding principles used by QBIC is to let com- puters do what they do best-quantifiable measurement- and let humans do what they do best-attaching semantic meaning. QBIC can find “fish-shaped objects,” since shape is a measurable property that can be extracted. However, since fish occur in many shapes, the only fish that will be found will have a shape close to the drawn shape. This is not the same as the much harder semantical query of finding all the pictures of fish in a pictorial database. September 1995 0018-9162/95/$4 00 1995 IEEEFigure 1. QBlC query by drawn color. Drawn query specification on left; best 21 results sorted by similarity to the query on right. The results were selected from a 12,968-picture database. ~. ,-\, Still images ) Scene \, /Motion objects ‘ Shots \ Feature f I Query interface Shape Multiobject Sketch Location Text I I ional Object Camera User Existlng I exture motion motlon deflned , 1 * Match engine Color Texture Shape Multiobject Sketch Location Text - Positional Object Camera User /’ color/texture motion motion defined i User Computercamera and object motion, and other graphical information. Two key properties of QBIC are (1) its use of image and video content-com- putable properties of color, texture, shape, and motion of images, videos, and their objects-in the queries, and (2) its graph- ical query language in which queries are posed by drawing, selecting, and other graphical means. Related systems, such as MIT’s Photobook3 and the Trademark and Art Museum applications from ETL,4 also address these common issues. This article describes the QBIC system and demon- strates its query capabilities. QBIC SYSTEM OVERVIEW Figure 1 illustrates a typical QBIC query.” The left side shows the query specification, where the user painted a large magenta cir- Figure 3. QBIC still image population interface. Entry for scene text at top. Tools in row are polygon outliner, rectangle outliner, ellipse outliner, paintbrush, eraser, line drawing, object translation, flood fill, and snake outliner. 1 eJ I $ cular area on a green background using standard drawing tools. Query results are shown on the right: an ordered list of “hits” similar to the query. The order of the results is top to bottom, then left to right, to support horizontal scrolling. In general, all queries follow this model in that the query is spec- ified by using graphical means-drawing, selecting from a color wheel, selecting a sample image, and so on-and results are displayed as an ordered set of images. To achieve this functionality, QBIC has two main com- ponents: database population (the process of creating an image database) and database query. During the popula- tion, images and videos are processed to extract features describing their content-colors, textures, shapes, and camera and object motion-and the features are stored in a database. During the query, the user composes a query graphically. Features are generated from the graphical query and then input to a matching engine that finds images or videos from the database with similar features. Figure 2 shows the system architecture. Data model For both population and query, the QBIC data model has still images or scenes (full images) that contain objects video shots that consist of sets of contiguous frames and (subsets of an image), and contain motion objects. For


View Full Document

UVA CS 662 - Query by Image and Video Content

Download Query by Image and Video Content
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Query by Image and Video Content and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Query by Image and Video Content 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?