DOC PREVIEW
Stanford EE 368 - Lecture Notes

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

EE 368: Digital Image Processing – Group 03 (Tsai, Zhang, Janatra), May 2007. 1Abstract—“A picture is worth a thousand words,” but finding the right thousand words to describe a picture has no simple adage. Camera phones and other mobile devices can easily capture a scene, but processing the image to inform its contents and describe the scene requires efficient object recognition and identification. In an effort to augment reality in a potential virtual museum guide, this document presents an efficient method to identify a painting centered in a camera phone digital image. This expedient and accurate algorithm taps the principal component transform to map the thirty-three paintings of interest into an alternative subspace spanned by an orthogonal basis of eigenimages. Within the eigenspace, Euclidean distance computation and subsequent feature recognition simplify into juxtaposition of the transformed image coordinates and the known eigencoefficients of our thirty-three known paintings. Index Terms—camera, painting, object recognition and identification, principal component analysis (PCA), eigenimage I. INTRODUCTION ELLULAR phone and mobile device technology have evolved beyond mere conversational communication to support numerous forms of multimedia, the most prevalent of which is image data. Camera-phones can capture scenes and store their pictures in digital formats like the JPEG file, at a resolution sufficiently fine for detailed object recognition, whether it involves face discrimination, or, for our application, painting identification. Furthermore, by properly recognizing the features in a painting, the portable device can augment its user’s reality by supplying additional information about the subject, such as the painting’s title, artist, date, and background information. Nevertheless, expedient identification begins with digital image processing, and the following document examines eigenanalysis – otherwise known as principal component analysis (PCA) – as an efficient means to accurately identify works of art, using examples from the European art gallery at the Cantor Arts Center to train and test our algorithm. We characterize successful identification by our algorithm’s ability to distinguish thirty-three different paintings from this European art gallery. II. PRE-PROCESSING PROCEDURE A. Downsampling Even at its coarse resolution (compared to camera images), phone-captured image files are saturated with information. For example, the 2048  1536 JPEG images taken with the Nokia N93 might provide less detail than a full-fledged digital camera photo, but the amount of pixel redundancy allows us to downsample the images eightfold while preserving recognizability of imaged objects. Consider the following digital image of the painting Edward Becher: Fig. 1. Original 2048  1536 JPEG color image of Edward Becher. If we downsample the image by a factor of four, we obtain the following comparatively detailed version: Fig. 2. Downsampled 512  384 color image of Edward Becher. Similarly, downsampling the image by a factor of eight also reduces detail but preserves recognizability of key features: Fig. 3. Downsampled 256  192 color image of Edward Becher. While Mr. Becher’s facial features no longer seem visible to the naked eye, the shape of his hair, the colors of his jacket, the limbs of the tree, and other key objects remain distinct. In other Pictures at an Eigen-Exhibition Christopher Tsai, June Zhang, and Ignatius I. Janatra CEE 368: Digital Image Processing – Group 03 (Tsai, Zhang, Janatra), May 2007. 2words, the downsampled image has lost some of its fine detail, but it remains distinctly and distinguishably Edward Becher. The advantages of downsampling far outweigh its minor drawbacks; while the smaller, downsampled images display less detail when compared to their full-sized counterparts, the downsampled versions require far less time to process, making them crucial to an augmented reality application in which expedient description should quickly follow a photographed scene. The number of pixels that an image processing algorithm must process decreases as the square of the downsampling factor: when we sample our original 2048  1536 image down to its 256  192 rendition, the image has 64 times fewer pixels, so any global or element-wise image processing matrix operation performs 64 times fewer computations, cutting processing time by 64 for our eightfold downsampled image, and by 256 for sixteen-fold downsampling. However, downsampling by even greater factors reaps little additional benefit in processing time: Fig. 4. Approximately inverse relationship between the speed of pre-processing and the factor by which we downsample the original image. In the digital domain, each image is a matrix. For the human eye, the loss of detail may arguably reduce a painting’s singular qualities, but, for the digital eye used to distinguish one painting from a finite number of others (such as the thirty-two other paintings in the European art gallery), the downsampling retains recognizability for the simple reason that hundreds of pixels still remain to separate one image matrix from the finite number of other possibilities. As a matter of fact, we will shortly see that the reduction in pixel redundancy also facilitates edge detection, as bridging gaps between frame lines requires less smoothing and smaller structuring elements than would be required for the full image. B. Grayscaling Continuing the quest for compact representation and hence minimal computation, we prune our image further by coalescing information from the three color channels into a single grayscale intensity. While color remains a powerful visual discriminant between different paintings, its limited value in principal component analysis does not warrant the threefold increase in computation that retaining all three channels would require. Thus, in an effort to reduce the amount of unnecessary information in our algorithm operand, we convert each input image from the Red-Green-Blue (RGB) color space to grayscale values between 1 and 256. Following the transformation used in rgb2gray, we merge the color channels in the linear combination: 0.29894·0.58704·0.11402· Because the coefficients add to unity, this weighted sum preserves the intensity scale while accentuating the


View Full Document

Stanford EE 368 - Lecture Notes

Documents in this Course
Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?