DOC PREVIEW
Stanford EE 368 - Pictures at an Exhibition

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Pictures at an Exhibition: EE368 ProjectJacob MattingleyStanford [email protected]—This report presents an algorithm which matchesphotographs of paintings to a small database. The algorithm usesa modified SIFT approach to match keypoints between paintings.The algorithm is somewhat invariant to size and rotation, highlyinvariant to perspective change and noise, and can toleratemultiple images in the field of view. Significant optimization wasperformed, leading to a typical identification time of 900 ms onan Intel Core Duo 1.67 GHz computer.I. INTRODUCTIONThere are many applications that are starting to leveragethe growing market penetration of cellphones with cameras.One possible system would use image processing to interpreta photograph taken by a cellphone camera, providing context-sensitive information to the user.A specific example of this is the identification of paintingsat an art gallery. After identifying a painting automatically, acellphone could function as a virtual tour guide—for exampleby providing a short audio guide to the painting.This report details an algorithm which successfully identi-fies paintings based on photographs taken with a cellphone.The algorithm borrows elements from the Scale InvariantFeature Transform (SIFT) described in [1]–[3]. A number ofsimplifications and modifications made implementation easierand faster, and decreased algorithm run time.II. ALGORITHM SELECTIONA. Problem analysisThe images in the training set have, in general, a numberof promising qualities:• High resolution• Relatively low noise• Consistent rotational alignment• Consistent lighting• All parts of the painting, including the frame, presentHowever, there are a number of challenges. The images in theset sometimes have:• Significantly changed perspective• Variation in painting size (up to a factor of two)• Other paintings present, especially at the sides of theimages• Other objects present• Unusually shaped framesThe first, na¨ıve, approach that the author attempted wasto try and leverage the presence of the frame to align theinitial image. If good alignment and deskewing were pos-sible, approaches such as correlation or eigenimages couldpotentially work well. Various strategies were trialled, in-corporating elements of morphological image processing andedge detection, but it was rapidly decided that this schemewould be difficult to make robust. Even if the images couldbe successfully aligned, it would still be relatively difficult andcomputationally intensive to determine the best match.A second approach that was briefly considered was match-ing global image characteristics. For example, histograms ofcolor values could be compared. Brief experimentation madethis approach unattractive, however, as all spatial informationis ignored, good masking of the region of interest is stillrequired, and it would be highly sensitive to even smallchanges in illumination.B. The chosen approachThe final algorithm is based on identifying stable keypointsfor a particular painting, extracting descriptive informationaround the points, and comparing them to a database ofpreviously identified keypoints. This approach offers goodrobustness to changes in alignment, perspective and lighting.It also has the attractive feature of simple and automatictraining. Example code, such as in [3], [4], was tested andgave spectacularly good results.III. ALGORITHM DESCRIPTIONThis section describes the complete algorithm used toidentify paintings. The steps used to identify a painting are:1) Load descriptor database.2) Preprocess image (scaling, grayscale conversion, nor-malization).3) Create Difference of Gaussian (DoG) stack.4) Find keypoints.5) Create descriptors by inspecting gradients around key-points.6) Compare descriptors to database.Typical processing time is 900 ms. The approximate portionof time taken by each of the steps is represented by the lengthof the bars in Figure 1.Each of the steps is now described in more detail.A. Image preprocessingThe images in the training set are all large, 2048 × 1536color images. This gives considerably more information thanis required to identify the paintings. To reduce the informationand speed up processing, the algorithm first subsamples inputimages to 400 × 400. (The consequent change in aspect ratiois arbitrary and harmless, as it is consistently applied to allLoad database (amortized).Preprocess image.Create DoG stack.Find keypoints.Create descriptors.Search database.Fig. 1. Major processing steps, with bars indicating processing time.images.) Images are also converted to grayscale for simplicityand speed, with pixel values normalized to the range [0, 256].The test set shows relatively consistently illuminated paint-ings, and thus, although various equalization strategies weretested, they were found to be unnecessary.B. Robust keypointsThe most important part of the algorithm is the identificationof robust points of interest, or keypoints, that can be foundin multiple views of the same object. The chosen strategyis to find scale-space extrema, in a scale space created byprogressively Gaussian blurring the input image, as describedin [2]. In particular, suppose the input image is I(x, y). Definea filter kernelU(x, y, σ) = ke−x2+y22σ2,where k ∈ R is a normalizing constant and σ is the degree ofblurring. Then, Gaussian blurred images L(x, y, σ) are createdby convolving the input image with the filter kernel, i.e.,L(x, y, σ) = U (x, y, σ) ∗ I(x, y).Next, a DoG stack {D(x, y, i )} is formed by finding thedifference between successive blurred images, i.e.,D(x, y, i) = L(x, y, σi+1) − L(x, y, σi).This is efficient to compute, as the two-dimensional convo-lution can be decomposed into two one-dimensional convo-lutions, each with a short kernel of length seven. This scalespace construction has been found in the literature to give goodperformance [5].Next, the algorithm finds local extrema in the DoG stack.A local maximum (minimum) in a layer of the DoG stack is apoint that is greater than (less than) all adjacent points withinthe layer, and greater than (less than) all nine nearest pointsin the layer above, and nine nearest points in the layer below.This is the last time the DoG stack is used: all subsequentprocessing uses the blurred images {L(x, y, σi)}.Once a local extremum is identified, a number of checksare made to see if it is a good choice for a keypoint. First,(discrete approximations to) the


View Full Document

Stanford EE 368 - Pictures at an Exhibition

Documents in this Course
Load more
Download Pictures at an Exhibition
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Pictures at an Exhibition and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Pictures at an Exhibition 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?