MIT 6 375 - Difference of Gaussian ScaleSpace Pyramids - D2398281

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 375> Difference of Gaussian ScaleSpace Pyramids

DOC PREVIEW

MIT 6 375 - Difference of Gaussian ScaleSpace Pyramids

School name Massachusetts Institute of Technology

Course 6 375- Complex Digital Systems

Pages 20

This preview shows page 1-2-19-20 out of 20 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 20 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Difference of Gaussian Scale-Space Pyramidsfor SIFT1 Feature DetectionBallard Blair and Chris MurphyIntroduction Algorithmic OverviewArchitecture OverviewMemory Management and Addressing Convolution Description and Details Image Down-sampling Implementation Notes Testing Procedure and Results Synthesis Results Place and Route Results Architectural Exploration Future ImprovementsConclusionsThanksReferencesAppendix A. Bluespec File DescriptionsAppendix B: SiftTypes.bsvDifference of Gaussian Scale-Space Pyramidsfor SIFT1 Feature DetectionBallard Blair and Chris MurphyGroup 5 6.375: Complex Digital Systems DesignFinal Project Report, Spring 2007» «Introduction The Scale-Invariant Feature Transform (SIFT1) algorithm has rapidly been adopted in the machine vision community as the "best-in-class" standard for feature detection and matching. Image Feature detection has a variety of applications across many domains, from object recognition and tracking in robotics to creating photo mosaics in consumer photography applications.Developed by David Lowe in 2003 at the University of British Colombia, the SIFT algorithm takes a single image as input and returns a set of image features. Each point in the set has a location, scale, orientation, and description vector. Compared to other feature detecting algorithms, the Sift algorithm's description vectors and identified points are relatively invariant to illumination, scale, and viewpoint changes. This makes the algorithm extremely useful for tasks like object recognition where the object or camera may be moving, or image registration, the process of determining a shared coordinate frame between images. A hardware implementing of SIFT would provide the timing and power requirements necessary for a real-time, mobile system, something current software implementations don't provide. Current SIFT software implementations typically involve the use of a high-power, general-purpose processor to achieve less-than-real-time performance (one the order of seconds per image). For many embedded applications, something that achieves sub-second performance without high power requirements is essential. With a hardware implementation, the full potential of the SIFT algorithm could be available to mobile sensing platforms without the overhead of a dedicated high-power processor.We have implemented a subset of the SIFT algorithm, a Difference of Gaussian Scale-Space image pyramid generator. This operation forms the computational "core" of SIFT. » 1 «Our implementation takes an input image, repeatedly blurs it a fixed number of times, and computes the difference between successive blurring iterations. One of the resulting images is then down-sampled and the blurring/difference operation repeated. The result is a collection of "Difference of Gaussian" images, several for each scale. The power of the "Difference of Gaussian" operation provides an efficient discrete approximation to the Laplacian. The Laplacian is formally defined as the divergence of the gradient, which in image processing translates to areas of high variation, characteristic of interesting features. These Laplacian or Difference of Gaussian pyramids were first identified in 19842, and since then have been adopted for a variety of applications in image processing well beyond their role in the SIFT algorithm.Algorithmic OverviewThe Difference of Gaussian or Laplacian pyramid is generated from a single input image. The output is a 'pyramid' of several images, each being a unique difference of Gaussian. To generate the pyramid, the input image is repeatedly blurred; the difference » 2 «Figure 1: Architectural Overviewbetween consecutive blur amounts is then output as one “Octave” of the pyramid. One of the blurred images is down-sampled by a factor of two in each direction, and the process occurs again with output in a different size. Figure 1 illustrates this process in greater detail.Architecture OverviewOur goal in implementing the difference of Gaussian module was to create an architecture that was flexible enough to allow for some architectural exploration but rigid enough to be implementable in hardware. The architecture we decided upon was five parallel convolution units, identical except for the coefficients of the Gaussian coefficients. Rather than attempting to do a two-dimensional block convolution, which would require a large number of multiply units, we exploit the fact that a two-dimensional Gaussian convolution has the property of separability; we can perform first a vertical convolution followed by a horizontal convolution, and the result is identical to performing a full 2D convolution. In the current implementation, each convolution unit requires a frame buffer large enough to hold the entire image, two address generators, one for read and one for write, a one dimensional convolution unit hard wired with coefficients, and some FIFOs and control logic to tie everything together. In addition to the convolution units we needed additional control units to handle image reading, down-sampling, and feedback, an additional memory and address generation units to store the original image for processing, and a difference unit to calculate the difference between neighboring images. Figure 2 shows a top level block description of how our blocks fit together. Later sections describe the functionality of the memory and some of the more important rules and functionality. For a detailed description of each file, see Appendix A.» 3 «» 4 «Figure 2: Difference of Gaussian image pyramid schematic and file hierarchyMemory Management and Addressing Our target design included an image size of 480 pixels by 360 pixels (pixel = 8 bits), separable convolution, and enough memory in each blur unit to store an entire image. Therefore, we needed to implement a memory that was large enough to hold 172,800 bytes. The Synopsis tool chain includes a memory generation tool, but this tool is limited to blocks of memories of 4096 words or less. In order to meet our target memory size while still leaving us the flexibility for architectural exploration through pixel precision changes we used 43 blocks of 4096 8-bit words. The Bluespec memory block instantiates 43 copies of the RAM modules, which the memory generate creates in Verilog. Wrapping each of these in a Bluespec interface, we can simply pass the bottom 12 address bits

View Full Document